LINQ to XML
Hi all, and welcome to my tutorial on LINQ to XML. In this tutorial, we shall cover creating, reading and generally querying .xml documents using LINQ.
Why is this important to learn about?
I believe this part of LINQ is very important to learn for a number of reasons:
1)XML is the modern way to represent complex information and convey meaning in a platform independent manner. It is used widely across the computing industry. Thus, knowing how to read and write XML easily is important.
2)If you have used other libraries in C# to work with XML, you will be amazed at how easy and logical LINQ to XML is. It can cut down the number of lines of code required dramatically.
3)It removes the horrible, laborious, error prone code associated with the old DOM API.
4)LINQ is a very elegant solution and is .NET specific. It is one of the compelling advantages of using .NET.
Definitions of terms used.
Functional Construction - is the ability to create an XML tree in a single statement
Note: All examples were created using Visual Studio 2010, targetting the .NET Framework 4.0. We'll do our best to point out anything that might not work in older versions.
LINQ to XML
Introduction
A few points before we get started:
1) You should have an understanding of basic LINQ queries and the concepts surrounding them before you start this tutorial.
2) I have written the examples in this tutorial in isolation, on at a time, in a ConsoleApplication. If you try an run all the queries at once, you will may get name clashes, and unexpected results.
3) For this tutorial, make sure you have the following using statements:
using System.Linq; using System.Xml.Linq; using System.Collections.Generic;
4) Here is the basic .xml document (called People.xml) we shall be working with in this tutorial:
Quote
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--Jason's xml-->
<People>
<Person id="1">
<Name>Joe</Name>
<Age>35</Age>
<Job>Manager</Job>
</Person>
<Person id="2">
<Name>Jason</Name>
<Age>18</Age>
<Job>Software Engineer</Job>
</Person>
<Person id="3">
<Name>Lisa</Name>
<Age>53</Age>
<Job>Bakery Owner</Job>
</Person>
<Person id="4">
<Name>Mary</Name>
<Age>90</Age>
<Job>Nurse</Job>
</Person>
</People>
<!--Jason's xml-->
<People>
<Person id="1">
<Name>Joe</Name>
<Age>35</Age>
<Job>Manager</Job>
</Person>
<Person id="2">
<Name>Jason</Name>
<Age>18</Age>
<Job>Software Engineer</Job>
</Person>
<Person id="3">
<Name>Lisa</Name>
<Age>53</Age>
<Job>Bakery Owner</Job>
</Person>
<Person id="4">
<Name>Mary</Name>
<Age>90</Age>
<Job>Nurse</Job>
</Person>
</People>
Creating .xml Documents
Creating XML Documents from scratch
First things first, in LINQ to XML, each type of node in an XML document (element, comment, attribute, CData section etc) is mapped to a class. Generally, the rule is that the name of the class for a specific type of node follows the following rule:
XNameOfNodeType
So, for example, the class used to represent elements is the XElement class. The class to represent comments is the XComment class etc. We use these instances of these objects to create XML documents.
NOTE: A link to the documentation of all the classes in the System.Xml.Linq namespace is given at the end of the tutorial.
We shall begin by creating that document from scratch using Functional Construction (see the ‘Definitions of terms used’ section above).
//create xml document from scratch
XDocument document = new XDocument(
new XDeclaration("1.0", "utf-8", "yes"),
new XComment("Jason's xml"),
new XElement("People",
new XElement("Person", new XAttribute("id", 1),
new XElement("Name", "Joe"),
new XElement("Age", 35),
new XElement("Job", "Manager")
),
new XElement("Person", new XAttribute("id", 2),
new XElement("Name", "Jason"),
new XElement("Age", 18),
new XElement("Job", "Software Engineer")
),
new XElement("Person", new XAttribute("id", 3),
new XElement("Name", "Lisa"),
new XElement("Age", 53),
new XElement("Job", "Bakery Owner")
),
new XElement("Person", new XAttribute("id", 4),
new XElement("Name", "Mary"),
new XElement("Age", 90),
new XElement("Job", "Nurse")
)
)
);
//save constructed document
document.Save("People.xml");
Firstly, we instantiate a new XDocument object, of which represents the document as a whole. We then think to ourselves, ‘what comes next in the document we are trying to create?”. The answer is the declaration. So, we simply instantiate a new XDeclaration object; passing in the relevant details of the declaration (version, encoding etc) to the constructor. We then create the comment using a XComment object. Notice how the declaration and comment are wrapped inside the XDocument object.
How easy was that! Now however, we come to the elements in the document. What is the first element in the document? It’s the ‘People’ element. So, we instantiate a new XElement object to represent this element, passing in the name of the element as the first parameter. How do we set the value of that element however?
We simply pass in a second argument to the constructor of the XElement class. However, in our example, the ‘People’ element doesn’t contain a single textual value, it contains other child/nested elements. How do we deal with that?
Simple, just instantiate new XElement objects, and pass them to the XElement constructor as one of the constructors taks a params array of objects as its second parameter, thus giving you the power to nest as many nodes (comments, attributes, elements etc) inside other nodes as you wish!
I have tried to indent the code to create the XML document in the same way the XML document itself is indented. This is to show you the very close relationship between the look of the code, and the resulting document.
Functional construction is the reason why creating XML documents with the LINQ to XML API is so intuitive. You just nest object instantiations in the same way that the nodes are nested in the actual XML document. The code looks just like the resulting XML, and just by looking at the code, you can extremely accurately visualise what the resulting XML document would look like.
Plus the fact that you have dedicated objects mapped directly to the different types of nodes
Once we’ve built the document in code, we just call the Save() method on the XDocument instance; passing in the path of the resulting file, and that’s it!
Creating a document from an array of pre-built objects
Say we already have an array of people objects built in our application, and we want to use LINQ to XML to create the XML document above.
Say we have this basic Person class:
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
public string Job { get; set; }
}
We then create an array of 4 people objects, each of which represent the people described in our above XML file:
Person[] people = new Person[] {
new Person{ ID = 1, Name = "Joe", Age = 35, Job = "Manager"},
new Person{ ID = 2, Name = "Jason", Age = 18, Job = "Software Engineer"},
new Person{ ID = 3, Name = "Lisa", Age = 53, Job = "Bakery Owner"},
new Person{ ID = 4, Name = "Mary", Age = 90, Job = "Nurse"},
};
Now, all we have to do to create exactly the same document as before, but this time using the array of pre built ‘people’, is:
//create xml document from already constructed Person objects
XDocument document = new XDocument(
new XDeclaration("1.0", "utf-8", "yes"),
new XComment("Jason's xml"),
new XElement("People",
from person in people
select new XElement("Person", new XAttribute("ID", person.ID),
new XElement("Name", person.Name),
new XElement("Age", person.Age),
new XElement("Job", person.Job))
)
);
document.Save("People.xml");
It’s exactly the same as the first example, right up until we get to the element construction.
Basically, inside the ‘People’ element, we want elements representing people (‘Person’ elements) just as before. This time however, we have all the details of the people sitting in an array, therefore we don’t have to hard code them into the construction code. We just use a basic LINQ query to create the relevant elements using the information in the ‘people’ array.
What that query does is goes through each object in the ‘people’ array (assigning the current Person object to the ‘person’ variable), and constructs 4 new XElements for each Person object in the array, using the details held in the properties of the Person objects in the array.
The result is an IEnumerable<XElement> collection in which each ‘space’ in the collection is occupied by the 1 'person' XElement object, plus the 'person' element's three nested child elements (‘name’, ‘age’ and ‘job’).
There will be 4 ‘spaces’ (the collection has a length of 4), one for each person object enumerated.
So, for example, the first space in the resulting IEnumerable collection essentially looks like this when translated into XML:
Quote
<Person ID="1">
<Name>Joe</Name>
<Age>35</Age>
<Job>Manager</Job>
</Person>
<Name>Joe</Name>
<Age>35</Age>
<Job>Manager</Job>
</Person>
Now, the clever thing is that the IEnumerable will automatically be enumerated (looped through), and thus the query will be executed, when passed to the XElement constructor, so it’ll produce exactly the same output as the previous example, without us needing to manually loop through the elements in the IEnumerable and put them in the XElement constructor manually!
Querying XML
You can load a document (or element) into memory and use a selection of extension methods (in the Extensions Class) to produce collections of nodes (these are crucial to unlocking the power of LINQ to XML), and you can then query those collections using LINQ.
This means that querying XML documents is very easy, providing you have a grip on basic LINQ queries of course
Here are a couple of example queries that could be preformed on the example document above:
The first query:
//list of names of the people below 60 years of age
var names = (from person in Xdocument.Load("People.xml").Descendants("Person")
where int.Parse(person.Element("Age").Value) < 60
select person.Element("Name").Value).ToList();
This query begins by loading the document at the specified file path, and building a collection of child elements by calling the Decendants() method on the XDocument object. We want to get all the descendant elements that have the name ‘Person’, so we pass in the string; “Person”.
We now have a collection of ‘Person’ elements that we can query. What we then do is build a collection of names by looping through the element collection and selecting the value of the ‘Name’ element that is nested within the current ‘Person’ element, if (and only if) the value held within the ‘Age’ element (of which is also nested within the current ‘Person’ element) is less than 60.
Thus, we build a collection of the names of the people that are below 60 years of age.
Finally, we call ToList() on this collection to execute the query (remember the deferred execution model of LINQ?) and convert the resulting collection to a generic list. This list is stored in the implicitly typed ‘names’ variable.
The second query:
Say we want to select a random person from our XML document, maybe for a competition or something. We could do something like this:
//select a random person based on id
//generate random id
int random = new Random().Next(1, 5);
var person = (from p in Xdocument.Load("People.xml").Descendants("Person")
let id = int.Parse(p.Attribute("id").Value)
where id.Equals(random)
select new Person { ID = id, Name = p.Element("Name").Value, Age = int.Parse(p.Element("Age").Value), Job = p.Element("Job").Value }).Single();
First, we get a random id (between 1 and 4 inclusive) using a Random object. We then once again get a collection of ‘Person’ elements using the Descendants() method.
Next, we define a variable in the query body using the ‘let’ keyword. We assign this variable to the value held in the ‘id’ attribute of the current ‘Person’. This is just because we have to access that value twice in our query, so rather than accessing it and parsing it twice, we just do it once at the start.
Now, we create, instantiate, and select a brand new Person object (filling in the relevant property values of the person, using the details in the XML file), where the person’s id is equal to the number generated by the Random object at the start. Therefore, at the end, as all the id’s are unique to each individual, we have a collection of one person.
Consequently, we call the Single() method on the collection to return that element as a single person object. Therefore, at the end, the implicitly typed variable ‘person’ contains a Person object that represents the person from the .xml file with the randomly generated id!
Modifying an XML document
Modifying an XML document using LINQ to XML is very similar to querying the document and projecting values from it (as we did above). Except, this time, we modify the elements we get back from the queries, and then save the changes.
Here are three examples:
Adding to a .xml document
//adding an element
//load document
XDocument document = Xdocument.Load("People.xml");
document.Element("People").Add(
new XElement("Person", new XAttribute("id", 5),
new XElement("Name", "Carl"),
new XElement("Age", 24),
new XElement("Job", "Banker")
)
);
//Note, you could also use (and often see it being used) an XmlWriter from the System.Xml here to output the new xml file
//For simplicity, I just use the Save() method to overwrite the current .xml file
document.Save("People.xml");
In this example, we add a new ‘Person’ element to the XML document, and overwrite the old XML document with the new one
Firstly, we load the document and store it in the ‘document’ variable. We then need to specify where we want to add the element. In this case, we want to add the element so it is nested inside the ‘People’ tag (as that’s where the ‘Person’ elements go in our document). Therefore, we access the ‘People’ element through the ‘document’ instance’s Element() method, and call the Add() method on that, passing in the node we want to add (which in this case is an element with an attribute, and 3 other nested/child elements. Then we just save the new document. Easy as that!
Removing from a .xml document
//removing an element
//load document
XDocument document = Xdocument.Load("People.xml");
document.Root.Elements().Where(e => e.Attribute("id").Value.Equals("5")).Select(e => e).Single().Remove();
document.Save("People.xml");
We get a collection of all the immediate child elements of the Person elements by calling the Elements() method on the root element on the ‘document’ variable. That’s the difference between Descendants() and Elements(). Descendants() recursively finds all children; Elements() returns only immediate children. Click here for more details
The root element in this document is ‘People’.
We then select the element where the value of the current Person element’s id is 5. We then call Single() on the resulting collection to get the single XElement object back. We then call Remove() on that element, and save the changes.
Updating a .xml document
We update nodes simply by retrieving the node to update, and changing its .Value property, and then saving the changes.
In this example, as Lisa has changed her job from a Bakery Owner to a Florist, I will update her job accordingly. Secondly, I will update the comment I placed in the original document. I will update it from “Jason’s xml” to “My new, updated comment!”. I shall then save the changes:
//updating an xml document
//load document
XDocument document = Xdocument.Load("People.xml");
XElement root = document.Root;
// Update Lisa's job to florist
root.Elements("Person").Where(e => e.Element("Name").Value.Equals("Lisa")).Select(e => e.Element("Job")).Single().SetValue("Florist");
//update the comment
document.Nodes().OfType<XComment>().Single().Value = "My new, updated comment!";
document.Save("People.xml");
As usual, we begin by loading the document. We then get the root element of the document (‘People’) and store it in a variable for later use. We then get all the immediate child elements of the root element (the ‘Person’ elements). Then we select the ‘Job’ element associated with the Person of which has a ‘Name’ element equal to “Lisa” (what a mouth full!). Then, we change the this element's value from “Bakery Owner” to “Florist” using the SetValue() method.
Next, we update the comment. As I know I only have one comment in the document, I can just search for all nodes that are of type XComment. I can then just change its value using the .Value property.
Then, all that is left to do is to save the changes as usual.
XML Events
A brief primer on the events available as part of the LINQ to XML API. There are two events specifically:
- Changed
- Changing
As the names suggest, they are fired when changes are made to the document OR specific parts of the document (which, due to deferred execution, may NOT be at the time of query definition). You can register with these events to be notified when a change is happening, or has happened.
You can register for either (or both) of these events on any of the objects that represent the various pieces of an XML document (XDocument, XElement, XComment, XAttribute etc).
The thing to remember is that when you register for one of the above events on an object, the event is raised for that object AND any child/descendant objects it has. Therefore, if you register for one of the events on the root element of the document (or the document itself!), you will be notified whenever a change is made anywhere in the xml document.
The great thing about this is that we can write a query to drill down on a very specific node within a complex xml document, and then just register to one of the above events for that object, and you will get notified when that specific element is changed (or is changing).
Here is a brief example for completeness:
Say we want to know if Lisa’s id gets changed. We could register for the Changed event of the ‘id’ attribute of the ‘Person’ element of which has the ‘Name’ tag value of “Lisa” as so:
//load the document
XDocument document = Xdocument.Load("People.xml");
//regsitering to get notified when Lisa's id gets changed
document.Root.Elements("Person").Where(e => e.Element("Name").Value.Equals("Lisa")).Select(e => e.Attribute("id")).Single().Changed
+= (object sender, XObjectChangeEventArgs e) => Console.WriteLine("Type of object changed: {0}, Change Type: {1}",sender.GetType().Name, e.ObjectChange);
We load the document into the variable ‘document’. We then get all the immediate child elements of name “Person” and of the root element ‘People’. We then select the id attribute (an XAttribute object) of the ‘Person’ element of which has child ‘Name’ element with a value of “Lisa”.
We then access the ‘Changed’ event of the XAttribute object, and register to the event. We shall print out a message showing the type of object that was changed (should be XAttribute in this case), and the type of change made (should be a change to the .Value property in this case).
Now we have registered for the event, now we need to write a query to change that specific attribute value to see if the event is fired, and the message is printed…
Here is the update query:
//update query to change Lisa's id
document.Root.Elements("Person").Where(e => e.Element("Name").Value.Equals("Lisa")).Select(e => e.Attribute("id")).Single().SetValue("6");
Hopefully you can see what this is doing by now. It’s just drilling down to Lisa’s id attribute, and changing its value to 6.
Finally, we need to actually save the changes to file. I save a brand new document, just so I don’t change the original as that may mess up previous queries we have done:
document.Save("NewPeople.xml");
Oh, and look at that... This is the output you should get in the console:
Quote
“Type of object changed: XAttribute, Change Type: Value”
In Conclusion
So there you have it! This was meant to be an introduction to some of the key areas of LINQ to XML, but it is in no way shape or form an exhaustive guide. Not by a long way! Trust me; it’s a worthwhile API to learn if you do anything with XML documents. Plus, it’s just good to know to get general knowledge of LINQ and its various features.
Here are a few more links:
MSDN Documentation
System.Xml.Linq classes
Make sure you know all the methods in the Extensions class, as they are absolutely key to building queryable collections of objects from .xml files, and are the very essence of LINQ to XML!
So, until next time, enjoy your LINQ
See all the C# Learning Series tutorials here!
This post has been edited by CodingSup3rnatur@l-360: 13 March 2011 - 09:48 AM






MultiQuote





Dude, this was awesome!!!! Only place I've found that gives a good enough example of how to do the Linq query to update an XML node, based on a parent node having a certain value! Saved me lots of time and effort - many kudos!



|