Subscribe to Stuck in an Infiniteloop        RSS Feed
***** 1 Votes

Parsing XML in C# Part 1: XmlReader

Icon 9 Comments
Yes, parsing Dream.In.Code XML feeds, now with C#!

You can find the Java XML tutorials here, here, and here.


Conceptually, very little has changed from the Java version(s). Since we are dealing with XML, the language in which you use [to parse] is often irrelevant. What does differ is the fine detailing, each language's version of the core XML parse methods (namely, DOM and StAX). As of this writing I am unaware of a SAX implementation in C# (aka push parsing, if you know of one, either in the standard or a third party, comment below or shoot me a PM).
--

XmlReader

XmlReader is a StAX implementation. It is a "pull parser". This allows us to read until we feel like stopping or reach the end of the XML feed. As mentioned in previous posts (or if you're already familiar with XML), this is the "middle ground" between push parsing of SAX and the entire in memory document of DOM.

Grabbing information from Dream.In.Code requires a URL address (however, XML parsing can be done from a regular File as well). The user provides the member ID (which can be found by visiting anyone's profile). The path is the same except for that specifier: http://www.dreaminco...l.php?showuser=


UML Diagram of the App:

Attached Image

Setting up XmlReader in C#:

  • Get an instance of XmlReader via uri/input
  • Read Stream


In code, it looks like:

private XmlReader reader;
//...
reader = XmlReader.Create(url);
//read stream



If we wanted to read through the entire document given to our XmlReader, it's as easy as:

while(reader.Read())
{
     //do stuff
}



At the most basic level we have the start of an element, the contents of that element, and the end of the element. There is a corresponding "tag type" for each:

XmlNodeType.Element //start
XmlNodeType.EndElement //end
XmlNodeType.Text //actual content



Full list can be found here.

The generic steps are as follows:

  • Get the next node
  • Take an action based on node type
  • Exit when desired or at EOF


The above looks like this in code:

while (reader.Read())
{
    //grab some info from the current node
    curNode = reader.NodeType;
    switch (curNode)
    {
        case XmlNodeType.Element:
            tagName = reader.Name; //important, loses scope on iteration
            //also a good time to get attribute information
            break;
        case XmlNodeType.EndElement:
            //do we want to get out at a certain node?
            break;
        case XmlNodeType.Text:
            //Do something with the actual content
            break;
        default:
            //here for debug purposes, XML should be well formed
            break;
    }
}



Some screenshots of the program in action (visually designed to look about the same as the Java implementation):

Attached Image

Source:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;

namespace DICXML_Project
{
    static class Program
    {
        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            Application.EnableVisualStyles();
            Application.SetCompatibleTextRenderingDefault(false);
            Application.Run(new MainForm());
        }
    }
}



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Drawing;

namespace DICXML_Project
{
    public class DICHead
    {
        public string Name { get; set; }
        public string JoinDate { get; set; }
        public string Group { get; set; }
        public string TotalPosts { get; set; }
        public Bitmap Picture { get; set; }
        public Color GroupColor { get; set; }

        public DICHead()
        {
            Name = JoinDate = Group = TotalPosts = "";
        }

        //debug
        public void Display()
        {
            Console.WriteLine("Name: {0}", Name);
            Console.WriteLine("Join Date: {0}", JoinDate);
            Console.WriteLine("Group: {0}", Group);
            Console.WriteLine("TotalPosts: {0}", TotalPosts);
            Console.WriteLine("Image (string): {0}", Picture.ToString());
        }
    }
}



using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Xml;
using System.Net;
using System.IO;

namespace DICXML_Project
{
    public partial class MainForm : Form
    {
        //Dream.In.Code
        private const string path = @"http://www.dreamincode.net/forums/xml.php?showuser=";
        private DICHead thePerson;
        //XML
        private XmlReader reader;

        public MainForm()
        {
            InitializeComponent();
            thePerson = new DICHead();
        }

        private void ParseButton_Click(object sender, EventArgs e)
        {
            try
            {
                int userID = System.Int32.Parse(InputTextBox.Text);
                if (userID <= 0) throw new Exception("Invalid user number");
                ReadStream(String.Format(path+userID)); //should properly escape this later
                FillOutDetails();
                //debug
                //thePerson.Display();
            }
            catch (Exception ex)
            {
                //debug
                //Console.WriteLine(ex.InnerException);
                MessageBox.Show(ex.Message, "An Error Occurred");
            }
        }

        public void ReadStream(string url)
        {
            //debug
            //Console.WriteLine("URL passed in: {0}", url);
            reader = XmlReader.Create(url);
            bool notDone = true;
            string tagName = "", color = "";
            XmlNodeType curNode;
            //let us stream!
            while (reader.Read() && notDone)
            {
                //grab some info from the current node
                curNode = reader.NodeType;
                switch (curNode)
                {
                    case XmlNodeType.Element:
                        tagName = reader.Name; //important, loses scope on iteration
                        if(tagName.Equals("span"))
                        {
                            color = reader.GetAttribute(0); //save for later
                        }
                        break;
                    case XmlNodeType.EndElement:
                        if(tagName.Equals("joined"))
                        {
                            //debug
                            //Console.WriteLine("Breaking out of the stream");
                            notDone = false;
                        }
                        break;
                    case XmlNodeType.Text:
                        //debug
                        //Console.WriteLine(tagName + ":\t" + reader.Value);
                        if (tagName.Equals("name"))
                        {
                            thePerson.Name = reader.Value;
                        }
                        else if (tagName.Equals("photo"))
                        {
                            WebClient web = new WebClient();
                            thePerson.Picture = new Bitmap(new BufferedStream(web.OpenRead(reader.Value)));
                        }
                        else if (tagName.Equals("group"))
                        {
                            thePerson.Group = reader.Value;
                            thePerson.GroupColor = Color.Black;
                        }
                        else if (tagName.Equals("span"))
                        {
                            string groupName = reader.Value;
                            thePerson.Group = groupName;
                            if (groupName.Equals("Moderators"))
                            {
                                thePerson.GroupColor = Color.Blue;
                            }
                            else if (groupName.Equals("Admins"))
                            {
                                thePerson.GroupColor = Color.Green;
                            }
                            else
                            {
                                string html = color.Substring(7, 6);
                                int[] rgb =     {
                                                    Convert.ToInt32(html.Substring(0, 2), 16),
                                                    Convert.ToInt32(html.Substring(2, 2), 16),
                                                    Convert.ToInt32(html.Substring(4, 2), 16)
                                                };
                                thePerson.GroupColor = Color.FromArgb(rgb[0], rgb[1], rgb[2]);
                            }
                        }
                        else if (tagName.Equals("posts"))
                        {
                            thePerson.TotalPosts = reader.Value;
                        }
                        else if (tagName.Equals("joined"))
                        {
                            thePerson.JoinDate = reader.Value;
                        }
                        break;
                    default:
                        //debug
                        //Console.WriteLine("Encountered unknown element. Taking no action");
                        break;
                }
            }
        }
        public void FillOutDetails()
        {
            AvatarBox.Image = thePerson.Picture;
            NameLabel.Text = String.Format("Name: {0}", thePerson.Name);
            JoinDateLabel.Text = String.Format("Join Date: {0}", thePerson.JoinDate);
            GroupLabel.ForeColor = thePerson.GroupColor;
            GroupLabel.Text = String.Format("Group: {0}", thePerson.Group);
            TotalPostsLabel.Text = String.Format("Total Posts: {0}", thePerson.TotalPosts);
        }
    }
}



Designer generated GUI code:
Spoiler

--

Happy coding!

9 Comments On This Entry

Page 1 of 1

alias120 Icon

29 October 2010 - 06:17 PM
Thank you for sharing this KYA. How are you liking C# so far? I picked up one of the C# books you mentioned, going to start reading through it once I have some free time.
0

KYA Icon

29 October 2010 - 07:01 PM
It's beautiful. I'm going to have a hard going back and writing accessors/mutators by hand.
0

alias120 Icon

29 October 2010 - 07:27 PM
Glad to hear it. I have barely touched on C#, but that little bit of use was great. Instead of jumping from C++ to C# I decided to take a detour and go a little lower to x86 ASM. I am excited to come back to C# at some point, but i've been told that once you learn assembly you gain a greater appreciation for how everything "works".
0

Sergio Tapia Icon

30 October 2010 - 06:03 AM
I'm parsing the DIC xml feeds as well using C# :P. Here's an example of my code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.Xml.Linq;
using System.Xml.XPath;

namespace SharpDIC.Entities
{
    /// <summary>
    /// Represents a user on the Dream.In.Code website.
    /// </summary>
    public class User
    {
        /********************************************************************************
         * Some of these attributes aren't even used. The API doesn't provide them yet, *
         * so I'll have to scrape the information from the HTML itself. Still thinking  *
         * about how to tackle this.                                                    *
         *                                                                              *
         * Author: Sergio Tapia                                                         *
         * Website: http://www.alphaot.com                                              *
         * ******************************************************************************/

        #region "Attributes"        
        public string ID { get; set; }
        public string Name { get; set; }
        public string Rating { get; set; }
        public string Photo { get; set; }
        public string LastActive { get; set; }
        public string Location { get; set; }
        public string Birthday { get; set; }
        public string Age { get; set; }
        public string Gender { get; set; }
        public string Email { get; set; }


        public string Title { get; set; }
        public string Reputation { get; set; }
        public string DreamKudos { get; set; }
        public string Group { get; set; }
        public string Posts { get; set; }
        public string PostsPerDay { get; set; }
        public string MostActiveIn { get; set; }
        public string JoinDate { get; set; }
        public string ProfileViews { get; set; }

        public string FavoriteOS { get; set; }
        public string FavoriteBrowser { get; set; }
        public string FavoriteProcessor { get; set; }
        public string FavoriteConsole { get; set; }

        public List<Visitor> Visitors { get; set; }
        public List<Friend> Friends { get; set; }
        public List<Comment> Comments { get; set; }
        public string ProgrammingLanguages { get; set; }

        public string AIM { get; set; }
        public string MSN { get; set; }
        public string Website { get; set; }
        public string ICQ { get; set; }
        public string Yahoo { get; set; }
        public string Jabber { get; set; }
        public string Skype { get; set; }
        public string LinkedIn { get; set; }
        public string Facebook { get; set; }
        public string Twitter { get; set; }
        public string XFire { get; set; }
        #endregion

        /// <summary>
        /// Load a user by providing an ID.
        /// </summary>
        /// <param name="ID">A user's individual ID number.</param>
        public User(string ID)
        {
            XDocument xmlResponse = GetUserXMLResponse(ID);            
                        
            LoadGeneralInformation(xmlResponse.Element("ipb").Element("profile"));
            LoadContactInformation(xmlResponse.Element("ipb").Element("profile").Element("contactinformation"));
            LoadLatestVisitors(xmlResponse.Element("ipb").Element("profile").Element("latestvisitors"));
            LoadFriends(xmlResponse.Element("ipb").Element("profile").Element("friends"));
            LoadComments(xmlResponse.Element("ipb").Element("profile").Element("comments"));            
        }

        #region "Loading Methods - give them XML and they'll do the job."        
        private void LoadGeneralInformation(XElement profileXML)
        {
            this.ID = (string)profileXML.Element("id");
            this.Name = (string)profileXML.Element("name");
            this.Rating = (string)profileXML.Element("rating");
            this.Photo = (string)profileXML.Element("photo");
            this.Reputation = (string)profileXML.Element("reputation");
            this.Group = (string)profileXML.Element("group").Element("span");
            this.Posts = (string)profileXML.Element("posts");
            this.PostsPerDay = (string)profileXML.Element("postsperday");
            this.JoinDate = (string)profileXML.Element("joined");
            this.ProfileViews = (string)profileXML.Element("views");
            this.LastActive = (string)profileXML.Element("lastactive");
            this.Location = (string)profileXML.Element("location");
            this.Title = (string)profileXML.Element("title");
            this.Age = (string)profileXML.Element("age");
            this.Birthday = (string)profileXML.Element("birthday");
            this.Gender = (string)profileXML.Element("gender").Element("gender").Element("value");
        }

        private XDocument GetUserXMLResponse(string ID)
        {
            WebClient webClient = new WebClient();
            string htmlSource = webClient.DownloadString(new Uri(String.Format("http://www.dreamincode.net/forums/xml.php?                     
                                showuser={0}", ID)));
            return Xdocument.Parse(htmlSource);
        } 
}



If you use XDocument and XElement your code will be much more easier to read.

You can find the entire project hosted on BitBucket:
http://bitbucket.org...arpdic/overview
0

KYA Icon

30 October 2010 - 08:14 AM
Is it cleaner, but it is possible someone may want/have a need to use the "vanilla" XML implementations within the .NET platform rather then LINQ to XML. Additionally, DOM is out of the scope of this particular post (it's coming down the pipe though...)
0

Sergio Tapia Icon

30 October 2010 - 11:47 AM
What do you mean, "need to use the "vanilla" XML implementations"? You're parsing XML as well by using the XMLNodeType property
0

KYA Icon

30 October 2010 - 12:43 PM
XmlReader/XmlDocument (among other things) are the non LINQ implementations on the .NET platform.

System.Xml and System.Xml.Linq are two separate entities for a reason.
0

Curtis Rutland Icon

01 November 2010 - 07:13 AM

Quote

System.Xml and System.Xml.Linq are two separate entities for a reason.


This is totally true, but remember that LINQ is part of the Framework, so it would be considered "vanilla" as well. Unless by that you mean "older", which System.Xml certainly is. It's available in versions prior to 3.5, and actually a lot of people still use it, because that's either what they first learned, or they don't care for LINQ.

I personally love the LINQ implementation of System.Xml.Linq, and I suggest you give both ways a try, KYA. If nothing else, it's good to learn about LINQ.
0

KYA Icon

01 November 2010 - 04:30 PM
You are absolutely correct. Geez, if you guys would let me get to "Part 3". ;)
0
Page 1 of 1

September 2014

S M T W T F S
 123456
78910111213
14151617181920
21 222324252627
282930    

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    0 user(s) viewing

    0 Guests
    0 member(s)
    0 anonymous member(s)