3 Replies - 791 Views - Last Post: 19 January 2013 - 02:10 PM Rate Topic: -----

#1 Alyssa Saila  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 72
  • Joined: 07-January 12

How To Extract Every <Name> & Value From *Any XML File

Posted 19 January 2013 - 12:35 PM

Using System.xml I can loop through child and parent nodes of an xml file.
Although I'm still a little fuzzy on the term "node" and or "xml element",
What I plan to achieve is to get all the following data from any xml file:

INSIDE BRACKET TEXT:
<insideBracketText>

BRACKET VALUE:
bracketValue</insideBracketText>

ATTRIBUTE NAME:
xmlAttributeName=""

ATTRIBUTE VALUE:
="attributeValue"

As each piece of data is found I will add it to its corresponding List.

Because these xml files are not written by me, these xml files could be
nested or written any number of ways. As a result, using System.xml to my
knowledge
can only get me so far with extracting the data this way.

I believe I will not be able to extract all of these things from any random
xml file without using a lot of .Substring() .Index() operations.

I have hundreds of lines of trial code but decided it was too messy and not
worth posting. My spaghetti code gets me 99% there with any xml file but
there's always one or two things not quite right about it.
Moreover, because my code is so excessive the errors are difficult to find.

My initial approach was to concatenate the entire file as a string and break
each line by the right bracket and equal sign to isolate each name and value.

This actually works well for simple xml files, but for more complex xml files
with bracket-attribute nested data (including CDATA bracket tags) it fails.

Simply asking if anyone has any cleaner recommendations for my goal.

My question...

Is there any program or .Net class that can extract these values the way I'm
trying to extract them more efficiently than looping through each line and
performing multiple .Substring() .IndexOf() operations?


Is This A Good Question/Topic? 0
  • +

Replies To: How To Extract Every <Name> & Value From *Any XML File

#2 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3838
  • View blog
  • Posts: 13,595
  • Joined: 12-December 12

Re: How To Extract Every <Name> & Value From *Any XML File

Posted 19 January 2013 - 12:57 PM

Look at XMLReader. There are other ways to parse XML.

I agree with tlhIn`toq though; you are not taking advantage of the features of XML, you are ignoring them and currently reading it as an unstructured text-file.

This post has been edited by andrewsw: 19 January 2013 - 01:03 PM

Was This Post Helpful? 2
  • +
  • -

#3 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3838
  • View blog
  • Posts: 13,595
  • Joined: 12-December 12

Re: How To Extract Every <Name> & Value From *Any XML File

Posted 19 January 2013 - 01:52 PM

The XMLReader can navigate the entire document extracting nodes, node names, and attribute names and values. you might, for example, store the retrieved values in another structure (a List, etc.) - if you really needed to!

Node (Nodes and NodeLists), Attributes and their values, are the proper terminology for XML. In HTML we tend to use the term tags, but they are, also, nodes.

The use of the term Elements is a more general term, but equally valid. I believe that an Element might refer to a Node and all of its descendant nodes, but this is not written in stone.

This post has been edited by andrewsw: 19 January 2013 - 01:54 PM

Was This Post Helpful? 2
  • +
  • -

#4 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3838
  • View blog
  • Posts: 13,595
  • Joined: 12-December 12

Re: How To Extract Every <Name> & Value From *Any XML File

Posted 19 January 2013 - 02:10 PM

Actually, just to bore a little further.., the technical term in XML is Element, and Node is more correct for the DOM (HTML documents). But, across the web - the world!, people have switched this order, and now we use the terms inter-changeably :dontgetit:. The IT industry is full of such challenges!
Was This Post Helpful? 2
  • +
  • -

Page 1 of 1