Xpath, Python, and Strange XML

  • (2 Pages)
  • +
  • 1
  • 2

15 Replies - 1000 Views - Last Post: 19 January 2012 - 08:41 AM Rate Topic: -----

Topic Sponsor:

#1 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Xpath, Python, and Strange XML

Posted 17 January 2012 - 11:37 AM

Let's jump right in!

I have the following XML tag:

<Entities TotalResults="1">

I am interested in capturing that TotalResults value, however I am unsure how to go about this. First off, I simply don't know what that is called in the tag.

I'm hoping this can be captured with Xpath and specifically implicated in a python script I'm making; if it can be done with Element Tree, so much the better. This script captures XML data from a REST API call, this TotalResults value is the number of child elements in this Entities tag.

Any insight would be appreciated.

This post has been edited by Wuzseen: 17 January 2012 - 11:38 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Xpath, Python, and Strange XML

#2 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 17 January 2012 - 11:43 AM

Which XML parser module are you using?
Was This Post Helpful? 0
  • +
  • -

#3 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 17 January 2012 - 12:18 PM

ElementTree, it was suggested as a really simple Xpath/XML parser module.
Was This Post Helpful? 0
  • +
  • -

#4 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 17 January 2012 - 12:23 PM

The term you are looking for is "attribute". If e is your "Entities" tag, you would use e.attrib["TotalResults"] to access the "TotalResults" attribute.

This post has been edited by Motoma: 17 January 2012 - 12:23 PM

Was This Post Helpful? 1
  • +
  • -

#5 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 17 January 2012 - 12:26 PM

Muchas gracias! That solves it. Danke.
Was This Post Helpful? 0
  • +
  • -

#6 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 11:22 AM

I think I spoke too soon, I got the high level down--your solution was perfect for that. Via the following code I got what I wanted:

tree = ET.parse('C:\\xmldump.xml')
root = tree.getroot()
print root.attrib["TotalResults"]    



XmlDump.xml is the file where I dumped the XML returned via my API call.

That was the high level, but there is now another layer I need to get into; my tinkerings haven't yielded success and i believe I'm misinterpreting the Documentation for ElementTree.

Beyond the root Elements tag (described in the first post), there are a plethora of Field tags with a Name attribute and another Value child tag within. That value is something I need to capture, the names are always consistent from API query to query and I'm simply going to map the order of values. What I need to do then, is query all the child "Field" tags and store those values in a list.

There's two ways I see to handle this.

1) Make the API call multiple times and only return the specific values I'm looking for--this is ultimately going to be on a webpage however and multiple queries with API really slows down the page loading process (I'm setting up this webpage via django).

2) Grab all the XML at once and filter through the XML; should be quicker as it is parsing text as opposed to waiting for the server the query the API.

The second solution is what is probably best, what I'm querying in the API is a bunch of defects found in tests (I'm in Quality Assurance), each defect is registered with a bunch of fields. I'm looking to compile metrics on the teams responsible for each defect; the responsible group is reflected as a field. The API returns the list of defects as XML, the Name attribute in the Field tag I spoke of refers to the field registered with the defect. I can query defects per team with the API or I can query all the defects and count the number of times the value for each team shows up. If team X shows up in Y field entities, then there are Y defects for team X.

So with ElementTree/Xpath I need to return all instances/matches of a certain Xpath query. I am honestly uncertain how to do this. ElementTrees documentation shows how to do xpath but it doesn't really say how you're implementing it; I am also unfamiliar with how Xpath returns lists--I presume it's just an iterable object with strings but I could be wrong.
Was This Post Helpful? 0
  • +
  • -

#7 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 12:09 PM

You will want to utilize the XMLParser to build a dataset that you can then use to perform your aggregate analysis.

If you post a small sample of your XML, I can help you through it. Also post the code you are starting with.
Was This Post Helpful? 0
  • +
  • -

#8 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 12:19 PM

Alrighty, here's a chunk of some sample XML, this is just a portion, but it's the same throughout (more or less, it should be sufficient methinks)didn't want to link all of it as it's A: long and B: has private data!
<Entities TotalResults="2">
<Entity Type="defect">
<Fields>
<Field Name="has-change">
 <Value/>
 </Field>
<Field Name="planned-closing-ver">
 <Value/>
 </Field>
<Field Name="reproducible">
 <Value>Y</Value>
 </Field>
 <Field Name="user-22"/>
 <Field Name="user-23"/>
 <Field Name="user-20"/>
 <Field Name="user-21"/>
<Field Name="user-26">
 <Value/>
 </Field>

  ....

</Entity>
</Entities>




Also here's the code I'm using to get the XML (I'm deleting some of the URLs, there are endpoints in the API call that are confidential, my replacement statements are apt for the day methinks):

f = open('C:\\xmldump.xml', 'w')
contents = ""
try:
    resp = urllib2.urlopen("%s/%s" % (urlbase, "Stop internet censorship!"))
    #resp = urllib2.urlopen("%s/%s" % (urlbase, "No on SOPA and PIPA!"))
    contents = resp.read()
    f.write(contents)
    print contents
except urllib2.HTTPError, error:
    contents = error.read()
    print contents

f.close()

tree = ET.parse('C:\\xmldump.xml')
#tree = ET.XML(contents)
root = tree.getroot()
print root.attrib["TotalResults"]    



Most of those print statements would eventually be commented out, I'm debugging this with the Python Visual Studio plugin before I implement it in my django app--thus the print statements. I was tinkering with parsing the XML from a string rather than an XML file, but despite what the documentation says, the XML function doesn't return an Element object thus getroot wasn't working. (Using Python 2.7 as a result of Django necessity).

I imagine most of the necessary code would belong in the space between the last two lines.

EDIT: You're suggesting using the built in XML parser included with Python?

This post has been edited by Wuzseen: 18 January 2012 - 12:29 PM

Was This Post Helpful? 0
  • +
  • -

#9 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 12:48 PM

Yes, I am recommending you use the XMLParser object to facilitate this, similar to:
from xml.etree.ElementTree import XMLParser

exml = 'XML that has been read() from a file or a URL'
class ElementParser:
    result_data = {}
    current_tag = ''
    current_name = ''

    def start(self, tag, attr):
        if tag == 'Field':
            self.current_name = attr['Name']

        self.current_tag = tag

    def data(self, data):
        if self.current_tag = 'Value':
            self.result_data[self.current_name] = data

element_parser = ElementParser()
xml_parser = XMLParser(target=element_parser)
xml_parser.feed(exml)
xml_parser.close()

# element_parser.result_data is a dictionary of
# {'Name': Value, 'Name': Value}
print(element_parser.result_data)


Was This Post Helpful? 1
  • +
  • -

#10 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 01:03 PM

I went to try your code, setting exml to be the large xml string I achieved (I let the rest lie alone), I got these errors at runtime

Quote

xml_parser.feed(exml)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1622, in feed
self._parser.Parse(data, 0)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1531, in _end
return self.target.end(self._fixname(tag))
AttributeError: ElementParser instance has no attribute 'end'


EDIT: I'm still looking, I'm not sure what the compiler wants from me though.

This post has been edited by Wuzseen: 18 January 2012 - 01:10 PM

Was This Post Helpful? 0
  • +
  • -

#11 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 01:37 PM

Inside the class, def end(self, tag): pass should do.

Sorry about that.
Was This Post Helpful? 0
  • +
  • -

#12 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 01:44 PM

Progress! And no need to apologize man, you've been super helpful :)

This seems to have pushed a similar error but for close, I attempted to use a similar function like so:
    def close(self, tag):
        pass



Alas, I don't really know what this function or class does. It looks to me like the XMLParser object uses a model to iterate through the tags in a chunk of XML? Said model being a class with the specified attributes being methods within? I'm not super programmer by any means, C# and C++ are my "main" languages anyhow.
Was This Post Helpful? 0
  • +
  • -

#13 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 01:47 PM

The way the XMLParser object works is:
  • You give the parser an instance of a class. This class impliments start, end, data, and close. These are callbacks for the parser.
  • You tell the XMLParser instance to parse a file.
  • When the start of a tag is found, your class' start() function is called, likewise for data, end, and close.

Was This Post Helpful? 0
  • +
  • -

#14 Wuzseen  Icon User is offline

  • D.I.C Head

Reputation: 3
  • View blog
  • Posts: 72
  • Joined: 04-October 11

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 01:55 PM

Word, that's kind of trippy.

Any ideas on what I need to do with close? It's giving me the same missing attribute error:

Quote

File "C:\Python27\lib\xml\etree\ElementTree.py", line 1637, in close
tree = self.target.close()
AttributeError: ElementParser instance has no attribute 'close'

Was This Post Helpful? 0
  • +
  • -

#15 Motoma  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 443
  • View blog
  • Posts: 792
  • Joined: 08-June 10

Re: Xpath, Python, and Strange XML

Posted 18 January 2012 - 02:00 PM

If you look at the example in the documentation link I posted above, you will see that close() takes no arguments, and therefore should be defined like def close(self): pass.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2