5 Replies - 1144 Views - Last Post: 24 January 2013 - 11:12 AM Rate Topic: -----

#1 hyperreal  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 23-January 13

Iterating through XML files using lxml.etree

Posted 23 January 2013 - 07:26 PM

Hello. I'm trying to iterate through xml files with the lxml.etree module, but I'm having a bit of trouble. For now, I'm only using the one RSS feed, http://distrowatch.com/news/dw.xml. But eventually I'm going to have the program prompt user for feed url, store the input in a variable, then use the variable as the base_url for etree.parse().

from lxml import etree

tree = etree.parse("http://distrowatch.com/news/dw.xml")
root = tree.getroot()

for channel in root:
    for item in channel:
        print(item.text)



This prints out a good chunk of data from the xml file, but what I'd really like to do is print out only the <title> of the feed headline, which is contained within the <item> tag. At first I thought of doing something like this:
for channel in root:
    for item in channel:
        print(item[0].text)


But that didn't work, as I got a 'list index out of range' error.

Is This A Good Question/Topic? 0
  • +

Replies To: Iterating through XML files using lxml.etree

#2 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3371
  • View blog
  • Posts: 11,420
  • Joined: 12-December 12

Re: Iterating through XML files using lxml.etree

Posted 23 January 2013 - 07:44 PM

for channel in root:
    for item in channel:
        print(item[0].text)




The loop-variable item that you create is not the same as the item tag that you are looking for. It is also not a list (or string) so [0] probably won't work. Although, it does seem to recognise it as a list(?).

It is late.. but I'm guessing that item is the first child from the root and for item in channel loops through just the first-level child elements? in which case you'll need to look into the docs for etree to see how to refer to the child (named title) of each item and retrieve its text.

You'll probably want to include an IF just in case some items don't have a title - even though, presumably, they should have.

This post has been edited by andrewsw: 23 January 2013 - 07:46 PM

Was This Post Helpful? 0
  • +
  • -

#3 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3371
  • View blog
  • Posts: 11,420
  • Joined: 12-December 12

Re: Iterating through XML files using lxml.etree

Posted 23 January 2013 - 07:51 PM

for itm in root.findall('item'):    # if item are direct children
    print(itm.find('title').text)


although I was looking at the wrong etree :whistling:/> so perhaps 'item' doesn't need to be a direct child.

Found a correct reference here:
http://lxml.de/tutorial.html

This post has been edited by andrewsw: 23 January 2013 - 07:59 PM

Was This Post Helpful? 0
  • +
  • -

#4 hyperreal  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 23-January 13

Re: Iterating through XML files using lxml.etree

Posted 23 January 2013 - 10:40 PM

View Postandrewsw, on 23 January 2013 - 07:51 PM, said:

for itm in root.findall('item'):    # if item are direct children
    print(itm.find('title').text)


although I was looking at the wrong etree :whistling:/>/> so perhaps 'item' doesn't need to be a direct child.

Found a correct reference here:
http://lxml.de/tutorial.html


Yes, I've tried that and it didn't print anything to the console. I've already looked at the documentation too, and I can't seem to find anything that works.
Was This Post Helpful? 0
  • +
  • -

#5 hyperreal  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 3
  • Joined: 23-January 13

Re: Iterating through XML files using lxml.etree

Posted 23 January 2013 - 11:02 PM

Well, I have found something that works well enough. Not *exactly* what I wanted, but it works good enough for now until I find a better way to extract data via iteration.
for element in root.iter():
    print(element.text)


This prints the data neatly to the console. Thanks for your help!
Was This Post Helpful? 0
  • +
  • -

#6 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3371
  • View blog
  • Posts: 11,420
  • Joined: 12-December 12

Re: Iterating through XML files using lxml.etree

Posted 24 January 2013 - 11:12 AM

You might want to look into specific RDF parsers, although the one in that link is a little old.

https://github.com/RDFLib
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1