2 Replies - 1416 Views - Last Post: 19 November 2012 - 02:55 PM Rate Topic: -----

#1 Acidogenic  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 5
  • Joined: 07-July 12

Parsing XML with multiple identical tags with xml.sax

Posted 15 November 2012 - 04:46 PM

The XML I'm tring to parse is (roughly) as follows and I want to pull out the <wanted> tags which are unique to each <top> and can range from approximately 1-30 <wanted> tags:


        ....
        </top 
        <top>
            <have0>Python</have0>
            <wanted>Spam</wanted>
            <wanted>Eggs</wanted>
            <wanted>Baked Beans</wanted>
            <wanted>Lobster Thermidor a Crevette with a mornay sauce served in a Provencale manner with shallots and aubergines garnished with truffle pate, brandy and with a fried egg on top</wanted>
            <have1>Eric Idle</have1>
            <have2>Flying Circus</have2>
        </top>
        <top>
       .....




I am currently using a SAX parser in python 3.2 to grab the <have> tags' data and then put all that into a dictionary for future use. Is there any way to use SAX or do I have to use another method to get the data, group the data in a list and put the data in a dictionary as follows (I have the 'haves' working)?:

D['Python']={
     'Wanted':['Spam', 'Eggs', 'Baked Beans', 'Lobster Thermidor a...']
     'Have1':'Eric Idle'
     'Have2':'Flying Circus'



Thanks is advance,

Chris

Is This A Good Question/Topic? 0
  • +

Replies To: Parsing XML with multiple identical tags with xml.sax

#2 alexr1090  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 44
  • View blog
  • Posts: 124
  • Joined: 08-May 11

Re: Parsing XML with multiple identical tags with xml.sax

Posted 16 November 2012 - 08:10 PM

Have you tried using BeautifulSoup to do your parsing? I don't know anything about SAX but I've had some experience with BeautfiulSoup and it's quite easy to parse pages with. After you create a soup object using some html you got using urllib2 you can type commands like
from bs4 import BeautifulSoup
import urllib2

html = urllib2.urlopen('http://www.fakepage.com').read()

soup = BeautifulSoup(html) # you may have to specify 'xml' as a second argument

print soup.wanted
print soup.have1

#etc


hope this helps!
oh, and here's a beautifulSoup tutorial
Was This Post Helpful? 0
  • +
  • -

#3 Acidogenic  Icon User is offline

  • New D.I.C Head

Reputation: 1
  • View blog
  • Posts: 5
  • Joined: 07-July 12

Re: Parsing XML with multiple identical tags with xml.sax

Posted 19 November 2012 - 02:55 PM

Actually, I swapped to an xml.etree method to parse. I need to run it once every 3 months or so to generate a new pickle for another program. Thanks though!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1