1 Replies - 849 Views - Last Post: 30 April 2011 - 10:29 PM Rate Topic: -----

#1 unbound  Icon User is offline

  • New D.I.C Head

Reputation: 2
  • View blog
  • Posts: 20
  • Joined: 01-March 11

Python Dom

Posted 30 April 2011 - 03:28 PM

Okay so I have a question I am opening and reading a html file and converting it into xhtml and all of that works fine. Then I'm trying to find all the elements in the xhtml with a certain value and I'm pretty sure I have that working fine too. But the question is how to I go about getting the text from the elements? Well this is my attempt.

This is the code I am using to extract values
def extract_values(dm):
   lst = []
   l = get_elms_for_atr_val('tr','class','mwNormalLight')
   lst=get_text(l)
   return lst


This is the code I am using to get the elements with a certain value
def get_elms_for_atr_val(tag,atr,val):
   lst=[]
   elms = dom.getElementsByTagName(tag)
   for node in elms:
	if node.getAttribute(atr)==val:
		lst.append(node)
   return lst


This is the code I am using to get the text
def get_text(e):
   lst=[]
   for node in e:
	node=replace_white_space(str(node))
	node=replace_non_alpha_numeric(node)
	lst.append(node)
   return lst


This is the output I get
['DOM Element: tr at 0xb746496c', 'DOM Element: tr at 0xb746e6ac', 'DOM Element:tr at 0x9cca42c']



Is This A Good Question/Topic? 0
  • +

Replies To: Python Dom

#2 unbound  Icon User is offline

  • New D.I.C Head

Reputation: 2
  • View blog
  • Posts: 20
  • Joined: 01-March 11

Re: Python Dom

Posted 30 April 2011 - 10:29 PM

Sorry to double post but I think I might of figured it out. I changed my get_text function to this
def get_text(e):
   lst=[]
   for elementNode in e:
	parent=elementNode.childNodes
	for child in parent:
		if child.childNodes[0].nodeType == 3:
			child.childNodes[0]=replace_white_space(str(child.childNodes[0]))
			child.childNodes[0]=replace_non_alpha_numeric(str(child.childNodes[0]))
			lst.append(child.childNodes[0])
   return lst


And I get an output like this
['DOM Text node u Citigroup', 'DOM Text node u 4 59', 'DOM Text node u 268 936 82']


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1