5 Replies - 5142 Views - Last Post: 30 April 2010 - 02:20 PM

#1 mapexx   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 30-November 09

PYTHON program to check HTML files for matching tags

Posted 29 April 2010 - 12:21 PM

My program has to use a command line to read an HTML or xHTML file and store all of the tags into a list. So if I had(these are just random tags):

<head>
<head>
<html>
<br />
<body>"Hello this is a file"</body>
</html>


I would need to store <head>, <head>, <html>, <br />, <body>, <body>, </html> into a list, ignoring any text between them. Also I need to store the self closing tags, the < br/>

Right now I have this:
list = []
file = open("text.txt", "r")
for line in file:
    for char in file:
        if char == "<":
            list.append(char)
print list



Obviously, this stores the "<"'s into list(this is still me just messing around trying to understand it)
I don't understand how to store the entire tag aswell as self closing tags into the list while ignoring any text or tags that are not properly written. Any help is appreciated. Thanks!! :)

This post has been edited by mapexx: 29 April 2010 - 12:21 PM


Is This A Good Question/Topic? 0
  • +

Replies To: PYTHON program to check HTML files for matching tags

#2 programble   User is offline

  • (cons :dic :head)

Reputation: 50
  • View blog
  • Posts: 1,315
  • Joined: 21-February 09

Re: PYTHON program to check HTML files for matching tags

Posted 29 April 2010 - 02:00 PM

Try using regex. This pattern should match all the tags:
</?[^>]+>
Was This Post Helpful? 1
  • +
  • -

#3 mapexx   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 30-November 09

Re: PYTHON program to check HTML files for matching tags

Posted 29 April 2010 - 05:00 PM

Thank you for your response Programble, but I haven't learned any of that yet..I'm supposed to use only what I know :\

There has to be a simple way of doing it.. any other ideas?
Was This Post Helpful? 0
  • +
  • -

#4 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7419
  • View blog
  • Posts: 15,374
  • Joined: 16-October 07

Re: PYTHON program to check HTML files for matching tags

Posted 29 April 2010 - 06:36 PM

It's kind of hard to show how this works without giving it away. Basically, when you hit '<' you simply record characters until you hit '>'.

This is what it looks like to scan through the line:
line = "<html><body>This is a test</body>"
word = None
for ch in line:
	if word==None:
		if ch=='<':
			word = ch
	else:
		word += ch
		if ch=='>':
			print word
			word = None



Hope this helps.
Was This Post Helpful? 0
  • +
  • -

#5 programble   User is offline

  • (cons :dic :head)

Reputation: 50
  • View blog
  • Posts: 1,315
  • Joined: 21-February 09

Re: PYTHON program to check HTML files for matching tags

Posted 30 April 2010 - 12:57 PM

Which is what regex does for you, it's worthwhile to learn and master.
Was This Post Helpful? 0
  • +
  • -

#6 baavgai   User is offline

  • Dreaming Coder
  • member icon


Reputation: 7419
  • View blog
  • Posts: 15,374
  • Joined: 16-October 07

Re: PYTHON program to check HTML files for matching tags

Posted 30 April 2010 - 02:20 PM

View Postprogramble, on 30 April 2010 - 01:57 PM, said:

Which is what regex does for you, it's worthwhile to learn and master.



Agreed. My answer to the other Python Html question might help. It uses regular expressions: http://www.dreaminco...ar-expressions/

However, the OP said they couldn't use them. Knowing how to roll your own is also useful.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1