Parsing an HTML file

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

34 Replies - 2673 Views - Last Post: 08 May 2011 - 03:05 PM Rate Topic: -----

#1 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Parsing an HTML file

Posted 08 May 2011 - 12:52 AM

I'm trying to get started on my assignment. I have to basically open an HTML file, parse it's contents into a tree, then afterwards print it out with proper formatting.

:o

I honestly do not know what any of this means. I am assuming that opening an HTML file, they want me to read in an html file.
What does it mean by parsing it's contents into a tree? what does it mean to parse it's contents into a tree?

I do have a class called node
//ADD CODE HERE
abstract class Node
{
  private String type;
  private int level;
  private Node parent;
  //ADD CODE HERE

  //ADD CODE HERE

  public static int getElementCount(){
    //ADD CODE HERE
  }

  abstract public void print();
}
That's all I have right now on Node.

Afterwards I print it out with proper formatting.

Thanks, once again!

By the Way:

Here is the html file we have to parse just so you have an idea what I'm working with.

<HTML>
<HEAD>
<TITLE> 
A basic form
</TITLE>
</HEAD>
<BODY>
Enter your age in the form below, and then click submit.
<FORM>
<TABLE cellpadding="2">
<TR>
<TD>
<INPUT type="text" age="age" value="0">
</INPUT>
<INPUT type="button" age="Submit" value="Submit">
</INPUT>
</TD>
</TR>
<TR>
<TD>
<INPUT>
</INPUT>
</TD>
</TR>
</TABLE>
</FORM>
</BODY>
</HTML>


This post has been edited by paki123: 08 May 2011 - 01:31 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Parsing an HTML file

#2 g00se  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3536
  • View blog
  • Posts: 16,029
  • Joined: 20-September 08

Re: Parsing an HTML file

Posted 08 May 2011 - 03:46 AM

See http://nekohtml.sour.../DOMParser.html
Was This Post Helpful? 0
  • +
  • -

#3 pbl  Icon User is offline

  • There is nothing you can't do with a JTable
  • member icon

Reputation: 8378
  • View blog
  • Posts: 31,956
  • Joined: 06-March 08

Re: Parsing an HTML file

Posted 08 May 2011 - 04:29 AM

And don't see any reason to male Node an Abstract Class
Was This Post Helpful? 0
  • +
  • -

#4 macosxnerd101  Icon User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12278
  • View blog
  • Posts: 45,364
  • Joined: 27-December 08

Re: Parsing an HTML file

Posted 08 May 2011 - 06:49 AM

If you just want to display/format the HTML, no need for an explicit Tree structure at all. HTML has a Tree structure to begin with, and you can use SAX as a Stack to add or remove formatting/spacing to the HTML, based on how the tags are nested.
Was This Post Helpful? 0
  • +
  • -

#5 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 08:34 AM

View Postg00se, on 08 May 2011 - 03:46 AM, said:


I will check that out.

View Postpbl, on 08 May 2011 - 04:29 AM, said:

And don't see any reason to male Node an Abstract Class

I don't even know what that means, it was just skeleton code that my teacher gave us.

View Postmacosxnerd101, on 08 May 2011 - 06:49 AM, said:

If you just want to display/format the HTML, no need for an explicit Tree structure at all. HTML has a Tree structure to begin with, and you can use SAX as a Stack to add or remove formatting/spacing to the HTML, based on how the tags are nested.


The thing is that the Tree structure is required, so I have to do it regardless if it is needed or not.
Was This Post Helpful? 0
  • +
  • -

#6 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 08:42 AM

I think basically i have to create a method that will read an html file, parse it, and it print out an hmtl file from scratch. Can anyone tell me how to get started? I'm assumming I have to use regex to get started, do i need to have a pattern for every type of tag?
Was This Post Helpful? 0
  • +
  • -

#7 g00se  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 3536
  • View blog
  • Posts: 16,029
  • Joined: 20-September 08

Re: Parsing an HTML file

Posted 08 May 2011 - 08:45 AM

Quote

I'm assumming I have to use regex to get started


Why? I've just shown you an api you can use
Was This Post Helpful? 0
  • +
  • -

#8 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 08:50 AM

I don't understand how the domparser works... Do I just read the file into the domparser?

Like would i do this
DomParser dm = new DomParser();



Then do what? What does a DomParser actually do? Where do I go after this? Sorry for the questions, I am very unaware about this subject.
Was This Post Helpful? 0
  • +
  • -

#9 macosxnerd101  Icon User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12278
  • View blog
  • Posts: 45,364
  • Joined: 27-December 08

Re: Parsing an HTML file

Posted 08 May 2011 - 09:14 AM

KYA has a blog entry demonstrating how to use DOM.
Was This Post Helpful? 0
  • +
  • -

#10 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 09:18 AM

View Postmacosxnerd101, on 08 May 2011 - 09:14 AM, said:

KYA has a blog entry demonstrating how to use DOM.

As much as I appreciate the help, I don't think my teacher will allow us to use open-source proejcts. I think she would rather us use Regex to take the information from the file and then make a tree out of it.
Was This Post Helpful? 0
  • +
  • -

#11 macosxnerd101  Icon User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12278
  • View blog
  • Posts: 45,364
  • Joined: 27-December 08

Re: Parsing an HTML file

Posted 08 May 2011 - 09:21 AM

This is a standard Java API tool. If you want to write your own parser, you can. But this can take some time, in and of itself. Use an existing parser to populate a Tree. You can manage a Stack to build your Tree. That makes me think of SAX. You can use DOM as well, as g00se suggested.

Also, to be fair, we don't know your teacher; and therefore, cannot guess at what he or she wants in this instance. If you are unsure, talk to your teacher. It doesn't make sense to reinvent an XML parser when a number of good parsers already exist.
Was This Post Helpful? 0
  • +
  • -

#12 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 09:25 AM

View Postmacosxnerd101, on 08 May 2011 - 09:21 AM, said:

This is a standard Java API tool. If you want to write your own parser, you can. But this can take some time, in and of itself. Use an existing parser to populate a Tree. You can manage a Stack to build your Tree[/url]. That makes me think of SAX. You can use DOM as well, as g00se suggested.

Also, to be fair, we don't know your teacher; and therefore, cannot guess at what he or she wants in this instance. If you are unsure, talk to your teacher. It doesn't make sense to reinvent an XML parser when a number of good parsers already exist.

I'm suppose to make an HTML parser, not an XML parser. I just don't know how to get this started. I think the reason for the project is to get experience making a parser.

This post has been edited by paki123: 08 May 2011 - 09:26 AM

Was This Post Helpful? 0
  • +
  • -

#13 macosxnerd101  Icon User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12278
  • View blog
  • Posts: 45,364
  • Joined: 27-December 08

Re: Parsing an HTML file

Posted 08 May 2011 - 09:27 AM

The basic syntax of HTML and XML is the same for the intents of making a parser. I would still pursue a Stack based solution if you want to make your own parser, as one of my above links described.
Was This Post Helpful? 0
  • +
  • -

#14 paki123  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 88
  • Joined: 18-February 11

Re: Parsing an HTML file

Posted 08 May 2011 - 09:34 AM

I'm screwed. :)

I really wish I knew what you were talking about.

I think the hardest part for me is to actually understand what this all means. I just don't know how to get started.
Was This Post Helpful? 0
  • +
  • -

#15 darek9576  Icon User is offline

  • D.I.C Lover

Reputation: 203
  • View blog
  • Posts: 1,731
  • Joined: 13-March 10

Re: Parsing an HTML file

Posted 08 May 2011 - 09:37 AM

Do more research? Or order something else.
Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3