9 Replies - 38785 Views - Last Post: 08 January 2010 - 12:40 PM Rate Topic: -----

#1 jad2010  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 02-January 10

How to read/extract data from a web page?

Post icon  Posted 02 January 2010 - 07:14 PM

I am trying to build a VB.NET windows app which gets cars info from a webpage (web application) that has a username and password.
I was able to programmatically login to this webpage(by automatically populating the input boxes) And after I logged in, I could view the cars data in browser and I did "View Source" but the cars data (such as car model, brand, color etc..) were not viewable in the page source code. So how can I read these data with my application?
I hope my question was clear and literally need some help.
Thanks

Is This A Good Question/Topic? 0
  • +

Replies To: How to read/extract data from a web page?

#2 saichong  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 21-November 09

Re: How to read/extract data from a web page?

Posted 04 January 2010 - 01:51 AM

ahlan

could you brig the url here to see the source code

salam
Was This Post Helpful? 0
  • +
  • -

#3 Jack Eagles1  Icon User is offline

  • Pugnacious Penguin (inspired by no2pencil)
  • member icon

Reputation: 183
  • View blog
  • Posts: 1,152
  • Joined: 10-December 08

Re: How to read/extract data from a web page?

Posted 04 January 2010 - 01:59 PM

To sign in and out of a website programatically you could insert a webbrowser control into your project and make it navigate to the selected site, then use the HtmlElement.SetAttribute function to set the text of the username & Password fields and then use the HtmlElement.InvokeMember function to invoke a click on the login button.


Next you need to extract the text you want from the webpage once you have logged on. To do this you need to know the name of the Html element you want to extract the data from. You can use the HtmlElement.GetAttribute("value") to retreive the text displayed in the HtmlElement (usually).

You mentioned that you can't find the car data in the source code. Is it displayed as a PDF or in Flash player? If so you won't be able to find it in the source code. If you want to get it, I suggest you add a web browser to a form, make the form transparrent, and then take a snapshot of the transparrent window. Then you would have to trim the immage to fit your needs.

All of the above is possible, and I've done most of it. It just may take you some time. Try displaying more source code of your own and I might be more willing to post more source code of my own.
Was This Post Helpful? 0
  • +
  • -

#4 jad2010  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 2
  • Joined: 02-January 10

Re: How to read/extract data from a web page?

Posted 04 January 2010 - 08:38 PM

Jack, I done the login part already using webbrowser control as you said.
Data is not displayed as PDF nor as Flash. I downloaded a firefox tool(web developer addon), I clicked "view generated source" and I could see the data in html there.
I see that the data which I want are being populated in a dynamic html table(the table on the page auto grows as more cars come, and each incoming car is appended to a new row in table).

1 row of data html source code looks like this:

<td class="x-grid3-col x-grid3-cell x-grid3-td-dealer x-grid3-cell-first " style="width: 78px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-dealer" unselectable="on">Privat</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-manufacturer " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-manufacturer" unselectable="on">AUDI</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-modelDescription " style="width: 157px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-modelDescription" unselectable="on">A2 1.4 </div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-price " style="width: 88px; text-align: right;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-price" unselectable="on"><span class="format-right">7.000</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-firstRegistration " style="width: 58px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-firstRegistration" unselectable="on">6/2000</div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-mileage " style="width: 73px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-mileage" unselectable="on"><span class="format-right">122.000</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-powerInKw " style="width: 48px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-powerInKw" unselectable="on"><span class="format-right">55</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-modificationDate " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-modificationDate" unselectable="on"><span class="format-right">Heute - 21:29</span></div></td><td class="x-grid3-col x-grid3-cell x-grid3-td-location " style="width: 98px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-location" unselectable="on">85256 Vierkirchen</div></td>

This table has 9 columns and I need to programmatically get these 9 values of each cell. I have highlighted in red some of the needed values (see above), so how can I send the http requests and store these data in variables whithin my application?

here is what i've done so far:
Private Sub btnNavigate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnNavigate.Click
		NavigateToUrlSync("url")
		Dim hElement As HtmlElement

		hElement = WebBrowser.document.GetElementById("username")
		hElement.SetAttribute("value", "user")
		hElement = WebBrowser.document.GetElementById("password")
		hElement.SetAttribute("value", "pass")
		Dim tagName As String = "form"
		Dim returnValue As HtmlElementCollection
		returnValue = WebBrowser.document.GetElementsByTagName(tagName)
		returnValue.Item(0).InvokeMember("submit")
	End Sub

This post has been edited by jad2010: 04 January 2010 - 08:38 PM

Was This Post Helpful? 0
  • +
  • -

#5 Jack Eagles1  Icon User is offline

  • Pugnacious Penguin (inspired by no2pencil)
  • member icon

Reputation: 183
  • View blog
  • Posts: 1,152
  • Joined: 10-December 08

Re: How to read/extract data from a web page?

Posted 05 January 2010 - 10:30 AM

Hmm. All your code seems to be working to me.
The problem seems to be that we can't use the HtmlElement.GetAttribute function because the text which we want is not enclosed between speech marks. If it was another property (such as style) you could use it. Perhaps you could write some code to replace the HTML which you don't want.
But then the problem would be actually getting the HTML without having to use the web developer add on for firefox.
Perhaps you could tell me the adress of the website you are trying to extract the data from.
Was This Post Helpful? 0
  • +
  • -

#6 masterwaldo  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 5
  • Joined: 23-December 09

Re: How to read/extract data from a web page?

Posted 08 January 2010 - 03:23 AM

I just started learning vb and I'm not sure the coding.

But to extract the data that you highlight, I would use regex. You can have a loop to look for 1st item, then 2nd item and so on and keep them in a variable.

Here is an example to get the word AUDI using regex:
(?<=<div\sclass="x-grid3-cell-inner x-grid3-col-manufacturer"\sunselectable="on">).*?(?=</div>)



Hope that help.
Was This Post Helpful? 0
  • +
  • -

#7 Programmist  Icon User is offline

  • CTO
  • member icon

Reputation: 252
  • View blog
  • Posts: 1,833
  • Joined: 02-January 06

Re: How to read/extract data from a web page?

Posted 08 January 2010 - 04:39 AM

Extracting data from HTML this way has traditionally been called screen scraping. This bad practice, if you care.
Was This Post Helpful? 0
  • +
  • -

#8 Jack Eagles1  Icon User is offline

  • Pugnacious Penguin (inspired by no2pencil)
  • member icon

Reputation: 183
  • View blog
  • Posts: 1,152
  • Joined: 10-December 08

Re: How to read/extract data from a web page?

Posted 08 January 2010 - 11:23 AM

View Postmasterwaldo, on 8 Jan, 2010 - 02:23 AM, said:

I just started learning vb and I'm not sure the coding.

But to extract the data that you highlight, I would use regex. You can have a loop to look for 1st item, then 2nd item and so on and keep them in a variable.

Here is an example to get the word AUDI using regex:
(?<=<div\sclass="x-grid3-cell-inner x-grid3-col-manufacturer"\sunselectable="on">).*?(?=</div>)



Hope that help.




Actually, you can see from the html of the table, it won't let you highlight it:


<td class="x-grid3-col x-grid3-cell x-grid3-td-dealer x-grid3-cell-first " style="width: 78px;" tabindex="0"><div class="x-grid3-cell-inner x-grid3-col-dealer" unselectable="on">Privat</div>
Was This Post Helpful? 0
  • +
  • -

#9 AMDKilla  Icon User is offline

  • D.I.C Head

Reputation: 6
  • View blog
  • Posts: 88
  • Joined: 30-December 09

Re: How to read/extract data from a web page?

Posted 08 January 2010 - 11:53 AM

Surely you can programatically change the unselectable="on" to "off" ?

Then you could select the content.

This post has been edited by AMDKilla: 08 January 2010 - 11:54 AM

Was This Post Helpful? 0
  • +
  • -

#10 Jack Eagles1  Icon User is offline

  • Pugnacious Penguin (inspired by no2pencil)
  • member icon

Reputation: 183
  • View blog
  • Posts: 1,152
  • Joined: 10-December 08

Re: How to read/extract data from a web page?

Posted 08 January 2010 - 12:40 PM

Yes, we could replace the code, using to HtmlElement.SetAttribute function, but 1: We can't actually access the code because of the reason which is described above, and 2: This might screw up the layout of the webpage, and 3: The website might have a javascript running to detect changes to certain html codes (I know a few which do).
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1