8 Replies - 286 Views - Last Post: 24 December 2017 - 09:42 PM Rate Topic: -----

#1 JoeBobJr   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 23-December 17

Bad at parsing data please show me how!

Posted 24 December 2017 - 07:53 PM

I'm trying to pull some data from a webpage and sadly it has a ton of the same tags in the code so its hard for me to parse the data out since I can't just grab the middle of two tags. I don't even know where to start on this all help is appreciated!

 </thead>
            <tbody>
            <tr>
                <td>3.88 USD</td>
                <td>0.16 USD</td>
                <td>3.88 USD</td>   <<<< This line here!
                <td>27.14 USD</td>
                <td>116.33 USD</td>
                <td>1,415.30 USD</td>
            </tr>
            </tbody>



I need to grab the third <td> information which is the 3.88 USD if it was between two specific tags that were different than all the rest I could get the data but I have no idea how to parse it out when there are so many tags the same.

Is This A Good Question/Topic? 0
  • +

Replies To: Bad at parsing data please show me how!

#2 Martyr2   User is offline

  • Programming Theoretician
  • member icon

Reputation: 5257
  • View blog
  • Posts: 14,073
  • Joined: 18-April 07

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 08:38 PM

Well one way you can do this is by treating the HTML there as a group of XML elements. Here is how you do that....

' This goes at the top of the file
Imports System.Xml.Path

' This stuff goes in some function or event...
Dim elementItems As String = "<tbody>
           <tr>
               <td>3.88 USD</td>
               <td>0.16 USD</td>
               <td>3.88 USD</td>
               <td>27.14 USD</td>
               <td>116.33 USD</td>
               <td>1,415.30 USD</td>
           </tr>
           </tbody>"

' Load the string into a document object, this will parse it into a structure you can reference with XPath
Dim eleDoc As System.Xml.Linq.XDocument = System.Xml.Linq.Xdocument.Parse(elementItems)

' Here we specify the path to the elements we want. In this case, the TD elements. So start from the body, down to the tr down to the td
' This will get a list of TD elements
Dim eleList = eleDoc.XPathSelectElements("/tbody/tr/td")

' Now we can reference the TD we want as a list element. We want the second item (starts at zero remember) and we pull out the value 3.88 USD
MessageBox.Show(eleList.ToList()(2).Value)



Hope this helps. :)
Was This Post Helpful? 1
  • +
  • -

#3 JoeBobJr   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 23-December 17

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:15 PM

View PostMartyr2, on 24 December 2017 - 08:38 PM, said:

Well one way you can do this is by treating the HTML there as a group of XML elements. Here is how you do that....

' This goes at the top of the file
Imports System.Xml.Path

' This stuff goes in some function or event...
Dim elementItems As String = "<tbody>
           <tr>
               <td>3.88 USD</td>
               <td>0.16 USD</td>
               <td>3.88 USD</td>
               <td>27.14 USD</td>
               <td>116.33 USD</td>
               <td>1,415.30 USD</td>
           </tr>
           </tbody>"

' Load the string into a document object, this will parse it into a structure you can reference with XPath
Dim eleDoc As System.Xml.Linq.XDocument = System.Xml.Linq.Xdocument.Parse(elementItems)

' Here we specify the path to the elements we want. In this case, the TD elements. So start from the body, down to the tr down to the td
' This will get a list of TD elements
Dim eleList = eleDoc.XPathSelectElements("/tbody/tr/td")

' Now we can reference the TD we want as a list element. We want the second item (starts at zero remember) and we pull out the value 3.88 USD
MessageBox.Show(eleList.ToList()(2).Value)



Hope this helps. :)/>


I can't seem to get it to work. I'm pulling the data from a webpage with regex putting it into the elementItems string then using the rest of your code and after its complete I'm trying to display the result into a textbox and it stays empty it never shows anything.

How do see the XPath list that is created so I can see if it's gotten that far in the code? I'm not sure how to debug unless I debug each line and see the results of each one as it goes.

I forgot to mention that I've displayed the data from the string into a textbox after I pull from the webpage and that portion is working. I'm just not sure how to debug the rest since I've never used XPath.
Was This Post Helpful? 0
  • +
  • -

#4 Martyr2   User is offline

  • Programming Theoretician
  • member icon

Reputation: 5257
  • View blog
  • Posts: 14,073
  • Joined: 18-April 07

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:18 PM

Put in a break point. Then when it hits the break point, you should see your "Locals" window popup. There you can see the values in all variables and can expand them to see what they have inside them. Just so you know, the code I provided works so if you are not getting anything, perhaps you are not loading in the string correctly or perhaps loading too much. In my example I am loading just the tbody tag. If you are loading more or less, you will have to adjust your select string (the /tbody/tr/td part) to reflect the structure of the coding you are loading.
Was This Post Helpful? 1
  • +
  • -

#5 JoeBobJr   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 23-December 17

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:21 PM

View PostMartyr2, on 24 December 2017 - 09:18 PM, said:

Put in a break point. Then when it hits the break point, you should see your "Locals" window popup. There you can see the values in all variables and can expand them to see what they have inside them. Just so you know, the code I provided works so if you are not getting anything, perhaps you are not loading in the string correctly or perhaps loading too much. In my example I am loading just the tbody tag. If you are loading more or less, you will have to adjust your select string (the /tbody/tr/td part) to reflect the structure of the coding you are loading.


Well I'm pulling a </thead> tag before it also but once I get the code in the string I use the replace and get rid of it so that it should be as the code I've posted to you.
Was This Post Helpful? 0
  • +
  • -

#6 Martyr2   User is offline

  • Programming Theoretician
  • member icon

Reputation: 5257
  • View blog
  • Posts: 14,073
  • Joined: 18-April 07

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:23 PM

Ok well that is part of the problem then, don't pull in that </thead>. You want the code elements to be a complete tag, not the ending of some other tag. Notice in my example everything is contained right in the <tbody></tbody> tags.

XPath is much like a folder/file structure you are use to in things like windows explorer. Tbody is our root folder, then you navigate down into the folders like paths you use to locate files. tbody/tr/td is much like c:/tbody/tr/td (you can think of it that way). By having the /thead part in there you are including the ending of some other node.

This post has been edited by Martyr2: 24 December 2017 - 09:26 PM

Was This Post Helpful? 1
  • +
  • -

#7 JoeBobJr   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 23-December 17

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:28 PM

View PostMartyr2, on 24 December 2017 - 09:23 PM, said:

Ok well that is part of the problem then, don't pull in that </thead>. You want the code elements to be a complete tag, not the ending of some other tag. Notice in my example everything is contained right in the <tbody></tbody> tags.


I have to pull the </thead> with it. There are about 5 or 6 different <tbody> tags in the whole webpage the only way I can pull the exact code I need is to pull that </thead> also otherwise it will grab other data from other tables.

View PostJoeBobJr, on 24 December 2017 - 09:25 PM, said:

View PostMartyr2, on 24 December 2017 - 09:23 PM, said:

Ok well that is part of the problem then, don't pull in that </thead>. You want the code elements to be a complete tag, not the ending of some other tag. Notice in my example everything is contained right in the <tbody></tbody> tags.


I have to pull the </thead> with it. There are about 5 or 6 different <tbody> tags in the whole webpage the only way I can pull the exact code I need is to pull that </thead> also otherwise it will grab other data from other tables.


Okay now that you said the tags need to be complete I'll pull more than what I need so that I can get that. I'll pull the entire div section and try that. Thanks for your help if I can't get it to work I'll let you know!
Was This Post Helpful? 0
  • +
  • -

#8 Martyr2   User is offline

  • Programming Theoretician
  • member icon

Reputation: 5257
  • View blog
  • Posts: 14,073
  • Joined: 18-April 07

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:28 PM

Well pull it, but just strip it out before you load it into the code I showed you. You want to pair it down to just the code I am showing you. I don't see why you need to have </thead> in the string that you load into your parser.
Was This Post Helpful? 0
  • +
  • -

#9 JoeBobJr   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 23
  • Joined: 23-December 17

Re: Bad at parsing data please show me how!

Posted 24 December 2017 - 09:42 PM

View PostMartyr2, on 24 December 2017 - 09:28 PM, said:

Well pull it, but just strip it out before you load it into the code I showed you. You want to pair it down to just the code I am showing you. I don't see why you need to have </thead> in the string that you load into your parser.


I got it work by just pulling the tbody to /tbody but the only reason that worked is because it was the first <tbody> tag in the webpage source. There are about 5-6 other <tbody> tags in the webpage so that's why I was trying to pull something else with it to make it unique so there was no way it would pull anything but the info I wanted. Thanks for this! By far the coolest way I've ever seen parsing done on html tags genius!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1