14 Replies - 1712 Views - Last Post: 12 February 2013 - 04:34 PM Rate Topic: -----

#1 Mycro  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 06-February 13

Getting a webpage title using httpwebrequest

Posted 06 February 2013 - 12:36 PM

Hello everyone, I've been playing with this for a few hours now and haven't gotten any results.

I'm trying to use a HttpWebRequest and HttpWebResponse to get what is in between the <title></title> tags of a website.

Whenever I try and get back what I'm looking for I get a blank statement.

For testing I used the following code (I would not like to use a WebBrowser) to test this and still my results were a failure.

Me.WebBrowser1.Navigate(New Uri("http://www.yahoo.com"))

        Me.ListBox1.Items.Add("Type" & "-->" & "Name")

        For Each element As HtmlElement In Me.WebBrowser1.document.All

            Me.ListBox1.Items.Add(element.TagName() & "-->" & element.Id)

        Next

    End Sub



This code is supposed to pull out the elements of a website, I was going to pick out the title element but this wont even list and gives me an error on
For Each element As HtmlElement In Me.WebBrowser1.document.All

saying "Object reference not set to an instance of an object."

I was just using this for testing and wouldn't like to do it this way.

Any Suggestions of where I can start would be much appreciated, thanks.

Is This A Good Question/Topic? 0
  • +

Replies To: Getting a webpage title using httpwebrequest

#2 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Getting a webpage title using httpwebrequest

Posted 06 February 2013 - 09:50 PM

First of all, when you use a WebBrowser, you can't just call Navigate, then look for the results, because the web page text has not arrived yet.

In Visual Studio, double-click on the WebBrowser control, and you will generate a DocumentCompleted event handler. Within that document, you can then do whatever you'd like. In your case, you want to look at WebBrowser1.DocumentTitle

Once you have a document, whether you get it with a WebBrowser control or an HttpWebRequest, you can retrieve the title the same way.

Here's a snippet from one of my programs...
    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim page As String = WebBrowser1.DocumentText
        If WebBrowser1.DocumentTitle.Contains("AWWS - METAR / TAF") Then
            For Each element As HtmlElement In WebBrowser1.document.GetElementsByTagName("input")
                If element.GetAttribute("name") = "Stations" Then
                    element.InnerText = "CYQR"
                    Exit For
                End If
            Next
      '...


Was This Post Helpful? 0
  • +
  • -

#3 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 07 February 2013 - 02:37 PM

Imports System.Net
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Web

Public Class MUAHAHAHAHHAHAHA

Dim letsGetBaked As New CookieContainer

Sub GetDatGhettoTitle() Handles GetDatGhettoTitle.Click
Dim reckMe As HttpWebRequest = HttpWebRequest.Create(New Uri("http://www.yahoo.com"))
reckMe.CookieContainer = letsGetBaked
reckMe.Method = "GET"
reckMe.UserAgent = "TROLOL THIS AINT A BROWSER SILLY BOY"
reckMe.AllowAutoRedirect = False

Dim IgotWrecked As Httpwebresponse = DirectCast(reckMe.GetResponse(), HttpWebResponse)
Dim datData As String = New StreamReader(IgotWrecked.GetResponseStream()).ReadToEnd

Dim regMeGently As New Regex("<title>.*</title>")

Dim demMatches As MatchCollection = regMeGently.Matches(datData)

Dim daTitle As String = "OMNOMNOM - No Title Present"

For Each title As Match In demMatches
daTitle = title.Value.Replace("<title>","").Replace("</title>","")
Next

MsgBox(daTitle)

End Sub

End Class

' LilGhost was here



That's how i'd do it (didn't test, just winging it.) If nothing else, that should put you on the right track.

Oh and lack of formatting is my pure laziness and not wanting to paste this into vb so it autocorrects the alignment xD i'll let you do that since i just wrote all of that for you.

This post has been edited by LilGhost: 07 February 2013 - 02:39 PM

Was This Post Helpful? 0
  • +
  • -

#4 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 07 February 2013 - 02:47 PM

View Postlar3ry, on 06 February 2013 - 09:50 PM, said:

First of all, when you use a WebBrowser, you can't just call Navigate, then look for the results, because the web page text has not arrived yet.

In Visual Studio, double-click on the WebBrowser control, and you will generate a DocumentCompleted event handler. Within that document, you can then do whatever you'd like. In your case, you want to look at WebBrowser1.DocumentTitle

Once you have a document, whether you get it with a WebBrowser control or an HttpWebRequest, you can retrieve the title the same way.

Here's a snippet from one of my programs...
    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim page As String = WebBrowser1.DocumentText
        If WebBrowser1.DocumentTitle.Contains("AWWS - METAR / TAF") Then
            For Each element As HtmlElement In WebBrowser1.document.GetElementsByTagName("input")
                If element.GetAttribute("name") = "Stations" Then
                    element.InnerText = "CYQR"
                    Exit For
                End If
            Next
      '...


I don't believe your approach will work because in html you set the title as "<title>Text</title>" and thus the title object has not attributes.
Was This Post Helpful? 0
  • +
  • -

#5 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Getting a webpage title using httpwebrequest

Posted 08 February 2013 - 09:26 PM

View PostLilGhost, on 07 February 2013 - 03:47 PM, said:

I don't believe your approach will work because in html you set the title as "<title>Text</title>" and thus the title object has not attributes.

Are you suggesting that my (working) program will not work? Did you actually READ the code? I am not getting the title by searching for attributes. I am getting the title by getting a property of the WebBrowser, ie. WebBrowser1.DocumentTitle. You can also get it by getting a property of an HTML document, as in Htmldocument.Title

Edit: VERY weird! I capitalized the D in Htmldocument.Title, but it keeps dropping to lower case, as it does in this sentence.

This post has been edited by lar3ry: 08 February 2013 - 09:30 PM

Was This Post Helpful? 0
  • +
  • -

#6 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 09 February 2013 - 09:26 AM

View Postlar3ry, on 08 February 2013 - 09:26 PM, said:

View PostLilGhost, on 07 February 2013 - 03:47 PM, said:

I don't believe your approach will work because in html you set the title as "<title>Text</title>" and thus the title object has not attributes.

Are you suggesting that my (working) program will not work? Did you actually READ the code? I am not getting the title by searching for attributes. I am getting the title by getting a property of the WebBrowser, ie. WebBrowser1.DocumentTitle. You can also get it by getting a property of an HTML document, as in Htmldocument.Title

Edit: VERY weird! I capitalized the D in Htmldocument.Title, but it keeps dropping to lower case, as it does in this sentence.


Upon rereading what you supplied to him, you are correct. I was looking at:
For Each element As HtmlElement In WebBrowser1.document.GetElementsByTagName("input")
            If element.GetAttribute("name") = "Stations" Then
                element.InnerText = "CYQR"
                Exit For
            End If
        Next


and didn't pay any attention to the if statement it was wrapped inside of.

On another note the OP did say:

Quote

(I would not like to use a WebBrowser)


So, a webrequest would still be a better approach.

This post has been edited by LilGhost: 09 February 2013 - 09:27 AM

Was This Post Helpful? 0
  • +
  • -

#7 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Getting a webpage title using httpwebrequest

Posted 09 February 2013 - 12:08 PM

View PostLilGhost, on 09 February 2013 - 10:26 AM, said:

On another note the OP did say:

Quote

(I would not like to use a WebBrowser)


So, a webrequest would still be a better approach.

I agree, and that's probably why I also said:

Quote

Once you have a document, whether you get it with a WebBrowser control or an HttpWebRequest, you can retrieve the title the same way.

Was This Post Helpful? 0
  • +
  • -

#8 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 09 February 2013 - 05:21 PM

View Postlar3ry, on 09 February 2013 - 12:08 PM, said:

View PostLilGhost, on 09 February 2013 - 10:26 AM, said:

On another note the OP did say:

Quote

(I would not like to use a WebBrowser)


So, a webrequest would still be a better approach.

I agree, and that's probably why I also said:

Quote

Once you have a document, whether you get it with a WebBrowser control or an HttpWebRequest, you can retrieve the title the same way.

Well, as we can see, i failed to read your post xD Question, if you get it as a webrequest, stream the response in as a string, how would you convert that into a loop of htmlelements without loading it into a webbrowser?
Was This Post Helpful? 0
  • +
  • -

#9 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Getting a webpage title using httpwebrequest

Posted 09 February 2013 - 09:53 PM

View PostLilGhost, on 09 February 2013 - 06:21 PM, said:

Well, as we can see, i failed to read your post xD Question, if you get it as a webrequest, stream the response in as a string, how would you convert that into a loop of htmlelements without loading it into a webbrowser?

After a little research, I guess I wouldn't. It seems that HtmlDocument is tied to the WebBrowser.
So, if the OP insists on not using a WebbRowser, I guess he'll need to get the title in a Stringish manner.

That being said, I would use the WebBrowser if I needed to easily extract stuff from the HTML source, or to interact with objects on the form (buttons, textboxes, etc. And if I didn't want to see the WebBrowser on my form, I would just create one programatically, as in Dim WB as New WebBrowser, then .Navigate and in the .DocumentCompleted event handler, cast the WB.Document as an Htmldocument.

This post has been edited by lar3ry: 09 February 2013 - 09:55 PM

Was This Post Helpful? 0
  • +
  • -

#10 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 10 February 2013 - 12:29 PM

View Postlar3ry, on 09 February 2013 - 09:53 PM, said:

View PostLilGhost, on 09 February 2013 - 06:21 PM, said:

Well, as we can see, i failed to read your post xD Question, if you get it as a webrequest, stream the response in as a string, how would you convert that into a loop of htmlelements without loading it into a webbrowser?

After a little research, I guess I wouldn't. It seems that HtmlDocument is tied to the WebBrowser.
So, if the OP insists on not using a WebbRowser, I guess he'll need to get the title in a Stringish manner.

That being said, I would use the WebBrowser if I needed to easily extract stuff from the HTML source, or to interact with objects on the form (buttons, textboxes, etc. And if I didn't want to see the WebBrowser on my form, I would just create one programatically, as in Dim WB as New WebBrowser, then .Navigate and in the .DocumentCompleted event handler, cast the WB.Document as an Htmldocument.

if you want to mix the request and the browser you could be so awesome as to get the responded text as a string and then set the programatically-generated browser's characteristic of ".DocumentText" = to the responded text. However, i still stand behind my webrequest & regex solution.
Was This Post Helpful? 0
  • +
  • -

#11 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3608
  • View blog
  • Posts: 12,399
  • Joined: 12-December 12

Re: Getting a webpage title using httpwebrequest

Posted 11 February 2013 - 04:15 AM

Quote

if you get it as a webrequest, stream the response in as a string, how would you convert that into a loop of htmlelements without loading it into a webbrowser?

In the following I convert a WebResponse into a string, convert it to an IHTMLDocument (htmlDocument within the function textFromHtml()) and then retrieve the title as:

htmldocument.title.toString()

Imports mshtml      'add reference
Imports System.Net
Imports System.IO
Imports System.Windows.Forms    'add reference

    Sub Main()
        strWebPage = getHTML("http://allenbrowne.com")
        strWebPage = textFromHtml(strWebPage)
        strWebPage = stripTags(strWebPage)
        Console.Write(strWebPage)
    End Sub

    Private Function getHTML(ByVal address As String) As String
        Dim RT As String = ""
        Dim WRequest As WebRequest
        Dim WResponse As WebResponse
        Dim SR As StreamReader

        WRequest = WebRequest.Create(address)
        WResponse = WRequest.GetResponse
        SR = New StreamReader(WResponse.GetResponseStream)
        RT = SR.ReadToEnd()
        SR.Close()
        Return RT
    End Function

    Function textFromHtml(ByVal htmlToParse As String) As String
        Dim htmlDocument As IHTMLDocument = New mshtml.HTMLDocument
        Dim sCollect As String = ""

        htmldocument.write(htmlToParse)
        htmldocument.close()
        System.Windows.Forms.MessageBox.Show(htmldocument.title.ToString)

        Dim allElements As IHTMLElementCollection = htmldocument.body.all
        Dim sTags() As String = {"P", "DIV", "SPAN", "H1", "H2", "H3"}
        For Each elem As IHTMLElement In allElements
            Dim sTagUpper As String = elem.tagName.ToUpper()
            If sTags.Contains(sTagUpper) Then
                sCollect += elem.innerText
                If sTagUpper <> "SPAN" Then
                    sCollect += Constants.vbCrLf
                End If
            End If
        Next

        Return sCollect
    End Function

    Public Function stripTags(ByVal htmlToParse As String) As String
        Return Text.RegularExpressions.Regex.Replace(htmlToParse, "<[^>]*>", "")
    End Function

This post has been edited by andrewsw: 11 February 2013 - 04:16 AM

Was This Post Helpful? 0
  • +
  • -

#12 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3608
  • View blog
  • Posts: 12,399
  • Joined: 12-December 12

Re: Getting a webpage title using httpwebrequest

Posted 11 February 2013 - 04:29 AM

Here is the full code version (a Console App) with Option Strict Off and the variable strWebPage declared:

Option Strict Off

Imports mshtml      'add reference
Imports System.Net
Imports System.IO
Imports System.Windows.Forms    'add reference

Module Module1
    Sub Main()
        Dim strWebPage As String

        strWebPage = getHTML("http://allenbrowne.com")
        strWebPage = textFromHtml(strWebPage)
        strWebPage = stripTags(strWebPage)
        Console.Write(strWebPage)

        Console.ReadKey()
    End Sub

    Private Function getHTML(ByVal address As String) As String
        Dim RT As String = ""
        Dim WRequest As WebRequest
        Dim WResponse As WebResponse
        Dim SR As StreamReader

        WRequest = WebRequest.Create(address)
        WResponse = WRequest.GetResponse
        SR = New StreamReader(WResponse.GetResponseStream)
        RT = SR.ReadToEnd()
        SR.Close()
        Return RT
    End Function

    Function textFromHtml(ByVal htmlToParse As String) As String
        Dim htmlDocument As IHTMLDocument = New mshtml.HTMLDocument
        Dim sCollect As String = ""

        htmldocument.write(htmlToParse)
        htmldocument.close()
        System.Windows.Forms.MessageBox.Show(htmldocument.title.ToString)

        Dim allElements As IHTMLElementCollection = htmldocument.body.all
        Dim sTags() As String = {"P", "DIV", "SPAN", "H1", "H2", "H3"}
        For Each elem As IHTMLElement In allElements
            Dim sTagUpper As String = elem.tagName.ToUpper()
            If sTags.Contains(sTagUpper) Then
                sCollect += elem.innerText
                If sTagUpper <> "SPAN" Then
                    sCollect += Constants.vbCrLf
                End If
            End If
        Next

        Return sCollect
    End Function

    Public Function stripTags(ByVal htmlToParse As String) As String
        Return Text.RegularExpressions.Regex.Replace(htmlToParse, "<[^>]*>", "")
    End Function

End Module


Don't forget to add the two references.
Was This Post Helpful? 2
  • +
  • -

#13 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 11 February 2013 - 02:56 PM

View Postandrewsw, on 11 February 2013 - 04:15 AM, said:

Quote

if you get it as a webrequest, stream the response in as a string, how would you convert that into a loop of htmlelements without loading it into a webbrowser?

In the following I convert a WebResponse into a string, convert it to an IHTMLDocument (htmlDocument within the function textFromHtml()) and then retrieve the title as:

htmldocument.title.toString()

Imports mshtml      'add reference
Imports System.Net
Imports System.IO
Imports System.Windows.Forms    'add reference

    Sub Main()
        strWebPage = getHTML("http://allenbrowne.com")
        strWebPage = textFromHtml(strWebPage)
        strWebPage = stripTags(strWebPage)
        Console.Write(strWebPage)
    End Sub

    Private Function getHTML(ByVal address As String) As String
        Dim RT As String = ""
        Dim WRequest As WebRequest
        Dim WResponse As WebResponse
        Dim SR As StreamReader

        WRequest = WebRequest.Create(address)
        WResponse = WRequest.GetResponse
        SR = New StreamReader(WResponse.GetResponseStream)
        RT = SR.ReadToEnd()
        SR.Close()
        Return RT
    End Function

    Function textFromHtml(ByVal htmlToParse As String) As String
        Dim htmlDocument As IHTMLDocument = New mshtml.HTMLDocument
        Dim sCollect As String = ""

        htmldocument.write(htmlToParse)
        htmldocument.close()
        System.Windows.Forms.MessageBox.Show(htmldocument.title.ToString)

        Dim allElements As IHTMLElementCollection = htmldocument.body.all
        Dim sTags() As String = {"P", "DIV", "SPAN", "H1", "H2", "H3"}
        For Each elem As IHTMLElement In allElements
            Dim sTagUpper As String = elem.tagName.ToUpper()
            If sTags.Contains(sTagUpper) Then
                sCollect += elem.innerText
                If sTagUpper <> "SPAN" Then
                    sCollect += Constants.vbCrLf
                End If
            End If
        Next

        Return sCollect
    End Function

    Public Function stripTags(ByVal htmlToParse As String) As String
        Return Text.RegularExpressions.Regex.Replace(htmlToParse, "<[^>]*>", "")
    End Function

as much fun as that is, once you have your webrequest's response, it'd be easier to just say this:
Dim browse As New Webbrowser

Sub LoadAndPrepare(ByVal htmlSource As String)
AddHandler browse.documentCompleted, AddressOf StripTitle
Browse.ScriptErrorSuppressed = True
Browser.DocumentText = htmlSource
' THIS WILL TRIP THE STRIPTITLE SUB
End Sub

Sub StripTitle()
MsgBox(Browse.DocumentTitle)
End Sub



Thanks for your help though. And at the end of the day, i'd still regex it using: "<title>.*</title>"

This post has been edited by LilGhost: 11 February 2013 - 02:57 PM

Was This Post Helpful? 0
  • +
  • -

#14 andrewsw  Icon User is online

  • It's just been revoked!
  • member icon

Reputation: 3608
  • View blog
  • Posts: 12,399
  • Joined: 12-December 12

Re: Getting a webpage title using httpwebrequest

Posted 11 February 2013 - 03:10 PM

Quote

as much fun as that is, once you have your webrequest's response, it'd be easier to just say this:


I left in a lot of code which isn't directly related to extracting the title. It could be stripped down to just a few lines.

I suspect it would be more efficient as well, but this is just a suspicion. A little testing would decide this I suppose :online2long:

This post has been edited by andrewsw: 11 February 2013 - 03:11 PM

Was This Post Helpful? 0
  • +
  • -

#15 LilGhost  Icon User is offline

  • D.I.C Head

Reputation: 8
  • View blog
  • Posts: 98
  • Joined: 12-October 12

Re: Getting a webpage title using httpwebrequest

Posted 12 February 2013 - 04:34 PM

View Postandrewsw, on 11 February 2013 - 03:10 PM, said:

Quote

as much fun as that is, once you have your webrequest's response, it'd be easier to just say this:


I left in a lot of code which isn't directly related to extracting the title. It could be stripped down to just a few lines.

I suspect it would be more efficient as well, but this is just a suspicion. A little testing would decide this I suppose :online2long:/>


We can run tests later, but it's really quite useless to test as i doubt speed is of the essence down to milliseconds.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1