3 Replies - 9302 Views - Last Post: 14 December 2011 - 02:48 AM Rate Topic: -----

#1 euverve  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 09-February 09

How to extract dates from web page

Posted 13 December 2011 - 09:56 PM

I wanted to extract today's date from

http://www.whatsthedatetoday.com/


and the expected output will be: Wednesday, December 14th, 2011

Anybody could help with sample code.

Help is appreciated.
Is This A Good Question/Topic? 0
  • +

Replies To: How to extract dates from web page

#2 Martyr2  Icon User is offline

  • Programming Theoretician
  • member icon

Reputation: 4421
  • View blog
  • Posts: 12,286
  • Joined: 18-April 07

Re: How to extract dates from web page

Posted 13 December 2011 - 11:01 PM

Try playing with the WebRequest object and its response object. Throw those into a reader object and read in the web page. Once you have done this, you can parse the string any way you like.

' Create a request object
Dim reqObj As System.Net.WebRequest = System.Net.WebRequest.Create("http://www.whatsthedatetoday.com/")
      
' Get a response from the request      
Dim response As System.Net.WebResponse = reqObj.GetResponse

' Get the underlying stream of that response
responseStream = response.GetResponseStream
streamRead = New System.IO.StreamReader(responseStream)
            
' Read the stream into the string object 
Dim strContent as String  = streamRead.ReadToEnd



Then once you have the string content from the page, you can search for the tag, or split it, pull it out etc using regular expressions or string functions etc.

:)
Was This Post Helpful? 0
  • +
  • -

#3 euverve  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 09-February 09

Re: How to extract dates from web page

Posted 14 December 2011 - 01:54 AM

I have done the same thing as yours. I want to know how to apply regular expressions and to avoid lengthy codes.

Public Class Form1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        MsgBox(getcurrentdate())
    End Sub

    Public Function getcurrentdate() As String
        Try
            Dim sData = readpage("http://www.whatsthedatetoday.com/")

            'Process HTML source removing tags
            Dim str As String = sData
            sData = Mid(str, str.IndexOf("oday's Date is:"), str.IndexOf("The Date on your Computer is:"))

            Dim Data = sData.IndexOf("GMT")
            sData = CleanSpace(StripHTMLTags(Mid(sData, 1, Data)))

            sData = Replace(sData, " th ", "", 1)
            sData = Replace(sData, " st ", "", 1)
            sData = Replace(sData, " nd ", "", 1)
            sData = Replace(sData, " rd ", "", 1)
            sData = Replace(sData, "  ", "", 1)

            Dim tab As Char = ControlChars.Tab
            TextBox1.Text = Replace(sData, tab.ToString(), "", 1)

            'Cleanup Unused Lines
            TextBox1.Lines = sData.Split(New Char() {ControlChars.Lf}, StringSplitOptions.RemoveEmptyEntries)

            Dim strLen As Integer
            For i As Integer = 0 To TextBox1.Lines.Length - 1
                If TextBox1.Lines(i).Contains("Monday") Or _
                    TextBox1.Lines(i).Contains("Tuesday") Or _
                    TextBox1.Lines(i).Contains("Wednesday") Or
                    TextBox1.Lines(i).Contains("Thursday") Or _
                    TextBox1.Lines(i).Contains("Friday") Or _
                    TextBox1.Lines(i).Contains("Saturday") Or _
                    TextBox1.Lines(i).Contains("Sunday") Then
                    strLen = Len(TextBox1.Lines(i).ToString)
                End If
            Next
            Return Mid(sData.Remove(0, strLen), 1, strLen - 1)
        Catch ex As Exception
            Return ""
        End Try
    End Function

    'Get Source of the HTML page
    Public Shared Function readpage(ByVal url As String) As String
        Dim Str As System.IO.Stream
        Dim srRead As System.IO.StreamReader
        Try
            ' make a Web request
            Dim req As System.Net.WebRequest = System.Net.WebRequest.Create(url)
            Dim resp As System.Net.WebResponse = req.GetResponse
            Str = resp.GetResponseStream
            srRead = New System.IO.StreamReader(Str)

            ' read all the text 
            Dim Data As String = srRead.ReadToEnd
            Return Data
        Catch ex As Exception
            Return "Unable to download content"
        End Try
    End Function

    'Remove HTML Tags
    Public Shared Function StripHTMLTags(ByVal HTMLToStrip As String) As String
        Dim stripped As String
        If HTMLToStrip <> "" Then
            stripped = System.Text.RegularExpressions.Regex.Replace(HTMLToStrip, "<(.|\n)+?>", " ")
            Return stripped
        Else
            Return ""
        End If
    End Function

    'Remove Double Spaces
    Public Shared Function CleanSpace(ByVal strIn As String) As String
        ' // Remove leading or trailing spaces
        strIn = Trim(strIn)
        ' // Replace all double space pairings with single spaces
        Do While InStr(strIn, "  ")
            strIn = Replace(strIn, "  ", " ")
        Loop
        ' // Return the result
        CleanSpace = strIn
    End Function
End Class




Is there anyway to utilize the code without the need of textbox?
Was This Post Helpful? 0
  • +
  • -

#4 Bolter99  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 13-August 11

Re: How to extract dates from web page

Posted 14 December 2011 - 02:48 AM

You could use a web browser object with this in the DocumentCompleted event:

MsgBox(WebBrowser1.document.getElementById("content").innerText)


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1