Chat LIVE With Programming Experts! There Are 23 Online Right Now...

Welcome to Dream.In.Code
Become a C# Expert!

Join 244,307 C# Programmers for FREE! Get instant access to thousands of C# experts, tutorials, code snippets, and more! There are 790 people online right now. Registration is fast and FREE... Join Now!




Web Page Source Code

 
Reply to this topicStart new topic

Web Page Source Code, Accessing the source code of a web page to retrieve some data

SigurdSuhm
9 Sep, 2008 - 12:53 PM
Post #1

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 85



Thanked: 11 times
My Contributions
Hey again DreamInCode.

I have a slight problem. I've been given the challenge to make an application retrieve some data from a couple of web sites and throw it together in a .txt file. Does anyone have a good method to retrieve a web page's source code and store it as a string or similar so I can scan through it for the information I need?
I've been looking at the browser ActiveX control and scanning the web for tutorials or posts on this matter but so far without any luck.

Thanks in forehand
Sigurd

User is offlineProfile CardPM
+Quote Post


JackOfAllTrades
RE: Web Page Source Code
9 Sep, 2008 - 01:08 PM
Post #2

Cantankerous Old Fart
Group Icon

Joined: 23 Aug, 2008
Posts: 3,055



Thanked: 270 times
Dream Kudos: 50
Expert In: Nothing. Well, nothing relevant here anyway. ;)

My Contributions
Google for "scraper C#"

User is online!Profile CardPM
+Quote Post

jacobjordan
RE: Web Page Source Code
9 Sep, 2008 - 01:13 PM
Post #3

class Me : Perfection
Group Icon

Joined: 11 Jun, 2008
Posts: 1,482



Thanked: 56 times
Dream Kudos: 1725
My Contributions
Super simple my friend. You can use the System.Net.WebClient class. For example, to get the source of this topic on Dream In Code:
csharp

System.Net.WebClient client = new System.Net.WebClient();
client.BaseAddress = "http://www.dreamincode.net";

That will establish a connection to Dream In Code. Then, use this method
csharp

client.DownloadString("http://www.dreamincode.net/forums/showtopic63212.htm");

That will return a string with the source of "http://www.dreamincode.net/forums/showtopic63212.htm", which happens to be this topic. Also, there is a similar method in the same class called DownloadFile(). It will download the data off the internet and automatically write it into a file. I used Dream In Code as an example, you can do this with any site.

This post has been edited by jacobjordan: 9 Sep, 2008 - 01:17 PM
User is offlineProfile CardPM
+Quote Post

PsychoCoder
RE: Web Page Source Code
9 Sep, 2008 - 01:32 PM
Post #4

loves.Coding(this);
Group Icon

Joined: 26 Jul, 2007
Posts: 12,288



Thanked: 372 times
Dream Kudos: 10775
Expert In: VB, VB.Net, C#, SQL, ASP, ASP.Net, Web Development, HTML, CSS, Win32 API, Javascript, mySQL, J#, Boo.Net, jQuery

My Contributions
A slightly more efficient approach is to use the WebRequest Class along with the StreamReader Class to accomplish this.

csharp

public string GetURLData()
{
try
{
//create a new WebRequest object
System.Net.WebRequest request = System.Net.WebRequest.Create("http://www.dreamincode.net/forums/showtopic63212.htm");

//create StreamReader to hold the returned request
System.IO.StreamReader stream = new System.IO.StreamReader(request.GetResponse().GetResponseStream());

//StringBuilder to hold info from the request
System.Text.StringBuilder builder = new System.Text.StringBuilder();

//now loop through the response
while (!(stream.Peek() == 0))
{
//now make sure we're not looking at a blank line
if(stream.ReadLine().Length>0) builder.Append(stream.ReadLine());
}

//close up the StreamReader
stream.Close();

//return the information
return builder.ToString();
}
catch (Exception ex)
{
//put your error handling here
return string.Empty;
}
}

User is offlineProfile CardPM
+Quote Post

SigurdSuhm
RE: Web Page Source Code
10 Sep, 2008 - 02:50 AM
Post #5

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 85



Thanked: 11 times
My Contributions
Wow. Thanks a lot for the great responses. I think this should get the job done.

Sigurd

User is offlineProfile CardPM
+Quote Post

SigurdSuhm
RE: Web Page Source Code
10 Sep, 2008 - 07:45 AM
Post #6

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 85



Thanked: 11 times
My Contributions
The presented solutions seem to work all fine but another problem has come up. The site I'm acessing is a site requiring login. Even though the user is logged in in a browser the C# application always just gets the source code of the login screen. Is there a workaround to this? Possibly allowing the user to log in using the application or something even easier?

// Edit:
I've been working around with some web page source code now and I can see that the login form calls another page called login_exec.asp. Of course some ID and Password are submitted. Don't know if this helps.

This post has been edited by SigurdSuhm: 10 Sep, 2008 - 07:54 AM
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic

Time is now: 7/4/09 06:53PM

Live C# Help!

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter Fan Us On Facebook

C# Tutorials

Reference Sheets

C# Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month