C# School Assignment? Project Due Tomorrow? Chat LIVE With A Programming Expert!

Welcome to Dream.In.Code
Become a C# Expert!

Join 300,495 C# Programmers for FREE! Get instant access to thousands of C# experts, tutorials, code snippets, and more! There are 1,851 people online right now. Registration is fast and FREE... Join Now!




Web Page Source Code

 

Web Page Source Code, Accessing the source code of a web page to retrieve some data

SigurdSuhm

9 Sep, 2008 - 12:53 PM
Post #1

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 101



Thanked: 13 times
My Contributions
Hey again DreamInCode.

I have a slight problem. I've been given the challenge to make an application retrieve some data from a couple of web sites and throw it together in a .txt file. Does anyone have a good method to retrieve a web page's source code and store it as a string or similar so I can scan through it for the information I need?
I've been looking at the browser ActiveX control and scanning the web for tutorials or posts on this matter but so far without any luck.

Thanks in forehand
Sigurd

User is offlineProfile CardPM
+Quote Post


JackOfAllTrades

RE: Web Page Source Code

9 Sep, 2008 - 01:08 PM
Post #2

I exist to Google your problems.
Group Icon

Joined: 23 Aug, 2008
Posts: 4,948



Thanked: 424 times
Dream Kudos: 50
Expert In: Being annoyed with lazy people.

My Contributions
Google for "scraper C#"

User is offlineProfile CardPM
+Quote Post

jacobjordan

RE: Web Page Source Code

9 Sep, 2008 - 01:13 PM
Post #3

class Me : Perfection
Group Icon

Joined: 11 Jun, 2008
Posts: 1,493



Thanked: 65 times
Dream Kudos: 1725
My Contributions
Super simple my friend. You can use the System.Net.WebClient class. For example, to get the source of this topic on Dream In Code:
csharp

System.Net.WebClient client = new System.Net.WebClient();
client.BaseAddress = "http://www.dreamincode.net";

That will establish a connection to Dream In Code. Then, use this method
csharp

client.DownloadString("http://www.dreamincode.net/forums/showtopic63212.htm");

That will return a string with the source of "http://www.dreamincode.net/forums/showtopic63212.htm", which happens to be this topic. Also, there is a similar method in the same class called DownloadFile(). It will download the data off the internet and automatically write it into a file. I used Dream In Code as an example, you can do this with any site.

This post has been edited by jacobjordan: 9 Sep, 2008 - 01:17 PM
User is offlineProfile CardPM
+Quote Post

PsychoCoder

RE: Web Page Source Code

9 Sep, 2008 - 01:32 PM
Post #4

Dyslexics Untie!
Group Icon

Joined: 26 Jul, 2007
Posts: 14,714



Thanked: 501 times
Dream Kudos: 11450
Expert In: VB, VB.Net, C#, SQL, ASP, ASP.Net, Web Development, HTML, CSS, Win32 API, Javascript, mySQL, J#, Boo.Net, jQuery

My Contributions
A slightly more efficient approach is to use the WebRequest Class along with the StreamReader Class to accomplish this.

csharp

public string GetURLData()
{
try
{
//create a new WebRequest object
System.Net.WebRequest request = System.Net.WebRequest.Create("http://www.dreamincode.net/forums/showtopic63212.htm");

//create StreamReader to hold the returned request
System.IO.StreamReader stream = new System.IO.StreamReader(request.GetResponse().GetResponseStream());

//StringBuilder to hold info from the request
System.Text.StringBuilder builder = new System.Text.StringBuilder();

//now loop through the response
while (!(stream.Peek() == 0))
{
//now make sure we're not looking at a blank line
if(stream.ReadLine().Length>0) builder.Append(stream.ReadLine());
}

//close up the StreamReader
stream.Close();

//return the information
return builder.ToString();
}
catch (Exception ex)
{
//put your error handling here
return string.Empty;
}
}

User is offlineProfile CardPM
+Quote Post

SigurdSuhm

RE: Web Page Source Code

10 Sep, 2008 - 02:50 AM
Post #5

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 101



Thanked: 13 times
My Contributions
Wow. Thanks a lot for the great responses. I think this should get the job done.

Sigurd

User is offlineProfile CardPM
+Quote Post

SigurdSuhm

RE: Web Page Source Code

10 Sep, 2008 - 07:45 AM
Post #6

D.I.C Head
**

Joined: 5 Aug, 2008
Posts: 101



Thanked: 13 times
My Contributions
The presented solutions seem to work all fine but another problem has come up. The site I'm acessing is a site requiring login. Even though the user is logged in in a browser the C# application always just gets the source code of the login screen. Is there a workaround to this? Possibly allowing the user to log in using the application or something even easier?

// Edit:
I've been working around with some web page source code now and I can see that the login form calls another page called login_exec.asp. Of course some ID and Password are submitted. Don't know if this helps.

This post has been edited by SigurdSuhm: 10 Sep, 2008 - 07:54 AM
User is offlineProfile CardPM
+Quote Post

Fast ReplyReply to this topicStart new topic

Time is now: 11/8/09 04:51AM

Live C# Help!

Be Social

Dream.In.Code RSS Feed Dream.In.Code LinkedIn Group Follow Us On Twitter Fan Us On Facebook

C# Tutorials

Reference Sheets

C# Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month