7 Replies - 5956 Views - Last Post: 29 October 2012 - 06:44 AM Rate Topic: -----

#1 complete  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 104
  • Joined: 12-April 07

WebRequest and WebResponse has issues

Posted 23 October 2012 - 05:53 PM

I wrote a C# program that uses WebRequest and WebResponse to perform a simple web crawler. I discovered something about web sites. Web browsers such as IE and FireFox offer the capacity to view the HTML source code. But it seems that html code that is sent to the browser is one thing and what the browser interprets and displays is something else. For example, if you run a google search in IE and run the same google search in FireFox, the content that you can see when you view the source in IE will NOT have the hyperlinks and content from the search results, but you can see the html hyperlinks and content from the search results when you view the source in FireFox. So my question is this. How do you specialise the WebRequest and WebResponse to show the content after it is processed by the browser instead of before?

Is This A Good Question/Topic? 0
  • +

Replies To: WebRequest and WebResponse has issues

#2 Sergio Tapia  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1253
  • View blog
  • Posts: 4,168
  • Joined: 27-January 10

Re: WebRequest and WebResponse has issues

Posted 23 October 2012 - 07:20 PM

Look into appending a User-Agent to the WebRequest. ;) That should get you started.

http://msdn.microsof....useragent.aspx

Trick the web server into thinking you're Chrome, Firefox or whatever.

This post has been edited by Sergio Tapia: 23 October 2012 - 07:22 PM

Was This Post Helpful? 0
  • +
  • -

#3 complete  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 104
  • Joined: 12-April 07

Re: WebRequest and WebResponse has issues

Posted 24 October 2012 - 01:04 AM

View PostSergio Tapia, on 23 October 2012 - 08:20 PM, said:

Look into appending a User-Agent to the WebRequest. ;) That should get you started.

http://msdn.microsof....useragent.aspx

Trick the web server into thinking you're Chrome, Firefox or whatever.

Interesting. I wonder what exactly is a UserAgent. That link does not provide much of a clue but, yeah, it is a startig point.
Was This Post Helpful? 0
  • +
  • -

#4 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4526
  • View blog
  • Posts: 7,894
  • Joined: 08-June 10

Re: WebRequest and WebResponse has issues

Posted 24 October 2012 - 09:36 AM

Hi, I've moved your topic to the C# forum. The Advanced Discussion forum is for discussion topics, not help topics.

The user agent string is a string all browsers send as a header in their requests, identifying themselves. It tells a server what type of browser is making the request. Of course, user agents are a big gigantic mess, as you'll see with a little research (IE's agent string has "Mozilla" in it, for example).

Now, I do have a question for you. Are you using the View Source feature in Firefox, or are you using the Inspect Element feature? If you're viewing source, you're actually viewing the HTML that was sent back to the browser, before any "processing" has been done to it. If you're using the Inspect Element feature of Chrome/Firefox, (also in IE, it should be available in all the browsers by hitting F12), you are seeing the rendered HTML, after the browser has run all it's scripts. You're actually seeing the page structure after it has been altered.

There'a a major distinction there: if you're seeing different sources, that means that the server is inspecting the user agent and sending different HTML back to different browsers. You'd have to send a user agent string that matches a browser to get the same results.

If, however, you're seeing different rendered HTML, there's no way to get to that directly from the WebRequest, because the WebRequest/Response don't use a browser. They don't have any capability to parse and process HTML/CSS/Javascript. All they can do is return the data the server sent to you. Which can be an issue on some sites, since they don't necessarily return all their data in the initial request; many will return Javascript that will run to fetch more data. There's basically no way for the WebRequest/Response to deal with that. You'd need to debug the page you're looking at, watch the web requests it makes, and simulate those requests yourself.
Was This Post Helpful? 1
  • +
  • -

#5 complete  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 104
  • Joined: 12-April 07

Re: WebRequest and WebResponse has issues

Posted 25 October 2012 - 01:14 PM

In Firefox, I am using the "View Page Source" feature. So I am viewing the HTML before any processing has been done to it.

That is why the sources are different. The IE shows the html after javascript processing has been done and this javascript processing is hiding the search engine result that I want to see programatically.

I have a theory as to how the javascript is hiding the content. The content changes immedately after this HTML line:

<script>if(google.j.b)document.body.style.visibility='hidden';
</script>


If I am able to set the web request somehow to hidden, I would get ack the html before it is processed. How would I do that?
Was This Post Helpful? 0
  • +
  • -

#6 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4526
  • View blog
  • Posts: 7,894
  • Joined: 08-June 10

Re: WebRequest and WebResponse has issues

Posted 25 October 2012 - 01:28 PM

You're still somewhat confused.

Quote

That is why the sources are different. The IE shows the html after javascript processing has been done and this javascript processing is hiding the search engine result that I want to see programatically.


That's not true. If you're using the View Source feature of IE, it's still just showing you the text of the server's response.

If you're using the F12 developer's tools, then you're seeing the actual document object model representation.

Again, if you're getting different results in different browsers, it's likely because the user agent string being sent back is different. Have you looked into that, and tried to send it along with your request?

Also, you have to understand. WebRequest/Response will never give you "processed" HTML. Web Browsers do that. The WebRequest/Response classes aren't browsers. They don't interpret the HTML/JS that they get back. They simply make the response available to you as text, along with some of the other HTTP information.

Perhaps you should post these different results here so we can see the difference.

Also, this may be a dumb question, but if you're using google...are you going to Google.com in both browsers? Firefox has a home page with google on it, but that's not Google's actual page, so the results may actually be different.
Was This Post Helpful? 0
  • +
  • -

#7 complete  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 104
  • Joined: 12-April 07

Re: WebRequest and WebResponse has issues

Posted 27 October 2012 - 09:45 PM

i think one browser shows the html straight from the server and the other browser shows the content after it has been processed by the browser.

Do you think that this is wrong?

I will post the differences in the html if you still think that is important. It is something that anyone can see easily.
Was This Post Helpful? 0
  • +
  • -

#8 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4526
  • View blog
  • Posts: 7,894
  • Joined: 08-June 10

Re: WebRequest and WebResponse has issues

Posted 29 October 2012 - 06:44 AM

Yes, I know that's wrong. As I've said twice before, if you're using the View Source command, you're not looking at the rendered DOM HTML, you're looking at what the server streamed back to the client.

I dislike repeating myself. If you don't trust what I'm saying, feel free to find other sources of help that'll tell you what you want to hear.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1