5 Replies - 783 Views - Last Post: 23 April 2014 - 05:35 PM Rate Topic: -----

#1 texasdeck  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 6
  • Joined: 04-December 09

HttpWebRequest redirects back to Login redirect page

Posted 22 April 2014 - 05:33 PM

I have been struggling with the HttpWebRequest/HttpWebResponse procedure for the better part of a week now and am no closer to figuring out the solution than when I started and can really use some help.

I have looked at literally hundreds of pages and searched all over the place, but have not figured out how to get this to work correctly. I am amazed there isn't a good example of this anywhere (or at least I have been unable to find one yet). I am sure it is something I have overlooked or am just not grasping the solution.

So, what I am trying to do is to scrape data from a secure site. Don't worry, it is not for nefarious purposes. It's for a game I play, I am part of a team and within the site we have team data that we would like to share amongst each other( and store for future reference) without having to manually copy and paste. My goal is to create a web page that one would use to input their game credentials that would then connect to and retrieve game data from the game web site.

Here are the three main types of pages that I am dealing with and how they operate:
-There is a main login page. Although I am not accessing it via code. I may need to in the future.

-There is a Login redirect page. When I attempt to access a secure page, if I am not currently logged into the site, this page is displayed and allows me to type in the username and password.

-Lastly, there are Data pages - once you are logged into the site, you have access to a series of pages that contain game related data (these are the pages I want to scrape)

Something to note:
I have used LiveHTTPHeaders to capture the string that is added to the URL, which shows up like this:
"textLogin=username&textPassword=password&token=xxxxxx&Logon=Login&LogonFake=Login";

I am unsure if I have to mimic this or if by using the cookie within the HttpWebRequest, it will do this for me. The token is a fairly long string of characters that I unsure where it comes from. It doesn't match anything in the cookie. If I build the URL manually with the webpage location combined with the string from above and put it in the address bar of a browser - it goes directly to the desired page (logging me into the site behind the scenes)

Here is what I coded so far without any success:
    public void TestConnection()
    {
        //Build the connection string
        ASCIIEncoding encoding = new ASCIIEncoding();
        string postData = "textLogin=" + txtUsername.Text + "&textPassword=" + txtPassword.Text;
        byte[] data = encoding.GetBytes(postData);

        //Make first request
        HttpWebRequest firstRequest = (HttpWebRequest)WebRequest.Create(targetURL);
        firstRequest.AllowAutoRedirect = false;
        firstRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
        firstRequest.Method = "POST";
        firstRequest.ContentType = "application/x-www-form-urlencoded";
        firstRequest.ContentLength = data.Length;
        
        //Add the cookie container that will be used to capture the cookie
        CookieContainer mycookie = new CookieContainer();
        firstRequest.CookieContainer = mycookie;

        //Create the request stream
        Stream firstRequestStream = firstRequest.GetRequestStream();
        firstRequestStream.Write(data, 0, data.Length);
        firstRequestStream.Close();

        //Build the response
        HttpWebResponse firstResponse = (HttpWebResponse)firstRequest.GetResponse();
        //Add any cookies that are found
        foreach (Cookie responseCookie in firstResponse.Cookies)
        {
            mycookie.Add(responseCookie);
        }

        //Make a second request adding the cookie container from first response
        HttpWebRequest secondRequest = (HttpWebRequest)WebRequest.Create(siteDomain + firstResponse.Headers["Location"]);
        secondRequest.KeepAlive = false;
        secondRequest.Method = "POST";
        secondRequest.ContentType = "application/x-www-form-urlencoded";
        secondRequest.ContentLength = data.Length;
        secondRequest.AllowAutoRedirect = true;
        secondRequest.CookieContainer = mycookie;

        //Make a second request stream
        Stream secondRequestStream = secondRequest.GetRequestStream();
        secondRequestStream.Write(data, 0, data.Length);
        secondRequestStream.Close();

        //Get the second response
        HttpWebResponse secondResponse = (HttpWebResponse)secondRequest.GetResponse();
        //Capture the HTML and show in the immediate window
        StreamReader sReader = new StreamReader(secondResponse.GetResponseStream());
        string HTML = sReader.ReadToEnd();
        Debug.Print (HTML);

    }



When it gets to the end of the procedure shown above, the string HTML is of the LoginRedirect page and not the desired Data page. I am uncertain why it isn't logging into the correct page.

I am not sure if the second request should be a GET method (rather than a POST). I don't quite understand the logic behind this kind of procedure, I am sure there is a better way to accomplish what I am trying to do. If you have a good example or know of one, I'd love to see it. I have come across quite a number of examples, but none of them quite explain what it is doing or actually work with a secure site. Most examples of this kind of procedure truly suck and get you nowhere.

This has become extremely frustrating and any help would be most appreciated.

Is This A Good Question/Topic? 0
  • +

Replies To: HttpWebRequest redirects back to Login redirect page

#2 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3616
  • View blog
  • Posts: 11,263
  • Joined: 05-May 12

Re: HttpWebRequest redirects back to Login redirect page

Posted 22 April 2014 - 08:21 PM

If you run Fiddler, what are the series of requests and responses that get sent when using a browser? How does that compare when using your program?
Was This Post Helpful? 0
  • +
  • -

#3 texasdeck  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 6
  • Joined: 04-December 09

Re: HttpWebRequest redirects back to Login redirect page

Posted 23 April 2014 - 12:16 PM

View PostSkydiver, on 22 April 2014 - 09:21 PM, said:

If you run Fiddler, what are the series of requests and responses that get sent when using a browser? How does that compare when using your program?


Thanks for the reply. I'll have to look into it. At this point, I'd be willing to pay someone cash to fix this for me. I'll have to get back to you when I get home and have the ability to access Fiddler.
Was This Post Helpful? 0
  • +
  • -

#4 Curtis Rutland  Icon User is online

  • (╯□)╯︵ (~ .o.)~
  • member icon


Reputation: 4525
  • View blog
  • Posts: 7,893
  • Joined: 08-June 10

Re: HttpWebRequest redirects back to Login redirect page

Posted 23 April 2014 - 01:26 PM

I have a question first. Based on this:

Quote

I have used LiveHTTPHeaders to capture the string that is added to the URL, which shows up like this:
"textLogin=username&textPassword=password&token=xxxxxx&Logon=Login&LogonFake=Login";


Is that string actually added to the browser's URL? If so, that's GET (and it's a really badly designed site to put secure info like the password in the query string). If it's not actually part of the URL, but part of the Headers, that's POST.

The second request probably should be GET, since you're not passing any particular data to an action.

But you should get fiddler and track what happens on a normal login, then track what happens when your program tries to login. See what's different. Make sure to inspect the headers on all requests, and check the cookies as well. That should show you what you have to do.

Then use breakpoints in your code to look at the cookies returned from the first request to make sure they match what you're expecting.
Was This Post Helpful? 0
  • +
  • -

#5 texasdeck  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 6
  • Joined: 04-December 09

Re: HttpWebRequest redirects back to Login redirect page

Posted 23 April 2014 - 05:07 PM

No it's not added to the URL. LiveHTTPHeaders (an add-on for Firefox that is similar to Fiddler) shows it within the POST (like you mention). You are correct the first request is a POST and the second is a GET.

I was planning on using Fiddler when I get home to do as Skydiver suggested. I tried to play around with it on my lunch hour. Unfortunately, it doesn't work well at work.

I have pretty much revamped a lot of my code through trial and error and dissecting the information that LiveHTTPHeaders shows. I finally was able to return the HTML of the desired webpage.

It appears that I had a number of things incorrect. The main thing was that it seems that when I was adding the cookies to the container, it was only adding one instead of all of them (not sure why). I changed it from:
        foreach (Cookie responseCookie in firstResponse.Cookies)
        {
            mycookie.Add(responseCookie);
        }


to this:
        foreach (Cookie responseCookie in firstResponse.Cookies)
        {
            mycookie.Add(firstResponse.ResponseUri, responseCookie);
        }



--------------------------------
So here is all the code I used(I sort of cleaned it up some, removing the Debug.Print calls and commented out code. Also renamed a number of objects. Hopefully you'll get the gist of it)

Added this at the top of the code-behind
using System.Net;
using System.IO;
using System.Text;



Here is the button click event
    protected void btnSubmit_Click(object sender, EventArgs e)
    {
        TestConnection();
    }



Here is the main procedure. There are two Textbox controls (txtUsername and txtPassword) on the webpage that I use. You'll notice that I have hard-coded the token that LiveHttpHeaders shown was being passed in (along with this:"&Logon=Login&LogonFake=Login"). I'll have to look into where the token is being generated to see if I can capture it on my first request, for now I can live with it being hard-coded. Hopefully, it will work for my teammates.

Here are two values that are used below (of course these are not the real addresses):
ThisIsTheRedirectedURLOfTheDesiredPage = "https://www.xyz123.com/Login.asp?Redirect=mydesiredpage.asp"

URLofDifferentPageThatIWouldLikeToScrape= "https://www.xyz123.com/mynextdesiredpage.asp"
    public void TestConnection()
    {

        CookieContainer mycookie = new CookieContainer();

        ASCIIEncoding encoding = new ASCIIEncoding();
        string postData = "textLogin=" + txtUsername.Text + "&textPassword=" + txtPassword.Text + "&token=xxxxxxxxxxxxxxxxxxxxxxx&Logon=Login&LogonFake=Login";
        byte[] data = encoding.GetBytes(postData);

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(ThisIsTheRedirectedURLOfTheDesiredPage);
        request.AllowAutoRedirect = true;
        request.KeepAlive = true;
        request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0";
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = data.Length;

        request.CookieContainer = mycookie;

        Stream RequestStream = request.GetRequestStream();
        RequestStream.Write(data, 0, data.Length);
        RequestStream.Close();

        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        foreach (Cookie responseCookie in response.Cookies)
        {
            mycookie.Add(response.ResponseUri, responseCookie);
        }
        StreamReader sReader = new StreamReader(response.GetResponseStream());
        string strHTML = sReader.ReadToEnd();
        
        //At this point the strHTML string contains the HTML of the desired page
        //One could just stop here and scrape out the data that you are trying to capture and exit

        //or you can continue on to the next section to grab additional data.
        //I'm thinking it may even be possible to keep the objects available (and capture when needed)
        //        until the user closes the browser.

        //----------------------------
        //This is to show how one would reuse the existing objects in order to capture the next page(s)

        request = (HttpWebRequest)WebRequest.Create(URLofDifferentPageThatIWouldLikeToScrape);
        request.KeepAlive = true;
        request.Method = "GET";
        request.ContentType = "text/html";

        response = (HttpWebResponse)request.GetResponse();

        sReader = new StreamReader(response.GetResponseStream());
        strHTML = sReader.ReadToEnd();

        //----------------------------
        //You can repeat this section to capture data from additional pages as long as the objects aren't disposed of
        //----------------------------
    }



Next step is to clean this up quite a bit and see if I can convert it to a class. Quite honestly, I'm certain there is a better way to do this. Also, I would imagine there is some unnecessary code in there. Web development is certainly not my forte. While I was searching ways to do this I came across the InnerText property (can't remember the Object type). I would prefer to use that, since it may be easier to parse just text rather than trying to parse HTML using regular expressions. So that will be one of my next endeavors.

Hopefully this helps anyone down the road that is running into this same sort of problem. Thanks guys for your help.
Was This Post Helpful? 0
  • +
  • -

#6 texasdeck  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 6
  • Joined: 04-December 09

Re: HttpWebRequest redirects back to Login redirect page

Posted 23 April 2014 - 05:35 PM

Just thought I'd add that I found the InnerText property I mentioned. It's part of the HtmlAgilityPack.

So one could use that to just snag the Text out of the HTML by doing the following (This would be inserted where the StreamReader is created):
...
        StreamReader sReader = new StreamReader(response.GetResponseStream());
        string strHTML = sReader.ReadToEnd();

        HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();

        doc.LoadHtml(strHTML);
        string strInnerText = doc.DocumentNode.InnerText;
...



Hopefully, this will help others if need be.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1