9 Replies - 744 Views - Last Post: 27 February 2016 - 10:17 AM Rate Topic: -----

#1 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Please help me understand this code.

Posted 04 February 2016 - 10:12 AM

Hey I am working on a project for the CS50 online class from edx. They provided us with the skeleton of a web server and we are suppose to fill in some of the functions they did not finish. I am going through the code they wrote as I want to understand every line before I get started. I think I under stand most of this function except the offset variable. I just do not understand what it is to represent and why must it be less than or s=equal to 3. I wrote some comment in here to help me understand the lines I get so hopefully it will help me understand the reset once I put it together. The length is created in the main function and reset every iteration of the infinite loop that keeps the server running. But is not used in anything else. So I am assuming it is used the keep track of the bytes read from the socket every iteration of the loop and the byutes is the number of bytes read during this loop so by subtracting bytes from length they are calculating the difference of the bytes read this time and the total bytes. I hope I am explaining this okay. I just do not understand what that offset is and why it must be be <= 3. I will post the request function as the whole program is over 1000 lines please let me know if you need more thank you for your time.
bool request(char** message, size_t* length)
{
	// <======================== MESSAGE WILL CONTAIN THE ENTIRE REQUEST ==========>
	// <======================= LENGTH IS # OF BYTES READ ===================> 
    // ensure socket is open
    if (cfd == -1)
    {
        return false;
    }
    // initialize message and its length
    *message = NULL;
    *length = 0;

    // read message --> length < (max size of the request +4) 4 is for \r\n\r\n which sigifies end of request header          //this is the max a request can be
    while (*length < LimitRequestLine + LimitRequestFields * LimitRequestFieldSize + 4) 
    {
        // read from socket
        BYTE buffer[BYTES];
        ssize_t bytes = read(cfd, buffer, BYTES);// bytes will contain # of bytes read from socket into buffer
        if (bytes < 0) // if 0 bytes were read
        {
            if (*message != NULL)
            {
                free(*message);
                *message = NULL;
            }
            *length = 0;
            break;
        }
		
        // append bytes to message 
        *message = (char*)realloc(*message, *length + bytes + 1);// 1 is for '\0'
        if (*message == NULL)
        {
            *length = 0;
            break;
        }
        memcpy(*message + *length, buffer, bytes); //add buffer onto end of message
        *length += bytes; // increase length by bytes

        // null-terminate message thus far
        *(*message + *length) = '\0'; // have 2 derefnce 2 times because message is **

        // search for CRLF CRLF // below (overall bytes read - bytes read this iteration )
        int offset = (*length - bytes < 3) ? *length - bytes : 3; /// (is 3 for "\r\n\o")
        char* haystack = *message + *length - bytes - offset; ///<-- haystack will point to where message was before this iteration - offset BUT WHAT IS OFFSET
        char* needle = strstr(haystack, "\r\n\r\n"); //now needle points to '\r\n\r\n'
        if (needle != NULL) /// end of request has not been found yet
        {
            // trim to one CRLF and null-terminate
            *length = needle - *message + 2;  
            *message = (char*)realloc(*message, *length + 1);
            if (*message == NULL)
            {
                break;
            }
            *(*message + *length) = '\0';

            // ensure request-line is no longer than LimitRequestLine
            haystack = *message;
            needle = strstr(haystack, "\r\n"); ///<============ search 4 "\r\n"
            if (needle == NULL || (needle - haystack + 2) > LimitRequestLine)
            {
                break;
            }

            // count fields in message
            int fields = 0;
            haystack = needle + 2;
            while (*haystack != '\0')
            {
                // look for CRLF
                needle = strstr(haystack, "\r\n");
                if (needle == NULL)
                {
                    break;
                }

                // ensure field is no longer than LimitRequestFieldSize
                if (needle - haystack + 2 > LimitRequestFieldSize)
                {
                    break;
                }

                // look beyond CRLF
                haystack = needle + 2;
            }

            // if we didn't get to end of message, we must have erred
            if (*haystack != '\0')
            {
                break;
            }

            // ensure message has no more than LimitRequestFields
            if (fields > LimitRequestFields)
            {
                break;
            }

            // valid
            return true;
        }
    }

    // invalid
    if (*message != NULL)
    {
        free(*message);
    }
    *message = NULL;
    *length = 0;
    return false;
}


This post has been edited by arortell: 04 February 2016 - 10:16 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Please help me understand this code.

#2 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7189
  • View blog
  • Posts: 24,365
  • Joined: 05-May 12

Re: Please help me understand this code.

Posted 04 February 2016 - 12:16 PM

In my opinion, the comment on that line that says "/// (is 3 for "\r\n\o")" is wrong. I'm forming that opinion because as far as I know, there is no C escape character '\o'. It could be typo, and the original author mean '\0' instead of '\o'.

Assuming that it is a typo, that offset could be attempt to deal with RFC 7230 compliant HTTP clients which may accidentally send '\0' to end a header field, but the original author is making a huge assumption that a complete header field was received.

Personally, I would always set offset to be the minimum of: the length of the previous message, or the length of the needle that you are looking for.

As as aside, it looks like that code leaks memory when realloc() fails to allocate a new chunk of memory.
Was This Post Helpful? 0
  • +
  • -

#3 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Re: Please help me understand this code.

Posted 04 February 2016 - 12:38 PM

Yea that '\o' is a typo I put that there while I was trying to figure out what that offset is there for.I am guessing that the 3 is for the '\r\n\0'. I read that '\r\n\r\n' signifies that the full request was received. Being that later in the function he trims of a '\r\n' leaving one behind. I just assumed that adding the null terminating '\0' at the end plus the '\r\n' would be 3. Since it must be <= 3 I am assuming that that either '\r' or '\n' or '\0' might be there. But the expression is not checking for negative numbers so that is out. I just don't get it. Thanks for the response. By the way the '\r'\n' is carriage return right? If it is how can that be in the middle of a http request?
Was This Post Helpful? 0
  • +
  • -

#4 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7189
  • View blog
  • Posts: 24,365
  • Joined: 05-May 12

Re: Please help me understand this code.

Posted 04 February 2016 - 12:47 PM

In general, always to refer to RFC 7230. What that code is doing is parsing the HTTP headers. (See section 3.)

     HTTP-message   = start-line
                      *( header-field CRLF )
                      CRLF
                      [ message-body ]



Notice on line 3 that the separator between the headers and the message body is a CRLF ("\r\n"). Also notice that the every header line ends with a CRLF. Therefore to find out where the headers end, and where the message starts, search for CRLFCRLF ("\r\n\r\n").
Was This Post Helpful? 1
  • +
  • -

#5 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7189
  • View blog
  • Posts: 24,365
  • Joined: 05-May 12

Re: Please help me understand this code.

Posted 04 February 2016 - 01:21 PM

This is the reason why magic numbers should be documented. Readers in the future will not be going "Why spoon cousin?" "Because it would hurt more." "Why the magic number 3?"

I just needed some time to get a snack, but I figured it out. Imagine what happens if the read call on line 19 always just reads one byte at a time. The search code only needs a maximum offset of 3 characters back into the previous message, because the current read will have read at least 1 character. This then sets everything up so that you can search for at least 4 characters and find a potential match.

I would have written the code as something like:
const char EndOfHeaderSignature[] = "\r\n\r\n"
int messageLookBackMax = strlen(EndOfHeaderSignature) - 1;    // At least one character will be received
:
:
int previousMessageLength = *length - bytes;
int previousMessageLookBack = min(previousMessageLength, previousMessageLength);
char * messageBuffer = *message;
char * searchStart = &messageBuffer[previousMessageLength - previousMesageLookBack];
char * endFound = strstr(searchStart, EndOfRequestSignature);
:
:


Was This Post Helpful? 2
  • +
  • -

#6 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Re: Please help me understand this code.

Posted 06 February 2016 - 01:45 PM

I get it! Thanks alot I have spent alot of time trying to figure this out. You are right this is a class to help teach new programmers the proper way to write code. You think they would have documented it better. But I think they never intended us to read this much into it. Just fill in the functions they wanted us to write. But I wanted to understand every line. I mean how else are we to learn. Thank you again for your help it is VERY much appreciated.
Was This Post Helpful? 0
  • +
  • -

#7 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Re: Please help me understand this code.

Posted 06 February 2016 - 03:46 PM

So basically he is making sure that it goes back no more 3 chars.(he has 'typedef char BYTE;'). To make sure haystack is not pointing to '\r', '\n' or '\0' in the entire header that was read excluding the buffer read THIS iteration. Because there will ALWAYS be a '\r\n' at the end of the headers line and every iteration he i tacks on a '\0'.

This post has been edited by arortell: 06 February 2016 - 03:50 PM

Was This Post Helpful? 0
  • +
  • -

#8 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Re: Please help me understand this code.

Posted 06 February 2016 - 04:05 PM

I understand that all that is needed to match the pattern is 3 chars. I am just guessing that he making sure that nothing beyond the total 4 bytes is skipped during the matching. I am not sure I am explaining this correctly. If he made offset lets say 4 than 1 byte is read in. Then the last byte would be skipped and would not be searched by strstr function.

This post has been edited by arortell: 06 February 2016 - 04:13 PM

Was This Post Helpful? 0
  • +
  • -

#9 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 7189
  • View blog
  • Posts: 24,365
  • Joined: 05-May 12

Re: Please help me understand this code.

Posted 06 February 2016 - 05:11 PM

No. The offset is used to set the haystack beginning to guarantee that it will find the CRLFCRLF in case the previous the previous receive ends with CRLFCR, and the current receive just read in the LF.

It uses 3 instead of 4 as optimization because there if the needle were in the haystack on the previous iteration, it would have been found in the previous iteration. Since it was not, the only way to find the

Let's say the HTTP request looked like:
Host: www.dreamincode.net\r\n
Connection: keep-alive\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36\r\n
Accept-Encoding: gzip, deflate, sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
\r\n
Fake message body
\r\n



On the first call to receive(), only the following was read in:
Host: www.dreami



Host: www.dreami
^
|
haystack



On the next read we get:
ncode.net\r\n
Conne



The entire message buffer would look like:
Host: www.dreamincode.net\r\nConne



Host: www.dreamincode.net\r\nConne
             ^
             |
             haystack



Lather rinse repeat


The 3 offset is meant to deal with the following case. Let's say kept on reading and we got the following message buffer:
Host: www.dreamincode.net\r\n
Connection: keep-alive\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36\r\n
Accept-Encoding: gzip, deflate, sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
\r



And the next read will read in:
\n
Fake message body
\r\n



The 3 sets things up so that we have:
Host: ww ... Language: en-US,en;q=0.8\r\n\r\nFake message body\r\n
                                     ^
                                     |
                                     haystack


Was This Post Helpful? 1
  • +
  • -

#10 arortell   User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 59
  • Joined: 26-August 14

Re: Please help me understand this code.

Posted 27 February 2016 - 10:17 AM

I got it that helped alot thank you very much.

This post has been edited by arortell: 27 February 2016 - 10:17 AM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1