Page 1 of 1

Network programming under UNIX Rate Topic: -----

#1 JWHSmith  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 1
  • View blog
  • Posts: 9
  • Joined: 13-November 14

Posted 21 January 2015 - 12:50 PM

After reading this (slightly outdated) request about a networking tutorial, I came across this post by James_Alex which seemed to answer it.

Now, because I don't like Windows, and because I find the excessive, yet sadly necessary complexity of WINSOCK completely absurd, I thought I'd go back to the roots and post a tutorial about networking in C, under Linux/BSD. This tutorial will cover the basics of the UNIX network API, as defined in Berkeley UNIX. This API should now be available in pretty much any Linux/BSD implementation, and most system calls are expected to be compatible across platforms.

Note: As I said earlier, Windows does not (actually, cannot) rely on this API. For this reason, a different API is defined for DOS systems, called WINSOCK (in typical Microsoft ALL-CAPS FASHION). If you plan on developing a Windows application here, I recommend you read the tutorial I linked earlier instead. Of course, the description of the UNIX API (which I find a little clearer) might help you understand the notions, which remain similar.

For the rest of this tutorial, let's consider the following terms:
  • Local end point: the local client machine connecting to a remote server, or the local server machine waiting for remote clients.
  • Remote end point: the remote client machine trying to connect to a local server, or the remote server machine which we're trying to reach through our local client.
  • Message: a message is a sequence of bytes sent over the network. These must not be confused with network packets, which are parts into which the messages is divided before it can be handled by lower layers of the OSI model.

The tutorial will be divided into the following parts:

  • IP protocols: TCP vs UDP
  • Representing hosts and addresses
    • IP addresses data structures
    • Host identification
  • Opening and closing IP sockets
  • Acting as a server
    • Binding to the local end point
    • Getting ready for incoming connections
    • Accepting an incoming connection
  • Acting as a client
    • Defining the remote end point
    • Connecting to the remote end point (server)
  • Exchanging messages
    • TCP style
    • UDP style
  • References


IP protocols: TCP vs UDP

Before we talk about the network API, it might be a good idea to know how communications over this network are organised. Without going into too much details, the key notion here is that of protocols. A protocol is basically a set of rules on which systems agree, and thanks to which they understand one another. Just like with any other human language, these rules must be followed in order for your interlocutor to understand you (cuz else it cant get no grasp of wat ur sayin). The difference here is that computers show no mercy, and will not make efforts trying to understand something that does not meet the standard.

To communicate over IP, there are 2 main protocols:
  • TCP: Transmission Control Protocol, this protocol is based on the concept of connection. Before messages can be exchanged, the machines need to be connected with one another. Until this connection is established, no message can be sent. Additionally, this protocol provides error checking mechanisms, and ensures that all outgoing messages are received, or resent. This is the protocol you want to use when your communication has to be safe and stable, but this comes with a cost: messages need more time to go from one point to another, and may hang over crappy networks.
  • UDP: User Diagram Protocol, which is basically the opposite. Messages are sent out on-the-fly, and may, or may not, reach their destination. There is no error checking mechanism, and applications cannot tell whether their messages has been sent successfully. This protocol, on the other hand, is extremely fast: it is used for video streaming, DNS, and so on. It is a good idea to use it if you send a lot of messages, and do not care too much if some are lost.

Now, no matter what protocol suits you best, the API defines the same concepts for both. While a few system calls change (rather logically), you shouldn't have too much trouble switching from one to the other.


Representing hosts and addresses

IP addresses data structures

The second important concept is that of IP address. Now, I won't go into deep details, telling you how IPs are stored, and how their work, but it is important to know how their are represented by the API: basically, you need to know the key data structures we'll use.

Note: with IPv6 coming our way, this part of the API had to be completely reviewed. I'll present the IPv6-compatible way to do things, but you'll find many programs (like most of mine...) that don't use it (because they assume an IPv4 fallback is available). If you want to know more about that, have a look at the references at the end of this tutorial.

The first structure you need to know about is struct sockaddr. It is defined like this:

struct sockaddr {
    unsigned short    sa_family;
    char              sa_data[14];
};

To put it simply: sa_family is either AF_INET (IPv4) or AF_INET6 (IPv6). These constants are available through the sys/socket.h header file. The sa_data field is the most interesting one: it holds information about the address, in a numeric form. Now, whether the contents of that field are IPv4 or IPv6 depends on the first field. For this reason, you may cast that structure into two others: struct sockaddr_in and struct sockaddr_in6. When you do so, the large sa_data blob is divided into more fields:

struct sockaddr_in {
    short int          sin_family;  // matches first sa_family
    unsigned short int sin_port;
    struct in_addr     sin_addr;
    unsigned char      sin_zero[8]; // fills what's left of the original sockaddr size
};

struct sockaddr_in6 {
    u_int16_t       sin6_family;
    u_int16_t       sin6_port;
    u_int32_t       sin6_flowinfo;
    struct in6_addr sin6_addr;
    u_int32_t       sin6_scope_id;
};

While most fields can be understood by their names, a few notes: the first field is always the address family, and remains available even when the structure is under a struct sockaddr form. Now, you notice the appearance of a port number, while the IP data is stored in sin(6)_addr. Those structures (in_addr and in6_addr) are basically holders for an unsigned char and u_int32_t field (called s_addr and s6_addr respectively). The sin_zero field can be ignored, since it is only here to "fill" what's left from struct sockaddr after the cast (the fields don't use the whole sa_data array).

Note: with the new API definition (see below), you don't need to care too much about the shape of these structures, since you'll basically use struct sockaddr and let the system calls decide what to do based on the family field.

Host identification

Now that we know about addresses, let's talk about hosts in general. While the IP address is the main piece of information, hosts are defined by a larger data structure called struct addrinfo. Here are the main fields of this structure (I have removed some of them for clarity) :
struct addrinfo {
    int              ai_family;    // IPv4 or IPv6?
    int              ai_socktype;  // Communication type: connection or datagrams?
    int              ai_protocol;  // Protocol: TCP or UDP?
    struct sockaddr *ai_addr;      // The IP address (may be sockaddr_in or sockaddr_in6)
    size_t           ai_addrlen;   // Address size
    char            *ai_canonname; // String hostname
};

The first 3 fields should be quite familiar to you. The family is still about IPv4 and IPv6, while the two others define the protocol. In most cases, you need to define the communication type (connection or datagrams), and the system will automtically deduce TCP or UDP respectively. For this to happen, you need to set ai_protocol to 0 (auto-detection based on other fields). For this tutorial, ai_socktype may be either SOCK_STREAM (TCP) or SOCK_DGRAM (UDP).

The other fields should usually not be filled, since this is the information we want to retrieve. In order to do so, we need to use the getaddrinfo function. This is done in a few steps:

  • Declare an addrinfo structure containing everything we know (address family and protocol). This is basically "giving hints" to the function.
  • Declare a pointer ready to receive another addrinfo structure, in which getaddrinfo will have written everything it finds out based on your hints.

Here comes an example:
struct addrinfo hints; // What we know.
memset(&hints, 0, sizeof(hints)); // Always good to clear garbage memory.
hints.ai_family = AF_INET; // IPv4
hints.ai_socktype = SOCK_STREAM; // TCP (stream-based), use SOCK_DGRAM for UDP.

/* Flags are basically options you might want to specify.
 * Here, I'm setting them to AI_CANONNAME in order to fill the ai_canonname field,
 * but feel free to set them to zero if you don't care about options.
 * This should tell me that the host I'm trying to resolve is www.google.com. */
hints.ai_flags = AI_CANONNAME;

struct addrinfo* results; // What we want to know.

int status;
if((status = getaddrinfo("www.google.com", "80", &hints, &results)) != 0){
    fprintf(stderr, "getaddrinfo: %s.\n", gai_strerror(status));
    exit(1);
}

printf("Successfully resolved %s.\n", results->ai_canonname);

The only truly new thing here is the function call. The first argument is a string representation of the host: it can be a domain, or an IP. The second one is the port, which may also be a text version such as "http", "ftp" (see your /etc/protocols file). The last two parameters reference our structures: our hints and the location where the results should be stored. Once the call returns, we just check for errors, and print the hostname we just resolved. This being done, we now have all the data we need to know about our host in order to connect to it.


Opening and closing IP sockets

Just like many things under UNIX systems, network communication requires... files (or more specifically here, descriptors). A socket is basically a virtual file representing your connection: reading from that file retrieves a message, while writing sends one: easy enough. In order to read and write, you'll first need a descriptor to that "file". This is done though the socket system call.
int remote_socket = socket(results->ai_family, results->ai_socktype, results->ai_protocol);

The parameters here are exactly the same as the first three fields in struct addrinfo: the address family, the communication (socket) type, and the protocol. This call returns -1 on error, so it might be a good idea to check it.

Once a socket is opened, we're able to perform network operations through it, which is what we'll see in the next sections. In the meantime, another interesting thing is closing that socket. When your application is done communicating, just close the socket just like any other file descriptor. Keep in mind that the number of sockets you're allowed to opened are limited: don't create lots of them without ever closing them, if you don't want your kernel to hold your network messages.
close(remote_socket);


Acting as a server

A server basically needs one socket for itself (we'll call it the listening socket), and a socket for each client (we'll call them remote sockets). When your server starts, it only prepares the listening socket. Whenever a client connects, a remote socket is defined and ready to be used for further communication with that specific client.

Binding to the local end point

Let's see how to prepare the listening socket. First, you need to define where you are listening from, basically: what IP address/host your clients should use to reach you. We've seen how to define hosts in previous sections, so here's an example for a local end point...
struct addrinfo hints, *results;
memset(&hints, 0, sizeof(hints)); // Always good to clear garbage memory.
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;

int status;
if((status = getaddrinfo("localhost", "6000", &hints, &results)) != 0){
    fprintf(stderr, "getaddrinfo: %s.\n", gai_strerror(status));
    exit(1);
}

Now that we know about ourselves, let's create a socket and connect (bind) it to that information. This is done though the bind system call. Here's an example:
int listening_sock = socket(results->ai_family, results->ai_socktype, results->ai_protocol);

if(bind(listening_sock, results->ai_addr, sizeof(struct sockaddr)) < 0){
    perror("bind");
    exit(1);
}

There you go, your socket is now ready to listen.

Getting ready for incoming connections

Note: If you are using UDP, there is no need to actively "listen", since there is no connection involved. You can start receiving messages right away, once your local socket is bound.

This should be kinda quick. Now that you have defined how you want to listen, you just need to tell the kernel that your program needs to handle messages coming from port 6000 (which we chose earlier). Basically, you need to tell it that listening_socket actually is, a listening socket. This is done though the listen system call which shouldn't require too much explanation...
if(listen(listening_socket, 10) < 0){
    perror("listen");
    exit(1);
}

The second argument is just the maximum number of connections you want your program to handle simultaneously (here, 10).

Accepting an incoming connection

Note: Again, if you are using UDP, this step doesn't concern you either. You can start exchanging messages right after binding.

The next step is to wait for an incoming connection. This is called accepting a connection, and is done through the... accept system call. This call returns a ready, brand new socket descriptor bound to the client which just reached you. It also requires space to store this client's address: a struct sockaddr. Here goes the example:

int i, remote_sock[10];
struct sockaddr remote_addr[10];
socklen_t addrlen = sizeof(struct sockaddr);

for(i = 0; i < 10; i++){
    remote_sock[i] = accept(listening_sock, &remote_addr[i], &addrlen);
    printf("Client connected!\n");
    close(remote_sock[i]);
}

Here, we're basically allocating enough room for 10 clients (10 sockets, 10 addresses) and accepting in a loop. Note that this does not allow parallel connections, nor can it reuse previous socket (ends after 10 connections). These little problems can be fixed though the use of parallel programming (threads, processes and IPC) but we'll leave that aside today.

Once accept has returned, remote_sock indexes hold sockets connected to your clients. You may know use these sockets to receive and send messages from your clients, which we'll see a little further down.


Acting as a client

Defining the remote end point

As always, the first step is to define where we're going. This is done exactly the same way as in the server example, except you may not use localhost if your server is running on another machine. In this case, just put in this machine's IP address and you're good to go. You may also want to set ai_flags to AI_CANONNAME in order to resolve your server's hostname on the way.

Connecting to the remote end point (server)

Note: if you are using UDP, there is no connection involved in your communication. There is no need to call connect. Once you have retrieved information about your remote server, just start receiving and sending messages.

Once again, we'll need a socket bound to the information we retrieved previously, in order to connect to our server. Now that you have your filled results structure, you can use the connect system call to connect.
int remote_sock = socket(results->ai_family, results->ai_socktype, results->ai_protocol);

if(connect(remote_sock, results->ai_addr, sizeof(struct sockaddr)) < 0){
    perror("connect");
    exit(1);
}

printf("Connected to %s.\n", results->ai_canonname); // Requires AI_CANONNAME!

There you go, you are now connected to your server! You should see Client connected! on its side. Once this is done, you have a connected socket with your server: you are now ready to exchange messages with it, through remote_sock.


Exchanging messages

Message transmission differs whether you use TCP or UDP. For this reason, these two sections will present you two different sets of system calls dedicated to message exchange. Before you start transmitting or receiving, it is important to prepare a clean buffer in which you'll be able to hold your messages. For instance:
char buffer[256];
memset(buffer, 0, 256); // Ready to receive 256-byte-long messages (max).

We'll know assume that this buffer is ready. The examples I'll give you are designed for the client side, but since you have a socket ready in both programs, you should be able to reproduce the same behaviour on the server side: nothing changes.

TCP style

Receiving and sending over TCP is done through the recv and send system calls, which are pretty self-explanatory. The two calls take the same arguments: the socket through which you want to communicate, your buffer (ready to hold your incoming messages, or containing your outgoing message) and the message length (or max. length when receiving). A final parameter holds flags, but let's forget about that. Here is a simple example of a client sending a message and waiting for a response:
strcpy(buffer, "Hello, World!");
if(send(remote_sock, buffer, strlen(buffer), 0) < 0){
    perror("send");
    exit(1);
}

memset(buffer, 0, 256); // Clean up!                                                                                                                                                   
if(recv(remote_sock, buffer, sizeof(buffer), 0) < 0){
    perror("recv");
    exit(1);
}

printf("Response from server: %s.\n", buffer);

It is very important that you don't use sizeof (which returns 256) when you actually know the length of your message, otherwise your system will try to send meaningless bytes situated at the end of the buffer. When sending, use strlen to give the actual string length, and keep sizeof for reception, when you can only give a maximum length. A possible server scenario here would be to wait for a message from the client (on remote_sock indexes) and send another one, let's say Hello, You! before closing the connection.

UDP style

UDP differs in that you don't actually have a connection to use. For this reason, once you've retrieved information about your local or remote end host, you can just start receiving and sending messages. This time, you'll have to use sendto and recvfrom, which require quite a few parameters...

  • On the server side, a socket bound to your local end point through which you can both send and receive messages. On the client side, a non-bound socket which will serve as a dummy support for everything as well.
  • Your buffer.
  • Your buffer (max) size.
  • Some flags (we'll assume none, 0).
  • Enough room to store the sender's/recipient's IP address (empty struct sockaddr).
  • The size of the previous structure.

Here is an example of a client sending a UDP message to a server, and waiting for a response. Here, I'm assuming that I have retrieved information about the remote recipient through getaddrinfo:
struct sockaddr sender;
socklen_t addrlen = sizeof(recipient);

strcpy(buffer, "Hello, World!");
if(sendto(remote_sock, buffer, strlen(buffer), 0, results->ai_addr, addrlen) < 0){
    perror("sendto");
    exit(1);
}

memset(buffer, 0, 256);
if(recvfrom(remote_sock, buffer, sizeof(buffer), 0, &sender, &addrlen) < 0){
    perror("recvfrom");
    exit(1);
}

printf("Received from the server: %s.\n", buffer);

Now, before you run this, make sure you have entirely designed your program for UDP: don't forget to use SOCK_DGRAM instead of SOCK_STREAM and remove TCP-only operations such as connecting, listening, accepting, and so on. Basically, on the server side, you should:

  • Retrieve information about the local end point.
  • Bind a socket to the local end point.
  • Call recvfrom to get the client's message.
  • Call sendto to respond.
  • Close your socket.

... while on the client side, you should:

  • Create a socket but don't bind it to any end point.
  • Call sendto to send your request to the server.
  • Call recvfrom to await its response.
  • Close your socket.

And there you have it: a UDP client/server pair.


References

Through this tutorial, I have given some content about the basics of IP networking under UNIX, but there is a lot more information available, and many other features which I didn't explore here. Here are a few links that you may find interesting for further study:

  • Beej's guide to network programming ; this is a big reference on the topic. You'll find a lot more information and details about what we covered and more (asynchronous I/O, old school host identification, functions/data structures reference, ...).
  • The Wikipedia pages for TCP and UDP ; always useful for a more in-depth description of the protocols.
  • For every system call and function I used in that tutorial, don't underestimate the man pages, for instance: man connect. You'll find a lot of information about each call, including these flags I kept setting to zero.
  • Again, this post about WINSOCK networking.

And now I think we're done! Thanks for reading, and see you next time!



Is This A Good Question/Topic? 0
  • +

Page 1 of 1