How does an HTTP Request work?

This post will attempt break down the anatomy of an HTTP request. Recall that HTTP stand for Hyper Text Transfer Protocol and it's basically a protocol for sending and receiving information which makes up the foundation of what we know as the World Wide Web or the internet.

You can read more about HTTP and it's history along with the person who invented the protocol, Tim Berners-Lee, here on Wikipedia, because I won't be going into that.

HTTP is an application layer protocol which means it sits on top of another protocol, a transportation layer protocol. This is often, but not always the Transimission Control Protocol (TCP). We won't go into depth how TCP works but suffice it to say that it is critical in the workings of the internet and as was previously mentioned, serves as base on which the HTTP Protocol sits.

Now, let's get to what is an HTTP Request. Here are the steps involved in a simple HTTP GET request to the root directory of a website ('/') which we will break down in more detail.

  1. Establish a TCP connection from the client to the server
  2. Client initiates an HTTP GET request to the server
  3. HTTP server response to the HTTP GET request
  4. HTML page is loaded on client browser

Now, let's break these steps down in a little more detail.

Step 1 - Establish a TCP connection from the client to the server

The first step before in initiating an HTTP connection is a TCP connection with the server. This involves setting up a Three way handshake between client and server. The client first sends a request with a special flag to the server, an "S" flag. The server responds with an additonal flag set ("S & ACK") and the client replies back with an "ACK" reply to complete the handshake.

Now the client is ready for step 2.

Step 2 - Client initiates an HTTP GET request to the server

A client request can be as basic as the following

GET /index.html HTTP/1.1
  • GET - refers to the type of request
  • /index.html - specifies the file that is being requested. It could alternatively just be a request to '/' in which case the server would decide what files to send back.
  • HTTP/1.1 - specifies the version of HTTP
  • Host: specifies the host url Note: the request always ends with a double newline so that the "Host" can distinguish between various Domain Name Services (DNS) sharing the same IP address.

In the code above, Host is part of the header section of the request. Another way to break this down is the following which draws on this [good explanation[( of HTTP Transactions:

<initial line, different for request vs. response>
Header1: value1
Header2: value2
Header3: value3

In our case, we only had one header but we could have had more such as User-Agent: HTTPTool/1.0 or others. Note that the headers include a colon between the Header item and the value.

Step 3 - HTTP server response to the HTTP GET request

Once the server receives the GET request, it will respond with something like the following:

HTTP/1.1 200 OK
Date: Mon, 23 March 2015 22:38:34 GMT
Server: Apache/ (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2015 23:11:55 GMT
ETag: "3f80f-1b6-3e1cb03b"
Content-Type: text/html; charset=UTF-8
Content-Length: 138
Accept-Ranges: bytes
Connection: close

  <title>An Example Page</title>
  Hello World, this is a very simple HTML document.

Note that the initial line tells us what the HTTP protocol version is and the server response status. 200 OK tells us that the request is good. The next 8 lines through Connection: close are part of the header followed by the html that will be rendered into the webpage. Content-Type specifies the type of content, in this case text/html. Content-Length  specifies the size of the resource requested in bytes. If the server has a lot of data to send back to the client, it my send it in chunked format, in other words broken up into bits. In that case, the header will show Transfer-Encoding: chunked which tells the client that more data is coming until it sees Connection: closed. Etag is an identifier assigned by the web server to a specific version of the resource requested.

Step 4 - Render HTML to the browser

After the client receives a Connection: closed in the header, it will render the HTMl received to the browser.

So far we've only discussed a simple GET request but how does the client tell the browser what information it is looking for. One way is using query strings. In the example below, query string are appended onto the url by the client after the ? in key/value pairs.


So, the key value pairs here are
product1=blackshoes and product2=bluehats.
This tells the server information, perhaps that the client is requesting product information for black shoes and blue hats.

Before, I briefly mentioned status codes, like 200 OK. There are many more like the famous 404 Not Found or 201 Created. You can find them listed here.

There are also other types of requests other than GET requests.
GET - previously explained
HEAD - same as get but only transfers status line and headers
POST - send data to the server
PUT - usually used to update data
DELETE - removes something from the target resource
CONNECT - establishes a tunnel to the server with the given URI
OPTIONS - describes the communication options for the target resource
TRACE - does a loop-back test along the path to the resource

Lets dig into the other most commonly used request type, POST. We will leave the rest for another day and another tuturial. POST is used to send data to the server, perhaps to create a record in a database or some other action that involves the transfer of information from the client to the server. Typically forms on an HTML page use the POST method to send their data to the server once submitted.

Here's what a typical POST request from the server might look like.

POST /users HTTP/1.1
User-Agent: ELinks/0.11.1 (textmode; Linux; 80x25-2)
Accept: */*
Accept-Encoding: gzip
Accept-Language: en
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 62

It is the last line in this example that contains the form data sent from the client while everything above it, except the very first line, represents the headers.

A server response upon successfully creating the new record will return with a 201 Created reponse along with the location of where the newly created resource can be found.

HTTP/1.1 201 Created
Location: http://myawesomesite/jerseycap
Cache-Control: public
Content-Type: text/html
Date: Fri, 28 Jan 2014 22:05:35 GMT
Content-Length: 930

The only othe thing we haven't seen yet is Cache-Control: public which basically tells the intermediary proxies, like your ISP, that they can cache the page (perhaps for faster loading) whereas if it were set to private, it would tell your ISP that it should not cache the page and only the final user would be allowed to cache it.

I hope this has given you a brief introduction to HTTP Requests and expanded your knowledge on the subject. There is certainly much more I can say but limiting this post to GET and POST requests is a good place to stop before your eyes glaze over.