How does an HTTP Request work?
This post will attempt break down the anatomy of an HTTP request. Recall that HTTP stand for Hyper Text Transfer Protocol and it's basically a protocol for sending and receiving information which makes up the foundation of what we know as the World Wide Web or the internet.
You can read more about HTTP and it's history along with the person who invented the protocol, Tim Berners-Lee, here on Wikipedia, because I won't be going into that.
HTTP is an application layer protocol which means it sits on top of another protocol, a transportation layer protocol. This is often, but not always the Transimission Control Protocol (TCP). We won't go into depth how TCP works but suffice it to say that it is critical in the workings of the internet and as was previously mentioned, serves as base on which the HTTP Protocol sits.
Now, let's get to what is an HTTP Request. Here are the steps involved in a simple HTTP GET request to the root directory of a website ('/') which we will break down in more detail.
- Establish a TCP connection from the client to the server
- Client initiates an HTTP GET request to the server
- HTTP server response to the HTTP GET request
- HTML page is loaded on client browser
Now, let's break these steps down in a little more detail.
Step 1 - Establish a TCP connection from the client to the server
The first step before in initiating an HTTP connection is a TCP connection with the server. This involves setting up a Three way handshake between client and server. The client first sends a request with a special flag to the server, an "S" flag. The server responds with an additonal flag set ("S & ACK") and the client replies back with an "ACK" reply to complete the handshake.
Now the client is ready for step 2.
Step 2 - Client initiates an HTTP GET request to the server
A client request can be as basic as the following
GET /index.html HTTP/1.1 Host: www.example.com
GET- refers to the type of request
/index.html- specifies the file that is being requested. It could alternatively just be a request to '/' in which case the server would decide what files to send back.
HTTP/1.1- specifies the version of HTTP
Host: www.example.comspecifies the host url Note: the request always ends with a double newline so that the "Host" can distinguish between various Domain Name Services (DNS) sharing the same IP address.
In the code above,
Host is part of the header section of the request. Another way to break this down is the following which draws on this [good explanation[(http://www.jmarshall.com/easy/http/#http1.1) of HTTP Transactions:
<initial line, different for request vs. response> Header1: value1 Header2: value2 Header3: value3
In our case, we only had one header but we could have had more such as
User-Agent: HTTPTool/1.0 or others. Note that the headers include a colon between the Header item and the value.
Step 3 - HTTP server response to the HTTP GET request
Once the server receives the GET request, it will respond with something like the following:
HTTP/1.1 200 OK Date: Mon, 23 March 2015 22:38:34 GMT Server: Apache/220.127.116.11 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2015 23:11:55 GMT ETag: "3f80f-1b6-3e1cb03b" Content-Type: text/html; charset=UTF-8 Content-Length: 138 Accept-Ranges: bytes Connection: close <html> <head> <title>An Example Page</title> </head> <body> Hello World, this is a very simple HTML document. </body> </html>
Note that the initial line tells us what the HTTP protocol version is and the server response status. 200 OK tells us that the request is good. The next 8 lines through
Connection: close are part of the header followed by the html that will be rendered into the webpage.
Content-Type specifies the type of content, in this case text/html.
Content-Length specifies the size of the resource requested in bytes. If the server has a lot of data to send back to the client, it my send it in chunked format, in other words broken up into bits. In that case, the header will show
Transfer-Encoding: chunked which tells the client that more data is coming until it sees
Etag is an identifier assigned by the web server to a specific version of the resource requested.
Step 4 - Render HTML to the browser
After the client receives a
Connection: closed in the header, it will render the HTMl received to the browser.
So far we've only discussed a simple GET request but how does the client tell the browser what information it is looking for. One way is using query strings. In the example below, query string are appended onto the url by the client after the
? in key/value pairs.
So, the key value pairs here are
This tells the server information, perhaps that the client is requesting product information for black shoes and blue hats.
Before, I briefly mentioned status codes, like 200 OK. There are many more like the famous 404 Not Found or 201 Created. You can find them listed here.
There are also other types of requests other than GET requests.
GET - previously explained
HEAD - same as get but only transfers status line and headers
POST - send data to the server
PUT - usually used to update data
DELETE - removes something from the target resource
CONNECT - establishes a tunnel to the server with the given URI
OPTIONS - describes the communication options for the target resource
TRACE - does a loop-back test along the path to the resource
Lets dig into the other most commonly used request type, POST. We will leave the rest for another day and another tuturial. POST is used to send data to the server, perhaps to create a record in a database or some other action that involves the transfer of information from the client to the server. Typically forms on an HTML page use the POST method to send their data to the server once submitted.
Here's what a typical POST request from the server might look like.
POST /users HTTP/1.1 Host: www.myawesomeserver.com User-Agent: ELinks/0.11.1 (textmode; Linux; 80x25-2) Referer: www.myawesomeserver.com Accept: */* Accept-Encoding: gzip Accept-Language: en Connection: Keep-Alive Content-Type: application/x-www-form-urlencoded Content-Length: 62 firstname.lastname@example.org&telephone=707-453-7070&comments=great
It is the last line in this example that contains the form data sent from the client while everything above it, except the very first line, represents the headers.
A server response upon successfully creating the new record will return with a
201 Created reponse along with the location of where the newly created resource can be found.
HTTP/1.1 201 Created Location: http://myawesomesite/jerseycap Cache-Control: public Content-Type: text/html Date: Fri, 28 Jan 2014 22:05:35 GMT Content-Length: 930
The only othe thing we haven't seen yet is
Cache-Control: public which basically tells the intermediary proxies, like your ISP, that they can cache the page (perhaps for faster loading) whereas if it were set to private, it would tell your ISP that it should not cache the page and only the final user would be allowed to cache it.
I hope this has given you a brief introduction to HTTP Requests and expanded your knowledge on the subject. There is certainly much more I can say but limiting this post to GET and POST requests is a good place to stop before your eyes glaze over.