Hypertext Transfer Protocol

Objectives

By the end of this tutorial you should be able to:

  • Understand the basic concepts of HTTP
  • Know what an HTTP Request is
  • Know what an HTTP Response is
  • Understand the content of the HTTP Header
  • Understand the content of the HTTP Body
  • Understand how HTML Forms use HTTP and Web server extensions

Introduction

A lot of people mistakenly assume that the Internet and the World Wide Web are one in the same; this is not correct. The Internet is a network of interconnected nodes designed to carry information from one place to another. Think of the Internet as all of the cables, wires, wireless signals, routers and switches that transport the billions of bits of information, like email, Web pages, music downloads, streaming video, and tweets, all over the world every day. There are numerous services, based on Internet standardized protocols, that operate over the Internet's data communication infrastructure, examples include: email, which uses the Simple Mail Transfer Protocol (SMTP); blogs, which use the Real Simple Syndication (RSS) protocol; and the World Wide Web (WWW - pronounced dub dub dub) which uses the HyperText Transfer Protocol (HTTP).

The technology that operates at the heart of a Web server is the hypertext transfer protocol (HTTP). When Web servers and Web browsers communicate with each other it is accomplished using HTTP. Essentially a Web server has three purposes:

  1. The main purpose of a Web server is to make your Web pages available to all who request them.
  2. Another job of the Web server is to provide an area to organize your Web pages (typically in a folder or a directory) that are part of a single Web site.
  3. A third duty of a Web server is to create Web pages dynamically through the use of Web server extensions like PHP, JSP, Cold Fusion, ASP, or ASP.NET; also referred to as Application servers.

This tutorial focuses mainly on the communication protocol used between a Web browser and a Web server, the HyperText Transfer Protocol (HTTP).

Table of Contents

The HTTP Protocol

To retrieve a Web page from the Internet you will most likely type a Web address into your Web browser or click on a hypertext link. In both cases the resource being retrieved is identified by a Uniform Resource Locator (URL). A URL consists of the protocol being used to communicate with the Web server, plus either the server's Fully Qualified Domain Name (FQDN) or IP address, and the name and location on the server where the specific resource can be found. The process of submitting your URL to the Web server is called making a HTTP request. The Web server interprets the URL in the request, locates the corresponding resource, and sends it back to the requesting device. The response message is appropriately called a HTTP response. If it is a Web page that is being requested, the Web browser then takes the code it has received from the Web server and compiles a viewable page from it. The Web browser is referred to as the client or user agent in this interaction and the whole interaction is a client-server relationship.

The message that is sent from the Web browser to the Web server is formatted using communication standards set forth by the Hypertext Transfer Protocol (HTTP) as defined in RFC 2616. A protocol is nothing more than a set of rules that are used to exchange information between two devices. HTTP is the protocol used by Web servers to receive and respond to Web browser requests for data. When you click on a hypertext link or type in the URL of a Web page, the URL will begin with "http://" indicating that the Web browser will be using the Hypertext Transfer Protocol to communicate with the Web server.

The message passed from the Web browser to the Web server is called the HTTP request. When the Web server receives the request, it looks for the file being requested. If the file being requested is found, it bundles the file, i.e. an HTML page or a GIF file, in an HTTP response and sends it back across the network to the Web browser. If the Web server cannot find the requested page it issues a response that contains an HTML page with an appropriate error message, i.e. an HTTP 404 - File Not Found message.

HTTP is said to be a stateless protocol. What this means basically is that once the request has been sent back to the Web browser, the connection between the Web server and the Web browser is terminated. Because of this, HTTP doesn't know if future request/response interactions are an ongoing conversation with a previous client or a request from a new one.

The reason HTTP is stateless is because it was originally intended to retrieve a single page for display. With all of the traffic on the Internet, imagine the problems that would incur if each client had a constant connection to the server. At the very least, the Internet would slow to a crawl if not collapse altogether.

Important!!!

It is important to remember what was just stated in the last two paragraphs. To reiterate, HTTP makes the connection, delivers the request, returns the response, and then disconnects. You may not think that this is such a big deal right now, but when you are a Web developer troubleshooting Web applications you will soon learn how important this concept is to understand and remember - write this down somewhere where you won't forget it.

Table of Contents

How HTTP Works

When a request is sent to the Web server, it carries more than just the desired URL. There is actually a lot of extra information that is sent as part of the request (this is also true for the response). Most of this extra information is generated automatically so that you don't have to deal with it programmatically. Although you don't typically have to fool with this information, you should know that it is there because Web server extensions like PHP, Cold Fusion, ASP, or ASP.NET can use the information provided by HTTP to have a direct effect on the content of the information sent back to the Web client.

Every HTTP message has the same format (whether it is the client or the server response). You should think of all the information associated with the client request or server response as a packet. This packet can be broken down into three sections: The request/response line, the HTTP header, and the HTTP body.

Web browser submitting an http get request to a Web server which executes aspnet code and submits the results back to the Web browsers in an http response using a status code of 200.
Example HTTP Request and Response packets.

As you can see, the packet for an HTTP request and an HTTP response are very similar and there is information common to both. The pieces of information such as the server name, the date, and the acceptance code are all called environment or server variables that can be used by Web server extensions like PHP, Cold Fusion, ASP, or ASP.NET to customize pages.

Table of Contents

The HTTP Response

An HTTP Response is sent by the Web server back to the client and consists of three pieces of information:

  1. The Response line
  2. The HTTP header
  3. The HTTP body

The Response Line

Contains two pieces of information :

  1. The HTTP version number
    1. currently 1.0 or 1.1
  2. An HTTP status code that reports the success or failure of the request
HTTP Status Codes
Code Class Description
100 - 199 Informational codes - they indicate that the request is currently being processed
200 - 299 Success codes - the web server received and processed the request successfully
300 - 399 Indicate the request hasn't been performed because the information required has been moved
400 - 499 Denote a client error - the request was incomplete, incorrect, or impossible to locate
500 - 599 Denote a server error - the request appeared to be valid but the server failed to carry it out

The response header is similar to the request header. The header information falls into three types:

  1. General: contains information about either the client or server, but is not specific to one or the other
  2. Entity: contains information about the data being sent between the client and the server
  3. Response: Information about the server sending the response, and how it can deal with the response.

The header consists of a number of lines and uses blank lines to indicate that the header information is complete.

HTTP/1.1 200 OK         // The Response Line
Date: Sun, 1st Jan 2006, 16:12: GMT // General Header
Server: Microsoft-IIS/5.0 // Response Header
Last-Modified: Fri, 30th Dec 2005, 12:08:03 GMT //Entity Header

The third line of the server's response header indicates the type of software the Web server is running. The rest of the header is pretty much self explanatory.

The HTTP Response Body

If the response is successful then the HTTP response body contains HTML code along with any linked or embedded scripts that need to be executed by the browser. In addition, HTTP requests are used to retrieve any other resource, such as an image file, as dictated by the HTML code. For instance, once the Web browser processes the HTML code it received in the response from the Web server, if it encounters an <img> tag the Web browser will use the value of the tag's "src" attribute to submit another request to the Web server to retrieve the image file that was indicated by the <img> tag.

Table of Contents

How Pages Using Web Server Extensions are Served

Web server extensions running on a Web server can provide very powerful tools for developing dynamic data-driven Web sites and mobiles apps. The extension sits in the middle of the HTTP pipeline listening for specific requests for server-side code, variables, or processes that are used to do things like retrieve data from or write data to a table, or read/write the contents of a file on the server, or dynamically modify page content based on certain conditions before delivery to the user agent. Using servers extensions is a lot more powerful way to deliver and update information than the previous statically-driven methods that were utilized in the days of more antiquated Web systems.

image repeats the five-step text below.
The steps invlolved in the HTTP Request/Response process on a Web application server.
  1. The client requests a Web page
  2. The Web server needs to locate the page that was requested; and if it is a PHP, Cold Fusion, JSP, ASP, or ASP.NET page then the code will need to be processed by the appropriate Web server extension first in order to generate the HTML that is returned to the browser.
  3. If the file name of the web page has an extension of .php, .cfm, .jsp, .asp, or .aspx the server sends it to one of its dynamic linking libraries (DLL) for processing first prior to sending the page to the requesting Web browser. For example if it is an ASP.NET page the Web server will recognize this by the page's .aspx extension and will thereby send it to the aspnet_isapi.dll (which is installed on the web server ) for processing. The aspnet_isapi.dll doesn't actually do much itself, it just forwards the ASP.NET code to the Common Language Runtime (CLR) which is another program running on the Web server and is capable of processing the ASP.NET code in the page. If the ASP.NET code has not been compiled before, it is compiled at this point, and then executed The result being that pure HTML comes out the other end and is sent to the requesting Web browser. In this way the HTML is created dynamically on the Web server.
  4. The HTML stream is returned to the browser
  5. The browser displays the web page

There are a lot of advantages to creating the pages dynamically. You can return information to the user based on their response in a form, you can customize a web page for a particular browser, you can personalize information for the particular user, and a lot more.

HTTP Video Tutorial

This video provides an overview of HTTP.
Table of Contents

Links Of Interest

You will not be tested on the information contained in the articles below, but if you are serious about being a Webmaster, you'll want to a least familiarize yourself with the people and organizations who are included here.