Brief HTTP Reference

Excerpts from the HTTP 1.1 Standard[RFC2068]

 

1.4 Overall Operation

The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity metainformation, and possible entity-body content. The relationship between HTTP and MIME is described in appendix 19.4.

Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection (v) between the user agent (UA) and the origin server (O).

          request chain ------------------------>
       UA -------------------v------------------- O
          <----------------------- response chain
A more complicated situation occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

          request chain -------------------------------------->
       UA -----v----- A -----v----- B -----v----- C -----v----- O
          <------------------------------------- response chain
The figure above shows three intermediaries (A, B, and C) between the user agent and origin server. A request or response message that travels the whole chain will pass through four separate connections. This distinction is important because some HTTP communication options may apply only to the connection with the nearest, non-tunnel neighbor, only to the end-points of the chain, or to all connections along the chain. Although the diagram is linear, each participant may be engaged in multiple, simultaneous communications. For example, B may be receiving requests from many clients other than A, and/or forwarding requests to servers other than C, at the same time that it is handling A's request.

Any party to the communication which is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request. The following illustrates the resulting chain if B has a cached copy of an earlier response from O (via C) for a request which has not been cached by UA or A.

          request chain ---------->
       UA -----v----- A -----v----- B - - - - - - C - - - - - - O
          <--------- response chain
Not all responses are usefully cachable, and some requests may contain modifiers which place special requirements on cache behavior. HTTP requirements for cache behavior and cachable responses are defined in section 13.

HTTP communication usually takes place over TCP/IP connections. The default port is TCP 80, but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used

In HTTP/1.0, most implementations used a new connection for each request/response exchange. In HTTP/1.1, a connection may be used for one or more request/response exchanges, although connections may be closed for a variety of reasons (see section 8.1).

 

3.3.1 Full Date Specification

       Sun, 06 Nov 1994 08:49:37 GMT 
as defined in RFC 1123 (an update to RFC 822). All HTTP 1.1 clients and servers must generate date/time only in that form (although they can accept a few obsolete formats)


 

4.1 Message Types

HTTP (Request and Response) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message). Both types of message consist of a start-line, one or more header fields (also known as headers), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and an optional message-body.


 

4.2 Message Headers

HTTP header fields, which include general-header (section 4.5), request- header (section 5.3), response-header (section 6.2), and entity-header (section 7.1) fields, follow the same generic format as that given in Section 3.1 of RFC 822 [9]. Each header field consists of a name followed by a colon (":") and the field value. Field names are case- insensitive. The field value may be preceded by any amount of LWS, though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.

3.7.2 Multipart Types

MIME provides for a number of "multipart" types -- encapsulations of one or more entities within a single message-body. All multipart types share a common syntax, as defined in section 7.2.1 of RFC 1521 [7], and MUST include a boundary parameter as part of the media type value. The message body is itself a protocol element and MUST therefore use only CRLF to represent line breaks between body-parts. Unlike in RFC 1521, the epilogue of any multipart message MUST be empty; HTTP applications MUST NOT transmit the epilogue (even if the original multipart contains an epilogue).

In HTTP, multipart body-parts MAY contain header fields which are significant to the meaning of that part. A Content-Location header field (section 14.15) SHOULD be included in the body-part of each enclosed entity that can be identified by a URL.

In general, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type. If an application receives an unrecognized multipart subtype, the application MUST treat it as being equivalent to multipart/mixed.

Note: The multipart/form-data type has been specifically defined for carrying form data suitable for processing via the POST request method, as described in RFC 1867 [15].

 

6.1.1 Response message status codes

The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.

The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit:

1xx: Informational
Request received, continuing process
2xx: Success
The action was successfully received, understood, and accepted
3xx: Redirection
Further action must be taken in order to complete the request
4xx: Client Error
The request contains bad syntax or cannot be fulfilled
5xx: Server Error
The server failed to fulfill an apparently valid request

100Continue
101Switching Protocols
200OK
201Created
202Accepted
203Non-Authoritative Information
204No Content
205Reset Content
206Partial Content
300Multiple Choices
301Moved Permanently
302Moved Temporarily
303See Other
304Not Modified
305Use Proxy
400Bad Request
401Unauthorized
402Payment Required
403Forbidden
404Not Found
405Method Not Allowed
406Not Acceptable
407Proxy Authentication Required
408Request Time-out
409Conflict
410Gone
411Length Required
412Precondition Failed
413Request Entity Too Large
414Request-URI Too Large
415Unsupported Media Type
500Internal Server Error
501Not Implemented
502Bad Gateway
503Service Unavailable
504Gateway Time-out
505HTTP Version not supported

10.1 Informational 1xx

This class of status code indicates a provisional response, consisting only of the Status-Line and optional headers, and is terminated by an empty line. Since HTTP/1.0 did not define any 1xx status codes, servers must not send a 1xx response to an HTTP/1.0 client except under experimental conditions.

 

10.1.1 100 Continue


10.2 Successful 2xx

This class of status code indicates that the client's request was successfully received, understood, and accepted.

 

10.2.1 200 OK
The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:
GET
an entity corresponding to the requested resource is sent in the response
HEAD
the entity-header fields corresponding to the requested resource are sent in the response without any message-body
POST
an entity describing or containing the result of the action;

 

10.2.2 201 Created
The request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URL for the resource given by a Location header field. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server should respond with 202 (Accepted) response instead.

 

10.2.3 202 Accepted
The request has been accepted for processing, but the processing has not been completed. The request may or may not eventually be acted upon, as it may be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.

The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch- oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response should include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.


 

10.3.5 304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server should respond with this status code. The response must not contain a message- body.

The response must include the following header fields:


 

10.4.2 401 Unauthorized
The request requires user authentication. The response must include a WWW-Authenticate header field (section 14.46) containing a challenge applicable to the requested resource. The client may repeat the request with a suitable Authorization header field (section 14.8). If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user should be presented the entity that was given in the response, since that entity may include relevant diagnostic information. HTTP access authentication is explained in section 11.


Transfer Codings

Transfer coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure "safe transport" through the network. This differs from a content coding in that the transfer coding is a property of the message, not of the original entity.
       transfer-coding         = "chunked" | transfer-extension

       transfer-extension      = token
All transfer-coding values are case-insensitive. HTTP/1.1 uses transfer coding values in the Transfer-Encoding header field (section 14.40).

Transfer codings are analogous to the Content-Transfer-Encoding values of MIME , which were designed to enable safe transport of binary data over a 7-bit transport service.

 

Authorization and Access Authentication

 

14.8 Authorization

A user agent that wishes to authenticate itself with a server--usually, but not necessarily, after receiving a 401 response -- may do so by including an Authorization request-header field with the request. The Authorization field value consists of credentials containing the authentication information of the user agent for the realm of the resource being requested.
       Authorization  = "Authorization" ":" credentials
HTTP access authentication is described in section 11. If a request is authenticated and a realm specified, the same credentials should be valid for all other requests within this realm.  

11 Access Authentication

HTTP provides a simple challenge-response authentication mechanism which may be used by a server to challenge a client request and by a client to provide authentication information. It uses an extensible, case- insensitive token to identify the authentication scheme, followed by a comma-separated list of attribute-value pairs which carry the parameters necessary for achieving authentication via that scheme.
       auth-scheme    = token
       auth-param     = token "=" quoted-string
The 401 (Unauthorized) response message is used by an origin server to challenge the authorization of a user agent. This response must include a WWW-Authenticate header field containing at least one challenge applicable to the requested resource.
       challenge      = auth-scheme 1*SP realm *( "," auth-param )
       realm          = "realm" "=" realm-value
       realm-value    = quoted-string
The realm attribute (case-insensitive) is required for all authentication schemes which issue a challenge. The realm value (case- sensitive), in combination with the canonical root URL (see section 5.1.2) of the server being accessed, defines the protection space. These realms allow the protected resources on a server to be partitioned into a set of protection spaces, each with its own authentication scheme and/or authorization database. The realm value is a string, generally assigned by the origin server, which may have additional semantics specific to the authentication scheme.

A user agent that wishes to authenticate itself with a server--usually, but not necessarily, after receiving a 401 or 411 response -- may do so by including an Authorization header field with the request. The Authorization field value consists of credentials containing the authentication information of the user agent for the realm of the resource being requested.

       credentials    = basic-credentials
                      | auth-scheme #auth-param
The domain over which credentials can be automatically applied by a user agent is determined by the protection space. If a prior request has been authorized, the same credentials may be reused for all other requests within that protection space for a period of time determined by the authentication scheme, parameters, and/or user preference. Unless otherwise defined by the authentication scheme, a single protection space cannot extend outside the scope of its server.

If the server does not wish to accept the credentials sent with a request, it should return a 401 (Unauthorized) response. The response must include a WWW-Authenticate header field containing the (possibly new) challenge applicable to the requested resource and an entity explaining the refusal.

The HTTP protocol does not restrict applications to this simple challenge-response mechanism for access authentication. Additional mechanisms may be used, such as encryption at the transport level or via message encapsulation, and with additional header fields specifying authentication information. However, these additional mechanisms are not defined by this specification.

Proxies must be completely transparent regarding user agent authentication. That is, they must forward the WWW-Authenticate and Authorization headers untouched, and follow the rules found in section 14.8.

HTTP/1.1 allows a client to pass authentication information to and from a proxy via the Proxy-Authenticate and Proxy-Authorization headers.  

11.1 Basic Authentication Scheme

The basic authentication scheme is based on the model that the user agent must authenticate itself with a user-ID and a password for each realm. The realm value should be considered an opaque string which can only be compared for equality with other realms on that server. The server will service the request only if it can validate the user-ID and password for the protection space of the Request-URI. There are no optional authentication parameters.

Upon receipt of an unauthorized request for a URI within the protection space, the server MAY respond with a challenge like the following:

       WWW-Authenticate: Basic realm="WallyWorld"
where "WallyWorld" is the string assigned by the server to identify the protection space of the Request-URI.

To receive authorization, the client sends the userid and password, separated by a single colon (":") character, within a base64 encoded string in the credentials.

       basic-credentials = "Basic" SP basic-cookie
       basic-cookie   = <BASE64 [7] encoding of user-pass,
                        except not limited to 76 char/line>

       user-pass   = userid ":" password
       userid      = *<TEXT excluding ":">
       password    = *TEXT
Userids might be case sensitive.

If the user agent wishes to send the userid Aladdin and password open sesame, it would use the following header field:

       Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

 

The MAILCAP file(s)

Excerps from the metmail.1 man page, see also [RFC1524]

The primary purpose of the metamail program is to allow diverse mail reading programs to centralize their access to multimedia information. If all the mail reading programs call a single program to handle non-text mail, then only that program needs to know about the diverse types of nontext mail that might be received.

The metamail program is made more flexible in this role through the mechanism of one or more "mailcap" files. The purpose of the mailcap files is to tell metamail what program to run in order to show the user mail in a given format. Thus it becomes possible to add a new media type to all of the mail reading programs at a site simply by adding a line to a mailcap file.

Metamail uses a search path to find the mailcap file(s) to consult. Unlike many path searches, if necessary metamail will read all the mailcap files on its path. That is, it will keep reading mailcap files until it runs out of them, or until it finds a line that tells it how to handle the piece of mail it is looking at. If it finds a matching line, it will execute the command that is specified in the mailcap file.

The default search path is equivalent to

$HOME/.mailcap:/usr/local/etc/mailcap:/usr/etc/mailcap:/etc/mailcap:/etc/mail/mailcap:/usr/public/lib/mailcap"
It can be overridden by setting the MAILCAPS environment variable. Note: Metamail does not actually interpret environment variables such as $HOME or the "~" syntax in this path search.

The syntax of a mailcap file is quite simple, at least compared to termcap files. Any line that starts with "#" is a comment. Blank lines are ignored. Otherwise, each line defines a single mailcap entry for a single content type. Long lines may be continued by ending them with a backslash character, \.

Each individual mailcap entry consists of a content-type specification, a command to execute, and (possibly) a set of optional "flag" values. For example, a very simple mailcap entry (which is actually a built-in default behavior for metamail) would look like this:

text/plain; cat %s
The optional flags can be used to specify additional information about the mail-handling command. For example:
text/plain; cat %s; copiousoutput
can be used to indicate that the output of the 'cat' command may be voluminous, requiring either a scrolling window, a pager, or some other appropriate coping mechanism.

The type field (text/plain, in the above example) is simply any legal content type name, as defined by RFC 822. In practice, this is almost any string. It is the string that will be matched against the Content-type header (or the value passed in with -c) to decide if this is the mailcap entry that matches the current message. Additionally, the type field may specify a subtype (e.g. text/ISO-8859-1) or a wildcard to match all subtypes (e.g. image/*).

The command field is any UNIX command ("cat %s" in the above example), and is used to specify the interpreter for the given type of message. It will be passed to the shell via the system(3) facility. Semicolons and backslashes within the command must be quoted with backslashes. If the command contains %s, those two characters will be replaced by the name of a file that contains the body of the message. If it contains %t, those two characters will be replaced by the content-type field, including the subtype, if any. (That is, if the content-type was image/pbm; opt1=something-else, then %t would be replaced by image/pbm.) If the command field contains %{ followed by a parameter name and a closing }, then all those characters will be replaced by the value of the named parameter, if any, from the Content-type header. Thus, in the previous example, %{opt1} will be replaced by something-else. Finally, if the command contains \%, those two characters will be replaced by a single % character. (In fact, the backslash can be used to quote any character, including itself.

If no %s appears in the command field, then instead of placing the message body in a temporary file, metamail will pass the body to the command on the standard input. This is helpful in saving /tmp file space, but can be problematic for window-oriented applications under some window systems such as MGR.

The notes=xxx field is an uninterpreted string that is used to specify the name of the person who installed this entry in the mailcap file. (The xxx may be replaced by any text string.)

The test=xxx field is a command that is executed to determine whether or not the mailcap line actually applies. That is, if the content-type field matches the content-type on the message, but a test= field is present, then the test must succeed before the mailcap line is considered to "match" the message being viewed. The command may be any UNIX command, using the same syntax and the same %-escapes as for the viewing command, as described above. A command is considered to succeed if it exits with a zero exit status, and to fail otherwise.

needsterminal
If this flag is given, the named interpreter needs to interact with the user on a terminal. In some environments (e.g. a window-oriented mail reader under X11) this will require the creation of a new terminal emulation window, while in most environments it will not. If the mailcap entry specifies needsterminal and metamail is not running on a terminal (as determined by isatty(3), the -x option, and the MM_NOTTTY environment variable) then metamail will try to run the command in a new terminal emulation window. Currently, metamail knows how to create new windows under the X11, SunTools, and WM window systems.

copiousoutput
This flag should be given whenever the interpreter is capable of producing more than a few lines of output on stdout, and does no interaction with the user. If the mailcap entry specifies copiousoutput, and pagination has been requested via the -p command, then the output of the command being executed will be piped through a pagination program ("more" by default, but this can be overridden with the METAMAIL_PAGER environment variable).