Protocol Modules as State machines
A part of the libwww thread model is to keep track of the current state in
the communication interface to the network. As an example, this section describes
the current implementation of the HTTP module
and how it has been implemented as a state machine. The HTTP module is based
on the HTTP 1.0 specification but is backwards compatible with the 0.9 version.
The major difference between the implementation before version 3.0 of the
Library is that this version is a state machine based on the state diagram
illustrated below. This implementation has several advantages even though
the HTTP protocol is stateless by nature.
The individual states and the transitions between them are explained in the
following sections.
  - 
    BEGIN State
  
 - 
    This state is the idle state or initial state where the HTTP module awaits
    a new request passed from the application. 
  
 - 
    NEED_CONNECTION State
  
 - 
    The HTTP module is now ready for setting up a connection to the remote host.
    The connection is always initiated by a connect system call. In
    order to minimize the access to the Domain Name Server, all host names to
    previous visited hosts are stored in a local host cache as explained in section
    "DNS Cache and Host Name Canonicalization". The
    cache handles multi homed hosts in a special way in that it measures the
    time it takes to actually make a connection to one of the IP-addresses. This
    time is stored together with the specific IP-address and the host name in
    the cache and on the next connection to the same host the IP-address with
    the fastest connect time is chosen.
  
 - 
    NEED_REQUEST State
  
 - 
    The HTTP Request is what the application
    sends to the remote HTTP server just after the establishment of the connection.
    The request consists of a HTTP header line, a set of HTTP Headers, and possibly
    a data object to be posted to the server. The header line has the following
    format:
    
	<METHOD> <URI> <HTTP-VERSION> CRLF
   - 
    SENT_REQUEST State
  
 - 
    When the request is sent the module waits until a response is given from
    the server or the connection is timed out in case or an error situation.
    As the module does not know whether the remote server is a HTTP 0.9 server
    or a HTTP 1.0 it must look at the first part of the response to figure out
    what version of HTTP is returned. The reason is that the HTTP protocol 0.9
    does not contain a HTTP header line in the response. It simply starts to
    send the requested data object as soon as the GET request is handled.
  
 - 
    NEED_ACCESS_AUTHORIZATION State
  
 - 
    If a 401 Unauthorized status code is returned
    the module asks the user for a user id and a password, see also the
    " HTTP Basic Access Authorization Scheme".
    The connection is closed before the user is asked for the user-id and password
    so any new request initiated upon a 401 status
    code causes a new connection to be established. This is done in order
    to avoid having the connection hanging around waiting while the applications
    is waiting for user input.
  
 - 
    REDIRECTION State
  
 - 
    The remote server returns a redirection status
    code if the URI has either been moved temporarily or permanent to another
    location, possibly on another HTTP server or any other service, for example
    FTP or gopher. The HTTP module supports both a temporarily and a permanent
    redirection code returned from the server:
    
      - 
	301 Moved
      
 - 
	The load procedure is recursively called on a 301 redirection code. The new
	URI is parsed back to the user as information via the
	Error and Information module,
	and a new request generated. The new request can be of any
	 access scheme accepted
	in a URI. An upper limit of redirections has been defined (default to 10)
	in order to avoid infinite loops.
      
 - 
	302 Found
      
 - 
	The functionality is the same as for a 301 Moved return status. A clever
	application can use the returned URI to change the document in which the
	URI originates so that the URI points to the new location.
    
 
  - 
  
  - 
    NO_DATA State
  
 - 
    When a return code indicates that no data object or resource follows the
    HTTP headers the HTTP module can terminate the request and pass control back
    to the application.
  
 - 
    NEED_BODY State
  
 - 
    If a body is included in the response from the server, the module must prepare
    to read the data from the network and direct it to the destination set up
    by the application. This is done by setting up a stream stack with the required
    conversions.
  
 - 
    GOT_DATA State
  
 - 
    When the data object has been parsed through the stream stack, the HTTP module
    terminates the request and handles control back to the application.
  
 - 
    ERROR or FAILURE State
  
 - 
    If at any point in the request handling a fatal error occurs the request
    is aborted and the connection closed. All information about the error is
    parsed back to the application via the
    Error and Information Module.
    As the HTTP protocol is stateless, all errors are fatal between the server
    and the server. If the erroneous request is to be repeated, the request starts
    in the initial state.
 
  
  Henrik Frystyk Nielsen,
  libwww@w3.org,
  @(#) $Id: HTTPFeatures.html,v 1.16 1996/12/09 03:20:54 jigsaw Exp $