The Format Manager

/*
**	(c) COPYRIGHT MIT 1995.
**	Please first read the full copyright statement in the file COPYRIGH.
*/

The Format Manager is responsible for setting up the stream pipe from the Channel Object to the Request Object when data is arriving, for example as a response to s HTTP Get request. As data arrives, we start to parse it and the more we know the more we can build up our stream pipe. For example, in the case of HTTP, we first have a stream that can parse the HTTP response line containing "200 OK". Then we have a MIME parser for handling the MIME headers. When the MIME headers have been parsed, we know the content type and any encoding of the MIME body. If we need to decode a chunked encoding then we set up a chunked decoder, and if we have to parse a HTML object then we set up a HTML parser.

The Format Manager is also responsible for keeping track of the "preferences" of the application and/or user. It is an integral part of the Web and HTTP, that the client application can express its preferences as a set of "accept" headers in a HTTP request. This task is highly related to the task mentioned above as we there use the modules that are registered and here tell the remote server what we are capable of doing and what we would prefer.

Note: The library core does not define any default decoders or parsers - they are all considered part of the application. The library comes with a default set of parsers including the ones mentioned above which can be initiated using the functions in HTInit module. There are different initialization functions for content type parsers and content encodings respectively.

Currently there are four dimensions for handling "accept" headers:

Content Type Converters and Presenters
Content Encoders and Decoders
Content Charsets
Natural Languages

The application can assign its preferences in two ways: either locally to a single request or globally to all requests. The local assignment can either add to or override the global settings depending on how they are registered. All local registration is handled by the Request Object and the global registration is handled by the Format Manager

Global Preferences

When data is arriving and the Format Manager is to build a stream pipe from the HTChannel Object to the Request Object, it uses a special set of stream stack algorithms. These algorithms take the complete set of content type converters, for example, and finds the one best suited for the task. We currently have three stream stack algorithms: one for content types, one for content encodings, and one for content transfer encodings:

The Content Type Stream Stack
The Content Encoding Stream Stack
The Content Transfer Encoding Stream Stack

As part of the HTTP content negotiation, a server must be able to match the preferences sent by the client in an HTTP request with the possible set of documents that it has avaiable fro this URL. For example , it may have an English and a Danish version in which case it looks at the Accept-Language header and sees what the client prefers. The Library has a simple "Match" algorithm for finding the best match.

Content negotiation

This module is implemented by HTFormat.c, and it is a part of the W3C Reference Library.

#ifndef HTFORMAT_H
#define HTFORMAT_H

#include "HTUtils.h"
#include "HTStream.h"
#include "HTAtom.h"
#include "HTList.h"
#include "HTAnchor.h"
#include "HTReq.h"

Content Type Converters and Presenters

All content type converters are subclassed from the Generic stream objetc. That way, we allow the application to do very fast progressive display of incoming data. In other words, the stream model of the Library provides data as soon as it arrives from the network, the application does not have to wait until the whole document has been down loaded before it starts parsing it.

Content Type Converters

A converter is a stream with a special set of parameters and which is registered as capable of converting from a MIME type to something else (maybe another MIME-type). A converter is defined to be a function returning a stream and accepting the following parameters. The content type elements are atoms for which we have defined a prototype.

typedef HTStream * HTConverter	(HTRequest *	request,
				 void *		param,
				 HTFormat	input_format,
				 HTFormat	output_format,
				 HTStream *	output_stream);

The HTPresentation Object

A presenter is a module (possibly an external program) which can present a graphic object of a certain MIME type to the user. That is, presenters are normally used to present objects that the converters are not able to handle. Data is transferred to the external program using for example the HTSaveAndExecute stream which writes to a local file. Both presenters and converters are of the type HTConverter.

typedef struct _HTPresentation {
    HTFormat	rep;			     /* representation name atomized */
    HTFormat	rep_out;			 /* resulting representation */
    HTConverter *converter;	      /* The routine to gen the stream stack */
    char *	command;			       /* MIME-format string */
    char *	test_command;			       /* MIME-format string */
    double	quality;		     /* Between 0 (bad) and 1 (good) */
    double	secs;
    double	secs_per_byte;
} HTPresentation;

Basic Content type Converters

We have a small set of basic converters that can be hooked in anywhere. They don't "convert" anything but are nice to have.

extern HTConverter HTThroughLine;
extern HTConverter HTBlackHoleConverter;

Predefined Content Types

These macros (which used to be constants) define some basic internally referenced representations. The www/xxx ones are of course not MIME standard. They are internal representations used in the Library but they can't be exported to other apps!

#define WWW_RAW		HTAtom_for("www/void")   /* Raw output from Protocol */

WWW_RAW is an output format which leaves the input untouched exactly as it is received by the protocol module. For example, in the case of FTP, this format returns raw ASCII objects for directory listings; for HTTP, everything including the header is returned, for Gopher, a raw ASCII object is returned for a menu etc.

#define WWW_SOURCE	HTAtom_for("*/*")

WWW_SOURCE is an output format which leaves the input untouched exactly as it is received by the protocol module IF not a suitable converter has been registered with a quality factor higher than 1 (for example 2). In this case the SUPER CONVERTER is preferred for the raw output. This can be used as a filter effect that allows conversion from, for example raw FTPdirectory listings into HTML but passes a MIME body untouched.

#define WWW_PRESENT	HTAtom_for("www/present")

WWW_PRESENT represents the user's perception of the document. If you convert to WWW_PRESENT, you present the material to the user.

#define WWW_DEBUG	HTAtom_for("www/debug")

WWW_DEBUG represents the user's perception of debug information, for example sent as a HTML document in a HTTP redirection message.

#define WWW_UNKNOWN     HTAtom_for("www/unknown")

WWW_UNKNOWN is a really unknown type. It differs from the real MIME type "application/octet-stream" in that we haven't even tried to figure out the content type at this point.

These are regular MIME types defined. Others can be added!

#define WWW_HTML 	HTAtom_for("text/html")
#define WWW_PLAINTEXT 	HTAtom_for("text/plain")

#define WWW_MIME	HTAtom_for("message/rfc822")
#define WWW_MIME_HEAD	HTAtom_for("message/x-rfc822-head")
#define WWW_MIME_FOOT	HTAtom_for("message/x-rfc822-foot")

#define WWW_AUDIO       HTAtom_for("audio/basic")

#define WWW_VIDEO 	HTAtom_for("video/mpeg")

#define WWW_GIF 	HTAtom_for("image")
#define WWW_JPEG 	HTAtom_for("image/jpeg")
#define WWW_TIFF 	HTAtom_for("image/tiff")
#define WWW_PNG 	HTAtom_for("image/png")

#define WWW_BINARY 	HTAtom_for("application/octet-stream")
#define WWW_POSTSCRIPT 	HTAtom_for("application/postscript")
#define WWW_RICHTEXT 	HTAtom_for("application/rtf")

We also have some MIME types that come from the various protocols when we convert from ASCII to HTML.

#define WWW_GOPHER_MENU HTAtom_for("text/x-gopher")
#define WWW_CSO_SEARCH	HTAtom_for("text/x-cso")

#define WWW_FTP_LNST	HTAtom_for("text/x-ftp-lnst")
#define WWW_FTP_LIST	HTAtom_for("text/x-ftp-list")

#define WWW_NNTP_LIST   HTAtom_for("text/x-nntp-list")
#define WWW_NNTP_OVER	HTAtom_for("text/x-nntp-over")
#define WWW_NNTP_HEAD	HTAtom_for("text/x-nntp-head")

#define WWW_HTTP	HTAtom_for("text/x-http")

Finally we have defined a special format for our RULE files as they can be handled by a special converter.

#define WWW_RULES	HTAtom_for("application/x-www-rules")

Register Presenters

This function creates a presenter object and adds to the list of conversions. These are the parameters :

conversions: The list of conveters and presenters
rep_in: the MIME-style format name
rep_out: is the resulting content-type after the conversion
converter: is the routine to call which actually does the conversion
quality: A degradation faction [0..1]
maxbytes: A limit on the length acceptable as input (0 infinite)
maxsecs: A limit on the time user will wait (0 for infinity)

extern void HTPresentation_add (HTList *	conversions,
				const char * 	representation,
				const char * 	command,
				const char * 	test_command,
				double		quality,
				double		secs, 
				double		secs_per_byte);

extern void HTPresentation_deleteAll	(HTList * list);

Register Converters

This function creates a presenter object and adds to the list of conversions.

conversions: The list of conveters and presenters
rep_in: the MIME-style format name
rep_out: is the resulting content-type after the conversion
converter: is the routine to call which actually does the conversion
quality: A degradation faction [0..1]
maxbytes: A limit on the length acceptable as input (0 infinite)
maxsecs: A limit on the time user will wait (0 for infinity)

extern void HTConversion_add   (HTList *	conversions,
				const char * 	rep_in,
				const char * 	rep_out,
				HTConverter *	converter,
				double		quality,
				double		secs, 
				double		secs_per_byte);

extern void HTConversion_deleteAll	(HTList * list);

Content and Transfer Encoders and Decoders

Content codins are transformations applied to an entity object after it was created in its original form. The Library handles two types of codings:

Content Codings: Content codings values indicate an encoding transformation that has been applied to a resource. Content cosings are primarily used to allow a document to be compressed or encrypted without loosing the identity of its underlying media type.
Content Transfer Codings: Content transfer codings values are used to indicate an encoding transformation that has been, can be, or may be need to be applied to an enity body in order to ensure safe transport through the network. This differs from a content coding in that the transfer coding is a property of the message, not the original message.

Both types of encodings use the same registration mechanism in the Library which we describe below:

Encoders and Decoders

Encoders and decoders are subclassed from the generic stream class. Encoders are capable of adding a content coding to a data object and decoders can remove a content coding.

typedef HTStream * HTCoder	(HTRequest *	request,
				 void *		param,
				 HTEncoding	coding,
				 HTStream *	target);

The encoding is the name of the encoding mechanism reporesented as an atom, for example "zip", "chunked", etc. Encodings are registered in lists and content encodings are separated from transfer encodings by registering them in different lists.

The HTCoding Object

The HTCoding object represents a registered encoding together with a encoder and a decoder.

typedef struct _HTCoding HTCoding;

Predefined Coding Types We have a set of pre defined atoms for various types of content encodings and transfer encodings. "chunked" is not exactly in the same group as the other encodings such as "binary" but it really doesn't make any difference as it is just a matter of how the words are chosen. The first three transfer encodings are actually not encodings - they are just left overs from brain dead mail systems.

#define WWW_CTE_7BIT		HTAtom_for("7bit")
#define WWW_CTE_8BIT		HTAtom_for("8bit")
#define WWW_CTE_BINARY		HTAtom_for("binary")

#define WWW_CTE_BASE64		HTAtom_for("base64")
#define WWW_CTE_MACBINHEX	HTAtom_for("macbinhex")
#define WWW_CTE_CHUNKED		HTAtom_for("chunked")

#define WWW_CE_COMPRESS		HTAtom_for("compress")
#define WWW_CE_GZIP		HTAtom_for("gzip")

Register Content Coders

There is no difference in registrering a content encoder or a content decoder, it all depends on how you use the list of encoders/decoders.

extern BOOL HTCoding_add (HTList * 	list,
			 const char *	encoding,
			 HTCoder *	encoder,
			 HTCoder *	decoder,
			 double		quality);

extern void HTCoding_deleteAll (HTList * list);

extern const char * HTCoding_name (HTCoding * me);

Content Charsets

Register a Charset

extern void HTCharset_add (HTList *		list,
			   const char *		charset,
			   double		quality);

Delete a list of Charsets

typedef struct _HTAcceptNode {
    HTAtom *	atom;
    double	quality;
} HTAcceptNode;

extern void HTCharset_deleteAll	(HTList * list);

Content Languages

Register a Language

extern void HTLanguage_add (HTList *		list,
			    const char *	lang,
			    double		quality);

Delete a list of Languages

extern void HTLanguage_deleteAll (HTList * list);

Global Preferences

There are two places where these preferences can be registered: in a global list valid for all requests and a local list valid for a particular request only. These are valid for all requests. See the Request Manager fro local sets.

Converters and Presenters

The global list of specific conversions which the format manager can do in order to fulfill the request. There is also a local list of conversions which contains a generic set of possible conversions.

extern void HTFormat_setConversion	(HTList * list);
extern HTList * HTFormat_conversion	(void);

Content Codings

extern void HTFormat_setContentCoding	(HTList * list);
extern HTList * HTFormat_contentCoding	(void);

Content Transfer Codings

extern void HTFormat_setTransferCoding	(HTList * list);
extern HTList * HTFormat_transferCoding	(void);

We also define a macro to find out whether a transfer encoding is really an encoding or whether it is just a "dummy" as for example 7bit, 8bit, and binary.

#define HTFormat_isUnityTransfer(me) \
	((me)==NULL \
	|| (me)==WWW_CTE_BINARY ||  (me)==WWW_CTE_7BIT || (me)==WWW_CTE_8BIT)

Content Languages

extern void HTFormat_setLanguage	(HTList * list);
extern HTList * HTFormat_language	(void);

Content Charsets

extern void HTFormat_setCharset		(HTList * list);
extern HTList * HTFormat_charset	(void);

Delete All Global Lists

This is a convenience function that might make life easier.

extern void HTFormat_deleteAll (void);

Ranking of Accepted Formats

This function is used when the best match among several possible documents is to be found as a function of the accept headers sent in the client request.

typedef struct _HTContentDescription {
    char *	filename;
    HTFormat	content_type;
    HTLanguage	content_language;
    HTEncoding	content_encoding;
    HTEncoding	content_transfer;
    int		content_length;
    double	quality;
} HTContentDescription;

extern BOOL HTRank (HTList * possibilities,
		    HTList * accepted_content_types,
		    HTList * accepted_content_languages,
		    HTList * accepted_content_encodings);

The Content Type Stream Stack

This is the routine which actually sets up the content type conversion. It currently checks only for direct conversions, but multi-stage conversions are forseen. It takes a stream into which the output should be sent in the final format, builds the conversion stack, and returns a stream into which the data in the input format should be fed. If guess is true and input format is www/unknown, try to guess the format by looking at the first few bytes of the stream.

extern HTStream * HTStreamStack (HTFormat	rep_in,
				 HTFormat	rep_out,
				 HTStream *	output_stream,
				 HTRequest *	request,
				 BOOL		guess);

Cost of a Stream Stack

Must return the cost of the same stack which HTStreamStack would set up.

extern double HTStackValue	(HTList *	conversions,
				 HTFormat	format_in,
				 HTFormat	format_out,
				 double		initial_value,
				 long int	length);

Content Encoding Stream Stack

When creating a coding stream stack, it is important that we keep the right order of encoders and decoders. As an example, the HTTP spec specifies that the list in the Content-Encoding header follows the order in which the encodings have been applied to the object. Internally, we represent the content encodings as atoms in a linked list object.

The creation of the content coding stack is not based on quality factors as we don't have the freedom as with content types. When using content codings we must apply the codings specified or fail.

extern HTStream * HTContentCodingStack (HTEncoding	coding,
					HTStream *	target,
					HTRequest *	request,
					void *		param,
					BOOL		encoding);

Here you can provide a complete list instead of a single token. The list has to be filled up in the order the _encodings_ are to be applied

extern HTStream * HTContentEncodingStack (HTList *	encodings,
					  HTStream *	target,
					  HTRequest *	request,
					  void *	param);

Here you can provide a complete list instead of a single token. The list has to be in the order the _encodings_ were applied - that is, the same way that _encodings_ are to be applied. This is all consistent with the order of the Content-Encoding header.

extern HTStream * HTContentDecodingStack (HTList *	encodings,
					  HTStream *	target,
					  HTRequest *	request,
					  void *	param);

Content Transfer Encoding Stream Stack

Creating the transfer content encoding stream stack is not based on quality factors as we don't have the freedom as with content types. Specify whether you you want encoding or decoding using the BOOL "encode" flag.

extern HTStream * HTTransferCodingStack (HTEncoding	encoding,
					 HTStream *	target,
					 HTRequest *	request,
					 void *		param,
					 BOOL		encode);

#endif /* HTFORMAT */

@(#) $Id: HTFormat.html,v 2.67 1996/06/01 17:46:48 frystyk Exp $