/* ** (c) COPYRIGHT MIT 1995. ** Please first read the full copyright statement in the file COPYRIGH. */
The Format Manager is responsible for setting up the stream pipe from the
Channel Object to the Request
Object when data is arriving, for example as a response to s
HTTP Get request. As data arrives, we start
to parse it and the more we know the more we can build up our stream pipe.
For example, in the case of HTTP, we first have a stream that can parse the
HTTP response line containing "200 OK
". Then we have a
MIME parser for handling the MIME headers. When
the MIME headers have been parsed, we know the content type and any encoding
of the MIME body. If we need to decode a chunked encoding then we set up
a chunked decoder, and if we have to parse a HTML object then we set up a
HTML parser.
The Format Manager is also responsible for keeping track of the "preferences" of the application and/or user. It is an integral part of the Web and HTTP, that the client application can express its preferences as a set of "accept" headers in a HTTP request. This task is highly related to the task mentioned above as we there use the modules that are registered and here tell the remote server what we are capable of doing and what we would prefer.
Note: The library core does not define any default decoders or parsers - they are all considered part of the application. The library comes with a default set of parsers including the ones mentioned above which can be initiated using the functions in HTInit module. There are different initialization functions for content type parsers and content encodings respectively.
Currently there are four dimensions for handling "accept" headers:
The application can assign its preferences in two ways: either locally to a single request or globally to all requests. The local assignment can either add to or override the global settings depending on how they are registered. All local registration is handled by the Request Object and the global registration is handled by the Format Manager
When data is arriving and the Format Manager is to build a stream pipe from the HTChannel Object to the Request Object, it uses a special set of stream stack algorithms. These algorithms take the complete set of content type converters, for example, and finds the one best suited for the task. We currently have three stream stack algorithms: one for content types, one for content encodings, and one for content transfer encodings:
As part of the HTTP content negotiation, a server must be able to match the preferences sent by the client in an HTTP request with the possible set of documents that it has avaiable fro this URL. For example , it may have an English and a Danish version in which case it looks at the Accept-Language header and sees what the client prefers. The Library has a simple "Match" algorithm for finding the best match.
This module is implemented by HTFormat.c, and it is a part of the W3C Reference Library.
#ifndef HTFORMAT_H #define HTFORMAT_H #include "HTUtils.h" #include "HTStream.h" #include "HTAtom.h" #include "HTList.h" #include "HTAnchor.h" #include "HTReq.h"
All content type converters are subclassed from the Generic stream objetc. That way, we allow the application to do very fast progressive display of incoming data. In other words, the stream model of the Library provides data as soon as it arrives from the network, the application does not have to wait until the whole document has been down loaded before it starts parsing it.
A converter
is a stream with a special set of parameters and
which is registered as capable of converting from a MIME type to something
else (maybe another MIME-type). A converter is defined to be a function returning
a stream and accepting the following parameters. The content type elements
are atoms for which we have defined a prototype.
typedef HTStream * HTConverter (HTRequest * request, void * param, HTFormat input_format, HTFormat output_format, HTStream * output_stream);
A presenter
is a module (possibly an external program) which
can present a graphic object of a certain MIME type to the user. That is,
presenters
are normally used to present objects that the
converters
are not able to handle. Data is transferred to the
external program using for example the
HTSaveAndExecute stream which writes to a local
file. Both presenters and converters are of the type
HTConverter.
typedef struct _HTPresentation { HTFormat rep; /* representation name atomized */ HTFormat rep_out; /* resulting representation */ HTConverter *converter; /* The routine to gen the stream stack */ char * command; /* MIME-format string */ char * test_command; /* MIME-format string */ double quality; /* Between 0 (bad) and 1 (good) */ double secs; double secs_per_byte; } HTPresentation;
We have a small set of basic converters that can be hooked in anywhere. They don't "convert" anything but are nice to have.
extern HTConverter HTThroughLine; extern HTConverter HTBlackHoleConverter;
These macros (which used to be constants) define some basic internally referenced
representations. The www/xxx
ones are of course not MIME standard.
They are internal representations used in the Library but they can't be exported
to other apps!
#define WWW_RAW HTAtom_for("www/void") /* Raw output from Protocol */
WWW_RAW
is an output format which leaves the input untouched
exactly as it is received by the protocol module. For example, in
the case of FTP, this format returns raw ASCII objects for directory listings;
for HTTP, everything including the header is returned, for Gopher, a raw
ASCII object is returned for a menu etc.
#define WWW_SOURCE HTAtom_for("*/*")
WWW_SOURCE
is an output format which leaves the input untouched
exactly as it is received by the protocol module IF not a
suitable converter has been registered with a quality factor higher than
1 (for example 2). In this case the SUPER CONVERTER is preferred
for the raw output. This can be used as a filter effect that allows conversion
from, for example raw FTPdirectory listings into HTML but passes a MIME body
untouched.
#define WWW_PRESENT HTAtom_for("www/present")
WWW_PRESENT
represents the user's perception of the document.
If you convert to WWW_PRESENT
, you present the material to the
user.
#define WWW_DEBUG HTAtom_for("www/debug")
WWW_DEBUG
represents the user's perception of debug information,
for example sent as a HTML document in a HTTP redirection message.
#define WWW_UNKNOWN HTAtom_for("www/unknown")
WWW_UNKNOWN
is a really unknown type. It differs from the real
MIME type "application/octet-stream" in that we haven't even tried
to figure out the content type at this point.
These are regular MIME types defined. Others can be added!
#define WWW_HTML HTAtom_for("text/html") #define WWW_PLAINTEXT HTAtom_for("text/plain") #define WWW_MIME HTAtom_for("message/rfc822") #define WWW_MIME_HEAD HTAtom_for("message/x-rfc822-head") #define WWW_MIME_FOOT HTAtom_for("message/x-rfc822-foot") #define WWW_AUDIO HTAtom_for("audio/basic") #define WWW_VIDEO HTAtom_for("video/mpeg") #define WWW_GIF HTAtom_for("image") #define WWW_JPEG HTAtom_for("image/jpeg") #define WWW_TIFF HTAtom_for("image/tiff") #define WWW_PNG HTAtom_for("image/png") #define WWW_BINARY HTAtom_for("application/octet-stream") #define WWW_POSTSCRIPT HTAtom_for("application/postscript") #define WWW_RICHTEXT HTAtom_for("application/rtf")
We also have some MIME types that come from the various protocols when we convert from ASCII to HTML.
#define WWW_GOPHER_MENU HTAtom_for("text/x-gopher") #define WWW_CSO_SEARCH HTAtom_for("text/x-cso") #define WWW_FTP_LNST HTAtom_for("text/x-ftp-lnst") #define WWW_FTP_LIST HTAtom_for("text/x-ftp-list") #define WWW_NNTP_LIST HTAtom_for("text/x-nntp-list") #define WWW_NNTP_OVER HTAtom_for("text/x-nntp-over") #define WWW_NNTP_HEAD HTAtom_for("text/x-nntp-head") #define WWW_HTTP HTAtom_for("text/x-http")
Finally we have defined a special format for our RULE files as they can be handled by a special converter.
#define WWW_RULES HTAtom_for("application/x-www-rules")
This function creates a presenter object and adds to the list of conversions. These are the parameters :
conveters
and presenters
extern void HTPresentation_add (HTList * conversions, const char * representation, const char * command, const char * test_command, double quality, double secs, double secs_per_byte); extern void HTPresentation_deleteAll (HTList * list);
This function creates a presenter object and adds to the list of conversions.
conveters
and presenters
extern void HTConversion_add (HTList * conversions, const char * rep_in, const char * rep_out, HTConverter * converter, double quality, double secs, double secs_per_byte); extern void HTConversion_deleteAll (HTList * list);
Content codins are transformations applied to an entity object after it was created in its original form. The Library handles two types of codings:
Both types of encodings use the same registration mechanism in the Library which we describe below:
Encoders and decoders are subclassed from the generic stream class. Encoders are capable of adding a content coding to a data object and decoders can remove a content coding.
typedef HTStream * HTCoder (HTRequest * request, void * param, HTEncoding coding, HTStream * target);
The encoding is the name of the encoding mechanism reporesented as an atom, for example "zip", "chunked", etc. Encodings are registered in lists and content encodings are separated from transfer encodings by registering them in different lists.
The HTCoding object represents a registered encoding together with a encoder and a decoder.
typedef struct _HTCoding HTCoding;
Predefined Coding Types We have a set of pre defined atoms for various types of content encodings and transfer encodings. "chunked" is not exactly in the same group as the other encodings such as "binary" but it really doesn't make any difference as it is just a matter of how the words are chosen. The first three transfer encodings are actually not encodings - they are just left overs from brain dead mail systems.
#define WWW_CTE_7BIT HTAtom_for("7bit") #define WWW_CTE_8BIT HTAtom_for("8bit") #define WWW_CTE_BINARY HTAtom_for("binary") #define WWW_CTE_BASE64 HTAtom_for("base64") #define WWW_CTE_MACBINHEX HTAtom_for("macbinhex") #define WWW_CTE_CHUNKED HTAtom_for("chunked") #define WWW_CE_COMPRESS HTAtom_for("compress") #define WWW_CE_GZIP HTAtom_for("gzip")
There is no difference in registrering a content encoder or a content decoder, it all depends on how you use the list of encoders/decoders.
extern BOOL HTCoding_add (HTList * list, const char * encoding, HTCoder * encoder, HTCoder * decoder, double quality); extern void HTCoding_deleteAll (HTList * list); extern const char * HTCoding_name (HTCoding * me);
extern void HTCharset_add (HTList * list, const char * charset, double quality);
typedef struct _HTAcceptNode { HTAtom * atom; double quality; } HTAcceptNode;
extern void HTCharset_deleteAll (HTList * list);
extern void HTLanguage_add (HTList * list, const char * lang, double quality);
extern void HTLanguage_deleteAll (HTList * list);
There are two places where these preferences can be registered: in a global list valid for all requests and a local list valid for a particular request only. These are valid for all requests. See the Request Manager fro local sets.
The global list of specific conversions which the format manager can do in order to fulfill the request. There is also a local list of conversions which contains a generic set of possible conversions.
extern void HTFormat_setConversion (HTList * list); extern HTList * HTFormat_conversion (void);
extern void HTFormat_setContentCoding (HTList * list); extern HTList * HTFormat_contentCoding (void);
extern void HTFormat_setTransferCoding (HTList * list); extern HTList * HTFormat_transferCoding (void);
We also define a macro to find out whether a transfer encoding is really an encoding or whether it is just a "dummy" as for example 7bit, 8bit, and binary.
#define HTFormat_isUnityTransfer(me) \ ((me)==NULL \ || (me)==WWW_CTE_BINARY || (me)==WWW_CTE_7BIT || (me)==WWW_CTE_8BIT)
extern void HTFormat_setLanguage (HTList * list); extern HTList * HTFormat_language (void);
extern void HTFormat_setCharset (HTList * list); extern HTList * HTFormat_charset (void);
This is a convenience function that might make life easier.
extern void HTFormat_deleteAll (void);
This function is used when the best match among several possible documents is to be found as a function of the accept headers sent in the client request.
typedef struct _HTContentDescription { char * filename; HTFormat content_type; HTLanguage content_language; HTEncoding content_encoding; HTEncoding content_transfer; int content_length; double quality; } HTContentDescription; extern BOOL HTRank (HTList * possibilities, HTList * accepted_content_types, HTList * accepted_content_languages, HTList * accepted_content_encodings);
This is the routine which actually sets up the content type conversion. It
currently checks only for direct conversions, but multi-stage conversions
are forseen. It takes a stream into which the output should be sent in the
final format, builds the conversion stack, and returns a stream into which
the data in the input format should be fed. If guess
is true
and input format is www/unknown
, try to guess the format by
looking at the first few bytes of the stream.
extern HTStream * HTStreamStack (HTFormat rep_in, HTFormat rep_out, HTStream * output_stream, HTRequest * request, BOOL guess);
Must return the cost of the same stack which HTStreamStack would set up.
extern double HTStackValue (HTList * conversions, HTFormat format_in, HTFormat format_out, double initial_value, long int length);
When creating a coding stream stack, it is important that we keep the right order of encoders and decoders. As an example, the HTTP spec specifies that the list in the Content-Encoding header follows the order in which the encodings have been applied to the object. Internally, we represent the content encodings as atoms in a linked list object.
The creation of the content coding stack is not based on quality factors as we don't have the freedom as with content types. When using content codings we must apply the codings specified or fail.
extern HTStream * HTContentCodingStack (HTEncoding coding, HTStream * target, HTRequest * request, void * param, BOOL encoding);
Here you can provide a complete list instead of a single token. The list has to be filled up in the order the _encodings_ are to be applied
extern HTStream * HTContentEncodingStack (HTList * encodings, HTStream * target, HTRequest * request, void * param);
Here you can provide a complete list instead of a single token. The list has to be in the order the _encodings_ were applied - that is, the same way that _encodings_ are to be applied. This is all consistent with the order of the Content-Encoding header.
extern HTStream * HTContentDecodingStack (HTList * encodings, HTStream * target, HTRequest * request, void * param);
Creating the transfer content encoding stream stack is not based on quality factors as we don't have the freedom as with content types. Specify whether you you want encoding or decoding using the BOOL "encode" flag.
extern HTStream * HTTransferCodingStack (HTEncoding encoding, HTStream * target, HTRequest * request, void * param, BOOL encode);
#endif /* HTFORMAT */