File:  [Public] / libwww / Library / src / HTFormat.html
Revision 2.39: download - view: text, annotated - select for diffs
Wed May 10 23:49:40 1995 UTC (29 years, 1 month ago) by frystyk
Branches: MAIN
CVS tags: HEAD
charset and level

<TITLE>The Format Manager in the WWW Library</TITLE>
<NEXTID N="z18">

<H1>The Format Manager</H1>

**	(c) COPYRIGHT CERN 1994.
**	Please first read the full copyright statement in the file COPYRIGH.

Here we describe the functions of the HTFormat module which handles
conversion between different data representations. (In MIME parlance,
a representation is known as a content-type. In <A
HREF="">WWW</A> the
term <EM>format</EM> is often used as it is shorter). The content of
this module is:

<LI><A HREF="#z15">Buffering for I/O</A>
<LI><A HREF="#FormatTypes">Predefined Format Types</A>
<LI><A HREF="#Encoding">Registration of Accepted Content Encodings</A>
<LI><A HREF="#Language">Registration of Accepted Content Languages</A>
<LI><A HREF="#Type">Registration of Accepted Converters and Presenters</A>
<LI><A HREF="#Rank">Ranking of Accepted Formats</A>
<LI><A HREF="#z3">The Stream Stack</A>
<LI><A HREF="#Read">Read from a Socket</A>

This module is implemented by <A HREF="HTFormat.c">HTFormat.c</A>, and
it is a part of the <A NAME="z10"
of Common Code</A>.

#ifndef HTFORMAT_H
#define HTFORMAT_H

#include <A HREF="HTUtils.html">"HTUtils.h"</A>
#include <A HREF="HTStream.html">"HTStream.h"</A>
#include <A HREF="HTAtom.html">"HTAtom.h"</A>
#include <A HREF="HTList.html">"HTList.h"</A>

<H2><A NAME="z15">Buffering for network I/O</A></H2>

This routines provide buffering for READ (and future WRITE) to the
network. It is used by all the protocol modules. The size of the
buffer, <CODE>INPUT_BUFFER_SIZE</CODE>, is a compromis between speed
and memory.

#define INPUT_BUFFER_SIZE 8192

typedef struct _socket_buffer {
	char	input_buffer[INPUT_BUFFER_SIZE];
	char *	input_pointer;
	char *	input_limit;
	SOCKFD	input_file_number;
} HTInputSocket;

<H3>Create an Input Buffer</H3>

This function allocates a input buffer and binds it to the socket
descriptor given as parameter.

extern HTInputSocket* HTInputSocket_new PARAMS((SOCKFD file_number));

<H3>Free an Input Buffer</H3>

extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));

<A NAME="FormatTypes"><H2>The HTFormat types</H2></A>

We use the <A HREF="HTAtom.html">HTAtom</A> object for holding
representations. This allows faster manipulation (comparison and
copying) that if we stayed with strings. These macros (which used to
be constants) define some basic internally referenced

typedef HTAtom * HTFormat;
typedef HTAtom * HTCharset;
typedef HTAtom * HTLevel;

<H3>Internal ones</H3>

The <CODE>www/xxx</CODE> ones are of course not MIME
standard. <P>

<CODE>star/star</CODE> is an output format which leaves the input
untouched. It is useful for diagnostics, and for users who want to see
the original, whatever it is.

#define WWW_SOURCE	HTAtom_for("*/*")      /* Whatever it was originally */

<CODE>www/present</CODE> represents the user's perception of the
document.  If you convert to www/present, you present the material to
the user.

#define WWW_PRESENT	HTAtom_for("www/present")   /* The user's perception */

The message/rfc822 format means a MIME message or a plain text message
with no MIME header. This is what is returned by an HTTP server.

#define WWW_MIME	HTAtom_for("www/mime")		   /* A MIME message */

<CODE>www/print</CODE> is like www/present except it represents a
printed copy.

#define WWW_PRINT	HTAtom_for("www/print")		   /* A printed copy */

<CODE>www/unknown</CODE> is a really unknown type.  Some default
action is appropriate.

#define WWW_UNKNOWN     HTAtom_for("www/unknown")

<H3>MIME ones</H3>

These are regular MIME types defined. Others can be added!

#define WWW_PLAINTEXT 	HTAtom_for("text/plain")
#define WWW_POSTSCRIPT 	HTAtom_for("application/postscript")
#define WWW_RICHTEXT 	HTAtom_for("application/rtf")
#define WWW_AUDIO       HTAtom_for("audio/basic")
#define WWW_HTML 	HTAtom_for("text/html")
#define WWW_BINARY 	HTAtom_for("application/octet-stream")
#define WWW_VIDEO 	HTAtom_for("video/mpeg")
#define WWW_GIF 	HTAtom_for("image/gif")

Extra types used in the library (EXPERIMENT)

#define WWW_NEWSLIST 	HTAtom_for("text/newslist")

We must include the following file after defining HTFormat, to which
it makes reference.

<H2>The HTEncoding type</H2>

typedef HTAtom * HTEncoding;
typedef HTAtom * HTCte;				/* Content transfer encoding */

The following are values for the MIME types:

#define WWW_ENC_7BIT		HTAtom_for("7bit")
#define WWW_ENC_8BIT		HTAtom_for("8bit")
#define WWW_ENC_BINARY		HTAtom_for("binary")
#define WWW_ENC_BASE64		HTAtom_for("base64")

We also add

#define WWW_ENC_COMPRESS	HTAtom_for("compress")

<A NAME="Encoding"><H2>Registration of Accepted Content Encodings</H2></A>

This function is not currently used, as no encoding/decoding streams
have been made :-(

typedef struct _HTContentDescription {
    char *	filename;
    HTAtom *	content_type;
    HTAtom *	content_language;
    HTAtom *	content_encoding;
    int		content_length;
    double	quality;
} HTContentDescription;

extern void HTAcceptEncoding PARAMS((HTList *	list,
				     char *	enc,
				     double	quality));

<A NAME="Language"><H2>Registration of Accepted Content Languages</H2></A>

This function is not currently used.

extern void HTAcceptLanguage PARAMS((HTList *	list,
				     char *	lang,
				     double	quality));

<A NAME="Type"><H2>Registration of Accepted Converters and

A <CODE><A NAME="z12">converter</A></CODE> is a stream with a special
set of parameters and which is registered as capable of converting
from a MIME type to something else (maybe another MIME-type). A
<CODE>presenter</CODE> is a module (possibly an external program)
which can present a graphic object of a certain MIME type to the
user. That is, <CODE>presenters</CODE> are normally used to present
graphic objects that the <CODE>converters</CODE> are not able to
handle. Data is transferred to the external program using the <A
HREF="HTFWrite.html">HTSaveAndExecute</A> stream which writes to a
local file. This stream is actually a normal <CODE>converter</CODE>,
e.g., at strem having the following set of parameters:<P>

#include "HTAccess.h"			/* Required for HTRequest definition */

typedef HTStream * HTConverter	PARAMS((HTRequest *	request,
					void *		param,
					HTFormat	input_format,
					HTFormat	output_format,
					HTStream *	output_stream));

Both <CODE>converters</CODE> and <CODE>presenters</CODE> are set up in
a list which is used by the <A HREF="#z3">StreamStack</A> module to
find the best way to pass the information to the user. <P>

The <CODE>HTPresentation</CODE> structure contains both
<CODE>converters</CODE> and <CODE>presenters</CODE>, and it is defined

typedef struct _HTPresentation {
	HTAtom* rep;			     /* representation name atomized */
	HTAtom* rep_out;			 /* resulting representation */
	HTConverter *converter;	      /* The routine to gen the stream stack */
	char *  command;			       /* MIME-format string */
	char *  test_command;			       /* MIME-format string */
	double	quality;		     /* Between 0 (bad) and 1 (good) */
	double   secs;
	double   secs_per_byte;
} HTPresentation;

<A NAME="z17"><H3>Global List of Converters</H3>

This module keeps a global list of converters. This can be used to get
the set of supported formats. <P>

<B>NOTE:</B> There is also a conversion list associated with each
request in the <A HREF="HTAccess.html#z1">HTRequest
structure</A>. This list can be used to expand the core set of
conversions in the global list. The <A HREF="#z3">StreamStack</A> uses
both lists.

extern HTList * HTConversions;

<H3>Register a Presenter</H3>

<DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
<DD>the <A HREF="#Type">MIME-style</A> format name
<DD>the MAILCAP-style command template
<DD>A degradation faction [0..1]
<DD>A limit on the length acceptable as input (0 infinite)
<DD>A limit on the time user will wait (0 for infinity)

extern void HTSetPresentation	PARAMS((HTList *	conversions,
					CONST char * 	representation,
					CONST char * 	command,
					CONST char * 	test_command,
					double		quality,
					double		secs, 
					double		secs_per_byte));

<H3>Register a Converter</H3>

<DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
<DD>the <A HREF="#Type">MIME-style</A> format name
<DD>is the resulting content-type after the conversion
<DD>is the routine to call which actually does the conversion
<DD>A degradation faction [0..1]
<DD>A limit on the length acceptable as input (0 infinite)
<DD>A limit on the time user will wait (0 for infinity)

extern void HTSetConversion	PARAMS((HTList *	conversions,
					CONST char * 	rep_in,
					CONST char * 	rep_out,
					HTConverter *	converter,
					double		quality,
					double		secs, 
					double		secs_per_byte));

<H3>Set up Default Presenters and Converters</H3>

A default set of <CODE>converters</CODE> and <CODE>presenters</CODE>
are defined in <A HREF="HTInit.c">HTInit.c</A> (or <A
HREF="../../Daemon/Inplementation/HTSUtils.c">HTSInit.c</A> in the
server). <P>

<B>NOTE: </B> No automatic initialization is done in the Library, so
this is for the application to do

extern void HTFormatInit	PARAMS((HTList * conversions));

This function also exists in a version where no
<CODE>presenters</CODE> are initialized. This is intended for Non
Interactive Mode, e.g., in the Line Mode Browser.

extern void HTFormatInitNIM	PARAMS((HTList * conversions));

<H3>Remove presentations and conversions</H3>

This function deletes the <EM>LOCAL </EM>list of
<CODE>converters</CODE> and <CODE>presenters</CODE> associated with
each <A HREF="HTAccess.html#z1">HTRequest structure</A>.

extern void HTFormatDelete	PARAMS((HTRequest * request));

This function cleans up the <EM>GLOBAL</EM> list of converters. The
function is called from <A

extern void HTDisposeConversions NOPARAMS;

<A NAME="Rank"><H2>Ranking of Accepted Formats</H2></A>

This function is used when the best match among several possible
documents is to be found as a function of the accept headers sent in
the client request.

extern BOOL HTRank PARAMS((HTList * possibilities,
			   HTList * accepted_content_types,
			   HTList * accepted_content_languages,
			   HTList * accepted_content_encodings));

<H2><A NAME="z3">HTStreamStack</A></H2>

This is the routine which actually sets up the conversion. It
currently checks only for direct conversions, but multi-stage
conversions are forseen.  It takes a stream into which the output
should be sent in the final format, builds the conversion stack, and
returns a stream into which the data in the input format should be

If <CODE>guess</CODE> is true and input format is
<CODE>www/unknown</CODE>, try to guess the format by looking at the
first few bytes of the stream. <P>

extern HTStream * HTStreamStack PARAMS((HTFormat	rep_in,
					HTFormat	rep_out,
					HTStream *	output_stream,
					HTRequest *	request,
					BOOL		guess));

<H3>Find the cost of a filter stack</H3>

Must return the cost of the same stack which HTStreamStack would set

#define NO_VALUE_FOUND	-1e20		/* returned if none found */

extern double HTStackValue	PARAMS((HTList *	conversions,
					HTFormat	format_in,
					HTFormat	format_out,
					double		initial_value,
					long int	length));

<A NAME="Read"><H2>Read Data from a Socket</H2></A>

This function has replaced many other functions for doing read from a
socket. It automaticly converts from ASCII if we are on a NON-ASCII
machine. This assumes that we do <B>not</B> use this function to read
a local file on a NON-ASCII machine. The following type definition is
to make life easier when having a state machine looking for a
<CODE>&lt;CRLF&gt;</CODE> sequence.

typedef enum _HTSocketEOL {
    EOL_ERR = -1,
    EOL_BEGIN = 0,
} HTSocketEOL;

extern int HTSocketRead	PARAMS((HTRequest * request, HTStream * target));


ALIGN=middle SRC=""> ATTENTION <IMG ALIGN=middle SRC=""> <P>



NAME="z1">HTCopy:  Copy a socket to a stream</A></H2>This is used by the protocol engines
to send data down a stream, typically
one which has been generated by HTStreamStack.
Returns the number of bytes transferred.

extern int HTCopy	PARAMS((SOCKFD		file_number,
				HTStream *	sink));

NAME="c6">HTFileCopy:  Copy a file to a stream</A></H2>This is used by the protocol engines
to send data down a stream, typically
one which has been generated by HTStreamStack.
It is currently called by <A
NAME="z9" HREF="#c7">HTParseFile</A>
<PRE>extern void HTFileCopy PARAMS((
	FILE*			fp,
	HTStream*		sink));

NAME="c2">HTCopyNoCR: Copy a socket to a stream,
stripping CR characters.</A></H2>It is slower than <A
NAME="z2" HREF="#z1">HTCopy</A> .
extern void HTCopyNoCR PARAMS((
	SOCKFD			file_number,
	HTStream*		sink));

<H2>HTParseSocket: Parse a socket given
its format</H2>This routine is called by protocol
modules to load an object.  uses<A
NAME="z4" HREF="#z3">
HTStreamStack</A> and the copy routines
above.  Returns HT_LOADED if succesful,
&lt;0 if not.
<PRE>extern int HTParseSocket PARAMS((
	HTFormat	format_in,
	SOCKFD 		file_number,
	HTRequest *	request));

NAME="c1">HTParseFile: Parse a File through
a file pointer</A></H2>This routine is called by protocols
modules to load an object. uses<A
NAME="z4" HREF="#z3"> HTStreamStack</A>
and <A
NAME="c7" HREF="#c6">HTFileCopy</A> .  Returns HT_LOADED
if succesful, &lt;0 if not.
<PRE>extern int HTParseFile PARAMS((
	HTFormat	format_in,
	FILE		*fp,
	HTRequest *	request));


<H3>Get next character from buffer</H3>

<PRE>extern int HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));

<H3>Read block from input socket</H3>Read *len characters and return a
buffer (don't free) containing *len
characters ( *len may have changed).
Buffer is not NULL-terminated.
<PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
						  int *           len));

extern char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
extern char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
extern char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
extern BOOL   HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));



End of definition module

