File:  [Public] / libwww / Library / src / HTFormat.html
Revision 2.39: download - view: text, annotated - select for diffs
Wed May 10 23:49:40 1995 UTC (29 years, 1 month ago) by frystyk
Branches: MAIN
CVS tags: HEAD
charset and level

<HTML>
<HEAD>
<TITLE>The Format Manager in the WWW Library</TITLE>
<NEXTID N="z18">
</HEAD>
<BODY>

<H1>The Format Manager</H1>

<PRE>
/*
**	(c) COPYRIGHT CERN 1994.
**	Please first read the full copyright statement in the file COPYRIGH.
*/
</PRE>

Here we describe the functions of the HTFormat module which handles
conversion between different data representations. (In MIME parlance,
a representation is known as a content-type. In <A
HREF="http://www.w3.org/hypertext/WWW/TheProject.html">WWW</A> the
term <EM>format</EM> is often used as it is shorter). The content of
this module is:

<UL>
<LI><A HREF="#z15">Buffering for I/O</A>
<LI><A HREF="#FormatTypes">Predefined Format Types</A>
<LI><A HREF="#Encoding">Registration of Accepted Content Encodings</A>
<LI><A HREF="#Language">Registration of Accepted Content Languages</A>
<LI><A HREF="#Type">Registration of Accepted Converters and Presenters</A>
<LI><A HREF="#Rank">Ranking of Accepted Formats</A>
<LI><A HREF="#z3">The Stream Stack</A>
<LI><A HREF="#Read">Read from a Socket</A>
</UL>

This module is implemented by <A HREF="HTFormat.c">HTFormat.c</A>, and
it is a part of the <A NAME="z10"
HREF="http://www.w3.org/hypertext/WWW/Library/User/Guide/Guide.html">Library
of Common Code</A>.

<PRE>
#ifndef HTFORMAT_H
#define HTFORMAT_H

#include <A HREF="HTUtils.html">"HTUtils.h"</A>
#include <A HREF="HTStream.html">"HTStream.h"</A>
#include <A HREF="HTAtom.html">"HTAtom.h"</A>
#include <A HREF="HTList.html">"HTList.h"</A>
</PRE>

<H2><A NAME="z15">Buffering for network I/O</A></H2>

This routines provide buffering for READ (and future WRITE) to the
network. It is used by all the protocol modules. The size of the
buffer, <CODE>INPUT_BUFFER_SIZE</CODE>, is a compromis between speed
and memory.

<PRE>
#define INPUT_BUFFER_SIZE 8192

typedef struct _socket_buffer {
	char	input_buffer[INPUT_BUFFER_SIZE];
	char *	input_pointer;
	char *	input_limit;
	SOCKFD	input_file_number;
} HTInputSocket;
</PRE>

<H3>Create an Input Buffer</H3>

This function allocates a input buffer and binds it to the socket
descriptor given as parameter.

<PRE>
extern HTInputSocket* HTInputSocket_new PARAMS((SOCKFD file_number));
</PRE>

<H3>Free an Input Buffer</H3>

<PRE>
extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
</PRE>

<A NAME="FormatTypes"><H2>The HTFormat types</H2></A>

We use the <A HREF="HTAtom.html">HTAtom</A> object for holding
representations. This allows faster manipulation (comparison and
copying) that if we stayed with strings. These macros (which used to
be constants) define some basic internally referenced
representations.<P>

<PRE>
typedef HTAtom * HTFormat;
typedef HTAtom * HTCharset;
typedef HTAtom * HTLevel;
</PRE>

<H3>Internal ones</H3>

The <CODE>www/xxx</CODE> ones are of course not MIME
standard. <P>

<CODE>star/star</CODE> is an output format which leaves the input
untouched. It is useful for diagnostics, and for users who want to see
the original, whatever it is.

<PRE>
#define WWW_SOURCE	HTAtom_for("*/*")      /* Whatever it was originally */
</PRE>

<CODE>www/present</CODE> represents the user's perception of the
document.  If you convert to www/present, you present the material to
the user.

<PRE>
#define WWW_PRESENT	HTAtom_for("www/present")   /* The user's perception */
</PRE>

The message/rfc822 format means a MIME message or a plain text message
with no MIME header. This is what is returned by an HTTP server.

<PRE>
#define WWW_MIME	HTAtom_for("www/mime")		   /* A MIME message */
</PRE>

<CODE>www/print</CODE> is like www/present except it represents a
printed copy.

<PRE>
#define WWW_PRINT	HTAtom_for("www/print")		   /* A printed copy */
</PRE>

<CODE>www/unknown</CODE> is a really unknown type.  Some default
action is appropriate.

<PRE>
#define WWW_UNKNOWN     HTAtom_for("www/unknown")
</PRE>


<H3>MIME ones</H3>

These are regular MIME types defined. Others can be added!

<PRE>
#define WWW_PLAINTEXT 	HTAtom_for("text/plain")
#define WWW_POSTSCRIPT 	HTAtom_for("application/postscript")
#define WWW_RICHTEXT 	HTAtom_for("application/rtf")
#define WWW_AUDIO       HTAtom_for("audio/basic")
#define WWW_HTML 	HTAtom_for("text/html")
#define WWW_BINARY 	HTAtom_for("application/octet-stream")
#define WWW_VIDEO 	HTAtom_for("video/mpeg")
#define WWW_GIF 	HTAtom_for("image/gif")
</PRE>

Extra types used in the library (EXPERIMENT)

<PRE>
#define WWW_NEWSLIST 	HTAtom_for("text/newslist")
</PRE>

We must include the following file after defining HTFormat, to which
it makes reference.

<H2>The HTEncoding type</H2>

<PRE>
typedef HTAtom * HTEncoding;
typedef HTAtom * HTCte;				/* Content transfer encoding */
</PRE>

The following are values for the MIME types:

<PRE>
#define WWW_ENC_7BIT		HTAtom_for("7bit")
#define WWW_ENC_8BIT		HTAtom_for("8bit")
#define WWW_ENC_BINARY		HTAtom_for("binary")
#define WWW_ENC_BASE64		HTAtom_for("base64")
</PRE>

We also add

<PRE>
#define WWW_ENC_COMPRESS	HTAtom_for("compress")
</PRE>

<A NAME="Encoding"><H2>Registration of Accepted Content Encodings</H2></A>

This function is not currently used, as no encoding/decoding streams
have been made :-(

<PRE>
typedef struct _HTContentDescription {
    char *	filename;
    HTAtom *	content_type;
    HTAtom *	content_language;
    HTAtom *	content_encoding;
    int		content_length;
    double	quality;
} HTContentDescription;

extern void HTAcceptEncoding PARAMS((HTList *	list,
				     char *	enc,
				     double	quality));
</PRE>

<A NAME="Language"><H2>Registration of Accepted Content Languages</H2></A>

This function is not currently used.

<PRE>
extern void HTAcceptLanguage PARAMS((HTList *	list,
				     char *	lang,
				     double	quality));
</PRE>

<A NAME="Type"><H2>Registration of Accepted Converters and
Presenters</H2></A>

A <CODE><A NAME="z12">converter</A></CODE> is a stream with a special
set of parameters and which is registered as capable of converting
from a MIME type to something else (maybe another MIME-type). A
<CODE>presenter</CODE> is a module (possibly an external program)
which can present a graphic object of a certain MIME type to the
user. That is, <CODE>presenters</CODE> are normally used to present
graphic objects that the <CODE>converters</CODE> are not able to
handle. Data is transferred to the external program using the <A
HREF="HTFWrite.html">HTSaveAndExecute</A> stream which writes to a
local file. This stream is actually a normal <CODE>converter</CODE>,
e.g., at strem having the following set of parameters:<P>

<PRE>
#include "HTAccess.h"			/* Required for HTRequest definition */

typedef HTStream * HTConverter	PARAMS((HTRequest *	request,
					void *		param,
					HTFormat	input_format,
					HTFormat	output_format,
					HTStream *	output_stream));
</PRE>

Both <CODE>converters</CODE> and <CODE>presenters</CODE> are set up in
a list which is used by the <A HREF="#z3">StreamStack</A> module to
find the best way to pass the information to the user. <P>

The <CODE>HTPresentation</CODE> structure contains both
<CODE>converters</CODE> and <CODE>presenters</CODE>, and it is defined
as:

<PRE>
typedef struct _HTPresentation {
	HTAtom* rep;			     /* representation name atomized */
	HTAtom* rep_out;			 /* resulting representation */
	HTConverter *converter;	      /* The routine to gen the stream stack */
	char *  command;			       /* MIME-format string */
	char *  test_command;			       /* MIME-format string */
	double	quality;		     /* Between 0 (bad) and 1 (good) */
	double   secs;
	double   secs_per_byte;
} HTPresentation;
</PRE>


<A NAME="z17"><H3>Global List of Converters</H3>

This module keeps a global list of converters. This can be used to get
the set of supported formats. <P>

<B>NOTE:</B> There is also a conversion list associated with each
request in the <A HREF="HTAccess.html#z1">HTRequest
structure</A>. This list can be used to expand the core set of
conversions in the global list. The <A HREF="#z3">StreamStack</A> uses
both lists.

<PRE>
extern HTList * HTConversions;
</PRE>

<H3>Register a Presenter</H3>

<DL>
<DT>conversions
<DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
<DT>representation
<DD>the <A HREF="#Type">MIME-style</A> format name
<DT>command
<DD>the MAILCAP-style command template
<DT>quality
<DD>A degradation faction [0..1]
<DT>maxbytes
<DD>A limit on the length acceptable as input (0 infinite)
<DT>maxsecs
<DD>A limit on the time user will wait (0 for infinity)
</DL>

<PRE>
extern void HTSetPresentation	PARAMS((HTList *	conversions,
					CONST char * 	representation,
					CONST char * 	command,
					CONST char * 	test_command,
					double		quality,
					double		secs, 
					double		secs_per_byte));
</PRE>

<H3>Register a Converter</H3>

<DL>
<DT>conversions
<DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
<DT>rep_in
<DD>the <A HREF="#Type">MIME-style</A> format name
<DT>rep_out
<DD>is the resulting content-type after the conversion
<DT>converter
<DD>is the routine to call which actually does the conversion
<DT>quality
<DD>A degradation faction [0..1]
<DT>maxbytes
<DD>A limit on the length acceptable as input (0 infinite)
<DT>maxsecs
<DD>A limit on the time user will wait (0 for infinity)
</DL>

<PRE>
extern void HTSetConversion	PARAMS((HTList *	conversions,
					CONST char * 	rep_in,
					CONST char * 	rep_out,
					HTConverter *	converter,
					double		quality,
					double		secs, 
					double		secs_per_byte));
</PRE>

<H3>Set up Default Presenters and Converters</H3>

A default set of <CODE>converters</CODE> and <CODE>presenters</CODE>
are defined in <A HREF="HTInit.c">HTInit.c</A> (or <A
HREF="../../Daemon/Inplementation/HTSUtils.c">HTSInit.c</A> in the
server). <P>

<B>NOTE: </B> No automatic initialization is done in the Library, so
this is for the application to do

<PRE>
extern void HTFormatInit	PARAMS((HTList * conversions));
</PRE>

This function also exists in a version where no
<CODE>presenters</CODE> are initialized. This is intended for Non
Interactive Mode, e.g., in the Line Mode Browser.

<PRE>
extern void HTFormatInitNIM	PARAMS((HTList * conversions));
</PRE>

<H3>Remove presentations and conversions</H3>

This function deletes the <EM>LOCAL </EM>list of
<CODE>converters</CODE> and <CODE>presenters</CODE> associated with
each <A HREF="HTAccess.html#z1">HTRequest structure</A>.

<PRE>
extern void HTFormatDelete	PARAMS((HTRequest * request));
</PRE>

This function cleans up the <EM>GLOBAL</EM> list of converters. The
function is called from <A
HREF="HTAccess.html#Library">HTLibTerminate</A>.

<PRE>
extern void HTDisposeConversions NOPARAMS;
</PRE>

<A NAME="Rank"><H2>Ranking of Accepted Formats</H2></A>

This function is used when the best match among several possible
documents is to be found as a function of the accept headers sent in
the client request.

<PRE>
extern BOOL HTRank PARAMS((HTList * possibilities,
			   HTList * accepted_content_types,
			   HTList * accepted_content_languages,
			   HTList * accepted_content_encodings));
</PRE>

<H2><A NAME="z3">HTStreamStack</A></H2>

This is the routine which actually sets up the conversion. It
currently checks only for direct conversions, but multi-stage
conversions are forseen.  It takes a stream into which the output
should be sent in the final format, builds the conversion stack, and
returns a stream into which the data in the input format should be
fed.<P>

If <CODE>guess</CODE> is true and input format is
<CODE>www/unknown</CODE>, try to guess the format by looking at the
first few bytes of the stream. <P>

<PRE>
extern HTStream * HTStreamStack PARAMS((HTFormat	rep_in,
					HTFormat	rep_out,
					HTStream *	output_stream,
					HTRequest *	request,
					BOOL		guess));
</PRE>

<H3>Find the cost of a filter stack</H3>

Must return the cost of the same stack which HTStreamStack would set
up.

<PRE>
#define NO_VALUE_FOUND	-1e20		/* returned if none found */

extern double HTStackValue	PARAMS((HTList *	conversions,
					HTFormat	format_in,
					HTFormat	format_out,
					double		initial_value,
					long int	length));
</PRE>

<A NAME="Read"><H2>Read Data from a Socket</H2></A>

This function has replaced many other functions for doing read from a
socket. It automaticly converts from ASCII if we are on a NON-ASCII
machine. This assumes that we do <B>not</B> use this function to read
a local file on a NON-ASCII machine. The following type definition is
to make life easier when having a state machine looking for a
<CODE>&lt;CRLF&gt;</CODE> sequence.

<PRE>
typedef enum _HTSocketEOL {
    EOL_ERR = -1,
    EOL_BEGIN = 0,
    EOL_FCR,
    EOL_FLF,
    EOL_DOT,
    EOL_SCR,
    EOL_SLF
} HTSocketEOL;

extern int HTSocketRead	PARAMS((HTRequest * request, HTStream * target));
</PRE>

<HR>

<B><IMG
ALIGN=middle SRC="http://www.w3.org/hypertext/WWW/Icons/32x32/caution.gif"> ATTENTION <IMG ALIGN=middle SRC="http://www.w3.org/hypertext/WWW/Icons/32x32/caution.gif"> <P>

THE REST OF THE FUNCTION DEFINED IN THIS MODULE ARE GOING TO BE
OBSOLETE SO DO NOT USE THEM - THEY ARE NOT REENTRANT.</B>

<HR>

<H2><A
NAME="z1">HTCopy:  Copy a socket to a stream</A></H2>This is used by the protocol engines
to send data down a stream, typically
one which has been generated by HTStreamStack.
Returns the number of bytes transferred.

<PRE>
extern int HTCopy	PARAMS((SOCKFD		file_number,
				HTStream *	sink));
</PRE>

<H2><A
NAME="c6">HTFileCopy:  Copy a file to a stream</A></H2>This is used by the protocol engines
to send data down a stream, typically
one which has been generated by HTStreamStack.
It is currently called by <A
NAME="z9" HREF="#c7">HTParseFile</A>
<PRE>extern void HTFileCopy PARAMS((
	FILE*			fp,
	HTStream*		sink));

	
</PRE>
<H2><A
NAME="c2">HTCopyNoCR: Copy a socket to a stream,
stripping CR characters.</A></H2>It is slower than <A
NAME="z2" HREF="#z1">HTCopy</A> .
<PRE>
extern void HTCopyNoCR PARAMS((
	SOCKFD			file_number,
	HTStream*		sink));


</PRE>
<H2>HTParseSocket: Parse a socket given
its format</H2>This routine is called by protocol
modules to load an object.  uses<A
NAME="z4" HREF="#z3">
HTStreamStack</A> and the copy routines
above.  Returns HT_LOADED if succesful,
&lt;0 if not.
<PRE>extern int HTParseSocket PARAMS((
	HTFormat	format_in,
	SOCKFD 		file_number,
	HTRequest *	request));

</PRE>
<H2><A
NAME="c1">HTParseFile: Parse a File through
a file pointer</A></H2>This routine is called by protocols
modules to load an object. uses<A
NAME="z4" HREF="#z3"> HTStreamStack</A>
and <A
NAME="c7" HREF="#c6">HTFileCopy</A> .  Returns HT_LOADED
if succesful, &lt;0 if not.
<PRE>extern int HTParseFile PARAMS((
	HTFormat	format_in,
	FILE		*fp,
	HTRequest *	request));

</PRE>


<H3>Get next character from buffer</H3>

<PRE>extern int HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
</PRE>

<H3>Read block from input socket</H3>Read *len characters and return a
buffer (don't free) containing *len
characters ( *len may have changed).
Buffer is not NULL-terminated.
<PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
						  int *           len));

extern char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
extern char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
extern char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
extern BOOL   HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));

#endif

</PRE>


End of definition module

</BODY>
</HTML>

Webmaster