Annotation of libwww/Library/src/HTFormat.html, revision 2.13

2.10      timbl       1: <HTML>
                      2: <HEAD>
2.1       timbl       3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.13    ! timbl       4: <NEXTID N="z13">
2.10      timbl       5: </HEAD>
2.1       timbl       6: <BODY>
                      7: <H1>Manage different document formats</H1>Here we describe the functions of
                      8: the HTFormat module which handles
                      9: conversion between different data
                     10: representations.  (In MIME parlance,
                     11: a representation is known as a content-type.
2.2       timbl      12: In WWW  the term "format" is often
2.1       timbl      13: used as it is shorter).<P>
                     14: This module is implemented by <A
2.10      timbl      15: NAME="z0" HREF="HTFormat.c">HTFormat.c</A>
2.7       timbl      16: . This hypertext document is used
                     17: to generate the <A
2.11      timbl      18: NAME="z8" HREF="HTFormat.h">HTFormat.h</A> include
2.9       timbl      19: file.  Part of the <A
2.10      timbl      20: NAME="z10" HREF="Overview.html">WWW library</A> .
2.1       timbl      21: <H2>Preamble</H2>
                     22: <PRE>#ifndef HTFORMAT_H
                     23: #define HTFORMAT_H
                     24: 
                     25: #include "HTUtils.h"
                     26: #include <A
2.10      timbl      27: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1       timbl      28: #include "HTAtom.h"
2.2       timbl      29: #include "HTList.h"
2.1       timbl      30: 
                     31: #ifdef SHORT_NAMES
                     32: #define HTOutputSource HTOuSour
                     33: #define HTOutputBinary HTOuBina
                     34: #endif
                     35: 
                     36: </PRE>
                     37: <H2>The HTFormat type</H2>We use the HTAtom object for holding
                     38: representations. This allows faster
                     39: manipulation (comparison and copying)
                     40: that if we stayed with strings.
                     41: <PRE>typedef HTAtom * HTFormat;
2.13    ! timbl      42: 
        !            43: </PRE>Now we can pick up the HTAccess definitions
        !            44: which we need later on but which
        !            45: use HTFormat.
        !            46: <PRE>#include "HTAccess.h"   /* Required for HTRequest definition */
        !            47: 
2.1       timbl      48:                        
                     49: </PRE>These macros (which used to be constants)
                     50: define some basic internally referenced
2.13    ! timbl      51: representations. 
        !            52: <H3>Internal ones</H3>The www/xxx ones are of course not
        !            53: MIME standard.<P>
2.1       timbl      54: www/source  is an output format which
                     55: leaves the input untouched. It is
                     56: useful for diagnostics, and for users
                     57: who want to see the original, whatever
                     58: it is.
2.13    ! timbl      59: <H3></H3>
        !            60: <PRE>#define WWW_SOURCE HTAtom_for("www/source")       /* Whatever it was originally*/
2.1       timbl      61: 
                     62: </PRE>www/present represents the user's
                     63: perception of the document.  If you
                     64: convert to www/present, you present
                     65: the material to the user. 
                     66: <PRE>#define WWW_PRESENT HTAtom_for("www/present")     /* The user's perception */
                     67: 
                     68: </PRE>The message/rfc822 format means a
                     69: MIME message or a plain text message
                     70: with no MIME header. This is what
                     71: is returned by an HTTP server.
                     72: <PRE>#define WWW_MIME HTAtom_for("www/mime")           /* A MIME message */
2.10      timbl      73: 
2.1       timbl      74: </PRE>www/print is like www/present except
                     75: it represents a printed copy.
                     76: <PRE>#define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
                     77: 
2.10      timbl      78: </PRE>www/unknown is a really unknown type.
2.11      timbl      79: Some default action is appropriate.
2.10      timbl      80: <PRE>#define WWW_UNKNOWN     HTAtom_for("www/unknown")
                     81: 
2.13    ! timbl      82: 
        !            83: 
        !            84: </PRE>
        !            85: <H3>MIME ones (a few)</H3>These are regular MIME types.  HTML
2.11      timbl      86: is assumed to be added by the W3
                     87: code. application/octet-stream was
                     88: mistakenly application/binary in
                     89: earlier libwww versions (pre 2.11).
2.10      timbl      90: <PRE>#define WWW_PLAINTEXT     HTAtom_for("text/plain")
2.1       timbl      91: #define WWW_POSTSCRIPT         HTAtom_for("application/postscript")
                     92: #define WWW_RICHTEXT   HTAtom_for("application/rtf")
2.10      timbl      93: #define WWW_AUDIO       HTAtom_for("audio/basic")
2.1       timbl      94: #define WWW_HTML       HTAtom_for("text/html")
2.11      timbl      95: #define WWW_BINARY     HTAtom_for("application/octet-stream")
2.7       timbl      96: 
2.1       timbl      97: </PRE>We must include the following file
                     98: after defining HTFormat, to which
2.10      timbl      99: it makes reference.
                    100: <H2>The HTEncoding type</H2>
                    101: <PRE>typedef HTAtom* HTEncoding;
                    102: 
                    103: </PRE>The following are values for the
                    104: MIME types:
                    105: <PRE>#define WWW_ENC_7BIT              HTAtom_for("7bit")
                    106: #define WWW_ENC_8BIT           HTAtom_for("8bit")
                    107: #define WWW_ENC_BINARY         HTAtom_for("binary")
                    108: 
                    109: </PRE>We also add
                    110: <PRE>#define WWW_ENC_COMPRESS  HTAtom_for("compress")
                    111: 
                    112: #include "HTAnchor.h"
2.1       timbl     113: 
                    114: </PRE>
                    115: <H2>The HTPresentation and HTConverter
                    116: types</H2>This HTPresentation structure represents
                    117: a possible conversion algorithm from
                    118: one format to annother.  It includes
                    119: a pointer to a conversion routine.
                    120: The conversion routine returns a
                    121: stream to which data should be fed.
                    122: See also <A
2.12      timbl     123: NAME="z5" HREF="#z3">HTStreamStack</A> which scans
2.1       timbl     124: the list of registered converters
                    125: and calls one. See the <A
2.10      timbl     126: NAME="z6" HREF="HTInit.html">initialisation
2.1       timbl     127: module</A> for a list of conversion routines.
                    128: <PRE>typedef struct _HTPresentation HTPresentation;
                    129: 
2.13    ! timbl     130: typedef HTStream * <A
        !           131: NAME="z12">HTConverter</A> PARAMS((
        !           132:        HTRequest *             request,
        !           133:        void *                  param,
        !           134:        HTFormat                input_format,
        !           135:        HTFormat                output_format,
        !           136:        HTStream *              output_stream));
2.1       timbl     137:        
                    138: struct _HTPresentation {
2.13    ! timbl     139:        HTAtom* rep;            /* representation name atomized */
2.1       timbl     140:        HTAtom* rep_out;        /* resulting representation */
2.2       timbl     141:        HTConverter *converter; /* The routine to gen the stream stack */
2.1       timbl     142:        char *  command;        /* MIME-format string */
                    143:        float   quality;        /* Between 0 (bad) and 1 (good) */
                    144:        float   secs;
                    145:        float   secs_per_byte;
                    146: };
                    147: 
                    148: </PRE>The list of presentations is kept
                    149: by this module.  It is also scanned
                    150: by modules which want to know the
                    151: set of formats supported. for example.
2.13    ! timbl     152: <PRE>/* extern HTList * HTPresentations; */
2.1       timbl     153: 
2.12      timbl     154: </PRE>The default presentation is used
2.13    ! timbl     155: when no other is approporate
        !           156: <PRE>/* extern  HTPresentation* default_presentation; */
2.12      timbl     157: 
2.1       timbl     158: </PRE>
                    159: <H2>HTSetPresentation: Register a system
                    160: command to present a format</H2>
2.8       timbl     161: <H3>On entry,</H3>
2.1       timbl     162: <DL>
                    163: <DT>rep
                    164: <DD> is the MIME - style format name
                    165: <DT>command
                    166: <DD> is the MAILCAP - style command
                    167: template
                    168: <DT>quality
                    169: <DD> A degradation faction 0..1
                    170: <DT>maxbytes
                    171: <DD> A limit on the length acceptable
                    172: as input (0 infinite)
                    173: <DT>maxsecs
                    174: <DD> A limit on the time user
                    175: will wait (0 for infinity)
                    176: </DL>
                    177: 
                    178: <PRE>extern void HTSetPresentation PARAMS((
2.13    ! timbl     179:        HTList *        conversions,
        !           180:        CONST char *    representation,
        !           181:        CONST char *    command,
        !           182:        float           quality,
        !           183:        float           secs, 
        !           184:        float           secs_per_byte
2.1       timbl     185: ));
                    186: 
                    187: 
                    188: </PRE>
                    189: <H2>HTSetConversion:   Register a converstion
                    190: routine</H2>
2.8       timbl     191: <H3>On entry,</H3>
2.1       timbl     192: <DL>
                    193: <DT>rep_in
                    194: <DD> is the content-type input
                    195: <DT>rep_out
                    196: <DD> is the resulting content-type
                    197: <DT>converter
                    198: <DD> is the routine to make
                    199: the stream to do it
                    200: </DL>
                    201: 
                    202: <PRE>
                    203: extern void HTSetConversion PARAMS((
2.13    ! timbl     204:        HTList *        conversions,
2.1       timbl     205:        CONST char *    rep_in,
                    206:        CONST char *    rep_out,
2.2       timbl     207:        HTConverter *   converter,
2.1       timbl     208:        float           quality,
                    209:        float           secs, 
                    210:        float           secs_per_byte
                    211: ));
                    212: 
                    213: 
                    214: </PRE>
                    215: <H2><A
2.10      timbl     216: NAME="z3">HTStreamStack:   Create a stack of
2.1       timbl     217: streams</A></H2>This is the routine which actually
                    218: sets up the conversion. It currently
                    219: checks only for direct conversions,
2.8       timbl     220: but multi-stage conversions are forseen.
2.2       timbl     221: It takes a stream into which the
2.1       timbl     222: output should be sent in the final
                    223: format, builds the conversion stack,
                    224: and returns a stream into which the
                    225: data in the input format should be
                    226: fed.  The anchor is passed because
                    227: hypertxet objects load information
                    228: into the anchor object which represents
                    229: them.
                    230: <PRE>extern HTStream * HTStreamStack PARAMS((
                    231:        HTFormat                format_in,
2.13    ! timbl     232:        HTRequest *             request));
2.1       timbl     233: 
                    234: </PRE>
                    235: <H2>HTStackValue: Find the cost of a
                    236: filter stack</H2>Must return the cost of the same
                    237: stack which HTStreamStack would set
                    238: up.
2.8       timbl     239: <H3>On entry,</H3>
2.1       timbl     240: <DL>
                    241: <DT>format_in
                    242: <DD> The fomat of the data to
                    243: be converted
                    244: <DT>format_out
                    245: <DD> The format required
                    246: <DT>initial_value
                    247: <DD> The intrinsic "value"
                    248: of the data before conversion on
                    249: a scale from 0 to 1
                    250: <DT>length
                    251: <DD> The number of bytes expected
                    252: in the input format
                    253: </DL>
                    254: 
                    255: <PRE>extern float HTStackValue PARAMS((
2.13    ! timbl     256:        HTList *                conversions,
2.1       timbl     257:        HTFormat                format_in,
2.13    ! timbl     258:        HTFormat                format_out,
2.1       timbl     259:        float                   initial_value,
                    260:        long int                length));
                    261: 
                    262: #define NO_VALUE_FOUND -1e20           /* returned if none found */
                    263: 
                    264: </PRE>
                    265: <H2><A
2.10      timbl     266: NAME="z1">HTCopy:  Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6       secret    267: to send data down a stream, typically
2.1       timbl     268: one which has been generated by HTStreamStack.
                    269: <PRE>extern void HTCopy PARAMS((
                    270:        int                     file_number,
                    271:        HTStream*               sink));
                    272: 
                    273:        
2.6       secret    274: </PRE>
                    275: <H2><A
2.10      timbl     276: NAME="c6">HTFileCopy:  Copy a file to a stream</A></H2>This is used by the protocol engines
2.6       secret    277: to send data down a stream, typically
2.7       timbl     278: one which has been generated by HTStreamStack.
                    279: It is currently called by <A
2.12      timbl     280: NAME="z9" HREF="#c7">HTParseFile</A>
2.6       secret    281: <PRE>extern void HTFileCopy PARAMS((
                    282:        FILE*                   fp,
                    283:        HTStream*               sink));
                    284: 
                    285:        
2.7       timbl     286: </PRE>
                    287: <H2><A
2.10      timbl     288: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7       timbl     289: stripping CR characters.</A></H2>It is slower than <A
2.12      timbl     290: NAME="z2" HREF="#z1">HTCopy</A> .
2.1       timbl     291: <PRE>
                    292: extern void HTCopyNoCR PARAMS((
                    293:        int                     file_number,
                    294:        HTStream*               sink));
                    295: 
                    296: 
                    297: </PRE>
                    298: <H2>Clear input buffer and set file number</H2>This routine and the one below provide
                    299: simple character input from sockets.
                    300: (They are left over from the older
                    301: architecure and may not be used very
                    302: much.)  The existence of a common
                    303: routine and buffer saves memory space
                    304: in small implementations.
                    305: <PRE>extern void HTInitInput PARAMS((int file_number));
                    306: 
                    307: </PRE>
                    308: <H2>Get next character from buffer</H2>
                    309: <PRE>extern char HTGetChararcter NOPARAMS;
                    310: 
                    311: 
                    312: </PRE>
                    313: <H2>HTParseSocket: Parse a socket given
                    314: its format</H2>This routine is called by protocol
                    315: modules to load an object.  uses<A
2.12      timbl     316: NAME="z4" HREF="#z3">
2.1       timbl     317: HTStreamStack</A> and the copy routines
                    318: above.  Returns HT_LOADED if succesful,
                    319: &lt;0 if not.
                    320: <PRE>extern int HTParseSocket PARAMS((
                    321:        HTFormat        format_in,
                    322:        int             file_number,
2.13    ! timbl     323:        HTRequest *     request));
2.6       secret    324: 
                    325: </PRE>
                    326: <H2><A
2.10      timbl     327: NAME="c1">HTParseFile: Parse a File through
2.7       timbl     328: a file pointer</A></H2>This routine is called by protocols
                    329: modules to load an object. uses<A
2.12      timbl     330: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7       timbl     331: and <A
2.12      timbl     332: NAME="c7" HREF="#c6">HTFileCopy</A> .  Returns HT_LOADED
2.7       timbl     333: if succesful, &lt;0 if not.
2.6       secret    334: <PRE>extern int HTParseFile PARAMS((
                    335:        HTFormat        format_in,
                    336:        FILE            *fp,
2.13    ! timbl     337:        HTRequest *     request));
2.8       timbl     338: 
                    339: </PRE>
2.11      timbl     340: <H2><A
                    341: NAME="z11">HTNetToText: Convert Net ASCII to
                    342: local representation</A></H2>This is a filter stream suitable
                    343: for taking text from a socket and
                    344: passing it into a stream which expects
                    345: text in the local C representation.
                    346: It does ASCII and newline conversion.
                    347: As usual, pass its output stream
                    348: to it when creating it.
                    349: <PRE>extern HTStream *  HTNetToText PARAMS ((HTStream * sink));
                    350: 
                    351: </PRE>
2.8       timbl     352: <H2>HTFormatInit: Set up default presentations
                    353: and conversions</H2>These are defined in HTInit.c or
                    354: HTSInit.c if these have been replaced.
                    355: If you don't call this routine, and
                    356: you don't define any presentations,
                    357: then this routine will automatically
                    358: be called the first time a conversion
                    359: is needed. However, if you explicitly
                    360: add some conversions (eg using HTLoadRules)
                    361: then you may want also to explicitly
                    362: call this to get the defaults as
                    363: well.
2.13    ! timbl     364: <PRE>extern void HTFormatInit PARAMS((HTList * conversions));
2.1       timbl     365: 
                    366: </PRE>
                    367: <H2>Epilogue</H2>
                    368: <PRE>extern BOOL HTOutputSource;       /* Flag: shortcut parser */
                    369: #endif
                    370: 
2.11      timbl     371: </PRE>end</BODY>
2.10      timbl     372: </HTML>

Webmaster