Annotation of libwww/Library/src/HTFormat.html, revision 2.12
2.10 timbl 1: <HTML>
2: <HEAD>
2.1 timbl 3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.11 timbl 4: <NEXTID N="z12">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
7: <H1>Manage different document formats</H1>Here we describe the functions of
8: the HTFormat module which handles
9: conversion between different data
10: representations. (In MIME parlance,
11: a representation is known as a content-type.
2.2 timbl 12: In WWW the term "format" is often
2.1 timbl 13: used as it is shorter).<P>
14: This module is implemented by <A
2.10 timbl 15: NAME="z0" HREF="HTFormat.c">HTFormat.c</A>
2.7 timbl 16: . This hypertext document is used
17: to generate the <A
2.11 timbl 18: NAME="z8" HREF="HTFormat.h">HTFormat.h</A> include
2.9 timbl 19: file. Part of the <A
2.10 timbl 20: NAME="z10" HREF="Overview.html">WWW library</A> .
2.1 timbl 21: <H2>Preamble</H2>
22: <PRE>#ifndef HTFORMAT_H
23: #define HTFORMAT_H
24:
25: #include "HTUtils.h"
26: #include <A
2.10 timbl 27: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1 timbl 28: #include "HTAtom.h"
2.2 timbl 29: #include "HTList.h"
2.1 timbl 30:
31: #ifdef SHORT_NAMES
32: #define HTOutputSource HTOuSour
33: #define HTOutputBinary HTOuBina
34: #endif
35:
36: </PRE>
37: <H2>The HTFormat type</H2>We use the HTAtom object for holding
38: representations. This allows faster
39: manipulation (comparison and copying)
40: that if we stayed with strings.
41: <PRE>typedef HTAtom * HTFormat;
42:
43: </PRE>These macros (which used to be constants)
44: define some basic internally referenced
2.2 timbl 45: representations. The www/xxx ones
2.1 timbl 46: are of course not MIME standard.<P>
47: www/source is an output format which
48: leaves the input untouched. It is
49: useful for diagnostics, and for users
50: who want to see the original, whatever
51: it is.
52: <PRE> /* Internal ones */
53: #define WWW_SOURCE HTAtom_for("www/source") /* Whatever it was originally*/
54:
55: </PRE>www/present represents the user's
56: perception of the document. If you
57: convert to www/present, you present
58: the material to the user.
59: <PRE>#define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
60:
61: </PRE>The message/rfc822 format means a
62: MIME message or a plain text message
63: with no MIME header. This is what
64: is returned by an HTTP server.
65: <PRE>#define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
2.10 timbl 66:
2.1 timbl 67: </PRE>www/print is like www/present except
68: it represents a printed copy.
69: <PRE>#define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
70:
2.10 timbl 71: </PRE>www/unknown is a really unknown type.
2.11 timbl 72: Some default action is appropriate.
2.10 timbl 73: <PRE>#define WWW_UNKNOWN HTAtom_for("www/unknown")
74:
2.11 timbl 75: </PRE>These are regular MIME types. HTML
76: is assumed to be added by the W3
77: code. application/octet-stream was
78: mistakenly application/binary in
79: earlier libwww versions (pre 2.11).
2.10 timbl 80: <PRE>#define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 81: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
82: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 83: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 84: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 85: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.7 timbl 86:
2.1 timbl 87: </PRE>We must include the following file
88: after defining HTFormat, to which
2.10 timbl 89: it makes reference.
90: <H2>The HTEncoding type</H2>
91: <PRE>typedef HTAtom* HTEncoding;
92:
93: </PRE>The following are values for the
94: MIME types:
95: <PRE>#define WWW_ENC_7BIT HTAtom_for("7bit")
96: #define WWW_ENC_8BIT HTAtom_for("8bit")
97: #define WWW_ENC_BINARY HTAtom_for("binary")
98:
99: </PRE>We also add
100: <PRE>#define WWW_ENC_COMPRESS HTAtom_for("compress")
101:
102: #include "HTAnchor.h"
2.1 timbl 103:
104: </PRE>
105: <H2>The HTPresentation and HTConverter
106: types</H2>This HTPresentation structure represents
107: a possible conversion algorithm from
108: one format to annother. It includes
109: a pointer to a conversion routine.
110: The conversion routine returns a
111: stream to which data should be fed.
112: See also <A
2.12 ! timbl 113: NAME="z5" HREF="#z3">HTStreamStack</A> which scans
2.1 timbl 114: the list of registered converters
115: and calls one. See the <A
2.10 timbl 116: NAME="z6" HREF="HTInit.html">initialisation
2.1 timbl 117: module</A> for a list of conversion routines.
118: <PRE>typedef struct _HTPresentation HTPresentation;
119:
2.2 timbl 120: typedef HTStream * HTConverter PARAMS((
2.1 timbl 121: HTPresentation * pres,
122: HTParentAnchor * anchor,
123: HTStream * sink));
124:
125: struct _HTPresentation {
126: HTAtom* rep; /* representation name atmoized */
127: HTAtom* rep_out; /* resulting representation */
2.2 timbl 128: HTConverter *converter; /* The routine to gen the stream stack */
2.1 timbl 129: char * command; /* MIME-format string */
130: float quality; /* Between 0 (bad) and 1 (good) */
131: float secs;
132: float secs_per_byte;
133: };
134:
135: </PRE>The list of presentations is kept
136: by this module. It is also scanned
137: by modules which want to know the
138: set of formats supported. for example.
139: <PRE>extern HTList * HTPresentations;
140:
2.12 ! timbl 141: </PRE>The default presentation is used
! 142: when no other is appriporate
! 143: <PRE>extern HTPresentation* default_presentation;
! 144:
2.1 timbl 145: </PRE>
146: <H2>HTSetPresentation: Register a system
147: command to present a format</H2>
2.8 timbl 148: <H3>On entry,</H3>
2.1 timbl 149: <DL>
150: <DT>rep
151: <DD> is the MIME - style format name
152: <DT>command
153: <DD> is the MAILCAP - style command
154: template
155: <DT>quality
156: <DD> A degradation faction 0..1
157: <DT>maxbytes
158: <DD> A limit on the length acceptable
159: as input (0 infinite)
160: <DT>maxsecs
161: <DD> A limit on the time user
162: will wait (0 for infinity)
163: </DL>
164:
165: <PRE>extern void HTSetPresentation PARAMS((
166: CONST char * representation,
167: CONST char * command,
168: float quality,
169: float secs,
170: float secs_per_byte
171: ));
172:
173:
174: </PRE>
175: <H2>HTSetConversion: Register a converstion
176: routine</H2>
2.8 timbl 177: <H3>On entry,</H3>
2.1 timbl 178: <DL>
179: <DT>rep_in
180: <DD> is the content-type input
181: <DT>rep_out
182: <DD> is the resulting content-type
183: <DT>converter
184: <DD> is the routine to make
185: the stream to do it
186: </DL>
187:
188: <PRE>
189: extern void HTSetConversion PARAMS((
190: CONST char * rep_in,
191: CONST char * rep_out,
2.2 timbl 192: HTConverter * converter,
2.1 timbl 193: float quality,
194: float secs,
195: float secs_per_byte
196: ));
197:
198:
199: </PRE>
200: <H2><A
2.10 timbl 201: NAME="z3">HTStreamStack: Create a stack of
2.1 timbl 202: streams</A></H2>This is the routine which actually
203: sets up the conversion. It currently
204: checks only for direct conversions,
2.8 timbl 205: but multi-stage conversions are forseen.
2.2 timbl 206: It takes a stream into which the
2.1 timbl 207: output should be sent in the final
208: format, builds the conversion stack,
209: and returns a stream into which the
210: data in the input format should be
211: fed. The anchor is passed because
212: hypertxet objects load information
213: into the anchor object which represents
214: them.
215: <PRE>extern HTStream * HTStreamStack PARAMS((
216: HTFormat format_in,
217: HTFormat format_out,
218: HTStream* stream_out,
219: HTParentAnchor* anchor));
220:
221: </PRE>
222: <H2>HTStackValue: Find the cost of a
223: filter stack</H2>Must return the cost of the same
224: stack which HTStreamStack would set
225: up.
2.8 timbl 226: <H3>On entry,</H3>
2.1 timbl 227: <DL>
228: <DT>format_in
229: <DD> The fomat of the data to
230: be converted
231: <DT>format_out
232: <DD> The format required
233: <DT>initial_value
234: <DD> The intrinsic "value"
235: of the data before conversion on
236: a scale from 0 to 1
237: <DT>length
238: <DD> The number of bytes expected
239: in the input format
240: </DL>
241:
242: <PRE>extern float HTStackValue PARAMS((
243: HTFormat format_in,
244: HTFormat rep_out,
245: float initial_value,
246: long int length));
247:
248: #define NO_VALUE_FOUND -1e20 /* returned if none found */
249:
250: </PRE>
251: <H2><A
2.10 timbl 252: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 253: to send data down a stream, typically
2.1 timbl 254: one which has been generated by HTStreamStack.
255: <PRE>extern void HTCopy PARAMS((
256: int file_number,
257: HTStream* sink));
258:
259:
2.6 secret 260: </PRE>
261: <H2><A
2.10 timbl 262: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 263: to send data down a stream, typically
2.7 timbl 264: one which has been generated by HTStreamStack.
265: It is currently called by <A
2.12 ! timbl 266: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 267: <PRE>extern void HTFileCopy PARAMS((
268: FILE* fp,
269: HTStream* sink));
270:
271:
2.7 timbl 272: </PRE>
273: <H2><A
2.10 timbl 274: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 275: stripping CR characters.</A></H2>It is slower than <A
2.12 ! timbl 276: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 277: <PRE>
278: extern void HTCopyNoCR PARAMS((
279: int file_number,
280: HTStream* sink));
281:
282:
283: </PRE>
284: <H2>Clear input buffer and set file number</H2>This routine and the one below provide
285: simple character input from sockets.
286: (They are left over from the older
287: architecure and may not be used very
288: much.) The existence of a common
289: routine and buffer saves memory space
290: in small implementations.
291: <PRE>extern void HTInitInput PARAMS((int file_number));
292:
293: </PRE>
294: <H2>Get next character from buffer</H2>
295: <PRE>extern char HTGetChararcter NOPARAMS;
296:
297:
298: </PRE>
299: <H2>HTParseSocket: Parse a socket given
300: its format</H2>This routine is called by protocol
301: modules to load an object. uses<A
2.12 ! timbl 302: NAME="z4" HREF="#z3">
2.1 timbl 303: HTStreamStack</A> and the copy routines
304: above. Returns HT_LOADED if succesful,
305: <0 if not.
306: <PRE>extern int HTParseSocket PARAMS((
307: HTFormat format_in,
308: HTFormat format_out,
309: HTParentAnchor *anchor,
310: int file_number,
2.6 secret 311: HTStream* sink));
312:
313: </PRE>
314: <H2><A
2.10 timbl 315: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 316: a file pointer</A></H2>This routine is called by protocols
317: modules to load an object. uses<A
2.12 ! timbl 318: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 319: and <A
2.12 ! timbl 320: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 321: if succesful, <0 if not.
2.6 secret 322: <PRE>extern int HTParseFile PARAMS((
323: HTFormat format_in,
324: HTFormat format_out,
325: HTParentAnchor *anchor,
326: FILE *fp,
2.1 timbl 327: HTStream* sink));
2.8 timbl 328:
329: </PRE>
2.11 timbl 330: <H2><A
331: NAME="z11">HTNetToText: Convert Net ASCII to
332: local representation</A></H2>This is a filter stream suitable
333: for taking text from a socket and
334: passing it into a stream which expects
335: text in the local C representation.
336: It does ASCII and newline conversion.
337: As usual, pass its output stream
338: to it when creating it.
339: <PRE>extern HTStream * HTNetToText PARAMS ((HTStream * sink));
340:
341: </PRE>
2.8 timbl 342: <H2>HTFormatInit: Set up default presentations
343: and conversions</H2>These are defined in HTInit.c or
344: HTSInit.c if these have been replaced.
345: If you don't call this routine, and
346: you don't define any presentations,
347: then this routine will automatically
348: be called the first time a conversion
349: is needed. However, if you explicitly
350: add some conversions (eg using HTLoadRules)
351: then you may want also to explicitly
352: call this to get the defaults as
353: well.
354: <PRE>extern void HTFormatInit NOPARAMS;
2.1 timbl 355:
356: </PRE>
357: <H2>Epilogue</H2>
358: <PRE>extern BOOL HTOutputSource; /* Flag: shortcut parser */
359: #endif
360:
2.11 timbl 361: </PRE>end</BODY>
2.10 timbl 362: </HTML>
Webmaster