Annotation of libwww/Library/src/HTFormat.html, revision 2.11
2.10 timbl 1: <HTML>
2: <HEAD>
2.1 timbl 3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.11 ! timbl 4: <NEXTID N="z12">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
7: <H1>Manage different document formats</H1>Here we describe the functions of
8: the HTFormat module which handles
9: conversion between different data
10: representations. (In MIME parlance,
11: a representation is known as a content-type.
2.2 timbl 12: In WWW the term "format" is often
2.1 timbl 13: used as it is shorter).<P>
14: This module is implemented by <A
2.10 timbl 15: NAME="z0" HREF="HTFormat.c">HTFormat.c</A>
2.7 timbl 16: . This hypertext document is used
17: to generate the <A
2.11 ! timbl 18: NAME="z8" HREF="HTFormat.h">HTFormat.h</A> include
2.9 timbl 19: file. Part of the <A
2.10 timbl 20: NAME="z10" HREF="Overview.html">WWW library</A> .
2.1 timbl 21: <H2>Preamble</H2>
22: <PRE>#ifndef HTFORMAT_H
23: #define HTFORMAT_H
24:
25: #include "HTUtils.h"
26: #include <A
2.10 timbl 27: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1 timbl 28: #include "HTAtom.h"
2.2 timbl 29: #include "HTList.h"
2.1 timbl 30:
31: #ifdef SHORT_NAMES
32: #define HTOutputSource HTOuSour
33: #define HTOutputBinary HTOuBina
34: #endif
35:
36: </PRE>
37: <H2>The HTFormat type</H2>We use the HTAtom object for holding
38: representations. This allows faster
39: manipulation (comparison and copying)
40: that if we stayed with strings.
41: <PRE>typedef HTAtom * HTFormat;
42:
43: </PRE>These macros (which used to be constants)
44: define some basic internally referenced
2.2 timbl 45: representations. The www/xxx ones
2.1 timbl 46: are of course not MIME standard.<P>
47: www/source is an output format which
48: leaves the input untouched. It is
49: useful for diagnostics, and for users
50: who want to see the original, whatever
51: it is.
52: <PRE> /* Internal ones */
53: #define WWW_SOURCE HTAtom_for("www/source") /* Whatever it was originally*/
54:
55: </PRE>www/present represents the user's
56: perception of the document. If you
57: convert to www/present, you present
58: the material to the user.
59: <PRE>#define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
60:
61: </PRE>The message/rfc822 format means a
62: MIME message or a plain text message
63: with no MIME header. This is what
64: is returned by an HTTP server.
65: <PRE>#define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
2.10 timbl 66:
2.1 timbl 67: </PRE>www/print is like www/present except
68: it represents a printed copy.
69: <PRE>#define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
70:
2.10 timbl 71: </PRE>www/unknown is a really unknown type.
2.11 ! timbl 72: Some default action is appropriate.
2.10 timbl 73: <PRE>#define WWW_UNKNOWN HTAtom_for("www/unknown")
74:
2.11 ! timbl 75: </PRE>These are regular MIME types. HTML
! 76: is assumed to be added by the W3
! 77: code. application/octet-stream was
! 78: mistakenly application/binary in
! 79: earlier libwww versions (pre 2.11).
2.10 timbl 80: <PRE>#define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 81: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
82: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 83: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 84: #define WWW_HTML HTAtom_for("text/html")
2.11 ! timbl 85: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.7 timbl 86:
2.1 timbl 87: </PRE>We must include the following file
88: after defining HTFormat, to which
2.10 timbl 89: it makes reference.
90: <H2>The HTEncoding type</H2>
91: <PRE>typedef HTAtom* HTEncoding;
92:
93: </PRE>The following are values for the
94: MIME types:
95: <PRE>#define WWW_ENC_7BIT HTAtom_for("7bit")
96: #define WWW_ENC_8BIT HTAtom_for("8bit")
97: #define WWW_ENC_BINARY HTAtom_for("binary")
98:
99: </PRE>We also add
100: <PRE>#define WWW_ENC_COMPRESS HTAtom_for("compress")
101:
102: #include "HTAnchor.h"
2.1 timbl 103:
104: </PRE>
105: <H2>The HTPresentation and HTConverter
106: types</H2>This HTPresentation structure represents
107: a possible conversion algorithm from
108: one format to annother. It includes
109: a pointer to a conversion routine.
110: The conversion routine returns a
111: stream to which data should be fed.
112: See also <A
2.11 ! timbl 113: NAME="z5" HREF="HTFormat.html#z3">HTStreamStack</A> which scans
2.1 timbl 114: the list of registered converters
115: and calls one. See the <A
2.10 timbl 116: NAME="z6" HREF="HTInit.html">initialisation
2.1 timbl 117: module</A> for a list of conversion routines.
118: <PRE>typedef struct _HTPresentation HTPresentation;
119:
2.2 timbl 120: typedef HTStream * HTConverter PARAMS((
2.1 timbl 121: HTPresentation * pres,
122: HTParentAnchor * anchor,
123: HTStream * sink));
124:
125: struct _HTPresentation {
126: HTAtom* rep; /* representation name atmoized */
127: HTAtom* rep_out; /* resulting representation */
2.2 timbl 128: HTConverter *converter; /* The routine to gen the stream stack */
2.1 timbl 129: char * command; /* MIME-format string */
130: float quality; /* Between 0 (bad) and 1 (good) */
131: float secs;
132: float secs_per_byte;
133: };
134:
135: </PRE>The list of presentations is kept
136: by this module. It is also scanned
137: by modules which want to know the
138: set of formats supported. for example.
139: <PRE>extern HTList * HTPresentations;
140:
141: </PRE>
142: <H2>HTSetPresentation: Register a system
143: command to present a format</H2>
2.8 timbl 144: <H3>On entry,</H3>
2.1 timbl 145: <DL>
146: <DT>rep
147: <DD> is the MIME - style format name
148: <DT>command
149: <DD> is the MAILCAP - style command
150: template
151: <DT>quality
152: <DD> A degradation faction 0..1
153: <DT>maxbytes
154: <DD> A limit on the length acceptable
155: as input (0 infinite)
156: <DT>maxsecs
157: <DD> A limit on the time user
158: will wait (0 for infinity)
159: </DL>
160:
161: <PRE>extern void HTSetPresentation PARAMS((
162: CONST char * representation,
163: CONST char * command,
164: float quality,
165: float secs,
166: float secs_per_byte
167: ));
168:
169:
170: </PRE>
171: <H2>HTSetConversion: Register a converstion
172: routine</H2>
2.8 timbl 173: <H3>On entry,</H3>
2.1 timbl 174: <DL>
175: <DT>rep_in
176: <DD> is the content-type input
177: <DT>rep_out
178: <DD> is the resulting content-type
179: <DT>converter
180: <DD> is the routine to make
181: the stream to do it
182: </DL>
183:
184: <PRE>
185: extern void HTSetConversion PARAMS((
186: CONST char * rep_in,
187: CONST char * rep_out,
2.2 timbl 188: HTConverter * converter,
2.1 timbl 189: float quality,
190: float secs,
191: float secs_per_byte
192: ));
193:
194:
195: </PRE>
196: <H2><A
2.10 timbl 197: NAME="z3">HTStreamStack: Create a stack of
2.1 timbl 198: streams</A></H2>This is the routine which actually
199: sets up the conversion. It currently
200: checks only for direct conversions,
2.8 timbl 201: but multi-stage conversions are forseen.
2.2 timbl 202: It takes a stream into which the
2.1 timbl 203: output should be sent in the final
204: format, builds the conversion stack,
205: and returns a stream into which the
206: data in the input format should be
207: fed. The anchor is passed because
208: hypertxet objects load information
209: into the anchor object which represents
210: them.
211: <PRE>extern HTStream * HTStreamStack PARAMS((
212: HTFormat format_in,
213: HTFormat format_out,
214: HTStream* stream_out,
215: HTParentAnchor* anchor));
216:
217: </PRE>
218: <H2>HTStackValue: Find the cost of a
219: filter stack</H2>Must return the cost of the same
220: stack which HTStreamStack would set
221: up.
2.8 timbl 222: <H3>On entry,</H3>
2.1 timbl 223: <DL>
224: <DT>format_in
225: <DD> The fomat of the data to
226: be converted
227: <DT>format_out
228: <DD> The format required
229: <DT>initial_value
230: <DD> The intrinsic "value"
231: of the data before conversion on
232: a scale from 0 to 1
233: <DT>length
234: <DD> The number of bytes expected
235: in the input format
236: </DL>
237:
238: <PRE>extern float HTStackValue PARAMS((
239: HTFormat format_in,
240: HTFormat rep_out,
241: float initial_value,
242: long int length));
243:
244: #define NO_VALUE_FOUND -1e20 /* returned if none found */
245:
246: </PRE>
247: <H2><A
2.10 timbl 248: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 249: to send data down a stream, typically
2.1 timbl 250: one which has been generated by HTStreamStack.
251: <PRE>extern void HTCopy PARAMS((
252: int file_number,
253: HTStream* sink));
254:
255:
2.6 secret 256: </PRE>
257: <H2><A
2.10 timbl 258: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 259: to send data down a stream, typically
2.7 timbl 260: one which has been generated by HTStreamStack.
261: It is currently called by <A
2.11 ! timbl 262: NAME="z9" HREF="HTFormat.html#c7">HTParseFile</A>
2.6 secret 263: <PRE>extern void HTFileCopy PARAMS((
264: FILE* fp,
265: HTStream* sink));
266:
267:
2.7 timbl 268: </PRE>
269: <H2><A
2.10 timbl 270: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 271: stripping CR characters.</A></H2>It is slower than <A
2.11 ! timbl 272: NAME="z2" HREF="HTFormat.html#z1">HTCopy</A> .
2.1 timbl 273: <PRE>
274: extern void HTCopyNoCR PARAMS((
275: int file_number,
276: HTStream* sink));
277:
278:
279: </PRE>
280: <H2>Clear input buffer and set file number</H2>This routine and the one below provide
281: simple character input from sockets.
282: (They are left over from the older
283: architecure and may not be used very
284: much.) The existence of a common
285: routine and buffer saves memory space
286: in small implementations.
287: <PRE>extern void HTInitInput PARAMS((int file_number));
288:
289: </PRE>
290: <H2>Get next character from buffer</H2>
291: <PRE>extern char HTGetChararcter NOPARAMS;
292:
293:
294: </PRE>
295: <H2>HTParseSocket: Parse a socket given
296: its format</H2>This routine is called by protocol
297: modules to load an object. uses<A
2.11 ! timbl 298: NAME="z4" HREF="HTFormat.html#z3">
2.1 timbl 299: HTStreamStack</A> and the copy routines
300: above. Returns HT_LOADED if succesful,
301: <0 if not.
302: <PRE>extern int HTParseSocket PARAMS((
303: HTFormat format_in,
304: HTFormat format_out,
305: HTParentAnchor *anchor,
306: int file_number,
2.6 secret 307: HTStream* sink));
308:
309: </PRE>
310: <H2><A
2.10 timbl 311: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 312: a file pointer</A></H2>This routine is called by protocols
313: modules to load an object. uses<A
2.11 ! timbl 314: NAME="z4" HREF="HTFormat.html#z3"> HTStreamStack</A>
2.7 timbl 315: and <A
2.11 ! timbl 316: NAME="c7" HREF="HTFormat.html#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 317: if succesful, <0 if not.
2.6 secret 318: <PRE>extern int HTParseFile PARAMS((
319: HTFormat format_in,
320: HTFormat format_out,
321: HTParentAnchor *anchor,
322: FILE *fp,
2.1 timbl 323: HTStream* sink));
2.8 timbl 324:
325: </PRE>
2.11 ! timbl 326: <H2><A
! 327: NAME="z11">HTNetToText: Convert Net ASCII to
! 328: local representation</A></H2>This is a filter stream suitable
! 329: for taking text from a socket and
! 330: passing it into a stream which expects
! 331: text in the local C representation.
! 332: It does ASCII and newline conversion.
! 333: As usual, pass its output stream
! 334: to it when creating it.
! 335: <PRE>extern HTStream * HTNetToText PARAMS ((HTStream * sink));
! 336:
! 337: </PRE>
2.8 timbl 338: <H2>HTFormatInit: Set up default presentations
339: and conversions</H2>These are defined in HTInit.c or
340: HTSInit.c if these have been replaced.
341: If you don't call this routine, and
342: you don't define any presentations,
343: then this routine will automatically
344: be called the first time a conversion
345: is needed. However, if you explicitly
346: add some conversions (eg using HTLoadRules)
347: then you may want also to explicitly
348: call this to get the defaults as
349: well.
350: <PRE>extern void HTFormatInit NOPARAMS;
2.1 timbl 351:
352: </PRE>
353: <H2>Epilogue</H2>
354: <PRE>extern BOOL HTOutputSource; /* Flag: shortcut parser */
355: #endif
356:
2.11 ! timbl 357: </PRE>end</BODY>
2.10 timbl 358: </HTML>
Webmaster