Annotation of libwww/Library/src/HTFormat.html, revision 2.16
2.10 timbl 1: <HTML>
2: <HEAD>
2.1 timbl 3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.15 timbl 4: <NEXTID N="z18">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
7: <H1>Manage different document formats</H1>Here we describe the functions of
8: the HTFormat module which handles
9: conversion between different data
10: representations. (In MIME parlance,
11: a representation is known as a content-type.
2.2 timbl 12: In WWW the term "format" is often
2.1 timbl 13: used as it is shorter).<P>
14: This module is implemented by <A
2.10 timbl 15: NAME="z0" HREF="HTFormat.c">HTFormat.c</A>
2.7 timbl 16: . This hypertext document is used
17: to generate the <A
2.11 timbl 18: NAME="z8" HREF="HTFormat.h">HTFormat.h</A> include
2.9 timbl 19: file. Part of the <A
2.10 timbl 20: NAME="z10" HREF="Overview.html">WWW library</A> .
2.1 timbl 21: <H2>Preamble</H2>
22: <PRE>#ifndef HTFORMAT_H
23: #define HTFORMAT_H
24:
25: #include "HTUtils.h"
26: #include <A
2.10 timbl 27: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1 timbl 28: #include "HTAtom.h"
2.2 timbl 29: #include "HTList.h"
2.1 timbl 30:
31: #ifdef SHORT_NAMES
32: #define HTOutputSource HTOuSour
33: #define HTOutputBinary HTOuBina
34: #endif
35:
36: </PRE>
37: <H2>The HTFormat type</H2>We use the HTAtom object for holding
38: representations. This allows faster
39: manipulation (comparison and copying)
2.14 timbl 40: that if we stayed with strings.<P>
41: The following have to be defined
42: in advance of the other include files
43: because of circular references.
2.1 timbl 44: <PRE>typedef HTAtom * HTFormat;
2.13 timbl 45:
2.14 timbl 46: #include <A
47: NAME="z14" HREF="HTAccess.html">"HTAccess.h"</A> /* Required for HTRequest definition */
48:
2.1 timbl 49: </PRE>These macros (which used to be constants)
50: define some basic internally referenced
2.13 timbl 51: representations.
52: <H3>Internal ones</H3>The www/xxx ones are of course not
53: MIME standard.<P>
2.1 timbl 54: www/source is an output format which
55: leaves the input untouched. It is
56: useful for diagnostics, and for users
57: who want to see the original, whatever
58: it is.
2.13 timbl 59: <H3></H3>
60: <PRE>#define WWW_SOURCE HTAtom_for("www/source") /* Whatever it was originally*/
2.1 timbl 61:
62: </PRE>www/present represents the user's
63: perception of the document. If you
64: convert to www/present, you present
65: the material to the user.
66: <PRE>#define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
67:
68: </PRE>The message/rfc822 format means a
69: MIME message or a plain text message
70: with no MIME header. This is what
71: is returned by an HTTP server.
72: <PRE>#define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
2.10 timbl 73:
2.1 timbl 74: </PRE>www/print is like www/present except
75: it represents a printed copy.
76: <PRE>#define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
77:
2.10 timbl 78: </PRE>www/unknown is a really unknown type.
2.11 timbl 79: Some default action is appropriate.
2.10 timbl 80: <PRE>#define WWW_UNKNOWN HTAtom_for("www/unknown")
81:
2.13 timbl 82:
83:
84: </PRE>
85: <H3>MIME ones (a few)</H3>These are regular MIME types. HTML
2.11 timbl 86: is assumed to be added by the W3
87: code. application/octet-stream was
88: mistakenly application/binary in
89: earlier libwww versions (pre 2.11).
2.10 timbl 90: <PRE>#define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 91: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
92: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 93: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 94: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 95: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.7 timbl 96:
2.1 timbl 97: </PRE>We must include the following file
98: after defining HTFormat, to which
2.10 timbl 99: it makes reference.
100: <H2>The HTEncoding type</H2>
101: <PRE>typedef HTAtom* HTEncoding;
102:
103: </PRE>The following are values for the
104: MIME types:
105: <PRE>#define WWW_ENC_7BIT HTAtom_for("7bit")
106: #define WWW_ENC_8BIT HTAtom_for("8bit")
107: #define WWW_ENC_BINARY HTAtom_for("binary")
108:
109: </PRE>We also add
110: <PRE>#define WWW_ENC_COMPRESS HTAtom_for("compress")
111:
112: #include "HTAnchor.h"
2.1 timbl 113:
114: </PRE>
115: <H2>The HTPresentation and HTConverter
116: types</H2>This HTPresentation structure represents
117: a possible conversion algorithm from
118: one format to annother. It includes
119: a pointer to a conversion routine.
120: The conversion routine returns a
121: stream to which data should be fed.
122: See also <A
2.12 timbl 123: NAME="z5" HREF="#z3">HTStreamStack</A> which scans
2.1 timbl 124: the list of registered converters
125: and calls one. See the <A
2.10 timbl 126: NAME="z6" HREF="HTInit.html">initialisation
2.1 timbl 127: module</A> for a list of conversion routines.
128: <PRE>typedef struct _HTPresentation HTPresentation;
129:
2.13 timbl 130: typedef HTStream * <A
131: NAME="z12">HTConverter</A> PARAMS((
132: HTRequest * request,
133: void * param,
134: HTFormat input_format,
135: HTFormat output_format,
136: HTStream * output_stream));
2.1 timbl 137:
138: struct _HTPresentation {
2.13 timbl 139: HTAtom* rep; /* representation name atomized */
2.1 timbl 140: HTAtom* rep_out; /* resulting representation */
2.2 timbl 141: HTConverter *converter; /* The routine to gen the stream stack */
2.1 timbl 142: char * command; /* MIME-format string */
143: float quality; /* Between 0 (bad) and 1 (good) */
144: float secs;
145: float secs_per_byte;
146: };
147:
2.15 timbl 148: </PRE>A global list of converters is kept
2.1 timbl 149: by this module. It is also scanned
150: by modules which want to know the
151: set of formats supported. for example.
2.15 timbl 152: Note there is also an additional
153: list associated with each <A
154: NAME="z16" HREF="HTAccess.html#z5">request</A>.
155: <PRE>extern HTList * <A
156: NAME="z17">HTConversions</A> ;
2.1 timbl 157:
2.12 timbl 158:
2.1 timbl 159: </PRE>
160: <H2>HTSetPresentation: Register a system
161: command to present a format</H2>
2.8 timbl 162: <H3>On entry,</H3>
2.1 timbl 163: <DL>
164: <DT>rep
165: <DD> is the MIME - style format name
166: <DT>command
167: <DD> is the MAILCAP - style command
168: template
169: <DT>quality
170: <DD> A degradation faction 0..1
171: <DT>maxbytes
172: <DD> A limit on the length acceptable
173: as input (0 infinite)
174: <DT>maxsecs
175: <DD> A limit on the time user
176: will wait (0 for infinity)
177: </DL>
178:
179: <PRE>extern void HTSetPresentation PARAMS((
2.13 timbl 180: HTList * conversions,
181: CONST char * representation,
182: CONST char * command,
183: float quality,
184: float secs,
185: float secs_per_byte
2.1 timbl 186: ));
187:
188:
189: </PRE>
190: <H2>HTSetConversion: Register a converstion
191: routine</H2>
2.8 timbl 192: <H3>On entry,</H3>
2.1 timbl 193: <DL>
194: <DT>rep_in
195: <DD> is the content-type input
196: <DT>rep_out
197: <DD> is the resulting content-type
198: <DT>converter
199: <DD> is the routine to make
200: the stream to do it
201: </DL>
202:
203: <PRE>
204: extern void HTSetConversion PARAMS((
2.13 timbl 205: HTList * conversions,
2.1 timbl 206: CONST char * rep_in,
207: CONST char * rep_out,
2.2 timbl 208: HTConverter * converter,
2.1 timbl 209: float quality,
210: float secs,
211: float secs_per_byte
212: ));
213:
214:
215: </PRE>
216: <H2><A
2.10 timbl 217: NAME="z3">HTStreamStack: Create a stack of
2.1 timbl 218: streams</A></H2>This is the routine which actually
219: sets up the conversion. It currently
220: checks only for direct conversions,
2.8 timbl 221: but multi-stage conversions are forseen.
2.2 timbl 222: It takes a stream into which the
2.1 timbl 223: output should be sent in the final
224: format, builds the conversion stack,
225: and returns a stream into which the
226: data in the input format should be
227: fed. The anchor is passed because
228: hypertxet objects load information
229: into the anchor object which represents
230: them.
231: <PRE>extern HTStream * HTStreamStack PARAMS((
232: HTFormat format_in,
2.13 timbl 233: HTRequest * request));
2.1 timbl 234:
235: </PRE>
236: <H2>HTStackValue: Find the cost of a
237: filter stack</H2>Must return the cost of the same
238: stack which HTStreamStack would set
239: up.
2.8 timbl 240: <H3>On entry,</H3>
2.1 timbl 241: <DL>
242: <DT>format_in
243: <DD> The fomat of the data to
244: be converted
245: <DT>format_out
246: <DD> The format required
247: <DT>initial_value
248: <DD> The intrinsic "value"
249: of the data before conversion on
250: a scale from 0 to 1
251: <DT>length
252: <DD> The number of bytes expected
253: in the input format
254: </DL>
255:
256: <PRE>extern float HTStackValue PARAMS((
2.13 timbl 257: HTList * conversions,
2.1 timbl 258: HTFormat format_in,
2.13 timbl 259: HTFormat format_out,
2.1 timbl 260: float initial_value,
261: long int length));
262:
263: #define NO_VALUE_FOUND -1e20 /* returned if none found */
264:
265: </PRE>
266: <H2><A
2.10 timbl 267: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 268: to send data down a stream, typically
2.1 timbl 269: one which has been generated by HTStreamStack.
270: <PRE>extern void HTCopy PARAMS((
271: int file_number,
272: HTStream* sink));
273:
274:
2.6 secret 275: </PRE>
276: <H2><A
2.10 timbl 277: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 278: to send data down a stream, typically
2.7 timbl 279: one which has been generated by HTStreamStack.
280: It is currently called by <A
2.12 timbl 281: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 282: <PRE>extern void HTFileCopy PARAMS((
283: FILE* fp,
284: HTStream* sink));
285:
286:
2.7 timbl 287: </PRE>
288: <H2><A
2.10 timbl 289: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 290: stripping CR characters.</A></H2>It is slower than <A
2.12 timbl 291: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 292: <PRE>
293: extern void HTCopyNoCR PARAMS((
294: int file_number,
295: HTStream* sink));
296:
297:
298: </PRE>
2.14 timbl 299: <H2>HT<A
2.15 timbl 300: NAME="z15"> Input Socket: Buffering for network
301: in</A></H2>This routines provide simple character
2.14 timbl 302: input from sockets. These are used
303: for parsing input in arbitrary IP
304: protocols (Gopher, NNTP, FTP).
305: <PRE>#define INPUT_BUFFER_SIZE 4096 /* Tradeoff spped vs memory*/
306: typedef struct _socket_buffer {
307: char input_buffer[INPUT_BUFFER_SIZE];
308: char * input_pointer;
309: char * input_limit;
310: int input_file_number;
311: } HTInputSocket;
312:
313:
314: </PRE>
315: <H3>Create input buffer and set file
316: number</H3>
317: <PRE>extern HTInputSocket* HTInputSocket_new PARAMS((int file_number));
2.1 timbl 318:
319: </PRE>
2.14 timbl 320: <H3>Get next character from buffer</H3>
321: <PRE>extern char HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
2.1 timbl 322:
2.14 timbl 323: </PRE>
324: <H3>Free input socket buffer</H3>
325: <PRE>extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
2.1 timbl 326:
327: </PRE>
2.16 ! luotonen 328: <PRE>
! 329: PUBLIC char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
! 330: PUBLIC char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
! 331: PUBLIC char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
! 332: PUBLIC BOOL HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));
! 333:
! 334: </PRE>
2.1 timbl 335: <H2>HTParseSocket: Parse a socket given
336: its format</H2>This routine is called by protocol
337: modules to load an object. uses<A
2.12 timbl 338: NAME="z4" HREF="#z3">
2.1 timbl 339: HTStreamStack</A> and the copy routines
340: above. Returns HT_LOADED if succesful,
341: <0 if not.
342: <PRE>extern int HTParseSocket PARAMS((
343: HTFormat format_in,
344: int file_number,
2.13 timbl 345: HTRequest * request));
2.6 secret 346:
347: </PRE>
348: <H2><A
2.10 timbl 349: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 350: a file pointer</A></H2>This routine is called by protocols
351: modules to load an object. uses<A
2.12 timbl 352: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 353: and <A
2.12 timbl 354: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 355: if succesful, <0 if not.
2.6 secret 356: <PRE>extern int HTParseFile PARAMS((
357: HTFormat format_in,
358: FILE *fp,
2.13 timbl 359: HTRequest * request));
2.8 timbl 360:
361: </PRE>
2.11 timbl 362: <H2><A
363: NAME="z11">HTNetToText: Convert Net ASCII to
364: local representation</A></H2>This is a filter stream suitable
365: for taking text from a socket and
366: passing it into a stream which expects
367: text in the local C representation.
368: It does ASCII and newline conversion.
369: As usual, pass its output stream
370: to it when creating it.
371: <PRE>extern HTStream * HTNetToText PARAMS ((HTStream * sink));
372:
373: </PRE>
2.8 timbl 374: <H2>HTFormatInit: Set up default presentations
375: and conversions</H2>These are defined in HTInit.c or
376: HTSInit.c if these have been replaced.
377: If you don't call this routine, and
378: you don't define any presentations,
379: then this routine will automatically
380: be called the first time a conversion
381: is needed. However, if you explicitly
382: add some conversions (eg using HTLoadRules)
383: then you may want also to explicitly
384: call this to get the defaults as
385: well.
2.13 timbl 386: <PRE>extern void HTFormatInit PARAMS((HTList * conversions));
2.1 timbl 387:
388: </PRE>
389: <H2>Epilogue</H2>
390: <PRE>extern BOOL HTOutputSource; /* Flag: shortcut parser */
391: #endif
392:
2.15 timbl 393: </PRE>end</A></BODY>
2.10 timbl 394: </HTML>
Webmaster