Annotation of libwww/Library/src/HTFormat.html, revision 2.22
2.10 timbl 1: <HTML>
2: <HEAD>
2.1 timbl 3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.15 timbl 4: <NEXTID N="z18">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
7: <H1>Manage different document formats</H1>Here we describe the functions of
8: the HTFormat module which handles
9: conversion between different data
10: representations. (In MIME parlance,
11: a representation is known as a content-type.
2.2 timbl 12: In WWW the term "format" is often
2.1 timbl 13: used as it is shorter).<P>
14: This module is implemented by <A
2.10 timbl 15: NAME="z0" HREF="HTFormat.c">HTFormat.c</A>
2.7 timbl 16: . This hypertext document is used
17: to generate the <A
2.11 timbl 18: NAME="z8" HREF="HTFormat.h">HTFormat.h</A> include
2.9 timbl 19: file. Part of the <A
2.10 timbl 20: NAME="z10" HREF="Overview.html">WWW library</A> .
2.1 timbl 21: <H2>Preamble</H2>
22: <PRE>#ifndef HTFORMAT_H
23: #define HTFORMAT_H
24:
25: #include "HTUtils.h"
26: #include <A
2.10 timbl 27: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1 timbl 28: #include "HTAtom.h"
2.2 timbl 29: #include "HTList.h"
2.1 timbl 30:
31: #ifdef SHORT_NAMES
32: #define HTOutputSource HTOuSour
33: #define HTOutputBinary HTOuBina
34: #endif
35:
2.18 luotonen 36:
37: typedef struct _HTContentDescription {
38: char * filename;
39: HTAtom * content_type;
40: HTAtom * content_language;
41: HTAtom * content_encoding;
42: int content_length;
43: float quality;
44: } HTContentDescription;
45:
46: PUBLIC void HTAcceptEncoding PARAMS((HTList * list,
47: char * enc,
48: float quality));
49:
50: PUBLIC void HTAcceptLanguage PARAMS((HTList * list,
51: char * lang,
52: float quality));
53:
54: PUBLIC BOOL HTRank PARAMS((HTList * possibilities,
55: HTList * accepted_content_types,
56: HTList * accepted_content_languages,
57: HTList * accepted_content_encodings));
58:
59:
2.1 timbl 60: </PRE>
2.17 luotonen 61: <H2>HT<A
62: NAME="z15"> Input Socket: Buffering for network
63: in</A></H2>This routines provide simple character
64: input from sockets. These are used
65: for parsing input in arbitrary IP
66: protocols (Gopher, NNTP, FTP).
67: <PRE>#define INPUT_BUFFER_SIZE 4096 /* Tradeoff spped vs memory*/
68: typedef struct _socket_buffer {
69: char input_buffer[INPUT_BUFFER_SIZE];
70: char * input_pointer;
71: char * input_limit;
72: int input_file_number;
73: } HTInputSocket;
74:
75: </PRE>
76: <H3>Create input buffer and set file
77: number</H3>
78: <PRE>extern HTInputSocket* HTInputSocket_new PARAMS((int file_number));
79:
80: </PRE>
81: <H3>Get next character from buffer</H3>
82: <PRE>extern char HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
83:
84: </PRE>
2.22 ! timbl 85: <H3>Read block from input socket</H3>Read *len characters and return a
! 86: buffer (don't free) containing *len
! 87: characters ( *len may have changed).
! 88: Buffer is not NULL-terminated.
2.17 luotonen 89: <PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
90: int * len));
91:
92: </PRE>
93: <H3>Free input socket buffer</H3>
94: <PRE>extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
95:
2.22 ! timbl 96:
2.17 luotonen 97: PUBLIC char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
98: PUBLIC char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
99: PUBLIC char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
100: PUBLIC BOOL HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));
101:
102: </PRE>
2.1 timbl 103: <H2>The HTFormat type</H2>We use the HTAtom object for holding
104: representations. This allows faster
105: manipulation (comparison and copying)
2.14 timbl 106: that if we stayed with strings.<P>
107: The following have to be defined
108: in advance of the other include files
109: because of circular references.
2.1 timbl 110: <PRE>typedef HTAtom * HTFormat;
2.13 timbl 111:
2.14 timbl 112: #include <A
113: NAME="z14" HREF="HTAccess.html">"HTAccess.h"</A> /* Required for HTRequest definition */
114:
2.1 timbl 115: </PRE>These macros (which used to be constants)
116: define some basic internally referenced
2.13 timbl 117: representations.
118: <H3>Internal ones</H3>The www/xxx ones are of course not
119: MIME standard.<P>
2.20 frystyk 120: star/star is an output format which
2.1 timbl 121: leaves the input untouched. It is
122: useful for diagnostics, and for users
123: who want to see the original, whatever
124: it is.
2.13 timbl 125: <H3></H3>
2.20 frystyk 126: <PRE>#define WWW_SOURCE HTAtom_for("*/*") /* Whatever it was originally*/
2.1 timbl 127:
128: </PRE>www/present represents the user's
129: perception of the document. If you
130: convert to www/present, you present
131: the material to the user.
132: <PRE>#define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
133:
134: </PRE>The message/rfc822 format means a
135: MIME message or a plain text message
136: with no MIME header. This is what
137: is returned by an HTTP server.
138: <PRE>#define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
2.10 timbl 139:
2.1 timbl 140: </PRE>www/print is like www/present except
141: it represents a printed copy.
142: <PRE>#define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
143:
2.10 timbl 144: </PRE>www/unknown is a really unknown type.
2.11 timbl 145: Some default action is appropriate.
2.10 timbl 146: <PRE>#define WWW_UNKNOWN HTAtom_for("www/unknown")
147:
2.13 timbl 148:
149:
150: </PRE>
151: <H3>MIME ones (a few)</H3>These are regular MIME types. HTML
2.11 timbl 152: is assumed to be added by the W3
153: code. application/octet-stream was
154: mistakenly application/binary in
155: earlier libwww versions (pre 2.11).
2.10 timbl 156: <PRE>#define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 157: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
158: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 159: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 160: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 161: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.7 timbl 162:
2.1 timbl 163: </PRE>We must include the following file
164: after defining HTFormat, to which
2.10 timbl 165: it makes reference.
166: <H2>The HTEncoding type</H2>
167: <PRE>typedef HTAtom* HTEncoding;
168:
169: </PRE>The following are values for the
170: MIME types:
171: <PRE>#define WWW_ENC_7BIT HTAtom_for("7bit")
172: #define WWW_ENC_8BIT HTAtom_for("8bit")
173: #define WWW_ENC_BINARY HTAtom_for("binary")
174:
175: </PRE>We also add
176: <PRE>#define WWW_ENC_COMPRESS HTAtom_for("compress")
177:
178: #include "HTAnchor.h"
2.1 timbl 179:
180: </PRE>
181: <H2>The HTPresentation and HTConverter
182: types</H2>This HTPresentation structure represents
183: a possible conversion algorithm from
184: one format to annother. It includes
185: a pointer to a conversion routine.
186: The conversion routine returns a
187: stream to which data should be fed.
188: See also <A
2.12 timbl 189: NAME="z5" HREF="#z3">HTStreamStack</A> which scans
2.1 timbl 190: the list of registered converters
191: and calls one. See the <A
2.10 timbl 192: NAME="z6" HREF="HTInit.html">initialisation
2.1 timbl 193: module</A> for a list of conversion routines.
194: <PRE>typedef struct _HTPresentation HTPresentation;
195:
2.13 timbl 196: typedef HTStream * <A
197: NAME="z12">HTConverter</A> PARAMS((
198: HTRequest * request,
199: void * param,
200: HTFormat input_format,
201: HTFormat output_format,
202: HTStream * output_stream));
2.1 timbl 203:
204: struct _HTPresentation {
2.13 timbl 205: HTAtom* rep; /* representation name atomized */
2.1 timbl 206: HTAtom* rep_out; /* resulting representation */
2.2 timbl 207: HTConverter *converter; /* The routine to gen the stream stack */
2.1 timbl 208: char * command; /* MIME-format string */
209: float quality; /* Between 0 (bad) and 1 (good) */
210: float secs;
211: float secs_per_byte;
212: };
213:
2.15 timbl 214: </PRE>A global list of converters is kept
2.1 timbl 215: by this module. It is also scanned
216: by modules which want to know the
217: set of formats supported. for example.
2.22 ! timbl 218: Note there is also an additional
2.15 timbl 219: list associated with each <A
2.22 ! timbl 220: NAME="z16" HREF="HTAccess.html#z5">request</A>
! 221: .
2.15 timbl 222: <PRE>extern HTList * <A
223: NAME="z17">HTConversions</A> ;
2.1 timbl 224:
2.12 timbl 225:
2.1 timbl 226: </PRE>
227: <H2>HTSetPresentation: Register a system
228: command to present a format</H2>
2.8 timbl 229: <H3>On entry,</H3>
2.1 timbl 230: <DL>
231: <DT>rep
232: <DD> is the MIME - style format name
233: <DT>command
234: <DD> is the MAILCAP - style command
235: template
236: <DT>quality
237: <DD> A degradation faction 0..1
238: <DT>maxbytes
239: <DD> A limit on the length acceptable
240: as input (0 infinite)
241: <DT>maxsecs
242: <DD> A limit on the time user
243: will wait (0 for infinity)
244: </DL>
245:
246: <PRE>extern void HTSetPresentation PARAMS((
2.13 timbl 247: HTList * conversions,
248: CONST char * representation,
249: CONST char * command,
250: float quality,
251: float secs,
252: float secs_per_byte
2.1 timbl 253: ));
254:
255:
256: </PRE>
257: <H2>HTSetConversion: Register a converstion
258: routine</H2>
2.8 timbl 259: <H3>On entry,</H3>
2.1 timbl 260: <DL>
261: <DT>rep_in
262: <DD> is the content-type input
263: <DT>rep_out
264: <DD> is the resulting content-type
265: <DT>converter
266: <DD> is the routine to make
267: the stream to do it
268: </DL>
269:
270: <PRE>
271: extern void HTSetConversion PARAMS((
2.13 timbl 272: HTList * conversions,
2.1 timbl 273: CONST char * rep_in,
274: CONST char * rep_out,
2.2 timbl 275: HTConverter * converter,
2.1 timbl 276: float quality,
277: float secs,
278: float secs_per_byte
279: ));
280:
281:
282: </PRE>
283: <H2><A
2.10 timbl 284: NAME="z3">HTStreamStack: Create a stack of
2.1 timbl 285: streams</A></H2>This is the routine which actually
286: sets up the conversion. It currently
287: checks only for direct conversions,
2.8 timbl 288: but multi-stage conversions are forseen.
2.2 timbl 289: It takes a stream into which the
2.1 timbl 290: output should be sent in the final
291: format, builds the conversion stack,
292: and returns a stream into which the
293: data in the input format should be
294: fed. The anchor is passed because
295: hypertxet objects load information
296: into the anchor object which represents
297: them.
298: <PRE>extern HTStream * HTStreamStack PARAMS((
299: HTFormat format_in,
2.13 timbl 300: HTRequest * request));
2.1 timbl 301:
302: </PRE>
303: <H2>HTStackValue: Find the cost of a
304: filter stack</H2>Must return the cost of the same
305: stack which HTStreamStack would set
306: up.
2.8 timbl 307: <H3>On entry,</H3>
2.1 timbl 308: <DL>
309: <DT>format_in
310: <DD> The fomat of the data to
311: be converted
312: <DT>format_out
313: <DD> The format required
314: <DT>initial_value
315: <DD> The intrinsic "value"
316: of the data before conversion on
317: a scale from 0 to 1
318: <DT>length
319: <DD> The number of bytes expected
320: in the input format
321: </DL>
322:
323: <PRE>extern float HTStackValue PARAMS((
2.13 timbl 324: HTList * conversions,
2.1 timbl 325: HTFormat format_in,
2.13 timbl 326: HTFormat format_out,
2.1 timbl 327: float initial_value,
328: long int length));
329:
330: #define NO_VALUE_FOUND -1e20 /* returned if none found */
331:
332: </PRE>
333: <H2><A
2.10 timbl 334: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 335: to send data down a stream, typically
2.22 ! timbl 336: one which has been generated by HTStreamStack.
! 337: Returns the number of bytes transferred.
2.19 luotonen 338: <PRE>extern int HTCopy PARAMS((
2.1 timbl 339: int file_number,
340: HTStream* sink));
341:
342:
2.6 secret 343: </PRE>
344: <H2><A
2.10 timbl 345: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 346: to send data down a stream, typically
2.7 timbl 347: one which has been generated by HTStreamStack.
348: It is currently called by <A
2.12 timbl 349: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 350: <PRE>extern void HTFileCopy PARAMS((
351: FILE* fp,
352: HTStream* sink));
353:
354:
2.7 timbl 355: </PRE>
356: <H2><A
2.10 timbl 357: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 358: stripping CR characters.</A></H2>It is slower than <A
2.12 timbl 359: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 360: <PRE>
361: extern void HTCopyNoCR PARAMS((
362: int file_number,
363: HTStream* sink));
364:
2.16 luotonen 365:
366: </PRE>
2.1 timbl 367: <H2>HTParseSocket: Parse a socket given
368: its format</H2>This routine is called by protocol
369: modules to load an object. uses<A
2.12 timbl 370: NAME="z4" HREF="#z3">
2.1 timbl 371: HTStreamStack</A> and the copy routines
372: above. Returns HT_LOADED if succesful,
373: <0 if not.
374: <PRE>extern int HTParseSocket PARAMS((
375: HTFormat format_in,
376: int file_number,
2.13 timbl 377: HTRequest * request));
2.6 secret 378:
379: </PRE>
380: <H2><A
2.10 timbl 381: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 382: a file pointer</A></H2>This routine is called by protocols
383: modules to load an object. uses<A
2.12 timbl 384: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 385: and <A
2.12 timbl 386: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 387: if succesful, <0 if not.
2.6 secret 388: <PRE>extern int HTParseFile PARAMS((
389: HTFormat format_in,
390: FILE *fp,
2.13 timbl 391: HTRequest * request));
2.8 timbl 392:
393: </PRE>
2.11 timbl 394: <H2><A
395: NAME="z11">HTNetToText: Convert Net ASCII to
396: local representation</A></H2>This is a filter stream suitable
397: for taking text from a socket and
398: passing it into a stream which expects
399: text in the local C representation.
400: It does ASCII and newline conversion.
401: As usual, pass its output stream
402: to it when creating it.
403: <PRE>extern HTStream * HTNetToText PARAMS ((HTStream * sink));
404:
405: </PRE>
2.8 timbl 406: <H2>HTFormatInit: Set up default presentations
407: and conversions</H2>These are defined in HTInit.c or
408: HTSInit.c if these have been replaced.
409: If you don't call this routine, and
410: you don't define any presentations,
411: then this routine will automatically
412: be called the first time a conversion
413: is needed. However, if you explicitly
414: add some conversions (eg using HTLoadRules)
415: then you may want also to explicitly
416: call this to get the defaults as
417: well.
2.20 frystyk 418: <PRE>
419: extern void HTFormatInit PARAMS((HTList * conversions));
2.22 ! timbl 420:
2.21 frystyk 421: </PRE>
2.22 ! timbl 422: <H2>HTFormatInitNIM: Set up default presentations
! 423: and conversions</H2>This is a slightly different version
! 424: of HTFormatInit, but without any
! 425: conversions that might use third
! 426: party programs. This is intended
! 427: for Non Interactive Mode.
! 428: <PRE>extern void HTFormatInitNIM PARAMS((HTList * conversions));
2.21 frystyk 429:
2.1 timbl 430: </PRE>
2.22 ! timbl 431: <H2>HTFormatDelete: Remove presentations
! 432: and conversions</H2>Deletes the list from HTFormatInit
! 433: or HTFormatInitNIM
! 434: <PRE>extern void HTFormatDelete PARAMS((HTList * conversions));
2.21 frystyk 435:
436: </PRE>
437:
2.1 timbl 438: <H2>Epilogue</H2>
439: <PRE>extern BOOL HTOutputSource; /* Flag: shortcut parser */
440: #endif
441:
2.22 ! timbl 442: </PRE>end</BODY>
2.10 timbl 443: </HTML>
Webmaster