Annotation of libwww/Library/src/HTFormat.html, revision 2.28
2.10 timbl 1: <HTML>
2: <HEAD>
2.1 timbl 3: <TITLE>HTFormat: The format manager in the WWW Library</TITLE>
2.15 timbl 4: <NEXTID N="z18">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
7: <H1>Manage different document formats</H1>Here we describe the functions of
2.27 frystyk 8: the HTFormat module which handles conversion between different data
9: representations. (In MIME parlance, a representation is known as a content-
10: type. In WWW the term "format" is often used as it is shorter).<P>
11: This module is implemented by <A NAME="z0" HREF="HTFormat.c">HTFormat.c</A>.
12:
13: The module is a part of the <A NAME="z10" HREF="Overview.html">WWW library</A>.
14:
2.1 timbl 15: <H2>Preamble</H2>
16: <PRE>#ifndef HTFORMAT_H
17: #define HTFORMAT_H
18:
19: #include "HTUtils.h"
20: #include <A
2.10 timbl 21: NAME="z7" HREF="HTStream.html">"HTStream.h"</A>
2.1 timbl 22: #include "HTAtom.h"
2.2 timbl 23: #include "HTList.h"
2.1 timbl 24:
25: #ifdef SHORT_NAMES
26: #define HTOutputSource HTOuSour
27: #define HTOutputBinary HTOuBina
28: #endif
29:
2.18 luotonen 30:
31: typedef struct _HTContentDescription {
32: char * filename;
33: HTAtom * content_type;
34: HTAtom * content_language;
35: HTAtom * content_encoding;
36: int content_length;
37: float quality;
38: } HTContentDescription;
39:
40: PUBLIC void HTAcceptEncoding PARAMS((HTList * list,
41: char * enc,
42: float quality));
43:
44: PUBLIC void HTAcceptLanguage PARAMS((HTList * list,
45: char * lang,
46: float quality));
47:
48: PUBLIC BOOL HTRank PARAMS((HTList * possibilities,
49: HTList * accepted_content_types,
50: HTList * accepted_content_languages,
51: HTList * accepted_content_encodings));
52:
53:
2.1 timbl 54: </PRE>
2.17 luotonen 55: <H2>HT<A
56: NAME="z15"> Input Socket: Buffering for network
57: in</A></H2>This routines provide simple character
58: input from sockets. These are used
59: for parsing input in arbitrary IP
60: protocols (Gopher, NNTP, FTP).
61: <PRE>#define INPUT_BUFFER_SIZE 4096 /* Tradeoff spped vs memory*/
62: typedef struct _socket_buffer {
2.25 luotonen 63: char input_buffer[INPUT_BUFFER_SIZE];
2.17 luotonen 64: char * input_pointer;
65: char * input_limit;
66: int input_file_number;
2.25 luotonen 67: BOOL s_do_buffering;
68: char * s_buffer;
69: int s_buffer_size;
70: char * s_buffer_cur;
2.17 luotonen 71: } HTInputSocket;
72:
73: </PRE>
74: <H3>Create input buffer and set file
75: number</H3>
76: <PRE>extern HTInputSocket* HTInputSocket_new PARAMS((int file_number));
77:
78: </PRE>
79: <H3>Get next character from buffer</H3>
2.24 frystyk 80: <PRE>extern int HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
2.17 luotonen 81:
82: </PRE>
2.22 timbl 83: <H3>Read block from input socket</H3>Read *len characters and return a
84: buffer (don't free) containing *len
85: characters ( *len may have changed).
86: Buffer is not NULL-terminated.
2.17 luotonen 87: <PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
88: int * len));
89:
90: </PRE>
91: <H3>Free input socket buffer</H3>
92: <PRE>extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
93:
2.22 timbl 94:
2.17 luotonen 95: PUBLIC char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
96: PUBLIC char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
97: PUBLIC char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
98: PUBLIC BOOL HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));
99:
100: </PRE>
2.25 luotonen 101:
102:
103: <H3>Security Buffering</H3>
104:
105: When it's necessary to get e.g. the header section, or part of it,
106: exactly as it came from the client to calculate the message digest,
107: these functions turn buffering on and off. All the material returned
108: by <CODE>HTInputSocket_getStatusLine()</CODE>,
109: <CODE>HTInputSocket_getUnfoldedLine()</CODE> and
110: <CODE>HTInputSocket_getLine()</CODE> gets buffered after a call to
111: <CODE>HTInputSocket_startBuffering()</CODE> until either
112: <CODE>HTInputSocket_stopBuffering()</CODE> is called, or an empty line is
113: returned by any of these functions (end of body section).
114: <CODE>HTInputSocket_getBuffer()</CODE> returns the number of
115: characters buffered, and sets the given buffer pointer to point to
116: internal buffer. This buffer exists until <CODE>HTInputSocket</CODE>
117: object is freed.
118: <PRE>
119:
120: PUBLIC void HTInputSocket_startBuffering PARAMS((HTInputSocket * isoc));
121: PUBLIC void HTInputSocket_stopBuffering PARAMS((HTInputSocket * isoc));
122: PUBLIC int HTInputSocket_getBuffer PARAMS((HTInputSocket * isoc,
123: char ** buffer_ptr));
124: </PRE>
125:
2.1 timbl 126: <H2>The HTFormat type</H2>We use the HTAtom object for holding
127: representations. This allows faster
128: manipulation (comparison and copying)
2.14 timbl 129: that if we stayed with strings.<P>
130: The following have to be defined
131: in advance of the other include files
132: because of circular references.
2.1 timbl 133: <PRE>typedef HTAtom * HTFormat;
2.13 timbl 134:
2.14 timbl 135: #include <A
136: NAME="z14" HREF="HTAccess.html">"HTAccess.h"</A> /* Required for HTRequest definition */
137:
2.28 ! frystyk 138: </PRE>
! 139:
! 140: These macros (which used to be constants) define some basic internally
! 141: referenced representations.
! 142:
! 143: <H3>Internal ones</H3>
! 144:
! 145: The www/xxx ones are of course not MIME standard.<P>
! 146:
! 147: star/star is an output format which leaves the input untouched. It is
! 148: useful for diagnostics, and for users who want to see the original,
! 149: whatever it is.
! 150:
! 151: <PRE>
! 152: #define WWW_SOURCE HTAtom_for("*/*") /* Whatever it was originally */
! 153: </PRE>
! 154:
! 155: www/present represents the user's perception of the document. If you
! 156: convert to www/present, you present the material to the user.
2.10 timbl 157:
2.28 ! frystyk 158: <PRE>
! 159: #define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
! 160: </PRE>
! 161:
! 162: The message/rfc822 format means a MIME message or a plain text message
! 163: with no MIME header. This is what is returned by an HTTP server.
! 164:
! 165: <PRE>
! 166: #define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
! 167: </PRE>
! 168:
! 169: www/print is like www/present except it represents a printed copy.
! 170:
! 171: <PRE>
! 172: #define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
! 173: </PRE>
2.13 timbl 174:
2.28 ! frystyk 175: www/unknown is a really unknown type. Some default action is
! 176: appropriate.
2.13 timbl 177:
2.28 ! frystyk 178: <PRE>
! 179: #define WWW_UNKNOWN HTAtom_for("www/unknown")
2.13 timbl 180: </PRE>
2.28 ! frystyk 181:
! 182:
! 183: <H3>MIME ones (a few)</H3>
! 184:
! 185: These are regular MIME types. HTML is assumed to be added by the W3
! 186: code. application/octet-stream was mistakenly application/binary in
2.11 timbl 187: earlier libwww versions (pre 2.11).
2.28 ! frystyk 188:
! 189: <PRE>
! 190: #define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 191: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
192: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 193: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 194: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 195: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.26 frystyk 196: #define WWW_VIDEO HTAtom_for("video/mpeg")
2.28 ! frystyk 197: </PRE>
! 198:
! 199: Extra types used in the library
! 200:
! 201: <PRE>
! 202: #define WWW_NEWSLIST HTAtom_for("text/newslist")
! 203: </PRE>
2.7 timbl 204:
2.28 ! frystyk 205: We must include the following file after defining HTFormat, to which
2.10 timbl 206: it makes reference.
2.28 ! frystyk 207:
2.10 timbl 208: <H2>The HTEncoding type</H2>
209: <PRE>typedef HTAtom* HTEncoding;
210:
211: </PRE>The following are values for the
212: MIME types:
213: <PRE>#define WWW_ENC_7BIT HTAtom_for("7bit")
214: #define WWW_ENC_8BIT HTAtom_for("8bit")
215: #define WWW_ENC_BINARY HTAtom_for("binary")
216:
217: </PRE>We also add
218: <PRE>#define WWW_ENC_COMPRESS HTAtom_for("compress")
219:
220: #include "HTAnchor.h"
2.1 timbl 221:
222: </PRE>
223:
2.28 ! frystyk 224: <H2>The HTPresentation and HTConverter types</H2>
! 225:
! 226: This HTPresentation structure represents a possible conversion
! 227: algorithm from one format to annother. It includes a pointer to a
! 228: conversion routine. The conversion routine returns a stream to which
! 229: data should be fed. See also <A NAME="z5"
! 230: HREF="#z3">HTStreamStack</A> which scans the list of registered
! 231: converters and calls one. See the <A NAME="z6"
! 232: HREF="HTInit.html">initialisation module</A> for a list of conversion
! 233: routines.
! 234:
! 235: <PRE>
! 236: typedef struct _HTPresentation HTPresentation;
! 237:
! 238: typedef HTStream * <A NAME="z12">HTConverter</A> PARAMS((
2.13 timbl 239: HTRequest * request,
240: void * param,
241: HTFormat input_format,
242: HTFormat output_format,
243: HTStream * output_stream));
2.1 timbl 244:
245: struct _HTPresentation {
2.13 timbl 246: HTAtom* rep; /* representation name atomized */
2.1 timbl 247: HTAtom* rep_out; /* resulting representation */
2.2 timbl 248: HTConverter *converter; /* The routine to gen the stream stack */
2.1 timbl 249: char * command; /* MIME-format string */
250: float quality; /* Between 0 (bad) and 1 (good) */
251: float secs;
252: float secs_per_byte;
253: };
2.28 ! frystyk 254: </PRE>
! 255:
! 256: A global list of converters is kept by this module. It is also
! 257: scanned by modules which want to know the set of formats supported.
! 258: for example. Note there is also an additional list associated with
! 259: each <A NAME="z16" HREF="HTAccess.html#z5">request</A>.
2.1 timbl 260:
2.15 timbl 261: <PRE>extern HTList * <A
262: NAME="z17">HTConversions</A> ;
2.1 timbl 263:
2.12 timbl 264:
2.1 timbl 265: </PRE>
266: <H2>HTSetPresentation: Register a system
267: command to present a format</H2>
2.8 timbl 268: <H3>On entry,</H3>
2.1 timbl 269: <DL>
270: <DT>rep
271: <DD> is the MIME - style format name
272: <DT>command
273: <DD> is the MAILCAP - style command
274: template
275: <DT>quality
276: <DD> A degradation faction 0..1
277: <DT>maxbytes
278: <DD> A limit on the length acceptable
279: as input (0 infinite)
280: <DT>maxsecs
281: <DD> A limit on the time user
282: will wait (0 for infinity)
283: </DL>
284:
285: <PRE>extern void HTSetPresentation PARAMS((
2.13 timbl 286: HTList * conversions,
287: CONST char * representation,
288: CONST char * command,
289: float quality,
290: float secs,
291: float secs_per_byte
2.1 timbl 292: ));
293:
294:
295: </PRE>
296: <H2>HTSetConversion: Register a converstion
297: routine</H2>
2.8 timbl 298: <H3>On entry,</H3>
2.1 timbl 299: <DL>
300: <DT>rep_in
301: <DD> is the content-type input
302: <DT>rep_out
303: <DD> is the resulting content-type
304: <DT>converter
305: <DD> is the routine to make
306: the stream to do it
307: </DL>
308:
309: <PRE>
310: extern void HTSetConversion PARAMS((
2.13 timbl 311: HTList * conversions,
2.1 timbl 312: CONST char * rep_in,
313: CONST char * rep_out,
2.2 timbl 314: HTConverter * converter,
2.1 timbl 315: float quality,
316: float secs,
317: float secs_per_byte
318: ));
319:
320:
321: </PRE>
322: <H2><A
2.10 timbl 323: NAME="z3">HTStreamStack: Create a stack of
2.1 timbl 324: streams</A></H2>This is the routine which actually
325: sets up the conversion. It currently
326: checks only for direct conversions,
2.8 timbl 327: but multi-stage conversions are forseen.
2.2 timbl 328: It takes a stream into which the
2.1 timbl 329: output should be sent in the final
330: format, builds the conversion stack,
331: and returns a stream into which the
332: data in the input format should be
333: fed. The anchor is passed because
334: hypertxet objects load information
335: into the anchor object which represents
2.23 luotonen 336: them. <P>
337: If <CODE>guess</CODE> is true and input format is
338: <CODE>www/unknown</CODE>, try to guess the format
339: by looking at the first few butes of the stream. <P>
2.1 timbl 340: <PRE>extern HTStream * HTStreamStack PARAMS((
341: HTFormat format_in,
2.23 luotonen 342: HTRequest * request,
343: BOOL guess));
2.1 timbl 344:
345: </PRE>
346: <H2>HTStackValue: Find the cost of a
347: filter stack</H2>Must return the cost of the same
348: stack which HTStreamStack would set
349: up.
2.8 timbl 350: <H3>On entry,</H3>
2.1 timbl 351: <DL>
352: <DT>format_in
353: <DD> The fomat of the data to
354: be converted
355: <DT>format_out
356: <DD> The format required
357: <DT>initial_value
358: <DD> The intrinsic "value"
359: of the data before conversion on
360: a scale from 0 to 1
361: <DT>length
362: <DD> The number of bytes expected
363: in the input format
364: </DL>
365:
366: <PRE>extern float HTStackValue PARAMS((
2.13 timbl 367: HTList * conversions,
2.1 timbl 368: HTFormat format_in,
2.13 timbl 369: HTFormat format_out,
2.1 timbl 370: float initial_value,
371: long int length));
372:
373: #define NO_VALUE_FOUND -1e20 /* returned if none found */
374:
375: </PRE>
376: <H2><A
2.10 timbl 377: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 378: to send data down a stream, typically
2.22 timbl 379: one which has been generated by HTStreamStack.
380: Returns the number of bytes transferred.
2.19 luotonen 381: <PRE>extern int HTCopy PARAMS((
2.1 timbl 382: int file_number,
383: HTStream* sink));
384:
385:
2.6 secret 386: </PRE>
387: <H2><A
2.10 timbl 388: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 389: to send data down a stream, typically
2.7 timbl 390: one which has been generated by HTStreamStack.
391: It is currently called by <A
2.12 timbl 392: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 393: <PRE>extern void HTFileCopy PARAMS((
394: FILE* fp,
395: HTStream* sink));
396:
397:
2.7 timbl 398: </PRE>
399: <H2><A
2.10 timbl 400: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 401: stripping CR characters.</A></H2>It is slower than <A
2.12 timbl 402: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 403: <PRE>
404: extern void HTCopyNoCR PARAMS((
405: int file_number,
406: HTStream* sink));
407:
2.16 luotonen 408:
409: </PRE>
2.1 timbl 410: <H2>HTParseSocket: Parse a socket given
411: its format</H2>This routine is called by protocol
412: modules to load an object. uses<A
2.12 timbl 413: NAME="z4" HREF="#z3">
2.1 timbl 414: HTStreamStack</A> and the copy routines
415: above. Returns HT_LOADED if succesful,
416: <0 if not.
417: <PRE>extern int HTParseSocket PARAMS((
418: HTFormat format_in,
419: int file_number,
2.13 timbl 420: HTRequest * request));
2.6 secret 421:
422: </PRE>
423: <H2><A
2.10 timbl 424: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 425: a file pointer</A></H2>This routine is called by protocols
426: modules to load an object. uses<A
2.12 timbl 427: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 428: and <A
2.12 timbl 429: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 430: if succesful, <0 if not.
2.6 secret 431: <PRE>extern int HTParseFile PARAMS((
432: HTFormat format_in,
433: FILE *fp,
2.13 timbl 434: HTRequest * request));
2.8 timbl 435:
436: </PRE>
2.11 timbl 437: <H2><A
438: NAME="z11">HTNetToText: Convert Net ASCII to
439: local representation</A></H2>This is a filter stream suitable
440: for taking text from a socket and
441: passing it into a stream which expects
442: text in the local C representation.
443: It does ASCII and newline conversion.
444: As usual, pass its output stream
445: to it when creating it.
446: <PRE>extern HTStream * HTNetToText PARAMS ((HTStream * sink));
447:
448: </PRE>
2.8 timbl 449: <H2>HTFormatInit: Set up default presentations
450: and conversions</H2>These are defined in HTInit.c or
451: HTSInit.c if these have been replaced.
452: If you don't call this routine, and
453: you don't define any presentations,
454: then this routine will automatically
455: be called the first time a conversion
456: is needed. However, if you explicitly
457: add some conversions (eg using HTLoadRules)
458: then you may want also to explicitly
459: call this to get the defaults as
460: well.
2.20 frystyk 461: <PRE>
462: extern void HTFormatInit PARAMS((HTList * conversions));
2.22 timbl 463:
2.21 frystyk 464: </PRE>
2.22 timbl 465: <H2>HTFormatInitNIM: Set up default presentations
466: and conversions</H2>This is a slightly different version
467: of HTFormatInit, but without any
468: conversions that might use third
469: party programs. This is intended
470: for Non Interactive Mode.
471: <PRE>extern void HTFormatInitNIM PARAMS((HTList * conversions));
2.21 frystyk 472:
2.1 timbl 473: </PRE>
2.22 timbl 474: <H2>HTFormatDelete: Remove presentations
475: and conversions</H2>Deletes the list from HTFormatInit
476: or HTFormatInitNIM
477: <PRE>extern void HTFormatDelete PARAMS((HTList * conversions));
2.21 frystyk 478:
479: </PRE>
480:
2.1 timbl 481: <H2>Epilogue</H2>
482: <PRE>extern BOOL HTOutputSource; /* Flag: shortcut parser */
483: #endif
484:
2.22 timbl 485: </PRE>end</BODY>
2.10 timbl 486: </HTML>
Webmaster