Annotation of libwww/Library/src/HTFormat.html, revision 2.37.2.1
2.10 timbl 1: <HTML>
2: <HEAD>
2.31 frystyk 3: <TITLE>The Format Manager in the WWW Library</TITLE>
2.15 timbl 4: <NEXTID N="z18">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
2.27 frystyk 7:
2.31 frystyk 8: <H1>The Format Manager</H1>
2.33 frystyk 9:
10: <PRE>
11: /*
12: ** (c) COPYRIGHT CERN 1994.
13: ** Please first read the full copyright statement in the file COPYRIGH.
14: */
15: </PRE>
2.31 frystyk 16:
17: Here we describe the functions of the HTFormat module which handles
18: conversion between different data representations. (In MIME parlance,
19: a representation is known as a content-type. In <A
2.37.2.1! frystyk 20: HREF="http://www.w3.org/hypertext/WWW/TheProject.html">WWW</A> the
2.31 frystyk 21: term <EM>format</EM> is often used as it is shorter). The content of
22: this module is:
23:
24: <UL>
25: <LI><A HREF="#z15">Buffering for I/O</A>
26: <LI><A HREF="#FormatTypes">Predefined Format Types</A>
27: <LI><A HREF="#Encoding">Registration of Accepted Content Encodings</A>
28: <LI><A HREF="#Language">Registration of Accepted Content Languages</A>
29: <LI><A HREF="#Type">Registration of Accepted Converters and Presenters</A>
30: <LI><A HREF="#Rank">Ranking of Accepted Formats</A>
31: <LI><A HREF="#z3">The Stream Stack</A>
2.37.2.1! frystyk 32: <LI><A HREF="#Read">Read from a Socket</A>
2.31 frystyk 33: </UL>
34:
35: This module is implemented by <A HREF="HTFormat.c">HTFormat.c</A>, and
36: it is a part of the <A NAME="z10"
2.37.2.1! frystyk 37: HREF="http://www.w3.org/hypertext/WWW/Library/User/Guide/Guide.html">Library
2.31 frystyk 38: of Common Code</A>.
2.27 frystyk 39:
2.31 frystyk 40: <PRE>
41: #ifndef HTFORMAT_H
2.1 timbl 42: #define HTFORMAT_H
43:
2.31 frystyk 44: #include <A HREF="HTUtils.html">"HTUtils.h"</A>
45: #include <A HREF="HTStream.html">"HTStream.h"</A>
46: #include <A HREF="HTAtom.html">"HTAtom.h"</A>
47: #include <A HREF="HTList.html">"HTList.h"</A>
48: </PRE>
2.1 timbl 49:
2.31 frystyk 50: <H2><A NAME="z15">Buffering for network I/O</A></H2>
2.18 luotonen 51:
2.31 frystyk 52: This routines provide buffering for READ (and future WRITE) to the
53: network. It is used by all the protocol modules. The size of the
54: buffer, <CODE>INPUT_BUFFER_SIZE</CODE>, is a compromis between speed
55: and memory.
2.18 luotonen 56:
2.31 frystyk 57: <PRE>
58: #define INPUT_BUFFER_SIZE 8192
2.18 luotonen 59:
2.17 luotonen 60: typedef struct _socket_buffer {
2.36 frystyk 61: char input_buffer[INPUT_BUFFER_SIZE];
62: char * input_pointer;
63: char * input_limit;
64: SOCKFD input_file_number;
2.17 luotonen 65: } HTInputSocket;
2.31 frystyk 66: </PRE>
67:
68: <H3>Create an Input Buffer</H3>
2.17 luotonen 69:
2.31 frystyk 70: This function allocates a input buffer and binds it to the socket
71: descriptor given as parameter.
72:
73: <PRE>
2.36 frystyk 74: extern HTInputSocket* HTInputSocket_new PARAMS((SOCKFD file_number));
2.17 luotonen 75: </PRE>
76:
2.31 frystyk 77: <H3>Free an Input Buffer</H3>
78:
79: <PRE>
80: extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
2.17 luotonen 81: </PRE>
82:
2.31 frystyk 83: <A NAME="FormatTypes"><H2>The HTFormat types</H2></A>
2.25 luotonen 84:
2.31 frystyk 85: We use the <A HREF="HTAtom.html">HTAtom</A> object for holding
86: representations. This allows faster manipulation (comparison and
87: copying) that if we stayed with strings. These macros (which used to
88: be constants) define some basic internally referenced
89: representations.<P>
2.25 luotonen 90:
2.31 frystyk 91: <PRE>
92: typedef HTAtom * HTFormat;
2.37.2.1! frystyk 93: typedef HTAtom * HTCharset;
! 94: typedef HTAtom * HTLevel;
2.28 frystyk 95: </PRE>
96:
97: <H3>Internal ones</H3>
98:
2.31 frystyk 99: The <CODE>www/xxx</CODE> ones are of course not MIME
100: standard. <P>
2.28 frystyk 101:
2.31 frystyk 102: <CODE>star/star</CODE> is an output format which leaves the input
103: untouched. It is useful for diagnostics, and for users who want to see
104: the original, whatever it is.
2.28 frystyk 105:
106: <PRE>
107: #define WWW_SOURCE HTAtom_for("*/*") /* Whatever it was originally */
108: </PRE>
109:
2.31 frystyk 110: <CODE>www/present</CODE> represents the user's perception of the
111: document. If you convert to www/present, you present the material to
112: the user.
2.10 timbl 113:
2.28 frystyk 114: <PRE>
115: #define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
116: </PRE>
117:
118: The message/rfc822 format means a MIME message or a plain text message
119: with no MIME header. This is what is returned by an HTTP server.
120:
121: <PRE>
122: #define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
123: </PRE>
124:
2.31 frystyk 125: <CODE>www/print</CODE> is like www/present except it represents a
126: printed copy.
2.28 frystyk 127:
128: <PRE>
129: #define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
130: </PRE>
2.13 timbl 131:
2.31 frystyk 132: <CODE>www/unknown</CODE> is a really unknown type. Some default
133: action is appropriate.
2.13 timbl 134:
2.28 frystyk 135: <PRE>
136: #define WWW_UNKNOWN HTAtom_for("www/unknown")
2.13 timbl 137: </PRE>
2.28 frystyk 138:
139:
2.31 frystyk 140: <H3>MIME ones</H3>
2.28 frystyk 141:
2.31 frystyk 142: These are regular MIME types defined. Others can be added!
2.28 frystyk 143:
144: <PRE>
145: #define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 146: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
147: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 148: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 149: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 150: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.26 frystyk 151: #define WWW_VIDEO HTAtom_for("video/mpeg")
2.37.2.1! frystyk 152: #define WWW_GIF HTAtom_for("image/gif")
2.28 frystyk 153: </PRE>
154:
2.31 frystyk 155: Extra types used in the library (EXPERIMENT)
2.28 frystyk 156:
157: <PRE>
158: #define WWW_NEWSLIST HTAtom_for("text/newslist")
159: </PRE>
2.7 timbl 160:
2.28 frystyk 161: We must include the following file after defining HTFormat, to which
2.10 timbl 162: it makes reference.
2.28 frystyk 163:
2.10 timbl 164: <H2>The HTEncoding type</H2>
165:
2.31 frystyk 166: <PRE>
2.37.2.1! frystyk 167: typedef HTAtom * HTEncoding;
! 168: typedef HTAtom * HTCte; /* Content transfer encoding */
2.31 frystyk 169: </PRE>
170:
171: The following are values for the MIME types:
172:
173: <PRE>
174: #define WWW_ENC_7BIT HTAtom_for("7bit")
2.10 timbl 175: #define WWW_ENC_8BIT HTAtom_for("8bit")
176: #define WWW_ENC_BINARY HTAtom_for("binary")
2.37.2.1! frystyk 177: #define WWW_ENC_BASE64 HTAtom_for("base64")
2.31 frystyk 178: </PRE>
179:
180: We also add
181:
182: <PRE>
183: #define WWW_ENC_COMPRESS HTAtom_for("compress")
184: </PRE>
2.10 timbl 185:
2.31 frystyk 186: <A NAME="Encoding"><H2>Registration of Accepted Content Encodings</H2></A>
2.10 timbl 187:
2.37.2.1! frystyk 188: This function is not currently used, as no encoding/decoding streams
! 189: have been made :-(
2.1 timbl 190:
2.31 frystyk 191: <PRE>
192: typedef struct _HTContentDescription {
193: char * filename;
194: HTAtom * content_type;
195: HTAtom * content_language;
196: HTAtom * content_encoding;
197: int content_length;
2.37 frystyk 198: double quality;
2.31 frystyk 199: } HTContentDescription;
200:
2.32 frystyk 201: extern void HTAcceptEncoding PARAMS((HTList * list,
2.31 frystyk 202: char * enc,
2.37 frystyk 203: double quality));
2.1 timbl 204: </PRE>
205:
2.31 frystyk 206: <A NAME="Language"><H2>Registration of Accepted Content Languages</H2></A>
2.28 frystyk 207:
2.31 frystyk 208: This function is not currently used.
2.28 frystyk 209:
210: <PRE>
2.32 frystyk 211: extern void HTAcceptLanguage PARAMS((HTList * list,
2.31 frystyk 212: char * lang,
2.37 frystyk 213: double quality));
2.31 frystyk 214: </PRE>
2.28 frystyk 215:
2.31 frystyk 216: <A NAME="Type"><H2>Registration of Accepted Converters and
217: Presenters</H2></A>
218:
219: A <CODE><A NAME="z12">converter</A></CODE> is a stream with a special
220: set of parameters and which is registered as capable of converting
221: from a MIME type to something else (maybe another MIME-type). A
222: <CODE>presenter</CODE> is a module (possibly an external program)
223: which can present a graphic object of a certain MIME type to the
224: user. That is, <CODE>presenters</CODE> are normally used to present
225: graphic objects that the <CODE>converters</CODE> are not able to
226: handle. Data is transferred to the external program using the <A
2.36 frystyk 227: HREF="HTFWrite.html">HTSaveAndExecute</A> stream which writes to a
2.31 frystyk 228: local file. This stream is actually a normal <CODE>converter</CODE>,
229: e.g., at strem having the following set of parameters:<P>
230:
231: <PRE>
232: #include "HTAccess.h" /* Required for HTRequest definition */
233:
234: typedef HTStream * HTConverter PARAMS((HTRequest * request,
235: void * param,
236: HTFormat input_format,
237: HTFormat output_format,
238: HTStream * output_stream));
239: </PRE>
240:
241: Both <CODE>converters</CODE> and <CODE>presenters</CODE> are set up in
242: a list which is used by the <A HREF="#z3">StreamStack</A> module to
243: find the best way to pass the information to the user. <P>
244:
245: The <CODE>HTPresentation</CODE> structure contains both
246: <CODE>converters</CODE> and <CODE>presenters</CODE>, and it is defined
247: as:
248:
249: <PRE>
250: typedef struct _HTPresentation {
251: HTAtom* rep; /* representation name atomized */
252: HTAtom* rep_out; /* resulting representation */
253: HTConverter *converter; /* The routine to gen the stream stack */
254: char * command; /* MIME-format string */
255: char * test_command; /* MIME-format string */
2.37 frystyk 256: double quality; /* Between 0 (bad) and 1 (good) */
257: double secs;
258: double secs_per_byte;
2.31 frystyk 259: } HTPresentation;
2.28 frystyk 260: </PRE>
261:
2.1 timbl 262:
2.31 frystyk 263: <A NAME="z17"><H3>Global List of Converters</H3>
2.1 timbl 264:
2.31 frystyk 265: This module keeps a global list of converters. This can be used to get
266: the set of supported formats. <P>
2.12 timbl 267:
2.31 frystyk 268: <B>NOTE:</B> There is also a conversion list associated with each
2.37.2.1! frystyk 269: request in the <A HREF="HTAccess.html#z1">HTRequest
! 270: structure</A>. This list can be used to expand the core set of
! 271: conversions in the global list. The <A HREF="#z3">StreamStack</A> uses
! 272: both lists.
2.31 frystyk 273:
274: <PRE>
275: extern HTList * HTConversions;
2.1 timbl 276: </PRE>
2.31 frystyk 277:
278: <H3>Register a Presenter</H3>
279:
2.1 timbl 280: <DL>
2.31 frystyk 281: <DT>conversions
282: <DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
283: <DT>representation
284: <DD>the <A HREF="#Type">MIME-style</A> format name
2.1 timbl 285: <DT>command
2.31 frystyk 286: <DD>the MAILCAP-style command template
2.1 timbl 287: <DT>quality
2.31 frystyk 288: <DD>A degradation faction [0..1]
2.1 timbl 289: <DT>maxbytes
2.31 frystyk 290: <DD>A limit on the length acceptable as input (0 infinite)
2.1 timbl 291: <DT>maxsecs
2.31 frystyk 292: <DD>A limit on the time user will wait (0 for infinity)
2.1 timbl 293: </DL>
294:
2.31 frystyk 295: <PRE>
296: extern void HTSetPresentation PARAMS((HTList * conversions,
297: CONST char * representation,
298: CONST char * command,
299: CONST char * test_command,
2.37 frystyk 300: double quality,
301: double secs,
302: double secs_per_byte));
2.31 frystyk 303: </PRE>
2.1 timbl 304:
2.31 frystyk 305: <H3>Register a Converter</H3>
2.1 timbl 306:
307: <DL>
2.31 frystyk 308: <DT>conversions
309: <DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
2.1 timbl 310: <DT>rep_in
2.31 frystyk 311: <DD>the <A HREF="#Type">MIME-style</A> format name
2.1 timbl 312: <DT>rep_out
2.31 frystyk 313: <DD>is the resulting content-type after the conversion
2.1 timbl 314: <DT>converter
2.31 frystyk 315: <DD>is the routine to call which actually does the conversion
316: <DT>quality
317: <DD>A degradation faction [0..1]
318: <DT>maxbytes
319: <DD>A limit on the length acceptable as input (0 infinite)
320: <DT>maxsecs
321: <DD>A limit on the time user will wait (0 for infinity)
2.1 timbl 322: </DL>
323:
324: <PRE>
2.31 frystyk 325: extern void HTSetConversion PARAMS((HTList * conversions,
326: CONST char * rep_in,
327: CONST char * rep_out,
328: HTConverter * converter,
2.37 frystyk 329: double quality,
330: double secs,
331: double secs_per_byte));
2.31 frystyk 332: </PRE>
333:
334: <H3>Set up Default Presenters and Converters</H3>
335:
336: A default set of <CODE>converters</CODE> and <CODE>presenters</CODE>
337: are defined in <A HREF="HTInit.c">HTInit.c</A> (or <A
338: HREF="../../Daemon/Inplementation/HTSUtils.c">HTSInit.c</A> in the
339: server). <P>
340:
341: <B>NOTE: </B> No automatic initialization is done in the Library, so
342: this is for the application to do
343:
344: <PRE>
345: extern void HTFormatInit PARAMS((HTList * conversions));
346: </PRE>
347:
348: This function also exists in a version where no
349: <CODE>presenters</CODE> are initialized. This is intended for Non
350: Interactive Mode, e.g., in the Line Mode Browser.
351:
352: <PRE>
353: extern void HTFormatInitNIM PARAMS((HTList * conversions));
354: </PRE>
355:
356: <H3>Remove presentations and conversions</H3>
2.1 timbl 357:
2.34 frystyk 358: This function deletes the <EM>LOCAL </EM>list of
359: <CODE>converters</CODE> and <CODE>presenters</CODE> associated with
360: each <A HREF="HTAccess.html#z1">HTRequest structure</A>.
2.1 timbl 361:
2.31 frystyk 362: <PRE>
363: extern void HTFormatDelete PARAMS((HTRequest * request));
364: </PRE>
365:
2.34 frystyk 366: This function cleans up the <EM>GLOBAL</EM> list of converters. The
367: function is called from <A
368: HREF="HTAccess.html#Library">HTLibTerminate</A>.
369:
370: <PRE>
371: extern void HTDisposeConversions NOPARAMS;
372: </PRE>
2.31 frystyk 373:
374: <A NAME="Rank"><H2>Ranking of Accepted Formats</H2></A>
375:
2.36 frystyk 376: This function is used when the best match among several possible
377: documents is to be found as a function of the accept headers sent in
378: the client request.
2.31 frystyk 379:
380: <PRE>
2.32 frystyk 381: extern BOOL HTRank PARAMS((HTList * possibilities,
2.31 frystyk 382: HTList * accepted_content_types,
383: HTList * accepted_content_languages,
384: HTList * accepted_content_encodings));
2.1 timbl 385: </PRE>
2.31 frystyk 386:
387: <H2><A NAME="z3">HTStreamStack</A></H2>
388:
389: This is the routine which actually sets up the conversion. It
390: currently checks only for direct conversions, but multi-stage
391: conversions are forseen. It takes a stream into which the output
392: should be sent in the final format, builds the conversion stack, and
393: returns a stream into which the data in the input format should be
394: fed.<P>
395:
2.23 luotonen 396: If <CODE>guess</CODE> is true and input format is
2.31 frystyk 397: <CODE>www/unknown</CODE>, try to guess the format by looking at the
398: first few bytes of the stream. <P>
2.1 timbl 399:
2.31 frystyk 400: <PRE>
2.32 frystyk 401: extern HTStream * HTStreamStack PARAMS((HTFormat rep_in,
2.31 frystyk 402: HTFormat rep_out,
403: HTStream * output_stream,
404: HTRequest * request,
405: BOOL guess));
2.1 timbl 406: </PRE>
2.31 frystyk 407:
408: <H3>Find the cost of a filter stack</H3>
409:
410: Must return the cost of the same stack which HTStreamStack would set
2.1 timbl 411: up.
412:
2.31 frystyk 413: <PRE>
2.1 timbl 414: #define NO_VALUE_FOUND -1e20 /* returned if none found */
415:
2.37 frystyk 416: extern double HTStackValue PARAMS((HTList * conversions,
2.31 frystyk 417: HTFormat format_in,
418: HTFormat format_out,
2.37 frystyk 419: double initial_value,
2.31 frystyk 420: long int length));
2.1 timbl 421: </PRE>
2.31 frystyk 422:
2.37.2.1! frystyk 423: <A NAME="Read"><H2>Read Data from a Socket</H2></A>
! 424:
! 425: This function has replaced many other functions for doing read from a
! 426: socket. It automaticly converts from ASCII if we are on a NON-ASCII
! 427: machine. This assumes that we do <B>not</B> use this function to read
! 428: a local file on a NON-ASCII machine. The following type definition is
! 429: to make life easier when having a state machine looking for a
! 430: <CODE><CRLF></CODE> sequence.
! 431:
! 432: <PRE>
! 433: typedef enum _HTSocketEOL {
! 434: EOL_ERR = -1,
! 435: EOL_BEGIN = 0,
! 436: EOL_FCR,
! 437: EOL_FLF,
! 438: EOL_DOT,
! 439: EOL_SCR,
! 440: EOL_SLF
! 441: } HTSocketEOL;
! 442:
! 443: extern int HTSocketRead PARAMS((HTRequest * request, HTStream * target));
! 444: </PRE>
! 445:
2.31 frystyk 446: <HR>
447:
448: <B><IMG
2.37.2.1! frystyk 449: ALIGN=middle SRC="http://www.w3.org/hypertext/WWW/Icons/32x32/caution.gif"> ATTENTION <IMG ALIGN=middle SRC="http://www.w3.org/hypertext/WWW/Icons/32x32/caution.gif"> <P>
2.31 frystyk 450:
451: THE REST OF THE FUNCTION DEFINED IN THIS MODULE ARE GOING TO BE
452: OBSOLETE SO DO NOT USE THEM - THEY ARE NOT REENTRANT.</B>
453:
454: <HR>
455:
2.1 timbl 456: <H2><A
2.10 timbl 457: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 458: to send data down a stream, typically
2.22 timbl 459: one which has been generated by HTStreamStack.
460: Returns the number of bytes transferred.
2.1 timbl 461:
2.36 frystyk 462: <PRE>
463: extern int HTCopy PARAMS((SOCKFD file_number,
464: HTStream * sink));
2.6 secret 465: </PRE>
2.36 frystyk 466:
2.6 secret 467: <H2><A
2.10 timbl 468: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 469: to send data down a stream, typically
2.7 timbl 470: one which has been generated by HTStreamStack.
471: It is currently called by <A
2.12 timbl 472: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 473: <PRE>extern void HTFileCopy PARAMS((
474: FILE* fp,
475: HTStream* sink));
476:
477:
2.7 timbl 478: </PRE>
479: <H2><A
2.10 timbl 480: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 481: stripping CR characters.</A></H2>It is slower than <A
2.12 timbl 482: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 483: <PRE>
484: extern void HTCopyNoCR PARAMS((
2.36 frystyk 485: SOCKFD file_number,
2.1 timbl 486: HTStream* sink));
487:
2.16 luotonen 488:
489: </PRE>
2.1 timbl 490: <H2>HTParseSocket: Parse a socket given
491: its format</H2>This routine is called by protocol
492: modules to load an object. uses<A
2.12 timbl 493: NAME="z4" HREF="#z3">
2.1 timbl 494: HTStreamStack</A> and the copy routines
495: above. Returns HT_LOADED if succesful,
496: <0 if not.
497: <PRE>extern int HTParseSocket PARAMS((
498: HTFormat format_in,
2.36 frystyk 499: SOCKFD file_number,
2.13 timbl 500: HTRequest * request));
2.6 secret 501:
502: </PRE>
503: <H2><A
2.10 timbl 504: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 505: a file pointer</A></H2>This routine is called by protocols
506: modules to load an object. uses<A
2.12 timbl 507: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 508: and <A
2.12 timbl 509: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 510: if succesful, <0 if not.
2.6 secret 511: <PRE>extern int HTParseFile PARAMS((
512: HTFormat format_in,
513: FILE *fp,
2.13 timbl 514: HTRequest * request));
2.8 timbl 515:
516: </PRE>
2.21 frystyk 517:
2.31 frystyk 518:
519: <H3>Get next character from buffer</H3>
520:
521: <PRE>extern int HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
2.21 frystyk 522: </PRE>
523:
2.31 frystyk 524: <H3>Read block from input socket</H3>Read *len characters and return a
525: buffer (don't free) containing *len
526: characters ( *len may have changed).
527: Buffer is not NULL-terminated.
528: <PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
529: int * len));
530:
2.32 frystyk 531: extern char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
532: extern char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
533: extern char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
534: extern BOOL HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));
2.31 frystyk 535:
2.1 timbl 536: #endif
537:
2.31 frystyk 538: </PRE>
539:
540:
541: End of definition module
542:
543: </BODY>
2.10 timbl 544: </HTML>
Webmaster