Annotation of libwww/Library/src/HTFormat.html, revision 2.34
2.10 timbl 1: <HTML>
2: <HEAD>
2.31 frystyk 3: <TITLE>The Format Manager in the WWW Library</TITLE>
2.15 timbl 4: <NEXTID N="z18">
2.10 timbl 5: </HEAD>
2.1 timbl 6: <BODY>
2.27 frystyk 7:
2.31 frystyk 8: <H1>The Format Manager</H1>
2.33 frystyk 9:
10: <PRE>
11: /*
12: ** (c) COPYRIGHT CERN 1994.
13: ** Please first read the full copyright statement in the file COPYRIGH.
14: */
15: </PRE>
2.31 frystyk 16:
17: Here we describe the functions of the HTFormat module which handles
18: conversion between different data representations. (In MIME parlance,
19: a representation is known as a content-type. In <A
20: HREF="http://info.cern.ch/hypertext/WWW/TheProject.html">WWW</A> the
21: term <EM>format</EM> is often used as it is shorter). The content of
22: this module is:
23:
24: <UL>
25: <LI><A HREF="#z15">Buffering for I/O</A>
26: <LI><A HREF="#Read">Read from the Network</A>
27: <LI><A HREF="#FormatTypes">Predefined Format Types</A>
28: <LI><A HREF="#Encoding">Registration of Accepted Content Encodings</A>
29: <LI><A HREF="#Language">Registration of Accepted Content Languages</A>
30: <LI><A HREF="#Type">Registration of Accepted Converters and Presenters</A>
31: <LI><A HREF="#Rank">Ranking of Accepted Formats</A>
32: <LI><A HREF="#z3">The Stream Stack</A>
33: </UL>
34:
35: This module is implemented by <A HREF="HTFormat.c">HTFormat.c</A>, and
36: it is a part of the <A NAME="z10"
37: HREF="http://info.cern.ch/hypertext/WWW/Library/User/Guide/Guide.html">Library
38: of Common Code</A>.
2.27 frystyk 39:
2.31 frystyk 40: <PRE>
41: #ifndef HTFORMAT_H
2.1 timbl 42: #define HTFORMAT_H
43:
2.31 frystyk 44: #include <A HREF="HTUtils.html">"HTUtils.h"</A>
45: #include <A HREF="HTStream.html">"HTStream.h"</A>
46: #include <A HREF="HTAtom.html">"HTAtom.h"</A>
47: #include <A HREF="HTList.html">"HTList.h"</A>
2.1 timbl 48:
49: #ifdef SHORT_NAMES
50: #define HTOutputSource HTOuSour
51: #define HTOutputBinary HTOuBina
52: #endif
2.31 frystyk 53: </PRE>
2.1 timbl 54:
2.31 frystyk 55: <H2><A NAME="z15">Buffering for network I/O</A></H2>
2.18 luotonen 56:
2.31 frystyk 57: This routines provide buffering for READ (and future WRITE) to the
58: network. It is used by all the protocol modules. The size of the
59: buffer, <CODE>INPUT_BUFFER_SIZE</CODE>, is a compromis between speed
60: and memory.
2.18 luotonen 61:
2.31 frystyk 62: <PRE>
63: #define INPUT_BUFFER_SIZE 8192
2.18 luotonen 64:
2.17 luotonen 65: typedef struct _socket_buffer {
2.25 luotonen 66: char input_buffer[INPUT_BUFFER_SIZE];
2.17 luotonen 67: char * input_pointer;
68: char * input_limit;
69: int input_file_number;
70: } HTInputSocket;
2.31 frystyk 71: </PRE>
72:
73: <H3>Create an Input Buffer</H3>
2.17 luotonen 74:
2.31 frystyk 75: This function allocates a input buffer and binds it to the socket
76: descriptor given as parameter.
77:
78: <PRE>
79: extern HTInputSocket* HTInputSocket_new PARAMS((int file_number));
2.17 luotonen 80: </PRE>
81:
2.31 frystyk 82: <H3>Free an Input Buffer</H3>
83:
84: <PRE>
85: extern void HTInputSocket_free PARAMS((HTInputSocket * isoc));
2.17 luotonen 86: </PRE>
87:
2.31 frystyk 88: <A NAME="Read"><H2>Read from the Network</H2></A>
89:
90: This function has replaced many other functions for doing read from
91: the network. It automaticly converts from ASCII if we are on a
92: NON-ASCII machine. This assumes that we do <B>not</B> use this
93: function to read a local file on a NON-ASCII machine. The following
94: type definition is to make life easier when having a state machine
95: looking for a <CODE><CRLF></CODE> sequence.
96:
97: <PRE>
98: typedef enum _HTSocketEOL {
99: EOL_ERR = -1,
100: EOL_BEGIN = 0,
101: EOL_FCR,
102: EOL_FLF,
103: EOL_DOT,
104: EOL_SCR,
105: EOL_SLF
106: } HTSocketEOL;
2.17 luotonen 107:
2.32 frystyk 108: extern int HTInputSocket_read PARAMS((HTInputSocket * isoc,
2.31 frystyk 109: HTStream * target));
2.17 luotonen 110: </PRE>
111:
2.31 frystyk 112: <H2><A NAME="z11">Convert Net ASCII to local representation</A></H2>
2.22 timbl 113:
2.31 frystyk 114: This is a filter stream suitable for taking text from a socket and
115: passing it into a stream which expects text in the local C
116: representation. It does newline conversion. As usual, pass its
117: output stream to it when creating it.
2.17 luotonen 118:
2.31 frystyk 119: <PRE>
120: extern HTStream * HTNetToText PARAMS ((HTStream * sink));
2.17 luotonen 121: </PRE>
2.25 luotonen 122:
2.31 frystyk 123: <A NAME="FormatTypes"><H2>The HTFormat types</H2></A>
2.25 luotonen 124:
2.31 frystyk 125: We use the <A HREF="HTAtom.html">HTAtom</A> object for holding
126: representations. This allows faster manipulation (comparison and
127: copying) that if we stayed with strings. These macros (which used to
128: be constants) define some basic internally referenced
129: representations.<P>
2.25 luotonen 130:
2.31 frystyk 131: <PRE>
132: typedef HTAtom * HTFormat;
2.28 frystyk 133: </PRE>
134:
135: <H3>Internal ones</H3>
136:
2.31 frystyk 137: The <CODE>www/xxx</CODE> ones are of course not MIME
138: standard. <P>
2.28 frystyk 139:
2.31 frystyk 140: <CODE>star/star</CODE> is an output format which leaves the input
141: untouched. It is useful for diagnostics, and for users who want to see
142: the original, whatever it is.
2.28 frystyk 143:
144: <PRE>
145: #define WWW_SOURCE HTAtom_for("*/*") /* Whatever it was originally */
146: </PRE>
147:
2.31 frystyk 148: <CODE>www/present</CODE> represents the user's perception of the
149: document. If you convert to www/present, you present the material to
150: the user.
2.10 timbl 151:
2.28 frystyk 152: <PRE>
153: #define WWW_PRESENT HTAtom_for("www/present") /* The user's perception */
154: </PRE>
155:
156: The message/rfc822 format means a MIME message or a plain text message
157: with no MIME header. This is what is returned by an HTTP server.
158:
159: <PRE>
160: #define WWW_MIME HTAtom_for("www/mime") /* A MIME message */
161: </PRE>
162:
2.31 frystyk 163: <CODE>www/print</CODE> is like www/present except it represents a
164: printed copy.
2.28 frystyk 165:
166: <PRE>
167: #define WWW_PRINT HTAtom_for("www/print") /* A printed copy */
168: </PRE>
2.13 timbl 169:
2.31 frystyk 170: <CODE>www/unknown</CODE> is a really unknown type. Some default
171: action is appropriate.
2.13 timbl 172:
2.28 frystyk 173: <PRE>
174: #define WWW_UNKNOWN HTAtom_for("www/unknown")
2.13 timbl 175: </PRE>
2.28 frystyk 176:
177:
2.31 frystyk 178: <H3>MIME ones</H3>
2.28 frystyk 179:
2.31 frystyk 180: These are regular MIME types defined. Others can be added!
2.28 frystyk 181:
182: <PRE>
183: #define WWW_PLAINTEXT HTAtom_for("text/plain")
2.1 timbl 184: #define WWW_POSTSCRIPT HTAtom_for("application/postscript")
185: #define WWW_RICHTEXT HTAtom_for("application/rtf")
2.10 timbl 186: #define WWW_AUDIO HTAtom_for("audio/basic")
2.1 timbl 187: #define WWW_HTML HTAtom_for("text/html")
2.11 timbl 188: #define WWW_BINARY HTAtom_for("application/octet-stream")
2.26 frystyk 189: #define WWW_VIDEO HTAtom_for("video/mpeg")
2.28 frystyk 190: </PRE>
191:
2.31 frystyk 192: Extra types used in the library (EXPERIMENT)
2.28 frystyk 193:
194: <PRE>
195: #define WWW_NEWSLIST HTAtom_for("text/newslist")
196: </PRE>
2.7 timbl 197:
2.28 frystyk 198: We must include the following file after defining HTFormat, to which
2.10 timbl 199: it makes reference.
2.28 frystyk 200:
2.10 timbl 201: <H2>The HTEncoding type</H2>
202:
2.31 frystyk 203: <PRE>
204: typedef HTAtom* HTEncoding;
205: </PRE>
206:
207: The following are values for the MIME types:
208:
209: <PRE>
210: #define WWW_ENC_7BIT HTAtom_for("7bit")
2.10 timbl 211: #define WWW_ENC_8BIT HTAtom_for("8bit")
212: #define WWW_ENC_BINARY HTAtom_for("binary")
2.31 frystyk 213: </PRE>
214:
215: We also add
216:
217: <PRE>
218: #define WWW_ENC_COMPRESS HTAtom_for("compress")
219: </PRE>
2.10 timbl 220:
2.31 frystyk 221: <A NAME="Encoding"><H2>Registration of Accepted Content Encodings</H2></A>
2.10 timbl 222:
2.31 frystyk 223: This function is not currently used.
2.1 timbl 224:
2.31 frystyk 225: <PRE>
226: typedef struct _HTContentDescription {
227: char * filename;
228: HTAtom * content_type;
229: HTAtom * content_language;
230: HTAtom * content_encoding;
231: int content_length;
232: float quality;
233: } HTContentDescription;
234:
2.32 frystyk 235: extern void HTAcceptEncoding PARAMS((HTList * list,
2.31 frystyk 236: char * enc,
237: float quality));
2.1 timbl 238: </PRE>
239:
2.31 frystyk 240: <A NAME="Language"><H2>Registration of Accepted Content Languages</H2></A>
2.28 frystyk 241:
2.31 frystyk 242: This function is not currently used.
2.28 frystyk 243:
244: <PRE>
2.32 frystyk 245: extern void HTAcceptLanguage PARAMS((HTList * list,
2.31 frystyk 246: char * lang,
247: float quality));
248: </PRE>
2.28 frystyk 249:
2.31 frystyk 250: <A NAME="Type"><H2>Registration of Accepted Converters and
251: Presenters</H2></A>
252:
253: A <CODE><A NAME="z12">converter</A></CODE> is a stream with a special
254: set of parameters and which is registered as capable of converting
255: from a MIME type to something else (maybe another MIME-type). A
256: <CODE>presenter</CODE> is a module (possibly an external program)
257: which can present a graphic object of a certain MIME type to the
258: user. That is, <CODE>presenters</CODE> are normally used to present
259: graphic objects that the <CODE>converters</CODE> are not able to
260: handle. Data is transferred to the external program using the <A
261: HREF="HTFWriter.html">HTSaveAndExecute</A> stream which writes to a
262: local file. This stream is actually a normal <CODE>converter</CODE>,
263: e.g., at strem having the following set of parameters:<P>
264:
265: <PRE>
266: #include "HTAccess.h" /* Required for HTRequest definition */
267:
268: typedef HTStream * HTConverter PARAMS((HTRequest * request,
269: void * param,
270: HTFormat input_format,
271: HTFormat output_format,
272: HTStream * output_stream));
273: </PRE>
274:
275: Both <CODE>converters</CODE> and <CODE>presenters</CODE> are set up in
276: a list which is used by the <A HREF="#z3">StreamStack</A> module to
277: find the best way to pass the information to the user. <P>
278:
279: The <CODE>HTPresentation</CODE> structure contains both
280: <CODE>converters</CODE> and <CODE>presenters</CODE>, and it is defined
281: as:
282:
283: <PRE>
284: typedef struct _HTPresentation {
285: HTAtom* rep; /* representation name atomized */
286: HTAtom* rep_out; /* resulting representation */
287: HTConverter *converter; /* The routine to gen the stream stack */
288: char * command; /* MIME-format string */
289: char * test_command; /* MIME-format string */
290: float quality; /* Between 0 (bad) and 1 (good) */
2.1 timbl 291: float secs;
292: float secs_per_byte;
2.31 frystyk 293: } HTPresentation;
2.28 frystyk 294: </PRE>
295:
2.1 timbl 296:
2.31 frystyk 297: <A NAME="z17"><H3>Global List of Converters</H3>
2.1 timbl 298:
2.31 frystyk 299: This module keeps a global list of converters. This can be used to get
300: the set of supported formats. <P>
2.12 timbl 301:
2.31 frystyk 302: <B>NOTE:</B> There is also a conversion list associated with each
303: request in the <A HREF="HTAccess.html#z1">HTRequest structure</A>.
304:
305: <PRE>
306: extern HTList * HTConversions;
2.1 timbl 307: </PRE>
2.31 frystyk 308:
309: <H3>Register a Presenter</H3>
310:
2.1 timbl 311: <DL>
2.31 frystyk 312: <DT>conversions
313: <DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
314: <DT>representation
315: <DD>the <A HREF="#Type">MIME-style</A> format name
2.1 timbl 316: <DT>command
2.31 frystyk 317: <DD>the MAILCAP-style command template
2.1 timbl 318: <DT>quality
2.31 frystyk 319: <DD>A degradation faction [0..1]
2.1 timbl 320: <DT>maxbytes
2.31 frystyk 321: <DD>A limit on the length acceptable as input (0 infinite)
2.1 timbl 322: <DT>maxsecs
2.31 frystyk 323: <DD>A limit on the time user will wait (0 for infinity)
2.1 timbl 324: </DL>
325:
2.31 frystyk 326: <PRE>
327: extern void HTSetPresentation PARAMS((HTList * conversions,
328: CONST char * representation,
329: CONST char * command,
330: CONST char * test_command,
331: float quality,
332: float secs,
333: float secs_per_byte));
334: </PRE>
2.1 timbl 335:
2.31 frystyk 336: <H3>Register a Converter</H3>
2.1 timbl 337:
338: <DL>
2.31 frystyk 339: <DT>conversions
340: <DD>The list of <CODE>conveters</CODE> and <CODE>presenters</CODE>
2.1 timbl 341: <DT>rep_in
2.31 frystyk 342: <DD>the <A HREF="#Type">MIME-style</A> format name
2.1 timbl 343: <DT>rep_out
2.31 frystyk 344: <DD>is the resulting content-type after the conversion
2.1 timbl 345: <DT>converter
2.31 frystyk 346: <DD>is the routine to call which actually does the conversion
347: <DT>quality
348: <DD>A degradation faction [0..1]
349: <DT>maxbytes
350: <DD>A limit on the length acceptable as input (0 infinite)
351: <DT>maxsecs
352: <DD>A limit on the time user will wait (0 for infinity)
2.1 timbl 353: </DL>
354:
355: <PRE>
2.31 frystyk 356: extern void HTSetConversion PARAMS((HTList * conversions,
357: CONST char * rep_in,
358: CONST char * rep_out,
359: HTConverter * converter,
360: float quality,
361: float secs,
362: float secs_per_byte));
363: </PRE>
364:
365: <H3>Set up Default Presenters and Converters</H3>
366:
367: A default set of <CODE>converters</CODE> and <CODE>presenters</CODE>
368: are defined in <A HREF="HTInit.c">HTInit.c</A> (or <A
369: HREF="../../Daemon/Inplementation/HTSUtils.c">HTSInit.c</A> in the
370: server). <P>
371:
372: <B>NOTE: </B> No automatic initialization is done in the Library, so
373: this is for the application to do
374:
375: <PRE>
376: extern void HTFormatInit PARAMS((HTList * conversions));
377: </PRE>
378:
379: This function also exists in a version where no
380: <CODE>presenters</CODE> are initialized. This is intended for Non
381: Interactive Mode, e.g., in the Line Mode Browser.
382:
383: <PRE>
384: extern void HTFormatInitNIM PARAMS((HTList * conversions));
385: </PRE>
386:
387: <H3>Remove presentations and conversions</H3>
2.1 timbl 388:
2.34 ! frystyk 389: This function deletes the <EM>LOCAL </EM>list of
! 390: <CODE>converters</CODE> and <CODE>presenters</CODE> associated with
! 391: each <A HREF="HTAccess.html#z1">HTRequest structure</A>.
2.1 timbl 392:
2.31 frystyk 393: <PRE>
394: extern void HTFormatDelete PARAMS((HTRequest * request));
395: </PRE>
396:
2.34 ! frystyk 397: This function cleans up the <EM>GLOBAL</EM> list of converters. The
! 398: function is called from <A
! 399: HREF="HTAccess.html#Library">HTLibTerminate</A>.
! 400:
! 401: <PRE>
! 402: extern void HTDisposeConversions NOPARAMS;
! 403: </PRE>
2.31 frystyk 404:
405: <A NAME="Rank"><H2>Ranking of Accepted Formats</H2></A>
406:
407: This function is not currently used.
408:
409: <PRE>
2.32 frystyk 410: extern BOOL HTRank PARAMS((HTList * possibilities,
2.31 frystyk 411: HTList * accepted_content_types,
412: HTList * accepted_content_languages,
413: HTList * accepted_content_encodings));
2.1 timbl 414: </PRE>
2.31 frystyk 415:
416: <H2><A NAME="z3">HTStreamStack</A></H2>
417:
418: This is the routine which actually sets up the conversion. It
419: currently checks only for direct conversions, but multi-stage
420: conversions are forseen. It takes a stream into which the output
421: should be sent in the final format, builds the conversion stack, and
422: returns a stream into which the data in the input format should be
423: fed.<P>
424:
2.23 luotonen 425: If <CODE>guess</CODE> is true and input format is
2.31 frystyk 426: <CODE>www/unknown</CODE>, try to guess the format by looking at the
427: first few bytes of the stream. <P>
2.1 timbl 428:
2.31 frystyk 429: <PRE>
2.32 frystyk 430: extern HTStream * HTStreamStack PARAMS((HTFormat rep_in,
2.31 frystyk 431: HTFormat rep_out,
432: HTStream * output_stream,
433: HTRequest * request,
434: BOOL guess));
2.1 timbl 435: </PRE>
2.31 frystyk 436:
437: <H3>Find the cost of a filter stack</H3>
438:
439: Must return the cost of the same stack which HTStreamStack would set
2.1 timbl 440: up.
441:
2.31 frystyk 442: <PRE>
2.1 timbl 443: #define NO_VALUE_FOUND -1e20 /* returned if none found */
444:
2.31 frystyk 445: extern float HTStackValue PARAMS((HTList * conversions,
446: HTFormat format_in,
447: HTFormat format_out,
448: float initial_value,
449: long int length));
2.1 timbl 450: </PRE>
2.31 frystyk 451:
452: <HR>
453:
454: <B><IMG
455: ALIGN=middle SRC="http://info.cern.ch/hypertext/WWW/Icons/32x32/caution.gif"> ATTENTION <IMG ALIGN=middle SRC="http://info.cern.ch/hypertext/WWW/Icons/32x32/caution.gif"> <P>
456:
457: THE REST OF THE FUNCTION DEFINED IN THIS MODULE ARE GOING TO BE
458: OBSOLETE SO DO NOT USE THEM - THEY ARE NOT REENTRANT.</B>
459:
460: <HR>
461:
2.1 timbl 462: <H2><A
2.10 timbl 463: NAME="z1">HTCopy: Copy a socket to a stream</A></H2>This is used by the protocol engines
2.6 secret 464: to send data down a stream, typically
2.22 timbl 465: one which has been generated by HTStreamStack.
466: Returns the number of bytes transferred.
2.19 luotonen 467: <PRE>extern int HTCopy PARAMS((
2.1 timbl 468: int file_number,
469: HTStream* sink));
470:
471:
2.6 secret 472: </PRE>
473: <H2><A
2.10 timbl 474: NAME="c6">HTFileCopy: Copy a file to a stream</A></H2>This is used by the protocol engines
2.6 secret 475: to send data down a stream, typically
2.7 timbl 476: one which has been generated by HTStreamStack.
477: It is currently called by <A
2.12 timbl 478: NAME="z9" HREF="#c7">HTParseFile</A>
2.6 secret 479: <PRE>extern void HTFileCopy PARAMS((
480: FILE* fp,
481: HTStream* sink));
482:
483:
2.7 timbl 484: </PRE>
485: <H2><A
2.10 timbl 486: NAME="c2">HTCopyNoCR: Copy a socket to a stream,
2.7 timbl 487: stripping CR characters.</A></H2>It is slower than <A
2.12 timbl 488: NAME="z2" HREF="#z1">HTCopy</A> .
2.1 timbl 489: <PRE>
490: extern void HTCopyNoCR PARAMS((
491: int file_number,
492: HTStream* sink));
493:
2.16 luotonen 494:
495: </PRE>
2.1 timbl 496: <H2>HTParseSocket: Parse a socket given
497: its format</H2>This routine is called by protocol
498: modules to load an object. uses<A
2.12 timbl 499: NAME="z4" HREF="#z3">
2.1 timbl 500: HTStreamStack</A> and the copy routines
501: above. Returns HT_LOADED if succesful,
502: <0 if not.
503: <PRE>extern int HTParseSocket PARAMS((
504: HTFormat format_in,
505: int file_number,
2.13 timbl 506: HTRequest * request));
2.6 secret 507:
508: </PRE>
509: <H2><A
2.10 timbl 510: NAME="c1">HTParseFile: Parse a File through
2.7 timbl 511: a file pointer</A></H2>This routine is called by protocols
512: modules to load an object. uses<A
2.12 timbl 513: NAME="z4" HREF="#z3"> HTStreamStack</A>
2.7 timbl 514: and <A
2.12 timbl 515: NAME="c7" HREF="#c6">HTFileCopy</A> . Returns HT_LOADED
2.7 timbl 516: if succesful, <0 if not.
2.6 secret 517: <PRE>extern int HTParseFile PARAMS((
518: HTFormat format_in,
519: FILE *fp,
2.13 timbl 520: HTRequest * request));
2.8 timbl 521:
522: </PRE>
2.21 frystyk 523:
2.31 frystyk 524:
525: <H3>Get next character from buffer</H3>
526:
527: <PRE>extern int HTInputSocket_getCharacter PARAMS((HTInputSocket* isoc));
2.21 frystyk 528: </PRE>
529:
2.31 frystyk 530: <H3>Read block from input socket</H3>Read *len characters and return a
531: buffer (don't free) containing *len
532: characters ( *len may have changed).
533: Buffer is not NULL-terminated.
534: <PRE>extern char * HTInputSocket_getBlock PARAMS((HTInputSocket * isoc,
535: int * len));
536:
2.32 frystyk 537: extern char * HTInputSocket_getLine PARAMS((HTInputSocket * isoc));
538: extern char * HTInputSocket_getUnfoldedLine PARAMS((HTInputSocket * isoc));
539: extern char * HTInputSocket_getStatusLine PARAMS((HTInputSocket * isoc));
540: extern BOOL HTInputSocket_seemsBinary PARAMS((HTInputSocket * isoc));
2.31 frystyk 541:
2.1 timbl 542: #endif
543:
2.31 frystyk 544: </PRE>
545:
546:
547: End of definition module
548:
549: </BODY>
2.10 timbl 550: </HTML>
Webmaster