Annotation of libwww/Library/src/HTML.html, revision 2.21
2.7 timbl 1: <HTML>
2: <HEAD>
3: <TITLE>HTML to rich text converter for libwww</TITLE>
4: </HEAD>
2.6 timbl 5: <BODY>
2.20 frystyk 6:
7: <H1>The HTML to styled text object converter</H1>
8:
9: <PRE>
10: /*
11: ** (c) COPYRIGHT CERN 1994.
12: ** Please first read the full copyright statement in the file COPYRIGH.
13: */
14: </PRE>
15:
16: This interprets the <A
17: HREF="http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html">HTML</A>
18: semantics and some HTMLPlus.<P>
19:
20: This module is implemented by <A HREF="HTML.c">HTML.c</A>, and it is
21: a part of the <A
22: HREF="http://info.cern.ch/hypertext/WWW/Library/User/Guide/Guide.html">
23: Library of Common Code</A>.
24:
25: <PRE>
26: #ifndef HTML_H
2.1 timbl 27: #define HTML_H
28:
2.21 ! roeber 29: #include "sysdep.h"
2.1 timbl 30: #include "HTUtils.h"
2.13 luotonen 31: #include "HTFormat.h"
2.1 timbl 32: #include "HTAnchor.h"
2.8 timbl 33: #include "HTMLPDTD.h"
2.1 timbl 34:
2.8 timbl 35: #define DTD HTMLP_dtd
36:
2.6 timbl 37: #ifdef SHORT_NAMES
38: #define HTMLPresentation HTMLPren
39: #define HTMLPresent HTMLPres
40: #endif
2.1 timbl 41:
42: extern CONST HTStructuredClass HTMLPresentation;
2.17 frystyk 43: </PRE>
44:
45: <H2>HTML_new: A structured stream to parse HTML</H2>
2.1 timbl 46:
2.17 frystyk 47: When this routine is called, the request structure may contain a <A
48: NAME="z4" HREF="HTAccess.html#z6">childAnchor</A> value. In that case
49: it is the responsability of this module to select the anchor.<P>
2.12 timbl 50:
2.9 timbl 51: <PRE>extern HTStructured* HTML_new PARAMS((HTRequest * request,
2.8 timbl 52: void * param,
53: HTFormat input_format,
54: HTFormat output_format,
55: HTStream * output_stream));
56:
2.10 luotonen 57: </PRE>
2.1 timbl 58:
2.17 frystyk 59: <H3>Reopen</H3>
60:
61: Reopening an existing HTML object allows it to be retained (for
62: example by the styled text object) after the structured stream has
63: been closed. To be actually deleted, the HTML object must be closed
64: once more times than it has been reopened.
65:
66: <PRE>
67: extern void HTML_reopen PARAMS((HTStructured * me));
2.10 luotonen 68: </PRE>
2.17 frystyk 69:
2.10 luotonen 70: <H2>Converters</H2>
2.8 timbl 71:
2.17 frystyk 72: These are the converters implemented in this module:
2.8 timbl 73:
2.17 frystyk 74: <PRE>
2.21 ! roeber 75: #ifndef NO_EXTERN_TYPEDEF_FUNC
2.17 frystyk 76: extern HTConverter HTMLToPlain, HTMLToC, HTMLPresent, HTMLToTeX;
77: #endif
2.8 timbl 78: </PRE>
2.17 frystyk 79:
2.8 timbl 80: <H2>Selecting internal character set
81: representations</H2>
82: <PRE>typedef enum _HTMLCharacterSet {
2.1 timbl 83: HTML_ISO_LATIN1,
84: HTML_NEXT_CHARS,
85: HTML_PC_CP950
86: } HTMLCharacterSet;
87:
88: extern void HTMLUseCharacterSet PARAMS((HTMLCharacterSet i));
89:
2.6 timbl 90: </PRE>
91: <H2>Record error message as a hypertext
92: object</H2>The error message should be marked
93: as an error so that it can be reloaded
94: later. This implementation just throws
95: up an error message and leaves the
96: document unloaded.
2.10 luotonen 97: <H3>On entry,</H3>
98: <DL>
99: <DT>sink
2.11 timbl 100: <DD> is a stream to the output device
2.10 luotonen 101: if any
102: <DT>number
2.11 timbl 103: <DD> is the HTTP error number
2.10 luotonen 104: <DT>message
2.11 timbl 105: <DD> is the human readable message.
2.10 luotonen 106: </DL>
107:
108: <H3>On exit,</H3>a return code like HT_LOADED if object
109: exists else < 0
2.19 frystyk 110: <PRE>extern int HTLoadError PARAMS((
2.14 luotonen 111: HTRequest * req,
2.1 timbl 112: int number,
113: CONST char * message));
114:
2.6 timbl 115:
2.16 timbl 116: </PRE>
117: <h2>White Space Treatment</h2>
118: There is a small number of different ways of treating white
119: space in SGML, in mapping from a text object to HTML.
120: These have to be programmed it seems.
121: <pre>
122: /*
123: In text object \n\n \n tab \n\n\t
124: -------------- ------------- ----- ----- -------
125: in Address,
126: Blockquote,
127: Normal, <P> <BR> - NORMAL
128: H1-6: close+open <BR> - HEADING
129: Glossary <DT> <DT> <DD> <P> GLOSSARY
130: List,
131: Menu <LI> <LI> - <P> LIST
132: Dir <LI> <LI> <LI> DIR
133: Pre etc \n\n \n \t PRE
2.7 timbl 134:
2.16 timbl 135: */
136:
137: typedef enum _white_space_treatment {
138: WS_NORMAL,
139: WS_HEADING,
140: WS_GLOSSARY,
141: WS_LIST,
142: WS_DIR,
143: WS_PRE
144: } white_space_treatment;
145:
146: </pre>
147: <h2>Nesting State</h2>
148: These elements form tree with an item for each nesting state: that
149: is, each unique combination of nested elements which has a
150: specific style.
151: <pre>
152: typedef struct _HTNesting {
153: void * style; /* HTStyle *: Platform dependent */
154: white_space_treatment wst;
155: struct _HTNesting * parent;
156: int element_number;
157: int item_number; /* only for ordered lists */
158: int list_level; /* how deep nested */
159: HTList * children;
160: BOOL paragraph_break;
161: int magic;
162: BOOL object_gens_HTML; /* we don't generate HTML */
163: } HTNesting;
164:
165:
166: </pre>
167: <H2>Nesting functions</H2>
168: These functions were new with HTML2.c. They allow the tree
169: of SGML nesting states to be manipulated, and SGML regenerated from the
170: style sequence.
171: <PRE>
172:
173: extern void HTRegenInit NOPARAMS;
174:
175: extern void HTRegenCharacter PARAMS((
176: char c,
177: HTNesting * nesting,
178: HTStructured * target));
179:
180: extern void HTNestingChange PARAMS((
181: HTStructured* s,
182: HTNesting* old,
2.18 frystyk 183: HTNesting * newnest,
2.16 timbl 184: HTChildAnchor * info,
185: CONST char * aName));
186:
187: extern HTNesting * HTMLCommonality PARAMS((
188: HTNesting * s1,
189: HTNesting * s2));
190:
191: extern HTNesting * HTNestElement PARAMS((HTNesting * p, int ele));
192: extern /* HTStyle * */ void * HTStyleForNesting PARAMS((HTNesting * n));
193:
194: extern HTNesting* HTMLAncestor PARAMS((HTNesting * old, int depth));
2.18 frystyk 195:
196: extern HTNesting* CopyBranch PARAMS((HTNesting * old, HTNesting * newnest,
197: int depth));
198:
2.16 timbl 199: extern HTNesting * HTInsertLevel PARAMS((HTNesting * old,
200: int element_number,
201: int level));
202: extern HTNesting * HTDeleteLevel PARAMS((HTNesting * old,
203: int level));
204: extern int HTMLElementNumber PARAMS((HTNesting * s));
205: extern int HTMLLevel PARAMS(( HTNesting * s));
206: extern HTNesting* HTMLAncestor PARAMS((HTNesting * old, int depth));
207:
208: #endif /* end HTML_H */
209:
210: </PRE>
211:
212: end</BODY>
2.7 timbl 213: </HTML>
Webmaster