Annotation of libwww/Library/src/HTParse.html, revision 2.14.2.1
2.14 frystyk 1: <HTML>
2: <HEAD>
3: <TITLE>URI Parsing in the WWW Library</TITLE>
2.6 timbl 4: <NEXTID N="1">
2.14 frystyk 5: </HEAD>
2.2 timbl 6: <BODY>
2.14 frystyk 7:
2.9 frystyk 8: <H1>HTParse</H1>
9:
2.14 frystyk 10: <PRE>
11: /*
12: ** (c) COPYRIGHT CERN 1994.
13: ** Please first read the full copyright statement in the file COPYRIGH.
14: */
15: </PRE>
16:
17: This module contains code to parse URIs and various related things such as:
2.9 frystyk 18:
19: <UL>
20: <LI> Parse a URI relative to another URI
21: <LI> Get the URI relative to another URI
22: <LI> Remove redundant data from the URI (HTSimplify and HTCanon)
23: <LI> Expand a local host name into a full domain name (HTExpand)
24: <LI> Search a URI for illigal characters in order to prevent security holes
25: </UL>
26:
2.14 frystyk 27: This module is implemented by <A HREF="HTParse.c">HTParse.c</A>, and it is
28: a part of the <A
29: HREF="http://info.cern.ch/hypertext/WWW/Library/User/Guide/Guide.html">
30: Library of Common Code</A>.
2.9 frystyk 31:
32: <PRE>
33: #ifndef HTPARSE_H
2.2 timbl 34: #define HTPARSE_H
2.14.2.1! frystyk 35:
2.13 frystyk 36: #include "HTEscape.h"
2.9 frystyk 37: </PRE>
2.2 timbl 38:
2.9 frystyk 39: <H2>HTParse: Parse a URI relative to another URI</H2>
40:
41: This returns those parts of a name which are given (and requested) substituting
42: bits from the related name where necessary.
43:
44: <H3>Control Flags for HTParse</H3>
45:
46: The following are flag bits which may be ORed together to form a number
47: to give the 'wanted' argument to HTParse.
48:
49: <PRE>
50: #define PARSE_ACCESS 16
2.1 timbl 51: #define PARSE_HOST 8
52: #define PARSE_PATH 4
53: #define PARSE_ANCHOR 2
54: #define PARSE_PUNCTUATION 1
55: #define PARSE_ALL 31
2.9 frystyk 56: </PRE>
2.1 timbl 57:
2.2 timbl 58: <H3>On entry</H3>
59: <DL>
60: <DT>aName
61: <DD> A filename given
62: <DT>relatedName
63: <DD> A name relative to which
64: aName is to be parsed
65: <DT>wanted
66: <DD> A mask for the bits which
67: are wanted.
68: </DL>
69:
70: <H3>On exit,</H3>
71: <DL>
72: <DT>returns
73: <DD> A pointer to a malloc'd string
74: which MUST BE FREED
75: </DL>
76:
77: <PRE>
2.13 frystyk 78: extern char * HTParse PARAMS(( const char * aName,
2.9 frystyk 79: const char * relatedName,
80: int wanted));
81: </PRE>
2.2 timbl 82:
2.9 frystyk 83: <H2>HTStrip: Strip white space off a string</H2>
2.1 timbl 84:
2.2 timbl 85: <H3>On exit</H3>Return value points to first non-white
86: character, or to 0 if none.<P>
87: All trailing white space is OVERWRITTEN
88: with zero.
2.10 frystyk 89:
90: <PRE>
2.13 frystyk 91: extern char * HTStrip PARAMS((char * s));
2.9 frystyk 92: </PRE>
93:
94: <H2>HTSimplify: Simplify a UTL</H2>
2.1 timbl 95:
2.9 frystyk 96: A URI is allowed to contain the seqeunce xxx/../ which may be
97: replaced by "" , and the seqeunce "/./" which may be replaced by "/".
98: Simplification helps us recognize duplicate URIs. Thus, the following
99: transformations are done:
100:
101: <UL>
102: <LI> /etc/junk/../fred becomes /etc/fred
103: <LI> /etc/junk/./fred becomes /etc/junk/fred
104: </UL>
105:
106: but we should NOT change
107: <UL>
108: <LI> http://fred.xxx.edu/../.. or
109: <LI> ../../albert.html
110: </UL>
111:
112: In the same manner, the following prefixed are preserved:
113:
114: <UL>
115: <LI> ./<etc>
116: <LI> //<etc>
117: </UL>
118:
119: In order to avoid empty URIs the following URIs become:
120:
121: <UL>
122: <LI> /fred/.. becomes /fred/..
123: <LI> /fred/././.. becomes /fred/..
124: <LI> /fred/.././junk/.././ becomes /fred/..
125: </UL>
126:
127: If more than one set of `://' is found (several proxies in cascade) then
128: only the part after the last `://' is simplified.
129:
130: <PRE>
2.13 frystyk 131: extern char *HTSimplify PARAMS((char *filename));
2.2 timbl 132: </PRE>
2.1 timbl 133:
2.6 timbl 134:
135: <H2>HTRelative: Make Relative (Partial)
2.9 frystyk 136: URI</H2>This function creates and returns
2.2 timbl 137: a string which gives an expression
138: of one address as related to another.
139: Where there is no relation, an absolute
140: address is retured.
141: <H3>On entry,</H3>Both names must be absolute, fully
142: qualified names of nodes (no anchor
143: bits)
144: <H3>On exit,</H3>The return result points to a newly
145: allocated name which, if parsed by
146: HTParse relative to relatedName,
147: will yield aName. The caller is responsible
148: for freeing the resulting name later.
2.10 frystyk 149:
150: <PRE>
2.13 frystyk 151: extern char * HTRelative PARAMS((const char * aName, const char *relatedName));
2.9 frystyk 152: </PRE>
153:
154: <H2>HTExpand: Expand a Local Host Name Into a Full Domain Name</H2>
2.2 timbl 155:
2.9 frystyk 156: This function expands the host name of the URI from a local name to a
2.11 frystyk 157: full domain name and converts the host name to lower case. The
158: advantage by doing this is that we only have one entry in the host
159: case and one entry in the document cache.
2.6 timbl 160:
2.9 frystyk 161: <PRE>
2.13 frystyk 162: extern char *HTCanon PARAMS (( char ** filename,
2.9 frystyk 163: char * host));
2.6 timbl 164: </PRE>
2.9 frystyk 165:
2.8 luotonen 166: <H2>Prevent Security Holes</H2>
167:
168: <CODE>HTCleanTelnetString()</CODE> makes sure that the given string
169: doesn't contain characters that could cause security holes, such as
170: newlines in ftp, gopher, news or telnet URLs; more specifically:
171: allows everything between hexadesimal ASCII 20-7E, and also A0-FE,
172: inclusive.
173: <DL>
174: <DT> <CODE>str</CODE>
175: <DD> the string that is *modified* if necessary. The string will be
176: truncated at the first illegal character that is encountered.
177: <DT>returns
178: <DD> YES, if the string was modified.
179: NO, otherwise.
180: </DL>
2.9 frystyk 181:
2.8 luotonen 182: <PRE>
2.13 frystyk 183: extern BOOL HTCleanTelnetString PARAMS((char * str));
2.8 luotonen 184: </PRE>
185:
186: <PRE>
2.6 timbl 187: #endif /* HTPARSE_H */
2.9 frystyk 188: </PRE>
2.2 timbl 189:
2.9 frystyk 190: End of HTParse Module
191: </BODY>
192: </HTML>
2.2 timbl 193:
Webmaster