Annotation of libwww/Library/src/HTParse.html, revision 2.13
2.2 timbl 1: <HEADER>
2.9 frystyk 2: <TITLE>HTParse: URI parsing in the WWW Library</TITLE>
2.6 timbl 3: <NEXTID N="1">
4: </HEADER>
2.2 timbl 5: <BODY>
2.9 frystyk 6: <H1>HTParse</H1>
7:
2.12 frystyk 8: This module is a part of the <A HREF="Overview.html">CERN Common WWW
9: Library</A>. It contains code to parse URIs and various related things
10: such as:
2.9 frystyk 11:
12: <UL>
13: <LI> Parse a URI relative to another URI
14: <LI> Get the URI relative to another URI
15: <LI> Remove redundant data from the URI (HTSimplify and HTCanon)
16: <LI> Expand a local host name into a full domain name (HTExpand)
17: <LI> Search a URI for illigal characters in order to prevent security holes
18: </UL>
19:
20: Implemented by <A NAME=z0 HREF="HTParse.c">HTParse.c</A>.
21:
22: <PRE>
23: #ifndef HTPARSE_H
2.2 timbl 24: #define HTPARSE_H
25: #include "HTUtils.h"
2.13 ! frystyk 26: #include "HTEscape.h"
2.9 frystyk 27: </PRE>
2.2 timbl 28:
2.9 frystyk 29: <H2>HTParse: Parse a URI relative to another URI</H2>
30:
31: This returns those parts of a name which are given (and requested) substituting
32: bits from the related name where necessary.
33:
34: <H3>Control Flags for HTParse</H3>
35:
36: The following are flag bits which may be ORed together to form a number
37: to give the 'wanted' argument to HTParse.
38:
39: <PRE>
40: #define PARSE_ACCESS 16
2.1 timbl 41: #define PARSE_HOST 8
42: #define PARSE_PATH 4
43: #define PARSE_ANCHOR 2
44: #define PARSE_PUNCTUATION 1
45: #define PARSE_ALL 31
2.9 frystyk 46: </PRE>
2.1 timbl 47:
2.2 timbl 48: <H3>On entry</H3>
49: <DL>
50: <DT>aName
51: <DD> A filename given
52: <DT>relatedName
53: <DD> A name relative to which
54: aName is to be parsed
55: <DT>wanted
56: <DD> A mask for the bits which
57: are wanted.
58: </DL>
59:
60: <H3>On exit,</H3>
61: <DL>
62: <DT>returns
63: <DD> A pointer to a malloc'd string
64: which MUST BE FREED
65: </DL>
66:
67: <PRE>
2.13 ! frystyk 68: extern char * HTParse PARAMS(( const char * aName,
2.9 frystyk 69: const char * relatedName,
70: int wanted));
71: </PRE>
2.2 timbl 72:
2.9 frystyk 73: <H2>HTStrip: Strip white space off a string</H2>
2.1 timbl 74:
2.2 timbl 75: <H3>On exit</H3>Return value points to first non-white
76: character, or to 0 if none.<P>
77: All trailing white space is OVERWRITTEN
78: with zero.
2.10 frystyk 79:
80: <PRE>
2.13 ! frystyk 81: extern char * HTStrip PARAMS((char * s));
2.9 frystyk 82: </PRE>
83:
84: <H2>HTSimplify: Simplify a UTL</H2>
2.1 timbl 85:
2.9 frystyk 86: A URI is allowed to contain the seqeunce xxx/../ which may be
87: replaced by "" , and the seqeunce "/./" which may be replaced by "/".
88: Simplification helps us recognize duplicate URIs. Thus, the following
89: transformations are done:
90:
91: <UL>
92: <LI> /etc/junk/../fred becomes /etc/fred
93: <LI> /etc/junk/./fred becomes /etc/junk/fred
94: </UL>
95:
96: but we should NOT change
97: <UL>
98: <LI> http://fred.xxx.edu/../.. or
99: <LI> ../../albert.html
100: </UL>
101:
102: In the same manner, the following prefixed are preserved:
103:
104: <UL>
105: <LI> ./<etc>
106: <LI> //<etc>
107: </UL>
108:
109: In order to avoid empty URIs the following URIs become:
110:
111: <UL>
112: <LI> /fred/.. becomes /fred/..
113: <LI> /fred/././.. becomes /fred/..
114: <LI> /fred/.././junk/.././ becomes /fred/..
115: </UL>
116:
117: If more than one set of `://' is found (several proxies in cascade) then
118: only the part after the last `://' is simplified.
119:
120: <PRE>
2.13 ! frystyk 121: extern char *HTSimplify PARAMS((char *filename));
2.2 timbl 122: </PRE>
2.1 timbl 123:
2.6 timbl 124:
125: <H2>HTRelative: Make Relative (Partial)
2.9 frystyk 126: URI</H2>This function creates and returns
2.2 timbl 127: a string which gives an expression
128: of one address as related to another.
129: Where there is no relation, an absolute
130: address is retured.
131: <H3>On entry,</H3>Both names must be absolute, fully
132: qualified names of nodes (no anchor
133: bits)
134: <H3>On exit,</H3>The return result points to a newly
135: allocated name which, if parsed by
136: HTParse relative to relatedName,
137: will yield aName. The caller is responsible
138: for freeing the resulting name later.
2.10 frystyk 139:
140: <PRE>
2.13 ! frystyk 141: extern char * HTRelative PARAMS((const char * aName, const char *relatedName));
2.9 frystyk 142: </PRE>
143:
144: <H2>HTExpand: Expand a Local Host Name Into a Full Domain Name</H2>
2.2 timbl 145:
2.9 frystyk 146: This function expands the host name of the URI from a local name to a
2.11 frystyk 147: full domain name and converts the host name to lower case. The
148: advantage by doing this is that we only have one entry in the host
149: case and one entry in the document cache.
2.6 timbl 150:
2.9 frystyk 151: <PRE>
2.13 ! frystyk 152: extern char *HTCanon PARAMS (( char ** filename,
2.9 frystyk 153: char * host));
2.6 timbl 154: </PRE>
2.9 frystyk 155:
2.8 luotonen 156: <H2>Prevent Security Holes</H2>
157:
158: <CODE>HTCleanTelnetString()</CODE> makes sure that the given string
159: doesn't contain characters that could cause security holes, such as
160: newlines in ftp, gopher, news or telnet URLs; more specifically:
161: allows everything between hexadesimal ASCII 20-7E, and also A0-FE,
162: inclusive.
163: <DL>
164: <DT> <CODE>str</CODE>
165: <DD> the string that is *modified* if necessary. The string will be
166: truncated at the first illegal character that is encountered.
167: <DT>returns
168: <DD> YES, if the string was modified.
169: NO, otherwise.
170: </DL>
2.9 frystyk 171:
2.8 luotonen 172: <PRE>
2.13 ! frystyk 173: extern BOOL HTCleanTelnetString PARAMS((char * str));
2.8 luotonen 174: </PRE>
175:
176: <PRE>
2.6 timbl 177: #endif /* HTPARSE_H */
2.9 frystyk 178: </PRE>
2.2 timbl 179:
2.9 frystyk 180: End of HTParse Module
181: </BODY>
182: </HTML>
2.2 timbl 183:
Webmaster