Annotation of libwww/Library/src/HTParse.html, revision 2.14.2.1

2.14      frystyk     1: <HTML>
                      2: <HEAD>
                      3: <TITLE>URI Parsing in the WWW Library</TITLE>
2.6       timbl       4: <NEXTID N="1">
2.14      frystyk     5: </HEAD>
2.2       timbl       6: <BODY>
2.14      frystyk     7: 
2.9       frystyk     8: <H1>HTParse</H1>
                      9: 
2.14      frystyk    10: <PRE>
                     11: /*
                     12: **     (c) COPYRIGHT CERN 1994.
                     13: **     Please first read the full copyright statement in the file COPYRIGH.
                     14: */
                     15: </PRE>
                     16: 
                     17: This module contains code to parse URIs and various related things such as:
2.9       frystyk    18: 
                     19: <UL>
                     20: <LI> Parse a URI relative to another URI
                     21: <LI> Get the URI relative to another URI 
                     22: <LI> Remove redundant data from the URI (HTSimplify and HTCanon)
                     23: <LI> Expand a local host name into a full domain name (HTExpand)
                     24: <LI> Search a URI for illigal characters in order to prevent security holes
                     25: </UL>
                     26: 
2.14      frystyk    27: This module is implemented by <A HREF="HTParse.c">HTParse.c</A>, and it is
                     28: a part of the <A
                     29: HREF="http://info.cern.ch/hypertext/WWW/Library/User/Guide/Guide.html">
                     30: Library of Common Code</A>.
2.9       frystyk    31: 
                     32: <PRE>
                     33: #ifndef HTPARSE_H
2.2       timbl      34: #define HTPARSE_H
2.14.2.1! frystyk    35: 
2.13      frystyk    36: #include "HTEscape.h"
2.9       frystyk    37: </PRE>
2.2       timbl      38: 
2.9       frystyk    39: <H2>HTParse:  Parse a URI relative to another URI</H2>
                     40: 
                     41: This returns those parts of a name which are given (and requested) substituting
                     42: bits from the related name where necessary.
                     43: 
                     44: <H3>Control Flags for HTParse</H3>
                     45: 
                     46: The following are flag bits which may be ORed together to form a number
                     47: to give the 'wanted' argument to HTParse.
                     48: 
                     49: <PRE>
                     50: #define PARSE_ACCESS           16
2.1       timbl      51: #define PARSE_HOST              8
                     52: #define PARSE_PATH              4
                     53: #define PARSE_ANCHOR            2
                     54: #define PARSE_PUNCTUATION       1
                     55: #define PARSE_ALL              31
2.9       frystyk    56: </PRE>
2.1       timbl      57: 
2.2       timbl      58: <H3>On entry</H3>
                     59: <DL>
                     60: <DT>aName
                     61: <DD> A filename given
                     62: <DT>relatedName
                     63: <DD> A name relative to which
                     64: aName is to be parsed
                     65: <DT>wanted
                     66: <DD> A mask for the bits which
                     67: are wanted.
                     68: </DL>
                     69: 
                     70: <H3>On exit,</H3>
                     71: <DL>
                     72: <DT>returns
                     73: <DD> A pointer to a malloc'd string
                     74: which MUST BE FREED
                     75: </DL>
                     76: 
                     77: <PRE>
2.13      frystyk    78: extern char * HTParse  PARAMS((        const char * aName,
2.9       frystyk    79:                                const char * relatedName,
                     80:                                int wanted));
                     81: </PRE>
2.2       timbl      82: 
2.9       frystyk    83: <H2>HTStrip: Strip white space off a string</H2>
2.1       timbl      84: 
2.2       timbl      85: <H3>On exit</H3>Return value points to first non-white
                     86: character, or to 0 if none.<P>
                     87: All trailing white space is OVERWRITTEN
                     88: with zero.
2.10      frystyk    89: 
                     90: <PRE>
2.13      frystyk    91: extern char * HTStrip PARAMS((char * s));
2.9       frystyk    92: </PRE>
                     93: 
                     94: <H2>HTSimplify: Simplify a UTL</H2>
2.1       timbl      95: 
2.9       frystyk    96: A URI is allowed to contain the seqeunce xxx/../ which may be
                     97: replaced by "" , and the seqeunce "/./" which may be replaced by "/".
                     98: Simplification helps us recognize duplicate URIs. Thus, the following
                     99: transformations are done:
                    100: 
                    101: <UL>
                    102: <LI> /etc/junk/../fred         becomes /etc/fred
                    103: <LI> /etc/junk/./fred  becomes /etc/junk/fred
                    104: </UL>
                    105: 
                    106: but we should NOT change
                    107: <UL>
                    108: <LI> http://fred.xxx.edu/../.. or
                    109: <LI> ../../albert.html
                    110: </UL>
                    111: 
                    112: In the same manner, the following prefixed are preserved:
                    113: 
                    114: <UL>
                    115: <LI> ./<etc>
                    116: <LI> //<etc>
                    117: </UL>
                    118: 
                    119: In order to avoid empty URIs the following URIs become:
                    120: 
                    121: <UL>
                    122: <LI> /fred/..          becomes /fred/..
                    123: <LI> /fred/././..              becomes /fred/..
                    124: <LI> /fred/.././junk/.././     becomes /fred/..
                    125: </UL>
                    126: 
                    127: If more than one set of `://' is found (several proxies in cascade) then
                    128: only the part after the last `://' is simplified.
                    129: 
                    130: <PRE>
2.13      frystyk   131: extern char *HTSimplify PARAMS((char *filename));
2.2       timbl     132: </PRE>
2.1       timbl     133: 
2.6       timbl     134: 
                    135: <H2>HTRelative:  Make Relative (Partial)
2.9       frystyk   136: URI</H2>This function creates and returns
2.2       timbl     137: a string which gives an expression
                    138: of one address as related to another.
                    139: Where there is no relation, an absolute
                    140: address is retured.
                    141: <H3>On entry,</H3>Both names must be absolute, fully
                    142: qualified names of nodes (no anchor
                    143: bits)
                    144: <H3>On exit,</H3>The return result points to a newly
                    145: allocated name which, if parsed by
                    146: HTParse relative to relatedName,
                    147: will yield aName. The caller is responsible
                    148: for freeing the resulting name later.
2.10      frystyk   149: 
                    150: <PRE>
2.13      frystyk   151: extern char * HTRelative PARAMS((const char * aName, const char *relatedName));
2.9       frystyk   152: </PRE>
                    153: 
                    154: <H2>HTExpand: Expand a Local Host Name Into a Full Domain Name</H2>
2.2       timbl     155: 
2.9       frystyk   156: This function expands the host name of the URI from a local name to a
2.11      frystyk   157: full domain name and converts the host name to lower case. The
                    158: advantage by doing this is that we only have one entry in the host
                    159: case and one entry in the document cache.
2.6       timbl     160: 
2.9       frystyk   161: <PRE>
2.13      frystyk   162: extern char *HTCanon PARAMS (( char ** filename,
2.9       frystyk   163:                                char *  host));
2.6       timbl     164: </PRE>
2.9       frystyk   165: 
2.8       luotonen  166: <H2>Prevent Security Holes</H2>
                    167: 
                    168: <CODE>HTCleanTelnetString()</CODE> makes sure that the given string
                    169: doesn't contain characters that could cause security holes, such as
                    170: newlines in ftp, gopher, news or telnet URLs; more specifically:
                    171: allows everything between hexadesimal ASCII 20-7E, and also A0-FE,
                    172: inclusive.
                    173: <DL>
                    174: <DT> <CODE>str</CODE>
                    175: <DD> the string that is *modified* if necessary.  The string will be
                    176:      truncated at the first illegal character that is encountered.
                    177: <DT>returns
                    178: <DD> YES, if the string was modified.
                    179:      NO, otherwise.
                    180: </DL>
2.9       frystyk   181: 
2.8       luotonen  182: <PRE>
2.13      frystyk   183: extern BOOL HTCleanTelnetString PARAMS((char * str));
2.8       luotonen  184: </PRE>
                    185: 
                    186: <PRE>
2.6       timbl     187: #endif /* HTPARSE_H */
2.9       frystyk   188: </PRE>
2.2       timbl     189: 
2.9       frystyk   190: End of HTParse Module
                    191: </BODY>
                    192: </HTML>
2.2       timbl     193: 

Webmaster