Annotation of libwww/Library/src/HTParse.html, revision 2.13

2.2       timbl       1: <HEADER>
2.9       frystyk     2: <TITLE>HTParse:  URI parsing in the WWW Library</TITLE>
2.6       timbl       3: <NEXTID N="1">
                      4: </HEADER>
2.2       timbl       5: <BODY>
2.9       frystyk     6: <H1>HTParse</H1>
                      7: 
2.12      frystyk     8: This module is a part of the <A HREF="Overview.html">CERN Common WWW
                      9: Library</A>. It contains code to parse URIs and various related things
                     10: such as:
2.9       frystyk    11: 
                     12: <UL>
                     13: <LI> Parse a URI relative to another URI
                     14: <LI> Get the URI relative to another URI 
                     15: <LI> Remove redundant data from the URI (HTSimplify and HTCanon)
                     16: <LI> Expand a local host name into a full domain name (HTExpand)
                     17: <LI> Search a URI for illigal characters in order to prevent security holes
                     18: </UL>
                     19: 
                     20: Implemented by <A NAME=z0 HREF="HTParse.c">HTParse.c</A>.
                     21: 
                     22: <PRE>
                     23: #ifndef HTPARSE_H
2.2       timbl      24: #define HTPARSE_H
                     25: #include "HTUtils.h"
2.13    ! frystyk    26: #include "HTEscape.h"
2.9       frystyk    27: </PRE>
2.2       timbl      28: 
2.9       frystyk    29: <H2>HTParse:  Parse a URI relative to another URI</H2>
                     30: 
                     31: This returns those parts of a name which are given (and requested) substituting
                     32: bits from the related name where necessary.
                     33: 
                     34: <H3>Control Flags for HTParse</H3>
                     35: 
                     36: The following are flag bits which may be ORed together to form a number
                     37: to give the 'wanted' argument to HTParse.
                     38: 
                     39: <PRE>
                     40: #define PARSE_ACCESS           16
2.1       timbl      41: #define PARSE_HOST              8
                     42: #define PARSE_PATH              4
                     43: #define PARSE_ANCHOR            2
                     44: #define PARSE_PUNCTUATION       1
                     45: #define PARSE_ALL              31
2.9       frystyk    46: </PRE>
2.1       timbl      47: 
2.2       timbl      48: <H3>On entry</H3>
                     49: <DL>
                     50: <DT>aName
                     51: <DD> A filename given
                     52: <DT>relatedName
                     53: <DD> A name relative to which
                     54: aName is to be parsed
                     55: <DT>wanted
                     56: <DD> A mask for the bits which
                     57: are wanted.
                     58: </DL>
                     59: 
                     60: <H3>On exit,</H3>
                     61: <DL>
                     62: <DT>returns
                     63: <DD> A pointer to a malloc'd string
                     64: which MUST BE FREED
                     65: </DL>
                     66: 
                     67: <PRE>
2.13    ! frystyk    68: extern char * HTParse  PARAMS((        const char * aName,
2.9       frystyk    69:                                const char * relatedName,
                     70:                                int wanted));
                     71: </PRE>
2.2       timbl      72: 
2.9       frystyk    73: <H2>HTStrip: Strip white space off a string</H2>
2.1       timbl      74: 
2.2       timbl      75: <H3>On exit</H3>Return value points to first non-white
                     76: character, or to 0 if none.<P>
                     77: All trailing white space is OVERWRITTEN
                     78: with zero.
2.10      frystyk    79: 
                     80: <PRE>
2.13    ! frystyk    81: extern char * HTStrip PARAMS((char * s));
2.9       frystyk    82: </PRE>
                     83: 
                     84: <H2>HTSimplify: Simplify a UTL</H2>
2.1       timbl      85: 
2.9       frystyk    86: A URI is allowed to contain the seqeunce xxx/../ which may be
                     87: replaced by "" , and the seqeunce "/./" which may be replaced by "/".
                     88: Simplification helps us recognize duplicate URIs. Thus, the following
                     89: transformations are done:
                     90: 
                     91: <UL>
                     92: <LI> /etc/junk/../fred         becomes /etc/fred
                     93: <LI> /etc/junk/./fred  becomes /etc/junk/fred
                     94: </UL>
                     95: 
                     96: but we should NOT change
                     97: <UL>
                     98: <LI> http://fred.xxx.edu/../.. or
                     99: <LI> ../../albert.html
                    100: </UL>
                    101: 
                    102: In the same manner, the following prefixed are preserved:
                    103: 
                    104: <UL>
                    105: <LI> ./<etc>
                    106: <LI> //<etc>
                    107: </UL>
                    108: 
                    109: In order to avoid empty URIs the following URIs become:
                    110: 
                    111: <UL>
                    112: <LI> /fred/..          becomes /fred/..
                    113: <LI> /fred/././..              becomes /fred/..
                    114: <LI> /fred/.././junk/.././     becomes /fred/..
                    115: </UL>
                    116: 
                    117: If more than one set of `://' is found (several proxies in cascade) then
                    118: only the part after the last `://' is simplified.
                    119: 
                    120: <PRE>
2.13    ! frystyk   121: extern char *HTSimplify PARAMS((char *filename));
2.2       timbl     122: </PRE>
2.1       timbl     123: 
2.6       timbl     124: 
                    125: <H2>HTRelative:  Make Relative (Partial)
2.9       frystyk   126: URI</H2>This function creates and returns
2.2       timbl     127: a string which gives an expression
                    128: of one address as related to another.
                    129: Where there is no relation, an absolute
                    130: address is retured.
                    131: <H3>On entry,</H3>Both names must be absolute, fully
                    132: qualified names of nodes (no anchor
                    133: bits)
                    134: <H3>On exit,</H3>The return result points to a newly
                    135: allocated name which, if parsed by
                    136: HTParse relative to relatedName,
                    137: will yield aName. The caller is responsible
                    138: for freeing the resulting name later.
2.10      frystyk   139: 
                    140: <PRE>
2.13    ! frystyk   141: extern char * HTRelative PARAMS((const char * aName, const char *relatedName));
2.9       frystyk   142: </PRE>
                    143: 
                    144: <H2>HTExpand: Expand a Local Host Name Into a Full Domain Name</H2>
2.2       timbl     145: 
2.9       frystyk   146: This function expands the host name of the URI from a local name to a
2.11      frystyk   147: full domain name and converts the host name to lower case. The
                    148: advantage by doing this is that we only have one entry in the host
                    149: case and one entry in the document cache.
2.6       timbl     150: 
2.9       frystyk   151: <PRE>
2.13    ! frystyk   152: extern char *HTCanon PARAMS (( char ** filename,
2.9       frystyk   153:                                char *  host));
2.6       timbl     154: </PRE>
2.9       frystyk   155: 
2.8       luotonen  156: <H2>Prevent Security Holes</H2>
                    157: 
                    158: <CODE>HTCleanTelnetString()</CODE> makes sure that the given string
                    159: doesn't contain characters that could cause security holes, such as
                    160: newlines in ftp, gopher, news or telnet URLs; more specifically:
                    161: allows everything between hexadesimal ASCII 20-7E, and also A0-FE,
                    162: inclusive.
                    163: <DL>
                    164: <DT> <CODE>str</CODE>
                    165: <DD> the string that is *modified* if necessary.  The string will be
                    166:      truncated at the first illegal character that is encountered.
                    167: <DT>returns
                    168: <DD> YES, if the string was modified.
                    169:      NO, otherwise.
                    170: </DL>
2.9       frystyk   171: 
2.8       luotonen  172: <PRE>
2.13    ! frystyk   173: extern BOOL HTCleanTelnetString PARAMS((char * str));
2.8       luotonen  174: </PRE>
                    175: 
                    176: <PRE>
2.6       timbl     177: #endif /* HTPARSE_H */
2.9       frystyk   178: </PRE>
2.2       timbl     179: 
2.9       frystyk   180: End of HTParse Module
                    181: </BODY>
                    182: </HTML>
2.2       timbl     183: 

Webmaster