Annotation of libwww/Library/src/HTCache.html, revision 2.26

2.1       frystyk     1: <HTML>
                      2: <HEAD>
2.16      frystyk     3:   <TITLE>W3C Sample Code Library libwww Persistent Cache Manager</TITLE>
2.1       frystyk     4: </HEAD>
                      5: <BODY>
2.10      frystyk     6: <H1>
                      7:   Persistent Cache Manager
                      8: </H1>
2.1       frystyk     9: <PRE>
                     10: /*
                     11: **     (c) COPYRIGHT MIT 1995.
                     12: **     Please first read the full copyright statement in the file COPYRIGH.
                     13: */
                     14: </PRE>
2.10      frystyk    15: <P>
2.11      frystyk    16: The cache contains details of persistent files which contain the contents
2.10      frystyk    17: of remote documents. The existing cache manager is somewhat naive - especially
2.11      frystyk    18: in its garbage collection but it is just an example of how it can be
                     19: done.However, it is a fully HTTP/1.1 compliant cache manager.&nbsp;More advanced
                     20: implementations are welcome!
2.10      frystyk    21: <P>
                     22: This module is implemented by <A HREF="HTCache.c">HTCache.c</A>, and it is
2.18      frystyk    23: a part of the <A HREF="http://www.w3.org/Library/">W3C Sample Code Library</A>.
2.1       frystyk    24: <PRE>
                     25: #ifndef HTCACHE_H
                     26: #define HTCACHE_H
                     27: 
2.10      frystyk    28: #include "WWWLib.h"
2.26    ! vbancrof   29: 
        !            30: #ifdef __cplusplus
        !            31: extern "C" { 
        !            32: #endif 
2.1       frystyk    33: </PRE>
2.10      frystyk    34: <H2>
2.11      frystyk    35:   Initialize and Terminate the Persistent Cache
2.10      frystyk    36: </H2>
                     37: <P>
2.21      frystyk    38: The <CODE>cache_root</CODE> is the URI of the location of the persistent
                     39: cache. An example is "<CODE>file:/tmp/w3c-lib</CODE>". If
                     40: <CODE>cache_root</CODE> is <CODE>NULL</CODE> then determine a cache root
                     41: using the following algorithm:
                     42: <OL>
                     43:   <LI>
                     44:     Look for any environment variables (if supported) in the following order:
                     45:     <CODE>WWW_CACHE</CODE>, <CODE>TMP</CODE>, and <CODE>TEMP</CODE>. If none
                     46:     are set then then fall back on "<CODE>/tmp</CODE>".
                     47:   <LI>
                     48:     Append the folder name "<CODE>w3c-cache</CODE>" to the root identified above
                     49: </OL>
                     50: <P>
                     51: The <CODE>cache_root</CODE> location does not have to exist, it will be created
                     52: automatically if not. An empty string will make '/' the cache root.
                     53: <P>
                     54: The size is the total size in MBytes - the default size is 20M. The cache
                     55: can not be less than 5M.
                     56: <P>
                     57: We can only enable the cache if we are in <A HREF="HTLib.html#Secure">secure
                     58: mode</A> where we can not access the local file system. This is for example
                     59: the case if using an application as a telnet shell.
2.11      frystyk    60: <PRE>
                     61: extern BOOL HTCacheInit (const char * cache_root, int size);
2.10      frystyk    62: </PRE>
                     63: <P>
2.11      frystyk    64: After the cache has been terminated it can not be used anymore unless you
                     65: do another <CODE>HTCacheInit()</CODE> call.
                     66: <PRE>
                     67: extern BOOL HTCacheTerminate (void);
2.10      frystyk    68: </PRE>
2.11      frystyk    69: <H2>
                     70:   Cache Mode Parameters
                     71: </H2>
2.10      frystyk    72: <P>
2.22      kahan      73: The persistent cache has a set of overall parameters that you can adjust
2.10      frystyk    74: <H3>
2.11      frystyk    75:   Enable and Disable the Cache
2.10      frystyk    76: </H3>
                     77: <P>
2.11      frystyk    78: The cache can be temporarily suspended by using the enable/disable flag.
                     79: This does not prevent the cache from being enabled/disable at a later point
                     80: in time.
                     81: <PRE>
                     82: extern void HTCacheMode_setEnabled (BOOL mode);
                     83: extern BOOL HTCacheMode_enabled (void);
2.10      frystyk    84: </PRE>
2.22      kahan      85: <P>
                     86: The cache can be setup to whether cache password protected documents thru the
                     87: protected flag. By default this flag is turned off.
                     88: <PRE>
                     89: extern void HTCacheMode_setProtected (BOOL mode);
                     90: extern BOOL HTCacheMode_protected (void);
                     91: </PRE>
2.10      frystyk    92: <H3>
2.11      frystyk    93:   What is the current Cache Root?
2.10      frystyk    94: </H3>
                     95: <P>
2.11      frystyk    96: Return the value of the cache root. The cache root can only be set through
2.21      frystyk    97: the <CODE>HTCacheInit()</CODE> function. The string returned MUST be freed
                     98: by the caller
2.11      frystyk    99: <PRE>
2.21      frystyk   100: extern char * HTCacheMode_getRoot (void);
2.10      frystyk   101: </PRE>
                    102: <H3>
2.11      frystyk   103:   Total Cache Size
2.10      frystyk   104: </H3>
                    105: <P>
2.11      frystyk   106: We set the default cache size to 20M. We set the minimum size to 5M in order
                    107: not to get into weird problems while writing the cache. The size is indicated
                    108: in Mega bytes. The size is given in MBytes and is also returned in MBytes.
2.14      frystyk   109: We don't consider the metainformation as part of the total cache size which
                    110: is the the reason for why the min cache size should not be less than 5M.
2.11      frystyk   111: <PRE>
                    112: extern BOOL HTCacheMode_setMaxSize (int size);
                    113: extern int  HTCacheMode_maxSize    (void);
2.10      frystyk   114: </PRE>
                    115: <H3>
2.19      frystyk   116:   Max Size of a Single Cache Entry
                    117: </H3>
                    118: <P>
2.20      frystyk   119: It is also possible to control the max size of a single cache entry so that
                    120: the cache doesn't get filled with a very few, very large cached entries.
                    121: The default max size for a single cached entry is 3M. The value indicated
                    122: must be in Mbytes, for example, a vaue of 3 would mean 3 MBytes.
2.19      frystyk   123: <PRE>
                    124: extern BOOL HTCacheMode_setMaxCacheEntrySize (int size);
                    125: extern int HTCacheMode_maxCacheEntrySize (void);
                    126: </PRE>
                    127: <H3>
2.23      kahan     128:  Default expiration time of cache entries
                    129: </H3>
                    130: <P>
                    131: If a response does not arrive with an expiration time and does not
                    132: explicitly forbid its being cached, use the default expiration time. The
                    133: time is given in seconds (e.g., 3,600 is one hour).
                    134: <PRE>
                    135: extern void HTCacheMode_setDefaultExpiration (const int exp_time);
                    136: extern int HTCacheMode_DefaultExpiration (void);
                    137: </PRE>
                    138: <H3>
2.10      frystyk   139:   How do we handle Expiration of Cached Objects?
                    140: </H3>
                    141: <P>
                    142: There are various ways of handling <CODE>Expires</CODE> header when met in
2.11      frystyk   143: a <I>history list</I>. Either it can be ignored all together, the user can
                    144: be notified with a warning, or the document can be reloaded automatically.
                    145: This flag decides what action to be taken. The default action is
2.10      frystyk   146: <CODE>HT_EXPIRES_IGNORE</CODE>. In <CODE>HT_EXPIRES_NOTIFY</CODE> mode ,
                    147: we push a message on to the Error stack which is presented to the user.
2.4       frystyk   148: <PRE>
                    149: typedef enum _HTExpiresMode {
                    150:     HT_EXPIRES_IGNORE = 0,
                    151:     HT_EXPIRES_NOTIFY,
                    152:     HT_EXPIRES_AUTO
                    153: } HTExpiresMode;
                    154: 
2.11      frystyk   155: extern void HTCacheMode_setExpires (HTExpiresMode mode);
                    156: extern HTExpiresMode HTCacheMode_expires (void);
                    157: </PRE>
                    158: <H3>
                    159:   Disconnected Operation
                    160: </H3>
                    161: <P>
                    162: The cache can be set to handle disconnected operation where it does not use
2.20      frystyk   163: the network to validate entries and do not attempt to load new documents.
2.11      frystyk   164: All requests that can not be fulfilled by the cache will be returned with
                    165: a <CODE>"504 Gateway Timeout"</CODE> response. There are two modes of how
2.21      frystyk   166: the cache can operate in disconnected mode:
2.20      frystyk   167: <DL>
                    168:   <DT>
                    169:     <EM>No network activity at all</EM>
                    170:   <DD>
                    171:     Here is uses its own persistent cache
                    172:   <DT>
                    173:     <EM>Forward all disconnected requests to a proxy cache</EM>
                    174:   <DD>
                    175:     Here it uses the HTTP/1.1 cache-control to indicate that the proxy should
                    176:     operate in disconnected mode. This mode only really makes sense when you
                    177:     are using a proxy, of course.
                    178: </DL>
2.11      frystyk   179: <PRE>
                    180: typedef enum _HTDisconnectedMode {
                    181:     HT_DISCONNECT_NONE     = 0,
                    182:     HT_DISCONNECT_NORMAL   = 1,
                    183:     HT_DISCONNECT_EXTERNAL = 2
                    184: } HTDisconnectedMode;
                    185: 
                    186: extern void HTCacheMode_setDisconnected (HTDisconnectedMode mode);
                    187: extern HTDisconnectedMode HTCacheMode_disconnected (void);
                    188: extern BOOL HTCacheMode_isDisconnected (HTReload mode);
2.1       frystyk   189: </PRE>
2.10      frystyk   190: <H2>
2.12      frystyk   191:   The Cache Index
                    192: </H2>
                    193: <P>
                    194: The persistent cache keeps an index of its current entries so that garbage
                    195: collection and lookup becomes more efficient. This index is stored automatically
                    196: at regular intervals so that we don't get out of sync. Also, it is automatically
                    197: loaded at startup and saved at closedown of the cache.
                    198: <H3>
                    199:   Reading the Cache Index
                    200: </H3>
                    201: <P>
                    202: Read the saved set of cached entries from disk. we only allow the index ro
                    203: be read when there is no entries in memory. That way we can ensure consistancy.
                    204: <PRE>
                    205: extern BOOL HTCacheIndex_read (const char * cache_root);
                    206: </PRE>
                    207: <H3>
                    208:   Write the Cache Index
                    209: </H3>
                    210: <P>
                    211: Walk through the list of cached objects and save them to disk. We override
                    212: any existing version but that is normally OK as we have already read its
                    213: contents.
                    214: <PRE>
                    215: extern BOOL HTCacheIndex_write (const char * cache_root);
                    216: </PRE>
                    217: <H2>
2.11      frystyk   218:   The HTCache Object
2.10      frystyk   219: </H2>
                    220: <P>
2.11      frystyk   221: The cache object is what we store about a cached objet in memory.
                    222: <PRE>
                    223: typedef struct _HTCache HTCache;
                    224: </PRE>
                    225: <H3>
2.12      frystyk   226:   Create and Update a Cache Object
2.11      frystyk   227: </H3>
                    228: <P>
2.10      frystyk   229: Filling the cache is done as all other transportation of bulk data in libwww
                    230: using <A HREF="HTStream.html">streams</A>. The cache object creater is a
                    231: stream which in many cases sits on a <A HREF="HTTee.html">T stream</A> so
                    232: that we get the original feed and at the same time can parse the contents.
2.14      frystyk   233: <P>
                    234: In some situations, we want to append data to an already exiting cache entry.
                    235: This is the case when a use has interrupted a download and we are stuck with
                    236: a subpart of the document. If the user later on whishes to download the object
                    237: again we can issue a range request and continue from where we were. This
                    238: will in many situations save a lot of bandwidth.
2.11      frystyk   239: <PRE>
2.14      frystyk   240: extern HTConverter HTCacheWriter, HTCacheAppend;
2.11      frystyk   241: </PRE>
2.12      frystyk   242: <P>
                    243: This function writes the metainformation along with the data object stored
                    244: by the HTCacheWriter stream above. If no headers are available then the meta
                    245: file is empty
                    246: <PRE>
2.14      frystyk   247: extern BOOL HTCache_writeMeta (HTCache * cache, HTRequest * request,
                    248:                                HTResponse * response);
2.12      frystyk   249: </PRE>
                    250: <P>
                    251: In case we received a "<CODE>304 Not Modified</CODE>" response then we do
                    252: not have to tough the body but must merge the metainformation with the previous
                    253: version. Therefore we need a special metainformation update function.
                    254: <PRE>
2.14      frystyk   255: extern BOOL HTCache_updateMeta (HTCache * cache, HTRequest * request,
                    256:                                 HTResponse * response);
2.12      frystyk   257: </PRE>
2.25      kahan     258: <P>
                    259: Clear a cache entry
                    260: <PRE>
                    261: extern BOOL HTCache_resetMeta (HTCache * cache, HTRequest * request,
                    262:                                 HTResponse * response);
                    263: </PRE>
2.11      frystyk   264: <H3>
2.18      frystyk   265:   Check Cached Entry
                    266: </H3>
                    267: <P>
                    268: After we get a response back, we should check whether we can still cache
                    269: an entry and/or we should add an entry for a resource that has just been
                    270: created so that we can remember the etag and other things. The latter allows
                    271: us to guarantee that we don't loose data due to the lost update problem.
                    272: <PRE>
                    273: extern HTCache * HTCache_touch (HTRequest * request, HTResponse * response,
                    274:                                 HTParentAnchor * anchor);
                    275: </PRE>
                    276: <P>
                    277: <H3>
2.11      frystyk   278:   Load a Cached Object
                    279: </H3>
                    280: <P>
                    281: Loading a cached object is also done as all other loads in libwww by using
                    282: a <A HREF="HTProt.html">protocol load module</A>. For the moment, this load
                    283: function handles the persistent cache as if it was on local file but in fact
                    284: &nbsp;it could be anywhere.
                    285: <PRE>
2.15      frystyk   286: extern HTProtCallback HTLoadCache;
2.11      frystyk   287: </PRE>
                    288: <H3>
                    289:   Delete a Cache Object
                    290: </H3>
                    291: <P>
                    292: Remove a HTCache object from memory and from disk. You must explicitly remove
                    293: a lock before this operation can succeed
                    294: <PRE>
                    295: extern BOOL HTCache_remove (HTCache * cache);
                    296: </PRE>
                    297: <H3>
2.13      frystyk   298:   Delete All Cache Objects in Memory
2.11      frystyk   299: </H3>
                    300: <P>
                    301: Destroys all cache entried in memory but does not write anything to disk.
                    302: Use the index methods above for doing that. We do not delete the disk contents.
                    303: <PRE>
                    304: extern BOOL HTCache_deleteAll (void);
2.10      frystyk   305: </PRE>
                    306: <H3>
2.13      frystyk   307:   Delete all Cache Object and File Entries
                    308: </H3>
                    309: <P>
                    310: Destroys all cache entried in memory <B>and</B> on disk. This call basically
                    311: resets the cache to the inital state but it does not terminate the cache.
                    312: That is, you don't have to reinitialize the cache before you can use it again.
                    313: <PRE>
                    314: extern BOOL HTCache_flushAll (void);
                    315: </PRE>
                    316: <H3>
2.11      frystyk   317:   Find a Cached Object
2.10      frystyk   318: </H3>
                    319: <P>
                    320: Verifies if a cache object exists for this URL and if so returns a URL for
                    321: the cached object. It does not verify whether the object is valid or not,
                    322: for example it might have expired. Use the cache validation methods for checking
                    323: this.
2.11      frystyk   324: <PRE>
2.24      kahan     325: extern HTCache * HTCache_find (HTParentAnchor * anchor, char * default_name);
2.11      frystyk   326: </PRE>
                    327: <H3>
                    328:   Verify if an Object is Fresh
                    329: </H3>
                    330: <P>
                    331: This function checks whether a document has expired or not. The check is
                    332: based on the metainformation passed in the anchor object The function returns
                    333: the level of validation needed for getting a fresh version. We also check
                    334: the cache control directives in the request to see if they change the freshness
                    335: discission.
                    336: <PRE>
                    337: extern HTReload HTCache_isFresh (HTCache * me, HTRequest * request);
                    338: </PRE>
                    339: <H3>
                    340:   Register a Cache Hit
                    341: </H3>
                    342: <P>
                    343: As a cache hit may occur several places, we have a public function where
                    344: we can declare a download to be a true cache hit. The number of hits a cache
                    345: object has affects its status when we are doing garbage collection.
                    346: <PRE>
                    347: extern BOOL HTCache_addHit (HTCache * cache);
                    348: </PRE>
                    349: <H3>
                    350:   Find the Location of a Cached Object
                    351: </H3>
                    352: <P>
                    353: Is we have a valid entry in the cache then we also need a location where
                    354: we can get it. Hopefully, we may be able to access it thourgh one of our
                    355: protocol modules, for example the <A HREF="WWWFile.html">local file module</A>.
                    356: The name returned is in URL syntax and must be freed by the caller
                    357: <PRE>
                    358: extern char * HTCache_name (HTCache * cache);
                    359: </PRE>
                    360: <H3>
                    361:   Locking a Cache Object
                    362: </H3>
                    363: <P>
                    364: While we are creating a new cache object or while we are validating an existing
                    365: one, we must have a lock on the entry so that not other requests can get
                    366: to it in the mean while. A lock can be broken if the same request tries to
                    367: create the cache entry again. This means that we have tried to validate the
                    368: cache entry but we got a new shipment of bytes back from the origin server
                    369: or an intermediary proxy.
                    370: <PRE>
                    371: extern BOOL HTCache_getLock     (HTCache * cache, HTRequest * request);
                    372: extern BOOL HTCache_breakLock   (HTCache * cache, HTRequest * request);
                    373: extern BOOL HTCache_hasLock     (HTCache * cache);
                    374: extern BOOL HTCache_releaseLock (HTCache * cache);
2.1       frystyk   375: </PRE>
                    376: <PRE>
2.26    ! vbancrof  377: #ifdef __cplusplus
        !           378: }
2.1       frystyk   379: #endif
2.26    ! vbancrof  380: 
        !           381: #endif  /* HTCACHE_H */
2.1       frystyk   382: </PRE>
2.10      frystyk   383: <P>
                    384:   <HR>
2.9       frystyk   385: <ADDRESS>
2.26    ! vbancrof  386:   @(#) $Id: HTCache.html,v 2.25 2001/08/30 13:41:03 kahan Exp $
2.9       frystyk   387: </ADDRESS>
2.10      frystyk   388: </BODY></HTML>

Webmaster