Annotation of libwww/Library/src/HTCache.html, revision 2.17

2.1       frystyk     1: <HTML>
                      2: <HEAD>
2.16      frystyk     3:   <TITLE>W3C Sample Code Library libwww Persistent Cache Manager</TITLE>
2.1       frystyk     4: </HEAD>
                      5: <BODY>
2.10      frystyk     6: <H1>
                      7:   Persistent Cache Manager
                      8: </H1>
2.1       frystyk     9: <PRE>
                     10: /*
                     11: **     (c) COPYRIGHT MIT 1995.
                     12: **     Please first read the full copyright statement in the file COPYRIGH.
                     13: */
                     14: </PRE>
2.10      frystyk    15: <P>
2.11      frystyk    16: The cache contains details of persistent files which contain the contents
2.10      frystyk    17: of remote documents. The existing cache manager is somewhat naive - especially
2.11      frystyk    18: in its garbage collection but it is just an example of how it can be
                     19: done.However, it is a fully HTTP/1.1 compliant cache manager.&nbsp;More advanced
                     20: implementations are welcome!
2.10      frystyk    21: <P>
                     22: This module is implemented by <A HREF="HTCache.c">HTCache.c</A>, and it is
2.17    ! frystyk    23: a part of the <A HREF="http://www.w3.org/Library/">W3C Sample Code
2.10      frystyk    24: Library</A>.
2.1       frystyk    25: <PRE>
                     26: #ifndef HTCACHE_H
                     27: #define HTCACHE_H
                     28: 
2.10      frystyk    29: #include "WWWLib.h"
2.1       frystyk    30: </PRE>
2.10      frystyk    31: <H2>
2.11      frystyk    32:   Initialize and Terminate the Persistent Cache
2.10      frystyk    33: </H2>
                     34: <P>
2.11      frystyk    35: If `cache_root' is <CODE>NULL</CODE> then use <CODE>HT_CACHE_ROOT</CODE>
                     36: which by default is set to "<CODE>/tmp/w3c-lib</CODE>". The
                     37: <CODE>cache_root</CODE> location does not have to exist, it will be created
                     38: automatically if not. An empty string will make '/' as cache root. The size
                     39: is the total size in MBytes - the default size is 20M. The cache can not
                     40: be less than 5M. We can only enable the cache if we are in
                     41: <A HREF="HTLib.html#Secure">secure mode </A>where we can not access the local
                     42: file system.&nbsp;This is for example the case if using an application as
                     43: a telnet shell.
                     44: <PRE>
                     45: extern BOOL HTCacheInit (const char * cache_root, int size);
2.10      frystyk    46: </PRE>
                     47: <P>
2.11      frystyk    48: After the cache has been terminated it can not be used anymore unless you
                     49: do another <CODE>HTCacheInit()</CODE> call.
                     50: <PRE>
                     51: extern BOOL HTCacheTerminate (void);
2.10      frystyk    52: </PRE>
2.11      frystyk    53: <H2>
                     54:   Cache Mode Parameters
                     55: </H2>
2.10      frystyk    56: <P>
2.11      frystyk    57: The persistent cache has a set of overall parameters &nbsp;that you can adust
2.10      frystyk    58: <H3>
2.11      frystyk    59:   Enable and Disable the Cache
2.10      frystyk    60: </H3>
                     61: <P>
2.11      frystyk    62: The cache can be temporarily suspended by using the enable/disable flag.
                     63: This does not prevent the cache from being enabled/disable at a later point
                     64: in time.
                     65: <PRE>
                     66: extern void HTCacheMode_setEnabled (BOOL mode);
                     67: extern BOOL HTCacheMode_enabled (void);
2.10      frystyk    68: </PRE>
                     69: <H3>
2.11      frystyk    70:   What is the current Cache Root?
2.10      frystyk    71: </H3>
                     72: <P>
2.11      frystyk    73: Return the value of the cache root. The cache root can only be set through
                     74: the <CODE>HTCacheInit()</CODE> function
                     75: <PRE>
                     76: extern const char * HTCacheMode_getRoot        (void);
2.10      frystyk    77: </PRE>
                     78: <H3>
2.11      frystyk    79:   Total Cache Size
2.10      frystyk    80: </H3>
                     81: <P>
2.11      frystyk    82: We set the default cache size to 20M. We set the minimum size to 5M in order
                     83: not to get into weird problems while writing the cache. The size is indicated
                     84: in Mega bytes. The size is given in MBytes and is also returned in MBytes.
2.14      frystyk    85: We don't consider the metainformation as part of the total cache size which
                     86: is the the reason for why the min cache size should not be less than 5M.
2.11      frystyk    87: <PRE>
                     88: extern BOOL HTCacheMode_setMaxSize (int size);
                     89: extern int  HTCacheMode_maxSize    (void);
2.10      frystyk    90: </PRE>
                     91: <H3>
                     92:   How do we handle Expiration of Cached Objects?
                     93: </H3>
                     94: <P>
                     95: There are various ways of handling <CODE>Expires</CODE> header when met in
2.11      frystyk    96: a <I>history list</I>. Either it can be ignored all together, the user can
                     97: be notified with a warning, or the document can be reloaded automatically.
                     98: This flag decides what action to be taken. The default action is
2.10      frystyk    99: <CODE>HT_EXPIRES_IGNORE</CODE>. In <CODE>HT_EXPIRES_NOTIFY</CODE> mode ,
                    100: we push a message on to the Error stack which is presented to the user.
2.4       frystyk   101: <PRE>
                    102: typedef enum _HTExpiresMode {
                    103:     HT_EXPIRES_IGNORE = 0,
                    104:     HT_EXPIRES_NOTIFY,
                    105:     HT_EXPIRES_AUTO
                    106: } HTExpiresMode;
                    107: 
2.11      frystyk   108: extern void HTCacheMode_setExpires (HTExpiresMode mode);
                    109: extern HTExpiresMode HTCacheMode_expires (void);
                    110: </PRE>
                    111: <H3>
                    112:   Disconnected Operation
                    113: </H3>
                    114: <P>
                    115: The cache can be set to handle disconnected operation where it does not use
                    116: the network to validate entries and do not attempt to load new versions.
                    117: All requests that can not be fulfilled by the cache will be returned with
                    118: a <CODE>"504 Gateway Timeout"</CODE> response. There are two modes of how
                    119: the cache can operate in disconnected mode: it can use diconnected mode on
                    120: its own persistent cache or it can forward the disconnected request to a
                    121: proxy cache, for example. The latter mode only really makes sense when you
                    122: are using a proxy, of course.
                    123: <PRE>
                    124: typedef enum _HTDisconnectedMode {
                    125:     HT_DISCONNECT_NONE     = 0,
                    126:     HT_DISCONNECT_NORMAL   = 1,
                    127:     HT_DISCONNECT_EXTERNAL = 2
                    128: } HTDisconnectedMode;
                    129: 
                    130: extern void HTCacheMode_setDisconnected (HTDisconnectedMode mode);
                    131: extern HTDisconnectedMode HTCacheMode_disconnected (void);
                    132: extern BOOL HTCacheMode_isDisconnected (HTReload mode);
2.1       frystyk   133: </PRE>
2.10      frystyk   134: <H2>
2.12      frystyk   135:   The Cache Index
                    136: </H2>
                    137: <P>
                    138: The persistent cache keeps an index of its current entries so that garbage
                    139: collection and lookup becomes more efficient. This index is stored automatically
                    140: at regular intervals so that we don't get out of sync. Also, it is automatically
                    141: loaded at startup and saved at closedown of the cache.
                    142: <H3>
                    143:   Reading the Cache Index
                    144: </H3>
                    145: <P>
                    146: Read the saved set of cached entries from disk. we only allow the index ro
                    147: be read when there is no entries in memory. That way we can ensure consistancy.
                    148: <PRE>
                    149: extern BOOL HTCacheIndex_read (const char * cache_root);
                    150: </PRE>
                    151: <H3>
                    152:   Write the Cache Index
                    153: </H3>
                    154: <P>
                    155: Walk through the list of cached objects and save them to disk. We override
                    156: any existing version but that is normally OK as we have already read its
                    157: contents.
                    158: <PRE>
                    159: extern BOOL HTCacheIndex_write (const char * cache_root);
                    160: </PRE>
                    161: <H2>
2.11      frystyk   162:   The HTCache Object
2.10      frystyk   163: </H2>
                    164: <P>
2.11      frystyk   165: The cache object is what we store about a cached objet in memory.
                    166: <PRE>
                    167: typedef struct _HTCache HTCache;
                    168: </PRE>
                    169: <H3>
2.12      frystyk   170:   Create and Update a Cache Object
2.11      frystyk   171: </H3>
                    172: <P>
2.10      frystyk   173: Filling the cache is done as all other transportation of bulk data in libwww
                    174: using <A HREF="HTStream.html">streams</A>. The cache object creater is a
                    175: stream which in many cases sits on a <A HREF="HTTee.html">T stream</A> so
                    176: that we get the original feed and at the same time can parse the contents.
2.14      frystyk   177: <P>
                    178: In some situations, we want to append data to an already exiting cache entry.
                    179: This is the case when a use has interrupted a download and we are stuck with
                    180: a subpart of the document. If the user later on whishes to download the object
                    181: again we can issue a range request and continue from where we were. This
                    182: will in many situations save a lot of bandwidth.
2.11      frystyk   183: <PRE>
2.14      frystyk   184: extern HTConverter HTCacheWriter, HTCacheAppend;
2.11      frystyk   185: </PRE>
2.12      frystyk   186: <P>
                    187: This function writes the metainformation along with the data object stored
                    188: by the HTCacheWriter stream above. If no headers are available then the meta
                    189: file is empty
                    190: <PRE>
2.14      frystyk   191: extern BOOL HTCache_writeMeta (HTCache * cache, HTRequest * request,
                    192:                                HTResponse * response);
2.12      frystyk   193: </PRE>
                    194: <P>
                    195: In case we received a "<CODE>304 Not Modified</CODE>" response then we do
                    196: not have to tough the body but must merge the metainformation with the previous
                    197: version. Therefore we need a special metainformation update function.
                    198: <PRE>
2.14      frystyk   199: extern BOOL HTCache_updateMeta (HTCache * cache, HTRequest * request,
                    200:                                 HTResponse * response);
2.12      frystyk   201: </PRE>
2.11      frystyk   202: <H3>
                    203:   Load a Cached Object
                    204: </H3>
                    205: <P>
                    206: Loading a cached object is also done as all other loads in libwww by using
                    207: a <A HREF="HTProt.html">protocol load module</A>. For the moment, this load
                    208: function handles the persistent cache as if it was on local file but in fact
                    209: &nbsp;it could be anywhere.
                    210: <PRE>
2.15      frystyk   211: extern HTProtCallback HTLoadCache;
2.11      frystyk   212: </PRE>
                    213: <H3>
                    214:   Delete a Cache Object
                    215: </H3>
                    216: <P>
                    217: Remove a HTCache object from memory and from disk. You must explicitly remove
                    218: a lock before this operation can succeed
                    219: <PRE>
                    220: extern BOOL HTCache_remove (HTCache * cache);
                    221: </PRE>
                    222: <H3>
2.13      frystyk   223:   Delete All Cache Objects in Memory
2.11      frystyk   224: </H3>
                    225: <P>
                    226: Destroys all cache entried in memory but does not write anything to disk.
                    227: Use the index methods above for doing that. We do not delete the disk contents.
                    228: <PRE>
                    229: extern BOOL HTCache_deleteAll (void);
2.10      frystyk   230: </PRE>
                    231: <H3>
2.13      frystyk   232:   Delete all Cache Object and File Entries
                    233: </H3>
                    234: <P>
                    235: Destroys all cache entried in memory <B>and</B> on disk. This call basically
                    236: resets the cache to the inital state but it does not terminate the cache.
                    237: That is, you don't have to reinitialize the cache before you can use it again.
                    238: <PRE>
                    239: extern BOOL HTCache_flushAll (void);
                    240: </PRE>
                    241: <H3>
2.11      frystyk   242:   Find a Cached Object
2.10      frystyk   243: </H3>
                    244: <P>
                    245: Verifies if a cache object exists for this URL and if so returns a URL for
                    246: the cached object. It does not verify whether the object is valid or not,
                    247: for example it might have expired. Use the cache validation methods for checking
                    248: this.
2.11      frystyk   249: <PRE>
                    250: extern HTCache * HTCache_find (HTParentAnchor * anchor);
                    251: </PRE>
                    252: <H3>
                    253:   Verify if an Object is Fresh
                    254: </H3>
                    255: <P>
                    256: This function checks whether a document has expired or not. The check is
                    257: based on the metainformation passed in the anchor object The function returns
                    258: the level of validation needed for getting a fresh version. We also check
                    259: the cache control directives in the request to see if they change the freshness
                    260: discission.
                    261: <PRE>
                    262: extern HTReload HTCache_isFresh (HTCache * me, HTRequest * request);
                    263: </PRE>
                    264: <H3>
                    265:   Register a Cache Hit
                    266: </H3>
                    267: <P>
                    268: As a cache hit may occur several places, we have a public function where
                    269: we can declare a download to be a true cache hit. The number of hits a cache
                    270: object has affects its status when we are doing garbage collection.
                    271: <PRE>
                    272: extern BOOL HTCache_addHit (HTCache * cache);
                    273: </PRE>
                    274: <H3>
                    275:   Find the Location of a Cached Object
                    276: </H3>
                    277: <P>
                    278: Is we have a valid entry in the cache then we also need a location where
                    279: we can get it. Hopefully, we may be able to access it thourgh one of our
                    280: protocol modules, for example the <A HREF="WWWFile.html">local file module</A>.
                    281: The name returned is in URL syntax and must be freed by the caller
                    282: <PRE>
                    283: extern char * HTCache_name (HTCache * cache);
                    284: </PRE>
                    285: <H3>
                    286:   Locking a Cache Object
                    287: </H3>
                    288: <P>
                    289: While we are creating a new cache object or while we are validating an existing
                    290: one, we must have a lock on the entry so that not other requests can get
                    291: to it in the mean while. A lock can be broken if the same request tries to
                    292: create the cache entry again. This means that we have tried to validate the
                    293: cache entry but we got a new shipment of bytes back from the origin server
                    294: or an intermediary proxy.
                    295: <PRE>
                    296: extern BOOL HTCache_getLock     (HTCache * cache, HTRequest * request);
                    297: extern BOOL HTCache_breakLock   (HTCache * cache, HTRequest * request);
                    298: extern BOOL HTCache_hasLock     (HTCache * cache);
                    299: extern BOOL HTCache_releaseLock (HTCache * cache);
2.1       frystyk   300: </PRE>
                    301: <PRE>
                    302: #endif
                    303: </PRE>
2.10      frystyk   304: <P>
                    305:   <HR>
2.9       frystyk   306: <ADDRESS>
2.17    ! frystyk   307:   @(#) $Id: HTCache.html,v 2.16 1997/02/16 18:42:03 frystyk Exp $
2.9       frystyk   308: </ADDRESS>
2.10      frystyk   309: </BODY></HTML>

Webmaster