Annotation of libwww/Library/src/HTCache.html, revision 2.26
2.1 frystyk 1: <HTML>
2: <HEAD>
2.16 frystyk 3: <TITLE>W3C Sample Code Library libwww Persistent Cache Manager</TITLE>
2.1 frystyk 4: </HEAD>
5: <BODY>
2.10 frystyk 6: <H1>
7: Persistent Cache Manager
8: </H1>
2.1 frystyk 9: <PRE>
10: /*
11: ** (c) COPYRIGHT MIT 1995.
12: ** Please first read the full copyright statement in the file COPYRIGH.
13: */
14: </PRE>
2.10 frystyk 15: <P>
2.11 frystyk 16: The cache contains details of persistent files which contain the contents
2.10 frystyk 17: of remote documents. The existing cache manager is somewhat naive - especially
2.11 frystyk 18: in its garbage collection but it is just an example of how it can be
19: done.However, it is a fully HTTP/1.1 compliant cache manager. More advanced
20: implementations are welcome!
2.10 frystyk 21: <P>
22: This module is implemented by <A HREF="HTCache.c">HTCache.c</A>, and it is
2.18 frystyk 23: a part of the <A HREF="http://www.w3.org/Library/">W3C Sample Code Library</A>.
2.1 frystyk 24: <PRE>
25: #ifndef HTCACHE_H
26: #define HTCACHE_H
27:
2.10 frystyk 28: #include "WWWLib.h"
2.26 ! vbancrof 29:
! 30: #ifdef __cplusplus
! 31: extern "C" {
! 32: #endif
2.1 frystyk 33: </PRE>
2.10 frystyk 34: <H2>
2.11 frystyk 35: Initialize and Terminate the Persistent Cache
2.10 frystyk 36: </H2>
37: <P>
2.21 frystyk 38: The <CODE>cache_root</CODE> is the URI of the location of the persistent
39: cache. An example is "<CODE>file:/tmp/w3c-lib</CODE>". If
40: <CODE>cache_root</CODE> is <CODE>NULL</CODE> then determine a cache root
41: using the following algorithm:
42: <OL>
43: <LI>
44: Look for any environment variables (if supported) in the following order:
45: <CODE>WWW_CACHE</CODE>, <CODE>TMP</CODE>, and <CODE>TEMP</CODE>. If none
46: are set then then fall back on "<CODE>/tmp</CODE>".
47: <LI>
48: Append the folder name "<CODE>w3c-cache</CODE>" to the root identified above
49: </OL>
50: <P>
51: The <CODE>cache_root</CODE> location does not have to exist, it will be created
52: automatically if not. An empty string will make '/' the cache root.
53: <P>
54: The size is the total size in MBytes - the default size is 20M. The cache
55: can not be less than 5M.
56: <P>
57: We can only enable the cache if we are in <A HREF="HTLib.html#Secure">secure
58: mode</A> where we can not access the local file system. This is for example
59: the case if using an application as a telnet shell.
2.11 frystyk 60: <PRE>
61: extern BOOL HTCacheInit (const char * cache_root, int size);
2.10 frystyk 62: </PRE>
63: <P>
2.11 frystyk 64: After the cache has been terminated it can not be used anymore unless you
65: do another <CODE>HTCacheInit()</CODE> call.
66: <PRE>
67: extern BOOL HTCacheTerminate (void);
2.10 frystyk 68: </PRE>
2.11 frystyk 69: <H2>
70: Cache Mode Parameters
71: </H2>
2.10 frystyk 72: <P>
2.22 kahan 73: The persistent cache has a set of overall parameters that you can adjust
2.10 frystyk 74: <H3>
2.11 frystyk 75: Enable and Disable the Cache
2.10 frystyk 76: </H3>
77: <P>
2.11 frystyk 78: The cache can be temporarily suspended by using the enable/disable flag.
79: This does not prevent the cache from being enabled/disable at a later point
80: in time.
81: <PRE>
82: extern void HTCacheMode_setEnabled (BOOL mode);
83: extern BOOL HTCacheMode_enabled (void);
2.10 frystyk 84: </PRE>
2.22 kahan 85: <P>
86: The cache can be setup to whether cache password protected documents thru the
87: protected flag. By default this flag is turned off.
88: <PRE>
89: extern void HTCacheMode_setProtected (BOOL mode);
90: extern BOOL HTCacheMode_protected (void);
91: </PRE>
2.10 frystyk 92: <H3>
2.11 frystyk 93: What is the current Cache Root?
2.10 frystyk 94: </H3>
95: <P>
2.11 frystyk 96: Return the value of the cache root. The cache root can only be set through
2.21 frystyk 97: the <CODE>HTCacheInit()</CODE> function. The string returned MUST be freed
98: by the caller
2.11 frystyk 99: <PRE>
2.21 frystyk 100: extern char * HTCacheMode_getRoot (void);
2.10 frystyk 101: </PRE>
102: <H3>
2.11 frystyk 103: Total Cache Size
2.10 frystyk 104: </H3>
105: <P>
2.11 frystyk 106: We set the default cache size to 20M. We set the minimum size to 5M in order
107: not to get into weird problems while writing the cache. The size is indicated
108: in Mega bytes. The size is given in MBytes and is also returned in MBytes.
2.14 frystyk 109: We don't consider the metainformation as part of the total cache size which
110: is the the reason for why the min cache size should not be less than 5M.
2.11 frystyk 111: <PRE>
112: extern BOOL HTCacheMode_setMaxSize (int size);
113: extern int HTCacheMode_maxSize (void);
2.10 frystyk 114: </PRE>
115: <H3>
2.19 frystyk 116: Max Size of a Single Cache Entry
117: </H3>
118: <P>
2.20 frystyk 119: It is also possible to control the max size of a single cache entry so that
120: the cache doesn't get filled with a very few, very large cached entries.
121: The default max size for a single cached entry is 3M. The value indicated
122: must be in Mbytes, for example, a vaue of 3 would mean 3 MBytes.
2.19 frystyk 123: <PRE>
124: extern BOOL HTCacheMode_setMaxCacheEntrySize (int size);
125: extern int HTCacheMode_maxCacheEntrySize (void);
126: </PRE>
127: <H3>
2.23 kahan 128: Default expiration time of cache entries
129: </H3>
130: <P>
131: If a response does not arrive with an expiration time and does not
132: explicitly forbid its being cached, use the default expiration time. The
133: time is given in seconds (e.g., 3,600 is one hour).
134: <PRE>
135: extern void HTCacheMode_setDefaultExpiration (const int exp_time);
136: extern int HTCacheMode_DefaultExpiration (void);
137: </PRE>
138: <H3>
2.10 frystyk 139: How do we handle Expiration of Cached Objects?
140: </H3>
141: <P>
142: There are various ways of handling <CODE>Expires</CODE> header when met in
2.11 frystyk 143: a <I>history list</I>. Either it can be ignored all together, the user can
144: be notified with a warning, or the document can be reloaded automatically.
145: This flag decides what action to be taken. The default action is
2.10 frystyk 146: <CODE>HT_EXPIRES_IGNORE</CODE>. In <CODE>HT_EXPIRES_NOTIFY</CODE> mode ,
147: we push a message on to the Error stack which is presented to the user.
2.4 frystyk 148: <PRE>
149: typedef enum _HTExpiresMode {
150: HT_EXPIRES_IGNORE = 0,
151: HT_EXPIRES_NOTIFY,
152: HT_EXPIRES_AUTO
153: } HTExpiresMode;
154:
2.11 frystyk 155: extern void HTCacheMode_setExpires (HTExpiresMode mode);
156: extern HTExpiresMode HTCacheMode_expires (void);
157: </PRE>
158: <H3>
159: Disconnected Operation
160: </H3>
161: <P>
162: The cache can be set to handle disconnected operation where it does not use
2.20 frystyk 163: the network to validate entries and do not attempt to load new documents.
2.11 frystyk 164: All requests that can not be fulfilled by the cache will be returned with
165: a <CODE>"504 Gateway Timeout"</CODE> response. There are two modes of how
2.21 frystyk 166: the cache can operate in disconnected mode:
2.20 frystyk 167: <DL>
168: <DT>
169: <EM>No network activity at all</EM>
170: <DD>
171: Here is uses its own persistent cache
172: <DT>
173: <EM>Forward all disconnected requests to a proxy cache</EM>
174: <DD>
175: Here it uses the HTTP/1.1 cache-control to indicate that the proxy should
176: operate in disconnected mode. This mode only really makes sense when you
177: are using a proxy, of course.
178: </DL>
2.11 frystyk 179: <PRE>
180: typedef enum _HTDisconnectedMode {
181: HT_DISCONNECT_NONE = 0,
182: HT_DISCONNECT_NORMAL = 1,
183: HT_DISCONNECT_EXTERNAL = 2
184: } HTDisconnectedMode;
185:
186: extern void HTCacheMode_setDisconnected (HTDisconnectedMode mode);
187: extern HTDisconnectedMode HTCacheMode_disconnected (void);
188: extern BOOL HTCacheMode_isDisconnected (HTReload mode);
2.1 frystyk 189: </PRE>
2.10 frystyk 190: <H2>
2.12 frystyk 191: The Cache Index
192: </H2>
193: <P>
194: The persistent cache keeps an index of its current entries so that garbage
195: collection and lookup becomes more efficient. This index is stored automatically
196: at regular intervals so that we don't get out of sync. Also, it is automatically
197: loaded at startup and saved at closedown of the cache.
198: <H3>
199: Reading the Cache Index
200: </H3>
201: <P>
202: Read the saved set of cached entries from disk. we only allow the index ro
203: be read when there is no entries in memory. That way we can ensure consistancy.
204: <PRE>
205: extern BOOL HTCacheIndex_read (const char * cache_root);
206: </PRE>
207: <H3>
208: Write the Cache Index
209: </H3>
210: <P>
211: Walk through the list of cached objects and save them to disk. We override
212: any existing version but that is normally OK as we have already read its
213: contents.
214: <PRE>
215: extern BOOL HTCacheIndex_write (const char * cache_root);
216: </PRE>
217: <H2>
2.11 frystyk 218: The HTCache Object
2.10 frystyk 219: </H2>
220: <P>
2.11 frystyk 221: The cache object is what we store about a cached objet in memory.
222: <PRE>
223: typedef struct _HTCache HTCache;
224: </PRE>
225: <H3>
2.12 frystyk 226: Create and Update a Cache Object
2.11 frystyk 227: </H3>
228: <P>
2.10 frystyk 229: Filling the cache is done as all other transportation of bulk data in libwww
230: using <A HREF="HTStream.html">streams</A>. The cache object creater is a
231: stream which in many cases sits on a <A HREF="HTTee.html">T stream</A> so
232: that we get the original feed and at the same time can parse the contents.
2.14 frystyk 233: <P>
234: In some situations, we want to append data to an already exiting cache entry.
235: This is the case when a use has interrupted a download and we are stuck with
236: a subpart of the document. If the user later on whishes to download the object
237: again we can issue a range request and continue from where we were. This
238: will in many situations save a lot of bandwidth.
2.11 frystyk 239: <PRE>
2.14 frystyk 240: extern HTConverter HTCacheWriter, HTCacheAppend;
2.11 frystyk 241: </PRE>
2.12 frystyk 242: <P>
243: This function writes the metainformation along with the data object stored
244: by the HTCacheWriter stream above. If no headers are available then the meta
245: file is empty
246: <PRE>
2.14 frystyk 247: extern BOOL HTCache_writeMeta (HTCache * cache, HTRequest * request,
248: HTResponse * response);
2.12 frystyk 249: </PRE>
250: <P>
251: In case we received a "<CODE>304 Not Modified</CODE>" response then we do
252: not have to tough the body but must merge the metainformation with the previous
253: version. Therefore we need a special metainformation update function.
254: <PRE>
2.14 frystyk 255: extern BOOL HTCache_updateMeta (HTCache * cache, HTRequest * request,
256: HTResponse * response);
2.12 frystyk 257: </PRE>
2.25 kahan 258: <P>
259: Clear a cache entry
260: <PRE>
261: extern BOOL HTCache_resetMeta (HTCache * cache, HTRequest * request,
262: HTResponse * response);
263: </PRE>
2.11 frystyk 264: <H3>
2.18 frystyk 265: Check Cached Entry
266: </H3>
267: <P>
268: After we get a response back, we should check whether we can still cache
269: an entry and/or we should add an entry for a resource that has just been
270: created so that we can remember the etag and other things. The latter allows
271: us to guarantee that we don't loose data due to the lost update problem.
272: <PRE>
273: extern HTCache * HTCache_touch (HTRequest * request, HTResponse * response,
274: HTParentAnchor * anchor);
275: </PRE>
276: <P>
277: <H3>
2.11 frystyk 278: Load a Cached Object
279: </H3>
280: <P>
281: Loading a cached object is also done as all other loads in libwww by using
282: a <A HREF="HTProt.html">protocol load module</A>. For the moment, this load
283: function handles the persistent cache as if it was on local file but in fact
284: it could be anywhere.
285: <PRE>
2.15 frystyk 286: extern HTProtCallback HTLoadCache;
2.11 frystyk 287: </PRE>
288: <H3>
289: Delete a Cache Object
290: </H3>
291: <P>
292: Remove a HTCache object from memory and from disk. You must explicitly remove
293: a lock before this operation can succeed
294: <PRE>
295: extern BOOL HTCache_remove (HTCache * cache);
296: </PRE>
297: <H3>
2.13 frystyk 298: Delete All Cache Objects in Memory
2.11 frystyk 299: </H3>
300: <P>
301: Destroys all cache entried in memory but does not write anything to disk.
302: Use the index methods above for doing that. We do not delete the disk contents.
303: <PRE>
304: extern BOOL HTCache_deleteAll (void);
2.10 frystyk 305: </PRE>
306: <H3>
2.13 frystyk 307: Delete all Cache Object and File Entries
308: </H3>
309: <P>
310: Destroys all cache entried in memory <B>and</B> on disk. This call basically
311: resets the cache to the inital state but it does not terminate the cache.
312: That is, you don't have to reinitialize the cache before you can use it again.
313: <PRE>
314: extern BOOL HTCache_flushAll (void);
315: </PRE>
316: <H3>
2.11 frystyk 317: Find a Cached Object
2.10 frystyk 318: </H3>
319: <P>
320: Verifies if a cache object exists for this URL and if so returns a URL for
321: the cached object. It does not verify whether the object is valid or not,
322: for example it might have expired. Use the cache validation methods for checking
323: this.
2.11 frystyk 324: <PRE>
2.24 kahan 325: extern HTCache * HTCache_find (HTParentAnchor * anchor, char * default_name);
2.11 frystyk 326: </PRE>
327: <H3>
328: Verify if an Object is Fresh
329: </H3>
330: <P>
331: This function checks whether a document has expired or not. The check is
332: based on the metainformation passed in the anchor object The function returns
333: the level of validation needed for getting a fresh version. We also check
334: the cache control directives in the request to see if they change the freshness
335: discission.
336: <PRE>
337: extern HTReload HTCache_isFresh (HTCache * me, HTRequest * request);
338: </PRE>
339: <H3>
340: Register a Cache Hit
341: </H3>
342: <P>
343: As a cache hit may occur several places, we have a public function where
344: we can declare a download to be a true cache hit. The number of hits a cache
345: object has affects its status when we are doing garbage collection.
346: <PRE>
347: extern BOOL HTCache_addHit (HTCache * cache);
348: </PRE>
349: <H3>
350: Find the Location of a Cached Object
351: </H3>
352: <P>
353: Is we have a valid entry in the cache then we also need a location where
354: we can get it. Hopefully, we may be able to access it thourgh one of our
355: protocol modules, for example the <A HREF="WWWFile.html">local file module</A>.
356: The name returned is in URL syntax and must be freed by the caller
357: <PRE>
358: extern char * HTCache_name (HTCache * cache);
359: </PRE>
360: <H3>
361: Locking a Cache Object
362: </H3>
363: <P>
364: While we are creating a new cache object or while we are validating an existing
365: one, we must have a lock on the entry so that not other requests can get
366: to it in the mean while. A lock can be broken if the same request tries to
367: create the cache entry again. This means that we have tried to validate the
368: cache entry but we got a new shipment of bytes back from the origin server
369: or an intermediary proxy.
370: <PRE>
371: extern BOOL HTCache_getLock (HTCache * cache, HTRequest * request);
372: extern BOOL HTCache_breakLock (HTCache * cache, HTRequest * request);
373: extern BOOL HTCache_hasLock (HTCache * cache);
374: extern BOOL HTCache_releaseLock (HTCache * cache);
2.1 frystyk 375: </PRE>
376: <PRE>
2.26 ! vbancrof 377: #ifdef __cplusplus
! 378: }
2.1 frystyk 379: #endif
2.26 ! vbancrof 380:
! 381: #endif /* HTCACHE_H */
2.1 frystyk 382: </PRE>
2.10 frystyk 383: <P>
384: <HR>
2.9 frystyk 385: <ADDRESS>
2.26 ! vbancrof 386: @(#) $Id: HTCache.html,v 2.25 2001/08/30 13:41:03 kahan Exp $
2.9 frystyk 387: </ADDRESS>
2.10 frystyk 388: </BODY></HTML>
Webmaster