File:  [Public] / libwww / Library / src / HTAnchor.html
Revision 2.59: download - view: text, annotated - select for diffs
Tue Dec 22 19:59:58 1998 UTC (25 years, 5 months ago) by frystyk
Branches: MAIN
CVS tags: Release-5-2-8, Release-5-2-6, HEAD, Before-New-Trace-Messages, Amaya_2_4, Amaya-3-2-1, Amaya-3-2
Added HTAnchor_clearAll() which deletes all the metainformation we know about a resource but not the anchor structure itself and how anchors are deleted

<HTML>
<HEAD>
  <!-- Changed by: Henrik Frystyk Nielsen, 16-Jul-1996 -->
  <TITLE>W3C Sample Code Library libwww Anchor Class</TITLE>
</HEAD>
<BODY>
<H1>
  The Anchor Class
</H1>
<PRE>
/*
**	(c) COPYRIGHT MIT 1995.
**	Please first read the full copyright statement in the file COPYRIGH.
*/
</PRE>
<P>
An anchor represents a region of a hypertext document which is linked to
another anchor in the same or a different document. Another name for anchors
would be URLs as an anchor represents all we know about a URL - including
where it points to and who points to it.&nbsp;Because the anchor objects
represent the part of the Web, the application has been in touch, it is often
useful to maintain the anchors throughout the lifetime of the application.
It would actually be most useful if we had persistent anchors so that an
application could build up a higher knowledge about the Web topology.
<P>
This module is implemented by <A HREF="HTAnchor.c">HTAnchor.c</A>, and it
is a part of the <A HREF="http://www.w3.org/Library/"> W3C Sample Code
Library</A>.
<PRE>
#ifndef HTANCHOR_H
#define HTANCHOR_H

</PRE>
<H2>
  Types defined and used by the Anchor Object
</H2>
<P>
This is a set of videly used type definitions used through out the Library:
<PRE>
#include "WWWUtil.h"

typedef HTAtom * HTFormat;
typedef HTAtom * HTLevel;		       /* Used to specify HTML level */
typedef HTAtom * HTEncoding;				    /* C-E and C-T-E */
typedef HTAtom * HTCharset;
typedef HTAtom * HTLanguage;

typedef struct _HTAnchor	HTAnchor;
typedef struct _HTParentAnchor	HTParentAnchor;
typedef struct _HTChildAnchor	HTChildAnchor;

#include "HTLink.h"
#include "HTMethod.h"
#include "HTResponse.h"
</PRE>
<H2>
  The Anchor Class
</H2>
<P>
We have three variants of the Anchor object - I guess some would call them
superclass and subclasses ;-) <A NAME="Generic"></A>
<H3>
  <A NAME="Generic">Anchor Base Class</A>
</H3>
<P>
This is the super class of anchors. We often use this as an argument to the
functions that both accept parent anchors and child anchors. We separate
the first link from the others to avoid too many small mallocs involved by
a list creation. Most anchors only point to one place. <A NAME="parent"></A>
<H3>
  <A NAME="parent">Anchor for a Parent Object</A>
</H3>
<P>
These anchors points to the whole contents of any resource accesible by a
URI. The parent anchor now contains all known metainformation about that
object and in some cases the parent anchor also contains the document itself.
Often we get the metainformation about a document via the entity headers
in the HTTP specification.
<H3>
  <A NAME="child">Anchor for a Child Object</A>
</H3>
<P>
A child anchor is a anchor object that points to a subpart of a hypertext
document. In HTML this is represented by the <CODE>NAME</CODE> tag of the
Anchor element.
<P>
After we have defined the data structures we must define the methods that
can be used on them. All anchors are kept in an internal hash table so that
they are easier to find again.
<H3>
  Find/Create a Parent Anchor
</H3>
<P>
This one is for a reference (link) which is found in a document, and might
not be already loaded. The parent anchor returned can either be created on
the spot or is already in the hash table.
<PRE>
extern HTAnchor * HTAnchor_findAddress		(const char * address);
</PRE>
<H3>
  Find/Create a Child Anchor
</H3>
<P>
This one is for a new child anchor being edited into an existing document.
The parent anchor must already exist but the child returned can either be
created on the spot or is already in the hash table. The <EM>tag</EM> is
the part that's after the '#' sign in a URI.
<PRE>
extern HTChildAnchor * HTAnchor_findChild	(HTParentAnchor *parent,
						 const char *	tag);
</PRE>
<H3>
  Find/Create a Child Anchor and Link to Another Parent
</H3>
<P>
Find a child anchor anchor with a given parent and possibly a <EM>tag</EM>,
and (if passed) link this child to the URI given in the <EM>href</EM>. As
we really want typed links to the caller should also indicate what the type
of the link is (see HTTP spec for more information). The link is
<EM>relative</EM> to the address of the parent anchor.
<PRE>
extern HTChildAnchor * HTAnchor_findChildAndLink (
		HTParentAnchor * parent,		/* May not be 0 */
		const char * tag,			/* May be "" or 0 */
		const char * href,			/* May be "" or 0 */
		HTLinkType ltype);			/* May be 0 */
</PRE>
<H3>
  Delete an Anchor
</H3>
<P>
All outgoing links from parent and children are deleted, and this anchor
is removed from the sources list of all its targets. We also delete the targets.
If this anchor's source list is empty, we delete it and its children.
<PRE>
extern BOOL HTAnchor_delete	(HTParentAnchor *me);
</PRE>
<H3>
  Clear all Anchors
</H3>
<P>
Deletes all the metadata associated with anchors but doesn't delete
the anchor link structure itself. This is much safer than deleting the
complete anchor structure as this represents the complete Web the
application has been in touch with. It also returns a list of all the
objects (hyperdoc) hanging of the parent anchors found while doing
it. These are not deleted by libwww.
<PRE>
extern BOOL HTAnchor_clearAll (HTList * documents);
</PRE>
<H3>
  Delete all Anchors
</H3>
<P>
Deletes <EM>all</EM> anchors and return a list of all the objects (hyperdoc)
hanging of the parent anchors found while doing it. The application may keep
its own list of <CODE>HyperDoc</CODE>s, but this function returns it anyway.
It is <EM>always</EM> for the application to delete any
<CODE>HyperDoc</CODE>s. If NULL then no hyperdocs are returned. Return YES
if OK, else NO.
<P>
<B>Note:</B> This function is different from cleaning up the history list!
<PRE>
extern BOOL HTAnchor_deleteAll	(HTList * objects);
</PRE>
<H3>
  Flatten all anchors into Array
</H3>
<P>
Flattens the anchor web structure into an array.  This is useful for
calculating statistics, sorting the parent anchors etc.<P>

The caller can indicate the size of the array (total number of anchors
if known - otherwise 0).<P>

Return an array that must be freed by the caller or NULL if no
anchors.<P>

<PRE>
extern HTArray * HTAnchor_getArray (int growby);
</PRE>

<H2>
  <A NAME="links">Links and Anchors</A>
</H2>
<P>
Anchor objects are bound together by <A HREF="HTLink.html">Link objects</A>
that carry information about what type of link and whetther we have followed
the link etc. Any anchor object can have zero, one, or many links but the
normal case is one. Therefore we treat this is a special way.
<H3>
  Handling the Main Link
</H3>
<P>
Any outgoing link can at any time be the main destination.
<PRE>
extern BOOL HTAnchor_setMainLink	(HTAnchor * anchor, HTLink * link);
extern HTLink * HTAnchor_mainLink	(HTAnchor * anchor);

extern HTAnchor * HTAnchor_followMainLink (HTAnchor * anchor);
</PRE>
<H3>
  Handling the Sub Links
</H3>
<PRE>
extern BOOL HTAnchor_setSubLinks	(HTAnchor * anchor, HTList * list);
extern HTList * HTAnchor_subLinks	(HTAnchor * anchor);
</PRE>

<H3>
  Search for a Link Type
</H3>

Links can have relations (indicated by the "rel" or "rev" HTML link
attributes).  This function can search an anchor looking for a
specific type, for example "stylesheet".

<PRE>
extern HTLink * HTAnchor_findLinkType (HTAnchor * me, HTLinkType type);
</PRE>

<H2>
  Relations Between Children and Parents
</H2>
<P>
As always, children and parents have a compliated relationship and the libwww
Anchor class is no exception.
<H3>
  Who is Parent?
</H3>
<P>
For parent anchors this returns the anchor itself
<PRE>extern HTParentAnchor * HTAnchor_parent	(HTAnchor *me);
</PRE>
<H3>
  Does it have any Anchors within it?
</H3>
<P>
Does this parent anchor have any children
<PRE>extern BOOL HTAnchor_hasChildren	(HTParentAnchor *me);
</PRE>
<H3>
  Is this anchor a Child?
</H3>
<PRE>
extern BOOL HTAnchor_isChild (HTAnchor * me);
</PRE>

<H3>
  Get the Tag/Fragment/View of this anchor
</H3>

If this is a child anchor then it has a tag (often also called a "fragment"), which
is essentially a specific <B>view</B> of a document. This is why I like to call it
a view instead of a fragment. The string returned (if non-NULL) must be freed by the
caller.

<PRE>
extern char * HTAnchor_view (HTAnchor * me);
</PRE>

<H2>
  Anchor Addresses
</H2>
<P>
There are two addresses of an anchor. The URI that was passed when the anchor
was crated and the physical address that's used when the URI is going to
be requested. The two addresses may be different if the request is going
through a proxy or a gateway or it may have been mapped through a rule file.
<H3>
  Logical Address
</H3>
<P>
Returns the full URI of the anchor, child or parent as a malloc'd string
to be freed by the caller as when the anchor was created.
<PRE>extern char * HTAnchor_address		(HTAnchor * me);
</PRE>
<H3>
  Expanded Logical Address
</H3>
<P>
When expanding URLs within a hypertext document, the base address is taken
as the following value if present (in that order):
<UL>
  <LI>
    <CODE>Content-Base</CODE> header
  <LI>
    <CODE>Content-Location</CODE> header
  <LI>
    Logical address
</UL>
<PRE>extern char * HTAnchor_expandedAddress  (HTAnchor * me);
</PRE>
<H3>
  Physical address
</H3>
<P>
Contains the physical address after we haved looked for proxies etc.
<PRE>extern char * HTAnchor_physical		(HTParentAnchor * me);
extern void HTAnchor_setPhysical	(HTParentAnchor * me, char * protocol);
extern void HTAnchor_clearPhysical	(HTParentAnchor * me);
</PRE>
<H2>
  Entity Body Information
</H2>
<P>
A parent anchor can have a data object bound to it. This data object does
can for example be a parsed version of a HTML that knows how to present itself
to the user, or it can be an unparsed data object. It's completely free for
the application to use this possibility, but a typical usage would to manage
the data object as part of a memory cache.
<PRE>
extern void HTAnchor_setDocument	(HTParentAnchor *me, void * doc);
extern void * HTAnchor_document		(HTParentAnchor *me);
</PRE>
<H2>
  Entity Header Information
</H2>
<P>
The anchor object also contains all the metainformation that we know about
the object.
<H3>
  Clear All header Information
</H3>
<PRE>
extern void HTAnchor_clearHeader	(HTParentAnchor *me);
</PRE>
<H3>
  Inherit Metainformation from the Response object
</H3>
<P>
Once we have decided to cache the object we transfer already parsed
metainformation from the <A HREF="HTResponse.html">HTResponse object </A>to
the anchor object and also the unparsed headers as we may wanna use that
information later.
<PRE>
extern BOOL HTAnchor_update (HTParentAnchor * me, HTResponse * response);
</PRE>
<H3>
  Is the Anchor searchable?
</H3>
<PRE>extern void HTAnchor_clearIndex		(HTParentAnchor * me);
extern void HTAnchor_setIndex		(HTParentAnchor * me);
extern BOOL HTAnchor_isIndex		(HTParentAnchor * me);
</PRE>
<H3>
  Anchor Title
</H3>
<P>
We keep the title in the anchor as we then can refer to it later in the history
list etc. We can also obtain the title element if it is passed as a HTTP
header in the response. Any title element found in an HTML document will
overwrite a title given in a HTTP header.
<PRE>extern const char * HTAnchor_title	(HTParentAnchor *me);
extern void HTAnchor_setTitle		(HTParentAnchor *me,
					 const char *	title);
extern void HTAnchor_appendTitle	(HTParentAnchor *me,
					 const char *	title);
</PRE>
<H3>
  Meta Tags within the Document
</H3>

<PRE>
extern HTAssocList * HTAnchor_meta (HTParentAnchor * me);
extern BOOL HTAnchor_addMeta (HTParentAnchor * me,
			      const char * name, const char * value);
</PRE>

<H4>
  The Robots Meta tag
</H4>

A special case function that looks for any robots meta tag. This tag
contains information about which links a robot can traverse and which
it shouldn't.

<PRE>
extern char * HTAnchor_robots (HTParentAnchor * me);
</PRE>

<H3>
  Content Base
</H3>
<P>
The <CODE>Content-Base</CODE> header may be used for resolving
relative URLs within the entity. If it there is no
<CODE>Content-Base</CODE> header then we use the Content-Location if
present and absolute.
<PRE>
extern char * HTAnchor_base	(HTParentAnchor * me);
extern BOOL HTAnchor_setBase 	(HTParentAnchor * me, char * base);
</PRE>
<H3>
  Content Location
</H3>
<P>
Content location can either be an absolute or a relative URL. The URL may
be either absolute or relative. If it is relative then we parse it relative
to the <CODE>Content-Base</CODE> header of the request URI if any, otherwise
we use the Request URI.
<PRE>
extern char * HTAnchor_location		(HTParentAnchor * me);
extern BOOL HTAnchor_setLocation	(HTParentAnchor * me, char * location);
</PRE>
<H3>
  Media Types (Content-Type)
</H3>
<PRE>
extern HTFormat HTAnchor_format		(HTParentAnchor *me);
extern void HTAnchor_setFormat		(HTParentAnchor *me,
					 HTFormat	form);
</PRE>
<H3>
  Content Type Parameters
</H3>
<P>
The Anchor obejct stores all content parameters in an Association list so
here you will always be able to find them. We also have a few methods for
the special cases: <CODE>charset</CODE> and <CODE>level</CODE> as they are
often needed.
<PRE>
extern HTAssocList * HTAnchor_formatParam (HTParentAnchor * me);

extern BOOL HTAnchor_addFormatParam 	(HTParentAnchor * me,
					const char * name, const char * value);
</PRE>
<H4>
  Charset parameter to Content-Type
</H4>
<PRE>
extern HTCharset HTAnchor_charset	(HTParentAnchor *me);
extern BOOL HTAnchor_setCharset		(HTParentAnchor *me,
					 HTCharset	charset);
</PRE>
<H4>
  Level parameter to Content-Type
</H4>
<PRE>
extern HTLevel HTAnchor_level		(HTParentAnchor * me);
extern BOOL HTAnchor_setLevel		(HTParentAnchor * me,
					 HTLevel	level);
</PRE>
<H3>
  Content Language
</H3>
<PRE>
extern HTList * HTAnchor_language	(HTParentAnchor * me);
extern BOOL HTAnchor_addLanguage	(HTParentAnchor *me, HTLanguage lang);
extern BOOL HTAnchor_deleteLanguageAll  (HTParentAnchor * me);
</PRE>
<H3>
  Content Encoding
</H3>
<PRE>
extern HTList * HTAnchor_encoding	(HTParentAnchor * me);
extern BOOL HTAnchor_addEncoding	(HTParentAnchor * me, HTEncoding enc);
extern BOOL HTAnchor_deleteEncoding     (HTParentAnchor * me, HTEncoding enc);
extern BOOL HTAnchor_deleteEncodingAll  (HTParentAnchor * me);

#define HTAnchor_removeEncoding(a, e)   HTAnchor_deleteEncoding((a), (e))
</PRE>
<H3>
  Content Transfer Encoding
</H3>
<PRE>
extern HTEncoding HTAnchor_contentTransferEncoding	(HTParentAnchor *me);
extern void HTAnchor_setContentTransferEncoding	        (HTParentAnchor *me,
				 	                 HTEncoding	cte);
</PRE>
<H3>
  Content Length
</H3>
<PRE>
extern long int HTAnchor_length	(HTParentAnchor * me);
extern void HTAnchor_setLength	(HTParentAnchor * me, long int length);
extern void HTAnchor_addLength	(HTParentAnchor * me, long int deltalength);
</PRE>
<H3>
  Content MD5
</H3>
<PRE>
extern char * HTAnchor_md5	(HTParentAnchor * me);
extern BOOL HTAnchor_setMd5	(HTParentAnchor * me, const char * hash);
</PRE>
<H3>
  Allowed methods (Allow)
</H3>
<PRE>
extern HTMethod HTAnchor_allow   (HTParentAnchor * me);
extern void HTAnchor_setAllow    (HTParentAnchor * me, HTMethod methodset);
extern void HTAnchor_appendAllow (HTParentAnchor * me, HTMethod methodset);
</PRE>
<H3>
  Version
</H3>
<PRE>
extern char * HTAnchor_version	(HTParentAnchor * me);
extern void HTAnchor_setVersion	(HTParentAnchor * me, const char * version);
</PRE>
<H3>
  Date
</H3>
<P>
Returns the date that was registered in the RFC822 header "Date"
<PRE>
extern time_t HTAnchor_date		(HTParentAnchor * me);
extern void HTAnchor_setDate		(HTParentAnchor	* me, const time_t date);
</PRE>
<H3>
  Last Modified Date
</H3>
<P>
Returns the date that was registered in the RFC822 header "Last-Modified"
<PRE>
extern time_t HTAnchor_lastModified	(HTParentAnchor * me);
extern void HTAnchor_setLastModified	(HTParentAnchor	* me, const time_t lm);
</PRE>
<H3>
  Entity Tag
</H3>
<P>
Entity tags are used for comparing two or more entities from the same requested
resource. It is a cache validator much in the same way <I>Date</I> can be.
The difference is that we can't always trust the date and time stamp and
hence we must have something stronger.
<PRE>extern char * HTAnchor_etag 	(HTParentAnchor * me);
extern void HTAnchor_setEtag	(HTParentAnchor * me, const char * etag);
extern BOOL HTAnchor_isEtagWeak	(HTParentAnchor * me);
</PRE>
<H3>
  Age Header
</H3>
<P>
The <CODE>Age</CODE> response-header field conveys the sender's estimate
of the amount of time since the response (or its revalidation) was generated
at the origin server. A cached response is "fresh" if its age does not exceed
its freshness lifetime.
<PRE>
extern time_t HTAnchor_age    (HTParentAnchor * me);
extern void HTAnchor_setAge   (HTParentAnchor * me, const time_t age);
</PRE>
<H3>
  Expires Date
</H3>
<PRE>
extern time_t HTAnchor_expires		(HTParentAnchor * me);
extern void HTAnchor_setExpires		(HTParentAnchor	* me, const time_t exp);
</PRE>
<H3>
  Derived from
</H3>
<PRE>
extern char * HTAnchor_derived	(HTParentAnchor *me);
extern void HTAnchor_setDerived	(HTParentAnchor *me, const char *derived_from);
</PRE>
<H2>
  Status of Header Parsing
</H2>
<P>
This is primarily for internal use. It is so that we can check whether the
header has been parsed or not.
<PRE>
extern BOOL HTAnchor_headerParsed	(HTParentAnchor *me);
extern void HTAnchor_setHeaderParsed	(HTParentAnchor *me);
</PRE>
<H3>
  Original Response Headers
</H3>
<P>
The <A HREF="HTMIME.html">MIME parser</A> may add the original response headers
as (name,value) pairs.
<PRE>
extern BOOL HTAnchor_setHeader       (HTParentAnchor * me, HTAssocList * list);
extern HTAssocList * HTAnchor_header (HTParentAnchor * me);
</PRE>
<PRE>
#endif /* HTANCHOR_H */
</PRE>
<P>
  <HR>
<ADDRESS>
  @(#) $Id: HTAnchor.html,v 2.59 1998/12/22 19:59:58 frystyk Exp $
</ADDRESS>
</BODY></HTML>

Webmaster