W3C libwww Using

Presentation Modules

The HTML parser has three different levels of APIs in order to make the implementation as flexible as possible. Depending on which API is used by the application, the output can be a stream, a structured stream or a set of callback functions as indicated in the figure below:

HTMLParser

The default HTML parser in libwww is very simple. You can look at the Amaya browser/editor for a complete structured parser.

SGML Stream Interface
This interface provides the most basic API consisting of the output from a stream without any form for structure imposed on the data. The internal SGML parser parses the data sequence, identifies SGML markup tags, and passes the information on the the HTML parser. However, if the application has its own SGML parser and HTML parser, the internal parsers can be disabled by removing the internal HTML converter called HTMLPresent() used to present a graphic object on the screen from both the global and the local list of converters and presenters.
HTML Structured Stream Interface
If the application has its own HTML parser that understands the structured output from the internal SGML parser then the second API can be used. The current HTML parser in libwww is very basic and does not understand many of the new features in HTML 2 and 3.
HText Call Back Interface
The last API can be in case the application prefers to use the internal HTML parser and only wants to provide a platform dependent definition of the callback functions defined in the HText module which are all defined in HText module.

Due to the limited functionality of the internal HTML parsing module, many applications have chosen to implement their own HTML parser.

DocumentationRegistrering the HTML Parser


Henrik Frystyk Nielsen,
@(#) $Id: HTML.html,v 1.13 1999/08/05 12:21:12 kahan Exp $