W3C Robot

The Webbot Log Database

This is an example of a Web-enabled interface to a MySQL database based on www-sql. The links below are some samples of how you can extract information about broken links, redirections, document titles, last modified dates, content types, link relationships and much much more from a MySQL log database generated by the webbot.

When active, the information is extracted directly from the MySQL database as you go along - the samples below are not hooked up to a database, so instead you will see the raw www-sql documents:

Link Relationships
Which resources point to which resources? Information about both incoming and outgoing links as well as different link types (images, style sheets, etc).
Request Information
Information gathered from the HTTP requests including information about broken links, redirections, etc.
Resources Information
Information about the documents like content length, type, language, title, whether it was content negotiated, etc.
Broken HTML, Bad Style, etc.
The Webbot can detect certain HTML problems like empty or non-existing ALT tags on images, for example

I use an active MySQL database to check links etc. on the W3C website - it takes about 50 minutes to check 32,000 links.


Henrik Frystyk Nielsen,
@(#) $Id: Overview.html,v 1.5 1998/10/26 22:30:54 frystyk Exp $