rpm2html/mirroring.html - view

File: [Public] / rpm2html / mirroring.html
Revision 1.2: download - view: text, annotated - select for diffs
Sat May 2 17:23:42 1998 UTC (26 years, 1 month ago) by veillard
Branches: MAIN
CVS tags: HEAD

Updated scripts, added Coda mirror, published the mirroring paper, Daniel.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title> Linux Packages Metadata Mirroring Proposal</title> <meta name="GENERATOR" content="amaya V1.3"> </head> <body bgcolor="#ffffff"> <h1 align=center>Linux Packages Metadata Mirroring Proposal</h1> <p> What does this title means ? Simply that there is currently a huge amount of precompiled software freely available for Linux, but it's hard to find the one one need:</p> <ol> <li> It is difficult to locate the actual package needed to fullfill the needs of the user (I want a graphic editor which support PNG format). <li> Once the program has been found, getting the binary version for a given system setup (distribution version, architecture, etc ...) is usually hard. <li> Now getting the nearest binary, to minimize the download time again proves difficult. <li> If the correct package has been found sometime it is impossible to install due to missed dependancies, and the quest continues ! </ol> <p> I suggest here a mechanism to help finding the packages needed. It is based on the propagation of Metadata (structured, machine readable informations) about the available binary packages. It is based on the lessons learned by building the <a href="http://rufus.w3.org/linux/RPM/">RPM database on rufus.w3.org</a>, the <a href="http://rufus.w3.org/linux/rpm2html/">rpm2html</a> program development, the work done on <a href="http://www.w3.org/Metadata/">Metadata at W3C</a>, and discussions with <a href="mailto:jim@jimpick.com">Jim Pick</a> (Debian) and <a href="mailto:marc@redhat.com">Marc Ewing</a> (RedHat).</p> <p> <img src="mirroring.gif" alt=" mirroring.gif "></p> <p> The picture above illustrate the four steps needed: to create, centralize, propagate and expose packages metadata</p> <h3>Extracting Metadata from binary packages</h3> <p> Basically the idea is to extract useful information about a package, like application name, revision, author, dependancies, etc., and save them in a format that is predefined and can be automatically parsed to extract the informations. This is precisely <a href="http://www.w3.org/Metadata/">Metadata</a> (data about data) and I suggest to use the <a href="http://www.w3.org/TR/WD-rdf-syntax/">RDF</a> Metadata encoding - based on <a href="http://www.w3.org/XML/">XML</a> - the metadata encoding proposed by <a href="http://www.w3.org/">W3C</a>. Ideally the description for the metadata should be independant of the package format, however in practice it may be, for example the package dependancies are more sophisticated in <a href="http://www.debian.org/">Debian</a> packages than in <a href="http://www.rpm.org/">RPM</a> ones. Here is for example an RDF file describing the RPM package "rpm2html-0.90-1.i386.rpm" :</p> <pre><?XML version="1.0"> <?namespace href ="http://www.w3.org/TR/WD-rdf-syntax#/" AS = "RDF"?> <?namespace href ="http://www.rpm.org/" AS = "RPM"?> <RDF:RDF> <RDF:Description RDF:HREF="ftp://ftp.redhat.com/pub/contrib/i386/rpm2html-0.90-1.i386.rpm"> <RPM:Name>rpm2html</RPM:Name> <RPM:Version>0.90</RPM:Version> <RPM:Release>1</RPM:Release> <RPM:Distribution>Unknown</RPM:Distribution> <RPM:Vendor>Daniel Veillard</RPM:Vendor> <RPM:Size>13244</RPM:Size> <RPM:URL>http://rufus.w3.org/linux/rpm2html/</RPM:URL> <RPM:BuildDate>Sun Mar 29 19:44:53 EST 1998</RPM:BuildDate> <RPM:BuildHost>rufus.w3.org</RPM:BuildHost> <RPM:Group>X11/Applications</RPM:Group> <RPM:Packager>Daniel Veillard</RPM:Packager> <RPM:Summary>Translates rpm database into html info</RPM:Summary> <RPM:Sources>ftp://ftp.redhat.com/pub/contrib/SRPM/rpm2html-0.90-1.src.rpm</RPM:Sources> <RPM:Description> </RPM:Description> Rpm2html tries to solve 2 big problems one face when grabbing a RPM package from a mirror on the net and trying to install it: - it gives more information than just the filename before installing the package. - it tries to solve the dependancy problem by analyzing all the Provides and Requires of the set of RPMs. It shows the cross references by the way of hypertext links. <RPM:Provides> <RDF:Bag> <RPM:Resource>rpm2html</RPM:Resource> </RDF:Bag> </RPM:Provides> <RPM:Requires> <RDF:Bag> <RPM:Resource>libz.so.1</RPM:Resource> <RPM:Resource>libdb.so.2</RPM:Resource> <RPM:Resource>libc.so.6</RPM:Resource> <RPM:Resource>ld-linux.so.2</RPM:Resource> </RDF:Bag> </RPM:Requires> <RPM:Files> /etc/rpm2html.config /usr/bin /usr/bin/rpm2html /usr/doc/rpm2html-0.85 /usr/doc/rpm2html-0.85/CHANGES /usr/doc/rpm2html-0.85/COPYING /usr/doc/rpm2html-0.85/PRINCIPLES /usr/doc/rpm2html-0.85/README /usr/doc/rpm2html-0.85/TODO /usr/doc/rpm2html-0.85/config.small /usr/man/man1/rpm2html.1 /usr/share/rpm2html/msg.de /usr/share/rpm2html/msg.es /usr/share/rpm2html/msg.fr </RPM:Files> </RDF:Description> </RDF:RDF> </pre> <p> While this description is definitely not suitable for an human, it can be parsed easily (basic RDF support is already present in Mozilla for example) and numerous tools can take advantage of the medata to process the associated data.</p> <p> As discussed and demonstrated quickly, these metadata can be easilly generated from the packages themselve (has been done for both RPM and Debian packages), and are usually quite smaller than the binary package themselves.</p> <h3>Centralizing the Metadata</h3> <p> Since one cannot assume that one entity can actually generate the metadata for all the available Linux binary packages available, some sort of distributed work is needed, for example each maintainer of a Linux distribution or of a set of packages can extract the metadata and make them available along with the packages.</p> <p> The next step is to centralize these Metadata to build a database as complete as possible, it can be done by mirroring the metadata provided by various maintainers. The key point is that for each package the metadata has to be generated once and uploaded once to the repository.</p> <p> Why centralizing ? Simply because metadata alone are not very useful, but the cross references one can obtain by gathering and following multiple references are usually far more useful. </p> <p> Moreover the bigger the database, the higher the probablility to answer a request based on the associated data. Centralizing also ease the spreading of data a lot !</p> <h3>Propagating the Metadata</h3> <p> Once the Matadata have been gathered in a unique place, setting up a mirroring scheme is esay and can be done in a very efficient way to propagate the data near the final user, that's the basic mechanism set up for FTP mirrors, it's well known and quite effective. The goal is to offer services based on these Metadata and install them as close as possible from the final user.</p> <h3> Expose the metadata</h3> <p> Once the Metadata are available a large amount of tools can be build to expose and use their content. A basic idea is to build directories available for searching and locating binary packages, for example rpm2html tool is being modified to support RDF Metadata as input instead of the binary packages. This allow the databases maintainer to point to a near mirror of the binary packages. A lot of other tools can be build using the metadata, like automatic checking of packages, smart installers following the metadata informations to retrieve and install the latest packages and the correct dependancies, easier management of clusters, etc.</p> <address> <p> <a href="mailto:veillard@w3.org">Daniel Veillard</a> </p> </address> <p> $Id: mirroring.html,v 1.2 1998/05/02 17:23:42 veillard Exp $</p> <h3></h3> </body> </html>