Annotation of rpm2html/mirroring.html, revision 1.4

1.1       veillard    1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
                      2: <html>
                      3: <head>
                      4: <title>
                      5: Linux Packages Metadata Mirroring Proposal</title>
                      6: <meta name="GENERATOR" content="amaya V1.3">
                      7: </head>
                      8: <body bgcolor="#ffffff">
                      9: 
                     10: <h1 align=center>Linux Packages Metadata Mirroring Proposal</h1>
                     11: <p>
                     12: What does this title means ? Simply that there is currently a huge amount of
                     13: precompiled software freely available for Linux, but it's hard to find the one
                     14: one need:</p>
                     15: <ol>
                     16: <li>
                     17: It is difficult to locate the actual package needed to fullfill the needs of
                     18: the user (I want a graphic editor which support PNG format).
                     19: <li>
                     20: Once the program has been found, getting the binary version for a given system
                     21: setup (distribution version, architecture, etc ...) is usually hard.
                     22: <li>
                     23: Now getting the nearest binary, to minimize the download time again proves
                     24: difficult.
                     25: <li>
                     26: If the correct package has been found sometime it is impossible to install due
                     27: to missed dependancies, and the quest continues !
                     28: </ol>
                     29: <p>
                     30: I suggest here a mechanism to help finding the packages needed. It is based on
                     31: the propagation of Metadata (structured, machine readable informations) about
                     32: the available binary packages. It is based on the lessons learned by building
1.4     ! daniel     33: the <a href="http://rpmfind.net/linux/RPM/">RPM database on rpmfind.net</a>,
        !            34: the <a href="http://rpmfind.net/linux/rpm2html/">rpm2html</a> program
1.1       veillard   35: development, the work done on <a href="http://www.w3.org/Metadata/">Metadata
                     36: at W3C</a>, and discussions with <a href="mailto:jim@jimpick.com">Jim Pick</a>
                     37: (Debian) and <a href="mailto:marc@redhat.com">Marc Ewing</a> (RedHat).</p>
                     38: <p>
                     39: <img src="mirroring.gif" alt=" mirroring.gif "></p>
                     40: <p>
                     41: The picture above illustrate the four steps needed: to create, centralize,
                     42: propagate and expose packages metadata</p>
                     43: 
                     44: <h3>Extracting Metadata from binary packages</h3>
                     45: <p>
                     46: Basically the idea is to extract useful information about a package, like
                     47: application name, revision, author, dependancies, etc., and save them in a
                     48: format that is predefined and can be automatically parsed to extract the
                     49: informations. This is precisely <a
                     50: href="http://www.w3.org/Metadata/">Metadata</a> (data about data) and I
                     51: suggest to use the <a href="http://www.w3.org/TR/WD-rdf-syntax/">RDF</a>
                     52: Metadata encoding - based on <a href="http://www.w3.org/XML/">XML</a> - the
                     53: metadata encoding proposed by <a href="http://www.w3.org/">W3C</a>. Ideally
                     54: the description for the metadata should be independant of the package format,
                     55: however in practice it may be, for example the package dependancies are more
                     56: sophisticated in <a href="http://www.debian.org/">Debian</a> packages than in
                     57: <a href="http://www.rpm.org/">RPM</a> ones. Here is for example an RDF file
1.2       veillard   58: describing the RPM package "rpm2html-0.90-1.i386.rpm" :</p>
1.1       veillard   59: <pre>&lt;?XML version="1.0">
                     60: &lt;?namespace href ="http://www.w3.org/TR/WD-rdf-syntax#/" AS = "RDF"?>
                     61: &lt;?namespace href ="http://www.rpm.org/" AS = "RPM"?>
                     62: &lt;RDF:RDF>
                     63:  &lt;RDF:Description RDF:HREF="ftp://ftp.redhat.com/pub/contrib/i386/rpm2html-0.90-1.i386.rpm">
                     64:   &lt;RPM:Name>rpm2html&lt;/RPM:Name>
                     65:   &lt;RPM:Version>0.90&lt;/RPM:Version>
                     66:   &lt;RPM:Release>1&lt;/RPM:Release>
                     67:   &lt;RPM:Distribution>Unknown&lt;/RPM:Distribution>
                     68:   &lt;RPM:Vendor>Daniel Veillard&lt;/RPM:Vendor>
                     69:   &lt;RPM:Size>13244&lt;/RPM:Size>
1.4     ! daniel     70:   &lt;RPM:URL>http://rpmfind.net/linux/rpm2html/&lt;/RPM:URL>
1.1       veillard   71:   &lt;RPM:BuildDate>Sun Mar 29 19:44:53 EST 1998&lt;/RPM:BuildDate>
1.4     ! daniel     72:   &lt;RPM:BuildHost>rpmfind.net&lt;/RPM:BuildHost>
1.1       veillard   73:   &lt;RPM:Group>X11/Applications&lt;/RPM:Group>
                     74:   &lt;RPM:Packager>Daniel Veillard&lt;/RPM:Packager>
                     75:   &lt;RPM:Summary>Translates rpm database into html info&lt;/RPM:Summary>
                     76:   &lt;RPM:Sources>ftp://ftp.redhat.com/pub/contrib/SRPM/rpm2html-0.90-1.src.rpm&lt;/RPM:Sources>
                     77:   &lt;RPM:Description>
                     78:   &lt;/RPM:Description>
                     79: Rpm2html tries to solve 2 big problems one face when
                     80: grabbing a RPM package from a mirror on the net and trying to
                     81: install it:
                     82: 
                     83:    - it gives more information than just the filename before
                     84:      installing the package.
                     85:    - it tries to solve the dependancy problem by analyzing all
                     86:      the Provides and Requires of the set of RPMs. It shows the
                     87:      cross references by the way of hypertext links.
                     88:   &lt;RPM:Provides>
                     89:      &lt;RDF:Bag>
                     90:        &lt;RPM:Resource>rpm2html&lt;/RPM:Resource>
                     91:      &lt;/RDF:Bag>
                     92:   &lt;/RPM:Provides>
                     93:   &lt;RPM:Requires>
                     94:      &lt;RDF:Bag>
                     95:        &lt;RPM:Resource>libz.so.1&lt;/RPM:Resource>
                     96:        &lt;RPM:Resource>libdb.so.2&lt;/RPM:Resource>
                     97:        &lt;RPM:Resource>libc.so.6&lt;/RPM:Resource>
                     98:        &lt;RPM:Resource>ld-linux.so.2&lt;/RPM:Resource>
                     99:      &lt;/RDF:Bag>
                    100:   &lt;/RPM:Requires>
                    101:   &lt;RPM:Files>
                    102: /etc/rpm2html.config
                    103: /usr/bin
                    104: /usr/bin/rpm2html
                    105: /usr/doc/rpm2html-0.85
                    106: /usr/doc/rpm2html-0.85/CHANGES
1.3       veillard  107: /usr/doc/rpm2html-0.85/Copyright
1.1       veillard  108: /usr/doc/rpm2html-0.85/PRINCIPLES
                    109: /usr/doc/rpm2html-0.85/README
                    110: /usr/doc/rpm2html-0.85/TODO
                    111: /usr/doc/rpm2html-0.85/config.small
                    112: /usr/man/man1/rpm2html.1
                    113: /usr/share/rpm2html/msg.de
                    114: /usr/share/rpm2html/msg.es
                    115: /usr/share/rpm2html/msg.fr
                    116:   &lt;/RPM:Files>
                    117:  &lt;/RDF:Description>
                    118: &lt;/RDF:RDF>
                    119: </pre>
                    120: <p>
                    121: While this description is definitely not suitable for an human, it can be
                    122: parsed easily (basic RDF support is already present in Mozilla for example)
                    123: and numerous tools can take advantage of the medata to process the associated
                    124: data.</p>
                    125: <p>
                    126: As discussed and demonstrated quickly, these metadata can be easilly generated
                    127: from the packages themselve (has been done for both RPM and Debian packages),
                    128: and are usually quite smaller than the binary package themselves.</p>
                    129: 
                    130: <h3>Centralizing the Metadata</h3>
                    131: <p>
                    132: Since one cannot assume that one entity can actually generate the metadata for
                    133: all the available Linux binary packages available, some sort of distributed
                    134: work is needed, for example each maintainer of a Linux distribution or of a
                    135: set of packages can extract the metadata and make them available along with
                    136: the packages.</p>
                    137: <p>
                    138: The next step is to centralize these Metadata to build a database as complete
                    139: as possible, it can be done by mirroring the metadata provided by various
                    140: maintainers. The key point is that for each package the metadata has to be
                    141: generated once and uploaded once to the repository.</p>
                    142: <p>
                    143: Why centralizing ? Simply because metadata alone are not very useful, but the
                    144: cross references one can obtain by gathering and following multiple references
                    145: are usually far more useful. </p>
                    146: <p>
                    147: Moreover the bigger the database, the higher the probablility to answer a
                    148: request based on the associated data. Centralizing also ease the spreading of
                    149: data a lot !</p>
                    150: 
                    151: <h3>Propagating the Metadata</h3>
                    152: <p>
                    153: Once the Matadata have been gathered in a unique place, setting up a mirroring
                    154: scheme is esay and can be done in a very efficient way to propagate the data
                    155: near the final user, that's the basic mechanism set up for FTP mirrors, it's
                    156: well known and quite effective. The goal is to offer services based on these
                    157: Metadata and install them as close as possible from the final user.</p>
                    158: 
                    159: <h3> Expose the metadata</h3>
                    160: <p>
                    161: Once the Metadata are available a large amount of tools can be build to expose
                    162: and use their content. A basic idea is to build directories available for
                    163: searching and locating binary packages, for example rpm2html tool is being
                    164: modified to support RDF Metadata as input instead of the binary packages. This
                    165: allow the databases maintainer to point to a near mirror of the binary
                    166: packages. A lot of other tools can be build using the metadata, like automatic
                    167: checking of packages, smart installers following the metadata informations to
                    168: retrieve and install the latest packages and the correct dependancies, easier
                    169: management of clusters, etc.</p>
                    170: <address>
                    171: <p>
                    172: <a href="mailto:veillard@w3.org">Daniel Veillard</a> </p>
                    173: </address>
                    174: <p>
1.4     ! daniel    175: $Id: mirroring.html,v 1.3 1998/05/25 23:58:14 veillard Exp $</p>
1.1       veillard  176: 
                    177: <h3></h3>
                    178: </body>
                    179: </html>

Webmaster