Annotation of rpm2html/mirroring.html, revision 1.4
1.1 veillard 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2: <html>
3: <head>
4: <title>
5: Linux Packages Metadata Mirroring Proposal</title>
6: <meta name="GENERATOR" content="amaya V1.3">
7: </head>
8: <body bgcolor="#ffffff">
9:
10: <h1 align=center>Linux Packages Metadata Mirroring Proposal</h1>
11: <p>
12: What does this title means ? Simply that there is currently a huge amount of
13: precompiled software freely available for Linux, but it's hard to find the one
14: one need:</p>
15: <ol>
16: <li>
17: It is difficult to locate the actual package needed to fullfill the needs of
18: the user (I want a graphic editor which support PNG format).
19: <li>
20: Once the program has been found, getting the binary version for a given system
21: setup (distribution version, architecture, etc ...) is usually hard.
22: <li>
23: Now getting the nearest binary, to minimize the download time again proves
24: difficult.
25: <li>
26: If the correct package has been found sometime it is impossible to install due
27: to missed dependancies, and the quest continues !
28: </ol>
29: <p>
30: I suggest here a mechanism to help finding the packages needed. It is based on
31: the propagation of Metadata (structured, machine readable informations) about
32: the available binary packages. It is based on the lessons learned by building
1.4 ! daniel 33: the <a href="http://rpmfind.net/linux/RPM/">RPM database on rpmfind.net</a>,
! 34: the <a href="http://rpmfind.net/linux/rpm2html/">rpm2html</a> program
1.1 veillard 35: development, the work done on <a href="http://www.w3.org/Metadata/">Metadata
36: at W3C</a>, and discussions with <a href="mailto:jim@jimpick.com">Jim Pick</a>
37: (Debian) and <a href="mailto:marc@redhat.com">Marc Ewing</a> (RedHat).</p>
38: <p>
39: <img src="mirroring.gif" alt=" mirroring.gif "></p>
40: <p>
41: The picture above illustrate the four steps needed: to create, centralize,
42: propagate and expose packages metadata</p>
43:
44: <h3>Extracting Metadata from binary packages</h3>
45: <p>
46: Basically the idea is to extract useful information about a package, like
47: application name, revision, author, dependancies, etc., and save them in a
48: format that is predefined and can be automatically parsed to extract the
49: informations. This is precisely <a
50: href="http://www.w3.org/Metadata/">Metadata</a> (data about data) and I
51: suggest to use the <a href="http://www.w3.org/TR/WD-rdf-syntax/">RDF</a>
52: Metadata encoding - based on <a href="http://www.w3.org/XML/">XML</a> - the
53: metadata encoding proposed by <a href="http://www.w3.org/">W3C</a>. Ideally
54: the description for the metadata should be independant of the package format,
55: however in practice it may be, for example the package dependancies are more
56: sophisticated in <a href="http://www.debian.org/">Debian</a> packages than in
57: <a href="http://www.rpm.org/">RPM</a> ones. Here is for example an RDF file
1.2 veillard 58: describing the RPM package "rpm2html-0.90-1.i386.rpm" :</p>
1.1 veillard 59: <pre><?XML version="1.0">
60: <?namespace href ="http://www.w3.org/TR/WD-rdf-syntax#/" AS = "RDF"?>
61: <?namespace href ="http://www.rpm.org/" AS = "RPM"?>
62: <RDF:RDF>
63: <RDF:Description RDF:HREF="ftp://ftp.redhat.com/pub/contrib/i386/rpm2html-0.90-1.i386.rpm">
64: <RPM:Name>rpm2html</RPM:Name>
65: <RPM:Version>0.90</RPM:Version>
66: <RPM:Release>1</RPM:Release>
67: <RPM:Distribution>Unknown</RPM:Distribution>
68: <RPM:Vendor>Daniel Veillard</RPM:Vendor>
69: <RPM:Size>13244</RPM:Size>
1.4 ! daniel 70: <RPM:URL>http://rpmfind.net/linux/rpm2html/</RPM:URL>
1.1 veillard 71: <RPM:BuildDate>Sun Mar 29 19:44:53 EST 1998</RPM:BuildDate>
1.4 ! daniel 72: <RPM:BuildHost>rpmfind.net</RPM:BuildHost>
1.1 veillard 73: <RPM:Group>X11/Applications</RPM:Group>
74: <RPM:Packager>Daniel Veillard</RPM:Packager>
75: <RPM:Summary>Translates rpm database into html info</RPM:Summary>
76: <RPM:Sources>ftp://ftp.redhat.com/pub/contrib/SRPM/rpm2html-0.90-1.src.rpm</RPM:Sources>
77: <RPM:Description>
78: </RPM:Description>
79: Rpm2html tries to solve 2 big problems one face when
80: grabbing a RPM package from a mirror on the net and trying to
81: install it:
82:
83: - it gives more information than just the filename before
84: installing the package.
85: - it tries to solve the dependancy problem by analyzing all
86: the Provides and Requires of the set of RPMs. It shows the
87: cross references by the way of hypertext links.
88: <RPM:Provides>
89: <RDF:Bag>
90: <RPM:Resource>rpm2html</RPM:Resource>
91: </RDF:Bag>
92: </RPM:Provides>
93: <RPM:Requires>
94: <RDF:Bag>
95: <RPM:Resource>libz.so.1</RPM:Resource>
96: <RPM:Resource>libdb.so.2</RPM:Resource>
97: <RPM:Resource>libc.so.6</RPM:Resource>
98: <RPM:Resource>ld-linux.so.2</RPM:Resource>
99: </RDF:Bag>
100: </RPM:Requires>
101: <RPM:Files>
102: /etc/rpm2html.config
103: /usr/bin
104: /usr/bin/rpm2html
105: /usr/doc/rpm2html-0.85
106: /usr/doc/rpm2html-0.85/CHANGES
1.3 veillard 107: /usr/doc/rpm2html-0.85/Copyright
1.1 veillard 108: /usr/doc/rpm2html-0.85/PRINCIPLES
109: /usr/doc/rpm2html-0.85/README
110: /usr/doc/rpm2html-0.85/TODO
111: /usr/doc/rpm2html-0.85/config.small
112: /usr/man/man1/rpm2html.1
113: /usr/share/rpm2html/msg.de
114: /usr/share/rpm2html/msg.es
115: /usr/share/rpm2html/msg.fr
116: </RPM:Files>
117: </RDF:Description>
118: </RDF:RDF>
119: </pre>
120: <p>
121: While this description is definitely not suitable for an human, it can be
122: parsed easily (basic RDF support is already present in Mozilla for example)
123: and numerous tools can take advantage of the medata to process the associated
124: data.</p>
125: <p>
126: As discussed and demonstrated quickly, these metadata can be easilly generated
127: from the packages themselve (has been done for both RPM and Debian packages),
128: and are usually quite smaller than the binary package themselves.</p>
129:
130: <h3>Centralizing the Metadata</h3>
131: <p>
132: Since one cannot assume that one entity can actually generate the metadata for
133: all the available Linux binary packages available, some sort of distributed
134: work is needed, for example each maintainer of a Linux distribution or of a
135: set of packages can extract the metadata and make them available along with
136: the packages.</p>
137: <p>
138: The next step is to centralize these Metadata to build a database as complete
139: as possible, it can be done by mirroring the metadata provided by various
140: maintainers. The key point is that for each package the metadata has to be
141: generated once and uploaded once to the repository.</p>
142: <p>
143: Why centralizing ? Simply because metadata alone are not very useful, but the
144: cross references one can obtain by gathering and following multiple references
145: are usually far more useful. </p>
146: <p>
147: Moreover the bigger the database, the higher the probablility to answer a
148: request based on the associated data. Centralizing also ease the spreading of
149: data a lot !</p>
150:
151: <h3>Propagating the Metadata</h3>
152: <p>
153: Once the Matadata have been gathered in a unique place, setting up a mirroring
154: scheme is esay and can be done in a very efficient way to propagate the data
155: near the final user, that's the basic mechanism set up for FTP mirrors, it's
156: well known and quite effective. The goal is to offer services based on these
157: Metadata and install them as close as possible from the final user.</p>
158:
159: <h3> Expose the metadata</h3>
160: <p>
161: Once the Metadata are available a large amount of tools can be build to expose
162: and use their content. A basic idea is to build directories available for
163: searching and locating binary packages, for example rpm2html tool is being
164: modified to support RDF Metadata as input instead of the binary packages. This
165: allow the databases maintainer to point to a near mirror of the binary
166: packages. A lot of other tools can be build using the metadata, like automatic
167: checking of packages, smart installers following the metadata informations to
168: retrieve and install the latest packages and the correct dependancies, easier
169: management of clusters, etc.</p>
170: <address>
171: <p>
172: <a href="mailto:veillard@w3.org">Daniel Veillard</a> </p>
173: </address>
174: <p>
1.4 ! daniel 175: $Id: mirroring.html,v 1.3 1998/05/25 23:58:14 veillard Exp $</p>
1.1 veillard 176:
177: <h3></h3>
178: </body>
179: </html>
Webmaster