Package rdflibUtils
Introduction
============
This package is a utility layer on the top of the excellent RDF Library, U{RDFLib<http://rdflib.net">}, created by Daniel Krech.
The documentation for the modules in the package are in:
- L{myTripleStore<rdflibUtils.myTripleStore>}: some simple utilities on top of the RDFLib triple store
- the core of a L{SPARQL API<rdflibUtils.sparql>} the core of a implementation on top the triple store (sparql-p)
- L{Graph Patterns<rdflibUtils.graphPattern>} for the SPARQL API
- L{Operators<rdflibUtils.sparqlOperators>} for SPARQL
Utilities
=========
Some of the utilities are:
- L{getPredicateValue<rdflibUtils.myTripleStore.myTripleStore.getPredicateValue>}: just get the predicate value for a
(subject,predicate) pair. The situation where one
knows from the context that there is only one value anyway occurs quite often, and this saves to write
always the same cycle/exception handling stuff to do that.
- L{getPredicateSubject<rdflibUtils.myTripleStore.myTripleStore.getPredicateSubject>}: the obvious counterpart of getPredicateSubject</li>
- L{getSeq<rdflibUtils.myTripleStore.myTripleStore.getSeq>}: unfortunately, RDFLib does not handle Seq, it just returns an object of type
Seq and then one can get a bunch of _1, _2, etc, predicate values. Due to the way rdflib works, a triple search would return
these in an arbitrary order. getSeq returns a (Python) array that sorts the rdf:li-s, essentially:
the user can get to the resources in the order as they were defined in the original RDF.
- L{unfoldCollection<rdflibUtils.myTripleStore.myTripleStore.unfoldCollection>}: does what it says, ie, creates a Python list from a collection.
- L{clusterForward<rdflibUtils.myTripleStore.myTripleStore.clusterForward>}: using a seed as a subject, generates a separate Triple
Store with the transitive closure of the seed.
- L{clusterBackward<rdflibUtils.myTripleStore.myTripleStore.clusterBackward>}: using a seed as a object, generates a separate Triple
Store with the transitive closure of the seed.
- L{cluster<rdflibUtils.myTripleStore.myTripleStore.cluster>}: the sum of the forward and backward clusters
- __add__, __mul__,__sub__ methods for set theoretical union, intersection and substraction. Note that
the original implementation already has __iadd__ and __isub__, meaning that both the a = b+c and a+=b type
arithmetics are now in place for addition and substraction (but not a *=b! This doe not really make sense...).
Note also that the superclass also implements __eq__, ie, triple sets can be compared.
- L{extendRdfs<rdflibUtils.myTripleStore.myTripleStore.extendRdfs>}: a subset of the full RDFS entailement rules.
SPARQL
======
For a general description of the SPARQL API, see the separate, more complete U{description<http://dev.w3.org/cvsweb/%7Echeckout%7E/2004/PythonLib-IH/Doc/sparqlDesc.html>}. Note that the
L{SPARQL<rdflibUtils.sparql.SPARQL>} class is a superclass of
L{myTripleStore<rdflibUtils.myTripleStore.myTripleStore>},
ie, by referring to the latter, all SPARQL functionalities are automatically included. Their separation into
separate classes is for a better maintainability only.
Variables, Imports
==================
The top level (__init__.py) module of the Package imports the important classes. In other words, the
user may choose to use the following imports only::
from rdflibUtils import myTripleStore
from rdflibUtils import retrieveRDFFiles
from rdflibUtils import UniquenessError, SPARQLError
from rdflibUtils import GraphPattern
The module imports and/or creates some frequently used Namespaces, and these can then be imported by the user like::
from rdflibUtils import ns_rdf
These are:
- ns_rdf: the RDF Namespace
- ns_rdfs: the RDFS Namespace
- ns_dc: the Dublic Core namespace
- ns_owl: the OWL namespace
Finally, the package also has a set of convenience string defines for XML Schema datatypes (ie, the URI-s of the datatypes); ie, one can use::
from rdflibUtils import type_string
from rdflibUtils import type_integer
from rdflibUtils import type_long
from rdflibUtils import type_double
from rdflibUtils import type_float
from rdflibUtils import type_decimal
from rdflibUtils import type_dateTime
from rdflibUtils import type_date
from rdflibUtils import type_time
from rdflibUtils import type_duration
These are used, for example, in the sparql-p implementation. The extra method
from rdflibUtils import getLiteralValue
can also be used to convert a Literal to the corresponding python datatype.
The three most important classes in RDFLib for the average user are Namespace, URIRef and Literal; these
are also imported, so the user can also use, eg::
from rdflibUtils import Namespace, URIRef, Literal
History
=======
- Version 1.0: based on an earlier version of the SPARQL, first released implementation
- Version 2.0: version based on the March 2005 SPARQL document, also a major change of the core code (introduction of the separate L{GraphPattern<rdflibUtils.graphPattern.GraphPattern>} class, etc).
- Version 2.01: minor changes only:
- switch to epydoc as a documentation tool, it gives a much better overview of the classes
- addition of the SELECT * feature to sparql-p
- Version 2.02:
- added some methods to L{myTripleStore<rdflibUtils.myTripleStore.myTripleStore>} to handle C{Alt} and C{Bag} the same
way as C{Seq}
- added also methods to I{add} collections and containers to the triple store, not only retrieve them
@author: U{Ivan Herman<http://www.ivan-herman.net>}
@license: This software is available for use under the
U{W3C Software License<http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231>}
@contact: Ivan Herman, ivan@ivan-herman.net
@version: 2.02
| Submodules |
-
fixLiteral: Fixing the Literal implementation in RDFLib
-
graphPattern: Graph pattern class used by the SPARQL implementation
-
myTripleStore: Some utility functions built on the top of rdflib
-
sparql: SPARQL implementation on top of RDFLib Implementation of the <a
href="http://www.w3.org/TR/rdf-sparql-query/">W3C
SPARQL</a> language (version April 2005).
-
sparqlOperators
|
| Function Summary |
| a function suitable as a sparql filter
|
generateCollectionConstraint(triplets,
collection,
item)
Generate a function that can then be used as a global constaint in
sparql to check whether the 'item' is an element of the 'collection'
(a.k.a. |
(myTripleStore,errmessage)
|
retrieveRDFData(rdfDirs,
rdfFiles,
silent,
extendRdfs,
tripleStore)
Retrieve and parse list of RDF files (encoded in RDF/XML), collecting
all files with an '.rdf' suffix from a set of directories, plus a
number of explicit RDF files. |
myTripleStore
|
retrieveRDFFiles(rdfFiles,
silent,
extendRdfs,
tripleStore)
Retrieve a list of RDF files (encoded in RDF/XML). |
generateCollectionConstraint(triplets,
collection,
item)
Generate a function that can then be used as a global constaint in
sparql to check whether the 'item' is an element of the 'collection'
(a.k.a. list). Both collection and item can be a real resource or a
query string. Furthermore, item might be a plain string, that is then
turned into a literal run-time.
The function returns an adapted filter method that can then be
plugged into a sparql request.
-
- Parameters:
triplets -
the myTripleStore instance
collection -
is either a query string (that has to be bound by the query)
or an RDFLib Resource representing the collection
item -
is either a query string (that has to be bound by the query)
or an RDFLib Resource that must be tested to be part of the
collection
- Returns:
-
a function suitable as a sparql filter
- Raises:
rdflibUtils.sparql.SPARQLError -
if the collection or the item parameters are illegal (not
valid resources for a collection or an object)
|
retrieveRDFData(rdfDirs,
rdfFiles,
silent=False,
extendRdfs=False,
tripleStore=None)
Retrieve and parse list of RDF files (encoded in RDF/XML),
collecting all files with an '.rdf' suffix from a set of
directories, plus a number of explicit RDF files.
-
- Parameters:
rdfDirs -
either a string (for one directory) or a list of directory
paths. Can be set to None (in which case it will be ignored).
Each directory is searched for RDF/XML files to be parsed and
added to the triple store
rdfFiles -
either a string (for one file) or a list of strings for the
paths of additional RDF files
silent -
if True, a message is printed after having loaded an RDF
file
(type=Boolean)
extendRdfs -
whether the mini-RDFS entailement should be performed right
after the creation of the triple store (ie, whether the extendRdfs method should be invoked
or not). The user can call that method later if the triple store
is built up in several steps
(type=Boolean)
tripleStore -
if not None, it is considered to be an existing triple store
(for incremental parsing)
- Returns:
-
a (store,errmessage) tuple. The store contains all tuples that
could be parsed in; errmessage is a list of possible
sys.exc_info() messages with exceptions that might have been
raised during parsing.
(type=(myTripleStore,errmessage))
|
retrieveRDFFiles(rdfFiles,
silent=False,
extendRdfs=False,
tripleStore=None)
Retrieve a list of RDF files (encoded in RDF/XML). The rdflib parser
may raise Parse exceptions (see rdflib documentation).
-
- Parameters:
rdfFiles -
either a string (for one file) or a list of strings
silent -
if True, a message is printed after having loaded an RDF
file
(type=Boolean)
extendRdfs -
whether the mini-RDFS entailement should be performed right
after the creation of the triple store (ie, whether the extendRdfs method should be invoked
or not). The user can call that method later if the triple store
is built up in several steps
(type=Boolean)
tripleStore -
if not None, it is considered to be an existing triple store
(for incremental parsing)
- Returns:
-
myTripleStore
|