Package rdflibUtils
[show private | hide private]
[frames | no frames]

Package rdflibUtils

Introduction
============
This package is a utility layer on the top of the excellent RDF Library, U{RDFLib<http://rdflib.net">}, created by Daniel Krech.
The documentation for the modules in the package are in:
        

  - L{myTripleStore<rdflibUtils.myTripleStore>}: some simple utilities on top of the RDFLib triple store
  - the core of a L{SPARQL API<rdflibUtils.sparql>} the core of a  implementation on top the triple store (sparql-p)
  - L{Graph Patterns<rdflibUtils.graphPattern>} for the SPARQL API
  - L{Operators<rdflibUtils.sparqlOperators>} for SPARQL

Utilities
=========
Some of the utilities are:

  - L{getPredicateValue<rdflibUtils.myTripleStore.myTripleStore.getPredicateValue>}:  just get the predicate value for a 
  (subject,predicate) pair. The situation where one
  knows from the context that there is only one value anyway occurs quite often, and this saves to write
  always the same cycle/exception handling stuff to do that.

  - L{getPredicateSubject<rdflibUtils.myTripleStore.myTripleStore.getPredicateSubject>}: the obvious counterpart of getPredicateSubject</li>

  - L{getSeq<rdflibUtils.myTripleStore.myTripleStore.getSeq>}: unfortunately, RDFLib does not handle Seq, it just returns an object of type 
  Seq and then one can get a bunch of _1, _2, etc, predicate values. Due to the way rdflib works, a triple search would return
  these in an arbitrary order. getSeq returns a (Python) array that sorts the rdf:li-s, essentially:
  the user can get to the resources in the order as they were defined in the original RDF.

  - L{unfoldCollection<rdflibUtils.myTripleStore.myTripleStore.unfoldCollection>}: does what it says, ie, creates a Python list from a collection.
  
  - L{clusterForward<rdflibUtils.myTripleStore.myTripleStore.clusterForward>}: using a seed as a subject, generates a separate Triple 
  Store with the transitive closure of the seed.
  
  - L{clusterBackward<rdflibUtils.myTripleStore.myTripleStore.clusterBackward>}: using a seed as a object, generates a separate Triple 
  Store with the transitive closure of the seed. 
  
  - L{cluster<rdflibUtils.myTripleStore.myTripleStore.cluster>}: the sum of the forward and backward clusters
                        
  - __add__, __mul__,__sub__ methods for set theoretical union, intersection and substraction. Note that
  the original implementation already has __iadd__ and __isub__, meaning that both the a = b+c and a+=b type
  arithmetics are now in place for addition and substraction (but not a *=b! This doe not really make sense...). 
  Note also that the superclass also implements __eq__, ie, triple sets can be compared.
  
  - L{extendRdfs<rdflibUtils.myTripleStore.myTripleStore.extendRdfs>}: a subset of the full RDFS entailement rules.

SPARQL
======

For a general description of the SPARQL API, see the separate, more complete U{description<http://dev.w3.org/cvsweb/%7Echeckout%7E/2004/PythonLib-IH/Doc/sparqlDesc.html>}. Note that the 
L{SPARQL<rdflibUtils.sparql.SPARQL>} class is a superclass of 
L{myTripleStore<rdflibUtils.myTripleStore.myTripleStore>},
ie, by referring to the latter, all SPARQL functionalities are automatically included. Their separation into
separate classes is for a better maintainability only.

Variables, Imports
==================

The top level (__init__.py) module of the Package imports the important classes. In other words, the
user may choose to use the following imports only::
        
        from rdflibUtils   import myTripleStore
        from rdflibUtils   import retrieveRDFFiles
        from rdflibUtils   import UniquenessError, SPARQLError
        from rdflibUtils   import GraphPattern

The module imports and/or creates some frequently used Namespaces, and these can then be imported by the user like::
        
        from rdflibUtils import ns_rdf
        
These are:
        

 - ns_rdf:  the RDF Namespace
 - ns_rdfs: the RDFS Namespace
 - ns_dc:   the Dublic Core namespace
 - ns_owl:  the OWL namespace

Finally, the package also has a set of convenience string defines for XML Schema datatypes (ie, the URI-s of the datatypes); ie, one can use::
        
        from rdflibUtils import type_string
        from rdflibUtils import type_integer
        from rdflibUtils import type_long
        from rdflibUtils import type_double
        from rdflibUtils import type_float
        from rdflibUtils import type_decimal
        from rdflibUtils import type_dateTime
        from rdflibUtils import type_date
        from rdflibUtils import type_time
        from rdflibUtils import type_duration

These are used, for example, in the sparql-p implementation. The extra method

    from rdflibUtils import getLiteralValue
        
can also be used to convert a Literal to the corresponding python datatype.

The three most important classes in RDFLib for the average user are Namespace, URIRef and Literal; these
are also imported, so the user can also use, eg::
        
        from rdflibUtils import Namespace, URIRef, Literal

History
=======

 - Version 1.0: based on an earlier version of the SPARQL, first released implementation
 - Version 2.0: version based on the March 2005 SPARQL document, also a major change of the core code (introduction of the separate L{GraphPattern<rdflibUtils.graphPattern.GraphPattern>} class, etc). 
 - Version 2.01: minor changes only:
         - switch to epydoc as a documentation tool, it gives a much better overview of the classes
         - addition of the SELECT * feature to sparql-p
 - Version 2.02: 
         - added some methods to L{myTripleStore<rdflibUtils.myTripleStore.myTripleStore>} to handle C{Alt} and C{Bag} the same
         way as C{Seq}
         - added also methods to I{add} collections and containers to the triple store, not only retrieve them
        
@author: U{Ivan Herman<http://www.ivan-herman.net>}
@license: This software is available for use under the 
U{W3C Software License<http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231>}
@contact: Ivan Herman, ivan@ivan-herman.net
@version: 2.02

Submodules
  • fixLiteral: Fixing the Literal implementation in RDFLib
  • graphPattern: Graph pattern class used by the SPARQL implementation
  • myTripleStore: Some utility functions built on the top of rdflib
  • sparql: SPARQL implementation on top of RDFLib Implementation of the <a href="http://www.w3.org/TR/rdf-sparql-query/">W3C SPARQL</a> language (version April 2005).
  • sparqlOperators

Function Summary
a function suitable as a sparql filter generateCollectionConstraint(triplets, collection, item)
Generate a function that can then be used as a global constaint in sparql to check whether the 'item' is an element of the 'collection' (a.k.a.
(myTripleStore,errmessage) retrieveRDFData(rdfDirs, rdfFiles, silent, extendRdfs, tripleStore)
Retrieve and parse list of RDF files (encoded in RDF/XML), collecting all files with an '.rdf' suffix from a set of directories, plus a number of explicit RDF files.
myTripleStore retrieveRDFFiles(rdfFiles, silent, extendRdfs, tripleStore)
Retrieve a list of RDF files (encoded in RDF/XML).

Function Details

generateCollectionConstraint(triplets, collection, item)

Generate a function that can then be used as a global constaint in sparql to check whether the 'item' is an element of the 'collection' (a.k.a. list). Both collection and item can be a real resource or a query string. Furthermore, item might be a plain string, that is then turned into a literal run-time.

The function returns an adapted filter method that can then be plugged into a sparql request.
Parameters:
triplets - the myTripleStore instance
collection - is either a query string (that has to be bound by the query) or an RDFLib Resource representing the collection
item - is either a query string (that has to be bound by the query) or an RDFLib Resource that must be tested to be part of the collection
Returns:
a function suitable as a sparql filter
Raises:
rdflibUtils.sparql.SPARQLError - if the collection or the item parameters are illegal (not valid resources for a collection or an object)

retrieveRDFData(rdfDirs, rdfFiles, silent=False, extendRdfs=False, tripleStore=None)

Retrieve and parse list of RDF files (encoded in RDF/XML), collecting all files with an '.rdf' suffix from a set of directories, plus a number of explicit RDF files.
Parameters:
rdfDirs - either a string (for one directory) or a list of directory paths. Can be set to None (in which case it will be ignored). Each directory is searched for RDF/XML files to be parsed and added to the triple store
rdfFiles - either a string (for one file) or a list of strings for the paths of additional RDF files
silent - if True, a message is printed after having loaded an RDF file
           (type=Boolean)
extendRdfs - whether the mini-RDFS entailement should be performed right after the creation of the triple store (ie, whether the extendRdfs method should be invoked or not). The user can call that method later if the triple store is built up in several steps
           (type=Boolean)
tripleStore - if not None, it is considered to be an existing triple store (for incremental parsing)
Returns:
a (store,errmessage) tuple. The store contains all tuples that could be parsed in; errmessage is a list of possible sys.exc_info() messages with exceptions that might have been raised during parsing.
           (type=(myTripleStore,errmessage))

retrieveRDFFiles(rdfFiles, silent=False, extendRdfs=False, tripleStore=None)

Retrieve a list of RDF files (encoded in RDF/XML). The rdflib parser may raise Parse exceptions (see rdflib documentation).
Parameters:
rdfFiles - either a string (for one file) or a list of strings
silent - if True, a message is printed after having loaded an RDF file
           (type=Boolean)
extendRdfs - whether the mini-RDFS entailement should be performed right after the creation of the triple store (ie, whether the extendRdfs method should be invoked or not). The user can call that method later if the triple store is built up in several steps
           (type=Boolean)
tripleStore - if not None, it is considered to be an existing triple store (for incremental parsing)
Returns:
myTripleStore

Generated by Epydoc 2.1 on Sat Aug 06 12:38:25 2005 http://epydoc.sf.net