W3C

HTML5+RDFa

A mechanism for embedding RDF in HTML

Editor's Draft 13 July 2009

Latest Published Version:
Not published
Latest Editor's Draft:
http://dev.w3.org/html5/rdfa/rdfa-module.html
Previous Versions:
None
Contributors (alphabetical order):
Ben Adida (Chair, Creative Commons)
Mark Birbeck (Editor, RDFa Core and inventor of RDFa concept, Web Backplane Ltd.)
Shane McCarron (Editor, RDFa Core, Applied Testing and Technology, Inc.)
Steven Pemberton (Chair, XHTML2, CWI)
Manu Sporny, (Editor, HTML5+RDFa, Digital Bazaar, Inc.)
Ralph R. Swick (Domain Leader, Technology and Society, W3C)

Abstract

This specification defines rules and guidelines for adapting the RDF in XHTML: Syntax and Processing (RDFa) specification for use in the HTML5 and XHTML5 members of the HTML family. The rules defined in this document not only apply to HTML5 documents, but also to HTML4 documents interpreted through the HTML5 parsing rules.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the most recently formally published revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

If you wish to make comments regarding this document, please send them to public-rdf-in-xhtml-tf@w3.org (subscribe, archives)

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.

The latest stable version of the editor's draft of this specification is always available on the W3C CVS server. The latest editor's working copy (which may contain unfinished text in the process of being prepared) is also available.

The W3C HTML Working Group is the W3C working group responsible for this specification's progress along the W3C Recommendation track.

Most of this text is intended to be included in the HTML5 specification as a section of the overall specification.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1 RDFa

1.1 Issues

This section outlines a number of editorial issues with the RDFa section of the HTML5 specification.

In order to provide a module that can be authored, inserted and moved easily within the HTML5 specification, the RDFa specification section is being edited separately from the main HTML5 specification source file. There are two documents that are generated from the RDFa specification source. The first is the full HTML5 specification, which includes the RDFa specification section. The second is the stand-alone HTML5+RDFa document.

The upside to having two documents generated from the same source mainly has to do with load-times for the HTML5 specification in web browsers. Loading the 4MB HTML5 specification can be very slow, even in Firefox 3.5 or Chrome. So for those that want to just look at the RDFa specification text, there is a much smaller, separate document for that purpose.

Unfortunately, there are a number of down-sides with this approach. The first is that the specification language becomes more verbose. The second is that cross-references within the HTML5 document are impossible due to a bug/feature in the Anolis specification processor.

These down-sides are not ideal and will eventually be remedied as we find a way to either fix Anolis or no longer support the smaller, standalone HTML5+RDFa document.

1.2 Introduction

This section is informative.

In early 2004, Mark Birbeck published a document named [XHTML and RDF] via the XHTML2 Working Group wherein he laid the groundwork for what would eventually become RDFa (The Resource Description Framework in Attributes).

In 2006, the work was co-sponsored by the Semantic Web Deployment Work Group, which began to formalize a technology to express semantic data in XHTML. This technology was successfully developed and reached consensus at the W3C, later published as an official W3C Recommendation. While HTML provides a mechanism to express the structure of a document (title, paragraphs, links), RDFa provides a mechanism to express the meaning in a document (people, places, events).

The document, titled "RDF in XHTML: Syntax and Processing" [XHTML+RDFa], defined a set of attributes and rules for processing those attributes that resulted in the output of machine-readable semantic data. While the document applied to XHTML, the attributes and rules were always intended to operate across any tree-based structure containing attributes on tree nodes (such as HTML4, SVG and ODF).

While RDFa was initially specified for use in XHTML, adoption by a number of large organizations on the Web spurred RDFa's use in non-XHTML languages. Its use in HTML4, before an official specification was developed for those languages, caused concern regarding document conformance.

Over the years, the members of the RDFa Task Force [RDFaTF] had discussed the possibility of applying the same attributes and processing rules outlined in the XHTML+RDFa specification to all HTML family documents. By design, the possibility of a unified semantic data expression mechanism between all HTML and XHTML family documents was squarely in the realm of possibility.

This section describes the modifications to the original XHTML+RDFa specification that permit the use of RDFa in all HTML family documents. By using the attributes and processing rules described in the XHTML+RDFa specification and heeding the minor changes in this section, authors can expect to generate markup that produces the same semantic data output in HTML4, HTML5 and XHTML5.

This section has been prepared by Manu Sporny (President/CEO of Digital Bazaar, Inc.) in consultation with key members of the RDFa in XHTML Task Force, the HTML WG, the WHAT WG, and other interested parties.

1.3 Parsing Model

Section 5 of the [XHTML+RDFa] specification defines a generic processing model for extracting RDF from a tree-based model. The method of transforming an input document into a model suited for the RDFa processing rules is intentionally not defined in the XHTML+RDFa specification. The method of transformation was intended to be defined in the implementation language, in this case, this section of the HTML5 specification.

In the context of the HTML5 specification, the parsing rules for an input document in HTML4 and HTML5 are clearly defined. The processing model defined in Section 5 of the XHTML+RDFa specification should be executed on the HTML5 DOM. While the HTML5 DOM is not currently stable, a parsing mechanism built on top of the html5lib library should provide a mechanism that is guaranteed to eventually provide a stable, tree-based model for the RDFa processing rules.

RDFa's tree-based processing rules enable an input document to be automatically corrected, cleaned-up, re-arranged, or modified in any way that is approved by the host language. For example, element nesting issues in HTML documents may be corrected before the input document is serialized into the tree-based model on which the RDFa processing rules will operate.

1.4 Conformance Requirements

This section is normative.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Note that all examples in this document are informative, and are not meant to be interpreted as normative requirements.

1.4.1 Document Conformance

In order for a document to claim that it is a conforming HTML+RDFa document, it must provide the facilities described as mandatory in this section. The document conformance criteria are listed below, of which only a subset are mandatory:

  1. There should be a DOCTYPE declaration specified prior to the root element in the document that follows the conventions outlined in the "The DOCTYPE section" of the HTML5 specification.
    This item may be removed once this document is integrated with the HTML5 specification. It is currently included for the purposes of clarifying the full set of Document Conformance requirements for an HTML5+RDFa document.
  2. The root element of the document must follow the conventions outlined in "The root element" section of the HTML5 specification.
    This item may be removed once this document is integrated with the HTML5 specification. It is currently included for the purposes of clarifying the full set of Document Conformance requirements for an HTML5+RDFa document.
  3. There may be a link element contained in the head element that contains profile for the the rel attribute and http://www.w3.org/1999/xhtml/vocab for the href attribute.
    This requires the HTML5 spec to add profile to the list of allow-able rel-values. This is used as the signalling mechansim for an RDFa document because the profile attribute is deprecated in HTML5.
    There has also been strong support from the RDFa Task Force that the profile attribute should be retained in HTML5, as it provides an "out-of-band" mechanism for signaling that the document contains RDFa. The profile attribute may also be used extensively to provide [RDFa Profiles] support. Adding profile to the list of rel values and using it to signal that the document contains RDFa places document processing instructions into the RDF graph, which is problematic.

1.4.2 User Agent Conformance

A conforming RDFa user agent must:

1.4.3 RDFa Processor Conformance

A conforming RDFa Processor must implement all of the mandatory features specified in the XHTML+RDFa specification. It must also support any mandatory features specified in the RDFa section of the HTML5 specification.

1.5 Modifications to XHTML+RDFa

This section is normative.

The [XHTML+RDFa] Recommendation is the base document on which this section builds. That document specifies the attributes and processing rules for extracting RDF from an XHTML document. This section specifies changes to the attributes and processing rules defined in XHTML+RDFa in order to support extracting RDF from HTML documents.

1.5.1 Specifying the language for a literal

The lang attribute must be supported in the same manner as the xml:lang attribute is in the XHTML+RDFa specification. The precedence rules for selecting which value overrides the other is outlined in the section titled "The lang and xml:lang attributes" in the HTML5 specification.

If an author is unsure of the final encapsulating DOCTYPE for their markup, such as HTML5 vs. XHTML5, it is suggested that the author specify both lang and xml:lang where the value in both attributes is exactly the same.

1.5.2 Invalid XMLLiteral values

When generating literals of type XMLLiteral, the processor must ensure that the output XMLLiteral is well-formed XML. If the input is not well-formed XML, the processor must transform the input text in a way that generates well-formed XML.

Since the HTML5 specification already has an algorithm for serializing a DOM subtree into XHTML5, the RDFa Task Force is considering re-using that algorithm.

Transformation to well-formed XML is required because an application that consumes XMLLiteral data expects that data to be well-formed.

The transformation requirement does not apply to input data that are text-only, such as literals that contain a datatype attribute with an empty value (""), or input data that that contain only text nodes.

1.5.3 The xmlns: attribute

There have been various objections to the usage of the xmlns: attribute across all HTML family languages. It is currently unknown whether or not the xmlns: attribute will be supported in HTML5 as it is defined in the [Namespaces in XML] specification. The RDFa Task Force is exploring alternatives to xmlns: that may be used in non-XML languages.
While deprecation of the xmlns: attribute is attempted in HTML5, it must still be available to ensure backwards-compatability for existing XHTML code snippets on the web. Which raises the question, if HTML5 subsumes XHTML 1.0 documents, and it is going to be long-lived, and xmlns: is required to ensure backwards with XHTML documents, then there is no choice but to support xmlns: as it is defined in [Namespaces in XML].

CURIE prefix mappings specified using xmlns: must be processed using the rules specified in the [Namespaces in XML] Recommendation.

Since CURIE prefix mappings have been specified using xmlns:, and since HTML attribute names are case-insensitive, CURIE prefix names declared using the xmlns:attribute-name pattern xmlns:<PREFIX>="<URI>" should be specified using only lower-case characters. For example, the text "xmlns:" and the text in "<PREFIX>" should be lower-case only. This is to ensure that prefix mappings are interpreted in the same way between HTML (case-insensitive attribute names) and XHTML (case-sensitive attribute names) document types.