This is revision 1.3627.
Status: Controversial Working Draft. ISSUE-76 (Microdata/RDFa) blocks progress to Last Call
Status: Last call for comments
Status: Last call for comments
This section is non-normative.
Sometimes, it is desirable to annotate content with specific machine-readable labels, e.g. to allow generic scripts to provide services that are customised to the page, or to enable content from a variety of cooperating authors to be processed by a single script in a consistent manner.
For this purpose, authors can use the microdata features described in this section. Microdata allows nested groups of name-value pairs to be added to documents, in parallel with the existing content.
Status: Last call for comments
This section is non-normative.
At a high level, microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property. Items and properties are represented by regular elements.
To create an item, the itemscope attribute is used.
To add a property to an item, the itemprop attribute is used on one of
the item's descendants.
Here there are two items, each of which has the property "name":
<div itemscope> <p>My name is <span itemprop="name">Elizabeth</span>.</p> </div> <div itemscope> <p>My name is <span itemprop="name">Daniel</span>.</p> </div>
Properties generally have values that are strings.
Here the item has three properties:
<div itemscope> <p>My name is <span itemprop="name">Neil</span>.</p> <p>My band is called <span itemprop="band">Four Parts Water</span>.</p> <p>I am <span itemprop="nationality">British</span>.</p> </div>
Properties can also have values that are URLs. This is achieved using the a
element and its href
attribute, the img element and its src attribute, or other elements that
link to or embed external resources.
In this example, the item has one property, "image", whose value is a URL:
<div itemscope> <img itemprop="image" src="google-logo.png" alt="Google"> </div>
Properties can also have values that are dates, times, or dates
and times. This is achieved using the time element and
its datetime attribute.
In this example, the item has one property, "birthday", whose value is a date:
<div itemscope> I was born on <time itemprop="birthday" datetime="2009-05-10">May 10th 2009</time>. </div>
Properties can also themselves be groups of name-value pairs, by
putting the itemscope attribute
on the element that declares the property.
Items that are not part of others are called top-level microdata items.
In this example, the outer item represents a person, and the inner one represents a band:
<div itemscope> <p>Name: <span itemprop="name">Amanda</span></p> <p>Band: <span itemprop="band" itemscope> <span itemprop="name">Jazz Band</span> (<span itemprop="size">12</span> players)</span></p> </div>
The outer item here has two properties, "name" and "band". The "name" is "Amanda", and the "band" is an item in its own right, with two properties, "name" and "size". The "name" of the band is "Jazz Band", and the "size" is "12".
The outer item in this example is a top-level microdata item.
Properties that are not descendants of the element with the itemscope attribute can be associated
with the item using the itemref attribute. This attribute takes
a list of IDs of elements to crawl in addition to crawling the
children of the element with the itemscope attribute.
This example is the same as the previous one, but all the properties are separated from their items:
<div itemscope id="amanda" itemref="a b"></div> <p id="a">Name: <span itemprop="name">Amanda</span></p> <div id="b" itemprop="band" itemscope itemref="c"></div> <div id="c"> <p>Band: <span itemprop="name">Jazz Band</span></p> <p>Size: <span itemprop="size">12</span> players</p> </div>
This gives the same result as the previous example. The first item has two properties, "name", set to "Amanda", and "band", set to another item. That second item has two further properties, "name", set to "Jazz Band", and "size", set to "12".
An item can have multiple properties with the same name and different values.
This example describes an ice cream, with two flavors:
<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul> </div>
This thus results in an item with two properties, both "flavor", having the values "Lemon sorbet" and "Apricot sorbet".
An element introducing a property can also introduce multiple properties at once, to avoid duplication when some of the properties have the same value.
Here we see an item with two properties, "favorite-color" and "favorite-fruit", both set to the value "orange":
<div itemscope> <span itemprop="favorite-color favorite-fruit">orange</span> </div>
It's important to note that there is no relationship between the microdata and the content of the document where the microdata is marked up.
There is no semantic difference, for instance, between the following two examples:
<figure> <dd><img src="castle.jpeg"> <dt><span itemscope><span itemprop="name">The Castle</span></span> (1986) </figure>
<span itemscope><meta itemprop="name" content="The Castle"></span> <figure> <dd><img src="castle.jpeg"> <dt>The Castle (1986) </figure>
Both have a figure with a caption, and both, completely unrelated to the figure, have an item with a name-value pair with the name "name" and the value "The Castle". The only difference is that if the user drags the caption out of the document, in the former case, the item will be included in the drag-and-drop data. In neither case is the image in any way associated with the item.
Status: Last call for comments
This section is non-normative.
The examples in the previous section show how information could be marked up on a page that doesn't expect its microdata to be re-used. Microdata is most useful, though, when it is used in contexts where other authors and readers are able to cooperate to make new uses of the markup.
For this purpose, it is necessary to give each item a type, such as "http://example.com/person", or "http://example.org/cat", or "http://band.example.net/". Types are identified as URLs.
The type for an item is given
as the value of an itemtype
attribute on the same element as the itemscope attribute.
Here, the item is "http://example.org/animals#cat":
<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"> </section>
In this example the "http://example.org/animals#cat" item has three properties, a "name" ("Hedral"), a "desc" ("Hedral is..."), and an "img" ("hedral.jpeg").
An item can only have one type. The type gives the context for the properties, thus defining a vocabulary: a property named "class" given for an item with the type "http://census.example/person" might refer to the economic class of an individual, while a property named "class" given for an item with the type "http://example.com/school/teacher" might refer to the classroom a teacher has been assigned.
Status: Last call for comments
This section is non-normative.
Sometimes, an item gives information about a topic that has a global identifier. For example, books can be identified by their ISBN number.
Vocabularies (as identified by the itemtype attribute) can be designed
such that items get associated
with their global identifier in an unambiguous way by expressing the
global identifiers as URLs given in an
itemid attribute.
The exact meaning of the URLs given in
itemid attributes depends on the
vocabulary used.
Here, an item is talking about a particular book:
<dl itemscope
itemtype="http://vocab.example.net/book"
itemid="urn:isbn:0-330-34032-8">
<dt>Title
<dd itemprop="title">The Reality Dysfunction
<dt>Author
<dd itemprop="author">Peter F. Hamilton
<dt>Publication date
<dd><time itemprop="pubdate" datetime="1996-01-26">26 January 1996</time>
</dl>
The "http://vocab.example.net/book"
vocabulary in this example would define that the itemid attribute takes a urn: URL pointing to the ISBN of the
book.
Status: Last call for comments
This section is non-normative.
Using microdata means using a vocabulary. For some purposes, an ad-hoc vocabulary is adequate. For others, a vocabulary will need to be designed. Where possible, authors are encouraged to re-use existing vocabularies, as this makes content re-use easier.
When designing new vocabularies, identifiers can be created either using URLs, or, for properties, as plain words (with no dots or colons). For URLs, conflicts with other vocabularies can be avoided by only using identifiers that correspond to pages that the author has control over.
For instance, if Jon and Adam both write content at example.com, at http://example.com/~jon/... and http://example.com/~adam/... respectively, then
they could select identifiers of the form
"http://example.com/~jon/name" and "http://example.com/~adam/name"
respectively.
Properties whose names are just plain words can only be used within the context of the types for which they are intended; properties named using URLs can be reused in items of any type. If an item has no type, and is not part of another item, then if its properties have names that are just plain words, they are not intended to be globally unique, and are instead only intended for limited use. Generally speaking, authors are encouraged to use either properties with globally unique names (URLs) or ensure that their items are typed.
Here, an item is an "http://example.org/animals#cat", and most of the properties have names that are words defined in the context of that type. There are also a few additional properties whose names come from other vocabularies.
<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name http://example.com/fn">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy <span itemprop="http://example.com/color">black</span> fur with <span itemprop="http://example.com/color">white</span> paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"> </section>
This example has one item with the type "http://example.org/animals#cat" and the following properties:
| Property | Value |
| name | Hedral |
| http://example.com/fn | Hedral |
| desc | Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly. |
| http://example.com/color | black |
| http://example.com/color | white |
| img | .../hedral.jpeg |
Status: Last call for comments
This section is non-normative.
The microdata becomes even more useful when scripts can use it to expose information to the user, for example offering it in a form that can be used by other applications.
The document.getItems(typeNames) method provides access to the
top-level microdata items. It returns a
NodeList containing the items with the specified types,
or all types if no argument is specified.
Each item is represented in the
DOM by the element on which the relevant itemscope attribute is found. These
elements have their element.itemScope IDL attribute set to
true.
The type of items can be
obtained using the element.itemType IDL attribute on the
element with the itemscope
attribute.
This sample shows how the getItems() method can be used
to obtain a list of all the top-level microdata items of one type
given in the document:
var cats = document.getItems("http://example.com/feline");
Once an element representing an item has been obtained, its properties
can be extracted using the properties IDL attribute. This
attribute returns an HTMLPropertiesCollection, which can
be enumerated to go through each element that adds one or more
properties to the item. It can also be indexed by name, which will
return an object with a list of the elements that add properties
with that name.
Each element that adds a property also has a itemValue IDL attribute that returns
its value.
This sample gets the first item of type "http://example.net/user" and then pops up an alert using the "name" property from that item.
var user = document.getItems('http://example.net/user')[0];
alert('Hello ' + user.properties['name'][0].content + '!');
The HTMLPropertiesCollection object, when indexed by
name in this way, actually returns a PropertyNodeList
object with all the matching properties. The
PropertyNodeList object can be used to obtain all the
values at once using its values attribute, which
returns an array of all the values.
In an earlier example, a "http://example.org/animals#cat" item had two "http://example.com/color" values. This script looks up the first such item and then lists all its values.
var cat = document.getItems('http://example.org/animals#cat')[0];
var colors = cat.properties['http://example.com/color'].values;
var result;
if (colors.length == 0) {
result = 'Color unknown.';
} else if (colors.length == 1) {
result = 'Color: ' + colors[0];
} else {
result = 'Colors:';
for (var i = 0; i < colors.length; i += 1)
result += ' ' + colors[i];
}
It's also possible to get a list of all the property
names using the object's names IDL
attribute.
This example creates a big list with a nested list for each item on the page, each with of all the property names used in that item.
var outer = document.createElement('ul');
var items = document.getItems();
for (var item = 0; item < items.length; item += 1) {
var itemLi = document.createElement('li');
var inner = document.createElement('ul');
for (var name = 0; name < items[item].properties.names.length; name += 1) {
var propLi = document.createElement('li');
propLi.appendChild(document.createTextNode(items[item].properties.names[name]));
inner.appendChild(propLi);
}
itemLi.appendChild(inner);
outer.appendChild(itemLi);
}
document.body.appendChild(outer);
If faced with the following from an earlier example:
<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name http://example.com/fn">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy <span itemprop="http://example.com/color">black</span> fur with <span itemprop="http://example.com/color">white</span> paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"> </section>
...it would result in the following output:
(The duplicate occurrence of "http://example.com/color" is not included in the list.)
Status: Last call for comments
Status: Last call for comments
The microdata model consists of groups of name-value pairs known as items.
Each group is known as an item. Each item can have an item type, a global identifier (if the item type supports global identifiers for its items), and a list of name-value pairs. Each name in the name-value pair is known as a property, and each property has one or more values. Each value is either a string or itself a group of name-value pairs (an item).
An item is said to be a typed item when either it has an item type, or it is the value of a property of a typed item. The relevant type for a typed item is the item's item type, if it has one, or else is the relevant type of the item for which it is a property's value.
Status: Last call for comments
Every HTML element may have an
itemscope attribute
specified. The itemscope
attribute is a boolean attribute.
An element with the itemscope
attribute specified creates a new item, a group of name-value pairs.
Elements with an itemscope
attribute may have an itemtype attribute
specified, to give the item type of the item.
The itemtype attribute, if
specified, must have a value that is a valid URL that
is an absolute URL for which the string "http://www.w3.org/1999/xhtml/microdata#" is not a
prefix match.
The item type of an item is the value of its element's itemtype attribute, if it has one and
its value is not the empty string. If the itemtype attribute is missing or its
value is the empty string, the item is said to have no item
type.
The item type must be a type defined in an applicable specification.
The itemtype attribute must
not be specified on elements that do not have an itemscope attribute specified.
Elements with an itemscope
attribute and an itemtype
attribute that references a vocabulary that is defined to
support global identifiers for items may also have an
itemid attribute
specified, to give a global identifier for the item, so that it can be related to other
items on pages elsewhere on the
Web.
The itemid attribute, if
specified, must have a value that is a valid URL.
The global identifier of an item is the value of its element's itemid attribute, if it has one, resolved relative to the element on
which the attribute is specified. If the itemid attribute is missing or if
resolving it fails, it is said to have no global
identifier.
The itemid attribute must not be
specified on elements that do not have both an itemscope attribute and an itemtype attribute specified, and must
not be specified on elements with an itemscope attribute whose itemtype attribute specifies a
vocabulary that does not support global identifiers for
items, as defined by that vocabulary's specification.
Elements with an itemscope
attribute may have an itemref attribute specified,
to give a list of additional elements to crawl to find the
name-value pairs of the item.
The itemref attribute, if
specified, must have a value that is an unordered set of
unique space-separated tokens consisting of IDs of elements in the same document; for
each one, the element's nearest ancestor element with an itemscope attribute specified, if any,
must not be the element with the referencing itemref attribute specified.
The itemref attribute must not
be specified on elements that do not have an itemscope attribute specified.
itemprop attributeStatus: Last call for comments
Every HTML element may have an
itemprop attribute specified, if
doing so adds a
property to one or more items (as defined below).
The itemprop attribute, if
specified, must have a value that is an unordered set of
unique space-separated tokens representing the names of the
name-value pairs that it adds. The attribute's value must have at
least one token.
Each token must be either:
http://www.w3.org/1999/xhtml/microdata#" is not a
prefix match, orWhen an element with an itemprop attribute adds a property to multiple items, the requirement above regarding
the tokens applies for each item
individually.
The property names of an element are the tokens that
the element's itemprop attribute
is found to contain when its value is split on spaces, with the order preserved but with
duplicates removed (leaving only the first occurrence of each
name).
Within an item, the properties are unordered with respect to each other, except for properties with the same name, which are ordered in the order they are given by the algorithm that defines the properties of an item.
In the following example, the "a" property has the values "1" and "2", in that order, but whether the "a" property comes before the "b" property or not is not important:
<div itemscope> <p itemprop="a">1</p> <p itemprop="a">2</p> <p itemprop="b">test</p> </div>
Thus, the following is equivalent:
<div itemscope> <p itemprop="b">test</p> <p itemprop="a">1</p> <p itemprop="a">2</p> </div>
As is the following:
<div itemscope> <p itemprop="a">1</p> <p itemprop="b">test</p> <p itemprop="a">2</p> </div>
And the following:
<div itemscope itemref="x"> <p itemprop="b">test</p> <p itemprop="a">2</p> </div> <div id="x"> <p itemprop="a">1</p> </div>
Status: Last call for comments
The property value of a
name-value pair added by an element with an itemprop attribute depends on the
element, as follows:
itemscope attributeThe value is the item created by the element.
meta elementThe value is the value of the element's content attribute, if any, or the empty
string if there is no such attribute.
audio, embed,
iframe, img, source, or
video elementThe value is the absolute URL that results from
resolving the value of the
element's src attribute relative to the
element at the time the attribute is set, or the empty string if
there is no such attribute or if resolving it results in an error.
a, area, or
link elementThe value is the absolute URL that results from
resolving the value of the
element's href attribute relative to the
element at the time the attribute is set, or the empty string if
there is no such attribute or if resolving it results in an error.
object elementThe value is the absolute URL that results from
resolving the value of the
element's data attribute relative to the
element at the time the attribute is set, or the empty string if
there is no such attribute or if resolving it results in an error.
time element with a datetime attributeThe value is the value of the element's datetime attribute.
The value is the element's
textContent.
The URL property elements are the a,
area, audio, embed,
iframe, img, link,
object, source, and video
elements.
If a property's value is an absolute URL, the property must be specified using a URL property element.
Status: Last call for comments
To find the properties of an item, the user agent must run the following steps:
Let root be the element with the itemscope attribute.
Let pending be a stack of elements initially containing the child elements of root, if any. This list will be the one that holds the elements that still need to be crawled.
Let properties be an empty list of elements. This list will be the result of the algorithm: a list of elements with properties that apply to root.
If root has an itemref attribute, split the value of that itemref attribute on spaces. For
each resulting token, ID, if there is an
element in the document with the ID
ID, then push the first such element onto pending.
For each element candidate in pending, run the following substeps:
Let scope be candidate's nearest ancestor element with an itemscope attribute specified, if
any, or null otherwise.
If one of the other elements in pending is also candidate, then remove candidate from pending (i.e. remove duplicates).
Otherwise, if one of the other elements in pending is an ancestor element of candidate, and that element is scope, then remove candidate from pending.
Otherwise, if one of the other elements in pending is an ancestor element of candidate, and either scope is
null or that element also has scope as its
nearest ancestor element with an itemscope attribute specified, then
remove candidate from pending.
Sort pending in tree order.
Loop: Pop the top element from pending and let current be that element.
If current has an itemprop attribute, then append current to properties.
If current does not have an itemscope attribute, and current is an element with child elements, then:
push all the child elements of current onto
pending, in tree order (so the first
child of current will be the next element to be
popped from pending).
End of loop: If pending is not empty, return to the step marked loop.
Return properties. That is the list of properties of the item root. By definition, this list is in tree order.
A document must not contain any elements that have an itemprop attribute that would not be
found to be a property of any of the items in that document were their properties all to be
determined.
An item is a top-level microdata item if
its element does not have an itemprop attribute.
Status: Last call for comments
Here is an example of some HTML using Microdata to express RDF statements:
<dl itemscope
itemtype="http://purl.org/vocab/frbr/core#Work"
itemid="http://purl.oreilly.com/works/45U8QJGZSQKDH8N">
<dt>Title</dt>
<dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd>
<dt>By</dt>
<dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd>
<dt>Format</dt>
<dd itemprop="http://purl.org/vocab/frbr/core#realization"
itemscope
itemtype="http://purl.org/vocab/frbr/core#Expression"
itemid="http://purl.oreilly.com/products/9780596007683.BOOK">
<link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/BOOK">
Print
</dd>
<dd itemprop="http://purl.org/vocab/frbr/core#realization"
itemscope
itemtype="http://purl.org/vocab/frbr/core#Expression"
itemid="http://purl.oreilly.com/products/9780596802189.EBOOK">
<link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/EBOOK">
Ebook
</dd>
</dl>This is equivalent to the following Turtle:
@prefix dc: <http://purl.org/dc/terms/> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .
<http://purl.oreilly.com/works/45U8QJGZSQKDH8N> a frbr:Work ;
dc:creator "Wil Wheaton"@en ;
dc:title "Just a Geek"@en ;
frbr:realization <http://purl.oreilly.com/products/9780596007683.BOOK>,
<http://purl.oreilly.com/products/9780596802189.EBOOK> .
<http://purl.oreilly.com/products/9780596007683.BOOK> a frbr:Expression ;
dc:type <http://purl.oreilly.com/product-types/BOOK> .
<http://purl.oreilly.com/products/9780596802189.EBOOK> a frbr:Expression ;
dc:type <http://purl.oreilly.com/product-types/EBOOK> .Status: Last call for comments
getItems( [ types ] )Returns a NodeList of the elements in the Document that create items, that are not part of other items, and that are of one of the types given in the argument, if any are listed.
The types argument is interpreted as a space-separated list of types.
propertiesIf the element has an itemscope attribute, returns an
HTMLPropertiesCollection object with all the element's
properties. Otherwise, an empty
HTMLPropertiesCollection object.
itemValue [ = value ]Returns the element's value.
Can be set, to change the element's value. Setting the value when the element has
no itemprop attribute or when
the element's value is an item
throws an INVALID_ACCESS_ERR exception.
The document.getItems(typeNames) method takes an optional
string that contains an unordered set of unique
space-separated tokens representing types. When called, the
method must return a live NodeList object containing
all the elements in the document, in tree order, that
are each top-level microdata items with a type equal to one of the types specified in
that argument, having obtained the types by splitting the string on spaces. If there
are no tokens specified in the argument, or if the argument is
missing, then the method must return a NodeList
containing all the top-level microdata items in the
document.
The itemScope IDL
attribute on HTML elements must reflect
the itemscope content attribute.
The itemType IDL
attribute on HTML elements must reflect
the itemtype content attribute.
The itemId IDL attribute
on HTML elements must reflect the itemid content attribute. The itemProp IDL attribute on
HTML elements must reflect the itemprop content attribute. The itemRef IDL attribute on
HTML elements must reflect the itemref content attribute.
The properties IDL
attribute on HTML elements must return an
HTMLPropertiesCollection rooted at the
Document node, whose filter matches only elements that
have property names and are the properties of the item created by the element
on which the attribute was invoked, while that element is an item, and matches nothing the rest of
the time.
The itemValue IDL
attribute's behavior depends on the element, as follows:
itemprop attributeThe attribute must return null on getting and must throw an
INVALID_ACCESS_ERR exception on setting.
itemscope attributeThe attribute must return the element itself on getting and
must throw an INVALID_ACCESS_ERR exception on
setting.
meta elementThe attribute must act as it would if it was reflecting the element's content content
attribute.
audio, embed,
iframe, img, source, or
video elementThe attribute must act as it would if it was reflecting the element's src content attribute.
a, area, or
link elementThe attribute must act as it would if it was reflecting the element's href content attribute.
object elementThe attribute must act as it would if it was reflecting the element's data content attribute.
time element with a datetime attributeThe attribute must act as it would if it was reflecting the element's datetime content attribute.
The attribute must act the same as the element's
textContent attribute.
When the itemValue IDL
attribute is reflecting a content
attribute or acting like the element's textContent
attribute, the user agent must, on setting, convert the new value to
the IDL DOMString value before using it
according to the mappings described above.
Status: Last call for comments
Status: Last call for comments
Given a list of nodes nodes in a
Document, a user agent must run the following algorithm
to extract the microdata from those
nodes into a JSON form:
Let result be an empty object.
Let items be an empty array.
For each node in nodes, check if the element is a top-level microdata item, and if it is then get the object for that element and add it to items.
Add an entry to result called "items" whose value is the array items.
Return the result of serializing result to JSON.
When the user agent is to get the object for an item item, it must run the following substeps:
Let result be an empty object.
If the item has an item
type, add an entry to result called
"type" whose value is the item
type of item.
If the item has an global
identifier, add an entry to result
called "id" whose value is the global
identifier of item.
Let properties be an empty object.
For each element element that has one or more property names and is one of the properties of the item item, in the order those elements are given by the algorithm that returns the properties of an item, run the following substeps:
Let value be the property value of element.
If value is an item, then get the object for value, and then replace value with the object returned from those steps.
For each name name in element's property names, run the following substeps:
If there is no entry named name in properties, then add an entry named name to properties whose value is an empty array.
Append value to the entry named name in properties.
Add an entry to result called "properties" whose value is the object properties.
Return result.
Status: Last call for comments
To convert a Document to RDF, a user agent must run
the following algorithm:
If the title element is not null,
then generate the following triple:
http://purl.org/dc/terms/title
title element, in tree order, as a plain literal, with the language information set from the language of the title element, if it is not unknown.
For each a, area, and
link element in the Document, run these
substeps:
If the element does not have a rel
attribute, then skip this element.
If the element does not have an href
attribute, then skip this element.
If resolving the
element's href attribute relative to the
element is not successful, then skip this element.
Otherwise, split
the value of the element's rel attribute on
spaces, obtaining list of tokens.
If list of tokens contains more than
one instance of the token up, then
remove all such tokens.
Coalesce duplicate tokens in list of tokens.
If list of tokens contains both the
tokens alternate and stylesheet, then remove them both
and replace them with the single (uppercase) token ALTERNATE-STYLESHEET.
For each token token in list of tokens that contains no U+003A COLON characters (:), generate the following triple:
http://www.w3.org/1999/xhtml/vocab#" and token, with any characters in token that are not valid in the <ifragment> production of the IRI syntax being %-escaped [RFC3987]
href attribute relative to the element
For each token token in list of tokens that is an absolute URL, generate the following triple:
href attribute relative to the element
For each meta element in the Document
that has a name attribute and
a content attribute, if the
value of the name attribute
contains no U+003A COLON characters (:), generate the following
triple:
http://www.w3.org/1999/xhtml/vocab#" and the value of the element's name attribute, converted to ASCII lowercase, with any characters in the value that are not valid in the <ifragment> production of the IRI syntax being %-escaped [RFC3987]
content attribute, as a plain literal, with the language information set from the language of the element, if it is not unknown
For each meta element in the Document
that has a name attribute and
a content attribute, if the
value of the name attribute is
an absolute URL, generate the following triple:
name attribute
content attribute, as a plain literal, with the language information set from the language of the element, if it is not unknown
For each blockquote and q element in
the Document that has a cite
attribute that resolves
successfully relative to the element, generate the following
triple:
http://purl.org/dc/terms/source
cite attribute relative to the element
For each element that is also a top-level microdata item, run the following steps:
Generate the triples for the item. Let item be the subject returned.
Generate the following triple:
http://www.w3.org/1999/xhtml/microdata#item
When the user agent is to generate the triples for an item item, it must follow the following steps:
If item has a global identifier and that global identifier is an absolute URL, let subject be that global identifier. Otherwise, let subject be a new blank node.
If item has an item type and that item type is an absolute URL, let type be that item type. Otherwise, let type be the empty string.
If type is not the empty string, generate the following triple:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
For each element element that has one or more property names and is one of the properties of the item item, in the order those elements are given by the algorithm that returns the properties of an item, run the following substeps:
Let value be the property value of element.
If value is an item, then generate the triples for value, and then replace value with the subject returned from those steps.
Otherwise, if element is not one of the URL property elements, let value be a plain literal, with the language information set from the language of the element, if it is not unknown.
For each name name in element's property names, run the appropriate substeps from the following list:
Generate the following triple:
If type is the empty string, then abort these substeps.
Let predicate have the same value as type.
If predicate does not contain a U+0023 NUMBER SIGN character (#), then append a U+0023 NUMBER SIGN character (#) to predicate.
Append a U+003A COLON character (:) to predicate.
Append the value of name to predicate, with any characters in name that are not valid in the <ifragment> production of the IRI syntax being %-escaped.
Generate the following triple:
http://www.w3.org/1999/xhtml/microdata#" and predicate, with any characters in predicate that are not valid in the <ifragment> production of the IRI syntax being %-escaped [RFC3987]
Return subject.
Status: Last call for comments
Given a Document source, a user
agent may run the following algorithm to extract an Atom feed. This is not the only algorithm
that can be used for this purpose; for instance, a user agent might
instead use the hAtom algorithm. [HATOM]
If the Document source does
not contain any article elements, then return nothing
and abort these steps. This algorithm can only be used with
documents that contain distinct articles.
Let R be an empty XML Document object whose address is user-agent
defined.
Append a feed element in the
Atom namespace to R.
For each meta element with a name attribute and a content attribute and whose name attribute's value is author, run the following substeps:
Append an author element in the
Atom namespace to the root element of R.
Append a name element in the
Atom namespace to the element created in the
previous step.
Append a text node whose data is the value of the
meta element's content attribute to the element
created in the previous step.
If there is a link element whose rel attribute's value includes the
keyword icon, and that element also
has an href attribute whose
value successfully resolves
relative to the link element, then append an icon element in the Atom namespace to
the root element of R whose contents is a text
node with its data set to the absolute URL resulting
from resolving the value of the
href attribute.
Append an id element in the Atom
namespace to the root element of R
whose contents is a text node with its data set to the
document's current address.
Optionally: Let x be a link element in the Atom
namespace. Add a rel attribute whose
value is the string "self" to x. Append a text node with its data set to the
(user-agent-defined) address of R to x. Append x to the root element
of R.
This step would be skipped when the document R has no convenient address. The presence of the rel="self" link is a "should"-level requirement in
the Atom specification.
Let x be a link
element in the Atom namespace. Add a rel attribute whose value is the string "alternate" to x. If the
document being converted is an HTML
document, add a type attribute whose
value is the string "text/html" to x. Otherwise, the document being converted is an
XML document; add a type attribute whose value is the string
"application/xhtml+xml" to x. Append a text node with its data set to
the document's current address to x. Append x to the root element
of R.
Let subheading text be the empty string.
Let heading be the first element of heading content whose nearest ancestor of sectioning content is the body element, if any, or null if there is none.
Take the appropriate action from the following list, as determined by the type of the heading element:
Let heading text be the
textContent of the title
element, if there is one, or the empty string
otherwise.
hgroup elementIf heading contains no child
h1–h6 elements, let heading text be the empty string.
Otherwise, let headings list be a list of
all the h1–h6 element children
of heading, sorted first by descending
rank and then in tree order (so
h1s first, then h2s, etc, with each
group in the order they appear in the document). Then, let heading text be the textContent of
the first entry in headings list, and if
there are multiple entries, let subheading
text be the textContent of the second entry
in headings list.
h1–h6 elementLet heading text be the
textContent of heading.
Append a title element in the Atom
namespace to the root element of R
whose contents is a text node with its data set to heading text.
If subheading text is not the empty string,
append a subtitle element in the Atom
namespace to the root element of R
whose contents is a text node with its data set to subheading text.
Let global update date have no value.
For each article element article that does not have an ancestor
article element, run the following steps:
Let E be an entry element in the Atom namespace,
and append E to the root element of R.
Let heading be the first element of heading content whose nearest ancestor of sectioning content is article, if any, or null if there is none.
Take the appropriate action from the following list, as determined by the type of the heading element:
Let heading text be the empty string.
hgroup elementIf heading contains no child
h1–h6 elements, let heading text be the empty string.
Otherwise, let headings list be a list
of all the h1–h6 element
children of heading, sorted first by
descending rank and then in tree
order (so h1s first, then
h2s, etc, with each group in the order they
appear in the document). Then, let heading
text be the textContent of the first entry
in headings list.
h1–h6 elementLet heading text be the
textContent of heading.
Append a title element in the
Atom namespace to E whose
contents is a text node with its data set to heading text.
Clone article and its descendants into an
environment that has scripting
disabled, has no plugins, and
fails any attempt to fetch any
resources. Let cloned article be the
resulting clone article element.
Remove from the subtree rooted at cloned
article any article elements other than the
cloned article itself, any
header, footer, or nav
elements whose nearest ancestor of sectioning
content is the cloned article, and
the first element of heading content whose nearest
ancestor of sectioning content is the cloned article, if any.
If cloned article contains any
ins or del elements with datetime attributes whose
values parse
as global date and time strings without errors, then let
update date be the value of the datetime attribute that parses
to the newest global date and
time.
Otherwise, let update date have no value.
This value is used below; it is calculated here because in certain cases the next step mutates the cloned article.
If the document being converted is an HTML document, then: Let x
be a content element in the Atom
namespace. Add a type attribute
whose value is the string "html" to x. Append a text node with its data set to the
result of running the HTML fragment serialization
algorithm on cloned article to x. Append x to E.
Otherwise, the document being converted is an XML document: Let x be a content element in
the Atom namespace. Add a type attribute whose value is the string "xml" to x. Append a
div element to x. Move all the
child nodes of the cloned article node to
that div element, preserving their relative
order. Append x to E.
Establish the value of id and has-alternate from the first of the following to apply:
a or area element with an href attribute that
successfully resolves
relative to that descendant and a rel attribute whose value includes
the bookmark keywordhref
attribute of the first such a or area
element, relative to the element. Let has-alternate be true.id attributeid attribute. Let has-alternate be false.Append an id element in the Atom
namespace to E whose contents is a
text node with its data set to id.
If has-alternate is true: Let x be a link element in the
Atom namespace. Add a rel
attribute whose value is the string "alternate" to x. Append a
text node with its data set to id to x. Append x to E.
If article has a time
element descendant that has a pubdate attribute and whose
nearest ancestor article element is article, and the first such element's date is not unknown, then run
the following substeps, with e being the
first such element:
Let datetime be a global date and time whose date component is the date of e.
If e's time and time-zone offset are not
unknown, then let datetime's time and
time-zone offset components be the time and time-zone offset of e. Otherwise, let them be midnight and no offset
respectively ("00:00Z").
Let publication date be the best representation of the global date and time string datetime.
Otherwise, let publication date have no value.
If update date has no value but publication date does, then let update date have the value of publication date.
Otherwise, if publication date has no value but update date does, then let publication date have the value of update date.
If update date has a value, and global update date has no value or is less recent than update date, then let global update date have the value of update date.
If publication date and update date both still have no value, then let them both value a value that is a valid global date and time string representing the global date and time of the moment that this algorithm was invoked.
Append an published element in the
Atom namespace to E whose
contents is a text node with its data set to publication date.
Append an updated element in the
Atom namespace to E whose
contents is a text node with its data set to update date.
If global update date has no value, then
let it have a value that is a valid global date and time
string representing the global date and time of the date
and time of the Document's source file's last
modification, if it is known, or else of the moment that this
algorithm was invoked.
Insert an updated element in the
Atom namespace into the root element of R before the first entry in
the Atom namespace whose contents is a text node with
its data set to global update date.
Return the Atom document R.
The above algorithm does not guarantee that the
output will be a conforming Atom feed. In particular, if
insufficient information is provided in the document (e.g. if the
document does not have any <meta name="author"
content="..."> elements), then the output will not be
conforming.
The Atom namespace is: http://www.w3.org/2005/Atom