W3C

HTML 5

A vocabulary and associated APIs for HTML and XHTML

← 2 Common infrastructureTable of contents4 The elements of HTML →

3 Semantics, structure, and APIs of HTML documents

3.1 Introduction

This section is non-normative.

An introduction to marking up a document.

3.2 Documents

Every XML and HTML document in an HTML UA is represented by a Document object. [DOMCORE]

The document's address is an absolute URL that is set when the Document is created. The document's current address is an absolute URL that can change during the lifetime of the Document, for example when the user navigates to a fragment identifier on the page. The document's current address must be set to the document's address when the Document is created.

Interactive user agents typically expose the document's current address in their user interface.

When a Document is created by a script using the createDocument() API, the document's address is the same as the document's address of the active document of the script's browsing context.

Document objects are assumed to be XML documents unless they are flagged as being HTML documents when they are created. Whether a document is an HTML document or an XML document affects the behavior of certain APIs, as well as a few CSS rendering rules. [CSS]

A Document object created by the createDocument() API on the DOMImplementation object is initially an XML document, but can be made into an HTML document by calling document.open() on it.

3.2.1 Documents in the DOM

All Document objects (in user agents implementing this specification) must also implement the HTMLDocument interface, available using binding-specific methods. (This is the case whether or not the document in question is an HTML document or indeed whether it contains any HTML elements at all.) Document objects must also implement the document-level interface of any other namespaces found in the document that the UA supports.

For example, if an HTML implementation also supports SVG, then the Document object implements both HTMLDocument and SVGDocument.

Because the HTMLDocument interface is now obtained using binding-specific casting methods instead of simply being the primary interface of the document object, it is no longer defined as inheriting from Document.

[OverrideBuiltins]
interface HTMLDocument {
  // resource metadata management
  [PutForwards=href] readonly attribute Location location;
  readonly attribute DOMString URL;
           attribute DOMString domain;
  readonly attribute DOMString referrer;
           attribute DOMString cookie;
  readonly attribute DOMString lastModified;
  readonly attribute DOMString compatMode;
           attribute DOMString charset;
  readonly attribute DOMString characterSet;
  readonly attribute DOMString defaultCharset;
  readonly attribute DOMString readyState;

  // DOM tree accessors
           attribute DOMString title;
           attribute DOMString dir;
           attribute HTMLElement body;
  readonly attribute HTMLCollection images;
  readonly attribute HTMLCollection embeds;
  readonly attribute HTMLCollection plugins;
  readonly attribute HTMLCollection links;
  readonly attribute HTMLCollection forms;
  readonly attribute HTMLCollection scripts;
  NodeList getElementsByName(in DOMString elementName);
  NodeList getElementsByClassName(in DOMString classNames);
  NodeList getItems(optional in DOMString typeNames);
  getter any (in DOMString name);

  // dynamic markup insertion
           attribute DOMString innerHTML;
  HTMLDocument open(optional in DOMString type, optional in DOMString replace);
  WindowProxy open(in DOMString url, in DOMString name, in DOMString features, optional in boolean replace);
  void close();
  void write(in DOMString... text);
  void writeln(in DOMString... text);

  // user interaction
  Selection getSelection();
  readonly attribute Element activeElement;
  boolean hasFocus();
           attribute DOMString designMode;
  boolean execCommand(in DOMString commandId);
  boolean execCommand(in DOMString commandId, in boolean showUI);
  boolean execCommand(in DOMString commandId, in boolean showUI, in DOMString value);
  boolean queryCommandEnabled(in DOMString commandId);
  boolean queryCommandIndeterm(in DOMString commandId);
  boolean queryCommandState(in DOMString commandId);
  boolean queryCommandSupported(in DOMString commandId);
  DOMString queryCommandValue(in DOMString commandId);
  readonly attribute HTMLCollection commands;

  // event handler DOM attributes
           attribute Function onabort;
           attribute Function onblur;
           attribute Function oncanplay;
           attribute Function oncanplaythrough;
           attribute Function onchange;
           attribute Function onclick;
           attribute Function oncontextmenu;
           attribute Function ondblclick;
           attribute Function ondrag;
           attribute Function ondragend;
           attribute Function ondragenter;
           attribute Function ondragleave;
           attribute Function ondragover;
           attribute Function ondragstart;
           attribute Function ondrop;
           attribute Function ondurationchange;
           attribute Function onemptied;
           attribute Function onended;
           attribute Function onerror;
           attribute Function onfocus;
           attribute Function onformchange;
           attribute Function onforminput;
           attribute Function oninput;
           attribute Function oninvalid;
           attribute Function onkeydown;
           attribute Function onkeypress;
           attribute Function onkeyup;
           attribute Function onload;
           attribute Function onloadeddata;
           attribute Function onloadedmetadata;
           attribute Function onloadstart;
           attribute Function onmousedown;
           attribute Function onmousemove;
           attribute Function onmouseout;
           attribute Function onmouseover;
           attribute Function onmouseup;
           attribute Function onmousewheel;
           attribute Function onpause;
           attribute Function onplay;
           attribute Function onplaying;
           attribute Function onprogress;
           attribute Function onratechange;
           attribute Function onreadystatechange;
           attribute Function onscroll;
           attribute Function onseeked;
           attribute Function onseeking;
           attribute Function onselect;
           attribute Function onshow;
           attribute Function onstalled;
           attribute Function onsubmit;
           attribute Function onsuspend;
           attribute Function ontimeupdate;
           attribute Function onvolumechange;
           attribute Function onwaiting;
};
Document implements HTMLDocument;

Since the HTMLDocument interface holds methods and attributes related to a number of disparate features, the members of this interface are described in various different sections.

3.2.2 Security

User agents must raise a SECURITY_ERR exception whenever any of the members of an HTMLDocument object are accessed by scripts whose effective script origin is not the same as the Document's effective script origin.

3.2.3 Resource metadata management

document . URL

Returns the document's address.

document . referrer

Returns the address of the Document from which the user navigated to this one, unless it was blocked or there was no such document, in which case it returns the empty string.

The noreferrer link type can be used to block the referrer.

The URL attribute must return the document's address.

The referrer attribute must return either the current address of the active document of the source browsing context at the time the navigation was started (that is, the page which navigated the browsing context to the current document), or the empty string if there is no such originating page, or if the UA has been configured not to report referrers in this case, or if the navigation was initiated for a hyperlink with a noreferrer keyword.

In the case of HTTP, the referrer DOM attribute will match the Referer (sic) header that was sent when fetching the current page.

Typically user agents are configured to not report referrers in the case where the referrer uses an encrypted protocol and the current page does not (e.g. when navigating from an https: page to an http: page).


document . cookie [ = value ]

Returns the HTTP cookies that apply to the Document. If there are no cookies or cookies can't be applied to this resource, the empty string will be returned.

Can be set, to add a new cookie to the element's set of HTTP cookies.

If the Document has no browsing context an INVALID_STATE_ERR exception will be thrown. If the contents are sandboxed into a unique origin, a SECURITY_ERR exception will be thrown.

The cookie attribute represents the cookies of the resource.

On getting, if the document is not associated with a browsing context then the user agent must raise an INVALID_STATE_ERR exception. Otherwise, if the sandboxed origin browsing context flag was set on the browsing context of the Document when the Document was created, the user agent must raise a SECURITY_ERR exception. Otherwise, if the document's address does not use a server-based naming authority, it must return the empty string. Otherwise, it must first obtain the storage mutex and then return the same string as the value of the Cookie HTTP header it would include if fetching the resource indicated by the document's address over HTTP, as per RFC 2109 section 4.3.4 or later specifications, excluding HTTP-only cookies. [RFC2109] [COOKIES]

On setting, if the document is not associated with a browsing context then the user agent must raise an INVALID_STATE_ERR exception. Otherwise, if the sandboxed origin browsing context flag was set on the browsing context of the Document when the Document was created, the user agent must raise a SECURITY_ERR exception. Otherwise, if the document's address does not use a server-based naming authority, it must do nothing. Otherwise, the user agent must obtain the storage mutex and then act as it would when processing cookies if it had just attempted to fetch the document's address over HTTP, and had received a response with a Set-Cookie header whose value was the specified value, as per RFC 2109 sections 4.3.1, 4.3.2, and 4.3.3 or later specifications, but without overwriting the values of HTTP-only cookies. [RFC2109] [COOKIES]

This specification does not define what makes an HTTP-only cookie, and at the time of publication the editor is not aware of any reference for HTTP-only cookies. They are a feature supported by some Web browsers wherein an "httponly" parameter added to the cookie string causes the cookie to be hidden from script.

Since the cookie attribute is accessible across frames, the path restrictions on cookies are only a tool to help manage which cookies are sent to which parts of the site, and are not in any way a security feature.


document . lastModified

Returns the date of the last modification to the document, as reported by the server, in the form "MM/DD/YYYY hh:mm:ss".

If the last modification date is not known, the current time is returned instead.

The lastModified attribute, on getting, must return the date and time of the Document's source file's last modification, in the user's local time zone, in the following format:

  1. The month component of the date.
  2. A U+002F SOLIDUS character ('/').
  3. The day component of the date.
  4. A U+002F SOLIDUS character ('/').
  5. The year component of the date.
  6. A U+0020 SPACE character.
  7. The hours component of the time.
  8. A U+003A COLON character (':').
  9. The minutes component of the time.
  10. A U+003A COLON character (':').
  11. The seconds component of the time.

All the numeric components above, other than the year, must be given as two digits in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE representing the number in base ten, zero-padded if necessary. The year must be given as four or more digits in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE representing the number in base ten, zero-padded if necessary.

The Document's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the file system for local files. If the last modification date and time are not known, the attribute must return the current date and time in the above format.


document . compatMode

In a conforming document, returns the string "CSS1Compat". (In quirks mode documents, returns the string "BackCompat", but a conforming document can never trigger quirks mode.)

A Document is always set to one of three modes: no quirks mode, the default; quirks mode, used typically for legacy documents; and limited quirks mode, also known as "almost standards" mode. The mode is only ever changed from the default by the HTML parser, based on the presence, absence, or value of the DOCTYPE string.

The compatMode DOM attribute must return the literal string "CSS1Compat" unless the document has been set to quirks mode by the HTML parser, in which case it must instead return the literal string "BackCompat".


document . charset [ = value ]

Returns the document's character encoding.

Can be set, to dynamically change the document's character encoding.

New values that are not IANA-registered aliases supported by the user agent are ignored.

document . characterSet

Returns the document's character encoding.

document . defaultCharset

Returns what might be the user agent's default character encoding.

Documents have an associated character encoding. When a Document object is created, the document's character encoding must be initialized to UTF-16. Various algorithms during page loading affect this value, as does the charset setter. [IANACHARSET]

The charset DOM attribute must, on getting, return the preferred MIME name of the document's character encoding. On setting, if the new value is an IANA-registered alias for a character encoding supported by the user agent, the document's character encoding must be set to that character encoding. (Otherwise, nothing happens.)

The characterSet DOM attribute must, on getting, return the preferred MIME name of the document's character encoding.

The defaultCharset DOM attribute must, on getting, return the preferred MIME name of a character encoding, possibly the user's default encoding, or an encoding associated with the user's current geographical location, or any arbitrary encoding name.


document . readyState

Returns "loading" while the Document is loading, and "complete" once it has loaded.

The readystatechange event fires on the Document object when this value changes.

Each document has a current document readiness. When a Document object is created, it must have its current document readiness set to the string "loading" if the document is associated with an HTML parser or an XML parser, or to the string "complete" otherwise. Various algorithms during page loading affect this value. When the value is set, the user agent must fire a simple event called readystatechange at the Document object.

A Document is said to have an active parser if it is associated with an HTML parser or an XML parser that has not yet been stopped or aborted.

The readyState DOM attribute must, on getting, return the current document readiness.

3.2.4 DOM tree accessors

The html element of a document is the document's root element, if there is one and it's an html element, or null otherwise.

The head element of a document is the first head element that is a child of the html element, if there is one, or null otherwise.


document . title [ = value ]

Returns the document's title, as given by the title element.

Can be set, to update the document's title. If there is no head element, the new value is ignored.

In SVG documents, the SVGDocument interface's title attribute takes precedence.

The title element of a document is the first title element in the document (in tree order), if there is one, or null otherwise.

The title attribute must, on getting, run the following algorithm:

  1. If the root element is an svg element in the "http://www.w3.org/2000/svg" namespace, and the user agent supports SVG, then return the value that would have been returned by the DOM attribute of the same name on the SVGDocument interface. [SVG]

  2. Otherwise, let value be a concatenation of the data of all the child text nodes of the title element, in tree order, or the empty string if the title element is null.

  3. Replace any sequence of two or more consecutive space characters in value with a single U+0020 SPACE character.

  4. Remove any leading or trailing space characters in value.

  5. Return value.

On setting, the following algorithm must be run. Mutation events must be fired as appropriate.

  1. If the root element is an svg element in the "http://www.w3.org/2000/svg" namespace, and the user agent supports SVG, then the setter must defer to the setter for the DOM attribute of the same name on the SVGDocument interface (if it is readonly, then this will raise an exception). Stop the algorithm here. [SVG]

  2. If the title element is null and the head element is null, then the attribute must do nothing. Stop the algorithm here.
  3. If the title element is null, then a new title element must be created and appended to the head element. Let element be that element. Otherwise, let element be the title element.
  4. The children of element (if any) must all be removed.
  5. A single Text node whose data is the new value being assigned must be appended to element.

The title attribute on the HTMLDocument interface should shadow the attribute of the same name on the SVGDocument interface when the user agent supports both HTML and SVG. [SVG]


document . body [ = value ]

Returns the body element.

Can be set, to replace the body element.

If the new value is not a body or frameset element, this will throw a HIERARCHY_REQUEST_ERR exception.

The body element of a document is the first child of the html element that is either a body element or a frameset element. If there is no such element, it is null. If the body element is null, then when the specification requires that events be fired at "the body element", they must instead be fired at the Document object.

The body attribute, on getting, must return the body element of the document (either a body element, a frameset element, or null). On setting, the following algorithm must be run:

  1. If the new value is not a body or frameset element, then raise a HIERARCHY_REQUEST_ERR exception and abort these steps.
  2. Otherwise, if the new value is the same as the body element, do nothing. Abort these steps.
  3. Otherwise, if the body element is not null, then replace that element with the new value in the DOM, as if the root element's replaceChild() method had been called with the new value and the incumbent body element as its two arguments respectively, then abort these steps.
  4. Otherwise, the the body element is null. Append the new value to the root element.

document . images

Returns an HTMLCollection of the img elements in the Document.

document . embeds
document . plugins

Return an HTMLCollection of the embed elements in the Document.

document . links

Returns an HTMLCollection of the a and area elements in the Document that have href attributes.

document . forms

Return an HTMLCollection of the form elements in the Document.

document . scripts

Return an HTMLCollection of the script elements in the Document.

The images attribute must return an HTMLCollection rooted at the Document node, whose filter matches only img elements.

The embeds attribute must return an HTMLCollection rooted at the Document node, whose filter matches only embed elements.

The plugins attribute must return the same object as that returned by the embeds attribute.

The links attribute must return an HTMLCollection rooted at the Document node, whose filter matches only a elements with href attributes and area elements with href attributes.

The forms attribute must return an HTMLCollection rooted at the Document node, whose filter matches only form elements.

The scripts attribute must return an HTMLCollection rooted at the Document node, whose filter matches only script elements.


collection = document . getElementsByName(name)

Returns a NodeList of elements in the Document that have a name attribute with the value name.

collection = document . getElementsByClassName(classes)
collection = element . getElementsByClassName(classes)

Returns a NodeList of the elements in the object on which the method was invoked (a Document or an Element) that have all the classes given by classes.

The classes argument is interpreted as a space-separated list of classes.

The getElementsByName(name) method takes a string name, and must return a live NodeList containing all the HTML elements in that document that have a name attribute whose value is equal to the name argument (in a case-sensitive manner), in tree order.

The getElementsByClassName(classNames) method takes a string that contains an unordered set of unique space-separated tokens representing classes. When called, the method must return a live NodeList object containing all the elements in the document, in tree order, that have all the classes specified in that argument, having obtained the classes by splitting a string on spaces. If there are no tokens specified in the argument, then the method must return an empty NodeList. If the document is in quirks mode, then the comparisons for the classes must be done in an ASCII case-insensitive manner, otherwise, the comparisons must be done in a case-sensitive manner.

The getElementsByClassName(classNames) method on the HTMLElement interface must return a live NodeList with the nodes that the HTMLDocument getElementsByClassName() method would return when passed the same argument(s), excluding any elements that are not descendants of the HTMLElement object on which the method was invoked.

HTML, SVG, and MathML elements define which classes they are in by having an attribute in the per-element partition with the name class containing a space-separated list of classes to which the element belongs. Other specifications may also allow elements in their namespaces to be labeled as being in specific classes.

Given the following XHTML fragment:

<div id="example">
 <p id="p1" class="aaa bbb"/>
 <p id="p2" class="aaa ccc"/>
 <p id="p3" class="bbb ccc"/>
</div>

A call to document.getElementById('example').getElementsByClassName('aaa') would return a NodeList with the two paragraphs p1 and p2 in it.

A call to getElementsByClassName('ccc bbb') would only return one node, however, namely p3. A call to document.getElementById('example').getElementsByClassName('bbb  ccc ') would return the same thing.

A call to getElementsByClassName('aaa,bbb') would return no nodes; none of the elements above are in the "aaa,bbb" class.


The HTMLDocument interface supports named properties. The names of the supported named properties at any moment consist of the values of the name content attributes of all the applet, embed, form, iframe, img, and fallback-free object elements in the Document that have name content attributes, and the values of the id content attributes of all the applet and fallback-free object elements in the Document that have id content attributes, and the values of the id content attributes of all the img elements in the Document that have both name content attributes and id content attributes.

When the HTMLDocument object is indexed for property retrieval using a name name, then the user agent must return the value obtained using the following steps:

  1. Let elements be the list of named elements with the name name in the Document.

    There will be at least one such element, by definition.

  2. If elements has only one element, and that element is an iframe element, then return the WindowProxy object of the nested browsing context represented by that iframe element, and abort these steps.

  3. Otherwise, if elements has only one element, return that element and abort these steps.

  4. Otherwise return an HTMLCollection rooted at the Document node, whose filter matches only named elements with the name name.

Named elements with the name name, for the purposes of the above algorithm, are those that are either:

An object element is said to be fallback-free if it has no object or embed descendants.


The dir attribute on the HTMLDocument interface is defined along with the dir content attribute.

3.3 Elements

3.3.1 Semantics

Elements, attributes, and attribute values in HTML are defined (by this specification) to have certain meanings (semantics). For example, the ol element represents an ordered list, and the lang attribute represents the language of the content.

Authors must not use elements, attributes, and attribute values for purposes other than their appropriate intended semantic purpose. Authors must not use elements, attributes, and attribute values that are not permitted by this specification or other applicable specifications.

For example, the following document is non-conforming, despite being syntactically correct:

<!DOCTYPE html>
<html lang="en-GB">
 <head> <title> Demonstration </title> </head>
 <body>
  <table>
   <tr> <td> My favourite animal is the cat. </td> </tr>
   <tr>
    <td>
     —<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>,
     in an essay from 1992
    </td>
   </tr>
  </table>
 </body>
</html>

...because the data placed in the cells is clearly not tabular data (and the cite element mis-used). A corrected version of this document might be:

<!DOCTYPE html>
<html lang="en-GB">
 <head> <title> Demonstration </title> </head>
 <body>
  <blockquote>
   <p> My favourite animal is the cat. </p>
  </blockquote>
  <p>
   —<a href="http://example.org/~ernest/">Ernest</a>,
   in an essay from 1992
  </p>
 </body>
</html>

This next document fragment, intended to represent the heading of a corporate site, is similarly non-conforming because the second line is not intended to be a heading of a subsection, but merely a subheading or subtitle (a subordinate heading for the same section).

<body>
 <h1>ABC Company</h1>
 <h2>Leading the way in widget design since 1432</h2>
 ...

The hgroup element should be used in these kinds of situations:

<body>
 <hgroup>
  <h1>ABC Company</h1>
  <h2>Leading the way in widget design since 1432</h2>
 </hgroup>
 ...

In the next example, there is a non-conforming attribute value ("carpet") and a non-conforming attribute ("texture"), which is not permitted by this specification:

<label>Carpet: <input type="carpet" name="c" texture="deep pile"></label>

Here would be an alternative and correct way to mark this up:

<label>Carpet: <input type="text" class="carpet" name="c" data-texture="deep pile"></label>

Through scripting and using other mechanisms, the values of attributes, text, and indeed the entire structure of the document may change dynamically while a user agent is processing it. The semantics of a document at an instant in time are those represented by the state of the document at that instant in time, and the semantics of a document can therefore change over time. User agents must update their presentation of the document as this occurs.

HTML has a progress element that describes a progress bar. If its "value" attribute is dynamically updated by a script, the UA would update the rendering to show the progress changing.

3.3.2 Elements in the DOM

The nodes representing HTML elements in the DOM must implement, and expose to scripts, the interfaces listed for them in the relevant sections of this specification. This includes HTML elements in XML documents, even when those documents are in another context (e.g. inside an XSLT transform).

Elements in the DOM represent things; that is, they have intrinsic meaning, also known as semantics.

For example, an ol element represents an ordered list.

The basic interface, from which all the HTML elements' interfaces inherit, and which must be used by elements that have no additional requirements, is the HTMLElement interface.

interface HTMLElement : Element {
  // DOM tree accessors
  NodeList getElementsByClassName(in DOMString classNames);

  // dynamic markup insertion
           attribute DOMString innerHTML;
           attribute DOMString outerHTML;
  void insertAdjacentHTML(in DOMString position, in DOMString text);

  // metadata attributes
           attribute DOMString id;
           attribute DOMString title;
           attribute DOMString lang;
           attribute DOMString dir;
           attribute DOMString className;
  readonly attribute DOMTokenList classList;
  readonly attribute DOMStringMap dataset;

  // microdata
  [PutForwards=value] readonly attribute DOMSettableTokenList item;
  [PutForwards=value] readonly attribute DOMSettableTokenList itemprop;
  readonly attribute HTMLPropertyCollection properties;
           attribute DOMString content;
           attribute HTMLElement subject;

  // user interaction
           attribute boolean hidden;
  void click();
  void scrollIntoView();
  void scrollIntoView(in boolean top);
           attribute long tabIndex;
  void focus();
  void blur();
           attribute DOMString accessKey;
  readonly attribute DOMString accessKeyLabel;
           attribute boolean draggable;
           attribute DOMString contentEditable;
  readonly attribute boolean isContentEditable;
           attribute HTMLMenuElement contextMenu;
           attribute DOMString spellcheck;

  // command API
  readonly attribute DOMString commandType;
  readonly attribute DOMString label;
  readonly attribute DOMString icon;
  readonly attribute boolean disabled;
  readonly attribute boolean checked;

  // styling
  readonly attribute CSSStyleDeclaration style;

  // event handler DOM attributes
           attribute Function onabort;
           attribute Function onblur;
           attribute Function oncanplay;
           attribute Function oncanplaythrough;
           attribute Function onchange;
           attribute Function onclick;
           attribute Function oncontextmenu;
           attribute Function ondblclick;
           attribute Function ondrag;
           attribute Function ondragend;
           attribute Function ondragenter;
           attribute Function ondragleave;
           attribute Function ondragover;
           attribute Function ondragstart;
           attribute Function ondrop;
           attribute Function ondurationchange;
           attribute Function onemptied;
           attribute Function onended;
           attribute Function onerror;
           attribute Function onfocus;
           attribute Function onformchange;
           attribute Function onforminput;
           attribute Function oninput;
           attribute Function oninvalid;
           attribute Function onkeydown;
           attribute Function onkeypress;
           attribute Function onkeyup;
           attribute Function onload;
           attribute Function onloadeddata;
           attribute Function onloadedmetadata;
           attribute Function onloadstart;
           attribute Function onmousedown;
           attribute Function onmousemove;
           attribute Function onmouseout;
           attribute Function onmouseover;
           attribute Function onmouseup;
           attribute Function onmousewheel;
           attribute Function onpause;
           attribute Function onplay;
           attribute Function onplaying;
           attribute Function onprogress;
           attribute Function onratechange;
           attribute Function onreadystatechange;
           attribute Function onscroll;
           attribute Function onseeked;
           attribute Function onseeking;
           attribute Function onselect;
           attribute Function onshow;
           attribute Function onstalled;
           attribute Function onsubmit;
           attribute Function onsuspend;
           attribute Function ontimeupdate;
           attribute Function onvolumechange;
           attribute Function onwaiting;
};

interface HTMLUnknownElement : HTMLElement { };

The HTMLElement interface holds methods and attributes related to a number of disparate features, and the members of this interface are therefore described in various different sections of this specification.

The HTMLUnknownElement interface must be used for HTML elements that are not defined by this specification.

3.3.3 Global attributes

The following attributes are common to and may be specified on all HTML elements (even those not defined in this specification):

In addition, unless otherwise specified, the following event handler content attributes may be specified on any HTML element:

The attributes marked with an asterisk have a different meaning when specified on body elements as those elements expose event handler attributes of the Window object with the same names.


Also, custom data attributes (e.g. data-foldername or data-msgid) can be specified on any HTML element, to store custom data specific to the page.

In HTML documents, elements in the HTML namespace may have an xmlns attribute specified, if, and only if, it has the exact value "http://www.w3.org/1999/xhtml". This does not apply to XML documents.

In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in no namespace, not the "http://www.w3.org/2000/xmlns/" namespace like namespace declaration attributes in XML do.

In XML, an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in no namespace specified.

3.3.3.1 The id attribute

The id attribute represents its element's unique identifier. The value must be unique in the element's home subtree and must contain at least one character. The value must not contain any space characters.

An element's unique identifier can be used for a variety of purposes, most notably as a way to link to specific parts of a document using fragment identifiers, as a way to target an element when scripting, and as a way to style a specific element from CSS.

If the value is not the empty string, user agents must associate the element with the given value (exactly, including any space characters) for the purposes of ID matching within the element's home subtree (e.g. for selectors in CSS or for the getElementById() method in the DOM).

Identifiers are opaque strings. Particular meanings should not be derived from the value of the id attribute.

This specification doesn't preclude an element having multiple IDs, if other mechanisms (e.g. DOM Core methods) can set an element's ID in a way that doesn't conflict with the id attribute.

The id DOM attribute must reflect the id content attribute.

3.3.3.2 The title attribute

The title attribute represents advisory information for the element, such as would be appropriate for a tooltip. On a link, this could be the title or a description of the target resource; on an image, it could be the image credit or a description of the image; on a paragraph, it could be a footnote or commentary on the text; on a citation, it could be further information about the source; and so forth. The value is text.

If this attribute is omitted from an element, then it implies that the title attribute of the nearest ancestor HTML element with a title attribute set is also relevant to this element. Setting the attribute overrides this, explicitly stating that the advisory information of any ancestors is not relevant to this element. Setting the attribute to the empty string indicates that the element has no advisory information.

If the title attribute's value contains U+000A LINE FEED (LF) characters, the content is split into multiple lines. Each U+000A LINE FEED (LF) character represents a line break.

Caution is advised with respect to the use of newlines in title attributes.

For instance, the following snippet actually defines an abbreviation's expansion with a line break in it:

<p>My logs show that there was some interest in <abbr title="Hypertext
Transport Protocol">HTTP</abbr> today.</p>

Some elements, such as link, abbr, and input, define additional semantics for the title attribute beyond the semantics described above.


The title DOM attribute must reflect the title content attribute.

3.3.3.3 The lang and xml:lang attributes

The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language code, or the empty string. [BCP47]

The lang attribute in the XML namespace is defined in XML. [XML]

If these attributes are omitted from an element, then the language of this element is the same as the language of its parent element, if any. Setting the attribute to the empty string indicates that the primary language is unknown.

The lang attribute in no namespace may be used on any HTML element.

The lang attribute in the XML namespace may be used on HTML elements in XML documents, as well as elements in other namespaces if the relevant specifications allow it (in particular, MathML and SVG allow lang attributes in the XML namespace to be specified on their elements). If both the lang attribute in no namespace and the lang attribute in the XML namespace are specified on the same element, they must have exactly the same value when compared in an ASCII case-insensitive manner.

Authors must not use the lang attribute in the XML namespace in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.


To determine the language of a node, user agents must look at the nearest ancestor element (including the element itself if the node is an element) that has a lang attribute in the XML namespace set or is an HTML element and has a lang in no namespace attribute set. That attribute specifies the language of the node.

If both the lang attribute in no namespace and the lang attribute in the XML namespace are set on an element, user agents must use the lang attribute in the XML namespace, and the lang attribute in no namespace must be ignored for the purposes of determining the element's language.

If no explicit language is given for the root element, but there is a document-wide default language set, then that is the language of the node.

If there is no document-wide default language, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language. In the absence of any language information, the default value is unknown (the empty string).

If the resulting value is not a recognized language code, then it must be treated as an unknown language (as if the value was the empty string).


User agents may use the element's language to determine proper processing or rendering (e.g. in the selection of appropriate fonts or pronunciations, or for dictionary selection).


The lang DOM attribute must reflect the lang content attribute in no namespace.

3.3.3.4 The xml:base attribute (XML only)

The xml:base attribute is defined in XML Base. [XMLBASE]

The xml:base attribute may be used on elements of XML documents. Authors must not use the xml:base attribute in HTML documents.

3.3.3.5 The dir attribute

The dir attribute specifies the element's text directionality. The attribute is an enumerated attribute with the keyword ltr mapping to the state ltr, and the keyword rtl mapping to the state rtl. The attribute has no defaults.

The processing of this attribute is primarily performed by the presentation layer. For example, the rendering section in this specification defines a mapping from this attribute to the CSS 'direction' and 'unicode-bidi' properties, and CSS defines rendering in terms of those properties.

The directionality of an element, which is used in particular by the canvas element's text rendering API, is either 'ltr' or 'rtl'. If the user agent supports CSS and the 'direction' property on this element has a computed value of either 'ltr' or 'rtl', then that is the directionality of the element. Otherwise, if the element is being rendered, then the directionality of the element is the directionality used by the presentation layer, potentially determined from the value of the dir attribute on the element. Otherwise, if the element's dir attribute has the state ltr, the element's directionality is 'ltr' (left-to-right); if the attribute has the state rtl, the element's directionality is 'rtl' (right-to-left); and otherwise, the element's directionality is the same as its parent element, or 'ltr' if there is no parent element.


document . dir [ = value ]

Returns the html element's dir attribute's value, if any.

Can be set, to either "ltr" or "rtl", to replace the html element's dir attribute's value.

If there is no html element, returns the empty string and ignores new values.

The dir DOM attribute on an element must reflect the dir content attribute of that element, limited to only known values.

The dir DOM attribute on HTMLDocument objects must reflect the dir content attribute of the html element, if any, limited to only known values. If there is no such element, then the attribute must return the empty string and do nothing on setting.

Authors are strongly encouraged to use the dir attribute to indicate text direction rather than using CSS, since that way their documents will continue to render correctly even in the absence of CSS (e.g. as interpreted by search engines).

3.3.3.6 The class attribute

Every HTML element may have a class attribute specified.

The attribute, if specified, must have a value that is an unordered set of unique space-separated tokens representing the various classes that the element belongs to.

The classes that an HTML element has assigned to it consists of all the classes returned when the value of the class attribute is split on spaces.

Assigning classes to an element affects class matching in selectors in CSS, the getElementsByClassName() method in the DOM, and other such features.

Authors may use any value in the class attribute, but are encouraged to use the values that describe the nature of the content, rather than values that describe the desired presentation of the content.


The className and classList DOM attributes must both reflect the class content attribute.

3.3.3.7 The style attribute

All HTML elements may have the style content attribute set. If specified, the attribute must contain only a list of zero or more semicolon-separated (;) CSS declarations. [CSS]

In user agents that support CSS, the attribute's value must be parsed when the attribute is added or has its value changed, with its value treated as the body (the part inside the curly brackets) of a declaration block in a rule whose selector matches just the element on which the attribute is set. All URLs in the value must be resolved relative to the element when the attribute is parsed. For the purposes of the CSS cascade, the attribute must be considered to be a 'style' attribute at the author level.

Documents that use style attributes on any of their elements must still be comprehensible and usable if those attributes were removed.

In particular, using the style attribute to hide and show content, or to convey meaning that is otherwise not included in the document, is non-conforming. (To hide and show content, use the hidden attribute.)


element . style

Returns a CSSStyleDeclaration object for the element's style attribute.

The style DOM attribute must return a CSSStyleDeclaration whose value represents the declarations specified in the attribute, if present. Mutating the CSSStyleDeclaration object must create a style attribute on the element (if there isn't one already) and then change its value to be a value representing the serialized form of the CSSStyleDeclaration object. [CSSOM]

In the following example, the words that refer to colors are marked up using the span element and the style attribute to make those words show up in the relevant colors in visual media.

<p>My sweat suit is <span style="color: green; background:
transparent">green</span> and my eyes are <span style="color: blue;
background: transparent">blue</span>.</p>
3.3.3.8 Embedding custom non-visible data

A custom data attribute is an attribute in no namespace whose name starts with the string "data-", has at least one character after the hyphen, is XML-compatible, and contains no characters in the range U+0041 .. U+005A (LATIN CAPITAL LETTER A .. LATIN CAPITAL LETTER Z).

All attributes in HTML documents get lowercased automatically, so the restriction on uppercase letters doesn't affect such documents.

Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.

These attributes are not intended for use by software that is independent of the site that uses the attributes.

For instance, a site about music could annotate list items representing tracks in an album with custom data attributes containing the length of each track. This information could then be used by the site itself to allow the user to sort the list by track length, or to filter the list for tracks of certain lengths.

<ol>
 <li data-length="2m11s">Beyond The Sea</li>
 ...
</ol>

It would be inappropriate, however, for the user to use generic software not associated with that music site to search for tracks of a certain length by looking at this data.

This is because these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata.

Every HTML element may have any number of custom data attributes specified, with any value.


element . dataset

Returns a DOMStringMap object for the element's data-* attributes.

The dataset DOM attribute provides convenient accessors for all the data-* attributes on an element. On getting, the dataset DOM attribute must return a DOMStringMap object, associated with the following algorithms, which expose these attributes on their element:

The algorithm for getting the list of name-value pairs
  1. Let list be an empty list of name-value pairs.
  2. For each content attribute on the element whose first five characters are the string "data-", add a name-value pair to list whose name is the attribute's name with the first five character removed and whose value is the attribute's value.
  3. Return list.
The algorithm for setting names to certain values
  1. Let name be the concatenation of the string data- and the name passed to the algorithm.
  2. Let value be the value passed to the algorithm.
  3. Set the value of the attribute with the name name, to the value value, replacing any previous value if the attribute already existed. If setAttribute() would have raised an exception when setting an attribute with the name name, then this must raise the same exception.
The algorithm for deleting names
  1. Let name be the concatenation of the string data- and the name passed to the algorithm.
  2. Remove the attribute with the name name, if such an attribute exists. Do nothing otherwise.

If a Web page wanted an element to represent a space ship, e.g. as part of a game, it would have to use the class attribute along with data-* attributes:

<div class="spaceship" data-id="92432"
     data-weapons="laser 2" data-shields="50%"
     data-x="30" data-y="10" data-z="90">
 <button class="fire"
         onclick="spaceships[this.parentNode.dataset.id].fire()">
  Fire
 </button>
</div>

Authors should carefully design such extensions so that when the attributes are ignored and any associated CSS dropped, the page is still usable.

User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.

3.3.4 Element definitions

Each element in this specification has a definition that includes the following information:

Categories
A list of categories to which the element belongs. These are used when defining the content models for each element.
Contexts in which this element may be used
A non-normative description of where the element can be used. This information is redundant with the content models of elements that allow this one as a child, and is provided only as a convenience.
Content model
A normative description of what content must be included as children and descendants of the element.
Content attributes
A normative list of attributes that may be specified on the element.
DOM interface
A normative definition of a DOM interface that such elements must implement.

This is then followed by a description of what the element represents, along with any additional normative conformance criteria that may apply to authors and implementations. Examples are sometimes also included.

3.3.5 Content models

All the elements in this specification have a defined content model, which describes what nodes are allowed inside the elements, and thus what the structure of an HTML document or fragment must look like.

As noted in the conformance and terminology sections, for the purposes of determining if an element matches its content model or not, CDATASection nodes in the DOM are treated as equivalent to Text nodes, and entity reference nodes are treated as if they were expanded in place.

The space characters are always allowed between elements. User agents represent these characters between elements in the source markup as text nodes in the DOM. Empty text nodes and text nodes consisting of just sequences of those characters are considered inter-element whitespace.

Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element matches its content model or not, and must be ignored when following algorithms that define document and element semantics.

An element A is said to be preceded or followed by a second element B if A and B have the same parent node and there are no other element nodes or text nodes (other than inter-element whitespace) between them.

Authors must not use elements in the HTML namespace anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.

The Atom specification defines the Atom content element, when its type attribute has the value xhtml, as requiring that it contains a single HTML div element. Thus, a div element is allowed in that context, even though this is not explicitly normatively stated by this specification. [ATOM]

In addition, elements in the HTML namespace may be orphan nodes (i.e. without a parent node).

For example, creating a td element and storing it in a global variable in a script is conforming, even though td elements are otherwise only supposed to be used inside tr elements.

var data = {
  name: "Banana",
  cell: document.createElement('td'),
};
3.3.5.1 Kinds of content

Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following broad categories are used in this specification:

Some elements also fall into other categories, which are defined in other parts of this specification.

These categories are related as follows:

Sectioning content, heading content, phrasing content, and
  embedded content are all types of flow content. Embedded content is
  also a type of phrasing content.

In addition, certain elements are categorized as form-associated elements and further subcategorized to define their role in various form-related processing models.

Some elements have unique requirements and do not fit into any particular category.

3.3.5.1.1 Metadata content

Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.

Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.

Thus, in the XML serialization, one can use RDF, like this:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 <head>
  <title>Hedral's Home Page</title>
  <r:RDF>
   <Person xmlns="http://www.w3.org/2000/10/swap/pim/contact#"
           r:about="http://hedral.example.com/#">
    <fullName>Cat Hedral</fullName>
    <mailbox r:resource="mailto:hedral@damowmow.com"/>
    <personalTitle>Sir</personalTitle>
   </Person>
  </r:RDF>
 </head>
 <body>
  <h1>My home page</h1>
  <p>I like playing with string, I guess. Sister says squirrels are fun
  too so sometimes I follow her to play with them.</p>
 </body>
</html>

This isn't possible in the HTML serialization, however.

3.3.5.1.2 Flow content

Most elements that are used in the body of documents and applications are categorized as flow content.

As a general rule, elements whose content model allows any flow content should have either at least one descendant text node that is not inter-element whitespace, or at least one descendant element node that is embedded content. For the purposes of this requirement, del elements and their descendants must not be counted as contributing to the ancestors of the del element.

This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.

3.3.5.1.3 Sectioning content

Sectioning content is content that defines the scope of headings and footers.

Each sectioning content element potentially has a heading and an outline. See the section on headings and sections for further details.

There are also certain elements that are sectioning roots. These are distinct from sectioning content, but they can also have an outline.

3.3.5.1.4 Heading content

Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).

3.3.5.1.5 Phrasing content

Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.

As a general rule, elements whose content model allows any phrasing content should have either at least one descendant text node that is not inter-element whitespace, or at least one descendant element node that is embedded content. For the purposes of this requirement, nodes that are descendants of del elements must not be counted as contributing to the ancestors of the del element.

Most elements that are categorized as phrasing content can only contain elements that are themselves categorized as phrasing content, not any flow content.

Text, in the context of content models, means text nodes. Text is sometimes used as a content model on its own, but is also phrasing content, and can be inter-element whitespace (if the text nodes are empty or contain just space characters).

3.3.5.1.6 Embedded content

Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.

Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)

Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.

3.3.5.1.7 Interactive content

Interactive content is content that is specifically intended for user interaction.

Certain elements in HTML have an activation behavior, which means that the user can activate them. This triggers a sequence of events dependent on the activation mechanism, and normally culminating in a click event followed by a DOMActivate event, as described below.

The user agent should allow the user to manually trigger elements that have an activation behavior, for instance using keyboard or voice input, or through mouse clicks. When the user triggers an element with a defined activation behavior in a manner other than clicking it, the default action of the interaction event must be to run synthetic click activation steps on the element.

When a user agent is to run synthetic click activation steps on an element, the user agent must run pre-click activation steps on the element, then fire a click event at the element. The default action of this click event must be to run post-click activation steps on the element. If the event is canceled, the user agent must run canceled activation steps on the element instead.

Given an element target, the nearest activatable element is the element returned by the following algorithm:

  1. If target has a defined activation behavior, then return target and abort these steps.

  2. If target has a parent element, then set target to that parent element and return to the first step.

  3. Otherwise, there is no nearest activatable element.

When a pointing device is clicked, the user agent must run these steps:

  1. Let e be the nearest activatable element of the element designated by the user, if any.

  2. If there is an element e, run pre-click activation steps on it.

  3. Dispatch the required click event.

    If there is an element e, then the default action of the click event must be to run post-click activation steps on element e.

    If there is an element e but the event is canceled, the user agent must run canceled activation steps on element e.

The above doesn't happen for arbitrary synthetic events dispatched by author script. However, the click() method can be used to make it happen programmatically.

When a user agent is to run pre-click activation steps on an element, it must run the pre-click activation steps defined for that element, if any.

When a user agent is to run post-click activation steps on an element, the user agent must fire a simple event called DOMActivate that is cancelable at that element. The default action of this event must be to run final activation steps on that element. If the event is canceled, the user agent must run canceled activation steps on the element instead.

When a user agent is to run canceled activation steps on an element, it must run the canceled activation steps defined for that element, if any.

When a user agent is to run final activation steps on an element, it must run the activation behavior defined for that element. Activation behaviors can refer to the click and DOMActivate events that were fired by the steps above leading up to this point.

3.3.5.2 Transparent content models

Some elements are described as transparent; they have "transparent" in the description of their content model.

When a content model includes a part that is "transparent", those parts must not contain content that would not be conformant if all transparent elements in the tree were replaced, in their parent element, by the children in the "transparent" part of their content model, retaining order.

When a transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.

3.3.5.3 Paragraphs

The term paragraph as defined in this section is distinct from (though related to) the p element defined later. The paragraph concept defined here is used to describe how to interpret documents.

A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.

In the following example, there are two paragraphs in a section. There is also a heading, which contains phrasing content that is not a paragraph. Note how the comments and inter-element whitespace do not form paragraphs.

<section>
  <h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in this example.
  <p>This is the second.</p>
  <!-- This is not a paragraph. -->
</section>

Paragraphs in flow content are defined relative to what the document looks like without the a, ins, del, and map elements complicating matters, since those elements, with their hybrid content models, can straddle paragraph boundaries, as shown in the first two examples below.

Generally, having elements straddle paragraph boundaries is best avoided. Maintaining such markup can be difficult.

The following example takes the markup from the earlier example and puts ins and del elements around some of the markup to show that the text was changed (though in this case, the changes admittedly don't make much sense). Notice how this example has exactly the same paragraphs as the previous one, despite the ins and del elements — the ins element straddles the heading and the first paragraph, and the del element straddles the boundary between the two paragraphs.

<section>
  <ins><h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in</ins> this example<del>.
  <p>This is the second.</p></del>
  <!-- This is not a paragraph. -->
</section>

Let view be a view of the DOM that replaces all a, ins, del, and map elements in the document with their contents. Then, in view, for each run of sibling phrasing content nodes uninterrupted by other types of content, in an element that accepts content other than phrasing content, let first be the first node of the run, and let last be the last node of the run. For each such run that consists of at least one node that is neither embedded content nor inter-element whitespace, a paragraph exists in the original DOM from immediately before first to immediately after last. (Paragraphs can thus span across a, ins, del, and map elements.)

Conformance checkers may warn authors of cases where they have paragraphs that overlap each other (this can happen with object, video, audio, and canvas elements).

A paragraph is also formed explicitly by p elements.

The p element can be used to wrap individual paragraphs when there would otherwise not be any content other than phrasing content to separate the paragraphs from each other.

In the following example, the link spans half of the first paragraph, all of the heading separating the two paragraphs, and half of the second paragraph. It straddles the paragraphs and the heading.

<aside>
 Welcome!
 <a href="about.html">
  This is home of...
  <h1>The Falcons!</h1>
  The Lockheed Martin multirole jet fighter aircraft!
 </a>
 This page discusses the F-16 Fighting Falcon's innermost secrets.
</aside>

Here is another way of marking this up, this time showing the paragraphs explicitly, and splitting the one link element into three:

<aside>
 <p>Welcome! <a href="about.html">This is home of...</a></p>
 <h1><a href="about.html">The Falcons!</a></h1>
 <p><a href="about.html">The Lockheed Martin multirole jet
 fighter aircraft!</a> This page discusses the F-16 Fighting
 Falcon's innermost secrets.</p>
</aside>

It is possible for paragraphs to overlap when using certain elements that define fallback content. For example, in the following section:

<section>
 <h1>My Cats</h1>
 You can play with my cat simulator.
 <object data="cats.sim">
  To see the cat simulator, use one of the following links:
  <ul>
   <li><a href="cats.sim">Download simulator file</a>
   <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
  </ul>
  Alternatively, upgrade to the Mellblom Browser.
 </object>
 I'm quite proud of it.
</section>

There are five paragraphs:

  1. The paragraph that says "You can play with my cat simulator. object I'm quite proud of it.", where object is the object element.
  2. The paragraph that says "To see the cat simulator, use one of the following links:".
  3. The paragraph that says "Download simulator file".
  4. The paragraph that says "Use online simulator".
  5. The paragraph that says "Alternatively, upgrade to the Mellblom Browser.".

The first paragraph is overlapped by the other four. A user agent that supports the "cats.sim" resource will only show the first one, but a user agent that shows the fallback will confusingly show the first sentence of the first paragraph as if it was in the same paragraph as the second one, and will show the last paragraph as if it was at the start of the second sentence of the first paragraph.

To avoid this confusion, explicit p elements can be used.

3.4 APIs in HTML documents

For HTML documents, and for HTML elements in HTML documents, certain APIs defined in DOM Core become case-insensitive or case-changing, as sometimes defined in DOM Core, and as summarized or required below. [DOMCORE].

This does not apply to XML documents or to elements that are not in the HTML namespace despite being in HTML documents.

Element.tagName and Node.nodeName

These attributes must return element names converted to ASCII uppercase, regardless of the case with which they were created.

Document.createElement()

The canonical form of HTML markup is all-lowercase; thus, this method will lowercase the argument before creating the requisite element. Also, the element created must be in the HTML namespace.

This doesn't apply to Document.createElementNS(). Thus, it is possible, by passing this last method a tag name in the wrong case, to create an element that claims to have the tag name of an element defined in this specification, but doesn't support its interfaces, because it really has another tag name not accessible from the DOM APIs.

Element.setAttribute()
Element.setAttributeNode()

Attribute names are converted to ASCII lowercase.

Specifically: when an attribute is set on an HTML element using Element.setAttribute(), the name argument must be converted to ASCII lowercase before the element is affected; and when an Attr node is set on an HTML element using Element.setAttributeNode(), it must have its name converted to ASCII lowercase before the element is affected.

This doesn't apply to Document.setAttributeNS() and Document.setAttributeNodeNS().

Element.getAttribute()
Element.getAttributeNode()

Attribute names are converted to ASCII lowercase.

Specifically: When the Element.getAttribute() method or the Element.getAttributeNode() method is invoked on an HTML element, the name argument must be converted to ASCII lowercase before the element's attributes are examined.

This doesn't apply to Document.getAttributeNS() and Document.getAttributeNodeNS().

Document.getElementsByTagName()
Element.getElementsByTagName()

HTML elements match by lower-casing the argument before comparison, elements from other namespaces are treated as in XML (case-sensitively).

Specifically, these methods (but not their namespaced counterparts) must compare the given argument in a case-sensitive manner, but when looking at HTML elements, the argument must first be converted to ASCII lowercase.

Thus, in an HTML document with nodes in multiple namespaces, these methods will effectively be both case-sensitive and case-insensitive at the same time.

3.5 Interactions with XPath and XSLT

Implementations of XPath 1.0 that operate on HTML documents parsed or created in the manners described in this specification (e.g. as part of the document.evaluate() API) are affected as follows:

In addition to the cases where a name expression would match a node per XPath 1.0, a name expression must evaluate to matching a node when all the following conditions are also met:

These requirements are a willful violation of the XPath 1.0 specification, motivated by desire to have implementations be compatible with legacy content while still supporting the changes that this specification introduces to HTML regarding which namespace is used for HTML elements. [XPATH10]


XSLT 1.0 processors outputting to a DOM when the output method is "html" (either explicitly or via the defaulting rule in XSLT 1.0) are affected as follows:

If the transformation program outputs an element in no namespace, the processor must, prior to constructing the corresponding DOM element node, change the namespace of the element to the HTML namespace, ASCII-lowercase the element's local name, and ASCII-lowercase the names of any non-namespaced attributes on the element.

This requirement is a willful violation of the XSLT 1.0 specification, required because this specification changes the namespaces and case-sensitivity rules of HTML in a manner that would otherwise be incompatible with DOM-based XSLT transformations. (Processors that serialize the output are unaffected.) [XSLT10]

3.6 Dynamic markup insertion

APIs for dynamically inserting markup into the document interact with the parser, and thus their behavior, varies depending on whether they are used with HTML documents (and the HTML parser) or XHTML in XML documents (and the XML parser).

3.6.1 Controlling the input stream

The open() method comes in several variants with different numbers of arguments.

document = document . open( [ type [, replace ] ] )

Causes the Document to be replaced in-place, as if it was a new Document object, but reusing the previous object, which is then returned.

If the type argument is omitted or has the value "text/html", then the resulting Document has an HTML parser associated with it, which can be given data to parse using document.write(). Otherwise, all content passed to document.write() will be parsed as plain text.

If the replace argument is absent or false, a new entry is added to the session history to represent this entry, and the previous entries for this Document are all collapsed into one entry with a new Document object.

The method has no effect if the Document is still being parsed.

window = document . open( url, name, features [, replace ] )

Works like the window.open() method.

document . close()

Closes the input stream that was opened by the document.open() method.

When called with two or fewer arguments, the method must act as follows:

  1. Let type be the value of the first argument, if there is one, or "text/html" otherwise.

  2. Let replace be true if there is a second argument and it is an ASCII case-insensitive match for the value "replace", and false otherwise.

  3. If the document has an active parser that isn't a script-created parser, and the insertion point associated with that parser's input stream is not undefined (that is, it does point to somewhere in the input stream), then the method does nothing. Abort these steps and return the Document object on which the method was invoked.

    This basically causes document.open() to be ignored when it's called in an inline script found during the parsing of data sent over the network, while still letting it have an effect when called asynchronously or on a document that is itself being spoon-fed using these APIs.

  4. Unload the Document object, with the recycle parameter set to true. If the user refused to allow the document to be unloaded, then these steps must be aborted.

  5. If the document has an active parser, then abort that parser, and throw away any pending content in the input stream.

  6. Unregister all event listeners registered on the Document node and its descendants.

  7. Remove any tasks associated with the Document in any task source.

  8. Remove all child nodes of the document, without firing any mutation events.

  9. Replace the Document's singleton objects with new instances of those objects. (This includes in particular the Window, Location, History, ApplicationCache, UndoManager, Navigator, and Selection objects, the various BarProp objects, the two Storage objects, and the various HTMLCollection objects. It also includes all the Web IDL prototypes in the JavaScript binding, including the Document object's prototype.)

  10. Change the document's character encoding to UTF-16.

  11. Change the document's address to the first script's browsing context's active document's address.

  12. Create a new HTML parser and associate it with the document. This is a script-created parser (meaning that it can be closed by the document.open() and document.close() methods, and that the tokenizer will wait for an explicit call to document.close() before emitting an end-of-file token). The encoding confidence is irrelevant.

  13. Mark the document as being an HTML document (it might already be so-marked).
  14. If the type string contains a U+003B SEMICOLON (;) character, remove the first such character and all characters from it up to the end of the string.

    Strip all leading and trailing space characters from type.

    If type is not now an ASCII case-insensitive match for the string "text/html", then act as if the tokenizer had emitted a start tag token with the tag name "pre", then set the HTML parser's tokenization stage's content model flag to PLAINTEXT.

  15. If replace is false, then:

    1. Remove all the entries in the browsing context's session history after the current entry in its Document's History object
    2. Remove any earlier entries that share the same Document
    3. Add a new entry just before the last entry that is associated with the text that was parsed by the previous parser associated with the Document object, as well as the state of the document at the start of these steps. (This allows the user to step backwards in the session history to see the page before it was blown away by the document.open() call.)
  16. Finally, set the insertion point to point at just before the end of the input stream (which at this point will be empty).

  17. Return the Document on which the method was invoked.

When called with three or more arguments, the open() method on the HTMLDocument object must call the open() method on the Window object of the HTMLDocument object, with the same arguments as the original call to the open() method, and return whatever that method returned. If the HTMLDocument object has no Window object, then the method must raise an INVALID_ACCESS_ERR exception.

The close() method must do nothing if there is no script-created parser associated with the document. If there is such a parser, then, when the method is called, the user agent must insert an explicit "EOF" character at the end of the parser's input stream.

3.6.2 document.write()

document . write(text...)

Adds the given string(s) to the Document's input stream. If necessary, calls the open() method implicitly first.

This method throws an INVALID_ACCESS_ERR exception when invoked on XML documents.

The document.write(...) method must act as follows:

  1. If the method was invoked on an XML document, throw an INVALID_ACCESS_ERR exception and abort these steps.

  2. If the insertion point is undefined, the open() method must be called (with no arguments) on the document object. If the user refused to allow the document to be unloaded, then these steps must be aborted. Otherwise, the insertion point will point at just before the end of the (empty) input stream.

  3. The string consisting of the concatenation of all the arguments to the method must be inserted into the input stream just before the insertion point.

  4. If there is a pending external script, then the method must now return without further processing of the input stream.

  5. Otherwise, the tokenizer must process the characters that were inserted, one at a time, processing resulting tokens as they are emitted, and stopping when the tokenizer reaches the insertion point or when the processing of the tokenizer is aborted by the tree construction stage (this can happen if a script end tag token is emitted by the tokenizer).

    If the document.write() method was called from script executing inline (i.e. executing because the parser parsed a set of script tags), then this is a reentrant invocation of the parser.

  6. Finally, the method must return.

3.6.3 document.writeln()

document . writeln(text...)

Adds the given string(s) to the Document's input stream, followed by a newline character. If necessary, calls the open() method implicitly first.

This method throws an INVALID_ACCESS_ERR exception when invoked on XML documents.

The document.writeln(...) method, when invoked, must act as if the document.write() method had been invoked with the same argument(s), plus an extra argument consisting of a string containing a single line feed character (U+000A).

3.6.4 innerHTML

The innerHTML DOM attribute represents the markup of the node's contents.

document . innerHTML [ = value ]

Returns a fragment of HTML or XML that represents the Document.

Can be set, to replace the Document's contents with the result of parsing the given string.

In the case of XML documents, will throw a SYNTAX_ERR if the Document cannot be serialized to XML, or if the given string is not well-formed.

element . innerHTML [ = value ]

Returns a fragment of HTML or XML that represents the element's contents.

Can be set, to replace the contents of the element with nodes parsed from the given string.

In the case of XML documents, will throw a SYNTAX_ERR if the element cannot be serialized to XML, or if the given string is not well-formed.

On getting, if the node's document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the node; otherwise, the node's document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the node instead (this might raise an exception instead of returning a string).

On setting, the following steps must be run:

  1. If the node's document is an HTML document: Invoke the HTML fragment parsing algorithm.

    If the node's document is an XML document: Invoke the XML fragment parsing algorithm.

    In either case, the algorithm must be invoked with the string being assigned into the innerHTML attribute as the input. If the node is an Element node, then, in addition, that element must be passed as the context element.

    If this raises an exception, then abort these steps.

    Otherwise, let new children be the nodes returned.

  2. If the attribute is being set on a Document node, and that document has an active parser, then abort that parser.

  3. Remove the child nodes of the node whose innerHTML attribute is being set, firing appropriate mutation events.

  4. If the attribute is being set on a Document node, let target document be that Document node. Otherwise, the attribute is being set on an Element node; let target document be the ownerDocument of that Element.

  5. Set the ownerDocument of all the nodes in new children to the target document.

  6. Append all the new children nodes to the node whose innerHTML attribute is being set, preserving their order, and firing mutation events as if a DocumentFragment containing the new children had been inserted.

3.6.5 outerHTML

The outerHTML DOM attribute represents the markup of the element and its contents.

element . outerHTML [ = value ]

Returns a fragment of HTML or XML that represents the element and its contents.

Can be set, to replace the element with nodes parsed from the given string.

In the case of XML documents, will throw a SYNTAX_ERR if the element cannot be serialized to XML, or if the given string is not well-formed.

On getting, if the node's document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on a fictional node whose only child is the node on which the attribute was invoked; otherwise, the node's document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on that fictional node instead (this might raise an exception instead of returning a string).

On setting, the following steps must be run:

  1. Let target be the element whose outerHTML attribute is being set.

  2. If target has no parent node, then abort these steps. There would be no way to obtain a reference to the nodes created even if the remaining steps were run.

  3. If target's parent node is a Document object, throw a NO_MODIFICATION_ALLOWED_ERR exception and abort these steps.

  4. Let parent be target's parent node, unless that is a DocumentFragment node, in which case let parent be an arbitrary body element.

  5. If target's document is an HTML document: Invoke the HTML fragment parsing algorithm.

    If target's document is an XML document: Invoke the XML fragment parsing algorithm.

    In either case, the algorithm must be invoked with the string being assigned into the outerHTML attribute as the input, and parent as the context element.

    If this raises an exception, then abort these steps.

    Otherwise, let new children be targets returned.

  6. Set the ownerDocument of all the nodes in new children to target's document.

  7. Remove target from its parent node, firing mutation events as appropriate, and then insert in its place all the new children nodes, preserving their order, and again firing mutation events as if a DocumentFragment containing the new children had been inserted.

3.6.6 insertAdjacentHTML()

element . insertAdjacentHTML(position, text)

Parsed the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:

"beforebegin"
Before the element itself.
"afterbegin"
Just inside the element, before its first child.
"beforeend"
Just inside the element, after its last child.
"afterend"
After the element itself.

Throws a SYNTAX_ERR exception the arguments have invalid values (e.g., in the case of XML documents, if the given string is not well-formed).

Throws a NO_MODIFICATION_ALLOWED_ERR exception if the given position isn't possible (e.g. inserting elements after the root element of a Document).

The insertAdjacentHTML(position, text) method, when invoked, must run the following algorithm:

  1. Let position and text be the method's first and second arguments, respectively.

  2. Let target be the element on which the method was invoked.

  3. Use the first matching item from this list:

    If position is an ASCII case-insensitive match for the string "beforebegin"
    If position is an ASCII case-insensitive match for the string "afterend"

    If target has no parent node, then abort these steps.

    If target's parent node is a Document object, then throw a NO_MODIFICATION_ALLOWED_ERR exception and abort these steps.

    Otherwise, let context be the parent node of target.

    If position is an ASCII case-insensitive match for the string "afterbegin"
    If position is an ASCII case-insensitive match for the string "beforeend"

    Let context be the same as target.

    Otherwise

    Throw a SYNTAX_ERR exception.

  4. If target's document is an HTML document: Invoke the HTML fragment parsing algorithm.

    If target's document is an XML document: Invoke the XML fragment parsing algorithm.

    In either case, the algorithm must be invoked with text as the input, and the element selected in by the previous step as the context element.

    If this raises an exception, then abort these steps.

    Otherwise, let new children be targets returned.

  5. Set the ownerDocument of all the nodes in new children to target's document.

  6. Use the first matching item from this list:

    If position is an ASCII case-insensitive match for the string "beforebegin"

    Insert all the new children nodes immediately before target.

    If position is an ASCII case-insensitive match for the string "afterbegin"

    Insert all the new children nodes before the first child of target, if there is one. If there is no such child, append them all to target.

    If position is an ASCII case-insensitive match for the string "beforeend"

    Append all the new children nodes to target.

    If position is an ASCII case-insensitive match for the string "afterend"

    Insert all the new children nodes immediately after target.

    The new children nodes must be inserted in a manner that preserves their order and fires mutation events as if a DocumentFragment containing the new children had been inserted.