This document illustrates how to write HTML 5 documents, focussing on simplicity and practical applications for beginners while also providing in depth information for more advanced web developers.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is an Editors Draft of “The Web Developer’s Guide to HTML 5” produced by the HTML Working Group, part of the HTML Activity. The working group is working on a new version of HTML not yet published under TR. In the meantime, you can access the HTML 5 Editor's draft. The appropriate forum for comments on this document is public-html-comments@w3.org (public archive) or public-html@w3.org (public archive).
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The most common format for publishing documents on the web and creating web applications is HTML. From its beginning as a relatively simple language primarily designed for describing scientific documents, it has grown and adapted to a wide variety needs ranging from publishing news and blogs, to providing the foundation for full blown applications for email, maps, word processing and spreadsheets.
As the uses of HTML have grown, the demands placed upon it by authors have increased and the limitations of HTML become more pronounced. HTML 5 is attempting to fill these limitations with new features designed specifically the address the needs of authors.
However, the way the HTML5 specification is written is very much targeted towards implementers rather than web designers and developers, making it more difficult to read and understand. This document is intended to meet the needs of web developers by focussing on document conformance criteria and authoring guidelines.
Authors who are familiar with previous versions of HTML are advised to familiarise themselves with the differences from HTML 4 [HTML4DIFF]
In general, the purpose of writing and publishing a document is to convey information to the readers. This could be any kind of information, such as telling a story, reporting news and current affairs or describing available products and services. Whatever the information is, it needs to be conveyed to the reader in a way that can be easily understood.
A typical document, such as an book, news article, blog entry or letter is often grouped into different sections containing a variety of headings, paragraphs, lists, tables, quotes and various other typographical structures. All of these structures are important for more easily conveying information to the reader and thus authors need a way to clearly identify each of these structures in a way that can then be easily presented to the user. This is the purpose of markup.
Markup is a machine readable language that describes aspects of a document such as its structure, semantics and/or style. Some markup languages are designed solely for the purpose of describing the presentation of the document, such as RTF (Rich Text Format). Others, such as HTML, are more generic and rather than focussing on describing the presentation, they are designed to focus on describing the meaning or purpose of the content and leave the presentation for another layer to deal with.
HTML provides a wide variety of semantic elements that can be used to mark up various common typographical structures. There are heading elements for marking up different levels of headings, a paragraph (p) element for paragraph, various list elements for marking up different types of lists, and a table elements for marking up tables.
It's important to distinguish between the structure and semantics of content, which should be described using HTML, and its presentation. In one document, a heading may be presented visually in a large bold typeface with wide margins above and below to separate it from the surrounding content and make it stand out. In another document, a heading may be presented in a light coloured, italic, fancy script typeface. But regardless of the presentation, it's still a heading and the markup can still uses the same basic elements for identifying common structures.
To ease readability and improve understanding, this document uses a number of conventions.
Notes are used throughout this document to provide additional information. Tips are used to provide useful hints and suggestions. Warnings are used to point out common authoring errors and highlight important issues to be aware of.
[Need to provide examples of these]
Example markup is provided for both HTML and XHTML. In some cases, the markup is the same and thus only one example is needed, but in others there may be differences syntactic differences. Where HTML and XHTML differ, separate examples are given with each one clearly labelled.
HTML Example:
<!DOCTYPE html> <html lang="en"> <head> <title>HTML Example</title> </head> <body> <p>This is a sample HTML document. </body> </html>
XHTML Example:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>XHTML Example</title> </head> <body> <p>This is a sample XHTML document.</p> </body> </html>
Sometimes, erroneous examples are included. This is usually done to illustrate common authoring errors, bad practices and other issues to be cautious of.
Erroneous Example:
<p>This markup contains a <em><strong>mistake</em></strong></p>
Unless explicitly stated otherwise for a specific purpose, all attribute values in examples are quoted using double quotes. In HTML examples, boolean attributes are written in their minimised form and in XHTML examples, they are written in expanded form.
HTML Example:
<input type="checkbox" checked>
XHTML Example:
<input type="checkbox" checked="checked"/>
In XHTML examples, due to the XML Well-Formedness requirements, void elements are always marked up using the trailing slash.
XHTML Example:
<img src="image.png" alt="example"/>
In HTML, however, the trailing slash is optional and, unless explicitly stated otherwise, is always omitted.
HTML Example:
<img src="image.png" alt="example">
Some XHTML examples make use of XML namespaces. In such cases, the
following prefixes are assumed to be defined even if there is no
xmlns attributes in the fragment of code.
xml
http://www.w3.org/XML/1998/namespace
html
http://www.w3.org/1999/xhtml
math
http://www.w3.org/1998/Math/MathML
svg
http://www.w3.org/2000/svg
XHTML Example:
<html xml:lang="en"> ... </html>
XHTML Example:
<div> <svg:svg><svg:circle r="50" cx="50" cy="50" fill="green"/></svg:svg> </div>
There is a difference between the vocabulary of HTML—the elements and attributes that can be used, and their meanings—and the way the document is represented. Typically, documents are created by writing them in either the HTML or XHTML syntax. When a browser opens the page, it reads the file and creates a Document Object Model (DOM) as its internal representation. The DOM is commonly used by web developers through the use of scripts that modify the content on the page.
The HTML syntax is loosely based upon the older, though very widely used syntax from HTML 4.01. Although it is inspired by its SGML origins, it really only shares minor syntactic similarities. This features a range of shorthand syntaxes, designed to make hand coding more convenient, such as allowing the omission of some optional tags and attribute values. Authors are free to choose whether or not they wish to take advantage of these shorthand features based upon their own personal preferences.
The following example illustrates a basic HTML document, demonstrating a few of the shorthand syntax
HTML Example:
<!DOCTYPE html>
<html>
<head>
<title>An HTML Document</title>
</head>
<body class=example>
<h1>Example</h1>
<p>This is an example HTML document.
</body>
</html>
The XHTML syntax uses the more strict XML syntax. While this too is inspired by SGML, this syntax requires documents to be well-formed, which some people prefer because of it's stricter error handling, forcing authors to maintain cleaner markup.
XHTML Example:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An HTML Document</title>
</head>
<body class="example">
<h1>Example</h1>
<p>This is an example HTML document.</p>
</body>
</html>
Note: The XHTML document does not need to include the DOCTYPE
because XHTML documents that are delivered correctly using an XML MIME
type and are processed as XML by browsers, are always rendered in no
quirks mode. However, the DOCTYPE may optionally be included, and
should be included if the document uses the compatible subset of markup
that is conforming in both HTML and XHTML, and is ever expected to be used
in text/html environments.
Both the HTML and XHTML syntax appear similar and it is possible to mark up documents using a common subset of of the syntax that is the same in both, while avoiding the syntactic sugar that is unique to each.
The Document Object Model (DOM) is the internal representation used by web browsers for storing the document in memory. This represents the document as a tree and exposes various APIs designed for authors to work with the document using scripts.
The following diagram illustrates the DOM structure representing the the document that would be created by the previous examples.
[TODO: INSERT DOM TREE IMAGE]
The DOM tree includes a title element in the head and h1 and p elements in the body.
Some of the basic APIs include the ability to add and remove elements and set attributes. Some elements feature more specialised APIs, such as form controls, canvas and the multimedia elements.
The Document Type Declaration needs to be present at the beginning of a document that uses the HTML syntax. It may optionally be used within the XHTML syntax, but it is not required.
<!DOCTYPE html>
The DOCTYPE originates from HTML’s SGML lineage and, in
previous levels of HTML, was originally used to refer to a Document Type
Definition (DTD) — a formal declaration of the elements, attributes and
syntactic features that could be used within the document. Those who are
familiar with previous levels of HTML will notice that there is no
PUBLIC or SYSTEM identifier present in this
DOCTYPE, which were used to refer to the DTD.
As HTML5 is no longer formally based upon SGML, the DOCTYPE
no longer serves this purpose, and thus it does not refer to a DTD
anymore. However, due to legacy constraints, it has gained another very
important purpose: triggering no-quirks mode in browsers.
HTML 5 defines three modes: quirks mode, limited quirks mode and no quirks mode, of which only the latter is considered conforming to use. The reason for this is due to backwards compatibility. The important thing to understand is that there are differences in the way documents are visually rendered in each of the modes and to ensure the most standards compliant rendering, it is important to ensure no-quirks mode is used.
Elements are marked up using start tags and end tags. Tags are delimited using angle brackets with the tag name in between. The difference between start tags and end tags is that the latter includes a slash before the tag name.
Example:
This example paragraph illustrates the use of start tags and end tags.
<p>The quick brown fox jumps over the lazy dog.</p>
In both tags, whitespace is permitted between the tag name and the closing right angle bracket, however it is usually omitted because it's redundant.
In XHTML, tag names are case sensitive and are usually defined to be written in lowercase. In HTML, however, tag names are case insensitive and may be written in all uppercase or mixed case, although the most common convention is to stick with lowercase. The case of the start and end tags do not have to be the same, but being consistent does make the code look cleaner.
HTML Example:
<DIV>...</DIV>
An empty element is any element that does not contain any content within it. In general, an empty element is just one with a start tag immediately followed by its associated end tag. In both HTML and XHTML syntaxes, this can be represented in the same way.
Example:
<span></span>
Some elements, however, are forbidden from containing any content at
all. These are known as void elements. In HTML, the above syntax
cannot be used for void elements. For such elements, the end tag must be
omitted because the element is automatically closed by the parser. Such
elements include, among others, br, hr,
link and meta
HTML Example:
<link type="text/css" href="style.css">
In XHTML, the XML syntactic requirements dictate that this must be made explicit using either an explicit end tag, as above, or the empty element syntax. This is achieved by inserting a slash at the end of the start tag immediately before the right angle bracket.
Example:
<link type="text/css" href="style.css"/>
Authors may optionally choose to use this same syntax for void elements in the HTML syntax as well. Some authors also choose to include whitespace before the slash, however this is not necessary. (Using whitespace in that fashion is a convention inherited from the compatibility guidelines in XHTML 1.0, Appendix C.)
Elements may contain attributes that are used to set various properties of an element. Some attributes are defined globally and can be used on any element, while others are defined for specific elements only. All attributes have a name and a value and look like this.
Example:
This example illustrates how to mark up a div element with
an attribute named class using a value of
"example".
<div class="example">...</div>
Attributes may only be specified within start tags and must never be used in end tags.
Erroneous Example:
<section id="example">...</section id="example">
In XHTML, attribute names are case sensitive and most are defined to be lowercase. In HTML, attribute names are case insensitive, and so they could be written in all uppercase or mixed case, depending on your own preferences. It is conventional, however, to use the same case as would be used in XHTML, which is generally all lowercase.
HTML Example:
<div CLASS="example">
In general, the values of attributes can contain any text or character references, although depending on the syntax used, some additional restrictions apply, which are outlined below.
There are four slightly different syntaxes that may be used for attributes in HTML: empty, unquoted, single-quoted and double-quoted. All four syntaxes may be used in the HTML syntax, depending on what is needed for each specific attribute. However, in the XHTML syntax, attribute values must always be quoted using either single or double quotes.
An empty attribute is one where the value has been omitted. This is a syntactic shorthand for specifying the attribute with an empty value, and is commonly used for boolean attributes. This syntax may be used in the HTML syntax, but not in the XHTML syntax.
Note: In previous editions of HTML, which were formally based on SGML, it was technically an attribute's name that could be omitted where the value was a unique enumerated value specified in the DTD. However, due to legacy constraints, this has been changed in HTML5 to reflect the way implementations really work.
HTML Example:
<input disabled>...</div>
The previous example is equivalent to specifying the attribute with an empty string as the value.
<input disabled="">...</div>
Note: The previous example is semantically equivalent to
specifying the attribute with the value "disabled", but it is
not exactly the same.
Example:
<img src="decoration.png" alt>
The previous example is equivalent to specifying the attribute with an empty string as the value.
<img src="decoration.png" alt="">
In HTML, but not in XHTML, the quotes surrounding the value may also be
omitted in most cases. The value may contain any characters except for
spaces, single or double quotes (' or "), an
equals sign (=) or a greater-than symbol (>).
If you need an attribute to contain those characters, they either need to
be escaped using character references, or you need to use either the single- or double-quoted attribute values.
Some additional characters cannot be used in unquoted attribute values,
including space characters, single (') or double
(") quotation marks, equals signs (=) or greater
than signs (>).
HTML Example:
<div class=example>
In both HTML and XHTML, attribute values may be surrounded with double quotes.
By quoting attributes, the value may contain the additional characters that can't be used in unquoted attribute values, but for obvious reasons, these attributes cannot contain additional double quotation marks within the value.
Example:
<div class="example class names">...</div>
In both HTML and XHTML, attribute values may be surrounded with double quotes.
By quoting attributes, the value may contain the additional characters that can't be used in unquoted attribute values, but for obvious reasons, these attributes cannot contain additional single quotation marks within the value.
Example:
<div class='example class names'>...</div>
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following categories are used in this guide:
Some elements have unique requirements and do not fit into any particular category.
[Create and link to some sort of index of elements that lists each element in each category.]
Metadata content includes elements for marking up document metadata; marking up or linking to resources that describe the behaviour or presentation of the document; or indicate relationships with other documents.
Metadata elements appear within the head of a document. Some common examples of
metadata elements include: title, meta,
link, script and style
Most elements that are used in the body of documents and applications are categorised as flow content. Most of the elements used to mark up the main content in the body of a page are considered to be flow content. In general, this includes elements that are presented visually as either block level or inline level.
Some common flow content includes elements like div, p, ul, and
As a general rule, elements whose content model allows any flow content should have either at least one descendant text node that is not inter-element whitespace, or at least one descendant element node that is embedded content. For the purposes of this requirement, del elements and their descendants must not be counted as contributing to the ancestors of the del element.
This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.
Heading content includes the elements for marking up headings. Headings,
in conjunction with the sectioning elements, are used to describe the the
struction of the document. Heading content elements include the
header element and the h1 to h6
elements.
Phrasing content includes text and text-level markup. This is similar to the concept of inline level elements in HTML 4.01. Phrasing content is a sub-category of prose contnet, and thus all phrasing content is also considered to be prose content. However, most elements that are categorised as phrasing content can only contain other phrasing content.
Some common examples of phrasing content elements include
abbr, em, strong and
span
Embedded content includes elements that load external resources into the
document. Such external resources include, for example, images, videos and
Flash-based content. Some embedded content elements include
img, object, embed and
video.
Sectioning elements are elements that divide the page into, for lack of a better word, sections. This section describes HTML's sectioning elements and elements that support them.
Some elements are scoped to their nearest ancestor sectioning element.
For example, address elements apply just to their section.
For such elements x, the elements that apply to a
sectioning element e are all the x
elements whose nearest sectioning element is e.
html elementThe html element is the document’s
root element, which contains all of the document’s content. Every
document must have an html element
head elementThe head element is the container for
the document’s metadata. Metadata is information about the document
itself, such as it's title, author. Scripts and stylesheets may also be
included within the head element. Every
document must have a head element.
<head> ... </head>
html
element. It may only occur once.
<head>) optional, implied as the first
child of the html element.
</head>) optional, implied immediately
before the body element.
body elementThe body element represents the main content of the document. Every
document must have a body and there can only be one per document. The
body element must occur as the second
child of the html element, after the
head element.
<body> ... </body>
html
element. It may only occur once.
<body>) optional, implied after the
head element.
</body>) optional, implied immediately
before the end of the html element.
The following examples illustrate the typical usage of the body element in HTML and XHTML.
HTML Example
<!DOCTYPE html> <html> <head> <title>Example</title> </head> <body> <h1>Document</h1> </body> </html>
XHTML Example
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Example</title> </head> <body> <h1>Document</h1> </body> </html>
In the HTML serialisation, both the start and end tags are optional. You may choose to omit both or include either the start tag only, end tag only, or both. Some authors appreciate the convenience of omitting the start and/or end tags, whereas others prefer to explicitly include them for clarity.
Note: This does not apply to XHTML.
HTML Example
<html> <head> ... </head> <h1>Document</h1> <p>Thebodystart tag is implied immediately before theh1element and the end tag before the end of thehtmlelement. </html>
It is sometimes useful to apply class or id
attributes to the body element, which may be used as a hook for page or
site specific styling or scripting.
In the following example, the body element has been given the class of
"archive", to indicate that the page is an archive page.
This may be used, for example, to style archive pages
differently from other pages on the site.
Example
<body class="archive"> ... </body>
[Maybe mention/refer to document.body here]
section elementSectioning block-level element.
style elements, followed by zero or more
block-level elements.
HTMLElement.
The section element represents a
generic document or application section. A section, in this context, is a
thematic grouping of content, typically with a header, possibly with a
footer. Examples of sections would be chapters, the various tabbed pages
in a tabbed dialog box, or the numbered sections of a thesis. A Web site's
home page could be split into sections for an introduction, news items,
contact information.
Example
<body>
<h1>Top Level Heading</h1>
<section>
<h1>Second Level Heading</h1>
<section>
<h1>Third Level Heading</h1>
</section>
</section>
</body>
text/html media type.
application/xhtml+xml or application/xml
[Need to expand this as required.]
The editor would like to thank:
[TODO]