text/html
This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.
charset
The charset
parameter may be provided
to definitively specify the document's character
encoding, overriding any character encoding declarations in the
document. The parameter's value must be the name of the
character encoding used to serialize the file, must be a valid
character encoding name, and must be an ASCII
case-insensitive match for the preferred MIME
name for that encoding. [IANACHARSET]
Entire novels have been written about the security considerations that apply to HTML documents. Many are listed in this document, to which the reader is referred for more details. Some general concerns bear mentioning here, however:
HTML is scripted language, and has a large number of APIs (some of which are described in this document). Script can expose the user to potential risks of information leakage, credential leakage, cross-site scripting attacks, cross-site request forgeries, and a host of other problems. While the designs in this specification are intended to be safe if implemented correctly, a full implementation is a massive undertaking and, as with any software, user agents are likely to have security bugs.
Even without scripting, there are specific features in HTML
which, for historical reasons, are required for broad
compatibility with legacy content but that expose the user to
unfortunate security problems. In particular, the img
element can be used in conjunction with some other features as a
way to effect a port scan from the user's location on the
Internet. This can expose local network topologies that the
attacker would otherwise not be able to determine.
HTML relies on a compartmentalization scheme sometimes known as the same-origin policy. An origin in most cases consists of all the pages served from the same host, on the same port, using the same protocol.
It is critical, therefore, to ensure that any untrusted content that forms part of a site be hosted on a different origin than any sensitive content on that site. Untrusted content can easily spoof any other page on the same origin, read data from that origin, cause scripts in that origin to execute, submit forms to and from that origin even if they are protected from cross-site request forgery attacks by unique tokens, and make use of any third-party resources exposed to or rights granted to that origin.
html
" and "htm
"
are commonly, but certainly not exclusively, used as the
extension for HTML documents.TEXT
Fragment identifiers used with text/html
resources
refer to the indicated part of the document.
text/html-sandboxed
This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.
text/html
text/html
The purpose of the text/html-sandboxed
MIME type
is to provide a way for content providers to indicate that they
want the file to be interpreted in a manner that does not give the
file's contents access to the rest of the site. This is achieved
by assigning the Document
objects generated from
resources labeled as text/html-sandboxed
unique
origins.
To avoid having legacy user agents treating resources labeled
as text/html-sandboxed
as regular
text/html
files, authors should avoid using the .html
or .htm
extensions for
resources labeled as text/html-sandboxed
.
Furthermore, since the text/html-sandboxed
MIME
type impacts the origin security model, authors should be careful
to prevent tampering with the MIME type labeling mechanism itself
when documents are labeled as text/html-sandboxed
. If
an attacker can cause a file to be served as
text/html
instead of
text/html-sandboxed
, then the sandboxing will not
take effect and a cross-site scripting attack will become
possible.
Beyond this, the type is identical to text/html
,
and the same considerations apply.
text/html
text/html-sandboxed
type asserts that the
resource is an HTML document
using the HTML syntax.
text/html
text/html-sandboxed
are
heuristically indistinguishable from those labeled as
text/html
.sandboxed
"TEXT
Fragment identifiers used with text/html-sandboxed
resources refer to the indicated part of the
document.
application/xhtml+xml
This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.
application/xml
[RFC3023]application/xml
[RFC3023]application/xml
[RFC3023]application/xml
[RFC3023]application/xml
[RFC3023]application/xhtml+xml
type asserts that the resource is an XML document that likely has
a root element from the HTML namespace. As such, the
relevant specifications are the XML specification, the Namespaces
in XML specification, and this specification. [XML] [XMLNS]
application/xml
[RFC3023]application/xml
[RFC3023]xhtml
" and "xht
"
are sometimes used as extensions for XML resources that have a
root element from the HTML namespace.TEXT
Fragment identifiers used with application/xhtml+xml
resources have the same semantics as with any XML MIME
type. [RFC3023]
text/cache-manifest
This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.
Cache manifests themselves pose no immediate risk unless sensitive information is included within the manifest. Implementations, however, are required to follow specific rules when populating a cache based on a cache manifest, to ensure that certain origin-based restrictions are honored. Failure to correctly implement these rules can result in information leakage, cross-site scripting attacks, and the like.
CACHE
MANIFEST
", followed by either a U+0020 SPACE character, a
U+0009 CHARACTER TABULATION (tab) character, a U+000A LINE FEED
(LF) character, or a U+000D CARRIAGE RETURN (CR) character.appcache
"Fragment identifiers have no meaning with
text/cache-manifest
resources.