org.w3c.mwi.mobileok.basic
Class XhtmlContent

java.lang.Object
  extended by org.w3c.mwi.mobileok.basic.DecodedContent
      extended by org.w3c.mwi.mobileok.basic.TextContent
          extended by org.w3c.mwi.mobileok.basic.XhtmlContent

public class XhtmlContent
extends TextContent

Represents an XHTML resource.

The class extends TextContent to add XHTML markup validation

A tidied version of the resource, created by a Parser is used when the resource is not a valid XHTML one. Tidying the resource may not always be possible.

Version:
$Revision: 1.1 $
Author:
The W3C mobileOK Checker Task Force

Nested Class Summary
private static class XhtmlContent.CommonCatalogResolver
           
private static class XhtmlContent.MinimizeHandler
           
private  class XhtmlContent.XHTMLBasicCatalogResolver
           
private static class XhtmlContent.XHTMLCatalogResolver
           
private  class XhtmlContent.XHTMLMPCatalogResolver
           
 
Field Summary
private  java.net.URI baseUri
          Base URI of the HTML document.
private static java.util.regex.Pattern commentPattern
          Regular expression to extract comments.
private static java.util.regex.Pattern doctypePattern
          Regular expression to extract a generic DOCTYPE declaration.
private  org.w3c.dom.Document dom
          Resource content represented as a DOM tree.
private  int extraneousChars
          Total number of ignorable whitespace and comments characters.
private  boolean hasXmlDeclaration
          True when the document contains an XML declaration.
private  boolean hasXmlNamespace
          True when the document defines the XHTML namespace.
private static java.util.regex.Pattern htmlDoctypePattern
          Regular expression to extract an HTML DOCTYPE declaration.
private static java.util.regex.Pattern htmlRootPattern
          Regular expression to extract the html root element.
private  java.util.List<ValidationLineAndColumnMessage> markupErrorMessageList
          List of markup validation errors when the document is validated against its declared DTD.
private  XHTMLValidationStatus markupValidationStatus
          Final validation status when the document is validated against its declared DTD.
private  java.util.List<ValidationLineAndColumnMessage> mobileErrorMessageList
          List of markup validation errors when the document is validated against the XHTML Basic 1.1 DTD (or XHTML MP 1.2 DTD).
private  XHTMLValidationStatus mobileValidationStatus
          Final validation status when the document is validated against the XHTML Basic 1.1 DTD (or XHTML MP 1.2 DTD).
private  java.lang.String publicDoctype
          Public ID of the DOCTYPE declaration.
private  int rootElementLine
          Index of the line that contains the html element declaration, 0 when the element is not defined.
private  java.lang.String systemDoctype
          System ID of the DOCTYPE declaration.
private  int totalChars
          Total number of characters.
private static java.util.regex.Pattern xmlDeclaration
          Regular expression to extract the XML declaration line.
private static java.util.regex.Pattern xmlNamespace
          Regular expression to check that the XHTML namespace is defined on the html element.
 
Constructor Summary
XhtmlContent(java.net.URI uri, java.util.List<RetrievalElement> retrieved)
          Creates a class instance bound to a URI.
 
Method Summary
static int countExtraneousChars(char[] ch, int start, int length)
          Returns the number of redundant whitespaces in the given byte array.
private static int findRootElementLine(java.lang.String body)
          Computes the index of the line that contains the html declaration in the resource's content.
 java.net.URI getBaseUri()
          Returns the base URI of the document, i.e.
 org.w3c.dom.Document getDOM()
          Resource content represented as a DOM tree.
 int getExtraneousChars()
          Returns the total number of ignorable whitespaces and comments characters.
 XHTMLValidationStatus getMarkupValid()
          Returns the final result of the markup validation against the document's declared DTD.
 java.util.List<ValidationLineAndColumnMessage> getMobileErrorMessageList()
          Returns the list of markup validation errors against the XHTML Basic 1.1 or XHTML MP 1.2 DTD.
 XHTMLValidationStatus getMobileValid()
          Returns the markup validation status against the XHTML Basic 1.1 or XHTML MP 1.2 DTD.
 java.lang.String getPublicDoctype()
          Returns the public ID of the resource's DOCTYPE.
 int getRootElementLine()
          Returns the index of the line in the resource content that contains the html declaration.
 org.w3c.dom.Document getSaxonDocument()
          Returns a Saxon-compliant representation of the given DOM document to take advantage of Saxon's features, and in particular the possibility to keep line numbers even when the document is represented as a DOM tree.
 java.lang.String getSystemDoctype()
          Returns the system ID of the resource's DOCTYPE.
 int getTotalChars()
          Returns the total number of characters in the resource's content.
 java.util.List<ValidationLineAndColumnMessage> getXHTMLErrorMessageList()
          Returns the list of markup validation errors against the document's declared DTD.
 boolean hasXmlDeclaration()
          Returns true when the resource contains an XML declaration.
 boolean hasXmlNamespace()
          Returns true when the resource references the XHTML namespace in the html declaration.
private static org.w3c.dom.Document parseDOM(java.lang.String body)
          Parses the given string as XML and returns the corresponding DOM Document.
private static org.w3c.dom.Document parseTidiedDOM(java.lang.String body)
          Parses the given string as not possibly not entirely valid XML and returns the corresponding DOM Document.
private  boolean setHasXmlDeclaration()
          Parses the beginning of the document in search of an XML declaration.
private  boolean setHasXmlNamespace()
          Parses the beginning of the document in search of an HTML namespace declaration.
 org.w3c.dom.Node toMokiNode(org.w3c.dom.Document document, org.w3c.dom.Node parent)
          Serializes the content to its moki representation as a DOM node.
private  XHTMLValidationStatus validateMarkup()
          Validates the document against its declared DOCTYPE, when known.
private  XHTMLValidationStatus validateMobile()
          Validates the document against mobileOK recommended mobile DTDs.
 
Methods inherited from class org.w3c.mwi.mobileok.basic.TextContent
getBody, getUTF8ErrorMessageList, isValid
 
Methods inherited from class org.w3c.mwi.mobileok.basic.DecodedContent
addByteErrorMessages, addLineAndColumnMessages
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

xmlDeclaration

private static final java.util.regex.Pattern xmlDeclaration
Regular expression to extract the XML declaration line.


xmlNamespace

private static final java.util.regex.Pattern xmlNamespace
Regular expression to check that the XHTML namespace is defined on the html element.


commentPattern

private static final java.util.regex.Pattern commentPattern
Regular expression to extract comments.


htmlDoctypePattern

private static final java.util.regex.Pattern htmlDoctypePattern
Regular expression to extract an HTML DOCTYPE declaration.


htmlRootPattern

private static final java.util.regex.Pattern htmlRootPattern
Regular expression to extract the html root element.


doctypePattern

private static final java.util.regex.Pattern doctypePattern
Regular expression to extract a generic DOCTYPE declaration.


dom

private org.w3c.dom.Document dom
Resource content represented as a DOM tree.

The DOM tree may represent a tidied version of the content. If null, it means the content could not be tidied.


rootElementLine

private int rootElementLine
Index of the line that contains the html element declaration, 0 when the element is not defined.


baseUri

private java.net.URI baseUri
Base URI of the HTML document.

Equals the URI of the resource unless a base element redefines it in the head section.


mobileErrorMessageList

private java.util.List<ValidationLineAndColumnMessage> mobileErrorMessageList
List of markup validation errors when the document is validated against the XHTML Basic 1.1 DTD (or XHTML MP 1.2 DTD).


markupErrorMessageList

private java.util.List<ValidationLineAndColumnMessage> markupErrorMessageList
List of markup validation errors when the document is validated against its declared DTD.


markupValidationStatus

private XHTMLValidationStatus markupValidationStatus
Final validation status when the document is validated against its declared DTD.


mobileValidationStatus

private XHTMLValidationStatus mobileValidationStatus
Final validation status when the document is validated against the XHTML Basic 1.1 DTD (or XHTML MP 1.2 DTD).


hasXmlDeclaration

private boolean hasXmlDeclaration
True when the document contains an XML declaration.


hasXmlNamespace

private boolean hasXmlNamespace
True when the document defines the XHTML namespace.


publicDoctype

private java.lang.String publicDoctype
Public ID of the DOCTYPE declaration.


systemDoctype

private java.lang.String systemDoctype
System ID of the DOCTYPE declaration.


extraneousChars

private int extraneousChars
Total number of ignorable whitespace and comments characters.


totalChars

private int totalChars
Total number of characters.

Constructor Detail

XhtmlContent

public XhtmlContent(java.net.URI uri,
                    java.util.List<RetrievalElement> retrieved)
             throws TestException
Creates a class instance bound to a URI.

The content is validated while it is instantiated.

Parameters:
uri - absolute URI of the resource.
retrieved - the retrieved representation of the resource.
Throws:
TestException - an unexpected error occurred
Method Detail

parseDOM

private static org.w3c.dom.Document parseDOM(java.lang.String body)
                                      throws TestException
Parses the given string as XML and returns the corresponding DOM Document.

Parameters:
body - XML string to parse.
Returns:
the DOM Document representation of the XML string, null when the string is not valid XML.
Throws:
TestException - an unexpected error occurred.

parseTidiedDOM

private static org.w3c.dom.Document parseTidiedDOM(java.lang.String body)
                                            throws TestException
Parses the given string as not possibly not entirely valid XML and returns the corresponding DOM Document.

Attempts are made to tidy up the string to make it valid. Tidying is not a guaranteed process and parsing may thus not be possible.

Parameters:
body - invalid XML string to parse.
Returns:
the DOM Document representation of the XML string once tidied, null when the string could not be tidied.
Throws:
TestException - an unexpected error occurred.

getDOM

public org.w3c.dom.Document getDOM()
Resource content represented as a DOM tree.

If null, it means the content could not be parsed (e.g. because it is malformed).

Returns:
a DOM tree representing the resource content.

getRootElementLine

public int getRootElementLine()
Returns the index of the line in the resource content that contains the html declaration.

Returns:
index of the html line, 0 if not found.

getXHTMLErrorMessageList

public java.util.List<ValidationLineAndColumnMessage> getXHTMLErrorMessageList()
Returns the list of markup validation errors against the document's declared DTD.

Returns:
a list of markup validation errors with their position in the content.

getMobileErrorMessageList

public java.util.List<ValidationLineAndColumnMessage> getMobileErrorMessageList()
Returns the list of markup validation errors against the XHTML Basic 1.1 or XHTML MP 1.2 DTD.

Returns:
a list of markup validation errors with their position in the content.

getMarkupValid

public XHTMLValidationStatus getMarkupValid()
Returns the final result of the markup validation against the document's declared DTD.

Returns:
the final XHTML validation status.

getMobileValid

public XHTMLValidationStatus getMobileValid()
Returns the markup validation status against the XHTML Basic 1.1 or XHTML MP 1.2 DTD.

Returns:
the final XHTML validation status.

getPublicDoctype

public java.lang.String getPublicDoctype()
Returns the public ID of the resource's DOCTYPE.

Returns:
the public DOCTYPE ID, null if not defined.

getSystemDoctype

public java.lang.String getSystemDoctype()
Returns the system ID of the resource's DOCTYPE.

Returns:
the system DOCTYPE ID, null if not defined.

hasXmlDeclaration

public boolean hasXmlDeclaration()
Returns true when the resource contains an XML declaration.

Returns:
true when the XML declaration could be found, false otherwise.

toMokiNode

public org.w3c.dom.Node toMokiNode(org.w3c.dom.Document document,
                                   org.w3c.dom.Node parent)
Serializes the content to its moki representation as a DOM node.

Overrides:
toMokiNode in class TextContent
Parameters:
document - DOM document the created node should belong to
parent - DOM node to which the representation should be appended.
Returns:
the moki representation of the resource as a DOM node.

getTotalChars

public int getTotalChars()
Returns the total number of characters in the resource's content.

Returns:
the number of characters in the retrieved document.

getExtraneousChars

public int getExtraneousChars()
Returns the total number of ignorable whitespaces and comments characters.

Returns:
the number of useless characters in the retrieved document.

hasXmlNamespace

public boolean hasXmlNamespace()
Returns true when the resource references the XHTML namespace in the html declaration.

Returns:
true when the html element contains the namespace, false otherwise.

getBaseUri

public java.net.URI getBaseUri()
Returns the base URI of the document, i.e. either the URI of the document itself or the URI defined in a base element.

Returns:
the base URI against which relative URIs should be resolved.

validateMarkup

private XHTMLValidationStatus validateMarkup()
                                      throws TestException
Validates the document against its declared DOCTYPE, when known.

The list of well-known DTDs is managed in a catalog resolver (see ExtendedCatalogResolver). XHTMLValidationStatus.NOT_VALIDATED is returned when the DTD is unknown or not defined.

Validation errors are kept in this instance for later retrieval through a call to getXHTMLErrorMessageList().

Returns:
the result of the markup validation of the document, XHTMLValidationStatus.NOT_VALIDATED when the DTD is unknown or not defined.
Throws:
TestException - an unexpected error occurred.

validateMobile

private XHTMLValidationStatus validateMobile()
                                      throws TestException
Validates the document against mobileOK recommended mobile DTDs.

The method attempts to validate the document against the XHTML Basic 1.1 DTD. If the validation fails, it then tries to validate the document against the XHTML MP 1.2 DTD.

Validation errors are kept in instances of this class for later retrieval through a call to getMobileErrorMessageList(). The list of errors contained in that list corresponds to the validation against the XHTML Basic 1.1 DTD.

NB: apart from the start attribute of the ol element, XHTML Basic 1.1 is a superset of XHTML MP 1.2, and so the validation against XHTML MP 1.2 is solely performed to take that exception to the rule into consideration.

Returns:
the result of the markup validation of the document against the XHTML Basic 1.1 and XHTML MP 1.2 DTDs.
Throws:
TestException - an unexpected error occurred.

findRootElementLine

private static int findRootElementLine(java.lang.String body)
Computes the index of the line that contains the html declaration in the resource's content.

Parameters:
body - content of the resource to parse
Returns:
the index of the line with the html declaration, 0 if not found.

getSaxonDocument

public org.w3c.dom.Document getSaxonDocument()
                                      throws TestException
Returns a Saxon-compliant representation of the given DOM document to take advantage of Saxon's features, and in particular the possibility to keep line numbers even when the document is represented as a DOM tree.

Returns:
the Saxon representation of the DOM document.
Throws:
TestException - an unexpected error occurred while building the document.

setHasXmlDeclaration

private boolean setHasXmlDeclaration()
                              throws java.io.IOException
Parses the beginning of the document in search of an XML declaration.

Returns:
true when the document contains an XML declaration, false otherwise.
Throws:
java.io.IOException - the document body could not be read.

setHasXmlNamespace

private boolean setHasXmlNamespace()
                            throws java.io.IOException
Parses the beginning of the document in search of an HTML namespace declaration.

Returns:
true when the document contains an HTML namespace declaration, false otherwise.
Throws:
java.io.IOException - the document body could not be read.

countExtraneousChars

public static int countExtraneousChars(char[] ch,
                                       int start,
                                       int length)
Returns the number of redundant whitespaces in the given byte array.

Parameters:
ch - the byte array to parse.
start - position where search should begin (must be superior than or equal to 0).
length - position where search should end (start+length must be inferior than or equal to the length of the byte array)
Returns:
the number of redundant whitespaces in the given byte array.