org.w3c.mwi.mobileok.basic
Class TextContent

java.lang.Object
  extended by org.w3c.mwi.mobileok.basic.DecodedContent
      extended by org.w3c.mwi.mobileok.basic.TextContent
Direct Known Subclasses:
CssContent, XhtmlContent

public class TextContent
extends DecodedContent

Base class that represents an UTF-8 encoded text content.

The class imlpements DecodedContent and adds UTF-8 decoding functionality. The isValid() method returns true when the resource is correctly encoded in UTF-8, false otherwise. The getUTF8ErrorMessageList() method can be used to retrieve the list of UTF-8 validation errors.

Attempts are made to decode HTTP responses bodies that are not encoded in UTF-8. There is no guarantee that decoding may work, and the list of supported encodings actually depends on the system and/or on the version of the Java platform on which the code is run.

Version:
$Revision: 1.1 $
Author:
Francois Daoust

Field Summary
private  java.lang.String body
          Decoded body
private static java.nio.charset.Charset UTF8_CHARSET
          The UTF-8 charset.
private  java.util.List<ValidationByteMessage> utf8ErrorMessageList
          List of UTF-8 validation errors.
private  boolean validUTF8
          True when the resource is valid UTF-8.
 
Constructor Summary
TextContent(java.net.URI uri, java.util.List<RetrievalElement> retrieved)
          Creates a class instance bound to a URI.
 
Method Summary
private  void decodeBodyAsUTF8(RetrievalElement retrieved)
          Decodes the resource content as a UTF-8 encoded stream.
 java.lang.String getBody()
          Returns the decoded resource content.
 java.util.List<ValidationByteMessage> getUTF8ErrorMessageList()
          Returns the list of UTF-8 validation errors that occurred while trying to decode the resource content.
private  java.lang.String getXMLPrologEncoding(RetrievalElement retrieved)
          Extracts the encoding from the XML declaration.
 boolean isValid()
          Returns true when the resource is correctly encoded in UTF-8, false otherwise.
 org.w3c.dom.Node toMokiNode(org.w3c.dom.Document document, org.w3c.dom.Node parent)
          Serializes the content to its moki representation as a DOM node, and in practice returns null because there is no defined moki representation for generic text content.
private  void validateUTF8(RetrievalElement retrieved)
          Validates the UTF-8 encoding of the HTTP Response received.
 
Methods inherited from class org.w3c.mwi.mobileok.basic.DecodedContent
addByteErrorMessages, addLineAndColumnMessages
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UTF8_CHARSET

private static final java.nio.charset.Charset UTF8_CHARSET
The UTF-8 charset.


utf8ErrorMessageList

private java.util.List<ValidationByteMessage> utf8ErrorMessageList
List of UTF-8 validation errors.


body

private java.lang.String body
Decoded body


validUTF8

private boolean validUTF8
True when the resource is valid UTF-8.

Constructor Detail

TextContent

public TextContent(java.net.URI uri,
                   java.util.List<RetrievalElement> retrieved)
            throws TestException
Creates a class instance bound to a URI.

The content is validated while it is instantiated.

Parameters:
uri - absolute URI of the resource.
retrieved - the retrieved representation of the resource.
Throws:
TestException - an unexpected error occurred while retrieving the resource.
Method Detail

getUTF8ErrorMessageList

public final java.util.List<ValidationByteMessage> getUTF8ErrorMessageList()
Returns the list of UTF-8 validation errors that occurred while trying to decode the resource content.

Returns:
a non null list of UTF-8 error messages.

getBody

public final java.lang.String getBody()
Returns the decoded resource content.

Returns:
the body of the resource.

decodeBodyAsUTF8

private void decodeBodyAsUTF8(RetrievalElement retrieved)
Decodes the resource content as a UTF-8 encoded stream.

The decoder attempts to recover from encoding errors, but recovering is not always possible. In such cases, getBody() would return null.

Parameters:
retrieved - the last retrieved exchange that represents the resource.

getXMLPrologEncoding

private java.lang.String getXMLPrologEncoding(RetrievalElement retrieved)
                                       throws TestException
Extracts the encoding from the XML declaration.

Example: the method returns "utf-8" if the XML declaration is:
<?xml version="1.0" encoding="utf-8"?>

See http://www.w3.org/TR/REC-xml/#sec-guessing. This method supports most of this heuristic.

Parameters:
retrieved - the last retrieved exchange that represents the resource.
Returns:
the encoding defined in the XML declaration, null when the XML declaration is not found or does not contain any encoding attribute.
Throws:
TestException

validateUTF8

private void validateUTF8(RetrievalElement retrieved)
                   throws TestException
Validates the UTF-8 encoding of the HTTP Response received.

Parameters:
retrieved - the last retrieved exchange that represents the resource.
Throws:
TestException - an unexpected error occurred.

isValid

public boolean isValid()
Returns true when the resource is correctly encoded in UTF-8, false otherwise.

The method returns false when the resource is served with an encoding that is not UTF-8, even if it happens to be correctly encoded in UTF-8 in the end.

Specified by:
isValid in class DecodedContent
Returns:
true when the resource is valid UTF-8.

toMokiNode

public org.w3c.dom.Node toMokiNode(org.w3c.dom.Document document,
                                   org.w3c.dom.Node parent)
Serializes the content to its moki representation as a DOM node, and in practice returns null because there is no defined moki representation for generic text content.

The method should be overridden in derived classes.

Specified by:
toMokiNode in class DecodedContent
Parameters:
document - DOM document the created node should belong to
parent - DOM node to which the representation should be appended.
Returns:
the moki representation of the resource as a DOM node.