This is the schema for MOKI, the MobileOK Intermediate document format.

moki was produced as part of the mobileOK Checker project, to allow the representation of an HTML resource and all its associated resources together, in order to allow the checking of that HTML resource for mobileOK conformance.

moki represents aspects of the HTTP retrieval of a resource, provides the ability to encode the content of the resource and to attach specific metadata about resources, concerning their size, validity and so on.

Although moki was created for the specific purpose of being processed by XSLT to provide a "results" document containing the results of the mobileOK Basic Tests (hence it being termed "intermediate"), its possible uses are more general than that. No additional use cases were taken into account in its design and it is likely that it would need to be extended or modified to accommodate other applications.

That said, the mobileOK checker itself does not currently exploit all moki features.

Moki was developed and elaborated by members of the mobile OK Checker Task Force from an initial design by Jo Rabin.

For more information on the mobileOK Basic Tests, please refer to the W3C mobileOK Basic Tests 1.0 specification.

Comments about moki should be sent to the Checker Task Force mailing list (see archives), public-mobileOK-checker@w3.org.

Editor: Jo Rabin

Date: 02 November 2008

This Version: http://dev.w3.org/2007/mobileok-ref/moki/schema/moki-20081204.xsd

Latest Version: http://dev.w3.org/2007/mobileok-ref/moki/schema/moki.xsd

Previous Version: http://dev.w3.org/2007/mobileok-ref/moki/schema/moki-20071022.xsd

Status: stable, the mobileOK Basic Tests 1.0 is a W3C Proposed Recommendation

$Id: moki.xsd,v 1.3 2008-12-04 16:57:45 fd Exp $

The root element of a moki document.

A timestamp of when the retrieval activity started.

The value should be the number of milliseconds elapsed since January 1, 1970 UTC.

Information about the process carrying out the assembly of the resources into moki format.

The name of a processing component.

The version of a processing component.

The IP Address of the component generating the moki document.

Since servers may vary the representation of resources based on IP address, this field is significant, though it is understood that address translation (NAT) may be in operation, so the IP address assigned to the component compiling the moki document may not be that received by the remote servers. Where possible, therefore, the address inserted here should be that after address translation.

This element contains information about the primary resource, i.e. the resource that is the subject of the moki document. The element contains details of the document's HTTP retrieval followed by details of the document itself.

Details of the primary document, which is assumed to be an XHTML document.

This element describes validity aspects of the primary document. The "valid" attribute summarises the validity status - it is true if all aspects of validity are true.

The mobileOK Checker incorrectly produces a validity element as a child element of the imageInfo element. It should rather produce an imageValidity element (whose definition it actually respects).

The number of characters in the document and the number of white space characters that are extraneous (as defined in mobileOK Basic Tests MINIMIZE.

Not sure whether the "total" here is different to the "size" of the final HTTP retrieval's body entity.

Details of the XML declaration of the primary document. "declaration" indicates the presence or otherwise of an XML declaration, encoding indicates the declared character encoding, absent if none.

Missing: version? Absent indicates no version declared, present, the version declared.

Details of any DOCTYPE declaration that may be present.

The HTML content of the document, tidied if necessary to make it correct. The original untidied version is found as the entity body of the final HTTP request. The content is tidied if "Validity" of the parent element is false.

We should consider adding a "tidied" attribute to indicate this. Also it would be nice if we allowed for details of which tidy operations were carried out, they may not correspond exactly to the validity errors.

A collection of style sheet information associated with the document.

An aspect of the style sheet information associated with the document - which might be an external style, a style element ("embedded") or "in line" style. The URI element refers to where the reference to the style sheet was found, rather than the URI of the style sheet itself.

POssibly missing here is a more precise reference to where the style was found - especially for "inline".

A collection of the images that were found in the primary document and in the style sheets associated with the primary document (as background or as list item images. The list contains only one reference for each image, caching considerations being equal.

Ideally each image would contain a list of references that caused its retireval.

Information about the retrieved image, its URI, details of its HTTP retrieval and basic image information.

Information about an image. The "type" attribute denotes the Internet Content Type (mime type) of the image. Validity contains information about whether the document is a valid instance of that image type, transparency denotes whether the whole image is transparent, and actualDimensions determines the actual pixel dimensions of images that have an intrinsic size in pixels. Dimensions may be absent for image types that do not support that features.

The mobileOK Checker incorrectly produces a validity child element instead of the imageValidity element it should produce.

The "transparent" attribute is true if all the pixels of an image are transparent.

This would probably be better as an attribute.

The actual dimensions in pixels of images that have an intrinsic size.

A collection of objects found in the document - i.e. the referrent of "object" elements. A resource is only contained once in the list, even if it is referred to multiple times, caching considerations being equal. An image that is referred to as an Object is placed under the images element of a moki document.

It would be worth giving some thought as to whether the images list and the objects list should be combined.

It would be worth giving some thought as to how to represent the dependent relationships between objects.

The referrent of an object element in the primary document.

Ideally, each object would reference where it was found in the referring document.

Contains information about the object:

the "loadtype" attribute specifies how the object would be handled by browsers:
- "RENDERED" means the object would be rendered
- "TASTED" means the object would not be rendered but would need to be retrieved to render the document
- "SKIPPED" means the object would not be retrieved
the "type" attribute specifies the Internet Content Type found on retrieval of the object.

All the targets of anchor (<a>) elements found in the primary document.

Should be one element per target and references to where they were found?

Information about the retrieval of a document referenced by hyperlink.

Details of the HTTP interaction involved in retrieving a document. The retrieved URI is the absolute URI retrieved (as opposed to a possibly relative URI referenced in the document). This is followed by the sequence of requests and responses involved in obtaining the resource.

An XML representation of the request. Raw headers if present, represents the string that actually sent as headers. timeStart can be used to enable timing of the request.

The HTTP method (GET, POST etc.) of a request.

An XML representation of an HTTP response. Raw headers if present, represents the string that was parsed to create the headers. timeStart and timeEnd can be used to track more precise timings of responses.

Need to represent failure of underlying DNS or TCP

This is the HTTP response status line with code (e.g. 200) and reason (e.g. OK).

The HTTP entity body (i.e. the raw body of the response). The body of this element can be omitted, if so the "encoding" attribute is omitted too. The size attribute must be present and is the size in bytes of the entity body after any transfer decoding has been applied (e.g. if the transfer coding was gizp, size refers to the exploded version). The encoding attribute must be present if element content is present and is "escaped" for escaped text coding and "base64" for images etc.

At present this is not the way the aplha release works.

The raw content of the HTTP headers as send across the TCP connection.

HTTP/1.1 and so on.

An HTTP header. This can take two forms:

The first form take a "name" parameter and a "value" parameter. This is the form used when no structural decomposition of the header can usefully be performed or when the HTTP parameter name is not recognized.

The second form take a "name" has no "value" parameter, but contains a sequence of "moki:element" elements. This is the form used when the parameter has a structure analogous to that specified in RFC 822, where the value of the header can be a comma separated list and where elements of the list may have parameters attached to them with semi-colons, examples being "Accept" (e.g. Accept: application/xhtml+xml, text/html;q=0.1, text/plain;q=0.1) and "Cache-Control" - (e.g. Cache-Contol: max-age=0, no-transform. In these cases, analysis using XSLT or other XML aware processing tools is facilitated by using the "moki:element" and "moki:parameter" elements.

The mobileOK checker uses this form for the HTTP headers listed in HeaderParseMethod as PARSE_AND_NORMALIZE and PARSE_BUT_DO_NOT_NORMALIZE and additionally performs white-space and case normalization on headers shown in that file as PARSE_AND_NORMALIZE.

HTTP header names are normalized to lower case in all cases.

Elements of HTTP headers that are defined as being comma separated lists. Each element of the list is represented by a separate "moki:element" element, and those that have values (e.g. max-age=0) have the "value" attribute set appropriately.

Some elements of HTTP headers (the elements of comma separated lists in the original header value) have parameters which are separated from the list element by a semi-colon. Each parameter is represented by a separate "moki:parameter" element, whose name and value are represented using the repsective attribute (e.g. q=0.1).

The underlying type for moki error reporting.

Details of CSS errors and warnings, if any. Overall validity (no errors) indicated by "valid" attribute being true.

Details of image errors and warnings, if any. Overall validity (no errors) indicated by "valid" attribute being true.

Records errors and warnings when evaluating UTF-8 validity

Records errors and warnings when evaluating against the provided DOCTYPE

Records errors and warnings when evaluating against XHTML Basic 1.1

The underlying type for moki error and warning details.

Possibly need to allow for more than one info to allow for multi-lingual reporting.

More than one position to allow for multiple errors of same type?

Records a validity error.

Records a validity warning.

A textual description of the error

Probably needs to be elaborated with xml:lang, and potentially a language neutral code

Specifies the location in the resource where the error occurs.

Seemingly a simple matter, actually ver difficult to populate given the information that is returned by validity checkers. In principle there are a number if types of position information, e.g. for character validation you'd prefer a character offset, for binary format a byte offset and so on.

The line offset into a line oriented resource. Starting from 0.

The column offset into a line oriented resource. Starting from 0.