W3C

Guidelines for Running the XPath and XQuery Full Text 1.0 Test Suite

This document provides information to implementers who wish to run the XPath and XQuery Full Text 1.0 Test Suite on their implementation. It includes guidelines how test cases can be customized in order to run on an implementation, and describes the process of evaluating the results. The documentation of the XPath and XQuery Full Text 1.0 Test Suite, which defines the structure of the test cases and the catalog, can be found in [1]. Guidelines for submitting results to the XML Query Working Group can be found in [3].

Obtaining a Test Harness

Implementers are expected to write their own test harness that implements the following tasks:

Read test cases from the catalog, apply customization if applicable (see below)
Execute tests, using source files specified in the catalog
Use appropriate comparator to match result
Produce categorization of test result (pass, fail etc., see below)

Ideally, the test harness produces an XML file containing all test results in the format shown below, that can be sent to the working group.

Tests of Optional Features

The test catalog includes a section (the Optional Features group) that provides tests for positive support of various optional features. In many cases these tests duplicate tests in the minimal conformance section except that error codes indicating non-support of a feature are disallowed. An implementation that does not support a particular optional feature does not need to execute the tests in the corresponding subgroup of the optional features group, and may exclude the relevant group from their report.

Test Suite Customization

In order to run the test suite on an XPath and XQuery Full Text 1.0 implementation, implementers may customize the test suite and make a number of well-defined changes to the test cases. All changes made to the original test suite must be documented in free-text form as part of the result submission. Changes beyond the ones listed below must be highlighted.

Setting the Default Context Item

XQuery, with or without Full-Text expressions, supports a number of different ways to refer to source data as query context. Among these are implicit context, external variables, the fn:doc() function, the default collection function, implementation-defined functions, or parameter passing through host-language binding.

Test cases that do not refer to any input document (i.e., the catalog does not contain any “input-file” for the “test-case”) or test a specific context setting (i.e., a “context” attribute in the “input-file” element defines a specific context setting) may not be customized this way.

The following example is a customizable test case (because the input-file element does not define a context attribute)

(: insert-start :)
declare variable $input-context external;
(: insert-end :)

for $b in $input-context/books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

and the corresponding catalog entry:

<test-case is-XPath2="false" name="examples-222-q1"
           FilePath="Examples/2.2.2-Examples/" scenario="standard" Creator="Full-Text Task Force">
  <description>Find the author of each book with a title containing a token
               with the same root as dog and the token cat.</description>
  <spec-citation spec="XQueryFullText" section-number="2.2.2"
                 section-title="Examples" section-pointer="section-ftcontainsexpr-examples"/>
  <query name="examples-222-q1" date="2008-06-26"/>
  <input-file role="principal-data" variable="input-context">ftbookexample</input-file>
  <output-file role="principal" compare="XML">ft-222-examples-results-q1.txt</output-file>
</test-case>

The context setting for this test case may be customized in one of the following six ways. Note that option 3 and 5 (default) are only applicable for one source document.

Unchanged: use external variables as indicated in the original query
Implicit variable declaration: Remove variable declarations between insert-start and insert-end comments. The implementation binds the input context to the variable $input-context.

for $b in $input-context/books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

Implicit context: Remove variable declarations between insert-start and insert-end comments, and the variable references, and refer to the implicit context (.).

for $b in ./books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

Use the fn:doc function: Remove variable declarations between insert-start and insert-end comments, and replace variable references with the fn:doc() function, including the input file specified in the catalog as argument

for $b in fn:doc(ftbookexample.xml)/books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

Use the (default) collection: Remove variable declarations between insert-start and insert-end comments, and replace variable references with the fn:collection() function (including optional URI).

for $b in fn:collection()/books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

Implementation-defined function: Remove variable declarations between insert-start and insert-end comments, and replace variable references with an implementation-defined function resolving to the input context.

for $b in impl-ns:impl-func("ftbookexample")/books/book
where $b/title ftcontains ("dog" with stemming) ftand "cat"
return $b/author

Customizing Namespaces

Test cases can refer to validated documents (which requires namespace declaration) or imported schemas. Such test cases contain schema import declarations located within the same comments as the context item described above.

(: input-start :)
import schema namespace example="http://www.example.com";
(: input-end :)

This declaration must refer to a source schema document defined in the test suite catalog having the same name and URI.

schema ID="example" type="schema" uri="http://www.example.com"
FileName="example.xsd"/>

This declaration can be customized in one of the following three ways:

Unchanged: use schema import as indicated in the original query
Remove the schema import declaration from the query, and add namespace declaration using same name and URI to statically known namespaces before the query is executed.
Replace schema import with namespace declaration using same name and URI.

declare namespace example="http://www.example.com";

Host Language Binding

Test cases can be embedded in a host language, for example using the xmlquery function in SQL. This may require escaping certain characters like quotes.

select xmlquery('$input-context/child'
passing xmlcol as "input-context")
from TreeCompass

Special Sources: Stop Word List, Thesaurus, and Stemming Dictionary

The stopwords, thesaurus, and stemming-dictionary sources are not intended to be used directly in the form in which they are given, but to provide information to those running the test suite about the expectations a particular test has about various implementation-specific aspects of the execution context. Implementations are expected to provide equivalent information to the query, but in whatever form is appropriate in their context.

A stopwords source is a plain text file containing list of stop words, one per line. When a query references this stop word list, the implementation is expected to provide that list of stop words to the query. A thesaurus source is an XML document defined against the thesaurus.xsd XML Schema. When a query references this thesaurus, the implementation is expected to provide equivalent thesaurus information to the query. The stemming-dictionary is a plain text file containing lines of whitespace-separated tokens. Each token on the line should stem to the first token on the line. When the catalog entry for a query references a stemming dictionary, the implementation is expected to provide stemming equivalent to the rules given in the stemming dictionary.

The special uri ##default is used to indicate that the stop word list in question is the default stop word list, and that the thesaurus in question is the default thesaurus. As with stop word lists and thesauri in general, the referenced file is used to provide guidance as to the expected contribution from those resources.

There are some cases where the test is testing that a special source is unknown. In these cases the query references a resource that does not exist. For example, the thesaurus "technical" references a file "Testsources/intentionally-missing.xml". The test harness implementation is expected to reflect this as an unknown resource in the static context.

Using Test Suite Thesauri

XQuery and XPath Full Text 1.0 leaves as implementation-defined the way in which the hierarchical relationships of thesauri are used in Full Text queries. This section describes how to use the test suite thesauri in a way consistent with the expected results of tests that use a thesuarus. In the following text, the phrase "expansion of an item T using relationship R" means the application of fts:lookupThesaurus to the query tokens of query item T.

Expansion of an item T exactly 0 times using relationship R turns the item unchanged.

For example, expansion of 'duties' exactly 0 times using relationship "UF" returns 'duties'
For example, expansion of 'cheese' exactly 0 times using relationship "UF" returns 'cheese'

Expansion of an item T exactly 1 time using relationship R returns item U if T is directly related to U by relationship R, or the empty sequence (i.e. no query items) if T is not directly related by R to any other term. For example using the thesaurus in 'usability.xml'.

For example, expansion of 'duties' exactly 1 time using relationship "UF" returns 'duty'
For example, expansion of 'cheese' exactly 1 time using relationship "UF" returns the empty sequence

Expansion of an item T exactly N times, where N is greater than 1, proceeds by expanding exactly 1 time the result of expanding item T exactly N - 1 times.

Expansion of an item T exactly N times, where N is less than zero using relationship R always returns the empty sequence.

For example, expansion of 'infrastructure' exactly 2 times using relationship "UF" returns 'Web'
For example, expansion of 'duties' exactly 2 times using relationship "UF" returns the empty sequence

Expansion of an item T between N and M times, where M is greater than or equal to N, returns the union of items returned from expansion of item T exactly P times for P between N and M.

Expansion of an item T at most N times, where N is greater than or equal to 0, returns the result of expanding T between 0 and N times.

Expansion of an item T at least N times, where N is greater than or equal to 0, returns the result of expanding T between N and infinity times. That is, expansion continues until the set of items in the expansion does not change.

Sentences and Paragraphs

The subject of sentences and paragraphs requires some consideration for implementations planning to run the test suite.

In Section 1, Introduction, the Full Text specification defines the words "sentence" and "paragraph" as follows:

[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]

[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]

However, the next paragraph states that "Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup." and Section 3, Full-Text Selections, says "This sample tokenization uses white space, punctuation and XML tags as word-breakers and <p> for paragraph boundaries.".

Section 5.2.3, FTUnit and FTBigUnit, says "Support for the "sentences" alternative of FTUnit and the "sentence" alternative of FTBigUnit is optional. Similarly, support for the "paragraphs" alternative of FTUnit and the "paragraph" alternative of FTBigUnit is optional."

After considerable discussion, this topic has been resolved as follows: In the data used to test queries in the Full Text Test Suite, sentences are separated by a period (a/k/a "full stop") followed immediately by white space, and paragraphs are encoded within a <p>...</p> element. Implementations that are configured to use a different mechanism for identifying sentence and/or paragraph boundaries may, provided this is clearly stated in plain text in their implementation reports, revise relevant test data so that the sentence and paragraph boundaries defined in this paragraph are replaced by corresponding boundaries required for that implementation.

Customizing XQueryX Tests

Customizing XQueryX tests must follow the same rules provided above. However, the XQueryX test cases do not include the insert-start/insert-end comments surrounding external variable declaration and schema import. Therefore, a test harness must find the items to be customized in the XQueryX document using the information found in the catalog. The external variable declaration and variable references in the XQueryX document typically looks as follows:

<xqx:varDecl>
<xqx:varName>input-context</xqx:varName>
<xqx:external>
</xqx:external>
</xqx:varDecl>

<xqx:varRef>
<xqx:name>input-context</xqx:name>
</xqx:varRef>

Comparing Results

In order to check correctness of running a test case, the result of the implementation must be compared to the result provided in the test suite. The implementations result of the test case must be serialized and compared to the expected file(s) provided in the test suite. Serialization should be performed as described in XSLT 2.0 and XQuery 1.0 Serialization [4] with method="xml". The catalog defines for each test case, which of the following five comparators has to be applied:

XML: The test harness must canonicalize both, the actual result and the expected result according to the “Canonical XML” recommendation [2], which refers to a number of open-source implementations. Byte-comparison can then be applied to the resulting XML documents. If the test harness does this process in a different manner, it must be documented.
XML fragment: For XML fragments, the same root node must be created for both, implementation result and test suite result. The resulting XML can be compared using XML comparison.
Text: Same comparison as "XML fragment".
Ignore: no comparison needs to be applied; the result is always true if the implementation successfully executes the test case.
Inspect: A human is required to make the call about correctness of the result according to the description in the test case.
Error: The expected result of the test case is and error, identified as an eight-character error code (e.g., XPST0003). The result of a test is true, if the implementation raises an error. However, raising an error because an implementation does not support the feature is not considered a correct result.

It is possible that a test case provides multiple expected results. In this case, successfully comparing the actual result to one of the provided expected results is a "pass".
The expected files provided in the test suite are serialized forms as specified by XSLT 2.0 and XQuery 1.0 Serialization, with the following parameter values:

byte-order-mark             no
cdata-section-elements      empty
doctype-public              (none)
doctype-system              (none)
encoding                    "utf-8"
escape-uri-attributes       (not applicable when method = xml)
include-content-type        (not applicable when method = xml)
indent                      no
media-type                  not applicable
method                      xml
normalization-form          implementation-defined
omit-xml-declaration        yes
standalone                  omit
undeclare-prefixes         no
use-character-maps          empty
version                     implementation-defined

For implementations using different parameters, the test harness must convert the result using the parameters above in order to perform byte-comparison with the provided expected results.

References

[1] XQuery and XPath Full Text 1.0 Test Suite Documentation

[2] Canonical XML Version 1.0, W3C Recommendation 15 March 2001
(http://www.w3.org/TR/xml-c14n)

[3] Guidelines for Submitting XQFTTS Results

[4] XSLT 2.0 and XQuery 1.0 Serialization

Webmaster · Last modified: $Date: 2011-04-12 19:02:05 $ by $Author: jmelton $

Copyright © 1994-2009 W3C^® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.