![]()
This document provides information to implementers who wish to run the XPath and XQuery Full Text 1.0 Test Suite on their implementation. It includes guidelines how test cases can be customized in order to run on an implementation, and describes the process of evaluating the results. The documentation of the XPath and XQuery Full Text 1.0 Test Suite, which defines the structure of the test cases and the catalog, can be found in [1]. Guidelines for submitting results to the XML Query Working Group can be found in [3].
Ideally, the test harness produces an XML file
containing
all test results in the format shown below, that can be sent to the
working group.
The test catalog includes a section (the Optional Features group) that provides tests for positive support of various optional features. In many cases these tests duplicate tests in the minimal conformance section except that error codes indicating non-support of a feature are disallowed. An implementation that does not support a particular optional feature does not need to execute the tests in the corresponding subgroup of the optional features group, and may exclude the relevant group from their report.
In order to run the test suite on an XPath and XQuery Full Text 1.0 implementation, implementers may customize the test suite and make a number of well-defined changes to the test cases. All changes made to the original test suite must be documented in free-text form as part of the result submission. Changes beyond the ones listed below must be highlighted.
XQuery, with or without Full-Text expressions, supports a number of different ways to refer to source data as query context. Among these are implicit context, external variables, the fn:doc() function, the default collection function, implementation-defined functions, or parameter passing through host-language binding.
Test cases that do not refer to any input document (i.e., the catalog does not contain any “input-file” for the “test-case”) or test a specific context setting (i.e., a “context” attribute in the “input-file” element defines a specific context setting) may not be customized this way.
The following example is a customizable test case (because the input-file element does not define a context attribute)
|
(: insert-start :) declare variable $input-context external; (: insert-end :) for $b in $input-context/books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
and the corresponding catalog entry:
|
<test-case is-XPath2="false" name="examples-222-q1" FilePath="Examples/2.2.2-Examples/" scenario="standard" Creator="Full-Text Task Force"> <description>Find the author of each book with a title containing a token with the same root as dog and the token cat.</description> <spec-citation spec="XQueryFullText" section-number="2.2.2" section-title="Examples" section-pointer="section-ftcontainsexpr-examples"/> <query name="examples-222-q1" date="2008-06-26"/> <input-file role="principal-data" variable="input-context">ftbookexample</input-file> <output-file role="principal" compare="XML">ft-222-examples-results-q1.txt</output-file> </test-case> |
The context setting for this test case may be customized in one of the following six ways. Note that option 3 and 5 (default) are only applicable for one source document.
|
for $b in $input-context/books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
|
for $b in ./books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
|
for $b in fn:doc(ftbookexample.xml)/books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
|
for $b in fn:collection()/books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
|
for $b in impl-ns:impl-func("ftbookexample")/books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author |
Test cases can refer to validated documents (which requires namespace declaration) or imported schemas. Such test cases contain schema import declarations located within the same comments as the context item described above.
|
(:
input-start :) |
This declaration must refer to a source schema document defined in the test suite catalog having the same name and URI.
|
schema
ID="example" type="schema" uri="http://www.example.com"
|
This declaration can be customized in one of the following three ways:
| declare namespace example="http://www.example.com"; |
Test cases can be embedded in a host language, for example using the xmlquery function in SQL. This may require escaping certain characters like quotes.
|
select
xmlquery('$input-context/child' |
The stopwords, thesaurus, and stemming-dictionary sources are not intended to be used directly in the form in which they are given, but to provide information to those running the test suite about the expectations a particular test has about various implementation-specific aspects of the execution context. Implementations are expected to provide equivalent information to the query, but in whatever form is appropriate in their context.
A stopwords source is a plain text file containing list of stop words, one per line. When a query references this stop word list, the implementation is expected to provide that list of stop words to the query. A thesaurus source is an XML document defined against the thesaurus.xsd XML Schema. When a query references this thesaurus, the implementation is expected to provide equivalent thesaurus information to the query. The stemming-dictionary is a plain text file containing lines of whitespace-separated tokens. Each token on the line should stem to the first token on the line. When the catalog entry for a query references a stemming dictionary, the implementation is expected to provide stemming equivalent to the rules given in the stemming dictionary.
The special uri ##default is used to indicate that the stop
word list in question is the default stop word list, and that the thesaurus in
question is the default thesaurus. As with stop word lists and thesauri in
general, the referenced file is used to provide guidance as to the expected
contribution from those resources.
There are some cases where the test is testing that a special source is unknown. In these cases the query references a resource that does not exist. For example, the thesaurus "technical" references a file "Testsources/intentionally-missing.xml". The test harness implementation is expected to reflect this as an unknown resource in the static context.
XQuery and XPath Full Text 1.0 leaves as implementation-defined the way in which the hierarchical relationships of thesauri are used in Full Text queries. This section describes how to use the test suite thesauri in a way consistent with the expected results of tests that use a thesuarus. In the following text, the phrase "expansion of an item T using relationship R" means the application of fts:lookupThesaurus to the query tokens of query item T.
Expansion of an item T exactly 0 times using relationship R turns the item unchanged.
For example, expansion of 'duties' exactly 0 times using relationship "UF"
returns 'duties'
For example, expansion of 'cheese' exactly 0 times using relationship "UF"
returns 'cheese'
Expansion of an item T exactly 1 time using relationship R returns item U if T is directly related to U by relationship R, or the empty sequence (i.e. no query items) if T is not directly related by R to any other term. For example using the thesaurus in 'usability.xml'.
For example, expansion of 'duties' exactly 1 time using relationship "UF"
returns 'duty'
For example, expansion of 'cheese' exactly 1 time using relationship "UF"
returns the empty sequence
Expansion of an item T exactly N times, where N is greater than 1, proceeds by expanding exactly 1 time the result of expanding item T exactly N - 1 times.
Expansion of an item T exactly N times, where N is less than zero using relationship R always returns the empty sequence.
For example, expansion of 'infrastructure' exactly 2 times using relationship "UF"
returns 'Web'
For example, expansion of 'duties' exactly 2 times using relationship "UF"
returns the empty sequence
Expansion of an item T between N and M times, where M is greater than or equal to N, returns the union of items returned from expansion of item T exactly P times for P between N and M.
Expansion of an item T at most N times, where N is greater than or equal to 0, returns the result of expanding T between 0 and N times.
Expansion of an item T at least N times, where N is greater than or equal to 0, returns the result of expanding T between N and infinity times. That is, expansion continues until the set of items in the expansion does not change.
The subject of sentences and paragraphs requires some consideration for implementations planning to run the test suite.
In Section 1, Introduction, the Full Text specification defines the words "sentence" and "paragraph" as follows:
[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]
[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]
However, the next paragraph states that "Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup." and Section 3, Full-Text Selections, says "This sample tokenization uses white space, punctuation and XML tags as word-breakers and <p> for paragraph boundaries.".
Section 5.2.3, FTUnit and FTBigUnit, says "Support for the "sentences" alternative of FTUnit and the "sentence" alternative of FTBigUnit is optional. Similarly, support for the "paragraphs" alternative of FTUnit and the "paragraph" alternative of FTBigUnit is optional."
After considerable discussion, this topic has been resolved as follows: In the data used to test queries in the Full Text Test Suite, sentences are separated by a period (a/k/a "full stop") followed immediately by white space, and paragraphs are encoded within a <p>...</p> element. Implementations that are configured to use a different mechanism for identifying sentence and/or paragraph boundaries may, provided this is clearly stated in plain text in their implementation reports, revise relevant test data so that the sentence and paragraph boundaries defined in this paragraph are replaced by corresponding boundaries required for that implementation.
Customizing XQueryX tests must follow the same rules provided above.
However, the XQueryX test cases do not include the
insert-start/insert-end comments surrounding external variable
declaration and schema import. Therefore, a test harness must find the
items to be customized in the XQueryX document using the information
found in the catalog. The external variable declaration and variable
references in the XQueryX document typically looks as follows:
|
<xqx:varDecl> |
In order to check correctness of running a test case, the result of the implementation must be compared to the result provided in the test suite. The implementations result of the test case must be serialized and compared to the expected file(s) provided in the test suite. Serialization should be performed as described in XSLT 2.0 and XQuery 1.0 Serialization [4] with method="xml". The catalog defines for each test case, which of the following five comparators has to be applied:
[1]
XQuery and XPath Full Text 1.0 Test
Suite Documentation
[3]
Guidelines
for
Submitting XQFTTS Results
[4] XSLT 2.0 and XQuery 1.0 Serialization
Copyright © 1994-2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.