This document is an introduction to the structure transformation mechanism provided by Amaya. It describes the syntax of the transformation language and the way transformations are performed in the editor.
The file amaya/HTML.trans
contains the description of
available transformations. This file can be edited during an
Amaya session. It is dynamically parsed when the
transformation procedure is called by the editor, so new transformations can
be added during an editing session.
Comments begin with!
and continue until the
end of the line.
The file consists of a list of transformation descriptions. Each transformation is described by three parts :
:
;
{
}
, each rule terminated by a semi-colon
;
The name appears in the Transform menu and identifies the transformation for the end-user.
The pattern describes a specific organization of the elements to be transformed. It acts as a filter over the HTML dtd. The purpose of the pattern is to identify a particular combination of elements to which the transformation can be applied. In a pattern it is possible to express conditions on sequence of tags, on the content of a tag and on the existence and value of attributes.
Formally, a pattern contains HTML tags (possibly with attributes) and some composition operators:
|
for choice
,
for sibling
+
for sequence
?
for option
( )
for grouping nodes
The braces {
}
define the content of a node.
The symbol *
is a token that matches any
element type.
It is possible to rename a tag by preceding it with a name followed by a
colon (:
).
The tag may have attributes. If no value is given for an attribute, an element is matched if the attribute is present. If a value is specified for the attribute, an element is matched if the attribute is present and have the specified value.
Examples of patterns are given at the end of the document.
A rule expresses how some elements identified in the pattern are
transformed. A rule has two parts separated by the symbol
>
:
The target tag list is itself divided into two parts separated by a colon
(:
):
The generation location path is searched in the leftmost branch of the document tree, starting from the parent of the element matching the highest symbol of the pattern.
In the target tag list, the dot symbol (.
)
is used for descending in the tree structure.
If the special token star (*
) ends the list
of tags to be generated, the source element tag is not changed, but it can be
moved to a different place in the destination.
If the source tag or the name in the left part of a rules is present more than once in the pattern, the rule transforms all the elements matching an occurrence of the tag in the pattern.
When the user chooses the Transform command from the
Edit menu, Amaya parses the HTML.trans
(or the
MathML.trans
, etc.) file. Then the selected elements are matched
with the pattern of each transformation. The names of the matched
transformations are proposed to the user in a pop-up menu.
If several transformations with the same name match the selected elements,
the higher-level matching transformation is proposed to the user. If several
transformations match at the same level, the first one declared in the
HTML.trans
file is proposed. As a consequence, it is recommended
to specify the transformations with specific patterns before the more general
ones.
Once a transformation has been chosen by the user, the destination structure is built according to the rules while selected elements are traversed.
Finally, the contents of the source elements (text and pictures, but also structured elements) are moved into the produced elements.
This transformation process for HTML documents is fully described in Interactively Restructuring HTML Documents, a paper presented at the 5th international WWW conference in Paris, May 96, by Cécile Roisin and Stéphane Bonhomme.
Merge Lists: (ul{li+})+; { li > ul:li; }
The pattern matches a sequence of unnumbered lists (UL), that contain a sequence of items (LI).
The rule expresses that each time an item is encountered when traversing the matched elements, a new LI tag is created within an UL. When the rule is first applied, the resulting structure is empty, so there is no UL element in which the LI can be created. Therefore an UL is first created, then the rule can be applied.
Table: dl{(dt|dd)+}; { dt > <table border=1>.tbody:tr.td; dd > <table border=1>.tbody.tr:td; }
The pattern matches any Definition List element (dl).
The rules explain how the table is incrementally built when the structure of the selected definition list is traversed :
Remove Table: table{?caption,?(body:*{(tr{(td{(?cell_content:*)+}| th{(?cell_content:*)+} )})+})+}; { caption>h3; cell_content>:*; }
The pattern matches any table and identifies the content of each cell of the table (cell_content).
The second rule expresses that the contents of each cell have to be moved to the place of the original table.