File:  [Public] / XML / TODO
Revision 1.15: download - view: text, annotated - select for diffs
Sun Aug 1 18:25:52 1999 UTC (24 years, 10 months ago) by daniel
Branches: MAIN
CVS tags: HEAD
Bunch of cleanup in the parser for better conformance to the spec.
Checked all WFC, separated getParameterEntity in SAX since the
same name can be used for both, Daniel.


           TODO for the XML parser:

- Add regression tests for all WFC errors

- Add regression tests with entity substitutions

- API to search the lang of an attribute

- Optimization of tag strings allocation.

- Language identification code, productions [33] to [38]

- Conditional sections in DTDs [61] to [65]

- External Parsed entities, either XML or external Subset [78] and [79]
  (on the workbench) parsing the xmllang DtD seems to end up in an
  infinite loop though...

- Collect IDs at parsing and maintain a table. PBM: maintain the table coherency

- Use it for XPath ID support

- Start adding validation, now that the DTD are fully parsed.

- Support for UTF-8 and UTF-16 encoding (Urgent !!!).
  => added some convertion routines provided by Martin Durst but I didn't
     try to glue them in. I plan to keep everything internally as UTF-8
     this is slightly more costly but more compact, and recent processors
     efficiency is cache related. The key for good performances is keeping
     the data set small, so will I.

- progressive parsing. The entity support is a first step toward
  asbtraction of an input stream. A large part of the context is still
  located on the stack, moving to a state machine and putting everyting
  in the parsing context should provide an adequate solution.
  => Rather than progressive parsing, give more power to the SAX-like
     interface. Currently the DOM-like representation is built but
     it should be possible to define that only as a set of SAX callbacks
     and remove the tree creation from the parser code.


Done:
- DOM support, instead of using a proprietary in memory
  format for the document representation, the parser should
  call a DOM API to actually build the resulting document.
  Then the parser becomes independent of the in-memory
  representation of the document. Even better using RPC's
  the parser can actually build the document in another
  program.
  => Work started, now the internal representation is by default
     very near a direct DOM implementation. The DOM glue is implemented
     as a separate module. See the gdome module.
- C++ support : John Ehresman <jehresma@dsg.harvard.edu>
- Updated code to follow more recent specs, added compatibility flag
- Better error handling, use a dedicated, overridable error
  handling function.
- Support for CDATA.
- Keep track of line numbers for better error reporting.
- Support for PI (SAX one).
- Support for Comments (bad, should be in ASAP, they are parsed
  but not stored), should be configurable.
- Improve the support of entities on save (+SAX).

$Id: TODO,v 1.15 1999/08/01 18:25:52 daniel Exp $

Webmaster