Annotation of XML/TODO, revision 1.23

1.1       daniel      1: 
1.16      daniel      2:            TODO for the XML parser and stuff:
                      3:           ==================================
                      4: 
1.18      daniel      5: URGENT:
                      6: =======
                      7: - Support for UTF-8 and UTF-16 encoding
                      8:   => added some convertion routines provided by Martin Durst but I didn't
                      9:      try to glue them in. I plan to keep everything internally as UTF-8
                     10:      this is slightly more costly but more compact, and recent processors
                     11:      efficiency is cache related. The key for good performances is keeping
                     12:      the data set small, so will I.
                     13:   => the new progressive reading routines call the detection code which
                     14:      need to be enabled, then thest the ISO->UTF-8 stuff, and add more
                     15:      charset conv routines.
                     16: 
                     17: TODO:
                     18: =====
                     19: 
1.21      daniel     20: - Tools to produce man pages from the SGML docs.
1.17      daniel     21: 
1.16      daniel     22: - Finish XPath
1.18      daniel     23:   => attributes addressing troubles
1.19      daniel     24:   => defaulted attributes handling
                     25:   => namespace axis ?
1.16      daniel     26: 
                     27: - Add Xpointer recognition/API
                     28: 
                     29: - Add Xlink recognition/API
1.18      daniel     30:   => started adding an xlink.[ch] with a unified API for XML and HTML.
1.16      daniel     31: 
                     32: - Implement XSLT
1.19      daniel     33:   => seems that someone volunteered ?!?
1.16      daniel     34: 
1.18      daniel     35: - Implement XSchemas
                     36: 
1.16      daniel     37: - O2K parsing;
                     38:   => this is a somewhat ugly mix of HTML and XML, adding a specific
                     39:      routine in the comment parsing code of HTML and plug the XML 
                     40:      parsing one in-there should not be too hard. Key point is to get
                     41:      XSL to transform all this to something decent ...
                     42: 
1.15      daniel     43: - Add regression tests for all WFC errors
1.16      daniel     44:   => did some in test/WFC , not added to the Makefile yet.
1.15      daniel     45: 
1.13      daniel     46: - Optimization of tag strings allocation.
                     47: 
1.12      daniel     48: - Language identification code, productions [33] to [38]
                     49: 
                     50: - Conditional sections in DTDs [61] to [65]
1.18      daniel     51:   => should this crap be really implemented ???
1.12      daniel     52: 
1.11      daniel     53: 
1.18      daniel     54: - Allow parsed entities defined in the internal subset to override
                     55:   the ones defined in the external subset (DtD customization).
                     56:   => This mean that the entity content should be computed only at
                     57:      use time, i.e. keep the orig string only at parse time and expand
                     58:      only when referenced from the external subset :-(
1.20      daniel     59:      Needed for complete use of most DTD from Eve Maler
1.18      daniel     60: 
                     61: - maintain coherency of namespace when doing cut'n paste operations
                     62:   => the functions are coded, but need testing
1.11      daniel     63: 
1.18      daniel     64: - function to rebuild the ID table ?
1.10      daniel     65: 
1.18      daniel     66: - extend the shell with:
                     67:    - edit
                     68:    - load/save
                     69:    - mv (yum, yum, but it's harder because directories are ordered in
                     70:      our case, mvup and mvdown would be required)
1.16      daniel     71: 
1.18      daniel     72: - Parsing of a well balanced chunk
1.8       daniel     73: 
1.20      daniel     74: - Add HTML validation using the XHTML DTD
                     75:   - problem: do we want to keep and maintain the code for handling
                     76:     DTD/System ID cache directly in libxml ?
                     77: 
                     78: - Add a DTD cache prefilled with xhtml DTDs and entities and a program to
                     79:   manage them -> like the /usr/bin/install-catalog from SGML
                     80:   right place seems $datadir/xmldtds
                     81: 
                     82: - turn tester into a generic program xml-test installed with xml-devel
1.16      daniel     83: 
1.19      daniel     84: - Add output to XHTML in case of HTML documents.
                     85: 
1.22      daniel     86: - dynamically adapt the alloc entry point to use g_alloc()/g_free()
                     87:   if the programmer wants it
                     88: 
1.23    ! daniel     89: - I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
        !            90:   by the XML parser, UTF-8 should be checked when there is no "encoding"
        !            91:   declared !
        !            92: 
1.16      daniel     93: Done:
                     94: =====
                     95: 
1.20      daniel     96: - External entities loading: 
                     97:    - allow override by client code
                     98:    - make sure it is alled for all external entities referenced
                     99:   Done, client code should use xmlSetExternalEntityLoader() to set
                    100:   the default loading routine. It will be called each time an external
                    101:   entity entity resolution is triggered.
                    102: - maintain ID coherency when removing/changing attributes
                    103:   The function used to deallocate attributes now check for it being an
                    104:   ID and removes it from the table.
                    105: - push mode parsing i.e. non-blocking state based parser
                    106:   done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
                    107:   and xmlParseChunk() and html counterparts.
                    108:   The tester program now has a --push option to select that parser 
                    109:   front-end. Douplicated tests to use both and check results are similar.
                    110: 
1.18      daniel    111: - Most of XPath, still see some troubles and occasionnal memleaks.
                    112: - an XML shell, allowing to traverse/manipulate an XML document with
                    113:   a shell like interface, and using XPath for the anming syntax
1.20      daniel    114:   - use of readline and history added when available
                    115:   - the shell interface has been cleanly separated and moved to debugXML.c
1.18      daniel    116: - HTML parser, should be fairly stable now
                    117: - API to search the lang of an attribute
                    118: - Collect IDs at parsing and maintain a table. 
                    119:    PBM: maintain the table coherency
                    120:    PBM: how to detect ID types in absence of DtD !
                    121: - Use it for XPath ID support
                    122: - Add validity checking
                    123:   Should be finished now !
1.16      daniel    124: - Add regression tests with entity substitutions
                    125: 
                    126: - External Parsed entities, either XML or external Subset [78] and [79]
                    127:   parsing the xmllang DtD now works, so it should be sufficient for
                    128:   most cases !
                    129: 
                    130: - progressive reading. The entity support is a first step toward
1.7       daniel    131:   asbtraction of an input stream. A large part of the context is still
                    132:   located on the stack, moving to a state machine and putting everyting
                    133:   in the parsing context should provide an adequate solution.
1.8       daniel    134:   => Rather than progressive parsing, give more power to the SAX-like
                    135:      interface. Currently the DOM-like representation is built but
1.18      daniel    136:      => it should be possible to define that only as a set of SAX callbacks
                    137:        and remove the tree creation from the parser code.
                    138:        DONE
1.8       daniel    139: 
1.1       daniel    140: - DOM support, instead of using a proprietary in memory
                    141:   format for the document representation, the parser should
                    142:   call a DOM API to actually build the resulting document.
                    143:   Then the parser becomes independent of the in-memory
                    144:   representation of the document. Even better using RPC's
                    145:   the parser can actually build the document in another
                    146:   program.
1.8       daniel    147:   => Work started, now the internal representation is by default
                    148:      very near a direct DOM implementation. The DOM glue is implemented
1.16      daniel    149:      as a separate module. See the GNOME gdome module.
                    150: 
1.2       daniel    151: - C++ support : John Ehresman <jehresma@dsg.harvard.edu>
1.6       daniel    152: - Updated code to follow more recent specs, added compatibility flag
1.7       daniel    153: - Better error handling, use a dedicated, overridable error
                    154:   handling function.
                    155: - Support for CDATA.
                    156: - Keep track of line numbers for better error reporting.
                    157: - Support for PI (SAX one).
1.8       daniel    158: - Support for Comments (bad, should be in ASAP, they are parsed
                    159:   but not stored), should be configurable.
                    160: - Improve the support of entities on save (+SAX).
1.2       daniel    161: 

Webmaster