Annotation of charlint/Overview.html, revision 1.50

1.1       duerst      1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
1.34      duerst      2:                       "http://www.w3.org/TR/REC-html40/loose.dtd">
                      3: <html>
                      4: <head>
                      5:   <meta http-equiv="Content-Type" content="text/html">
                      6:   <meta http-equiv="Content-Style-Type" content="text/css">
1.1       duerst      7:   <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
                      8:   <!--LINK rel="stylesheet" href="../i18n.css"-->
1.34      duerst      9:   <style type="text/css">  <!--
                     10: H1.title {text-align: center }
                     11: P.toolbar { text-align: center }
                     12: DIV.deliverable { margin-left: 2em;
                     13: margin-right: 2em }
                     14: P.note { margin-left: 10%;
                     15: margin-right: 10%;
                     16: color: green }
                     17: TH { text-align: left }
                     18: TH, TD { padding: 2px }
                     19: .external { font-style: italic }
                     20: -->
                     21: 
1.45      duerst     22: 
1.47      duerst     23: 
1.34      duerst     24:   </style>
                     25:   <title>Charlint - A Character Normalization Tool</title>
                     26:   <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
                     27: </head>
                     28: 
                     29: <body bgcolor="#FFFFFF" text="#000000">
                     30: <p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"
                     31: height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt"
                     32: alt="International" width="72" height="48" border="0"></a></p>
                     33: 
                     34: <h1>Charlint - A Character Normalization Tool</h1>
                     35: 
                     36: <p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data
                     37: Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a>
                     38: | <a href="#Background">Background </a>| <a href="#Version">Version
                     39: History</a></p>
                     40: 
1.50    ! duerst     41: <p><strong>IMPORTANT</strong>: Newest version 0.40, implements NFC (Canonical
        !            42: Composition) and NFD (Canonical Decomposition), including Hangul</p>
1.36      duerst     43: 
1.34      duerst     44: <p>Charlint is a character normalization/checking tool written in Perl. Among
                     45: else, it implements Normalization Form C of <a
                     46: href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p>
                     47: 
                     48: <h3><a name="Perl">Perl Source</a> and Installation</h3>
                     49: 
                     50: <p>Charlint , aka 'Charlie', is written in <a
                     51: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can
                     52: get the source from <a
                     53: href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>.
                     54: Charlint is covered by the <a
                     55: href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software
                     56: licence</a>. To install charlint, please make sure you have installed <a
                     57: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have
                     58: downloaded an appropriate character data file, and you have downloaded the <a
                     59: href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>.
                     60: Please send error reports or comments to <a
                     61: href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public
                     62: discussion please see the Winter mailing list (www-international@w3.org).</p>
                     63: 
                     64: <h3><a name="Recommended">Recommended Character Data Files</a></h3>
                     65: 
                     66: <p>Charlint needs information on characters in order to work correctly. To
                     67: indicate the file you want to use, please use the -f option. The currently
                     68: recommended character data file is available from <a
1.40      duerst     69: href="fftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt">ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt</a>.
1.34      duerst     70: Composition exclusions are currently hard-coded and are based on <a
1.41      duerst     71: href="ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt">ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt</a>
1.42      duerst     72: [Final 3.0.0 version of 10 Sept 1999 for both files; identical with the
                     73: versions available on the CD-ROM provided with <a
                     74: href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode Standard,
                     75: Version 3.0</a>]. Additional information on these and other files can be found
                     76: at <a
                     77: href="http://www.unicode.org/unicode/onlinedat/online.html">http://www.unicode.org/unicode/onlinedat/online.html</a>.</p>
1.34      duerst     78: 
                     79: <h3><a name="How">How to use charlint</a></h3>
                     80: 
                     81: <p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both
                     82: for input and for output. Behaviour can be fine-tuned with various options. A
                     83: list of options such as the one below can be optained by using <kbd>charlint
                     84: -h</kbd>.</p>
                     85: <pre>(options prefixed by # are currently not available)
1.14      duerst     86: -b: Remove initial 'Byte Order Mark'
                     87: -B: Supress warning about initial 'Byte Order Mark'
1.48      duerst     88: -C: Do not normalize
1.14      duerst     89: -d: Debug: Thoroughly check character data table input
                     90: -D: Leave after reading in character data
                     91: -e: # remove undefined codepoints
                     92: -E: Do not warn about undefined codepoints
                     93: -f file: Read data from file
                     94:          (please use newest V3.0 beta datafiles)
                     95: -h: Prints out this short description
1.48      duerst     96: -k: # Warn about compatibility codepoints
                     97: -K: # Normalize out compatibility codepoints
1.14      duerst     98: -n: Accept &amp;#ddddd; and &amp;#xhhhh; on input
1.34      duerst     99:         (beware of &lt;![CDATA[, &lt;SCRIPT>, &lt;STYLE>)
1.48      duerst    100: -N: Produce &amp;#xhhhh; on output
                    101: -o: Print out 'unprintable' bytes as \octal
1.14      duerst    102: -p: # Remove stuff in private zone
                    103: -P: Supress checking private zone
                    104: -u: # Fix UTF-8 (convert or remove)
                    105: -U: Supress checking correctness of UTF-8
1.48      duerst    106: -v: Print version
                    107: -x: Do decomposition only
                    108: -X: Don't do decomposition (assume input is decomposed)</pre>
1.34      duerst    109: 
                    110: <h3><a name="Version">Version History</a></h3>
1.47      duerst    111: <pre># 2000/08/03: 0.40, added Hangul support and did quite some testing  MJD
                    112: # 2000/08/02: 0.37, added -x and -X for decomposition                MJD
                    113: # 2000/07/27: 0.36, fixed a bug for non-starter decompositions       MJD
1.39      duerst    114: # 2000/07/24: 0.35, adapted exclusions to 3.0.0 final (+Tibetan)     MJD
                    115: # 2000/07/24: 0.34, $chClass = $CombClass{ch}; should read $chClass = $CombClass{$ch};
                    116: #                   implemented -C                                   MJD
                    117: # 1999/08/16: 0.33, updated for second version of 3.0.0.beta         MJD
                    118: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta      MJD
1.23      duerst    119: # 1999/06/25: 0.31, fixed reordering bug, going public               MJD
1.39      duerst    120: # 1999/06/23: 0.30, preparation for W3C member test, without Hangul  MJD</pre>
1.34      duerst    121: 
                    122: <h3><a name="Background">Background</a></h3>
                    123: <ul>
                    124:   <li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World
                    125:     Wide Web</a> (W3C Working Draft)</li>
                    126:   <li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
1.43      duerst    127:     Report #15</a> (Version 18 part of Unicode V 3.0)</li>
1.34      duerst    128:   <li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li>
                    129: </ul>
                    130: 
                    131: <h3><a name="Future">Future Plans</a></h3>
                    132: 
                    133: <p>We have just released the first version of charlint. There are many things
                    134: we plan to add in the future:</p>
                    135: <ul>
                    136:   <li>Hangul syllable normalization</li>
                    137:   <li>Removal of undefined codepoints and codepoints in the private zone</li>
                    138:   <li>Removal/fix of incorrect UTF-8</li>
1.44      duerst    139:   <li>Compatibility character detection or removal</li>
                    140:   <li>Detection or removal of characters not suitable for markup</li>
1.34      duerst    141: </ul>
                    142: 
                    143: <p>Your help (bug reports, patches, ideas, test cases) is welcome.</p>
                    144: <hr>
                    145: 
                    146: <address>
                    147:   <a href="mailto:duerst@w3.org">Martin Dürst</a> <br>
                    148:   <a href="/Help/Webmaster.html">Webmaster</a> <br>
1.47      duerst    149:   last revised $Date: 2000/07/27 09:43:49 $ by $Author: duerst $ 
1.34      duerst    150: </address>
                    151: 
                    152: <p class="policyfooter"><small><a
                    153: href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</a>  ©  1997 <a
                    154: href="http://www.w3.org">W3C</a> (<a href="http://www.lcs.mit.edu">MIT</a>, <a
                    155: href="http://www.inria.fr/">INRIA</a>, <a
                    156: href="http://www.keio.ac.jp/">Keio</a> ), All Rights Reserved. W3C <a
                    157: href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</a> <a
                    158: href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</a>, <a
                    159: href="/Consortium/Legal/copyright-documents.html">document use </a>and <a
                    160: href="/Consortium/Legal/copyright-software.html">software licensing </a>rules
                    161: apply. Your interactions with this site are in accordance with our <a
                    162: href="/Consortium/Legal/privacy-statement.html#Public">public</a> and <a
                    163: href="/Consortium/Legal/privacy-statement.html#Members">Member</a> privacy
                    164: statements.</small></p>
                    165: </body>
                    166: </html>

Webmaster