Annotation of charlint/Overview.html, revision 1.34

1.1       duerst      1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
1.34    ! duerst      2:                       "http://www.w3.org/TR/REC-html40/loose.dtd">
        !             3: <html>
        !             4: <head>
        !             5:   <meta http-equiv="Content-Type" content="text/html">
        !             6:   <meta http-equiv="Content-Style-Type" content="text/css">
1.1       duerst      7:   <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
                      8:   <!--LINK rel="stylesheet" href="../i18n.css"-->
1.34    ! duerst      9:   <style type="text/css">  <!--
        !            10: H1.title {text-align: center }
        !            11: P.toolbar { text-align: center }
        !            12: DIV.deliverable { margin-left: 2em;
        !            13: margin-right: 2em }
        !            14: P.note { margin-left: 10%;
        !            15: margin-right: 10%;
        !            16: color: green }
        !            17: TH { text-align: left }
        !            18: TH, TD { padding: 2px }
        !            19: .external { font-style: italic }
        !            20: -->
        !            21: 
        !            22:   </style>
        !            23:   <title>Charlint - A Character Normalization Tool</title>
        !            24:   <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
        !            25: </head>
        !            26: 
        !            27: <body bgcolor="#FFFFFF" text="#000000">
        !            28: <p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"
        !            29: height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt"
        !            30: alt="International" width="72" height="48" border="0"></a></p>
        !            31: 
        !            32: <h1>Charlint - A Character Normalization Tool</h1>
        !            33: 
        !            34: <p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data
        !            35: Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a>
        !            36: | <a href="#Background">Background </a>| <a href="#Version">Version
        !            37: History</a></p>
        !            38: 
        !            39: <p>Charlint is a character normalization/checking tool written in Perl. Among
        !            40: else, it implements Normalization Form C of <a
        !            41: href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p>
        !            42: 
        !            43: <h3><a name="Perl">Perl Source</a> and Installation</h3>
        !            44: 
        !            45: <p>Charlint , aka 'Charlie', is written in <a
        !            46: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can
        !            47: get the source from <a
        !            48: href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>.
        !            49: Charlint is covered by the <a
        !            50: href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software
        !            51: licence</a>. To install charlint, please make sure you have installed <a
        !            52: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have
        !            53: downloaded an appropriate character data file, and you have downloaded the <a
        !            54: href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>.
        !            55: Please send error reports or comments to <a
        !            56: href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public
        !            57: discussion please see the Winter mailing list (www-international@w3.org).</p>
        !            58: 
        !            59: <h3><a name="Recommended">Recommended Character Data Files</a></h3>
        !            60: 
        !            61: <p>Charlint needs information on characters in order to work correctly. To
        !            62: indicate the file you want to use, please use the -f option. The currently
        !            63: recommended character data file is available from <a
        !            64: href="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</a>.
        !            65: Composition exclusions are currently hard-coded and are based on <a
        !            66: href="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</a>
        !            67: [version of 04 Aug 1999 for both files]. Additional information on these and
        !            68: other files can be found at <a
        !            69: href="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</a>.
        !            70: Please note that this data file is a beta version; the beta test will last up
        !            71: to 15 August 1999. Using charlint is one way to test this data file. Please
        !            72: send any comments on <em>the data file</em> to <a
        !            73: href="mailto:errata@unicode.org">errata@unicode.org</a>.</p>
        !            74: 
        !            75: <h3><a name="How">How to use charlint</a></h3>
        !            76: 
        !            77: <p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both
        !            78: for input and for output. Behaviour can be fine-tuned with various options. A
        !            79: list of options such as the one below can be optained by using <kbd>charlint
        !            80: -h</kbd>.</p>
        !            81: <pre>(options prefixed by # are currently not available)
1.14      duerst     82: -b: Remove initial 'Byte Order Mark'
                     83: -B: Supress warning about initial 'Byte Order Mark'
                     84: -d: Debug: Thoroughly check character data table input
                     85: -D: Leave after reading in character data
                     86: -e: # remove undefined codepoints
                     87: -E: Do not warn about undefined codepoints
                     88: -f file: Read data from file
                     89:          (please use newest V3.0 beta datafiles)
                     90: -C: # Do not normalize
                     91: -h: Prints out this short description
                     92: -n: Accept &amp;#ddddd; and &amp;#xhhhh; on input
1.34    ! duerst     93:         (beware of &lt;![CDATA[, &lt;SCRIPT>, &lt;STYLE>)
1.14      duerst     94: -N: Produce &amp;#xhh; on output
                     95: -o: Print out 'unprintable' bytes as octal
                     96: -p: # Remove stuff in private zone
                     97: -P: Supress checking private zone
                     98: -u: # Fix UTF-8 (convert or remove)
                     99: -U: Supress checking correctness of UTF-8
1.34    ! duerst    100: -v: Print version</pre>
        !           101: 
        !           102: <h3><a name="Version">Version History</a></h3>
        !           103: <pre># 1999/06/23: 0.30, preparation for W3C member test, without Hangul  MJD
1.23      duerst    104: # 1999/06/25: 0.31, fixed reordering bug, going public               MJD
1.34    ! duerst    105: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta      MJD<br>
        !           106: # 1999/08/16: 0.33, updated for second version of 3.0.0.beta         MJD
        !           107: # 2000/07/24: 0.34, $chClass = $CombClass{ch}; should read $chClass = $CombClass{$ch};
        !           108: #                   implemented -C                                   MJD</pre>
        !           109: 
        !           110: <h3><a name="Background">Background</a></h3>
        !           111: <ul>
        !           112:   <li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World
        !           113:     Wide Web</a> (W3C Working Draft)</li>
        !           114:   <li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
        !           115:     Report #15</a> (Version 14 approved subject to various changes)</li>
        !           116:   <li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li>
        !           117: </ul>
        !           118: 
        !           119: <h3><a name="Future">Future Plans</a></h3>
        !           120: 
        !           121: <p>We have just released the first version of charlint. There are many things
        !           122: we plan to add in the future:</p>
        !           123: <ul>
        !           124:   <li>Hangul syllable normalization</li>
        !           125:   <li>Removal of undefined codepoints and codepoints in the private zone</li>
        !           126:   <li>Removal/fix of incorrect UTF-8</li>
        !           127:   <li>Incorporate knowledge of V3.0 decompositions to automatically detect
        !           128:     future precomposites.</li>
        !           129:   <li>Compatibility character detection or removal.</li>
        !           130:   <li>Detection or removal of characters not suitable for markup.</li>
        !           131: </ul>
        !           132: 
        !           133: <p>Your help (bug reports, patches, ideas, test cases) is welcome.</p>
        !           134: 
        !           135: <p></p>
        !           136: 
        !           137: <p></p>
        !           138: <hr>
        !           139: 
        !           140: <address>
        !           141:   <a href="mailto:duerst@w3.org">Martin Dürst</a> <br>
        !           142:   <a href="/Help/Webmaster.html">Webmaster</a> <br>
        !           143:   last revised $Date: 1999/09/29 09:03:18 $ by $Author: duerst $ 
        !           144: </address>
        !           145: 
        !           146: <p class="policyfooter"><small><a
        !           147: href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</a>  ©  1997 <a
        !           148: href="http://www.w3.org">W3C</a> (<a href="http://www.lcs.mit.edu">MIT</a>, <a
        !           149: href="http://www.inria.fr/">INRIA</a>, <a
        !           150: href="http://www.keio.ac.jp/">Keio</a> ), All Rights Reserved. W3C <a
        !           151: href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</a> <a
        !           152: href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</a>, <a
        !           153: href="/Consortium/Legal/copyright-documents.html">document use </a>and <a
        !           154: href="/Consortium/Legal/copyright-software.html">software licensing </a>rules
        !           155: apply. Your interactions with this site are in accordance with our <a
        !           156: href="/Consortium/Legal/privacy-statement.html#Public">public</a> and <a
        !           157: href="/Consortium/Legal/privacy-statement.html#Members">Member</a> privacy
        !           158: statements.</small></p>
        !           159: </body>
        !           160: </html>

Webmaster