Diff for /charlint/Overview.html between versions 1.55 and 1.56

version 1.55, 2000/08/03 11:09:39 version 1.56, 2000/11/08 09:22:20
Line 6 Line 6
   <meta http-equiv="Content-Style-Type" content="text/css">    <meta http-equiv="Content-Style-Type" content="text/css">
   <!--BASE href="http://www.w3.org/Consortium/Translation/"-->    <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
   <!--LINK rel="stylesheet" href="../i18n.css"-->    <!--LINK rel="stylesheet" href="../i18n.css"-->
   <style type="text/css">  <!--    <style type="text/css">
     <!--
 H1.title {text-align: center }  H1.title {text-align: center }
 P.toolbar { text-align: center }  P.toolbar { text-align: center }
 DIV.deliverable { margin-left: 2em;  DIV.deliverable { margin-left: 2em;
Line 21  TH, TD { padding: 2px } Line 22  TH, TD { padding: 2px }
   
   
   
   
   </style>    </style>
   <title>Charlint - A Character Normalization Tool</title>    <title>Charlint - A Character Normalization Tool</title>
   <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">    <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
 </head>  </head>
   
 <body bgcolor="#FFFFFF" text="#000000">  <body bgcolor="#FFFFFF" text="#000000">
 <p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"  
 height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt"  
 alt="International" width="72" height="48" border="0"></a></p>  
   
 <h1>Charlint - A Character Normalization Tool</h1>  <h1>Charlint - A Character Normalization Tool</h1>
   
 <p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data  <p>For more information on Charlint, please see the <a
 Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a>  href="http://www.w3.org/International/Charlint/">Charlint home page</a>.</p>
 | <a href="#Background">Background </a>| <a href="#Version">Version  
 History</a></p>  
   
 <p><strong><span style="background-color:  
 #FFE500">IMPORTANT</span></strong><span style="background-color: #FFE500">:  
 Newest version 0.40, implements Normalization Form C (NFC, Canonical  
 Composition) and NFD (Canonical Decomposition), including Hangul.</span></p>  
   
 <p>Charlint is a character normalization/checking tool written in Perl. Among  
 else, it implements Normalization Form C of <a  
 href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p>  
   
 <h3><a name="Perl">Perl Source</a> and Installation</h3>  
   
 <p>Charlint , aka 'Charlie', is written in <a  
 href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can  
 get the source from <a  
 href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>.  
 Charlint is covered by the <a  
 href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software  
 licence</a>. To install charlint, please make sure you have installed <a  
 href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have  
 downloaded an appropriate character data file, and you have downloaded the <a  
 href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>.  
 Please send error reports or comments to <a  
 href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public  
 discussion please see the Winter mailing list (www-international@w3.org).</p>  
   
 <h3><a name="Recommended">Recommended Character Data Files</a></h3>  
   
 <p>Charlint needs information on characters in order to work correctly. To  
 indicate the file you want to use, please use the -f option. The currently  
 recommended character data file is available from <a  
 href="ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt">ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt</a>.  
 Composition exclusions are currently hard-coded and are based on <a  
 href="ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt">ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt</a>  
 [Final 3.0.0 version of 10 Sept 1999 for both files; identical with the  
 versions available on the CD-ROM provided with <a  
 href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode Standard,  
 Version 3.0</a>]. Additional information on these and other files can be found  
 at <a  
 href="http://www.unicode.org/unicode/onlinedat/online.html">http://www.unicode.org/unicode/onlinedat/online.html</a>.</p>  
   
 <h3><a name="How">How to use charlint</a></h3>  
   
 <p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both  
 for input and for output. Behaviour can be fine-tuned with various options. A  
 list of options such as the one below can be optained by using <kbd>charlint  
 -h</kbd>.</p>  
 <pre>(options prefixed by # are currently not available)  
 -b: Remove initial 'Byte Order Mark'  
 -B: Supress warning about initial 'Byte Order Mark'  
 -C: Do not normalize  
 -d: Debug: Thoroughly check character data table input  
 -D: Leave after reading in character data  
 -e: # remove undefined codepoints  
 -E: Do not warn about undefined codepoints  
 -f file: Read data from file  
          (please use newest V3.0 beta datafiles)  
 -h: Prints out this short description  
 -k: # Warn about compatibility codepoints  
 -K: # Normalize out compatibility codepoints  
 -n: Accept &amp;#ddddd; and &amp;#xhhhh; on input  
         (beware of &lt;![CDATA[, &lt;SCRIPT>, &lt;STYLE>)  
 -N: Produce &amp;#xhhhh; on output  
 -o: Print out 'unprintable' bytes as \octal  
 -p: # Remove stuff in private zone  
 -P: Supress checking private zone  
 -u: # Fix UTF-8 (convert or remove)  
 -U: Supress checking correctness of UTF-8  
 -v: Print version  
 -x: Do decomposition only  
 -X: Don't do decomposition (assume input is decomposed)</pre>  
   
 <h3><a name="Version">Version History</a></h3>  
 <pre># 2000/08/03: 0.40, added Hangul support and did quite some testing  MJD  
 # 2000/08/02: 0.37, added -x and -X for decomposition                MJD  
 # 2000/07/27: 0.36, fixed a bug for non-starter decompositions       MJD  
 # 2000/07/24: 0.35, adapted exclusions to 3.0.0 final (+Tibetan)     MJD  
 # 2000/07/24: 0.34, $chClass = $CombClass{ch}; should read $chClass = $CombClass{$ch};  
 #                   implemented -C                                   MJD  
 # 1999/08/16: 0.33, updated for second version of 3.0.0.beta         MJD  
 # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta      MJD  
 # 1999/06/25: 0.31, fixed reordering bug, going public               MJD  
 # 1999/06/23: 0.30, preparation for W3C member test, without Hangul  MJD</pre>  
   
 <h3><a name="Background">Background</a></h3>  
 <ul>  
   <li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World  
     Wide Web</a> (W3C Working Draft)</li>  
   <li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical  
     Report #15</a> (Version 18 part of Unicode V 3.0)</li>  
   <li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li>  
 </ul>  
   
 <h3><a name="Future">Future Plans</a></h3>  
   
 <p>We have just released the first version of charlint. There are many things  
 we plan to add in the future:</p>  
 <ul>  
   <li>Hangul syllable normalization (Done in version 0.40)</li>  
   <li>Removal of undefined codepoints and codepoints in the private zone</li>  
   <li>Removal/fix of incorrect UTF-8</li>  
   <li>Compatibility character detection or removal</li>  
   <li>Detection or removal of characters not suitable for markup</li>  
 </ul>  
   
 <p>Your help (bug reports, patches, ideas, test cases) is welcome.</p>  
 <hr>  <hr>
   
 <address>  <address>
   <a href="mailto:duerst@w3.org">Martin Dürst</a> <br>    <a href="mailto:duerst@w3.org">Martin Dürst</a>
   <a href="/Help/Webmaster.html">Webmaster</a> <br>  
   last revised $Date$ by $Author$   
 </address>  </address>
   
 <p class="policyfooter"><small><a  
 href="/Consortium/Legal/ipr-notice-20000612#Copyright">Copyright</a>  ©  1997 <a  
 href="http://www.w3.org">W3C</a> (<a href="http://www.lcs.mit.edu">MIT</a>, <a  
 href="http://www.inria.fr/">INRIA</a>, <a  
 href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a  
 href="/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">liability,</a> <a  
 href="/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</a>, <a  
 href="/Consortium/Legal/copyright-documents-19990405">document use </a>and <a  
 href="/Consortium/Legal/copyright-software-19980720">software licensing</a>  
 rules apply. Your interactions with this site are in accordance with our <a  
 href="/Consortium/Legal/privacy-statement-20000612#Public">public</a> and <a  
 href="/Consortium/Legal/privacy-statement-20000612#Members">Member</a> privacy  
 statements.</small></p>  
 </body>  </body>
 </html>  </html>

Removed from v.1.55  
changed lines
  Added in v.1.56


Webmaster