version 1.55, 2000/08/03 11:09:39
|
version 1.56, 2000/11/08 09:22:20
|
Line 6
|
Line 6
|
<meta http-equiv="Content-Style-Type" content="text/css"> |
<meta http-equiv="Content-Style-Type" content="text/css"> |
<!--BASE href="http://www.w3.org/Consortium/Translation/"--> |
<!--BASE href="http://www.w3.org/Consortium/Translation/"--> |
<!--LINK rel="stylesheet" href="../i18n.css"--> |
<!--LINK rel="stylesheet" href="../i18n.css"--> |
<style type="text/css"> <!-- |
<style type="text/css"> |
|
<!-- |
H1.title {text-align: center } |
H1.title {text-align: center } |
P.toolbar { text-align: center } |
P.toolbar { text-align: center } |
DIV.deliverable { margin-left: 2em; |
DIV.deliverable { margin-left: 2em; |
Line 21 TH, TD { padding: 2px }
|
Line 22 TH, TD { padding: 2px }
|
|
|
|
|
|
|
|
|
</style> |
</style> |
<title>Charlint - A Character Normalization Tool</title> |
<title>Charlint - A Character Normalization Tool</title> |
<link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css"> |
<link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css"> |
</head> |
</head> |
|
|
<body bgcolor="#FFFFFF" text="#000000"> |
<body bgcolor="#FFFFFF" text="#000000"> |
<p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72" |
|
height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt" |
|
alt="International" width="72" height="48" border="0"></a></p> |
|
|
|
<h1>Charlint - A Character Normalization Tool</h1> |
<h1>Charlint - A Character Normalization Tool</h1> |
|
|
<p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data |
<p>For more information on Charlint, please see the <a |
Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a> |
href="http://www.w3.org/International/Charlint/">Charlint home page</a>.</p> |
| <a href="#Background">Background </a>| <a href="#Version">Version |
|
History</a></p> |
|
|
|
<p><strong><span style="background-color: |
|
#FFE500">IMPORTANT</span></strong><span style="background-color: #FFE500">: |
|
Newest version 0.40, implements Normalization Form C (NFC, Canonical |
|
Composition) and NFD (Canonical Decomposition), including Hangul.</span></p> |
|
|
|
<p>Charlint is a character normalization/checking tool written in Perl. Among |
|
else, it implements Normalization Form C of <a |
|
href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p> |
|
|
|
<h3><a name="Perl">Perl Source</a> and Installation</h3> |
|
|
|
<p>Charlint , aka 'Charlie', is written in <a |
|
href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can |
|
get the source from <a |
|
href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>. |
|
Charlint is covered by the <a |
|
href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software |
|
licence</a>. To install charlint, please make sure you have installed <a |
|
href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have |
|
downloaded an appropriate character data file, and you have downloaded the <a |
|
href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>. |
|
Please send error reports or comments to <a |
|
href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public |
|
discussion please see the Winter mailing list (www-international@w3.org).</p> |
|
|
|
<h3><a name="Recommended">Recommended Character Data Files</a></h3> |
|
|
|
<p>Charlint needs information on characters in order to work correctly. To |
|
indicate the file you want to use, please use the -f option. The currently |
|
recommended character data file is available from <a |
|
href="ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt">ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt</a>. |
|
Composition exclusions are currently hard-coded and are based on <a |
|
href="ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt">ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt</a> |
|
[Final 3.0.0 version of 10 Sept 1999 for both files; identical with the |
|
versions available on the CD-ROM provided with <a |
|
href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode Standard, |
|
Version 3.0</a>]. Additional information on these and other files can be found |
|
at <a |
|
href="http://www.unicode.org/unicode/onlinedat/online.html">http://www.unicode.org/unicode/onlinedat/online.html</a>.</p> |
|
|
|
<h3><a name="How">How to use charlint</a></h3> |
|
|
|
<p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both |
|
for input and for output. Behaviour can be fine-tuned with various options. A |
|
list of options such as the one below can be optained by using <kbd>charlint |
|
-h</kbd>.</p> |
|
<pre>(options prefixed by # are currently not available) |
|
-b: Remove initial 'Byte Order Mark' |
|
-B: Supress warning about initial 'Byte Order Mark' |
|
-C: Do not normalize |
|
-d: Debug: Thoroughly check character data table input |
|
-D: Leave after reading in character data |
|
-e: # remove undefined codepoints |
|
-E: Do not warn about undefined codepoints |
|
-f file: Read data from file |
|
(please use newest V3.0 beta datafiles) |
|
-h: Prints out this short description |
|
-k: # Warn about compatibility codepoints |
|
-K: # Normalize out compatibility codepoints |
|
-n: Accept &#ddddd; and &#xhhhh; on input |
|
(beware of <![CDATA[, <SCRIPT>, <STYLE>) |
|
-N: Produce &#xhhhh; on output |
|
-o: Print out 'unprintable' bytes as \octal |
|
-p: # Remove stuff in private zone |
|
-P: Supress checking private zone |
|
-u: # Fix UTF-8 (convert or remove) |
|
-U: Supress checking correctness of UTF-8 |
|
-v: Print version |
|
-x: Do decomposition only |
|
-X: Don't do decomposition (assume input is decomposed)</pre> |
|
|
|
<h3><a name="Version">Version History</a></h3> |
|
<pre># 2000/08/03: 0.40, added Hangul support and did quite some testing MJD |
|
# 2000/08/02: 0.37, added -x and -X for decomposition MJD |
|
# 2000/07/27: 0.36, fixed a bug for non-starter decompositions MJD |
|
# 2000/07/24: 0.35, adapted exclusions to 3.0.0 final (+Tibetan) MJD |
|
# 2000/07/24: 0.34, $chClass = $CombClass{ch}; should read $chClass = $CombClass{$ch}; |
|
# implemented -C MJD |
|
# 1999/08/16: 0.33, updated for second version of 3.0.0.beta MJD |
|
# 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD |
|
# 1999/06/25: 0.31, fixed reordering bug, going public MJD |
|
# 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD</pre> |
|
|
|
<h3><a name="Background">Background</a></h3> |
|
<ul> |
|
<li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World |
|
Wide Web</a> (W3C Working Draft)</li> |
|
<li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical |
|
Report #15</a> (Version 18 part of Unicode V 3.0)</li> |
|
<li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li> |
|
</ul> |
|
|
|
<h3><a name="Future">Future Plans</a></h3> |
|
|
|
<p>We have just released the first version of charlint. There are many things |
|
we plan to add in the future:</p> |
|
<ul> |
|
<li>Hangul syllable normalization (Done in version 0.40)</li> |
|
<li>Removal of undefined codepoints and codepoints in the private zone</li> |
|
<li>Removal/fix of incorrect UTF-8</li> |
|
<li>Compatibility character detection or removal</li> |
|
<li>Detection or removal of characters not suitable for markup</li> |
|
</ul> |
|
|
|
<p>Your help (bug reports, patches, ideas, test cases) is welcome.</p> |
|
<hr> |
<hr> |
|
|
<address> |
<address> |
<a href="mailto:duerst@w3.org">Martin Dürst</a> <br> |
<a href="mailto:duerst@w3.org">Martin Dürst</a> |
<a href="/Help/Webmaster.html">Webmaster</a> <br> |
|
last revised $Date$ by $Author$ |
|
</address> |
</address> |
|
|
<p class="policyfooter"><small><a |
|
href="/Consortium/Legal/ipr-notice-20000612#Copyright">Copyright</a> © 1997 <a |
|
href="http://www.w3.org">W3C</a> (<a href="http://www.lcs.mit.edu">MIT</a>, <a |
|
href="http://www.inria.fr/">INRIA</a>, <a |
|
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a |
|
href="/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">liability,</a> <a |
|
href="/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</a>, <a |
|
href="/Consortium/Legal/copyright-documents-19990405">document use </a>and <a |
|
href="/Consortium/Legal/copyright-software-19980720">software licensing</a> |
|
rules apply. Your interactions with this site are in accordance with our <a |
|
href="/Consortium/Legal/privacy-statement-20000612#Public">public</a> and <a |
|
href="/Consortium/Legal/privacy-statement-20000612#Members">Member</a> privacy |
|
statements.</small></p> |
|
</body> |
</body> |
</html> |
</html> |