Annotation of charlint/Overview.html, revision 1.46
1.1 duerst 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
1.34 duerst 2: "http://www.w3.org/TR/REC-html40/loose.dtd">
3: <html>
4: <head>
5: <meta http-equiv="Content-Type" content="text/html">
6: <meta http-equiv="Content-Style-Type" content="text/css">
1.1 duerst 7: <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
8: <!--LINK rel="stylesheet" href="../i18n.css"-->
1.34 duerst 9: <style type="text/css"> <!--
10: H1.title {text-align: center }
11: P.toolbar { text-align: center }
12: DIV.deliverable { margin-left: 2em;
13: margin-right: 2em }
14: P.note { margin-left: 10%;
15: margin-right: 10%;
16: color: green }
17: TH { text-align: left }
18: TH, TD { padding: 2px }
19: .external { font-style: italic }
20: -->
21:
1.45 duerst 22:
1.34 duerst 23: </style>
24: <title>Charlint - A Character Normalization Tool</title>
25: <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
26: </head>
27:
28: <body bgcolor="#FFFFFF" text="#000000">
29: <p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"
30: height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt"
31: alt="International" width="72" height="48" border="0"></a></p>
32:
33: <h1>Charlint - A Character Normalization Tool</h1>
34:
35: <p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data
36: Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a>
37: | <a href="#Background">Background </a>| <a href="#Version">Version
38: History</a></p>
39:
1.46 ! duerst 40: <p><strong>IMPORTANT</strong>: Newest version 0.36, fixes serious bugs (24
! 41: July 2000)</p>
1.36 duerst 42:
1.34 duerst 43: <p>Charlint is a character normalization/checking tool written in Perl. Among
44: else, it implements Normalization Form C of <a
45: href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p>
46:
47: <h3><a name="Perl">Perl Source</a> and Installation</h3>
48:
49: <p>Charlint , aka 'Charlie', is written in <a
50: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can
51: get the source from <a
52: href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>.
53: Charlint is covered by the <a
54: href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software
55: licence</a>. To install charlint, please make sure you have installed <a
56: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have
57: downloaded an appropriate character data file, and you have downloaded the <a
58: href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>.
59: Please send error reports or comments to <a
60: href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public
61: discussion please see the Winter mailing list (www-international@w3.org).</p>
62:
63: <h3><a name="Recommended">Recommended Character Data Files</a></h3>
64:
65: <p>Charlint needs information on characters in order to work correctly. To
66: indicate the file you want to use, please use the -f option. The currently
67: recommended character data file is available from <a
1.40 duerst 68: href="fftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt">ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt</a>.
1.34 duerst 69: Composition exclusions are currently hard-coded and are based on <a
1.41 duerst 70: href="ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt">ftp://ftp.unicode.org/Public/UNIDATA/CompositionExclusions.txt</a>
1.42 duerst 71: [Final 3.0.0 version of 10 Sept 1999 for both files; identical with the
72: versions available on the CD-ROM provided with <a
73: href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode Standard,
74: Version 3.0</a>]. Additional information on these and other files can be found
75: at <a
76: href="http://www.unicode.org/unicode/onlinedat/online.html">http://www.unicode.org/unicode/onlinedat/online.html</a>.</p>
1.34 duerst 77:
78: <h3><a name="How">How to use charlint</a></h3>
79:
80: <p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both
81: for input and for output. Behaviour can be fine-tuned with various options. A
82: list of options such as the one below can be optained by using <kbd>charlint
83: -h</kbd>.</p>
84: <pre>(options prefixed by # are currently not available)
1.14 duerst 85: -b: Remove initial 'Byte Order Mark'
86: -B: Supress warning about initial 'Byte Order Mark'
87: -d: Debug: Thoroughly check character data table input
88: -D: Leave after reading in character data
89: -e: # remove undefined codepoints
90: -E: Do not warn about undefined codepoints
91: -f file: Read data from file
92: (please use newest V3.0 beta datafiles)
1.38 duerst 93: -C: Do not normalize
1.14 duerst 94: -h: Prints out this short description
95: -n: Accept &#ddddd; and &#xhhhh; on input
1.34 duerst 96: (beware of <![CDATA[, <SCRIPT>, <STYLE>)
1.14 duerst 97: -N: Produce &#xhh; on output
98: -o: Print out 'unprintable' bytes as octal
99: -p: # Remove stuff in private zone
100: -P: Supress checking private zone
101: -u: # Fix UTF-8 (convert or remove)
102: -U: Supress checking correctness of UTF-8
1.34 duerst 103: -v: Print version</pre>
104:
105: <h3><a name="Version">Version History</a></h3>
1.45 duerst 106: <pre># 2000/07/27: 0.36, fixed a bug for non-starter decompositions MJD
1.39 duerst 107: # 2000/07/24: 0.35, adapted exclusions to 3.0.0 final (+Tibetan) MJD
108: # 2000/07/24: 0.34, $chClass = $CombClass{ch}; should read $chClass = $CombClass{$ch};
109: # implemented -C MJD
110: # 1999/08/16: 0.33, updated for second version of 3.0.0.beta MJD
111: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD
1.23 duerst 112: # 1999/06/25: 0.31, fixed reordering bug, going public MJD
1.39 duerst 113: # 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD</pre>
1.34 duerst 114:
115: <h3><a name="Background">Background</a></h3>
116: <ul>
117: <li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World
118: Wide Web</a> (W3C Working Draft)</li>
119: <li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
1.43 duerst 120: Report #15</a> (Version 18 part of Unicode V 3.0)</li>
1.34 duerst 121: <li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li>
122: </ul>
123:
124: <h3><a name="Future">Future Plans</a></h3>
125:
126: <p>We have just released the first version of charlint. There are many things
127: we plan to add in the future:</p>
128: <ul>
129: <li>Hangul syllable normalization</li>
130: <li>Removal of undefined codepoints and codepoints in the private zone</li>
131: <li>Removal/fix of incorrect UTF-8</li>
1.44 duerst 132: <li>Compatibility character detection or removal</li>
133: <li>Detection or removal of characters not suitable for markup</li>
1.34 duerst 134: </ul>
135:
136: <p>Your help (bug reports, patches, ideas, test cases) is welcome.</p>
137: <hr>
138:
139: <address>
140: <a href="mailto:duerst@w3.org">Martin Dürst</a> <br>
141: <a href="/Help/Webmaster.html">Webmaster</a> <br>
1.45 duerst 142: last revised $Date: 2000/07/25 05:29:14 $ by $Author: duerst $
1.34 duerst 143: </address>
144:
145: <p class="policyfooter"><small><a
146: href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</a> © 1997 <a
147: href="http://www.w3.org">W3C</a> (<a href="http://www.lcs.mit.edu">MIT</a>, <a
148: href="http://www.inria.fr/">INRIA</a>, <a
149: href="http://www.keio.ac.jp/">Keio</a> ), All Rights Reserved. W3C <a
150: href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</a> <a
151: href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</a>, <a
152: href="/Consortium/Legal/copyright-documents.html">document use </a>and <a
153: href="/Consortium/Legal/copyright-software.html">software licensing </a>rules
154: apply. Your interactions with this site are in accordance with our <a
155: href="/Consortium/Legal/privacy-statement.html#Public">public</a> and <a
156: href="/Consortium/Legal/privacy-statement.html#Members">Member</a> privacy
157: statements.</small></p>
158: </body>
159: </html>
Webmaster