Annotation of charlint/Overview.html, revision 1.21
1.1 duerst 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
2: "http://www.w3.org/TR/REC-html40/loose.dtd">
3: <HTML>
4: <HEAD>
5: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
6: <META http-equiv="Content-Style-Type" content="text/css">
7: <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
8: <!--LINK rel="stylesheet" href="../i18n.css"-->
9: <STYLE type="text/css">
10: <!--
11: H1.title {text-align: center }
12: P.toolbar { text-align: center }
13: DIV.deliverable { margin-left: 2em;
14: margin-right: 2em }
15: P.note { margin-left: 10%;
16: margin-right: 10%;
17: color: green }
18: TH { text-align: left }
19: TH, TD { padding: 2px }
20: .external { font-style: italic }
21: -->
22: </STYLE>
1.17 duerst 23: <TITLE>Charlint - A Character Normalization Tool</TITLE>
1.1 duerst 24: <LINK rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
25: </HEAD>
26: <BODY bgcolor="#FFFFFF" text="#000000">
27: <P>
28: <A HREF="/"><IMG BORDER="0" ALT="W3C" WIDTH="72" HEIGHT="48" SRC="w3c_home"></A>
29: <H1>
1.3 duerst 30: Charlint - A Character Normalization Tool
1.1 duerst 31: </H1>
1.3 duerst 32: <P>
1.20 duerst 33: <A href="#Perl">Perl source</A> | <A HREF="#Recommended">Recommended Data
34: Files</A> | <A HREF="#How">How to use</A> | <A HREF="#Future">Future Plans</A>
35: | Background | <A HREF="#Version">Version History</A>
1.4 duerst 36: <H3>
1.12 duerst 37: <A NAME="Perl">Perl Source</A> and Installation
1.4 duerst 38: </H3>
1.3 duerst 39: <P>
1.19 duerst 40: Charlint is writen in
41: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>. You
42: can get the source from
1.4 duerst 43: <A HREF="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</A>.
1.5 duerst 44: Charlint is covered by the
45: <A HREF="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C
1.12 duerst 46: software licence</A>. To install charlint, please make sure you have installed
1.19 duerst 47: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>, you
48: have downloaded an appropriate character data file, and you have downloaded
49: the <A HREF="http://www.w3.org/International/charlint/charlint.pl">Perl
1.21 ! duerst 50: source</A>. Please send error reports or comments to
! 51: <A HREF="mailto:duerst@w3.org">duerst@w3.org</A>; for anouncements and public
! 52: discussion please see the Winter mailing list (www-international@w3.org).
1.6 duerst 53: <H3>
1.20 duerst 54: <A NAME="Recommended">Recommended Character Data Files</A>
1.6 duerst 55: </H3>
1.1 duerst 56: <P>
1.7 duerst 57: Charlint needs information on characters in order to work correctly. To indicate
58: the file you want to use, please use the -f option. The currently recommended
1.8 duerst 59: character data file is available from
60: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</A>.
1.16 duerst 61: Composition exclusions are currently hard-coded and are based on
62: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</A>.
1.18 duerst 63: Additional information on these and other files can be found at
1.9 duerst 64: <A HREF="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</A>.
1.10 duerst 65: Please note that this data file is a beta version; the beta test will last
66: up to 15 August 1999. Using charlint is one way to test this data file. Please
67: send any comments on <EM>the data file</EM> to
68: <A HREF="mailto:errata@unicode.org">errata@unicode.org</A>.
1.1 duerst 69: <H3>
1.12 duerst 70: <A NAME="How">How to use charlint</A>
1.11 duerst 71: </H3>
72: <P>
1.15 duerst 73: Charlint is a perl script that works as a simple filter. It uses UTF-8 both
74: for input and for output. Behaviour can be fine-tuned with various options.
75: A list of options as the one below can be optained by using <KBD>charlint
76: -h</KBD>.
77: <PRE>(options prefixed by # are currently not available)
1.14 duerst 78: -b: Remove initial 'Byte Order Mark'
79: -B: Supress warning about initial 'Byte Order Mark'
80: -d: Debug: Thoroughly check character data table input
81: -D: Leave after reading in character data
82: -e: # remove undefined codepoints
83: -E: Do not warn about undefined codepoints
84: -f file: Read data from file
85: (please use newest V3.0 beta datafiles)
86: -C: # Do not normalize
87: -h: Prints out this short description
88: -n: Accept &#ddddd; and &#xhhhh; on input
89: (beware of <![CDATA[, <SCRIPT>, <STYLE>)
90: -N: Produce &#xhh; on output
91: -o: Print out 'unprintable' bytes as octal
92: -p: # Remove stuff in private zone
93: -P: Supress checking private zone
94: -u: # Fix UTF-8 (convert or remove)
95: -U: Supress checking correctness of UTF-8
96: -v: Print version
97: </PRE>
1.19 duerst 98: <H3>
99: <A NAME="Version">Version History</A>
100: </H3>
101: <P>
102: # History:
103: <P>
104: # 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD
105: <P>
106: # 1999/06/25: 0.31, fixed reordering bug, going public MJD
107: <P>
108: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD
1.11 duerst 109: <H3>
1.15 duerst 110: <A NAME="Future">Future Plans</A>
1.1 duerst 111: </H3>
1.15 duerst 112: <P>
113: We have just released the first version of charlint. There are many things
114: we plan to add in the future:
115: <UL>
116: <LI>
117: Hangul syllable normalization
118: <LI>
119: Removal of undefined codepoints and codepoints in the private zone
120: <LI>
121: Removal/fix of incorrect UTF-8
122: </UL>
1.1 duerst 123: <P>
124: <HR>
125: <ADDRESS>
126: <A HREF="mailto:duerst@w3.org">Martin Dürst</A> <BR>
127: <A HREF="../Help/Webmaster.html">Webmaster</A> <BR>
128: last revised $Date: 1999/06/23 12:51:49 $ by $Author: connolly $
129: </ADDRESS>
130: <P class=policyfooter>
131: <SMALL><A href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</A>
132: © 1997 <A href="http://www.w3.org">W3C</A>
133: (<A href="http://www.lcs.mit.edu">MIT</A>,
134: <A href="http://www.inria.fr/">INRIA</A>,
135: <A href="http://www.keio.ac.jp/">Keio</A> ), All Rights Reserved. W3C
136: <A href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</A>
137: <A href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</A>,
138: <A href="/Consortium/Legal/copyright-documents.html">document use </A>and
139: <A href="/Consortium/Legal/copyright-software.html">software licensing
140: </A>rules apply. Your interactions with this site are in accordance with
141: our <A href="/Consortium/Legal/privacy-statement.html#Public">public</A>
142: and <A href="/Consortium/Legal/privacy-statement.html#Members">Member</A>
143: privacy statements.</SMALL>
144: </BODY></HTML>
Webmaster