Annotation of charlint/Overview.html, revision 1.29
1.1 duerst 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
2: "http://www.w3.org/TR/REC-html40/loose.dtd">
3: <HTML>
4: <HEAD>
5: <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
6: <META http-equiv="Content-Style-Type" content="text/css">
7: <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
8: <!--LINK rel="stylesheet" href="../i18n.css"-->
9: <STYLE type="text/css">
10: <!--
11: H1.title {text-align: center }
12: P.toolbar { text-align: center }
13: DIV.deliverable { margin-left: 2em;
14: margin-right: 2em }
15: P.note { margin-left: 10%;
16: margin-right: 10%;
17: color: green }
18: TH { text-align: left }
19: TH, TD { padding: 2px }
20: .external { font-style: italic }
21: -->
22: </STYLE>
1.17 duerst 23: <TITLE>Charlint - A Character Normalization Tool</TITLE>
1.1 duerst 24: <LINK rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
25: </HEAD>
26: <BODY bgcolor="#FFFFFF" text="#000000">
27: <P>
1.27 duerst 28: <A HREF="/"><IMG BORDER="0" SRC="/Icons/WWW/w3c_home" ALT="W3C" WIDTH="72"
29: HEIGHT="48"></A>
30: <A HREF="/International"><IMG SRC="/Icons/WWW/i18n-alt" ALT="International"
31: WIDTH="72" HEIGHT="48" BORDER="0"></A>
1.1 duerst 32: <H1>
1.3 duerst 33: Charlint - A Character Normalization Tool
1.1 duerst 34: </H1>
1.3 duerst 35: <P>
1.20 duerst 36: <A href="#Perl">Perl source</A> | <A HREF="#Recommended">Recommended Data
37: Files</A> | <A HREF="#How">How to use</A> | <A HREF="#Future">Future Plans</A>
1.24 duerst 38: | <A HREF="#Background">Background </A>| <A HREF="#Version">Version History</A>
1.4 duerst 39: <H3>
1.12 duerst 40: <A NAME="Perl">Perl Source</A> and Installation
1.4 duerst 41: </H3>
1.3 duerst 42: <P>
1.28 duerst 43: Charlint , aka 'Charlie', is writen in
1.19 duerst 44: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>. You
45: can get the source from
1.4 duerst 46: <A HREF="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</A>.
1.5 duerst 47: Charlint is covered by the
48: <A HREF="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C
1.12 duerst 49: software licence</A>. To install charlint, please make sure you have installed
1.19 duerst 50: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>, you
51: have downloaded an appropriate character data file, and you have downloaded
52: the <A HREF="http://www.w3.org/International/charlint/charlint.pl">Perl
1.21 duerst 53: source</A>. Please send error reports or comments to
54: <A HREF="mailto:duerst@w3.org">duerst@w3.org</A>; for anouncements and public
55: discussion please see the Winter mailing list (www-international@w3.org).
1.6 duerst 56: <H3>
1.20 duerst 57: <A NAME="Recommended">Recommended Character Data Files</A>
1.6 duerst 58: </H3>
1.1 duerst 59: <P>
1.7 duerst 60: Charlint needs information on characters in order to work correctly. To indicate
61: the file you want to use, please use the -f option. The currently recommended
1.8 duerst 62: character data file is available from
63: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</A>.
1.16 duerst 64: Composition exclusions are currently hard-coded and are based on
65: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</A>.
1.18 duerst 66: Additional information on these and other files can be found at
1.9 duerst 67: <A HREF="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</A>.
1.10 duerst 68: Please note that this data file is a beta version; the beta test will last
69: up to 15 August 1999. Using charlint is one way to test this data file. Please
70: send any comments on <EM>the data file</EM> to
71: <A HREF="mailto:errata@unicode.org">errata@unicode.org</A>.
1.1 duerst 72: <H3>
1.12 duerst 73: <A NAME="How">How to use charlint</A>
1.11 duerst 74: </H3>
75: <P>
1.15 duerst 76: Charlint is a perl script that works as a simple filter. It uses UTF-8 both
77: for input and for output. Behaviour can be fine-tuned with various options.
1.23 duerst 78: A list of options such as the one below can be optained by using <KBD>charlint
1.15 duerst 79: -h</KBD>.
80: <PRE>(options prefixed by # are currently not available)
1.14 duerst 81: -b: Remove initial 'Byte Order Mark'
82: -B: Supress warning about initial 'Byte Order Mark'
83: -d: Debug: Thoroughly check character data table input
84: -D: Leave after reading in character data
85: -e: # remove undefined codepoints
86: -E: Do not warn about undefined codepoints
87: -f file: Read data from file
88: (please use newest V3.0 beta datafiles)
89: -C: # Do not normalize
90: -h: Prints out this short description
91: -n: Accept &#ddddd; and &#xhhhh; on input
92: (beware of <![CDATA[, <SCRIPT>, <STYLE>)
93: -N: Produce &#xhh; on output
94: -o: Print out 'unprintable' bytes as octal
95: -p: # Remove stuff in private zone
96: -P: Supress checking private zone
97: -u: # Fix UTF-8 (convert or remove)
98: -U: Supress checking correctness of UTF-8
99: -v: Print version
100: </PRE>
1.19 duerst 101: <H3>
102: <A NAME="Version">Version History</A>
103: </H3>
1.23 duerst 104: <PRE># 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD
105: # 1999/06/25: 0.31, fixed reordering bug, going public MJD
106: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD
107: </PRE>
1.24 duerst 108: <H3>
109: <A NAME="Background">Background</A>
110: </H3>
1.25 duerst 111: <UL>
112: <LI>
113: <A HREF="http://www.w3.org/TR/WD-charmod">Character Model for the World Wide
114: Web</A> (W3C Working Draft)
115: <LI>
116: <A HREF="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
117: Report #15</A> (Version 14 approved subject to various changes)
1.29 ! duerst 118: <LI>
! 119: <A HREF="http://www.w3.org/Status">W3C Open Source Releases</A>
1.25 duerst 120: </UL>
1.11 duerst 121: <H3>
1.15 duerst 122: <A NAME="Future">Future Plans</A>
1.1 duerst 123: </H3>
1.15 duerst 124: <P>
125: We have just released the first version of charlint. There are many things
126: we plan to add in the future:
127: <UL>
128: <LI>
129: Hangul syllable normalization
130: <LI>
131: Removal of undefined codepoints and codepoints in the private zone
132: <LI>
133: Removal/fix of incorrect UTF-8
1.22 duerst 134: <LI>
135: Incorporate knowledge of V3.0 decompositions to automatically detect future
136: precomposites.
137: <LI>
138: Compatibility character detection or removal.
139: <LI>
140: Detection or removal of characters not suitable for markup.
1.15 duerst 141: </UL>
1.1 duerst 142: <P>
1.29 ! duerst 143: Your help (bug reports, patches, ideas, test cases) is welcome.
! 144: <P>
1.1 duerst 145: <HR>
146: <ADDRESS>
147: <A HREF="mailto:duerst@w3.org">Martin Dürst</A> <BR>
148: <A HREF="../Help/Webmaster.html">Webmaster</A> <BR>
1.29 ! duerst 149: last revised $Date: 1999/07/01 09:25:39 $ by $Author: duerst $
1.1 duerst 150: </ADDRESS>
151: <P class=policyfooter>
152: <SMALL><A href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</A>
153: © 1997 <A href="http://www.w3.org">W3C</A>
154: (<A href="http://www.lcs.mit.edu">MIT</A>,
155: <A href="http://www.inria.fr/">INRIA</A>,
156: <A href="http://www.keio.ac.jp/">Keio</A> ), All Rights Reserved. W3C
157: <A href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</A>
158: <A href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</A>,
159: <A href="/Consortium/Legal/copyright-documents.html">document use </A>and
160: <A href="/Consortium/Legal/copyright-software.html">software licensing
161: </A>rules apply. Your interactions with this site are in accordance with
162: our <A href="/Consortium/Legal/privacy-statement.html#Public">public</A>
163: and <A href="/Consortium/Legal/privacy-statement.html#Members">Member</A>
164: privacy statements.</SMALL>
165: </BODY></HTML>
Webmaster