Annotation of charlint/Overview.html, revision 1.30
1.1 duerst 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
1.30 ! duerst 2: "http://www.w3.org/TR/REC-html40/loose.dtd">
! 3: <html>
! 4: <head>
! 5: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
! 6: <meta http-equiv="Content-Style-Type" content="text/css">
1.1 duerst 7: <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
8: <!--LINK rel="stylesheet" href="../i18n.css"-->
1.30 ! duerst 9: <style type="text/css">
1.1 duerst 10: <!--
11: H1.title {text-align: center }
12: P.toolbar { text-align: center }
13: DIV.deliverable { margin-left: 2em;
14: margin-right: 2em }
15: P.note { margin-left: 10%;
16: margin-right: 10%;
17: color: green }
18: TH { text-align: left }
19: TH, TD { padding: 2px }
20: .external { font-style: italic }
21: -->
1.30 ! duerst 22: </style>
! 23: <title>Charlint - A Character Normalization Tool</title>
! 24: <link rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
! 25: </head>
! 26:
! 27: <body bgcolor="#FFFFFF" text="#000000">
! 28: <p><a href="/"><img border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"
! 29: height="48"></a> <a href="/International"><img src="/Icons/WWW/i18n-alt"
! 30: alt="International" width="72" height="48" border="0"></a></p>
! 31:
! 32: <h1>Charlint - A Character Normalization Tool</h1>
! 33:
! 34: <p><a href="#Perl">Perl source</a> | <a href="#Recommended">Recommended Data
! 35: Files</a> | <a href="#How">How to use</a> | <a href="#Future">Future Plans</a>
! 36: | <a href="#Background">Background </a>| <a href="#Version">Version
! 37: History</a></p>
! 38:
! 39: <p>Charlint is a character normalization/checking tool written in Perl. Among
! 40: else, it implements Normalization Form C of <a
! 41: href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</a>.</p>
! 42:
! 43: <h3><a name="Perl">Perl Source</a> and Installation</h3>
! 44:
! 45: <p>Charlint , aka 'Charlie', is written in <a
! 46: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>. You can
! 47: get the source from <a
! 48: href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</a>.
! 49: Charlint is covered by the <a
! 50: href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C software
! 51: licence</a>. To install charlint, please make sure you have installed <a
! 52: href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</a>, you have
! 53: downloaded an appropriate character data file, and you have downloaded the <a
! 54: href="http://www.w3.org/International/charlint/charlint.pl">Perl source</a>.
! 55: Please send error reports or comments to <a
! 56: href="mailto:duerst@w3.org">duerst@w3.org</a>; for anouncements and public
! 57: discussion please see the Winter mailing list (www-international@w3.org).</p>
! 58:
! 59: <h3><a name="Recommended">Recommended Character Data Files</a></h3>
! 60:
! 61: <p>Charlint needs information on characters in order to work correctly. To
! 62: indicate the file you want to use, please use the -f option. The currently
! 63: recommended character data file is available from <a
! 64: href="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</a>.
! 65: Composition exclusions are currently hard-coded and are based on <a
! 66: href="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</a>.
! 67: Additional information on these and other files can be found at <a
! 68: href="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</a>.
! 69: Please note that this data file is a beta version; the beta test will last up
! 70: to 15 August 1999. Using charlint is one way to test this data file. Please
! 71: send any comments on <em>the data file</em> to <a
! 72: href="mailto:errata@unicode.org">errata@unicode.org</a>.</p>
! 73:
! 74: <h3><a name="How">How to use charlint</a></h3>
! 75:
! 76: <p>Charlint is a perl script that works as a simple filter. It uses UTF-8 both
! 77: for input and for output. Behaviour can be fine-tuned with various options. A
! 78: list of options such as the one below can be optained by using <kbd>charlint
! 79: -h</kbd>.</p>
! 80: <pre>(options prefixed by # are currently not available)
1.14 duerst 81: -b: Remove initial 'Byte Order Mark'
82: -B: Supress warning about initial 'Byte Order Mark'
83: -d: Debug: Thoroughly check character data table input
84: -D: Leave after reading in character data
85: -e: # remove undefined codepoints
86: -E: Do not warn about undefined codepoints
87: -f file: Read data from file
88: (please use newest V3.0 beta datafiles)
89: -C: # Do not normalize
90: -h: Prints out this short description
91: -n: Accept &#ddddd; and &#xhhhh; on input
1.30 ! duerst 92: (beware of <![CDATA[, <SCRIPT>, <STYLE>)
1.14 duerst 93: -N: Produce &#xhh; on output
94: -o: Print out 'unprintable' bytes as octal
95: -p: # Remove stuff in private zone
96: -P: Supress checking private zone
97: -u: # Fix UTF-8 (convert or remove)
98: -U: Supress checking correctness of UTF-8
1.30 ! duerst 99: -v: Print version</pre>
! 100:
! 101: <h3><a name="Version">Version History</a></h3>
! 102: <pre># 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD
1.23 duerst 103: # 1999/06/25: 0.31, fixed reordering bug, going public MJD
1.30 ! duerst 104: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD</pre>
! 105:
! 106: <h3><a name="Background">Background</a></h3>
! 107: <ul>
! 108: <li><a href="http://www.w3.org/TR/WD-charmod">Character Model for the World
! 109: Wide Web</a> (W3C Working Draft)</li>
! 110: <li><a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
! 111: Report #15</a> (Version 14 approved subject to various changes)</li>
! 112: <li><a href="http://www.w3.org/Status">W3C Open Source Releases</a></li>
! 113: </ul>
! 114:
! 115: <h3><a name="Future">Future Plans</a></h3>
! 116:
! 117: <p>We have just released the first version of charlint. There are many things
! 118: we plan to add in the future:</p>
! 119: <ul>
! 120: <li>Hangul syllable normalization</li>
! 121: <li>Removal of undefined codepoints and codepoints in the private zone</li>
! 122: <li>Removal/fix of incorrect UTF-8</li>
! 123: <li>Incorporate knowledge of V3.0 decompositions to automatically detect
! 124: future precomposites.</li>
! 125: <li>Compatibility character detection or removal.</li>
! 126: <li>Detection or removal of characters not suitable for markup.</li>
! 127: </ul>
! 128:
! 129: <p>Your help (bug reports, patches, ideas, test cases) is welcome.</p>
! 130:
! 131: <p></p>
! 132: <hr>
! 133:
! 134: <address>
! 135: <a href="mailto:duerst@w3.org">Martin Dürst</a> <br>
! 136: <a href="../Help/Webmaster.html">Webmaster</a> <br>
! 137: last revised $Date: 1999/07/02 07:55:33 $ by $Author: duerst $
! 138: </address>
! 139:
! 140: <p class="policyfooter"><small><a
! 141: href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</a>
! 142: © 1997 <a href="http://www.w3.org">W3C</a> (<a
! 143: href="http://www.lcs.mit.edu">MIT</a>, <a
! 144: href="http://www.inria.fr/">INRIA</a>, <a
! 145: href="http://www.keio.ac.jp/">Keio</a> ), All Rights Reserved. W3C <a
! 146: href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</a> <a
! 147: href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</a>, <a
! 148: href="/Consortium/Legal/copyright-documents.html">document use </a>and <a
! 149: href="/Consortium/Legal/copyright-software.html">software licensing </a>rules
! 150: apply. Your interactions with this site are in accordance with our <a
! 151: href="/Consortium/Legal/privacy-statement.html#Public">public</a> and <a
! 152: href="/Consortium/Legal/privacy-statement.html#Members">Member</a> privacy
! 153: statements.</small></p>
! 154: </body>
! 155: </html>
Webmaster