Annotation of charlint/Overview.html, revision 1.33

1.1       duerst      1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
1.31      duerst      2: "http://www.w3.org/TR/REC-html40/loose.dtd">
                      3: <HTML>
                      4: <HEAD>
                      5:   <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
                      6:   <META http-equiv="Content-Style-Type" content="text/css">
1.1       duerst      7:   <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
                      8:   <!--LINK rel="stylesheet" href="../i18n.css"-->
1.31      duerst      9:   <STYLE type="text/css">
1.1       duerst     10:   <!--
                     11:     H1.title {text-align: center }
                     12:     P.toolbar { text-align: center }
                     13:     DIV.deliverable { margin-left: 2em;
                     14:                 margin-right: 2em }
                     15:     P.note { margin-left: 10%;
                     16:                 margin-right: 10%;
                     17:                 color: green }
                     18:     TH { text-align: left }
                     19:     TH, TD { padding: 2px }
                     20:     .external { font-style: italic }
                     21:   -->
1.31      duerst     22:   </STYLE>
                     23:   <TITLE>Charlint - A Character Normalization Tool</TITLE>
                     24:   <LINK rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
                     25: </HEAD>
                     26: <BODY bgcolor="#FFFFFF" text="#000000">
                     27: <P>
                     28: <A href="/"><IMG border="0" src="/Icons/WWW/w3c_home" alt="W3C" width="72"
                     29:     height="48"></A>
                     30: <A href="/International"><IMG src="/Icons/WWW/i18n-alt" alt="International"
                     31:     width="72" height="48" border="0"></A>
                     32: <H1>
                     33:   Charlint - A Character Normalization Tool
                     34: </H1>
                     35: <P>
                     36: <A href="#Perl">Perl source</A> | <A href="#Recommended">Recommended Data
                     37: Files</A> | <A href="#How">How to use</A> | <A href="#Future">Future Plans</A>
                     38: | <A href="#Background">Background </A>| <A href="#Version">Version History</A>
                     39: <P>
                     40: Charlint is a character normalization/checking tool written in Perl. Among
                     41: else, it implements Normalization Form C of
                     42: <A href="http://www.unicode.org/unicode/reports/tr15/">Unicode TR 15</A>.
                     43: <H3>
                     44:   <A name="Perl">Perl Source</A> and Installation
                     45: </H3>
                     46: <P>
                     47: Charlint , aka 'Charlie', is written in
                     48: <A href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>. You
                     49: can get the source from
                     50: <A href="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</A>.
                     51: Charlint is covered by the
                     52: <A href="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C
                     53: software licence</A>. To install charlint, please make sure you have installed
                     54: <A href="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>, you
                     55: have downloaded an appropriate character data file, and you have downloaded
                     56: the <A href="http://www.w3.org/International/charlint/charlint.pl">Perl
                     57: source</A>. Please send error reports or comments to
                     58: <A href="mailto:duerst@w3.org">duerst@w3.org</A>; for anouncements and public
                     59: discussion please see the Winter mailing list (www-international@w3.org).
                     60: <H3>
                     61:   <A name="Recommended">Recommended Character Data Files</A>
                     62: </H3>
                     63: <P>
                     64: Charlint needs information on characters in order to work correctly. To indicate
                     65: the file you want to use, please use the -f option. The currently recommended
                     66: character data file is available from
                     67: <A href="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</A>.
                     68: Composition exclusions are currently hard-coded and are based on
1.32      duerst     69: <A href="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</A>
                     70: [version of 04 Aug 1999 for both files]. Additional information on these
                     71: and other files can be found at
1.31      duerst     72: <A href="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</A>.
                     73: Please note that this data file is a beta version; the beta test will last
                     74: up to 15 August 1999. Using charlint is one way to test this data file. Please
                     75: send any comments on <EM>the data file</EM> to
                     76: <A href="mailto:errata@unicode.org">errata@unicode.org</A>.
                     77: <H3>
                     78:   <A name="How">How to use charlint</A>
                     79: </H3>
                     80: <P>
                     81: Charlint is a perl script that works as a simple filter. It uses UTF-8 both
                     82: for input and for output. Behaviour can be fine-tuned with various options.
                     83: A list of options such as the one below can be optained by using <KBD>charlint
                     84: -h</KBD>.
                     85: <PRE>(options prefixed by # are currently not available)
1.14      duerst     86: -b: Remove initial 'Byte Order Mark'
                     87: -B: Supress warning about initial 'Byte Order Mark'
                     88: -d: Debug: Thoroughly check character data table input
                     89: -D: Leave after reading in character data
                     90: -e: # remove undefined codepoints
                     91: -E: Do not warn about undefined codepoints
                     92: -f file: Read data from file
                     93:          (please use newest V3.0 beta datafiles)
                     94: -C: # Do not normalize
                     95: -h: Prints out this short description
                     96: -n: Accept &amp;#ddddd; and &amp;#xhhhh; on input
1.31      duerst     97:         (beware of &lt;![CDATA[, &lt;SCRIPT&gt;, &lt;STYLE&gt;)
1.14      duerst     98: -N: Produce &amp;#xhh; on output
                     99: -o: Print out 'unprintable' bytes as octal
                    100: -p: # Remove stuff in private zone
                    101: -P: Supress checking private zone
                    102: -u: # Fix UTF-8 (convert or remove)
                    103: -U: Supress checking correctness of UTF-8
1.31      duerst    104: -v: Print version
                    105: </PRE>
                    106: <H3>
                    107:   <A name="Version">Version History</A>
                    108: </H3>
                    109: <PRE># 1999/06/23: 0.30, preparation for W3C member test, without Hangul  MJD
1.23      duerst    110: # 1999/06/25: 0.31, fixed reordering bug, going public               MJD
1.31      duerst    111: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta      MJD<BR># 1999/08/16: 0.33, updated for second version of 3.0.0.beta         MJD
                    112: </PRE>
                    113: <H3>
                    114:   <A name="Background">Background</A>
                    115: </H3>
                    116: <UL>
                    117:   <LI>
                    118:     <A href="http://www.w3.org/TR/WD-charmod">Character Model for the World Wide
                    119:     Web</A> (W3C Working Draft)
                    120:   <LI>
                    121:     <A href="http://www.unicode.org/unicode/reports/tr15/">Unicode Technical
                    122:     Report #15</A> (Version 14 approved subject to various changes)
                    123:   <LI>
                    124:     <A href="http://www.w3.org/Status">W3C Open Source Releases</A>
                    125: </UL>
                    126: <H3>
                    127:   <A name="Future">Future Plans</A>
                    128: </H3>
                    129: <P>
                    130: We have just released the first version of charlint. There are many things
                    131: we plan to add in the future:
                    132: <UL>
                    133:   <LI>
                    134:     Hangul syllable normalization
                    135:   <LI>
                    136:     Removal of undefined codepoints and codepoints in the private zone
                    137:   <LI>
                    138:     Removal/fix of incorrect UTF-8
                    139:   <LI>
                    140:     Incorporate knowledge of V3.0 decompositions to automatically detect future
                    141:     precomposites.
                    142:   <LI>
                    143:     Compatibility character detection or removal.
                    144:   <LI>
                    145:     Detection or removal of characters not suitable for markup.
                    146: </UL>
                    147: <P>
                    148: Your help (bug reports, patches, ideas, test cases) is welcome.
                    149: <P>
                    150: <P>
                    151:   <HR>
                    152: <ADDRESS>
                    153:   <A href="mailto:duerst@w3.org">Martin D&uuml;rst</A> <BR>
1.33    ! duerst    154:   <A href="/Help/Webmaster.html">Webmaster</A> <BR>
        !           155:   last revised $Date: 1999/08/16 07:28:42 $ by $Author: duerst $
1.31      duerst    156: </ADDRESS>
                    157: <P class="policyfooter">
                    158: <SMALL><A href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</A>
                    159: &nbsp;&copy;&nbsp; 1997 <A href="http://www.w3.org">W3C</A>
                    160: (<A href="http://www.lcs.mit.edu">MIT</A>,
                    161: <A href="http://www.inria.fr/">INRIA</A>,
                    162: <A href="http://www.keio.ac.jp/">Keio</A> ), All Rights Reserved. W3C
                    163: <A href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</A>
                    164: <A href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</A>,
                    165: <A href="/Consortium/Legal/copyright-documents.html">document use </A>and
                    166: <A href="/Consortium/Legal/copyright-software.html">software licensing
                    167: </A>rules apply. Your interactions with this site are in accordance with
                    168: our <A href="/Consortium/Legal/privacy-statement.html#Public">public</A>
                    169: and <A href="/Consortium/Legal/privacy-statement.html#Members">Member</A>
                    170: privacy statements.</SMALL>
                    171: </BODY></HTML>

Webmaster