Annotation of charlint/Overview.html, revision 1.21

1.1       duerst      1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
                      2: "http://www.w3.org/TR/REC-html40/loose.dtd">
                      3: <HTML>
                      4: <HEAD>
                      5:   <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
                      6:   <META http-equiv="Content-Style-Type" content="text/css">
                      7:   <!--BASE href="http://www.w3.org/Consortium/Translation/"-->
                      8:   <!--LINK rel="stylesheet" href="../i18n.css"-->
                      9:   <STYLE type="text/css">
                     10:   <!--
                     11:     H1.title {text-align: center }
                     12:     P.toolbar { text-align: center }
                     13:     DIV.deliverable { margin-left: 2em;
                     14:                 margin-right: 2em }
                     15:     P.note { margin-left: 10%;
                     16:                 margin-right: 10%;
                     17:                 color: green }
                     18:     TH { text-align: left }
                     19:     TH, TD { padding: 2px }
                     20:     .external { font-style: italic }
                     21:   -->
                     22:   </STYLE>
1.17      duerst     23:   <TITLE>Charlint - A Character Normalization Tool</TITLE>
1.1       duerst     24:   <LINK rel="stylesheet" type="text/css" href="../../StyleSheets/base.css">
                     25: </HEAD>
                     26: <BODY bgcolor="#FFFFFF" text="#000000">
                     27: <P>
                     28: <A HREF="/"><IMG BORDER="0" ALT="W3C" WIDTH="72" HEIGHT="48" SRC="w3c_home"></A>
                     29: <H1>
1.3       duerst     30:   Charlint - A Character Normalization Tool
1.1       duerst     31: </H1>
1.3       duerst     32: <P>
1.20      duerst     33: <A href="#Perl">Perl source</A> | <A HREF="#Recommended">Recommended Data
                     34: Files</A> | <A HREF="#How">How to use</A> | <A HREF="#Future">Future Plans</A>
                     35: | Background | <A HREF="#Version">Version History</A>
1.4       duerst     36: <H3>
1.12      duerst     37:   <A NAME="Perl">Perl Source</A> and Installation
1.4       duerst     38: </H3>
1.3       duerst     39: <P>
1.19      duerst     40: Charlint is writen in
                     41: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>. You
                     42: can get the source from
1.4       duerst     43: <A HREF="http://www.w3.org/International/charlint/charlint.pl">http://www.w3.org/International/charlint/charlint.pl</A>.
1.5       duerst     44: Charlint is covered by the
                     45: <A HREF="http://www.w3.org/Consortium/Legal/copyright-software.html">W3C
1.12      duerst     46: software licence</A>. To install charlint, please make sure you have installed
1.19      duerst     47: <A HREF="http://www.perl.com/pace/pub/perldocs/latest.html">Perl 5</A>, you
                     48: have downloaded an appropriate character data file, and you have downloaded
                     49: the <A HREF="http://www.w3.org/International/charlint/charlint.pl">Perl
1.21    ! duerst     50: source</A>. Please send error reports or comments to
        !            51: <A HREF="mailto:duerst@w3.org">duerst@w3.org</A>; for anouncements and public
        !            52: discussion please see the Winter mailing list (www-international@w3.org).
1.6       duerst     53: <H3>
1.20      duerst     54:   <A NAME="Recommended">Recommended Character Data Files</A>
1.6       duerst     55: </H3>
1.1       duerst     56: <P>
1.7       duerst     57: Charlint needs information on characters in order to work correctly. To indicate
                     58: the file you want to use, please use the -f option. The currently recommended
1.8       duerst     59: character data file is available from
                     60: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt</A>.
1.16      duerst     61: Composition exclusions are currently hard-coded and are based on
                     62: <A HREF="ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt">ftp://ftp.unicode.org/Public/3.0-Update/CompositionExclusions-1.beta.txt</A>.
1.18      duerst     63: Additional information on these and other files can be found at
1.9       duerst     64: <A HREF="http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html">http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html</A>.
1.10      duerst     65: Please note that this data file is a beta version; the beta test will last
                     66: up to 15 August 1999. Using charlint is one way to test this data file. Please
                     67: send any comments on <EM>the data file</EM> to
                     68: <A HREF="mailto:errata@unicode.org">errata@unicode.org</A>.
1.1       duerst     69: <H3>
1.12      duerst     70:   <A NAME="How">How to use charlint</A>
1.11      duerst     71: </H3>
                     72: <P>
1.15      duerst     73: Charlint is a perl script that works as a simple filter. It uses UTF-8 both
                     74: for input and for output. Behaviour can be fine-tuned with various options.
                     75: A list of options as the one below can be optained by using <KBD>charlint
                     76: -h</KBD>.
                     77: <PRE>(options prefixed by # are currently not available)
1.14      duerst     78: -b: Remove initial 'Byte Order Mark'
                     79: -B: Supress warning about initial 'Byte Order Mark'
                     80: -d: Debug: Thoroughly check character data table input
                     81: -D: Leave after reading in character data
                     82: -e: # remove undefined codepoints
                     83: -E: Do not warn about undefined codepoints
                     84: -f file: Read data from file
                     85:          (please use newest V3.0 beta datafiles)
                     86: -C: # Do not normalize
                     87: -h: Prints out this short description
                     88: -n: Accept &amp;#ddddd; and &amp;#xhhhh; on input
                     89:         (beware of &lt;![CDATA[, &lt;SCRIPT&gt;, &lt;STYLE&gt;)
                     90: -N: Produce &amp;#xhh; on output
                     91: -o: Print out 'unprintable' bytes as octal
                     92: -p: # Remove stuff in private zone
                     93: -P: Supress checking private zone
                     94: -u: # Fix UTF-8 (convert or remove)
                     95: -U: Supress checking correctness of UTF-8
                     96: -v: Print version
                     97: </PRE>
1.19      duerst     98: <H3>
                     99:   <A NAME="Version">Version History</A>
                    100: </H3>
                    101: <P>
                    102: # History:
                    103: <P>
                    104: # 1999/06/23: 0.30, preparation for W3C member test, without Hangul MJD
                    105: <P>
                    106: # 1999/06/25: 0.31, fixed reordering bug, going public MJD
                    107: <P>
                    108: # 1999/07/01: 0.32, adapted surrogates/exclusions to 3.0.0.beta MJD
1.11      duerst    109: <H3>
1.15      duerst    110:   <A NAME="Future">Future Plans</A>
1.1       duerst    111: </H3>
1.15      duerst    112: <P>
                    113: We have just released the first version of charlint. There are many things
                    114: we plan to add in the future:
                    115: <UL>
                    116:   <LI>
                    117:     Hangul syllable normalization
                    118:   <LI>
                    119:     Removal of undefined codepoints and codepoints in the private zone
                    120:   <LI>
                    121:     Removal/fix of incorrect UTF-8
                    122: </UL>
1.1       duerst    123: <P>
                    124:   <HR>
                    125: <ADDRESS>
                    126:   <A HREF="mailto:duerst@w3.org">Martin D&uuml;rst</A> <BR>
                    127:   <A HREF="../Help/Webmaster.html">Webmaster</A> <BR>
                    128:   last revised $Date: 1999/06/23 12:51:49 $ by $Author: connolly $
                    129: </ADDRESS>
                    130: <P class=policyfooter>
                    131: <SMALL><A href="/Consortium/Legal/ipr-notice.html#Copyright">Copyright</A>
                    132: &nbsp;&copy;&nbsp; 1997 <A href="http://www.w3.org">W3C</A>
                    133: (<A href="http://www.lcs.mit.edu">MIT</A>,
                    134: <A href="http://www.inria.fr/">INRIA</A>,
                    135: <A href="http://www.keio.ac.jp/">Keio</A> ), All Rights Reserved. W3C
                    136: <A href="/Consortium/Legal/ipr-notice.html#Legal Disclaimer">liability,</A>
                    137: <A href="/Consortium/Legal/ipr-notice.html#W3C Trademarks">trademark</A>,
                    138: <A href="/Consortium/Legal/copyright-documents.html">document use </A>and
                    139: <A href="/Consortium/Legal/copyright-software.html">software licensing
                    140: </A>rules apply. Your interactions with this site are in accordance with
                    141: our <A href="/Consortium/Legal/privacy-statement.html#Public">public</A>
                    142: and <A href="/Consortium/Legal/privacy-statement.html#Members">Member</A>
                    143: privacy statements.</SMALL>
                    144: </BODY></HTML>

Webmaster