use strict;
use Getopt::Std;
my @orig_ARGV = @ARGV;
-our $VERSION = do { my @r = (q$Revision: 1.23 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
+our $VERSION = do { my @r = (q$Revision: 1.24 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
# These may get re-ordered.
# RAW is a do_now as inserted by &enter
=back
-Needless to say, if you are manually creating a UCM file, you should
-copy ascii.ucm or existing encoding which is close to yours than write
-your own from scratch.
+When you are manually creating a UCM file, you should copy ascii.ucm
+or existing encoding which is close to yours than write your own from
+scratch.
When you do so, make sure you leave at least B<U0000> to B<U0020> as
is, unless your environment is on EBCDIC.
ISO-2022 series. Such modules include L<Encode::JP::2022_JP>,
L<Encode::KR::2022_KR>, and L<Encode::TW::HZ>.
+=head2 Coping with duplicate mappings
+
+When you create a map, you SHOULD make your mappings round-trip safe.
+That is, C<encode('your-encoding', decode('your-encoding', $data)) eq
+$data> stands for all characters that are marked as C<|0>. Here is
+how to make sure;
+
+=over 2
+
+=item *
+
+Sort your map in Unicode order.
+
+=item *
+
+When you have a duplicate entry, mark either one with '|1' or '|3'.
+
+=item *
+
+And make sure '|1' or '|3' FOLLOWS '|0' entry.
+
+=back
+
+Here is an example from big5-eten.
+
+ <U2550> \xF9\xF9 |0
+ <U2550> \xA2\xA4 |3
+
+Internally Encoding -> Unicode and Unicode -> Encoding Map looks like
+this;
+
+ E to U U to E
+ --------------------------------------
+ \xF9\xF9 => U2550 U2550 => \xF9\xF9
+ \xA2\xA4 => U2550
+
+So it is round-trip safe for \xF9\xF9. But if the line above is upside
+down, here is what happens.
+
+ E to U U to E
+ --------------------------------------
+ \xA2\xA4 => U2550 U2550 => \xF9\xF9
+ (\xF9\xF9 => U2550 is now overwritten!)
+
+The Encode package comes with F<ucmlint>, a crude but sufficient
+utility to check the integrity of ucm file. Check under Encode/bin
+directory for this.
+
+
=head1 Bookmarks
ICU Home Page