3 Encode::Supported -- Supported encodings by Encode
9 Encoding names are case insensitive. White space in names
10 is ignored. In addition an encoding may have aliases.
11 Each encoding has one "canonical" name. The "canonical"
12 name is chosen from the names of the encoding by picking
13 he first in the following sequence:
15 o The MIME name as defined in IETF RFCs.
16 o The name in the IANA registry.
17 o The name used by the organization that defined it.
19 Because of all the alias issues, and because in the gen-
20 eral case encodings have state, "Encode" uses the encoding
21 object internally once an operation is in progress.
23 =head2 Supported Encodings
25 As of Perl 5.8.0, at least the following encodings are recognized.
26 Note that unless otherwise specified, they are all case insensitive
27 (via alias) and all occurance of spaces are replaced with '-'. In
28 other words, "ISO 8859 1" and "iso-8859-1" are identical.
33 -----------------------
40 ucs2 UCS-2, iso-10646-1
42 =head3 The ISO 8859, KOI, and other 1-byte encodings
44 The following encodings are based upon ASCII. For most cases it uses
45 \x80-\xff (upper half) to map non-ASCII characters.
58 (iso-8859-12 is nonexistent)
68 viscii # ASCII + vietnamese
79 # all cp* are also available as ibm-* and ms-*
94 =head3 The CJK: Chinese, Japanese, Korean (Multibyte)
96 Note Vietnamese is listed above. Also read "Encoding vs Charset"
97 below. Also note these are impelemented in distinct module by
98 languages, due the the size concerns. See these perldocs also.
100 cp936 gbk # Encode::CN
106 iso-ir-165 # Encode::CN
108 7bit-jis jis # Encode::JP
110 euc-jp ujis # Encode::JP
111 iso-2022-jp # Encode::JP
112 macjapan # Encode::JP
113 shiftjis Shift_JIS, sjis # Encode::JP
120 big5-hkscs # Encode::TW
123 Due to size concerns, additional Chinese encodings including "GB
124 18030", "EUC-TW" and "BIG5PLUS" are distributed separately on CPAN,
125 under the name Encode::HanExtra.
129 See perlebcdic for details.
135 =head3 Symbols and dingbats
140 =head1 Encoding vs. Charset
142 Character encoding (or just "encoding") and Character Set (or just
143 "charset") are often used interchangeably but they are different
146 Charset determines which characters to be included in a given text.
148 Encoding actually maps charset(s) to stream of bits.
150 Note a given encoding contains multiple charsets. For instance,
151 euc-jp contains ASCII, JIS X 0201 (Hankaku Kana), JIS X 0208 (Zenkaku
152 Kana and Kanji) and JIS X 0212 (Extended Kanji) in a single encoding.
154 As the name suggests, the Encode module supports encodings, not
157 =head1 Encoding Classification (by Anton Tagunov)
161 US-ASCII UTF-8 KOI8-R ISO-8859-*
162 ISO-2022-CN ISO-2022-JP Big5
165 are <http://www.iana.org/assignments/character-sets>-registered as
166 preferred MIME names and may probably be used over the Internet. So is
170 but despite its wide spread it bears the label of being
171 Microsft proprietary -- was. Now Shift JIS is official as of
176 are IANA-registered preferred MIME names but probably
177 shoule be avoided as encoding for web pages due to lack of
180 ISO-2022 (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
181 ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
182 ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
185 GB 12345 (only plains 1 and 2 available)
189 are totally valid encodings but not registered at IANA.
192 EUC-JP-0212 (Encode::lib::Encode::Tcl::Extended)
194 are a bit proprietary
196 You may probably get some info on CJK encodings at
198 brief description for most of the mentioned CJK encodings
200 F<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
202 several years old, but still useful
204 F<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
206 and some in-depth reading for the heroes :-)
207 F<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq ISO-2022)
211 L<Encode>, L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>