3 Encode::Supported -- Supported encodings by Encode
9 Encoding names are case insensitive. White space in names
10 is ignored. In addition an encoding may have aliases.
11 Each encoding has one "canonical" name. The "canonical"
12 name is chosen from the names of the encoding by picking
13 he first in the following sequence:
15 o The MIME name as defined in IETF RFCs.
16 o The name in the IANA registry.
17 o The name used by the organization that defined it.
19 Because of all the alias issues, and because in the general case
20 encodings have state, "Encode" uses the encoding object internally
21 once an operation is in progress.
23 =head1 Supported Encodings
25 As of Perl 5.8.0, at least the following encodings are recognized.
26 Note that unless otherwise specified, they are all case insensitive
27 (via alias) and all occurance of spaces are replaced with '-'. In
28 other words, "ISO 8859 1" and "iso-8859-1" are identical.
30 Encodings are categorized and implemented in several different modules
31 but you don't have to C<use Encode::XX> to make them available for
32 most cases. Encode.pm will automatically load those modules in need.
34 =head2 Built-in Encodings
36 The following encodings are always available.
39 -----------------------
42 UCS-2 ucs2, iso-10646-1
45 -----------------------
49 The following encodings are based single-byte encoding implemented as
50 extended ASCII. For most cases it uses \x80-\xff (upper half) to map
53 -----------------------
65 (iso-8859-12 is nonexistent)
75 viscii # ASCII + vietnamese
86 # all cp* are also available as ibm-* and ms-*
100 -----------------------
102 =head2 The CJK: Chinese, Japanese, Korean (Multibyte)
104 Note Vietnamese is listed above. Also read "Encoding vs Charset"
105 below. Also note these are impelemented in distinct module by
106 languages, due the the size concerns. See these perldocs also.
110 =item Encode::CN -- Continental China
112 -----------------------
119 -----------------------
121 =item Encode::JP -- Japan
123 -----------------------
129 shiftjis Shift_JIS, sjis
130 -----------------------
132 =item Encode::KR -- Korea
134 -----------------------
138 -----------------------
140 =item Encode::TW -- Taiwan
142 -----------------------
146 -----------------------
148 =item Encode::HanExtra -- More Chinese via CPAN
150 Due to size concerns, additional Chinese encodings below are
151 distributed separately on CPAN, under the name Encode::HanExtra.
153 -----------------------
157 -----------------------
161 =head2 Miscellaneous encodings
167 See perlebcdic for details.
169 -----------------------
173 -----------------------
175 =item Enocode::Symbols
177 For symbols and dingbats.
179 -----------------------
182 -----------------------
186 =head1 Encoding vs. Charset
188 Character encoding (or just "encoding") and Character Set (or just
189 "charset") are often used interchangeably but they are different
192 Charset determines which characters to be included in a given text.
194 Encoding actually maps charset(s) to stream of bits.
196 Note a given encoding contains multiple charsets. For instance,
197 euc-jp contains ASCII, JIS X 0201 (Hankaku Kana), JIS X 0208 (Zenkaku
198 Kana and Kanji) and JIS X 0212 (Extended Kanji) in a single encoding.
200 As the name suggests, the Encode module supports encodings, not
203 =head1 Encoding Classification (by Anton Tagunov)
207 US-ASCII UTF-8 KOI8-R ISO-8859-*
208 ISO-2022-CN ISO-2022-JP Big5
211 are <http://www.iana.org/assignments/character-sets>-registered as
212 preferred MIME names and may probably be used over the Internet. So is
216 but despite its wide spread it bears the label of being
217 Microsft proprietary -- was. Now Shift JIS is official as of
222 are IANA-registered preferred MIME names but probably
223 shoule be avoided as encoding for web pages due to lack of
226 ISO-2022 (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
227 ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
228 ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
231 GB 12345 (only plains 1 and 2 available)
235 are totally valid encodings but not registered at IANA.
238 EUC-JP-0212 (Encode::lib::Encode::Tcl::Extended)
240 are a bit proprietary
242 You may probably get some info on CJK encodings at
244 brief description for most of the mentioned CJK encodings
246 F<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
248 several years old, but still useful
250 F<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
252 and some in-depth reading for the heroes :-)
253 F<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq ISO-2022)
259 L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>
260 L<Encode::EBCDIC>, L<Encode::Symbol>