Integrate perlio:
[p5sagit/p5-mst-13.2.git] / ext / Encode / lib / Encode / Supported.pod
CommitLineData
5d030b67 1=head1 NAME
2
3Encode::Supported -- Supported encodings by Encode
4
5=head1 DESCRIPTION
6
5129552c 7=head2 Encoding Names
5d030b67 8
9Encoding names are case insensitive. White space in names
10is ignored. In addition an encoding may have aliases.
11Each encoding has one "canonical" name. The "canonical"
12name is chosen from the names of the encoding by picking
13he first in the following sequence:
14
15 o The MIME name as defined in IETF RFCs.
16 o The name in the IANA registry.
17 o The name used by the organization that defined it.
18
5129552c 19Because of all the alias issues, and because in the general case
20encodings have state, "Encode" uses the encoding object internally
21once an operation is in progress.
5d030b67 22
5129552c 23=head1 Supported Encodings
5d030b67 24
25As of Perl 5.8.0, at least the following encodings are recognized.
26Note that unless otherwise specified, they are all case insensitive
27(via alias) and all occurance of spaces are replaced with '-'. In
28other words, "ISO 8859 1" and "iso-8859-1" are identical.
29
5129552c 30Encodings are categorized and implemented in several different modules
31but you don't have to C<use Encode::XX> to make them available for
32most cases. Encode.pm will automatically load those modules in need.
5d030b67 33
5129552c 34=head2 Built-in Encodings
5d030b67 35
5129552c 36The following encodings are always available.
5d030b67 37
5129552c 38 Canonical Aliases
39 -----------------------
40 iso-8859-1 latin1
41 US-ascii ascii
42 UCS-2 ucs2, iso-10646-1
43 UCS-2le
44 UTF-8 utf8
45 -----------------------
5d030b67 46
5129552c 47=head2 Encode::Byte
5d030b67 48
5129552c 49The following encodings are based single-byte encoding implemented as
50extended ASCII. For most cases it uses \x80-\xff (upper half) to map
51non-ASCII characters.
5d030b67 52
5129552c 53 -----------------------
54 iso-8859-1 latin
5d030b67 55 iso-8859-2 latin2
56 iso-8859-3 latin3
57 iso-8859-4 latin4
58 iso-8859-5 latin
59 iso-8859-6 latin
60 iso-8859-7
61 iso-8859-8
62 iso-8859-9 latin5
63 iso-8859-10 latin6
64 iso-8859-11
65 (iso-8859-12 is nonexistent)
66 iso-8859-13 latin7
67 iso-8859-14 latin8
68 iso-8859-15 latin9
69 iso-8859-16 latin10
70
71 koi8-f
72 koi8-r
73 koi8-u
74
75 viscii # ASCII + vietnamese
76
77 cp1250 WinLatin2
78 cp1251 WinCyrillic
79 cp1252 WinLatin1
80 cp1253 WinGreek
81 cp1254 WinTurkiskh
82 cp1255 WinHebrew
83 cp1256 WinArabic
84 cp1257 WinBaltic
85 cp1258 WinVietnamese
86 # all cp* are also available as ibm-* and ms-*
87
88 maccentraleuropean
89 maccroatian
90 macroman
91 maccyrillic
92 macromanian
93 macdingbats
94 macsami
95 macgreek
96 macthai
97 macicelandic
98 macturkish
99 macukraine
5129552c 100 -----------------------
5d030b67 101
5129552c 102=head2 The CJK: Chinese, Japanese, Korean (Multibyte)
5d030b67 103
104Note Vietnamese is listed above. Also read "Encoding vs Charset"
105below. Also note these are impelemented in distinct module by
106languages, due the the size concerns. See these perldocs also.
107
5129552c 108=over 4
109
110=item Encode::CN -- Continental China
111
112 -----------------------
113 cp936 gbk
114 euc-cn
115 gb12345
116 gb2312
117 hz
118 iso-ir-165
119 -----------------------
120
121=item Encode::JP -- Japan
122
123 -----------------------
124 7bit-jis jis
125 cp932
126 euc-jp ujis
127 iso-2022-jp
128 macjapan
129 shiftjis Shift_JIS, sjis
130 -----------------------
131
132=item Encode::KR -- Korea
133
134 -----------------------
135 euc-kr
136 ksc5601
137 cp949
138 -----------------------
139
140=item Encode::TW -- Taiwan
141
142 -----------------------
143 big5
144 big5-hkscs
145 cp950
146 -----------------------
147
148=item Encode::HanExtra -- More Chinese via CPAN
149
150Due to size concerns, additional Chinese encodings below are
151distributed separately on CPAN, under the name Encode::HanExtra.
152
153 -----------------------
154 gb18030
155 euc-tw
156 big5plus
157 -----------------------
158
159=back
160
161=head2 Miscellaneous encodings
162
163=over 4
164
165=item Encode::EBCDIC
5d030b67 166
167See perlebcdic for details.
168
5129552c 169 -----------------------
5d030b67 170 cp1047
171 cp37
172 posix-bc
5129552c 173 -----------------------
174
175=item Enocode::Symbols
5d030b67 176
5129552c 177For symbols and dingbats.
5d030b67 178
5129552c 179 -----------------------
5d030b67 180 symbol
181 dingbats
5129552c 182 -----------------------
183
184=back
5d030b67 185
186=head1 Encoding vs. Charset
187
188Character encoding (or just "encoding") and Character Set (or just
189"charset") are often used interchangeably but they are different
190concepts.
191
192Charset determines which characters to be included in a given text.
193
194Encoding actually maps charset(s) to stream of bits.
195
196Note a given encoding contains multiple charsets. For instance,
197euc-jp contains ASCII, JIS X 0201 (Hankaku Kana), JIS X 0208 (Zenkaku
198Kana and Kanji) and JIS X 0212 (Extended Kanji) in a single encoding.
199
200As the name suggests, the Encode module supports encodings, not
201individual charsets.
202
203=head1 Encoding Classification (by Anton Tagunov)
204
205Encodings
206
207 US-ASCII UTF-8 KOI8-R ISO-8859-*
208 ISO-2022-CN ISO-2022-JP Big5
209 EUC-CN EUC-JP EUC-KR
210
211are <http://www.iana.org/assignments/character-sets>-registered as
212preferred MIME names and may probably be used over the Internet. So is
213
214 Shift_JIS
215
216but despite its wide spread it bears the label of being
217Microsft proprietary -- was. Now Shift JIS is official as of
218JIS X 0208-1997.
219
220 UTF-16 KOI8-U
221
222are IANA-registered preferred MIME names but probably
223shoule be avoided as encoding for web pages due to lack of
224browser support.
225
226 ISO-2022 (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
227 ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
228 ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
229 GBK
230 VISCII
231 GB 12345 (only plains 1 and 2 available)
232 GB 18030
233 CNS 11643
234
235are totally valid encodings but not registered at IANA.
236
237 BIG5PLUS
238 EUC-JP-0212 (Encode::lib::Encode::Tcl::Extended)
239
240are a bit proprietary
241
242You may probably get some info on CJK encodings at
243
244brief description for most of the mentioned CJK encodings
245
246F<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
247
248several years old, but still useful
249
250F<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
251
252and some in-depth reading for the heroes :-)
253F<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq ISO-2022)
254
255=head1 See Also
256
5129552c 257L<Encode>,
258L<Encode::Byte>,
259L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>
260L<Encode::EBCDIC>, L<Encode::Symbol>
5d030b67 261
262=cut