h2xs.t fix for VMS
[p5sagit/p5-mst-13.2.git] / ext / Encode / lib / Encode / Supported.pod
CommitLineData
5d030b67 1=head1 NAME
2
3Encode::Supported -- Supported encodings by Encode
4
5=head1 DESCRIPTION
6
7=Encoding Names
8
9Encoding names are case insensitive. White space in names
10is ignored. In addition an encoding may have aliases.
11Each encoding has one "canonical" name. The "canonical"
12name is chosen from the names of the encoding by picking
13he first in the following sequence:
14
15 o The MIME name as defined in IETF RFCs.
16 o The name in the IANA registry.
17 o The name used by the organization that defined it.
18
19Because of all the alias issues, and because in the gen-
20eral case encodings have state, "Encode" uses the encoding
21object internally once an operation is in progress.
22
23=head2 Supported Encodings
24
25As of Perl 5.8.0, at least the following encodings are recognized.
26Note that unless otherwise specified, they are all case insensitive
27(via alias) and all occurance of spaces are replaced with '-'. In
28other words, "ISO 8859 1" and "iso-8859-1" are identical.
29
30=head3 ASCII
31
32 Canonical Aliases
33 -----------------------
34 ascii uc-ascii
35
36=head3 The Unicode
37
38 utf8 UTF-8
39 utf16 UTF-16
40 ucs2 UCS-2, iso-10646-1
41
42=head3 The ISO 8859, KOI, and other 1-byte encodings
43
44The following encodings are based upon ASCII. For most cases it uses
45\x80-\xff (upper half) to map non-ASCII characters.
46
47 iso-8859-1 latin1
48 iso-8859-2 latin2
49 iso-8859-3 latin3
50 iso-8859-4 latin4
51 iso-8859-5 latin
52 iso-8859-6 latin
53 iso-8859-7
54 iso-8859-8
55 iso-8859-9 latin5
56 iso-8859-10 latin6
57 iso-8859-11
58 (iso-8859-12 is nonexistent)
59 iso-8859-13 latin7
60 iso-8859-14 latin8
61 iso-8859-15 latin9
62 iso-8859-16 latin10
63
64 koi8-f
65 koi8-r
66 koi8-u
67
68 viscii # ASCII + vietnamese
69
70 cp1250 WinLatin2
71 cp1251 WinCyrillic
72 cp1252 WinLatin1
73 cp1253 WinGreek
74 cp1254 WinTurkiskh
75 cp1255 WinHebrew
76 cp1256 WinArabic
77 cp1257 WinBaltic
78 cp1258 WinVietnamese
79 # all cp* are also available as ibm-* and ms-*
80
81 maccentraleuropean
82 maccroatian
83 macroman
84 maccyrillic
85 macromanian
86 macdingbats
87 macsami
88 macgreek
89 macthai
90 macicelandic
91 macturkish
92 macukraine
93
94=head3 The CJK: Chinese, Japanese, Korean (Multibyte)
95
96Note Vietnamese is listed above. Also read "Encoding vs Charset"
97below. Also note these are impelemented in distinct module by
98languages, due the the size concerns. See these perldocs also.
99
100 cp936 gbk # Encode::CN
101 euc-cn # Encode::CN
102 gb12345 # Encode::CN
103 gb2312 # Encode::CN
104 gb2312 # Encode::CN
105 hz # Encode::CN
106 iso-ir-165 # Encode::CN
107
108 7bit-jis jis # Encode::JP
109 cp932 # Encode::JP
110 euc-jp ujis # Encode::JP
111 iso-2022-jp # Encode::JP
112 macjapan # Encode::JP
113 shiftjis Shift_JIS, sjis # Encode::JP
114
115 euc-kr # Encode::KR
116 ksc5601 # Encode::KR
117 cp949 # Encode::KR
118
119 big5 # Encode::TW
120 big5-hkscs # Encode::TW
121 cp950 # Encode::TW
122
123Due to size concerns, additional Chinese encodings including "GB
12418030", "EUC-TW" and "BIG5PLUS" are distributed separately on CPAN,
125under the name Encode::HanExtra.
126
127=head3 EBCDIC
128
129See perlebcdic for details.
130
131 cp1047
132 cp37
133 posix-bc
134
135=head3 Symbols and dingbats
136
137 symbol
138 dingbats
139
140=head1 Encoding vs. Charset
141
142Character encoding (or just "encoding") and Character Set (or just
143"charset") are often used interchangeably but they are different
144concepts.
145
146Charset determines which characters to be included in a given text.
147
148Encoding actually maps charset(s) to stream of bits.
149
150Note a given encoding contains multiple charsets. For instance,
151euc-jp contains ASCII, JIS X 0201 (Hankaku Kana), JIS X 0208 (Zenkaku
152Kana and Kanji) and JIS X 0212 (Extended Kanji) in a single encoding.
153
154As the name suggests, the Encode module supports encodings, not
155individual charsets.
156
157=head1 Encoding Classification (by Anton Tagunov)
158
159Encodings
160
161 US-ASCII UTF-8 KOI8-R ISO-8859-*
162 ISO-2022-CN ISO-2022-JP Big5
163 EUC-CN EUC-JP EUC-KR
164
165are <http://www.iana.org/assignments/character-sets>-registered as
166preferred MIME names and may probably be used over the Internet. So is
167
168 Shift_JIS
169
170but despite its wide spread it bears the label of being
171Microsft proprietary -- was. Now Shift JIS is official as of
172JIS X 0208-1997.
173
174 UTF-16 KOI8-U
175
176are IANA-registered preferred MIME names but probably
177shoule be avoided as encoding for web pages due to lack of
178browser support.
179
180 ISO-2022 (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
181 ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
182 ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
183 GBK
184 VISCII
185 GB 12345 (only plains 1 and 2 available)
186 GB 18030
187 CNS 11643
188
189are totally valid encodings but not registered at IANA.
190
191 BIG5PLUS
192 EUC-JP-0212 (Encode::lib::Encode::Tcl::Extended)
193
194are a bit proprietary
195
196You may probably get some info on CJK encodings at
197
198brief description for most of the mentioned CJK encodings
199
200F<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
201
202several years old, but still useful
203
204F<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
205
206and some in-depth reading for the heroes :-)
207F<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq ISO-2022)
208
209=head1 See Also
210
211L<Encode>, L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>
212
213=cut