[p5sagit/p5-mst-13.2.git] / ext / Encode / lib / Encode / Supported.pod

=head1 NAME

Encode::Supports -- Supported encodings by Encode

=head1 DESCRIPTION

=head2 Encoding Names

Encoding names are case insensitive. White space in names
is ignored.  In addition an encoding may have aliases.
Each encoding has one "canonical" name.  The "canonical"
name is chosen from the names of the encoding by picking
he first in the following sequence:

       o The MIME name as defined in IETF RFCs.
       o The name in the IANA registry.
       o The name used by the organization that defined it.

Because of all the alias issues, and because in the general case 
encodings have state, "Encode" uses the encoding object internally 
once an operation is in progress.

=head1 Supported Encodings

As of Perl 5.8.0, at least the following encodings are recognized.
Note that unless otherwise specified, they are all case insensitive
(via alias) and all occurrance of spaces are replaced with '-'.  In
other words, "ISO 8859 1" and "iso-8859-1" are identical.

Encodings are categorized and implemented in several different modules
but you don't have to C<use Encode::XX> to make them available for
most cases.  Encode.pm will automatically load those modules in need.

=head2 Built-in Encodings

The following encodings are always available.

  Canonical	Aliases                      Comments & References
  ----------------------------------------------------------------
  iso-8859-1	latin1					     [ISO]
  US-ascii	ascii					    [ECMA]
  UCS-2		ucs2, iso-10646-1	             [IANA, et al]
  UCS-2l
  UTF-8		utf8					 [RFC2279]
  ----------------------------------------------------------------

=head2 Encode::Byte

The following encodings are based single-byte encoding implemented as
extended ASCII.  For most cases it uses \x80-\xff (upper half) to map
non-ASCII characters.

  ----------------------------------------------------------------
  # ISO 8859 series
  (iso-8859-1	is in built-in)
  iso-8859-2	latin2					     [ISO]
  iso-8859-3	latin3					     [ISO]
  iso-8859-4	latin4					     [ISO]
  iso-8859-5						     [ISO]
  iso-8859-6						     [ISO]
  iso-8859-7						     [ISO]
  iso-8859-8						     [ISO]
  iso-8859-9	latin5					     [ISO]
  iso-8859-10	latin6					     [ISO]
  iso-8859-11
  (iso-8859-12 is nonexistent)
  iso-8859-13   latin7					     [ISO]
  iso-8859-14	latin8					     [ISO]
  iso-8859-15	latin9					     [ISO]
  iso-8859-16	latin10					     [ISO]

  # Cyrillic
  koi8-f					
  koi8-r						 [RFC1489]
  koi8-u						 [RFC2319]

  # Vietnamese
  viscii
  
  # all cp* are also available as ibm-*, ms-*, and windows-*
  # also see L<http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp>
  cp1250	WinLatin2
  cp1251	WinCyrillic
  cp1252	WinLatin1
  cp1253	WinGreek
  cp1254	WinTurkiskh
  cp1255	WinHebrew
  cp1256	WinArabic
  cp1257	WinBaltic
  cp1258	WinVietnamese

  # Macintosh
  # Also see L<http://developer.apple.com/technotes/tn/tn1150.html>
  MacCentralEurRoman
  MacCroatian
  MacRoman
  MacCyrillic
  MacRomanian
  MacSami
  MacGreek
  MacThai
  MacIceland    
  MacTurkish
  MacUkrainian

  # More vendor encodings
  nextstep
  gsm0338	# used in GSM handsets
  hp-roman8
  ----------------------------------------------------------------

=head2 The CJK: Chinese, Japanese, Korean (Multibyte)

Note Vietnamese is listed above.  Also read "Encoding vs Charset"
below.  Also note these are implemented in distinct module by
languages, due the the size concerns.  Please also refer to their
respective document pages.

=over 4

=item Encode::CN -- Continental China

  ----------------------------------------------------------------
  cp936      gbk		    
  euc-cn     gb2312
  gb12345-raw
  gb2312-raw
  hz
  iso-ir-165
  ----------------------------------------------------------------

=item Encode::JP -- Japan

  ----------------------------------------------------------------
  7bit-jis	  jis
  cp932		  ms_Kanji
  euc-jp	  ujis
  iso-2022-jp						 [RFC1468]
  iso-2022-jp-1						 [RFC2237]
  macJapan
  shiftjis	  Shift_JIS, sjis
  ----------------------------------------------------------------

=item Encode::KR -- Korea

  ----------------------------------------------------------------
  euc-kr
  cp949		ks_c_5601-1987 x-windows-949 uhc
  iso-2022-kr					         [RFC1557]
  johab
  ksc5601-raw
  ----------------------------------------------------------------

=item Encode::TW -- Taiwan

  ----------------------------------------------------------------
  big5
  big5-hkscs
  cp950
  ----------------------------------------------------------------

=item Encode::HanExtra -- More Chinese via CPAN

Due to size concerns, additional Chinese encodings below are
distributed separately on CPAN, under the name Encode::HanExtra.

  ----------------------------------------------------------------
  gb18030
  euc-tw
  big5plus
  ----------------------------------------------------------------

=back

=head2 Miscellaneous encodings

=over 4

=item Encode::EBCDIC

See perlebcdic for details.

  ----------------------------------------------------------------
  cp1047
  cp37
  posix-bc
  ----------------------------------------------------------------

=item Encode::Symbols

For symbols  and dingbats.

  ----------------------------------------------------------------
  symbol
  dingbats
  macDingbats
  ----------------------------------------------------------------

=back

=head1 Unsupported encodings

The following are not supported as yet.  Some because they are rarely
usede, some because of technical difficulty.  They may be supported by
external modules via CPAN in future, however.

=over 4

=item   ISO-2022-JP-2 [RFC1554]

Not very popular yet.  Needs Unicode Database or equivalent to
implement encode() (Because it includes JIS X 0208/0212, KSC5601, and
GB2312 sumulteniously, which code points in unicode overlap.  So you
need to lookup the database to determine what character set a given
Unicode character should belong). 

=item   ISO-2022-CN [RFC1922]

Not very popular.  Needs CNS 11643-1 and 2 which are not available in
this module.  CNS 11643 is supported (via euc-tw) in
Encode::HanExtra.  Autrijus may add support for this encoding in his
module in future 

=item various UP-UX encodings

The following are unsoported due to the lack of mapping data.
  
  '8'  - arabic8, greek8, hebrew8, kana8, thai8, and turkish8
  '15' - japanese15, korean15, and  roi15

=item Cyrillic encoding ISO-IR-111

Anton doubts its usefulness.

=item ISO-8859-8-1 [Hebrew]

None of the Encode team knows Hebrew enough.  Contribution welcome.

=item Thai encoding TCVN

Ditto.

=item Vietnamese encodings VPS

Ditto.

=item various Mac encodings

The following are unsoported due to the lack of mapping data. "Mac"
that prepends the encoding names are omitted.

 Arabic, Armenian, Bengali, Burmese
 ChineseSimp, ChineseTrad, Devanagari, Ethiopic, ExtArabic
 Farsi, Georgian, Gujarati, Gurmukhi, Hebrew
 Kannada, Khmer, Korean, Laotian, Malayalam, Mongolian
 Oriya Sinhalese Symbol Tamil Telugu Tibetan Vietnamese

The rest of which already available are based upon the vendor mapping
available at L<http://www.unicode.org/>

=back

=head1 Encoding vs. Charset

Character encoding (or just "encoding") and Character Set (or just
"charset") are often used interchangeably but they are different
concepts.

=over 2

=item Character I<Set> (I<charset> for short)

Is a collection of characters in which each character is distinguished
with unique ID (in most cases, ID is number). 

=item Character I<Encoding>

Is a way to represent character set(s) in a stream of bits.

=back

A character encoding may contain a single character set
(i.e. US-ascii) or multiple character sets (i.e. EUC-JP;
US-ascii, JIS X 0201 Kana, JIS X 0208 and JIS X 0212).  

A character encoding may also encode character set as-is (also called
a I<raw> encoding.  i.e. US-ascii) or processed (i.e. EUC-JP, US-ascii is
as-is, JIS X 0201 is prepended  with \x8E, JIS X 0208 is added by
0x8080, and JIS X 0212 is added by 0x8080 then prepended with \x8F).

As the name suggests, the Encode module supports encodings, not
individual charsets.

However, the word I<charset> is casually used even in Internet
Assigned Number Authority to actually mean I<encoding>.  Encode tries
to soothe this misconception via aliases.  For instance,
C<gb2312> is aliased to C<euc-cn>, while "raw" encoded version is
available as C<gb2312-raw>.

=head1 Encoding Classification (by Anton Tagunov and Dan Kogai)

This section tries to classify the supported encodings by their 
applicability for information exchange over the Internet and to 
choose the most suitable aliases to name them in the context of 
such communication.

=over 2

=item * 

To (en|de) code Encodings marked as C<*>, You need C<Encode::HanExtra>
,available from CPAN.

=back

Encoding names

  US-ASCII    UTF-8     ISO-8859-*  KOI8-R
  Shift_JIS   EUC-JP  ISO-2022-JP ISO-2022-JP-1
  EUC-KR      Big5

are registered to IANA as preferred MIME names and may probably be used over the Internet.

C<Shift_JIS> is no longer Microsft proprietary since it has been
officialized by JIS X 0208-1997.

  EUC-CN

has not been registered with IANA (as of march 2002) but
seems to be supported by major web browsers. In Encode, GB2312
is aliased to EUC-CN, with "uncooked" version of GB2312 canonicalized
as gb2312-raw.  See L<Encode::CN> for details.

  KS_C_5601-1987

has been registered to IANA but when they are used, they are
EUC-coded.  Internet community in Korea is not happy with this.
so C<KS_C_5601-1987> is aliased to C<cp949>, an enhanced version
of C<euc-kr>, with ksc5601-raw for "uncooked".

  UTF-16 
  KOI8-U        (http://www.faqs.org/rfcs/rfc2319.html)

are IANA-registered (C<UTF-16> even as a preferred MIME name)
but probably should be avoided as encoding for web pages due to 
the lack of browser supports.

  ISO-IR-165    (http://www.faqs.org/rfcs/rfc1345.html)
  GBK
  VISCII
  GB 12345
  GB 18030 (*)  (see links bellow)
  EUC-TW   (*)

are totally valid encodings but not registered at IANA.
The names under which they are listed here are probably the
most widely-known names for these encodings and are recommended
names.

  BIG5PLUS (*)

is a bit proprietary name. 

=head1 Bookmarks

=over 2

=item Assigned Charset Names by IANA

L<http://www.iana.org/assignments/character-sets>

Most of the C<canonical names> in Encode derive from this list
so you can directly apply the string you have extracted from MIME
header of mails and we pages.

=item CJK.inf

L<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>

Somewhat obsolete (last update in 1996), but still useful.  Also try

L<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>

You will find brief info on C<EUC-CN>, C<GBK> and mostly on C<GB 18030>

=item EMCA-035 (eq C<ISO-2022>)

L<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> 

The very dspecification of ISO-2022 is available from the link above.

=back

=head1 See Also

L<Encode>, 
L<Encode::Byte>, 
L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>,
L<Encode::EBCDIC>, L<Encode::Symbol>

=cut

I could not find this page because the hostname doesn't resolve!

 Brief description for most of the mentioned CJK encodings
L<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
Commit	Line	Data
5d030b67	1	=head1 NAME
5d030b67	2
64ffdd5e	3	Encode::Supports -- Supported encodings by Encode
5d030b67	4
	5	=head1 DESCRIPTION
	6
5129552c	7	=head2 Encoding Names
5d030b67	8
	9	Encoding names are case insensitive. White space in names
	10	is ignored. In addition an encoding may have aliases.
	11	Each encoding has one "canonical" name. The "canonical"
	12	name is chosen from the names of the encoding by picking
	13	he first in the following sequence:
	14
	15	o The MIME name as defined in IETF RFCs.
	16	o The name in the IANA registry.
	17	o The name used by the organization that defined it.
	18
5129552c	19	Because of all the alias issues, and because in the general case
	20	encodings have state, "Encode" uses the encoding object internally
	21	once an operation is in progress.
5d030b67	22
5129552c	23	=head1 Supported Encodings
5d030b67	24
	25	As of Perl 5.8.0, at least the following encodings are recognized.
	26	Note that unless otherwise specified, they are all case insensitive
a63c962f	27	(via alias) and all occurrance of spaces are replaced with '-'. In
5d030b67	28	other words, "ISO 8859 1" and "iso-8859-1" are identical.
5d030b67	29
5129552c	30	Encodings are categorized and implemented in several different modules
	31	but you don't have to C<use Encode::XX> to make them available for
	32	most cases. Encode.pm will automatically load those modules in need.
5d030b67	33
5129552c	34	=head2 Built-in Encodings
5d030b67	35
5129552c	36	The following encodings are always available.
5d030b67	37
67d7b5ef	38	Canonical Aliases Comments & References
	39	----------------------------------------------------------------
	40	iso-8859-1 latin1 [ISO]
	41	US-ascii ascii [ECMA]
	42	UCS-2 ucs2, iso-10646-1 [IANA, et al]
	43	UCS-2l
	44	UTF-8 utf8 [RFC2279]
	45	----------------------------------------------------------------
5d030b67	46
5129552c	47	=head2 Encode::Byte
5d030b67	48
5129552c	49	The following encodings are based single-byte encoding implemented as
	50	extended ASCII. For most cases it uses \x80-\xff (upper half) to map
	51	non-ASCII characters.
5d030b67	52
67d7b5ef	53	----------------------------------------------------------------
67d7b5ef	54	# ISO 8859 series
a63c962f	55	(iso-8859-1 is in built-in)
67d7b5ef	56	iso-8859-2 latin2 [ISO]
	57	iso-8859-3 latin3 [ISO]
	58	iso-8859-4 latin4 [ISO]
	59	iso-8859-5 [ISO]
	60	iso-8859-6 [ISO]
	61	iso-8859-7 [ISO]
	62	iso-8859-8 [ISO]
	63	iso-8859-9 latin5 [ISO]
	64	iso-8859-10 latin6 [ISO]
5d030b67	65	iso-8859-11
5d030b67	66	(iso-8859-12 is nonexistent)
67d7b5ef	67	iso-8859-13 latin7 [ISO]
	68	iso-8859-14 latin8 [ISO]
	69	iso-8859-15 latin9 [ISO]
	70	iso-8859-16 latin10 [ISO]
	71
	72	# Cyrillic
	73	koi8-f
	74	koi8-r [RFC1489]
	75	koi8-u [RFC2319]
	76
	77	# Vietnamese
	78	viscii
	79
	80	# all cp* are also available as ibm-, ms-, and windows-*
	81	# also see L<http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp>
5d030b67	82	cp1250 WinLatin2
	83	cp1251 WinCyrillic
	84	cp1252 WinLatin1
	85	cp1253 WinGreek
	86	cp1254 WinTurkiskh
	87	cp1255 WinHebrew
	88	cp1256 WinArabic
	89	cp1257 WinBaltic
	90	cp1258 WinVietnamese
64ffdd5e	91
67d7b5ef	92	# Macintosh
	93	# Also see L<http://developer.apple.com/technotes/tn/tn1150.html>
	94	MacCentralEurRoman
	95	MacCroatian
	96	MacRoman
	97	MacCyrillic
	98	MacRomanian
	99	MacSami
3ef515df	100	MacGreek
67d7b5ef	101	MacThai
3ef515df	102	MacIceland
67d7b5ef	103	MacTurkish
	104	MacUkrainian
	105
	106	# More vendor encodings
64ffdd5e	107	nextstep
64ffdd5e	108	gsm0338 # used in GSM handsets
67d7b5ef	109	hp-roman8
67d7b5ef	110	----------------------------------------------------------------
5d030b67	111
5129552c	112	=head2 The CJK: Chinese, Japanese, Korean (Multibyte)
5d030b67	113
5d030b67	114	Note Vietnamese is listed above. Also read "Encoding vs Charset"
a63c962f	115	below. Also note these are implemented in distinct module by
	116	languages, due the the size concerns. Please also refer to their
	117	respective document pages.
5d030b67	118
5129552c	119	=over 4
	120
	121	=item Encode::CN -- Continental China
	122
67d7b5ef	123	----------------------------------------------------------------
5129552c	124	cp936 gbk
67d7b5ef	125	euc-cn gb2312
	126	gb12345-raw
	127	gb2312-raw
5129552c	128	hz
5129552c	129	iso-ir-165
67d7b5ef	130	----------------------------------------------------------------
5129552c	131
	132	=item Encode::JP -- Japan
	133
67d7b5ef	134	----------------------------------------------------------------
5129552c	135	7bit-jis jis
67d7b5ef	136	cp932 ms_Kanji
5129552c	137	euc-jp ujis
67d7b5ef	138	iso-2022-jp [RFC1468]
	139	iso-2022-jp-1 [RFC2237]
	140	macJapan
5129552c	141	shiftjis Shift_JIS, sjis
67d7b5ef	142	----------------------------------------------------------------
5129552c	143
	144	=item Encode::KR -- Korea
	145
67d7b5ef	146	----------------------------------------------------------------
5129552c	147	euc-kr
67d7b5ef	148	cp949 ks_c_5601-1987 x-windows-949 uhc
	149	iso-2022-kr [RFC1557]
	150	johab
	151	ksc5601-raw
	152	----------------------------------------------------------------
5129552c	153
	154	=item Encode::TW -- Taiwan
	155
67d7b5ef	156	----------------------------------------------------------------
5129552c	157	big5
	158	big5-hkscs
	159	cp950
67d7b5ef	160	----------------------------------------------------------------
5129552c	161
	162	=item Encode::HanExtra -- More Chinese via CPAN
	163
	164	Due to size concerns, additional Chinese encodings below are
	165	distributed separately on CPAN, under the name Encode::HanExtra.
	166
67d7b5ef	167	----------------------------------------------------------------
5129552c	168	gb18030
	169	euc-tw
	170	big5plus
67d7b5ef	171	----------------------------------------------------------------
5129552c	172
	173	=back
	174
	175	=head2 Miscellaneous encodings
	176
	177	=over 4
	178
	179	=item Encode::EBCDIC
5d030b67	180
	181	See perlebcdic for details.
	182
67d7b5ef	183	----------------------------------------------------------------
5d030b67	184	cp1047
	185	cp37
	186	posix-bc
67d7b5ef	187	----------------------------------------------------------------
5129552c	188
a63c962f	189	=item Encode::Symbols
5d030b67	190
5129552c	191	For symbols and dingbats.
5d030b67	192
67d7b5ef	193	----------------------------------------------------------------
5d030b67	194	symbol
5d030b67	195	dingbats
67d7b5ef	196	macDingbats
	197	----------------------------------------------------------------
	198
	199	=back
	200
	201	=head1 Unsupported encodings
	202
	203	The following are not supported as yet. Some because they are rarely
	204	usede, some because of technical difficulty. They may be supported by
	205	external modules via CPAN in future, however.
	206
	207	=over 4
	208
	209	=item ISO-2022-JP-2 [RFC1554]
	210
	211	Not very popular yet. Needs Unicode Database or equivalent to
	212	implement encode() (Because it includes JIS X 0208/0212, KSC5601, and
	213	GB2312 sumulteniously, which code points in unicode overlap. So you
	214	need to lookup the database to determine what character set a given
	215	Unicode character should belong).
	216
	217	=item ISO-2022-CN [RFC1922]
	218
	219	Not very popular. Needs CNS 11643-1 and 2 which are not available in
	220	this module. CNS 11643 is supported (via euc-tw) in
	221	Encode::HanExtra. Autrijus may add support for this encoding in his
	222	module in future
	223
	224	=item various UP-UX encodings
	225
	226	The following are unsoported due to the lack of mapping data.
	227
	228	'8' - arabic8, greek8, hebrew8, kana8, thai8, and turkish8
	229	'15' - japanese15, korean15, and roi15
	230
	231	=item Cyrillic encoding ISO-IR-111
	232
	233	Anton doubts its usefulness.
	234
	235	=item ISO-8859-8-1 [Hebrew]
	236
	237	None of the Encode team knows Hebrew enough. Contribution welcome.
	238
	239	=item Thai encoding TCVN
	240
	241	Ditto.
	242
	243	=item Vietnamese encodings VPS
	244
	245	Ditto.
	246
	247	=item various Mac encodings
	248
	249	The following are unsoported due to the lack of mapping data. "Mac"
	250	that prepends the encoding names are omitted.
	251
	252	Arabic, Armenian, Bengali, Burmese
	253	ChineseSimp, ChineseTrad, Devanagari, Ethiopic, ExtArabic
	254	Farsi, Georgian, Gujarati, Gurmukhi, Hebrew
	255	Kannada, Khmer, Korean, Laotian, Malayalam, Mongolian
	256	Oriya Sinhalese Symbol Tamil Telugu Tibetan Vietnamese
	257
	258	The rest of which already available are based upon the vendor mapping
	259	available at L<http://www.unicode.org/>
5129552c	260
5129552c	261	=back
5d030b67	262
	263	=head1 Encoding vs. Charset
	264
	265	Character encoding (or just "encoding") and Character Set (or just
	266	"charset") are often used interchangeably but they are different
	267	concepts.
	268
67d7b5ef	269	=over 2
	270
	271	=item Character I<Set> (I<charset> for short)
5d030b67	272
67d7b5ef	273	Is a collection of characters in which each character is distinguished
67d7b5ef	274	with unique ID (in most cases, ID is number).
5d030b67	275
67d7b5ef	276	=item Character I<Encoding>
a63c962f	277
67d7b5ef	278	Is a way to represent character set(s) in a stream of bits.
	279
	280	=back
	281
	282	A character encoding may contain a single character set
	283	(i.e. US-ascii) or multiple character sets (i.e. EUC-JP;
	284	US-ascii, JIS X 0201 Kana, JIS X 0208 and JIS X 0212).
	285
	286	A character encoding may also encode character set as-is (also called
	287	a I<raw> encoding. i.e. US-ascii) or processed (i.e. EUC-JP, US-ascii is
	288	as-is, JIS X 0201 is prepended with \x8E, JIS X 0208 is added by
	289	0x8080, and JIS X 0212 is added by 0x8080 then prepended with \x8F).
5d030b67	290
	291	As the name suggests, the Encode module supports encodings, not
	292	individual charsets.
	293
67d7b5ef	294	However, the word I<charset> is casually used even in Internet
	295	Assigned Number Authority to actually mean I<encoding>. Encode tries
	296	to soothe this misconception via aliases. For instance,
	297	C<gb2312> is aliased to C<euc-cn>, while "raw" encoded version is
	298	available as C<gb2312-raw>.
	299
a63c962f	300	=head1 Encoding Classification (by Anton Tagunov and Dan Kogai)
	301
	302	This section tries to classify the supported encodings by their
	303	applicability for information exchange over the Internet and to
	304	choose the most suitable aliases to name them in the context of
	305	such communication.
	306
67d7b5ef	307	=over 2
	308
	309	=item *
	310
	311	To (en\|de) code Encodings marked as C<*>, You need C<Encode::HanExtra>
	312	,available from CPAN.
	313
	314	=back
	315
a63c962f	316	Encoding names
5d030b67	317
67d7b5ef	318	US-ASCII UTF-8 ISO-8859-* KOI8-R
a63c962f	319	Shift_JIS EUC-JP ISO-2022-JP ISO-2022-JP-1
67d7b5ef	320	EUC-KR Big5
5d030b67	321
67d7b5ef	322	are registered to IANA as preferred MIME names and may probably be used over the Internet.
5d030b67	323
a63c962f	324	C<Shift_JIS> is no longer Microsft proprietary since it has been
67d7b5ef	325	officialized by JIS X 0208-1997.
5d030b67	326
a63c962f	327	EUC-CN
5d030b67	328
a63c962f	329	has not been registered with IANA (as of march 2002) but
67d7b5ef	330	seems to be supported by major web browsers. In Encode, GB2312
	331	is aliased to EUC-CN, with "uncooked" version of GB2312 canonicalized
	332	as gb2312-raw. See L<Encode::CN> for details.
	333
	334	KS_C_5601-1987
	335
	336	has been registered to IANA but when they are used, they are
	337	EUC-coded. Internet community in Korea is not happy with this.
	338	so C<KS_C_5601-1987> is aliased to C<cp949>, an enhanced version
	339	of C<euc-kr>, with ksc5601-raw for "uncooked".
5d030b67	340
a63c962f	341	UTF-16
a63c962f	342	KOI8-U (http://www.faqs.org/rfcs/rfc2319.html)
5d030b67	343
a63c962f	344	are IANA-registered (C<UTF-16> even as a preferred MIME name)
a63c962f	345	but probably should be avoided as encoding for web pages due to
67d7b5ef	346	the lack of browser supports.
5d030b67	347
5d030b67	348	ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
	349	GBK
	350	VISCII
a63c962f	351	GB 12345
	352	GB 18030 (*) (see links bellow)
	353	EUC-TW (*)
5d030b67	354
5d030b67	355	are totally valid encodings but not registered at IANA.
a63c962f	356	The names under which they are listed here are probably the
	357	most widely-known names for these encodings and are recommended
	358	names.
	359
67d7b5ef	360	BIG5PLUS (*)
a63c962f	361
67d7b5ef	362	is a bit proprietary name.
5d030b67	363
67d7b5ef	364	=head1 Bookmarks
5d030b67	365
67d7b5ef	366	=over 2
a63c962f	367
67d7b5ef	368	=item Assigned Charset Names by IANA
5d030b67	369
67d7b5ef	370	L<http://www.iana.org/assignments/character-sets>
5d030b67	371
67d7b5ef	372	Most of the C<canonical names> in Encode derive from this list
	373	so you can directly apply the string you have extracted from MIME
	374	header of mails and we pages.
	375
	376	=item CJK.inf
5d030b67	377
a63c962f	378	L<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
5d030b67	379
67d7b5ef	380	Somewhat obsolete (last update in 1996), but still useful. Also try
	381
	382	L<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>
	383
	384	You will find brief info on C<EUC-CN>, C<GBK> and mostly on C<GB 18030>
a63c962f	385
67d7b5ef	386	=item EMCA-035 (eq C<ISO-2022>)
a63c962f	387
67d7b5ef	388	L<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM>
	389
	390	The very dspecification of ISO-2022 is available from the link above.
	391
	392	=back
5d030b67	393
	394	=head1 See Also
	395
5129552c	396	L<Encode>,
5129552c	397	L<Encode::Byte>,
a63c962f	398	L<Encode::CN>, L<Encode::JP>, L<Encode::KR>, L<Encode::TW>,
5129552c	399	L<Encode::EBCDIC>, L<Encode::Symbol>
5d030b67	400
5d030b67	401	=cut
67d7b5ef	402
	403	I could not find this page because the hostname doesn't resolve!
	404
	405	Brief description for most of the mentioned CJK encodings
	406	L<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>