3 package I18N::LangTags::List;
4 # Time-stamp: "2004-10-06 23:26:21 ADT"
6 use vars qw(%Name %Is_Disrec $Debug $VERSION);
10 #----------------------------------------------------------------------
12 # read the table out of our own POD!
15 my($disrec,$tag,$name);
17 while(<I18N::LangTags::List::DATA>) {
19 $seeking = 0 if m/=for woohah/;
20 } elsif( ($disrec, $tag, $name) =
21 m/(\[?)\{([-0-9a-zA-Z]+)\}(?:\s*:)?\s*([^\[\]]+)/
23 $name =~ s/\s*[;\.]*\s*$//g;
26 print "<$tag> <$name>\n" if $Debug;
27 $last_name = $Name{$tag} = $name;
28 $Is_Disrec{$tag} = 1 if $disrec;
29 } elsif (m/[Ff]ormerly \"([-a-z0-9]+)\"/) {
30 $Name{$1} = "$last_name (old tag)" if $last_name;
34 die "No tags read??" unless $count;
36 #----------------------------------------------------------------------
39 my $tag = lc($_[0] || return);
44 if($tag =~ m/^x-(.+)/) {
46 } elsif($tag =~ m/^i-(.+)/) {
54 print "Input: {$tag}\n" if $Debug;
56 last if $name = $Name{$tag};
57 last if $name = $Name{$alt};
58 if($tag =~ s/(-[a-z0-9]+)$//s) {
59 print "Shaving off: $1 leaving $tag\n" if $Debug;
60 $subform = "$1$subform";
61 # and loop around again
63 $alt =~ s/(-[a-z0-9]+)$//s && $Debug && print " alt -> $alt\n";
65 # we're trying to pull a subform off a primary tag. TILT!
66 print "Aborting on: {$name}{$subform}\n" if $Debug;
70 print "Output: {$name}{$subform}\n" if $Debug;
72 return unless $name; # Failure
73 return $name unless $subform; # Exact match
76 return "$name (Subform \"$subform\")";
79 #--------------------------------------------------------------------------
82 my $tag = lc($_[0] || return 0);
83 #require I18N::LangTags;
90 (?: # Subtags thereafter
92 [a-z0-9]{1,8} # subtag
97 foreach my $bit (split('-', $tag)) {
99 scalar(@supers) ? ($supers[-1] . '-' . $bit) : $bit;
101 return 0 unless @supers;
102 shift @supers if $supers[0] =~ m<^(i|x|sgn)$>s;
103 return 0 unless @supers;
105 foreach my $f ($tag, @supers) {
106 return 0 if $Is_Disrec{$f};
107 return 2 if $Name{$f};
108 # so that decent subforms of indecent tags are decent
110 return 2 if $Name{$tag}; # not only is it decent, it's known!
114 #--------------------------------------------------------------------------
121 I18N::LangTags::List -- tags and names for human languages
125 use I18N::LangTags::List;
126 print "Parlez-vous... ", join(', ',
127 I18N::LangTags::List::name('elx') || 'unknown_language',
128 I18N::LangTags::List::name('ar-Kw') || 'unknown_language',
129 I18N::LangTags::List::name('en') || 'unknown_language',
130 I18N::LangTags::List::name('en-CA') || 'unknown_language',
135 Parlez-vous... Elamite, Kuwait Arabic, English, Canadian English?
139 This module provides a function
140 C<I18N::LangTags::List::name( I<langtag> ) > that takes
141 a language tag (see L<I18N::LangTags|I18N::LangTags>)
142 and returns the best attempt at an English name for it, or
143 undef if it can't make sense of the tag.
145 The function I18N::LangTags::List::name(...) is not exported.
147 This module also provides a function
148 C<I18N::LangTags::List::is_decent( I<langtag> )> that returns true iff
149 the language tag is syntactically valid and is for general use (like
150 "fr" or "fr-ca", below). That is, it returns false for tags that are
151 syntactically invalid and for tags, like "aus", that are listed in
152 brackets below. This function is not exported.
154 The map of tags-to-names that it uses is accessible as
155 %I18N::LangTags::List::Name, and it's the same as the list
156 that follows in this documentation, which should be useful
157 to you even if you don't use this module.
159 =head1 ABOUT LANGUAGE TAGS
161 Internet language tags, as defined in RFC 3066, are a formalism
162 for denoting human languages. The two-letter ISO 639-1 language
163 codes are well known (as "en" for English), as are their forms
164 when qualified by a country code ("en-US"). Less well-known are the
165 arbitrary-length non-ISO codes (like "i-mingo"), and the
166 recently (in 2001) introduced three-letter ISO-639-2 codes.
168 Remember these important facts:
174 Language tags are not locale IDs. A locale ID is written with a "_"
175 instead of a "-", (almost?) always matches C<m/^\w\w_\w\w\b/>, and
176 I<means> something different than a language tag. A language tag
177 denotes a language. A locale ID denotes a language I<as used in>
178 a particular place, in combination with non-linguistic
179 location-specific information such as what currency is used
180 there. Locales I<also> often denote character set information,
181 as in "en_US.ISO8859-1".
185 Language tags are not for computer languages.
189 "Dialect" is not a useful term, since there is no objective
190 criterion for establishing when two language-forms are
191 dialects of eachother, or are separate languages.
195 Language tags are not case-sensitive. en-US, en-us, En-Us, etc.,
196 are all the same tag, and denote the same language.
200 Not every language tag really refers to a single language. Some
201 language tags refer to conditions: i-default (system-message text
202 in English plus maybe other languages), und (undetermined
203 language). Others (notably lots of the three-letter codes) are
204 bibliographic tags that classify whole groups of languages, as
205 with cus "Cushitic (Other)" (i.e., a
206 language that has been classed as Cushtic, but which has no more
207 specific code) or the even less linguistically coherent
208 sai for "South American Indian (Other)". Though useful in
209 bibliography, B<SUCH TAGS ARE NOT
210 FOR GENERAL USE>. For further guidance, email me.
214 Language tags are not country codes. In fact, they are often
215 distinct codes, as with language tag ja for Japanese, and
216 ISO 3166 country code C<.jp> for Japan.
220 =head1 LIST OF LANGUAGES
222 The first part of each item is the language tag, between
224 is followed by an English name for the language or language-group.
225 Language tags that I judge to be not for general use, are bracketed.
227 This list is in alphabetical order by English name of the language.
230 The name in the =item line MUST NOT have E<...>'s in it!!
236 =item {ab} : Abkhazian
240 =item {ace} : Achinese
244 =item {ada} : Adangme
252 =item {afh} : Afrihili
256 =item {af} : Afrikaans
258 =item [{afa} : Afro-Asiatic (Other)]
264 =item {akk} : Akkadian
268 =item {sq} : Albanian
272 =item [{alg} : Algonquian languages]
276 =item [{tut} : Altaic (Other)]
284 eq Amis. eq 'Amis. eq Pangca.
286 =item [{apa} : Apache languages]
290 Many forms are mutually un-intelligible in spoken media.
293 {ar-bh} Bahrain Arabic;
294 {ar-dz} Algerian Arabic;
295 {ar-eg} Egyptian Arabic;
296 {ar-iq} Iraqi Arabic;
297 {ar-jo} Jordanian Arabic;
298 {ar-kw} Kuwait Arabic;
299 {ar-lb} Lebanese Arabic;
300 {ar-ly} Libyan Arabic;
301 {ar-ma} Moroccan Arabic;
302 {ar-om} Omani Arabic;
303 {ar-qa} Qatari Arabic;
304 {ar-sa} Sauda Arabic;
305 {ar-sy} Syrian Arabic;
306 {ar-tn} Tunisian Arabic;
307 {ar-ye} Yemen Arabic.
309 =item {arc} : Aramaic
311 NOT Amharic! NOT Samaritan Aramaic!
313 =item {arp} : Arapaho
315 =item {arn} : Araucanian
319 =item {hy} : Armenian
321 =item {an} : Aragonese
323 =item [{art} : Artificial (Other)]
325 =item {ast} : Asturian
329 =item {as} : Assamese
331 =item [{ath} : Athapascan languages]
333 eq Athabaskan. eq Athapaskan. eq Athabascan.
335 =item [{aus} : Australian languages]
337 =item [{map} : Austronesian (Other)]
351 =item {az} : Azerbaijani
356 {az-Arab} Azerbaijani in Arabic script;
357 {az-Cyrl} Azerbaijani in Cyrillic script;
358 {az-Latn} Azerbaijani in Latin script.
360 =item {ban} : Balinese
362 =item [{bat} : Baltic (Other)]
364 =item {bal} : Baluchi
370 =item [{bai} : Bamileke languages]
374 =item [{bnt} : Bantu (Other)]
382 =item {btk} : Batak (Indonesia)
386 =item {be} : Belarusian
388 eq Belarussian. eq Byelarussian.
389 eq Belorussian. eq Byelorussian.
390 eq White Russian. eq White Ruthenian.
399 =item [{ber} : Berber (Other)]
401 =item {bho} : Bhojpuri
419 =item {bug} : Buginese
421 =item {bg} : Bulgarian
423 =item {i-bnn} : Bunun
435 eq CatalE<aacute>n. eq Catalonian.
437 =item [{cau} : Caucasian (Other)]
439 =item {ceb} : Cebuano
441 =item [{cel} : Celtic (Other)]
444 {cel-gaulish} Gaulish (Historical)
446 =item [{cai} : Central American Indian (Other)]
448 =item {chg} : Chagatai
452 =item [{cmc} : Chamic languages]
454 =item {ch} : Chamorro
458 =item {chr} : Cherokee
462 =item {chy} : Cheyenne
464 =item {chb} : Chibcha
466 (Historical) NOT Chibchan (which is a language family).
468 =item {ny} : Chichewa
470 eq Nyanja. eq Chinyanja.
474 Many forms are mutually un-intelligible in spoken media.
476 {zh-Hans} Chinese, in simplified script;
477 {zh-Hant} Chinese, in traditional script;
478 {zh-tw} Taiwan Chinese;
480 {zh-sg} Singapore Chinese;
481 {zh-mo} Macau Chinese;
482 {zh-hk} Hong Kong Chinese;
483 {zh-guoyu} Mandarin [Putonghua/Guoyu];
484 {zh-hakka} Hakka [formerly "i-hakka"];
486 {zh-min-nan} Southern Hokkien;
487 {zh-wuu} Shanghaiese;
493 {i-hakka} Hakka (old tag)
495 =item {chn} : Chinook Jargon
499 =item {chp} : Chipewyan
501 =item {cho} : Choctaw
503 =item {cu} : Church Slavic
505 eq Old Church Slavonic.
507 =item {chk} : Chuukese
509 eq Trukese. eq Chuuk. eq Truk. eq Ruk.
517 =item {co} : Corsican
523 NOT Creek! (Formerly "cre".)
529 =item [{cpe} : English-based Creoles and pidgins (Other)]
531 =item [{cpf} : French-based Creoles and pidgins (Other)]
533 =item [{cpp} : Portuguese-based Creoles and pidgins (Other)]
535 =item [{crp} : Creoles and pidgins (Other)]
537 =item {hr} : Croatian
541 =item [{cus} : Cushitic (Other)]
547 eq Nakota. eq Latoka.
555 =item {i-default} : Default (Fallthru) Language
557 Defined in RFC 2277, this is for tagging text
558 (which must include English text, and might/should include text
559 in other appropriate languages) that is emitted in a context
560 where language-negotiation wasn't possible -- in SMTP mail failure
561 messages, for example.
563 =item {del} : Delaware
569 eq Maldivian. (Formerly "div".)
579 =item [{dra} : Dravidian (Other)]
585 eq Netherlander. Notable forms:
586 {nl-nl} Netherlands Dutch;
587 {nl-be} Belgian Dutch.
589 =item {dum} : Middle Dutch (ca.1050-1350)
595 =item {dz} : Dzongkha
599 =item {egy} : Ancient Egyptian
605 =item {elx} : Elamite
612 {en-au} Australian English;
613 {en-bz} Belize English;
614 {en-ca} Canadian English;
616 {en-ie} Irish English;
617 {en-jm} Jamaican English;
618 {en-nz} New Zealand English;
619 {en-ph} Philippine English;
620 {en-tt} Trinidad English;
622 {en-za} South African English;
623 {en-zw} Zimbabwe English.
625 =item {enm} : Old English (1100-1500)
629 =item {ang} : Old English (ca.450-1100)
631 eq Anglo-Saxon. (Historical)
633 =item {i-enochian} : Enochian (Artificial)
637 =item {eo} : Esperanto
641 =item {et} : Estonian
659 =item [{fiu} : Finno-Ugrian (Other)]
661 eq Finno-Ugric. NOT Ugaritic!
668 {fr-fr} France French;
669 {fr-be} Belgian French;
670 {fr-ca} Canadian French;
671 {fr-ch} Swiss French;
672 {fr-lu} Luxembourg French;
673 {fr-mc} Monaco French.
675 =item {frm} : Middle French (ca.1400-1600)
679 =item {fro} : Old French (842-ca.1400)
685 =item {fur} : Friulian
693 =item {gd} : Scots Gaelic
697 =item {gl} : Gallegan
713 =item {ka} : Georgian
718 {de-at} Austrian German;
719 {de-be} Belgian German;
720 {de-ch} Swiss German;
721 {de-de} Germany German;
722 {de-li} Liechtenstein German;
723 {de-lu} Luxembourg German.
725 =item {gmh} : Middle High German (ca.1050-1500)
729 =item {goh} : Old High German (ca.750-1050)
733 =item [{gem} : Germanic (Other)]
735 =item {gil} : Gilbertese
739 =item {gor} : Gorontalo
747 =item {grc} : Ancient Greek
749 (Historical) (Until 15th century or so.)
751 =item {el} : Modern Greek
753 (Since 15th century or so.)
759 =item {gu} : Gujarati
761 =item {gwi} : Gwich'in
773 =item {haw} : Hawaiian
782 {iw} Hebrew (old tag)
786 =item {hil} : Hiligaynon
788 =item {him} : Himachali
792 =item {ho} : Hiri Motu
794 =item {hit} : Hittite
800 =item {hu} : Hungarian
806 =item {is} : Icelandic
820 =item [{inc} : Indic (Other)]
822 =item [{ine} : Indo-European (Other)]
824 =item {id} : Indonesian
829 {in} Indonesian (old tag)
833 =item {ia} : Interlingua (International Auxiliary Language Association)
835 (Artificial) NOT Interlingue!
837 =item {ie} : Interlingue
839 (Artificial) NOT Interlingua!
841 =item {iu} : Inuktitut
843 A subform of "Eskimo".
847 A subform of "Eskimo".
849 =item [{ira} : Iranian (Other)]
853 =item {mga} : Middle Irish (900-1200)
857 =item {sga} : Old Irish (to 900)
861 =item [{iro} : Iroquoian languages]
866 {it-it} Italy Italian;
867 {it-ch} Swiss Italian.
869 =item {ja} : Japanese
873 =item {jv} : Javanese
875 (Formerly "jw" because of a typo.)
877 =item {jrb} : Judeo-Arabic
879 =item {jpr} : Judeo-Persian
881 =item {kbd} : Kabardian
887 =item {kl} : Kalaallisut
889 eq Greenlandic "Eskimo"
897 eq Kanarese. NOT Canadian!
903 =item {krc} : Karachay-Balkar
905 =item {kaa} : Kara-Kalpak
909 =item {ks} : Kashmiri
911 =item {csb} : Kashubian
923 eq Cambodian. eq Kampuchean.
925 =item [{khi} : Khoisan (Other)]
927 =item {kho} : Khotanese
933 =item {kmb} : Kimbundu
935 =item {rw} : Kinyarwanda
939 =item {i-klingon} : Klingon
947 =item {kok} : Konkani
951 =item {kos} : Kosraean
957 =item {kj} : Kuanyama
965 =item {kut} : Kutenai
969 eq Judeo-Spanish. NOT Ladin (a minority language in Italy).
985 (Historical) NOT Ladin! NOT Ladino!
991 =item {lb} : Letzeburgesch
993 eq Luxemburgian, eq Luxemburger. (Formerly "i-lux".)
996 {i-lux} Letzeburgesch (old tag)
998 =item {lez} : Lezghian
1000 =item {li} : Limburgish
1002 eq Limburger, eq Limburgan. NOT Letzeburgesch!
1004 =item {ln} : Lingala
1006 =item {lt} : Lithuanian
1008 =item {nds} : Low German
1010 eq Low Saxon. eq Low German. eq Low Saxon.
1012 =item {art-lojban} : Lojban (Artificial)
1016 =item {lu} : Luba-Katanga
1020 =item {lua} : Luba-Lulua
1022 =item {lui} : Luiseno
1028 =item {luo} : Luo (Kenya and Tanzania)
1030 =item {lus} : Lushai
1032 =item {mk} : Macedonian
1034 eq the modern Slavic language spoken in what was Yugoslavia.
1035 NOT the form of Greek spoken in Greek Macedonia!
1037 =item {mad} : Madurese
1039 =item {mag} : Magahi
1041 =item {mai} : Maithili
1043 =item {mak} : Makasar
1045 =item {mg} : Malagasy
1051 =item {ml} : Malayalam
1055 =item {mt} : Maltese
1057 =item {mnc} : Manchu
1059 =item {mdr} : Mandar
1063 =item {man} : Mandingo
1065 =item {mni} : Manipuri
1069 =item [{mno} : Manobo languages]
1077 =item {mr} : Marathi
1083 =item {mh} : Marshall
1087 =item {mwr} : Marwari
1091 =item [{myn} : Mayan languages]
1095 =item {mic} : Micmac
1097 =item {min} : Minangkabau
1099 =item {i-mingo} : Mingo
1101 eq the Irquoian language West Virginia Seneca. NOT New York Seneca!
1103 =item [{mis} : Miscellaneous languages]
1107 =item {moh} : Mohawk
1109 =item {mdf} : Moksha
1111 =item {mo} : Moldavian
1115 =item [{mkh} : Mon-Khmer (Other)]
1119 =item {mn} : Mongolian
1125 =item [{mul} : Multiple languages]
1129 =item [{mun} : Munda languages]
1131 =item {nah} : Nahuatl
1133 =item {nap} : Neapolitan
1139 eq Navaho. (Formerly "i-navajo".)
1142 {i-navajo} Navajo (old tag)
1144 =item {nd} : North Ndebele
1146 =item {nr} : South Ndebele
1152 eq Nepalese. Notable forms:
1153 {ne-np} Nepal Nepali;
1154 {ne-in} India Nepali.
1156 =item {new} : Newari
1160 =item [{nic} : Niger-Kordofanian (Other)]
1162 =item [{ssa} : Nilo-Saharan (Other)]
1164 =item {niu} : Niuean
1168 =item {non} : Old Norse
1172 =item [{nai} : North American Indian]
1176 =item {no} : Norwegian
1178 Note the two following forms:
1180 =item {nb} : Norwegian Bokmal
1182 eq BokmE<aring>l, (A form of Norwegian.) (Formerly "no-bok".)
1185 {no-bok} Norwegian Bokmal (old tag)
1187 =item {nn} : Norwegian Nynorsk
1189 (A form of Norwegian.) (Formerly "no-nyn".)
1192 {no-nyn} Norwegian Nynorsk (old tag)
1194 =item [{nub} : Nubian languages]
1196 =item {nym} : Nyamwezi
1198 =item {nyn} : Nyankole
1204 =item {oc} : Occitan (post 1500)
1206 eq ProvenE<ccedil>al, eq Provencal
1210 eq Ojibwe. (Formerly "oji".)
1218 =item {os} : Ossetian; Ossetic
1220 =item [{oto} : Otomian languages]
1222 Group of languages collectively called "OtomE<iacute>".
1224 =item {pal} : Pahlavi
1228 =item {i-pwn} : Paiwan
1232 =item {pau} : Palauan
1238 =item {pam} : Pampanga
1240 =item {pag} : Pangasinan
1242 =item {pa} : Panjabi
1246 =item {pap} : Papiamento
1250 =item [{paa} : Papuan (Other)]
1252 =item {fa} : Persian
1254 eq Farsi. eq Iranian.
1256 =item {peo} : Old Persian (ca.600-400 B.C.)
1258 =item [{phi} : Philippine (Other)]
1260 =item {phn} : Phoenician
1264 =item {pon} : Pohnpeian
1270 =item {pt} : Portuguese
1272 eq Portugese. Notable forms:
1273 {pt-pt} Portugal Portuguese;
1274 {pt-br} Brazilian Portuguese.
1276 =item [{pra} : Prakrit languages]
1278 =item {pro} : Old Provencal (to 1500)
1280 eq Old ProvenE<ccedil>al. (Historical.)
1284 eq Pashto. eq Pushtu.
1286 =item {qu} : Quechua
1290 =item {rm} : Raeto-Romance
1294 =item {raj} : Rajasthani
1296 =item {rap} : Rapanui
1298 =item {rar} : Rarotongan
1300 =item [{qaa - qtz} : Reserved for local use.]
1302 =item [{roa} : Romance (Other)]
1304 NOT Romanian! NOT Romany! NOT Romansh!
1306 =item {ro} : Romanian
1308 eq Rumanian. NOT Romany!
1310 =item {rom} : Romany
1312 eq Rom. NOT Romanian!
1316 =item {ru} : Russian
1318 NOT White Russian! NOT Rusyn!
1320 =item [{sal} : Salishan languages]
1322 Large language group.
1324 =item {sam} : Samaritan Aramaic
1328 =item {se} : Northern Sami
1330 eq Lappish. eq Lapp. eq (Northern) Saami.
1332 =item {sma} : Southern Sami
1334 =item {smn} : Inari Sami
1336 =item {smj} : Lule Sami
1338 =item {sms} : Skolt Sami
1340 =item [{smi} : Sami languages (Other)]
1344 =item {sad} : Sandawe
1348 =item {sa} : Sanskrit
1352 =item {sat} : Santali
1354 =item {sc} : Sardinian
1364 =item {sel} : Selkup
1366 =item [{sem} : Semitic (Other)]
1368 =item {sr} : Serbian
1370 eq Serb. NOT Sorbian.
1373 {sr-Cyrl} : Serbian in Cyrillic script;
1374 {sr-Latn} : Serbian in Latin script.
1382 =item {sid} : Sidamo
1384 =item {sgn-...} : Sign Languages
1386 Always use with a subtag. Notable forms:
1387 {sgn-gb} British Sign Language (BSL);
1388 {sgn-ie} Irish Sign Language (ESL);
1389 {sgn-ni} Nicaraguan Sign Language (ISN);
1390 {sgn-us} American Sign Language (ASL).
1392 (And so on with other country codes as the subtag.)
1394 =item {bla} : Siksika
1396 eq Blackfoot. eq Pikanii.
1400 =item {si} : Sinhalese
1404 =item [{sit} : Sino-Tibetan (Other)]
1406 =item [{sio} : Siouan languages]
1408 =item {den} : Slave (Athapascan)
1410 ("Slavey" is a subform.)
1412 =item [{sla} : Slavic (Other)]
1418 =item {sl} : Slovenian
1422 =item {sog} : Sogdian
1426 =item {son} : Songhai
1428 =item {snk} : Soninke
1430 =item {wen} : Sorbian languages
1432 eq Wendish. eq Sorb. eq Lusatian. eq Wend. NOT Venda! NOT Serbian!
1434 =item {nso} : Northern Sotho
1436 =item {st} : Southern Sotho
1438 eq Sutu. eq Sesotho.
1440 =item [{sai} : South American Indian (Other)]
1442 =item {es} : Spanish
1445 {es-ar} Argentine Spanish;
1446 {es-bo} Bolivian Spanish;
1447 {es-cl} Chilean Spanish;
1448 {es-co} Colombian Spanish;
1449 {es-do} Dominican Spanish;
1450 {es-ec} Ecuadorian Spanish;
1451 {es-es} Spain Spanish;
1452 {es-gt} Guatemalan Spanish;
1453 {es-hn} Honduran Spanish;
1454 {es-mx} Mexican Spanish;
1455 {es-pa} Panamanian Spanish;
1456 {es-pe} Peruvian Spanish;
1457 {es-pr} Puerto Rican Spanish;
1458 {es-py} Paraguay Spanish;
1459 {es-sv} Salvadoran Spanish;
1461 {es-uy} Uruguayan Spanish;
1462 {es-ve} Venezuelan Spanish.
1464 =item {suk} : Sukuma
1466 =item {sux} : Sumerian
1470 =item {su} : Sundanese
1474 =item {sw} : Swahili
1480 =item {sv} : Swedish
1483 {sv-se} Sweden Swedish;
1484 {sv-fi} Finland Swedish.
1486 =item {syr} : Syriac
1488 =item {tl} : Tagalog
1490 =item {ty} : Tahitian
1492 =item [{tai} : Tai (Other)]
1498 =item {tmh} : Tamashek
1508 =item {i-tay} : Tayal
1510 eq Atayal. eq Atayan.
1514 =item {ter} : Tereno
1522 =item {bo} : Tibetan
1526 =item {ti} : Tigrinya
1530 eq Themne. eq Timene.
1534 =item {tli} : Tlingit
1536 =item {tpi} : Tok Pisin
1538 =item {tkl} : Tokelau
1540 =item {tog} : Tonga (Nyasa)
1544 =item {to} : Tonga (Tonga Islands)
1546 (Pronounced "Tong-a", not "Tong-ga")
1550 =item {tsi} : Tsimshian
1558 =item {i-tsu} : Tsou
1564 =item {tum} : Tumbuka
1566 =item [{tup} : Tupi languages]
1568 =item {tr} : Turkish
1570 (Typically in Roman script)
1572 =item {ota} : Ottoman Turkish (1500-1928)
1574 (Typically in Arabic script) (Historical)
1576 =item {crh} : Crimean Turkish
1580 =item {tk} : Turkmen
1584 =item {tvl} : Tuvalu
1586 =item {tyv} : Tuvinian
1592 =item {udm} : Udmurt
1594 =item {uga} : Ugaritic
1600 =item {uk} : Ukrainian
1602 =item {umb} : Umbundu
1604 =item {und} : Undetermined
1606 Not a tag for normal use.
1615 {uz-Cyrl} Uzbek in Cyrillic script;
1616 {uz-Latn} Uzbek in Latin script.
1622 NOT Wendish! NOT Wend! NOT Avestan! (Formerly "ven".)
1624 =item {vi} : Vietnamese
1628 =item {vo} : Volapuk
1630 eq VolapE<uuml>k. (Artificial)
1636 =item [{wak} : Wakashan languages]
1638 =item {wa} : Walloon
1640 =item {wal} : Walamo
1646 Presumably the Philippine language Waray-Waray (SamareE<ntilde>o),
1647 not the smaller Philippine language Waray Sorsogon, nor the extinct
1648 Australian language Waray.
1658 =item {x-...} : Unregistered (Semi-Private Use)
1660 "x-" is a prefix for language tags that are not registered with ISO
1661 or IANA. Example, x-double-dutch
1669 (The Yao in Malawi?)
1671 =item {yap} : Yapese
1675 =item {ii} : Sichuan Yi
1677 =item {yi} : Yiddish
1679 Formerly "ji". Usually in Hebrew script.
1682 {yi-latn} Yiddish in Latin script
1686 =item [{ypk} : Yupik languages]
1688 Several "Eskimo" languages.
1692 =item [{zap} : Zapotec]
1694 (A group of languages.)
1696 =item {zen} : Zenaga
1714 L<I18N::LangTags|I18N::LangTags> and its "See Also" section.
1716 =head1 COPYRIGHT AND DISCLAIMER
1718 Copyright (c) 2001+ Sean M. Burke. All rights reserved.
1720 You can redistribute and/or
1721 modify this document under the same terms as Perl itself.
1723 This document is provided in the hope that it will be
1724 useful, but without any warranty;
1725 without even the implied warranty of accuracy, authoritativeness,
1726 completeness, merchantability, or fitness for a particular purpose.
1728 Email any corrections or questions to me.
1732 Sean M. Burke, sburkeE<64>cpan.org
1737 # To generate a list of just the two and three-letter codes:
1739 #!/usr/local/bin/perl -w
1741 require 5; # Time-stamp: "2001-03-13 21:53:39 MST"
1742 # Sean M. Burke, sburke@cpan.org
1743 # This program is for generating the language_codes.txt file
1746 use HTML::TreeBuilder 3.10;
1747 my $root = HTML::TreeBuilder->new();
1748 my $url = 'http://lcweb.loc.gov/standards/iso639-2/bibcodes.html';
1749 $root->parse(get($url) || die "Can't get $url");
1754 foreach my $tr ($root->find_by_tag_name('tr')) {
1755 my @f = map $_->as_text(), $tr->content_list();
1756 #print map("<$_> ", @f), "\n";
1757 next unless @f == 5;
1758 pop @f; # nix the French name
1759 next if $f[-1] eq 'Language Name (English)'; # it's a header line
1760 my $xx = splice(@f, 2,1); # pull out the two-letter code
1763 if($xx =~ m/[a-zA-Z]/) { # there's a two-letter code for it
1764 push @codes, [ lc($f[-1]), "$xx\t$f[-1]\n" ];
1765 } else { # print the three-letter codes.
1766 if($f[0] eq $f[1]) {
1767 push @codes, [ lc($f[-1]), "$f[1]\t$f[2]\n" ];
1768 } else { # shouldn't happen
1769 push @codes, [ lc($f[-1]), "@f !!!!!!!!!!\n" ];
1774 print map $_->[1], sort {; $a->[0] cmp $b->[0] } @codes;
1775 print "[ based on $url\n at ", scalar(localtime), "]\n",
1776 "[Note: doesn't include IANA-registered codes.]\n";