From: Jarkko Hietaniemi Date: Mon, 18 Feb 2002 15:26:32 +0000 (+0000) Subject: Upgrade to Locale::Codes 2.00. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=6b6e008c33e4c468fb6c1356921c16cdf5c73b26;p=p5sagit%2Fp5-mst-13.2.git Upgrade to Locale::Codes 2.00. p4raw-id: //depot/perl@14746 --- diff --git a/MANIFEST b/MANIFEST index 1392103..36326fd 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1105,11 +1105,14 @@ lib/lib.t For "use lib" testing lib/lib_pm.PL For "use lib", produces lib/lib.pm lib/locale.pm For "use locale" lib/locale.t See if locale support works +lib/Locale/Codes/ChangeLog Locale::Codes +lib/Locale/Codes/README Locale::Codes lib/Locale/Codes/t/all.t See if Locale::Codes work lib/Locale/Codes/t/constants.t See if Locale::Codes work lib/Locale/Codes/t/country.t See if Locale::Codes work lib/Locale/Codes/t/currency.t See if Locale::Codes work lib/Locale/Codes/t/languages.t See if Locale::Codes work +lib/Locale/Codes/t/script.t See if Locale::Codes work lib/Locale/Codes/t/uk.t See if Locale::Codes work lib/Locale/Constants.pm Locale::Codes lib/Locale/Country.pm Locale::Codes @@ -1121,6 +1124,7 @@ lib/Locale/Maketext/ChangeLog Locale::Maketext lib/Locale/Maketext/README Locale::Maketext lib/Locale/Maketext/test.pl See if Locale::Maketext works lib/Locale/Maketext/TPJ13.pod Locale::Maketext documentation article +lib/Locale/Script.pm Locale::Codes lib/look.pl A "look" equivalent lib/Math/BigFloat.pm An arbitrary precision floating-point arithmetic package lib/Math/BigInt.pm An arbitrary precision integer arithmetic package diff --git a/lib/Locale/Codes/ChangeLog b/lib/Locale/Codes/ChangeLog new file mode 100644 index 0000000..d8a1837 --- /dev/null +++ b/lib/Locale/Codes/ChangeLog @@ -0,0 +1,88 @@ + + ChangeLog for Locale-Codes Distribution + +2.00 2002-02-17 neilb + + * Created Locale::Script which provides an interface to the + ISO codes for identification of scripts (writing scripts, + rather than perl style scripts). The codes are defined + by ISO 15924, which is currently in final draft. + Thanks to Jarkko for pointing out this new standard. + All three code sets are supported, and a test-suite added. + + * Added support for country name variants to Locale::Country, + so that + country2code('USA') + country2code('United States') + country2code('United States of America') + will all return 'us'. + This had been in the LIMITATIONS section since the first version. + Patch from TJ Mather with additional + variants from me. Added test-cases for these. + + * Added VERSION to Locale::Constants. Thanks to Jarkko for + pointing that it was missing. + + * Should really have bumped major version with previous release, + since there was a change to the API. + +1.06 2001-03-04 neilb + + Added Locale::Constants, which defines three symbols + for identifying which codeset is being used: + + LOCALE_CODE_ALPHA_2 + LOCALE_CODE_ALPHA_3 + LOCALE_CODE_NUMERIC + + Updated Locale::Country to support all three code sets + defined by ISO 3166. This was requested by Keith Wall. + I haven't added multiple codeset support to the other + modules yet - I'll wait until someone asks for them. + +1.05 Feb 2001 + + Added Locale::Currency, contribution from Michael Hennecke. + Added testsuite for it (t/currency.t) and added testcases + to t/all.t for the all_* functions. + +1.04 Dec 2000 + + Fixed very minor typos from 1.03! + +1.03 Dec 2000 + + Updated Locale::Country: + - fixed spelling of a few countries + - added link to a relevant page from CIA world factbook + + Updated Locale::Language: + - fixed typo in the documentation (ISO 939 should be 639) + +1.02 May 2000 + + Updated Locale::Country and Locale::Language to reflect changes + in the relevant ISO standards. These mainly reflect languages + which are new to the relevant standard, and changes in the + spelling of some country names. + + Added official URLs for the standards to the SEE ALSO sections + of the doc for each module. + + Thanks to Jarkko Hietaniemi for pointing me at the pages + with latest versions of ISO 3166 and 639. + +1.00 March 1998 + + Added Locale::Country::_alias_code() so that 'uk' can be added + as the code for "United Kingdom", if you want it. + This was prompted by Ed Jordan + + Added a new testsuite for handling this case, and extended the + existing test-suite to include testing of the case where + 'uk' hasn't been defined as a valid code. + +0.003 May 1997 + + First public release to CPAN + diff --git a/lib/Locale/Codes/README b/lib/Locale/Codes/README new file mode 100644 index 0000000..20d174f --- /dev/null +++ b/lib/Locale/Codes/README @@ -0,0 +1,47 @@ + + Locale-Codes Distribution + v2.00 + +This distribution contains four Perl modules which can be used to process +ISO codes for identifying languages, countries, scripts, +and currencies & funds. + + Locale::Language + Two letter codes for language identification (ISO 639). + For example, 'en' is the code for 'English'. + + Locale::Country + Codes for country identification (ISO 3166). This module + supports the three different code sets defined by the + standard: alpha-2, alpha-3, and numeric codes. + For example, 'bo' is the code for 'Bolivia'. + + Locale::Currency + Three letter codes for currency and fund identification (ISO 4217). + For example, 'sek' is the code for 'Swedish Krona'. + + Locale::Script + Codes for script identification (ISO 15924). This module supports + the three different code sets defined by the standard: + alpha-2, alpha-3, and numeric codes. + +To install these modules, you should just have to run the following: + + % perl Makefile.PL + % make + % make test + % make install + +The modules are documented using pod. When you "make install", you +will get three man-pages: Locale::Language, Locale::Country, +Locale::Currency, Locale::Script. + +The first version of Locale::Currency was written by Michael Hennecke, +with modifications for inclusion by me. Kudos to Michael. + +Please let me know if you experience any problems with these modules, +or have any ideas for additions. + + +Neil Bowers + diff --git a/lib/Locale/Codes/t/all.t b/lib/Locale/Codes/t/all.t index ed93c5a..6c751aa 100644 --- a/lib/Locale/Codes/t/all.t +++ b/lib/Locale/Codes/t/all.t @@ -4,6 +4,7 @@ # Locale::Country # Locale::Language # Locale::Currency +# Locale::Script # # There are four tests. We get a list of all codes, convert to # language/country/currency, # convert back to code, @@ -19,8 +20,9 @@ BEGIN { use Locale::Country; use Locale::Language; use Locale::Currency; +use Locale::Script; -print "1..12\n"; +print "1..20\n"; my $code; my $language; @@ -28,6 +30,7 @@ my $country; my $ok; my $reverse; my $currency; +my $script; #----------------------------------------------------------------------- @@ -364,3 +367,219 @@ foreach $currency (all_currency_names()) } } print ($ok ? "ok 12\n" : "not ok 12\n"); + +#======================================================================= +# +# Locale::Script tests +# +#======================================================================= + +#----------------------------------------------------------------------- +# Old API - without codeset specified, default to ALPHA_2 +#----------------------------------------------------------------------- +$ok = 1; +foreach $code (all_script_codes()) +{ + $script = code2script($code); + if (!defined $script) + { + $ok = 0; + last; + } + $reverse = script2code($script); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $code) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 13\n" : "not ok 13\n"); + +#----------------------------------------------------------------------- +# code to script, back to code, for ALPHA2 +#----------------------------------------------------------------------- +$ok = 1; +foreach $code (all_script_codes(LOCALE_CODE_ALPHA_2)) +{ + $script = code2script($code, LOCALE_CODE_ALPHA_2); + if (!defined $script) + { + $ok = 0; + last; + } + $reverse = script2code($script, LOCALE_CODE_ALPHA_2); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $code) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 14\n" : "not ok 14\n"); + +#----------------------------------------------------------------------- +# code to script, back to code, for ALPHA3 +#----------------------------------------------------------------------- +$ok = 1; +foreach $code (all_script_codes(LOCALE_CODE_ALPHA_3)) +{ + $script = code2script($code, LOCALE_CODE_ALPHA_3); + if (!defined $script) + { + $ok = 0; + last; + } + $reverse = script2code($script, LOCALE_CODE_ALPHA_3); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $code) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 15\n" : "not ok 15\n"); + +#----------------------------------------------------------------------- +# code to script, back to code, for NUMERIC +#----------------------------------------------------------------------- +$ok = 1; +foreach $code (all_script_codes(LOCALE_CODE_NUMERIC)) +{ + $script = code2script($code, LOCALE_CODE_NUMERIC); + if (!defined $script) + { + $ok = 0; + last; + } + $reverse = script2code($script, LOCALE_CODE_NUMERIC); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $code) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 16\n" : "not ok 16\n"); + + +#----------------------------------------------------------------------- +# Old API - script to code, back to script, using default of ALPHA_2 +#----------------------------------------------------------------------- +$ok = 1; +foreach $script (all_script_names()) +{ + $code = script2code($script); + if (!defined $code) + { + $ok = 0; + last; + } + $reverse = code2script($code); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $script) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 17\n" : "not ok 17\n"); + +#----------------------------------------------------------------------- +# script to code, back to script, using LOCALE_CODE_ALPHA_2 +#----------------------------------------------------------------------- +$ok = 1; +foreach $script (all_script_names()) +{ + $code = script2code($script, LOCALE_CODE_ALPHA_2); + if (!defined $code) + { + $ok = 0; + last; + } + $reverse = code2script($code, LOCALE_CODE_ALPHA_2); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $script) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 18\n" : "not ok 18\n"); + +#----------------------------------------------------------------------- +# script to code, back to script, using LOCALE_CODE_ALPHA_3 +#----------------------------------------------------------------------- +$ok = 1; +foreach $script (all_script_names()) +{ + $code = script2code($script, LOCALE_CODE_ALPHA_3); + if (!defined $code) + { + $ok = 0; + last; + } + $reverse = code2script($code, LOCALE_CODE_ALPHA_3); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $script) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 19\n" : "not ok 19\n"); + +#----------------------------------------------------------------------- +# script to code, back to script, using LOCALE_CODE_NUMERIC +#----------------------------------------------------------------------- +$ok = 1; +foreach $script (all_script_names()) +{ + $code = script2code($script, LOCALE_CODE_NUMERIC); + if (!defined $code) + { + $ok = 0; + last; + } + $reverse = code2script($code, LOCALE_CODE_NUMERIC); + if (!defined $reverse) + { + $ok = 0; + last; + } + if ($reverse ne $script) + { + $ok = 0; + last; + } +} +print ($ok ? "ok 20\n" : "not ok 20\n"); + diff --git a/lib/Locale/Codes/t/country.t b/lib/Locale/Codes/t/country.t index 4234d1e..0953da1 100644 --- a/lib/Locale/Codes/t/country.t +++ b/lib/Locale/Codes/t/country.t @@ -60,13 +60,23 @@ use Locale::Country; ['!defined country2code("Banana")', 0], # illegal country name #---- some successful examples ----------------------------------------- - ['country2code("japan") eq "jp"', 0], - ['country2code("japan") ne "ja"', 0], - ['country2code("Japan") eq "jp"', 0], - ['country2code("United States") eq "us"', 0], - ['country2code("United Kingdom") eq "gb"', 0], - ['country2code("Andorra") eq "ad"', 0], # first in DATA segment - ['country2code("Zimbabwe") eq "zw"', 0], # last in DATA segment + ['country2code("japan") eq "jp"', 0], + ['country2code("japan") ne "ja"', 0], + ['country2code("Japan") eq "jp"', 0], + ['country2code("United States") eq "us"', 0], + ['country2code("United Kingdom") eq "gb"', 0], + ['country2code("Andorra") eq "ad"', 0], # first in DATA + ['country2code("Zimbabwe") eq "zw"', 0], # last in DATA + ['country2code("Iran") eq "ir"', 0], # alias + ['country2code("North Korea") eq "kp"', 0], # alias + ['country2code("South Korea") eq "kr"', 0], # alias + ['country2code("Libya") eq "ly"', 0], # alias + ['country2code("Syria") eq "sy"', 0], # alias + ['country2code("Svalbard") eq "sj"', 0], # alias + ['country2code("Jan Mayen") eq "sj"', 0], # alias + ['country2code("USA") eq "us"', 0], # alias + ['country2code("United States of America") eq "us"', 0], # alias + ['country2code("Great Britain") eq "gb"', 0], # alias #================================================ # TESTS FOR country_code2code diff --git a/lib/Locale/Codes/t/script.t b/lib/Locale/Codes/t/script.t new file mode 100644 index 0000000..989b778 --- /dev/null +++ b/lib/Locale/Codes/t/script.t @@ -0,0 +1,106 @@ +#!./perl +# +# script.t - tests for Locale::Script +# + +use Locale::Script; + +#----------------------------------------------------------------------- +# This is an array of tests specs. Each spec is [TEST, OK_TO_DIE] +# Each TEST is eval'd as an expression. +# If it evaluates to FALSE, then "not ok N" is printed for the test, +# otherwise "ok N". If the eval dies, then the OK_TO_DIE flag is checked. +# If it is true (1), the test is treated as passing, otherwise it failed. +#----------------------------------------------------------------------- +@TESTS = +( + #================================================ + # TESTS FOR code2script + #================================================ + + #---- selection of examples which should all result in undef ----------- + ['!defined code2script()', 0], # no argument + ['!defined code2script(undef)', 0], # undef argument + ['!defined code2script("aa")', 0], # illegal code + ['!defined code2script("aa", LOCALE_CODE_ALPHA_2)', 0], # illegal code + ['!defined code2script("aa", LOCALE_CODE_ALPHA_3)', 0], # illegal code + ['!defined code2script("aa", LOCALE_CODE_NUMERIC)', 0], # illegal code + + #---- some successful examples ----------------------------------------- + ['code2script("BO") eq "Tibetan"', 0], + ['code2script("Bo") eq "Tibetan"', 0], + ['code2script("bo") eq "Tibetan"', 0], + ['code2script("bo", LOCALE_CODE_ALPHA_2) eq "Tibetan"', 0], + ['code2script("bod", LOCALE_CODE_ALPHA_3) eq "Tibetan"', 0], + ['code2script("330", LOCALE_CODE_NUMERIC) eq "Tibetan"', 0], + + ['code2script("yi", LOCALE_CODE_ALPHA_2) eq "Yi"', 0], # last in DATA + ['code2script("Yii", LOCALE_CODE_ALPHA_3) eq "Yi"', 0], + ['code2script("460", LOCALE_CODE_NUMERIC) eq "Yi"', 0], + + ['code2script("am") eq "Aramaic"', 0], # first in DATA segment + + + #================================================ + # TESTS FOR script2code + #================================================ + + #---- selection of examples which should all result in undef ----------- + ['!defined code2script("BO", LOCALE_CODE_ALPHA_3)', 0], + ['!defined code2script("BO", LOCALE_CODE_NUMERIC)', 0], + ['!defined script2code()', 0], # no argument + ['!defined script2code(undef)', 0], # undef argument + ['!defined script2code("Banana")', 0], # illegal script name + + #---- some successful examples ----------------------------------------- + ['script2code("meroitic") eq "me"', 0], + ['script2code("burmese") eq "my"', 0], + ['script2code("Pahlavi") eq "ph"', 0], + ['script2code("Vai", LOCALE_CODE_ALPHA_3) eq "vai"', 0], + ['script2code("Tamil", LOCALE_CODE_NUMERIC) eq "346"', 0], + ['script2code("Latin") eq "la"', 0], + ['script2code("Latin", LOCALE_CODE_ALPHA_3) eq "lat"', 0], + + #================================================ + # TESTS FOR script_code2code + #================================================ + + #---- selection of examples which should all result in undef ----------- + ['!defined script_code2code("bo", LOCALE_CODE_ALPHA_3, LOCALE_CODE_ALPHA_3)', 0], + ['!defined script_code2code("aa", LOCALE_CODE_ALPHA_2, LOCALE_CODE_ALPHA_3)', 0], + ['!defined script_code2code("aa", LOCALE_CODE_ALPHA_3, LOCALE_CODE_ALPHA_3)', 0], + ['!defined script_code2code("aa", LOCALE_CODE_ALPHA_2)', 1], + ['!defined script_code2code()', 1], # no argument + ['!defined script_code2code(undef)', 1], # undef argument + + #---- some successful examples ----------------------------------------- + ['script_code2code("BO", LOCALE_CODE_ALPHA_2, LOCALE_CODE_ALPHA_3) eq "bod"', 0], + ['script_code2code("bod", LOCALE_CODE_ALPHA_3, LOCALE_CODE_ALPHA_2) eq "bo"', 0], + ['script_code2code("Phx", LOCALE_CODE_ALPHA_3, LOCALE_CODE_ALPHA_2) eq "ph"', 0], + ['script_code2code("295", LOCALE_CODE_NUMERIC, LOCALE_CODE_ALPHA_3) eq "pqd"', 0], + ['script_code2code(170, LOCALE_CODE_NUMERIC, LOCALE_CODE_ALPHA_3) eq "tna"', 0], + ['script_code2code("rr", LOCALE_CODE_ALPHA_2, LOCALE_CODE_NUMERIC) eq "620"', 0], + +); + +print "1..", int(@TESTS), "\n"; + +$testid = 1; +foreach $test (@TESTS) +{ + eval "print (($test->[0]) ? \"ok $testid\\n\" : \"not ok $testid\\n\" )"; + if ($@) + { + if (!$test->[1]) + { + print "not ok $testid\n"; + } + else + { + print "ok $testid\n"; + } + } + ++$testid; +} + +exit 0; diff --git a/lib/Locale/Constants.pm b/lib/Locale/Constants.pm index 9dfe1db..88651fd 100644 --- a/lib/Locale/Constants.pm +++ b/lib/Locale/Constants.pm @@ -2,7 +2,7 @@ package Locale::Constants; # # Locale::Constants - defined constants for identifying codesets # -# $Id: Constants.pm,v 1.1 2001/03/04 17:58:15 neilb Exp $ +# $Id: Constants.pm,v 2.0 2002/02/05 22:37:58 neilb Exp $ # use strict; @@ -10,6 +10,7 @@ use strict; require Exporter; use vars qw($VERSION @ISA @EXPORT); +$VERSION = sprintf("%d.%02d", q$Revision: 2.0 $ =~ /(\d+)\.(\d+)/); @ISA = qw(Exporter); @EXPORT = qw(LOCALE_CODE_ALPHA_2 LOCALE_CODE_ALPHA_3 LOCALE_CODE_NUMERIC LOCALE_CODE_DEFAULT); @@ -18,8 +19,6 @@ use constant LOCALE_CODE_ALPHA_2 => 1; use constant LOCALE_CODE_ALPHA_3 => 2; use constant LOCALE_CODE_NUMERIC => 3; -$VERSION = '1.00'; - use constant LOCALE_CODE_DEFAULT => LOCALE_CODE_ALPHA_2; 1; @@ -39,14 +38,15 @@ Locale::Constants - constants for Locale codes =head1 DESCRIPTION B defines symbols which are used in -the three modules from the Locale-Codes distribution: +the four modules from the Locale-Codes distribution: Locale::Language Locale::Country Locale::Currency + Locale::Script -B at the moment only Locale::Country supports -more than one code set. +B at the moment only Locale::Country and Locale::Script +support more than one code set. The symbols defined are used to specify which codes you want to be used: @@ -75,6 +75,10 @@ Codes for identification of languages. Codes for identification of countries. +=item Locale::Script + +Codes for identification of scripts. + =item Locale::Currency Codes for identification of currencies and funds. @@ -83,10 +87,12 @@ Codes for identification of currencies and funds. =head1 AUTHOR -Neil Bowers Eneilb@cre.canon.co.ukE +Neil Bowers Eneil@bowers.comE =head1 COPYRIGHT +Copyright (C) 2002, Neil Bowers. + Copyright (C) 2001, Canon Research Centre Europe (CRE). This module is free software; you can redistribute it and/or diff --git a/lib/Locale/Country.pm b/lib/Locale/Country.pm index f60b135..2f43ae7 100644 --- a/lib/Locale/Country.pm +++ b/lib/Locale/Country.pm @@ -69,6 +69,14 @@ so you can use 'BO', 'Bo', 'bO' or 'bo' for Bolivia. When a code is returned by one of the functions in this module, it will always be lower-case. +As of version 2.00, Locale::Country supports variant +names for countries. So, for example, the country code for "United States" +is "us", so country2code('United States') returns 'us'. +Now the following will also return 'us': + + country2code('United States of America') + country2code('USA') + =cut #----------------------------------------------------------------------- @@ -82,7 +90,7 @@ use Locale::Constants; # Public Global Variables #----------------------------------------------------------------------- use vars qw($VERSION @ISA @EXPORT @EXPORT_OK); -$VERSION = sprintf("%d.%02d", q$Revision: 1.7 $ =~ /(\d+)\.(\d+)/); +$VERSION = sprintf("%d.%02d", q$Revision: 2.0 $ =~ /(\d+)\.(\d+)/); @ISA = qw(Exporter); @EXPORT = qw(code2country country2code all_country_codes all_country_names @@ -355,18 +363,11 @@ B would map to B. Any others? =item * When using C, the country name must currently appear -exactly as it does in the source of the module. For example, - - country2code('United States') - -will return B, as expected. But the following will all return C: +exactly as it does in the source of the module. The module now supports +a small number of variants. - country2code('United States of America') - country2code('Great Britain') - country2code('U.S.A.') - -If there's need for it, a future version could have variants -for country names. +Possible extensions to this are: an interface for getting at the +list of variant names, and regular expression matches. =item * @@ -374,6 +375,10 @@ In the current implementation, all data is read in when the module is loaded, and then held in memory. A lazy implementation would be more memory friendly. +=item * + +Support for country names in different languages. + =back =head1 SEE ALSO @@ -384,6 +389,10 @@ A lazy implementation would be more memory friendly. ISO two letter codes for identification of language (ISO 639). +=item Locale::Script + +ISO codes for identification of scripts (ISO 15924). + =item Locale::Currency ISO three letter codes for identification of currencies @@ -411,10 +420,12 @@ as defined by ISO 3166, FIPS 10-4, and internet domain names. =head1 AUTHOR -Neil Bowers Eneilb@cre.canon.co.ukE +Neil Bowers Eneil@bowers.comE =head1 COPYRIGHT +Copyright (C) 2002, Neil Bowers. + Copyright (c) 1997-2001 Canon Research Centre Europe (CRE). This module is free software; you can redistribute it and/or @@ -429,28 +440,37 @@ modify it under the same terms as Perl itself. #======================================================================= { my ($alpha2, $alpha3, $numeric); - my $country; + my ($country, @countries); while () { next unless /\S/; chop; - ($alpha2, $alpha3, $numeric, $country) = split(/:/, $_, 4); + ($alpha2, $alpha3, $numeric, @countries) = split(/:/, $_); - $CODES->[LOCALE_CODE_ALPHA_2]->{$alpha2} = $country; - $COUNTRIES->[LOCALE_CODE_ALPHA_2]->{"\L$country"} = $alpha2; + $CODES->[LOCALE_CODE_ALPHA_2]->{$alpha2} = $countries[0]; + foreach $country (@countries) + { + $COUNTRIES->[LOCALE_CODE_ALPHA_2]->{"\L$country"} = $alpha2; + } if ($alpha3) { - $CODES->[LOCALE_CODE_ALPHA_3]->{$alpha3} = $country; - $COUNTRIES->[LOCALE_CODE_ALPHA_3]->{"\L$country"} = $alpha3; + $CODES->[LOCALE_CODE_ALPHA_3]->{$alpha3} = $countries[0]; + foreach $country (@countries) + { + $COUNTRIES->[LOCALE_CODE_ALPHA_3]->{"\L$country"} = $alpha3; + } } if ($numeric) { - $CODES->[LOCALE_CODE_NUMERIC]->{$numeric} = $country; - $COUNTRIES->[LOCALE_CODE_NUMERIC]->{"\L$country"} = $numeric; + $CODES->[LOCALE_CODE_NUMERIC]->{$numeric} = $countries[0]; + foreach $country (@countries) + { + $COUNTRIES->[LOCALE_CODE_NUMERIC]->{"\L$country"} = $numeric; + } } } @@ -496,7 +516,7 @@ by:blr:112:Belarus bz:blz:084:Belize ca:can:124:Canada cc:::Cocos (Keeling) Islands -cd:cod:180:Congo, The Democratic Republic of the +cd:cod:180:Congo, The Democratic Republic of the:Congo, Democratic Republic of the cf:caf:140:Central African Republic cg:cog:178:Congo ch:che:756:Switzerland @@ -527,13 +547,13 @@ es:esp:724:Spain et:eth:231:Ethiopia fi:fin:246:Finland fj:fji:242:Fiji -fk:flk:238:Falkland Islands (Malvinas) +fk:flk:238:Falkland Islands (Malvinas):Falkland Islands (Islas Malvinas) fm:fsm:583:Micronesia, Federated States of fo:fro:234:Faroe Islands fr:fra:250:France fx:::France, Metropolitan ga:gab:266:Gabon -gb:gbr:826:United Kingdom +gb:gbr:826:United Kingdom:Great Britain gd:grd:308:Grenada ge:geo:268:Georgia gf:guf:254:French Guiana @@ -562,7 +582,7 @@ il:isr:376:Israel in:ind:356:India io:::British Indian Ocean Territory iq:irq:368:Iraq -ir:irn:364:Iran, Islamic Republic of +ir:irn:364:Iran, Islamic Republic of:Iran is:isl:352:Iceland it:ita:380:Italy jm:jam:388:Jamaica @@ -574,8 +594,8 @@ kh:khm:116:Cambodia ki:kir:296:Kiribati km:com:174:Comoros kn:kna:659:Saint Kitts and Nevis -kp:prk:408:Korea, Democratic People's Republic of -kr:kor:410:Korea, Republic of +kp:prk:408:Korea, Democratic People's Republic of:Korea, North:North Korea +kr:kor:410:Korea, Republic of:Korea, South:South Korea kw:kwt:414:Kuwait ky:cym:136:Cayman Islands kz:kaz:398:Kazakstan @@ -589,13 +609,13 @@ ls:lso:426:Lesotho lt:ltu:440:Lithuania lu:lux:442:Luxembourg lv:lva:428:Latvia -ly:lby:434:Libyan Arab Jamahiriya +ly:lby:434:Libyan Arab Jamahiriya:Libya ma:mar:504:Morocco mc:mco:492:Monaco md:mda:498:Moldova, Republic of mg:mdg:450:Madagascar mh:mhl:584:Marshall Islands -mk:mkd:807:Macedonia, the Former Yugoslav Republic of +mk:mkd:807:Macedonia, the Former Yugoslav Republic of:Macedonia, Former Yugoslav Republic of:Macedonia ml:mli:466:Mali mm:mmr:104:Myanmar mn:mng:496:Mongolia @@ -632,7 +652,7 @@ ph:phl:608:Philippines pk:pak:586:Pakistan pl:pol:616:Poland pm:spm:666:Saint Pierre and Miquelon -pn:pcn:612:Pitcairn +pn:pcn:612:Pitcairn:Pitcairn Island pr:pri:630:Puerto Rico ps:pse:275:Palestinian Territory, Occupied pt:prt:620:Portugal @@ -641,7 +661,7 @@ py:pry:600:Paraguay qa:qat:634:Qatar re:reu:638:Reunion ro:rom:642:Romania -ru:rus:643:Russian Federation +ru:rus:643:Russian Federation:Russia rw:rwa:646:Rwanda sa:sau:682:Saudi Arabia sb:slb:090:Solomon Islands @@ -651,7 +671,7 @@ se:swe:752:Sweden sg:sgp:702:Singapore sh:shn:654:Saint Helena si:svn:705:Slovenia -sj:sjm:744:Svalbard and Jan Mayen +sj:sjm:744:Svalbard and Jan Mayen:Jan Mayen:Svalbard sk:svk:703:Slovakia sl:sle:694:Sierra Leone sm:smr:674:San Marino @@ -660,7 +680,7 @@ so:som:706:Somalia sr:sur:740:Suriname st:stp:678:Sao Tome and Principe sv:slv:222:El Salvador -sy:syr:760:Syrian Arab Republic +sy:syr:760:Syrian Arab Republic:Syria sz:swz:748:Swaziland tc:tca:796:Turks and Caicos Islands td:tcd:148:Chad @@ -676,18 +696,18 @@ tp:tmp:626:East Timor tr:tur:792:Turkey tt:tto:780:Trinidad and Tobago tv:tuv:798:Tuvalu -tw:twn:158:Taiwan, Province of China -tz:tza:834:Tanzania, United Republic of +tw:twn:158:Taiwan, Province of China:Taiwan +tz:tza:834:Tanzania, United Republic of:Tanzania ua:ukr:804:Ukraine ug:uga:800:Uganda um:::United States Minor Outlying Islands -us:usa:840:United States +us:usa:840:United States:USA:United States of America uy:ury:858:Uruguay uz:uzb:860:Uzbekistan -va:vat:336:Holy See (Vatican City State) +va:vat:336:Holy See (Vatican City State):Hole See (Vatican City) vc:vct:670:Saint Vincent and the Grenadines ve:ven:862:Venezuela -vg:vgb:092:Virgin Islands, British +vg:vgb:092:Virgin Islands, British:British Virgin Islands vi:vir:850:Virgin Islands, U.S. vn:vnm:704:Vietnam vu:vut:548:Vanuatu diff --git a/lib/Locale/Currency.pm b/lib/Locale/Currency.pm index 054ac1b..d8732e7 100644 --- a/lib/Locale/Currency.pm +++ b/lib/Locale/Currency.pm @@ -58,7 +58,7 @@ require Exporter; # Public Global Variables #----------------------------------------------------------------------- use vars qw($VERSION @ISA @EXPORT); -$VERSION = sprintf("%d.%02d", q$Revision: 1.3 $ =~ /(\d+)\.(\d+)/); +$VERSION = sprintf("%d.%02d", q$Revision: 2.0 $ =~ /(\d+)\.(\d+)/); @ISA = qw(Exporter); @EXPORT = qw(&code2currency ¤cy2code &all_currency_codes &all_currency_names ); @@ -224,7 +224,8 @@ do something about it. =item * ISO 4217 also defines a numeric code for each currency. -Currency codes are not currently supported by this module. +Currency codes are not currently supported by this module, +in the same way Locale::Country supports multiple codesets. =item * @@ -245,8 +246,10 @@ you might not get back the code you expected. =item Locale::Country ISO codes for identification of country (ISO 3166). -Supports alpha-2, alpha-3, and numeric codes. -The currency codes use the alpha-2 codeset. + +=item Locale::Script + +ISO codes for identification of written scripts (ISO 15924). =item ISO 4217:1995 @@ -263,7 +266,7 @@ This has the latest list of codes, in MS Word format. Boo. Michael Hennecke Ehennecke@rz.uni-karlsruhe.deE and -Neil Bowers Eneilb@cre.canon.co.ukE +Neil Bowers Eneil@bowers.comE =head1 COPYRIGHT diff --git a/lib/Locale/Language.pm b/lib/Locale/Language.pm index 9e0cf19..6479c28 100644 --- a/lib/Locale/Language.pm +++ b/lib/Locale/Language.pm @@ -29,7 +29,7 @@ require 5.002; The C module provides access to the ISO two-letter codes for identifying languages, as defined in ISO 639. You can either access the codes via the L (described below), -or with the two functions which return lists of all language codes or +or via the two functions which return lists of all language codes or all language names. =cut @@ -42,7 +42,7 @@ require Exporter; # Public Global Variables #----------------------------------------------------------------------- use vars qw($VERSION @ISA @EXPORT); -$VERSION = sprintf("%d.%02d", q$Revision: 1.6 $ =~ /(\d+)\.(\d+)/); +$VERSION = sprintf("%d.%02d", q$Revision: 2.0 $ =~ /(\d+)\.(\d+)/); @ISA = qw(Exporter); @EXPORT = qw(&code2language &language2code &all_language_codes &all_language_names ); @@ -213,6 +213,10 @@ Would these be of any use to anyone? ISO codes for identification of country (ISO 3166). Supports 2-letter, 3-letter, and numeric country codes. +=item Locale::Script + +ISO codes for identification of written scripts (ISO 15924). + =item Locale::Currency ISO three letter codes for identification of currencies and funds (ISO 4217). @@ -223,17 +227,19 @@ Code for the representation of names of languages. =item http://lcweb.loc.gov/standards/iso639-2/langhome.html -Home page for ISO 639-2 +Home page for ISO 639-2. =back =head1 AUTHOR -Neil Bowers Eneilb@cre.canon.co.ukE +Neil Bowers Eneil@bowers.comE =head1 COPYRIGHT +Copyright (C) 2002, Neil Bowers. + Copyright (c) 1997-2001 Canon Research Centre Europe (CRE). This module is free software; you can redistribute it and/or @@ -247,15 +253,13 @@ modify it under the same terms as Perl itself. # initialisation code - stuff the DATA into the CODES hash #======================================================================= { - no utf8; # __DATA__ contains Latin-1 - my $code; my $language; while () { - next unless /\S/; + next unless /\S/; chop; ($code, $language) = split(/:/, $_, 2); $CODES{$code} = $language; diff --git a/lib/Locale/Script.pm b/lib/Locale/Script.pm new file mode 100644 index 0000000..a7168fe --- /dev/null +++ b/lib/Locale/Script.pm @@ -0,0 +1,528 @@ +#----------------------------------------------------------------------- + +=head1 NAME + +Locale::Script - ISO codes for script identification (ISO 15924) + +=head1 SYNOPSIS + + use Locale::Script; + use Locale::Constants; + + $script = code2script('ph'); # 'Phoenician' + $code = script2code('Tibetan'); # 'bo' + $code3 = script2code('Tibetan', + LOCALE_CODE_ALPHA_3); # 'bod' + $codeN = script2code('Tibetan', + LOCALE_CODE_ALPHA_NUMERIC); # 330 + + @codes = all_script_codes(); + @scripts = all_script_names(); + +=cut + +#----------------------------------------------------------------------- + +package Locale::Script; +use strict; +require 5.002; + +#----------------------------------------------------------------------- + +=head1 DESCRIPTION + +The C module provides access to the ISO +codes for identifying scripts, as defined in ISO 15924. +For example, Egyptian hieroglyphs are denoted by the two-letter +code 'eg', the three-letter code 'egy', and the numeric code 050. + +You can either access the codes via the conversion routines +(described below), or with the two functions which return lists +of all script codes or all script names. + +There are three different code sets you can use for identifying +scripts: + +=over 4 + +=item B + +Two letter codes, such as 'bo' for Tibetan. +This code set is identified with the symbol C. + +=item B + +Three letter codes, such as 'ell' for Greek. +This code set is identified with the symbol C. + +=item B + +Numeric codes, such as 410 for Hiragana. +This code set is identified with the symbol C. + +=back + +All of the routines take an optional additional argument +which specifies the code set to use. +If not specified, it defaults to the two-letter codes. +This is partly for backwards compatibility (previous versions +of Locale modules only supported the alpha-2 codes), and +partly because they are the most widely used codes. + +The alpha-2 and alpha-3 codes are not case-dependent, +so you can use 'BO', 'Bo', 'bO' or 'bo' for Tibetan. +When a code is returned by one of the functions in +this module, it will always be lower-case. + +=head2 SPECIAL CODES + +The standard defines various special codes. + +=over 4 + +=item * + +The standard reserves codes in the ranges B - B, +B - B, and B<900> - B<919>, for private use. + +=item * + +B, B, and B<997>, are the codes for unwritten languages. + +=item * + +B, B, and B<998>, are the codes for an undetermined script. + +=item * + +B, B, and B<999>, are the codes for an uncoded script. + +=back + +The private codes are not recognised by Locale::Script, +but the others are. + +=cut + +#----------------------------------------------------------------------- + +require Exporter; +use Carp; +use Locale::Constants; + + +#----------------------------------------------------------------------- +# Public Global Variables +#----------------------------------------------------------------------- +use vars qw($VERSION @ISA @EXPORT @EXPORT_OK); +$VERSION = sprintf("%d.%02d", q$Revision: 2.0 $ =~ /(\d+)\.(\d+)/); +@ISA = qw(Exporter); +@EXPORT = qw(code2script script2code + all_script_codes all_script_names + script_code2code + LOCALE_CODE_ALPHA_2 LOCALE_CODE_ALPHA_3 LOCALE_CODE_NUMERIC); + +#----------------------------------------------------------------------- +# Private Global Variables +#----------------------------------------------------------------------- +my $CODES = []; +my $COUNTRIES = []; + + +#======================================================================= + +=head1 CONVERSION ROUTINES + +There are three conversion routines: C, C, +and C. + +=over 8 + +=item code2script( CODE, [ CODESET ] ) + +This function takes a script code and returns a string +which contains the name of the script identified. +If the code is not a valid script code, as defined by ISO 15924, +then C will be returned: + + $script = code2script('cy'); # Cyrillic + +=item script2code( STRING, [ CODESET ] ) + +This function takes a script name and returns the corresponding +script code, if such exists. +If the argument could not be identified as a script name, +then C will be returned: + + $code = script2code('Gothic', LOCALE_CODE_ALPHA_3); + # $code will now be 'gth' + +The case of the script name is not important. +See the section L below. + +=item script_code2code( CODE, CODESET, CODESET ) + +This function takes a script code from one code set, +and returns the corresponding code from another code set. + + $alpha2 = script_code2code('jwi', + LOCALE_CODE_ALPHA_3 => LOCALE_CODE_ALPHA_2); + # $alpha2 will now be 'jw' (Javanese) + +If the code passed is not a valid script code in +the first code set, or if there isn't a code for the +corresponding script in the second code set, +then C will be returned. + +=back + +=cut + +#======================================================================= +sub code2script +{ + my $code = shift; + my $codeset = @_ > 0 ? shift : LOCALE_CODE_DEFAULT; + + + return undef unless defined $code; + + #------------------------------------------------------------------- + # Make sure the code is in the right form before we use it + # to look up the corresponding script. + # We have to sprintf because the codes are given as 3-digits, + # with leading 0's. Eg 070 for Egyptian demotic. + #------------------------------------------------------------------- + if ($codeset == LOCALE_CODE_NUMERIC) + { + return undef if ($code =~ /\D/); + $code = sprintf("%.3d", $code); + } + else + { + $code = lc($code); + } + + if (exists $CODES->[$codeset]->{$code}) + { + return $CODES->[$codeset]->{$code}; + } + else + { + #--------------------------------------------------------------- + # no such script code! + #--------------------------------------------------------------- + return undef; + } +} + +sub script2code +{ + my $script = shift; + my $codeset = @_ > 0 ? shift : LOCALE_CODE_DEFAULT; + + + return undef unless defined $script; + $script = lc($script); + if (exists $COUNTRIES->[$codeset]->{$script}) + { + return $COUNTRIES->[$codeset]->{$script}; + } + else + { + #--------------------------------------------------------------- + # no such script! + #--------------------------------------------------------------- + return undef; + } +} + +sub script_code2code +{ + (@_ == 3) or croak "script_code2code() takes 3 arguments!"; + + my $code = shift; + my $inset = shift; + my $outset = shift; + my $outcode = shift; + my $script; + + + return undef if $inset == $outset; + $script = code2script($code, $inset); + return undef if not defined $script; + $outcode = script2code($script, $outset); + return $outcode; +} + +#======================================================================= + +=head1 QUERY ROUTINES + +There are two function which can be used to obtain a list of all codes, +or all script names: + +=over 8 + +=item C + +Returns a list of all two-letter script codes. +The codes are guaranteed to be all lower-case, +and not in any particular order. + +=item C + +Returns a list of all script names for which there is a corresponding +script code in the specified code set. +The names are capitalised, and not returned in any particular order. + +=back + +=cut + +#======================================================================= +sub all_script_codes +{ + my $codeset = @_ > 0 ? shift : LOCALE_CODE_DEFAULT; + + return keys %{ $CODES->[$codeset] }; +} + +sub all_script_names +{ + my $codeset = @_ > 0 ? shift : LOCALE_CODE_DEFAULT; + + return values %{ $CODES->[$codeset] }; +} + + +#----------------------------------------------------------------------- + +=head1 EXAMPLES + +The following example illustrates use of the C function. +The user is prompted for a script code, and then told the corresponding +script name: + + $| = 1; # turn off buffering + + print "Enter script code: "; + chop($code = ); + $script = code2script($code, LOCALE_CODE_ALPHA_2); + if (defined $script) + { + print "$code = $script\n"; + } + else + { + print "'$code' is not a valid script code!\n"; + } + + +=head1 KNOWN BUGS AND LIMITATIONS + +=over 4 + +=item * + +When using C, the script name must currently appear +exactly as it does in the source of the module. For example, + + script2code('Egyptian hieroglyphs') + +will return B, as expected. But the following will all return C: + + script2code('hieroglyphs') + script2code('Egyptian Hieroglypics') + +If there's need for it, a future version could have variants +for script names. + +=item * + +In the current implementation, all data is read in when the +module is loaded, and then held in memory. +A lazy implementation would be more memory friendly. + +=back + +=head1 SEE ALSO + +=over 4 + +=item Locale::Language + +ISO two letter codes for identification of language (ISO 639). + +=item Locale::Currency + +ISO three letter codes for identification of currencies +and funds (ISO 4217). + +=item Locale::Country + +ISO three letter codes for identification of countries (ISO 3166) + +=item ISO 15924 + +The ISO standard which defines these codes. + +=item http://www.evertype.com/standards/iso15924/ + +Home page for ISO 15924. + + +=back + + +=head1 AUTHOR + +Neil Bowers Eneil@bowers.comE + +=head1 COPYRIGHT + +Copyright (c) 2002 Neil Bowers. + +This module is free software; you can redistribute it and/or +modify it under the same terms as Perl itself. + +=cut + +#----------------------------------------------------------------------- + +#======================================================================= +# initialisation code - stuff the DATA into the ALPHA2 hash +#======================================================================= +{ + my ($alpha2, $alpha3, $numeric); + my $script; + + + while () + { + next unless /\S/; + chop; + ($alpha2, $alpha3, $numeric, $script) = split(/:/, $_, 4); + + $CODES->[LOCALE_CODE_ALPHA_2]->{$alpha2} = $script; + $COUNTRIES->[LOCALE_CODE_ALPHA_2]->{"\L$script"} = $alpha2; + + if ($alpha3) + { + $CODES->[LOCALE_CODE_ALPHA_3]->{$alpha3} = $script; + $COUNTRIES->[LOCALE_CODE_ALPHA_3]->{"\L$script"} = $alpha3; + } + + if ($numeric) + { + $CODES->[LOCALE_CODE_NUMERIC]->{$numeric} = $script; + $COUNTRIES->[LOCALE_CODE_NUMERIC]->{"\L$script"} = $numeric; + } + + } +} + +1; + +__DATA__ +am:ama:130:Aramaic +ar:ara:160:Arabic +av:ave:151:Avestan +bh:bhm:300:Brahmi (Ashoka) +bi:bid:372:Buhid +bn:ben:325:Bengali +bo:bod:330:Tibetan +bp:bpm:285:Bopomofo +br:brl:570:Braille +bt:btk:365:Batak +bu:bug:367:Buginese (Makassar) +by:bys:550:Blissymbols +ca:cam:358:Cham +ch:chu:221:Old Church Slavonic +ci:cir:291:Cirth +cm:cmn:402:Cypro-Minoan +co:cop:205:Coptic +cp:cpr:403:Cypriote syllabary +cy:cyr:220:Cyrillic +ds:dsr:250:Deserel (Mormon) +dv:dvn:315:Devanagari (Nagari) +ed:egd:070:Egyptian demotic +eg:egy:050:Egyptian hieroglyphs +eh:egh:060:Egyptian hieratic +el:ell:200:Greek +eo:eos:210:Etruscan and Oscan +et:eth:430:Ethiopic +gl:glg:225:Glagolitic +gm:gmu:310:Gurmukhi +gt:gth:206:Gothic +gu:guj:320:Gujarati +ha:han:500:Han ideographs +he:heb:125:Hebrew +hg:hgl:420:Hangul +hm:hmo:450:Pahawh Hmong +ho:hoo:371:Hanunoo +hr:hrg:410:Hiragana +hu:hun:176:Old Hungarian runic +hv:hvn:175:Kok Turki runic +hy:hye:230:Armenian +iv:ivl:610:Indus Valley +ja:jap:930:(alias for Han + Hiragana + Katakana) +jl:jlg:445:Cherokee syllabary +jw:jwi:360:Javanese +ka:kam:241:Georgian (Mxedruli) +kh:khn:931:(alias for Hangul + Han) +kk:kkn:411:Katakana +km:khm:354:Khmer +kn:kan:345:Kannada +kr:krn:357:Karenni (Kayah Li) +ks:kst:305:Kharoshthi +kx:kax:240:Georgian (Xucuri) +la:lat:217:Latin +lf:laf:215:Latin (Fraktur variant) +lg:lag:216:Latin (Gaelic variant) +lo:lao:356:Lao +lp:lpc:335:Lepcha (Rong) +md:mda:140:Mandaean +me:mer:100:Meroitic +mh:may:090:Mayan hieroglyphs +ml:mlm:347:Malayalam +mn:mon:145:Mongolian +my:mya:350:Burmese +na:naa:400:Linear A +nb:nbb:401:Linear B +og:ogm:212:Ogham +or:ory:327:Oriya +os:osm:260:Osmanya +ph:phx:115:Phoenician +ph:pah:150:Pahlavi +pl:pld:282:Pollard Phonetic +pq:pqd:295:Klingon plQaD +pr:prm:227:Old Permic +ps:pst:600:Phaistos Disk +rn:rnr:211:Runic (Germanic) +rr:rro:620:Rongo-rongo +sa:sar:110:South Arabian +si:sin:348:Sinhala +sj:syj:137:Syriac (Jacobite variant) +sl:slb:440:Unified Canadian Aboriginal Syllabics +sn:syn:136:Syriac (Nestorian variant) +sw:sww:281:Shavian (Shaw) +sy:syr:135:Syriac (Estrangelo) +ta:tam:346:Tamil +tb:tbw:373:Tagbanwa +te:tel:340:Telugu +tf:tfn:120:Tifnagh +tg:tag:370:Tagalog +th:tha:352:Thai +tn:tna:170:Thaana +tw:twr:290:Tengwar +va:vai:470:Vai +vs:vsp:280:Visible Speech +xa:xas:000:Cuneiform, Sumero-Akkadian +xf:xfa:105:Cuneiform, Old Persian +xk:xkn:412:(alias for Hiragana + Katakana) +xu:xug:106:Cuneiform, Ugaritic +yi:yii:460:Yi +zx:zxx:997:Unwritten language +zy:zyy:998:Undetermined script +zz:zzz:999:Uncoded script