Commit | Line | Data |
6b14ceb7 |
1 | |
2 | =head1 NAME |
3 | |
4 | Locale::Script - ISO codes for script identification (ISO 15924) |
5 | |
6 | =head1 SYNOPSIS |
7 | |
8 | use Locale::Script; |
9 | use Locale::Constants; |
10 | |
11 | $script = code2script('ph'); # 'Phoenician' |
12 | $code = script2code('Tibetan'); # 'bo' |
13 | $code3 = script2code('Tibetan', |
14 | LOCALE_CODE_ALPHA_3); # 'bod' |
15 | $codeN = script2code('Tibetan', |
16 | LOCALE_CODE_ALPHA_NUMERIC); # 330 |
17 | |
18 | @codes = all_script_codes(); |
19 | @scripts = all_script_names(); |
20 | |
21 | |
22 | =head1 DESCRIPTION |
23 | |
24 | The C<Locale::Script> module provides access to the ISO |
25 | codes for identifying scripts, as defined in ISO 15924. |
26 | For example, Egyptian hieroglyphs are denoted by the two-letter |
27 | code 'eg', the three-letter code 'egy', and the numeric code 050. |
28 | |
29 | You can either access the codes via the conversion routines |
30 | (described below), or with the two functions which return lists |
31 | of all script codes or all script names. |
32 | |
33 | There are three different code sets you can use for identifying |
34 | scripts: |
35 | |
36 | =over 4 |
37 | |
38 | =item B<alpha-2> |
39 | |
40 | Two letter codes, such as 'bo' for Tibetan. |
41 | This code set is identified with the symbol C<LOCALE_CODE_ALPHA_2>. |
42 | |
43 | =item B<alpha-3> |
44 | |
45 | Three letter codes, such as 'ell' for Greek. |
46 | This code set is identified with the symbol C<LOCALE_CODE_ALPHA_3>. |
47 | |
48 | =item B<numeric> |
49 | |
50 | Numeric codes, such as 410 for Hiragana. |
51 | This code set is identified with the symbol C<LOCALE_CODE_NUMERIC>. |
52 | |
53 | =back |
54 | |
55 | All of the routines take an optional additional argument |
56 | which specifies the code set to use. |
57 | If not specified, it defaults to the two-letter codes. |
58 | This is partly for backwards compatibility (previous versions |
59 | of Locale modules only supported the alpha-2 codes), and |
60 | partly because they are the most widely used codes. |
61 | |
62 | The alpha-2 and alpha-3 codes are not case-dependent, |
63 | so you can use 'BO', 'Bo', 'bO' or 'bo' for Tibetan. |
64 | When a code is returned by one of the functions in |
65 | this module, it will always be lower-case. |
66 | |
67 | =head2 SPECIAL CODES |
68 | |
69 | The standard defines various special codes. |
70 | |
71 | =over 4 |
72 | |
73 | =item * |
74 | |
75 | The standard reserves codes in the ranges B<qa> - B<qt>, |
76 | B<qaa> - B<qat>, and B<900> - B<919>, for private use. |
77 | |
78 | =item * |
79 | |
80 | B<zx>, B<zxx>, and B<997>, are the codes for unwritten languages. |
81 | |
82 | =item * |
83 | |
84 | B<zy>, B<zyy>, and B<998>, are the codes for an undetermined script. |
85 | |
86 | =item * |
87 | |
88 | B<zz>, B<zzz>, and B<999>, are the codes for an uncoded script. |
89 | |
90 | =back |
91 | |
92 | The private codes are not recognised by Locale::Script, |
93 | but the others are. |
94 | |
95 | |
96 | =head1 CONVERSION ROUTINES |
97 | |
98 | There are three conversion routines: C<code2script()>, C<script2code()>, |
99 | and C<script_code2code()>. |
100 | |
101 | =over 4 |
102 | |
103 | =item code2script( CODE, [ CODESET ] ) |
104 | |
105 | This function takes a script code and returns a string |
106 | which contains the name of the script identified. |
107 | If the code is not a valid script code, as defined by ISO 15924, |
108 | then C<undef> will be returned: |
109 | |
110 | $script = code2script('cy'); # Cyrillic |
111 | |
112 | =item script2code( STRING, [ CODESET ] ) |
113 | |
114 | This function takes a script name and returns the corresponding |
115 | script code, if such exists. |
116 | If the argument could not be identified as a script name, |
117 | then C<undef> will be returned: |
118 | |
119 | $code = script2code('Gothic', LOCALE_CODE_ALPHA_3); |
120 | # $code will now be 'gth' |
121 | |
122 | The case of the script name is not important. |
123 | See the section L<KNOWN BUGS AND LIMITATIONS> below. |
124 | |
125 | =item script_code2code( CODE, CODESET, CODESET ) |
126 | |
127 | This function takes a script code from one code set, |
128 | and returns the corresponding code from another code set. |
129 | |
130 | $alpha2 = script_code2code('jwi', |
131 | LOCALE_CODE_ALPHA_3 => LOCALE_CODE_ALPHA_2); |
132 | # $alpha2 will now be 'jw' (Javanese) |
133 | |
134 | If the code passed is not a valid script code in |
135 | the first code set, or if there isn't a code for the |
136 | corresponding script in the second code set, |
137 | then C<undef> will be returned. |
138 | |
139 | =back |
140 | |
141 | |
142 | =head1 QUERY ROUTINES |
143 | |
144 | There are two function which can be used to obtain a list of all codes, |
145 | or all script names: |
146 | |
147 | =over 4 |
148 | |
149 | =item C<all_script_codes ( [ CODESET ] )> |
150 | |
151 | Returns a list of all two-letter script codes. |
152 | The codes are guaranteed to be all lower-case, |
153 | and not in any particular order. |
154 | |
155 | =item C<all_script_names ( [ CODESET ] )> |
156 | |
157 | Returns a list of all script names for which there is a corresponding |
158 | script code in the specified code set. |
159 | The names are capitalised, and not returned in any particular order. |
160 | |
161 | =back |
162 | |
163 | |
164 | =head1 EXAMPLES |
165 | |
166 | The following example illustrates use of the C<code2script()> function. |
167 | The user is prompted for a script code, and then told the corresponding |
168 | script name: |
169 | |
170 | $| = 1; # turn off buffering |
171 | |
172 | print "Enter script code: "; |
173 | chop($code = <STDIN>); |
174 | $script = code2script($code, LOCALE_CODE_ALPHA_2); |
175 | if (defined $script) |
176 | { |
177 | print "$code = $script\n"; |
178 | } |
179 | else |
180 | { |
181 | print "'$code' is not a valid script code!\n"; |
182 | } |
183 | |
184 | |
185 | =head1 KNOWN BUGS AND LIMITATIONS |
186 | |
187 | =over 4 |
188 | |
189 | =item * |
190 | |
191 | When using C<script2code()>, the script name must currently appear |
192 | exactly as it does in the source of the module. For example, |
193 | |
194 | script2code('Egyptian hieroglyphs') |
195 | |
196 | will return B<eg>, as expected. But the following will all return C<undef>: |
197 | |
198 | script2code('hieroglyphs') |
199 | script2code('Egyptian Hieroglypics') |
200 | |
201 | If there's need for it, a future version could have variants |
202 | for script names. |
203 | |
204 | =item * |
205 | |
206 | In the current implementation, all data is read in when the |
207 | module is loaded, and then held in memory. |
208 | A lazy implementation would be more memory friendly. |
209 | |
210 | =back |
211 | |
212 | =head1 SEE ALSO |
213 | |
214 | =over 4 |
215 | |
216 | =item Locale::Language |
217 | |
218 | ISO two letter codes for identification of language (ISO 639). |
219 | |
220 | =item Locale::Currency |
221 | |
222 | ISO three letter codes for identification of currencies |
223 | and funds (ISO 4217). |
224 | |
225 | =item Locale::Country |
226 | |
227 | ISO three letter codes for identification of countries (ISO 3166) |
228 | |
229 | =item ISO 15924 |
230 | |
231 | The ISO standard which defines these codes. |
232 | |
233 | =item http://www.evertype.com/standards/iso15924/ |
234 | |
235 | Home page for ISO 15924. |
236 | |
237 | |
238 | =back |
239 | |
240 | |
241 | =head1 AUTHOR |
242 | |
243 | Neil Bowers E<lt>neil@bowers.comE<gt> |
244 | |
245 | =head1 COPYRIGHT |
246 | |
247 | Copyright (c) 2002 Neil Bowers. |
248 | |
249 | This module is free software; you can redistribute it and/or |
250 | modify it under the same terms as Perl itself. |
251 | |
252 | =cut |
253 | |