sync blead with Update Archive::Extract 0.34
[p5sagit/p5-mst-13.2.git] / lib / Locale / Maketext.pod
CommitLineData
9378c581 1
14be35aa 2# Time-stamp: "2004-01-11 18:35:34 AST"
9378c581 3
4=head1 NAME
5
f918d677 6Locale::Maketext - framework for localization
9378c581 7
8=head1 SYNOPSIS
9
10 package MyProgram;
11 use strict;
12 use MyProgram::L10N;
13 # ...which inherits from Locale::Maketext
14 my $lh = MyProgram::L10N->get_handle() || die "What language?";
15 ...
16 # And then any messages your program emits, like:
17 warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
18 ...
19
20=head1 DESCRIPTION
21
22It is a common feature of applications (whether run directly,
23or via the Web) for them to be "localized" -- i.e., for them
24to a present an English interface to an English-speaker, a German
25interface to a German-speaker, and so on for all languages it's
26programmed with. Locale::Maketext
27is a framework for software localization; it provides you with the
28tools for organizing and accessing the bits of text and text-processing
29code that you need for producing localized applications.
30
31In order to make sense of Maketext and how all its
32components fit together, you should probably
33go read L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>, and
34I<then> read the following documentation.
35
36You may also want to read over the source for C<File::Findgrep>
37and its constituent modules -- they are a complete (if small)
38example application that uses Maketext.
39
40=head1 QUICK OVERVIEW
41
42The basic design of Locale::Maketext is object-oriented, and
43Locale::Maketext is an abstract base class, from which you
44derive a "project class".
45The project class (with a name like "TkBocciBall::Localize",
46which you then use in your module) is in turn the base class
47for all the "language classes" for your project
48(with names "TkBocciBall::Localize::it",
49"TkBocciBall::Localize::en",
50"TkBocciBall::Localize::fr", etc.).
51
52A language class is
53a class containing a lexicon of phrases as class data,
54and possibly also some methods that are of use in interpreting
55phrases in the lexicon, or otherwise dealing with text in that
56language.
57
58An object belonging to a language class is called a "language
59handle"; it's typically a flyweight object.
60
61The normal course of action is to call:
62
63 use TkBocciBall::Localize; # the localization project class
64 $lh = TkBocciBall::Localize->get_handle();
65 # Depending on the user's locale, etc., this will
66 # make a language handle from among the classes available,
67 # and any defaults that you declare.
68 die "Couldn't make a language handle??" unless $lh;
69
70From then on, you use the C<maketext> function to access
71entries in whatever lexicon(s) belong to the language handle
72you got. So, this:
73
74 print $lh->maketext("You won!"), "\n";
75
76...emits the right text for this language. If the object
77in C<$lh> belongs to class "TkBocciBall::Localize::fr" and
78%TkBocciBall::Localize::fr::Lexicon contains C<("You won!"
79=E<gt> "Tu as gagnE<eacute>!")>, then the above
80code happily tells the user "Tu as gagnE<eacute>!".
81
82=head1 METHODS
83
84Locale::Maketext offers a variety of methods, which fall
85into three categories:
86
87=over
88
89=item *
90
91Methods to do with constructing language handles.
92
93=item *
94
95C<maketext> and other methods to do with accessing %Lexicon data
96for a given language handle.
97
98=item *
99
100Methods that you may find it handy to use, from routines of
101yours that you put in %Lexicon entries.
102
103=back
104
105These are covered in the following section.
106
107=head2 Construction Methods
108
109These are to do with constructing a language handle:
110
111=over
112
f918d677 113=item *
5dc6f178 114
115$lh = YourProjClass->get_handle( ...langtags... ) || die "lg-handle?";
9378c581 116
117This tries loading classes based on the language-tags you give (like
118C<("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")>, and for the first class
119that succeeds, returns YourProjClass::I<language>->new().
120
f666394a 121If it runs thru the entire given list of language-tags, and finds no classes
9378c581 122for those exact terms, it then tries "superordinate" language classes.
123So if no "en-US" class (i.e., YourProjClass::en_us)
124was found, nor classes for anything else in that list, we then try
125its superordinate, "en" (i.e., YourProjClass::en), and so on thru
126the other language-tags in the given list: "es".
127(The other language-tags in our example list:
128happen to have no superordinates.)
129
130If none of those language-tags leads to loadable classes, we then
131try classes derived from YourProjClass->fallback_languages() and
132then if nothing comes of that, we use classes named by
133YourProjClass->fallback_language_classes(). Then in the (probably
134quite unlikely) event that that fails, we just return undef.
135
5dc6f178 136=item *
137
138$lh = YourProjClass->get_handleB<()> || die "lg-handle?";
9378c581 139
140When C<get_handle> is called with an empty parameter list, magic happens:
141
142If C<get_handle> senses that it's running in program that was
143invoked as a CGI, then it tries to get language-tags out of the
144environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
145those were the languages passed as parameters to C<get_handle>.
146
147Otherwise (i.e., if not a CGI), this tries various OS-specific ways
148to get the language-tags for the current locale/language, and then
f918d677 149pretends that those were the value(s) passed to C<get_handle>.
9378c581 150
151Currently this OS-specific stuff consists of looking in the environment
152variables "LANG" and "LANGUAGE"; and on MSWin machines (where those
153variables are typically unused), this also tries using
154the module Win32::Locale to get a language-tag for whatever language/locale
155is currently selected in the "Regional Settings" (or "International"?)
156Control Panel. I welcome further
157suggestions for making this do the Right Thing under other operating
158systems that support localization.
159
160If you're using localization in an application that keeps a configuration
161file, you might consider something like this in your project class:
162
163 sub get_handle_via_config {
164 my $class = $_[0];
f666394a 165 my $chosen_language = $Config_settings{'language'};
9378c581 166 my $lh;
f666394a 167 if($chosen_language) {
9378c581 168 $lh = $class->get_handle($chosen_language)
169 || die "No language handle for \"$chosen_language\" or the like";
170 } else {
171 # Config file missing, maybe?
172 $lh = $class->get_handle()
173 || die "Can't get a language handle";
174 }
175 return $lh;
176 }
177
5dc6f178 178=item *
179
180$lh = YourProjClass::langname->new();
9378c581 181
182This constructs a language handle. You usually B<don't> call this
183directly, but instead let C<get_handle> find a language class to C<use>
184and to then call ->new on.
185
5dc6f178 186=item *
187
188$lh->init();
9378c581 189
190This is called by ->new to initialize newly-constructed language handles.
191If you define an init method in your class, remember that it's usually
192considered a good idea to call $lh->SUPER::init in it (presumably at the
193beginning), so that all classes get a chance to initialize a new object
194however they see fit.
195
5dc6f178 196=item *
197
198YourProjClass->fallback_languages()
9378c581 199
200C<get_handle> appends the return value of this to the end of
201whatever list of languages you pass C<get_handle>. Unless
202you override this method, your project class
203will inherit Locale::Maketext's C<fallback_languages>, which
204currently returns C<('i-default', 'en', 'en-US')>.
205("i-default" is defined in RFC 2277).
206
207This method (by having it return the name
208of a language-tag that has an existing language class)
209can be used for making sure that
210C<get_handle> will always manage to construct a language
211handle (assuming your language classes are in an appropriate
212@INC directory). Or you can use the next method:
213
5dc6f178 214=item *
215
216YourProjClass->fallback_language_classes()
9378c581 217
218C<get_handle> appends the return value of this to the end
219of the list of classes it will try using. Unless
220you override this method, your project class
221will inherit Locale::Maketext's C<fallback_language_classes>,
222which currently returns an empty list, C<()>.
223By setting this to some value (namely, the name of a loadable
224language class), you can be sure that
225C<get_handle> will always manage to construct a language
226handle.
227
228=back
229
230=head2 The "maketext" Method
231
232This is the most important method in Locale::Maketext:
233
f666394a 234 $text = $lh->maketext(I<key>, ...parameters for this phrase...);
9378c581 235
236This looks in the %Lexicon of the language handle
237$lh and all its superclasses, looking
238for an entry whose key is the string I<key>. Assuming such
239an entry is found, various things then happen, depending on the
240value found:
241
242If the value is a scalarref, the scalar is dereferenced and returned
243(and any parameters are ignored).
f666394a 244
9378c581 245If the value is a coderef, we return &$value($lh, ...parameters...).
f666394a 246
9378c581 247If the value is a string that I<doesn't> look like it's in Bracket Notation,
248we return it (after replacing it with a scalarref, in its %Lexicon).
f666394a 249
9378c581 250If the value I<does> look like it's in Bracket Notation, then we compile
251it into a sub, replace the string in the %Lexicon with the new coderef,
252and then we return &$new_sub($lh, ...parameters...).
253
254Bracket Notation is discussed in a later section. Note
255that trying to compile a string into Bracket Notation can throw
256an exception if the string is not syntactically valid (say, by not
257balancing brackets right.)
258
259Also, calling &$coderef($lh, ...parameters...) can throw any sort of
260exception (if, say, code in that sub tries to divide by zero). But
261a very common exception occurs when you have Bracket
262Notation text that says to call a method "foo", but there is no such
263method. (E.g., "You have [quaB<tn>,_1,ball]." will throw an exception
264on trying to call $lh->quaB<tn>($_[1],'ball') -- you presumably meant
265"quant".) C<maketext> catches these exceptions, but only to make the
266error message more readable, at which point it rethrows the exception.
267
268An exception I<may> be thrown if I<key> is not found in any
269of $lh's %Lexicon hashes. What happens if a key is not found,
270is discussed in a later section, "Controlling Lookup Failure".
271
272Note that you might find it useful in some cases to override
273the C<maketext> method with an "after method", if you want to
274translate encodings, or even scripts:
275
276 package YrProj::zh_cn; # Chinese with PRC-style glyphs
277 use base ('YrProj::zh_tw'); # Taiwan-style
278 sub maketext {
279 my $self = shift(@_);
280 my $value = $self->maketext(@_);
281 return Chineeze::taiwan2mainland($value);
282 }
283
284Or you may want to override it with something that traps
285any exceptions, if that's critical to your program:
286
287 sub maketext {
288 my($lh, @stuff) = @_;
289 my $out;
290 eval { $out = $lh->SUPER::maketext(@stuff) };
291 return $out unless $@;
292 ...otherwise deal with the exception...
293 }
294
295Other than those two situations, I don't imagine that
296it's useful to override the C<maketext> method. (If
297you run into a situation where it is useful, I'd be
298interested in hearing about it.)
299
300=over
301
302=item $lh->fail_with I<or> $lh->fail_with(I<PARAM>)
303
304=item $lh->failure_handler_auto
305
306These two methods are discussed in the section "Controlling
307Lookup Failure".
308
309=back
310
311=head2 Utility Methods
312
313These are methods that you may find it handy to use, generally
314from %Lexicon routines of yours (whether expressed as
315Bracket Notation or not).
316
317=over
318
319=item $language->quant($number, $singular)
320
321=item $language->quant($number, $singular, $plural)
322
323=item $language->quant($number, $singular, $plural, $negative)
324
325This is generally meant to be called from inside Bracket Notation
326(which is discussed later), as in
327
328 "Your search matched [quant,_1,document]!"
329
330It's for I<quantifying> a noun (i.e., saying how much of it there is,
f918d677 331while giving the correct form of it). The behavior of this method is
9378c581 332handy for English and a few other Western European languages, and you
333should override it for languages where it's not suitable. You can feel
334free to read the source, but the current implementation is basically
335as this pseudocode describes:
336
337 if $number is 0 and there's a $negative,
338 return $negative;
339 elsif $number is 1,
340 return "1 $singular";
341 elsif there's a $plural,
342 return "$number $plural";
343 else
344 return "$number " . $singular . "s";
345 #
346 # ...except that we actually call numf to
347 # stringify $number before returning it.
348
349So for English (with Bracket Notation)
350C<"...[quant,_1,file]..."> is fine (for 0 it returns "0 files",
351for 1 it returns "1 file", and for more it returns "2 files", etc.)
352
f918d677 353But for "directory", you'd want C<"[quant,_1,directory,directories]">
9378c581 354so that our elementary C<quant> method doesn't think that the
355plural of "directory" is "directorys". And you might find that the
356output may sound better if you specify a negative form, as in:
357
358 "[quant,_1,file,files,No files] matched your query.\n"
359
360Remember to keep in mind verb agreement (or adjectives too, in
361other languages), as in:
362
363 "[quant,_1,document] were matched.\n"
364
365Because if _1 is one, you get "1 document B<were> matched".
366An acceptable hack here is to do something like this:
367
368 "[quant,_1,document was, documents were] matched.\n"
369
370=item $language->numf($number)
371
372This returns the given number formatted nicely according to
373this language's conventions. Maketext's default method is
374mostly to just take the normal string form of the number
375(applying sprintf "%G" for only very large numbers), and then
376to add commas as necessary. (Except that
377we apply C<tr/,./.,/> if $language->{'numf_comma'} is true;
378that's a bit of a hack that's useful for languages that express
379two million as "2.000.000" and not as "2,000,000").
380
381If you want anything fancier, consider overriding this with something
382that uses L<Number::Format|Number::Format>, or does something else
383entirely.
384
385Note that numf is called by quant for stringifying all quantifying
386numbers.
387
388=item $language->sprintf($format, @items)
389
390This is just a wrapper around Perl's normal C<sprintf> function.
391It's provided so that you can use "sprintf" in Bracket Notation:
392
393 "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
394
395returning...
396
397 Couldn't access datanode Stuff=[thangamabob]!
398
399=item $language->language_tag()
400
401Currently this just takes the last bit of C<ref($language)>, turns
402underscores to dashes, and returns it. So if $language is
403an object of class Hee::HOO::Haw::en_us, $language->language_tag()
404returns "en-us". (Yes, the usual representation for that language
405tag is "en-US", but case is I<never> considered meaningful in
406language-tag comparison.)
407
408You may override this as you like; Maketext doesn't use it for
409anything.
410
411=item $language->encoding()
412
413Currently this isn't used for anything, but it's provided
414(with default value of
415C<(ref($language) && $language-E<gt>{'encoding'})) or "iso-8859-1">
416) as a sort of suggestion that it may be useful/necessary to
417associate encodings with your language handles (whether on a
418per-class or even per-handle basis.)
419
420=back
421
422=head2 Language Handle Attributes and Internals
423
424A language handle is a flyweight object -- i.e., it doesn't (necessarily)
425carry any data of interest, other than just being a member of
426whatever class it belongs to.
427
428A language handle is implemented as a blessed hash. Subclasses of yours
429can store whatever data you want in the hash. Currently the only hash
430entry used by any crucial Maketext method is "fail", so feel free to
431use anything else as you like.
432
433B<Remember: Don't be afraid to read the Maketext source if there's
434any point on which this documentation is unclear.> This documentation
435is vastly longer than the module source itself.
436
437=over
438
439=back
440
441=head1 LANGUAGE CLASS HIERARCHIES
442
443These are Locale::Maketext's assumptions about the class
444hierarchy formed by all your language classes:
445
446=over
447
448=item *
449
450You must have a project base class, which you load, and
451which you then use as the first argument in
452the call to YourProjClass->get_handle(...). It should derive
453(whether directly or indirectly) from Locale::Maketext.
f666394a 454It B<doesn't matter> how you name this class, although assuming this
9378c581 455is the localization component of your Super Mega Program,
456good names for your project class might be
457SuperMegaProgram::Localization, SuperMegaProgram::L10N,
458SuperMegaProgram::I18N, SuperMegaProgram::International,
459or even SuperMegaProgram::Languages or SuperMegaProgram::Messages.
460
461=item *
462
463Language classes are what YourProjClass->get_handle will try to load.
464It will look for them by taking each language-tag (B<skipping> it
465if it doesn't look like a language-tag or locale-tag!), turning it to
f666394a 466all lowercase, turning dashes to underscores, and appending it
9378c581 467to YourProjClass . "::". So this:
468
469 $lh = YourProjClass->get_handle(
470 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
471 );
472
473will try loading the classes
474YourProjClass::en_us (note lowercase!), YourProjClass::fr,
475YourProjClass::kon,
476YourProjClass::i_klingon
477and YourProjClass::i_klingon_romanized. (And it'll stop at the
478first one that actually loads.)
479
480=item *
481
482I assume that each language class derives (directly or indirectly)
483from your project class, and also defines its @ISA, its %Lexicon,
484or both. But I anticipate no dire consequences if these assumptions
485do not hold.
486
487=item *
488
f666394a 489Language classes may derive from other language classes (although they
9378c581 490should have "use I<Thatclassname>" or "use base qw(I<...classes...>)").
491They may derive from the project
492class. They may derive from some other class altogether. Or via
493multiple inheritance, it may derive from any mixture of these.
494
495=item *
496
497I foresee no problems with having multiple inheritance in
498your hierarchy of language classes. (As usual, however, Perl will
499complain bitterly if you have a cycle in the hierarchy: i.e., if
500any class is its own ancestor.)
501
502=back
503
504=head1 ENTRIES IN EACH LEXICON
505
506A typical %Lexicon entry is meant to signify a phrase,
507taking some number (0 or more) of parameters. An entry
508is meant to be accessed by via
509a string I<key> in $lh->maketext(I<key>, ...parameters...),
510which should return a string that is generally meant for
511be used for "output" to the user -- regardless of whether
512this actually means printing to STDOUT, writing to a file,
513or putting into a GUI widget.
514
515While the key must be a string value (since that's a basic
516restriction that Perl places on hash keys), the value in
f918d677 517the lexicon can currently be of several types:
9378c581 518a defined scalar, scalarref, or coderef. The use of these is
519explained above, in the section 'The "maketext" Method', and
520Bracket Notation for strings is discussed in the next section.
521
522While you can use arbitrary unique IDs for lexicon keys
523(like "_min_larger_max_error"), it is often
524useful for if an entry's key is itself a valid value, like
525this example error message:
526
527 "Minimum ([_1]) is larger than maximum ([_2])!\n",
528
529Compare this code that uses an arbitrary ID...
530
531 die $lh->maketext( "_min_larger_max_error", $min, $max )
532 if $min > $max;
533
534...to this code that uses a key-as-value:
535
536 die $lh->maketext(
537 "Minimum ([_1]) is larger than maximum ([_2])!\n",
538 $min, $max
539 ) if $min > $max;
540
541The second is, in short, more readable. In particular, it's obvious
542that the number of parameters you're feeding to that phrase (two) is
543the number of parameters that it I<wants> to be fed. (Since you see
544_1 and a _2 being used in the key there.)
545
546Also, once a project is otherwise
547complete and you start to localize it, you can scrape together
548all the various keys you use, and pass it to a translator; and then
549the translator's work will go faster if what he's presented is this:
550
551 "Minimum ([_1]) is larger than maximum ([_2])!\n",
552 => "", # fill in something here, Jacques!
553
554rather than this more cryptic mess:
555
556 "_min_larger_max_error"
557 => "", # fill in something here, Jacques
558
559I think that keys as lexicon values makes the completed lexicon
560entries more readable:
561
562 "Minimum ([_1]) is larger than maximum ([_2])!\n",
563 => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
564
565Also, having valid values as keys becomes very useful if you set
566up an _AUTO lexicon. _AUTO lexicons are discussed in a later
567section.
568
569I almost always use keys that are themselves
570valid lexicon values. One notable exception is when the value is
571quite long. For example, to get the screenful of data that
f666394a 572a command-line program might return when given an unknown switch,
573I often just use a brief, self-explanatory key such as "_USAGE_MESSAGE". At that point I then go
9378c581 574and immediately to define that lexicon entry in the
575ProjectClass::L10N::en lexicon (since English is always my "project
f918d677 576language"):
9378c581 577
578 '_USAGE_MESSAGE' => <<'EOSTUFF',
579 ...long long message...
580 EOSTUFF
581
582and then I can use it as:
583
584 getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
585
586Incidentally,
587note that each class's C<%Lexicon> inherits-and-extends
588the lexicons in its superclasses. This is not because these are
589special hashes I<per se>, but because you access them via the
590C<maketext> method, which looks for entries across all the
f666394a 591C<%Lexicon> hashes in a language class I<and> all its ancestor classes.
9378c581 592(This is because the idea of "class data" isn't directly implemented
593in Perl, but is instead left to individual class-systems to implement
594as they see fit..)
595
596Note that you may have things stored in a lexicon
597besides just phrases for output: for example, if your program
598takes input from the keyboard, asking a "(Y/N)" question,
f666394a 599you probably need to know what the equivalent of "Y[es]/N[o]" is
9378c581 600in whatever language. You probably also need to know what
601the equivalents of the answers "y" and "n" are. You can
602store that information in the lexicon (say, under the keys
603"~answer_y" and "~answer_n", and the long forms as
604"~answer_yes" and "~answer_no", where "~" is just an ad-hoc
605character meant to indicate to programmers/translators that
606these are not phrases for output).
607
608Or instead of storing this in the language class's lexicon,
609you can (and, in some cases, really should) represent the same bit
f666394a 610of knowledge as code in a method in the language class. (That
9378c581 611leaves a tidy distinction between the lexicon as the things we
612know how to I<say>, and the rest of the things in the lexicon class
613as things that we know how to I<do>.) Consider
614this example of a processor for responses to French "oui/non"
615questions:
616
617 sub y_or_n {
618 return undef unless defined $_[1] and length $_[1];
619 my $answer = lc $_[1]; # smash case
620 return 1 if $answer eq 'o' or $answer eq 'oui';
621 return 0 if $answer eq 'n' or $answer eq 'non';
622 return undef;
623 }
624
625...which you'd then call in a construct like this:
626
627 my $response;
628 until(defined $response) {
629 print $lh->maketext("Open the pod bay door (y/n)? ");
630 $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
631 }
632 if($response) { $pod_bay_door->open() }
633 else { $pod_bay_door->leave_closed() }
634
635Other data worth storing in a lexicon might be things like
636filenames for language-targetted resources:
637
638 ...
639 "_main_splash_png"
640 => "/styles/en_us/main_splash.png",
641 "_main_splash_imagemap"
642 => "/styles/en_us/main_splash.incl",
643 "_general_graphics_path"
644 => "/styles/en_us/",
645 "_alert_sound"
646 => "/styles/en_us/hey_there.wav",
647 "_forward_icon"
648 => "left_arrow.png",
649 "_backward_icon"
650 => "right_arrow.png",
651 # In some other languages, left equals
652 # BACKwards, and right is FOREwards.
653 ...
654
655You might want to do the same thing for expressing key bindings
656or the like (since hardwiring "q" as the binding for the function
657that quits a screen/menu/program is useful only if your language
658happens to associate "q" with "quit"!)
659
660=head1 BRACKET NOTATION
661
662Bracket Notation is a crucial feature of Locale::Maketext. I mean
f666394a 663Bracket Notation to provide a replacement for the use of sprintf formatting.
9378c581 664Everything you do with Bracket Notation could be done with a sub block,
665but bracket notation is meant to be much more concise.
666
667Bracket Notation is a like a miniature "template" system (in the sense
668of L<Text::Template|Text::Template>, not in the sense of C++ templates),
f666394a 669where normal text is passed thru basically as is, but text in special
670regions is specially interpreted. In Bracket Notation, you use square brackets ("[...]"),
671not curly braces ("{...}") to note sections that are specially interpreted.
9378c581 672
673For example, here all the areas that are taken literally are underlined with
674a "^", and all the in-bracket special regions are underlined with an X:
675
676 "Minimum ([_1]) is larger than maximum ([_2])!\n",
677 ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
678
679When that string is compiled from bracket notation into a real Perl sub,
680it's basically turned into:
681
682 sub {
683 my $lh = $_[0];
684 my @params = @_;
685 return join '',
686 "Minimum (",
687 ...some code here...
688 ") is larger than maximum (",
689 ...some code here...
690 ")!\n",
691 }
692 # to be called by $lh->maketext(KEY, params...)
693
694In other words, text outside bracket groups is turned into string
695literals. Text in brackets is rather more complex, and currently follows
696these rules:
697
698=over
699
700=item *
701
702Bracket groups that are empty, or which consist only of whitespace,
703are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns
704and/or tabs and/or spaces between them.
705
706Otherwise, each group is taken to be a comma-separated group of items,
707and each item is interpreted as follows:
708
709=item *
710
711An item that is "_I<digits>" or "_-I<digits>" is interpreted as
f666394a 712$_[I<value>]. I.e., "_1" becomes with $_[1], and "_-3" is interpreted
9378c581 713as $_[-3] (in which case @_ should have at least three elements in it).
714Note that $_[0] is the language handle, and is typically not named
715directly.
716
717=item *
718
719An item "_*" is interpreted to mean "all of @_ except $_[0]".
720I.e., C<@_[1..$#_]>. Note that this is an empty list in the case
721of calls like $lh->maketext(I<key>) where there are no
722parameters (except $_[0], the language handle).
723
724=item *
725
726Otherwise, each item is interpreted as a string literal.
727
728=back
729
730The group as a whole is interpreted as follows:
731
732=over
733
734=item *
735
736If the first item in a bracket group looks like a method name,
737then that group is interpreted like this:
738
739 $lh->that_method_name(
740 ...rest of items in this group...
741 ),
742
743=item *
744
ff5ad48a 745If the first item in a bracket group is "*", it's taken as shorthand
746for the so commonly called "quant" method. Similarly, if the first
747item in a bracket group is "#", it's taken to be shorthand for
748"numf".
749
750=item *
751
f666394a 752If the first item in a bracket group is the empty-string, or "_*"
9378c581 753or "_I<digits>" or "_-I<digits>", then that group is interpreted
754as just the interpolation of all its items:
755
756 join('',
757 ...rest of items in this group...
758 ),
759
760Examples: "[_1]" and "[,_1]", which are synonymous; and
f918d677 761"C<[,ID-(,_4,-,_2,)]>", which compiles as
9378c581 762C<join "", "ID-(", $_[4], "-", $_[2], ")">.
763
764=item *
765
766Otherwise this bracket group is invalid. For example, in the group
f666394a 767"[!@#,whatever]", the first item C<"!@#"> is neither the empty-string,
9378c581 768"_I<number>", "_-I<number>", "_*", nor a valid method name; and so
769Locale::Maketext will throw an exception of you try compiling an
770expression containing this bracket group.
771
772=back
773
774Note, incidentally, that items in each group are comma-separated,
775not C</\s*,\s*/>-separated. That is, you might expect that this
776bracket group:
777
778 "Hoohah [foo, _1 , bar ,baz]!"
779
780would compile to this:
781
782 sub {
783 my $lh = $_[0];
784 return join '',
785 "Hoohah ",
786 $lh->foo( $_[1], "bar", "baz"),
787 "!",
788 }
789
790But it actually compiles as this:
791
792 sub {
793 my $lh = $_[0];
794 return join '',
795 "Hoohah ",
f666394a 796 $lh->foo(" _1 ", " bar ", "baz"), # note the <space> in " bar "
9378c581 797 "!",
798 }
799
800In the notation discussed so far, the characters "[" and "]" are given
801special meaning, for opening and closing bracket groups, and "," has
802a special meaning inside bracket groups, where it separates items in the
803group. This begs the question of how you'd express a literal "[" or
804"]" in a Bracket Notation string, and how you'd express a literal
805comma inside a bracket group. For this purpose I've adopted "~" (tilde)
806as an escape character: "~[" means a literal '[' character anywhere
807in Bracket Notation (i.e., regardless of whether you're in a bracket
808group or not), and ditto for "~]" meaning a literal ']', and "~," meaning
809a literal comma. (Altho "," means a literal comma outside of
810bracket groups -- it's only inside bracket groups that commas are special.)
811
812And on the off chance you need a literal tilde in a bracket expression,
813you get it with "~~".
814
815Currently, an unescaped "~" before a character
816other than a bracket or a comma is taken to mean just a "~" and that
f918d677 817character. I.e., "~X" means the same as "~~X" -- i.e., one literal tilde,
9378c581 818and then one literal "X". However, by using "~X", you are assuming that
819no future version of Maketext will use "~X" as a magic escape sequence.
820In practice this is not a great problem, since first off you can just
821write "~~X" and not worry about it; second off, I doubt I'll add lots
822of new magic characters to bracket notation; and third off, you
823aren't likely to want literal "~" characters in your messages anyway,
824since it's not a character with wide use in natural language text.
825
826Brackets must be balanced -- every openbracket must have
827one matching closebracket, and vice versa. So these are all B<invalid>:
828
829 "I ate [quant,_1,rhubarb pie."
830 "I ate [quant,_1,rhubarb pie[."
831 "I ate quant,_1,rhubarb pie]."
832 "I ate quant,_1,rhubarb pie[."
833
834Currently, bracket groups do not nest. That is, you B<cannot> say:
835
836 "Foo [bar,baz,[quux,quuux]]\n";
837
838If you need a notation that's that powerful, use normal Perl:
839
840 %Lexicon = (
841 ...
842 "some_key" => sub {
843 my $lh = $_[0];
844 join '',
845 "Foo ",
846 $lh->bar('baz', $lh->quux('quuux')),
847 "\n",
848 },
849 ...
850 );
851
852Or write the "bar" method so you don't need to pass it the
853output from calling quux.
854
855I do not anticipate that you will need (or particularly want)
856to nest bracket groups, but you are welcome to email me with
857convincing (real-life) arguments to the contrary.
858
859=head1 AUTO LEXICONS
860
861If maketext goes to look in an individual %Lexicon for an entry
862for I<key> (where I<key> does not start with an underscore), and
863sees none, B<but does see> an entry of "_AUTO" => I<some_true_value>,
864then we actually define $Lexicon{I<key>} = I<key> right then and there,
865and then use that value as if it had been there all
866along. This happens before we even look in any superclass %Lexicons!
867
868(This is meant to be somewhat like the AUTOLOAD mechanism in
869Perl's function call system -- or, looked at another way,
870like the L<AutoLoader|AutoLoader> module.)
871
872I can picture all sorts of circumstances where you just
873do not want lookup to be able to fail (since failing
f666394a 874normally means that maketext throws a C<die>, although
9378c581 875see the next section for greater control over that). But
876here's one circumstance where _AUTO lexicons are meant to
877be I<especially> useful:
878
879As you're writing an application, you decide as you go what messages
880you need to emit. Normally you'd go to write this:
881
882 if(-e $filename) {
883 go_process_file($filename)
884 } else {
f666394a 885 print qq{Couldn't find file "$filename"!\n};
9378c581 886 }
887
888but since you anticipate localizing this, you write:
889
890 use ThisProject::I18N;
891 my $lh = ThisProject::I18N->get_handle();
892 # For the moment, assume that things are set up so
893 # that we load class ThisProject::I18N::en
f918d677 894 # and that that's the class that $lh belongs to.
9378c581 895 ...
896 if(-e $filename) {
897 go_process_file($filename)
898 } else {
899 print $lh->maketext(
f666394a 900 qq{Couldn't find file "[_1]"!\n}, $filename
9378c581 901 );
902 }
903
904Now, right after you've just written the above lines, you'd
905normally have to go open the file
906ThisProject/I18N/en.pm, and immediately add an entry:
907
908 "Couldn't find file \"[_1]\"!\n"
909 => "Couldn't find file \"[_1]\"!\n",
910
911But I consider that somewhat of a distraction from the work
912of getting the main code working -- to say nothing of the fact
913that I often have to play with the program a few times before
914I can decide exactly what wording I want in the messages (which
915in this case would require me to go changing three lines of code:
916the call to maketext with that key, and then the two lines in
917ThisProject/I18N/en.pm).
918
919However, if you set "_AUTO => 1" in the %Lexicon in,
920ThisProject/I18N/en.pm (assuming that English (en) is
921the language that all your programmers will be using for this
922project's internal message keys), then you don't ever have to
923go adding lines like this
924
925 "Couldn't find file \"[_1]\"!\n"
926 => "Couldn't find file \"[_1]\"!\n",
927
928to ThisProject/I18N/en.pm, because if _AUTO is true there,
929then just looking for an entry with the key "Couldn't find
930file \"[_1]\"!\n" in that lexicon will cause it to be added,
931with that value!
932
933Note that the reason that keys that start with "_"
934are immune to _AUTO isn't anything generally magical about
935the underscore character -- I just wanted a way to have most
936lexicon keys be autoable, except for possibly a few, and I
937arbitrarily decided to use a leading underscore as a signal
938to distinguish those few.
939
940=head1 CONTROLLING LOOKUP FAILURE
941
942If you call $lh->maketext(I<key>, ...parameters...),
943and there's no entry I<key> in $lh's class's %Lexicon, nor
944in the superclass %Lexicon hash, I<and> if we can't auto-make
945I<key> (because either it starts with a "_", or because none
946of its lexicons have C<_AUTO =E<gt> 1,>), then we have
947failed to find a normal way to maketext I<key>. What then
f666394a 948happens in these failure conditions, depends on the $lh object's
9378c581 949"fail" attribute.
950
951If the language handle has no "fail" attribute, maketext
952will simply throw an exception (i.e., it calls C<die>, mentioning
953the I<key> whose lookup failed, and naming the line number where
954the calling $lh->maketext(I<key>,...) was.
955
956If the language handle has a "fail" attribute whose value is a
957coderef, then $lh->maketext(I<key>,...params...) gives up and calls:
958
f666394a 959 return $that_subref->($lh, $key, @params);
9378c581 960
961Otherwise, the "fail" attribute's value should be a string denoting
962a method name, so that $lh->maketext(I<key>,...params...) can
963give up with:
964
965 return $lh->$that_method_name($phrase, @params);
966
967The "fail" attribute can be accessed with the C<fail_with> method:
968
969 # Set to a coderef:
970 $lh->fail_with( \&failure_handler );
971
972 # Set to a method name:
973 $lh->fail_with( 'failure_method' );
974
975 # Set to nothing (i.e., so failure throws a plain exception)
976 $lh->fail_with( undef );
977
f666394a 978 # Get the current value
9378c581 979 $handler = $lh->fail_with();
980
981Now, as to what you may want to do with these handlers: Maybe you'd
982want to log what key failed for what class, and then die. Maybe
983you don't like C<die> and instead you want to send the error message
984to STDOUT (or wherever) and then merely C<exit()>.
985
986Or maybe you don't want to C<die> at all! Maybe you could use a
987handler like this:
988
989 # Make all lookups fall back onto an English value,
f666394a 990 # but only after we log it for later fingerpointing.
9378c581 991 my $lh_backup = ThisProject->get_handle('en');
992 open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
993 sub lex_fail {
994 my($failing_lh, $key, $params) = @_;
995 print LEX_FAIL_LOG scalar(localtime), "\t",
996 ref($failing_lh), "\t", $key, "\n";
997 return $lh_backup->maketext($key,@params);
998 }
999
1000Some users have expressed that they think this whole mechanism of
1001having a "fail" attribute at all, seems a rather pointless complication.
1002But I want Locale::Maketext to be usable for software projects of I<any>
1003scale and type; and different software projects have different ideas
1004of what the right thing is to do in failure conditions. I could simply
1005say that failure always throws an exception, and that if you want to be
1006careful, you'll just have to wrap every call to $lh->maketext in an
1007S<eval { }>. However, I want programmers to reserve the right (via
1008the "fail" attribute) to treat lookup failure as something other than
1009an exception of the same level of severity as a config file being
f918d677 1010unreadable, or some essential resource being inaccessible.
9378c581 1011
1012One possibly useful value for the "fail" attribute is the method name
f666394a 1013"failure_handler_auto". This is a method defined in the class
9378c581 1014Locale::Maketext itself. You set it with:
1015
1016 $lh->fail_with('failure_handler_auto');
1017
1018Then when you call $lh->maketext(I<key>, ...parameters...) and
1019there's no I<key> in any of those lexicons, maketext gives up with
1020
1021 return $lh->failure_handler_auto($key, @params);
1022
1023But failure_handler_auto, instead of dying or anything, compiles
f666394a 1024$key, caching it in
1025
1026 $lh->{'failure_lex'}{$key} = $complied
1027
9378c581 1028and then calls the compiled value, and returns that. (I.e., if
1029$key looks like bracket notation, $compiled is a sub, and we return
1030&{$compiled}(@params); but if $key is just a plain string, we just
1031return that.)
1032
1033The effect of using "failure_auto_handler"
1034is like an AUTO lexicon, except that it 1) compiles $key even if
1035it starts with "_", and 2) you have a record in the new hashref
1036$lh->{'failure_lex'} of all the keys that have failed for
1037this object. This should avoid your program dying -- as long
1038as your keys aren't actually invalid as bracket code, and as
1039long as they don't try calling methods that don't exist.
1040
1041"failure_auto_handler" may not be exactly what you want, but I
1042hope it at least shows you that maketext failure can be mitigated
1043in any number of very flexible ways. If you can formalize exactly
1044what you want, you should be able to express that as a failure
1045handler. You can even make it default for every object of a given
1046class, by setting it in that class's init:
1047
1048 sub init {
1049 my $lh = $_[0]; # a newborn handle
1050 $lh->SUPER::init();
1051 $lh->fail_with('my_clever_failure_handler');
1052 return;
1053 }
1054 sub my_clever_failure_handler {
1055 ...you clever things here...
1056 }
1057
1058=head1 HOW TO USE MAKETEXT
1059
1060Here is a brief checklist on how to use Maketext to localize
1061applications:
1062
1063=over
1064
1065=item *
1066
1067Decide what system you'll use for lexicon keys. If you insist,
1068you can use opaque IDs (if you're nostalgic for C<catgets>),
1069but I have better suggestions in the
1070section "Entries in Each Lexicon", above. Assuming you opt for
1071meaningful keys that double as values (like "Minimum ([_1]) is
1072larger than maximum ([_2])!\n"), you'll have to settle on what
1073language those should be in. For the sake of argument, I'll
1074call this English, specifically American English, "en-US".
1075
1076=item *
1077
1078Create a class for your localization project. This is
1079the name of the class that you'll use in the idiom:
1080
1081 use Projname::L10N;
1082 my $lh = Projname::L10N->get_handle(...) || die "Language?";
1083
f666394a 1084Assuming you call your class Projname::L10N, create a class
9378c581 1085consisting minimally of:
1086
1087 package Projname::L10N;
1088 use base qw(Locale::Maketext);
1089 ...any methods you might want all your languages to share...
1090
1091 # And, assuming you want the base class to be an _AUTO lexicon,
1092 # as is discussed a few sections up:
1093
1094 1;
1095
1096=item *
1097
1098Create a class for the language your internal keys are in. Name
1099the class after the language-tag for that language, in lowercase,
1100with dashes changed to underscores. Assuming your project's first
1101language is US English, you should call this Projname::L10N::en_us.
1102It should consist minimally of:
1103
1104 package Projname::L10N::en_us;
1105 use base qw(Projname::L10N);
1106 %Lexicon = (
1107 '_AUTO' => 1,
1108 );
1109 1;
1110
1111(For the rest of this section, I'll assume that this "first
1112language class" of Projname::L10N::en_us has
1113_AUTO lexicon.)
1114
1115=item *
1116
1117Go and write your program. Everywhere in your program where
1118you would say:
1119
1120 print "Foobar $thing stuff\n";
1121
1122instead do it thru maketext, using no variable interpolation in
1123the key:
1124
1125 print $lh->maketext("Foobar [_1] stuff\n", $thing);
1126
1127If you get tired of constantly saying C<print $lh-E<gt>maketext>,
1128consider making a functional wrapper for it, like so:
1129
1130 use Projname::L10N;
1131 use vars qw($lh);
1132 $lh = Projname::L10N->get_handle(...) || die "Language?";
1133 sub pmt (@) { print( $lh->maketext(@_)) }
1134 # "pmt" is short for "Print MakeText"
1135 $Carp::Verbose = 1;
1136 # so if maketext fails, we see made the call to pmt
1137
1138Besides whole phrases meant for output, anything language-dependent
1139should be put into the class Projname::L10N::en_us,
1140whether as methods, or as lexicon entries -- this is discussed
1141in the section "Entries in Each Lexicon", above.
1142
1143=item *
1144
1145Once the program is otherwise done, and once its localization for
1146the first language works right (via the data and methods in
1147Projname::L10N::en_us), you can get together the data for translation.
1148If your first language lexicon isn't an _AUTO lexicon, then you already
1149have all the messages explicitly in the lexicon (or else you'd be
1150getting exceptions thrown when you call $lh->maketext to get
1151messages that aren't in there). But if you were (advisedly) lazy and are
1152using an _AUTO lexicon, then you've got to make a list of all the phrases
1153that you've so far been letting _AUTO generate for you. There are very
1154many ways to assemble such a list. The most straightforward is to simply
1155grep the source for every occurrence of "maketext" (or calls
1156to wrappers around it, like the above C<pmt> function), and to log the
1157following phrase.
1158
1159=item *
1160
f666394a 1161You may at this point want to consider whether your base class
1162(Projname::L10N), from which all lexicons inherit from (Projname::L10N::en,
1163Projname::L10N::es, etc.), should be an _AUTO lexicon. It may be true
9378c581 1164that in theory, all needed messages will be in each language class;
1165but in the presumably unlikely or "impossible" case of lookup failure,
1166you should consider whether your program should throw an exception,
1167emit text in English (or whatever your project's first language is),
1168or some more complex solution as described in the section
1169"Controlling Lookup Failure", above.
1170
1171=item *
1172
1173Submit all messages/phrases/etc. to translators.
1174
1175(You may, in fact, want to start with localizing to I<one> other language
f666394a 1176at first, if you're not sure that you've properly abstracted the
9378c581 1177language-dependent parts of your code.)
1178
1179Translators may request clarification of the situation in which a
1180particular phrase is found. For example, in English we are entirely happy
1181saying "I<n> files found", regardless of whether we mean "I looked for files,
1182and found I<n> of them" or the rather distinct situation of "I looked for
1183something else (like lines in files), and along the way I saw I<n>
1184files." This may involve rethinking things that you thought quite clear:
1185should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is
1186there already a conventionalized way to express that menu option, separate
1187from the target language's normal word for "to edit"?
1188
1189In all cases where the very common phenomenon of quantification
1190(saying "I<N> files", for B<any> value of N)
1191is involved, each translator should make clear what dependencies the
1192number causes in the sentence. In many cases, dependency is
1193limited to words adjacent to the number, in places where you might
1194expect them ("I found the-?PLURAL I<N>
1195empty-?PLURAL directory-?PLURAL"), but in some cases there are
1196unexpected dependencies ("I found-?PLURAL ..."!) as well as long-distance
1197dependencies "The I<N> directory-?PLURAL could not be deleted-?PLURAL"!).
1198
1199Remind the translators to consider the case where N is 0:
1200"0 files found" isn't exactly natural-sounding in any language, but it
1201may be unacceptable in many -- or it may condition special
1202kinds of agreement (similar to English "I didN'T find ANY files").
1203
1204Remember to ask your translators about numeral formatting in their
1205language, so that you can override the C<numf> method as
1206appropriate. Typical variables in number formatting are: what to
1207use as a decimal point (comma? period?); what to use as a thousands
f918d677 1208separator (space? nonbreaking space? comma? period? small
9378c581 1209middot? prime? apostrophe?); and even whether the so-called "thousands
1210separator" is actually for every third digit -- I've heard reports of
f918d677 1211two hundred thousand being expressible as "2,00,000" for some Indian
9378c581 1212(Subcontinental) languages, besides the less surprising "S<200 000>",
1213"200.000", "200,000", and "200'000". Also, using a set of numeral
1214glyphs other than the usual ASCII "0"-"9" might be appreciated, as via
1215C<tr/0-9/\x{0966}-\x{096F}/> for getting digits in Devanagari script
1216(for Hindi, Konkani, others).
1217
1218The basic C<quant> method that Locale::Maketext provides should be
1219good for many languages. For some languages, it might be useful
1220to modify it (or its constituent C<numerate> method)
1221to take a plural form in the two-argument call to C<quant>
1222(as in "[quant,_1,files]") if
1223it's all-around easier to infer the singular form from the plural, than
1224to infer the plural form from the singular.
1225
1226But for other languages (as is discussed at length
1227in L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>), simple
1228C<quant>/C<numerify> is not enough. For the particularly problematic
1229Slavic languages, what you may need is a method which you provide
1230with the number, the citation form of the noun to quantify, and
1231the case and gender that the sentence's syntax projects onto that
1232noun slot. The method would then be responsible for determining
1233what grammatical number that numeral projects onto its noun phrase,
1234and what case and gender it may override the normal case and gender
1235with; and then it would look up the noun in a lexicon providing
1236all needed inflected forms.
1237
1238=item *
1239
1240You may also wish to discuss with the translators the question of
1241how to relate different subforms of the same language tag,
1242considering how this reacts with C<get_handle>'s treatment of
1243these. For example, if a user accepts interfaces in "en, fr", and
1244you have interfaces available in "en-US" and "fr", what should
1245they get? You may wish to resolve this by establishing that "en"
1246and "en-US" are effectively synonymous, by having one class
1247zero-derive from the other.
1248
1249For some languages this issue may never come up (Danish is rarely
1250expressed as "da-DK", but instead is just "da"). And for other
1251languages, the whole concept of a "generic" form may verge on
1252being uselessly vague, particularly for interfaces involving voice
1253media in forms of Arabic or Chinese.
1254
1255=item *
1256
1257Once you've localized your program/site/etc. for all desired
1258languages, be sure to show the result (whether live, or via
1259screenshots) to the translators. Once they approve, make every
1260effort to have it then checked by at least one other speaker of
1261that language. This holds true even when (or especially when) the
1262translation is done by one of your own programmers. Some
1263kinds of systems may be harder to find testers for than others,
1264depending on the amount of domain-specific jargon and concepts
1265involved -- it's easier to find people who can tell you whether
1266they approve of your translation for "delete this message" in an
1267email-via-Web interface, than to find people who can give you
1268an informed opinion on your translation for "attribute value"
1269in an XML query tool's interface.
1270
1271=back
1272
1273=head1 SEE ALSO
1274
1275I recommend reading all of these:
1276
1277L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13> -- my I<The Perl
1278Journal> article about Maketext. It explains many important concepts
1279underlying Locale::Maketext's design, and some insight into why
f666394a 1280Maketext is better than the plain old approach of having
9378c581 1281message catalogs that are just databases of sprintf formats.
1282
1283L<File::Findgrep|File::Findgrep> is a sample application/module
f918d677 1284that uses Locale::Maketext to localize its messages. For a larger
1285internationalized system, see also L<Apache::MP3>.
9378c581 1286
1287L<I18N::LangTags|I18N::LangTags>.
1288
1289L<Win32::Locale|Win32::Locale>.
1290
1291RFC 3066, I<Tags for the Identification of Languages>,
1292as at http://sunsite.dk/RFC/rfc/rfc3066.html
1293
1294RFC 2277, I<IETF Policy on Character Sets and Languages>
1295is at http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is
1296just things of interest to protocol designers, but it explains
1297some basic concepts, like the distinction between locales and
1298language-tags.
1299
1300The manual for GNU C<gettext>. The gettext dist is available in
1301C<ftp://prep.ai.mit.edu/pub/gnu/> -- get
1302a recent gettext tarball and look in its "doc/" directory, there's
1303an easily browsable HTML version in there. The
1304gettext documentation asks lots of questions worth thinking
1305about, even if some of their answers are sometimes wonky,
1306particularly where they start talking about pluralization.
1307
1308The Locale/Maketext.pm source. Obverse that the module is much
1309shorter than its documentation!
1310
1311=head1 COPYRIGHT AND DISCLAIMER
1312
14be35aa 1313Copyright (c) 1999-2004 Sean M. Burke. All rights reserved.
9378c581 1314
1315This library is free software; you can redistribute it and/or modify
1316it under the same terms as Perl itself.
1317
1318This program is distributed in the hope that it will be useful, but
1319without any warranty; without even the implied warranty of
1320merchantability or fitness for a particular purpose.
1321
1322=head1 AUTHOR
1323
1324Sean M. Burke C<sburke@cpan.org>
1325
1326=cut