5 I18N::Collate - compare 8-bit scalar data according to the current locale
10 setlocale(LC_COLLATE, 'locale-of-your-choice');
11 $s1 = new I18N::Collate "scalar_data_1";
12 $s2 = new I18N::Collate "scalar_data_2";
16 This module provides you with objects that will collate
17 according to your national character set, provided that the
18 POSIX setlocale() function is supported on your system.
20 You can compare $s1 and $s2 above with
24 to extract the data itself, you'll need a dereference: $$s1
26 This uses POSIX::setlocale(). The basic collation conversion is done by
27 strxfrm() which terminates at NUL characters being a decent C routine.
28 collate_xfrm() handles embedded NUL characters gracefully. Due to C<cmp>
29 and overload magic, C<lt>, C<le>, C<eq>, C<ge>, and C<gt> work also. The
30 available locales depend on your operating system; try whether C<locale
31 -a> shows them or man pages for "locale" or "nlsinfo" or
32 the direct approach C<ls /usr/lib/nls/loc> or C<ls
33 /usr/lib/nls>. Not all the locales that your vendor supports
34 are necessarily installed: please consult your operating system's
35 documentation and possibly your local system administration.
37 The locale names are probably something like
38 C<"xx_XX.(ISO)?8859-N"> or C<"xx_XX.(ISO)?8859N">, for example
39 C<"fr_CH.ISO8859-1"> is the Swiss (CH) variant of French (fr),
40 ISO Latin (8859) 1 (-1) which is the Western European character set.
46 # Author: Jarkko Hietaniemi <Jarkko.Hietaniemi@hut.fi>
47 # Helsinki University of Technology, Finland
49 # Acks: Guy Decoux <decoux@moulon.inra.fr> understood
50 # overloading magic much deeper than I and told
51 # how to cut the size of this code by more than half.
52 # (my first version did overload all of lt gt eq le ge cmp)
54 # Purpose: compare 8-bit scalar data according to the current locale
56 # Requirements: Perl5 POSIX::setlocale() and POSIX::strxfrm()
58 # Exports: setlocale 1)
63 # Usage: use I18N::Collate;
64 # setlocale(LC_COLLATE, 'locale-of-your-choice'); # 4)
65 # $s1 = new I18N::Collate "scalar_data_1";
66 # $s2 = new I18N::Collate "scalar_data_2";
68 # now you can compare $s1 and $s2: $s1 le $s2
69 # to extract the data itself, you need to deref: $$s1
72 # 1) this uses POSIX::setlocale
73 # 2) the basic collation conversion is done by strxfrm() which
74 # terminates at NUL characters being a decent C routine.
75 # collate_xfrm handles embedded NUL characters gracefully.
76 # 3) due to cmp and overload magic, lt le eq ge gt work also
77 # 4) the available locales depend on your operating system;
78 # try whether "locale -a" shows them or man pages for
79 # "locale" or "nlsinfo" work or the more direct
80 # approach "ls /usr/lib/nls/loc" or "ls /usr/lib/nls".
81 # Not all the locales that your vendor supports
82 # are necessarily installed: please consult your
83 # operating system's documentation.
84 # The locale names are probably something like
85 # 'xx_XX.(ISO)?8859-N' or 'xx_XX.(ISO)?8859N',
86 # for example 'fr_CH.ISO8859-1' is the Swiss (CH)
87 # variant of French (fr), ISO Latin (8859) 1 (-1)
88 # which is the Western European character set.
90 # Updated: 19960104 1946 GMT
94 use POSIX qw(strxfrm LC_COLLATE);
99 @EXPORT = qw(collate_xfrm setlocale LC_COLLATE);
107 sub new { my $new = $_[1]; bless \$new }
110 my ($category, $locale) = @_[0,1];
112 POSIX::setlocale($category, $locale) if (defined $category);
113 # the current $LOCALE
114 $LOCALE = $locale || $ENV{'LC_COLLATE'} || $ENV{'LC_ALL'} || '';
120 $C->{$LOCALE}->{$s} = collate_xfrm($s)
121 unless (defined $C->{$LOCALE}->{$s}); # cache when met
130 for (split(/(\000+)/, $s)) {
131 $x .= (/^\000/) ? $_ : strxfrm("$_\000");
138 &C($_[0]) cmp &C($_[1]);
143 &I18N::Collate::setlocale();
145 1; # keep require happy