From: karl williamson Date: Wed, 3 Dec 2008 19:51:54 +0000 (-0700) Subject: PATCH [perl #58430] Unicode::UCD::casefold() does not work as documented, X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=a452d459f5ff77fd7605dd9d2f58e9d44bc227cf;p=p5sagit%2Fp5-mst-13.2.git PATCH [perl #58430] Unicode::UCD::casefold() does not work as documented, Message-ID: <493745CA.6070300@khwilliamson.com> And bump version to 0.27 p4raw-id: //depot/perl@35036 --- diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm index a760d3c..dc3fede 100644 --- a/lib/Unicode/UCD.pm +++ b/lib/Unicode/UCD.pm @@ -3,7 +3,7 @@ package Unicode::UCD; use strict; use warnings; -our $VERSION = '0.26'; +our $VERSION = '0.27'; use Storable qw(dclone); @@ -61,9 +61,20 @@ Unicode::UCD - Unicode character database =head1 DESCRIPTION -The Unicode::UCD module offers a simple interface to the Unicode +The Unicode::UCD module offers a series of functions that +provide a simple interface to the Unicode Character Database. +=head2 code point argument + +Some of the functions are called with a I, which is either +a decimal or a hexadecimal scalar designating a Unicode code point, or C +followed by hexadecimals designating a Unicode code point. In other words, if +you want a code point to be interpreted as a hexadecimal number, you must +prefix it with either C<0x> or C, because a string like e.g. C<123> will be +interpreted as a decimal code point. Also note that Unicode is B limited +to 16 bits (the number of Unicode code points is open-ended, in theory +unlimited): you may have more than 4 hexdigits. =cut my $UNICODEFH; @@ -92,46 +103,136 @@ sub openunicode { return $f; } -=head2 charinfo +=head2 B use Unicode::UCD 'charinfo'; my $charinfo = charinfo(0x41); -charinfo() returns a reference to a hash that has the following fields -as defined by the Unicode standard: - - key - - code code point with at least four hexdigits - name name of the character IN UPPER CASE - category general category of the character - combining classes used in the Canonical Ordering Algorithm - bidi bidirectional type - decomposition character decomposition mapping - decimal if decimal digit this is the integer numeric value - digit if digit this is the numeric value - numeric if numeric is the integer or rational numeric value - mirrored if mirrored in bidirectional text - unicode10 Unicode 1.0 name if existed and different - comment ISO 10646 comment field - upper uppercase equivalent mapping - lower lowercase equivalent mapping - title titlecase equivalent mapping - - block block the character belongs to (used in \p{In...}) - script script the character belongs to - -If no match is found, a reference to an empty hash is returned. - -The C property is the same as returned by charinfo(). It is -not defined in the Unicode Character Database proper (Chapter 4 of the -Unicode 3.0 Standard, aka TUS3) but instead in an auxiliary database -(Chapter 14 of TUS3). Similarly for the C