From: Jarkko Hietaniemi Date: Sun, 11 May 2003 06:25:08 +0000 (+0000) Subject: Clarify the doc (and the code) for Unicode code points. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=dc0a4417816de4de16d412283906a2a3c2bbce9b;p=p5sagit%2Fp5-mst-13.2.git Clarify the doc (and the code) for Unicode code points. p4raw-id: //depot/perl@19481 --- diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm index dcb478a..ec1c998 100644 --- a/lib/Unicode/UCD.pm +++ b/lib/Unicode/UCD.pm @@ -126,9 +126,9 @@ you will need also the compexcl(), casefold(), and casespec() functions. sub _getcode { my $arg = shift; - if ($arg =~ /^\d+$/) { + if ($arg =~ /^[1-9]\d*$/) { return $arg; - } elsif ($arg =~ /^(?:U\+|0x)?([[:xdigit:]]+)$/) { + } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) { return hex($1); } @@ -457,9 +457,12 @@ any of the 256 code points in the Tibetan block). A I is either a decimal or a hexadecimal scalar designating a Unicode character, or C followed by hexadecimals -designating a Unicode character. Note that Unicode is B limited -to 16 bits (the number of Unicode characters is open-ended, in theory -unlimited): you may have more than 4 hexdigits. +designating a Unicode character. In other words, if you want a code +point to be interpreted as a hexadecimal number, you must prefix it +with either C<0x> or C, becauseq a string like e.g. C<123> will +be interpreted as a decimal code point. Also note that Unicode is +B limited to 16 bits (the number of Unicode characters is +open-ended, in theory unlimited): you may have more than 4 hexdigits. =head2 charinrange