X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlunicode.pod;h=b05edab7b587459961b60fdd3d14086b0ecfb10c;hb=56da5a46eac515b5a165aaf05cb06f7bcdfd8e67;hp=d47e7dff6296cc893eefd125482aadefe02bb41d;hpb=1e8e823624ada1d9231e47a66cb2b9e3ab42701a;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index d47e7df..b05edab 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -166,6 +166,10 @@ bytes and match against the character properties specified in the Unicode properties database. C<\w> can be used to match a Japanese ideograph, for instance. +(However, and as a limitation of the current implementation, using +C<\w> or C<\W> I a C<[...]> character class will still match +with byte semantics.) + =item * Named Unicode properties, scripts, and block ranges may be used like @@ -1087,8 +1091,9 @@ as Unicode (UTF-8), there still are many places where Unicode (in some encoding or another) could be given as arguments or received as results, or both, but it is not. -The following are such interfaces. For all of these Perl currently -(as of 5.8.1) simply assumes byte strings both as arguments and results. +The following are such interfaces. For all of these interfaces Perl +currently (as of 5.8.3) simply assumes byte strings both as arguments +and results, or UTF-8 strings if the C pragma has been used. One reason why Perl does not attempt to resolve the role of Unicode in this cases is that the answers are highly dependent on the operating