[ 7] include Letters in word characters
[ 8] see UTR#21 Case Mappings: Perl implements 1:1 mappings
[ 9] see UTR#13 Unicode Newline Guidelines
- [10] should do ^ and $ also on \x{2028} and \x{2029}
-
+ [10] should do ^ and $ also on \x{85}, \x{2028} and \x{2029})
+ (should also affect <>, $., and script line numbers)
+
(*) Instead of [\u0370-\u03FF-[{UNASSIGNED}]] as suggested by the TR
18 you can use negated lookahead: to match currently assigned modern
Greek characters use for example
is C<off> (positive or negative) Unicode characters displaced from the
UTF-8 buffer C<s>.
+=item *
+
+pv_uni_display(dsv, spv, len, pvlim, flags) and sv_uni_display(dsv,
+ssv, pvlim, flags) are useful for debug output of Unicode strings and
+scalars (only for debug: they display B<all> characters as hexadecimal
+code points).
+
+=item *
+
+ibcmp_utf8(s1, u1, len1, s2, u2, len2) can be used to compare two
+strings case-insensitively in Unicode. (For case-sensitive
+comparisons you can just use memEQ() and memNE() as usual.)
+
=back
For more information, see L<perlapi>, and F<utf8.c> and F<utf8.h>