From: Jarkko Hietaniemi Date: Sun, 16 Dec 2001 14:39:34 +0000 (+0000) Subject: More documentation for the encoding pragma. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=4ef28c72ed49aa6b9d3f54cb581962eceee8c546;p=p5sagit%2Fp5-mst-13.2.git More documentation for the encoding pragma. p4raw-id: //depot/perl@13719 --- diff --git a/lib/encoding.pm b/lib/encoding.pm index 4938bfd..642726d 100644 --- a/lib/encoding.pm +++ b/lib/encoding.pm @@ -77,6 +77,13 @@ since the C<\xDF> on the left will B be upgraded to C<\x{3af}> because of the C<\x{100}> on the left. You should not be mixing your legacy data and Unicode in the same string. +This pragma also affects encoding of the 0x80..0xFF code point range: +normally characters in that range are left as eight-bit bytes (unless +they are combined with characters with code points 0x100 or larger, +in which case all characters need to become UTF-8 encoded), but if +the C pragma is present, even the 0x80..0xFF range always +gets UTF-8 encoded. + If no encoding is specified, the environment variable L is consulted. If that fails, "latin1" (ISO 8859-1) is assumed. If no encoding can be found, C error will be thrown.