From: Anton Tagunov Date: Mon, 4 Mar 2002 05:41:41 +0000 (+0300) Subject: [ID 20020303.006] [Doc][utf8::up/down grade][use encoding] application for clarification X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=f1e62f77e429d3d8456955e82037ca65bbe65d82;p=p5sagit%2Fp5-mst-13.2.git [ID 20020303.006] [Doc][utf8::up/down grade][use encoding] application for clarification Date: Mon, 4 Mar 2002 05:41:41 +0300 Message-Id: <7916563907.20020304054141@motor.ru> Subject: [ID 20020303.005] Patch perlinicode C API description From: Anton Tagunov Date: Mon, 4 Mar 2002 06:08:23 +0300 Message-Id: <2018165510.20020304060823@motor.ru> p4raw-id: //depot/perl@14981 --- diff --git a/lib/utf8.pm b/lib/utf8.pm index 86456d5..7b1ef0d 100644 --- a/lib/utf8.pm +++ b/lib/utf8.pm @@ -97,6 +97,13 @@ into logical characters. =back +C is like C but the UTF8 flag does not +get turned on. See L for more on the UTF8 flag and the C +API functions C, C, +C, C that are wrapped by the Perl +functions C, C, C and +C. + =head1 SEE ALSO L, L diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 7ea8714..c170d2c 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -848,16 +848,16 @@ the following C APIs useful (see perlapi for details): =item * -DO_UTF8(sv) returns true if the UTF8 flag is on and the bytes -pragma is not in effect. SvUTF8(sv) returns true is the UTF8 -flag is on, the bytes pragma is ignored. Remember that UTF8 -flag being on does not mean that there would be any characters -of code points greater than 255 or 127 in the scalar, or that -there even are any characters in the scalar. The UTF8 flag -means that any characters added to the string will be encoded -in UTF8 if the code points of the characters are greater than -255. Not "if greater than 127", since Perl's Unicode model -is not to use UTF-8 until it's really necessary. +DO_UTF8(sv) returns true if the UTF8 flag is on and the bytes pragma +is not in effect. SvUTF8(sv) returns true is the UTF8 flag is on, the +bytes pragma is ignored. The UTF8 flag being on does B mean that +there are any characters of code points greater than 255 (or 127) in the +scalar, or that there even are any characters in the scalar. What the +UTF8 flag means is that the sequence of octets in the representation +of the scalar should be treated as UTF-8 encoding of a string. +The UTF8 flag being off means that each octet in this representation +encodes a single character with codepoint 0..255 within the string. +Perl's Unicode model is not to use UTF-8 until it's really necessary. =item *