[ID 20020303.006] [Doc][utf8::up/down grade][use encoding] application for clarification

diff --git a/lib/utf8.pm b/lib/utf8.pm

index 86456d5..7b1ef0d 100644 (file)
--- a/lib/utf8.pm
+++ b/lib/utf8.pm
@@ -97,6 +97,13 @@ into logical characters.
 
 =back
 
+C<utf8::encode> is like C<utf8::upgrade> but the UTF8 flag does not
+get turned on. See L<perlunicode> for more on the UTF8 flag and the C
+API functions C<sv_utf8_upgrade>, C<sv_utf8_downgrade>,
+C<sv_utf8_encode>, C<sv_utf8_decode> that are wrapped by the Perl
+functions C<utf8::upgrade>, C<utf8::downgrade>, C<utf8::encode> and
+C<utf8::decode>.
+
 =head1 SEE ALSO
 
 L<perlunicode>, L<bytes>
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod

index 7ea8714..c170d2c 100644 (file)
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -848,16 +848,16 @@ the following C APIs useful (see perlapi for details):
 
 =item *
 
-DO_UTF8(sv) returns true if the UTF8 flag is on and the bytes
-pragma is not in effect.  SvUTF8(sv) returns true is the UTF8
-flag is on, the bytes pragma is ignored.  Remember that UTF8
-flag being on does not mean that there would be any characters
-of code points greater than 255 or 127 in the scalar, or that
-there even are any characters in the scalar.  The UTF8 flag
-means that any characters added to the string will be encoded
-in UTF8 if the code points of the characters are greater than
-255.  Not "if greater than 127", since Perl's Unicode model
-is not to use UTF-8 until it's really necessary.
+DO_UTF8(sv) returns true if the UTF8 flag is on and the bytes pragma
+is not in effect.  SvUTF8(sv) returns true is the UTF8 flag is on, the
+bytes pragma is ignored.  The UTF8 flag being on does B<not> mean that
+there are any characters of code points greater than 255 (or 127) in the
+scalar, or that there even are any characters in the scalar.  What the
+UTF8 flag means is that the sequence of octets in the representation
+of the scalar should be treated as UTF-8 encoding of a string.
+The UTF8 flag being off means that each octet in this representation
+encodes a single character with codepoint 0..255 within the string.
+Perl's Unicode model is not to use UTF-8 until it's really necessary.
 
 =item *
lib/utf8.pm		patch \| blob \| blame \| history
pod/perlunicode.pod		patch \| blob \| blame \| history