From: Anton Tagunov Date: Wed, 6 Mar 2002 02:10:21 +0000 (+0300) Subject: Re[2]: [ID 20020303.005] Patch ... C API description X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=3c1c801782be26f18d6483e5f7f5316152bb11aa;p=p5sagit%2Fp5-mst-13.2.git Re[2]: [ID 20020303.005] Patch ... C API description Message-ID: <11152782757.20020306021021@motor.ru> (reworded) p4raw-id: //depot/perl@15056 --- diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index c94f3d2..8a7a055 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -596,9 +596,21 @@ string are necessary UTF-8 encoded, or that any of the characters have code points greater than 0xFF (255) or even 0x80 (128), or that the string has any characters at all. All the C does is to return the value of the internal "utf8ness" flag attached to the -$string. If the flag is on, characters added to that string will be -automatically upgraded to UTF-8 (and even then only if they really -need to be upgraded, that is, if their code point is greater than 0xFF). +$string. If the flag is off, the bytes in the scalar are interpreted +as a single byte encoding. If the flag is on, the bytes in the scalar +are interpreted as the (multibyte, variable-length) UTF-8 encoded code +points of the characters. Bytes added to an UTF-8 encoded string are +automatically upgraded to UTF-8. If mixed non-UTF8 and UTF-8 scalars +are merged (doublequoted interpolation, explicit concatenation, and +printf/sprintf parameter substitution), the result will be UTF-8 encoded +as if copies of the byte strings were upgraded to UTF-8: for example, + + $a = "ab\x80c"; + $b = "\x{100}"; + print "$a = $b\n"; + +the output string will be UTF-8-encoded "ab\x80c\x{100}\n", but note +that C<$a> will stay single byte encoded. Sometimes you might really need to know the byte length of a string instead of the character length. For that use the C pragma