X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlunicode.pod;h=f1308beb1fba791e935ca8b448eae5e0b5463e77;hb=38cabd2daffcdb193c3134c3eb36b7ceaa4ca051;hp=46ea68216c60e62af7825f085b6d4251f2c49c5c;hpb=12ac2576dfc10fd43d91903e7602870c10b4f00f;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 46ea682..f1308be 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -42,6 +42,14 @@ is needed.> See L. You can also use the C pragma to change the default encoding of the data in your script; see L. +=item BOM-marked scripts and UTF-16 scripts autodetected + +If a Perl script begins marked with the Unicode BOM (UTF-16LE, UTF16-BE, +or UTF-8), or if the script looks like non-BOM-marked UTF-16 of either +endianness, Perl will correctly read in the script as Unicode. +(BOMless UTF-8 cannot be effectively recognized or differentiated from +ISO 8859-1 or other eight-bit encodings.) + =item C needed to upgrade non-Latin-1 byte strings By default, there is a fundamental asymmetry in Perl's unicode model: @@ -563,25 +571,24 @@ that make the distinction. Most operators that deal with positions or lengths in a string will automatically switch to using character positions, including C, C, C, C, C, C, -C, C, and C. Operators that -specifically do not switch include C, C, and -C. Operators that really don't care include -operators that treats strings as a bucket of bits such as C, -and operators dealing with filenames. +C, C, and C. An operator that +specifically does not switch is C. Operators that really don't +care include operators that treat strings as a bucket of bits such as +C, and operators dealing with filenames. =item * -The C/C letters C and C do I change, -since they are often used for byte-oriented formats. Again, think -C in the C language. +The C/C letter C does I change, since it is often +used for byte-oriented formats. Again, think C in the C language. There is a new C specifier that converts between Unicode characters -and code points. +and code points. There is also a C specifier that is the equivalent of +C/C and properly handles character values even if they are above 255. =item * The C and C functions work on characters, similar to -C and C, I C and +C and C, I C and C. C and C are methods for emulating byte-oriented C and C on Unicode strings. While these methods reveal the internal encoding of Unicode strings,