X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlhack.pod;h=c815177fa19a1896d1fe1e7d798ae4ff9ff4e9d4;hb=cc7ef057bab1579c0576d0a578186a6e5ae298e2;hp=2d05fc3841a07f6a1634a7042ea88e0cb8ade5ff;hpb=37c0adebee4df35675070b1f30e9578094f823d7;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlhack.pod b/pod/perlhack.pod index 2d05fc3..c815177 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -875,7 +875,7 @@ C<"\0">. Line 13 manipulates the flags; since we've changed the PV, any IV or NV values will no longer be valid: if we have C<$a=10; $a.="6";> we don't -want to use the old IV of 10. C is a special UTF8-aware +want to use the old IV of 10. C is a special UTF-8-aware version of C, a macro which turns off the IOK and NOK flags and turns on POK. The final C is a macro which launders tainted data if taint mode is turned on. @@ -1439,7 +1439,7 @@ some things you'll need to know when fiddling with them. Let's now get on and create a simple patch. Here's something Larry suggested: if a C is the first active format during a C, (for example, C) then the resulting string should be treated as -UTF8 encoded. +UTF-8 encoded. How do we prepare to fix this up? First we locate the code in question - the C happens at runtime, so it's going to be in one of the F @@ -1488,7 +1488,7 @@ of C: while (pat < patend) { Now if we see a C which was at the start of the string, we turn on -the UTF8 flag for the output SV, C: +the C flag for the output SV, C: + if (datumtype == 'U' && pat==patcopy+1) + SvUTF8_on(cat); @@ -1574,10 +1574,10 @@ this text in the description of C: =item * If the pattern begins with a C, the resulting string will be treated - as Unicode-encoded. You can force UTF8 encoding on in a string with an - initial C, and the bytes that follow will be interpreted as Unicode - characters. If you don't want this to happen, you can begin your pattern - with C (or anything else) to force Perl not to UTF8 encode your + as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string + with an initial C, and the bytes that follow will be interpreted as + Unicode characters. If you don't want this to happen, you can begin your + pattern with C (or anything else) to force Perl not to UTF-8 encode your string, and then follow this with a C somewhere in your pattern. All done. Now let's create the patch. F tells us