From: Rafael Garcia-Suarez Date: Sun, 13 Mar 2005 21:14:36 +0000 (+0000) Subject: Document pack changes in perldelta X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=a8cf0b1d6622369ce66f477c7ed494f6ff1fd35c;p=p5sagit%2Fp5-mst-13.2.git Document pack changes in perldelta p4raw-id: //depot/perl@24035 --- diff --git a/pod/perl592delta.pod b/pod/perl592delta.pod index 2002132..fc251a8 100644 --- a/pod/perl592delta.pod +++ b/pod/perl592delta.pod @@ -10,6 +10,34 @@ differences between 5.8.0 and 5.9.1. =head1 Incompatible Changes +=head2 Packing and UTF-8 strings + +The semantics of pack() and unpack() regarding UTF-8-encoded data has been +clarified. B Notably, code that +uses C to see through the encoding of string will now +simply return $string. + +To be consistent with pack(), the C in unpack() templates indicates +that the data is to be processed in character mode, i.e. character by +character; at the contrary, C in unpack() indicates UTF-8 mode, where +the packed string is processed in its UTF-8-encoded Unicode form on a byte +by byte basis. This is reversed with regard to perl 5.8.X. + +Moreover, C and C can also be used in pack() templates to specify +respectively character and byte modes. + +C and C in the middle of a pack format now switch to the specified +encoding mode, honoring parens grouping. Previously, parens were ignored. + +Also, there is a new pack() character format, C, which is intended to +replace the old C. C is kept for unsigned chars coded on eight bits. +C represents unsigned character values, which can be greater than 255. +It is therefore more robust when dealing with potentially UTF-8-encoded +data (as C will wrap values outside the range 0..255). + +In practice, that means that pack formats are now encoding-neutral, except +C. + =head1 Core Enhancements =head1 Modules and Pragmata