Add in perldelta changes about unpack A and trailing whitespace, trie
[p5sagit/p5-mst-13.2.git] / pod / perl592delta.pod
CommitLineData
e0eb806d 1=head1 NAME
2
3perldelta - what is new for perl v5.9.2
4
5=head1 DESCRIPTION
6
7This document describes differences between the 5.9.1 and the 5.9.2
fa11829f 8development releases. See L<perl590delta> and L<perl591delta> for the
e0eb806d 9differences between 5.8.0 and 5.9.1.
10
11=head1 Incompatible Changes
12
a8cf0b1d 13=head2 Packing and UTF-8 strings
14
15The semantics of pack() and unpack() regarding UTF-8-encoded data has been
f1aa04aa 16changed. Processing is now by default character per character instead of
17byte per byte on the underlying encoding. Notably, code that used things
18like C<pack("a*", $string)> to see through the encoding of string will now
19simply get back the original $string. Packed strings can also get upgraded
20during processing when you store upgraded characters. You can get the old
21behaviour by using C<use bytes>.
a8cf0b1d 22
23To be consistent with pack(), the C<C0> in unpack() templates indicates
24that the data is to be processed in character mode, i.e. character by
25character; at the contrary, C<U0> in unpack() indicates UTF-8 mode, where
26the packed string is processed in its UTF-8-encoded Unicode form on a byte
27by byte basis. This is reversed with regard to perl 5.8.X.
28
29Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
30respectively character and byte modes.
31
f1aa04aa 32C<C0> and C<U0> in the middle of a pack or unpack format now switch to the
33specified encoding mode, honoring parens grouping. Previously, parens were
34ignored.
a8cf0b1d 35
36Also, there is a new pack() character format, C<W>, which is intended to
f1aa04aa 37replace the old C<C>. C<C> is kept for unsigned chars coded as bytes in
38the strings internal representation. C<W> represents unsigned (logical)
39character values, which can be greater than 255. It is therefore more
40robust when dealing with potentially UTF-8-encoded data (as C<C> will wrap
41values outside the range 0..255, and not respect the string encoding).
a8cf0b1d 42
43In practice, that means that pack formats are now encoding-neutral, except
44C<C>.
45
1cdd6bcc 46For consistency, C<A> in unpack() format now trims all Unicode whitespace
47from the end of the string. Before perl 5.9.2, it used to strip only the
48classical ASCII space characters.
49
e0eb806d 50=head1 Core Enhancements
51
1cdd6bcc 52=head2 Regexp debug flags
53
54A new variable, ${^RE_DEBUG_FLAGS}, controls what debug flags are in
55effect for the regular expression engine when running under C<use re
56"debug">. See L<re> for details.
57
e0eb806d 58=head1 Modules and Pragmata
59
60=head1 Utility Changes
61
62=head1 Documentation
63
64=head1 Performance Enhancements
65
1cdd6bcc 66=head2 Trie optimization for regexp engine
67
68The regexp engine is now able to factorize common prefixes and suffixes in
69regular expressions. A new special variable, ${^RE_TRIE_MAXBUFF}, has been
70added to fine tune this optimization.
71
e0eb806d 72=head1 Installation and Configuration Improvements
73
74=head1 Selected Bug Fixes
75
76=head1 New or Changed Diagnostics
77
78=head1 Changed Internals
79
80=head1 Known Problems
81
82=head2 Platform Specific Problems
83
84=head1 Reporting Bugs
85
86If you find what you think is a bug, you might check the articles
87recently posted to the comp.lang.perl.misc newsgroup and the perl
88bug database at http://bugs.perl.org/ . There may also be
89information at http://www.perl.org/ , the Perl Home Page.
90
91If you believe you have an unreported bug, please run the B<perlbug>
92program included with your release. Be sure to trim your bug down
93to a tiny but sufficient test case. Your bug report, along with the
94output of C<perl -V>, will be sent off to perlbug@perl.org to be
95analysed by the Perl porting team.
96
97=head1 SEE ALSO
98
99The F<Changes> file for exhaustive details on what changed.
100
101The F<INSTALL> file for how to build Perl.
102
103The F<README> file for general stuff.
104
105The F<Artistic> and F<Copying> files for copyright information.
106
107=cut