as horribly unfriendly.
Or maybe we could use regular expressions:
-
+
while (<>) {
my($date, $desc, $income, $expend) =
m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|;
Hence, putting it all together:
- my($date,$description,$income,$expend) = unpack("A10xA27xA7A*", $_);
+ my($date,$description,$income,$expend) = unpack("A10xA27xA7xA*", $_);
Now, that's our data parsed. I suppose what we might want to do now is
total up our income and expenditure, and add another line to the end of
s! S! sizeof(short) $Config{shortsize}
i! I! sizeof(int) $Config{intsize}
l! L! sizeof(long) $Config{longsize}
- q! Q! sizeof(longlong) $Config{longlongsize}
+ q! Q! sizeof(long long) $Config{longlongsize}
The C<i!> and C<I!> codes aren't different from C<i> and C<I>; they are
tolerated for completeness' sake.
First, we note that this time-honored 16-bit CPU uses little-endian order,
and that's why the low order byte is stored at the lower address. To
-unpack such a (signed) short we'll have to use code C<v>. A repeat
+unpack such a (unsigned) short we'll have to use code C<v>. A repeat
count unpacks all 12 shorts:
my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =
$si, $di, $bp, $ds, $es ) =
unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );
+(The clumsy construction of the template can be avoided - just read on!)
+
We've taken some pains to construct the template so that it matches
the contents of our frame buffer. Otherwise we'd either get undefined values,
or C<unpack> could not unpack all. If C<pack> runs out of items, it will
you really code this!)
+=head2 Byte-order modifiers
+
+In the previous sections we've learned how to use C<n>, C<N>, C<v> and
+C<V> to pack and unpack integers with big- or little-endian byte-order.
+While this is nice, it's still rather limited because it leaves out all
+kinds of signed integers as well as 64-bit integers. For example, if you
+wanted to unpack a sequence of signed big-endian 16-bit integers in a
+platform-independent way, you would have to write:
+
+ my @data = unpack 's*', pack 'S*', unpack 'n*', $buf;
+
+This is ugly. As of Perl 5.9.2, there's a much nicer way to express your
+desire for a certain byte-order: the C<E<gt>> and C<E<lt>> modifiers.
+C<E<gt>> is the big-endian modifier, while C<E<lt>> is the little-endian
+modifier. Using them, we could rewrite the above code as:
+
+ my @data = unpack 's>*', $buf;
+
+As you can see, the "big end" of the arrow touches the C<s>, which is a
+nice way to remember that C<E<gt>> is the big-endian modifier. The same
+obviously works for C<E<lt>>, where the "little end" touches the code.
+
+You will probably find these modifiers even more useful if you have
+to deal with big- or little-endian C structures. Be sure to read
+L<"Packing and Unpacking C Structures"> for more on that.
+
=head2 Floating point Numbers
For packing floating point numbers you have the choice between the
-pack codes C<f> and C<d> which pack into (or unpack from) single-precision or
-double-precision representation as it is provided by your system. (There
+pack codes C<f>, C<d>, C<F> and C<D>. C<f> and C<d> pack into (or unpack
+from) single-precision or double-precision representation as it is provided
+by your system. If your systems supports it, C<D> can be used to pack and
+unpack extended-precision floating point values (C<long double>), which
+can offer even more resolution than C<f> or C<d>. C<F> packs an C<NV>,
+which is the floating point type used by Perl internally. (There
is no such thing as a network representation for reals, so if you want
to send your real numbers across computer boundaries, you'd better stick
to ASCII representation, unless you're absolutely sure what's on the other
-end of the line.)
+end of the line. For the even more adventuresome, you can use the byte-order
+modifiers from the previous section also on floating point codes.)
simply assigned to C<undef>, a convenient notation for "I don't care where
this goes".
- ($carry, undef, $parity, undef, $auxcarry, undef, $sign,
+ ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign,
$trace, $interrupt, $direction, $overflow) =
split( //, unpack( 'b16', $status ) );
128, C<unpack> knows where to stop.
+=head1 Template Grouping
+
+Prior to Perl 5.8, repetitions of templates had to be made by
+C<x>-multiplication of template strings. Now there is a better way as
+we may use the pack codes C<(> and C<)> combined with a repeat count.
+The C<unpack> template from the Stack Frame example can simply
+be written like this:
+
+ unpack( 'v2 (vXXCC)5 v5', $frame )
+
+Let's explore this feature a little more. We'll begin with the equivalent of
+
+ join( '', map( substr( $_, 0, 1 ), @str ) )
+
+which returns a string consisting of the first character from each string.
+Using pack, we can write
+
+ pack( '(A)'.@str, @str )
+
+or, because a repeat count C<*> means "repeat as often as required",
+simply
+
+ pack( '(A)*', @str )
+
+(Note that the template C<A*> would only have packed C<$str[0]> in full
+length.)
+
+To pack dates stored as triplets ( day, month, year ) in an array C<@dates>
+into a sequence of byte, byte, short integer we can write
+
+ $pd = pack( '(CCS)*', map( @$_, @dates ) );
+
+To swap pairs of characters in a string (with even length) one could use
+several techniques. First, let's use C<x> and C<X> to skip forward and back:
+
+ $s = pack( '(A)*', unpack( '(xAXXAx)*', $s ) );
+
+We can also use C<@> to jump to an offset, with 0 being the position where
+we were when the last C<(> was encountered:
+
+ $s = pack( '(A)*', unpack( '(@1A @0A @2)*', $s ) );
+
+Finally, there is also an entirely different approach by unpacking big
+endian shorts and packing them in the reverse byte order:
+
+ $s = pack( '(v)*', unpack( '(n)*', $s );
+
+
=head1 Lengths and Widths
=head2 String Lengths
# pack a message
my $msg = pack( 'Z*Z*CA*C', $src, $dst, length( $sm ), $sm, $prio );
-
+
# unpack fails - $prio remains undefined!
( $src, $dst, $len, $sm, $prio ) = unpack( 'Z*Z*CA*C', $msg );
So far, we've seen literals used as templates. If the list of pack
items doesn't have fixed length, an expression constructing the
-template has to be used. Here's an example:
-To store named string values in a way that can be conveniently parsed
-by a C program, we create a sequence of names and null terminated ASCII
-strings, with C<=> between the name and the value, followed by an
-additional delimiting null byte. Here's how:
+template is required (whenever, for some reason, C<()*> cannot be used).
+Here's an example: To store named string values in a way that can be
+conveniently parsed by a C program, we create a sequence of names and
+null terminated ASCII strings, with C<=> between the name and the value,
+followed by an additional delimiting null byte. Here's how:
- my $env = pack( 'A*A*Z*' x keys( %Env ) . 'C',
+ my $env = pack( '(A*A*Z*)' . keys( %Env ) . 'C',
map( { ( $_, '=', $Env{$_} ) } keys( %Env ) ), 0 );
Let's examine the cogs of this byte mill, one by one. There's the C<map>
call, creating the items we intend to stuff into the C<$env> buffer:
to each key (in C<$_>) it adds the C<=> separator and the hash entry value.
Each triplet is packed with the template code sequence C<A*A*Z*> that
-is multiplied with the number of keys. (Yes, that's what the C<keys>
+is repeated according to the number of keys. (Yes, that's what the C<keys>
function returns in scalar context.) To get the very last null byte,
we add a C<0> at the end of the C<pack> list, to be packed with C<C>.
(Attentive readers may have noticed that we could have omitted the 0.)
in the buffer before we can let C<unpack> rip it apart:
my $n = $env =~ tr/\0// - 1;
- my %env = map( split( /=/, $_ ), unpack( 'Z*' x $n, $env ) );
+ my %env = map( split( /=/, $_ ), unpack( "(Z*)$n", $env ) );
The C<tr> counts the null bytes. The C<unpack> call returns a list of
name-value pairs each of which is taken apart in the C<map> block.
+=head2 Counting Repetitions
+
+Rather than storing a sentinel at the end of a data item (or a list of items),
+we could precede the data with a count. Again, we pack keys and values of
+a hash, preceding each with an unsigned short length count, and up front
+we store the number of pairs:
+
+ my $env = pack( 'S(S/A* S/A*)*', scalar keys( %Env ), %Env );
+
+This simplifies the reverse operation as the number of repetitions can be
+unpacked with the C</> code:
+
+ my %env = unpack( 'S/(S/A* S/A*)', $env );
+
+Note that this is one of the rare cases where you cannot use the same
+template for C<pack> and C<unpack> because C<pack> can't determine
+a repeat count for a C<()>-group.
+
+
=head1 Packing and Unpacking C Structures
In previous sections we have seen how to pack numbers and character
contain anything else, and therefore you already know all there is to it.
Sorry, no: read on, please.
+If you have to deal with a lot of C structures, and don't want to
+hack all your template strings manually, you'll probably want to have
+a look at the CPAN module C<Convert::Binary::C>. Not only can it parse
+your C source directly, but it also has built-in support for all the
+odds and ends described further on in this section.
+
=head2 The Alignment Pit
In the consideration of speed against memory requirements the balance
#define Pt(struct,field,tchar) \
printf( "@%d%s ", offsetof(struct,field), # tchar );
- int main(){
+ int main() {
Pt( gappy_t, fc1, c );
Pt( gappy_t, fs, s! );
Pt( gappy_t, fc2, c );
given a C<struct> type and one of its field names ("member-designator" in
C standardese).
+Neither using offsets nor adding C<x>'s to bridge the gaps is satisfactory.
+(Just imagine what happens if the structure changes.) What we really need
+is a way of saying "skip as many bytes as required to the next multiple of N".
+In fluent Templatese, you say this with C<x!N> where N is replaced by the
+appropriate value. Here's the next version of our struct packaging:
+
+ my $gappy = pack( 'c x!2 s c x!4 l!', $c1, $s, $c2, $l );
+
+That's certainly better, but we still have to know how long all the
+integers are, and portability is far away. Rather than C<2>,
+for instance, we want to say "however long a short is". But this can be
+done by enclosing the appropriate pack code in brackets: C<[s]>. So, here's
+the very best we can do:
+
+ my $gappy = pack( 'c x![s] s c x![l!] l!', $c1, $s, $c2, $l );
+
+
+=head2 Dealing with Endian-ness
+
+Now, imagine that we want to pack the data for a machine with a
+different byte-order. First, we'll have to figure out how big the data
+types on the target machine really are. Let's assume that the longs are
+32 bits wide and the shorts are 16 bits wide. You can then rewrite the
+template as:
+
+ my $gappy = pack( 'c x![s] s c x![l] l', $c1, $s, $c2, $l );
+
+If the target machine is little-endian, we could write:
+
+ my $gappy = pack( 'c x![s] s< c x![l] l<', $c1, $s, $c2, $l );
+
+This forces the short and the long members to be little-endian, and is
+just fine if you don't have too many struct members. But we could also
+use the byte-order modifier on a group and write the following:
+
+ my $gappy = pack( '( c x![s] s c x![l] l )<', $c1, $s, $c2, $l );
+
+This is not as short as before, but it makes it more obvious that we
+intend to have little-endian byte-order for a whole group, not only
+for individual template codes. It can also be more readable and easier
+to maintain.
+
=head2 Alignment, Take 2
spacing - 16 bytes to a line:
my $i;
- print map { ++$i % 16 ? "$_ " : "$_\n" }
- unpack( 'H2' x length( $mem ), $mem ),
+ print map( ++$i % 16 ? "$_ " : "$_\n",
+ unpack( 'H2' x length( $mem ), $mem ) ),
length( $mem ) % 16 ? "\n" : '';