From: Gurusamy Sarathy Date: Mon, 11 Oct 1999 20:28:32 +0000 (+0000) Subject: more pack/unpack documentation (from Ilya Zakharevich); changed X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=2b6c5635bc48c0253928a53128e6cda1b8ab68c0;p=p5sagit%2Fp5-mst-13.2.git more pack/unpack documentation (from Ilya Zakharevich); changed the behavior of 'Z*' and 'Z3' to always pack a trailing null byte; changed documentation to suit; added test p4raw-id: //depot/perl@4346 --- diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 450dd4b..2dde78d 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -2724,8 +2724,13 @@ it differs from "use vars", which is package scoped.) =item pack TEMPLATE,LIST -Takes a list of values and packs it into a binary structure, -returning the string containing the structure. The TEMPLATE is a +Takes a LIST of values and converts it into a string using the rules +given by the TEMPLATE. The resulting string is the concatenation of +the converted values. Typically, each converted value looks +like its machine-level representation. For example, on 32-bit machines +a converted integer may be represented by a sequence of 4 bytes. + +The TEMPLATE is a sequence of characters that give the order and type of values, as follows: @@ -2733,8 +2738,8 @@ follows: A An ascii string, will be space padded. Z A null terminated (asciz) string, will be null padded. - b A bit string (ascending bit order, like vec()). - B A bit string (descending bit order). + b A bit string (ascending bit order inside each byte, like vec()). + B A bit string (descending bit order inside each byte). h A hex string (low nybble first). H A hex string (high nybble first). @@ -2802,18 +2807,46 @@ Each letter may optionally be followed by a number giving a repeat count. With all types except C<"a">, C<"A">, C<"Z">, C<"b">, C<"B">, C<"h">, C<"H">, and C<"P"> the pack function will gobble up that many values from the LIST. A C<*> for the repeat count means to use however many items are -left. +left, except for C<"@">, C<"x">, C<"X">, where it is equivalent +to C<"0">, and C<"u">, where it is equivalent to 1 (or 45, what is the +same). + +When used with C<"Z">, C<*> results in the addition of a trailing null +byte (so the packed result will be one longer than the byte C +of the item). + +The repeat count for C<"u"> is interpreted as the maximal number of bytes +to encode per line of output, with 0 and 1 replaced by 45. =item * The C<"a">, C<"A">, and C<"Z"> types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. When unpacking, C<"A"> strips trailing spaces and nulls, C<"Z"> strips everything -after the first null, and C<"a"> returns data verbatim. +after the first null, and C<"a"> returns data verbatim. When packing, +C<"a">, and C<"Z"> are equivalent. + +If the value-to-pack is too long, it is truncated. If too long and an +explicit count is provided, C<"Z"> packs only C<$count-1> bytes, followed +by a null byte. Thus C<"Z"> always packs a trailing null byte under +all circumstances. =item * Likewise, the C<"b"> and C<"B"> fields pack a string that many bits long. +Each byte of the input field generates 1 bit of the result basing on +the least-signifant bit of each input byte, i.e., on C. +In particular, bytes C<"0"> and C<"1"> generate bits 0 and 1. + +Starting from the beginning of the input string, each 8-tuple of bytes +is converted to 1 byte of output. If the length of the input string +is not divisible by 8, the remainder is packed as if padded by 0s. +Similarly, during unpack()ing the "extra" bits are ignored. + +If the input string is longer than needed, extra bytes are ignored. +A C<*> for the repeat count of pack() means to use all the bytes of +the input field. On unpack()ing the bits are converted to a string +of C<"0">s and C<"1">s. =item * @@ -2827,7 +2860,7 @@ responsible for ensuring the string is not a temporary value (which can potentially get deallocated before you get around to using the packed result). The C<"P"> type packs a pointer to a structure of the size indicated by the length. A NULL pointer is created if the corresponding value for C<"p"> or -C<"P"> is C. +C<"P"> is C, similarly for unpack(). =item * @@ -2957,6 +2990,12 @@ sequences of bytes. A comment in a TEMPLATE starts with C<#> and goes to the end of line. +=item * + +If TEMPLATE requires more arguments to pack() than actually given, pack() +assumes additional C<""> arguments. If TEMPLATE requires less arguments +to pack() than actually given, extra arguments are ignored. + =back Examples: @@ -4869,8 +4908,14 @@ If LIST is omitted, uses C<$_>. =item unpack TEMPLATE,EXPR C does the reverse of C: it takes a string -representing a structure and expands it out into a list of values. +and expands it out into a list of values. (In scalar context, it returns merely the first value produced.) + +The string is broken into chunks described by the TEMPLATE. Each chunk +is converted separately to a value. Typically, either the string is a result +of C, or the bytes of the string represent a C structure of some +kind. + The TEMPLATE has the same format as in the C function. Here's a subroutine that does substring: @@ -4883,9 +4928,14 @@ and then there's sub ordinal { unpack("c",$_[0]); } # same as ord() -In addition, you may prefix a field with a %EnumberE to indicate that +In addition to fields allowed in pack(), you may prefix a field with +a %EnumberE to indicate that you want a EnumberE-bit checksum of the items instead of the items -themselves. Default is a 16-bit checksum. For example, the following +themselves. Default is a 16-bit checksum. Checksum is calculated by +summing numeric values of expanded values (for string fields the sum of +C is taken, for bit fields the sum of zeroes and ones). + +For example, the following computes the same number as the System V sum program: $checksum = do { @@ -4902,6 +4952,10 @@ has no way of checking whether the value passed to C corresponds to a valid memory location, passing a pointer value that's not known to be valid is likely to have disastrous consequences. +If the repeat count of a field is larger than what the remainder of +the input string allows, repeat count is decreased. If the input string +is longer than one described by the TEMPLATE, the rest is ignored. + See L for more examples and notes. =item untie VARIABLE diff --git a/pp.c b/pp.c index 41aeeeb..7168be0 100644 --- a/pp.c +++ b/pp.c @@ -4427,10 +4427,16 @@ PP(pp_pack) case 'a': fromstr = NEXTFROM; aptr = SvPV(fromstr, fromlen); - if (pat[-1] == '*') + if (pat[-1] == '*') { len = fromlen; - if (fromlen > len) + if (datumtype == 'Z') + ++len; + } + if (fromlen >= len) { sv_catpvn(cat, aptr, len); + if (datumtype == 'Z') + *(SvEND(cat)-1) = '\0'; + } else { sv_catpvn(cat, aptr, fromlen); len -= fromlen; diff --git a/t/op/pack.t b/t/op/pack.t index 092e810..9b96289 100755 --- a/t/op/pack.t +++ b/t/op/pack.t @@ -6,7 +6,7 @@ BEGIN { require Config; import Config; } -print "1..152\n"; +print "1..153\n"; $format = "c2 x5 C C x s d i l a6"; # Need the expression in here to force ary[5] to be numeric. This avoids @@ -208,7 +208,7 @@ EOUU print "not " unless unpack('u', $uu) eq $in; print "ok ", $test++, "\n"; -# 61..72: test the ascii template types (A, a, Z) +# 61..73: test the ascii template types (A, a, Z) print "not " unless pack('A*', "foo\0bar\0 ") eq "foo\0bar\0 "; print "ok ", $test++, "\n"; @@ -234,19 +234,22 @@ print "ok ", $test++, "\n"; print "not " unless unpack('a8', "foo\0bar \0") eq "foo\0bar "; print "ok ", $test++, "\n"; -print "not " unless pack('Z*', "foo\0bar\0 ") eq "foo\0bar\0 "; +print "not " unless pack('Z*', "foo\0bar\0 ") eq "foo\0bar\0 \0"; print "ok ", $test++, "\n"; print "not " unless pack('Z11', "foo\0bar\0 ") eq "foo\0bar\0 \0\0"; print "ok ", $test++, "\n"; +print "not " unless pack('Z3', "foo") eq "fo\0"; +print "ok ", $test++, "\n"; + print "not " unless unpack('Z*', "foo\0bar \0") eq "foo"; print "ok ", $test++, "\n"; print "not " unless unpack('Z8', "foo\0bar \0") eq "foo"; print "ok ", $test++, "\n"; -# 73..78: packing native shorts/ints/longs +# 74..79: packing native shorts/ints/longs print "not " unless length(pack("s!", 0)) == $Config{shortsize}; print "ok ", $test++, "\n"; @@ -266,81 +269,81 @@ print "ok ", $test++, "\n"; print "not " unless length(pack("i!", 0)) == length(pack("i", 0)); print "ok ", $test++, "\n"; -# 79..138: pack <-> unpack bijectionism +# 80..139: pack <-> unpack bijectionism -# 79.. 83 c +# 80.. 84 c foreach my $c (-128, -1, 0, 1, 127) { print "not " unless unpack("c", pack("c", $c)) == $c; print "ok ", $test++, "\n"; } -# 84.. 88: C +# 85.. 89: C foreach my $C (0, 1, 127, 128, 255) { print "not " unless unpack("C", pack("C", $C)) == $C; print "ok ", $test++, "\n"; } -# 89.. 93: s +# 90.. 94: s foreach my $s (-32768, -1, 0, 1, 32767) { print "not " unless unpack("s", pack("s", $s)) == $s; print "ok ", $test++, "\n"; } -# 94.. 98: S +# 95.. 99: S foreach my $S (0, 1, 32767, 32768, 65535) { print "not " unless unpack("S", pack("S", $S)) == $S; print "ok ", $test++, "\n"; } -# 99..103: i +# 100..104: i foreach my $i (-2147483648, -1, 0, 1, 2147483647) { print "not " unless unpack("i", pack("i", $i)) == $i; print "ok ", $test++, "\n"; } -# 104..108: I +# 105..109: I foreach my $I (0, 1, 2147483647, 2147483648, 4294967295) { print "not " unless unpack("I", pack("I", $I)) == $I; print "ok ", $test++, "\n"; } -# 109..113: l +# 110..114: l foreach my $l (-2147483648, -1, 0, 1, 2147483647) { print "not " unless unpack("l", pack("l", $l)) == $l; print "ok ", $test++, "\n"; } -# 114..118: L +# 115..119: L foreach my $L (0, 1, 2147483647, 2147483648, 4294967295) { print "not " unless unpack("L", pack("L", $L)) == $L; print "ok ", $test++, "\n"; } -# 119..123: n +# 120..124: n foreach my $n (0, 1, 32767, 32768, 65535) { print "not " unless unpack("n", pack("n", $n)) == $n; print "ok ", $test++, "\n"; } -# 124..128: v +# 125..129: v foreach my $v (0, 1, 32767, 32768, 65535) { print "not " unless unpack("v", pack("v", $v)) == $v; print "ok ", $test++, "\n"; } -# 129..133: N +# 130..134: N foreach my $N (0, 1, 2147483647, 2147483648, 4294967295) { print "not " unless unpack("N", pack("N", $N)) == $N; print "ok ", $test++, "\n"; } -# 134..138: V +# 135..139: V foreach my $V (0, 1, 2147483647, 2147483648, 4294967295) { print "not " unless unpack("V", pack("V", $V)) == $V; print "ok ", $test++, "\n"; } -# 139..142: pack nvNV byteorders +# 140..143: pack nvNV byteorders print "not " unless pack("n", 0xdead) eq "\xde\xad"; print "ok ", $test++, "\n"; @@ -354,7 +357,7 @@ print "ok ", $test++, "\n"; print "not " unless pack("V", 0xdeadbeef) eq "\xef\xbe\xad\xde"; print "ok ", $test++, "\n"; -# 143..148: / +# 144..149: / my $z; eval { ($x) = unpack '/a*','hello' }; @@ -369,7 +372,7 @@ print 'not ' unless $@; print "ok $test\n"; $test++; $z = pack 'n/a* w/A*','string','etc'; print 'not ' unless $z eq "\000\006string\003etc"; print "ok $test\n"; $test++; -# 149..152: / with # +# 150..153: / with # eval { ($z,$x,$y) = unpack <