Default to allocating the correct size for the array in the HV.

[p5sagit/p5-mst-13.2.git] / pod / perlfunc.pod
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod

index c15185e..a428b5f 100644 (file)
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -2265,7 +2265,7 @@ See also L</map> for a list composed of the results of the BLOCK or EXPR.
 =item hex
 
 Interprets EXPR as a hex string and returns the corresponding value.
-(To convert strings that might start with either 0, 0x, or 0b, see
+(To convert strings that might start with either C<0>, C<0x>, or C<0b>, see
 L</oct>.)  If EXPR is omitted, uses C<$_>.
 
     print hex '0xAf'; # prints '175'
@@ -2273,7 +2273,8 @@ L</oct>.)  If EXPR is omitted, uses C<$_>.
 
 Hex strings may only represent integers.  Strings that would cause
 integer overflow trigger a warning.  Leading whitespace is not stripped,
-unlike oct().
+unlike oct(). To present something as hex, look into L</printf>,
+L</sprintf>, or L</unpack>.
 
 =item import
 
@@ -2686,10 +2687,13 @@ and you get list of anonymous hashes each with only 1 entry.
 
 =item mkdir FILENAME
 
+=item mkdir
+
 Creates the directory specified by FILENAME, with permissions
 specified by MASK (as modified by C<umask>).  If it succeeds it
 returns true, otherwise it returns false and sets C<$!> (errno).
-If omitted, MASK defaults to 0777.
+If omitted, MASK defaults to 0777. If omitted, FILENAME defaults
+to C<$_>.
 
 In general, it is better to create directories with permissive MASK,
 and let the user modify that with their C<umask>, than it is to supply
@@ -3007,7 +3011,7 @@ Examples:
        }
     }
 
-See L<perliol/> for detailed info on PerlIO.
+See L<perliol> for detailed info on PerlIO.
 
 You may also, in the Bourne shell tradition, specify an EXPR beginning
 with C<< '>&' >>, in which case the rest of the string is interpreted
@@ -3357,15 +3361,16 @@ of values, as follows:
     U  A Unicode character number.  Encodes to UTF-8 internally
        (or UTF-EBCDIC in EBCDIC platforms).
 
-    w  A BER compressed integer.  Its bytes represent an unsigned
-       integer in base 128, most significant digit first, with as
-        few digits as possible.  Bit eight (the high bit) is set
-        on each byte except the last.
+    w  A BER compressed integer (not an ASN.1 BER, see perlpacktut for
+       details).  Its bytes represent an unsigned integer in base 128,
+       most significant digit first, with as few digits as possible.  Bit
+       eight (the high bit) is set on each byte except the last.
 
     x  A null byte.
     X  Back up a byte.
-    @  Null fill to absolute position, counted from the start of
-        the innermost ()-group.
+    @  Null fill or truncate to absolute position, counted from the
+        start of the innermost ()-group.
+    .   Null fill or truncate to absolute position specified by value.
     (  Start of a ()-group.
 
 Some letters in the TEMPLATE may optionally be followed by one or
@@ -3379,6 +3384,10 @@ which the modifier is valid):
 
         nNvV       Treat integers as signed instead of unsigned.
 
+        @.         Specify position as byte offset in the internal
+                   representation of the packed string. Efficient but
+                   dangerous.
+
     >   sSiIlLqQ   Force big-endian byte-order on the type.
         jJfFdDpP   (The "big end" touches the construct.)
 
@@ -3397,12 +3406,13 @@ The following rules apply:
 
 Each letter may optionally be followed by a number giving a repeat
 count.  With all types except C<a>, C<A>, C<Z>, C<b>, C<B>, C<h>,
-C<H>, C<@>, C<x>, C<X> and C<P> the pack function will gobble up that
-many values from the LIST.  A C<*> for the repeat count means to use
-however many items are left, except for C<@>, C<x>, C<X>, where it is
-equivalent to C<0>, and C<u>, where it is equivalent to 1 (or 45, what
-is the same).  A numeric repeat count may optionally be enclosed in
-brackets, as in C<pack 'C[80]', @arr>.
+C<H>, C<@>, C<.>, C<x>, C<X> and C<P> the pack function will gobble up
+that many values from the LIST.  A C<*> for the repeat count means to
+use however many items are left, except for C<@>, C<x>, C<X>, where it
+is equivalent to C<0>, for <.> where it means relative to string start
+and C<u>, where it is equivalent to 1 (or 45, which is the same).
+A numeric repeat count may optionally be enclosed in brackets, as in
+C<pack 'C[80]', @arr>.
 
 One can replace the numeric repeat count by a template enclosed in brackets;
 then the packed length of this template in bytes is used as a count.
@@ -3416,6 +3426,17 @@ When used with C<Z>, C<*> results in the addition of a trailing null
 byte (so the packed result will be one longer than the byte C<length>
 of the item).
 
+When used with C<@>, the repeat count represents an offset from the start
+of the innermost () group.
+
+When used with C<.>, the repeat count is used to determine the starting
+position from where the value offset is calculated. If the repeat count
+is 0, it's relative to the current position. If the repeat count is C<*>,
+the offset is relative to the start of the packed string. And if its an
+integer C<n> the offset is relative to the start of the n-th innermost
+() group (or the start of the string if C<n> is bigger then the group
+level).
+
 The repeat count for C<u> is interpreted as the maximal number of bytes
 to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat 
 count should not be more than 65.
@@ -3424,7 +3445,7 @@ count should not be more than 65.
 
 The C<a>, C<A>, and C<Z> types gobble just one value, but pack it as a
 string of length count, padding with nulls or spaces as necessary.  When
-unpacking, C<A> strips trailing spaces and nulls, C<Z> strips everything
+unpacking, C<A> strips trailing whitespace and nulls, C<Z> strips everything
 after the first null, and C<a> returns data verbatim.
 
 If the value-to-pack is too long, it is truncated.  If too long and an
@@ -3501,24 +3522,32 @@ so will result in a fatal error.
 
 =item *
 
-The C</> template character allows packing and unpacking of strings where
-the packed structure contains a byte count followed by the string itself.
-You write I<length-item>C</>I<string-item>.
+The C</> template character allows packing and unpacking of a sequence of
+items where the packed structure contains a packed item count followed by 
+the packed items themselves.
+You write I<length-item>C</>I<sequence-item>.
 
 The I<length-item> can be any C<pack> template letter, and describes
 how the length value is packed.  The ones likely to be of most use are
 integer-packing ones like C<n> (for Java strings), C<w> (for ASN.1 or
 SNMP) and C<N> (for Sun XDR).
 
-For C<pack>, the I<string-item> must, at present, be C<"A*">, C<"a*"> or
-C<"Z*">. For C<unpack> the length of the string is obtained from the
-I<length-item>, but if you put in the '*' it will be ignored. For all other
-codes, C<unpack> applies the length value to the next item, which must not
-have a repeat count.
+For C<pack>, the I<sequence-item> may have a repeat count, in which case
+the minimum of that and the number of available items is used as argument
+for the I<length-item>. If it has no repeat count or uses a '*', the number
+of available items is used. For C<unpack> the repeat count is always obtained
+by decoding the packed item count, and the I<sequence-item> must not have a
+repeat count.
 
-    unpack 'W/a', "\04Gurusamy";        gives 'Guru'
-    unpack 'a3/A* A*', '007 Bond  J ';  gives (' Bond','J')
-    pack 'n/a* w/a*','hello,','world';  gives "\000\006hello,\005world"
+If the I<sequence-item> refers to a string type (C<"A">, C<"a"> or C<"Z">),
+the I<length-item> is a string length, not a number of strings. If there is
+an explicit repeat count for pack, the packed string will be adjusted to that
+given length.
+
+    unpack 'W/a', "\04Gurusamy";        gives ('Guru')
+    unpack 'a3/A* A*', '007 Bond  J ';  gives (' Bond', 'J')
+    pack 'n/a* w/a','hello,','world';   gives "\000\006hello,\005world"
+    pack 'a/W2', ord('a') .. ord('z');  gives '2ab'
 
 The I<length-item> is not returned explicitly from C<unpack>.
 
@@ -3680,7 +3709,6 @@ C<@> starts again at 0. Therefore, the result of
 
 is the string "\0a\0\0bc".
 
-
 =item *
 
 C<x> and C<X> accept C<!> modifier.  In this case they act as
@@ -3771,6 +3799,8 @@ Examples:
     $bar = pack('s@4l', 12, 34);
     # short 12, zero fill to position 4, long 34
     # $foo eq $bar
+    $baz = pack('s.l', 12, 4, 34);
+    # short 12, zero fill to position 4, long 34
 
     $foo = pack('nN', 42, 4711);
     # pack big-endian 16- and 32-bit unsigned integers
@@ -4273,7 +4303,7 @@ a bareword argument, there is a little extra functionality going on
 behind the scenes.  Before C<require> looks for a "F<.pm>" extension,
 it will first look for a filename with a "F<.pmc>" extension.  A file
 with this extension is assumed to be Perl bytecode generated by
-L<B::Bytecode|B::Bytecode>.  If this file is found, and it's modification
+L<B::Bytecode|B::Bytecode>.  If this file is found, and its modification
 time is newer than a coinciding "F<.pm>" non-compiled file, it will be
 loaded in place of that non-compiled file ending in a "F<.pm>" extension.
 
@@ -4576,6 +4606,14 @@ Note that whether C<select> gets restarted after signals (say, SIGALRM)
 is implementation-dependent.  See also L<perlport> for notes on the
 portability of C<select>.
 
+On error, C<select> returns C<undef> and sets C<$!>.
+
+Note: on some Unixes, the select(2) system call may report a socket file
+descriptor as "ready for reading", when actually no data is available,
+thus a subsequent read blocks. It can be avoided using always the
+O_NONBLOCK flag on the socket. See select(2) and fcntl(2) for further
+details.
+
 B<WARNING>: One should not attempt to mix buffered I/O (like C<read>
 or <FH>) with C<select>, except as permitted by POSIX, and even
 then only on POSIX systems.  You have to use C<sysread> instead.
@@ -5039,14 +5077,19 @@ characters at each point it matches that way.  For example:
 
 produces the output 'h:i:t:h:e:r:e'.
 
-Using the empty pattern C<//> specifically matches the null string, and is
-not be confused with the use of C<//> to mean "the last successful pattern
-match".
+As a special case for C<split>, using the empty pattern C<//> specifically
+matches only the null string, and is not be confused with the regular use
+of C<//> to mean "the last successful pattern match".  So, for C<split>,
+the following:
 
-Empty leading (or trailing) fields are produced when there are positive width
-matches at the beginning (or end) of the string; a zero-width match at the
-beginning (or end) of the string does not produce an empty field.  For
-example:
+    print join(':', split(//, 'hi there'));
+
+produces the output 'h:i: :t:h:e:r:e'.
+
+Empty leading (or trailing) fields are produced when there are positive
+width matches at the beginning (or end) of the string; a zero-width match
+at the beginning (or end) of the string does not produce an empty field.
+For example:
 
    print join(':', split(/(?=\w)/, 'hi there!'));