X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfunc.pod;h=c21ca0dde01021a3e50f94faf86e3caa1c382b9e;hb=28a5cf3b760f8b6322a0839e3b3e060e7a6f23ea;hp=dc23b210faec446ccf2ad7225c6aeb3bc3d90d39;hpb=ea63ef19ca3f585d76378df52ef140c169642026;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index dc23b21..c21ca0d 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -226,7 +226,7 @@ C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, -C, C, C, C, +C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, @@ -603,6 +603,12 @@ previous time C was called. =item chdir EXPR +=item chdir FILEHANDLE + +=item chdir DIRHANDLE + +=item chdir + Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by C<$ENV{HOME}>, if set; if not, changes to the directory specified by C<$ENV{LOGDIR}>. (Under VMS, the @@ -610,6 +616,10 @@ variable C<$ENV{SYS$LOGIN}> is also checked, and used if it is set.) If neither is set, C does nothing. It returns true upon success, false otherwise. See the example under C. +On systems that support fchdir, you might pass a file handle or +directory handle as argument. On systems that don't support fchdir, +passing handles produces a fatal error at run time. + =item chmod LIST Changes the permissions of a list of files. The first element of the @@ -625,6 +635,14 @@ successfully changed. See also L, if all you have is a string. $mode = '0644'; chmod oct($mode), 'foo'; # this is better $mode = 0644; chmod $mode, 'foo'; # this is best +On systems that support fchmod, you might pass file handles among the +files. On systems that don't support fchmod, passing file handles +produces a fatal error at run time. + + open(my $fh, "<", "foo"); + my $perm = (stat $fh)[2] & 07777; + chmod($perm | 0600, $fh); + You can also import the symbolic C constants from the Fcntl module: @@ -710,6 +728,10 @@ successfully changed. $cnt = chown $uid, $gid, 'foo', 'bar'; chown $uid, $gid, @filenames; +On systems that support fchown, you might pass file handles among the +files. On systems that don't support fchown, passing file handles +produces a fatal error at run time. + Here's an example that looks up nonnumeric uids in the passwd file: print "User: "; @@ -742,6 +764,10 @@ chr(0x263a) is a Unicode smiley face. Note that characters from 128 to 255 (inclusive) are by default not encoded in UTF-8 Unicode for backward compatibility reasons (but see L). +Negative values give the Unicode replacement character (chr(0xfffd)), +except under the L pragma, where low eight bits of the value +(truncated to an integer) are used. + If NUMBER is omitted, uses C<$_>. For the reverse, use L. @@ -782,7 +808,8 @@ program exits with non-zero status. (If the only problem was that the program exited non-zero, C<$!> will be set to C<0>.) Closing a pipe also waits for the process executing on the pipe to complete, in case you want to look at the output of the pipe afterwards, and -implicitly puts the exit status value of that command into C<$?>. +implicitly puts the exit status value of that command into C<$?> and +C<${^CHILD_ERROR_NATIVE}>. Prematurely closing the read end of a pipe (i.e. before the process writing to it at the other end has closed it) will result in a @@ -858,31 +885,42 @@ function, or use this relation: =item crypt PLAINTEXT,SALT -Encrypts a string exactly like the crypt(3) function in the C library -(assuming that you actually have a version there that has not been -extirpated as a potential munition). This can prove useful for checking -the password file for lousy passwords, amongst other things. Only the -guys wearing white hats should do this. - -Note that L is intended to be a one-way function, much like -breaking eggs to make an omelette. There is no (known) corresponding -decrypt function (in other words, the crypt() is a one-way hash -function). As a result, this function isn't all that useful for -cryptography. (For that, see your nearby CPAN mirror.) - -When verifying an existing encrypted string you should use the -encrypted text as the salt (like C). This allows your code to work with the standard L -and with more exotic implementations. In other words, do not assume -anything about the returned string itself, or how many bytes in -the encrypted string matter. +Creates a digest string exactly like the crypt(3) function in the C +library (assuming that you actually have a version there that has not +been extirpated as a potential munition). + +crypt() is a one-way hash function. The PLAINTEXT and SALT is turned +into a short string, called a digest, which is returned. The same +PLAINTEXT and SALT will always return the same string, but there is no +(known) way to get the original PLAINTEXT from the hash. Small +changes in the PLAINTEXT or SALT will result in large changes in the +digest. + +There is no decrypt function. This function isn't all that useful for +cryptography (for that, look for F modules on your nearby CPAN +mirror) and the name "crypt" is a bit of a misnomer. Instead it is +primarily used to check if two pieces of text are the same without +having to transmit or store the text itself. An example is checking +if a correct password is given. The digest of the password is stored, +not the password itself. The user types in a password which is +crypt()'d with the same salt as the stored digest. If the two digests +match the password is correct. + +When verifying an existing digest string you should use the digest as +the salt (like C). The SALT used +to create the digest is visible as part of the digest so this ensures +crypt() will hash the new string with the same salt as the digest. +This allows your code to work with the standard L and +with more exotic implementations. In other words, do not assume +anything about the returned string itself, or how many bytes in the +digest matter. Traditionally the result is a string of 13 bytes: two first bytes of the salt, followed by 11 bytes from the set C<[./0-9A-Za-z]>, and only -the first eight bytes of the encrypted string mattered, but -alternative hashing schemes (like MD5), higher level security schemes -(like C2), and implementations on non-UNIX platforms may produce -different strings. +the first eight bytes of the digest string mattered, but alternative +hashing schemes (like MD5), higher level security schemes (like C2), +and implementations on non-UNIX platforms may produce different +strings. When choosing a new salt create a random two character string whose characters come from the set C<[./0-9A-Za-z]> (like C function is unsuitable for encrypting large quantities +The L function is unsuitable for hashing large quantities of data, not least of all because you can't get the information -back. Look at the F and F directories -on your favorite CPAN mirror for a slew of potentially useful -modules. +back. Look at the L module for more robust algorithms. If using crypt() on a Unicode string (which I has characters with codepoints above 255), Perl tries to make sense @@ -1145,7 +1181,7 @@ This is useful for propagating exceptions: If LIST is empty and C<$@> contains an object reference that has a C method, that method will be called with additional file and line number parameters. The return value replaces the value in -C<$@>. ie. as if C<< $@ = eval { $@->PROPAGATE(__FILE__, __LINE__) }; >> +C<$@>. i.e. as if C<< $@ = eval { $@->PROPAGATE(__FILE__, __LINE__) }; >> were called. If C<$@> is empty then the string C<"Died"> is used. @@ -1157,9 +1193,11 @@ maintain arbitrary state about the nature of the exception. Such a scheme is sometimes preferable to matching particular string values of $@ using regular expressions. Here's an example: + use Scalar::Util 'blessed'; + eval { ... ; die Some::Module::Exception->new( FOO => "bar" ) }; if ($@) { - if (ref($@) && UNIVERSAL::isa($@,"Some::Module::Exception")) { + if (blessed($@) && $@->isa("Some::Module::Exception")) { # handle Some::Module::Exception } else { @@ -1371,6 +1409,8 @@ there was an error. =item eval BLOCK +=item eval + In the first form, the return value of EXPR is parsed and executed as if it were a little Perl program. The value of the expression (which is itself determined within scalar context) is first parsed, and if there weren't any @@ -1619,6 +1659,8 @@ to exists() is an error. =item exit EXPR +=item exit + Evaluates EXPR and exits immediately with that value. Example: $ans = ; @@ -2054,7 +2096,7 @@ addresses returned by the corresponding system library call. In the Internet domain, each address is four bytes long and you can unpack it by saying something like: - ($a,$b,$c,$d) = unpack('C4',$addr[0]); + ($a,$b,$c,$d) = unpack('W4',$addr[0]); The Socket library makes this slightly easier: @@ -2142,6 +2184,8 @@ C extension. See L for details. =item gmtime EXPR +=item gmtime + Converts a time as returned by the time function to an 8-element list with the time localized for the standard Greenwich time zone. Typically used as follows: @@ -2185,6 +2229,8 @@ This scalar value is B locale dependent (see L), but is instead a Perl builtin. To get somewhat similar but locale dependent date strings, see the example in L. +See L for portability concerns. + =item goto LABEL =item goto EXPR @@ -2265,7 +2311,7 @@ See also L for a list composed of the results of the BLOCK or EXPR. =item hex Interprets EXPR as a hex string and returns the corresponding value. -(To convert strings that might start with either 0, 0x, or 0b, see +(To convert strings that might start with either C<0>, C<0x>, or C<0b>, see L.) If EXPR is omitted, uses C<$_>. print hex '0xAf'; # prints '175' @@ -2273,9 +2319,10 @@ L.) If EXPR is omitted, uses C<$_>. Hex strings may only represent integers. Strings that would cause integer overflow trigger a warning. Leading whitespace is not stripped, -unlike oct(). +unlike oct(). To present something as hex, look into L, +L, or L. -=item import +=item import LIST There is no builtin C function. It is just an ordinary method (subroutine) defined (or inherited) by modules that wish to export @@ -2311,9 +2358,9 @@ functions will serve you better than will int(). Implements the ioctl(2) function. You'll probably first have to say - require "ioctl.ph"; # probably in /usr/local/lib/perl/ioctl.ph + require "sys/ioctl.ph"; # probably in $Config{archlib}/ioctl.ph -to get the correct function definitions. If F doesn't +to get the correct function definitions. If F doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as F<< >>. (There is a Perl script called B that comes with the Perl kit that @@ -2581,6 +2628,8 @@ try for example: Note that the C<%a> and C<%b>, the short forms of the day of the week and the month of the year, may not necessarily be three characters wide. +See L for portability concerns. + =item lock THING This function places an advisory lock on a shared variable, or referenced @@ -2686,10 +2735,13 @@ and you get list of anonymous hashes each with only 1 entry. =item mkdir FILENAME +=item mkdir + Creates the directory specified by FILENAME, with permissions specified by MASK (as modified by C). If it succeeds it returns true, otherwise it returns false and sets C<$!> (errno). -If omitted, MASK defaults to 0777. +If omitted, MASK defaults to 0777. If omitted, FILENAME defaults +to C<$_>. In general, it is better to create directories with permissive MASK, and let the user modify that with their C, than it is to supply @@ -2943,7 +2995,7 @@ to the temporary file first. You will need to seek() to do the reading. Since v5.8.0, perl has built using PerlIO by default. Unless you've -changed this (ie Configure -Uuseperlio), you can open file handles to +changed this (i.e. Configure -Uuseperlio), you can open file handles to "in memory" files held in Perl scalars via: open($fh, '>', \$variable) || .. @@ -3007,7 +3059,7 @@ Examples: } } -See L for detailed info on PerlIO. +See L for detailed info on PerlIO. You may also, in the Bourne shell tradition, specify an EXPR beginning with C<< '>&' >>, in which case the rest of the string is interpreted @@ -3122,7 +3174,8 @@ be set for the newly opened file descriptor as determined by the value of $^F. See L. Closing any piped filehandle causes the parent process to wait for the -child to finish, and returns the status value in C<$?>. +child to finish, and returns the status value in C<$?> and +C<${^CHILD_ERROR_NATIVE}>. The filename passed to 2-argument (or 1-argument) form of open() will have leading and trailing whitespace deleted, and the normal @@ -3218,15 +3271,24 @@ See L and L for more about Unicode. =item our TYPE EXPR : ATTRS -An C declares the listed variables to be valid globals within -the enclosing block, file, or C. That is, it has the same -scoping rules as a "my" declaration, but does not create a local -variable. If more than one value is listed, the list must be placed -in parentheses. The C declaration has no semantic effect unless -"use strict vars" is in effect, in which case it lets you use the -declared global variable without qualifying it with a package name. -(But only within the lexical scope of the C declaration. In this -it differs from "use vars", which is package scoped.) +C associates a simple name with a package variable in the current +package for use within the current scope. When C is in +effect, C lets you use declared global variables without qualifying +them with package names, within the lexical scope of the C declaration. +In this way C differs from C, which is package scoped. + +Unlike C, which both allocates storage for a variable and associates a +a simple name with that storage for use within the current scope, C +associates a simple name with a package variable in the current package, +for use within the current scope. In other words, C has the same +scoping rules as C, but does not necessarily create a +variable. + +If more than one value is listed, the list must be placed +in parentheses. + + our $foo; + our($bar, $baz); An C declaration declares a global variable that will be visible across its entire lexical scope, even across package boundaries. The @@ -3239,11 +3301,15 @@ behavior holds: $bar = 20; package Bar; - print $bar; # prints 20 + print $bar; # prints 20, as it refers to $Foo::bar -Multiple C declarations in the same lexical scope are allowed -if they are in different packages. If they happened to be in the same -package, Perl will emit warnings if you have asked for them. +Multiple C declarations with the same name in the same lexical +scope are allowed if they are in different packages. If they happen +to be in the same package, Perl will emit warnings if you have asked +for them, just like multiple C declarations. Unlike a second +C declaration, which will bind the name to a fresh variable, a +second C declaration in the same package, in the same scope, is +merely redundant. use warnings; package Foo; @@ -3254,7 +3320,8 @@ package, Perl will emit warnings if you have asked for them. our $bar = 30; # declares $Bar::bar for rest of lexical scope print $bar; # prints 30 - our $bar; # emits warning + our $bar; # emits warning but has no other effect + print $bar; # still prints 30 An C declaration may also have a list of attributes associated with it. @@ -3296,7 +3363,8 @@ Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines -a converted integer may be represented by a sequence of 4 bytes. +an integer may be represented by a sequence of 4 bytes which will be +converted to a sequence of 4 characters. The TEMPLATE is a sequence of characters that give the order and type of values, as follows: @@ -3311,7 +3379,9 @@ of values, as follows: H A hex string (high nybble first). c A signed char (8-bit) value. - C An unsigned char value. Only does bytes. See U for Unicode. + C An unsigned C char (octet) even under Unicode. Should normally not + be used. See U and W instead. + W An unsigned char value (can be greater than 255). s A signed short (16-bit) value. S An unsigned short value. @@ -3354,15 +3424,16 @@ of values, as follows: U A Unicode character number. Encodes to UTF-8 internally (or UTF-EBCDIC in EBCDIC platforms). - w A BER compressed integer. Its bytes represent an unsigned - integer in base 128, most significant digit first, with as - few digits as possible. Bit eight (the high bit) is set - on each byte except the last. + w A BER compressed integer (not an ASN.1 BER, see perlpacktut for + details). Its bytes represent an unsigned integer in base 128, + most significant digit first, with as few digits as possible. Bit + eight (the high bit) is set on each byte except the last. x A null byte. X Back up a byte. - @ Null fill to absolute position, counted from the start of - the innermost ()-group. + @ Null fill or truncate to absolute position, counted from the + start of the innermost ()-group. + . Null fill or truncate to absolute position specified by value. ( Start of a ()-group. Some letters in the TEMPLATE may optionally be followed by one or @@ -3376,6 +3447,10 @@ which the modifier is valid): nNvV Treat integers as signed instead of unsigned. + @. Specify position as byte offset in the internal + representation of the packed string. Efficient but + dangerous. + > sSiIlLqQ Force big-endian byte-order on the type. jJfFdDpP (The "big end" touches the construct.) @@ -3394,12 +3469,13 @@ The following rules apply: Each letter may optionally be followed by a number giving a repeat count. With all types except C, C, C, C, C, C, -C, C<@>, C, C and C

the pack function will gobble up that -many values from the LIST. A C<*> for the repeat count means to use -however many items are left, except for C<@>, C, C, where it is -equivalent to C<0>, and C, where it is equivalent to 1 (or 45, what -is the same). A numeric repeat count may optionally be enclosed in -brackets, as in C. +C, C<@>, C<.>, C, C and C

the pack function will gobble up +that many values from the LIST. A C<*> for the repeat count means to +use however many items are left, except for C<@>, C, C, where it +is equivalent to C<0>, for <.> where it means relative to string start +and C, where it is equivalent to 1 (or 45, which is the same). +A numeric repeat count may optionally be enclosed in brackets, as in +C. One can replace the numeric repeat count by a template enclosed in brackets; then the packed length of this template in bytes is used as a count. @@ -3413,72 +3489,84 @@ When used with C, C<*> results in the addition of a trailing null byte (so the packed result will be one longer than the byte C of the item). +When used with C<@>, the repeat count represents an offset from the start +of the innermost () group. + +When used with C<.>, the repeat count is used to determine the starting +position from where the value offset is calculated. If the repeat count +is 0, it's relative to the current position. If the repeat count is C<*>, +the offset is relative to the start of the packed string. And if its an +integer C the offset is relative to the start of the n-th innermost +() group (or the start of the string if C is bigger then the group +level). + The repeat count for C is interpreted as the maximal number of bytes -to encode per line of output, with 0 and 1 replaced by 45. +to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat +count should not be more than 65. =item * The C, C, and C types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. When -unpacking, C strips trailing spaces and nulls, C strips everything -after the first null, and C returns data verbatim. When packing, -C, and C are equivalent. +unpacking, C strips trailing whitespace and nulls, C strips everything +after the first null, and C returns data verbatim. If the value-to-pack is too long, it is truncated. If too long and an explicit count is provided, C packs only C<$count-1> bytes, followed -by a null byte. Thus C always packs a trailing null byte under -all circumstances. +by a null byte. Thus C always packs a trailing null (except when the +count is 0). =item * Likewise, the C and C fields pack a string that many bits long. -Each byte of the input field of pack() generates 1 bit of the result. +Each character of the input field of pack() generates 1 bit of the result. Each result bit is based on the least-significant bit of the corresponding -input byte, i.e., on C. In particular, bytes C<"0"> and -C<"1"> generate bits 0 and 1, as do bytes C<"\0"> and C<"\1">. +input character, i.e., on C. In particular, characters C<"0"> +and C<"1"> generate bits 0 and 1, as do characters C<"\0"> and C<"\1">. Starting from the beginning of the input string of pack(), each 8-tuple -of bytes is converted to 1 byte of output. With format C -the first byte of the 8-tuple determines the least-significant bit of a -byte, and with format C it determines the most-significant bit of -a byte. +of characters is converted to 1 character of output. With format C +the first character of the 8-tuple determines the least-significant bit of a +character, and with format C it determines the most-significant bit of +a character. If the length of the input string is not exactly divisible by 8, the -remainder is packed as if the input string were padded by null bytes +remainder is packed as if the input string were padded by null characters at the end. Similarly, during unpack()ing the "extra" bits are ignored. -If the input string of pack() is longer than needed, extra bytes are ignored. -A C<*> for the repeat count of pack() means to use all the bytes of -the input field. On unpack()ing the bits are converted to a string -of C<"0">s and C<"1">s. +If the input string of pack() is longer than needed, extra characters are +ignored. A C<*> for the repeat count of pack() means to use all the +characters of the input field. On unpack()ing the bits are converted to a +string of C<"0">s and C<"1">s. =item * The C and C fields pack a string that many nybbles (4-bit groups, representable as hexadecimal digits, 0-9a-f) long. -Each byte of the input field of pack() generates 4 bits of the result. -For non-alphabetical bytes the result is based on the 4 least-significant -bits of the input byte, i.e., on C. In particular, -bytes C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes -C<"\0"> and C<"\1">. For bytes C<"a".."f"> and C<"A".."F"> the result +Each character of the input field of pack() generates 4 bits of the result. +For non-alphabetical characters the result is based on the 4 least-significant +bits of the input character, i.e., on C. In particular, +characters C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes +C<"\0"> and C<"\1">. For characters C<"a".."f"> and C<"A".."F"> the result is compatible with the usual hexadecimal digits, so that C<"a"> and -C<"A"> both generate the nybble C<0xa==10>. The result for bytes +C<"A"> both generate the nybble C<0xa==10>. The result for characters C<"g".."z"> and C<"G".."Z"> is not well-defined. Starting from the beginning of the input string of pack(), each pair -of bytes is converted to 1 byte of output. With format C the -first byte of the pair determines the least-significant nybble of the -output byte, and with format C it determines the most-significant +of characters is converted to 1 character of output. With format C the +first character of the pair determines the least-significant nybble of the +output character, and with format C it determines the most-significant nybble. If the length of the input string is not even, it behaves as if padded -by a null byte at the end. Similarly, during unpack()ing the "extra" +by a null character at the end. Similarly, during unpack()ing the "extra" nybbles are ignored. -If the input string of pack() is longer than needed, extra bytes are ignored. -A C<*> for the repeat count of pack() means to use all the bytes of -the input field. On unpack()ing the bits are converted to a string +If the input string of pack() is longer than needed, extra characters are +ignored. +A C<*> for the repeat count of pack() means to use all the characters of +the input field. On unpack()ing the nybbles are converted to a string of hexadecimal digits. =item * @@ -3497,24 +3585,32 @@ so will result in a fatal error. =item * -The C template character allows packing and unpacking of strings where -the packed structure contains a byte count followed by the string itself. -You write ICI. +The C template character allows packing and unpacking of a sequence of +items where the packed structure contains a packed item count followed by +the packed items themselves. +You write ICI. The I can be any C template letter, and describes how the length value is packed. The ones likely to be of most use are integer-packing ones like C (for Java strings), C (for ASN.1 or SNMP) and C (for Sun XDR). -For C, the I must, at present, be C<"A*">, C<"a*"> or -C<"Z*">. For C the length of the string is obtained from the -I, but if you put in the '*' it will be ignored. For all other -codes, C applies the length value to the next item, which must not -have a repeat count. +For C, the I may have a repeat count, in which case +the minimum of that and the number of available items is used as argument +for the I. If it has no repeat count or uses a '*', the number +of available items is used. For C the repeat count is always obtained +by decoding the packed item count, and the I must not have a +repeat count. + +If the I refers to a string type (C<"A">, C<"a"> or C<"Z">), +the I is a string length, not a number of strings. If there is +an explicit repeat count for pack, the packed string will be adjusted to that +given length. - unpack 'C/a', "\04Gurusamy"; gives 'Guru' - unpack 'a3/A* A*', '007 Bond J '; gives (' Bond','J') - pack 'n/a* w/a*','hello,','world'; gives "\000\006hello,\005world" + unpack 'W/a', "\04Gurusamy"; gives ('Guru') + unpack 'a3/A* A*', '007 Bond J '; gives (' Bond', 'J') + pack 'n/a* w/a','hello,','world'; gives "\000\006hello,\005world" + pack 'a/W2', ord('a') .. ord('z'); gives '2ab' The I is not returned explicitly from C. @@ -3581,7 +3677,7 @@ Some systems may have even weirder byte orders such as You can see your system's preference with print join(" ", map { sprintf "%#02x", $_ } - unpack("C*",pack("L",0x12345678))), "\n"; + unpack("W*",pack("L",0x12345678))), "\n"; The byteorder on the platform where Perl was built is also available via L: @@ -3620,7 +3716,7 @@ binary representation (e.g. IEEE floating point format). Even if all platforms are using IEEE, there may be subtle differences. Being able to use C> or C> on floating point values can be very useful, but also very dangerous if you don't know exactly what you're doing. -It is definetely not a general way to portably store floating point +It is definitely not a general way to portably store floating point values. When using C> or C> on an C<()>-group, this will affect @@ -3649,21 +3745,21 @@ will not in general equal $foo). =item * -If the pattern begins with a C, the resulting string will be -treated as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a -string with an initial C, and the bytes that follow will be -interpreted as Unicode characters. If you don't want this to happen, -you can begin your pattern with C (or anything else) to force Perl -not to UTF-8 encode your string, and then follow this with a C -somewhere in your pattern. +Pack and unpack can operate in two modes, character mode (C mode) where +the packed string is processed per character and UTF-8 mode (C mode) +where the packed string is processed in its UTF-8-encoded Unicode form on +a byte by byte basis. Character mode is the default unless the format string +starts with an C. You can switch mode at any moment with an explicit +C or C in the format. A mode is in effect until the next mode switch +or until the end of the ()-group in which it was entered. =item * You must yourself do any alignment or padding by inserting for example enough C<'x'>es while packing. There is no way to pack() and unpack() -could know where the bytes are going to or coming from. Therefore +could know where the characters are going to or coming from. Therefore C (and C) handle their output and input as flat -sequences of bytes. +sequences of characters. =item * @@ -3676,14 +3772,13 @@ C<@> starts again at 0. Therefore, the result of is the string "\0a\0\0bc". - =item * C and C accept C modifier. In this case they act as alignment commands: they jump forward/back to the closest position -aligned at a multiple of C bytes. For example, to pack() or +aligned at a multiple of C characters. For example, to pack() or unpack() C's C one may need to -use the template C; this assumes that doubles must be +use the template C; this assumes that doubles must be aligned on the double's size. For alignment commands C of 0 is equivalent to C of 1; @@ -3713,20 +3808,27 @@ to pack() than actually given, extra arguments are ignored. Examples: - $foo = pack("CCCC",65,66,67,68); + $foo = pack("WWWW",65,66,67,68); # foo eq "ABCD" - $foo = pack("C4",65,66,67,68); + $foo = pack("W4",65,66,67,68); # same thing + $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9); + # same thing with Unicode circled letters. $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); - # same thing with Unicode circled letters + # same thing with Unicode circled letters. You don't get the UTF-8 + # bytes because the U at the start of the format caused a switch to + # U0-mode, so the UTF-8 bytes get joined into characters + $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9); + # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9" + # This is the UTF-8 encoding of the string in the previous example $foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD" - # note: the above examples featuring "C" and "c" are true + # note: the above examples featuring "W" and "c" are true # only on ASCII and ASCII-derived systems such as ISO Latin 1 # and UTF-8. In EBCDIC the first example would be - # $foo = pack("CCCC",193,194,195,196); + # $foo = pack("WWWW",193,194,195,196); $foo = pack("s2",1,2); # "\1\0\2\0" on little-endian @@ -3760,6 +3862,8 @@ Examples: $bar = pack('s@4l', 12, 34); # short 12, zero fill to position 4, long 34 # $foo eq $bar + $baz = pack('s.l', 12, 4, 34); + # short 12, zero fill to position 4, long 34 $foo = pack('nN', 42, 4711); # pack big-endian 16- and 32-bit unsigned integers @@ -3872,8 +3976,9 @@ the corresponding right parenthesis to terminate the arguments to the print--interpose a C<+> or put parentheses around all the arguments. -Note that if you're storing FILEHANDLES in an array or other expression, -you will have to use a block returning its value instead: +Note that if you're storing FILEHANDLEs in an array, or if you're using +any other expression more complex than a scalar variable to retrieve it, +you will have to use a block returning the filehandle value instead: print { $files[$i] } "stuff\n"; print { $OK ? STDOUT : STDERR } "stuff\n"; @@ -4148,9 +4253,6 @@ name is returned instead. You can think of C as a C operator. unless (ref($r)) { print "r is not a reference at all.\n"; } - if (UNIVERSAL::isa($r, "HASH")) { # for subclassing - print "r is a reference to something that isa hash.\n"; - } See also L. @@ -4262,7 +4364,7 @@ a bareword argument, there is a little extra functionality going on behind the scenes. Before C looks for a "F<.pm>" extension, it will first look for a filename with a "F<.pmc>" extension. A file with this extension is assumed to be Perl bytecode generated by -L. If this file is found, and it's modification +L. If this file is found, and its modification time is newer than a coinciding "F<.pm>" non-compiled file, it will be loaded in place of that non-compiled file ending in a "F<.pm>" extension. @@ -4565,6 +4667,14 @@ Note that whether C. +On error, C, except as permitted by POSIX, and even then only on POSIX systems. You have to use C instead. @@ -4657,9 +4767,9 @@ Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the C<@_> array within the lexical scope of subroutines and formats, and the -C<@ARGV> array at file scopes or within the lexical scopes established by -the C, C, C, C, and C -constructs. +C<@ARGV> array outside of a subroutine and also within the lexical scopes +established by the C, C, C, C +and C constructs. See also C, C, and C. C and C do the same thing to the left end of an array that C and C do to the @@ -5028,14 +5138,19 @@ characters at each point it matches that way. For example: produces the output 'h:i:t:h:e:r:e'. -Using the empty pattern C specifically matches the null string, and is -not be confused with the use of C to mean "the last successful pattern -match". +As a special case for C, using the empty pattern C specifically +matches only the null string, and is not be confused with the regular use +of C to mean "the last successful pattern match". So, for C, +the following: -Empty leading (or trailing) fields are produced when there are positive width -matches at the beginning (or end) of the string; a zero-width match at the -beginning (or end) of the string does not produce an empty field. For -example: + print join(':', split(//, 'hi there')); + +produces the output 'h:i: :t:h:e:r:e'. + +Empty leading (or trailing) fields are produced when there are positive +width matches at the beginning (or end) of the string; a zero-width match +at the beginning (or end) of the string does not produce an empty field. +For example: print join(':', split(/(?=\w)/, 'hi there!')); @@ -5921,8 +6036,8 @@ C<$?> like this: printf "child exited with value %d\n", $? >> 8; } -or more portably by using the W*() calls of the POSIX extension; -see L for more information. +Alternatively you might inspect the value of C<${^CHILD_ERROR_NATIVE}> +with the W*() calls of the POSIX extension. When the arguments get executed via the system shell, results and return codes will be subject to its quirks and capabilities. @@ -6242,7 +6357,7 @@ If EXPR is omitted, unpacks the C<$_> string. The string is broken into chunks described by the TEMPLATE. Each chunk is converted separately to a value. Typically, either the string is a result -of C, or the bytes of the string represent a C structure of some +of C, or the characters of the string represent a C structure of some kind. The TEMPLATE has the same format as in the C function. @@ -6255,7 +6370,7 @@ Here's a subroutine that does substring: and then there's - sub ordinal { unpack("c",$_[0]); } # same as ord() + sub ordinal { unpack("W",$_[0]); } # same as ord() In addition to fields allowed in pack(), you may prefix a field with a % to indicate that @@ -6269,7 +6384,7 @@ computes the same number as the System V sum program: $checksum = do { local $/; # slurp! - unpack("%32C*",<>) % 65535; + unpack("%32W*",<>) % 65535; }; The following efficiently counts the number of set bits in a bit vector: @@ -6707,7 +6822,8 @@ example should print the following table: Behaves like the wait(2) system call on your system: it waits for a child process to terminate and returns the pid of the deceased process, or -C<-1> if there are no child processes. The status is returned in C<$?>. +C<-1> if there are no child processes. The status is returned in C<$?> +and C<{^CHILD_ERROR_NATIVE}>. Note that a return value of C<-1> could mean that child processes are being automatically reaped, as described in L. @@ -6716,7 +6832,7 @@ being automatically reaped, as described in L. Waits for a particular child process to terminate and returns the pid of the deceased process, or C<-1> if there is no such child process. On some systems, a value of 0 indicates that there are processes still running. -The status is returned in C<$?>. If you say +The status is returned in C<$?> and C<{^CHILD_ERROR_NATIVE}>. If you say use POSIX ":sys_wait_h"; #...