X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfunc.pod;h=a428b5f655e5d5da0004274fc6764cfa3afb7cff;hb=36768cf4ecea77bfd5cba13cb714b94a91cfd528;hp=211bdc8199b59095356359f6e18c5e565ddd1309;hpb=a4142048d093908dfadd5b3e7ed8f633af464ea8;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 211bdc8..a428b5f 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -614,7 +614,7 @@ false otherwise. See the example under C. Changes the permissions of a list of files. The first element of the list must be the numerical mode, which should probably be an octal -number, and which definitely should I a string of octal digits: +number, and which definitely should I be a string of octal digits: C<0644> is okay, C<'0644'> is not. Returns the number of files successfully changed. See also L, if all you have is a string. @@ -1206,8 +1206,7 @@ A deprecated form of subroutine call. See L. =item do EXPR Uses the value of EXPR as a filename and executes the contents of the -file as a Perl script. Its primary use is to include subroutines -from a Perl subroutine library. +file as a Perl script. do 'stat.pl'; @@ -1216,7 +1215,7 @@ is just like eval `cat stat.pl`; except that it's more efficient and concise, keeps track of the current -filename for error messages, searches the @INC libraries, and updates +filename for error messages, searches the @INC directories, and updates C<%INC> if the file is found. See L for these variables. It also differs in that code evaluated with C cannot see lexicals in the enclosing scope; C does. It's the @@ -2055,7 +2054,7 @@ addresses returned by the corresponding system library call. In the Internet domain, each address is four bytes long and you can unpack it by saying something like: - ($a,$b,$c,$d) = unpack('C4',$addr[0]); + ($a,$b,$c,$d) = unpack('W4',$addr[0]); The Socket library makes this slightly easier: @@ -2115,13 +2114,13 @@ integer which you can decode using unpack with the C (or C) format. An example testing if Nagle's algorithm is turned on on a socket: - use Socket; + use Socket qw(:all); defined(my $tcp = getprotobyname("tcp")) or die "Could not determine the protocol number for tcp"; - # my $tcp = Socket::IPPROTO_TCP; # Alternative - my $packed = getsockopt($socket, $tcp, Socket::TCP_NODELAY) - or die "Could not query TCP_NODELAY SOCKEt option: $!"; + # my $tcp = IPPROTO_TCP; # Alternative + my $packed = getsockopt($socket, $tcp, TCP_NODELAY) + or die "Could not query TCP_NODELAY socket option: $!"; my $nodelay = unpack("I", $packed); print "Nagle's algorithm is turned ", $nodelay ? "off\n" : "on\n"; @@ -2266,7 +2265,7 @@ See also L for a list composed of the results of the BLOCK or EXPR. =item hex Interprets EXPR as a hex string and returns the corresponding value. -(To convert strings that might start with either 0, 0x, or 0b, see +(To convert strings that might start with either C<0>, C<0x>, or C<0b>, see L.) If EXPR is omitted, uses C<$_>. print hex '0xAf'; # prints '175' @@ -2274,7 +2273,8 @@ L.) If EXPR is omitted, uses C<$_>. Hex strings may only represent integers. Strings that would cause integer overflow trigger a warning. Leading whitespace is not stripped, -unlike oct(). +unlike oct(). To present something as hex, look into L, +L, or L. =item import @@ -2519,36 +2519,44 @@ for details, including issues with tied arrays and hashes. =item localtime EXPR +=item localtime + Converts a time as returned by the time function to a 9-element list with the time analyzed for the local time zone. Typically used as follows: # 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = - localtime(time); + localtime(time); All list elements are numeric, and come straight out of the C `struct -tm'. $sec, $min, and $hour are the seconds, minutes, and hours of the -specified time. $mday is the day of the month, and $mon is the month -itself, in the range C<0..11> with 0 indicating January and 11 -indicating December. $year is the number of years since 1900. That -is, $year is C<123> in year 2023. $wday is the day of the week, with -0 indicating Sunday and 3 indicating Wednesday. $yday is the day of -the year, in the range C<0..364> (or C<0..365> in leap years.) $isdst -is true if the specified time occurs during daylight savings time, -false otherwise. +tm'. C<$sec>, C<$min>, and C<$hour> are the seconds, minutes, and hours +of the specified time. -Note that the $year element is I simply the last two digits of -the year. If you assume it is, then you create non-Y2K-compliant -programs--and you wouldn't want to do that, would you? +C<$mday> is the day of the month, and C<$mon> is the month itself, in +the range C<0..11> with 0 indicating January and 11 indicating December. +This makes it easy to get a month name from a list: -The proper way to get a complete 4-digit year is simply: + my @abbr = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); + print "$abbr[$mon] $mday"; + # $mon=9, $mday=18 gives "Oct 18" - $year += 1900; +C<$year> is the number of years since 1900, not just the last two digits +of the year. That is, C<$year> is C<123> in year 2023. The proper way +to get a complete 4-digit year is simply: -And to get the last two digits of the year (e.g., '01' in 2001) do: + $year += 1900; - $year = sprintf("%02d", $year % 100); +To get the last two digits of the year (e.g., '01' in 2001) do: + + $year = sprintf("%02d", $year % 100); + +C<$wday> is the day of the week, with 0 indicating Sunday and 3 indicating +Wednesday. C<$yday> is the day of the year, in the range C<0..364> +(or C<0..365> in leap years.) + +C<$isdst> is true if the specified time occurs during Daylight Saving +Time, false otherwise. If EXPR is omitted, C uses the current time (C). @@ -2679,10 +2687,13 @@ and you get list of anonymous hashes each with only 1 entry. =item mkdir FILENAME +=item mkdir + Creates the directory specified by FILENAME, with permissions specified by MASK (as modified by C). If it succeeds it returns true, otherwise it returns false and sets C<$!> (errno). -If omitted, MASK defaults to 0777. +If omitted, MASK defaults to 0777. If omitted, FILENAME defaults +to C<$_>. In general, it is better to create directories with permissive MASK, and let the user modify that with their C, than it is to supply @@ -3000,7 +3011,7 @@ Examples: } } -See L for detailed info on PerlIO. +See L for detailed info on PerlIO. You may also, in the Bourne shell tradition, specify an EXPR beginning with C<< '>&' >>, in which case the rest of the string is interpreted @@ -3289,7 +3300,8 @@ Takes a LIST of values and converts it into a string using the rules given by the TEMPLATE. The resulting string is the concatenation of the converted values. Typically, each converted value looks like its machine-level representation. For example, on 32-bit machines -a converted integer may be represented by a sequence of 4 bytes. +an integer may be represented by a sequence of 4 bytes which will be +converted to a sequence of 4 characters. The TEMPLATE is a sequence of characters that give the order and type of values, as follows: @@ -3304,7 +3316,9 @@ of values, as follows: H A hex string (high nybble first). c A signed char (8-bit) value. - C An unsigned char value. Only does bytes. See U for Unicode. + C An unsigned C char (octet) even under Unicode. Should normally not + be used. See U and W instead. + W An unsigned char value (can be greater than 255). s A signed short (16-bit) value. S An unsigned short value. @@ -3347,15 +3361,16 @@ of values, as follows: U A Unicode character number. Encodes to UTF-8 internally (or UTF-EBCDIC in EBCDIC platforms). - w A BER compressed integer. Its bytes represent an unsigned - integer in base 128, most significant digit first, with as - few digits as possible. Bit eight (the high bit) is set - on each byte except the last. + w A BER compressed integer (not an ASN.1 BER, see perlpacktut for + details). Its bytes represent an unsigned integer in base 128, + most significant digit first, with as few digits as possible. Bit + eight (the high bit) is set on each byte except the last. x A null byte. X Back up a byte. - @ Null fill to absolute position, counted from the start of - the innermost ()-group. + @ Null fill or truncate to absolute position, counted from the + start of the innermost ()-group. + . Null fill or truncate to absolute position specified by value. ( Start of a ()-group. Some letters in the TEMPLATE may optionally be followed by one or @@ -3369,6 +3384,10 @@ which the modifier is valid): nNvV Treat integers as signed instead of unsigned. + @. Specify position as byte offset in the internal + representation of the packed string. Efficient but + dangerous. + > sSiIlLqQ Force big-endian byte-order on the type. jJfFdDpP (The "big end" touches the construct.) @@ -3387,12 +3406,13 @@ The following rules apply: Each letter may optionally be followed by a number giving a repeat count. With all types except C, C, C, C, C, C, -C, C<@>, C, C and C

the pack function will gobble up that -many values from the LIST. A C<*> for the repeat count means to use -however many items are left, except for C<@>, C, C, where it is -equivalent to C<0>, and C, where it is equivalent to 1 (or 45, what -is the same). A numeric repeat count may optionally be enclosed in -brackets, as in C. +C, C<@>, C<.>, C, C and C

the pack function will gobble up +that many values from the LIST. A C<*> for the repeat count means to +use however many items are left, except for C<@>, C, C, where it +is equivalent to C<0>, for <.> where it means relative to string start +and C, where it is equivalent to 1 (or 45, which is the same). +A numeric repeat count may optionally be enclosed in brackets, as in +C. One can replace the numeric repeat count by a template enclosed in brackets; then the packed length of this template in bytes is used as a count. @@ -3406,72 +3426,84 @@ When used with C, C<*> results in the addition of a trailing null byte (so the packed result will be one longer than the byte C of the item). +When used with C<@>, the repeat count represents an offset from the start +of the innermost () group. + +When used with C<.>, the repeat count is used to determine the starting +position from where the value offset is calculated. If the repeat count +is 0, it's relative to the current position. If the repeat count is C<*>, +the offset is relative to the start of the packed string. And if its an +integer C the offset is relative to the start of the n-th innermost +() group (or the start of the string if C is bigger then the group +level). + The repeat count for C is interpreted as the maximal number of bytes -to encode per line of output, with 0 and 1 replaced by 45. +to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat +count should not be more than 65. =item * The C, C, and C types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. When -unpacking, C strips trailing spaces and nulls, C strips everything -after the first null, and C returns data verbatim. When packing, -C, and C are equivalent. +unpacking, C strips trailing whitespace and nulls, C strips everything +after the first null, and C returns data verbatim. If the value-to-pack is too long, it is truncated. If too long and an explicit count is provided, C packs only C<$count-1> bytes, followed -by a null byte. Thus C always packs a trailing null byte under -all circumstances. +by a null byte. Thus C always packs a trailing null (except when the +count is 0). =item * Likewise, the C and C fields pack a string that many bits long. -Each byte of the input field of pack() generates 1 bit of the result. +Each character of the input field of pack() generates 1 bit of the result. Each result bit is based on the least-significant bit of the corresponding -input byte, i.e., on C. In particular, bytes C<"0"> and -C<"1"> generate bits 0 and 1, as do bytes C<"\0"> and C<"\1">. +input character, i.e., on C. In particular, characters C<"0"> +and C<"1"> generate bits 0 and 1, as do characters C<"\0"> and C<"\1">. Starting from the beginning of the input string of pack(), each 8-tuple -of bytes is converted to 1 byte of output. With format C -the first byte of the 8-tuple determines the least-significant bit of a -byte, and with format C it determines the most-significant bit of -a byte. +of characters is converted to 1 character of output. With format C +the first character of the 8-tuple determines the least-significant bit of a +character, and with format C it determines the most-significant bit of +a character. If the length of the input string is not exactly divisible by 8, the -remainder is packed as if the input string were padded by null bytes +remainder is packed as if the input string were padded by null characters at the end. Similarly, during unpack()ing the "extra" bits are ignored. -If the input string of pack() is longer than needed, extra bytes are ignored. -A C<*> for the repeat count of pack() means to use all the bytes of -the input field. On unpack()ing the bits are converted to a string -of C<"0">s and C<"1">s. +If the input string of pack() is longer than needed, extra characters are +ignored. A C<*> for the repeat count of pack() means to use all the +characters of the input field. On unpack()ing the bits are converted to a +string of C<"0">s and C<"1">s. =item * The C and C fields pack a string that many nybbles (4-bit groups, representable as hexadecimal digits, 0-9a-f) long. -Each byte of the input field of pack() generates 4 bits of the result. -For non-alphabetical bytes the result is based on the 4 least-significant -bits of the input byte, i.e., on C. In particular, -bytes C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes -C<"\0"> and C<"\1">. For bytes C<"a".."f"> and C<"A".."F"> the result +Each character of the input field of pack() generates 4 bits of the result. +For non-alphabetical characters the result is based on the 4 least-significant +bits of the input character, i.e., on C. In particular, +characters C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes +C<"\0"> and C<"\1">. For characters C<"a".."f"> and C<"A".."F"> the result is compatible with the usual hexadecimal digits, so that C<"a"> and -C<"A"> both generate the nybble C<0xa==10>. The result for bytes +C<"A"> both generate the nybble C<0xa==10>. The result for characters C<"g".."z"> and C<"G".."Z"> is not well-defined. Starting from the beginning of the input string of pack(), each pair -of bytes is converted to 1 byte of output. With format C the -first byte of the pair determines the least-significant nybble of the -output byte, and with format C it determines the most-significant +of characters is converted to 1 character of output. With format C the +first character of the pair determines the least-significant nybble of the +output character, and with format C it determines the most-significant nybble. If the length of the input string is not even, it behaves as if padded -by a null byte at the end. Similarly, during unpack()ing the "extra" +by a null character at the end. Similarly, during unpack()ing the "extra" nybbles are ignored. -If the input string of pack() is longer than needed, extra bytes are ignored. -A C<*> for the repeat count of pack() means to use all the bytes of -the input field. On unpack()ing the bits are converted to a string +If the input string of pack() is longer than needed, extra characters are +ignored. +A C<*> for the repeat count of pack() means to use all the characters of +the input field. On unpack()ing the nybbles are converted to a string of hexadecimal digits. =item * @@ -3490,24 +3522,32 @@ so will result in a fatal error. =item * -The C template character allows packing and unpacking of strings where -the packed structure contains a byte count followed by the string itself. -You write ICI. +The C template character allows packing and unpacking of a sequence of +items where the packed structure contains a packed item count followed by +the packed items themselves. +You write ICI. The I can be any C template letter, and describes how the length value is packed. The ones likely to be of most use are integer-packing ones like C (for Java strings), C (for ASN.1 or SNMP) and C (for Sun XDR). -For C, the I must, at present, be C<"A*">, C<"a*"> or -C<"Z*">. For C the length of the string is obtained from the -I, but if you put in the '*' it will be ignored. For all other -codes, C applies the length value to the next item, which must not -have a repeat count. +For C, the I may have a repeat count, in which case +the minimum of that and the number of available items is used as argument +for the I. If it has no repeat count or uses a '*', the number +of available items is used. For C the repeat count is always obtained +by decoding the packed item count, and the I must not have a +repeat count. - unpack 'C/a', "\04Gurusamy"; gives 'Guru' - unpack 'a3/A* A*', '007 Bond J '; gives (' Bond','J') - pack 'n/a* w/a*','hello,','world'; gives "\000\006hello,\005world" +If the I refers to a string type (C<"A">, C<"a"> or C<"Z">), +the I is a string length, not a number of strings. If there is +an explicit repeat count for pack, the packed string will be adjusted to that +given length. + + unpack 'W/a', "\04Gurusamy"; gives ('Guru') + unpack 'a3/A* A*', '007 Bond J '; gives (' Bond', 'J') + pack 'n/a* w/a','hello,','world'; gives "\000\006hello,\005world" + pack 'a/W2', ord('a') .. ord('z'); gives '2ab' The I is not returned explicitly from C. @@ -3574,7 +3614,7 @@ Some systems may have even weirder byte orders such as You can see your system's preference with print join(" ", map { sprintf "%#02x", $_ } - unpack("C*",pack("L",0x12345678))), "\n"; + unpack("W*",pack("L",0x12345678))), "\n"; The byteorder on the platform where Perl was built is also available via L: @@ -3642,21 +3682,21 @@ will not in general equal $foo). =item * -If the pattern begins with a C, the resulting string will be -treated as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a -string with an initial C, and the bytes that follow will be -interpreted as Unicode characters. If you don't want this to happen, -you can begin your pattern with C (or anything else) to force Perl -not to UTF-8 encode your string, and then follow this with a C -somewhere in your pattern. +Pack and unpack can operate in two modes, character mode (C mode) where +the packed string is processed per character and UTF-8 mode (C mode) +where the packed string is processed in its UTF-8-encoded Unicode form on +a byte by byte basis. Character mode is the default unless the format string +starts with an C. You can switch mode at any moment with an explicit +C or C in the format. A mode is in effect until the next mode switch +or until the end of the ()-group in which it was entered. =item * You must yourself do any alignment or padding by inserting for example enough C<'x'>es while packing. There is no way to pack() and unpack() -could know where the bytes are going to or coming from. Therefore +could know where the characters are going to or coming from. Therefore C (and C) handle their output and input as flat -sequences of bytes. +sequences of characters. =item * @@ -3669,14 +3709,13 @@ C<@> starts again at 0. Therefore, the result of is the string "\0a\0\0bc". - =item * C and C accept C modifier. In this case they act as alignment commands: they jump forward/back to the closest position -aligned at a multiple of C bytes. For example, to pack() or +aligned at a multiple of C characters. For example, to pack() or unpack() C's C one may need to -use the template C; this assumes that doubles must be +use the template C; this assumes that doubles must be aligned on the double's size. For alignment commands C of 0 is equivalent to C of 1; @@ -3706,20 +3745,27 @@ to pack() than actually given, extra arguments are ignored. Examples: - $foo = pack("CCCC",65,66,67,68); + $foo = pack("WWWW",65,66,67,68); # foo eq "ABCD" - $foo = pack("C4",65,66,67,68); + $foo = pack("W4",65,66,67,68); # same thing + $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9); + # same thing with Unicode circled letters. $foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); - # same thing with Unicode circled letters + # same thing with Unicode circled letters. You don't get the UTF-8 + # bytes because the U at the start of the format caused a switch to + # U0-mode, so the UTF-8 bytes get joined into characters + $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9); + # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9" + # This is the UTF-8 encoding of the string in the previous example $foo = pack("ccxxcc",65,66,67,68); # foo eq "AB\0\0CD" - # note: the above examples featuring "C" and "c" are true + # note: the above examples featuring "W" and "c" are true # only on ASCII and ASCII-derived systems such as ISO Latin 1 # and UTF-8. In EBCDIC the first example would be - # $foo = pack("CCCC",193,194,195,196); + # $foo = pack("WWWW",193,194,195,196); $foo = pack("s2",1,2); # "\1\0\2\0" on little-endian @@ -3753,6 +3799,8 @@ Examples: $bar = pack('s@4l', 12, 34); # short 12, zero fill to position 4, long 34 # $foo eq $bar + $baz = pack('s.l', 12, 4, 34); + # short 12, zero fill to position 4, long 34 $foo = pack('nN', 42, 4711); # pack big-endian 16- and 32-bit unsigned integers @@ -3828,9 +3876,14 @@ array in subroutines, just like C. =item pos Returns the offset of where the last C search left off for the variable -in question (C<$_> is used when the variable is not specified). May be -modified to change that offset. Such modification will also influence -the C<\G> zero-width assertion in regular expressions. See L and +in question (C<$_> is used when the variable is not specified). Note that +0 is a valid match offset, while C indicates that the search position +is reset (usually due to match failure, but can also be because no match has +yet been performed on the scalar). C directly accesses the location used +by the regexp engine to store the offset, so assigning to C will change +that offset, and so will also influence the C<\G> zero-width assertion in +regular expressions. Because a failed C match doesn't reset the offset, +the return from C won't change either in this case. See L and L. =item print FILEHANDLE LIST @@ -4250,7 +4303,7 @@ a bareword argument, there is a little extra functionality going on behind the scenes. Before C looks for a "F<.pm>" extension, it will first look for a filename with a "F<.pmc>" extension. A file with this extension is assumed to be Perl bytecode generated by -L. If this file is found, and it's modification +L. If this file is found, and its modification time is newer than a coinciding "F<.pm>" non-compiled file, it will be loaded in place of that non-compiled file ending in a "F<.pm>" extension. @@ -4553,6 +4606,14 @@ Note that whether C. +On error, C, except as permitted by POSIX, and even then only on POSIX systems. You have to use C instead. @@ -5016,14 +5077,19 @@ characters at each point it matches that way. For example: produces the output 'h:i:t:h:e:r:e'. -Using the empty pattern C specifically matches the null string, and is -not be confused with the use of C to mean "the last successful pattern -match". +As a special case for C, using the empty pattern C specifically +matches only the null string, and is not be confused with the regular use +of C to mean "the last successful pattern match". So, for C, +the following: -Empty leading (or trailing) fields are produced when there are positive width -matches at the beginning (or end) of the string; a zero-width match at the -beginning (or end) of the string does not produce an empty field. For -example: + print join(':', split(//, 'hi there')); + +produces the output 'h:i: :t:h:e:r:e'. + +Empty leading (or trailing) fields are produced when there are positive +width matches at the beginning (or end) of the string; a zero-width match +at the beginning (or end) of the string does not produce an empty field. +For example: print join(':', split(/(?=\w)/, 'hi there!')); @@ -5464,9 +5530,9 @@ meanings of the fields: (The epoch was at 00:00 January 1, 1970 GMT.) -(*) The ctime field is non-portable. In particular, you cannot expect -it to be a "creation time", see L -for details. +(*) Not all fields are supported on all filesystem types. Notably, the +ctime field is non-portable. In particular, you cannot expect it to be a +"creation time", see L for details. If C is passed the special filehandle consisting of an underline, no stat is done, but the current contents of the stat structure from the @@ -5836,7 +5902,7 @@ will return byte offsets, not character offsets (because implementing that would render sysseek() very slow). sysseek() bypasses normal buffered IO, so mixing this with reads (other -than C, for example >< or read()) C, C, +than C, for example C<< <> >> or read()) C, C, C, C, or C may cause confusion. For WHENCE, you may also use the constants C, C, @@ -6230,7 +6296,7 @@ If EXPR is omitted, unpacks the C<$_> string. The string is broken into chunks described by the TEMPLATE. Each chunk is converted separately to a value. Typically, either the string is a result -of C, or the bytes of the string represent a C structure of some +of C, or the characters of the string represent a C structure of some kind. The TEMPLATE has the same format as in the C function. @@ -6243,7 +6309,7 @@ Here's a subroutine that does substring: and then there's - sub ordinal { unpack("c",$_[0]); } # same as ord() + sub ordinal { unpack("W",$_[0]); } # same as ord() In addition to fields allowed in pack(), you may prefix a field with a % to indicate that @@ -6257,7 +6323,7 @@ computes the same number as the System V sum program: $checksum = do { local $/; # slurp! - unpack("%32C*",<>) % 65535; + unpack("%32W*",<>) % 65535; }; The following efficiently counts the number of set bits in a bit vector: @@ -6726,7 +6792,7 @@ and for other examples. =item wantarray Returns true if the context of the currently executing subroutine or -eval() block is looking for a list value. Returns false if the context is +C is looking for a list value. Returns false if the context is looking for a scalar. Returns the undefined value if the context is looking for no value (void context). @@ -6734,6 +6800,10 @@ looking for no value (void context). my @a = complex_calculation(); return wantarray ? @a : "@a"; +C's result is unspecified in the top level of a file, +in a C, C, C or C block, or in a C +method. + This function should have been named wantlist() instead. =item warn LIST