=item do EXPR
Uses the value of EXPR as a filename and executes the contents of the
-file as a Perl script. Its primary use is to include subroutines
-from a Perl subroutine library.
+file as a Perl script.
do 'stat.pl';
eval `cat stat.pl`;
except that it's more efficient and concise, keeps track of the current
-filename for error messages, searches the @INC libraries, and updates
+filename for error messages, searches the @INC directories, and updates
C<%INC> if the file is found. See L<perlvar/Predefined Names> for these
variables. It also differs in that code evaluated with C<do FILENAME>
cannot see lexicals in the enclosing scope; C<eval STRING> does. It's the
Internet domain, each address is four bytes long and you can unpack it
by saying something like:
- ($a,$b,$c,$d) = unpack('C4',$addr[0]);
+ ($a,$b,$c,$d) = unpack('W4',$addr[0]);
The Socket library makes this slightly easier:
An example testing if Nagle's algorithm is turned on on a socket:
- use Socket;
+ use Socket qw(:all);
defined(my $tcp = getprotobyname("tcp"))
or die "Could not determine the protocol number for tcp";
- # my $tcp = Socket::IPPROTO_TCP; # Alternative
- my $packed = getsockopt($socket, $tcp, Socket::TCP_NODELAY)
- or die "Could not query TCP_NODELAY SOCKEt option: $!";
+ # my $tcp = IPPROTO_TCP; # Alternative
+ my $packed = getsockopt($socket, $tcp, TCP_NODELAY)
+ or die "Could not query TCP_NODELAY socket option: $!";
my $nodelay = unpack("I", $packed);
print "Nagle's algorithm is turned ", $nodelay ? "off\n" : "on\n";
=item hex
Interprets EXPR as a hex string and returns the corresponding value.
-(To convert strings that might start with either 0, 0x, or 0b, see
+(To convert strings that might start with either C<0>, C<0x>, or C<0b>, see
L</oct>.) If EXPR is omitted, uses C<$_>.
print hex '0xAf'; # prints '175'
Hex strings may only represent integers. Strings that would cause
integer overflow trigger a warning. Leading whitespace is not stripped,
-unlike oct().
+unlike oct(). To present something as hex, look into L</printf>,
+L</sprintf>, or L</unpack>.
=item import
=item localtime EXPR
+=item localtime
+
Converts a time as returned by the time function to a 9-element list
with the time analyzed for the local time zone. Typically used as
follows:
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
- localtime(time);
+ localtime(time);
All list elements are numeric, and come straight out of the C `struct
-tm'. $sec, $min, and $hour are the seconds, minutes, and hours of the
-specified time. $mday is the day of the month, and $mon is the month
-itself, in the range C<0..11> with 0 indicating January and 11
-indicating December. $year is the number of years since 1900. That
-is, $year is C<123> in year 2023. $wday is the day of the week, with
-0 indicating Sunday and 3 indicating Wednesday. $yday is the day of
-the year, in the range C<0..364> (or C<0..365> in leap years.) $isdst
-is true if the specified time occurs during daylight savings time,
-false otherwise.
+tm'. C<$sec>, C<$min>, and C<$hour> are the seconds, minutes, and hours
+of the specified time.
-Note that the $year element is I<not> simply the last two digits of
-the year. If you assume it is, then you create non-Y2K-compliant
-programs--and you wouldn't want to do that, would you?
+C<$mday> is the day of the month, and C<$mon> is the month itself, in
+the range C<0..11> with 0 indicating January and 11 indicating December.
+This makes it easy to get a month name from a list:
-The proper way to get a complete 4-digit year is simply:
+ my @abbr = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec );
+ print "$abbr[$mon] $mday";
+ # $mon=9, $mday=18 gives "Oct 18"
- $year += 1900;
+C<$year> is the number of years since 1900, not just the last two digits
+of the year. That is, C<$year> is C<123> in year 2023. The proper way
+to get a complete 4-digit year is simply:
-And to get the last two digits of the year (e.g., '01' in 2001) do:
+ $year += 1900;
- $year = sprintf("%02d", $year % 100);
+To get the last two digits of the year (e.g., '01' in 2001) do:
+
+ $year = sprintf("%02d", $year % 100);
+
+C<$wday> is the day of the week, with 0 indicating Sunday and 3 indicating
+Wednesday. C<$yday> is the day of the year, in the range C<0..364>
+(or C<0..365> in leap years.)
+
+C<$isdst> is true if the specified time occurs during Daylight Saving
+Time, false otherwise.
If EXPR is omitted, C<localtime()> uses the current time (C<localtime(time)>).
given by the TEMPLATE. The resulting string is the concatenation of
the converted values. Typically, each converted value looks
like its machine-level representation. For example, on 32-bit machines
-a converted integer may be represented by a sequence of 4 bytes.
+an integer may be represented by a sequence of 4 bytes which will be
+converted to a sequence of 4 characters.
The TEMPLATE is a sequence of characters that give the order and type
of values, as follows:
H A hex string (high nybble first).
c A signed char (8-bit) value.
- C An unsigned char value. Only does bytes. See U for Unicode.
+ C An unsigned C char (octet) even under Unicode. Should normally not
+ be used. See U and W instead.
+ W An unsigned char value (can be greater than 255).
s A signed short (16-bit) value.
S An unsigned short value.
of the item).
The repeat count for C<u> is interpreted as the maximal number of bytes
-to encode per line of output, with 0 and 1 replaced by 45.
+to encode per line of output, with 0, 1 and 2 replaced by 45. The repeat
+count should not be more than 65.
=item *
The C<a>, C<A>, and C<Z> types gobble just one value, but pack it as a
string of length count, padding with nulls or spaces as necessary. When
unpacking, C<A> strips trailing spaces and nulls, C<Z> strips everything
-after the first null, and C<a> returns data verbatim. When packing,
-C<a>, and C<Z> are equivalent.
+after the first null, and C<a> returns data verbatim.
If the value-to-pack is too long, it is truncated. If too long and an
explicit count is provided, C<Z> packs only C<$count-1> bytes, followed
-by a null byte. Thus C<Z> always packs a trailing null byte under
-all circumstances.
+by a null byte. Thus C<Z> always packs a trailing null (except when the
+count is 0).
=item *
Likewise, the C<b> and C<B> fields pack a string that many bits long.
-Each byte of the input field of pack() generates 1 bit of the result.
+Each character of the input field of pack() generates 1 bit of the result.
Each result bit is based on the least-significant bit of the corresponding
-input byte, i.e., on C<ord($byte)%2>. In particular, bytes C<"0"> and
-C<"1"> generate bits 0 and 1, as do bytes C<"\0"> and C<"\1">.
+input character, i.e., on C<ord($char)%2>. In particular, characters C<"0">
+and C<"1"> generate bits 0 and 1, as do characters C<"\0"> and C<"\1">.
Starting from the beginning of the input string of pack(), each 8-tuple
-of bytes is converted to 1 byte of output. With format C<b>
-the first byte of the 8-tuple determines the least-significant bit of a
-byte, and with format C<B> it determines the most-significant bit of
-a byte.
+of characters is converted to 1 character of output. With format C<b>
+the first character of the 8-tuple determines the least-significant bit of a
+character, and with format C<B> it determines the most-significant bit of
+a character.
If the length of the input string is not exactly divisible by 8, the
-remainder is packed as if the input string were padded by null bytes
+remainder is packed as if the input string were padded by null characters
at the end. Similarly, during unpack()ing the "extra" bits are ignored.
-If the input string of pack() is longer than needed, extra bytes are ignored.
-A C<*> for the repeat count of pack() means to use all the bytes of
-the input field. On unpack()ing the bits are converted to a string
-of C<"0">s and C<"1">s.
+If the input string of pack() is longer than needed, extra characters are
+ignored. A C<*> for the repeat count of pack() means to use all the
+characters of the input field. On unpack()ing the bits are converted to a
+string of C<"0">s and C<"1">s.
=item *
The C<h> and C<H> fields pack a string that many nybbles (4-bit groups,
representable as hexadecimal digits, 0-9a-f) long.
-Each byte of the input field of pack() generates 4 bits of the result.
-For non-alphabetical bytes the result is based on the 4 least-significant
-bits of the input byte, i.e., on C<ord($byte)%16>. In particular,
-bytes C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes
-C<"\0"> and C<"\1">. For bytes C<"a".."f"> and C<"A".."F"> the result
+Each character of the input field of pack() generates 4 bits of the result.
+For non-alphabetical characters the result is based on the 4 least-significant
+bits of the input character, i.e., on C<ord($char)%16>. In particular,
+characters C<"0"> and C<"1"> generate nybbles 0 and 1, as do bytes
+C<"\0"> and C<"\1">. For characters C<"a".."f"> and C<"A".."F"> the result
is compatible with the usual hexadecimal digits, so that C<"a"> and
-C<"A"> both generate the nybble C<0xa==10>. The result for bytes
+C<"A"> both generate the nybble C<0xa==10>. The result for characters
C<"g".."z"> and C<"G".."Z"> is not well-defined.
Starting from the beginning of the input string of pack(), each pair
-of bytes is converted to 1 byte of output. With format C<h> the
-first byte of the pair determines the least-significant nybble of the
-output byte, and with format C<H> it determines the most-significant
+of characters is converted to 1 character of output. With format C<h> the
+first character of the pair determines the least-significant nybble of the
+output character, and with format C<H> it determines the most-significant
nybble.
If the length of the input string is not even, it behaves as if padded
-by a null byte at the end. Similarly, during unpack()ing the "extra"
+by a null character at the end. Similarly, during unpack()ing the "extra"
nybbles are ignored.
-If the input string of pack() is longer than needed, extra bytes are ignored.
-A C<*> for the repeat count of pack() means to use all the bytes of
-the input field. On unpack()ing the bits are converted to a string
+If the input string of pack() is longer than needed, extra characters are
+ignored.
+A C<*> for the repeat count of pack() means to use all the characters of
+the input field. On unpack()ing the nybbles are converted to a string
of hexadecimal digits.
=item *
codes, C<unpack> applies the length value to the next item, which must not
have a repeat count.
- unpack 'C/a', "\04Gurusamy"; gives 'Guru'
+ unpack 'W/a', "\04Gurusamy"; gives 'Guru'
unpack 'a3/A* A*', '007 Bond J '; gives (' Bond','J')
pack 'n/a* w/a*','hello,','world'; gives "\000\006hello,\005world"
You can see your system's preference with
print join(" ", map { sprintf "%#02x", $_ }
- unpack("C*",pack("L",0x12345678))), "\n";
+ unpack("W*",pack("L",0x12345678))), "\n";
The byteorder on the platform where Perl was built is also available
via L<Config>:
=item *
-If the pattern begins with a C<U>, the resulting string will be
-treated as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a
-string with an initial C<U0>, and the bytes that follow will be
-interpreted as Unicode characters. If you don't want this to happen,
-you can begin your pattern with C<C0> (or anything else) to force Perl
-not to UTF-8 encode your string, and then follow this with a C<U*>
-somewhere in your pattern.
+Pack and unpack can operate in two modes, character mode (C<C0> mode) where
+the packed string is processed per character and UTF-8 mode (C<U0> mode)
+where the packed string is processed in its UTF-8-encoded Unicode form on
+a byte by byte basis. Character mode is the default unless the format string
+starts with an C<U>. You can switch mode at any moment with an explicit
+C<C0> or C<U0> in the format. A mode is in effect until the next mode switch
+or until the end of the ()-group in which it was entered.
=item *
You must yourself do any alignment or padding by inserting for example
enough C<'x'>es while packing. There is no way to pack() and unpack()
-could know where the bytes are going to or coming from. Therefore
+could know where the characters are going to or coming from. Therefore
C<pack> (and C<unpack>) handle their output and input as flat
-sequences of bytes.
+sequences of characters.
=item *
C<x> and C<X> accept C<!> modifier. In this case they act as
alignment commands: they jump forward/back to the closest position
-aligned at a multiple of C<count> bytes. For example, to pack() or
+aligned at a multiple of C<count> characters. For example, to pack() or
unpack() C's C<struct {char c; double d; char cc[2]}> one may need to
-use the template C<C x![d] d C[2]>; this assumes that doubles must be
+use the template C<W x![d] d W[2]>; this assumes that doubles must be
aligned on the double's size.
For alignment commands C<count> of 0 is equivalent to C<count> of 1;
Examples:
- $foo = pack("CCCC",65,66,67,68);
+ $foo = pack("WWWW",65,66,67,68);
# foo eq "ABCD"
- $foo = pack("C4",65,66,67,68);
+ $foo = pack("W4",65,66,67,68);
# same thing
+ $foo = pack("W4",0x24b6,0x24b7,0x24b8,0x24b9);
+ # same thing with Unicode circled letters.
$foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
- # same thing with Unicode circled letters
+ # same thing with Unicode circled letters. You don't get the UTF-8
+ # bytes because the U at the start of the format caused a switch to
+ # U0-mode, so the UTF-8 bytes get joined into characters
+ $foo = pack("C0U4",0x24b6,0x24b7,0x24b8,0x24b9);
+ # foo eq "\xe2\x92\xb6\xe2\x92\xb7\xe2\x92\xb8\xe2\x92\xb9"
+ # This is the UTF-8 encoding of the string in the previous example
$foo = pack("ccxxcc",65,66,67,68);
# foo eq "AB\0\0CD"
- # note: the above examples featuring "C" and "c" are true
+ # note: the above examples featuring "W" and "c" are true
# only on ASCII and ASCII-derived systems such as ISO Latin 1
# and UTF-8. In EBCDIC the first example would be
- # $foo = pack("CCCC",193,194,195,196);
+ # $foo = pack("WWWW",193,194,195,196);
$foo = pack("s2",1,2);
# "\1\0\2\0" on little-endian
that would render sysseek() very slow).
sysseek() bypasses normal buffered IO, so mixing this with reads (other
-than C<sysread>, for example >< or read()) C<print>, C<write>,
+than C<sysread>, for example C<< <> >> or read()) C<print>, C<write>,
C<seek>, C<tell>, or C<eof> may cause confusion.
For WHENCE, you may also use the constants C<SEEK_SET>, C<SEEK_CUR>,
The string is broken into chunks described by the TEMPLATE. Each chunk
is converted separately to a value. Typically, either the string is a result
-of C<pack>, or the bytes of the string represent a C structure of some
+of C<pack>, or the characters of the string represent a C structure of some
kind.
The TEMPLATE has the same format as in the C<pack> function.
and then there's
- sub ordinal { unpack("c",$_[0]); } # same as ord()
+ sub ordinal { unpack("W",$_[0]); } # same as ord()
In addition to fields allowed in pack(), you may prefix a field with
a %<number> to indicate that
$checksum = do {
local $/; # slurp!
- unpack("%32C*",<>) % 65535;
+ unpack("%32W*",<>) % 65535;
};
The following efficiently counts the number of set bits in a bit vector:
=item wantarray
Returns true if the context of the currently executing subroutine or
-eval() block is looking for a list value. Returns false if the context is
+C<eval> is looking for a list value. Returns false if the context is
looking for a scalar. Returns the undefined value if the context is
looking for no value (void context).
my @a = complex_calculation();
return wantarray ? @a : "@a";
+C<wantarray()>'s result is unspecified in the top level of a file,
+in a C<BEGIN>, C<CHECK>, C<INIT> or C<END> block, or in a C<DESTROY>
+method.
+
This function should have been named wantlist() instead.
=item warn LIST