=head2 Terms and List Operators (Leftward)
-A TERM has the highest precedence in Perl. They includes variables,
+A TERM has the highest precedence in Perl. They include variables,
quote and quote-like operators, any expression in parentheses,
and any function whose arguments are parenthesized. Actually, there
aren't really functions in this sense, just list operators and unary
"" qq{} Literal yes
`` qx{} Command yes (unless '' is delimiter)
qw{} Word list no
- // m{} Pattern match yes
- qr{} Pattern yes
- s{}{} Substitution yes
+ // m{} Pattern match yes (unless '' is delimiter)
+ qr{} Pattern yes (unless '' is delimiter)
+ s{}{} Substitution yes (unless '' is delimiter)
tr{}{} Transliteration no (but see below)
Note that there can be whitespace between the operator and the quoting
For constructs that do interpolation, variables beginning with "C<$>"
or "C<@>" are interpolated, as are the following sequences. Within
-a transliteration, the first ten of these sequences may be used.
+a transliteration, the first eleven of these sequences may be used.
\t tab (HT, TAB)
\n newline (NL)
\b backspace (BS)
\a alarm (bell) (BEL)
\e escape (ESC)
- \033 octal char
- \x1b hex char
+ \033 octal char (ESC)
+ \x1b hex char (ESC)
+ \x{263a} wide hex char (SMILEY)
\c[ control char
\l lowercase next char
If "/" is the delimiter then the initial C<m> is optional. With the C<m>
you can use any pair of non-alphanumeric, non-whitespace characters
-as delimiters (if single quotes are used, no interpretation is done
-on the replacement string. Unlike Perl 4, Perl 5 treats backticks as normal
-delimiters; the replacement text is not evaluated as a command).
-This is particularly useful for matching Unix path names
-that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
+as delimiters. This is particularly useful for matching Unix path names
+that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
+If "'" is the delimiter, no variable interpolation is performed on the
+PATTERN.
PATTERN may contain variables, which will be interpolated (and the
-pattern recompiled) every time the pattern search is evaluated. (Note
-that C<$)> and C<$|> might not be interpolated because they look like
-end-of-string tests.) If you want such a pattern to be compiled only
-once, add a C</o> after the trailing delimiter. This avoids expensive
-run-time recompilations, and is useful when the value you are
-interpolating won't change over the life of the script. However, mentioning
-C</o> constitutes a promise that you won't change the variables in the pattern.
-If you change them, Perl won't even notice.
+pattern recompiled) every time the pattern search is evaluated, except
+for when the delimiter is a single quote. (Note that C<$)> and C<$|>
+might not be interpolated because they look like end-of-string tests.)
+If you want such a pattern to be compiled only once, add a C</o> after
+the trailing delimiter. This avoids expensive run-time recompilations,
+and is useful when the value you are interpolating won't change over
+the life of the script. However, mentioning C</o> constitutes a promise
+that you won't change the variables in the pattern. If you change them,
+Perl won't even notice.
If the PATTERN evaluates to the empty string, the last
I<successfully> matched regular expression is used instead.
If the C</g> option is not used, C<m//> in a list context returns a
list consisting of the subexpressions matched by the parentheses in the
-pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here
-C<$1> etc. are also set, and
-that this differs from Perl 4's behavior.) If there are no parentheses,
-the return value is the list C<(1)> for success or C<('')> upon failure.
-With parentheses, C<()> is returned upon failure.
+pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
+also set, and that this differs from Perl 4's behavior.) When there are
+no parentheses in the pattern, the return value is the list C<(1)> for
+success. With or without parentheses, an empty list is returned upon
+failure.
Examples:
=item qr/STRING/imosx
-A string which is (possibly) interpolated and then compiled as a
-regular expression. The result may be used as a pattern in a match
+Quote-as-a-regular-expression operator. I<STRING> is interpolated the
+same way as I<PATTERN> in C<m/PATTERN/>. If "'" is used as the
+delimiter, no variable interpolation is done. Returns a Perl value
+which may be used instead of the corresponding C</STRING/imosx> expression.
+
+For example,
+
+ $rex = qr/my.STRING/is;
+ s/$rex/foo/;
+
+is equivalent to
+
+ s/my.STRING/foo/is;
+
+The result may be used as a subpattern in a match:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other patterns
$string =~ $re; # or used standalone
+ $string =~ /$re/; # or this way
+
+Since Perl may compile the pattern at the moment of execution of qr()
+operator, using qr() may have speed advantages in I<some> situations,
+notably if the result of qr() is used standalone:
+
+ sub match {
+ my $patterns = shift;
+ my @compiled = map qr/$_/i, @$patterns;
+ grep {
+ my $success = 0;
+ foreach my $pat @compiled {
+ $success = 1, last if /$pat/;
+ }
+ $success;
+ } @_;
+ }
+
+Precompilation of the pattern into an internal representation at the
+moment of qr() avoids a need to recompile the pattern every time a
+match C</$pat/> is attempted. (Note that Perl has many other
+internal optimizations, but none would be triggered in the above
+example if we did not use qr() operator.)
Options are:
s Treat string as single line.
x Use extended regular expressions.
-The benefit from this is that the pattern is precompiled into an internal
-representation, and does not need to be recompiled every time a match
-is attempted. This makes it very efficient to do something like:
-
- foreach $pattern (@pattern_list) {
- my $re = qr/$pattern/;
- foreach $line (@lines) {
- if($line =~ /$re/) {
- do_something($line);
- }
- }
- }
-
See L<perlre> for additional information on valid syntax for STRING, and
for a detailed look at the semantics of regular expressions.
=item qw/STRING/
-Returns a list of the words extracted out of STRING, using embedded
-whitespace as the word delimiters. It is exactly equivalent to
+Evaluates to a list of the words extracted out of STRING, using embedded
+whitespace as the word delimiters. It can be understood as being roughly
+equivalent to:
split(' ', q/STRING/);
-This equivalency means that if used in scalar context, you'll get split's
-(unfortunate) scalar context behavior, complete with mysterious warnings.
+the difference being that it generates a real list at compile time. So
+this expression:
+
+ qw(foo bar baz)
+
+is exactly equivalent to the list:
+
+ ('foo', 'bar', 'baz')
Some frequently seen examples:
switch produce warnings if the STRING contains the "," or the "#"
character.
+Note that under use L<locale> qw() taints because the definition of
+whitespace is tainted. See L<perlsec> for more information about
+tainting and L<perllocale> for more information about locales.
+
=item s/PATTERN/REPLACEMENT/egimosx
Searches a string for a pattern, and if found, replaces that pattern
be scalar variable, an array element, a hash element, or an assignment
to one of those, i.e., an lvalue.)
-If the delimiter chosen is single quote, no variable interpolation is
+If the delimiter chosen is a single quote, no variable interpolation is
done on either the PATTERN or the REPLACEMENT. Otherwise, if the
PATTERN contains a $ that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
-=item tr/SEARCHLIST/REPLACEMENTLIST/cds
+=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC
-=item y/SEARCHLIST/REPLACEMENTLIST/cds
+=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
specified via the =~ or !~ operator, the $_ string is transliterated. (The
string specified with =~ must be a scalar variable, an array element, a
hash element, or an assignment to one of those, i.e., an lvalue.)
+
A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
its own pair of quotes, which may or may not be bracketing quotes,
e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
+Note also that the whole range idea is rather unportable between
+character sets--and even within character sets they may cause results
+you probably didn't expect. A sound principle is to use only ranges
+that begin from and end at either alphabets of equal case (a-e, A-E),
+or digits (0-4). Anything else is unsafe. If in doubt, spell out the
+character sets in full.
+
Options:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
+ U Translate to/from UTF-8.
+ C Translate to/from 8-bit char (octet).
If the C</c> modifier is specified, the SEARCHLIST character set is
complemented. If the C</d> modifier is specified, any characters specified
This latter is useful for counting characters in a class or for
squashing character sequences in a class.
+The first C</U> or C</C> modifier applies to the left side of the translation.
+The second one applies to the right side. If present, these modifiers override
+the current utf8 state.
+
Examples:
$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
tr [\200-\377]
[\000-\177]; # delete 8th bit
+ tr/\0-\xFF//CU; # translate Latin-1 to Unicode
+ tr/\0-\x{FF}//UC; # translate Unicode to Latin-1
+
If multiple transliterations are given for a character, only the first one is used:
tr/AAA/XYZ/
=head2 I/O Operators
There are several I/O operators you should know about.
+
A string enclosed by backticks (grave accents) first undergoes
variable substitution just like a double quoted string. It is then
interpreted as a command, and the output of that command is the value
always undergo shell expansion as well, see L<perlsec> for
security concerns.)
-Evaluating a filehandle in angle brackets yields the next line from
-that file (newline, if any, included), or C<undef> at end of file.
-Ordinarily you must assign that value to a variable, but there is one
+In a scalar context, evaluating a filehandle in angle brackets yields the
+next line from that file (newline, if any, included), or C<undef> at
+end-of-file. When C<$/> is set to C<undef> (i.e. file slurp mode),
+it returns C<''> the first time, followed by C<undef> subsequently.
+
+Ordinarily you must assign the returned value to a variable, but there is one
situation where an automatic assignment happens. I<If and ONLY if> the
input symbol is the only thing inside the conditional of a C<while> or
C<for(;;)> loop, the value is automatically assigned to the variable
filehandles C<stdin>, C<stdout>, and C<stderr> will also work except in
packages, where they would be interpreted as local identifiers rather
than global.) Additional filehandles may be created with the open()
-function. See L<perlfunc/open()> for details on this.
+function. See L<perlfunc/open> for details on this.
If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a
list consisting of all the input lines is returned, one line per list
element. It's easy to make a I<LARGE> data space this way, so use with
care.
+E<lt>FILEHANDLEE<gt> may also be spelt readline(FILEHANDLE). See
+L<perlfunc/readline>.
+
The null filehandle E<lt>E<gt> is special and can be used to emulate the
behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
standard input, or from each file listed on the command line. Here's
(C<~ | & ^>).
If the operands to a binary bitwise op are strings of different sizes,
-B<or> and B<xor> ops will act as if the shorter operand had additional
-zero bits on the right, while the B<and> op will act as if the longer
-operand were truncated to the length of the shorter.
+B<|> and B<^> ops will act as if the shorter operand had additional
+zero bits on the right, while the B<&> op will act as if the longer
+operand were truncated to the length of the shorter. Note that the
+granularity for such extension or truncation is one or more I<bytes>.
# ASCII-based examples
print "j p \n" ^ " a h"; # prints "JAPH\n"
$baz = 0+$foo & 0+$bar; # both ops explicitly numeric
$biz = "$foo" ^ "$bar"; # both ops explicitly stringy
+See L<perlfunc/vec> for information on how to manipulate individual bits
+in a bit vector.
+
=head2 Integer Arithmetic
By default Perl assumes that it must do most of its arithmetic in