=head1 DESCRIPTION
A Perl script consists of a sequence of declarations and statements.
-The only things that need to be declared in Perl are report formats
-and subroutines. See the sections below for more information on those
-declarations. All uninitialized user-created objects are assumed to
-start with a null or 0 value until they are defined by some explicit
-operation such as assignment. (Though you can get warnings about the
-use of undefined values if you like.) The sequence of statements is
-executed just once, unlike in B<sed> and B<awk> scripts, where the
-sequence of statements is executed for each input line. While this means
-that you must explicitly loop over the lines of your input file (or
-files), it also means you have much more control over which files and
-which lines you look at. (Actually, I'm lying--it is possible to do an
-implicit loop with either the B<-n> or B<-p> switch. It's just not the
-mandatory default like it is in B<sed> and B<awk>.)
+The sequence of statements is executed just once, unlike in B<sed>
+and B<awk> scripts, where the sequence of statements is executed
+for each input line. While this means that you must explicitly
+loop over the lines of your input file (or files), it also means
+you have much more control over which files and which lines you look at.
+(Actually, I'm lying--it is possible to do an implicit loop with
+either the B<-n> or B<-p> switch. It's just not the mandatory
+default like it is in B<sed> and B<awk>.)
+
+Perl is, for the most part, a free-form language. (The only exception
+to this is format declarations, for obvious reasons.) Text from a
+C<"#"> character until the end of the line is a comment, and is
+ignored. If you attempt to use C</* */> C-style comments, it will be
+interpreted either as division or pattern matching, depending on the
+context, and C++ C<//> comments just look like a null regular
+expression, so don't do that.
=head2 Declarations
-Perl is, for the most part, a free-form language. (The only
-exception to this is format declarations, for obvious reasons.) Comments
-are indicated by the "#" character, and extend to the end of the line. If
-you attempt to use C</* */> C-style comments, it will be interpreted
-either as division or pattern matching, depending on the context, and C++
-C<//> comments just look like a null regular expression, so don't do
-that.
+The only things you need to declare in Perl are report formats
+and subroutines--and even undefined subroutines can be handled
+through AUTOLOAD. A variable holds the undefined value (C<undef>)
+until it has been assigned a defined value, which is anything
+other than C<undef>. When used as a number, C<undef> is treated
+as C<0>; when used as a string, it is treated the empty string,
+C<"">; and when used as a reference that isn't being assigned
+to, it is treated as an error. If you enable warnings, you'll
+be notified of an uninitialized value whenever you treat C<undef>
+as a string or a number. Well, usually. Boolean ("don't-care")
+contexts and operators such as C<++>, C<-->, C<+=>, C<-=>, and
+C<.=> are always exempt from such warnings.
A declaration can be put anywhere a statement can, but has no effect on
the execution of the primary sequence of statements--declarations all
take effect at compile time. Typically all the declarations are put at
-the beginning or the end of the script. However, if you're using
-lexically-scoped private variables created with my(), you'll have to make sure
+the beginning or the end of the script. However, if you're using
+lexically-scoped private variables created with C<my()>, you'll
+have to make sure
your format or subroutine definition is within the same block scope
-as the my if you expect to to be able to access those private variables.
+as the my if you expect to be able to access those private variables.
Declaring a subroutine allows a subroutine name to be used as if it were a
list operator from that point forward in the program. You can declare a
-subroutine without defining it by saying just
+subroutine without defining it by saying C<sub name>, thus:
sub myname;
$me = myname $0 or die "can't get myname";
-Note that it functions as a list operator though, not as a unary
-operator, so be careful to use C<or> instead of C<||> there.
+Note that my() functions as a list operator, not as a unary operator; so
+be careful to use C<or> instead of C<||> in this case. However, if
+you were to declare the subroutine as C<sub myname ($)>, then
+C<myname> would function as a unary operator, so either C<or> or
+C<||> would work.
Subroutines declarations can also be loaded up with the C<require> statement
or both loaded and imported into your namespace with a C<use> statement.
side effects. Every simple statement must be terminated with a
semicolon, unless it is the final statement in a block, in which case
the semicolon is optional. (A semicolon is still encouraged there if the
-block takes up more than one line, since you may eventually add another line.)
+block takes up more than one line, because you may eventually add another line.)
Note that there are some operators like C<eval {}> and C<do {}> that look
-like compound statements, but aren't (they're just TERMs in an expression),
+like compound statements, but aren't (they're just TERMs in an expression),
and thus need an explicit termination if used as the last item in a statement.
Any simple statement may optionally be followed by a I<SINGLE> modifier,
unless EXPR
while EXPR
until EXPR
+ foreach EXPR
The C<if> and C<unless> modifiers have the expected semantics,
-presuming you're a speaker of English. The C<while> and C<until>
-modifiers also have the usual "while loop" semantics (conditional
-evaluated first), except when applied to a do-BLOCK (or to the
-now-deprecated do-SUBROUTINE statement), in which case the block
-executes once before the conditional is evaluated. This is so that you
-can write loops like:
+presuming you're a speaker of English. The C<foreach> modifier is an
+iterator: For each value in EXPR, it aliases C<$_> to the value and
+executes the statement. The C<while> and C<until> modifiers have the
+usual "C<while> loop" semantics (conditional evaluated first), except
+when applied to a C<do>-BLOCK (or to the deprecated C<do>-SUBROUTINE
+statement), in which case the block executes once before the
+conditional is evaluated. This is so that you can write loops like:
do {
$line = <STDIN>;
...
} until $line eq ".\n";
-See L<perlfunc/do>. Note also that the loop control
-statements described later will I<NOT> work in this construct, since
-modifiers don't take loop labels. Sorry. You can always wrap
-another block around it to do that sort of thing.
+See L<perlfunc/do>. Note also that the loop control statements described
+later will I<NOT> work in this construct, because modifiers don't take
+loop labels. Sorry. You can always put another block inside of it
+(for C<next>) or around it (for C<last>) to do that sort of thing.
+For C<next>, just double the braces:
+
+ do {{
+ next if $x == $y;
+ # do something here
+ }} until $x++ > $z;
+
+For C<last>, you have to be more elaborate:
+
+ LOOP: {
+ do {
+ last if $x = $y**2;
+ # do something here
+ } while $x++ <= $z;
+ }
=head2 Compound statements
LABEL while (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
+ LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK continue BLOCK
Note that, unlike C and Pascal, these are defined in terms of BLOCKs,
open(FOO) ? 'hi mom' : die "Can't open $FOO: $!";
# a bit exotic, that last one
-The C<if> statement is straightforward. Since BLOCKs are always
+The C<if> statement is straightforward. Because BLOCKs are always
bounded by curly brackets, there is never any ambiguity about which
C<if> an C<else> goes with. If you use C<unless> in place of C<if>,
the sense of the test is reversed.
The C<while> statement executes the block as long as the expression is
-true (does not evaluate to the null string or 0 or "0"). The LABEL is
-optional, and if present, consists of an identifier followed by a colon.
-The LABEL identifies the loop for the loop control statements C<next>,
-C<last>, and C<redo>. If the LABEL is omitted, the loop control statement
+true (does not evaluate to the null string C<""> or C<0> or C<"0">).
+The LABEL is optional, and if present, consists of an identifier followed
+by a colon. The LABEL identifies the loop for the loop control
+statements C<next>, C<last>, and C<redo>.
+If the LABEL is omitted, the loop control statement
refers to the innermost enclosing loop. This may include dynamically
looking back your call-stack at run time to find the LABEL. Such
-desperate behavior triggers a warning if you use the B<-w> flag.
+desperate behavior triggers a warning if you use the C<use warnings>
+praga or the B<-w> flag.
+Unlike a C<foreach> statement, a C<while> statement never implicitly
+localises any variables.
If there is a C<continue> BLOCK, it is always executed just before the
conditional is about to be evaluated again, just like the third part of a
while (<>) {
chomp;
- if (s/\\$//) {
- $_ .= <>;
+ if (s/\\$//) {
+ $_ .= <>;
redo unless eof();
}
# now process $_
- }
+ }
which is Perl short-hand for the more explicitly written version:
- LINE: while ($line = <ARGV>) {
+ LINE: while (defined($line = <ARGV>)) {
chomp($line);
- if ($line =~ s/\\$//) {
- $line .= <ARGV>;
+ if ($line =~ s/\\$//) {
+ $line .= <ARGV>;
redo LINE unless eof(); # not eof(ARGV)!
}
# now process $line
- }
-
-Or here's a a simpleminded Pascal comment stripper (warning: assumes no { or } in strings)
-
- LINE: while (<STDIN>) {
- while (s|({.*}.*){.*}|$1 |) {}
- s|{.*}| |;
- if (s|{.*| |) {
- $front = $_;
- while (<STDIN>) {
- if (/}/) { # end of comment?
- s|^|$front{|;
- redo LINE;
- }
- }
- }
- print;
}
Note that if there were a C<continue> block on the above code, it would get
-executed even on discarded lines.
+executed even on discarded lines. This is often used to reset line counters
+or C<?pat?> one-time matches.
+
+ # inspired by :1,$g/fred/s//WILMA/
+ while (<>) {
+ ?(fred)? && s//WILMA $1 WILMA/;
+ ?(barney)? && s//BETTY $1 BETTY/;
+ ?(homer)? && s//MARGE $1 MARGE/;
+ } continue {
+ print "$ARGV $.: $_";
+ close ARGV if eof(); # reset $.
+ reset if eof(); # reset ?pat?
+ }
If the word C<while> is replaced by the word C<until>, the sense of the
test is reversed, but the conditional is still tested before the first
iteration.
-In either the C<if> or the C<while> statement, you may replace "(EXPR)"
-with a BLOCK, and the conditional is true if the value of the last
-statement in that block is true. While this "feature" continues to work in
-version 5, it has been deprecated, so please change any occurrences of "if BLOCK" to
-"if (do BLOCK)".
+The loop control statements don't work in an C<if> or C<unless>, since
+they aren't loops. You can double the braces to make them such, though.
+
+ if (/pattern/) {{
+ next if /fred/;
+ next if /barney/;
+ # so something here
+ }}
+
+The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer
+available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>.
=head2 For Loops
$i++;
}
+(There is one minor difference: The first form implies a lexical scope
+for variables declared with C<my> in the initialization expression.)
+
Besides the normal array index looping, C<for> can lend itself
to many other interesting applications. Here's one that avoids the
-problem you get into if you explicitly test for end-of-file on
-an interactive file descriptor causing your program to appear to
+problem you get into if you explicitly test for end-of-file on
+an interactive file descriptor causing your program to appear to
hang.
$on_a_tty = -t STDIN && -t STDOUT;
sub prompt { print "yes? " if $on_a_tty }
for ( prompt(); <STDIN>; prompt() ) {
# do something
- }
+ }
=head2 Foreach Loops
The C<foreach> loop iterates over a normal list value and sets the
-variable VAR to be each element of the list in turn. The variable is
-implicitly local to the loop and regains its former value upon exiting the
-loop. If the variable was previously declared with C<my>, it uses that
-variable instead of the global one, but it's still localized to the loop.
-This can cause problems if you have subroutine or format declarations
-within that block's scope.
+variable VAR to be each element of the list in turn. If the variable
+is preceded with the keyword C<my>, then it is lexically scoped, and
+is therefore visible only within the loop. Otherwise, the variable is
+implicitly local to the loop and regains its former value upon exiting
+the loop. If the variable was previously declared with C<my>, it uses
+that variable instead of the global one, but it's still localized to
+the loop.
The C<foreach> keyword is actually a synonym for the C<for> keyword, so
-you can use C<foreach> for readability or C<for> for brevity. If VAR is
-omitted, $_ is set to each value. If LIST is an actual array (as opposed
-to an expression returning a list value), you can modify each element of
-the array by modifying VAR inside the loop. That's because the C<foreach>
-loop index variable is an implicit alias for each item in the list that
-you're looping over.
+you can use C<foreach> for readability or C<for> for brevity. (Or because
+the Bourne shell is more familiar to you than I<csh>, so writing C<for>
+comes more naturally.) If VAR is omitted, C<$_> is set to each value.
+If any element of LIST is an lvalue, you can modify it by modifying VAR
+inside the loop. That's because the C<foreach> loop index variable is
+an implicit alias for each item in the list that you're looping over.
+
+If any part of LIST is an array, C<foreach> will get very confused if
+you add or remove elements within the loop body, for example with
+C<splice>. So don't do that.
+
+C<foreach> probably won't do what you expect if VAR is a tied or other
+special variable. Don't do that either.
Examples:
for (@ary) { s/foo/bar/ }
- foreach $elem (@elements) {
+ foreach my $elem (@elements) {
$elem *= 2;
}
Here's how a C programmer might code up a particular algorithm in Perl:
- for ($i = 0; $i < @ary1; $i++) {
- for ($j = 0; $j < @ary2; $j++) {
+ for (my $i = 0; $i < @ary1; $i++) {
+ for (my $j = 0; $j < @ary2; $j++) {
if ($ary1[$i] > $ary2[$j]) {
last; # can't go to outer :-(
}
# this is where that last takes me
}
-Whereas here's how a Perl programmer more confortable with the idiom might
+Whereas here's how a Perl programmer more comfortable with the idiom might
do it:
- OUTER: foreach $wid (@ary1) {
- INNER: foreach $jet (@ary2) {
+ OUTER: foreach my $wid (@ary1) {
+ INNER: foreach my $jet (@ary2) {
next OUTER if $wid > $jet;
$wid += $jet;
- }
- }
+ }
+ }
See how much easier this is? It's cleaner, safer, and faster. It's
cleaner because it's less noisy. It's safer because if code gets added
-between the inner and outer loops later, you won't accidentally excecute
-it because you've explicitly asked to iterate the other loop rather than
-merely terminating the inner one. And it's faster because Perl executes a
-C<foreach> statement more rapidly than it would the equivalent C<for>
-loop.
+between the inner and outer loops later on, the new code won't be
+accidentally executed. The C<next> explicitly iterates the other loop
+rather than merely terminating the inner one. And it's faster because
+Perl executes a C<foreach> statement more rapidly than it would the
+equivalent C<for> loop.
=head2 Basic BLOCKs and Switch Statements
-A BLOCK by itself (labeled or not) is semantically equivalent to a loop
-that executes once. Thus you can use any of the loop control
-statements in it to leave or restart the block. The C<continue> block
-is optional.
+A BLOCK by itself (labeled or not) is semantically equivalent to a
+loop that executes once. Thus you can use any of the loop control
+statements in it to leave or restart the block. (Note that this is
+I<NOT> true in C<eval{}>, C<sub{}>, or contrary to popular belief
+C<do{}> blocks, which do I<NOT> count as loops.) The C<continue>
+block is optional.
The BLOCK construct is particularly nice for doing case
structures.
$nothing = 1;
}
-There is no official switch statement in Perl, because there are
+There is no official C<switch> statement in Perl, because there are
already several ways to write the equivalent. In addition to the
above, you could write
$nothing = 1;
}
-or formatted so it stands out more as a "proper" switch statement:
+or formatted so it stands out more as a "proper" C<switch> statement:
SWITCH: {
- /^abc/ && do {
- $abc = 1;
- last SWITCH;
+ /^abc/ && do {
+ $abc = 1;
+ last SWITCH;
};
- /^def/ && do {
- $def = 1;
- last SWITCH;
+ /^def/ && do {
+ $def = 1;
+ last SWITCH;
};
- /^xyz/ && do {
- $xyz = 1;
- last SWITCH;
+ /^xyz/ && do {
+ $xyz = 1;
+ last SWITCH;
};
$nothing = 1;
}
else
{ $nothing = 1 }
-
-A common idiom for a switch statement is to use C<foreach>'s aliasing to make
-a temporary assignment to $_ for convenient matching:
+A common idiom for a C<switch> statement is to use C<foreach>'s aliasing to make
+a temporary assignment to C<$_> for convenient matching:
SWITCH: for ($where) {
/In Card Names/ && do { push @flags, '-e'; last; };
/Anywhere/ && do { push @flags, '-h'; last; };
/In Rulings/ && do { last; };
die "unknown value for form variable where: `$where'";
- }
+ }
Another interesting approach to a switch statement is arrange
for a C<do> block to return the proper value:
$amode = do {
- if ($flag & O_RDONLY) { "r" }
- elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "w" : "a" }
+ if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0?
+ elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }
elsif ($flag & O_RDWR) {
if ($flag & O_CREAT) { "w+" }
- else { ($flag & O_APPEND) ? "r+" : "a+" }
+ else { ($flag & O_APPEND) ? "a+" : "r+" }
}
};
+Or
+
+ print do {
+ ($flags & O_WRONLY) ? "write-only" :
+ ($flags & O_RDWR) ? "read-write" :
+ "read-only";
+ };
+
+Or if you are certainly that all the C<&&> clauses are true, you can use
+something like this, which "switches" on the value of the
+C<HTTP_USER_AGENT> envariable.
+
+ #!/usr/bin/perl
+ # pick out jargon file page based on browser
+ $dir = 'http://www.wins.uva.nl/~mes/jargon';
+ for ($ENV{HTTP_USER_AGENT}) {
+ $page = /Mac/ && 'm/Macintrash.html'
+ || /Win(dows )?NT/ && 'e/evilandrude.html'
+ || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html'
+ || /Linux/ && 'l/Linux.html'
+ || /HP-UX/ && 'h/HP-SUX.html'
+ || /SunOS/ && 's/ScumOS.html'
+ || 'a/AppendixB.html';
+ }
+ print "Location: $dir/$page\015\012\015\012";
+
+That kind of switch statement only works when you know the C<&&> clauses
+will be true. If you don't, the previous C<?:> example should be used.
+
+You might also consider writing a hash of subroutine references
+instead of synthesizing a C<switch> statement.
+
=head2 Goto
-Although not for the faint of heart, Perl does support a C<goto> statement.
-A loop's LABEL is not actually a valid target for a C<goto>;
-it's just the name of the loop. There are three forms: goto-LABEL,
-goto-EXPR, and goto-&NAME.
+Although not for the faint of heart, Perl does support a C<goto>
+statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and
+C<goto>-&NAME. A loop's LABEL is not actually a valid target for
+a C<goto>; it's just the name of the loop.
-The goto-LABEL form finds the statement labeled with LABEL and resumes
+The C<goto>-LABEL form finds the statement labeled with LABEL and resumes
execution there. It may not be used to go into any construct that
-requires initialization, such as a subroutine or a foreach loop. It
+requires initialization, such as a subroutine or a C<foreach> loop. It
also can't be used to go into a construct that is optimized away. It
can be used to go almost anywhere else within the dynamic scope,
including out of subroutines, but it's usually better to use some other
-construct such as last or die. The author of Perl has never felt the
-need to use this form of goto (in Perl, that is--C is another matter).
+construct such as C<last> or C<die>. The author of Perl has never felt the
+need to use this form of C<goto> (in Perl, that is--C is another matter).
-The goto-EXPR form expects a label name, whose scope will be resolved
-dynamically. This allows for computed gotos per FORTRAN, but isn't
+The C<goto>-EXPR form expects a label name, whose scope will be resolved
+dynamically. This allows for computed C<goto>s per FORTRAN, but isn't
necessarily recommended if you're optimizing for maintainability:
goto ("FOO", "BAR", "GLARCH")[$i];
-The goto-&NAME form is highly magical, and substitutes a call to the
+The C<goto>-&NAME form is highly magical, and substitutes a call to the
named subroutine for the currently running subroutine. This is used by
-AUTOLOAD() subroutines that wish to load another subroutine and then
+C<AUTOLOAD()> subroutines that wish to load another subroutine and then
pretend that the other subroutine had been called in the first place
-(except that any modifications to @_ in the current subroutine are
-propagated to the other subroutine.) After the C<goto>, not even caller()
+(except that any modifications to C<@_> in the current subroutine are
+propagated to the other subroutine.) After the C<goto>, not even C<caller()>
will be able to tell that this routine was called first.
-In almost cases like this, it's usually a far, far better idea to use the
-structured control flow mechanisms of C<next>, C<last>, or C<redo> insetad
+In almost all cases like this, it's usually a far, far better idea to use the
+structured control flow mechanisms of C<next>, C<last>, or C<redo> instead of
resorting to a C<goto>. For certain applications, the catch and throw pair of
C<eval{}> and die() for exception processing can also be a prudent approach.
=head2 PODs: Embedded Documentation
Perl has a mechanism for intermixing documentation with source code.
-If while expecting the beginning of a new statement, the compiler
+While it's expecting the beginning of a new statement, if the compiler
encounters a line that begins with an equal sign and a word, like this
=head1 Here There Be Pods!
Then that text and all remaining text up through and including a line
beginning with C<=cut> will be ignored. The format of the intervening
-text is described in L<perlpod>.
+text is described in L<perlpod>.
This allows you to intermix your source code
and your documentation text freely, as in
=item snazzle($)
- The snazzle() function will behave in the most spectacular
+ The snazzle() function will behave in the most spectacular
form that you can possibly imagine, not even excepting
cybernetic pyrotechnics.
sub snazzle($) {
my $thingie = shift;
.........
- }
+ }
-Note that pod translators should only look at paragraphs beginning
-with a pod diretive (it makes parsing easier), whereas the compiler
-actually knows to look for pod escapes even in the middle of a
+Note that pod translators should look at only paragraphs beginning
+with a pod directive (it makes parsing easier), whereas the compiler
+actually knows to look for pod escapes even in the middle of a
paragraph. This means that the following secret stuff will be
ignored by both the compiler and the translators.
=cut back
print "got $a\n";
-You probably shouldn't rely upon the warn() being podded out forever.
+You probably shouldn't rely upon the C<warn()> being podded out forever.
Not all pod translators are well-behaved in this regard, and perhaps
the compiler will become pickier.
+
+One may also use pod directives to quickly comment out a section
+of code.
+
+=head2 Plain Old Comments (Not!)
+
+Much like the C preprocessor, Perl can process line directives. Using
+this, one can control Perl's idea of filenames and line numbers in
+error or warning messages (especially for strings that are processed
+with C<eval()>). The syntax for this mechanism is the same as for most
+C preprocessors: it matches the regular expression
+C</^#\s*line\s+(\d+)\s*(?:\s"([^"]+)")?\s*$/> with C<$1> being the line
+number for the next line, and C<$2> being the optional filename
+(specified within quotes).
+
+Here are some examples that you should be able to type into your command
+shell:
+
+ % perl
+ # line 200 "bzzzt"
+ # the `#' on the previous line must be the first char on line
+ die 'foo';
+ __END__
+ foo at bzzzt line 201.
+
+ % perl
+ # line 200 "bzzzt"
+ eval qq[\n#line 2001 ""\ndie 'foo']; print $@;
+ __END__
+ foo at - line 2001.
+
+ % perl
+ eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@;
+ __END__
+ foo at foo bar line 200.
+
+ % perl
+ # line 345 "goop"
+ eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'";
+ print $@;
+ __END__
+ foo at goop line 345.
+
+=cut