X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlsyn.pod;h=c27933015c3effbff40f675b1af8ea4065426685;hb=0111df86b68202837d8ca044a27bbc00d7895fb1;hp=3ddb493c8bd1d4e1d81da5dbfd96819931eb3ddb;hpb=a0d0e21ea6ea90a22318550944fe6cb09ae10cda;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod index 3ddb493..c279330 100644 --- a/pod/perlsyn.pod +++ b/pod/perlsyn.pod @@ -5,49 +5,79 @@ perlsyn - Perl syntax =head1 DESCRIPTION A Perl script consists of a sequence of declarations and statements. -The only things that need to be declared in Perl are report formats -and subroutines. See the sections below for more information on those -declarations. All uninitialized user-created objects are assumed to -start with a null or 0 value until they are defined by some explicit -operation such as assignment. (Though you can get warnings about the -use of undefined values if you like.) The sequence of statements is -executed just once, unlike in B and B scripts, where the -sequence of statements is executed for each input line. While this means -that you must explicitly loop over the lines of your input file (or -files), it also means you have much more control over which files and -which lines you look at. (Actually, I'm lying--it is possible to do an -implicit loop with either the B<-n> or B<-p> switch. It's just not the -mandatory default like it is in B and B.) - -Perl is, for the most part, a free-form language. (The only -exception to this is format declarations, for obvious reasons.) Comments -are indicated by the "#" character, and extend to the end of the line. If -you attempt to use C C-style comments, it will be interpreted -either as division or pattern matching, depending on the context, and C++ -C comments just look like a null regular expression, So don't do -that. +The sequence of statements is executed just once, unlike in B +and B scripts, where the sequence of statements is executed +for each input line. While this means that you must explicitly +loop over the lines of your input file (or files), it also means +you have much more control over which files and which lines you look at. +(Actually, I'm lying--it is possible to do an implicit loop with +either the B<-n> or B<-p> switch. It's just not the mandatory +default like it is in B and B.) + +Perl is, for the most part, a free-form language. (The only exception +to this is format declarations, for obvious reasons.) Text from a +C<"#"> character until the end of the line is a comment, and is +ignored. If you attempt to use C C-style comments, it will be +interpreted either as division or pattern matching, depending on the +context, and C++ C comments just look like a null regular +expression, so don't do that. + +=head2 Declarations + +The only things you need to declare in Perl are report formats +and subroutines--and even undefined subroutines can be handled +through AUTOLOAD. A variable holds the undefined value (C) +until it has been assigned a defined value, which is anything +other than C. When used as a number, C is treated +as C<0>; when used as a string, it is treated the empty string, +C<"">; and when used as a reference that isn't being assigned +to, it is treated as an error. If you enable warnings, you'll +be notified of an uninitialized value whenever you treat C +as a string or a number. Well, usually. Boolean contexts, such as: + + my $a; + if ($a) {} + +are exempt from warnings (because they care about truth rather than +definedness). Operators such as C<++>, C<-->, C<+=>, +C<-=>, and C<.=>, that operate on undefined left values such as: + + my $a; + $a++; + +are also always exempt from such warnings. A declaration can be put anywhere a statement can, but has no effect on the execution of the primary sequence of statements--declarations all take effect at compile time. Typically all the declarations are put at -the beginning or the end of the script. +the beginning or the end of the script. However, if you're using +lexically-scoped private variables created with C, you'll +have to make sure +your format or subroutine definition is within the same block scope +as the my if you expect to be able to access those private variables. -As of Perl 5, declaring a subroutine allows a subroutine name to be used -as if it were a list operator from that point forward in the program. You -can declare a subroutine without defining it by saying just +Declaring a subroutine allows a subroutine name to be used as if it were a +list operator from that point forward in the program. You can declare a +subroutine without defining it by saying C, thus: sub myname; $me = myname $0 or die "can't get myname"; -Note that it functions as a list operator though, not a unary -operator, so be careful to use C instead of C<||> there. +Note that myname() functions as a list operator, not as a unary operator; +so be careful to use C instead of C<||> in this case. However, if +you were to declare the subroutine as C, then +C would function as a unary operator, so either C or +C<||> would work. -Subroutines declarations can also be imported by a C statement. +Subroutines declarations can also be loaded up with the C statement +or both loaded and imported into your namespace with a C statement. +See L for details on this. -Also as of Perl 5, a statement sequence may contain declarations of -lexically scoped variables, but apart from declaring a variable name, -the declaration acts like an ordinary statement, and is elaborated within -the sequence of statements as if it were an ordinary statement. +A statement sequence may contain declarations of lexically-scoped +variables, but apart from declaring a variable name, the declaration acts +like an ordinary statement, and is elaborated within the sequence of +statements as if it were an ordinary statement. That means it actually +has both compile-time and run-time effects. =head2 Simple statements @@ -55,11 +85,10 @@ The only kind of simple statement is an expression evaluated for its side effects. Every simple statement must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon is optional. (A semicolon is still encouraged there if the -block takes up more than one line, since you may add another line.) +block takes up more than one line, because you may eventually add another line.) Note that there are some operators like C and C that look -like compound statements, but aren't (they're just TERMs in an expression), -and thus need an explicit termination -if used as the last item in a statement. +like compound statements, but aren't (they're just TERMs in an expression), +and thus need an explicit termination if used as the last item in a statement. Any simple statement may optionally be followed by a I modifier, just before the terminating semicolon (or block ending). The possible @@ -69,24 +98,41 @@ modifiers are: unless EXPR while EXPR until EXPR + foreach EXPR The C and C modifiers have the expected semantics, -presuming you're a speaker of English. The C and C -modifiers also have the usual "while loop" semantics (conditional -evaluated first), except when applied to a do-BLOCK (or to the -now-deprecated do-SUBROUTINE statement), in which case the block -executes once before the conditional is evaluated. This is so that you -can write loops like: +presuming you're a speaker of English. The C modifier is an +iterator: For each value in EXPR, it aliases C<$_> to the value and +executes the statement. The C and C modifiers have the +usual "C loop" semantics (conditional evaluated first), except +when applied to a C-BLOCK (or to the deprecated C-SUBROUTINE +statement), in which case the block executes once before the +conditional is evaluated. This is so that you can write loops like: do { - $_ = ; + $line = ; ... - } until $_ eq ".\n"; - -See L. Note also that the loop control -statements described later will I work in this construct, since -modifiers don't take loop labels. Sorry. You can always wrap -another block around it to do that sort of thing.) + } until $line eq ".\n"; + +See L. Note also that the loop control statements described +later will I work in this construct, because modifiers don't take +loop labels. Sorry. You can always put another block inside of it +(for C) or around it (for C) to do that sort of thing. +For C, just double the braces: + + do {{ + next if $x == $y; + # do something here + }} until $x++ > $z; + +For C, you have to be more elaborate: + + LOOP: { + do { + last if $x = $y**2; + # do something here + } while $x++ <= $z; + } =head2 Compound statements @@ -106,7 +152,8 @@ The following compound statements may be used to control flow: LABEL while (EXPR) BLOCK LABEL while (EXPR) BLOCK continue BLOCK LABEL for (EXPR; EXPR; EXPR) BLOCK - LABEL foreach VAR (ARRAY) BLOCK + LABEL foreach VAR (LIST) BLOCK + LABEL foreach VAR (LIST) BLOCK continue BLOCK LABEL BLOCK continue BLOCK Note that, unlike C and Pascal, these are defined in terms of BLOCKs, @@ -121,39 +168,123 @@ all do the same thing: open(FOO) ? 'hi mom' : die "Can't open $FOO: $!"; # a bit exotic, that last one -The C statement is straightforward. Since BLOCKs are always +The C statement is straightforward. Because BLOCKs are always bounded by curly brackets, there is never any ambiguity about which C an C goes with. If you use C in place of C, the sense of the test is reversed. The C statement executes the block as long as the expression is -true (does not evaluate to the null string or 0 or "0"). The LABEL is -optional, and if present, consists of an identifier followed by a -colon. The LABEL identifies the loop for the loop control statements -C, C, and C (see below). If there is a C -BLOCK, it is always executed just before the conditional is about to be -evaluated again, just like the third part of a C loop in C. -Thus it can be used to increment a loop variable, even when the loop -has been continued via the C statement (which is similar to the C -C statement). +true (does not evaluate to the null string C<""> or C<0> or C<"0">). +The LABEL is optional, and if present, consists of an identifier followed +by a colon. The LABEL identifies the loop for the loop control +statements C, C, and C. +If the LABEL is omitted, the loop control statement +refers to the innermost enclosing loop. This may include dynamically +looking back your call-stack at run time to find the LABEL. Such +desperate behavior triggers a warning if you use the C +pragma or the B<-w> flag. +Unlike a C statement, a C statement never implicitly +localises any variables. + +If there is a C BLOCK, it is always executed just before the +conditional is about to be evaluated again, just like the third part of a +C loop in C. Thus it can be used to increment a loop variable, even +when the loop has been continued via the C statement (which is +similar to the C C statement). + +=head2 Loop Control + +The C command is like the C statement in C; it starts +the next iteration of the loop: + + LINE: while () { + next LINE if /^#/; # discard comments + ... + } + +The C command is like the C statement in C (as used in +loops); it immediately exits the loop in question. The +C block, if any, is not executed: + + LINE: while () { + last LINE if /^$/; # exit when done with header + ... + } + +The C command restarts the loop block without evaluating the +conditional again. The C block, if any, is I executed. +This command is normally used by programs that want to lie to themselves +about what was just input. + +For example, when processing a file like F. +If your input lines might end in backslashes to indicate continuation, you +want to skip ahead and get the next record. + + while (<>) { + chomp; + if (s/\\$//) { + $_ .= <>; + redo unless eof(); + } + # now process $_ + } + +which is Perl short-hand for the more explicitly written version: + + LINE: while (defined($line = )) { + chomp($line); + if ($line =~ s/\\$//) { + $line .= ; + redo LINE unless eof(); # not eof(ARGV)! + } + # now process $line + } + +Note that if there were a C block on the above code, it would +get executed only on lines discarded by the regex (since redo skips the +continue block). A continue block is often used to reset line counters +or C one-time matches: + + # inspired by :1,$g/fred/s//WILMA/ + while (<>) { + ?(fred)? && s//WILMA $1 WILMA/; + ?(barney)? && s//BETTY $1 BETTY/; + ?(homer)? && s//MARGE $1 MARGE/; + } continue { + print "$ARGV $.: $_"; + close ARGV if eof(); # reset $. + reset if eof(); # reset ?pat? + } If the word C is replaced by the word C, the sense of the test is reversed, but the conditional is still tested before the first iteration. -In either the C or the C statement, you may replace "(EXPR)" -with a BLOCK, and the conditional is true if the value of the last -statement in that block is true. (This feature continues to work in Perl -5 but is deprecated. Please change any occurrences of "if BLOCK" to -"if (do BLOCK)".) +The loop control statements don't work in an C or C, since +they aren't loops. You can double the braces to make them such, though. + + if (/pattern/) {{ + last if /fred/; + next if /barney/; # same effect as "last", but doesn't document as well + # do something here + }} + +This is caused by the fact that a block by itself acts as a loop that +executes once, see L<"Basic BLOCKs and Switch Statements">. -The C-style C loop works exactly like the corresponding C loop: +The form C, available in Perl 4, is no longer +available. Replace any occurrence of C by C. + +=head2 For Loops + +Perl's C-style C loop works like the corresponding C loop; +that means that this: for ($i = 1; $i < 10; $i++) { ... } -is the same as +is the same as this: $i = 1; while ($i < 10) { @@ -162,36 +293,110 @@ is the same as $i++; } -The foreach loop iterates over a normal list value and sets the -variable VAR to be each element of the list in turn. The variable is -implicitly local to the loop (unless declared previously with C), -and regains its former value upon exiting the loop. The C -keyword is actually a synonym for the C keyword, so you can use -C for readability or C for brevity. If VAR is omitted, $_ -is set to each value. If ARRAY is an actual array (as opposed to an -expression returning a list value), you can modify each element of the -array by modifying VAR inside the loop. Examples: +There is one minor difference: if variables are declared with C +in the initialization section of the C, the lexical scope of +those variables is exactly the C loop (the body of the loop +and the control sections). + +Besides the normal array index looping, C can lend itself +to many other interesting applications. Here's one that avoids the +problem you get into if you explicitly test for end-of-file on +an interactive file descriptor causing your program to appear to +hang. + + $on_a_tty = -t STDIN && -t STDOUT; + sub prompt { print "yes? " if $on_a_tty } + for ( prompt(); ; prompt() ) { + # do something + } + +=head2 Foreach Loops + +The C loop iterates over a normal list value and sets the +variable VAR to be each element of the list in turn. If the variable +is preceded with the keyword C, then it is lexically scoped, and +is therefore visible only within the loop. Otherwise, the variable is +implicitly local to the loop and regains its former value upon exiting +the loop. If the variable was previously declared with C, it uses +that variable instead of the global one, but it's still localized to +the loop. + +The C keyword is actually a synonym for the C keyword, so +you can use C for readability or C for brevity. (Or because +the Bourne shell is more familiar to you than I, so writing C +comes more naturally.) If VAR is omitted, C<$_> is set to each value. + +If any element of LIST is an lvalue, you can modify it by modifying +VAR inside the loop. Conversely, if any element of LIST is NOT an +lvalue, any attempt to modify that element will fail. In other words, +the C loop index variable is an implicit alias for each item +in the list that you're looping over. + +If any part of LIST is an array, C will get very confused if +you add or remove elements within the loop body, for example with +C. So don't do that. + +C probably won't do what you expect if VAR is a tied or other +special variable. Don't do that either. + +Examples: - for (@ary) { s/foo/bar/; } + for (@ary) { s/foo/bar/ } - foreach $elem (@elements) { + for my $elem (@elements) { $elem *= 2; } - for ((10,9,8,7,6,5,4,3,2,1,'BOOM')) { - print $_, "\n"; sleep(1); + for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') { + print $count, "\n"; sleep(1); } for (1..15) { print "Merry Christmas\n"; } - foreach $item (split(/:[\\\n:]*/, $ENV{'TERMCAP'})) { + foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) { print "Item: $item\n"; } -A BLOCK by itself (labeled or not) is semantically equivalent to a loop -that executes once. Thus you can use any of the loop control -statements in it to leave or restart the block. The C block -is optional. This construct is particularly nice for doing case +Here's how a C programmer might code up a particular algorithm in Perl: + + for (my $i = 0; $i < @ary1; $i++) { + for (my $j = 0; $j < @ary2; $j++) { + if ($ary1[$i] > $ary2[$j]) { + last; # can't go to outer :-( + } + $ary1[$i] += $ary2[$j]; + } + # this is where that last takes me + } + +Whereas here's how a Perl programmer more comfortable with the idiom might +do it: + + OUTER: for my $wid (@ary1) { + INNER: for my $jet (@ary2) { + next OUTER if $wid > $jet; + $wid += $jet; + } + } + +See how much easier this is? It's cleaner, safer, and faster. It's +cleaner because it's less noisy. It's safer because if code gets added +between the inner and outer loops later on, the new code won't be +accidentally executed. The C explicitly iterates the other loop +rather than merely terminating the inner one. And it's faster because +Perl executes a C statement more rapidly than it would the +equivalent C loop. + +=head2 Basic BLOCKs and Switch Statements + +A BLOCK by itself (labeled or not) is semantically equivalent to a +loop that executes once. Thus you can use any of the loop control +statements in it to leave or restart the block. (Note that this is +I true in C, C, or contrary to popular belief +C blocks, which do I count as loops.) The C +block is optional. + +The BLOCK construct is particularly nice for doing case structures. SWITCH: { @@ -201,9 +406,19 @@ structures. $nothing = 1; } -There is no official switch statement in Perl, because there are -already several ways to write the equivalent. In addition to the -above, you could write +There is no official C statement in Perl, because there are +already several ways to write the equivalent. + +However, starting from Perl 5.8 to get switch and case one can use +the Switch extension and say: + + use Switch; + +after which one has switch and case. It is not as fast as it could be +because it's not really part of the language (it's done using source +filters) but it is available, and it's very flexible. + +In addition to the above BLOCK construct, you could write SWITCH: { $abc = 1, last SWITCH if /^abc/; @@ -212,7 +427,7 @@ above, you could write $nothing = 1; } -(That's actually not as strange as it looks one you realize that you can +(That's actually not as strange as it looks once you realize that you can use loop control "operators" within an expression, That's just the normal C comma operator.) @@ -225,22 +440,22 @@ or $nothing = 1; } -or formatted so it stands out more as a "proper" switch statement: +or formatted so it stands out more as a "proper" C statement: SWITCH: { - /^abc/ && do { - $abc = 1; - last SWITCH; + /^abc/ && do { + $abc = 1; + last SWITCH; }; - /^def/ && do { - $def = 1; - last SWITCH; + /^def/ && do { + $def = 1; + last SWITCH; }; - /^xyz/ && do { - $xyz = 1; - last SWITCH; + /^xyz/ && do { + $xyz = 1; + last SWITCH; }; $nothing = 1; } @@ -265,3 +480,184 @@ or even, horrors, else { $nothing = 1 } +A common idiom for a C statement is to use C's aliasing to make +a temporary assignment to C<$_> for convenient matching: + + SWITCH: for ($where) { + /In Card Names/ && do { push @flags, '-e'; last; }; + /Anywhere/ && do { push @flags, '-h'; last; }; + /In Rulings/ && do { last; }; + die "unknown value for form variable where: `$where'"; + } + +Another interesting approach to a switch statement is arrange +for a C block to return the proper value: + + $amode = do { + if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0? + elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" } + elsif ($flag & O_RDWR) { + if ($flag & O_CREAT) { "w+" } + else { ($flag & O_APPEND) ? "a+" : "r+" } + } + }; + +Or + + print do { + ($flags & O_WRONLY) ? "write-only" : + ($flags & O_RDWR) ? "read-write" : + "read-only"; + }; + +Or if you are certain that all the C<&&> clauses are true, you can use +something like this, which "switches" on the value of the +C environment variable. + + #!/usr/bin/perl + # pick out jargon file page based on browser + $dir = 'http://www.wins.uva.nl/~mes/jargon'; + for ($ENV{HTTP_USER_AGENT}) { + $page = /Mac/ && 'm/Macintrash.html' + || /Win(dows )?NT/ && 'e/evilandrude.html' + || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html' + || /Linux/ && 'l/Linux.html' + || /HP-UX/ && 'h/HP-SUX.html' + || /SunOS/ && 's/ScumOS.html' + || 'a/AppendixB.html'; + } + print "Location: $dir/$page\015\012\015\012"; + +That kind of switch statement only works when you know the C<&&> clauses +will be true. If you don't, the previous C example should be used. + +You might also consider writing a hash of subroutine references +instead of synthesizing a C statement. + +=head2 Goto + +Although not for the faint of heart, Perl does support a C +statement. There are three forms: C-LABEL, C-EXPR, and +C-&NAME. A loop's LABEL is not actually a valid target for +a C; it's just the name of the loop. + +The C-LABEL form finds the statement labeled with LABEL and resumes +execution there. It may not be used to go into any construct that +requires initialization, such as a subroutine or a C loop. It +also can't be used to go into a construct that is optimized away. It +can be used to go almost anywhere else within the dynamic scope, +including out of subroutines, but it's usually better to use some other +construct such as C or C. The author of Perl has never felt the +need to use this form of C (in Perl, that is--C is another matter). + +The C-EXPR form expects a label name, whose scope will be resolved +dynamically. This allows for computed Cs per FORTRAN, but isn't +necessarily recommended if you're optimizing for maintainability: + + goto(("FOO", "BAR", "GLARCH")[$i]); + +The C-&NAME form is highly magical, and substitutes a call to the +named subroutine for the currently running subroutine. This is used by +C subroutines that wish to load another subroutine and then +pretend that the other subroutine had been called in the first place +(except that any modifications to C<@_> in the current subroutine are +propagated to the other subroutine.) After the C, not even C +will be able to tell that this routine was called first. + +In almost all cases like this, it's usually a far, far better idea to use the +structured control flow mechanisms of C, C, or C instead of +resorting to a C. For certain applications, the catch and throw pair of +C and die() for exception processing can also be a prudent approach. + +=head2 PODs: Embedded Documentation + +Perl has a mechanism for intermixing documentation with source code. +While it's expecting the beginning of a new statement, if the compiler +encounters a line that begins with an equal sign and a word, like this + + =head1 Here There Be Pods! + +Then that text and all remaining text up through and including a line +beginning with C<=cut> will be ignored. The format of the intervening +text is described in L. + +This allows you to intermix your source code +and your documentation text freely, as in + + =item snazzle($) + + The snazzle() function will behave in the most spectacular + form that you can possibly imagine, not even excepting + cybernetic pyrotechnics. + + =cut back to the compiler, nuff of this pod stuff! + + sub snazzle($) { + my $thingie = shift; + ......... + } + +Note that pod translators should look at only paragraphs beginning +with a pod directive (it makes parsing easier), whereas the compiler +actually knows to look for pod escapes even in the middle of a +paragraph. This means that the following secret stuff will be +ignored by both the compiler and the translators. + + $a=3; + =secret stuff + warn "Neither POD nor CODE!?" + =cut back + print "got $a\n"; + +You probably shouldn't rely upon the C being podded out forever. +Not all pod translators are well-behaved in this regard, and perhaps +the compiler will become pickier. + +One may also use pod directives to quickly comment out a section +of code. + +=head2 Plain Old Comments (Not!) + +Much like the C preprocessor, Perl can process line directives. Using +this, one can control Perl's idea of filenames and line numbers in +error or warning messages (especially for strings that are processed +with C). The syntax for this mechanism is the same as for most +C preprocessors: it matches the regular expression +C with C<$1> being the line +number for the next line, and C<$2> being the optional filename +(specified within quotes). + +There is a fairly obvious gotcha included with the line directive: +Debuggers and profilers will only show the last source line to appear +at a particular line number in a given file. Care should be taken not +to cause line number collisions in code you'd like to debug later. + +Here are some examples that you should be able to type into your command +shell: + + % perl + # line 200 "bzzzt" + # the `#' on the previous line must be the first char on line + die 'foo'; + __END__ + foo at bzzzt line 201. + + % perl + # line 200 "bzzzt" + eval qq[\n#line 2001 ""\ndie 'foo']; print $@; + __END__ + foo at - line 2001. + + % perl + eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@; + __END__ + foo at foo bar line 200. + + % perl + # line 345 "goop" + eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'"; + print $@; + __END__ + foo at goop line 345. + +=cut