X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlsyn.pod;h=6d820b6882e8e9f30a0bf3833aed65789b9ed40a;hb=25fbdfc0879f30cf6944c322d4607eea9bcc7d15;hp=4b1d607e7e83bf14bec927bdca0e6b23e51e6675;hpb=4633a7c4bad06b471d9310620b7fe8ddd158cccd;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod index 4b1d607..6d820b6 100644 --- a/pod/perlsyn.pod +++ b/pod/perlsyn.pod @@ -5,47 +5,59 @@ perlsyn - Perl syntax =head1 DESCRIPTION A Perl script consists of a sequence of declarations and statements. -The only things that need to be declared in Perl are report formats -and subroutines. See the sections below for more information on those -declarations. All uninitialized user-created objects are assumed to -start with a null or 0 value until they are defined by some explicit -operation such as assignment. (Though you can get warnings about the -use of undefined values if you like.) The sequence of statements is -executed just once, unlike in B and B scripts, where the -sequence of statements is executed for each input line. While this means -that you must explicitly loop over the lines of your input file (or -files), it also means you have much more control over which files and -which lines you look at. (Actually, I'm lying--it is possible to do an -implicit loop with either the B<-n> or B<-p> switch. It's just not the -mandatory default like it is in B and B.) +The sequence of statements is executed just once, unlike in B +and B scripts, where the sequence of statements is executed +for each input line. While this means that you must explicitly +loop over the lines of your input file (or files), it also means +you have much more control over which files and which lines you look at. +(Actually, I'm lying--it is possible to do an implicit loop with +either the B<-n> or B<-p> switch. It's just not the mandatory +default like it is in B and B.) + +Perl is, for the most part, a free-form language. (The only exception +to this is format declarations, for obvious reasons.) Text from a +C<"#"> character until the end of the line is a comment, and is +ignored. If you attempt to use C C-style comments, it will be +interpreted either as division or pattern matching, depending on the +context, and C++ C comments just look like a null regular +expression, so don't do that. =head2 Declarations -Perl is, for the most part, a free-form language. (The only -exception to this is format declarations, for obvious reasons.) Comments -are indicated by the "#" character, and extend to the end of the line. If -you attempt to use C C-style comments, it will be interpreted -either as division or pattern matching, depending on the context, and C++ -C comments just look like a null regular expression, so don't do -that. +The only things you need to declare in Perl are report formats +and subroutines--and even undefined subroutines can be handled +through AUTOLOAD. A variable holds the undefined value (C) +until it has been assigned a defined value, which is anything +other than C. When used as a number, C is treated +as C<0>; when used as a string, it is treated the empty string, +C<"">; and when used as a reference that isn't being assigned +to, it is treated as an error. If you enable warnings, you'll +be notified of an uninitialized value whenever you treat C +as a string or a number. Well, usually. Boolean ("don't-care") +contexts and operators such as C<++>, C<-->, C<+=>, C<-=>, and +C<.=> are always exempt from such warnings. A declaration can be put anywhere a statement can, but has no effect on the execution of the primary sequence of statements--declarations all take effect at compile time. Typically all the declarations are put at -the beginning or the end of the script. However, if you're using -lexically-scoped private variables created with my(), you'll have to make sure +the beginning or the end of the script. However, if you're using +lexically-scoped private variables created with C, you'll +have to make sure your format or subroutine definition is within the same block scope -as the my if you expect to to be able to access those private variables. +as the my if you expect to be able to access those private variables. Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point forward in the program. You can declare a -subroutine without defining it by saying just +subroutine without defining it by saying C, thus: sub myname; $me = myname $0 or die "can't get myname"; -Note that it functions as a list operator though, not as a unary -operator, so be careful to use C instead of C<||> there. +Note that myname() functions as a list operator, not as a unary operator; +so be careful to use C instead of C<||> in this case. However, if +you were to declare the subroutine as C, then +C would function as a unary operator, so either C or +C<||> would work. Subroutines declarations can also be loaded up with the C statement or both loaded and imported into your namespace with a C statement. @@ -63,9 +75,9 @@ The only kind of simple statement is an expression evaluated for its side effects. Every simple statement must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon is optional. (A semicolon is still encouraged there if the -block takes up more than one line, since you may eventually add another line.) +block takes up more than one line, because you may eventually add another line.) Note that there are some operators like C and C that look -like compound statements, but aren't (they're just TERMs in an expression), +like compound statements, but aren't (they're just TERMs in an expression), and thus need an explicit termination if used as the last item in a statement. Any simple statement may optionally be followed by a I modifier, @@ -76,24 +88,41 @@ modifiers are: unless EXPR while EXPR until EXPR + foreach EXPR The C and C modifiers have the expected semantics, -presuming you're a speaker of English. The C and C -modifiers also have the usual "while loop" semantics (conditional -evaluated first), except when applied to a do-BLOCK (or to the -now-deprecated do-SUBROUTINE statement), in which case the block -executes once before the conditional is evaluated. This is so that you -can write loops like: +presuming you're a speaker of English. The C modifier is an +iterator: For each value in EXPR, it aliases C<$_> to the value and +executes the statement. The C and C modifiers have the +usual "C loop" semantics (conditional evaluated first), except +when applied to a C-BLOCK (or to the deprecated C-SUBROUTINE +statement), in which case the block executes once before the +conditional is evaluated. This is so that you can write loops like: do { $line = ; ... } until $line eq ".\n"; -See L. Note also that the loop control -statements described later will I work in this construct, since -modifiers don't take loop labels. Sorry. You can always wrap -another block around it to do that sort of thing. +See L. Note also that the loop control statements described +later will I work in this construct, because modifiers don't take +loop labels. Sorry. You can always put another block inside of it +(for C) or around it (for C) to do that sort of thing. +For C, just double the braces: + + do {{ + next if $x == $y; + # do something here + }} until $x++ > $z; + +For C, you have to be more elaborate: + + LOOP: { + do { + last if $x = $y**2; + # do something here + } while $x++ <= $z; + } =head2 Compound statements @@ -114,6 +143,7 @@ The following compound statements may be used to control flow: LABEL while (EXPR) BLOCK continue BLOCK LABEL for (EXPR; EXPR; EXPR) BLOCK LABEL foreach VAR (LIST) BLOCK + LABEL foreach VAR (LIST) BLOCK continue BLOCK LABEL BLOCK continue BLOCK Note that, unlike C and Pascal, these are defined in terms of BLOCKs, @@ -128,19 +158,23 @@ all do the same thing: open(FOO) ? 'hi mom' : die "Can't open $FOO: $!"; # a bit exotic, that last one -The C statement is straightforward. Since BLOCKs are always +The C statement is straightforward. Because BLOCKs are always bounded by curly brackets, there is never any ambiguity about which C an C goes with. If you use C in place of C, the sense of the test is reversed. The C statement executes the block as long as the expression is -true (does not evaluate to the null string or 0 or "0"). The LABEL is -optional, and if present, consists of an identifier followed by a colon. -The LABEL identifies the loop for the loop control statements C, -C, and C. If the LABEL is omitted, the loop control statement +true (does not evaluate to the null string C<""> or C<0> or C<"0">). +The LABEL is optional, and if present, consists of an identifier followed +by a colon. The LABEL identifies the loop for the loop control +statements C, C, and C. +If the LABEL is omitted, the loop control statement refers to the innermost enclosing loop. This may include dynamically looking back your call-stack at run time to find the LABEL. Such -desperate behavior triggers a warning if you use the B<-w> flag. +desperate behavior triggers a warning if you use the C +pragma or the B<-w> flag. +Unlike a C statement, a C statement never implicitly +localises any variables. If there is a C BLOCK, it is always executed just before the conditional is about to be evaluated again, just like the third part of a @@ -178,63 +212,65 @@ want to skip ahead and get the next record. while (<>) { chomp; - if (s/\\$//) { - $_ .= <>; + if (s/\\$//) { + $_ .= <>; redo unless eof(); } # now process $_ - } + } which is Perl short-hand for the more explicitly written version: - LINE: while ($line = ) { + LINE: while (defined($line = )) { chomp($line); - if ($line =~ s/\\$//) { - $line .= ; + if ($line =~ s/\\$//) { + $line .= ; redo LINE unless eof(); # not eof(ARGV)! } # now process $line - } - -Or here's a a simpleminded Pascal comment stripper (warning: assumes no { or } in strings) - - LINE: while () { - while (s|({.*}.*){.*}|$1 |) {} - s|{.*}| |; - if (s|{.*| |) { - $front = $_; - while () { - if (/}/) { # end of comment? - s|^|$front{|; - redo LINE; - } - } - } - print; } Note that if there were a C block on the above code, it would get -executed even on discarded lines. +executed even on discarded lines. This is often used to reset line counters +or C one-time matches. + + # inspired by :1,$g/fred/s//WILMA/ + while (<>) { + ?(fred)? && s//WILMA $1 WILMA/; + ?(barney)? && s//BETTY $1 BETTY/; + ?(homer)? && s//MARGE $1 MARGE/; + } continue { + print "$ARGV $.: $_"; + close ARGV if eof(); # reset $. + reset if eof(); # reset ?pat? + } If the word C is replaced by the word C, the sense of the test is reversed, but the conditional is still tested before the first iteration. -In either the C or the C statement, you may replace "(EXPR)" -with a BLOCK, and the conditional is true if the value of the last -statement in that block is true. While this "feature" continues to work in -version 5, it has been deprecated, so please change any occurrences of "if BLOCK" to -"if (do BLOCK)". +The loop control statements don't work in an C or C, since +they aren't loops. You can double the braces to make them such, though. -=head2 For and Foreach + if (/pattern/) {{ + next if /fred/; + next if /barney/; + # so something here + }} -Perl's C-style C loop works exactly like the corresponding C loop: +The form C, available in Perl 4, is no longer +available. Replace any occurrence of C by C. + +=head2 For Loops + +Perl's C-style C loop works exactly like the corresponding C loop; +that means that this: for ($i = 1; $i < 10; $i++) { ... } -is the same as +is the same as this: $i = 1; while ($i < 10) { @@ -243,27 +279,52 @@ is the same as $i++; } +(There is one minor difference: The first form implies a lexical scope +for variables declared with C in the initialization expression.) + +Besides the normal array index looping, C can lend itself +to many other interesting applications. Here's one that avoids the +problem you get into if you explicitly test for end-of-file on +an interactive file descriptor causing your program to appear to +hang. + + $on_a_tty = -t STDIN && -t STDOUT; + sub prompt { print "yes? " if $on_a_tty } + for ( prompt(); ; prompt() ) { + # do something + } + +=head2 Foreach Loops + The C loop iterates over a normal list value and sets the -variable VAR to be each element of the list in turn. The variable is -implicitly local to the loop and regains its former value upon exiting the -loop. If the variable was previously declared with C, it uses that -variable instead of the global one, but it's still localized to the loop. -This can cause problems if you have subroutine or format declarations -within that block's scope. +variable VAR to be each element of the list in turn. If the variable +is preceded with the keyword C, then it is lexically scoped, and +is therefore visible only within the loop. Otherwise, the variable is +implicitly local to the loop and regains its former value upon exiting +the loop. If the variable was previously declared with C, it uses +that variable instead of the global one, but it's still localized to +the loop. The C keyword is actually a synonym for the C keyword, so -you can use C for readability or C for brevity. If VAR is -omitted, $_ is set to each value. If LIST is an actual array (as opposed -to an expression returning a list value), you can modify each element of -the array by modifying VAR inside the loop. That's because the C -loop index variable is an implicit alias for each item in the list that -you're looping over. +you can use C for readability or C for brevity. (Or because +the Bourne shell is more familiar to you than I, so writing C +comes more naturally.) If VAR is omitted, C<$_> is set to each value. +If any element of LIST is an lvalue, you can modify it by modifying VAR +inside the loop. That's because the C loop index variable is +an implicit alias for each item in the list that you're looping over. + +If any part of LIST is an array, C will get very confused if +you add or remove elements within the loop body, for example with +C. So don't do that. + +C probably won't do what you expect if VAR is a tied or other +special variable. Don't do that either. Examples: for (@ary) { s/foo/bar/ } - foreach $elem (@elements) { + for my $elem (@elements) { $elem *= 2; } @@ -279,40 +340,42 @@ Examples: Here's how a C programmer might code up a particular algorithm in Perl: - for ($i = 0; $i < @ary1; $i++) { - for ($j = 0; $j < @ary2; $j++) { + for (my $i = 0; $i < @ary1; $i++) { + for (my $j = 0; $j < @ary2; $j++) { if ($ary1[$i] > $ary2[$j]) { last; # can't go to outer :-( } $ary1[$i] += $ary2[$j]; } + # this is where that last takes me } -Whereas here's how a Perl programmer more confortable with the idiom might -do it this way: - - OUTER: foreach $i (@ary1) { - INNER: foreach $j (@ary2) { - next OUTER if $i > $j; - $i += $j; - } - } - -See how much easier this is? It's cleaner, safer, and faster. -It's cleaner because it's less noisy. -It's safer because if code gets added -between the inner and outer loops later, you won't accidentally excecute -it because you've explicitly asked to iterate the other loop rather than -merely terminating the inner one. -And it's faster because Perl exececute C statement more -rapidly than it would the equivalent C loop. +Whereas here's how a Perl programmer more comfortable with the idiom might +do it: + + OUTER: for my $wid (@ary1) { + INNER: for my $jet (@ary2) { + next OUTER if $wid > $jet; + $wid += $jet; + } + } + +See how much easier this is? It's cleaner, safer, and faster. It's +cleaner because it's less noisy. It's safer because if code gets added +between the inner and outer loops later on, the new code won't be +accidentally executed. The C explicitly iterates the other loop +rather than merely terminating the inner one. And it's faster because +Perl executes a C statement more rapidly than it would the +equivalent C loop. =head2 Basic BLOCKs and Switch Statements -A BLOCK by itself (labeled or not) is semantically equivalent to a loop -that executes once. Thus you can use any of the loop control -statements in it to leave or restart the block. The C block -is optional. +A BLOCK by itself (labeled or not) is semantically equivalent to a +loop that executes once. Thus you can use any of the loop control +statements in it to leave or restart the block. (Note that this is +I true in C, C, or contrary to popular belief +C blocks, which do I count as loops.) The C +block is optional. The BLOCK construct is particularly nice for doing case structures. @@ -324,7 +387,7 @@ structures. $nothing = 1; } -There is no official switch statement in Perl, because there are +There is no official C statement in Perl, because there are already several ways to write the equivalent. In addition to the above, you could write @@ -335,7 +398,7 @@ above, you could write $nothing = 1; } -(That's actually not as strange as it looks one you realize that you can +(That's actually not as strange as it looks once you realize that you can use loop control "operators" within an expression, That's just the normal C comma operator.) @@ -348,22 +411,22 @@ or $nothing = 1; } -or formatted so it stands out more as a "proper" switch statement: +or formatted so it stands out more as a "proper" C statement: SWITCH: { - /^abc/ && do { - $abc = 1; - last SWITCH; + /^abc/ && do { + $abc = 1; + last SWITCH; }; - /^def/ && do { - $def = 1; - last SWITCH; + /^def/ && do { + $def = 1; + last SWITCH; }; - /^xyz/ && do { - $xyz = 1; - last SWITCH; + /^xyz/ && do { + $xyz = 1; + last SWITCH; }; $nothing = 1; } @@ -388,48 +451,184 @@ or even, horrors, else { $nothing = 1 } - -A common idiom for a switch statement is to use C's aliasing to make -a temporary assignment to $_ for convenient matching: +A common idiom for a C statement is to use C's aliasing to make +a temporary assignment to C<$_> for convenient matching: SWITCH: for ($where) { /In Card Names/ && do { push @flags, '-e'; last; }; /Anywhere/ && do { push @flags, '-h'; last; }; /In Rulings/ && do { last; }; die "unknown value for form variable where: `$where'"; - } + } + +Another interesting approach to a switch statement is arrange +for a C block to return the proper value: + + $amode = do { + if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0? + elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" } + elsif ($flag & O_RDWR) { + if ($flag & O_CREAT) { "w+" } + else { ($flag & O_APPEND) ? "a+" : "r+" } + } + }; + +Or + + print do { + ($flags & O_WRONLY) ? "write-only" : + ($flags & O_RDWR) ? "read-write" : + "read-only"; + }; + +Or if you are certainly that all the C<&&> clauses are true, you can use +something like this, which "switches" on the value of the +C environment variable. + + #!/usr/bin/perl + # pick out jargon file page based on browser + $dir = 'http://www.wins.uva.nl/~mes/jargon'; + for ($ENV{HTTP_USER_AGENT}) { + $page = /Mac/ && 'm/Macintrash.html' + || /Win(dows )?NT/ && 'e/evilandrude.html' + || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html' + || /Linux/ && 'l/Linux.html' + || /HP-UX/ && 'h/HP-SUX.html' + || /SunOS/ && 's/ScumOS.html' + || 'a/AppendixB.html'; + } + print "Location: $dir/$page\015\012\015\012"; + +That kind of switch statement only works when you know the C<&&> clauses +will be true. If you don't, the previous C example should be used. + +You might also consider writing a hash of subroutine references +instead of synthesizing a C statement. =head2 Goto -Although not for the faint of heart, Perl does support a C statement. -A loop's LABEL is not actually a valid target for a C; -it's just the name of the loop. There are three forms: goto-LABEL, -goto-EXPR, and goto-&NAME. +Although not for the faint of heart, Perl does support a C +statement. There are three forms: C-LABEL, C-EXPR, and +C-&NAME. A loop's LABEL is not actually a valid target for +a C; it's just the name of the loop. -The goto-LABEL form finds the statement labeled with LABEL and resumes +The C-LABEL form finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that -requires initialization, such as a subroutine or a foreach loop. It +requires initialization, such as a subroutine or a C loop. It also can't be used to go into a construct that is optimized away. It can be used to go almost anywhere else within the dynamic scope, including out of subroutines, but it's usually better to use some other -construct such as last or die. The author of Perl has never felt the -need to use this form of goto (in Perl, that is--C is another matter). +construct such as C or C. The author of Perl has never felt the +need to use this form of C (in Perl, that is--C is another matter). -The goto-EXPR form expects a label name, whose scope will be resolved -dynamically. This allows for computed gotos per FORTRAN, but isn't +The C-EXPR form expects a label name, whose scope will be resolved +dynamically. This allows for computed Cs per FORTRAN, but isn't necessarily recommended if you're optimizing for maintainability: - goto ("FOO", "BAR", "GLARCH")[$i]; + goto(("FOO", "BAR", "GLARCH")[$i]); -The goto-&NAME form is highly magical, and substitutes a call to the +The C-&NAME form is highly magical, and substitutes a call to the named subroutine for the currently running subroutine. This is used by -AUTOLOAD() subroutines that wish to load another subroutine and then +C subroutines that wish to load another subroutine and then pretend that the other subroutine had been called in the first place -(except that any modifications to @_ in the current subroutine are -propagated to the other subroutine.) After the C, not even caller() +(except that any modifications to C<@_> in the current subroutine are +propagated to the other subroutine.) After the C, not even C will be able to tell that this routine was called first. -In almost cases like this, it's usually a far, far better idea to use the -structured control flow mechanisms of C, C, or C insetad +In almost all cases like this, it's usually a far, far better idea to use the +structured control flow mechanisms of C, C, or C instead of resorting to a C. For certain applications, the catch and throw pair of C and die() for exception processing can also be a prudent approach. + +=head2 PODs: Embedded Documentation + +Perl has a mechanism for intermixing documentation with source code. +While it's expecting the beginning of a new statement, if the compiler +encounters a line that begins with an equal sign and a word, like this + + =head1 Here There Be Pods! + +Then that text and all remaining text up through and including a line +beginning with C<=cut> will be ignored. The format of the intervening +text is described in L. + +This allows you to intermix your source code +and your documentation text freely, as in + + =item snazzle($) + + The snazzle() function will behave in the most spectacular + form that you can possibly imagine, not even excepting + cybernetic pyrotechnics. + + =cut back to the compiler, nuff of this pod stuff! + + sub snazzle($) { + my $thingie = shift; + ......... + } + +Note that pod translators should look at only paragraphs beginning +with a pod directive (it makes parsing easier), whereas the compiler +actually knows to look for pod escapes even in the middle of a +paragraph. This means that the following secret stuff will be +ignored by both the compiler and the translators. + + $a=3; + =secret stuff + warn "Neither POD nor CODE!?" + =cut back + print "got $a\n"; + +You probably shouldn't rely upon the C being podded out forever. +Not all pod translators are well-behaved in this regard, and perhaps +the compiler will become pickier. + +One may also use pod directives to quickly comment out a section +of code. + +=head2 Plain Old Comments (Not!) + +Much like the C preprocessor, Perl can process line directives. Using +this, one can control Perl's idea of filenames and line numbers in +error or warning messages (especially for strings that are processed +with C). The syntax for this mechanism is the same as for most +C preprocessors: it matches the regular expression +C with C<$1> being the line +number for the next line, and C<$2> being the optional filename +(specified within quotes). + +There is a fairly obvious gotcha included with the line directive: +Debuggers and profilers will only show the last source line to appear +at a particular line number in a given file. Care should be taken not +to cause line number collisions in code you'd like to debug later. + +Here are some examples that you should be able to type into your command +shell: + + % perl + # line 200 "bzzzt" + # the `#' on the previous line must be the first char on line + die 'foo'; + __END__ + foo at bzzzt line 201. + + % perl + # line 200 "bzzzt" + eval qq[\n#line 2001 ""\ndie 'foo']; print $@; + __END__ + foo at - line 2001. + + % perl + eval qq[\n#line 200 "foo bar"\ndie 'foo']; print $@; + __END__ + foo at foo bar line 200. + + % perl + # line 345 "goop" + eval "\n#line " . __LINE__ . ' "' . __FILE__ ."\"\ndie 'foo'"; + print $@; + __END__ + foo at goop line 345. + +=cut