2 ''' $Header: perl_man.1,v 3.0.1.7 90/08/09 04:24:03 lwall Locked $
4 ''' $Log: perl.man.1,v $
5 ''' Revision 3.0.1.7 90/08/09 04:24:03 lwall
6 ''' patch19: added -x switch to extract script from input trash
7 ''' patch19: Added -c switch to do compilation only
8 ''' patch19: bare identifiers are now strings if no other interpretation possible
9 ''' patch19: -s now returns size of file
10 ''' patch19: Added __LINE__ and __FILE__ tokens
11 ''' patch19: Added __END__ token
13 ''' Revision 3.0.1.6 90/08/03 11:14:44 lwall
14 ''' patch19: Intermediate diffs for Randal
16 ''' Revision 3.0.1.5 90/03/27 16:14:37 lwall
17 ''' patch16: .. now works using magical string increment
19 ''' Revision 3.0.1.4 90/03/12 16:44:33 lwall
20 ''' patch13: (LIST,) now legal
21 ''' patch13: improved LIST documentation
22 ''' patch13: example of if-elsif switch was wrong
24 ''' Revision 3.0.1.3 90/02/28 17:54:32 lwall
25 ''' patch9: @array in scalar context now returns length of array
26 ''' patch9: in manual, example of open and ?: was backwards
28 ''' Revision 3.0.1.2 89/11/17 15:30:03 lwall
29 ''' patch5: fixed some manual typos and indent problems
31 ''' Revision 3.0.1.1 89/11/11 04:41:22 lwall
32 ''' patch2: explained about sh and ${1+"$@"}
33 ''' patch2: documented that space must separate word and '' string
35 ''' Revision 3.0 89/10/18 15:21:29 lwall
57 ''' Set up \*(-- to give an unbreakable dash;
58 ''' string Tr holds user defined translation string.
59 ''' Bell System Logo is used as a dummy character.
64 .if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
65 .if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
82 perl \- Practical Extraction and Report Language
85 [options] filename args
88 is an interpreted language optimized for scanning arbitrary text files,
89 extracting information from those text files, and printing reports based
91 It's also a good language for many system management tasks.
92 The language is intended to be practical (easy to use, efficient, complete)
93 rather than beautiful (tiny, elegant, minimal).
94 It combines (in the author's opinion, anyway) some of the best features of C,
95 \fIsed\fR, \fIawk\fR, and \fIsh\fR,
96 so people familiar with those languages should have little difficulty with it.
97 (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
99 Expression syntax corresponds quite closely to C expression syntax.
100 Unlike most Unix utilities,
102 does not arbitrarily limit the size of your data\*(--if you've got
105 can slurp in your whole file as a single string.
106 Recursion is of unlimited depth.
107 And the hash tables used by associative arrays grow as necessary to prevent
108 degraded performance.
110 uses sophisticated pattern matching techniques to scan large amounts of
112 Although optimized for scanning text,
114 can also deal with binary data, and can make dbm files look like associative
115 arrays (where dbm is available).
118 scripts are safer than C programs
119 through a dataflow tracing mechanism which prevents many stupid security holes.
120 If you have a problem that would ordinarily use \fIsed\fR
121 or \fIawk\fR or \fIsh\fR, but it
122 exceeds their capabilities or must run a little faster,
123 and you don't want to write the silly thing in C, then
126 There are also translators to turn your
137 looks for your script in one of the following places:
139 Specified line by line via
141 switches on the command line.
143 Contained in the file specified by the first filename on the command line.
144 (Note that systems supporting the #! notation invoke interpreters this way.)
146 Passed in implicitly via standard input.
147 This only works if there are no filename arguments\*(--to pass
150 script you must explicitly specify a \- for the script name.
152 After locating your script,
154 compiles it to an internal form.
155 If the script is syntactically correct, it is executed.
157 Note: on first reading this section may not make much sense to you. It's here
158 at the front for easy reference.
160 A single-character option may be combined with the following option, if any.
161 This is particularly useful when invoking a script using the #! construct which
162 only allows one argument. Example:
166 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
173 turns on autosplit mode when used with a
177 An implicit split command to the @F array
178 is done as the first thing inside the implicit while loop produced by
185 perl \-ane \'print pop(@F), "\en";\'
191 print pop(@F), "\en";
199 to check the syntax of the script and then exit without executing it.
202 runs the script under the perl debugger.
203 See the section on Debugging.
206 sets debugging flags.
207 To watch how it executes your script, use
209 (This only works if debugging is compiled into your
211 Another nice value is \-D1024, which lists your compiled syntax tree.
212 And \-D512 displays compiled regular expressions.
214 .BI \-e " commandline"
215 may be used to enter one line of script.
218 commands may be given to build up a multi-line script.
223 will not look for a script filename in the argument list.
226 specifies that files processed by the <> construct are to be edited
228 It does this by renaming the input file, opening the output file by the
229 same name, and selecting that output file as the default for print statements.
230 The extension, if supplied, is added to the name of the
231 old file to make a backup copy.
232 If no extension is supplied, no backup is made.
233 Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
238 #!/usr/bin/perl \-pi.bak
241 which is equivalent to
246 if ($ARGV ne $oldargv) {
247 rename($ARGV, $ARGV . \'.bak\');
248 open(ARGVOUT, ">$ARGV");
255 print; # this prints to original filename
262 form doesn't need to compare $ARGV to $oldargv to know when
263 the filename has changed.
264 It does, however, use ARGVOUT for the selected filehandle.
267 is restored as the default output filehandle after the loop.
269 You can use eof to locate the end of each input file, in case you want
270 to append to each file, or reset line numbering (see example under eof).
273 may be used in conjunction with
275 to tell the C preprocessor where to look for include files.
276 By default /usr/include and /usr/lib/perl are searched.
281 to assume the following loop around your script, which makes it iterate
282 over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
287 .\|.\|. # your script goes here
291 Note that the lines are not printed by default.
294 to have lines printed.
295 Here is an efficient way to delete all files older than a week:
298 find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\'
301 This is faster than using the \-exec switch of find because you don't have to
302 start a process on every filename found.
307 to assume the following loop around your script, which makes it iterate
308 over filename arguments somewhat like \fIsed\fR:
313 .\|.\|. # your script goes here
319 Note that the lines are printed automatically.
320 To suppress printing use the
330 causes your script to be run through the C preprocessor before
333 (Since both comments and cpp directives begin with the # character,
334 you should avoid starting comments with any words recognized
335 by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
338 enables some rudimentary switch parsing for switches on the command line
339 after the script name but before any filename arguments (or before a \-\|\-).
340 Any switch found there is removed from @ARGV and sets the corresponding variable in the
343 The following script prints \*(L"true\*(R" if and only if the script is
344 invoked with a \-xyz switch.
349 if ($xyz) { print "true\en"; }
356 use the PATH environment variable to search for the script
357 (unless the name of the script starts with a slash).
358 Typically this is used to emulate #! startup on machines that don't
359 support #!, in the following manner:
363 eval "exec /usr/bin/perl \-S $0 $*"
364 if $running_under_some_shell;
367 The system ignores the first line and feeds the script to /bin/sh,
368 which proceeds to try to execute the
370 script as a shell script.
371 The shell executes the second line as a normal shell command, and thus
375 On some systems $0 doesn't always contain the full pathname,
380 to search for the script if necessary.
383 locates the script, it parses the lines and ignores them because
384 the variable $running_under_some_shell is never true.
385 A better construct than $* would be ${1+"$@"}, which handles embedded spaces
386 and such in the filenames, but doesn't work if the script is being interpreted
388 In order to start up sh rather than csh, some systems may have to replace the
389 #! line with a line containing just
390 a colon, which will be politely ignored by perl.
391 Other systems can't control that, and need a totally devious construct that
392 will work under any of csh, sh or perl, such as the following:
396 eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
397 & eval 'exec /usr/bin/perl -S $0 $argv:q'
405 to dump core after compiling your script.
406 You can then take this core dump and turn it into an executable file
407 by using the undump program (not supplied).
408 This speeds startup at the expense of some disk space (which you can
409 minimize by stripping the executable).
410 (Still, a "hello world" executable comes out to about 200K on my machine.)
411 If you are going to run your executable as a set-id program then you
412 should probably compile it using taintperl rather than normal perl.
413 If you want to execute a portion of your script before dumping, use the
414 dump operator instead.
415 Note: availability of undump is platform specific and may not be available
416 for a specific port of perl.
421 to do unsafe operations.
422 Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while
423 running as superuser.
426 prints the version and patchlevel of your
431 prints warnings about identifiers that are mentioned only once, and scalar
432 variables that are used before being set.
433 Also warns about redefined subroutines, and references to undefined
434 filehandles or filehandles opened readonly that you are attempting to
436 Also warns you if you use == on values that don't look like numbers, and if
437 your subroutines recurse more than 100 deep.
442 that the script is embedded in a message.
443 Leading garbage will be discarded until the first line that starts
444 with #! and contains the string "perl".
445 Any meaningful switches on that line will be applied (but only one
446 group of switches, as with normal #! processing).
447 If a directory name is specified, Perl will switch to that directory
448 before running the script.
451 switch only controls the the disposal of leading garbage.
452 The script must be terminated with __END__ if there is trailing garbage
453 to be ignored (the script can process any or all of the trailing garbage
454 via standard input if desired).
455 .Sh "Data Types and Objects"
458 has three data types: scalars, arrays of scalars, and
459 associative arrays of scalars.
460 Normal arrays are indexed by number, and associative arrays by string.
462 The interpretation of operations and values in perl sometimes
463 depends on the requirements
464 of the context around the operation or value.
465 There are three major contexts: string, numeric and array.
466 Certain operations return array values
467 in contexts wanting an array, and scalar values otherwise.
468 (If this is true of an operation it will be mentioned in the documentation
470 Operations which return scalars don't care whether the context is looking
471 for a string or a number, but
472 scalar variables and values are interpreted as strings or numbers
473 as appropriate to the context.
474 A scalar is interpreted as TRUE in the boolean sense if it is not the null
476 Booleans returned by operators are 1 for true and 0 or \'\' (the null
479 There are actually two varieties of null string: defined and undefined.
480 Undefined null strings are returned when there is no real value for something,
481 such as when there was an error, or at end of file, or when you refer
482 to an uninitialized variable or element of an array.
483 An undefined null string may become defined the first time you access it, but
484 prior to that you can use the defined() operator to determine whether the
485 value is defined or not.
487 References to scalar variables always begin with \*(L'$\*(R', even when referring
488 to a scalar that is part of an array.
493 $days \h'|2i'# a simple scalar variable
494 $days[28] \h'|2i'# 29th element of array @days
495 $days{\'Feb\'}\h'|2i'# one value from an associative array
496 $#days \h'|2i'# last index of array @days
498 but entire arrays or array slices are denoted by \*(L'@\*(R':
500 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
501 @days[3,4,5]\h'|2i'# same as @days[3.\|.5]
502 @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'})
504 and entire associative arrays are denoted by \*(L'%\*(R':
506 %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.)
509 Any of these eight constructs may serve as an lvalue,
510 that is, may be assigned to.
511 (It also turns out that an assignment is itself an lvalue in
512 certain contexts\*(--see examples under s, tr and chop.)
513 Assignment to a scalar evaluates the righthand side in a scalar context,
514 while assignment to an array or array slice evaluates the righthand side
517 You may find the length of array @days by evaluating
518 \*(L"$#days\*(R", as in
520 (Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.)
521 Assigning to $#days changes the length of the array.
522 Shortening an array by this method does not actually destroy any values.
523 Lengthening an array that was previously shortened recovers the values that
524 were in those elements.
525 You can also gain some measure of efficiency by preextending an array that
527 (You can also extend an array by assigning to an element that is off the
529 This differs from assigning to $#whatever in that intervening values
530 are set to null rather than recovered.)
531 You can truncate an array down to nothing by assigning the null list () to
533 The following are exactly equivalent
537 $#whatever = $[ \- 1;
541 If you evaluate an array in a scalar context, it returns the length of
543 The following is always true:
546 @whatever == $#whatever \- $[ + 1;
550 Multi-dimensional arrays are not directly supported, but see the discussion
551 of the $; variable later for a means of emulating multiple subscripts with
552 an associative array.
553 You could also write a subroutine to turn multiple subscripts into a single
556 Every data type has its own namespace.
557 You can, without fear of conflict, use the same name for a scalar variable,
558 an array, an associative array, a filehandle, a subroutine name, and/or
560 Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R',
561 or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
562 with respect to variable names.
563 (They ARE reserved with respect to labels and filehandles, however, which
564 don't have an initial special character.
565 Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\').
566 Using uppercase filehandles also improves readability and protects you
567 from conflict with future reserved words.)
568 Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
570 Names which start with a letter may also contain digits and underscores.
571 Names which do not start with a letter are limited to one character,
572 e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
573 (Most of the one character names have a predefined significance to
577 Numeric literals are specified in any of the usual floating point or
589 String literals are delimited by either single or double quotes.
590 They work much like shell quotes:
591 double-quoted string literals are subject to backslash and variable
592 substitution; single-quoted strings are not (except for \e\' and \e\e).
593 The usual backslash rules apply for making characters such as newline, tab, etc.
594 You can also embed newlines directly in your strings, i.e. they can end on
595 a different line than they begin.
596 This is nice, but if you forget your trailing quote, the error will not be
599 finds another line containing the quote character, which
600 may be much further on in the script.
601 Variable substitution inside strings is limited to scalar variables, normal
602 array values, and array slices.
603 (In other words, identifiers beginning with $ or @, followed by an optional
604 bracketed expression as a subscript.)
605 The following code segment prints out \*(L"The price is $100.\*(R"
609 $Price = \'$100\';\h'|3.5i'# not interpreted
610 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
613 Note that you can put curly brackets around the identifier to delimit it
614 from following alphanumerics.
615 Also note that a single quoted string must be separated from a preceding
616 word by a space, since single quote is a valid character in an identifier
619 Two special literals are __LINE__ and __FILE__, which represent the current
620 line number and filename at that point in your program.
621 They may only be used as separate tokens; they will not be interpolated
623 In addition, the token __END__ may be used to indicate the logical end of the
624 script before the actual end of file.
625 Any following text is ignored (but if the script is being read from
626 the standard input, then the rest of the input is available by reading
627 from filehandle STDIN).
628 The two control characters ^D and ^Z are synonyms for __END__.
630 A word that doesn't have any other interpretation in the grammar will be
631 treated as if it had single quotes around it.
632 For this purpose, a word consists only of alphanumeric characters and underline,
633 and must start with an alphabetic character.
634 As with filehandles and labels, a bare word that consists entirely of
635 lowercase letters risks conflict with future reserved words, and if you
638 switch, Perl will warn you about any such words.
640 Array values are interpolated into double-quoted strings by joining all the
641 elements of the array with the delimiter specified in the $" variable,
643 (Since in versions of perl prior to 3.0 the @ character was not a metacharacter
644 in double-quoted strings, the interpolation of @array, $array[EXPR],
645 @array[LIST], $array{EXPR}, or @array{LIST} only happens if array is
646 referenced elsewhere in the program or is predefined.)
647 The following are equivalent:
651 $temp = join($",@ARGV);
657 Within search patterns (which also undergo double-quotish substitution)
658 there is a bad ambiguity: Is /$foo[bar]/ to be
659 interpreted as /${foo}[bar]/ (where [bar] is a character class for the
660 regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to
662 If @foo doesn't otherwise exist, then it's obviously a character class.
663 If @foo exists, perl takes a good guess about [bar], and is almost always right.
664 If it does guess wrong, or if you're just plain paranoid,
665 you can force the correct interpretation with curly brackets as above.
667 A line-oriented form of quoting is based on the shell here-is syntax.
668 Following a << you specify a string to terminate the quoted material, and all lines
669 following the current line down to the terminating string are the value
671 The terminating string may be either an identifier (a word), or some
673 If quoted, the type of quotes you use determines the treatment of the text,
674 just as in regular quoting.
675 An unquoted identifier works like double quotes.
676 There must be no space between the << and the identifier.
677 (If you put a space it will be treated as a null identifier, which is
678 valid, and matches the first blank line\*(--see Merry Christmas example below.)
679 The terminating string must appear by itself (unquoted and with no surrounding
680 whitespace) on the terminating line.
683 print <<EOF; # same as above
687 print <<"EOF"; # same as above
691 print << x 10; # null identifier is delimiter
694 print <<`EOC`; # execute commands
699 print <<foo, <<bar; # you can stack them
706 Array literals are denoted by separating individual values by commas, and
707 enclosing the list in parentheses:
713 In a context not requiring an array value, the value of the array literal
714 is the value of the final element, as in the C comma operator.
719 @foo = (\'cc\', \'\-E\', $bar);
721 assigns the entire array value to array foo, but
723 $foo = (\'cc\', \'\-E\', $bar);
726 assigns the value of variable bar to variable foo.
727 Note that the value of an actual array in a scalar context is the length
728 of the array; the following assigns to $foo the value 3:
732 @foo = (\'cc\', \'\-E\', $bar);
733 $foo = @foo; # $foo gets 3
736 You may have an optional comma before the closing parenthesis of an
737 array literal, so that you can say:
747 When a LIST is evaluated, each element of the list is evaluated in
748 an array context, and the resulting array value is interpolated into LIST
749 just as if each individual element were a member of LIST. Thus arrays
750 lose their identity in a LIST\*(--the list
754 contains all the elements of @foo followed by all the elements of @bar,
755 followed by all the elements returned by the subroutine named SomeSub.
757 A list value may also be subscripted like a normal array.
761 $time = (stat($file))[8]; # stat returns array value
762 $digit = ('a','b','c','d','e','f')[$digit-10];
763 return (pop(@foo),pop(@foo))[0];
767 Array lists may be assigned to if and only if each element of the list
771 ($a, $b, $c) = (1, 2, 3);
773 ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00);
775 The final element may be an array or an associative array:
777 ($a, $b, @rest) = split;
778 local($a, $b, %rest) = @_;
781 You can actually put an array anywhere in the list, but the first array
782 in the list will soak up all the values, and anything after it will get
784 This may be useful in a local().
786 An associative array literal contains pairs of values to be interpreted
787 as a key and a value:
791 # same as map assignment above
792 %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
795 Array assignment in a scalar context returns the number of elements
796 produced by the expression on the right side of the assignment:
799 $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
803 There are several other pseudo-literals that you should know about.
804 If a string is enclosed by backticks (grave accents), it first undergoes
805 variable substitution just like a double quoted string.
806 It is then interpreted as a command, and the output of that command
807 is the value of the pseudo-literal, like in a shell.
808 In a scalar context, a single string consisting of all the output is
810 In an array context, an array of values is returned, one for each line
812 (You can set $/ to use a different line terminator.)
813 The command is executed each time the pseudo-literal is evaluated.
814 The status value of the command is returned in $? (see Predefined Names
815 for the interpretation of $?).
816 Unlike in \f2csh\f1, no translation is done on the return
817 data\*(--newlines remain newlines.
818 Unlike in any of the shells, single quotes do not hide variable names
819 in the command from interpretation.
820 To pass a $ through to the shell you need to hide it with a backslash.
822 Evaluating a filehandle in angle brackets yields the next line
823 from that file (newline included, so it's never false until EOF, at
824 which time an undefined value is returned).
825 Ordinarily you must assign that value to a variable,
826 but there is one situation where an automatic assignment happens.
827 If (and only if) the input symbol is the only thing inside the conditional of a
830 automatically assigned to the variable \*(L"$_\*(R".
831 (This may seem like an odd thing to you, but you'll use the construct
835 Anyway, the following lines are equivalent to each other:
839 while ($_ = <STDIN>) { print; }
840 while (<STDIN>) { print; }
841 for (\|;\|<STDIN>;\|) { print; }
842 print while $_ = <STDIN>;
857 will also work except in packages, where they would be interpreted as
858 local identifiers rather than global.)
859 Additional filehandles may be created with the
863 If a <FILEHANDLE> is used in a context that is looking for an array, an array
864 consisting of all the input lines is returned, one line per array element.
865 It's easy to make a LARGE data space this way, so use with care.
867 The null filehandle <> is special and can be used to emulate the behavior of
868 \fIsed\fR and \fIawk\fR.
869 Input from <> comes either from standard input, or from each file listed on
871 Here's how it works: the first time <> is evaluated, the ARGV array is checked,
872 and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard
874 The ARGV array is then processed as a list of filenames.
880 .\|.\|. # code for each line
886 unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[;
887 while ($ARGV = shift) {
890 .\|.\|. # code for each line
895 except that it isn't as cumbersome to say.
896 It really does shift array ARGV and put the current filename into
898 It also uses filehandle ARGV internally.
899 You can modify @ARGV before the first <> as long as you leave the first
900 filename at the beginning of the array.
901 Line numbers ($.) continue as if the input was one big happy file.
902 (But see example under eof for how to reset line numbers on each file.)
905 If you want to set @ARGV to your own list of files, go right ahead.
906 If you want to pass switches into your script, you can
907 put a loop on the front like this:
911 while ($_ = $ARGV[0], /\|^\-/\|) {
913 last if /\|^\-\|\-$\|/\|;
914 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
915 /\|^\-v\|/ \|&& \|$verbose++;
916 .\|.\|. # other switches
919 .\|.\|. # code for each line
923 The <> symbol will return FALSE only once.
924 If you call it again after this it will assume you are processing another
925 @ARGV list, and if you haven't set @ARGV, will input from
928 If the string inside the angle brackets is a reference to a scalar variable
930 then that variable contains the name of the filehandle to input from.
932 If the string inside angle brackets is not a filehandle, it is interpreted
933 as a filename pattern to be globbed, and either an array of filenames or the
934 next filename in the list is returned, depending on context.
935 One level of $ interpretation is done first, but you can't say <$foo>
936 because that's an indirect filehandle as explained in the previous
938 You could insert curly brackets to force interpretation as a
939 filename glob: <${foo}>.
951 open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|");
958 In fact, it's currently implemented that way.
959 (Which means it will not work on filenames with spaces in them unless
960 you have /bin/csh on your machine.)
961 Of course, the shortest way to do the above is:
971 script consists of a sequence of declarations and commands.
972 The only things that need to be declared in
974 are report formats and subroutines.
975 See the sections below for more information on those declarations.
976 All uninitialized user-created objects are assumed to
977 start with a null or 0 value until they
978 are defined by some explicit operation such as assignment.
979 The sequence of commands is executed just once, unlike in
983 scripts, where the sequence of commands is executed for each input line.
984 While this means that you must explicitly loop over the lines of your input file
985 (or files), it also means you have much more control over which files and which
987 (Actually, I'm lying\*(--it is possible to do an implicit loop with either the
993 A declaration can be put anywhere a command can, but has no effect on the
994 execution of the primary sequence of commands--declarations all take effect
996 Typically all the declarations are put at the beginning or the end of the script.
999 is, for the most part, a free-form language.
1000 (The only exception to this is format declarations, for fairly obvious reasons.)
1001 Comments are indicated by the # character, and extend to the end of the line.
1002 If you attempt to use /* */ C comments, it will be interpreted either as
1003 division or pattern matching, depending on the context.
1005 .Sh "Compound statements"
1008 a sequence of commands may be treated as one command by enclosing it
1010 We will call this a BLOCK.
1012 The following compound commands may be used to control flow:
1017 if (EXPR) BLOCK else BLOCK
1018 if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK
1019 LABEL while (EXPR) BLOCK
1020 LABEL while (EXPR) BLOCK continue BLOCK
1021 LABEL for (EXPR; EXPR; EXPR) BLOCK
1022 LABEL foreach VAR (ARRAY) BLOCK
1023 LABEL BLOCK continue BLOCK
1026 Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
1028 This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
1029 If you want to write conditionals without curly brackets there are several
1030 other ways to do it.
1031 The following all do the same thing:
1035 if (!open(foo)) { die "Can't open $foo: $!"; }
1036 die "Can't open $foo: $!" unless open(foo);
1037 open(foo) || die "Can't open $foo: $!"; # foo or bust!
1038 open(foo) ? \'hi mom\' : die "Can't open $foo: $!";
1039 # a bit exotic, that last one
1045 statement is straightforward.
1046 Since BLOCKs are always bounded by curly brackets, there is never any
1047 ambiguity about which
1056 the sense of the test is reversed.
1060 statement executes the block as long as the expression is true
1061 (does not evaluate to the null string or 0).
1062 The LABEL is optional, and if present, consists of an identifier followed by
1064 The LABEL identifies the loop for the loop control statements
1072 BLOCK, it is always executed just before
1073 the conditional is about to be evaluated again, similarly to the third part
1077 Thus it can be used to increment a loop variable, even when the loop has
1078 been continued via the
1080 statement (similar to the C \*(L"continue\*(R" statement).
1084 is replaced by the word
1086 the sense of the test is reversed, but the conditional is still tested before
1087 the first iteration.
1093 statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
1094 is true if the value of the last command in that block is true.
1098 loop works exactly like the corresponding
1104 for ($i = 1; $i < 10; $i++) {
1118 The foreach loop iterates over a normal array value and sets the variable
1119 VAR to be each element of the array in turn.
1120 The variable is implicitly local to the loop, and regains its former value
1121 upon exiting the loop.
1122 The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword,
1123 so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity.
1124 If VAR is omitted, $_ is set to each value.
1125 If ARRAY is an actual array (as opposed to an expression returning an array
1126 value), you can modify each element of the array
1127 by modifying VAR inside the loop.
1132 for (@ary) { s/foo/bar/; }
1134 foreach $elem (@elements) {
1139 for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) {
1140 print $_, "\en"; sleep(1);
1143 for (1..15) { print "Merry Christmas\en"; }
1146 foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'})) {
1147 print "Item: $item\en";
1152 The BLOCK by itself (labeled or not) is equivalent to a loop that executes
1154 Thus you can use any of the loop control statements in it to leave or
1159 This construct is particularly nice for doing case structures.
1164 if (/^abc/) { $abc = 1; last foo; }
1165 if (/^def/) { $def = 1; last foo; }
1166 if (/^xyz/) { $xyz = 1; last foo; }
1171 There is no official switch statement in perl, because there
1172 are already several ways to write the equivalent.
1173 In addition to the above, you could write
1178 $abc = 1, last foo if /^abc/;
1179 $def = 1, last foo if /^def/;
1180 $xyz = 1, last foo if /^xyz/;
1188 /^abc/ && do { $abc = 1; last foo; };
1189 /^def/ && do { $def = 1; last foo; };
1190 /^xyz/ && do { $xyz = 1; last foo; };
1198 /^abc/ && ($abc = 1, last foo);
1199 /^def/ && ($def = 1, last foo);
1200 /^xyz/ && ($xyz = 1, last foo);
1217 As it happens, these are all optimized internally to a switch structure,
1218 so perl jumps directly to the desired statement, and you needn't worry
1219 about perl executing a lot of unnecessary statements when you have a string
1220 of 50 elsifs, as long as you are testing the same simple scalar variable
1221 using ==, eq, or pattern matching as above.
1222 (If you're curious as to whether the optimizer has done this for a particular
1223 case statement, you can use the \-D1024 switch to list the syntax tree
1225 .Sh "Simple statements"
1226 The only kind of simple statement is an expression evaluated for its side
1228 Every expression (simple statement) must be terminated with a semicolon.
1229 Note that this is like C, but unlike Pascal (and
1232 Any simple statement may optionally be followed by a
1233 single modifier, just before the terminating semicolon.
1234 The possible modifiers are:
1248 modifiers have the expected semantics.
1253 modifiers also have the expected semantics (conditional evaluated first),
1254 except when applied to a do-BLOCK command,
1255 in which case the block executes once before the conditional is evaluated.
1256 This is so that you can write loops like:
1263 } until $_ \|eq \|".\|\e\|n";
1268 operator below. Note also that the loop control commands described later will
1269 NOT work in this construct, since modifiers don't take loop labels.
1274 expressions work almost exactly like C expressions, only the differences
1275 will be mentioned here.
1281 The exponentiation operator.
1283 The exponentiation assignment operator.
1285 The null list, used to initialize an array to null.
1287 Concatenation of two strings.
1289 The concatenation assignment operator.
1291 String equality (== is numeric equality).
1292 For a mnemonic just think of \*(L"eq\*(R" as a string.
1293 (If you are used to the
1295 behavior of using == for either string or numeric equality
1296 based on the current form of the comparands, beware!
1297 You must be explicit here.)
1299 String inequality (!= is numeric inequality).
1303 String greater than.
1305 String less than or equal.
1307 String greater than or equal.
1309 Certain operations search or modify the string \*(L"$_\*(R" by default.
1310 This operator makes that kind of operation work on some other string.
1311 The right argument is a search pattern, substitution, or translation.
1312 The left argument is what is supposed to be searched, substituted, or
1313 translated instead of the default \*(L"$_\*(R".
1314 The return value indicates the success of the operation.
1315 (If the right argument is an expression other than a search pattern,
1316 substitution, or translation, it is interpreted as a search pattern
1318 This is less efficient than an explicit search, since the pattern must
1319 be compiled every time the expression is evaluated.)
1320 The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
1322 Just like =~ except the return value is negated.
1324 The repetition operator.
1325 Returns a string consisting of the left operand repeated the
1326 number of times specified by the right operand.
1329 print \'\-\' x 80; # print row of dashes
1330 print \'\-\' x80; # illegal, x80 is identifier
1332 print "\et" x ($tab/8), \' \' x ($tab%8); # tab over
1336 The repetition assignment operator.
1338 The range operator, which is really two different operators depending
1340 In an array context, returns an array of values counting (by ones)
1341 from the left value to the right value.
1342 This is useful for writing \*(L"for (1..10)\*(R" loops and for doing
1343 slice operations on arrays.
1345 In a scalar context, .\|. returns a boolean value.
1346 The operator is bistable, like a flip-flop..
1347 Each .\|. operator maintains its own boolean state.
1348 It is false as long as its left operand is false.
1349 Once the left operand is true, the range operator stays true
1350 until the right operand is true,
1351 AFTER which the range operator becomes false again.
1352 (It doesn't become false till the next time the range operator is evaluated.
1353 It can become false on the same evaluation it became true, but it still returns
1355 The right operand is not evaluated while the operator is in the \*(L"false\*(R" state,
1356 and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state.
1357 The scalar .\|. operator is primarily intended for doing line number ranges
1359 the fashion of \fIsed\fR or \fIawk\fR.
1360 The precedence is a little lower than || and &&.
1361 The value returned is either the null string for false, or a sequence number
1362 (beginning with 1) for true.
1363 The sequence number is reset for each range encountered.
1364 The final sequence number in a range has the string \'E0\' appended to it, which
1365 doesn't affect its numeric value, but gives you something to search for if you
1366 want to exclude the endpoint.
1367 You can exclude the beginning point by waiting for the sequence number to be
1369 If either operand of scalar .\|. is static, that operand is implicitly compared
1370 to the $. variable, the current line number.
1375 As a scalar operator:
1376 if (101 .\|. 200) { print; } # print 2nd hundred lines
1378 next line if (1 .\|. /^$/); # skip header lines
1380 s/^/> / if (/^$/ .\|. eof()); # quote body
1383 As an array operator:
1384 for (101 .\|. 200) { print; } # print $_ 100 times
1386 @foo = @foo[$[ .\|. $#foo]; # an expensive no-op
1387 @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items
1392 This unary operator takes one argument, either a filename or a filehandle,
1393 and tests the associated file to see if something is true about it.
1394 If the argument is omitted, tests $_, except for \-t, which tests
1396 It returns 1 for true and \'\' for false, or the undefined value if the
1398 Precedence is higher than logical and relational operators, but lower than
1399 arithmetic operators.
1400 The operator may be any of:
1402 \-r File is readable by effective uid.
1403 \-w File is writable by effective uid.
1404 \-x File is executable by effective uid.
1405 \-o File is owned by effective uid.
1406 \-R File is readable by real uid.
1407 \-W File is writable by real uid.
1408 \-X File is executable by real uid.
1409 \-O File is owned by real uid.
1411 \-z File has zero size.
1412 \-s File has non-zero size (returns size).
1413 \-f File is a plain file.
1414 \-d File is a directory.
1415 \-l File is a symbolic link.
1416 \-p File is a named pipe (FIFO).
1417 \-S File is a socket.
1418 \-b File is a block special file.
1419 \-c File is a character special file.
1420 \-u File has setuid bit set.
1421 \-g File has setgid bit set.
1422 \-k File has sticky bit set.
1423 \-t Filehandle is opened to a tty.
1424 \-T File is a text file.
1425 \-B File is a binary file (opposite of \-T).
1428 The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X
1429 is based solely on the mode of the file and the uids and gids of the user.
1430 There may be other reasons you can't actually read, write or execute the file.
1431 Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and
1432 \-x and \-X return 1 if any execute bit is set in the mode.
1433 Scripts run by the superuser may thus need to do a stat() in order to determine
1434 the actual mode of the file, or temporarily set the uid to something else.
1442 next unless \-f $_; # ignore specials
1447 Note that \-s/a/b/ does not do a negated substitution.
1448 Saying \-exp($foo) still works as expected, however\*(--only single letters
1449 following a minus are interpreted as file tests.
1451 The \-T and \-B switches work as follows.
1452 The first block or so of the file is examined for odd characters such as
1453 strange control codes or metacharacters.
1454 If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file.
1455 Also, any file containing null in the first block is considered a binary file.
1456 If \-T or \-B is used on a filehandle, the current stdio buffer is examined
1457 rather than the first block.
1458 Both \-T and \-B return TRUE on a null file, or a file at EOF when testing
1461 If any of the file tests (or either stat operator) are given the special
1462 filehandle consisting of a solitary underline, then the stat structure
1463 of the previous file test (or stat operator) is used, saving a system
1465 (This doesn't work with \-t, and you need to remember that lstat and -l
1466 will leave values in the stat structure for the symbolic link, not the
1471 print "Can do.\en" if -r $a || -w _ || -x _;
1475 print "Readable\en" if -r _;
1476 print "Writable\en" if -w _;
1477 print "Executable\en" if -x _;
1478 print "Setuid\en" if -u _;
1479 print "Setgid\en" if -g _;
1480 print "Sticky\en" if -k _;
1481 print "Text\en" if -T _;
1482 print "Binary\en" if -B _;
1486 Here is what C has that
1490 Address-of operator.
1492 Dereference-address operator.
1494 Type casting operator.
1498 does a certain amount of expression evaluation at compile time, whenever
1499 it determines that all of the arguments to an operator are static and have
1501 In particular, string concatenation happens at compile time between literals that don't do variable substitution.
1502 Backslash interpretation also happens at compile time.
1507 \'Now is the time for all\' . "\|\e\|n" .
1508 \'good men to come to.\'
1511 and this all reduces to one string internally.
1513 The autoincrement operator has a little extra built-in magic to it.
1514 If you increment a variable that is numeric, or that has ever been used in
1515 a numeric context, you get a normal increment.
1516 If, however, the variable has only been used in string contexts since it
1517 was set, and has a value that is not null and matches the
1518 pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done
1519 as a string, preserving each character within its range, with carry:
1522 print ++($foo = \'99\'); # prints \*(L'100\*(R'
1523 print ++($foo = \'a0\'); # prints \*(L'a1\*(R'
1524 print ++($foo = \'Az\'); # prints \*(L'Ba\*(R'
1525 print ++($foo = \'zz\'); # prints \*(L'aaa\*(R'
1528 The autodecrement is not magical.
1530 The range operator (in an array context) makes use of the magical
1531 autoincrement algorithm if the minimum and maximum are strings.
1534 @alphabet = (\'A\' .. \'Z\');
1536 to get all the letters of the alphabet, or
1538 $hexdigit = (0 .. 9, \'a\' .. \'f\')[$num & 15];
1540 to get a hexadecimal digit, or
1542 @z2 = (\'01\' .. \'31\'); print @z2[$mday];
1544 to get dates with leading zeros.
1545 (If the final value specified is not in the sequence that the magical increment
1546 would produce, the sequence goes until the next value would be longer than
1547 the final value specified.)
1549 The || and && operators differ from C's in that, rather than returning 0 or 1,
1550 they return the last value evaluated.
1551 Thus, a portable way to find out the home directory might be:
1554 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
1555 (getpwuid($<))[7] || die "You're homeless!\en";