2 ''' $Header: perl_man.1,v 3.0.1.9 90/10/20 02:14:24 lwall Locked $
4 ''' $Log: perl.man.1,v $
5 ''' Revision 3.0.1.9 90/10/20 02:14:24 lwall
6 ''' patch37: fixed various typos in man page
8 ''' Revision 3.0.1.8 90/10/15 18:16:19 lwall
9 ''' patch29: added DATA filehandle to read stuff after __END__
10 ''' patch29: added cmp and <=>
11 ''' patch29: added -M, -A and -C
13 ''' Revision 3.0.1.7 90/08/09 04:24:03 lwall
14 ''' patch19: added -x switch to extract script from input trash
15 ''' patch19: Added -c switch to do compilation only
16 ''' patch19: bare identifiers are now strings if no other interpretation possible
17 ''' patch19: -s now returns size of file
18 ''' patch19: Added __LINE__ and __FILE__ tokens
19 ''' patch19: Added __END__ token
21 ''' Revision 3.0.1.6 90/08/03 11:14:44 lwall
22 ''' patch19: Intermediate diffs for Randal
24 ''' Revision 3.0.1.5 90/03/27 16:14:37 lwall
25 ''' patch16: .. now works using magical string increment
27 ''' Revision 3.0.1.4 90/03/12 16:44:33 lwall
28 ''' patch13: (LIST,) now legal
29 ''' patch13: improved LIST documentation
30 ''' patch13: example of if-elsif switch was wrong
32 ''' Revision 3.0.1.3 90/02/28 17:54:32 lwall
33 ''' patch9: @array in scalar context now returns length of array
34 ''' patch9: in manual, example of open and ?: was backwards
36 ''' Revision 3.0.1.2 89/11/17 15:30:03 lwall
37 ''' patch5: fixed some manual typos and indent problems
39 ''' Revision 3.0.1.1 89/11/11 04:41:22 lwall
40 ''' patch2: explained about sh and ${1+"$@"}
41 ''' patch2: documented that space must separate word and '' string
43 ''' Revision 3.0 89/10/18 15:21:29 lwall
60 .ie \\n(.$>=3 .ne \\$3
65 ''' Set up \*(-- to give an unbreakable dash;
66 ''' string Tr holds user defined translation string.
67 ''' Bell System Logo is used as a dummy character.
72 .if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
73 .if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
90 perl \- Practical Extraction and Report Language
93 [options] filename args
96 is an interpreted language optimized for scanning arbitrary text files,
97 extracting information from those text files, and printing reports based
99 It's also a good language for many system management tasks.
100 The language is intended to be practical (easy to use, efficient, complete)
101 rather than beautiful (tiny, elegant, minimal).
102 It combines (in the author's opinion, anyway) some of the best features of C,
103 \fIsed\fR, \fIawk\fR, and \fIsh\fR,
104 so people familiar with those languages should have little difficulty with it.
105 (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
107 Expression syntax corresponds quite closely to C expression syntax.
108 Unlike most Unix utilities,
110 does not arbitrarily limit the size of your data\*(--if you've got
113 can slurp in your whole file as a single string.
114 Recursion is of unlimited depth.
115 And the hash tables used by associative arrays grow as necessary to prevent
116 degraded performance.
118 uses sophisticated pattern matching techniques to scan large amounts of
120 Although optimized for scanning text,
122 can also deal with binary data, and can make dbm files look like associative
123 arrays (where dbm is available).
126 scripts are safer than C programs
127 through a dataflow tracing mechanism which prevents many stupid security holes.
128 If you have a problem that would ordinarily use \fIsed\fR
129 or \fIawk\fR or \fIsh\fR, but it
130 exceeds their capabilities or must run a little faster,
131 and you don't want to write the silly thing in C, then
134 There are also translators to turn your
145 looks for your script in one of the following places:
147 Specified line by line via
149 switches on the command line.
151 Contained in the file specified by the first filename on the command line.
152 (Note that systems supporting the #! notation invoke interpreters this way.)
154 Passed in implicitly via standard input.
155 This only works if there are no filename arguments\*(--to pass
158 script you must explicitly specify a \- for the script name.
160 After locating your script,
162 compiles it to an internal form.
163 If the script is syntactically correct, it is executed.
165 Note: on first reading this section may not make much sense to you. It's here
166 at the front for easy reference.
168 A single-character option may be combined with the following option, if any.
169 This is particularly useful when invoking a script using the #! construct which
170 only allows one argument. Example:
174 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
181 turns on autosplit mode when used with a
185 An implicit split command to the @F array
186 is done as the first thing inside the implicit while loop produced by
193 perl \-ane \'print pop(@F), "\en";\'
199 print pop(@F), "\en";
207 to check the syntax of the script and then exit without executing it.
210 runs the script under the perl debugger.
211 See the section on Debugging.
214 sets debugging flags.
215 To watch how it executes your script, use
217 (This only works if debugging is compiled into your
219 Another nice value is \-D1024, which lists your compiled syntax tree.
220 And \-D512 displays compiled regular expressions.
222 .BI \-e " commandline"
223 may be used to enter one line of script.
226 commands may be given to build up a multi-line script.
231 will not look for a script filename in the argument list.
234 specifies that files processed by the <> construct are to be edited
236 It does this by renaming the input file, opening the output file by the
237 same name, and selecting that output file as the default for print statements.
238 The extension, if supplied, is added to the name of the
239 old file to make a backup copy.
240 If no extension is supplied, no backup is made.
241 Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
246 #!/usr/bin/perl \-pi.bak
249 which is equivalent to
254 if ($ARGV ne $oldargv) {
255 rename($ARGV, $ARGV . \'.bak\');
256 open(ARGVOUT, ">$ARGV");
263 print; # this prints to original filename
270 form doesn't need to compare $ARGV to $oldargv to know when
271 the filename has changed.
272 It does, however, use ARGVOUT for the selected filehandle.
275 is restored as the default output filehandle after the loop.
277 You can use eof to locate the end of each input file, in case you want
278 to append to each file, or reset line numbering (see example under eof).
281 may be used in conjunction with
283 to tell the C preprocessor where to look for include files.
284 By default /usr/include and /usr/lib/perl are searched.
289 to assume the following loop around your script, which makes it iterate
290 over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
295 .\|.\|. # your script goes here
299 Note that the lines are not printed by default.
302 to have lines printed.
303 Here is an efficient way to delete all files older than a week:
306 find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\'
309 This is faster than using the \-exec switch of find because you don't have to
310 start a process on every filename found.
315 to assume the following loop around your script, which makes it iterate
316 over filename arguments somewhat like \fIsed\fR:
321 .\|.\|. # your script goes here
327 Note that the lines are printed automatically.
328 To suppress printing use the
338 causes your script to be run through the C preprocessor before
341 (Since both comments and cpp directives begin with the # character,
342 you should avoid starting comments with any words recognized
343 by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
346 enables some rudimentary switch parsing for switches on the command line
347 after the script name but before any filename arguments (or before a \-\|\-).
348 Any switch found there is removed from @ARGV and sets the corresponding variable in the
351 The following script prints \*(L"true\*(R" if and only if the script is
352 invoked with a \-xyz switch.
357 if ($xyz) { print "true\en"; }
364 use the PATH environment variable to search for the script
365 (unless the name of the script starts with a slash).
366 Typically this is used to emulate #! startup on machines that don't
367 support #!, in the following manner:
371 eval "exec /usr/bin/perl \-S $0 $*"
372 if $running_under_some_shell;
375 The system ignores the first line and feeds the script to /bin/sh,
376 which proceeds to try to execute the
378 script as a shell script.
379 The shell executes the second line as a normal shell command, and thus
383 On some systems $0 doesn't always contain the full pathname,
388 to search for the script if necessary.
391 locates the script, it parses the lines and ignores them because
392 the variable $running_under_some_shell is never true.
393 A better construct than $* would be ${1+"$@"}, which handles embedded spaces
394 and such in the filenames, but doesn't work if the script is being interpreted
396 In order to start up sh rather than csh, some systems may have to replace the
397 #! line with a line containing just
398 a colon, which will be politely ignored by perl.
399 Other systems can't control that, and need a totally devious construct that
400 will work under any of csh, sh or perl, such as the following:
404 eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
405 & eval 'exec /usr/bin/perl -S $0 $argv:q'
413 to dump core after compiling your script.
414 You can then take this core dump and turn it into an executable file
415 by using the undump program (not supplied).
416 This speeds startup at the expense of some disk space (which you can
417 minimize by stripping the executable).
418 (Still, a "hello world" executable comes out to about 200K on my machine.)
419 If you are going to run your executable as a set-id program then you
420 should probably compile it using taintperl rather than normal perl.
421 If you want to execute a portion of your script before dumping, use the
422 dump operator instead.
423 Note: availability of undump is platform specific and may not be available
424 for a specific port of perl.
429 to do unsafe operations.
430 Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while
431 running as superuser.
434 prints the version and patchlevel of your
439 prints warnings about identifiers that are mentioned only once, and scalar
440 variables that are used before being set.
441 Also warns about redefined subroutines, and references to undefined
442 filehandles or filehandles opened readonly that you are attempting to
444 Also warns you if you use == on values that don't look like numbers, and if
445 your subroutines recurse more than 100 deep.
450 that the script is embedded in a message.
451 Leading garbage will be discarded until the first line that starts
452 with #! and contains the string "perl".
453 Any meaningful switches on that line will be applied (but only one
454 group of switches, as with normal #! processing).
455 If a directory name is specified, Perl will switch to that directory
456 before running the script.
459 switch only controls the the disposal of leading garbage.
460 The script must be terminated with __END__ if there is trailing garbage
461 to be ignored (the script can process any or all of the trailing garbage
462 via the DATA filehandle if desired).
463 .Sh "Data Types and Objects"
466 has three data types: scalars, arrays of scalars, and
467 associative arrays of scalars.
468 Normal arrays are indexed by number, and associative arrays by string.
470 The interpretation of operations and values in perl sometimes
471 depends on the requirements
472 of the context around the operation or value.
473 There are three major contexts: string, numeric and array.
474 Certain operations return array values
475 in contexts wanting an array, and scalar values otherwise.
476 (If this is true of an operation it will be mentioned in the documentation
478 Operations which return scalars don't care whether the context is looking
479 for a string or a number, but
480 scalar variables and values are interpreted as strings or numbers
481 as appropriate to the context.
482 A scalar is interpreted as TRUE in the boolean sense if it is not the null
484 Booleans returned by operators are 1 for true and 0 or \'\' (the null
487 There are actually two varieties of null string: defined and undefined.
488 Undefined null strings are returned when there is no real value for something,
489 such as when there was an error, or at end of file, or when you refer
490 to an uninitialized variable or element of an array.
491 An undefined null string may become defined the first time you access it, but
492 prior to that you can use the defined() operator to determine whether the
493 value is defined or not.
495 References to scalar variables always begin with \*(L'$\*(R', even when referring
496 to a scalar that is part of an array.
501 $days \h'|2i'# a simple scalar variable
502 $days[28] \h'|2i'# 29th element of array @days
503 $days{\'Feb\'}\h'|2i'# one value from an associative array
504 $#days \h'|2i'# last index of array @days
506 but entire arrays or array slices are denoted by \*(L'@\*(R':
508 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
509 @days[3,4,5]\h'|2i'# same as @days[3.\|.5]
510 @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'})
512 and entire associative arrays are denoted by \*(L'%\*(R':
514 %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.)
517 Any of these eight constructs may serve as an lvalue,
518 that is, may be assigned to.
519 (It also turns out that an assignment is itself an lvalue in
520 certain contexts\*(--see examples under s, tr and chop.)
521 Assignment to a scalar evaluates the righthand side in a scalar context,
522 while assignment to an array or array slice evaluates the righthand side
525 You may find the length of array @days by evaluating
526 \*(L"$#days\*(R", as in
528 (Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.)
529 Assigning to $#days changes the length of the array.
530 Shortening an array by this method does not actually destroy any values.
531 Lengthening an array that was previously shortened recovers the values that
532 were in those elements.
533 You can also gain some measure of efficiency by preextending an array that
535 (You can also extend an array by assigning to an element that is off the
537 This differs from assigning to $#whatever in that intervening values
538 are set to null rather than recovered.)
539 You can truncate an array down to nothing by assigning the null list () to
541 The following are exactly equivalent
545 $#whatever = $[ \- 1;
549 If you evaluate an array in a scalar context, it returns the length of
551 The following is always true:
554 @whatever == $#whatever \- $[ + 1;
558 Multi-dimensional arrays are not directly supported, but see the discussion
559 of the $; variable later for a means of emulating multiple subscripts with
560 an associative array.
561 You could also write a subroutine to turn multiple subscripts into a single
564 Every data type has its own namespace.
565 You can, without fear of conflict, use the same name for a scalar variable,
566 an array, an associative array, a filehandle, a subroutine name, and/or
568 Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R',
569 or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
570 with respect to variable names.
571 (They ARE reserved with respect to labels and filehandles, however, which
572 don't have an initial special character.
573 Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\').
574 Using uppercase filehandles also improves readability and protects you
575 from conflict with future reserved words.)
576 Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
578 Names which start with a letter may also contain digits and underscores.
579 Names which do not start with a letter are limited to one character,
580 e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
581 (Most of the one character names have a predefined significance to
585 Numeric literals are specified in any of the usual floating point or
597 String literals are delimited by either single or double quotes.
598 They work much like shell quotes:
599 double-quoted string literals are subject to backslash and variable
600 substitution; single-quoted strings are not (except for \e\' and \e\e).
601 The usual backslash rules apply for making characters such as newline, tab, etc.
602 You can also embed newlines directly in your strings, i.e. they can end on
603 a different line than they begin.
604 This is nice, but if you forget your trailing quote, the error will not be
607 finds another line containing the quote character, which
608 may be much further on in the script.
609 Variable substitution inside strings is limited to scalar variables, normal
610 array values, and array slices.
611 (In other words, identifiers beginning with $ or @, followed by an optional
612 bracketed expression as a subscript.)
613 The following code segment prints out \*(L"The price is $100.\*(R"
617 $Price = \'$100\';\h'|3.5i'# not interpreted
618 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
621 Note that you can put curly brackets around the identifier to delimit it
622 from following alphanumerics.
623 Also note that a single quoted string must be separated from a preceding
624 word by a space, since single quote is a valid character in an identifier
627 Two special literals are __LINE__ and __FILE__, which represent the current
628 line number and filename at that point in your program.
629 They may only be used as separate tokens; they will not be interpolated
631 In addition, the token __END__ may be used to indicate the logical end of the
632 script before the actual end of file.
633 Any following text is ignored (but may be read via the DATA filehandle).
634 The two control characters ^D and ^Z are synomyms for __END__.
636 A word that doesn't have any other interpretation in the grammar will be
637 treated as if it had single quotes around it.
638 For this purpose, a word consists only of alphanumeric characters and underline,
639 and must start with an alphabetic character.
640 As with filehandles and labels, a bare word that consists entirely of
641 lowercase letters risks conflict with future reserved words, and if you
644 switch, Perl will warn you about any such words.
646 Array values are interpolated into double-quoted strings by joining all the
647 elements of the array with the delimiter specified in the $" variable,
649 (Since in versions of perl prior to 3.0 the @ character was not a metacharacter
650 in double-quoted strings, the interpolation of @array, $array[EXPR],
651 @array[LIST], $array{EXPR}, or @array{LIST} only happens if array is
652 referenced elsewhere in the program or is predefined.)
653 The following are equivalent:
657 $temp = join($",@ARGV);
663 Within search patterns (which also undergo double-quotish substitution)
664 there is a bad ambiguity: Is /$foo[bar]/ to be
665 interpreted as /${foo}[bar]/ (where [bar] is a character class for the
666 regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to
668 If @foo doesn't otherwise exist, then it's obviously a character class.
669 If @foo exists, perl takes a good guess about [bar], and is almost always right.
670 If it does guess wrong, or if you're just plain paranoid,
671 you can force the correct interpretation with curly brackets as above.
673 A line-oriented form of quoting is based on the shell here-is syntax.
674 Following a << you specify a string to terminate the quoted material, and all lines
675 following the current line down to the terminating string are the value
677 The terminating string may be either an identifier (a word), or some
679 If quoted, the type of quotes you use determines the treatment of the text,
680 just as in regular quoting.
681 An unquoted identifier works like double quotes.
682 There must be no space between the << and the identifier.
683 (If you put a space it will be treated as a null identifier, which is
684 valid, and matches the first blank line\*(--see Merry Christmas example below.)
685 The terminating string must appear by itself (unquoted and with no surrounding
686 whitespace) on the terminating line.
689 print <<EOF; # same as above
693 print <<"EOF"; # same as above
697 print << x 10; # null identifier is delimiter
700 print <<`EOC`; # execute commands
705 print <<foo, <<bar; # you can stack them
712 Array literals are denoted by separating individual values by commas, and
713 enclosing the list in parentheses:
719 In a context not requiring an array value, the value of the array literal
720 is the value of the final element, as in the C comma operator.
725 @foo = (\'cc\', \'\-E\', $bar);
727 assigns the entire array value to array foo, but
729 $foo = (\'cc\', \'\-E\', $bar);
732 assigns the value of variable bar to variable foo.
733 Note that the value of an actual array in a scalar context is the length
734 of the array; the following assigns to $foo the value 3:
738 @foo = (\'cc\', \'\-E\', $bar);
739 $foo = @foo; # $foo gets 3
742 You may have an optional comma before the closing parenthesis of an
743 array literal, so that you can say:
753 When a LIST is evaluated, each element of the list is evaluated in
754 an array context, and the resulting array value is interpolated into LIST
755 just as if each individual element were a member of LIST. Thus arrays
756 lose their identity in a LIST\*(--the list
760 contains all the elements of @foo followed by all the elements of @bar,
761 followed by all the elements returned by the subroutine named SomeSub.
763 A list value may also be subscripted like a normal array.
767 $time = (stat($file))[8]; # stat returns array value
768 $digit = ('a','b','c','d','e','f')[$digit-10];
769 return (pop(@foo),pop(@foo))[0];
773 Array lists may be assigned to if and only if each element of the list
777 ($a, $b, $c) = (1, 2, 3);
779 ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00);
781 The final element may be an array or an associative array:
783 ($a, $b, @rest) = split;
784 local($a, $b, %rest) = @_;
787 You can actually put an array anywhere in the list, but the first array
788 in the list will soak up all the values, and anything after it will get
790 This may be useful in a local().
792 An associative array literal contains pairs of values to be interpreted
793 as a key and a value:
797 # same as map assignment above
798 %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
801 Array assignment in a scalar context returns the number of elements
802 produced by the expression on the right side of the assignment:
805 $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
809 There are several other pseudo-literals that you should know about.
810 If a string is enclosed by backticks (grave accents), it first undergoes
811 variable substitution just like a double quoted string.
812 It is then interpreted as a command, and the output of that command
813 is the value of the pseudo-literal, like in a shell.
814 In a scalar context, a single string consisting of all the output is
816 In an array context, an array of values is returned, one for each line
818 (You can set $/ to use a different line terminator.)
819 The command is executed each time the pseudo-literal is evaluated.
820 The status value of the command is returned in $? (see Predefined Names
821 for the interpretation of $?).
822 Unlike in \f2csh\f1, no translation is done on the return
823 data\*(--newlines remain newlines.
824 Unlike in any of the shells, single quotes do not hide variable names
825 in the command from interpretation.
826 To pass a $ through to the shell you need to hide it with a backslash.
828 Evaluating a filehandle in angle brackets yields the next line
829 from that file (newline included, so it's never false until EOF, at
830 which time an undefined value is returned).
831 Ordinarily you must assign that value to a variable,
832 but there is one situation where an automatic assignment happens.
833 If (and only if) the input symbol is the only thing inside the conditional of a
836 automatically assigned to the variable \*(L"$_\*(R".
837 (This may seem like an odd thing to you, but you'll use the construct
841 Anyway, the following lines are equivalent to each other:
845 while ($_ = <STDIN>) { print; }
846 while (<STDIN>) { print; }
847 for (\|;\|<STDIN>;\|) { print; }
848 print while $_ = <STDIN>;
863 will also work except in packages, where they would be interpreted as
864 local identifiers rather than global.)
865 Additional filehandles may be created with the
869 If a <FILEHANDLE> is used in a context that is looking for an array, an array
870 consisting of all the input lines is returned, one line per array element.
871 It's easy to make a LARGE data space this way, so use with care.
873 The null filehandle <> is special and can be used to emulate the behavior of
874 \fIsed\fR and \fIawk\fR.
875 Input from <> comes either from standard input, or from each file listed on
877 Here's how it works: the first time <> is evaluated, the ARGV array is checked,
878 and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard
880 The ARGV array is then processed as a list of filenames.
886 .\|.\|. # code for each line
892 unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[;
893 while ($ARGV = shift) {
896 .\|.\|. # code for each line
901 except that it isn't as cumbersome to say.
902 It really does shift array ARGV and put the current filename into
904 It also uses filehandle ARGV internally.
905 You can modify @ARGV before the first <> as long as you leave the first
906 filename at the beginning of the array.
907 Line numbers ($.) continue as if the input was one big happy file.
908 (But see example under eof for how to reset line numbers on each file.)
911 If you want to set @ARGV to your own list of files, go right ahead.
912 If you want to pass switches into your script, you can
913 put a loop on the front like this:
917 while ($_ = $ARGV[0], /\|^\-/\|) {
919 last if /\|^\-\|\-$\|/\|;
920 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
921 /\|^\-v\|/ \|&& \|$verbose++;
922 .\|.\|. # other switches
925 .\|.\|. # code for each line
929 The <> symbol will return FALSE only once.
930 If you call it again after this it will assume you are processing another
931 @ARGV list, and if you haven't set @ARGV, will input from
934 If the string inside the angle brackets is a reference to a scalar variable
936 then that variable contains the name of the filehandle to input from.
938 If the string inside angle brackets is not a filehandle, it is interpreted
939 as a filename pattern to be globbed, and either an array of filenames or the
940 next filename in the list is returned, depending on context.
941 One level of $ interpretation is done first, but you can't say <$foo>
942 because that's an indirect filehandle as explained in the previous
944 You could insert curly brackets to force interpretation as a
945 filename glob: <${foo}>.
957 open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|");
964 In fact, it's currently implemented that way.
965 (Which means it will not work on filenames with spaces in them unless
966 you have /bin/csh on your machine.)
967 Of course, the shortest way to do the above is:
977 script consists of a sequence of declarations and commands.
978 The only things that need to be declared in
980 are report formats and subroutines.
981 See the sections below for more information on those declarations.
982 All uninitialized user-created objects are assumed to
983 start with a null or 0 value until they
984 are defined by some explicit operation such as assignment.
985 The sequence of commands is executed just once, unlike in
989 scripts, where the sequence of commands is executed for each input line.
990 While this means that you must explicitly loop over the lines of your input file
991 (or files), it also means you have much more control over which files and which
993 (Actually, I'm lying\*(--it is possible to do an implicit loop with either the
999 A declaration can be put anywhere a command can, but has no effect on the
1000 execution of the primary sequence of commands--declarations all take effect
1002 Typically all the declarations are put at the beginning or the end of the script.
1005 is, for the most part, a free-form language.
1006 (The only exception to this is format declarations, for fairly obvious reasons.)
1007 Comments are indicated by the # character, and extend to the end of the line.
1008 If you attempt to use /* */ C comments, it will be interpreted either as
1009 division or pattern matching, depending on the context.
1011 .Sh "Compound statements"
1014 a sequence of commands may be treated as one command by enclosing it
1016 We will call this a BLOCK.
1018 The following compound commands may be used to control flow:
1023 if (EXPR) BLOCK else BLOCK
1024 if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK
1025 LABEL while (EXPR) BLOCK
1026 LABEL while (EXPR) BLOCK continue BLOCK
1027 LABEL for (EXPR; EXPR; EXPR) BLOCK
1028 LABEL foreach VAR (ARRAY) BLOCK
1029 LABEL BLOCK continue BLOCK
1032 Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
1034 This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
1035 If you want to write conditionals without curly brackets there are several
1036 other ways to do it.
1037 The following all do the same thing:
1041 if (!open(foo)) { die "Can't open $foo: $!"; }
1042 die "Can't open $foo: $!" unless open(foo);
1043 open(foo) || die "Can't open $foo: $!"; # foo or bust!
1044 open(foo) ? \'hi mom\' : die "Can't open $foo: $!";
1045 # a bit exotic, that last one
1051 statement is straightforward.
1052 Since BLOCKs are always bounded by curly brackets, there is never any
1053 ambiguity about which
1062 the sense of the test is reversed.
1066 statement executes the block as long as the expression is true
1067 (does not evaluate to the null string or 0).
1068 The LABEL is optional, and if present, consists of an identifier followed by
1070 The LABEL identifies the loop for the loop control statements
1078 BLOCK, it is always executed just before
1079 the conditional is about to be evaluated again, similarly to the third part
1083 Thus it can be used to increment a loop variable, even when the loop has
1084 been continued via the
1086 statement (similar to the C \*(L"continue\*(R" statement).
1090 is replaced by the word
1092 the sense of the test is reversed, but the conditional is still tested before
1093 the first iteration.
1099 statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
1100 is true if the value of the last command in that block is true.
1104 loop works exactly like the corresponding
1110 for ($i = 1; $i < 10; $i++) {
1124 The foreach loop iterates over a normal array value and sets the variable
1125 VAR to be each element of the array in turn.
1126 The variable is implicitly local to the loop, and regains its former value
1127 upon exiting the loop.
1128 The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword,
1129 so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity.
1130 If VAR is omitted, $_ is set to each value.
1131 If ARRAY is an actual array (as opposed to an expression returning an array
1132 value), you can modify each element of the array
1133 by modifying VAR inside the loop.
1138 for (@ary) { s/foo/bar/; }
1140 foreach $elem (@elements) {
1145 for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) {
1146 print $_, "\en"; sleep(1);
1149 for (1..15) { print "Merry Christmas\en"; }
1152 foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'})) {
1153 print "Item: $item\en";
1158 The BLOCK by itself (labeled or not) is equivalent to a loop that executes
1160 Thus you can use any of the loop control statements in it to leave or
1165 This construct is particularly nice for doing case structures.
1170 if (/^abc/) { $abc = 1; last foo; }
1171 if (/^def/) { $def = 1; last foo; }
1172 if (/^xyz/) { $xyz = 1; last foo; }
1177 There is no official switch statement in perl, because there
1178 are already several ways to write the equivalent.
1179 In addition to the above, you could write
1184 $abc = 1, last foo if /^abc/;
1185 $def = 1, last foo if /^def/;
1186 $xyz = 1, last foo if /^xyz/;
1194 /^abc/ && do { $abc = 1; last foo; };
1195 /^def/ && do { $def = 1; last foo; };
1196 /^xyz/ && do { $xyz = 1; last foo; };
1204 /^abc/ && ($abc = 1, last foo);
1205 /^def/ && ($def = 1, last foo);
1206 /^xyz/ && ($xyz = 1, last foo);
1223 As it happens, these are all optimized internally to a switch structure,
1224 so perl jumps directly to the desired statement, and you needn't worry
1225 about perl executing a lot of unnecessary statements when you have a string
1226 of 50 elsifs, as long as you are testing the same simple scalar variable
1227 using ==, eq, or pattern matching as above.
1228 (If you're curious as to whether the optimizer has done this for a particular
1229 case statement, you can use the \-D1024 switch to list the syntax tree
1231 .Sh "Simple statements"
1232 The only kind of simple statement is an expression evaluated for its side
1234 Every expression (simple statement) must be terminated with a semicolon.
1235 Note that this is like C, but unlike Pascal (and
1238 Any simple statement may optionally be followed by a
1239 single modifier, just before the terminating semicolon.
1240 The possible modifiers are:
1254 modifiers have the expected semantics.
1259 modifiers also have the expected semantics (conditional evaluated first),
1260 except when applied to a do-BLOCK command,
1261 in which case the block executes once before the conditional is evaluated.
1262 This is so that you can write loops like:
1269 } until $_ \|eq \|".\|\e\|n";
1274 operator below. Note also that the loop control commands described later will
1275 NOT work in this construct, since modifiers don't take loop labels.
1280 expressions work almost exactly like C expressions, only the differences
1281 will be mentioned here.
1287 The exponentiation operator.
1289 The exponentiation assignment operator.
1291 The null list, used to initialize an array to null.
1293 Concatenation of two strings.
1295 The concatenation assignment operator.
1297 String equality (== is numeric equality).
1298 For a mnemonic just think of \*(L"eq\*(R" as a string.
1299 (If you are used to the
1301 behavior of using == for either string or numeric equality
1302 based on the current form of the comparands, beware!
1303 You must be explicit here.)
1305 String inequality (!= is numeric inequality).
1309 String greater than.
1311 String less than or equal.
1313 String greater than or equal.
1315 String comparison, returning -1, 0, or 1.
1317 Numeric comparison, returning -1, 0, or 1.
1319 Certain operations search or modify the string \*(L"$_\*(R" by default.
1320 This operator makes that kind of operation work on some other string.
1321 The right argument is a search pattern, substitution, or translation.
1322 The left argument is what is supposed to be searched, substituted, or
1323 translated instead of the default \*(L"$_\*(R".
1324 The return value indicates the success of the operation.
1325 (If the right argument is an expression other than a search pattern,
1326 substitution, or translation, it is interpreted as a search pattern
1328 This is less efficient than an explicit search, since the pattern must
1329 be compiled every time the expression is evaluated.)
1330 The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
1332 Just like =~ except the return value is negated.
1334 The repetition operator.
1335 Returns a string consisting of the left operand repeated the
1336 number of times specified by the right operand.
1339 print \'\-\' x 80; # print row of dashes
1340 print \'\-\' x80; # illegal, x80 is identifier
1342 print "\et" x ($tab/8), \' \' x ($tab%8); # tab over
1346 The repetition assignment operator.
1348 The range operator, which is really two different operators depending
1350 In an array context, returns an array of values counting (by ones)
1351 from the left value to the right value.
1352 This is useful for writing \*(L"for (1..10)\*(R" loops and for doing
1353 slice operations on arrays.
1355 In a scalar context, .\|. returns a boolean value.
1356 The operator is bistable, like a flip-flop..
1357 Each .\|. operator maintains its own boolean state.
1358 It is false as long as its left operand is false.
1359 Once the left operand is true, the range operator stays true
1360 until the right operand is true,
1361 AFTER which the range operator becomes false again.
1362 (It doesn't become false till the next time the range operator is evaluated.
1363 It can become false on the same evaluation it became true, but it still returns
1365 The right operand is not evaluated while the operator is in the \*(L"false\*(R" state,
1366 and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state.
1367 The scalar .\|. operator is primarily intended for doing line number ranges
1369 the fashion of \fIsed\fR or \fIawk\fR.
1370 The precedence is a little lower than || and &&.
1371 The value returned is either the null string for false, or a sequence number
1372 (beginning with 1) for true.
1373 The sequence number is reset for each range encountered.
1374 The final sequence number in a range has the string \'E0\' appended to it, which
1375 doesn't affect its numeric value, but gives you something to search for if you
1376 want to exclude the endpoint.
1377 You can exclude the beginning point by waiting for the sequence number to be
1379 If either operand of scalar .\|. is static, that operand is implicitly compared
1380 to the $. variable, the current line number.
1385 As a scalar operator:
1386 if (101 .\|. 200) { print; } # print 2nd hundred lines
1388 next line if (1 .\|. /^$/); # skip header lines
1390 s/^/> / if (/^$/ .\|. eof()); # quote body
1393 As an array operator:
1394 for (101 .\|. 200) { print; } # print $_ 100 times
1396 @foo = @foo[$[ .\|. $#foo]; # an expensive no-op
1397 @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items
1402 This unary operator takes one argument, either a filename or a filehandle,
1403 and tests the associated file to see if something is true about it.
1404 If the argument is omitted, tests $_, except for \-t, which tests
1406 It returns 1 for true and \'\' for false, or the undefined value if the
1408 Precedence is higher than logical and relational operators, but lower than
1409 arithmetic operators.
1410 The operator may be any of:
1412 \-r File is readable by effective uid.
1413 \-w File is writable by effective uid.
1414 \-x File is executable by effective uid.
1415 \-o File is owned by effective uid.
1416 \-R File is readable by real uid.
1417 \-W File is writable by real uid.
1418 \-X File is executable by real uid.
1419 \-O File is owned by real uid.
1421 \-z File has zero size.
1422 \-s File has non-zero size (returns size).
1423 \-f File is a plain file.
1424 \-d File is a directory.
1425 \-l File is a symbolic link.
1426 \-p File is a named pipe (FIFO).
1427 \-S File is a socket.
1428 \-b File is a block special file.
1429 \-c File is a character special file.
1430 \-u File has setuid bit set.
1431 \-g File has setgid bit set.
1432 \-k File has sticky bit set.
1433 \-t Filehandle is opened to a tty.
1434 \-T File is a text file.
1435 \-B File is a binary file (opposite of \-T).
1436 \-M Age of file in days when script started.
1437 \-A Same for access time.
1438 \-C Same for inode change time.
1441 The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X
1442 is based solely on the mode of the file and the uids and gids of the user.
1443 There may be other reasons you can't actually read, write or execute the file.
1444 Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and
1445 \-x and \-X return 1 if any execute bit is set in the mode.
1446 Scripts run by the superuser may thus need to do a stat() in order to determine
1447 the actual mode of the file, or temporarily set the uid to something else.
1455 next unless \-f $_; # ignore specials
1460 Note that \-s/a/b/ does not do a negated substitution.
1461 Saying \-exp($foo) still works as expected, however\*(--only single letters
1462 following a minus are interpreted as file tests.
1464 The \-T and \-B switches work as follows.
1465 The first block or so of the file is examined for odd characters such as
1466 strange control codes or metacharacters.
1467 If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file.
1468 Also, any file containing null in the first block is considered a binary file.
1469 If \-T or \-B is used on a filehandle, the current stdio buffer is examined
1470 rather than the first block.
1471 Both \-T and \-B return TRUE on a null file, or a file at EOF when testing
1474 If any of the file tests (or either stat operator) are given the special
1475 filehandle consisting of a solitary underline, then the stat structure
1476 of the previous file test (or stat operator) is used, saving a system
1478 (This doesn't work with \-t, and you need to remember that lstat and -l
1479 will leave values in the stat structure for the symbolic link, not the
1484 print "Can do.\en" if -r $a || -w _ || -x _;
1488 print "Readable\en" if -r _;
1489 print "Writable\en" if -w _;
1490 print "Executable\en" if -x _;
1491 print "Setuid\en" if -u _;
1492 print "Setgid\en" if -g _;
1493 print "Sticky\en" if -k _;
1494 print "Text\en" if -T _;
1495 print "Binary\en" if -B _;
1499 Here is what C has that
1503 Address-of operator.
1505 Dereference-address operator.
1507 Type casting operator.
1511 does a certain amount of expression evaluation at compile time, whenever
1512 it determines that all of the arguments to an operator are static and have
1514 In particular, string concatenation happens at compile time between literals that don't do variable substitution.
1515 Backslash interpretation also happens at compile time.
1520 \'Now is the time for all\' . "\|\e\|n" .
1521 \'good men to come to.\'
1524 and this all reduces to one string internally.
1526 The autoincrement operator has a little extra built-in magic to it.
1527 If you increment a variable that is numeric, or that has ever been used in
1528 a numeric context, you get a normal increment.
1529 If, however, the variable has only been used in string contexts since it
1530 was set, and has a value that is not null and matches the
1531 pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done
1532 as a string, preserving each character within its range, with carry:
1535 print ++($foo = \'99\'); # prints \*(L'100\*(R'
1536 print ++($foo = \'a0\'); # prints \*(L'a1\*(R'
1537 print ++($foo = \'Az\'); # prints \*(L'Ba\*(R'
1538 print ++($foo = \'zz\'); # prints \*(L'aaa\*(R'
1541 The autodecrement is not magical.
1543 The range operator (in an array context) makes use of the magical
1544 autoincrement algorithm if the minimum and maximum are strings.
1547 @alphabet = (\'A\' .. \'Z\');
1549 to get all the letters of the alphabet, or
1551 $hexdigit = (0 .. 9, \'a\' .. \'f\')[$num & 15];
1553 to get a hexadecimal digit, or
1555 @z2 = (\'01\' .. \'31\'); print @z2[$mday];
1557 to get dates with leading zeros.
1558 (If the final value specified is not in the sequence that the magical increment
1559 would produce, the sequence goes until the next value would be longer than
1560 the final value specified.)
1562 The || and && operators differ from C's in that, rather than returning 0 or 1,
1563 they return the last value evaluated.
1564 Thus, a portable way to find out the home directory might be:
1567 $home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
1568 (getpwuid($<))[7] || die "You're homeless!\en";