2 ''' $Header: perl.man.1,v 1.0.1.2 88/01/30 17:04:07 root Exp $
4 ''' $Log: perl.man.1,v $
5 ''' Revision 1.0.1.2 88/01/30 17:04:07 root
6 ''' patch 11: random cleanup
8 ''' Revision 1.0.1.1 88/01/28 10:24:44 root
9 ''' patch8: added eval operator.
11 ''' Revision 1.0 87/12/18 16:18:16 root
33 ''' Set up \*(-- to give an unbreakable dash;
34 ''' string Tr holds user defined translation string.
35 ''' Bell System Logo is used as a dummy character.
40 .if (\n(.H=4u)&(1m=24u) .ds -- \(bs\h'-12u'\(bs\h'-12u'-\" diablo 10 pitch
41 .if (\n(.H=4u)&(1m=20u) .ds -- \(bs\h'-12u'\(bs\h'-8u'-\" diablo 12 pitch
57 perl - Practical Extraction and Report Language
59 .B perl [options] filename args
62 is a interpreted language optimized for scanning arbitrary text files,
63 extracting information from those text files, and printing reports based
65 It's also a good language for many system management tasks.
66 The language is intended to be practical (easy to use, efficient, complete)
67 rather than beautiful (tiny, elegant, minimal).
68 It combines (in the author's opinion, anyway) some of the best features of C,
69 \fIsed\fR, \fIawk\fR, and \fIsh\fR,
70 so people familiar with those languages should have little difficulty with it.
71 (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
73 Expression syntax corresponds quite closely to C expression syntax.
74 If you have a problem that would ordinarily use \fIsed\fR
75 or \fIawk\fR or \fIsh\fR, but it
76 exceeds their capabilities or must run a little faster,
77 and you don't want to write the silly thing in C, then
80 There are also translators to turn your sed and awk scripts into perl scripts.
85 looks for your script in one of the following places:
87 Specified line by line via
89 switches on the command line.
91 Contained in the file specified by the first filename on the command line.
92 (Note that systems supporting the #! notation invoke interpreters this way.)
94 Passed in via standard input.
96 After locating your script,
98 compiles it to an internal form.
99 If the script is syntactically correct, it is executed.
101 Note: on first reading this section may not make much sense to you. It's here
102 at the front for easy reference.
104 A single-character option may be combined with the following option, if any.
105 This is particularly useful when invoking a script using the #! construct which
106 only allows one argument. Example:
110 #!/bin/perl -spi.bak # same as -s -p -i.bak
117 sets debugging flags.
118 To watch how it executes your script, use
120 (This only works if debugging is compiled into your
124 may be used to enter one line of script.
127 commands may be given to build up a multi-line script.
132 will not look for a script filename in the argument list.
135 specifies that files processed by the <> construct are to be edited
137 It does this by renaming the input file, opening the output file by the
138 same name, and selecting that output file as the default for print statements.
139 The extension, if supplied, is added to the name of the
140 old file to make a backup copy.
141 If no extension is supplied, no backup is made.
142 Saying \*(L"perl -p -i.bak -e "s/foo/bar/;" ... \*(R" is the same as using
150 which is equivalent to
155 if ($ARGV ne $oldargv) {
156 rename($ARGV,$ARGV . '.bak');
157 open(ARGVOUT,">$ARGV");
164 print; # this prints to original filename
169 except that the \-i form doesn't need to compare $ARGV to $oldargv to know when
170 the filename has changed.
171 It does, however, use ARGVOUT for the selected filehandle.
172 Note that stdout is restored as the default output filehandle after the loop.
175 may be used in conjunction with
177 to tell the C preprocessor where to look for include files.
178 By default /usr/include and /usr/lib/perl are searched.
183 to assume the following loop around your script, which makes it iterate
184 over filename arguments somewhat like \*(L"sed -n\*(R" or \fIawk\fR:
189 ... # your script goes here
193 Note that the lines are not printed by default.
196 to have lines printed.
201 to assume the following loop around your script, which makes it iterate
202 over filename arguments somewhat like \fIsed\fR:
207 ... # your script goes here
213 Note that the lines are printed automatically.
214 To suppress printing use the
224 causes your script to be run through the C preprocessor before
227 (Since both comments and cpp directives begin with the # character,
228 you should avoid starting comments with any words recognized
229 by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
232 enables some rudimentary switch parsing for switches on the command line
233 after the script name but before any filename arguments (or before a --).
234 Any switch found there is removed from @ARGV and sets the corresponding variable in the
237 The following script prints \*(L"true\*(R" if and only if the script is
238 invoked with a -xyz switch.
243 if ($xyz) { print "true\en"; }
246 .Sh "Data Types and Objects"
248 Perl has about two and a half data types: strings, arrays of strings, and
250 Strings and arrays of strings are first class objects, for the most part,
251 in the sense that they can be used as a whole as values in an expression.
252 Associative arrays can only be accessed on an association by association basis;
253 they don't have a value as a whole (at least not yet).
255 Strings are interpreted numerically as appropriate.
256 A string is interpreted as TRUE in the boolean sense if it is not the null
258 Booleans returned by operators are 1 for true and '0' or '' (the null
261 References to string variables always begin with \*(L'$\*(R', even when referring
262 to a string that is part of an array.
267 $days \h'|2i'# a simple string variable
268 $days[28] \h'|2i'# 29th element of array @days
269 $days{'Feb'}\h'|2i'# one value from an associative array
271 but entire arrays are denoted by \*(L'@\*(R':
273 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
277 Any of these four constructs may be assigned to (in compiler lingo, may serve
279 (Additionally, you may find the length of array @days by evaluating
280 \*(L"$#days\*(R", as in
282 [Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.])
284 Every data type has its own namespace.
285 You can, without fear of conflict, use the same name for a string variable,
286 an array, an associative array, a filehandle, a subroutine name, and/or
288 Since variable and array references always start with \*(L'$\*(R'
289 or \*(L'@\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
290 with respect to variable names.
291 (They ARE reserved with respect to labels and filehandles, however, which
292 don't have an initial special character.)
293 Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
295 Names which start with a letter may also contain digits and underscores.
296 Names which do not start with a letter are limited to one character,
297 e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
298 (Many one character names have a predefined significance to
302 String literals are delimited by either single or double quotes.
303 They work much like shell quotes:
304 double-quoted string literals are subject to backslash and variable
305 substitution; single-quoted strings are not.
306 The usual backslash rules apply for making characters such as newline, tab, etc.
307 You can also embed newlines directly in your strings, i.e. they can end on
308 a different line than they begin.
309 This is nice, but if you forget your trailing quote, the error will not be
310 reported until perl finds another line containing the quote character, which
311 may be much further on in the script.
312 Variable substitution inside strings is limited (currently) to simple string variables.
313 The following code segment prints out \*(L"The price is $100.\*(R"
317 $Price = '$100';\h'|3.5i'# not interpreted
318 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
321 Note that you can put curly brackets around the identifier to delimit it
322 from following alphanumerics.
324 Array literals are denoted by separating individual values by commas, and
325 enclosing the list in parentheses.
326 In a context not requiring an array value, the value of the array literal
327 is the value of the final element, as in the C comma operator.
332 @foo = ('cc', '\-E', $bar);
334 assigns the entire array value to array foo, but
336 $foo = ('cc', '\-E', $bar);
339 assigns the value of variable bar to variable foo.
340 Array lists may be assigned to if and only if each element of the list
344 ($a, $b, $c) = (1, 2, 3);
346 ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
350 Numeric literals are specified in any of the usual floating point or
353 There are several other pseudo-literals that you should know about.
354 If a string is enclosed by backticks (grave accents), it is interpreted as
355 a command, and the output of that command is the value of the pseudo-literal,
356 just like in any of the standard shells.
357 The command is executed each time the pseudo-literal is evaluated.
358 Unlike in \f2csh\f1, no interpretation is done on the
359 data\*(--newlines remain newlines.
360 The status value of the command is returned in $?.
362 Evaluating a filehandle in angle brackets yields the next line
363 from that file (newline included, so it's never false until EOF).
364 Ordinarily you must assign that value to a variable,
365 but there is one situation where in which an automatic assignment happens.
366 If (and only if) the input symbol is the only thing inside the conditional of a
369 automatically assigned to the variable \*(L"$_\*(R".
370 (This may seem like an odd thing to you, but you'll use the construct
374 Anyway, the following lines are equivalent to each other:
378 while ($_ = <stdin>) {
380 for (\|;\|<stdin>;\|) {
389 Additional filehandles may be created with the
393 The null filehandle <> is special and can be used to emulate the behavior of
394 \fIsed\fR and \fIawk\fR.
395 Input from <> comes either from standard input, or from each file listed on
397 Here's how it works: the first time <> is evaluated, the ARGV array is checked,
398 and if it is null, $ARGV[0] is set to '-', which when opened gives you standard
400 The ARGV array is then processed as a list of filenames.
406 .\|.\|. # code for each line
412 unshift(@ARGV, '\-') \|if \|$#ARGV < $[;
413 while ($ARGV = shift) {
416 .\|.\|. # code for each line
421 except that it isn't as cumbersome to say.
422 It really does shift array ARGV and put the current filename into
424 It also uses filehandle ARGV internally.
425 You can modify @ARGV before the first <> as long as you leave the first
426 filename at the beginning of the array.
427 Line numbers ($.) continue as if the input was one big happy file.
430 If you want to set @ARGV to you own list of files, go right ahead.
431 If you want to pass switches into your script, you can
432 put a loop on the front like this:
436 while ($_ = $ARGV[0], /\|^\-/\|) {
438 last if /\|^\-\|\-$\|/\|;
439 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
440 /\|^\-v\|/ \|&& \|$verbose++;
441 .\|.\|. # other switches
444 .\|.\|. # code for each line
448 The <> symbol will return FALSE only once.
449 If you call it again after this it will assume you are processing another
450 @ARGV list, and if you haven't set @ARGV, will input from stdin.
455 script consists of a sequence of declarations and commands.
456 The only things that need to be declared in
458 are report formats and subroutines.
459 See the sections below for more information on those declarations.
460 All objects are assumed to start with a null or 0 value.
461 The sequence of commands is executed just once, unlike in
465 scripts, where the sequence of commands is executed for each input line.
466 While this means that you must explicitly loop over the lines of your input file
467 (or files), it also means you have much more control over which files and which
469 (Actually, I'm lying\*(--it is possible to do an implicit loop with either the
475 A declaration can be put anywhere a command can, but has no effect on the
476 execution of the primary sequence of commands.
477 Typically all the declarations are put at the beginning or the end of the script.
480 is, for the most part, a free-form language.
481 (The only exception to this is format declarations, for fairly obvious reasons.)
482 Comments are indicated by the # character, and extend to the end of the line.
483 If you attempt to use /* */ C comments, it will be interpreted either as
484 division or pattern matching, depending on the context.
486 .Sh "Compound statements"
489 a sequence of commands may be treated as one command by enclosing it
491 We will call this a BLOCK.
493 The following compound commands may be used to control flow:
498 if (EXPR) BLOCK else BLOCK
499 if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
500 LABEL while (EXPR) BLOCK
501 LABEL while (EXPR) BLOCK continue BLOCK
502 LABEL for (EXPR; EXPR; EXPR) BLOCK
503 LABEL BLOCK continue BLOCK
506 Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
508 This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
509 If you want to write conditionals without curly brackets there are several
511 The following all do the same thing:
515 if (!open(foo)) { die "Can't open $foo"; }
516 die "Can't open $foo" unless open(foo);
517 open(foo) || die "Can't open $foo"; # foo or bust!
518 open(foo) ? die "Can't open $foo" : 'hi mom';
519 # a bit exotic, that last one
525 statement is straightforward.
526 Since BLOCKs are always bounded by curly brackets, there is never any
527 ambiguity about which
536 the sense of the test is reversed.
540 statement executes the block as long as the expression is true
541 (does not evaluate to the null string or 0).
542 The LABEL is optional, and if present, consists of an identifier followed by
544 The LABEL identifies the loop for the loop control statements
552 BLOCK, it is always executed just before
553 the conditional is about to be evaluated again, similarly to the third part
557 Thus it can be used to increment a loop variable, even when the loop has
558 been continued via the
560 statement (similar to the C \*(L"continue\*(R" statement).
564 is replaced by the word
566 the sense of the test is reversed, but the conditional is still tested before
573 statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
574 is true if the value of the last command in that block is true.
578 loop works exactly like the corresponding
584 for ($i = 1; $i < 10; $i++) {
598 The BLOCK by itself (labeled or not) is equivalent to a loop that executes
600 Thus you can use any of the loop control statements in it to leave or
605 This construct is particularly nice for doing case structures.
610 if (/abc/) { $abc = 1; last foo; }
611 if (/def/) { $def = 1; last foo; }
612 if (/xyz/) { $xyz = 1; last foo; }
617 .Sh "Simple statements"
618 The only kind of simple statement is an expression evaluated for its side
620 Every expression (simple statement) must be terminated with a semicolon.
621 Note that this is like C, but unlike Pascal (and
624 Any simple statement may optionally be followed by a
625 single modifier, just before the terminating semicolon.
626 The possible modifiers are:
640 modifiers have the expected semantics.
645 modifiers also have the expected semantics (conditional evaluated first),
646 except when applied to a do-BLOCK command,
647 in which case the block executes once before the conditional is evaluated.
648 This is so that you can write loops like:
655 } until $_ \|eq \|".\|\e\|n";
660 operator below. Note also that the loop control commands described later will
661 NOT work in this construct, since modifiers don't take loop labels.
666 expressions work almost exactly like C expressions, only the differences
667 will be mentioned here.
673 The null list, used to initialize an array to null.
675 Concatenation of two strings.
677 The corresponding assignment operator.
679 String equality (== is numeric equality).
680 For a mnemonic just think of \*(L"eq\*(R" as a string.
681 (If you are used to the
683 behavior of using == for either string or numeric equality
684 based on the current form of the comparands, beware!
685 You must be explicit here.)
687 String inequality (!= is numeric inequality).
693 String less than or equal.
695 String greater than or equal.
697 Certain operations search or modify the string \*(L"$_\*(R" by default.
698 This operator makes that kind of operation work on some other string.
699 The right argument is a search pattern, substitution, or translation.
700 The left argument is what is supposed to be searched, substituted, or
701 translated instead of the default \*(L"$_\*(R".
702 The return value indicates the success of the operation.
703 (If the right argument is an expression other than a search pattern,
704 substitution, or translation, it is interpreted as a search pattern
706 This is less efficient than an explicit search, since the pattern must
707 be compiled every time the expression is evaluated.)
708 The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
710 Just like =~ except the return value is negated.
712 The repetition operator.
713 Returns a string consisting of the left operand repeated the
714 number of times specified by the right operand.
717 print '-' x 80; # print row of dashes
718 print '-' x80; # illegal, x80 is identifier
720 print "\et" x ($tab/8), ' ' x ($tab%8); # tab over
724 The corresponding assignment operator.
726 The range operator, which is bistable.
727 It is false as long as its left argument is false.
728 Once the left argument is true, it stays true until the right argument is true,
729 AFTER which it becomes false again.
730 (It doesn't become false till the next time it's evaluated.
731 It can become false on the same evaluation it became true, but it still returns
733 The .. operator is primarily intended for doing line number ranges after
734 the fashion of \fIsed\fR or \fIawk\fR.
735 The precedence is a little lower than || and &&.
736 The value returned is either the null string for false, or a sequence number
737 (beginning with 1) for true.
738 The sequence number is reset for each range encountered.
739 The final sequence number in a range has the string 'E0' appended to it, which
740 doesn't affect its numeric value, but gives you something to search for if you
741 want to exclude the endpoint.
742 You can exclude the beginning point by waiting for the sequence number to be
744 If either argument to .. is static, that argument is implicitly compared to
745 the $. variable, the current line number.
750 if (101 .. 200) { print; } # print 2nd hundred lines
752 next line if (1 .. /^$/); # skip header lines
754 s/^/> / if (/^$/ .. eof()); # quote body
758 Here is what C has that
764 Dereference-address operator.
768 does a certain amount of expression evaluation at compile time, whenever
769 it determines that all of the arguments to an operator are static and have
771 In particular, string concatenation happens at compile time between literals that don't do variable substitution.
772 Backslash interpretation also happens at compile time.
777 'Now is the time for all' . "\|\e\|n" .
778 'good men to come to.'
781 and this all reduces to one string internally.
783 Along with the literals and variables mentioned earlier,
784 the following operations can serve as terms in an expression:
786 Searches a string for a pattern, and returns true (1) or false ('').
787 If no string is specified via the =~ or !~ operator,
788 the $_ string is searched.
789 (The string specified with =~ need not be an lvalue\*(--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.)
790 See also the section on regular expressions.
792 If you prepend an `m' you can use any pair of characters as delimiters.
793 This is particularly useful for matching Unix path names that contain `/'.
799 open(tty, '/dev/tty');
800 <tty> \|=~ \|/\|^[Yy]\|/ \|&& \|do foo(\|); # do foo if desired
802 if (/Version: \|*\|([0-9.]*\|)\|/\|) { $version = $1; }
804 next if m#^/usr/spool/uucp#;
808 This is just like the /pattern/ search, except that it matches only once between
812 This is a useful optimization when you only want to see the first occurence of
813 something in each of a set of files, for instance.
815 Changes the working director to EXPR, if possible.
816 Returns 1 upon success, 0 otherwise.
817 See example under die().
819 Changes the permissions of a list of files.
820 The first element of the list must be the numerical mode.
821 LIST may be an array, in which case you may wish to use the unshift()
822 command to put the mode on the front of the array.
823 Returns the number of files successfully changed.
824 Note: in order to use the value you must put the whole thing in parentheses.
827 $cnt = (chmod 0755,'foo','bar');
830 .Ip "chop(VARIABLE)" 8 5
832 Chops off the last character of a string and returns it.
833 It's used primarily to remove the newline from the end of an input record,
834 but is much more efficient than s/\en// because it neither scans nor copies
836 If VARIABLE is omitted, chops $_.
842 chop; # avoid \en on last field
849 Changes the owner (and group) of a list of files.
850 LIST may be an array.
851 The first two elements of the list must be the NUMERICAL uid and gid, in that order.
852 Returns the number of files successfully changed.
853 Note: in order to use the value you must put the whole thing in parentheses.
856 $cnt = (chown $uid,$gid,'foo');
860 Here's an example of looking up non-numeric uids:
865 open(pass,'/etc/passwd') || die "Can't open passwd";
867 ($login,$pass,$uid,$gid) = split(/:/);
871 @ary = ('foo','bar','bie','doll');
872 if ($uid{$user} eq '') {
873 die "$user not in passwd file";
876 unshift(@ary,$uid{$user},$gid{$user});
881 .Ip "close(FILEHANDLE)" 8 5
882 .Ip "close FILEHANDLE" 8
883 Closes the file or pipe associated with the file handle.
884 You don't have to close FILEHANDLE if you are immediately going to
885 do another open on it, since open will close it for you.
888 However, an explicit close on an input file resets the line counter ($.), while
889 the implicit close done by
892 Also, closing a pipe will wait for the process executing on the pipe to complete,
893 in case you want to look at the output of the pipe afterwards.
898 open(output,'|sort >foo'); # pipe to sort
899 ... # print stuff to output
900 close(output); # wait for sort to finish
901 open(input,'foo'); # get sort's results
904 .Ip "crypt(PLAINTEXT,SALT)" 8 6
905 Encrypts a string exactly like the crypt() function in the C library.
906 Useful for checking the password file for lousy passwords.
907 Only the guys wearing white hats should do this.
909 Prints the value of EXPR to stderr and exits with a non-zero status.
914 die "Can't cd to spool." unless chdir '/usr/spool/news';
916 (chdir '/usr/spool/news') || die "Can't cd to spool."
919 Note that the parens are necessary above due to precedence.
923 Returns the value of the last command in the sequence of commands indicated
925 When modified by a loop modifier, executes the BLOCK once before testing the
927 (On other statements the loop modifiers test the conditional first.)
928 .Ip "do SUBROUTINE (LIST)" 8 3
929 Executes a SUBROUTINE declared by a
931 declaration, and returns the value
932 of the last expression evaluated in SUBROUTINE.
933 (See the section on subroutines later on.)
934 .Ip "each(ASSOC_ARRAY)" 8 6
935 Returns a 2 element array consisting of the key and value for the next
936 value of an associative array, so that you can iterate over it.
937 Entries are returned in an apparently random order.
938 When the array is entirely read, a null array is returned (which when
939 assigned produces a FALSE (0) value).
940 The next call to each() after that will start iterating again.
941 The iterator can be reset only by reading all the elements from the array.
942 You should not modify the array while iterating over it.
943 The following prints out your environment like the printenv program, only
944 in a different order:
948 while (($key,$value) = each(ENV)) {
949 print "$key=$value\en";
953 See also keys() and values().
954 .Ip "eof(FILEHANDLE)" 8 8
956 Returns 1 if the next read on FILEHANDLE will return end of file, or if
957 FILEHANDLE is not open.
958 If (FILEHANDLE) is omitted, the eof status is returned for the last file read.
959 The null filehandle may be used to indicate the pseudo file formed of the
960 files listed on the command line, i.e. eof() is reasonable to use inside
966 # insert dashes just before last line
969 print "--------------\en";
976 EXPR is parsed and executed as if it were a little perl program.
977 It is executed in the context of the current perl program, so that
978 any variable settings, subroutine or format definitions remain afterwards.
979 The value returned is the value of the last expression evaluated, just
981 If there is a syntax error or runtime error, a null string is returned by
982 eval, and $@ is set to the error message.
983 If there was no error, $@ is null.
985 If there is more than one argument in LIST,
986 calls execvp() with the arguments in LIST.
987 If there is only one argument, the argument is checked for shell metacharacters.
988 If there are any, the entire argument is passed to /bin/sh -c for parsing.
989 If there are none, the argument is split into words and passed directly to
990 execvp(), which is more efficient.
991 Note: exec (and system) do not flush your output buffer, so you may need to
992 set $| to avoid lost output.
994 Evaluates EXPR and exits immediately with that value.
1000 exit 0 \|if \|$ans \|=~ \|/\|^[Xx]\|/\|;
1006 Returns e to the power of EXPR.
1009 Returns the child pid to the parent process and 0 to the child process.
1010 Note: unflushed buffers remain unflushed in both processes, which means
1011 you may need to set $| to avoid duplicate output.
1012 .Ip "gmtime(EXPR)" 8 4
1013 Converts a time as returned by the time function to a 9-element array with
1014 the time analyzed for the Greenwich timezone.
1015 Typically used as follows:
1019 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)
1023 All array elements are numeric.