2 ''' $Header: perl.man.1,v 1.0 87/12/18 16:18:16 root Exp $
4 ''' $Log: perl.man.1,v $
5 ''' Revision 1.0 87/12/18 16:18:16 root
27 ''' Set up \*(-- to give an unbreakable dash;
28 ''' string Tr holds user defined translation string.
29 ''' Bell System Logo is used as a dummy character.
34 .if (\n(.H=4u)&(1m=24u) .ds -- \(bs\h'-12u'\(bs\h'-12u'-\" diablo 10 pitch
35 .if (\n(.H=4u)&(1m=20u) .ds -- \(bs\h'-12u'\(bs\h'-8u'-\" diablo 12 pitch
51 perl - Practical Extraction and Report Language
53 .B perl [options] filename args
56 is a interpreted language optimized for scanning arbitrary text files,
57 extracting information from those text files, and printing reports based
59 It's also a good language for many system management tasks.
60 The language is intended to be practical (easy to use, efficient, complete)
61 rather than beautiful (tiny, elegant, minimal).
62 It combines (in the author's opinion, anyway) some of the best features of C,
63 \fIsed\fR, \fIawk\fR, and \fIsh\fR,
64 so people familiar with those languages should have little difficulty with it.
65 (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
67 Expression syntax corresponds quite closely to C expression syntax.
68 If you have a problem that would ordinarily use \fIsed\fR
69 or \fIawk\fR or \fIsh\fR, but it
70 exceeds their capabilities or must run a little faster,
71 and you don't want to write the silly thing in C, then
74 There are also translators to turn your sed and awk scripts into perl scripts.
79 looks for your script in one of the following places:
81 Specified line by line via
83 switches on the command line.
85 Contained in the file specified by the first filename on the command line.
86 (Note that systems supporting the #! notation invoke interpreters this way.)
88 Passed in via standard input.
90 After locating your script,
92 compiles it to an internal form.
93 If the script is syntactically correct, it is executed.
95 Note: on first reading this section won't make much sense to you. It's here
96 at the front for easy reference.
98 A single-character option may be combined with the following option, if any.
99 This is particularly useful when invoking a script using the #! construct which
100 only allows one argument. Example:
104 #!/bin/perl -spi.bak # same as -s -p -i.bak
111 sets debugging flags.
112 To watch how it executes your script, use
114 (This only works if debugging is compiled into your
118 may be used to enter one line of script.
121 commands may be given to build up a multi-line script.
126 will not look for a script filename in the argument list.
129 specifies that files processed by the <> construct are to be edited
131 It does this by renaming the input file, opening the output file by the
132 same name, and selecting that output file as the default for print statements.
133 The extension, if supplied, is added to the name of the
134 old file to make a backup copy.
135 If no extension is supplied, no backup is made.
136 Saying \*(L"perl -p -i.bak -e "s/foo/bar/;" ... \*(R" is the same as using
144 which is equivalent to
149 if ($ARGV ne $oldargv) {
150 rename($ARGV,$ARGV . '.bak');
151 open(ARGVOUT,">$ARGV");
158 print; # this prints to original filename
163 except that the \-i form doesn't need to compare $ARGV to $oldargv to know when
164 the filename has changed.
165 It does, however, use ARGVOUT for the selected filehandle.
166 Note that stdout is restored as the default output filehandle after the loop.
169 may be used in conjunction with
171 to tell the C preprocessor where to look for include files.
172 By default /usr/include and /usr/lib/perl are searched.
177 to assume the following loop around your script, which makes it iterate
178 over filename arguments somewhat like \*(L"sed -n\*(R" or \fIawk\fR:
183 ... # your script goes here
187 Note that the lines are not printed by default.
190 to have lines printed.
195 to assume the following loop around your script, which makes it iterate
196 over filename arguments somewhat like \fIsed\fR:
201 ... # your script goes here
207 Note that the lines are printed automatically.
208 To suppress printing use the
213 causes your script to be run through the C preprocessor before
216 (Since both comments and cpp directives begin with the # character,
217 you should avoid starting comments with any words recognized
218 by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
221 enables some rudimentary switch parsing for switches on the command line
222 after the script name but before any filename arguments.
223 Any switch found there will set the corresponding variable in the
226 The following script prints \*(L"true\*(R" if and only if the script is
227 invoked with a -x switch.
232 if ($x) { print "true\en"; }
235 .Sh "Data Types and Objects"
237 Perl has about two and a half data types: strings, arrays of strings, and
239 Strings and arrays of strings are first class objects, for the most part,
240 in the sense that they can be used as a whole as values in an expression.
241 Associative arrays can only be accessed on an association by association basis;
242 they don't have a value as a whole (at least not yet).
244 Strings are interpreted numerically as appropriate.
245 A string is interpreted as TRUE in the boolean sense if it is not the null
247 Booleans returned by operators are 1 for true and '0' or '' (the null
250 References to string variables always begin with \*(L'$\*(R', even when referring
251 to a string that is part of an array.
256 $days \h'|2i'# a simple string variable
257 $days[28] \h'|2i'# 29th element of array @days
258 $days{'Feb'}\h'|2i'# one value from an associative array
260 but entire arrays are denoted by \*(L'@\*(R':
262 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
266 Any of these four constructs may be assigned to (in compiler lingo, may serve
268 (Additionally, you may find the length of array @days by evaluating
269 \*(L"$#days\*(R", as in
271 [Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.])
273 Every data type has its own namespace.
274 You can, without fear of conflict, use the same name for a string variable,
275 an array, an associative array, a filehandle, a subroutine name, and/or
277 Since variable and array references always start with \*(L'$\*(R'
278 or \*(L'@\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
279 with respect to variable names.
280 (They ARE reserved with respect to labels and filehandles, however, which
281 don't have an initial special character.)
282 Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
284 Names which start with a letter may also contain digits and underscores.
285 Names which do not start with a letter are limited to one character,
286 e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
287 (Many one character names have a predefined significance to
291 String literals are delimited by either single or double quotes.
292 They work much like shell quotes:
293 double-quoted string literals are subject to backslash and variable
294 substitution; single-quoted strings are not.
295 The usual backslash rules apply for making characters such as newline, tab, etc.
296 You can also embed newlines directly in your strings, i.e. they can end on
297 a different line than they begin.
298 This is nice, but if you forget your trailing quote, the error will not be
299 reported until perl finds another line containing the quote character, which
300 may be much further on in the script.
301 Variable substitution inside strings is limited (currently) to simple string variables.
302 The following code segment prints out \*(L"The price is $100.\*(R"
306 $Price = '$100';\h'|3.5i'# not interpreted
307 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
311 Array literals are denoted by separating individual values by commas, and
312 enclosing the list in parentheses.
313 In a context not requiring an array value, the value of the array literal
314 is the value of the final element, as in the C comma operator.
318 @foo = ('cc', '\-E', $bar);
320 assigns the entire array value to array foo, but
322 $foo = ('cc', '\-E', $bar);
325 assigns the value of variable bar to variable foo.
326 Array lists may be assigned to if and only if each element of the list
330 ($a, $b, $c) = (1, 2, 3);
332 ($map{'red'}, $map{'blue'}, $map{'green'}) = (0x00f, 0x0f0, 0xf00);
336 Numeric literals are specified in any of the usual floating point or
339 There are several other pseudo-literals that you should know about.
340 If a string is enclosed by backticks (grave accents), it is interpreted as
341 a command, and the output of that command is the value of the pseudo-literal,
342 just like in any of the standard shells.
343 The command is executed each time the pseudo-literal is evaluated.
344 Unlike in \f2csh\f1, no interpretation is done on the
345 data\*(--newlines remain newlines.
347 Evaluating a filehandle in angle brackets yields the next line
348 from that file (newline included, so it's never false until EOF).
349 Ordinarily you must assign that value to a variable,
350 but there is one situation where in which an automatic assignment happens.
351 If (and only if) the input symbol is the only thing inside the conditional of a
354 automatically assigned to the variable \*(L"$_\*(R".
355 (This may seem like an odd thing to you, but you'll use the construct
359 Anyway, the following lines are equivalent to each other:
363 while ($_ = <stdin>) {
365 for (\|;\|<stdin>;\|) {
374 Additional filehandles may be created with the
378 The null filehandle <> is special and can be used to emulate the behavior of
379 \fIsed\fR and \fIawk\fR.
380 Input from <> comes either from standard input, or from each file listed on
382 Here's how it works: the first time <> is evaluated, the ARGV array is checked,
383 and if it is null, $ARGV[0] is set to '-', which when opened gives you standard
385 The ARGV array is then processed as a list of filenames.
391 .\|.\|. # code for each line
397 unshift(@ARGV, '\-') \|if \|$#ARGV < $[;
398 while ($ARGV = shift) {
401 .\|.\|. # code for each line
406 except that it isn't as cumbersome to say.
407 It really does shift array ARGV and put the current filename into
409 It also uses filehandle ARGV internally.
410 You can modify @ARGV before the first <> as long as you leave the first
411 filename at the beginning of the array.
413 If you want to set @ARGV to you own list of files, go right ahead.
414 If you want to pass switches into your script, you can
415 put a loop on the front like this:
419 while ($_ = $ARGV[0], /\|^\-/\|) {
421 last if /\|^\-\|\-$\|/\|;
422 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
423 /\|^\-v\|/ \|&& \|$verbose++;
424 .\|.\|. # other switches
427 .\|.\|. # code for each line
431 The <> symbol will return FALSE only once.
432 If you call it again after this it will assume you are processing another
433 @ARGV list, and if you haven't set @ARGV, will input from stdin.
438 script consists of a sequence of declarations and commands.
439 The only things that need to be declared in
441 are report formats and subroutines.
442 See the sections below for more information on those declarations.
443 All objects are assumed to start with a null or 0 value.
444 The sequence of commands is executed just once, unlike in
448 scripts, where the sequence of commands is executed for each input line.
449 While this means that you must explicitly loop over the lines of your input file
450 (or files), it also means you have much more control over which files and which
452 (Actually, I'm lying\*(--it is possible to do an implicit loop with either the
458 A declaration can be put anywhere a command can, but has no effect on the
459 execution of the primary sequence of commands.
460 Typically all the declarations are put at the beginning or the end of the script.
463 is, for the most part, a free-form language.
464 (The only exception to this is format declarations, for fairly obvious reasons.)
465 Comments are indicated by the # character, and extend to the end of the line.
466 If you attempt to use /* */ C comments, it will be interpreted either as
467 division or pattern matching, depending on the context.
469 .Sh "Compound statements"
472 a sequence of commands may be treated as one command by enclosing it
474 We will call this a BLOCK.
476 The following compound commands may be used to control flow:
481 if (EXPR) BLOCK else BLOCK
482 if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
483 LABEL while (EXPR) BLOCK
484 LABEL while (EXPR) BLOCK continue BLOCK
485 LABEL for (EXPR; EXPR; EXPR) BLOCK
486 LABEL BLOCK continue BLOCK
489 (Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
491 This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
492 If you want to write conditionals without curly brackets there are several
494 The following all do the same thing:
498 if (!open(foo)) { die "Can't open $foo"; }
499 die "Can't open $foo" unless open(foo);
500 open(foo) || die "Can't open $foo"; # foo or bust!
501 open(foo) ? die "Can't open $foo" : 'hi mom';
504 though the last one is a bit exotic.)
508 statement is straightforward.
509 Since BLOCKs are always bounded by curly brackets, there is never any
510 ambiguity about which
519 the sense of the test is reversed.
523 statement executes the block as long as the expression is true
524 (does not evaluate to the null string or 0).
525 The LABEL is optional, and if present, consists of an identifier followed by
527 The LABEL identifies the loop for the loop control statements
535 BLOCK, it is always executed just before
536 the conditional is about to be evaluated again, similarly to the third part
540 Thus it can be used to increment a loop variable, even when the loop has
541 been continued via the
543 statement (similar to the C \*(L"continue\*(R" statement).
547 is replaced by the word
549 the sense of the test is reversed, but the conditional is still tested before
556 statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
557 is true if the value of the last command in that block is true.
561 loop works exactly like the corresponding
567 for ($i = 1; $i < 10; $i++) {
581 The BLOCK by itself (labeled or not) is equivalent to a loop that executes
583 Thus you can use any of the loop control statements in it to leave or
588 This construct is particularly nice for doing case structures.
593 if (/abc/) { $abc = 1; last foo; }
594 if (/def/) { $def = 1; last foo; }
595 if (/xyz/) { $xyz = 1; last foo; }
600 .Sh "Simple statements"
601 The only kind of simple statement is an expression evaluated for its side
603 Every expression (simple statement) must be terminated with a semicolon.
604 Note that this is like C, but unlike Pascal (and
607 Any simple statement may optionally be followed by a
608 single modifier, just before the terminating semicolon.
609 The possible modifiers are:
623 modifiers have the expected semantics.
628 modifiers also have the expected semantics (conditional evaluated first),
629 except when applied to a do-BLOCK command,
630 in which case the block executes once before the conditional is evaluated.
631 This is so that you can write loops like:
638 } until $_ \|eq \|".\|\e\|n";
643 operator below. Note also that the loop control commands described later will
644 NOT work in this construct, since loop modifiers don't take loop labels.
649 expressions work almost exactly like C expressions, only the differences
650 will be mentioned here.
656 The null list, used to initialize an array to null.
658 Concatenation of two strings.
660 The corresponding assignment operator.
662 String equality (== is numeric equality).
663 For a mnemonic just think of \*(L"eq\*(R" as a string.
664 (If you are used to the
666 behavior of using == for either string or numeric equality
667 based on the current form of the comparands, beware!
668 You must be explicit here.)
670 String inequality (!= is numeric inequality).
676 String less than or equal.
678 String greater than or equal.
680 Certain operations search or modify the string \*(L"$_\*(R" by default.
681 This operator makes that kind of operation work on some other string.
682 The right argument is a search pattern, substitution, or translation.
683 The left argument is what is supposed to be searched, substituted, or
684 translated instead of the default \*(L"$_\*(R".
685 The return value indicates the success of the operation.
686 (If the right argument is an expression other than a search pattern,
687 substitution, or translation, it is interpreted as a search pattern
689 This is less efficient than an explicit search, since the pattern must
690 be compiled every time the expression is evaluated.)
691 The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
693 Just like =~ except the return value is negated.
695 The repetition operator.
696 Returns a string consisting of the left operand repeated the
697 number of times specified by the right operand.
700 print '-' x 80; # print row of dashes
701 print '-' x80; # illegal, x80 is identifier
703 print "\et" x ($tab/8), ' ' x ($tab%8); # tab over
707 The corresponding assignment operator.
709 The range operator, which is bistable.
710 It is false as long as its left argument is false.
711 Once the left argument is true, it stays true until the right argument is true,
712 AFTER which it becomes false again.
713 (It doesn't become false till the next time it's evaluated.
714 It can become false on the same evaluation it became true, but it still returns
716 The .. operator is primarily intended for doing line number ranges after
717 the fashion of \fIsed\fR or \fIawk\fR.
718 The precedence is a little lower than || and &&.
719 The value returned is either the null string for false, or a sequence number
720 (beginning with 1) for true.
721 The sequence number is reset for each range encountered.
722 The final sequence number in a range has the string 'E0' appended to it, which
723 doesn't affect its numeric value, but gives you something to search for if you
724 want to exclude the endpoint.
725 You can exclude the beginning point by waiting for the sequence number to be
727 If either argument to .. is static, that argument is implicitly compared to
728 the $. variable, the current line number.
733 if (101 .. 200) { print; } # print 2nd hundred lines
735 next line if (1 .. /^$/); # skip header lines
737 s/^/> / if (/^$/ .. eof()); # quote body
741 Here is what C has that
747 Dereference-address operator.
751 does a certain amount of expression evaluation at compile time, whenever
752 it determines that all of the arguments to an operator are static and have
754 In particular, string concatenation happens at compile time between literals that don't do variable substitution.
755 Backslash interpretation also happens at compile time.
760 'Now is the time for all' . "\|\e\|n" .
761 'good men to come to.'
764 and this all reduces to one string internally.
766 Along with the literals and variables mentioned earlier,
767 the following operations can serve as terms in an expression:
769 Searches a string for a pattern, and returns true (1) or false ('').
770 If no string is specified via the =~ or !~ operator,
771 the $_ string is searched.
772 (The string specified with =~ need not be an lvalue\*(--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.)
773 See also the section on regular expressions.
775 If you prepend an `m' you can use any pair of characters as delimiters.
776 This is particularly useful for matching Unix path names that contain `/'.
782 open(tty, '/dev/tty');
783 <tty> \|=~ \|/\|^[Yy]\|/ \|&& \|do foo(\|); # do foo if desired
785 if (/Version: \|*\|([0-9.]*\|)\|/\|) { $version = $1; }
787 next if m#^/usr/spool/uucp#;
791 This is just like the /pattern/ search, except that it matches only once between
795 This is a useful optimization when you only want to see the first occurence of
796 something in each of a set of files, for instance.
798 Changes the working director to EXPR, if possible.
799 Returns 1 upon success, 0 otherwise.
800 See example under die().
802 Changes the permissions of a list of files.
803 The first element of the list must be the numerical mode.
804 LIST may be an array, in which case you may wish to use the unshift()
805 command to put the mode on the front of the array.
806 Returns the number of files successfully changed.
807 Note: in order to use the value you must put the whole thing in parentheses.
810 $cnt = (chmod 0755,'foo','bar');
813 .Ip "chop(VARIABLE)" 8 5
815 Chops off the last character of a string and returns it.
816 It's used primarily to remove the newline from the end of an input record,
817 but is much more efficient than s/\en// because it neither scans nor copies
819 If VARIABLE is omitted, chops $_.
825 chop; # avoid \en on last field
832 Changes the owner (and group) of a list of files.
833 LIST may be an array.
834 The first two elements of the list must be the NUMERICAL uid and gid, in that order.
835 Returns the number of files successfully changed.
836 Note: in order to use the value you must put the whole thing in parentheses.
839 $cnt = (chown $uid,$gid,'foo');
842 Here's an example of looking up non-numeric uids:
848 open(pass,'/etc/passwd') || die "Can't open passwd";
850 ($login,$pass,$uid,$gid) = split(/:/);
854 @ary = ('foo','bar','bie','doll');
855 if ($uid{$user} eq '') {
856 die "$user not in passwd file";
859 unshift(@ary,$uid{$user},$gid{$user});
864 .Ip "close(FILEHANDLE)" 8 5
865 .Ip "close FILEHANDLE" 8
866 Closes the file or pipe associated with the file handle.
867 You don't have to close FILEHANDLE if you are immediately going to
868 do another open on it, since open will close it for you.
871 However, an explicit close on an input file resets the line counter ($.), while
872 the implicit close done by
875 Also, closing a pipe will wait for the process executing on the pipe to complete,
876 in case you want to look at the output of the pipe afterwards.
881 open(output,'|sort >foo'); # pipe to sort
882 ... # print stuff to output
883 close(output); # wait for sort to finish
884 open(input,'foo'); # get sort's results
887 .Ip "crypt(PLAINTEXT,SALT)" 8 6
888 Encrypts a string exactly like the crypt() function in the C library.
889 Useful for checking the password file for lousy passwords.
890 Only the guys wearing white hats should do this.
892 Prints the value of EXPR to stderr and exits with a non-zero status.
897 die "Can't cd to spool." unless chdir '/usr/spool/news';
899 (chdir '/usr/spool/news') || die "Can't cd to spool."
902 Note that the parens are necessary above due to precedence.
906 Returns the value of the last command in the sequence of commands indicated
908 When modified by a loop modifier, executes the BLOCK once before testing the
910 (On other statements the loop modifiers test the conditional first.)
911 .Ip "do SUBROUTINE (LIST)" 8 3
912 Executes a SUBROUTINE declared by a
914 declaration, and returns the value
915 of the last expression evaluated in SUBROUTINE.
916 (See the section on subroutines later on.)
917 .Ip "each(ASSOC_ARRAY)" 8 6
918 Returns a 2 element array consisting of the key and value for the next
919 value of an associative array, so that you can iterate over it.
920 Entries are returned in an apparently random order.
921 When the array is entirely read, a null array is returned (which when
922 assigned produces a FALSE (0) value).
923 The next call to each() after that will start iterating again.
924 The iterator can be reset only by reading all the elements from the array.
925 The following prints out your environment like the printenv program, only
926 in a different order:
930 while (($key,$value) = each(ENV)) {
931 print "$key=$value\en";
935 See also keys() and values().
936 .Ip "eof(FILEHANDLE)" 8 8
938 Returns 1 if the next read on FILEHANDLE will return end of file, or if
939 FILEHANDLE is not open.
940 If (FILEHANDLE) is omitted, the eof status is returned for the last file read.
941 The null filehandle may be used to indicate the pseudo file formed of the
942 files listed on the command line, i.e. eof() is reasonable to use inside
948 # insert dashes just before last line
951 print "--------------\en";
958 If there is more than one argument in LIST,
959 calls execvp() with the arguments in LIST.
960 If there is only one argument, the argument is checked for shell metacharacters.
961 If there are any, the entire argument is passed to /bin/sh -c for parsing.
962 If there are none, the argument is split into words and passed directly to
963 execvp(), which is more efficient.
964 Note: exec (and system) do not flush your output buffer, so you may need to
965 set $| to avoid lost output.
967 Evaluates EXPR and exits immediately with that value.
973 exit 0 \|if \|$ans \|=~ \|/\|^[Xx]\|/\|;
979 Returns e to the power of EXPR.
982 Returns the child pid to the parent process and 0 to the child process.
983 Note: unflushed buffers remain unflushed in both processes, which means
984 you may need to set $| to avoid duplicate output.
985 .Ip "gmtime(EXPR)" 8 4
986 Converts a time as returned by the time function to a 9-element array with
987 the time analyzed for the Greenwich timezone.
988 Typically used as follows:
992 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)
996 All array elements are numeric.