perl 3.0 patch #5 (combined patch)
[p5sagit/p5-mst-13.2.git] / perl.man.1
CommitLineData
8d063cd8 1.rn '' }`
ae986130 2''' $Header: perl.man.1,v 3.0.1.1 89/11/11 04:41:22 lwall Locked $
8d063cd8 3'''
4''' $Log: perl.man.1,v $
ae986130 5''' Revision 3.0.1.1 89/11/11 04:41:22 lwall
6''' patch2: explained about sh and ${1+"$@"}
7''' patch2: documented that space must separate word and '' string
8'''
a687059c 9''' Revision 3.0 89/10/18 15:21:29 lwall
10''' 3.0 baseline
8d063cd8 11'''
12'''
13.de Sh
14.br
15.ne 5
16.PP
17\fB\\$1\fR
18.PP
19..
20.de Sp
21.if t .sp .5v
22.if n .sp
23..
24.de Ip
25.br
26.ie \\n.$>=3 .ne \\$3
27.el .ne 3
28.IP "\\$1" \\$2
29..
30'''
31''' Set up \*(-- to give an unbreakable dash;
32''' string Tr holds user defined translation string.
33''' Bell System Logo is used as a dummy character.
34'''
378cc40b 35.tr \(*W-|\(bv\*(Tr
8d063cd8 36.ie n \{\
378cc40b 37.ds -- \(*W-
38.if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
39.if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
8d063cd8 40.ds L" ""
41.ds R" ""
42.ds L' '
43.ds R' '
44'br\}
45.el\{\
46.ds -- \(em\|
47.tr \*(Tr
48.ds L" ``
49.ds R" ''
50.ds L' `
51.ds R' '
52'br\}
a687059c 53.TH PERL 1 "\*(RP"
54.UC
8d063cd8 55.SH NAME
a687059c 56perl \- Practical Extraction and Report Language
8d063cd8 57.SH SYNOPSIS
a687059c 58.B perl
59[options] filename args
8d063cd8 60.SH DESCRIPTION
61.I Perl
a687059c 62is an interpreted language optimized for scanning arbitrary text files,
8d063cd8 63extracting information from those text files, and printing reports based
64on that information.
65It's also a good language for many system management tasks.
66The language is intended to be practical (easy to use, efficient, complete)
67rather than beautiful (tiny, elegant, minimal).
68It combines (in the author's opinion, anyway) some of the best features of C,
69\fIsed\fR, \fIawk\fR, and \fIsh\fR,
70so people familiar with those languages should have little difficulty with it.
71(Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
72even BASIC-PLUS.)
73Expression syntax corresponds quite closely to C expression syntax.
a687059c 74Unlike most Unix utilities,
75.I perl
76does not arbitrarily limit the size of your data\*(--if you've got
77the memory,
78.I perl
79can slurp in your whole file as a single string.
80Recursion is of unlimited depth.
81And the hash tables used by associative arrays grow as necessary to prevent
82degraded performance.
83.I Perl
84uses sophisticated pattern matching techniques to scan large amounts of
85data very quickly.
86Although optimized for scanning text,
87.I perl
88can also deal with binary data, and can make dbm files look like associative
89arrays (where dbm is available).
90Setuid
91.I perl
92scripts are safer than C programs
93through a dataflow tracing mechanism which prevents many stupid security holes.
8d063cd8 94If you have a problem that would ordinarily use \fIsed\fR
95or \fIawk\fR or \fIsh\fR, but it
96exceeds their capabilities or must run a little faster,
97and you don't want to write the silly thing in C, then
98.I perl
99may be for you.
a687059c 100There are also translators to turn your
101.I sed
102and
103.I awk
104scripts into
105.I perl
106scripts.
8d063cd8 107OK, enough hype.
108.PP
109Upon startup,
110.I perl
111looks for your script in one of the following places:
112.Ip 1. 4 2
113Specified line by line via
114.B \-e
115switches on the command line.
116.Ip 2. 4 2
117Contained in the file specified by the first filename on the command line.
118(Note that systems supporting the #! notation invoke interpreters this way.)
119.Ip 3. 4 2
a687059c 120Passed in implicitly via standard input.
378cc40b 121This only works if there are no filename arguments\*(--to pass
a687059c 122arguments to a
123.I stdin
124script you must explicitly specify a \- for the script name.
8d063cd8 125.PP
126After locating your script,
127.I perl
128compiles it to an internal form.
129If the script is syntactically correct, it is executed.
130.Sh "Options"
83b4785a 131Note: on first reading this section may not make much sense to you. It's here
8d063cd8 132at the front for easy reference.
133.PP
134A single-character option may be combined with the following option, if any.
135This is particularly useful when invoking a script using the #! construct which
136only allows one argument. Example:
137.nf
138
139.ne 2
a687059c 140 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
8d063cd8 141 .\|.\|.
142
143.fi
144Options include:
145.TP 5
378cc40b 146.B \-a
a687059c 147turns on autosplit mode when used with a
148.B \-n
149or
150.BR \-p .
378cc40b 151An implicit split command to the @F array
152is done as the first thing inside the implicit while loop produced by
a687059c 153the
154.B \-n
155or
156.BR \-p .
378cc40b 157.nf
158
a687059c 159 perl \-ane \'print pop(@F), "\en";\'
378cc40b 160
161is equivalent to
162
163 while (<>) {
a687059c 164 @F = split(\' \');
165 print pop(@F), "\en";
378cc40b 166 }
167
168.fi
169.TP 5
a687059c 170.BI \-d
171runs the script under the perl debugger.
172See the section on Debugging.
173.TP 5
174.BI \-D number
8d063cd8 175sets debugging flags.
176To watch how it executes your script, use
a687059c 177.BR \-D14 .
8d063cd8 178(This only works if debugging is compiled into your
179.IR perl .)
a687059c 180Another nice value is \-D1024, which lists your compiled syntax tree.
181And \-D512 displays compiled regular expressions.
8d063cd8 182.TP 5
a687059c 183.BI \-e " commandline"
8d063cd8 184may be used to enter one line of script.
185Multiple
186.B \-e
187commands may be given to build up a multi-line script.
188If
189.B \-e
190is given,
191.I perl
192will not look for a script filename in the argument list.
193.TP 5
a687059c 194.BI \-i extension
8d063cd8 195specifies that files processed by the <> construct are to be edited
196in-place.
197It does this by renaming the input file, opening the output file by the
198same name, and selecting that output file as the default for print statements.
199The extension, if supplied, is added to the name of the
200old file to make a backup copy.
201If no extension is supplied, no backup is made.
a687059c 202Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
8d063cd8 203the script:
204.nf
205
206.ne 2
a687059c 207 #!/usr/bin/perl \-pi.bak
8d063cd8 208 s/foo/bar/;
209
210which is equivalent to
211
212.ne 14
378cc40b 213 #!/usr/bin/perl
8d063cd8 214 while (<>) {
215 if ($ARGV ne $oldargv) {
a687059c 216 rename($ARGV, $ARGV . \'.bak\');
217 open(ARGVOUT, ">$ARGV");
8d063cd8 218 select(ARGVOUT);
219 $oldargv = $ARGV;
220 }
221 s/foo/bar/;
222 }
223 continue {
224 print; # this prints to original filename
225 }
a687059c 226 select(STDOUT);
8d063cd8 227
228.fi
a687059c 229except that the
230.B \-i
231form doesn't need to compare $ARGV to $oldargv to know when
8d063cd8 232the filename has changed.
233It does, however, use ARGVOUT for the selected filehandle.
a687059c 234Note that
235.I STDOUT
236is restored as the default output filehandle after the loop.
378cc40b 237.Sp
238You can use eof to locate the end of each input file, in case you want
239to append to each file, or reset line numbering (see example under eof).
8d063cd8 240.TP 5
a687059c 241.BI \-I directory
8d063cd8 242may be used in conjunction with
243.B \-P
244to tell the C preprocessor where to look for include files.
245By default /usr/include and /usr/lib/perl are searched.
246.TP 5
247.B \-n
248causes
249.I perl
250to assume the following loop around your script, which makes it iterate
a687059c 251over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
8d063cd8 252.nf
253
254.ne 3
255 while (<>) {
378cc40b 256 .\|.\|. # your script goes here
8d063cd8 257 }
258
259.fi
260Note that the lines are not printed by default.
261See
262.B \-p
263to have lines printed.
378cc40b 264Here is an efficient way to delete all files older than a week:
265.nf
266
a687059c 267 find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\'
378cc40b 268
269.fi
a687059c 270This is faster than using the \-exec switch of find because you don't have to
378cc40b 271start a process on every filename found.
8d063cd8 272.TP 5
273.B \-p
274causes
275.I perl
276to assume the following loop around your script, which makes it iterate
277over filename arguments somewhat like \fIsed\fR:
278.nf
279
280.ne 5
281 while (<>) {
378cc40b 282 .\|.\|. # your script goes here
8d063cd8 283 } continue {
284 print;
285 }
286
287.fi
288Note that the lines are printed automatically.
289To suppress printing use the
290.B \-n
291switch.
83b4785a 292A
293.B \-p
294overrides a
295.B \-n
296switch.
8d063cd8 297.TP 5
298.B \-P
299causes your script to be run through the C preprocessor before
300compilation by
a687059c 301.IR perl .
8d063cd8 302(Since both comments and cpp directives begin with the # character,
303you should avoid starting comments with any words recognized
304by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
305.TP 5
306.B \-s
307enables some rudimentary switch parsing for switches on the command line
a687059c 308after the script name but before any filename arguments (or before a \-\|\-).
83b4785a 309Any switch found there is removed from @ARGV and sets the corresponding variable in the
8d063cd8 310.I perl
311script.
312The following script prints \*(L"true\*(R" if and only if the script is
a687059c 313invoked with a \-xyz switch.
8d063cd8 314.nf
315
316.ne 2
a687059c 317 #!/usr/bin/perl \-s
83b4785a 318 if ($xyz) { print "true\en"; }
8d063cd8 319
320.fi
378cc40b 321.TP 5
322.B \-S
a687059c 323makes
324.I perl
325use the PATH environment variable to search for the script
378cc40b 326(unless the name of the script starts with a slash).
327Typically this is used to emulate #! startup on machines that don't
328support #!, in the following manner:
329.nf
330
331 #!/usr/bin/perl
a687059c 332 eval "exec /usr/bin/perl \-S $0 $*"
378cc40b 333 if $running_under_some_shell;
334
335.fi
336The system ignores the first line and feeds the script to /bin/sh,
a687059c 337which proceeds to try to execute the
338.I perl
339script as a shell script.
378cc40b 340The shell executes the second line as a normal shell command, and thus
a687059c 341starts up the
342.I perl
343interpreter.
378cc40b 344On some systems $0 doesn't always contain the full pathname,
a687059c 345so the
346.B \-S
347tells
348.I perl
349to search for the script if necessary.
350After
351.I perl
352locates the script, it parses the lines and ignores them because
378cc40b 353the variable $running_under_some_shell is never true.
ae986130 354A better construct than $* would be ${1+"$@"}, which handles embedded spaces
355and such in the filenames, but doesn't work if the script is being interpreted
356by csh.
357In order to start up sh rather than csh, some systems may have to replace the
358#! line with a line containing just
359a colon, which will be politely ignored by perl.
378cc40b 360.TP 5
a687059c 361.B \-u
362causes
363.I perl
364to dump core after compiling your script.
365You can then take this core dump and turn it into an executable file
366by using the undump program (not supplied).
367This speeds startup at the expense of some disk space (which you can
368minimize by stripping the executable).
369(Still, a "hello world" executable comes out to about 200K on my machine.)
370If you are going to run your executable as a set-id program then you
371should probably compile it using taintperl rather than normal perl.
372If you want to execute a portion of your script before dumping, use the
373dump operator instead.
374.TP 5
378cc40b 375.B \-U
a687059c 376allows
377.I perl
378to do unsafe operations.
13281fa4 379Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while
378cc40b 380running as superuser.
381.TP 5
382.B \-v
a687059c 383prints the version and patchlevel of your
384.I perl
385executable.
378cc40b 386.TP 5
387.B \-w
388prints warnings about identifiers that are mentioned only once, and scalar
389variables that are used before being set.
390Also warns about redefined subroutines, and references to undefined
a687059c 391filehandles or filehandles opened readonly that you are attempting to
392write on.
393Also warns you if you use == on values that don't look like numbers, and if
394your subroutines recurse more than 100 deep.
8d063cd8 395.Sh "Data Types and Objects"
396.PP
a687059c 397.I Perl
398has three data types: scalars, arrays of scalars, and
399associative arrays of scalars.
400Normal arrays are indexed by number, and associative arrays by string.
8d063cd8 401.PP
a687059c 402The interpretation of operations and values in perl sometimes
403depends on the requirements
404of the context around the operation or value.
405There are three major contexts: string, numeric and array.
406Certain operations return array values
407in contexts wanting an array, and scalar values otherwise.
408(If this is true of an operation it will be mentioned in the documentation
409for that operation.)
410Operations which return scalars don't care whether the context is looking
411for a string or a number, but
412scalar variables and values are interpreted as strings or numbers
413as appropriate to the context.
378cc40b 414A scalar is interpreted as TRUE in the boolean sense if it is not the null
8d063cd8 415string or 0.
a687059c 416Booleans returned by operators are 1 for true and \'0\' or \'\' (the null
8d063cd8 417string) for false.
418.PP
a687059c 419There are actually two varieties of null string: defined and undefined.
420Undefined null strings are returned when there is no real value for something,
421such as when there was an error, or at end of file, or when you refer
422to an uninitialized variable or element of an array.
423An undefined null string may become defined the first time you access it, but
424prior to that you can use the defined() operator to determine whether the
425value is defined or not.
426.PP
378cc40b 427References to scalar variables always begin with \*(L'$\*(R', even when referring
428to a scalar that is part of an array.
8d063cd8 429Thus:
430.nf
431
432.ne 3
378cc40b 433 $days \h'|2i'# a simple scalar variable
8d063cd8 434 $days[28] \h'|2i'# 29th element of array @days
a687059c 435 $days{\'Feb\'}\h'|2i'# one value from an associative array
378cc40b 436 $#days \h'|2i'# last index of array @days
8d063cd8 437
a687059c 438but entire arrays or array slices are denoted by \*(L'@\*(R':
8d063cd8 439
440 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
a687059c 441 @days[3,4,5]\h'|2i'# same as @days[3.\|.5]
442 @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'})
443
444and entire associative arrays are denoted by \*(L'%\*(R':
8d063cd8 445
a687059c 446 %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.)
8d063cd8 447.fi
448.PP
a687059c 449Any of these eight constructs may serve as an lvalue,
378cc40b 450that is, may be assigned to.
a687059c 451(It also turns out that an assignment is itself an lvalue in
452certain contexts\*(--see examples under s, tr and chop.)
453Assignment to a scalar evaluates the righthand side in a scalar context,
454while assignment to an array or array slice evaluates the righthand side
455in an array context.
456.PP
378cc40b 457You may find the length of array @days by evaluating
8d063cd8 458\*(L"$#days\*(R", as in
459.IR csh .
378cc40b 460(Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.)
461Assigning to $#days changes the length of the array.
462Shortening an array by this method does not actually destroy any values.
463Lengthening an array that was previously shortened recovers the values that
464were in those elements.
465You can also gain some measure of efficiency by preextending an array that
466is going to get big.
467(You can also extend an array by assigning to an element that is off the
468end of the array.
469This differs from assigning to $#whatever in that intervening values
470are set to null rather than recovered.)
471You can truncate an array down to nothing by assigning the null list () to
472it.
473The following are exactly equivalent
474.nf
475
476 @whatever = ();
477 $#whatever = $[ \- 1;
478
479.fi
8d063cd8 480.PP
a687059c 481Multi-dimensional arrays are not directly supported, but see the discussion
482of the $; variable later for a means of emulating multiple subscripts with
483an associative array.
484.PP
8d063cd8 485Every data type has its own namespace.
378cc40b 486You can, without fear of conflict, use the same name for a scalar variable,
8d063cd8 487an array, an associative array, a filehandle, a subroutine name, and/or
488a label.
a687059c 489Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R',
490or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
8d063cd8 491with respect to variable names.
492(They ARE reserved with respect to labels and filehandles, however, which
378cc40b 493don't have an initial special character.
a687059c 494Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\').
495Using uppercase filehandles also improves readability and protects you
496from conflict with future reserved words.)
8d063cd8 497Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
498different names.
499Names which start with a letter may also contain digits and underscores.
500Names which do not start with a letter are limited to one character,
501e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
a687059c 502(Most of the one character names have a predefined significance to
503.IR perl .
8d063cd8 504More later.)
505.PP
a687059c 506Numeric literals are specified in any of the usual floating point or
507integer formats:
508.nf
509
510.ne 5
511 12345
512 12345.67
513 .23E-10
514 0xffff # hex
515 0377 # octal
516
517.fi
8d063cd8 518String literals are delimited by either single or double quotes.
519They work much like shell quotes:
520double-quoted string literals are subject to backslash and variable
a687059c 521substitution; single-quoted strings are not (except for \e\' and \e\e).
8d063cd8 522The usual backslash rules apply for making characters such as newline, tab, etc.
523You can also embed newlines directly in your strings, i.e. they can end on
524a different line than they begin.
525This is nice, but if you forget your trailing quote, the error will not be
a687059c 526reported until
527.I perl
528finds another line containing the quote character, which
8d063cd8 529may be much further on in the script.
a687059c 530Variable substitution inside strings is limited to scalar variables, normal
531array values, and array slices.
532(In other words, identifiers beginning with $ or @, followed by an optional
533bracketed expression as a subscript.)
8d063cd8 534The following code segment prints out \*(L"The price is $100.\*(R"
535.nf
536
537.ne 2
a687059c 538 $Price = \'$100\';\h'|3.5i'# not interpreted
8d063cd8 539 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
540
541.fi
83b4785a 542Note that you can put curly brackets around the identifier to delimit it
543from following alphanumerics.
ae986130 544Also note that a single quoted string must be separated from a preceding
545word by a space, since single quote is a valid character in an identifier
546(see Packages).
8d063cd8 547.PP
a687059c 548Array values are interpolated into double-quoted strings by joining all the
549elements of the array with the delimiter specified in the $" variable,
550space by default.
551(Since in versions of perl prior to 3.0 the @ character was not a metacharacter
552in double-quoted strings, the interpolation of @array, $array[EXPR],
553@array[LIST], $array{EXPR}, or @array{LIST} only happens if array is
554referenced elsewhere in the program or is predefined.)
555The following are equivalent:
556.nf
557
558.ne 4
559 $temp = join($",@ARGV);
560 system "echo $temp";
561
562 system "echo @ARGV";
563
564.fi
ae986130 565Within search patterns (which also undergo double-quotish substitution)
a687059c 566there is a bad ambiguity: Is /$foo[bar]/ to be
567interpreted as /${foo}[bar]/ (where [bar] is a character class for the
568regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to
569array @foo)?
570If @foo doesn't otherwise exist, then it's obviously a character class.
571If @foo exists, perl takes a good guess about [bar], and is almost always right.
572If it does guess wrong, or if you're just plain paranoid,
573you can force the correct interpretation with curly brackets as above.
574.PP
575A line-oriented form of quoting is based on the shell here-is syntax.
576Following a << you specify a string to terminate the quoted material, and all lines
577following the current line down to the terminating string are the value
578of the item.
579The terminating string may be either an identifier (a word), or some
580quoted text.
581If quoted, the type of quotes you use determines the treatment of the text,
582just as in regular quoting.
583An unquoted identifier works like double quotes.
584There must be no space between the << and the identifier.
585(If you put a space it will be treated as a null identifier, which is
586valid, and matches the first blank line\*(--see Merry Christmas example below.)
587The terminating string must appear by itself (unquoted and with no surrounding
588whitespace) on the terminating line.
589.nf
590
591 print <<EOF; # same as above
592The price is $Price.
593EOF
594
595 print <<"EOF"; # same as above
596The price is $Price.
597EOF
598
599 print << x 10; # null identifier is delimiter
600Merry Christmas!
601
602 print <<`EOC`; # execute commands
603echo hi there
604echo lo there
605EOC
606
607 print <<foo, <<bar; # you can stack them
608I said foo.
609foo
610I said bar.
611bar
612
613.fi
8d063cd8 614Array literals are denoted by separating individual values by commas, and
615enclosing the list in parentheses.
616In a context not requiring an array value, the value of the array literal
617is the value of the final element, as in the C comma operator.
618For example,
619.nf
620
83b4785a 621.ne 4
a687059c 622 @foo = (\'cc\', \'\-E\', $bar);
8d063cd8 623
624assigns the entire array value to array foo, but
625
a687059c 626 $foo = (\'cc\', \'\-E\', $bar);
8d063cd8 627
628.fi
629assigns the value of variable bar to variable foo.
630Array lists may be assigned to if and only if each element of the list
631is an lvalue:
632.nf
633
634 ($a, $b, $c) = (1, 2, 3);
635
a687059c 636 ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00);
637
638The final element may be an array or an associative array:
639
640 ($a, $b, @rest) = split;
641 local($a, $b, %rest) = @_;
8d063cd8 642
643.fi
a687059c 644You can actually put an array anywhere in the list, but the first array
645in the list will soak up all the values, and anything after it will get
646a null value.
647This may be useful in a local().
8d063cd8 648.PP
a687059c 649An associative array literal contains pairs of values to be interpreted
650as a key and a value:
651.nf
652
653.ne 2
654 # same as map assignment above
655 %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
656
657.fi
658Array assignment in a scalar context returns the number of elements
659produced by the expression on the right side of the assignment:
660.nf
661
662 $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
663
664.fi
8d063cd8 665.PP
666There are several other pseudo-literals that you should know about.
378cc40b 667If a string is enclosed by backticks (grave accents), it first undergoes
668variable substitution just like a double quoted string.
669It is then interpreted as a command, and the output of that command
670is the value of the pseudo-literal, like in a shell.
8d063cd8 671The command is executed each time the pseudo-literal is evaluated.
378cc40b 672The status value of the command is returned in $? (see Predefined Names
673for the interpretation of $?).
674Unlike in \f2csh\f1, no translation is done on the return
8d063cd8 675data\*(--newlines remain newlines.
378cc40b 676Unlike in any of the shells, single quotes do not hide variable names
677in the command from interpretation.
678To pass a $ through to the shell you need to hide it with a backslash.
8d063cd8 679.PP
680Evaluating a filehandle in angle brackets yields the next line
a687059c 681from that file (newline included, so it's never false until EOF, at
682which time an undefined value is returned).
8d063cd8 683Ordinarily you must assign that value to a variable,
684but there is one situation where in which an automatic assignment happens.
685If (and only if) the input symbol is the only thing inside the conditional of a
686.I while
687loop, the value is
688automatically assigned to the variable \*(L"$_\*(R".
689(This may seem like an odd thing to you, but you'll use the construct
690in almost every
691.I perl
692script you write.)
693Anyway, the following lines are equivalent to each other:
694.nf
695
a687059c 696.ne 5
697 while ($_ = <STDIN>) { print; }
698 while (<STDIN>) { print; }
699 for (\|;\|<STDIN>;\|) { print; }
700 print while $_ = <STDIN>;
701 print while <STDIN>;
8d063cd8 702
703.fi
704The filehandles
a687059c 705.IR STDIN ,
706.I STDOUT
707and
708.I STDERR
709are predefined.
710(The filehandles
8d063cd8 711.IR stdin ,
712.I stdout
713and
714.I stderr
a687059c 715will also work except in packages, where they would be interpreted as
716local identifiers rather than global.)
8d063cd8 717Additional filehandles may be created with the
718.I open
719function.
720.PP
378cc40b 721If a <FILEHANDLE> is used in a context that is looking for an array, an array
722consisting of all the input lines is returned, one line per array element.
723It's easy to make a LARGE data space this way, so use with care.
724.PP
8d063cd8 725The null filehandle <> is special and can be used to emulate the behavior of
726\fIsed\fR and \fIawk\fR.
727Input from <> comes either from standard input, or from each file listed on
728the command line.
729Here's how it works: the first time <> is evaluated, the ARGV array is checked,
a687059c 730and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard
8d063cd8 731input.
732The ARGV array is then processed as a list of filenames.
733The loop
734.nf
735
736.ne 3
737 while (<>) {
738 .\|.\|. # code for each line
739 }
740
741.ne 10
742is equivalent to
743
a687059c 744 unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[;
8d063cd8 745 while ($ARGV = shift) {
746 open(ARGV, $ARGV);
747 while (<ARGV>) {
748 .\|.\|. # code for each line
749 }
750 }
751
752.fi
753except that it isn't as cumbersome to say.
754It really does shift array ARGV and put the current filename into
755variable ARGV.
756It also uses filehandle ARGV internally.
757You can modify @ARGV before the first <> as long as you leave the first
758filename at the beginning of the array.
83b4785a 759Line numbers ($.) continue as if the input was one big happy file.
378cc40b 760(But see example under eof for how to reset line numbers on each file.)
8d063cd8 761.PP
83b4785a 762.ne 5
378cc40b 763If you want to set @ARGV to your own list of files, go right ahead.
8d063cd8 764If you want to pass switches into your script, you can
765put a loop on the front like this:
766.nf
767
768.ne 10
769 while ($_ = $ARGV[0], /\|^\-/\|) {
770 shift;
771 last if /\|^\-\|\-$\|/\|;
772 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
773 /\|^\-v\|/ \|&& \|$verbose++;
774 .\|.\|. # other switches
775 }
776 while (<>) {
777 .\|.\|. # code for each line
778 }
779
780.fi
781The <> symbol will return FALSE only once.
782If you call it again after this it will assume you are processing another
a687059c 783@ARGV list, and if you haven't set @ARGV, will input from
784.IR STDIN .
378cc40b 785.PP
786If the string inside the angle brackets is a reference to a scalar variable
787(e.g. <$foo>),
788then that variable contains the name of the filehandle to input from.
789.PP
790If the string inside angle brackets is not a filehandle, it is interpreted
791as a filename pattern to be globbed, and either an array of filenames or the
792next filename in the list is returned, depending on context.
793One level of $ interpretation is done first, but you can't say <$foo>
794because that's an indirect filehandle as explained in the previous
795paragraph.
796You could insert curly brackets to force interpretation as a
797filename glob: <${foo}>.
798Example:
799.nf
800
801.ne 3
802 while (<*.c>) {
a687059c 803 chmod 0644, $_;
378cc40b 804 }
805
806is equivalent to
807
808.ne 5
a687059c 809 open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|");
378cc40b 810 while (<foo>) {
811 chop;
a687059c 812 chmod 0644, $_;
378cc40b 813 }
814
815.fi
816In fact, it's currently implemented that way.
a687059c 817(Which means it will not work on filenames with spaces in them unless
818you have /bin/csh on your machine.)
378cc40b 819Of course, the shortest way to do the above is:
820.nf
821
a687059c 822 chmod 0644, <*.c>;
378cc40b 823
824.fi
8d063cd8 825.Sh "Syntax"
826.PP
827A
828.I perl
829script consists of a sequence of declarations and commands.
830The only things that need to be declared in
831.I perl
832are report formats and subroutines.
833See the sections below for more information on those declarations.
a687059c 834All uninitialized objects user-created objects are assumed to
835start with a null or 0 value until they
836are defined by some explicit operation such as assignment.
8d063cd8 837The sequence of commands is executed just once, unlike in
838.I sed
839and
840.I awk
841scripts, where the sequence of commands is executed for each input line.
842While this means that you must explicitly loop over the lines of your input file
843(or files), it also means you have much more control over which files and which
844lines you look at.
845(Actually, I'm lying\*(--it is possible to do an implicit loop with either the
846.B \-n
847or
848.B \-p
849switch.)
850.PP
851A declaration can be put anywhere a command can, but has no effect on the
a687059c 852execution of the primary sequence of commands--declarations all take effect
853at compile time.
8d063cd8 854Typically all the declarations are put at the beginning or the end of the script.
855.PP
856.I Perl
857is, for the most part, a free-form language.
858(The only exception to this is format declarations, for fairly obvious reasons.)
859Comments are indicated by the # character, and extend to the end of the line.
860If you attempt to use /* */ C comments, it will be interpreted either as
861division or pattern matching, depending on the context.
862So don't do that.
863.Sh "Compound statements"
864In
865.IR perl ,
866a sequence of commands may be treated as one command by enclosing it
867in curly brackets.
868We will call this a BLOCK.
869.PP
870The following compound commands may be used to control flow:
871.nf
872
873.ne 4
874 if (EXPR) BLOCK
875 if (EXPR) BLOCK else BLOCK
378cc40b 876 if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK
8d063cd8 877 LABEL while (EXPR) BLOCK
878 LABEL while (EXPR) BLOCK continue BLOCK
879 LABEL for (EXPR; EXPR; EXPR) BLOCK
378cc40b 880 LABEL foreach VAR (ARRAY) BLOCK
8d063cd8 881 LABEL BLOCK continue BLOCK
882
883.fi
83b4785a 884Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
8d063cd8 885statements.
886This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
887If you want to write conditionals without curly brackets there are several
888other ways to do it.
889The following all do the same thing:
890.nf
891
892.ne 5
a687059c 893 if (!open(foo)) { die "Can't open $foo: $!"; }
894 die "Can't open $foo: $!" unless open(foo);
895 open(foo) || die "Can't open $foo: $!"; # foo or bust!
896 open(foo) ? die "Can't open $foo: $!" : \'hi mom\';
897 # a bit exotic, that last one
8d063cd8 898
899.fi
8d063cd8 900.PP
901The
902.I if
903statement is straightforward.
904Since BLOCKs are always bounded by curly brackets, there is never any
905ambiguity about which
906.I if
907an
908.I else
909goes with.
910If you use
911.I unless
912in place of
913.IR if ,
914the sense of the test is reversed.
915.PP
916The
917.I while
918statement executes the block as long as the expression is true
919(does not evaluate to the null string or 0).
920The LABEL is optional, and if present, consists of an identifier followed by
921a colon.
922The LABEL identifies the loop for the loop control statements
923.IR next ,
a687059c 924.IR last ,
8d063cd8 925and
926.I redo
927(see below).
928If there is a
929.I continue
930BLOCK, it is always executed just before
931the conditional is about to be evaluated again, similarly to the third part
932of a
933.I for
934loop in C.
935Thus it can be used to increment a loop variable, even when the loop has
936been continued via the
937.I next
938statement (similar to the C \*(L"continue\*(R" statement).
939.PP
940If the word
941.I while
942is replaced by the word
943.IR until ,
944the sense of the test is reversed, but the conditional is still tested before
945the first iteration.
946.PP
947In either the
948.I if
949or the
950.I while
951statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
952is true if the value of the last command in that block is true.
953.PP
954The
955.I for
956loop works exactly like the corresponding
957.I while
958loop:
959.nf
960
961.ne 12
962 for ($i = 1; $i < 10; $i++) {
963 .\|.\|.
964 }
965
966is the same as
967
968 $i = 1;
969 while ($i < 10) {
970 .\|.\|.
971 } continue {
972 $i++;
973 }
974.fi
975.PP
378cc40b 976The foreach loop iterates over a normal array value and sets the variable
977VAR to be each element of the array in turn.
13281fa4 978The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword,
979so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity.
378cc40b 980If VAR is omitted, $_ is set to each value.
981If ARRAY is an actual array (as opposed to an expression returning an array
982value), you can modify each element of the array
983by modifying VAR inside the loop.
984Examples:
985.nf
986
987.ne 5
988 for (@ary) { s/foo/bar/; }
989
990 foreach $elem (@elements) {
991 $elem *= 2;
992 }
993
a687059c 994.ne 3
995 for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) {
996 print $_, "\en"; sleep(1);
378cc40b 997 }
998
a687059c 999 for (1..15) { print "Merry Christmas\en"; }
1000
378cc40b 1001.ne 3
a687059c 1002 foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'}) {
378cc40b 1003 print "Item: $item\en";
1004 }
a687059c 1005
378cc40b 1006.fi
1007.PP
8d063cd8 1008The BLOCK by itself (labeled or not) is equivalent to a loop that executes
1009once.
1010Thus you can use any of the loop control statements in it to leave or
1011restart the block.
1012The
1013.I continue
1014block is optional.
1015This construct is particularly nice for doing case structures.
1016.nf
1017
1018.ne 6
1019 foo: {
a687059c 1020 if (/^abc/) { $abc = 1; last foo; }
1021 if (/^def/) { $def = 1; last foo; }
1022 if (/^xyz/) { $xyz = 1; last foo; }
8d063cd8 1023 $nothing = 1;
1024 }
1025
1026.fi
a687059c 1027There is no official switch statement in perl, because there
1028are already several ways to write the equivalent.
1029In addition to the above, you could write
378cc40b 1030.nf
1031
a687059c 1032.ne 6
1033 foo: {
1034 $abc = 1, last foo if /^abc/;
1035 $def = 1, last foo if /^def/;
1036 $xyz = 1, last foo if /^xyz/;
1037 $nothing = 1;
1038 }
1039
1040or
1041
1042.ne 6
1043 foo: {
1044 /^abc/ && do { $abc = 1; last foo; }
1045 /^def/ && do { $def = 1; last foo; }
1046 /^xyz/ && do { $xyz = 1; last foo; }
1047 $nothing = 1;
1048 }
1049
1050or
1051
1052.ne 6
1053 foo: {
1054 /^abc/ && ($abc = 1, last foo);
1055 /^def/ && ($def = 1, last foo);
1056 /^xyz/ && ($xyz = 1, last foo);
1057 $nothing = 1;
1058 }
1059
1060or even
1061
378cc40b 1062.ne 8
a687059c 1063 if (/^abc/)
1064 { $abc = 1; last foo; }
1065 elsif (/^def/)
1066 { $def = 1; last foo; }
1067 elsif (/^xyz/)
1068 { $xyz = 1; last foo; }
1069 else
1070 {$nothing = 1;}
378cc40b 1071
1072.fi
a687059c 1073As it happens, these are all optimized internally to a switch structure,
1074so perl jumps directly to the desired statement, and you needn't worry
1075about perl executing a lot of unnecessary statements when you have a string
1076of 50 elsifs, as long as you are testing the same simple scalar variable
1077using ==, eq, or pattern matching as above.
1078(If you're curious as to whether the optimizer has done this for a particular
1079case statement, you can use the \-D1024 switch to list the syntax tree
1080before execution.)
8d063cd8 1081.Sh "Simple statements"
1082The only kind of simple statement is an expression evaluated for its side
1083effects.
1084Every expression (simple statement) must be terminated with a semicolon.
1085Note that this is like C, but unlike Pascal (and
1086.IR awk ).
1087.PP
1088Any simple statement may optionally be followed by a
1089single modifier, just before the terminating semicolon.
1090The possible modifiers are:
1091.nf
1092
1093.ne 4
1094 if EXPR
1095 unless EXPR
1096 while EXPR
1097 until EXPR
1098
1099.fi
1100The
1101.I if
1102and
1103.I unless
1104modifiers have the expected semantics.
1105The
1106.I while
1107and
378cc40b 1108.I until
8d063cd8 1109modifiers also have the expected semantics (conditional evaluated first),
1110except when applied to a do-BLOCK command,
1111in which case the block executes once before the conditional is evaluated.
1112This is so that you can write loops like:
1113.nf
1114
1115.ne 4
1116 do {
a687059c 1117 $_ = <STDIN>;
8d063cd8 1118 .\|.\|.
1119 } until $_ \|eq \|".\|\e\|n";
1120
1121.fi
1122(See the
1123.I do
1124operator below. Note also that the loop control commands described later will
83b4785a 1125NOT work in this construct, since modifiers don't take loop labels.
8d063cd8 1126Sorry.)
1127.Sh "Expressions"
1128Since
1129.I perl
1130expressions work almost exactly like C expressions, only the differences
1131will be mentioned here.
1132.PP
1133Here's what
1134.I perl
1135has that C doesn't:
a687059c 1136.Ip ** 8 2
1137The exponentiation operator.
1138.Ip **= 8
1139The exponentiation assignment operator.
8d063cd8 1140.Ip (\|) 8 3
1141The null list, used to initialize an array to null.
1142.Ip . 8
1143Concatenation of two strings.
1144.Ip .= 8
a687059c 1145The concatenation assignment operator.
8d063cd8 1146.Ip eq 8
1147String equality (== is numeric equality).
1148For a mnemonic just think of \*(L"eq\*(R" as a string.
1149(If you are used to the
1150.I awk
1151behavior of using == for either string or numeric equality
1152based on the current form of the comparands, beware!
1153You must be explicit here.)
1154.Ip ne 8
1155String inequality (!= is numeric inequality).
1156.Ip lt 8
1157String less than.
1158.Ip gt 8
1159String greater than.
1160.Ip le 8
1161String less than or equal.
1162.Ip ge 8
1163String greater than or equal.
1164.Ip =~ 8 2
1165Certain operations search or modify the string \*(L"$_\*(R" by default.
1166This operator makes that kind of operation work on some other string.
1167The right argument is a search pattern, substitution, or translation.
1168The left argument is what is supposed to be searched, substituted, or
1169translated instead of the default \*(L"$_\*(R".
1170The return value indicates the success of the operation.
1171(If the right argument is an expression other than a search pattern,
1172substitution, or translation, it is interpreted as a search pattern
1173at run time.
1174This is less efficient than an explicit search, since the pattern must
1175be compiled every time the expression is evaluated.)
1176The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
1177.Ip !~ 8
1178Just like =~ except the return value is negated.
1179.Ip x 8
1180The repetition operator.
1181Returns a string consisting of the left operand repeated the
1182number of times specified by the right operand.
1183.nf
1184
a687059c 1185 print \'\-\' x 80; # print row of dashes
1186 print \'\-\' x80; # illegal, x80 is identifier
8d063cd8 1187
a687059c 1188 print "\et" x ($tab/8), \' \' x ($tab%8); # tab over
8d063cd8 1189
1190.fi
1191.Ip x= 8
a687059c 1192The repetition assignment operator.
1193.Ip .\|. 8
1194The range operator, which is really two different operators depending
1195on the context.
1196In an array context, returns an array of values counting (by ones)
1197from the left value to the right value.
1198This is useful for writing \*(L"for (1..10)\*(R" loops and for doing
1199slice operations on arrays.
1200.Sp
1201In a scalar context, .\|. returns a boolean value.
1202The operator is bistable, like a flip-flop..
1203Each .\|. operator maintains its own boolean state.
378cc40b 1204It is false as long as its left operand is false.
1205Once the left operand is true, the range operator stays true
1206until the right operand is true,
1207AFTER which the range operator becomes false again.
a687059c 1208(It doesn't become false till the next time the range operator is evaluated.
8d063cd8 1209It can become false on the same evaluation it became true, but it still returns
1210true once.)
13281fa4 1211The right operand is not evaluated while the operator is in the \*(L"false\*(R" state,
1212and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state.
a687059c 1213The scalar .\|. operator is primarily intended for doing line number ranges
1214after
8d063cd8 1215the fashion of \fIsed\fR or \fIawk\fR.
1216The precedence is a little lower than || and &&.
1217The value returned is either the null string for false, or a sequence number
1218(beginning with 1) for true.
1219The sequence number is reset for each range encountered.
a687059c 1220The final sequence number in a range has the string \'E0\' appended to it, which
8d063cd8 1221doesn't affect its numeric value, but gives you something to search for if you
1222want to exclude the endpoint.
1223You can exclude the beginning point by waiting for the sequence number to be
1224greater than 1.
a687059c 1225If either operand of scalar .\|. is static, that operand is implicitly compared
1226to the $. variable, the current line number.
8d063cd8 1227Examples:
1228.nf
1229
a687059c 1230.ne 6
1231As a scalar operator:
1232 if (101 .\|. 200) { print; } # print 2nd hundred lines
8d063cd8 1233
a687059c 1234 next line if (1 .\|. /^$/); # skip header lines
8d063cd8 1235
a687059c 1236 s/^/> / if (/^$/ .\|. eof()); # quote body
1237
1238.ne 4
1239As an array operator:
1240 for (101 .\|. 200) { print; } # print $_ 100 times
1241
1242 @foo = @foo[$[ .\|. $#foo]; # an expensive no-op
1243 @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items
8d063cd8 1244
1245.fi
378cc40b 1246.Ip \-x 8
1247A file test.
1248This unary operator takes one argument, either a filename or a filehandle,
1249and tests the associated file to see if something is true about it.
a687059c 1250If the argument is omitted, tests $_, except for \-t, which tests
1251.IR STDIN .
1252It returns 1 for true and \'\' for false, or the undefined value if the
1253file doesn't exist.
378cc40b 1254Precedence is higher than logical and relational operators, but lower than
1255arithmetic operators.
1256The operator may be any of:
1257.nf
1258 \-r File is readable by effective uid.
a687059c 1259 \-w File is writable by effective uid.
378cc40b 1260 \-x File is executable by effective uid.
1261 \-o File is owned by effective uid.
1262 \-R File is readable by real uid.
a687059c 1263 \-W File is writable by real uid.
378cc40b 1264 \-X File is executable by real uid.
1265 \-O File is owned by real uid.
1266 \-e File exists.
1267 \-z File has zero size.
1268 \-s File has non-zero size.
1269 \-f File is a plain file.
1270 \-d File is a directory.
1271 \-l File is a symbolic link.
1272 \-p File is a named pipe (FIFO).
1273 \-S File is a socket.
1274 \-b File is a block special file.
1275 \-c File is a character special file.
1276 \-u File has setuid bit set.
1277 \-g File has setgid bit set.
1278 \-k File has sticky bit set.
1279 \-t Filehandle is opened to a tty.
1280 \-T File is a text file.
1281 \-B File is a binary file (opposite of \-T).
1282
1283.fi
1284The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X
1285is based solely on the mode of the file and the uids and gids of the user.
1286There may be other reasons you can't actually read, write or execute the file.
1287Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and
1288\-x and \-X return 1 if any execute bit is set in the mode.
1289Scripts run by the superuser may thus need to do a stat() in order to determine
1290the actual mode of the file, or temporarily set the uid to something else.
1291.Sp
1292Example:
1293.nf
1294.ne 7
1295
1296 while (<>) {
1297 chop;
1298 next unless \-f $_; # ignore specials
1299 .\|.\|.
1300 }
1301
1302.fi
a687059c 1303Note that \-s/a/b/ does not do a negated substitution.
1304Saying \-exp($foo) still works as expected, however\*(--only single letters
378cc40b 1305following a minus are interpreted as file tests.
1306.Sp
1307The \-T and \-B switches work as follows.
1308The first block or so of the file is examined for odd characters such as
1309strange control codes or metacharacters.
1310If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file.
1311Also, any file containing null in the first block is considered a binary file.
1312If \-T or \-B is used on a filehandle, the current stdio buffer is examined
1313rather than the first block.
378cc40b 1314Both \-T and \-B return TRUE on a null file, or a file at EOF when testing
1315a filehandle.
8d063cd8 1316.PP
a687059c 1317If any of the file tests (or either stat operator) are given the special
1318filehandle consisting of a solitary underline, then the stat structure
1319of the previous file test (or stat operator) is used, saving a system
1320call.
1321(This doesn't work with \-t, and you need to remember that lstat and -l
1322will leave values in the stat structure for the symbolic link, not the
1323real file.)
1324Example:
1325.nf
1326
1327 print "Can do.\en" if -r $a || -w _ || -x _;
1328
1329.ne 9
1330 stat($filename);
1331 print "Readable\en" if -r _;
1332 print "Writable\en" if -w _;
1333 print "Executable\en" if -x _;
1334 print "Setuid\en" if -u _;
1335 print "Setgid\en" if -g _;
1336 print "Sticky\en" if -k _;
1337 print "Text\en" if -T _;
1338 print "Binary\en" if -B _;
1339
1340.fi
1341.PP
8d063cd8 1342Here is what C has that
1343.I perl
1344doesn't:
1345.Ip "unary &" 12
1346Address-of operator.
1347.Ip "unary *" 12
1348Dereference-address operator.
378cc40b 1349.Ip "(TYPE)" 12
1350Type casting operator.
8d063cd8 1351.PP
1352Like C,
1353.I perl
1354does a certain amount of expression evaluation at compile time, whenever
1355it determines that all of the arguments to an operator are static and have
1356no side effects.
1357In particular, string concatenation happens at compile time between literals that don't do variable substitution.
1358Backslash interpretation also happens at compile time.
1359You can say
1360.nf
1361
1362.ne 2
a687059c 1363 \'Now is the time for all\' . "\|\e\|n" .
1364 \'good men to come to.\'
8d063cd8 1365
1366.fi
1367and this all reduces to one string internally.
1368.PP
378cc40b 1369The autoincrement operator has a little extra built-in magic to it.
1370If you increment a variable that is numeric, or that has ever been used in
1371a numeric context, you get a normal increment.
1372If, however, the variable has only been used in string contexts since it
1373was set, and has a value that is not null and matches the
a687059c 1374pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done
378cc40b 1375as a string, preserving each character within its range, with carry:
1376.nf
1377
a687059c 1378 print ++($foo = \'99\'); # prints \*(L'100\*(R'
1379 print ++($foo = \'a0\'); # prints \*(L'a1\*(R'
1380 print ++($foo = \'Az\'); # prints \*(L'Ba\*(R'
1381 print ++($foo = \'zz\'); # prints \*(L'aaa\*(R'
378cc40b 1382
1383.fi
1384The autodecrement is not magical.