perl 3.0 patch #6 patch 5 continued
[p5sagit/p5-mst-13.2.git] / perl.man.1
CommitLineData
8d063cd8 1.rn '' }`
ffed7fef 2''' $Header: perl.man.1,v 3.0.1.2 89/11/17 15:30:03 lwall Locked $
8d063cd8 3'''
4''' $Log: perl.man.1,v $
ffed7fef 5''' Revision 3.0.1.2 89/11/17 15:30:03 lwall
6''' patch5: fixed some manual typos and indent problems
7'''
ae986130 8''' Revision 3.0.1.1 89/11/11 04:41:22 lwall
9''' patch2: explained about sh and ${1+"$@"}
10''' patch2: documented that space must separate word and '' string
11'''
a687059c 12''' Revision 3.0 89/10/18 15:21:29 lwall
13''' 3.0 baseline
8d063cd8 14'''
15'''
16.de Sh
17.br
18.ne 5
19.PP
20\fB\\$1\fR
21.PP
22..
23.de Sp
24.if t .sp .5v
25.if n .sp
26..
27.de Ip
28.br
29.ie \\n.$>=3 .ne \\$3
30.el .ne 3
31.IP "\\$1" \\$2
32..
33'''
34''' Set up \*(-- to give an unbreakable dash;
35''' string Tr holds user defined translation string.
36''' Bell System Logo is used as a dummy character.
37'''
378cc40b 38.tr \(*W-|\(bv\*(Tr
8d063cd8 39.ie n \{\
378cc40b 40.ds -- \(*W-
41.if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
42.if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
8d063cd8 43.ds L" ""
44.ds R" ""
45.ds L' '
46.ds R' '
47'br\}
48.el\{\
49.ds -- \(em\|
50.tr \*(Tr
51.ds L" ``
52.ds R" ''
53.ds L' `
54.ds R' '
55'br\}
a687059c 56.TH PERL 1 "\*(RP"
57.UC
8d063cd8 58.SH NAME
a687059c 59perl \- Practical Extraction and Report Language
8d063cd8 60.SH SYNOPSIS
a687059c 61.B perl
62[options] filename args
8d063cd8 63.SH DESCRIPTION
64.I Perl
a687059c 65is an interpreted language optimized for scanning arbitrary text files,
8d063cd8 66extracting information from those text files, and printing reports based
67on that information.
68It's also a good language for many system management tasks.
69The language is intended to be practical (easy to use, efficient, complete)
70rather than beautiful (tiny, elegant, minimal).
71It combines (in the author's opinion, anyway) some of the best features of C,
72\fIsed\fR, \fIawk\fR, and \fIsh\fR,
73so people familiar with those languages should have little difficulty with it.
74(Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
75even BASIC-PLUS.)
76Expression syntax corresponds quite closely to C expression syntax.
a687059c 77Unlike most Unix utilities,
78.I perl
79does not arbitrarily limit the size of your data\*(--if you've got
80the memory,
81.I perl
82can slurp in your whole file as a single string.
83Recursion is of unlimited depth.
84And the hash tables used by associative arrays grow as necessary to prevent
85degraded performance.
86.I Perl
87uses sophisticated pattern matching techniques to scan large amounts of
88data very quickly.
89Although optimized for scanning text,
90.I perl
91can also deal with binary data, and can make dbm files look like associative
92arrays (where dbm is available).
93Setuid
94.I perl
95scripts are safer than C programs
96through a dataflow tracing mechanism which prevents many stupid security holes.
8d063cd8 97If you have a problem that would ordinarily use \fIsed\fR
98or \fIawk\fR or \fIsh\fR, but it
99exceeds their capabilities or must run a little faster,
100and you don't want to write the silly thing in C, then
101.I perl
102may be for you.
a687059c 103There are also translators to turn your
104.I sed
105and
106.I awk
107scripts into
108.I perl
109scripts.
8d063cd8 110OK, enough hype.
111.PP
112Upon startup,
113.I perl
114looks for your script in one of the following places:
115.Ip 1. 4 2
116Specified line by line via
117.B \-e
118switches on the command line.
119.Ip 2. 4 2
120Contained in the file specified by the first filename on the command line.
121(Note that systems supporting the #! notation invoke interpreters this way.)
122.Ip 3. 4 2
a687059c 123Passed in implicitly via standard input.
378cc40b 124This only works if there are no filename arguments\*(--to pass
a687059c 125arguments to a
126.I stdin
127script you must explicitly specify a \- for the script name.
8d063cd8 128.PP
129After locating your script,
130.I perl
131compiles it to an internal form.
132If the script is syntactically correct, it is executed.
133.Sh "Options"
83b4785a 134Note: on first reading this section may not make much sense to you. It's here
8d063cd8 135at the front for easy reference.
136.PP
137A single-character option may be combined with the following option, if any.
138This is particularly useful when invoking a script using the #! construct which
139only allows one argument. Example:
140.nf
141
142.ne 2
a687059c 143 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
8d063cd8 144 .\|.\|.
145
146.fi
147Options include:
148.TP 5
378cc40b 149.B \-a
a687059c 150turns on autosplit mode when used with a
151.B \-n
152or
153.BR \-p .
378cc40b 154An implicit split command to the @F array
155is done as the first thing inside the implicit while loop produced by
a687059c 156the
157.B \-n
158or
159.BR \-p .
378cc40b 160.nf
161
a687059c 162 perl \-ane \'print pop(@F), "\en";\'
378cc40b 163
164is equivalent to
165
166 while (<>) {
a687059c 167 @F = split(\' \');
168 print pop(@F), "\en";
378cc40b 169 }
170
171.fi
172.TP 5
a687059c 173.BI \-d
174runs the script under the perl debugger.
175See the section on Debugging.
176.TP 5
177.BI \-D number
8d063cd8 178sets debugging flags.
179To watch how it executes your script, use
a687059c 180.BR \-D14 .
8d063cd8 181(This only works if debugging is compiled into your
182.IR perl .)
a687059c 183Another nice value is \-D1024, which lists your compiled syntax tree.
184And \-D512 displays compiled regular expressions.
8d063cd8 185.TP 5
a687059c 186.BI \-e " commandline"
8d063cd8 187may be used to enter one line of script.
188Multiple
189.B \-e
190commands may be given to build up a multi-line script.
191If
192.B \-e
193is given,
194.I perl
195will not look for a script filename in the argument list.
196.TP 5
a687059c 197.BI \-i extension
8d063cd8 198specifies that files processed by the <> construct are to be edited
199in-place.
200It does this by renaming the input file, opening the output file by the
201same name, and selecting that output file as the default for print statements.
202The extension, if supplied, is added to the name of the
203old file to make a backup copy.
204If no extension is supplied, no backup is made.
a687059c 205Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
8d063cd8 206the script:
207.nf
208
209.ne 2
a687059c 210 #!/usr/bin/perl \-pi.bak
8d063cd8 211 s/foo/bar/;
212
213which is equivalent to
214
215.ne 14
378cc40b 216 #!/usr/bin/perl
8d063cd8 217 while (<>) {
218 if ($ARGV ne $oldargv) {
a687059c 219 rename($ARGV, $ARGV . \'.bak\');
220 open(ARGVOUT, ">$ARGV");
8d063cd8 221 select(ARGVOUT);
222 $oldargv = $ARGV;
223 }
224 s/foo/bar/;
225 }
226 continue {
227 print; # this prints to original filename
228 }
a687059c 229 select(STDOUT);
8d063cd8 230
231.fi
a687059c 232except that the
233.B \-i
234form doesn't need to compare $ARGV to $oldargv to know when
8d063cd8 235the filename has changed.
236It does, however, use ARGVOUT for the selected filehandle.
a687059c 237Note that
238.I STDOUT
239is restored as the default output filehandle after the loop.
378cc40b 240.Sp
241You can use eof to locate the end of each input file, in case you want
242to append to each file, or reset line numbering (see example under eof).
8d063cd8 243.TP 5
a687059c 244.BI \-I directory
8d063cd8 245may be used in conjunction with
246.B \-P
247to tell the C preprocessor where to look for include files.
248By default /usr/include and /usr/lib/perl are searched.
249.TP 5
250.B \-n
251causes
252.I perl
253to assume the following loop around your script, which makes it iterate
a687059c 254over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
8d063cd8 255.nf
256
257.ne 3
258 while (<>) {
378cc40b 259 .\|.\|. # your script goes here
8d063cd8 260 }
261
262.fi
263Note that the lines are not printed by default.
264See
265.B \-p
266to have lines printed.
378cc40b 267Here is an efficient way to delete all files older than a week:
268.nf
269
a687059c 270 find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\'
378cc40b 271
272.fi
a687059c 273This is faster than using the \-exec switch of find because you don't have to
378cc40b 274start a process on every filename found.
8d063cd8 275.TP 5
276.B \-p
277causes
278.I perl
279to assume the following loop around your script, which makes it iterate
280over filename arguments somewhat like \fIsed\fR:
281.nf
282
283.ne 5
284 while (<>) {
378cc40b 285 .\|.\|. # your script goes here
8d063cd8 286 } continue {
287 print;
288 }
289
290.fi
291Note that the lines are printed automatically.
292To suppress printing use the
293.B \-n
294switch.
83b4785a 295A
296.B \-p
297overrides a
298.B \-n
299switch.
8d063cd8 300.TP 5
301.B \-P
302causes your script to be run through the C preprocessor before
303compilation by
a687059c 304.IR perl .
8d063cd8 305(Since both comments and cpp directives begin with the # character,
306you should avoid starting comments with any words recognized
307by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
308.TP 5
309.B \-s
310enables some rudimentary switch parsing for switches on the command line
a687059c 311after the script name but before any filename arguments (or before a \-\|\-).
83b4785a 312Any switch found there is removed from @ARGV and sets the corresponding variable in the
8d063cd8 313.I perl
314script.
315The following script prints \*(L"true\*(R" if and only if the script is
a687059c 316invoked with a \-xyz switch.
8d063cd8 317.nf
318
319.ne 2
a687059c 320 #!/usr/bin/perl \-s
83b4785a 321 if ($xyz) { print "true\en"; }
8d063cd8 322
323.fi
378cc40b 324.TP 5
325.B \-S
a687059c 326makes
327.I perl
328use the PATH environment variable to search for the script
378cc40b 329(unless the name of the script starts with a slash).
330Typically this is used to emulate #! startup on machines that don't
331support #!, in the following manner:
332.nf
333
334 #!/usr/bin/perl
a687059c 335 eval "exec /usr/bin/perl \-S $0 $*"
378cc40b 336 if $running_under_some_shell;
337
338.fi
339The system ignores the first line and feeds the script to /bin/sh,
a687059c 340which proceeds to try to execute the
341.I perl
342script as a shell script.
378cc40b 343The shell executes the second line as a normal shell command, and thus
a687059c 344starts up the
345.I perl
346interpreter.
378cc40b 347On some systems $0 doesn't always contain the full pathname,
a687059c 348so the
349.B \-S
350tells
351.I perl
352to search for the script if necessary.
353After
354.I perl
355locates the script, it parses the lines and ignores them because
378cc40b 356the variable $running_under_some_shell is never true.
ae986130 357A better construct than $* would be ${1+"$@"}, which handles embedded spaces
358and such in the filenames, but doesn't work if the script is being interpreted
359by csh.
360In order to start up sh rather than csh, some systems may have to replace the
361#! line with a line containing just
362a colon, which will be politely ignored by perl.
378cc40b 363.TP 5
a687059c 364.B \-u
365causes
366.I perl
367to dump core after compiling your script.
368You can then take this core dump and turn it into an executable file
369by using the undump program (not supplied).
370This speeds startup at the expense of some disk space (which you can
371minimize by stripping the executable).
372(Still, a "hello world" executable comes out to about 200K on my machine.)
373If you are going to run your executable as a set-id program then you
374should probably compile it using taintperl rather than normal perl.
375If you want to execute a portion of your script before dumping, use the
376dump operator instead.
377.TP 5
378cc40b 378.B \-U
a687059c 379allows
380.I perl
381to do unsafe operations.
13281fa4 382Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while
378cc40b 383running as superuser.
384.TP 5
385.B \-v
a687059c 386prints the version and patchlevel of your
387.I perl
388executable.
378cc40b 389.TP 5
390.B \-w
391prints warnings about identifiers that are mentioned only once, and scalar
392variables that are used before being set.
393Also warns about redefined subroutines, and references to undefined
a687059c 394filehandles or filehandles opened readonly that you are attempting to
395write on.
396Also warns you if you use == on values that don't look like numbers, and if
397your subroutines recurse more than 100 deep.
8d063cd8 398.Sh "Data Types and Objects"
399.PP
a687059c 400.I Perl
401has three data types: scalars, arrays of scalars, and
402associative arrays of scalars.
403Normal arrays are indexed by number, and associative arrays by string.
8d063cd8 404.PP
a687059c 405The interpretation of operations and values in perl sometimes
406depends on the requirements
407of the context around the operation or value.
408There are three major contexts: string, numeric and array.
409Certain operations return array values
410in contexts wanting an array, and scalar values otherwise.
411(If this is true of an operation it will be mentioned in the documentation
412for that operation.)
413Operations which return scalars don't care whether the context is looking
414for a string or a number, but
415scalar variables and values are interpreted as strings or numbers
416as appropriate to the context.
378cc40b 417A scalar is interpreted as TRUE in the boolean sense if it is not the null
8d063cd8 418string or 0.
ffed7fef 419Booleans returned by operators are 1 for true and 0 or \'\' (the null
8d063cd8 420string) for false.
421.PP
a687059c 422There are actually two varieties of null string: defined and undefined.
423Undefined null strings are returned when there is no real value for something,
424such as when there was an error, or at end of file, or when you refer
425to an uninitialized variable or element of an array.
426An undefined null string may become defined the first time you access it, but
427prior to that you can use the defined() operator to determine whether the
428value is defined or not.
429.PP
378cc40b 430References to scalar variables always begin with \*(L'$\*(R', even when referring
431to a scalar that is part of an array.
8d063cd8 432Thus:
433.nf
434
435.ne 3
378cc40b 436 $days \h'|2i'# a simple scalar variable
8d063cd8 437 $days[28] \h'|2i'# 29th element of array @days
a687059c 438 $days{\'Feb\'}\h'|2i'# one value from an associative array
378cc40b 439 $#days \h'|2i'# last index of array @days
8d063cd8 440
a687059c 441but entire arrays or array slices are denoted by \*(L'@\*(R':
8d063cd8 442
443 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
a687059c 444 @days[3,4,5]\h'|2i'# same as @days[3.\|.5]
445 @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'})
446
447and entire associative arrays are denoted by \*(L'%\*(R':
8d063cd8 448
a687059c 449 %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.)
8d063cd8 450.fi
451.PP
a687059c 452Any of these eight constructs may serve as an lvalue,
378cc40b 453that is, may be assigned to.
a687059c 454(It also turns out that an assignment is itself an lvalue in
455certain contexts\*(--see examples under s, tr and chop.)
456Assignment to a scalar evaluates the righthand side in a scalar context,
457while assignment to an array or array slice evaluates the righthand side
458in an array context.
459.PP
378cc40b 460You may find the length of array @days by evaluating
8d063cd8 461\*(L"$#days\*(R", as in
462.IR csh .
378cc40b 463(Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.)
464Assigning to $#days changes the length of the array.
465Shortening an array by this method does not actually destroy any values.
466Lengthening an array that was previously shortened recovers the values that
467were in those elements.
468You can also gain some measure of efficiency by preextending an array that
469is going to get big.
470(You can also extend an array by assigning to an element that is off the
471end of the array.
472This differs from assigning to $#whatever in that intervening values
473are set to null rather than recovered.)
474You can truncate an array down to nothing by assigning the null list () to
475it.
476The following are exactly equivalent
477.nf
478
479 @whatever = ();
480 $#whatever = $[ \- 1;
481
482.fi
8d063cd8 483.PP
a687059c 484Multi-dimensional arrays are not directly supported, but see the discussion
485of the $; variable later for a means of emulating multiple subscripts with
486an associative array.
487.PP
8d063cd8 488Every data type has its own namespace.
378cc40b 489You can, without fear of conflict, use the same name for a scalar variable,
8d063cd8 490an array, an associative array, a filehandle, a subroutine name, and/or
491a label.
a687059c 492Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R',
493or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
8d063cd8 494with respect to variable names.
495(They ARE reserved with respect to labels and filehandles, however, which
378cc40b 496don't have an initial special character.
a687059c 497Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\').
498Using uppercase filehandles also improves readability and protects you
499from conflict with future reserved words.)
8d063cd8 500Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
501different names.
502Names which start with a letter may also contain digits and underscores.
503Names which do not start with a letter are limited to one character,
504e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
a687059c 505(Most of the one character names have a predefined significance to
506.IR perl .
8d063cd8 507More later.)
508.PP
a687059c 509Numeric literals are specified in any of the usual floating point or
510integer formats:
511.nf
512
513.ne 5
514 12345
515 12345.67
516 .23E-10
517 0xffff # hex
518 0377 # octal
519
520.fi
8d063cd8 521String literals are delimited by either single or double quotes.
522They work much like shell quotes:
523double-quoted string literals are subject to backslash and variable
a687059c 524substitution; single-quoted strings are not (except for \e\' and \e\e).
8d063cd8 525The usual backslash rules apply for making characters such as newline, tab, etc.
526You can also embed newlines directly in your strings, i.e. they can end on
527a different line than they begin.
528This is nice, but if you forget your trailing quote, the error will not be
a687059c 529reported until
530.I perl
531finds another line containing the quote character, which
8d063cd8 532may be much further on in the script.
a687059c 533Variable substitution inside strings is limited to scalar variables, normal
534array values, and array slices.
535(In other words, identifiers beginning with $ or @, followed by an optional
536bracketed expression as a subscript.)
8d063cd8 537The following code segment prints out \*(L"The price is $100.\*(R"
538.nf
539
540.ne 2
a687059c 541 $Price = \'$100\';\h'|3.5i'# not interpreted
8d063cd8 542 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
543
544.fi
83b4785a 545Note that you can put curly brackets around the identifier to delimit it
546from following alphanumerics.
ae986130 547Also note that a single quoted string must be separated from a preceding
548word by a space, since single quote is a valid character in an identifier
549(see Packages).
8d063cd8 550.PP
a687059c 551Array values are interpolated into double-quoted strings by joining all the
552elements of the array with the delimiter specified in the $" variable,
553space by default.
554(Since in versions of perl prior to 3.0 the @ character was not a metacharacter
555in double-quoted strings, the interpolation of @array, $array[EXPR],
556@array[LIST], $array{EXPR}, or @array{LIST} only happens if array is
557referenced elsewhere in the program or is predefined.)
558The following are equivalent:
559.nf
560
561.ne 4
562 $temp = join($",@ARGV);
563 system "echo $temp";
564
565 system "echo @ARGV";
566
567.fi
ae986130 568Within search patterns (which also undergo double-quotish substitution)
a687059c 569there is a bad ambiguity: Is /$foo[bar]/ to be
570interpreted as /${foo}[bar]/ (where [bar] is a character class for the
571regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to
572array @foo)?
573If @foo doesn't otherwise exist, then it's obviously a character class.
574If @foo exists, perl takes a good guess about [bar], and is almost always right.
575If it does guess wrong, or if you're just plain paranoid,
576you can force the correct interpretation with curly brackets as above.
577.PP
578A line-oriented form of quoting is based on the shell here-is syntax.
579Following a << you specify a string to terminate the quoted material, and all lines
580following the current line down to the terminating string are the value
581of the item.
582The terminating string may be either an identifier (a word), or some
583quoted text.
584If quoted, the type of quotes you use determines the treatment of the text,
585just as in regular quoting.
586An unquoted identifier works like double quotes.
587There must be no space between the << and the identifier.
588(If you put a space it will be treated as a null identifier, which is
589valid, and matches the first blank line\*(--see Merry Christmas example below.)
590The terminating string must appear by itself (unquoted and with no surrounding
591whitespace) on the terminating line.
592.nf
593
594 print <<EOF; # same as above
595The price is $Price.
596EOF
597
598 print <<"EOF"; # same as above
599The price is $Price.
600EOF
601
602 print << x 10; # null identifier is delimiter
603Merry Christmas!
604
605 print <<`EOC`; # execute commands
606echo hi there
607echo lo there
608EOC
609
610 print <<foo, <<bar; # you can stack them
611I said foo.
612foo
613I said bar.
614bar
615
616.fi
8d063cd8 617Array literals are denoted by separating individual values by commas, and
618enclosing the list in parentheses.
619In a context not requiring an array value, the value of the array literal
620is the value of the final element, as in the C comma operator.
621For example,
622.nf
623
83b4785a 624.ne 4
a687059c 625 @foo = (\'cc\', \'\-E\', $bar);
8d063cd8 626
627assigns the entire array value to array foo, but
628
a687059c 629 $foo = (\'cc\', \'\-E\', $bar);
8d063cd8 630
631.fi
632assigns the value of variable bar to variable foo.
633Array lists may be assigned to if and only if each element of the list
634is an lvalue:
635.nf
636
637 ($a, $b, $c) = (1, 2, 3);
638
a687059c 639 ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00);
640
641The final element may be an array or an associative array:
642
643 ($a, $b, @rest) = split;
644 local($a, $b, %rest) = @_;
8d063cd8 645
646.fi
a687059c 647You can actually put an array anywhere in the list, but the first array
648in the list will soak up all the values, and anything after it will get
649a null value.
650This may be useful in a local().
8d063cd8 651.PP
a687059c 652An associative array literal contains pairs of values to be interpreted
653as a key and a value:
654.nf
655
656.ne 2
657 # same as map assignment above
658 %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
659
660.fi
661Array assignment in a scalar context returns the number of elements
662produced by the expression on the right side of the assignment:
663.nf
664
665 $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
666
667.fi
8d063cd8 668.PP
669There are several other pseudo-literals that you should know about.
378cc40b 670If a string is enclosed by backticks (grave accents), it first undergoes
671variable substitution just like a double quoted string.
672It is then interpreted as a command, and the output of that command
673is the value of the pseudo-literal, like in a shell.
8d063cd8 674The command is executed each time the pseudo-literal is evaluated.
378cc40b 675The status value of the command is returned in $? (see Predefined Names
676for the interpretation of $?).
677Unlike in \f2csh\f1, no translation is done on the return
8d063cd8 678data\*(--newlines remain newlines.
378cc40b 679Unlike in any of the shells, single quotes do not hide variable names
680in the command from interpretation.
681To pass a $ through to the shell you need to hide it with a backslash.
8d063cd8 682.PP
683Evaluating a filehandle in angle brackets yields the next line
a687059c 684from that file (newline included, so it's never false until EOF, at
685which time an undefined value is returned).
8d063cd8 686Ordinarily you must assign that value to a variable,
687but there is one situation where in which an automatic assignment happens.
688If (and only if) the input symbol is the only thing inside the conditional of a
689.I while
690loop, the value is
691automatically assigned to the variable \*(L"$_\*(R".
692(This may seem like an odd thing to you, but you'll use the construct
693in almost every
694.I perl
695script you write.)
696Anyway, the following lines are equivalent to each other:
697.nf
698
a687059c 699.ne 5
700 while ($_ = <STDIN>) { print; }
701 while (<STDIN>) { print; }
702 for (\|;\|<STDIN>;\|) { print; }
703 print while $_ = <STDIN>;
704 print while <STDIN>;
8d063cd8 705
706.fi
707The filehandles
a687059c 708.IR STDIN ,
709.I STDOUT
710and
711.I STDERR
712are predefined.
713(The filehandles
8d063cd8 714.IR stdin ,
715.I stdout
716and
717.I stderr
a687059c 718will also work except in packages, where they would be interpreted as
719local identifiers rather than global.)
8d063cd8 720Additional filehandles may be created with the
721.I open
722function.
723.PP
378cc40b 724If a <FILEHANDLE> is used in a context that is looking for an array, an array
725consisting of all the input lines is returned, one line per array element.
726It's easy to make a LARGE data space this way, so use with care.
727.PP
8d063cd8 728The null filehandle <> is special and can be used to emulate the behavior of
729\fIsed\fR and \fIawk\fR.
730Input from <> comes either from standard input, or from each file listed on
731the command line.
732Here's how it works: the first time <> is evaluated, the ARGV array is checked,
a687059c 733and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard
8d063cd8 734input.
735The ARGV array is then processed as a list of filenames.
736The loop
737.nf
738
739.ne 3
740 while (<>) {
741 .\|.\|. # code for each line
742 }
743
744.ne 10
745is equivalent to
746
a687059c 747 unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[;
8d063cd8 748 while ($ARGV = shift) {
749 open(ARGV, $ARGV);
750 while (<ARGV>) {
751 .\|.\|. # code for each line
752 }
753 }
754
755.fi
756except that it isn't as cumbersome to say.
757It really does shift array ARGV and put the current filename into
758variable ARGV.
759It also uses filehandle ARGV internally.
760You can modify @ARGV before the first <> as long as you leave the first
761filename at the beginning of the array.
83b4785a 762Line numbers ($.) continue as if the input was one big happy file.
378cc40b 763(But see example under eof for how to reset line numbers on each file.)
8d063cd8 764.PP
83b4785a 765.ne 5
378cc40b 766If you want to set @ARGV to your own list of files, go right ahead.
8d063cd8 767If you want to pass switches into your script, you can
768put a loop on the front like this:
769.nf
770
771.ne 10
772 while ($_ = $ARGV[0], /\|^\-/\|) {
773 shift;
774 last if /\|^\-\|\-$\|/\|;
775 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
776 /\|^\-v\|/ \|&& \|$verbose++;
777 .\|.\|. # other switches
778 }
779 while (<>) {
780 .\|.\|. # code for each line
781 }
782
783.fi
784The <> symbol will return FALSE only once.
785If you call it again after this it will assume you are processing another
a687059c 786@ARGV list, and if you haven't set @ARGV, will input from
787.IR STDIN .
378cc40b 788.PP
789If the string inside the angle brackets is a reference to a scalar variable
790(e.g. <$foo>),
791then that variable contains the name of the filehandle to input from.
792.PP
793If the string inside angle brackets is not a filehandle, it is interpreted
794as a filename pattern to be globbed, and either an array of filenames or the
795next filename in the list is returned, depending on context.
796One level of $ interpretation is done first, but you can't say <$foo>
797because that's an indirect filehandle as explained in the previous
798paragraph.
799You could insert curly brackets to force interpretation as a
800filename glob: <${foo}>.
801Example:
802.nf
803
804.ne 3
805 while (<*.c>) {
a687059c 806 chmod 0644, $_;
378cc40b 807 }
808
809is equivalent to
810
811.ne 5
a687059c 812 open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|");
378cc40b 813 while (<foo>) {
814 chop;
a687059c 815 chmod 0644, $_;
378cc40b 816 }
817
818.fi
819In fact, it's currently implemented that way.
a687059c 820(Which means it will not work on filenames with spaces in them unless
821you have /bin/csh on your machine.)
378cc40b 822Of course, the shortest way to do the above is:
823.nf
824
a687059c 825 chmod 0644, <*.c>;
378cc40b 826
827.fi
8d063cd8 828.Sh "Syntax"
829.PP
830A
831.I perl
832script consists of a sequence of declarations and commands.
833The only things that need to be declared in
834.I perl
835are report formats and subroutines.
836See the sections below for more information on those declarations.
ffed7fef 837All uninitialized user-created objects are assumed to
a687059c 838start with a null or 0 value until they
839are defined by some explicit operation such as assignment.
8d063cd8 840The sequence of commands is executed just once, unlike in
841.I sed
842and
843.I awk
844scripts, where the sequence of commands is executed for each input line.
845While this means that you must explicitly loop over the lines of your input file
846(or files), it also means you have much more control over which files and which
847lines you look at.
848(Actually, I'm lying\*(--it is possible to do an implicit loop with either the
849.B \-n
850or
851.B \-p
852switch.)
853.PP
854A declaration can be put anywhere a command can, but has no effect on the
a687059c 855execution of the primary sequence of commands--declarations all take effect
856at compile time.
8d063cd8 857Typically all the declarations are put at the beginning or the end of the script.
858.PP
859.I Perl
860is, for the most part, a free-form language.
861(The only exception to this is format declarations, for fairly obvious reasons.)
862Comments are indicated by the # character, and extend to the end of the line.
863If you attempt to use /* */ C comments, it will be interpreted either as
864division or pattern matching, depending on the context.
865So don't do that.
866.Sh "Compound statements"
867In
868.IR perl ,
869a sequence of commands may be treated as one command by enclosing it
870in curly brackets.
871We will call this a BLOCK.
872.PP
873The following compound commands may be used to control flow:
874.nf
875
876.ne 4
877 if (EXPR) BLOCK
878 if (EXPR) BLOCK else BLOCK
378cc40b 879 if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK
8d063cd8 880 LABEL while (EXPR) BLOCK
881 LABEL while (EXPR) BLOCK continue BLOCK
882 LABEL for (EXPR; EXPR; EXPR) BLOCK
378cc40b 883 LABEL foreach VAR (ARRAY) BLOCK
8d063cd8 884 LABEL BLOCK continue BLOCK
885
886.fi
83b4785a 887Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
8d063cd8 888statements.
889This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
890If you want to write conditionals without curly brackets there are several
891other ways to do it.
892The following all do the same thing:
893.nf
894
895.ne 5
a687059c 896 if (!open(foo)) { die "Can't open $foo: $!"; }
897 die "Can't open $foo: $!" unless open(foo);
898 open(foo) || die "Can't open $foo: $!"; # foo or bust!
899 open(foo) ? die "Can't open $foo: $!" : \'hi mom\';
900 # a bit exotic, that last one
8d063cd8 901
902.fi
8d063cd8 903.PP
904The
905.I if
906statement is straightforward.
907Since BLOCKs are always bounded by curly brackets, there is never any
908ambiguity about which
909.I if
910an
911.I else
912goes with.
913If you use
914.I unless
915in place of
916.IR if ,
917the sense of the test is reversed.
918.PP
919The
920.I while
921statement executes the block as long as the expression is true
922(does not evaluate to the null string or 0).
923The LABEL is optional, and if present, consists of an identifier followed by
924a colon.
925The LABEL identifies the loop for the loop control statements
926.IR next ,
a687059c 927.IR last ,
8d063cd8 928and
929.I redo
930(see below).
931If there is a
932.I continue
933BLOCK, it is always executed just before
934the conditional is about to be evaluated again, similarly to the third part
935of a
936.I for
937loop in C.
938Thus it can be used to increment a loop variable, even when the loop has
939been continued via the
940.I next
941statement (similar to the C \*(L"continue\*(R" statement).
942.PP
943If the word
944.I while
945is replaced by the word
946.IR until ,
947the sense of the test is reversed, but the conditional is still tested before
948the first iteration.
949.PP
950In either the
951.I if
952or the
953.I while
954statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
955is true if the value of the last command in that block is true.
956.PP
957The
958.I for
959loop works exactly like the corresponding
960.I while
961loop:
962.nf
963
964.ne 12
965 for ($i = 1; $i < 10; $i++) {
966 .\|.\|.
967 }
968
969is the same as
970
971 $i = 1;
972 while ($i < 10) {
973 .\|.\|.
974 } continue {
975 $i++;
976 }
977.fi
978.PP
378cc40b 979The foreach loop iterates over a normal array value and sets the variable
980VAR to be each element of the array in turn.
13281fa4 981The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword,
982so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity.
378cc40b 983If VAR is omitted, $_ is set to each value.
984If ARRAY is an actual array (as opposed to an expression returning an array
985value), you can modify each element of the array
986by modifying VAR inside the loop.
987Examples:
988.nf
989
990.ne 5
991 for (@ary) { s/foo/bar/; }
992
993 foreach $elem (@elements) {
994 $elem *= 2;
995 }
996
a687059c 997.ne 3
998 for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) {
999 print $_, "\en"; sleep(1);
378cc40b 1000 }
1001
a687059c 1002 for (1..15) { print "Merry Christmas\en"; }
1003
378cc40b 1004.ne 3
a687059c 1005 foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'}) {
378cc40b 1006 print "Item: $item\en";
1007 }
a687059c 1008
378cc40b 1009.fi
1010.PP
8d063cd8 1011The BLOCK by itself (labeled or not) is equivalent to a loop that executes
1012once.
1013Thus you can use any of the loop control statements in it to leave or
1014restart the block.
1015The
1016.I continue
1017block is optional.
1018This construct is particularly nice for doing case structures.
1019.nf
1020
1021.ne 6
1022 foo: {
a687059c 1023 if (/^abc/) { $abc = 1; last foo; }
1024 if (/^def/) { $def = 1; last foo; }
1025 if (/^xyz/) { $xyz = 1; last foo; }
8d063cd8 1026 $nothing = 1;
1027 }
1028
1029.fi
a687059c 1030There is no official switch statement in perl, because there
1031are already several ways to write the equivalent.
1032In addition to the above, you could write
378cc40b 1033.nf
1034
a687059c 1035.ne 6
1036 foo: {
ffed7fef 1037 $abc = 1, last foo if /^abc/;
1038 $def = 1, last foo if /^def/;
1039 $xyz = 1, last foo if /^xyz/;
a687059c 1040 $nothing = 1;
1041 }
1042
1043or
1044
1045.ne 6
1046 foo: {
1047 /^abc/ && do { $abc = 1; last foo; }
1048 /^def/ && do { $def = 1; last foo; }
1049 /^xyz/ && do { $xyz = 1; last foo; }
1050 $nothing = 1;
1051 }
1052
1053or
1054
1055.ne 6
1056 foo: {
1057 /^abc/ && ($abc = 1, last foo);
1058 /^def/ && ($def = 1, last foo);
1059 /^xyz/ && ($xyz = 1, last foo);
1060 $nothing = 1;
1061 }
1062
1063or even
1064
378cc40b 1065.ne 8
a687059c 1066 if (/^abc/)
1067 { $abc = 1; last foo; }
1068 elsif (/^def/)
1069 { $def = 1; last foo; }
1070 elsif (/^xyz/)
1071 { $xyz = 1; last foo; }
1072 else
1073 {$nothing = 1;}
378cc40b 1074
1075.fi
a687059c 1076As it happens, these are all optimized internally to a switch structure,
1077so perl jumps directly to the desired statement, and you needn't worry
1078about perl executing a lot of unnecessary statements when you have a string
1079of 50 elsifs, as long as you are testing the same simple scalar variable
1080using ==, eq, or pattern matching as above.
1081(If you're curious as to whether the optimizer has done this for a particular
1082case statement, you can use the \-D1024 switch to list the syntax tree
1083before execution.)
8d063cd8 1084.Sh "Simple statements"
1085The only kind of simple statement is an expression evaluated for its side
1086effects.
1087Every expression (simple statement) must be terminated with a semicolon.
1088Note that this is like C, but unlike Pascal (and
1089.IR awk ).
1090.PP
1091Any simple statement may optionally be followed by a
1092single modifier, just before the terminating semicolon.
1093The possible modifiers are:
1094.nf
1095
1096.ne 4
1097 if EXPR
1098 unless EXPR
1099 while EXPR
1100 until EXPR
1101
1102.fi
1103The
1104.I if
1105and
1106.I unless
1107modifiers have the expected semantics.
1108The
1109.I while
1110and
378cc40b 1111.I until
8d063cd8 1112modifiers also have the expected semantics (conditional evaluated first),
1113except when applied to a do-BLOCK command,
1114in which case the block executes once before the conditional is evaluated.
1115This is so that you can write loops like:
1116.nf
1117
1118.ne 4
1119 do {
a687059c 1120 $_ = <STDIN>;
8d063cd8 1121 .\|.\|.
1122 } until $_ \|eq \|".\|\e\|n";
1123
1124.fi
1125(See the
1126.I do
1127operator below. Note also that the loop control commands described later will
83b4785a 1128NOT work in this construct, since modifiers don't take loop labels.
8d063cd8 1129Sorry.)
1130.Sh "Expressions"
1131Since
1132.I perl
1133expressions work almost exactly like C expressions, only the differences
1134will be mentioned here.
1135.PP
1136Here's what
1137.I perl
1138has that C doesn't:
a687059c 1139.Ip ** 8 2
1140The exponentiation operator.
1141.Ip **= 8
1142The exponentiation assignment operator.
8d063cd8 1143.Ip (\|) 8 3
1144The null list, used to initialize an array to null.
1145.Ip . 8
1146Concatenation of two strings.
1147.Ip .= 8
a687059c 1148The concatenation assignment operator.
8d063cd8 1149.Ip eq 8
1150String equality (== is numeric equality).
1151For a mnemonic just think of \*(L"eq\*(R" as a string.
1152(If you are used to the
1153.I awk
1154behavior of using == for either string or numeric equality
1155based on the current form of the comparands, beware!
1156You must be explicit here.)
1157.Ip ne 8
1158String inequality (!= is numeric inequality).
1159.Ip lt 8
1160String less than.
1161.Ip gt 8
1162String greater than.
1163.Ip le 8
1164String less than or equal.
1165.Ip ge 8
1166String greater than or equal.
1167.Ip =~ 8 2
1168Certain operations search or modify the string \*(L"$_\*(R" by default.
1169This operator makes that kind of operation work on some other string.
1170The right argument is a search pattern, substitution, or translation.
1171The left argument is what is supposed to be searched, substituted, or
1172translated instead of the default \*(L"$_\*(R".
1173The return value indicates the success of the operation.
1174(If the right argument is an expression other than a search pattern,
1175substitution, or translation, it is interpreted as a search pattern
1176at run time.
1177This is less efficient than an explicit search, since the pattern must
1178be compiled every time the expression is evaluated.)
1179The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
1180.Ip !~ 8
1181Just like =~ except the return value is negated.
1182.Ip x 8
1183The repetition operator.
1184Returns a string consisting of the left operand repeated the
1185number of times specified by the right operand.
1186.nf
1187
a687059c 1188 print \'\-\' x 80; # print row of dashes
1189 print \'\-\' x80; # illegal, x80 is identifier
8d063cd8 1190
a687059c 1191 print "\et" x ($tab/8), \' \' x ($tab%8); # tab over
8d063cd8 1192
1193.fi
1194.Ip x= 8
a687059c 1195The repetition assignment operator.
1196.Ip .\|. 8
1197The range operator, which is really two different operators depending
1198on the context.
1199In an array context, returns an array of values counting (by ones)
1200from the left value to the right value.
1201This is useful for writing \*(L"for (1..10)\*(R" loops and for doing
1202slice operations on arrays.
1203.Sp
1204In a scalar context, .\|. returns a boolean value.
1205The operator is bistable, like a flip-flop..
1206Each .\|. operator maintains its own boolean state.
378cc40b 1207It is false as long as its left operand is false.
1208Once the left operand is true, the range operator stays true
1209until the right operand is true,
1210AFTER which the range operator becomes false again.
a687059c 1211(It doesn't become false till the next time the range operator is evaluated.
8d063cd8 1212It can become false on the same evaluation it became true, but it still returns
1213true once.)
13281fa4 1214The right operand is not evaluated while the operator is in the \*(L"false\*(R" state,
1215and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state.
a687059c 1216The scalar .\|. operator is primarily intended for doing line number ranges
1217after
8d063cd8 1218the fashion of \fIsed\fR or \fIawk\fR.
1219The precedence is a little lower than || and &&.
1220The value returned is either the null string for false, or a sequence number
1221(beginning with 1) for true.
1222The sequence number is reset for each range encountered.
a687059c 1223The final sequence number in a range has the string \'E0\' appended to it, which
8d063cd8 1224doesn't affect its numeric value, but gives you something to search for if you
1225want to exclude the endpoint.
1226You can exclude the beginning point by waiting for the sequence number to be
1227greater than 1.
a687059c 1228If either operand of scalar .\|. is static, that operand is implicitly compared
1229to the $. variable, the current line number.
8d063cd8 1230Examples:
1231.nf
1232
a687059c 1233.ne 6
1234As a scalar operator:
1235 if (101 .\|. 200) { print; } # print 2nd hundred lines
8d063cd8 1236
a687059c 1237 next line if (1 .\|. /^$/); # skip header lines
8d063cd8 1238
a687059c 1239 s/^/> / if (/^$/ .\|. eof()); # quote body
1240
1241.ne 4
1242As an array operator:
1243 for (101 .\|. 200) { print; } # print $_ 100 times
1244
1245 @foo = @foo[$[ .\|. $#foo]; # an expensive no-op
1246 @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items
8d063cd8 1247
1248.fi
378cc40b 1249.Ip \-x 8
1250A file test.
1251This unary operator takes one argument, either a filename or a filehandle,
1252and tests the associated file to see if something is true about it.
a687059c 1253If the argument is omitted, tests $_, except for \-t, which tests
1254.IR STDIN .
1255It returns 1 for true and \'\' for false, or the undefined value if the
1256file doesn't exist.
378cc40b 1257Precedence is higher than logical and relational operators, but lower than
1258arithmetic operators.
1259The operator may be any of:
1260.nf
1261 \-r File is readable by effective uid.
a687059c 1262 \-w File is writable by effective uid.
378cc40b 1263 \-x File is executable by effective uid.
1264 \-o File is owned by effective uid.
1265 \-R File is readable by real uid.
a687059c 1266 \-W File is writable by real uid.
378cc40b 1267 \-X File is executable by real uid.
1268 \-O File is owned by real uid.
1269 \-e File exists.
1270 \-z File has zero size.
1271 \-s File has non-zero size.
1272 \-f File is a plain file.
1273 \-d File is a directory.
1274 \-l File is a symbolic link.
1275 \-p File is a named pipe (FIFO).
1276 \-S File is a socket.
1277 \-b File is a block special file.
1278 \-c File is a character special file.
1279 \-u File has setuid bit set.
1280 \-g File has setgid bit set.
1281 \-k File has sticky bit set.
1282 \-t Filehandle is opened to a tty.
1283 \-T File is a text file.
1284 \-B File is a binary file (opposite of \-T).
1285
1286.fi
1287The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X
1288is based solely on the mode of the file and the uids and gids of the user.
1289There may be other reasons you can't actually read, write or execute the file.
1290Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and
1291\-x and \-X return 1 if any execute bit is set in the mode.
1292Scripts run by the superuser may thus need to do a stat() in order to determine
1293the actual mode of the file, or temporarily set the uid to something else.
1294.Sp
1295Example:
1296.nf
1297.ne 7
1298
1299 while (<>) {
1300 chop;
1301 next unless \-f $_; # ignore specials
1302 .\|.\|.
1303 }
1304
1305.fi
a687059c 1306Note that \-s/a/b/ does not do a negated substitution.
1307Saying \-exp($foo) still works as expected, however\*(--only single letters
378cc40b 1308following a minus are interpreted as file tests.
1309.Sp
1310The \-T and \-B switches work as follows.
1311The first block or so of the file is examined for odd characters such as
1312strange control codes or metacharacters.
1313If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file.
1314Also, any file containing null in the first block is considered a binary file.
1315If \-T or \-B is used on a filehandle, the current stdio buffer is examined
1316rather than the first block.
378cc40b 1317Both \-T and \-B return TRUE on a null file, or a file at EOF when testing
1318a filehandle.
8d063cd8 1319.PP
a687059c 1320If any of the file tests (or either stat operator) are given the special
1321filehandle consisting of a solitary underline, then the stat structure
1322of the previous file test (or stat operator) is used, saving a system
1323call.
1324(This doesn't work with \-t, and you need to remember that lstat and -l
1325will leave values in the stat structure for the symbolic link, not the
1326real file.)
1327Example:
1328.nf
1329
1330 print "Can do.\en" if -r $a || -w _ || -x _;
1331
1332.ne 9
1333 stat($filename);
1334 print "Readable\en" if -r _;
1335 print "Writable\en" if -w _;
1336 print "Executable\en" if -x _;
1337 print "Setuid\en" if -u _;
1338 print "Setgid\en" if -g _;
1339 print "Sticky\en" if -k _;
1340 print "Text\en" if -T _;
1341 print "Binary\en" if -B _;
1342
1343.fi
1344.PP
8d063cd8 1345Here is what C has that
1346.I perl
1347doesn't:
1348.Ip "unary &" 12
1349Address-of operator.
1350.Ip "unary *" 12
1351Dereference-address operator.
378cc40b 1352.Ip "(TYPE)" 12
1353Type casting operator.
8d063cd8 1354.PP
1355Like C,
1356.I perl
1357does a certain amount of expression evaluation at compile time, whenever
1358it determines that all of the arguments to an operator are static and have
1359no side effects.
1360In particular, string concatenation happens at compile time between literals that don't do variable substitution.
1361Backslash interpretation also happens at compile time.
1362You can say
1363.nf
1364
1365.ne 2
a687059c 1366 \'Now is the time for all\' . "\|\e\|n" .
1367 \'good men to come to.\'
8d063cd8 1368
1369.fi
1370and this all reduces to one string internally.
1371.PP
378cc40b 1372The autoincrement operator has a little extra built-in magic to it.
1373If you increment a variable that is numeric, or that has ever been used in
1374a numeric context, you get a normal increment.
1375If, however, the variable has only been used in string contexts since it
1376was set, and has a value that is not null and matches the
a687059c 1377pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done
378cc40b 1378as a string, preserving each character within its range, with carry:
1379.nf
1380
a687059c 1381 print ++($foo = \'99\'); # prints \*(L'100\*(R'
1382 print ++($foo = \'a0\'); # prints \*(L'a1\*(R'
1383 print ++($foo = \'Az\'); # prints \*(L'Ba\*(R'
1384 print ++($foo = \'zz\'); # prints \*(L'aaa\*(R'
378cc40b 1385
1386.fi
1387The autodecrement is not magical.