perl 3.0 patch #3 Patch #2 continued
[p5sagit/p5-mst-13.2.git] / perl.man.1
CommitLineData
8d063cd8 1.rn '' }`
a687059c 2''' $Header: perl.man.1,v 3.0 89/10/18 15:21:29 lwall Locked $
8d063cd8 3'''
4''' $Log: perl.man.1,v $
a687059c 5''' Revision 3.0 89/10/18 15:21:29 lwall
6''' 3.0 baseline
8d063cd8 7'''
8'''
9.de Sh
10.br
11.ne 5
12.PP
13\fB\\$1\fR
14.PP
15..
16.de Sp
17.if t .sp .5v
18.if n .sp
19..
20.de Ip
21.br
22.ie \\n.$>=3 .ne \\$3
23.el .ne 3
24.IP "\\$1" \\$2
25..
26'''
27''' Set up \*(-- to give an unbreakable dash;
28''' string Tr holds user defined translation string.
29''' Bell System Logo is used as a dummy character.
30'''
378cc40b 31.tr \(*W-|\(bv\*(Tr
8d063cd8 32.ie n \{\
378cc40b 33.ds -- \(*W-
34.if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
35.if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
8d063cd8 36.ds L" ""
37.ds R" ""
38.ds L' '
39.ds R' '
40'br\}
41.el\{\
42.ds -- \(em\|
43.tr \*(Tr
44.ds L" ``
45.ds R" ''
46.ds L' `
47.ds R' '
48'br\}
a687059c 49.TH PERL 1 "\*(RP"
50.UC
8d063cd8 51.SH NAME
a687059c 52perl \- Practical Extraction and Report Language
8d063cd8 53.SH SYNOPSIS
a687059c 54.B perl
55[options] filename args
8d063cd8 56.SH DESCRIPTION
57.I Perl
a687059c 58is an interpreted language optimized for scanning arbitrary text files,
8d063cd8 59extracting information from those text files, and printing reports based
60on that information.
61It's also a good language for many system management tasks.
62The language is intended to be practical (easy to use, efficient, complete)
63rather than beautiful (tiny, elegant, minimal).
64It combines (in the author's opinion, anyway) some of the best features of C,
65\fIsed\fR, \fIawk\fR, and \fIsh\fR,
66so people familiar with those languages should have little difficulty with it.
67(Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
68even BASIC-PLUS.)
69Expression syntax corresponds quite closely to C expression syntax.
a687059c 70Unlike most Unix utilities,
71.I perl
72does not arbitrarily limit the size of your data\*(--if you've got
73the memory,
74.I perl
75can slurp in your whole file as a single string.
76Recursion is of unlimited depth.
77And the hash tables used by associative arrays grow as necessary to prevent
78degraded performance.
79.I Perl
80uses sophisticated pattern matching techniques to scan large amounts of
81data very quickly.
82Although optimized for scanning text,
83.I perl
84can also deal with binary data, and can make dbm files look like associative
85arrays (where dbm is available).
86Setuid
87.I perl
88scripts are safer than C programs
89through a dataflow tracing mechanism which prevents many stupid security holes.
8d063cd8 90If you have a problem that would ordinarily use \fIsed\fR
91or \fIawk\fR or \fIsh\fR, but it
92exceeds their capabilities or must run a little faster,
93and you don't want to write the silly thing in C, then
94.I perl
95may be for you.
a687059c 96There are also translators to turn your
97.I sed
98and
99.I awk
100scripts into
101.I perl
102scripts.
8d063cd8 103OK, enough hype.
104.PP
105Upon startup,
106.I perl
107looks for your script in one of the following places:
108.Ip 1. 4 2
109Specified line by line via
110.B \-e
111switches on the command line.
112.Ip 2. 4 2
113Contained in the file specified by the first filename on the command line.
114(Note that systems supporting the #! notation invoke interpreters this way.)
115.Ip 3. 4 2
a687059c 116Passed in implicitly via standard input.
378cc40b 117This only works if there are no filename arguments\*(--to pass
a687059c 118arguments to a
119.I stdin
120script you must explicitly specify a \- for the script name.
8d063cd8 121.PP
122After locating your script,
123.I perl
124compiles it to an internal form.
125If the script is syntactically correct, it is executed.
126.Sh "Options"
83b4785a 127Note: on first reading this section may not make much sense to you. It's here
8d063cd8 128at the front for easy reference.
129.PP
130A single-character option may be combined with the following option, if any.
131This is particularly useful when invoking a script using the #! construct which
132only allows one argument. Example:
133.nf
134
135.ne 2
a687059c 136 #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
8d063cd8 137 .\|.\|.
138
139.fi
140Options include:
141.TP 5
378cc40b 142.B \-a
a687059c 143turns on autosplit mode when used with a
144.B \-n
145or
146.BR \-p .
378cc40b 147An implicit split command to the @F array
148is done as the first thing inside the implicit while loop produced by
a687059c 149the
150.B \-n
151or
152.BR \-p .
378cc40b 153.nf
154
a687059c 155 perl \-ane \'print pop(@F), "\en";\'
378cc40b 156
157is equivalent to
158
159 while (<>) {
a687059c 160 @F = split(\' \');
161 print pop(@F), "\en";
378cc40b 162 }
163
164.fi
165.TP 5
a687059c 166.BI \-d
167runs the script under the perl debugger.
168See the section on Debugging.
169.TP 5
170.BI \-D number
8d063cd8 171sets debugging flags.
172To watch how it executes your script, use
a687059c 173.BR \-D14 .
8d063cd8 174(This only works if debugging is compiled into your
175.IR perl .)
a687059c 176Another nice value is \-D1024, which lists your compiled syntax tree.
177And \-D512 displays compiled regular expressions.
8d063cd8 178.TP 5
a687059c 179.BI \-e " commandline"
8d063cd8 180may be used to enter one line of script.
181Multiple
182.B \-e
183commands may be given to build up a multi-line script.
184If
185.B \-e
186is given,
187.I perl
188will not look for a script filename in the argument list.
189.TP 5
a687059c 190.BI \-i extension
8d063cd8 191specifies that files processed by the <> construct are to be edited
192in-place.
193It does this by renaming the input file, opening the output file by the
194same name, and selecting that output file as the default for print statements.
195The extension, if supplied, is added to the name of the
196old file to make a backup copy.
197If no extension is supplied, no backup is made.
a687059c 198Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
8d063cd8 199the script:
200.nf
201
202.ne 2
a687059c 203 #!/usr/bin/perl \-pi.bak
8d063cd8 204 s/foo/bar/;
205
206which is equivalent to
207
208.ne 14
378cc40b 209 #!/usr/bin/perl
8d063cd8 210 while (<>) {
211 if ($ARGV ne $oldargv) {
a687059c 212 rename($ARGV, $ARGV . \'.bak\');
213 open(ARGVOUT, ">$ARGV");
8d063cd8 214 select(ARGVOUT);
215 $oldargv = $ARGV;
216 }
217 s/foo/bar/;
218 }
219 continue {
220 print; # this prints to original filename
221 }
a687059c 222 select(STDOUT);
8d063cd8 223
224.fi
a687059c 225except that the
226.B \-i
227form doesn't need to compare $ARGV to $oldargv to know when
8d063cd8 228the filename has changed.
229It does, however, use ARGVOUT for the selected filehandle.
a687059c 230Note that
231.I STDOUT
232is restored as the default output filehandle after the loop.
378cc40b 233.Sp
234You can use eof to locate the end of each input file, in case you want
235to append to each file, or reset line numbering (see example under eof).
8d063cd8 236.TP 5
a687059c 237.BI \-I directory
8d063cd8 238may be used in conjunction with
239.B \-P
240to tell the C preprocessor where to look for include files.
241By default /usr/include and /usr/lib/perl are searched.
242.TP 5
243.B \-n
244causes
245.I perl
246to assume the following loop around your script, which makes it iterate
a687059c 247over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
8d063cd8 248.nf
249
250.ne 3
251 while (<>) {
378cc40b 252 .\|.\|. # your script goes here
8d063cd8 253 }
254
255.fi
256Note that the lines are not printed by default.
257See
258.B \-p
259to have lines printed.
378cc40b 260Here is an efficient way to delete all files older than a week:
261.nf
262
a687059c 263 find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\'
378cc40b 264
265.fi
a687059c 266This is faster than using the \-exec switch of find because you don't have to
378cc40b 267start a process on every filename found.
8d063cd8 268.TP 5
269.B \-p
270causes
271.I perl
272to assume the following loop around your script, which makes it iterate
273over filename arguments somewhat like \fIsed\fR:
274.nf
275
276.ne 5
277 while (<>) {
378cc40b 278 .\|.\|. # your script goes here
8d063cd8 279 } continue {
280 print;
281 }
282
283.fi
284Note that the lines are printed automatically.
285To suppress printing use the
286.B \-n
287switch.
83b4785a 288A
289.B \-p
290overrides a
291.B \-n
292switch.
8d063cd8 293.TP 5
294.B \-P
295causes your script to be run through the C preprocessor before
296compilation by
a687059c 297.IR perl .
8d063cd8 298(Since both comments and cpp directives begin with the # character,
299you should avoid starting comments with any words recognized
300by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
301.TP 5
302.B \-s
303enables some rudimentary switch parsing for switches on the command line
a687059c 304after the script name but before any filename arguments (or before a \-\|\-).
83b4785a 305Any switch found there is removed from @ARGV and sets the corresponding variable in the
8d063cd8 306.I perl
307script.
308The following script prints \*(L"true\*(R" if and only if the script is
a687059c 309invoked with a \-xyz switch.
8d063cd8 310.nf
311
312.ne 2
a687059c 313 #!/usr/bin/perl \-s
83b4785a 314 if ($xyz) { print "true\en"; }
8d063cd8 315
316.fi
378cc40b 317.TP 5
318.B \-S
a687059c 319makes
320.I perl
321use the PATH environment variable to search for the script
378cc40b 322(unless the name of the script starts with a slash).
323Typically this is used to emulate #! startup on machines that don't
324support #!, in the following manner:
325.nf
326
327 #!/usr/bin/perl
a687059c 328 eval "exec /usr/bin/perl \-S $0 $*"
378cc40b 329 if $running_under_some_shell;
330
331.fi
332The system ignores the first line and feeds the script to /bin/sh,
a687059c 333which proceeds to try to execute the
334.I perl
335script as a shell script.
378cc40b 336The shell executes the second line as a normal shell command, and thus
a687059c 337starts up the
338.I perl
339interpreter.
378cc40b 340On some systems $0 doesn't always contain the full pathname,
a687059c 341so the
342.B \-S
343tells
344.I perl
345to search for the script if necessary.
346After
347.I perl
348locates the script, it parses the lines and ignores them because
378cc40b 349the variable $running_under_some_shell is never true.
350.TP 5
a687059c 351.B \-u
352causes
353.I perl
354to dump core after compiling your script.
355You can then take this core dump and turn it into an executable file
356by using the undump program (not supplied).
357This speeds startup at the expense of some disk space (which you can
358minimize by stripping the executable).
359(Still, a "hello world" executable comes out to about 200K on my machine.)
360If you are going to run your executable as a set-id program then you
361should probably compile it using taintperl rather than normal perl.
362If you want to execute a portion of your script before dumping, use the
363dump operator instead.
364.TP 5
378cc40b 365.B \-U
a687059c 366allows
367.I perl
368to do unsafe operations.
13281fa4 369Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while
378cc40b 370running as superuser.
371.TP 5
372.B \-v
a687059c 373prints the version and patchlevel of your
374.I perl
375executable.
378cc40b 376.TP 5
377.B \-w
378prints warnings about identifiers that are mentioned only once, and scalar
379variables that are used before being set.
380Also warns about redefined subroutines, and references to undefined
a687059c 381filehandles or filehandles opened readonly that you are attempting to
382write on.
383Also warns you if you use == on values that don't look like numbers, and if
384your subroutines recurse more than 100 deep.
8d063cd8 385.Sh "Data Types and Objects"
386.PP
a687059c 387.I Perl
388has three data types: scalars, arrays of scalars, and
389associative arrays of scalars.
390Normal arrays are indexed by number, and associative arrays by string.
8d063cd8 391.PP
a687059c 392The interpretation of operations and values in perl sometimes
393depends on the requirements
394of the context around the operation or value.
395There are three major contexts: string, numeric and array.
396Certain operations return array values
397in contexts wanting an array, and scalar values otherwise.
398(If this is true of an operation it will be mentioned in the documentation
399for that operation.)
400Operations which return scalars don't care whether the context is looking
401for a string or a number, but
402scalar variables and values are interpreted as strings or numbers
403as appropriate to the context.
378cc40b 404A scalar is interpreted as TRUE in the boolean sense if it is not the null
8d063cd8 405string or 0.
a687059c 406Booleans returned by operators are 1 for true and \'0\' or \'\' (the null
8d063cd8 407string) for false.
408.PP
a687059c 409There are actually two varieties of null string: defined and undefined.
410Undefined null strings are returned when there is no real value for something,
411such as when there was an error, or at end of file, or when you refer
412to an uninitialized variable or element of an array.
413An undefined null string may become defined the first time you access it, but
414prior to that you can use the defined() operator to determine whether the
415value is defined or not.
416.PP
378cc40b 417References to scalar variables always begin with \*(L'$\*(R', even when referring
418to a scalar that is part of an array.
8d063cd8 419Thus:
420.nf
421
422.ne 3
378cc40b 423 $days \h'|2i'# a simple scalar variable
8d063cd8 424 $days[28] \h'|2i'# 29th element of array @days
a687059c 425 $days{\'Feb\'}\h'|2i'# one value from an associative array
378cc40b 426 $#days \h'|2i'# last index of array @days
8d063cd8 427
a687059c 428but entire arrays or array slices are denoted by \*(L'@\*(R':
8d063cd8 429
430 @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n])
a687059c 431 @days[3,4,5]\h'|2i'# same as @days[3.\|.5]
432 @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'})
433
434and entire associative arrays are denoted by \*(L'%\*(R':
8d063cd8 435
a687059c 436 %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.)
8d063cd8 437.fi
438.PP
a687059c 439Any of these eight constructs may serve as an lvalue,
378cc40b 440that is, may be assigned to.
a687059c 441(It also turns out that an assignment is itself an lvalue in
442certain contexts\*(--see examples under s, tr and chop.)
443Assignment to a scalar evaluates the righthand side in a scalar context,
444while assignment to an array or array slice evaluates the righthand side
445in an array context.
446.PP
378cc40b 447You may find the length of array @days by evaluating
8d063cd8 448\*(L"$#days\*(R", as in
449.IR csh .
378cc40b 450(Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.)
451Assigning to $#days changes the length of the array.
452Shortening an array by this method does not actually destroy any values.
453Lengthening an array that was previously shortened recovers the values that
454were in those elements.
455You can also gain some measure of efficiency by preextending an array that
456is going to get big.
457(You can also extend an array by assigning to an element that is off the
458end of the array.
459This differs from assigning to $#whatever in that intervening values
460are set to null rather than recovered.)
461You can truncate an array down to nothing by assigning the null list () to
462it.
463The following are exactly equivalent
464.nf
465
466 @whatever = ();
467 $#whatever = $[ \- 1;
468
469.fi
8d063cd8 470.PP
a687059c 471Multi-dimensional arrays are not directly supported, but see the discussion
472of the $; variable later for a means of emulating multiple subscripts with
473an associative array.
474.PP
8d063cd8 475Every data type has its own namespace.
378cc40b 476You can, without fear of conflict, use the same name for a scalar variable,
8d063cd8 477an array, an associative array, a filehandle, a subroutine name, and/or
478a label.
a687059c 479Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R',
480or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved
8d063cd8 481with respect to variable names.
482(They ARE reserved with respect to labels and filehandles, however, which
378cc40b 483don't have an initial special character.
a687059c 484Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\').
485Using uppercase filehandles also improves readability and protects you
486from conflict with future reserved words.)
8d063cd8 487Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all
488different names.
489Names which start with a letter may also contain digits and underscores.
490Names which do not start with a letter are limited to one character,
491e.g. \*(L"$%\*(R" or \*(L"$$\*(R".
a687059c 492(Most of the one character names have a predefined significance to
493.IR perl .
8d063cd8 494More later.)
495.PP
a687059c 496Numeric literals are specified in any of the usual floating point or
497integer formats:
498.nf
499
500.ne 5
501 12345
502 12345.67
503 .23E-10
504 0xffff # hex
505 0377 # octal
506
507.fi
8d063cd8 508String literals are delimited by either single or double quotes.
509They work much like shell quotes:
510double-quoted string literals are subject to backslash and variable
a687059c 511substitution; single-quoted strings are not (except for \e\' and \e\e).
8d063cd8 512The usual backslash rules apply for making characters such as newline, tab, etc.
513You can also embed newlines directly in your strings, i.e. they can end on
514a different line than they begin.
515This is nice, but if you forget your trailing quote, the error will not be
a687059c 516reported until
517.I perl
518finds another line containing the quote character, which
8d063cd8 519may be much further on in the script.
a687059c 520Variable substitution inside strings is limited to scalar variables, normal
521array values, and array slices.
522(In other words, identifiers beginning with $ or @, followed by an optional
523bracketed expression as a subscript.)
8d063cd8 524The following code segment prints out \*(L"The price is $100.\*(R"
525.nf
526
527.ne 2
a687059c 528 $Price = \'$100\';\h'|3.5i'# not interpreted
8d063cd8 529 print "The price is $Price.\e\|n";\h'|3.5i'# interpreted
530
531.fi
83b4785a 532Note that you can put curly brackets around the identifier to delimit it
533from following alphanumerics.
8d063cd8 534.PP
a687059c 535Array values are interpolated into double-quoted strings by joining all the
536elements of the array with the delimiter specified in the $" variable,
537space by default.
538(Since in versions of perl prior to 3.0 the @ character was not a metacharacter
539in double-quoted strings, the interpolation of @array, $array[EXPR],
540@array[LIST], $array{EXPR}, or @array{LIST} only happens if array is
541referenced elsewhere in the program or is predefined.)
542The following are equivalent:
543.nf
544
545.ne 4
546 $temp = join($",@ARGV);
547 system "echo $temp";
548
549 system "echo @ARGV";
550
551.fi
552Within search patterns (which also undergo double-quoteish substitution)
553there is a bad ambiguity: Is /$foo[bar]/ to be
554interpreted as /${foo}[bar]/ (where [bar] is a character class for the
555regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to
556array @foo)?
557If @foo doesn't otherwise exist, then it's obviously a character class.
558If @foo exists, perl takes a good guess about [bar], and is almost always right.
559If it does guess wrong, or if you're just plain paranoid,
560you can force the correct interpretation with curly brackets as above.
561.PP
562A line-oriented form of quoting is based on the shell here-is syntax.
563Following a << you specify a string to terminate the quoted material, and all lines
564following the current line down to the terminating string are the value
565of the item.
566The terminating string may be either an identifier (a word), or some
567quoted text.
568If quoted, the type of quotes you use determines the treatment of the text,
569just as in regular quoting.
570An unquoted identifier works like double quotes.
571There must be no space between the << and the identifier.
572(If you put a space it will be treated as a null identifier, which is
573valid, and matches the first blank line\*(--see Merry Christmas example below.)
574The terminating string must appear by itself (unquoted and with no surrounding
575whitespace) on the terminating line.
576.nf
577
578 print <<EOF; # same as above
579The price is $Price.
580EOF
581
582 print <<"EOF"; # same as above
583The price is $Price.
584EOF
585
586 print << x 10; # null identifier is delimiter
587Merry Christmas!
588
589 print <<`EOC`; # execute commands
590echo hi there
591echo lo there
592EOC
593
594 print <<foo, <<bar; # you can stack them
595I said foo.
596foo
597I said bar.
598bar
599
600.fi
8d063cd8 601Array literals are denoted by separating individual values by commas, and
602enclosing the list in parentheses.
603In a context not requiring an array value, the value of the array literal
604is the value of the final element, as in the C comma operator.
605For example,
606.nf
607
83b4785a 608.ne 4
a687059c 609 @foo = (\'cc\', \'\-E\', $bar);
8d063cd8 610
611assigns the entire array value to array foo, but
612
a687059c 613 $foo = (\'cc\', \'\-E\', $bar);
8d063cd8 614
615.fi
616assigns the value of variable bar to variable foo.
617Array lists may be assigned to if and only if each element of the list
618is an lvalue:
619.nf
620
621 ($a, $b, $c) = (1, 2, 3);
622
a687059c 623 ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00);
624
625The final element may be an array or an associative array:
626
627 ($a, $b, @rest) = split;
628 local($a, $b, %rest) = @_;
8d063cd8 629
630.fi
a687059c 631You can actually put an array anywhere in the list, but the first array
632in the list will soak up all the values, and anything after it will get
633a null value.
634This may be useful in a local().
8d063cd8 635.PP
a687059c 636An associative array literal contains pairs of values to be interpreted
637as a key and a value:
638.nf
639
640.ne 2
641 # same as map assignment above
642 %map = ('red',0x00f,'blue',0x0f0,'green',0xf00);
643
644.fi
645Array assignment in a scalar context returns the number of elements
646produced by the expression on the right side of the assignment:
647.nf
648
649 $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2
650
651.fi
8d063cd8 652.PP
653There are several other pseudo-literals that you should know about.
378cc40b 654If a string is enclosed by backticks (grave accents), it first undergoes
655variable substitution just like a double quoted string.
656It is then interpreted as a command, and the output of that command
657is the value of the pseudo-literal, like in a shell.
8d063cd8 658The command is executed each time the pseudo-literal is evaluated.
378cc40b 659The status value of the command is returned in $? (see Predefined Names
660for the interpretation of $?).
661Unlike in \f2csh\f1, no translation is done on the return
8d063cd8 662data\*(--newlines remain newlines.
378cc40b 663Unlike in any of the shells, single quotes do not hide variable names
664in the command from interpretation.
665To pass a $ through to the shell you need to hide it with a backslash.
8d063cd8 666.PP
667Evaluating a filehandle in angle brackets yields the next line
a687059c 668from that file (newline included, so it's never false until EOF, at
669which time an undefined value is returned).
8d063cd8 670Ordinarily you must assign that value to a variable,
671but there is one situation where in which an automatic assignment happens.
672If (and only if) the input symbol is the only thing inside the conditional of a
673.I while
674loop, the value is
675automatically assigned to the variable \*(L"$_\*(R".
676(This may seem like an odd thing to you, but you'll use the construct
677in almost every
678.I perl
679script you write.)
680Anyway, the following lines are equivalent to each other:
681.nf
682
a687059c 683.ne 5
684 while ($_ = <STDIN>) { print; }
685 while (<STDIN>) { print; }
686 for (\|;\|<STDIN>;\|) { print; }
687 print while $_ = <STDIN>;
688 print while <STDIN>;
8d063cd8 689
690.fi
691The filehandles
a687059c 692.IR STDIN ,
693.I STDOUT
694and
695.I STDERR
696are predefined.
697(The filehandles
8d063cd8 698.IR stdin ,
699.I stdout
700and
701.I stderr
a687059c 702will also work except in packages, where they would be interpreted as
703local identifiers rather than global.)
8d063cd8 704Additional filehandles may be created with the
705.I open
706function.
707.PP
378cc40b 708If a <FILEHANDLE> is used in a context that is looking for an array, an array
709consisting of all the input lines is returned, one line per array element.
710It's easy to make a LARGE data space this way, so use with care.
711.PP
8d063cd8 712The null filehandle <> is special and can be used to emulate the behavior of
713\fIsed\fR and \fIawk\fR.
714Input from <> comes either from standard input, or from each file listed on
715the command line.
716Here's how it works: the first time <> is evaluated, the ARGV array is checked,
a687059c 717and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard
8d063cd8 718input.
719The ARGV array is then processed as a list of filenames.
720The loop
721.nf
722
723.ne 3
724 while (<>) {
725 .\|.\|. # code for each line
726 }
727
728.ne 10
729is equivalent to
730
a687059c 731 unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[;
8d063cd8 732 while ($ARGV = shift) {
733 open(ARGV, $ARGV);
734 while (<ARGV>) {
735 .\|.\|. # code for each line
736 }
737 }
738
739.fi
740except that it isn't as cumbersome to say.
741It really does shift array ARGV and put the current filename into
742variable ARGV.
743It also uses filehandle ARGV internally.
744You can modify @ARGV before the first <> as long as you leave the first
745filename at the beginning of the array.
83b4785a 746Line numbers ($.) continue as if the input was one big happy file.
378cc40b 747(But see example under eof for how to reset line numbers on each file.)
8d063cd8 748.PP
83b4785a 749.ne 5
378cc40b 750If you want to set @ARGV to your own list of files, go right ahead.
8d063cd8 751If you want to pass switches into your script, you can
752put a loop on the front like this:
753.nf
754
755.ne 10
756 while ($_ = $ARGV[0], /\|^\-/\|) {
757 shift;
758 last if /\|^\-\|\-$\|/\|;
759 /\|^\-D\|(.*\|)/ \|&& \|($debug = $1);
760 /\|^\-v\|/ \|&& \|$verbose++;
761 .\|.\|. # other switches
762 }
763 while (<>) {
764 .\|.\|. # code for each line
765 }
766
767.fi
768The <> symbol will return FALSE only once.
769If you call it again after this it will assume you are processing another
a687059c 770@ARGV list, and if you haven't set @ARGV, will input from
771.IR STDIN .
378cc40b 772.PP
773If the string inside the angle brackets is a reference to a scalar variable
774(e.g. <$foo>),
775then that variable contains the name of the filehandle to input from.
776.PP
777If the string inside angle brackets is not a filehandle, it is interpreted
778as a filename pattern to be globbed, and either an array of filenames or the
779next filename in the list is returned, depending on context.
780One level of $ interpretation is done first, but you can't say <$foo>
781because that's an indirect filehandle as explained in the previous
782paragraph.
783You could insert curly brackets to force interpretation as a
784filename glob: <${foo}>.
785Example:
786.nf
787
788.ne 3
789 while (<*.c>) {
a687059c 790 chmod 0644, $_;
378cc40b 791 }
792
793is equivalent to
794
795.ne 5
a687059c 796 open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|");
378cc40b 797 while (<foo>) {
798 chop;
a687059c 799 chmod 0644, $_;
378cc40b 800 }
801
802.fi
803In fact, it's currently implemented that way.
a687059c 804(Which means it will not work on filenames with spaces in them unless
805you have /bin/csh on your machine.)
378cc40b 806Of course, the shortest way to do the above is:
807.nf
808
a687059c 809 chmod 0644, <*.c>;
378cc40b 810
811.fi
8d063cd8 812.Sh "Syntax"
813.PP
814A
815.I perl
816script consists of a sequence of declarations and commands.
817The only things that need to be declared in
818.I perl
819are report formats and subroutines.
820See the sections below for more information on those declarations.
a687059c 821All uninitialized objects user-created objects are assumed to
822start with a null or 0 value until they
823are defined by some explicit operation such as assignment.
8d063cd8 824The sequence of commands is executed just once, unlike in
825.I sed
826and
827.I awk
828scripts, where the sequence of commands is executed for each input line.
829While this means that you must explicitly loop over the lines of your input file
830(or files), it also means you have much more control over which files and which
831lines you look at.
832(Actually, I'm lying\*(--it is possible to do an implicit loop with either the
833.B \-n
834or
835.B \-p
836switch.)
837.PP
838A declaration can be put anywhere a command can, but has no effect on the
a687059c 839execution of the primary sequence of commands--declarations all take effect
840at compile time.
8d063cd8 841Typically all the declarations are put at the beginning or the end of the script.
842.PP
843.I Perl
844is, for the most part, a free-form language.
845(The only exception to this is format declarations, for fairly obvious reasons.)
846Comments are indicated by the # character, and extend to the end of the line.
847If you attempt to use /* */ C comments, it will be interpreted either as
848division or pattern matching, depending on the context.
849So don't do that.
850.Sh "Compound statements"
851In
852.IR perl ,
853a sequence of commands may be treated as one command by enclosing it
854in curly brackets.
855We will call this a BLOCK.
856.PP
857The following compound commands may be used to control flow:
858.nf
859
860.ne 4
861 if (EXPR) BLOCK
862 if (EXPR) BLOCK else BLOCK
378cc40b 863 if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK
8d063cd8 864 LABEL while (EXPR) BLOCK
865 LABEL while (EXPR) BLOCK continue BLOCK
866 LABEL for (EXPR; EXPR; EXPR) BLOCK
378cc40b 867 LABEL foreach VAR (ARRAY) BLOCK
8d063cd8 868 LABEL BLOCK continue BLOCK
869
870.fi
83b4785a 871Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not
8d063cd8 872statements.
873This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed.
874If you want to write conditionals without curly brackets there are several
875other ways to do it.
876The following all do the same thing:
877.nf
878
879.ne 5
a687059c 880 if (!open(foo)) { die "Can't open $foo: $!"; }
881 die "Can't open $foo: $!" unless open(foo);
882 open(foo) || die "Can't open $foo: $!"; # foo or bust!
883 open(foo) ? die "Can't open $foo: $!" : \'hi mom\';
884 # a bit exotic, that last one
8d063cd8 885
886.fi
8d063cd8 887.PP
888The
889.I if
890statement is straightforward.
891Since BLOCKs are always bounded by curly brackets, there is never any
892ambiguity about which
893.I if
894an
895.I else
896goes with.
897If you use
898.I unless
899in place of
900.IR if ,
901the sense of the test is reversed.
902.PP
903The
904.I while
905statement executes the block as long as the expression is true
906(does not evaluate to the null string or 0).
907The LABEL is optional, and if present, consists of an identifier followed by
908a colon.
909The LABEL identifies the loop for the loop control statements
910.IR next ,
a687059c 911.IR last ,
8d063cd8 912and
913.I redo
914(see below).
915If there is a
916.I continue
917BLOCK, it is always executed just before
918the conditional is about to be evaluated again, similarly to the third part
919of a
920.I for
921loop in C.
922Thus it can be used to increment a loop variable, even when the loop has
923been continued via the
924.I next
925statement (similar to the C \*(L"continue\*(R" statement).
926.PP
927If the word
928.I while
929is replaced by the word
930.IR until ,
931the sense of the test is reversed, but the conditional is still tested before
932the first iteration.
933.PP
934In either the
935.I if
936or the
937.I while
938statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional
939is true if the value of the last command in that block is true.
940.PP
941The
942.I for
943loop works exactly like the corresponding
944.I while
945loop:
946.nf
947
948.ne 12
949 for ($i = 1; $i < 10; $i++) {
950 .\|.\|.
951 }
952
953is the same as
954
955 $i = 1;
956 while ($i < 10) {
957 .\|.\|.
958 } continue {
959 $i++;
960 }
961.fi
962.PP
378cc40b 963The foreach loop iterates over a normal array value and sets the variable
964VAR to be each element of the array in turn.
13281fa4 965The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword,
966so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity.
378cc40b 967If VAR is omitted, $_ is set to each value.
968If ARRAY is an actual array (as opposed to an expression returning an array
969value), you can modify each element of the array
970by modifying VAR inside the loop.
971Examples:
972.nf
973
974.ne 5
975 for (@ary) { s/foo/bar/; }
976
977 foreach $elem (@elements) {
978 $elem *= 2;
979 }
980
a687059c 981.ne 3
982 for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) {
983 print $_, "\en"; sleep(1);
378cc40b 984 }
985
a687059c 986 for (1..15) { print "Merry Christmas\en"; }
987
378cc40b 988.ne 3
a687059c 989 foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'}) {
378cc40b 990 print "Item: $item\en";
991 }
a687059c 992
378cc40b 993.fi
994.PP
8d063cd8 995The BLOCK by itself (labeled or not) is equivalent to a loop that executes
996once.
997Thus you can use any of the loop control statements in it to leave or
998restart the block.
999The
1000.I continue
1001block is optional.
1002This construct is particularly nice for doing case structures.
1003.nf
1004
1005.ne 6
1006 foo: {
a687059c 1007 if (/^abc/) { $abc = 1; last foo; }
1008 if (/^def/) { $def = 1; last foo; }
1009 if (/^xyz/) { $xyz = 1; last foo; }
8d063cd8 1010 $nothing = 1;
1011 }
1012
1013.fi
a687059c 1014There is no official switch statement in perl, because there
1015are already several ways to write the equivalent.
1016In addition to the above, you could write
378cc40b 1017.nf
1018
a687059c 1019.ne 6
1020 foo: {
1021 $abc = 1, last foo if /^abc/;
1022 $def = 1, last foo if /^def/;
1023 $xyz = 1, last foo if /^xyz/;
1024 $nothing = 1;
1025 }
1026
1027or
1028
1029.ne 6
1030 foo: {
1031 /^abc/ && do { $abc = 1; last foo; }
1032 /^def/ && do { $def = 1; last foo; }
1033 /^xyz/ && do { $xyz = 1; last foo; }
1034 $nothing = 1;
1035 }
1036
1037or
1038
1039.ne 6
1040 foo: {
1041 /^abc/ && ($abc = 1, last foo);
1042 /^def/ && ($def = 1, last foo);
1043 /^xyz/ && ($xyz = 1, last foo);
1044 $nothing = 1;
1045 }
1046
1047or even
1048
378cc40b 1049.ne 8
a687059c 1050 if (/^abc/)
1051 { $abc = 1; last foo; }
1052 elsif (/^def/)
1053 { $def = 1; last foo; }
1054 elsif (/^xyz/)
1055 { $xyz = 1; last foo; }
1056 else
1057 {$nothing = 1;}
378cc40b 1058
1059.fi
a687059c 1060As it happens, these are all optimized internally to a switch structure,
1061so perl jumps directly to the desired statement, and you needn't worry
1062about perl executing a lot of unnecessary statements when you have a string
1063of 50 elsifs, as long as you are testing the same simple scalar variable
1064using ==, eq, or pattern matching as above.
1065(If you're curious as to whether the optimizer has done this for a particular
1066case statement, you can use the \-D1024 switch to list the syntax tree
1067before execution.)
8d063cd8 1068.Sh "Simple statements"
1069The only kind of simple statement is an expression evaluated for its side
1070effects.
1071Every expression (simple statement) must be terminated with a semicolon.
1072Note that this is like C, but unlike Pascal (and
1073.IR awk ).
1074.PP
1075Any simple statement may optionally be followed by a
1076single modifier, just before the terminating semicolon.
1077The possible modifiers are:
1078.nf
1079
1080.ne 4
1081 if EXPR
1082 unless EXPR
1083 while EXPR
1084 until EXPR
1085
1086.fi
1087The
1088.I if
1089and
1090.I unless
1091modifiers have the expected semantics.
1092The
1093.I while
1094and
378cc40b 1095.I until
8d063cd8 1096modifiers also have the expected semantics (conditional evaluated first),
1097except when applied to a do-BLOCK command,
1098in which case the block executes once before the conditional is evaluated.
1099This is so that you can write loops like:
1100.nf
1101
1102.ne 4
1103 do {
a687059c 1104 $_ = <STDIN>;
8d063cd8 1105 .\|.\|.
1106 } until $_ \|eq \|".\|\e\|n";
1107
1108.fi
1109(See the
1110.I do
1111operator below. Note also that the loop control commands described later will
83b4785a 1112NOT work in this construct, since modifiers don't take loop labels.
8d063cd8 1113Sorry.)
1114.Sh "Expressions"
1115Since
1116.I perl
1117expressions work almost exactly like C expressions, only the differences
1118will be mentioned here.
1119.PP
1120Here's what
1121.I perl
1122has that C doesn't:
a687059c 1123.Ip ** 8 2
1124The exponentiation operator.
1125.Ip **= 8
1126The exponentiation assignment operator.
8d063cd8 1127.Ip (\|) 8 3
1128The null list, used to initialize an array to null.
1129.Ip . 8
1130Concatenation of two strings.
1131.Ip .= 8
a687059c 1132The concatenation assignment operator.
8d063cd8 1133.Ip eq 8
1134String equality (== is numeric equality).
1135For a mnemonic just think of \*(L"eq\*(R" as a string.
1136(If you are used to the
1137.I awk
1138behavior of using == for either string or numeric equality
1139based on the current form of the comparands, beware!
1140You must be explicit here.)
1141.Ip ne 8
1142String inequality (!= is numeric inequality).
1143.Ip lt 8
1144String less than.
1145.Ip gt 8
1146String greater than.
1147.Ip le 8
1148String less than or equal.
1149.Ip ge 8
1150String greater than or equal.
1151.Ip =~ 8 2
1152Certain operations search or modify the string \*(L"$_\*(R" by default.
1153This operator makes that kind of operation work on some other string.
1154The right argument is a search pattern, substitution, or translation.
1155The left argument is what is supposed to be searched, substituted, or
1156translated instead of the default \*(L"$_\*(R".
1157The return value indicates the success of the operation.
1158(If the right argument is an expression other than a search pattern,
1159substitution, or translation, it is interpreted as a search pattern
1160at run time.
1161This is less efficient than an explicit search, since the pattern must
1162be compiled every time the expression is evaluated.)
1163The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.
1164.Ip !~ 8
1165Just like =~ except the return value is negated.
1166.Ip x 8
1167The repetition operator.
1168Returns a string consisting of the left operand repeated the
1169number of times specified by the right operand.
1170.nf
1171
a687059c 1172 print \'\-\' x 80; # print row of dashes
1173 print \'\-\' x80; # illegal, x80 is identifier
8d063cd8 1174
a687059c 1175 print "\et" x ($tab/8), \' \' x ($tab%8); # tab over
8d063cd8 1176
1177.fi
1178.Ip x= 8
a687059c 1179The repetition assignment operator.
1180.Ip .\|. 8
1181The range operator, which is really two different operators depending
1182on the context.
1183In an array context, returns an array of values counting (by ones)
1184from the left value to the right value.
1185This is useful for writing \*(L"for (1..10)\*(R" loops and for doing
1186slice operations on arrays.
1187.Sp
1188In a scalar context, .\|. returns a boolean value.
1189The operator is bistable, like a flip-flop..
1190Each .\|. operator maintains its own boolean state.
378cc40b 1191It is false as long as its left operand is false.
1192Once the left operand is true, the range operator stays true
1193until the right operand is true,
1194AFTER which the range operator becomes false again.
a687059c 1195(It doesn't become false till the next time the range operator is evaluated.
8d063cd8 1196It can become false on the same evaluation it became true, but it still returns
1197true once.)
13281fa4 1198The right operand is not evaluated while the operator is in the \*(L"false\*(R" state,
1199and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state.
a687059c 1200The scalar .\|. operator is primarily intended for doing line number ranges
1201after
8d063cd8 1202the fashion of \fIsed\fR or \fIawk\fR.
1203The precedence is a little lower than || and &&.
1204The value returned is either the null string for false, or a sequence number
1205(beginning with 1) for true.
1206The sequence number is reset for each range encountered.
a687059c 1207The final sequence number in a range has the string \'E0\' appended to it, which
8d063cd8 1208doesn't affect its numeric value, but gives you something to search for if you
1209want to exclude the endpoint.
1210You can exclude the beginning point by waiting for the sequence number to be
1211greater than 1.
a687059c 1212If either operand of scalar .\|. is static, that operand is implicitly compared
1213to the $. variable, the current line number.
8d063cd8 1214Examples:
1215.nf
1216
a687059c 1217.ne 6
1218As a scalar operator:
1219 if (101 .\|. 200) { print; } # print 2nd hundred lines
8d063cd8 1220
a687059c 1221 next line if (1 .\|. /^$/); # skip header lines
8d063cd8 1222
a687059c 1223 s/^/> / if (/^$/ .\|. eof()); # quote body
1224
1225.ne 4
1226As an array operator:
1227 for (101 .\|. 200) { print; } # print $_ 100 times
1228
1229 @foo = @foo[$[ .\|. $#foo]; # an expensive no-op
1230 @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items
8d063cd8 1231
1232.fi
378cc40b 1233.Ip \-x 8
1234A file test.
1235This unary operator takes one argument, either a filename or a filehandle,
1236and tests the associated file to see if something is true about it.
a687059c 1237If the argument is omitted, tests $_, except for \-t, which tests
1238.IR STDIN .
1239It returns 1 for true and \'\' for false, or the undefined value if the
1240file doesn't exist.
378cc40b 1241Precedence is higher than logical and relational operators, but lower than
1242arithmetic operators.
1243The operator may be any of:
1244.nf
1245 \-r File is readable by effective uid.
a687059c 1246 \-w File is writable by effective uid.
378cc40b 1247 \-x File is executable by effective uid.
1248 \-o File is owned by effective uid.
1249 \-R File is readable by real uid.
a687059c 1250 \-W File is writable by real uid.
378cc40b 1251 \-X File is executable by real uid.
1252 \-O File is owned by real uid.
1253 \-e File exists.
1254 \-z File has zero size.
1255 \-s File has non-zero size.
1256 \-f File is a plain file.
1257 \-d File is a directory.
1258 \-l File is a symbolic link.
1259 \-p File is a named pipe (FIFO).
1260 \-S File is a socket.
1261 \-b File is a block special file.
1262 \-c File is a character special file.
1263 \-u File has setuid bit set.
1264 \-g File has setgid bit set.
1265 \-k File has sticky bit set.
1266 \-t Filehandle is opened to a tty.
1267 \-T File is a text file.
1268 \-B File is a binary file (opposite of \-T).
1269
1270.fi
1271The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X
1272is based solely on the mode of the file and the uids and gids of the user.
1273There may be other reasons you can't actually read, write or execute the file.
1274Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and
1275\-x and \-X return 1 if any execute bit is set in the mode.
1276Scripts run by the superuser may thus need to do a stat() in order to determine
1277the actual mode of the file, or temporarily set the uid to something else.
1278.Sp
1279Example:
1280.nf
1281.ne 7
1282
1283 while (<>) {
1284 chop;
1285 next unless \-f $_; # ignore specials
1286 .\|.\|.
1287 }
1288
1289.fi
a687059c 1290Note that \-s/a/b/ does not do a negated substitution.
1291Saying \-exp($foo) still works as expected, however\*(--only single letters
378cc40b 1292following a minus are interpreted as file tests.
1293.Sp
1294The \-T and \-B switches work as follows.
1295The first block or so of the file is examined for odd characters such as
1296strange control codes or metacharacters.
1297If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file.
1298Also, any file containing null in the first block is considered a binary file.
1299If \-T or \-B is used on a filehandle, the current stdio buffer is examined
1300rather than the first block.
378cc40b 1301Both \-T and \-B return TRUE on a null file, or a file at EOF when testing
1302a filehandle.
8d063cd8 1303.PP
a687059c 1304If any of the file tests (or either stat operator) are given the special
1305filehandle consisting of a solitary underline, then the stat structure
1306of the previous file test (or stat operator) is used, saving a system
1307call.
1308(This doesn't work with \-t, and you need to remember that lstat and -l
1309will leave values in the stat structure for the symbolic link, not the
1310real file.)
1311Example:
1312.nf
1313
1314 print "Can do.\en" if -r $a || -w _ || -x _;
1315
1316.ne 9
1317 stat($filename);
1318 print "Readable\en" if -r _;
1319 print "Writable\en" if -w _;
1320 print "Executable\en" if -x _;
1321 print "Setuid\en" if -u _;
1322 print "Setgid\en" if -g _;
1323 print "Sticky\en" if -k _;
1324 print "Text\en" if -T _;
1325 print "Binary\en" if -B _;
1326
1327.fi
1328.PP
8d063cd8 1329Here is what C has that
1330.I perl
1331doesn't:
1332.Ip "unary &" 12
1333Address-of operator.
1334.Ip "unary *" 12
1335Dereference-address operator.
378cc40b 1336.Ip "(TYPE)" 12
1337Type casting operator.
8d063cd8 1338.PP
1339Like C,
1340.I perl
1341does a certain amount of expression evaluation at compile time, whenever
1342it determines that all of the arguments to an operator are static and have
1343no side effects.
1344In particular, string concatenation happens at compile time between literals that don't do variable substitution.
1345Backslash interpretation also happens at compile time.
1346You can say
1347.nf
1348
1349.ne 2
a687059c 1350 \'Now is the time for all\' . "\|\e\|n" .
1351 \'good men to come to.\'
8d063cd8 1352
1353.fi
1354and this all reduces to one string internally.
1355.PP
378cc40b 1356The autoincrement operator has a little extra built-in magic to it.
1357If you increment a variable that is numeric, or that has ever been used in
1358a numeric context, you get a normal increment.
1359If, however, the variable has only been used in string contexts since it
1360was set, and has a value that is not null and matches the
a687059c 1361pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done
378cc40b 1362as a string, preserving each character within its range, with carry:
1363.nf
1364
a687059c 1365 print ++($foo = \'99\'); # prints \*(L'100\*(R'
1366 print ++($foo = \'a0\'); # prints \*(L'a1\*(R'
1367 print ++($foo = \'Az\'); # prints \*(L'Ba\*(R'
1368 print ++($foo = \'zz\'); # prints \*(L'aaa\*(R'
378cc40b 1369
1370.fi
1371The autodecrement is not magical.