1 ''' Beginning of part 4
2 ''' $Header: perl_man.4,v 3.0.1.8 90/03/27 16:19:31 lwall Locked $
4 ''' $Log: perl.man.4,v $
5 ''' Revision 3.0.1.8 90/03/27 16:19:31 lwall
6 ''' patch16: MSDOS support
8 ''' Revision 3.0.1.7 90/03/14 12:29:50 lwall
9 ''' patch15: man page falsely states that you can't subscript array values
11 ''' Revision 3.0.1.6 90/03/12 16:54:04 lwall
12 ''' patch13: improved documentation of *name
14 ''' Revision 3.0.1.5 90/02/28 18:01:52 lwall
15 ''' patch9: $0 is now always the command name
17 ''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
18 ''' patch7: documented that package'filehandle works as well as $package'variable
19 ''' patch7: documented which identifiers are always in package main
21 ''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
22 ''' patch5: fixed some manual typos and indent problems
23 ''' patch5: clarified difference between $! and $@
25 ''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
26 ''' patch2: made some line breaks depend on troff vs. nroff
27 ''' patch2: clarified operation of ^ and $ when $* is false
29 ''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
30 ''' patch1: documented the desirability of unnecessary parentheses
32 ''' Revision 3.0 89/10/18 15:21:55 lwall
37 operators have the following associativity and precedence:
40 nonassoc\h'|1i'print printf exec system sort reverse
41 \h'1.5i'chmod chown kill unlink utime die return
43 right\h'|1i'= += \-= *= etc.
50 nonassoc\h'|1i'== != eq ne
51 nonassoc\h'|1i'< > <= >= lt gt le ge
52 nonassoc\h'|1i'chdir exit eval reset sleep rand umask
53 nonassoc\h'|1i'\-r \-w \-x etc.
58 right\h'|1i'! ~ and unary minus
60 nonassoc\h'|1i'++ \-\|\-
61 left\h'|1i'\*(L'(\*(R'
64 As mentioned earlier, if any list operator (print, etc.) or
65 any unary operator (chdir, etc.)
66 is followed by a left parenthesis as the next token on the same line,
67 the operator and arguments within parentheses are taken to
68 be of highest precedence, just like a normal function call.
72 chdir $foo || die;\h'|3i'# (chdir $foo) || die
73 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
74 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
75 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
77 but, because * is higher precedence than ||:
79 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
80 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
81 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
82 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
84 rand 10 * 20;\h'|3i'# rand (10 * 20)
85 rand(10) * 20;\h'|3i'# (rand 10) * 20
86 rand (10) * 20;\h'|3i'# (rand 10) * 20
87 rand +(10) * 20;\h'|3i'# rand (10 * 20)
90 In the absence of parentheses,
91 the precedence of list operators such as print, sort or chmod is
92 either very high or very low depending on whether you look at the left
93 side of operator or the right side of it.
97 @ary = (1, 3, sort 4, 2);
98 print @ary; # prints 1324
101 the commas on the right of the sort are evaluated before the sort, but
102 the commas on the left are evaluated after.
103 In other words, list operators tend to gobble up all the arguments that
104 follow them, and then act like a simple term with regard to the preceding
106 Note that you have to be careful with parens:
110 # These evaluate exit before doing the print:
111 print($foo, exit); # Obviously not what you want.
112 print $foo, exit; # Nor is this.
115 # These do the print before evaluating exit:
116 (print $foo), exit; # This is what you want.
117 print($foo), exit; # Or this.
118 print ($foo), exit; # Or even this.
122 print ($foo & 255) + 1, "\en";
125 probably doesn't do what you expect at first glance.
127 A subroutine may be declared as follows:
134 Any arguments passed to the routine come in as array @_,
135 that is ($_[0], $_[1], .\|.\|.).
136 The array @_ is a local array, but its values are references to the
137 actual scalar parameters.
138 The return value of the subroutine is the value of the last expression
139 evaluated, and can be either an array value or a scalar value.
140 Alternately, a return statement may be used to specify the returned value and
142 To create local variables see the
146 A subroutine is called using the
148 operator or the & operator.
155 local($max) = pop(@_);
157 $max = $foo \|if \|$max < $foo;
163 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
168 # get a line, combining continuation lines
169 # that start with whitespace
171 $thisline = $lookahead;
172 line: while ($lookahead = <STDIN>) {
173 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
174 $thisline \|.= \|$lookahead;
183 $lookahead = <STDIN>; # get first line
184 while ($_ = do get_line(\|)) {
191 Use array assignment to a local list to name your formal arguments:
194 local($key, $value) = @_;
195 $foo{$key} = $value unless $foo{$key};
199 This also has the effect of turning call-by-reference into call-by-value,
200 since the assignment copies the values.
202 Subroutines may be called recursively.
203 If a subroutine is called using the & form, the argument list is optional.
204 If omitted, no @_ array is set up for the subroutine; the @_ array at the
205 time of the call is visible to subroutine instead.
208 do foo(1,2,3); # pass three arguments
209 &foo(1,2,3); # the same
211 do foo(); # pass a null list
213 &foo; # pass no arguments--more efficient
216 .Sh "Passing By Reference"
217 Sometimes you don't want to pass the value of an array to a subroutine but
218 rather the name of it, so that the subroutine can modify the global copy
219 of it rather than working with a local copy.
220 In perl you can refer to all the objects of a particular name by prefixing
221 the name with a star: *foo.
222 When evaluated, it produces a scalar value that represents all the objects
223 of that name, including any filehandle, format or subroutine.
224 When assigned to within a local() operation, it causes the name mentioned
225 to refer to whatever * value was assigned to it.
230 local(*someary) = @_;
231 foreach $elem (@someary) {
239 Assignment to *name is currently recommended only inside a local().
240 You can actually assign to *name anywhere, but the previous referent of
241 *name may be stranded forever.
242 This may or may not bother you.
244 Note that scalars are already passed by reference, so you can modify scalar
245 arguments without using this mechanism by referring explicitly to the $_[nnn]
247 You can modify all the elements of an array by passing all the elements
248 as scalars, but you have to use the * mechanism to push, pop or change the
250 The * mechanism will probably be more efficient in any case.
252 Since a *name value contains unprintable binary data, if it is used as
253 an argument in a print, or as a %s argument in a printf or sprintf, it
254 then has the value '*name', just so it prints out pretty.
256 Even if you don't want to modify an array, this mechanism is useful for
257 passing multiple arrays in a single LIST, since normally the LIST mechanism
258 will merge all the array values so that you can't extract out the
260 .Sh "Regular Expressions"
261 The patterns used in pattern matching are regular expressions such as
262 those supplied in the Version 8 regexp routines.
263 (In fact, the routines are derived from Henry Spencer's freely redistributable
264 reimplementation of the V8 routines.)
265 In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
266 Word boundaries may be matched by \eb, and non-boundaries by \eB.
267 A whitespace character is matched by \es, non-whitespace by \eS.
268 A numeric character is matched by \ed, non-numeric by \eD.
269 You may use \ew, \es and \ed within character classes.
270 Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
271 Within character classes \eb represents backspace rather than a word boundary.
272 Alternatives may be separated by |.
273 The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
274 matches the digit'th substring, where digit can range from 1 to 9.
275 (Outside of the pattern, always use $ instead of \e in front of the digit.
276 The scope of $<digit> (and $\`, $& and $\')
277 extends to the end of the enclosing BLOCK or eval string, or to
278 the next pattern match with subexpressions.
279 The \e<digit> notation sometimes works outside the current pattern, but should
281 $+ returns whatever the last bracket match matched.
282 $& returns the entire matched string.
283 ($0 used to return the same thing, but not any more.)
284 $\` returns everything before the matched string.
285 $\' returns everything after the matched string.
289 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
292 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
299 By default, the ^ character is only guaranteed to match at the beginning
301 the $ character only at the end (or before the newline at the end)
304 does certain optimizations with the assumption that the string contains
306 The behavior of ^ and $ on embedded newlines will be inconsistent.
307 You may, however, wish to treat a string as a multi-line buffer, such that
308 the ^ will match after any newline within the string, and $ will match
310 At the cost of a little more overhead, you can do this by setting the variable
312 Setting it back to 0 makes
314 revert to its old behavior.
316 To facilitate multi-line substitutions, the . character never matches a newline
318 In particular, the following leaves a newline on the $_ string:
322 s/.*(some_string).*/$1/;
324 If the newline is unwanted, try one of
326 s/.*(some_string).*\en/$1/;
327 s/.*(some_string)[^\e000]*/$1/;
328 s/.*(some_string)(.|\en)*/$1/;
329 chop; s/.*(some_string).*/$1/;
330 /(some_string)/ && ($_ = $1);
333 Any item of a regular expression may be followed with digits in curly brackets
334 of the form {n,m}, where n gives the minimum number of times to match the item
335 and m gives the maximum.
336 The form {n} is equivalent to {n,n} and matches exactly n times.
337 The form {n,} matches n or more times.
338 (If a curly bracket occurs in any other context, it is treated as a regular
340 The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
342 There is no limit to the size of n or m, but large numbers will chew up
345 You will note that all backslashed metacharacters in
348 such as \eb, \ew, \en.
349 Unlike some other regular expression languages, there are no backslashed
350 symbols that aren't alphanumeric.
351 So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
352 interpreted as a literal character, not a metacharacter.
353 This makes it simple to quote a string that you want to use for a pattern
354 but that you are afraid might contain metacharacters.
355 Simply quote all the non-alphanumeric characters:
358 $pattern =~ s/(\eW)/\e\e$1/g;
362 Output record formats for use with the
364 operator may declared as follows:
373 If name is omitted, format \*(L"STDOUT\*(R" is defined.
374 FORMLIST consists of a sequence of lines, each of which may be of one of three
379 A \*(L"picture\*(R" line giving the format for one output line.
381 An argument line supplying values to plug into a picture line.
383 Picture lines are printed exactly as they look, except for certain fields
384 that substitute values into the line.
385 Each picture field starts with either @ or ^.
386 The @ field (not to be confused with the array marker @) is the normal
387 case; ^ fields are used
388 to do rudimentary multi-line text block filling.
389 The length of the field is supplied by padding out the field
390 with multiple <, >, or | characters to specify, respectively, left justification,
391 right justification, or centering.
392 If any of the values supplied for these fields contains a newline, only
393 the text up to the newline is printed.
394 The special field @* can be used for printing multi-line values.
395 It should appear by itself on a line.
397 The values are specified on the following line, in the same order as
399 The values should be separated by commas.
401 Picture fields that begin with ^ rather than @ are treated specially.
402 The value supplied must be a scalar variable name which contains a text
405 puts as much text as it can into the field, and then chops off the front
406 of the string so that the next time the variable is referenced,
407 more of the text can be printed.
408 Normally you would use a sequence of fields in a vertical stack to print
410 If you like, you can end the final field with .\|.\|., which will appear in the
411 output if the text was too long to appear in its entirety.
412 You can change which characters are legal to break on by changing the
413 variable $: to a list of the desired characters.
415 Since use of ^ fields can produce variable length records if the text to be
416 formatted is short, you can suppress blank lines by putting the tilde (~)
417 character anywhere in the line.
418 (Normally you should put it in the front if possible, for visibility.)
419 The tilde will be translated to a space upon output.
420 If you put a second tilde contiguous to the first, the line will be repeated
421 until all the fields on the line are exhausted.
422 (If you use a field of the @ variety, the expression you supply had better
423 not give the same value every time forever!)
432 # a report on the /etc/passwd file
435 Name Login Office Uid Gid Home
436 ------------------------------------------------------------------
439 @<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
440 $name, $login, $office,$uid,$gid, $home
444 # a report from a bug report form
447 @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
449 ------------------------------------------------------------------
452 Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
454 Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
455 \& $index, $description
456 Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
457 \& $priority, $date, $description
458 From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
459 \& $from, $description
460 Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
461 \& $programmer, $description
462 \&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
464 \&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
466 \&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
468 \&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
470 \&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
478 It is possible to intermix prints with writes on the same output channel,
479 but you'll have to handle $\- (lines left on the page) yourself.
481 If you are printing lots of fields that are usually blank, you should consider
482 using the reset operator between records.
483 Not only is it more efficient, but it can prevent the bug of adding another
484 field and forgetting to zero it.
485 .Sh "Interprocess Communication"
486 The IPC facilities of perl are built on the Berkeley socket mechanism.
487 If you don't have sockets, you can ignore this section.
488 The calls have the same names as the corresponding system calls,
489 but the arguments tend to differ, for two reasons.
490 First, perl file handles work differently than C file descriptors.
491 Second, perl already knows the length of its strings, so you don't need
492 to pass that information.
493 Here is a sample client (untested):
496 ($them,$port) = @ARGV;
497 $port = 2345 unless $port;
498 $them = 'localhost' unless $them;
500 $SIG{'INT'} = 'dokill';
501 sub dokill { kill 9,$child if $child; }
503 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
505 $sockaddr = 'S n a4 x8';
506 chop($hostname = `hostname`);
508 ($name, $aliases, $proto) = getprotobyname('tcp');
509 ($name, $aliases, $port) = getservbyname($port, 'tcp')
510 unless $port =~ /^\ed+$/;
512 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
515 ($name, $aliases, $type, $len, $thisaddr) =
516 gethostbyname($hostname);
518 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
520 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
521 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
523 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
524 bind(S, $this) || die "bind: $!";
525 connect(S, $that) || die "connect: $!";
527 select(S); $| = 1; select(stdout);
547 $port = 2345 unless $port;
549 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
551 $sockaddr = 'S n a4 x8';
553 ($name, $aliases, $proto) = getprotobyname('tcp');
554 ($name, $aliases, $port) = getservbyname($port, 'tcp')
555 unless $port =~ /^\ed+$/;
557 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
559 select(NS); $| = 1; select(stdout);
561 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
562 bind(S, $this) || die "bind: $!";
563 listen(S, 5) || die "connect: $!";
565 select(S); $| = 1; select(stdout);
568 print "Listening again\en";
569 ($addr = accept(NS,S)) || die $!;
570 print "accept ok\en";
572 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
573 @inetaddr = unpack('C4',$inetaddr);
574 print "$af $port @inetaddr\en";
583 .Sh "Predefined Names"
584 The following names have special meaning to
586 I could have used alphabetic symbols for some of these, but I didn't want
587 to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
589 You'll just have to suffer along with these silly symbols.
590 Most of them have reasonable mnemonics, or analogues in one of the shells.
592 The default input and pattern-searching space.
593 The following pairs are equivalent:
597 while (<>) {\|.\|.\|. # only equivalent in while!
598 while ($_ = <>) {\|.\|.\|.
602 $_ \|=~ \|/\|^Subject:/
613 (Mnemonic: underline is understood in certain operations.)
615 The current input line number of the last filehandle that was read.
617 Remember that only an explicit close on the filehandle resets the line number.
618 Since <> never does an explicit close, line numbers increase across ARGV files
619 (but see examples under eof).
620 (Mnemonic: many programs use . to mean the current line number.)
622 The input record separator, newline by default.
625 RS variable, including treating blank lines as delimiters
626 if set to the null string.
627 If set to a value longer than one character, only the first character is used.
628 (Mnemonic: / is used to delimit line boundaries when quoting poetry.)
630 The output field separator for the print operator.
631 Ordinarily the print operator simply prints out the comma separated fields
633 In order to get behavior more like
635 set this variable as you would set
637 OFS variable to specify what is printed between fields.
638 (Mnemonic: what is printed when there is a , in your print statement.)
640 This is like $, except that it applies to array values interpolated into
641 a double-quoted string (or similar interpreted string).
643 (Mnemonic: obvious, I think.)
645 The output record separator for the print operator.
646 Ordinarily the print operator simply prints out the comma separated fields
647 you specify, with no trailing newline or record separator assumed.
648 In order to get behavior more like
650 set this variable as you would set
652 ORS variable to specify what is printed at the end of the print.
653 (Mnemonic: you set $\e instead of adding \en at the end of the print.
654 Also, it's just like /, but it's what you get \*(L"back\*(R" from
657 The output format for printed numbers.
658 This variable is a half-hearted attempt to emulate
661 There are times, however, when
665 have differing notions of what
667 Also, the initial value is %.20g rather than %.6g, so you need to set $#
671 (Mnemonic: # is the number sign.)
673 The current page number of the currently selected output channel.
674 (Mnemonic: % is page number in nroff.)
676 The current page length (printable lines) of the currently selected output
679 (Mnemonic: = has horizontal lines.)
681 The number of lines left on the page of the currently selected output channel.
682 (Mnemonic: lines_on_page \- lines_printed.)
684 The name of the current report format for the currently selected output
686 (Mnemonic: brother to $^.)
688 The name of the current top-of-page format for the currently selected output
690 (Mnemonic: points to top of page.)
692 If set to nonzero, forces a flush after every write or print on the currently
693 selected output channel.
697 will typically be line buffered if output is to the
698 terminal and block buffered otherwise.
699 Setting this variable is useful primarily when you are outputting to a pipe,
700 such as when you are running a
702 script under rsh and want to see the
703 output as it's happening.
704 (Mnemonic: when you want your pipes to be piping hot.)
706 The process number of the
709 (Mnemonic: same as shells.)
711 The status returned by the last pipe close, backtick (\`\`) command or
714 Note that this is the status word returned by the wait() system
715 call, so the exit value of the subprocess is actually ($? >> 8).
716 $? & 255 gives which signal, if any, the process died from, and whether
717 there was a core dump.
718 (Mnemonic: similar to sh and ksh.)
720 The string matched by the last pattern match (not counting any matches hidden
721 within a BLOCK or eval enclosed by the current BLOCK).
722 (Mnemonic: like & in some editors.)
724 The string preceding whatever was matched by the last pattern match
725 (not counting any matches hidden within a BLOCK or eval enclosed by the current
727 (Mnemonic: \` often precedes a quoted string.)
729 The string following whatever was matched by the last pattern match
730 (not counting any matches hidden within a BLOCK or eval enclosed by the current
732 (Mnemonic: \' often follows a quoted string.)
739 print "$\`:$&:$\'\en"; # prints abc:def:ghi
743 The last bracket matched by the last search pattern.
744 This is useful if you don't know which of a set of alternative patterns
749 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
752 (Mnemonic: be positive and forward looking.)
754 Set to 1 to do multiline matching within a string, 0 to tell
756 that it can assume that strings contain a single line, for the purpose
757 of optimizing pattern matches.
758 Pattern matches on strings containing multiple newlines can produce confusing
759 results when $* is 0.
761 (Mnemonic: * matches multiple things.)
763 Contains the name of the file containing the
765 script being executed.
766 (Mnemonic: same as sh and ksh.)
768 Contains the subpattern from the corresponding set of parentheses in the last
769 pattern matched, not counting patterns matched in nested blocks that have
771 (Mnemonic: like \edigit.)
773 The index of the first element in an array, and of the first character in
775 Default is 0, but you could set it to 1 to make
780 when subscripting and when evaluating the index() and substr() functions.
781 (Mnemonic: [ begins subscripts.)
783 The string printed out when you say \*(L"perl -v\*(R".
784 It can be used to determine at the beginning of a script whether the perl
785 interpreter executing the script is in the right range of versions.
790 # see if getc is available
791 ($version,$patchlevel) =
792 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
793 print STDERR "(No filename completion available.)\en"
794 if $version * 1000 + $patchlevel < 2016;
797 (Mnemonic: Is this version of perl in the right bracket?)
799 The subscript separator for multi-dimensional array emulation.
800 If you refer to an associative array element as
806 $foo{join($;, $a, $b, $c)}
810 @foo{$a,$b,$c} # a slice--note the @
814 ($foo{$a},$foo{$b},$foo{$c})
817 Default is "\e034", the same as SUBSEP in
819 Note that if your keys contain binary data there might not be any safe
821 (Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
822 Yeah, I know, it's pretty lame, but $, is already taken for something more
825 If used in a numeric context, yields the current value of errno, with all the
827 (This means that you shouldn't depend on the value of $! to be anything
828 in particular unless you've gotten a specific error return indicating a
830 If used in a string context, yields the corresponding system error string.
831 You can assign to $! in order to set errno
832 if, for instance, you want $! to return the string for error n, or you want
833 to set the exit value for the die operator.
834 (Mnemonic: What just went bang?)
836 The perl syntax error message from the last eval command.
837 If null, the last eval parsed and executed correctly (although the operations
838 you invoked may have failed in the normal fashion).
839 (Mnemonic: Where was the syntax error \*(L"at\*(R"?)
841 The real uid of this process.
842 (Mnemonic: it's the uid you came FROM, if you're running setuid.)
844 The effective uid of this process.
849 $< = $>; # set real uid to the effective uid
850 ($<,$>) = ($>,$<); # swap real and effective uid
853 (Mnemonic: it's the uid you went TO, if you're running setuid.)
854 Note: $< and $> can only be swapped on machines supporting setreuid().
856 The real gid of this process.
857 If you are on a machine that supports membership in multiple groups
858 simultaneously, gives a space separated list of groups you are in.
859 The first number is the one returned by getgid(), and the subsequent ones
860 by getgroups(), one of which may be the same as the first number.
861 (Mnemonic: parentheses are used to GROUP things.
862 The real gid is the group you LEFT, if you're running setgid.)
864 The effective gid of this process.
865 If you are on a machine that supports membership in multiple groups
866 simultaneously, gives a space separated list of groups you are in.
867 The first number is the one returned by getegid(), and the subsequent ones
868 by getgroups(), one of which may be the same as the first number.
869 (Mnemonic: parentheses are used to GROUP things.
870 The effective gid is the group that's RIGHT for you, if you're running setgid.)
872 Note: $<, $>, $( and $) can only be set on machines that support the
873 corresponding set[re][ug]id() routine.
874 $( and $) can only be swapped on machines supporting setregid().
876 The current set of characters after which a string may be broken to
877 fill continuation fields (starting with ^) in a format.
878 Default is "\ \en-", to break on whitespace or hyphens.
879 (Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
881 The array ARGV contains the command line arguments intended for the script.
882 Note that $#ARGV is the generally number of arguments minus one, since
883 $ARGV[0] is the first argument, NOT the command name.
884 See $0 for the command name.
886 The array INC contains the list of places to look for
889 evaluated by the \*(L"do EXPR\*(R" command.
890 It initially consists of the arguments to any
892 command line switches, followed
895 library, probably \*(L"/usr/local/lib/perl\*(R".
897 The associative array ENV contains your current environment.
898 Setting a value in ENV changes the environment for child processes.
900 The associative array SIG is used to set signal handlers for various signals.
905 sub handler { # 1st argument is signal name
907 print "Caught a SIG$sig\-\|\-shutting down\en";
912 $SIG{\'INT\'} = \'handler\';
913 $SIG{\'QUIT\'} = \'handler\';
915 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
916 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
919 The SIG array only contains values for the signals actually set within
922 Perl provides a mechanism for alternate namespaces to protect packages from
923 stomping on each others variables.
924 By default, a perl script starts compiling into the package known as \*(L"main\*(R".
927 declaration, you can switch namespaces.
928 The scope of the package declaration is from the declaration itself to the end
929 of the enclosing block (the same scope as the local() operator).
930 Typically it would be the first declaration in a file to be included by
931 the \*(L"do FILE\*(R" operator.
932 You can switch into a package in more than one place; it merely influences
933 which symbol table is used by the compiler for the rest of that block.
934 You can refer to variables and filehandles in other packages by prefixing
935 the identifier with the package name and a single quote.
936 If the package name is null, the \*(L"main\*(R" package as assumed.
938 Only identifiers starting with letters are stored in the packages symbol
940 All other symbols are kept in package \*(L"main\*(R".
941 In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
942 and SIG are forced to be in package \*(L"main\*(R", even when used for
943 other purposes than their built-in one.
944 Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
945 or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
946 will be interpreted instead as a pattern match, a substitution
949 Eval'ed strings are compiled in the package in which the eval was compiled
951 (Assignments to $SIG{}, however, assume the signal handler specified is in the
953 Qualify the signal handler name if you wish to have a signal handler in
955 For an example, examine perldb.pl in the perl library.
956 It initially switches to the DB package so that the debugger doesn't interfere
957 with variables in the script you are trying to debug.
958 At various points, however, it temporarily switches back to the main package
959 to evaluate various expressions in the context of the main package.
961 The symbol table for a package happens to be stored in the associative array
962 of that name prepended with an underscore.
963 The value in each entry of the associative array is
964 what you are referring to when you use the *name notation.
965 In fact, the following have the same effect (in package main, anyway),
966 though the first is more
967 efficient because it does the symbol table lookups at compile time:
972 local($_main{'foo'}) = $_main{'bar'};
975 You can use this to print out all the variables in a package, for instance.
976 Here is dumpvar.pl from the perl library:
983 \& local(*stab) = eval("*_$package");
984 \& while (($key,$val) = each(%stab)) {
986 \& local(*entry) = $val;
987 \& if (defined $entry) {
988 \& print "\e$$key = '$entry'\en";
991 \& if (defined @entry) {
992 \& print "\e@$key = (\en";
993 \& foreach $num ($[ .. $#entry) {
994 \& print " $num\et'",$entry[$num],"'\en";
999 \& if ($key ne "_$package" && defined %entry) {
1000 \& print "\e%$key = (\en";
1001 \& foreach $key (sort keys(%entry)) {
1002 \& print " $key\et'",$entry{$key},"'\en";
1011 Note that, even though the subroutine is compiled in package dumpvar, the
1012 name of the subroutine is qualified so that its name is inserted into package
1015 Each programmer will, of course, have his or her own preferences in regards
1016 to formatting, but there are some general guidelines that will make your
1017 programs easier to read.
1019 Just because you CAN do something a particular way doesn't mean that
1020 you SHOULD do it that way.
1022 is designed to give you several ways to do anything, so consider picking
1023 the most readable one.
1026 open(FOO,$foo) || die "Can't open $foo: $!";
1030 die "Can't open $foo: $!" unless open(FOO,$foo);
1032 because the second way hides the main point of the statement in a
1036 print "Starting analysis\en" if $verbose;
1040 $verbose && print "Starting analysis\en";
1042 since the main point isn't whether the user typed -v or not.
1044 Similarly, just because an operator lets you assume default arguments
1045 doesn't mean that you have to make use of the defaults.
1046 The defaults are there for lazy systems programmers writing one-shot
1048 If you want your program to be readable, consider supplying the argument.
1050 Along the same lines, just because you
1052 omit parentheses in many places doesn't mean that you ought to:
1055 return print reverse sort num values array;
1056 return print(reverse(sort num (values(%array))));
1059 When in doubt, parenthesize.
1060 At the very least it will let some poor schmuck bounce on the % key in vi.
1062 Don't go through silly contortions to exit a loop at the top or the
1065 provides the "last" operator so you can exit in the middle.
1066 Just outdent it a little to make it more visible:
1080 Don't be afraid to use loop labels\*(--they're there to enhance readability as
1081 well as to allow multi-level loop breaks.
1084 For portability, when using features that may not be implemented on every
1085 machine, test the construct in an eval to see if it fails.
1086 If you know what version or patchlevel a particular feature was implemented,
1087 you can test $] to see if it will be there.
1089 Choose mnemonic identifiers.
1097 switch, your script will be run under a debugging monitor.
1098 It will halt before the first executable statement and ask you for a
1101 Prints out a help message.
1104 Executes until it reaches the beginning of another statement.
1107 Executes until the next breakpoint is reached.
1111 Single step around subroutine call.
1112 .Ip "l min+incr" 12 4
1113 List incr+1 lines starting at min.
1114 If min is omitted, starts where last listing left off.
1115 If incr is omitted, previous value of incr is used.
1116 .Ip "l min-max" 12 4
1117 List lines in the indicated range.
1119 List just the indicated line.
1121 List incr+1 more lines after last printed line.
1122 .Ip "l subname" 12 4
1124 If it's a long subroutine it just lists the beginning.
1125 Use \*(L"l\*(R" to list more.
1127 List lines that have breakpoints or actions.
1129 Toggle trace mode on or off.
1132 If line is omitted, sets a breakpoint on the current line
1133 line that is about to be executed.
1134 Breakpoints may only be set on lines that begin an executable statement.
1135 .Ip "b subname" 12 4
1136 Set breakpoint at first executable line of subroutine.
1138 Lists the names of all subroutines.
1141 If line is omitted, deletes the breakpoint on the current line
1142 line that is about to be executed.
1144 Delete all breakpoints.
1146 Delete all line actions.
1147 .Ip "V package" 12 4
1148 List all variables in package.
1149 Default is main package.
1150 .Ip "a line command" 12 4
1151 Set an action for line.
1152 A multi-line command may be entered by backslashing the newlines.
1153 .Ip "< command" 12 4
1154 Set an action to happen before every debugger prompt.
1155 A multi-line command may be entered by backslashing the newlines.
1156 .Ip "> command" 12 4
1157 Set an action to happen after the prompt when you've just given a command
1158 to return to executing the script.
1159 A multi-line command may be entered by backslashing the newlines.
1161 Redo a debugging command.
1162 If number is omitted, redoes the previous command.
1163 .Ip "! -number" 12 4
1164 Redo the command that was that many commands ago.
1165 .Ip "H -number" 12 4
1166 Display last n commands.
1167 Only commands longer than one character are listed.
1168 If number is omitted, lists them all.
1172 Execute command as a perl statement.
1173 A missing semicolon will be supplied.
1175 Same as \*(L"print DB'OUT expr\*(R".
1176 The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1177 may be redirected to.
1179 If you want to modify the debugger, copy perldb.pl from the perl library
1180 to your current directory and modify it as necessary.
1181 You can do some customization by setting up a .perldb file which contains
1182 initialization code.
1183 For instance, you could make aliases like these:
1186 $DB'alias{'len'} = 's/^len(.*)/p length($1)/';
1187 $DB'alias{'stop'} = 's/^stop (at|in)/b/';
1189 's/^\e./p "\e$DB\e'sub(\e$DB\e'line):\et",\e$DB\e'line[\e$DB\e'line]/';
1192 .Sh "Setuid Scripts"
1194 is designed to make it easy to write secure setuid and setgid scripts.
1195 Unlike shells, which are based on multiple substitution passes on each line
1198 uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1199 Additionally, since the language has more built-in functionality, it
1200 has to rely less upon external (and possibly untrustworthy) programs to
1201 accomplish its purposes.
1203 In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1204 insecure, but this kernel feature can be disabled.
1207 can emulate the setuid and setgid mechanism when it notices the otherwise
1208 useless setuid/gid bits on perl scripts.
1209 If the kernel feature isn't disabled,
1211 will complain loudly that your setuid script is insecure.
1212 You'll need to either disable the kernel setuid script feature, or put
1213 a C wrapper around the script.
1215 When perl is executing a setuid script, it takes special precautions to
1216 prevent you from falling into any obvious traps.
1217 (In some ways, a perl script is more secure than the corresponding
1219 Any command line argument, environment variable, or input is marked as
1220 \*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1221 command that invokes a subshell, or in any command that modifies files,
1222 directories or processes.
1223 Any variable that is set within an expression that has previously referenced
1224 a tainted value also becomes tainted (even if it is logically impossible
1225 for the tainted value to influence the variable).
1230 $foo = shift; # $foo is tainted
1231 $bar = $foo,\'bar\'; # $bar is also tainted
1232 $xxx = <>; # Tainted
1233 $path = $ENV{\'PATH\'}; # Tainted, but see below
1234 $abc = \'abc\'; # Not tainted
1237 system "echo $foo"; # Insecure
1238 system "/bin/echo", $foo; # Secure (doesn't use sh)
1239 system "echo $bar"; # Insecure
1240 system "echo $abc"; # Insecure until PATH set
1243 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1244 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1246 $path = $ENV{\'PATH\'}; # Not tainted
1247 system "echo $abc"; # Is secure now!
1250 open(FOO,"$foo"); # OK
1251 open(FOO,">$foo"); # Not OK
1253 open(FOO,"echo $foo|"); # Not OK, but...
1254 open(FOO,"-|") || exec \'echo\', $foo; # OK
1256 $zzz = `echo $foo`; # Insecure, zzz tainted
1258 unlink $abc,$foo; # Insecure
1259 umask $foo; # Insecure
1262 exec "echo $foo"; # Insecure
1263 exec "echo", $foo; # Secure (doesn't use sh)
1264 exec "sh", \'-c\', $foo; # Considered secure, alas
1267 The taintedness is associated with each scalar value, so some elements
1268 of an array can be tainted, and others not.
1270 If you try to do something insecure, you will get a fatal error saying
1271 something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1272 Note that you can still write an insecure system call or exec,
1273 but only by explicitly doing something like the last example above.
1274 You can also bypass the tainting mechanism by referencing
1277 presumes that if you reference a substring using $1, $2, etc, you knew
1278 what you were doing when you wrote the pattern:
1281 $ARGV[0] =~ /^\-P(\ew+)$/;
1282 $printer = $1; # Not tainted
1285 This is fairly secure since \ew+ doesn't match shell metacharacters.
1286 Use of .+ would have been insecure, but
1288 doesn't check for that, so you must be careful with your patterns.
1289 This is the ONLY mechanism for untainting user supplied filenames if you
1290 want to do file operations on them (unless you make $> equal to $<).
1292 It's also possible to get into trouble with other operations that don't care
1293 whether they use tainted values.
1294 Make judicious use of the file tests in dealing with any user-supplied
1296 When possible, do opens and such after setting $> = $<.
1298 doesn't prevent you from opening tainted filenames for reading, so be
1299 careful what you print out.
1300 The tainting mechanism is intended to prevent stupid mistakes, not to remove
1301 the need for thought.
1304 uses PATH in executing subprocesses, and in finding the script if \-S
1306 HOME or LOGDIR are used if chdir has no argument.
1310 uses no environment variables, except to make them available
1311 to the script being executed, and to child processes.
1312 However, scripts running setuid would do well to execute the following lines
1313 before doing anything else, just to keep people honest:
1317 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1318 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1319 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1323 Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1325 MS-DOS port by Diomidis Spinellis <dds@cc.ic.ac.uk>
1327 /tmp/perl\-eXXXXXX temporary file for
1331 a2p awk to perl translator
1333 s2p sed to perl translator
1335 Compilation errors will tell you the line number of the error, with an
1336 indication of the next token or token type that was to be examined.
1337 (In the case of a script passed to
1343 is counted as one line.)
1345 Setuid scripts have additional constraints that can produce error messages
1346 such as \*(L"Insecure dependency\*(R".
1347 See the section on setuid scripts.
1351 users should take special note of the following:
1353 Semicolons are required after all simple statements in
1356 is not a statement delimiter.
1358 Curly brackets are required on ifs and whiles.
1360 Variables begin with $ or @ in
1363 Arrays index from 0 unless you set $[.
1364 Likewise string positions in substr() and index().
1366 You have to decide whether your array has numeric or string indices.
1368 Associative array values do not spring into existence upon mere reference.
1370 You have to decide whether you want to use string or numeric comparisons.
1372 Reading an input line does not split it for you. You get to split it yourself
1376 operator has different arguments.
1378 The current input line is normally in $_, not $0.
1379 It generally does not have the newline stripped.
1380 ($0 is the name of the program executed.)
1382 $<digit> does not refer to fields\*(--it refers to substrings matched by the last
1387 statement does not add field and record separators unless you set
1390 You must open your files before you print to them.
1392 The range operator is \*(L".\|.\*(R", not comma.
1393 (The comma operator works as in C.)
1395 The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1396 (\*(L"~\*(R" is the one's complement operator, as in C.)
1398 The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1399 (\*(L"^\*(R" is the XOR operator, as in C.)
1401 The concatenation operator is \*(L".\*(R", not the null string.
1402 (Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1403 since the third slash would be interpreted as a division operator\*(--the
1404 tokener is in fact slightly context sensitive for operators like /, ?, and <.
1405 And in fact, . itself can be the beginning of a number.)
1413 The following variables work differently
1417 ARGC \h'|2.5i'$#ARGV
1419 FILENAME\h'|2.5i'$ARGV
1420 FNR \h'|2.5i'$. \- something
1421 FS \h'|2.5i'(whatever you like)
1422 NF \h'|2.5i'$#Fld, or some such
1427 RLENGTH \h'|2.5i'length($&)
1429 RSTART \h'|2.5i'length($\`)
1434 When in doubt, run the
1436 construct through a2p and see what it gives you.
1438 Cerebral C programmers should take note of the following:
1440 Curly brackets are required on ifs and whiles.
1442 You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1453 There's no switch statement.
1455 Variables begin with $ or @ in
1458 Printf does not implement *.
1460 Comments begin with #, not /*.
1462 You can't take the address of anything.
1464 ARGV must be capitalized.
1466 The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1468 Signal handlers deal with signal names, not numbers.
1472 programmers should take note of the following:
1474 Backreferences in substitutions use $ rather than \e.
1476 The pattern matching metacharacters (, ), and | do not have backslashes in front.
1478 The range operator is .\|. rather than comma.
1480 Sharp shell programmers should take note of the following:
1482 The backtick operator does variable interpretation without regard to the
1483 presence of single quotes in the command.
1485 The backtick operator does no translation of the return value, unlike csh.
1487 Shells (especially csh) do several levels of substitution on each command line.
1489 does substitution only in certain constructs such as double quotes,
1490 backticks, angle brackets and search patterns.
1492 Shells interpret scripts a little bit at a time.
1494 compiles the whole program before executing it.
1496 The arguments are available via @ARGV, not $1, $2, etc.
1498 The environment is not automatically made available as variables.
1502 is at the mercy of your machine's definitions of various operations
1503 such as type casting, atof() and sprintf().
1505 If your stdio requires an seek or eof between reads and writes on a particular
1509 While none of the built-in data types have any arbitrary size limits (apart
1510 from memory size), there are still a few arbitrary limits:
1511 a given identifier may not be longer than 255 characters;
1512 sprintf is limited on many machines to 128 characters per field (unless the format
1513 specifier is exactly %s);
1514 and no component of your PATH may be longer than 255 if you use \-S.
1517 actually stands for Pathologically Eclectic Rubbish Lister, but don't tell