perl 3.0 patch #38 (combined patch)
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
e5d73d77 2''' $Header: perl_man.4,v 3.0.1.12 90/10/20 02:15:43 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
e5d73d77 5''' Revision 3.0.1.12 90/10/20 02:15:43 lwall
6''' patch37: patch37: fixed various typos in man page
7'''
76854fea 8''' Revision 3.0.1.11 90/10/16 10:04:28 lwall
9''' patch29: added @###.## fields to format
10'''
33b78306 11''' Revision 3.0.1.10 90/08/09 04:47:35 lwall
12''' patch19: added require operator
13''' patch19: added numeric interpretation of $]
14'''
15''' Revision 3.0.1.9 90/08/03 11:15:58 lwall
16''' patch19: Intermediate diffs for Randal
17'''
0f85fab0 18''' Revision 3.0.1.8 90/03/27 16:19:31 lwall
19''' patch16: MSDOS support
20'''
63f2c1e1 21''' Revision 3.0.1.7 90/03/14 12:29:50 lwall
22''' patch15: man page falsely states that you can't subscript array values
23'''
79a0689e 24''' Revision 3.0.1.6 90/03/12 16:54:04 lwall
25''' patch13: improved documentation of *name
26'''
ac58e20f 27''' Revision 3.0.1.5 90/02/28 18:01:52 lwall
28''' patch9: $0 is now always the command name
29'''
663a0e37 30''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
31''' patch7: documented that package'filehandle works as well as $package'variable
32''' patch7: documented which identifiers are always in package main
33'''
ffed7fef 34''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
35''' patch5: fixed some manual typos and indent problems
36''' patch5: clarified difference between $! and $@
37'''
ae986130 38''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
39''' patch2: made some line breaks depend on troff vs. nroff
40''' patch2: clarified operation of ^ and $ when $* is false
41'''
03a14243 42''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
43''' patch1: documented the desirability of unnecessary parentheses
44'''
a687059c 45''' Revision 3.0 89/10/18 15:21:55 lwall
46''' 3.0 baseline
47'''
48.Sh "Precedence"
49.I Perl
50operators have the following associativity and precedence:
51.nf
52
53nonassoc\h'|1i'print printf exec system sort reverse
54\h'1.5i'chmod chown kill unlink utime die return
55left\h'|1i',
56right\h'|1i'= += \-= *= etc.
57right\h'|1i'?:
58nonassoc\h'|1i'.\|.
59left\h'|1i'||
60left\h'|1i'&&
61left\h'|1i'| ^
62left\h'|1i'&
63nonassoc\h'|1i'== != eq ne
64nonassoc\h'|1i'< > <= >= lt gt le ge
65nonassoc\h'|1i'chdir exit eval reset sleep rand umask
66nonassoc\h'|1i'\-r \-w \-x etc.
67left\h'|1i'<< >>
68left\h'|1i'+ \- .
69left\h'|1i'* / % x
70left\h'|1i'=~ !~
71right\h'|1i'! ~ and unary minus
72right\h'|1i'**
73nonassoc\h'|1i'++ \-\|\-
74left\h'|1i'\*(L'(\*(R'
75
76.fi
77As mentioned earlier, if any list operator (print, etc.) or
78any unary operator (chdir, etc.)
79is followed by a left parenthesis as the next token on the same line,
80the operator and arguments within parentheses are taken to
81be of highest precedence, just like a normal function call.
82Examples:
83.nf
84
ffed7fef 85 chdir $foo || die;\h'|3i'# (chdir $foo) || die
86 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
87 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
88 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 89
90but, because * is higher precedence than ||:
91
ffed7fef 92 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
93 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
94 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
95 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 96
ffed7fef 97 rand 10 * 20;\h'|3i'# rand (10 * 20)
98 rand(10) * 20;\h'|3i'# (rand 10) * 20
99 rand (10) * 20;\h'|3i'# (rand 10) * 20
100 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 101
102.fi
103In the absence of parentheses,
104the precedence of list operators such as print, sort or chmod is
105either very high or very low depending on whether you look at the left
106side of operator or the right side of it.
107For example, in
108.nf
109
110 @ary = (1, 3, sort 4, 2);
111 print @ary; # prints 1324
112
113.fi
114the commas on the right of the sort are evaluated before the sort, but
115the commas on the left are evaluated after.
116In other words, list operators tend to gobble up all the arguments that
117follow them, and then act like a simple term with regard to the preceding
118expression.
119Note that you have to be careful with parens:
120.nf
121
122.ne 3
123 # These evaluate exit before doing the print:
124 print($foo, exit); # Obviously not what you want.
125 print $foo, exit; # Nor is this.
126
127.ne 4
128 # These do the print before evaluating exit:
129 (print $foo), exit; # This is what you want.
130 print($foo), exit; # Or this.
131 print ($foo), exit; # Or even this.
132
133Also note that
134
135 print ($foo & 255) + 1, "\en";
136
137.fi
138probably doesn't do what you expect at first glance.
139.Sh "Subroutines"
140A subroutine may be declared as follows:
141.nf
142
143 sub NAME BLOCK
144
145.fi
146.PP
147Any arguments passed to the routine come in as array @_,
148that is ($_[0], $_[1], .\|.\|.).
149The array @_ is a local array, but its values are references to the
150actual scalar parameters.
151The return value of the subroutine is the value of the last expression
152evaluated, and can be either an array value or a scalar value.
153Alternately, a return statement may be used to specify the returned value and
154exit the subroutine.
155To create local variables see the
156.I local
157operator.
158.PP
159A subroutine is called using the
160.I do
161operator or the & operator.
162.nf
163
164.ne 12
165Example:
166
167 sub MAX {
168 local($max) = pop(@_);
169 foreach $foo (@_) {
170 $max = $foo \|if \|$max < $foo;
171 }
172 $max;
173 }
174
175 .\|.\|.
176 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
177
178.ne 21
179Example:
180
181 # get a line, combining continuation lines
182 # that start with whitespace
183 sub get_line {
184 $thisline = $lookahead;
185 line: while ($lookahead = <STDIN>) {
186 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
187 $thisline \|.= \|$lookahead;
188 }
189 else {
190 last line;
191 }
192 }
193 $thisline;
194 }
195
196 $lookahead = <STDIN>; # get first line
197 while ($_ = do get_line(\|)) {
198 .\|.\|.
199 }
200
201.fi
202.nf
203.ne 6
204Use array assignment to a local list to name your formal arguments:
205
206 sub maybeset {
207 local($key, $value) = @_;
208 $foo{$key} = $value unless $foo{$key};
209 }
210
211.fi
212This also has the effect of turning call-by-reference into call-by-value,
213since the assignment copies the values.
214.Sp
215Subroutines may be called recursively.
216If a subroutine is called using the & form, the argument list is optional.
217If omitted, no @_ array is set up for the subroutine; the @_ array at the
218time of the call is visible to subroutine instead.
219.nf
220
221 do foo(1,2,3); # pass three arguments
222 &foo(1,2,3); # the same
223
224 do foo(); # pass a null list
225 &foo(); # the same
226 &foo; # pass no arguments--more efficient
227
228.fi
229.Sh "Passing By Reference"
230Sometimes you don't want to pass the value of an array to a subroutine but
231rather the name of it, so that the subroutine can modify the global copy
232of it rather than working with a local copy.
233In perl you can refer to all the objects of a particular name by prefixing
234the name with a star: *foo.
235When evaluated, it produces a scalar value that represents all the objects
79a0689e 236of that name, including any filehandle, format or subroutine.
a687059c 237When assigned to within a local() operation, it causes the name mentioned
238to refer to whatever * value was assigned to it.
239Example:
240.nf
241
242 sub doubleary {
243 local(*someary) = @_;
244 foreach $elem (@someary) {
245 $elem *= 2;
246 }
247 }
248 do doubleary(*foo);
249 do doubleary(*bar);
250
251.fi
252Assignment to *name is currently recommended only inside a local().
253You can actually assign to *name anywhere, but the previous referent of
254*name may be stranded forever.
255This may or may not bother you.
256.Sp
257Note that scalars are already passed by reference, so you can modify scalar
ae986130 258arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 259in question.
260You can modify all the elements of an array by passing all the elements
261as scalars, but you have to use the * mechanism to push, pop or change the
262size of an array.
263The * mechanism will probably be more efficient in any case.
264.Sp
265Since a *name value contains unprintable binary data, if it is used as
266an argument in a print, or as a %s argument in a printf or sprintf, it
267then has the value '*name', just so it prints out pretty.
79a0689e 268.Sp
269Even if you don't want to modify an array, this mechanism is useful for
270passing multiple arrays in a single LIST, since normally the LIST mechanism
271will merge all the array values so that you can't extract out the
272individual arrays.
a687059c 273.Sh "Regular Expressions"
274The patterns used in pattern matching are regular expressions such as
275those supplied in the Version 8 regexp routines.
276(In fact, the routines are derived from Henry Spencer's freely redistributable
277reimplementation of the V8 routines.)
278In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
279Word boundaries may be matched by \eb, and non-boundaries by \eB.
280A whitespace character is matched by \es, non-whitespace by \eS.
281A numeric character is matched by \ed, non-numeric by \eD.
282You may use \ew, \es and \ed within character classes.
283Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
284Within character classes \eb represents backspace rather than a word boundary.
285Alternatives may be separated by |.
286The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
287matches the digit'th substring, where digit can range from 1 to 9.
288(Outside of the pattern, always use $ instead of \e in front of the digit.
289The scope of $<digit> (and $\`, $& and $\')
290extends to the end of the enclosing BLOCK or eval string, or to
291the next pattern match with subexpressions.
292The \e<digit> notation sometimes works outside the current pattern, but should
293not be relied upon.)
294$+ returns whatever the last bracket match matched.
295$& returns the entire matched string.
ac58e20f 296($0 used to return the same thing, but not any more.)
a687059c 297$\` returns everything before the matched string.
298$\' returns everything after the matched string.
299Examples:
300.nf
301
302 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
303
304.ne 5
305 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
306 $hours = $1;
307 $minutes = $2;
308 $seconds = $3;
309 }
310
311.fi
ae986130 312By default, the ^ character is only guaranteed to match at the beginning
313of the string,
314the $ character only at the end (or before the newline at the end)
a687059c 315and
316.I perl
317does certain optimizations with the assumption that the string contains
318only one line.
ae986130 319The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 320You may, however, wish to treat a string as a multi-line buffer, such that
321the ^ will match after any newline within the string, and $ will match
322before any newline.
323At the cost of a little more overhead, you can do this by setting the variable
324$* to 1.
325Setting it back to 0 makes
326.I perl
327revert to its old behavior.
328.PP
329To facilitate multi-line substitutions, the . character never matches a newline
330(even when $* is 0).
331In particular, the following leaves a newline on the $_ string:
332.nf
333
334 $_ = <STDIN>;
335 s/.*(some_string).*/$1/;
336
337If the newline is unwanted, try one of
338
339 s/.*(some_string).*\en/$1/;
340 s/.*(some_string)[^\e000]*/$1/;
341 s/.*(some_string)(.|\en)*/$1/;
342 chop; s/.*(some_string).*/$1/;
343 /(some_string)/ && ($_ = $1);
344
345.fi
346Any item of a regular expression may be followed with digits in curly brackets
347of the form {n,m}, where n gives the minimum number of times to match the item
348and m gives the maximum.
349The form {n} is equivalent to {n,n} and matches exactly n times.
350The form {n,} matches n or more times.
351(If a curly bracket occurs in any other context, it is treated as a regular
352character.)
353The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
354to {0,1}.
355There is no limit to the size of n or m, but large numbers will chew up
356more memory.
357.Sp
358You will note that all backslashed metacharacters in
359.I perl
360are alphanumeric,
361such as \eb, \ew, \en.
362Unlike some other regular expression languages, there are no backslashed
363symbols that aren't alphanumeric.
364So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
365interpreted as a literal character, not a metacharacter.
366This makes it simple to quote a string that you want to use for a pattern
367but that you are afraid might contain metacharacters.
368Simply quote all the non-alphanumeric characters:
369.nf
370
371 $pattern =~ s/(\eW)/\e\e$1/g;
372
373.fi
374.Sh "Formats"
375Output record formats for use with the
376.I write
377operator may declared as follows:
378.nf
379
380.ne 3
381 format NAME =
382 FORMLIST
383 .
384
385.fi
386If name is omitted, format \*(L"STDOUT\*(R" is defined.
387FORMLIST consists of a sequence of lines, each of which may be of one of three
388types:
389.Ip 1. 4
390A comment.
391.Ip 2. 4
392A \*(L"picture\*(R" line giving the format for one output line.
393.Ip 3. 4
394An argument line supplying values to plug into a picture line.
395.PP
396Picture lines are printed exactly as they look, except for certain fields
397that substitute values into the line.
398Each picture field starts with either @ or ^.
399The @ field (not to be confused with the array marker @) is the normal
400case; ^ fields are used
401to do rudimentary multi-line text block filling.
402The length of the field is supplied by padding out the field
403with multiple <, >, or | characters to specify, respectively, left justification,
404right justification, or centering.
76854fea 405As an alternate form of right justification,
406you may also use # characters (with an optional .) to specify a numeric field.
a687059c 407If any of the values supplied for these fields contains a newline, only
408the text up to the newline is printed.
409The special field @* can be used for printing multi-line values.
410It should appear by itself on a line.
411.PP
412The values are specified on the following line, in the same order as
413the picture fields.
414The values should be separated by commas.
415.PP
416Picture fields that begin with ^ rather than @ are treated specially.
417The value supplied must be a scalar variable name which contains a text
418string.
419.I Perl
420puts as much text as it can into the field, and then chops off the front
421of the string so that the next time the variable is referenced,
422more of the text can be printed.
423Normally you would use a sequence of fields in a vertical stack to print
424out a block of text.
425If you like, you can end the final field with .\|.\|., which will appear in the
426output if the text was too long to appear in its entirety.
427You can change which characters are legal to break on by changing the
428variable $: to a list of the desired characters.
429.PP
430Since use of ^ fields can produce variable length records if the text to be
431formatted is short, you can suppress blank lines by putting the tilde (~)
432character anywhere in the line.
433(Normally you should put it in the front if possible, for visibility.)
434The tilde will be translated to a space upon output.
435If you put a second tilde contiguous to the first, the line will be repeated
436until all the fields on the line are exhausted.
437(If you use a field of the @ variety, the expression you supply had better
438not give the same value every time forever!)
439.PP
440Examples:
441.nf
442.lg 0
443.cs R 25
444.ft C
445
446.ne 10
447# a report on the /etc/passwd file
448format top =
449\& Passwd File
450Name Login Office Uid Gid Home
451------------------------------------------------------------------
452\&.
453format STDOUT =
454@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
455$name, $login, $office,$uid,$gid, $home
456\&.
457
458.ne 29
459# a report from a bug report form
460format top =
461\& Bug Reports
462@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
463$system, $%, $date
464------------------------------------------------------------------
465\&.
466format STDOUT =
467Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
468\& $subject
469Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
470\& $index, $description
471Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
472\& $priority, $date, $description
473From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
474\& $from, $description
475Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
476\& $programmer, $description
477\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
478\& $description
479\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
480\& $description
481\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
482\& $description
483\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
484\& $description
485\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
486\& $description
487\&.
488
489.ft R
490.cs R
491.lg
492.fi
493It is possible to intermix prints with writes on the same output channel,
494but you'll have to handle $\- (lines left on the page) yourself.
495.PP
496If you are printing lots of fields that are usually blank, you should consider
497using the reset operator between records.
498Not only is it more efficient, but it can prevent the bug of adding another
499field and forgetting to zero it.
500.Sh "Interprocess Communication"
501The IPC facilities of perl are built on the Berkeley socket mechanism.
502If you don't have sockets, you can ignore this section.
503The calls have the same names as the corresponding system calls,
504but the arguments tend to differ, for two reasons.
505First, perl file handles work differently than C file descriptors.
506Second, perl already knows the length of its strings, so you don't need
507to pass that information.
508Here is a sample client (untested):
509.nf
510
511 ($them,$port) = @ARGV;
512 $port = 2345 unless $port;
513 $them = 'localhost' unless $them;
514
515 $SIG{'INT'} = 'dokill';
516 sub dokill { kill 9,$child if $child; }
517
33b78306 518 require 'sys/socket.ph';
a687059c 519
520 $sockaddr = 'S n a4 x8';
521 chop($hostname = `hostname`);
522
523 ($name, $aliases, $proto) = getprotobyname('tcp');
524 ($name, $aliases, $port) = getservbyname($port, 'tcp')
0f85fab0 525 unless $port =~ /^\ed+$/;
ae986130 526.ie t \{\
a687059c 527 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 528'br\}
529.el \{\
530 ($name, $aliases, $type, $len, $thisaddr) =
531 gethostbyname($hostname);
532'br\}
a687059c 533 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
534
535 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
536 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
537
538 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
539 bind(S, $this) || die "bind: $!";
540 connect(S, $that) || die "connect: $!";
541
542 select(S); $| = 1; select(stdout);
543
544 if ($child = fork) {
545 while (<>) {
546 print S;
547 }
548 sleep 3;
549 do dokill();
550 }
551 else {
552 while (<S>) {
553 print;
554 }
555 }
556
557.fi
558And here's a server:
559.nf
560
561 ($port) = @ARGV;
562 $port = 2345 unless $port;
563
33b78306 564 require 'sys/socket.ph';
a687059c 565
566 $sockaddr = 'S n a4 x8';
567
568 ($name, $aliases, $proto) = getprotobyname('tcp');
569 ($name, $aliases, $port) = getservbyname($port, 'tcp')
0f85fab0 570 unless $port =~ /^\ed+$/;
a687059c 571
572 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
573
574 select(NS); $| = 1; select(stdout);
575
576 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
577 bind(S, $this) || die "bind: $!";
578 listen(S, 5) || die "connect: $!";
579
580 select(S); $| = 1; select(stdout);
581
582 for (;;) {
583 print "Listening again\en";
584 ($addr = accept(NS,S)) || die $!;
585 print "accept ok\en";
586
ae986130 587 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 588 @inetaddr = unpack('C4',$inetaddr);
589 print "$af $port @inetaddr\en";
590
591 while (<NS>) {
592 print;
593 print NS;
594 }
595 }
596
597.fi
598.Sh "Predefined Names"
599The following names have special meaning to
600.IR perl .
601I could have used alphabetic symbols for some of these, but I didn't want
602to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
603out.
604You'll just have to suffer along with these silly symbols.
605Most of them have reasonable mnemonics, or analogues in one of the shells.
606.Ip $_ 8
607The default input and pattern-searching space.
608The following pairs are equivalent:
609.nf
610
611.ne 2
612 while (<>) {\|.\|.\|. # only equivalent in while!
613 while ($_ = <>) {\|.\|.\|.
614
615.ne 2
616 /\|^Subject:/
617 $_ \|=~ \|/\|^Subject:/
618
619.ne 2
620 y/a\-z/A\-Z/
621 $_ =~ y/a\-z/A\-Z/
622
623.ne 2
624 chop
625 chop($_)
626
627.fi
628(Mnemonic: underline is understood in certain operations.)
629.Ip $. 8
630The current input line number of the last filehandle that was read.
631Readonly.
632Remember that only an explicit close on the filehandle resets the line number.
633Since <> never does an explicit close, line numbers increase across ARGV files
634(but see examples under eof).
635(Mnemonic: many programs use . to mean the current line number.)
636.Ip $/ 8
637The input record separator, newline by default.
638Works like
639.IR awk 's
640RS variable, including treating blank lines as delimiters
641if set to the null string.
642If set to a value longer than one character, only the first character is used.
643(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
644.Ip $, 8
645The output field separator for the print operator.
646Ordinarily the print operator simply prints out the comma separated fields
647you specify.
648In order to get behavior more like
649.IR awk ,
650set this variable as you would set
651.IR awk 's
652OFS variable to specify what is printed between fields.
653(Mnemonic: what is printed when there is a , in your print statement.)
654.Ip $"" 8
655This is like $, except that it applies to array values interpolated into
656a double-quoted string (or similar interpreted string).
657Default is a space.
658(Mnemonic: obvious, I think.)
659.Ip $\e 8
660The output record separator for the print operator.
661Ordinarily the print operator simply prints out the comma separated fields
662you specify, with no trailing newline or record separator assumed.
663In order to get behavior more like
664.IR awk ,
665set this variable as you would set
666.IR awk 's
667ORS variable to specify what is printed at the end of the print.
668(Mnemonic: you set $\e instead of adding \en at the end of the print.
669Also, it's just like /, but it's what you get \*(L"back\*(R" from
670.IR perl .)
671.Ip $# 8
672The output format for printed numbers.
673This variable is a half-hearted attempt to emulate
674.IR awk 's
675OFMT variable.
676There are times, however, when
677.I awk
678and
679.I perl
680have differing notions of what
681is in fact numeric.
682Also, the initial value is %.20g rather than %.6g, so you need to set $#
683explicitly to get
684.IR awk 's
685value.
686(Mnemonic: # is the number sign.)
687.Ip $% 8
688The current page number of the currently selected output channel.
689(Mnemonic: % is page number in nroff.)
690.Ip $= 8
691The current page length (printable lines) of the currently selected output
692channel.
693Default is 60.
694(Mnemonic: = has horizontal lines.)
695.Ip $\- 8
696The number of lines left on the page of the currently selected output channel.
697(Mnemonic: lines_on_page \- lines_printed.)
698.Ip $~ 8
699The name of the current report format for the currently selected output
700channel.
701(Mnemonic: brother to $^.)
702.Ip $^ 8
703The name of the current top-of-page format for the currently selected output
704channel.
705(Mnemonic: points to top of page.)
706.Ip $| 8
707If set to nonzero, forces a flush after every write or print on the currently
708selected output channel.
709Default is 0.
710Note that
711.I STDOUT
712will typically be line buffered if output is to the
713terminal and block buffered otherwise.
714Setting this variable is useful primarily when you are outputting to a pipe,
715such as when you are running a
716.I perl
717script under rsh and want to see the
718output as it's happening.
719(Mnemonic: when you want your pipes to be piping hot.)
720.Ip $$ 8
721The process number of the
722.I perl
723running this script.
724(Mnemonic: same as shells.)
725.Ip $? 8
726The status returned by the last pipe close, backtick (\`\`) command or
727.I system
728operator.
729Note that this is the status word returned by the wait() system
730call, so the exit value of the subprocess is actually ($? >> 8).
731$? & 255 gives which signal, if any, the process died from, and whether
732there was a core dump.
733(Mnemonic: similar to sh and ksh.)
734.Ip $& 8 4
735The string matched by the last pattern match (not counting any matches hidden
736within a BLOCK or eval enclosed by the current BLOCK).
737(Mnemonic: like & in some editors.)
738.Ip $\` 8 4
739The string preceding whatever was matched by the last pattern match
740(not counting any matches hidden within a BLOCK or eval enclosed by the current
741BLOCK).
742(Mnemonic: \` often precedes a quoted string.)
743.Ip $\' 8 4
744The string following whatever was matched by the last pattern match
745(not counting any matches hidden within a BLOCK or eval enclosed by the current
746BLOCK).
747(Mnemonic: \' often follows a quoted string.)
748Example:
749.nf
750
751.ne 3
752 $_ = \'abcdefghi\';
753 /def/;
754 print "$\`:$&:$\'\en"; # prints abc:def:ghi
755
756.fi
757.Ip $+ 8 4
758The last bracket matched by the last search pattern.
759This is useful if you don't know which of a set of alternative patterns
760matched.
761For example:
762.nf
763
764 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
765
766.fi
767(Mnemonic: be positive and forward looking.)
768.Ip $* 8 2
769Set to 1 to do multiline matching within a string, 0 to tell
770.I perl
771that it can assume that strings contain a single line, for the purpose
772of optimizing pattern matches.
773Pattern matches on strings containing multiple newlines can produce confusing
774results when $* is 0.
775Default is 0.
776(Mnemonic: * matches multiple things.)
777.Ip $0 8
778Contains the name of the file containing the
779.I perl
780script being executed.
a687059c 781(Mnemonic: same as sh and ksh.)
782.Ip $<digit> 8
783Contains the subpattern from the corresponding set of parentheses in the last
784pattern matched, not counting patterns matched in nested blocks that have
785been exited already.
786(Mnemonic: like \edigit.)
787.Ip $[ 8 2
788The index of the first element in an array, and of the first character in
789a substring.
790Default is 0, but you could set it to 1 to make
791.I perl
792behave more like
793.I awk
794(or Fortran)
795when subscripting and when evaluating the index() and substr() functions.
796(Mnemonic: [ begins subscripts.)
797.Ip $] 8 2
798The string printed out when you say \*(L"perl -v\*(R".
799It can be used to determine at the beginning of a script whether the perl
800interpreter executing the script is in the right range of versions.
33b78306 801If used in a numeric context, returns the version + patchlevel / 1000.
a687059c 802Example:
803.nf
804
33b78306 805.ne 8
a687059c 806 # see if getc is available
807 ($version,$patchlevel) =
808 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
809 print STDERR "(No filename completion available.)\en"
810 if $version * 1000 + $patchlevel < 2016;
811
33b78306 812or, used numerically,
813
e5d73d77 814 warn "No checksumming!\en" if $] < 3.019;
33b78306 815
a687059c 816.fi
817(Mnemonic: Is this version of perl in the right bracket?)
818.Ip $; 8 2
819The subscript separator for multi-dimensional array emulation.
820If you refer to an associative array element as
821.nf
822 $foo{$a,$b,$c}
823
824it really means
825
826 $foo{join($;, $a, $b, $c)}
827
828But don't put
829
830 @foo{$a,$b,$c} # a slice--note the @
831
832which means
833
834 ($foo{$a},$foo{$b},$foo{$c})
835
836.fi
837Default is "\e034", the same as SUBSEP in
838.IR awk .
839Note that if your keys contain binary data there might not be any safe
840value for $;.
841(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
842Yeah, I know, it's pretty lame, but $, is already taken for something more
843important.)
844.Ip $! 8 2
845If used in a numeric context, yields the current value of errno, with all the
846usual caveats.
ffed7fef 847(This means that you shouldn't depend on the value of $! to be anything
848in particular unless you've gotten a specific error return indicating a
849system error.)
a687059c 850If used in a string context, yields the corresponding system error string.
851You can assign to $! in order to set errno
852if, for instance, you want $! to return the string for error n, or you want
853to set the exit value for the die operator.
854(Mnemonic: What just went bang?)
855.Ip $@ 8 2
ffed7fef 856The perl syntax error message from the last eval command.
857If null, the last eval parsed and executed correctly (although the operations
858you invoked may have failed in the normal fashion).
a687059c 859(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
860.Ip $< 8 2
861The real uid of this process.
862(Mnemonic: it's the uid you came FROM, if you're running setuid.)
863.Ip $> 8 2
864The effective uid of this process.
865Example:
866.nf
867
868.ne 2
869 $< = $>; # set real uid to the effective uid
870 ($<,$>) = ($>,$<); # swap real and effective uid
871
872.fi
873(Mnemonic: it's the uid you went TO, if you're running setuid.)
874Note: $< and $> can only be swapped on machines supporting setreuid().
875.Ip $( 8 2
876The real gid of this process.
877If you are on a machine that supports membership in multiple groups
878simultaneously, gives a space separated list of groups you are in.
879The first number is the one returned by getgid(), and the subsequent ones
880by getgroups(), one of which may be the same as the first number.
881(Mnemonic: parentheses are used to GROUP things.
882The real gid is the group you LEFT, if you're running setgid.)
883.Ip $) 8 2
884The effective gid of this process.
885If you are on a machine that supports membership in multiple groups
886simultaneously, gives a space separated list of groups you are in.
887The first number is the one returned by getegid(), and the subsequent ones
888by getgroups(), one of which may be the same as the first number.
889(Mnemonic: parentheses are used to GROUP things.
890The effective gid is the group that's RIGHT for you, if you're running setgid.)
891.Sp
892Note: $<, $>, $( and $) can only be set on machines that support the
893corresponding set[re][ug]id() routine.
894$( and $) can only be swapped on machines supporting setregid().
895.Ip $: 8 2
896The current set of characters after which a string may be broken to
897fill continuation fields (starting with ^) in a format.
898Default is "\ \en-", to break on whitespace or hyphens.
899(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
33b78306 900.Ip $ARGV 8 3
901contains the name of the current file when reading from <>.
a687059c 902.Ip @ARGV 8 3
903The array ARGV contains the command line arguments intended for the script.
904Note that $#ARGV is the generally number of arguments minus one, since
905$ARGV[0] is the first argument, NOT the command name.
906See $0 for the command name.
907.Ip @INC 8 3
908The array INC contains the list of places to look for
909.I perl
910scripts to be
e5d73d77 911evaluated by the \*(L"do EXPR\*(R" command or the \*(L"require\*(R" command.
a687059c 912It initially consists of the arguments to any
913.B \-I
914command line switches, followed
915by the default
916.I perl
33b78306 917library, probably \*(L"/usr/local/lib/perl\*(R",
918followed by \*(L".\*(R", to represent the current directory.
919.Ip %INC 8 3
920The associative array INC contains entries for each filename that has
921been included via \*(L"do\*(R" or \*(L"require\*(R".
922The key is the filename you specified, and the value is the location of
923the file actually found.
924The \*(L"require\*(R" command uses this array to determine whether
925a given file has already been included.
a687059c 926.Ip $ENV{expr} 8 2
927The associative array ENV contains your current environment.
928Setting a value in ENV changes the environment for child processes.
929.Ip $SIG{expr} 8 2
930The associative array SIG is used to set signal handlers for various signals.
931Example:
932.nf
933
934.ne 12
935 sub handler { # 1st argument is signal name
936 local($sig) = @_;
937 print "Caught a SIG$sig\-\|\-shutting down\en";
938 close(LOG);
939 exit(0);
940 }
941
942 $SIG{\'INT\'} = \'handler\';
943 $SIG{\'QUIT\'} = \'handler\';
944 .\|.\|.
945 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
946 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
947
948.fi
949The SIG array only contains values for the signals actually set within
950the perl script.
951.Sh "Packages"
952Perl provides a mechanism for alternate namespaces to protect packages from
953stomping on each others variables.
954By default, a perl script starts compiling into the package known as \*(L"main\*(R".
955By use of the
956.I package
957declaration, you can switch namespaces.
958The scope of the package declaration is from the declaration itself to the end
959of the enclosing block (the same scope as the local() operator).
960Typically it would be the first declaration in a file to be included by
33b78306 961the \*(L"require\*(R" operator.
a687059c 962You can switch into a package in more than one place; it merely influences
963which symbol table is used by the compiler for the rest of that block.
663a0e37 964You can refer to variables and filehandles in other packages by prefixing
965the identifier with the package name and a single quote.
a687059c 966If the package name is null, the \*(L"main\*(R" package as assumed.
663a0e37 967.PP
968Only identifiers starting with letters are stored in the packages symbol
969table.
970All other symbols are kept in package \*(L"main\*(R".
971In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
972and SIG are forced to be in package \*(L"main\*(R", even when used for
973other purposes than their built-in one.
974Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
975or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
976will be interpreted instead as a pattern match, a substitution
977or a translation.
978.PP
a687059c 979Eval'ed strings are compiled in the package in which the eval was compiled
980in.
981(Assignments to $SIG{}, however, assume the signal handler specified is in the
982main package.
983Qualify the signal handler name if you wish to have a signal handler in
984a package.)
985For an example, examine perldb.pl in the perl library.
986It initially switches to the DB package so that the debugger doesn't interfere
987with variables in the script you are trying to debug.
988At various points, however, it temporarily switches back to the main package
989to evaluate various expressions in the context of the main package.
990.PP
991The symbol table for a package happens to be stored in the associative array
992of that name prepended with an underscore.
993The value in each entry of the associative array is
994what you are referring to when you use the *name notation.
995In fact, the following have the same effect (in package main, anyway),
996though the first is more
997efficient because it does the symbol table lookups at compile time:
998.nf
999
1000.ne 2
1001 local(*foo) = *bar;
1002 local($_main{'foo'}) = $_main{'bar'};
1003
1004.fi
1005You can use this to print out all the variables in a package, for instance.
1006Here is dumpvar.pl from the perl library:
1007.nf
1008.ne 11
1009 package dumpvar;
1010
1011 sub main'dumpvar {
1012 \& ($package) = @_;
1013 \& local(*stab) = eval("*_$package");
1014 \& while (($key,$val) = each(%stab)) {
1015 \& {
1016 \& local(*entry) = $val;
1017 \& if (defined $entry) {
1018 \& print "\e$$key = '$entry'\en";
1019 \& }
1020.ne 7
1021 \& if (defined @entry) {
1022 \& print "\e@$key = (\en";
1023 \& foreach $num ($[ .. $#entry) {
1024 \& print " $num\et'",$entry[$num],"'\en";
1025 \& }
1026 \& print ")\en";
1027 \& }
1028.ne 10
1029 \& if ($key ne "_$package" && defined %entry) {
1030 \& print "\e%$key = (\en";
1031 \& foreach $key (sort keys(%entry)) {
1032 \& print " $key\et'",$entry{$key},"'\en";
1033 \& }
1034 \& print ")\en";
1035 \& }
1036 \& }
1037 \& }
1038 }
1039
1040.fi
1041Note that, even though the subroutine is compiled in package dumpvar, the
663a0e37 1042name of the subroutine is qualified so that its name is inserted into package
a687059c 1043\*(L"main\*(R".
1044.Sh "Style"
1045Each programmer will, of course, have his or her own preferences in regards
1046to formatting, but there are some general guidelines that will make your
1047programs easier to read.
1048.Ip 1. 4 4
1049Just because you CAN do something a particular way doesn't mean that
1050you SHOULD do it that way.
1051.I Perl
1052is designed to give you several ways to do anything, so consider picking
1053the most readable one.
1054For instance
1055
1056 open(FOO,$foo) || die "Can't open $foo: $!";
1057
1058is better than
1059
1060 die "Can't open $foo: $!" unless open(FOO,$foo);
1061
1062because the second way hides the main point of the statement in a
1063modifier.
1064On the other hand
1065
1066 print "Starting analysis\en" if $verbose;
1067
1068is better than
1069
1070 $verbose && print "Starting analysis\en";
1071
1072since the main point isn't whether the user typed -v or not.
1073.Sp
1074Similarly, just because an operator lets you assume default arguments
1075doesn't mean that you have to make use of the defaults.
1076The defaults are there for lazy systems programmers writing one-shot
1077programs.
1078If you want your program to be readable, consider supplying the argument.
03a14243 1079.Sp
1080Along the same lines, just because you
1081.I can
1082omit parentheses in many places doesn't mean that you ought to:
1083.nf
1084
1085 return print reverse sort num values array;
1086 return print(reverse(sort num (values(%array))));
1087
1088.fi
1089When in doubt, parenthesize.
1090At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1091.Ip 2. 4 4
1092Don't go through silly contortions to exit a loop at the top or the
1093bottom, when
1094.I perl
1095provides the "last" operator so you can exit in the middle.
1096Just outdent it a little to make it more visible:
1097.nf
1098
1099.ne 7
1100 line:
1101 for (;;) {
1102 statements;
1103 last line if $foo;
1104 next line if /^#/;
1105 statements;
1106 }
1107
1108.fi
1109.Ip 3. 4 4
1110Don't be afraid to use loop labels\*(--they're there to enhance readability as
1111well as to allow multi-level loop breaks.
1112See last example.
ffed7fef 1113.Ip 4. 4 4
a687059c 1114For portability, when using features that may not be implemented on every
1115machine, test the construct in an eval to see if it fails.
03a14243 1116If you know what version or patchlevel a particular feature was implemented,
1117you can test $] to see if it will be there.
a687059c 1118.Ip 5. 4 4
ffed7fef 1119Choose mnemonic identifiers.
1120.Ip 6. 4 4
a687059c 1121Be consistent.
1122.Sh "Debugging"
1123If you invoke
1124.I perl
1125with a
1126.B \-d
1127switch, your script will be run under a debugging monitor.
1128It will halt before the first executable statement and ask you for a
1129command, such as:
1130.Ip "h" 12 4
1131Prints out a help message.
33b78306 1132.Ip "T" 12 4
1133Stack trace.
a687059c 1134.Ip "s" 12 4
1135Single step.
1136Executes until it reaches the beginning of another statement.
33b78306 1137.Ip "n" 12 4
1138Next.
1139Executes over subroutine calls, until it reaches the beginning of the
1140next statement.
1141.Ip "f" 12 4
1142Finish.
1143Executes statements until it has finished the current subroutine.
a687059c 1144.Ip "c" 12 4
1145Continue.
1146Executes until the next breakpoint is reached.
33b78306 1147.Ip "c line" 12 4
1148Continue to the specified line.
1149Inserts a one-time-only breakpoint at the specified line.
a687059c 1150.Ip "<CR>" 12 4
33b78306 1151Repeat last n or s.
a687059c 1152.Ip "l min+incr" 12 4
1153List incr+1 lines starting at min.
1154If min is omitted, starts where last listing left off.
1155If incr is omitted, previous value of incr is used.
1156.Ip "l min-max" 12 4
1157List lines in the indicated range.
1158.Ip "l line" 12 4
1159List just the indicated line.
1160.Ip "l" 12 4
33b78306 1161List next window.
1162.Ip "-" 12 4
1163List previous window.
1164.Ip "w line" 12 4
1165List window around line.
a687059c 1166.Ip "l subname" 12 4
1167List subroutine.
1168If it's a long subroutine it just lists the beginning.
1169Use \*(L"l\*(R" to list more.
33b78306 1170.Ip "/pattern/" 12 4
1171Regular expression search forward for pattern; the final / is optional.
1172.Ip "?pattern?" 12 4
1173Regular expression search backward for pattern; the final ? is optional.
a687059c 1174.Ip "L" 12 4
1175List lines that have breakpoints or actions.
33b78306 1176.Ip "S" 12 4
1177Lists the names of all subroutines.
a687059c 1178.Ip "t" 12 4
1179Toggle trace mode on or off.
33b78306 1180.Ip "b line condition" 12 4
a687059c 1181Set a breakpoint.
33b78306 1182If line is omitted, sets a breakpoint on the
a687059c 1183line that is about to be executed.
33b78306 1184If a condition is specified, it is evaluated each time the statement is
1185reached and a breakpoint is taken only if the condition is true.
a687059c 1186Breakpoints may only be set on lines that begin an executable statement.
33b78306 1187.Ip "b subname condition" 12 4
a687059c 1188Set breakpoint at first executable line of subroutine.
a687059c 1189.Ip "d line" 12 4
1190Delete breakpoint.
33b78306 1191If line is omitted, deletes the breakpoint on the
a687059c 1192line that is about to be executed.
1193.Ip "D" 12 4
1194Delete all breakpoints.
a687059c 1195.Ip "a line command" 12 4
1196Set an action for line.
1197A multi-line command may be entered by backslashing the newlines.
33b78306 1198.Ip "A" 12 4
1199Delete all line actions.
a687059c 1200.Ip "< command" 12 4
1201Set an action to happen before every debugger prompt.
1202A multi-line command may be entered by backslashing the newlines.
1203.Ip "> command" 12 4
1204Set an action to happen after the prompt when you've just given a command
1205to return to executing the script.
1206A multi-line command may be entered by backslashing the newlines.
33b78306 1207.Ip "V package" 12 4
1208List all variables in package.
1209Default is main package.
a687059c 1210.Ip "! number" 12 4
1211Redo a debugging command.
1212If number is omitted, redoes the previous command.
1213.Ip "! -number" 12 4
1214Redo the command that was that many commands ago.
1215.Ip "H -number" 12 4
1216Display last n commands.
1217Only commands longer than one character are listed.
1218If number is omitted, lists them all.
1219.Ip "q or ^D" 12 4
1220Quit.
1221.Ip "command" 12 4
1222Execute command as a perl statement.
1223A missing semicolon will be supplied.
1224.Ip "p expr" 12 4
1225Same as \*(L"print DB'OUT expr\*(R".
1226The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1227may be redirected to.
1228.PP
1229If you want to modify the debugger, copy perldb.pl from the perl library
1230to your current directory and modify it as necessary.
76854fea 1231(You'll also have to put -I. on your command line.)
a687059c 1232You can do some customization by setting up a .perldb file which contains
1233initialization code.
1234For instance, you could make aliases like these:
1235.nf
1236
ac58e20f 1237 $DB'alias{'len'} = 's/^len(.*)/p length($1)/';
1238 $DB'alias{'stop'} = 's/^stop (at|in)/b/';
1239 $DB'alias{'.'} =
1240 's/^\e./p "\e$DB\e'sub(\e$DB\e'line):\et",\e$DB\e'line[\e$DB\e'line]/';
a687059c 1241
1242.fi
1243.Sh "Setuid Scripts"
1244.I Perl
1245is designed to make it easy to write secure setuid and setgid scripts.
1246Unlike shells, which are based on multiple substitution passes on each line
1247of the script,
1248.I perl
1249uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1250Additionally, since the language has more built-in functionality, it
1251has to rely less upon external (and possibly untrustworthy) programs to
1252accomplish its purposes.
1253.PP
1254In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1255insecure, but this kernel feature can be disabled.
1256If it is,
1257.I perl
1258can emulate the setuid and setgid mechanism when it notices the otherwise
1259useless setuid/gid bits on perl scripts.
1260If the kernel feature isn't disabled,
1261.I perl
1262will complain loudly that your setuid script is insecure.
1263You'll need to either disable the kernel setuid script feature, or put
1264a C wrapper around the script.
1265.PP
1266When perl is executing a setuid script, it takes special precautions to
1267prevent you from falling into any obvious traps.
1268(In some ways, a perl script is more secure than the corresponding
1269C program.)
1270Any command line argument, environment variable, or input is marked as
1271\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1272command that invokes a subshell, or in any command that modifies files,
1273directories or processes.
1274Any variable that is set within an expression that has previously referenced
1275a tainted value also becomes tainted (even if it is logically impossible
1276for the tainted value to influence the variable).
1277For example:
1278.nf
1279
1280.ne 5
1281 $foo = shift; # $foo is tainted
1282 $bar = $foo,\'bar\'; # $bar is also tainted
1283 $xxx = <>; # Tainted
1284 $path = $ENV{\'PATH\'}; # Tainted, but see below
1285 $abc = \'abc\'; # Not tainted
1286
1287.ne 4
1288 system "echo $foo"; # Insecure
79a0689e 1289 system "/bin/echo", $foo; # Secure (doesn't use sh)
a687059c 1290 system "echo $bar"; # Insecure
1291 system "echo $abc"; # Insecure until PATH set
1292
1293.ne 5
1294 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1295 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1296
1297 $path = $ENV{\'PATH\'}; # Not tainted
1298 system "echo $abc"; # Is secure now!
1299
1300.ne 5
1301 open(FOO,"$foo"); # OK
1302 open(FOO,">$foo"); # Not OK
1303
1304 open(FOO,"echo $foo|"); # Not OK, but...
1305 open(FOO,"-|") || exec \'echo\', $foo; # OK
1306
1307 $zzz = `echo $foo`; # Insecure, zzz tainted
1308
1309 unlink $abc,$foo; # Insecure
1310 umask $foo; # Insecure
1311
1312.ne 3
1313 exec "echo $foo"; # Insecure
1314 exec "echo", $foo; # Secure (doesn't use sh)
1315 exec "sh", \'-c\', $foo; # Considered secure, alas
1316
1317.fi
1318The taintedness is associated with each scalar value, so some elements
1319of an array can be tainted, and others not.
1320.PP
1321If you try to do something insecure, you will get a fatal error saying
1322something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1323Note that you can still write an insecure system call or exec,
ae986130 1324but only by explicitly doing something like the last example above.
a687059c 1325You can also bypass the tainting mechanism by referencing
1326subpatterns\*(--\c
1327.I perl
1328presumes that if you reference a substring using $1, $2, etc, you knew
1329what you were doing when you wrote the pattern:
1330.nf
1331
1332 $ARGV[0] =~ /^\-P(\ew+)$/;
1333 $printer = $1; # Not tainted
1334
1335.fi
1336This is fairly secure since \ew+ doesn't match shell metacharacters.
1337Use of .+ would have been insecure, but
1338.I perl
1339doesn't check for that, so you must be careful with your patterns.
1340This is the ONLY mechanism for untainting user supplied filenames if you
1341want to do file operations on them (unless you make $> equal to $<).
1342.PP
1343It's also possible to get into trouble with other operations that don't care
1344whether they use tainted values.
1345Make judicious use of the file tests in dealing with any user-supplied
1346filenames.
1347When possible, do opens and such after setting $> = $<.
1348.I Perl
1349doesn't prevent you from opening tainted filenames for reading, so be
1350careful what you print out.
1351The tainting mechanism is intended to prevent stupid mistakes, not to remove
1352the need for thought.
1353.SH ENVIRONMENT
1354.I Perl
1355uses PATH in executing subprocesses, and in finding the script if \-S
1356is used.
1357HOME or LOGDIR are used if chdir has no argument.
1358.PP
1359Apart from these,
1360.I perl
1361uses no environment variables, except to make them available
1362to the script being executed, and to child processes.
1363However, scripts running setuid would do well to execute the following lines
1364before doing anything else, just to keep people honest:
1365.nf
1366
1367.ne 3
1368 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1369 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1370 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1371
1372.fi
1373.SH AUTHOR
1374Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
0f85fab0 1375.br
1376MS-DOS port by Diomidis Spinellis <dds@cc.ic.ac.uk>
a687059c 1377.SH FILES
1378/tmp/perl\-eXXXXXX temporary file for
1379.B \-e
1380commands.
1381.SH SEE ALSO
1382a2p awk to perl translator
1383.br
1384s2p sed to perl translator
1385.SH DIAGNOSTICS
1386Compilation errors will tell you the line number of the error, with an
1387indication of the next token or token type that was to be examined.
1388(In the case of a script passed to
1389.I perl
1390via
1391.B \-e
1392switches, each
1393.B \-e
1394is counted as one line.)
1395.PP
1396Setuid scripts have additional constraints that can produce error messages
1397such as \*(L"Insecure dependency\*(R".
1398See the section on setuid scripts.
1399.SH TRAPS
1400Accustomed
1401.IR awk
1402users should take special note of the following:
1403.Ip * 4 2
1404Semicolons are required after all simple statements in
1405.IR perl .
1406Newline
1407is not a statement delimiter.
1408.Ip * 4 2
1409Curly brackets are required on ifs and whiles.
1410.Ip * 4 2
1411Variables begin with $ or @ in
1412.IR perl .
1413.Ip * 4 2
1414Arrays index from 0 unless you set $[.
1415Likewise string positions in substr() and index().
1416.Ip * 4 2
1417You have to decide whether your array has numeric or string indices.
1418.Ip * 4 2
1419Associative array values do not spring into existence upon mere reference.
1420.Ip * 4 2
1421You have to decide whether you want to use string or numeric comparisons.
1422.Ip * 4 2
1423Reading an input line does not split it for you. You get to split it yourself
1424to an array.
1425And the
1426.I split
1427operator has different arguments.
1428.Ip * 4 2
1429The current input line is normally in $_, not $0.
1430It generally does not have the newline stripped.
ac58e20f 1431($0 is the name of the program executed.)
a687059c 1432.Ip * 4 2
1433$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1434match pattern.
1435.Ip * 4 2
1436The
1437.I print
1438statement does not add field and record separators unless you set
1439$, and $\e.
1440.Ip * 4 2
1441You must open your files before you print to them.
1442.Ip * 4 2
1443The range operator is \*(L".\|.\*(R", not comma.
1444(The comma operator works as in C.)
1445.Ip * 4 2
1446The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1447(\*(L"~\*(R" is the one's complement operator, as in C.)
1448.Ip * 4 2
1449The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1450(\*(L"^\*(R" is the XOR operator, as in C.)
1451.Ip * 4 2
1452The concatenation operator is \*(L".\*(R", not the null string.
1453(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1454since the third slash would be interpreted as a division operator\*(--the
1455tokener is in fact slightly context sensitive for operators like /, ?, and <.
1456And in fact, . itself can be the beginning of a number.)
1457.Ip * 4 2
1458.IR Next ,
1459.I exit
1460and
1461.I continue
1462work differently.
1463.Ip * 4 2
1464The following variables work differently
1465.nf
1466
1467 Awk \h'|2.5i'Perl
1468 ARGC \h'|2.5i'$#ARGV
1469 ARGV[0] \h'|2.5i'$0
1470 FILENAME\h'|2.5i'$ARGV
1471 FNR \h'|2.5i'$. \- something
1472 FS \h'|2.5i'(whatever you like)
1473 NF \h'|2.5i'$#Fld, or some such
1474 NR \h'|2.5i'$.
1475 OFMT \h'|2.5i'$#
1476 OFS \h'|2.5i'$,
1477 ORS \h'|2.5i'$\e
1478 RLENGTH \h'|2.5i'length($&)
ac58e20f 1479 RS \h'|2.5i'$/
a687059c 1480 RSTART \h'|2.5i'length($\`)
1481 SUBSEP \h'|2.5i'$;
1482
1483.fi
1484.Ip * 4 2
1485When in doubt, run the
1486.I awk
1487construct through a2p and see what it gives you.
1488.PP
1489Cerebral C programmers should take note of the following:
1490.Ip * 4 2
1491Curly brackets are required on ifs and whiles.
1492.Ip * 4 2
1493You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1494.Ip * 4 2
1495.I Break
1496and
1497.I continue
1498become
1499.I last
1500and
1501.IR next ,
1502respectively.
1503.Ip * 4 2
1504There's no switch statement.
1505.Ip * 4 2
1506Variables begin with $ or @ in
1507.IR perl .
1508.Ip * 4 2
1509Printf does not implement *.
1510.Ip * 4 2
1511Comments begin with #, not /*.
1512.Ip * 4 2
1513You can't take the address of anything.
1514.Ip * 4 2
1515ARGV must be capitalized.
1516.Ip * 4 2
1517The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1518.Ip * 4 2
1519Signal handlers deal with signal names, not numbers.
a687059c 1520.PP
1521Seasoned
1522.I sed
1523programmers should take note of the following:
1524.Ip * 4 2
1525Backreferences in substitutions use $ rather than \e.
1526.Ip * 4 2
1527The pattern matching metacharacters (, ), and | do not have backslashes in front.
1528.Ip * 4 2
1529The range operator is .\|. rather than comma.
1530.PP
1531Sharp shell programmers should take note of the following:
1532.Ip * 4 2
1533The backtick operator does variable interpretation without regard to the
1534presence of single quotes in the command.
1535.Ip * 4 2
1536The backtick operator does no translation of the return value, unlike csh.
1537.Ip * 4 2
1538Shells (especially csh) do several levels of substitution on each command line.
1539.I Perl
1540does substitution only in certain constructs such as double quotes,
1541backticks, angle brackets and search patterns.
1542.Ip * 4 2
1543Shells interpret scripts a little bit at a time.
1544.I Perl
1545compiles the whole program before executing it.
1546.Ip * 4 2
1547The arguments are available via @ARGV, not $1, $2, etc.
1548.Ip * 4 2
1549The environment is not automatically made available as variables.
1550.SH BUGS
1551.PP
1552.I Perl
1553is at the mercy of your machine's definitions of various operations
1554such as type casting, atof() and sprintf().
1555.PP
1556If your stdio requires an seek or eof between reads and writes on a particular
1557stream, so does
1558.IR perl .
1559.PP
1560While none of the built-in data types have any arbitrary size limits (apart
1561from memory size), there are still a few arbitrary limits:
1562a given identifier may not be longer than 255 characters;
1563sprintf is limited on many machines to 128 characters per field (unless the format
1564specifier is exactly %s);
1565and no component of your PATH may be longer than 255 if you use \-S.
1566.PP
1567.I Perl
1568actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1569anyone I said that.
1570.rn }` ''