perl 3.0 patch #7 (combined patch)
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
ffed7fef 2''' $Header: perl.man.4,v 3.0.1.3 89/11/17 15:32:25 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
ffed7fef 5''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
6''' patch5: fixed some manual typos and indent problems
7''' patch5: clarified difference between $! and $@
8'''
ae986130 9''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
10''' patch2: made some line breaks depend on troff vs. nroff
11''' patch2: clarified operation of ^ and $ when $* is false
12'''
03a14243 13''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
14''' patch1: documented the desirability of unnecessary parentheses
15'''
a687059c 16''' Revision 3.0 89/10/18 15:21:55 lwall
17''' 3.0 baseline
18'''
19.Sh "Precedence"
20.I Perl
21operators have the following associativity and precedence:
22.nf
23
24nonassoc\h'|1i'print printf exec system sort reverse
25\h'1.5i'chmod chown kill unlink utime die return
26left\h'|1i',
27right\h'|1i'= += \-= *= etc.
28right\h'|1i'?:
29nonassoc\h'|1i'.\|.
30left\h'|1i'||
31left\h'|1i'&&
32left\h'|1i'| ^
33left\h'|1i'&
34nonassoc\h'|1i'== != eq ne
35nonassoc\h'|1i'< > <= >= lt gt le ge
36nonassoc\h'|1i'chdir exit eval reset sleep rand umask
37nonassoc\h'|1i'\-r \-w \-x etc.
38left\h'|1i'<< >>
39left\h'|1i'+ \- .
40left\h'|1i'* / % x
41left\h'|1i'=~ !~
42right\h'|1i'! ~ and unary minus
43right\h'|1i'**
44nonassoc\h'|1i'++ \-\|\-
45left\h'|1i'\*(L'(\*(R'
46
47.fi
48As mentioned earlier, if any list operator (print, etc.) or
49any unary operator (chdir, etc.)
50is followed by a left parenthesis as the next token on the same line,
51the operator and arguments within parentheses are taken to
52be of highest precedence, just like a normal function call.
53Examples:
54.nf
55
ffed7fef 56 chdir $foo || die;\h'|3i'# (chdir $foo) || die
57 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
58 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
59 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 60
61but, because * is higher precedence than ||:
62
ffed7fef 63 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
64 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
65 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
66 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 67
ffed7fef 68 rand 10 * 20;\h'|3i'# rand (10 * 20)
69 rand(10) * 20;\h'|3i'# (rand 10) * 20
70 rand (10) * 20;\h'|3i'# (rand 10) * 20
71 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 72
73.fi
74In the absence of parentheses,
75the precedence of list operators such as print, sort or chmod is
76either very high or very low depending on whether you look at the left
77side of operator or the right side of it.
78For example, in
79.nf
80
81 @ary = (1, 3, sort 4, 2);
82 print @ary; # prints 1324
83
84.fi
85the commas on the right of the sort are evaluated before the sort, but
86the commas on the left are evaluated after.
87In other words, list operators tend to gobble up all the arguments that
88follow them, and then act like a simple term with regard to the preceding
89expression.
90Note that you have to be careful with parens:
91.nf
92
93.ne 3
94 # These evaluate exit before doing the print:
95 print($foo, exit); # Obviously not what you want.
96 print $foo, exit; # Nor is this.
97
98.ne 4
99 # These do the print before evaluating exit:
100 (print $foo), exit; # This is what you want.
101 print($foo), exit; # Or this.
102 print ($foo), exit; # Or even this.
103
104Also note that
105
106 print ($foo & 255) + 1, "\en";
107
108.fi
109probably doesn't do what you expect at first glance.
110.Sh "Subroutines"
111A subroutine may be declared as follows:
112.nf
113
114 sub NAME BLOCK
115
116.fi
117.PP
118Any arguments passed to the routine come in as array @_,
119that is ($_[0], $_[1], .\|.\|.).
120The array @_ is a local array, but its values are references to the
121actual scalar parameters.
122The return value of the subroutine is the value of the last expression
123evaluated, and can be either an array value or a scalar value.
124Alternately, a return statement may be used to specify the returned value and
125exit the subroutine.
126To create local variables see the
127.I local
128operator.
129.PP
130A subroutine is called using the
131.I do
132operator or the & operator.
133.nf
134
135.ne 12
136Example:
137
138 sub MAX {
139 local($max) = pop(@_);
140 foreach $foo (@_) {
141 $max = $foo \|if \|$max < $foo;
142 }
143 $max;
144 }
145
146 .\|.\|.
147 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
148
149.ne 21
150Example:
151
152 # get a line, combining continuation lines
153 # that start with whitespace
154 sub get_line {
155 $thisline = $lookahead;
156 line: while ($lookahead = <STDIN>) {
157 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
158 $thisline \|.= \|$lookahead;
159 }
160 else {
161 last line;
162 }
163 }
164 $thisline;
165 }
166
167 $lookahead = <STDIN>; # get first line
168 while ($_ = do get_line(\|)) {
169 .\|.\|.
170 }
171
172.fi
173.nf
174.ne 6
175Use array assignment to a local list to name your formal arguments:
176
177 sub maybeset {
178 local($key, $value) = @_;
179 $foo{$key} = $value unless $foo{$key};
180 }
181
182.fi
183This also has the effect of turning call-by-reference into call-by-value,
184since the assignment copies the values.
185.Sp
186Subroutines may be called recursively.
187If a subroutine is called using the & form, the argument list is optional.
188If omitted, no @_ array is set up for the subroutine; the @_ array at the
189time of the call is visible to subroutine instead.
190.nf
191
192 do foo(1,2,3); # pass three arguments
193 &foo(1,2,3); # the same
194
195 do foo(); # pass a null list
196 &foo(); # the same
197 &foo; # pass no arguments--more efficient
198
199.fi
200.Sh "Passing By Reference"
201Sometimes you don't want to pass the value of an array to a subroutine but
202rather the name of it, so that the subroutine can modify the global copy
203of it rather than working with a local copy.
204In perl you can refer to all the objects of a particular name by prefixing
205the name with a star: *foo.
206When evaluated, it produces a scalar value that represents all the objects
207of that name.
208When assigned to within a local() operation, it causes the name mentioned
209to refer to whatever * value was assigned to it.
210Example:
211.nf
212
213 sub doubleary {
214 local(*someary) = @_;
215 foreach $elem (@someary) {
216 $elem *= 2;
217 }
218 }
219 do doubleary(*foo);
220 do doubleary(*bar);
221
222.fi
223Assignment to *name is currently recommended only inside a local().
224You can actually assign to *name anywhere, but the previous referent of
225*name may be stranded forever.
226This may or may not bother you.
227.Sp
228Note that scalars are already passed by reference, so you can modify scalar
ae986130 229arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 230in question.
231You can modify all the elements of an array by passing all the elements
232as scalars, but you have to use the * mechanism to push, pop or change the
233size of an array.
234The * mechanism will probably be more efficient in any case.
235.Sp
236Since a *name value contains unprintable binary data, if it is used as
237an argument in a print, or as a %s argument in a printf or sprintf, it
238then has the value '*name', just so it prints out pretty.
239.Sh "Regular Expressions"
240The patterns used in pattern matching are regular expressions such as
241those supplied in the Version 8 regexp routines.
242(In fact, the routines are derived from Henry Spencer's freely redistributable
243reimplementation of the V8 routines.)
244In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
245Word boundaries may be matched by \eb, and non-boundaries by \eB.
246A whitespace character is matched by \es, non-whitespace by \eS.
247A numeric character is matched by \ed, non-numeric by \eD.
248You may use \ew, \es and \ed within character classes.
249Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
250Within character classes \eb represents backspace rather than a word boundary.
251Alternatives may be separated by |.
252The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
253matches the digit'th substring, where digit can range from 1 to 9.
254(Outside of the pattern, always use $ instead of \e in front of the digit.
255The scope of $<digit> (and $\`, $& and $\')
256extends to the end of the enclosing BLOCK or eval string, or to
257the next pattern match with subexpressions.
258The \e<digit> notation sometimes works outside the current pattern, but should
259not be relied upon.)
260$+ returns whatever the last bracket match matched.
261$& returns the entire matched string.
262($0 normally returns the same thing, but don't depend on it.)
263$\` returns everything before the matched string.
264$\' returns everything after the matched string.
265Examples:
266.nf
267
268 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
269
270.ne 5
271 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
272 $hours = $1;
273 $minutes = $2;
274 $seconds = $3;
275 }
276
277.fi
ae986130 278By default, the ^ character is only guaranteed to match at the beginning
279of the string,
280the $ character only at the end (or before the newline at the end)
a687059c 281and
282.I perl
283does certain optimizations with the assumption that the string contains
284only one line.
ae986130 285The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 286You may, however, wish to treat a string as a multi-line buffer, such that
287the ^ will match after any newline within the string, and $ will match
288before any newline.
289At the cost of a little more overhead, you can do this by setting the variable
290$* to 1.
291Setting it back to 0 makes
292.I perl
293revert to its old behavior.
294.PP
295To facilitate multi-line substitutions, the . character never matches a newline
296(even when $* is 0).
297In particular, the following leaves a newline on the $_ string:
298.nf
299
300 $_ = <STDIN>;
301 s/.*(some_string).*/$1/;
302
303If the newline is unwanted, try one of
304
305 s/.*(some_string).*\en/$1/;
306 s/.*(some_string)[^\e000]*/$1/;
307 s/.*(some_string)(.|\en)*/$1/;
308 chop; s/.*(some_string).*/$1/;
309 /(some_string)/ && ($_ = $1);
310
311.fi
312Any item of a regular expression may be followed with digits in curly brackets
313of the form {n,m}, where n gives the minimum number of times to match the item
314and m gives the maximum.
315The form {n} is equivalent to {n,n} and matches exactly n times.
316The form {n,} matches n or more times.
317(If a curly bracket occurs in any other context, it is treated as a regular
318character.)
319The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
320to {0,1}.
321There is no limit to the size of n or m, but large numbers will chew up
322more memory.
323.Sp
324You will note that all backslashed metacharacters in
325.I perl
326are alphanumeric,
327such as \eb, \ew, \en.
328Unlike some other regular expression languages, there are no backslashed
329symbols that aren't alphanumeric.
330So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
331interpreted as a literal character, not a metacharacter.
332This makes it simple to quote a string that you want to use for a pattern
333but that you are afraid might contain metacharacters.
334Simply quote all the non-alphanumeric characters:
335.nf
336
337 $pattern =~ s/(\eW)/\e\e$1/g;
338
339.fi
340.Sh "Formats"
341Output record formats for use with the
342.I write
343operator may declared as follows:
344.nf
345
346.ne 3
347 format NAME =
348 FORMLIST
349 .
350
351.fi
352If name is omitted, format \*(L"STDOUT\*(R" is defined.
353FORMLIST consists of a sequence of lines, each of which may be of one of three
354types:
355.Ip 1. 4
356A comment.
357.Ip 2. 4
358A \*(L"picture\*(R" line giving the format for one output line.
359.Ip 3. 4
360An argument line supplying values to plug into a picture line.
361.PP
362Picture lines are printed exactly as they look, except for certain fields
363that substitute values into the line.
364Each picture field starts with either @ or ^.
365The @ field (not to be confused with the array marker @) is the normal
366case; ^ fields are used
367to do rudimentary multi-line text block filling.
368The length of the field is supplied by padding out the field
369with multiple <, >, or | characters to specify, respectively, left justification,
370right justification, or centering.
371If any of the values supplied for these fields contains a newline, only
372the text up to the newline is printed.
373The special field @* can be used for printing multi-line values.
374It should appear by itself on a line.
375.PP
376The values are specified on the following line, in the same order as
377the picture fields.
378The values should be separated by commas.
379.PP
380Picture fields that begin with ^ rather than @ are treated specially.
381The value supplied must be a scalar variable name which contains a text
382string.
383.I Perl
384puts as much text as it can into the field, and then chops off the front
385of the string so that the next time the variable is referenced,
386more of the text can be printed.
387Normally you would use a sequence of fields in a vertical stack to print
388out a block of text.
389If you like, you can end the final field with .\|.\|., which will appear in the
390output if the text was too long to appear in its entirety.
391You can change which characters are legal to break on by changing the
392variable $: to a list of the desired characters.
393.PP
394Since use of ^ fields can produce variable length records if the text to be
395formatted is short, you can suppress blank lines by putting the tilde (~)
396character anywhere in the line.
397(Normally you should put it in the front if possible, for visibility.)
398The tilde will be translated to a space upon output.
399If you put a second tilde contiguous to the first, the line will be repeated
400until all the fields on the line are exhausted.
401(If you use a field of the @ variety, the expression you supply had better
402not give the same value every time forever!)
403.PP
404Examples:
405.nf
406.lg 0
407.cs R 25
408.ft C
409
410.ne 10
411# a report on the /etc/passwd file
412format top =
413\& Passwd File
414Name Login Office Uid Gid Home
415------------------------------------------------------------------
416\&.
417format STDOUT =
418@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
419$name, $login, $office,$uid,$gid, $home
420\&.
421
422.ne 29
423# a report from a bug report form
424format top =
425\& Bug Reports
426@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
427$system, $%, $date
428------------------------------------------------------------------
429\&.
430format STDOUT =
431Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
432\& $subject
433Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
434\& $index, $description
435Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
436\& $priority, $date, $description
437From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
438\& $from, $description
439Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
440\& $programmer, $description
441\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
442\& $description
443\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
444\& $description
445\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
446\& $description
447\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
448\& $description
449\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
450\& $description
451\&.
452
453.ft R
454.cs R
455.lg
456.fi
457It is possible to intermix prints with writes on the same output channel,
458but you'll have to handle $\- (lines left on the page) yourself.
459.PP
460If you are printing lots of fields that are usually blank, you should consider
461using the reset operator between records.
462Not only is it more efficient, but it can prevent the bug of adding another
463field and forgetting to zero it.
464.Sh "Interprocess Communication"
465The IPC facilities of perl are built on the Berkeley socket mechanism.
466If you don't have sockets, you can ignore this section.
467The calls have the same names as the corresponding system calls,
468but the arguments tend to differ, for two reasons.
469First, perl file handles work differently than C file descriptors.
470Second, perl already knows the length of its strings, so you don't need
471to pass that information.
472Here is a sample client (untested):
473.nf
474
475 ($them,$port) = @ARGV;
476 $port = 2345 unless $port;
477 $them = 'localhost' unless $them;
478
479 $SIG{'INT'} = 'dokill';
480 sub dokill { kill 9,$child if $child; }
481
482 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
483
484 $sockaddr = 'S n a4 x8';
485 chop($hostname = `hostname`);
486
487 ($name, $aliases, $proto) = getprotobyname('tcp');
488 ($name, $aliases, $port) = getservbyname($port, 'tcp')
489 unless $port =~ /^\ed+$/;;
ae986130 490.ie t \{\
a687059c 491 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 492'br\}
493.el \{\
494 ($name, $aliases, $type, $len, $thisaddr) =
495 gethostbyname($hostname);
496'br\}
a687059c 497 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
498
499 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
500 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
501
502 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
503 bind(S, $this) || die "bind: $!";
504 connect(S, $that) || die "connect: $!";
505
506 select(S); $| = 1; select(stdout);
507
508 if ($child = fork) {
509 while (<>) {
510 print S;
511 }
512 sleep 3;
513 do dokill();
514 }
515 else {
516 while (<S>) {
517 print;
518 }
519 }
520
521.fi
522And here's a server:
523.nf
524
525 ($port) = @ARGV;
526 $port = 2345 unless $port;
527
528 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
529
530 $sockaddr = 'S n a4 x8';
531
532 ($name, $aliases, $proto) = getprotobyname('tcp');
533 ($name, $aliases, $port) = getservbyname($port, 'tcp')
534 unless $port =~ /^\ed+$/;;
535
536 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
537
538 select(NS); $| = 1; select(stdout);
539
540 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
541 bind(S, $this) || die "bind: $!";
542 listen(S, 5) || die "connect: $!";
543
544 select(S); $| = 1; select(stdout);
545
546 for (;;) {
547 print "Listening again\en";
548 ($addr = accept(NS,S)) || die $!;
549 print "accept ok\en";
550
ae986130 551 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 552 @inetaddr = unpack('C4',$inetaddr);
553 print "$af $port @inetaddr\en";
554
555 while (<NS>) {
556 print;
557 print NS;
558 }
559 }
560
561.fi
562.Sh "Predefined Names"
563The following names have special meaning to
564.IR perl .
565I could have used alphabetic symbols for some of these, but I didn't want
566to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
567out.
568You'll just have to suffer along with these silly symbols.
569Most of them have reasonable mnemonics, or analogues in one of the shells.
570.Ip $_ 8
571The default input and pattern-searching space.
572The following pairs are equivalent:
573.nf
574
575.ne 2
576 while (<>) {\|.\|.\|. # only equivalent in while!
577 while ($_ = <>) {\|.\|.\|.
578
579.ne 2
580 /\|^Subject:/
581 $_ \|=~ \|/\|^Subject:/
582
583.ne 2
584 y/a\-z/A\-Z/
585 $_ =~ y/a\-z/A\-Z/
586
587.ne 2
588 chop
589 chop($_)
590
591.fi
592(Mnemonic: underline is understood in certain operations.)
593.Ip $. 8
594The current input line number of the last filehandle that was read.
595Readonly.
596Remember that only an explicit close on the filehandle resets the line number.
597Since <> never does an explicit close, line numbers increase across ARGV files
598(but see examples under eof).
599(Mnemonic: many programs use . to mean the current line number.)
600.Ip $/ 8
601The input record separator, newline by default.
602Works like
603.IR awk 's
604RS variable, including treating blank lines as delimiters
605if set to the null string.
606If set to a value longer than one character, only the first character is used.
607(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
608.Ip $, 8
609The output field separator for the print operator.
610Ordinarily the print operator simply prints out the comma separated fields
611you specify.
612In order to get behavior more like
613.IR awk ,
614set this variable as you would set
615.IR awk 's
616OFS variable to specify what is printed between fields.
617(Mnemonic: what is printed when there is a , in your print statement.)
618.Ip $"" 8
619This is like $, except that it applies to array values interpolated into
620a double-quoted string (or similar interpreted string).
621Default is a space.
622(Mnemonic: obvious, I think.)
623.Ip $\e 8
624The output record separator for the print operator.
625Ordinarily the print operator simply prints out the comma separated fields
626you specify, with no trailing newline or record separator assumed.
627In order to get behavior more like
628.IR awk ,
629set this variable as you would set
630.IR awk 's
631ORS variable to specify what is printed at the end of the print.
632(Mnemonic: you set $\e instead of adding \en at the end of the print.
633Also, it's just like /, but it's what you get \*(L"back\*(R" from
634.IR perl .)
635.Ip $# 8
636The output format for printed numbers.
637This variable is a half-hearted attempt to emulate
638.IR awk 's
639OFMT variable.
640There are times, however, when
641.I awk
642and
643.I perl
644have differing notions of what
645is in fact numeric.
646Also, the initial value is %.20g rather than %.6g, so you need to set $#
647explicitly to get
648.IR awk 's
649value.
650(Mnemonic: # is the number sign.)
651.Ip $% 8
652The current page number of the currently selected output channel.
653(Mnemonic: % is page number in nroff.)
654.Ip $= 8
655The current page length (printable lines) of the currently selected output
656channel.
657Default is 60.
658(Mnemonic: = has horizontal lines.)
659.Ip $\- 8
660The number of lines left on the page of the currently selected output channel.
661(Mnemonic: lines_on_page \- lines_printed.)
662.Ip $~ 8
663The name of the current report format for the currently selected output
664channel.
665(Mnemonic: brother to $^.)
666.Ip $^ 8
667The name of the current top-of-page format for the currently selected output
668channel.
669(Mnemonic: points to top of page.)
670.Ip $| 8
671If set to nonzero, forces a flush after every write or print on the currently
672selected output channel.
673Default is 0.
674Note that
675.I STDOUT
676will typically be line buffered if output is to the
677terminal and block buffered otherwise.
678Setting this variable is useful primarily when you are outputting to a pipe,
679such as when you are running a
680.I perl
681script under rsh and want to see the
682output as it's happening.
683(Mnemonic: when you want your pipes to be piping hot.)
684.Ip $$ 8
685The process number of the
686.I perl
687running this script.
688(Mnemonic: same as shells.)
689.Ip $? 8
690The status returned by the last pipe close, backtick (\`\`) command or
691.I system
692operator.
693Note that this is the status word returned by the wait() system
694call, so the exit value of the subprocess is actually ($? >> 8).
695$? & 255 gives which signal, if any, the process died from, and whether
696there was a core dump.
697(Mnemonic: similar to sh and ksh.)
698.Ip $& 8 4
699The string matched by the last pattern match (not counting any matches hidden
700within a BLOCK or eval enclosed by the current BLOCK).
701(Mnemonic: like & in some editors.)
702.Ip $\` 8 4
703The string preceding whatever was matched by the last pattern match
704(not counting any matches hidden within a BLOCK or eval enclosed by the current
705BLOCK).
706(Mnemonic: \` often precedes a quoted string.)
707.Ip $\' 8 4
708The string following whatever was matched by the last pattern match
709(not counting any matches hidden within a BLOCK or eval enclosed by the current
710BLOCK).
711(Mnemonic: \' often follows a quoted string.)
712Example:
713.nf
714
715.ne 3
716 $_ = \'abcdefghi\';
717 /def/;
718 print "$\`:$&:$\'\en"; # prints abc:def:ghi
719
720.fi
721.Ip $+ 8 4
722The last bracket matched by the last search pattern.
723This is useful if you don't know which of a set of alternative patterns
724matched.
725For example:
726.nf
727
728 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
729
730.fi
731(Mnemonic: be positive and forward looking.)
732.Ip $* 8 2
733Set to 1 to do multiline matching within a string, 0 to tell
734.I perl
735that it can assume that strings contain a single line, for the purpose
736of optimizing pattern matches.
737Pattern matches on strings containing multiple newlines can produce confusing
738results when $* is 0.
739Default is 0.
740(Mnemonic: * matches multiple things.)
741.Ip $0 8
742Contains the name of the file containing the
743.I perl
744script being executed.
745The value should be copied elsewhere before any pattern matching happens, which
746clobbers $0.
747(Mnemonic: same as sh and ksh.)
748.Ip $<digit> 8
749Contains the subpattern from the corresponding set of parentheses in the last
750pattern matched, not counting patterns matched in nested blocks that have
751been exited already.
752(Mnemonic: like \edigit.)
753.Ip $[ 8 2
754The index of the first element in an array, and of the first character in
755a substring.
756Default is 0, but you could set it to 1 to make
757.I perl
758behave more like
759.I awk
760(or Fortran)
761when subscripting and when evaluating the index() and substr() functions.
762(Mnemonic: [ begins subscripts.)
763.Ip $] 8 2
764The string printed out when you say \*(L"perl -v\*(R".
765It can be used to determine at the beginning of a script whether the perl
766interpreter executing the script is in the right range of versions.
767Example:
768.nf
769
770.ne 5
771 # see if getc is available
772 ($version,$patchlevel) =
773 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
774 print STDERR "(No filename completion available.)\en"
775 if $version * 1000 + $patchlevel < 2016;
776
777.fi
778(Mnemonic: Is this version of perl in the right bracket?)
779.Ip $; 8 2
780The subscript separator for multi-dimensional array emulation.
781If you refer to an associative array element as
782.nf
783 $foo{$a,$b,$c}
784
785it really means
786
787 $foo{join($;, $a, $b, $c)}
788
789But don't put
790
791 @foo{$a,$b,$c} # a slice--note the @
792
793which means
794
795 ($foo{$a},$foo{$b},$foo{$c})
796
797.fi
798Default is "\e034", the same as SUBSEP in
799.IR awk .
800Note that if your keys contain binary data there might not be any safe
801value for $;.
802(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
803Yeah, I know, it's pretty lame, but $, is already taken for something more
804important.)
805.Ip $! 8 2
806If used in a numeric context, yields the current value of errno, with all the
807usual caveats.
ffed7fef 808(This means that you shouldn't depend on the value of $! to be anything
809in particular unless you've gotten a specific error return indicating a
810system error.)
a687059c 811If used in a string context, yields the corresponding system error string.
812You can assign to $! in order to set errno
813if, for instance, you want $! to return the string for error n, or you want
814to set the exit value for the die operator.
815(Mnemonic: What just went bang?)
816.Ip $@ 8 2
ffed7fef 817The perl syntax error message from the last eval command.
818If null, the last eval parsed and executed correctly (although the operations
819you invoked may have failed in the normal fashion).
a687059c 820(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
821.Ip $< 8 2
822The real uid of this process.
823(Mnemonic: it's the uid you came FROM, if you're running setuid.)
824.Ip $> 8 2
825The effective uid of this process.
826Example:
827.nf
828
829.ne 2
830 $< = $>; # set real uid to the effective uid
831 ($<,$>) = ($>,$<); # swap real and effective uid
832
833.fi
834(Mnemonic: it's the uid you went TO, if you're running setuid.)
835Note: $< and $> can only be swapped on machines supporting setreuid().
836.Ip $( 8 2
837The real gid of this process.
838If you are on a machine that supports membership in multiple groups
839simultaneously, gives a space separated list of groups you are in.
840The first number is the one returned by getgid(), and the subsequent ones
841by getgroups(), one of which may be the same as the first number.
842(Mnemonic: parentheses are used to GROUP things.
843The real gid is the group you LEFT, if you're running setgid.)
844.Ip $) 8 2
845The effective gid of this process.
846If you are on a machine that supports membership in multiple groups
847simultaneously, gives a space separated list of groups you are in.
848The first number is the one returned by getegid(), and the subsequent ones
849by getgroups(), one of which may be the same as the first number.
850(Mnemonic: parentheses are used to GROUP things.
851The effective gid is the group that's RIGHT for you, if you're running setgid.)
852.Sp
853Note: $<, $>, $( and $) can only be set on machines that support the
854corresponding set[re][ug]id() routine.
855$( and $) can only be swapped on machines supporting setregid().
856.Ip $: 8 2
857The current set of characters after which a string may be broken to
858fill continuation fields (starting with ^) in a format.
859Default is "\ \en-", to break on whitespace or hyphens.
860(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
861.Ip @ARGV 8 3
862The array ARGV contains the command line arguments intended for the script.
863Note that $#ARGV is the generally number of arguments minus one, since
864$ARGV[0] is the first argument, NOT the command name.
865See $0 for the command name.
866.Ip @INC 8 3
867The array INC contains the list of places to look for
868.I perl
869scripts to be
870evaluated by the \*(L"do EXPR\*(R" command.
871It initially consists of the arguments to any
872.B \-I
873command line switches, followed
874by the default
875.I perl
876library, probably \*(L"/usr/local/lib/perl\*(R".
877.Ip $ENV{expr} 8 2
878The associative array ENV contains your current environment.
879Setting a value in ENV changes the environment for child processes.
880.Ip $SIG{expr} 8 2
881The associative array SIG is used to set signal handlers for various signals.
882Example:
883.nf
884
885.ne 12
886 sub handler { # 1st argument is signal name
887 local($sig) = @_;
888 print "Caught a SIG$sig\-\|\-shutting down\en";
889 close(LOG);
890 exit(0);
891 }
892
893 $SIG{\'INT\'} = \'handler\';
894 $SIG{\'QUIT\'} = \'handler\';
895 .\|.\|.
896 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
897 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
898
899.fi
900The SIG array only contains values for the signals actually set within
901the perl script.
902.Sh "Packages"
903Perl provides a mechanism for alternate namespaces to protect packages from
904stomping on each others variables.
905By default, a perl script starts compiling into the package known as \*(L"main\*(R".
906By use of the
907.I package
908declaration, you can switch namespaces.
909The scope of the package declaration is from the declaration itself to the end
910of the enclosing block (the same scope as the local() operator).
911Typically it would be the first declaration in a file to be included by
912the \*(L"do FILE\*(R" operator.
913You can switch into a package in more than one place; it merely influences
914which symbol table is used by the compiler for the rest of that block.
915You can refer to variables in other packages by prefixing the name with
916the package name and a single quote.
917If the package name is null, the \*(L"main\*(R" package as assumed.
918Eval'ed strings are compiled in the package in which the eval was compiled
919in.
920(Assignments to $SIG{}, however, assume the signal handler specified is in the
921main package.
922Qualify the signal handler name if you wish to have a signal handler in
923a package.)
924For an example, examine perldb.pl in the perl library.
925It initially switches to the DB package so that the debugger doesn't interfere
926with variables in the script you are trying to debug.
927At various points, however, it temporarily switches back to the main package
928to evaluate various expressions in the context of the main package.
929.PP
930The symbol table for a package happens to be stored in the associative array
931of that name prepended with an underscore.
932The value in each entry of the associative array is
933what you are referring to when you use the *name notation.
934In fact, the following have the same effect (in package main, anyway),
935though the first is more
936efficient because it does the symbol table lookups at compile time:
937.nf
938
939.ne 2
940 local(*foo) = *bar;
941 local($_main{'foo'}) = $_main{'bar'};
942
943.fi
944You can use this to print out all the variables in a package, for instance.
945Here is dumpvar.pl from the perl library:
946.nf
947.ne 11
948 package dumpvar;
949
950 sub main'dumpvar {
951 \& ($package) = @_;
952 \& local(*stab) = eval("*_$package");
953 \& while (($key,$val) = each(%stab)) {
954 \& {
955 \& local(*entry) = $val;
956 \& if (defined $entry) {
957 \& print "\e$$key = '$entry'\en";
958 \& }
959.ne 7
960 \& if (defined @entry) {
961 \& print "\e@$key = (\en";
962 \& foreach $num ($[ .. $#entry) {
963 \& print " $num\et'",$entry[$num],"'\en";
964 \& }
965 \& print ")\en";
966 \& }
967.ne 10
968 \& if ($key ne "_$package" && defined %entry) {
969 \& print "\e%$key = (\en";
970 \& foreach $key (sort keys(%entry)) {
971 \& print " $key\et'",$entry{$key},"'\en";
972 \& }
973 \& print ")\en";
974 \& }
975 \& }
976 \& }
977 }
978
979.fi
980Note that, even though the subroutine is compiled in package dumpvar, the
981name of the subroutine is qualified so that it's name is inserted into package
982\*(L"main\*(R".
983.Sh "Style"
984Each programmer will, of course, have his or her own preferences in regards
985to formatting, but there are some general guidelines that will make your
986programs easier to read.
987.Ip 1. 4 4
988Just because you CAN do something a particular way doesn't mean that
989you SHOULD do it that way.
990.I Perl
991is designed to give you several ways to do anything, so consider picking
992the most readable one.
993For instance
994
995 open(FOO,$foo) || die "Can't open $foo: $!";
996
997is better than
998
999 die "Can't open $foo: $!" unless open(FOO,$foo);
1000
1001because the second way hides the main point of the statement in a
1002modifier.
1003On the other hand
1004
1005 print "Starting analysis\en" if $verbose;
1006
1007is better than
1008
1009 $verbose && print "Starting analysis\en";
1010
1011since the main point isn't whether the user typed -v or not.
1012.Sp
1013Similarly, just because an operator lets you assume default arguments
1014doesn't mean that you have to make use of the defaults.
1015The defaults are there for lazy systems programmers writing one-shot
1016programs.
1017If you want your program to be readable, consider supplying the argument.
03a14243 1018.Sp
1019Along the same lines, just because you
1020.I can
1021omit parentheses in many places doesn't mean that you ought to:
1022.nf
1023
1024 return print reverse sort num values array;
1025 return print(reverse(sort num (values(%array))));
1026
1027.fi
1028When in doubt, parenthesize.
1029At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1030.Ip 2. 4 4
1031Don't go through silly contortions to exit a loop at the top or the
1032bottom, when
1033.I perl
1034provides the "last" operator so you can exit in the middle.
1035Just outdent it a little to make it more visible:
1036.nf
1037
1038.ne 7
1039 line:
1040 for (;;) {
1041 statements;
1042 last line if $foo;
1043 next line if /^#/;
1044 statements;
1045 }
1046
1047.fi
1048.Ip 3. 4 4
1049Don't be afraid to use loop labels\*(--they're there to enhance readability as
1050well as to allow multi-level loop breaks.
1051See last example.
ffed7fef 1052.Ip 4. 4 4
a687059c 1053For portability, when using features that may not be implemented on every
1054machine, test the construct in an eval to see if it fails.
03a14243 1055If you know what version or patchlevel a particular feature was implemented,
1056you can test $] to see if it will be there.
a687059c 1057.Ip 5. 4 4
ffed7fef 1058Choose mnemonic identifiers.
1059.Ip 6. 4 4
a687059c 1060Be consistent.
1061.Sh "Debugging"
1062If you invoke
1063.I perl
1064with a
1065.B \-d
1066switch, your script will be run under a debugging monitor.
1067It will halt before the first executable statement and ask you for a
1068command, such as:
1069.Ip "h" 12 4
1070Prints out a help message.
1071.Ip "s" 12 4
1072Single step.
1073Executes until it reaches the beginning of another statement.
1074.Ip "c" 12 4
1075Continue.
1076Executes until the next breakpoint is reached.
1077.Ip "<CR>" 12 4
1078Repeat last s or c.
1079.Ip "n" 12 4
1080Single step around subroutine call.
1081.Ip "l min+incr" 12 4
1082List incr+1 lines starting at min.
1083If min is omitted, starts where last listing left off.
1084If incr is omitted, previous value of incr is used.
1085.Ip "l min-max" 12 4
1086List lines in the indicated range.
1087.Ip "l line" 12 4
1088List just the indicated line.
1089.Ip "l" 12 4
1090List incr+1 more lines after last printed line.
1091.Ip "l subname" 12 4
1092List subroutine.
1093If it's a long subroutine it just lists the beginning.
1094Use \*(L"l\*(R" to list more.
1095.Ip "L" 12 4
1096List lines that have breakpoints or actions.
1097.Ip "t" 12 4
1098Toggle trace mode on or off.
1099.Ip "b line" 12 4
1100Set a breakpoint.
1101If line is omitted, sets a breakpoint on the current line
1102line that is about to be executed.
1103Breakpoints may only be set on lines that begin an executable statement.
1104.Ip "b subname" 12 4
1105Set breakpoint at first executable line of subroutine.
1106.Ip "S" 12 4
1107Lists the names of all subroutines.
1108.Ip "d line" 12 4
1109Delete breakpoint.
1110If line is omitted, deletes the breakpoint on the current line
1111line that is about to be executed.
1112.Ip "D" 12 4
1113Delete all breakpoints.
1114.Ip "A" 12 4
1115Delete all line actions.
1116.Ip "V package" 12 4
1117List all variables in package.
1118Default is main package.
1119.Ip "a line command" 12 4
1120Set an action for line.
1121A multi-line command may be entered by backslashing the newlines.
1122.Ip "< command" 12 4
1123Set an action to happen before every debugger prompt.
1124A multi-line command may be entered by backslashing the newlines.
1125.Ip "> command" 12 4
1126Set an action to happen after the prompt when you've just given a command
1127to return to executing the script.
1128A multi-line command may be entered by backslashing the newlines.
1129.Ip "! number" 12 4
1130Redo a debugging command.
1131If number is omitted, redoes the previous command.
1132.Ip "! -number" 12 4
1133Redo the command that was that many commands ago.
1134.Ip "H -number" 12 4
1135Display last n commands.
1136Only commands longer than one character are listed.
1137If number is omitted, lists them all.
1138.Ip "q or ^D" 12 4
1139Quit.
1140.Ip "command" 12 4
1141Execute command as a perl statement.
1142A missing semicolon will be supplied.
1143.Ip "p expr" 12 4
1144Same as \*(L"print DB'OUT expr\*(R".
1145The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1146may be redirected to.
1147.PP
1148If you want to modify the debugger, copy perldb.pl from the perl library
1149to your current directory and modify it as necessary.
1150You can do some customization by setting up a .perldb file which contains
1151initialization code.
1152For instance, you could make aliases like these:
1153.nf
1154
1155 $DBalias{'len'} = 's/^len(.*)/p length(\e$1)/';
1156 $DBalias{'stop'} = 's/^stop (at|in)/b/';
1157 $DBalias{'.'} =
1158 's/^./p "\e$DBsub(\e$DBline):\et\e$DBline[\e$DBline]"/';
1159
1160.fi
1161.Sh "Setuid Scripts"
1162.I Perl
1163is designed to make it easy to write secure setuid and setgid scripts.
1164Unlike shells, which are based on multiple substitution passes on each line
1165of the script,
1166.I perl
1167uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1168Additionally, since the language has more built-in functionality, it
1169has to rely less upon external (and possibly untrustworthy) programs to
1170accomplish its purposes.
1171.PP
1172In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1173insecure, but this kernel feature can be disabled.
1174If it is,
1175.I perl
1176can emulate the setuid and setgid mechanism when it notices the otherwise
1177useless setuid/gid bits on perl scripts.
1178If the kernel feature isn't disabled,
1179.I perl
1180will complain loudly that your setuid script is insecure.
1181You'll need to either disable the kernel setuid script feature, or put
1182a C wrapper around the script.
1183.PP
1184When perl is executing a setuid script, it takes special precautions to
1185prevent you from falling into any obvious traps.
1186(In some ways, a perl script is more secure than the corresponding
1187C program.)
1188Any command line argument, environment variable, or input is marked as
1189\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1190command that invokes a subshell, or in any command that modifies files,
1191directories or processes.
1192Any variable that is set within an expression that has previously referenced
1193a tainted value also becomes tainted (even if it is logically impossible
1194for the tainted value to influence the variable).
1195For example:
1196.nf
1197
1198.ne 5
1199 $foo = shift; # $foo is tainted
1200 $bar = $foo,\'bar\'; # $bar is also tainted
1201 $xxx = <>; # Tainted
1202 $path = $ENV{\'PATH\'}; # Tainted, but see below
1203 $abc = \'abc\'; # Not tainted
1204
1205.ne 4
1206 system "echo $foo"; # Insecure
1207 system "echo", $foo; # Secure (doesn't use sh)
1208 system "echo $bar"; # Insecure
1209 system "echo $abc"; # Insecure until PATH set
1210
1211.ne 5
1212 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1213 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1214
1215 $path = $ENV{\'PATH\'}; # Not tainted
1216 system "echo $abc"; # Is secure now!
1217
1218.ne 5
1219 open(FOO,"$foo"); # OK
1220 open(FOO,">$foo"); # Not OK
1221
1222 open(FOO,"echo $foo|"); # Not OK, but...
1223 open(FOO,"-|") || exec \'echo\', $foo; # OK
1224
1225 $zzz = `echo $foo`; # Insecure, zzz tainted
1226
1227 unlink $abc,$foo; # Insecure
1228 umask $foo; # Insecure
1229
1230.ne 3
1231 exec "echo $foo"; # Insecure
1232 exec "echo", $foo; # Secure (doesn't use sh)
1233 exec "sh", \'-c\', $foo; # Considered secure, alas
1234
1235.fi
1236The taintedness is associated with each scalar value, so some elements
1237of an array can be tainted, and others not.
1238.PP
1239If you try to do something insecure, you will get a fatal error saying
1240something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1241Note that you can still write an insecure system call or exec,
ae986130 1242but only by explicitly doing something like the last example above.
a687059c 1243You can also bypass the tainting mechanism by referencing
1244subpatterns\*(--\c
1245.I perl
1246presumes that if you reference a substring using $1, $2, etc, you knew
1247what you were doing when you wrote the pattern:
1248.nf
1249
1250 $ARGV[0] =~ /^\-P(\ew+)$/;
1251 $printer = $1; # Not tainted
1252
1253.fi
1254This is fairly secure since \ew+ doesn't match shell metacharacters.
1255Use of .+ would have been insecure, but
1256.I perl
1257doesn't check for that, so you must be careful with your patterns.
1258This is the ONLY mechanism for untainting user supplied filenames if you
1259want to do file operations on them (unless you make $> equal to $<).
1260.PP
1261It's also possible to get into trouble with other operations that don't care
1262whether they use tainted values.
1263Make judicious use of the file tests in dealing with any user-supplied
1264filenames.
1265When possible, do opens and such after setting $> = $<.
1266.I Perl
1267doesn't prevent you from opening tainted filenames for reading, so be
1268careful what you print out.
1269The tainting mechanism is intended to prevent stupid mistakes, not to remove
1270the need for thought.
1271.SH ENVIRONMENT
1272.I Perl
1273uses PATH in executing subprocesses, and in finding the script if \-S
1274is used.
1275HOME or LOGDIR are used if chdir has no argument.
1276.PP
1277Apart from these,
1278.I perl
1279uses no environment variables, except to make them available
1280to the script being executed, and to child processes.
1281However, scripts running setuid would do well to execute the following lines
1282before doing anything else, just to keep people honest:
1283.nf
1284
1285.ne 3
1286 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1287 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1288 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1289
1290.fi
1291.SH AUTHOR
1292Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1293.SH FILES
1294/tmp/perl\-eXXXXXX temporary file for
1295.B \-e
1296commands.
1297.SH SEE ALSO
1298a2p awk to perl translator
1299.br
1300s2p sed to perl translator
1301.SH DIAGNOSTICS
1302Compilation errors will tell you the line number of the error, with an
1303indication of the next token or token type that was to be examined.
1304(In the case of a script passed to
1305.I perl
1306via
1307.B \-e
1308switches, each
1309.B \-e
1310is counted as one line.)
1311.PP
1312Setuid scripts have additional constraints that can produce error messages
1313such as \*(L"Insecure dependency\*(R".
1314See the section on setuid scripts.
1315.SH TRAPS
1316Accustomed
1317.IR awk
1318users should take special note of the following:
1319.Ip * 4 2
1320Semicolons are required after all simple statements in
1321.IR perl .
1322Newline
1323is not a statement delimiter.
1324.Ip * 4 2
1325Curly brackets are required on ifs and whiles.
1326.Ip * 4 2
1327Variables begin with $ or @ in
1328.IR perl .
1329.Ip * 4 2
1330Arrays index from 0 unless you set $[.
1331Likewise string positions in substr() and index().
1332.Ip * 4 2
1333You have to decide whether your array has numeric or string indices.
1334.Ip * 4 2
1335Associative array values do not spring into existence upon mere reference.
1336.Ip * 4 2
1337You have to decide whether you want to use string or numeric comparisons.
1338.Ip * 4 2
1339Reading an input line does not split it for you. You get to split it yourself
1340to an array.
1341And the
1342.I split
1343operator has different arguments.
1344.Ip * 4 2
1345The current input line is normally in $_, not $0.
1346It generally does not have the newline stripped.
1347($0 is initially the name of the program executed, then the last matched
1348string.)
1349.Ip * 4 2
1350$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1351match pattern.
1352.Ip * 4 2
1353The
1354.I print
1355statement does not add field and record separators unless you set
1356$, and $\e.
1357.Ip * 4 2
1358You must open your files before you print to them.
1359.Ip * 4 2
1360The range operator is \*(L".\|.\*(R", not comma.
1361(The comma operator works as in C.)
1362.Ip * 4 2
1363The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1364(\*(L"~\*(R" is the one's complement operator, as in C.)
1365.Ip * 4 2
1366The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1367(\*(L"^\*(R" is the XOR operator, as in C.)
1368.Ip * 4 2
1369The concatenation operator is \*(L".\*(R", not the null string.
1370(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1371since the third slash would be interpreted as a division operator\*(--the
1372tokener is in fact slightly context sensitive for operators like /, ?, and <.
1373And in fact, . itself can be the beginning of a number.)
1374.Ip * 4 2
1375.IR Next ,
1376.I exit
1377and
1378.I continue
1379work differently.
1380.Ip * 4 2
1381The following variables work differently
1382.nf
1383
1384 Awk \h'|2.5i'Perl
1385 ARGC \h'|2.5i'$#ARGV
1386 ARGV[0] \h'|2.5i'$0
1387 FILENAME\h'|2.5i'$ARGV
1388 FNR \h'|2.5i'$. \- something
1389 FS \h'|2.5i'(whatever you like)
1390 NF \h'|2.5i'$#Fld, or some such
1391 NR \h'|2.5i'$.
1392 OFMT \h'|2.5i'$#
1393 OFS \h'|2.5i'$,
1394 ORS \h'|2.5i'$\e
1395 RLENGTH \h'|2.5i'length($&)
ae986130 1396 RS \h'|2.5i'$\/
a687059c 1397 RSTART \h'|2.5i'length($\`)
1398 SUBSEP \h'|2.5i'$;
1399
1400.fi
1401.Ip * 4 2
1402When in doubt, run the
1403.I awk
1404construct through a2p and see what it gives you.
1405.PP
1406Cerebral C programmers should take note of the following:
1407.Ip * 4 2
1408Curly brackets are required on ifs and whiles.
1409.Ip * 4 2
1410You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1411.Ip * 4 2
1412.I Break
1413and
1414.I continue
1415become
1416.I last
1417and
1418.IR next ,
1419respectively.
1420.Ip * 4 2
1421There's no switch statement.
1422.Ip * 4 2
1423Variables begin with $ or @ in
1424.IR perl .
1425.Ip * 4 2
1426Printf does not implement *.
1427.Ip * 4 2
1428Comments begin with #, not /*.
1429.Ip * 4 2
1430You can't take the address of anything.
1431.Ip * 4 2
1432ARGV must be capitalized.
1433.Ip * 4 2
1434The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1435.Ip * 4 2
1436Signal handlers deal with signal names, not numbers.
1437.Ip * 4 2
1438You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1439.PP
1440Seasoned
1441.I sed
1442programmers should take note of the following:
1443.Ip * 4 2
1444Backreferences in substitutions use $ rather than \e.
1445.Ip * 4 2
1446The pattern matching metacharacters (, ), and | do not have backslashes in front.
1447.Ip * 4 2
1448The range operator is .\|. rather than comma.
1449.PP
1450Sharp shell programmers should take note of the following:
1451.Ip * 4 2
1452The backtick operator does variable interpretation without regard to the
1453presence of single quotes in the command.
1454.Ip * 4 2
1455The backtick operator does no translation of the return value, unlike csh.
1456.Ip * 4 2
1457Shells (especially csh) do several levels of substitution on each command line.
1458.I Perl
1459does substitution only in certain constructs such as double quotes,
1460backticks, angle brackets and search patterns.
1461.Ip * 4 2
1462Shells interpret scripts a little bit at a time.
1463.I Perl
1464compiles the whole program before executing it.
1465.Ip * 4 2
1466The arguments are available via @ARGV, not $1, $2, etc.
1467.Ip * 4 2
1468The environment is not automatically made available as variables.
1469.SH BUGS
1470.PP
1471.I Perl
1472is at the mercy of your machine's definitions of various operations
1473such as type casting, atof() and sprintf().
1474.PP
1475If your stdio requires an seek or eof between reads and writes on a particular
1476stream, so does
1477.IR perl .
1478.PP
1479While none of the built-in data types have any arbitrary size limits (apart
1480from memory size), there are still a few arbitrary limits:
1481a given identifier may not be longer than 255 characters;
1482sprintf is limited on many machines to 128 characters per field (unless the format
1483specifier is exactly %s);
1484and no component of your PATH may be longer than 255 if you use \-S.
1485.PP
1486.I Perl
1487actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1488anyone I said that.
1489.rn }` ''