perl 3.0 patch #10 patch #9, continued
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
663a0e37 2''' $Header: perl.man.4,v 3.0.1.4 89/12/21 20:12:39 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
663a0e37 5''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
6''' patch7: documented that package'filehandle works as well as $package'variable
7''' patch7: documented which identifiers are always in package main
8'''
ffed7fef 9''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
10''' patch5: fixed some manual typos and indent problems
11''' patch5: clarified difference between $! and $@
12'''
ae986130 13''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
14''' patch2: made some line breaks depend on troff vs. nroff
15''' patch2: clarified operation of ^ and $ when $* is false
16'''
03a14243 17''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
18''' patch1: documented the desirability of unnecessary parentheses
19'''
a687059c 20''' Revision 3.0 89/10/18 15:21:55 lwall
21''' 3.0 baseline
22'''
23.Sh "Precedence"
24.I Perl
25operators have the following associativity and precedence:
26.nf
27
28nonassoc\h'|1i'print printf exec system sort reverse
29\h'1.5i'chmod chown kill unlink utime die return
30left\h'|1i',
31right\h'|1i'= += \-= *= etc.
32right\h'|1i'?:
33nonassoc\h'|1i'.\|.
34left\h'|1i'||
35left\h'|1i'&&
36left\h'|1i'| ^
37left\h'|1i'&
38nonassoc\h'|1i'== != eq ne
39nonassoc\h'|1i'< > <= >= lt gt le ge
40nonassoc\h'|1i'chdir exit eval reset sleep rand umask
41nonassoc\h'|1i'\-r \-w \-x etc.
42left\h'|1i'<< >>
43left\h'|1i'+ \- .
44left\h'|1i'* / % x
45left\h'|1i'=~ !~
46right\h'|1i'! ~ and unary minus
47right\h'|1i'**
48nonassoc\h'|1i'++ \-\|\-
49left\h'|1i'\*(L'(\*(R'
50
51.fi
52As mentioned earlier, if any list operator (print, etc.) or
53any unary operator (chdir, etc.)
54is followed by a left parenthesis as the next token on the same line,
55the operator and arguments within parentheses are taken to
56be of highest precedence, just like a normal function call.
57Examples:
58.nf
59
ffed7fef 60 chdir $foo || die;\h'|3i'# (chdir $foo) || die
61 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
62 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
63 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 64
65but, because * is higher precedence than ||:
66
ffed7fef 67 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
68 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
69 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
70 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 71
ffed7fef 72 rand 10 * 20;\h'|3i'# rand (10 * 20)
73 rand(10) * 20;\h'|3i'# (rand 10) * 20
74 rand (10) * 20;\h'|3i'# (rand 10) * 20
75 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 76
77.fi
78In the absence of parentheses,
79the precedence of list operators such as print, sort or chmod is
80either very high or very low depending on whether you look at the left
81side of operator or the right side of it.
82For example, in
83.nf
84
85 @ary = (1, 3, sort 4, 2);
86 print @ary; # prints 1324
87
88.fi
89the commas on the right of the sort are evaluated before the sort, but
90the commas on the left are evaluated after.
91In other words, list operators tend to gobble up all the arguments that
92follow them, and then act like a simple term with regard to the preceding
93expression.
94Note that you have to be careful with parens:
95.nf
96
97.ne 3
98 # These evaluate exit before doing the print:
99 print($foo, exit); # Obviously not what you want.
100 print $foo, exit; # Nor is this.
101
102.ne 4
103 # These do the print before evaluating exit:
104 (print $foo), exit; # This is what you want.
105 print($foo), exit; # Or this.
106 print ($foo), exit; # Or even this.
107
108Also note that
109
110 print ($foo & 255) + 1, "\en";
111
112.fi
113probably doesn't do what you expect at first glance.
114.Sh "Subroutines"
115A subroutine may be declared as follows:
116.nf
117
118 sub NAME BLOCK
119
120.fi
121.PP
122Any arguments passed to the routine come in as array @_,
123that is ($_[0], $_[1], .\|.\|.).
124The array @_ is a local array, but its values are references to the
125actual scalar parameters.
126The return value of the subroutine is the value of the last expression
127evaluated, and can be either an array value or a scalar value.
128Alternately, a return statement may be used to specify the returned value and
129exit the subroutine.
130To create local variables see the
131.I local
132operator.
133.PP
134A subroutine is called using the
135.I do
136operator or the & operator.
137.nf
138
139.ne 12
140Example:
141
142 sub MAX {
143 local($max) = pop(@_);
144 foreach $foo (@_) {
145 $max = $foo \|if \|$max < $foo;
146 }
147 $max;
148 }
149
150 .\|.\|.
151 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
152
153.ne 21
154Example:
155
156 # get a line, combining continuation lines
157 # that start with whitespace
158 sub get_line {
159 $thisline = $lookahead;
160 line: while ($lookahead = <STDIN>) {
161 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
162 $thisline \|.= \|$lookahead;
163 }
164 else {
165 last line;
166 }
167 }
168 $thisline;
169 }
170
171 $lookahead = <STDIN>; # get first line
172 while ($_ = do get_line(\|)) {
173 .\|.\|.
174 }
175
176.fi
177.nf
178.ne 6
179Use array assignment to a local list to name your formal arguments:
180
181 sub maybeset {
182 local($key, $value) = @_;
183 $foo{$key} = $value unless $foo{$key};
184 }
185
186.fi
187This also has the effect of turning call-by-reference into call-by-value,
188since the assignment copies the values.
189.Sp
190Subroutines may be called recursively.
191If a subroutine is called using the & form, the argument list is optional.
192If omitted, no @_ array is set up for the subroutine; the @_ array at the
193time of the call is visible to subroutine instead.
194.nf
195
196 do foo(1,2,3); # pass three arguments
197 &foo(1,2,3); # the same
198
199 do foo(); # pass a null list
200 &foo(); # the same
201 &foo; # pass no arguments--more efficient
202
203.fi
204.Sh "Passing By Reference"
205Sometimes you don't want to pass the value of an array to a subroutine but
206rather the name of it, so that the subroutine can modify the global copy
207of it rather than working with a local copy.
208In perl you can refer to all the objects of a particular name by prefixing
209the name with a star: *foo.
210When evaluated, it produces a scalar value that represents all the objects
211of that name.
212When assigned to within a local() operation, it causes the name mentioned
213to refer to whatever * value was assigned to it.
214Example:
215.nf
216
217 sub doubleary {
218 local(*someary) = @_;
219 foreach $elem (@someary) {
220 $elem *= 2;
221 }
222 }
223 do doubleary(*foo);
224 do doubleary(*bar);
225
226.fi
227Assignment to *name is currently recommended only inside a local().
228You can actually assign to *name anywhere, but the previous referent of
229*name may be stranded forever.
230This may or may not bother you.
231.Sp
232Note that scalars are already passed by reference, so you can modify scalar
ae986130 233arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 234in question.
235You can modify all the elements of an array by passing all the elements
236as scalars, but you have to use the * mechanism to push, pop or change the
237size of an array.
238The * mechanism will probably be more efficient in any case.
239.Sp
240Since a *name value contains unprintable binary data, if it is used as
241an argument in a print, or as a %s argument in a printf or sprintf, it
242then has the value '*name', just so it prints out pretty.
243.Sh "Regular Expressions"
244The patterns used in pattern matching are regular expressions such as
245those supplied in the Version 8 regexp routines.
246(In fact, the routines are derived from Henry Spencer's freely redistributable
247reimplementation of the V8 routines.)
248In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
249Word boundaries may be matched by \eb, and non-boundaries by \eB.
250A whitespace character is matched by \es, non-whitespace by \eS.
251A numeric character is matched by \ed, non-numeric by \eD.
252You may use \ew, \es and \ed within character classes.
253Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
254Within character classes \eb represents backspace rather than a word boundary.
255Alternatives may be separated by |.
256The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
257matches the digit'th substring, where digit can range from 1 to 9.
258(Outside of the pattern, always use $ instead of \e in front of the digit.
259The scope of $<digit> (and $\`, $& and $\')
260extends to the end of the enclosing BLOCK or eval string, or to
261the next pattern match with subexpressions.
262The \e<digit> notation sometimes works outside the current pattern, but should
263not be relied upon.)
264$+ returns whatever the last bracket match matched.
265$& returns the entire matched string.
266($0 normally returns the same thing, but don't depend on it.)
267$\` returns everything before the matched string.
268$\' returns everything after the matched string.
269Examples:
270.nf
271
272 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
273
274.ne 5
275 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
276 $hours = $1;
277 $minutes = $2;
278 $seconds = $3;
279 }
280
281.fi
ae986130 282By default, the ^ character is only guaranteed to match at the beginning
283of the string,
284the $ character only at the end (or before the newline at the end)
a687059c 285and
286.I perl
287does certain optimizations with the assumption that the string contains
288only one line.
ae986130 289The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 290You may, however, wish to treat a string as a multi-line buffer, such that
291the ^ will match after any newline within the string, and $ will match
292before any newline.
293At the cost of a little more overhead, you can do this by setting the variable
294$* to 1.
295Setting it back to 0 makes
296.I perl
297revert to its old behavior.
298.PP
299To facilitate multi-line substitutions, the . character never matches a newline
300(even when $* is 0).
301In particular, the following leaves a newline on the $_ string:
302.nf
303
304 $_ = <STDIN>;
305 s/.*(some_string).*/$1/;
306
307If the newline is unwanted, try one of
308
309 s/.*(some_string).*\en/$1/;
310 s/.*(some_string)[^\e000]*/$1/;
311 s/.*(some_string)(.|\en)*/$1/;
312 chop; s/.*(some_string).*/$1/;
313 /(some_string)/ && ($_ = $1);
314
315.fi
316Any item of a regular expression may be followed with digits in curly brackets
317of the form {n,m}, where n gives the minimum number of times to match the item
318and m gives the maximum.
319The form {n} is equivalent to {n,n} and matches exactly n times.
320The form {n,} matches n or more times.
321(If a curly bracket occurs in any other context, it is treated as a regular
322character.)
323The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
324to {0,1}.
325There is no limit to the size of n or m, but large numbers will chew up
326more memory.
327.Sp
328You will note that all backslashed metacharacters in
329.I perl
330are alphanumeric,
331such as \eb, \ew, \en.
332Unlike some other regular expression languages, there are no backslashed
333symbols that aren't alphanumeric.
334So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
335interpreted as a literal character, not a metacharacter.
336This makes it simple to quote a string that you want to use for a pattern
337but that you are afraid might contain metacharacters.
338Simply quote all the non-alphanumeric characters:
339.nf
340
341 $pattern =~ s/(\eW)/\e\e$1/g;
342
343.fi
344.Sh "Formats"
345Output record formats for use with the
346.I write
347operator may declared as follows:
348.nf
349
350.ne 3
351 format NAME =
352 FORMLIST
353 .
354
355.fi
356If name is omitted, format \*(L"STDOUT\*(R" is defined.
357FORMLIST consists of a sequence of lines, each of which may be of one of three
358types:
359.Ip 1. 4
360A comment.
361.Ip 2. 4
362A \*(L"picture\*(R" line giving the format for one output line.
363.Ip 3. 4
364An argument line supplying values to plug into a picture line.
365.PP
366Picture lines are printed exactly as they look, except for certain fields
367that substitute values into the line.
368Each picture field starts with either @ or ^.
369The @ field (not to be confused with the array marker @) is the normal
370case; ^ fields are used
371to do rudimentary multi-line text block filling.
372The length of the field is supplied by padding out the field
373with multiple <, >, or | characters to specify, respectively, left justification,
374right justification, or centering.
375If any of the values supplied for these fields contains a newline, only
376the text up to the newline is printed.
377The special field @* can be used for printing multi-line values.
378It should appear by itself on a line.
379.PP
380The values are specified on the following line, in the same order as
381the picture fields.
382The values should be separated by commas.
383.PP
384Picture fields that begin with ^ rather than @ are treated specially.
385The value supplied must be a scalar variable name which contains a text
386string.
387.I Perl
388puts as much text as it can into the field, and then chops off the front
389of the string so that the next time the variable is referenced,
390more of the text can be printed.
391Normally you would use a sequence of fields in a vertical stack to print
392out a block of text.
393If you like, you can end the final field with .\|.\|., which will appear in the
394output if the text was too long to appear in its entirety.
395You can change which characters are legal to break on by changing the
396variable $: to a list of the desired characters.
397.PP
398Since use of ^ fields can produce variable length records if the text to be
399formatted is short, you can suppress blank lines by putting the tilde (~)
400character anywhere in the line.
401(Normally you should put it in the front if possible, for visibility.)
402The tilde will be translated to a space upon output.
403If you put a second tilde contiguous to the first, the line will be repeated
404until all the fields on the line are exhausted.
405(If you use a field of the @ variety, the expression you supply had better
406not give the same value every time forever!)
407.PP
408Examples:
409.nf
410.lg 0
411.cs R 25
412.ft C
413
414.ne 10
415# a report on the /etc/passwd file
416format top =
417\& Passwd File
418Name Login Office Uid Gid Home
419------------------------------------------------------------------
420\&.
421format STDOUT =
422@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
423$name, $login, $office,$uid,$gid, $home
424\&.
425
426.ne 29
427# a report from a bug report form
428format top =
429\& Bug Reports
430@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
431$system, $%, $date
432------------------------------------------------------------------
433\&.
434format STDOUT =
435Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
436\& $subject
437Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
438\& $index, $description
439Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
440\& $priority, $date, $description
441From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
442\& $from, $description
443Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
444\& $programmer, $description
445\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
446\& $description
447\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
448\& $description
449\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
450\& $description
451\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
452\& $description
453\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
454\& $description
455\&.
456
457.ft R
458.cs R
459.lg
460.fi
461It is possible to intermix prints with writes on the same output channel,
462but you'll have to handle $\- (lines left on the page) yourself.
463.PP
464If you are printing lots of fields that are usually blank, you should consider
465using the reset operator between records.
466Not only is it more efficient, but it can prevent the bug of adding another
467field and forgetting to zero it.
468.Sh "Interprocess Communication"
469The IPC facilities of perl are built on the Berkeley socket mechanism.
470If you don't have sockets, you can ignore this section.
471The calls have the same names as the corresponding system calls,
472but the arguments tend to differ, for two reasons.
473First, perl file handles work differently than C file descriptors.
474Second, perl already knows the length of its strings, so you don't need
475to pass that information.
476Here is a sample client (untested):
477.nf
478
479 ($them,$port) = @ARGV;
480 $port = 2345 unless $port;
481 $them = 'localhost' unless $them;
482
483 $SIG{'INT'} = 'dokill';
484 sub dokill { kill 9,$child if $child; }
485
486 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
487
488 $sockaddr = 'S n a4 x8';
489 chop($hostname = `hostname`);
490
491 ($name, $aliases, $proto) = getprotobyname('tcp');
492 ($name, $aliases, $port) = getservbyname($port, 'tcp')
493 unless $port =~ /^\ed+$/;;
ae986130 494.ie t \{\
a687059c 495 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 496'br\}
497.el \{\
498 ($name, $aliases, $type, $len, $thisaddr) =
499 gethostbyname($hostname);
500'br\}
a687059c 501 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
502
503 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
504 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
505
506 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
507 bind(S, $this) || die "bind: $!";
508 connect(S, $that) || die "connect: $!";
509
510 select(S); $| = 1; select(stdout);
511
512 if ($child = fork) {
513 while (<>) {
514 print S;
515 }
516 sleep 3;
517 do dokill();
518 }
519 else {
520 while (<S>) {
521 print;
522 }
523 }
524
525.fi
526And here's a server:
527.nf
528
529 ($port) = @ARGV;
530 $port = 2345 unless $port;
531
532 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
533
534 $sockaddr = 'S n a4 x8';
535
536 ($name, $aliases, $proto) = getprotobyname('tcp');
537 ($name, $aliases, $port) = getservbyname($port, 'tcp')
538 unless $port =~ /^\ed+$/;;
539
540 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
541
542 select(NS); $| = 1; select(stdout);
543
544 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
545 bind(S, $this) || die "bind: $!";
546 listen(S, 5) || die "connect: $!";
547
548 select(S); $| = 1; select(stdout);
549
550 for (;;) {
551 print "Listening again\en";
552 ($addr = accept(NS,S)) || die $!;
553 print "accept ok\en";
554
ae986130 555 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 556 @inetaddr = unpack('C4',$inetaddr);
557 print "$af $port @inetaddr\en";
558
559 while (<NS>) {
560 print;
561 print NS;
562 }
563 }
564
565.fi
566.Sh "Predefined Names"
567The following names have special meaning to
568.IR perl .
569I could have used alphabetic symbols for some of these, but I didn't want
570to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
571out.
572You'll just have to suffer along with these silly symbols.
573Most of them have reasonable mnemonics, or analogues in one of the shells.
574.Ip $_ 8
575The default input and pattern-searching space.
576The following pairs are equivalent:
577.nf
578
579.ne 2
580 while (<>) {\|.\|.\|. # only equivalent in while!
581 while ($_ = <>) {\|.\|.\|.
582
583.ne 2
584 /\|^Subject:/
585 $_ \|=~ \|/\|^Subject:/
586
587.ne 2
588 y/a\-z/A\-Z/
589 $_ =~ y/a\-z/A\-Z/
590
591.ne 2
592 chop
593 chop($_)
594
595.fi
596(Mnemonic: underline is understood in certain operations.)
597.Ip $. 8
598The current input line number of the last filehandle that was read.
599Readonly.
600Remember that only an explicit close on the filehandle resets the line number.
601Since <> never does an explicit close, line numbers increase across ARGV files
602(but see examples under eof).
603(Mnemonic: many programs use . to mean the current line number.)
604.Ip $/ 8
605The input record separator, newline by default.
606Works like
607.IR awk 's
608RS variable, including treating blank lines as delimiters
609if set to the null string.
610If set to a value longer than one character, only the first character is used.
611(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
612.Ip $, 8
613The output field separator for the print operator.
614Ordinarily the print operator simply prints out the comma separated fields
615you specify.
616In order to get behavior more like
617.IR awk ,
618set this variable as you would set
619.IR awk 's
620OFS variable to specify what is printed between fields.
621(Mnemonic: what is printed when there is a , in your print statement.)
622.Ip $"" 8
623This is like $, except that it applies to array values interpolated into
624a double-quoted string (or similar interpreted string).
625Default is a space.
626(Mnemonic: obvious, I think.)
627.Ip $\e 8
628The output record separator for the print operator.
629Ordinarily the print operator simply prints out the comma separated fields
630you specify, with no trailing newline or record separator assumed.
631In order to get behavior more like
632.IR awk ,
633set this variable as you would set
634.IR awk 's
635ORS variable to specify what is printed at the end of the print.
636(Mnemonic: you set $\e instead of adding \en at the end of the print.
637Also, it's just like /, but it's what you get \*(L"back\*(R" from
638.IR perl .)
639.Ip $# 8
640The output format for printed numbers.
641This variable is a half-hearted attempt to emulate
642.IR awk 's
643OFMT variable.
644There are times, however, when
645.I awk
646and
647.I perl
648have differing notions of what
649is in fact numeric.
650Also, the initial value is %.20g rather than %.6g, so you need to set $#
651explicitly to get
652.IR awk 's
653value.
654(Mnemonic: # is the number sign.)
655.Ip $% 8
656The current page number of the currently selected output channel.
657(Mnemonic: % is page number in nroff.)
658.Ip $= 8
659The current page length (printable lines) of the currently selected output
660channel.
661Default is 60.
662(Mnemonic: = has horizontal lines.)
663.Ip $\- 8
664The number of lines left on the page of the currently selected output channel.
665(Mnemonic: lines_on_page \- lines_printed.)
666.Ip $~ 8
667The name of the current report format for the currently selected output
668channel.
669(Mnemonic: brother to $^.)
670.Ip $^ 8
671The name of the current top-of-page format for the currently selected output
672channel.
673(Mnemonic: points to top of page.)
674.Ip $| 8
675If set to nonzero, forces a flush after every write or print on the currently
676selected output channel.
677Default is 0.
678Note that
679.I STDOUT
680will typically be line buffered if output is to the
681terminal and block buffered otherwise.
682Setting this variable is useful primarily when you are outputting to a pipe,
683such as when you are running a
684.I perl
685script under rsh and want to see the
686output as it's happening.
687(Mnemonic: when you want your pipes to be piping hot.)
688.Ip $$ 8
689The process number of the
690.I perl
691running this script.
692(Mnemonic: same as shells.)
693.Ip $? 8
694The status returned by the last pipe close, backtick (\`\`) command or
695.I system
696operator.
697Note that this is the status word returned by the wait() system
698call, so the exit value of the subprocess is actually ($? >> 8).
699$? & 255 gives which signal, if any, the process died from, and whether
700there was a core dump.
701(Mnemonic: similar to sh and ksh.)
702.Ip $& 8 4
703The string matched by the last pattern match (not counting any matches hidden
704within a BLOCK or eval enclosed by the current BLOCK).
705(Mnemonic: like & in some editors.)
706.Ip $\` 8 4
707The string preceding whatever was matched by the last pattern match
708(not counting any matches hidden within a BLOCK or eval enclosed by the current
709BLOCK).
710(Mnemonic: \` often precedes a quoted string.)
711.Ip $\' 8 4
712The string following whatever was matched by the last pattern match
713(not counting any matches hidden within a BLOCK or eval enclosed by the current
714BLOCK).
715(Mnemonic: \' often follows a quoted string.)
716Example:
717.nf
718
719.ne 3
720 $_ = \'abcdefghi\';
721 /def/;
722 print "$\`:$&:$\'\en"; # prints abc:def:ghi
723
724.fi
725.Ip $+ 8 4
726The last bracket matched by the last search pattern.
727This is useful if you don't know which of a set of alternative patterns
728matched.
729For example:
730.nf
731
732 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
733
734.fi
735(Mnemonic: be positive and forward looking.)
736.Ip $* 8 2
737Set to 1 to do multiline matching within a string, 0 to tell
738.I perl
739that it can assume that strings contain a single line, for the purpose
740of optimizing pattern matches.
741Pattern matches on strings containing multiple newlines can produce confusing
742results when $* is 0.
743Default is 0.
744(Mnemonic: * matches multiple things.)
745.Ip $0 8
746Contains the name of the file containing the
747.I perl
748script being executed.
749The value should be copied elsewhere before any pattern matching happens, which
750clobbers $0.
751(Mnemonic: same as sh and ksh.)
752.Ip $<digit> 8
753Contains the subpattern from the corresponding set of parentheses in the last
754pattern matched, not counting patterns matched in nested blocks that have
755been exited already.
756(Mnemonic: like \edigit.)
757.Ip $[ 8 2
758The index of the first element in an array, and of the first character in
759a substring.
760Default is 0, but you could set it to 1 to make
761.I perl
762behave more like
763.I awk
764(or Fortran)
765when subscripting and when evaluating the index() and substr() functions.
766(Mnemonic: [ begins subscripts.)
767.Ip $] 8 2
768The string printed out when you say \*(L"perl -v\*(R".
769It can be used to determine at the beginning of a script whether the perl
770interpreter executing the script is in the right range of versions.
771Example:
772.nf
773
774.ne 5
775 # see if getc is available
776 ($version,$patchlevel) =
777 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
778 print STDERR "(No filename completion available.)\en"
779 if $version * 1000 + $patchlevel < 2016;
780
781.fi
782(Mnemonic: Is this version of perl in the right bracket?)
783.Ip $; 8 2
784The subscript separator for multi-dimensional array emulation.
785If you refer to an associative array element as
786.nf
787 $foo{$a,$b,$c}
788
789it really means
790
791 $foo{join($;, $a, $b, $c)}
792
793But don't put
794
795 @foo{$a,$b,$c} # a slice--note the @
796
797which means
798
799 ($foo{$a},$foo{$b},$foo{$c})
800
801.fi
802Default is "\e034", the same as SUBSEP in
803.IR awk .
804Note that if your keys contain binary data there might not be any safe
805value for $;.
806(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
807Yeah, I know, it's pretty lame, but $, is already taken for something more
808important.)
809.Ip $! 8 2
810If used in a numeric context, yields the current value of errno, with all the
811usual caveats.
ffed7fef 812(This means that you shouldn't depend on the value of $! to be anything
813in particular unless you've gotten a specific error return indicating a
814system error.)
a687059c 815If used in a string context, yields the corresponding system error string.
816You can assign to $! in order to set errno
817if, for instance, you want $! to return the string for error n, or you want
818to set the exit value for the die operator.
819(Mnemonic: What just went bang?)
820.Ip $@ 8 2
ffed7fef 821The perl syntax error message from the last eval command.
822If null, the last eval parsed and executed correctly (although the operations
823you invoked may have failed in the normal fashion).
a687059c 824(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
825.Ip $< 8 2
826The real uid of this process.
827(Mnemonic: it's the uid you came FROM, if you're running setuid.)
828.Ip $> 8 2
829The effective uid of this process.
830Example:
831.nf
832
833.ne 2
834 $< = $>; # set real uid to the effective uid
835 ($<,$>) = ($>,$<); # swap real and effective uid
836
837.fi
838(Mnemonic: it's the uid you went TO, if you're running setuid.)
839Note: $< and $> can only be swapped on machines supporting setreuid().
840.Ip $( 8 2
841The real gid of this process.
842If you are on a machine that supports membership in multiple groups
843simultaneously, gives a space separated list of groups you are in.
844The first number is the one returned by getgid(), and the subsequent ones
845by getgroups(), one of which may be the same as the first number.
846(Mnemonic: parentheses are used to GROUP things.
847The real gid is the group you LEFT, if you're running setgid.)
848.Ip $) 8 2
849The effective gid of this process.
850If you are on a machine that supports membership in multiple groups
851simultaneously, gives a space separated list of groups you are in.
852The first number is the one returned by getegid(), and the subsequent ones
853by getgroups(), one of which may be the same as the first number.
854(Mnemonic: parentheses are used to GROUP things.
855The effective gid is the group that's RIGHT for you, if you're running setgid.)
856.Sp
857Note: $<, $>, $( and $) can only be set on machines that support the
858corresponding set[re][ug]id() routine.
859$( and $) can only be swapped on machines supporting setregid().
860.Ip $: 8 2
861The current set of characters after which a string may be broken to
862fill continuation fields (starting with ^) in a format.
863Default is "\ \en-", to break on whitespace or hyphens.
864(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
865.Ip @ARGV 8 3
866The array ARGV contains the command line arguments intended for the script.
867Note that $#ARGV is the generally number of arguments minus one, since
868$ARGV[0] is the first argument, NOT the command name.
869See $0 for the command name.
870.Ip @INC 8 3
871The array INC contains the list of places to look for
872.I perl
873scripts to be
874evaluated by the \*(L"do EXPR\*(R" command.
875It initially consists of the arguments to any
876.B \-I
877command line switches, followed
878by the default
879.I perl
880library, probably \*(L"/usr/local/lib/perl\*(R".
881.Ip $ENV{expr} 8 2
882The associative array ENV contains your current environment.
883Setting a value in ENV changes the environment for child processes.
884.Ip $SIG{expr} 8 2
885The associative array SIG is used to set signal handlers for various signals.
886Example:
887.nf
888
889.ne 12
890 sub handler { # 1st argument is signal name
891 local($sig) = @_;
892 print "Caught a SIG$sig\-\|\-shutting down\en";
893 close(LOG);
894 exit(0);
895 }
896
897 $SIG{\'INT\'} = \'handler\';
898 $SIG{\'QUIT\'} = \'handler\';
899 .\|.\|.
900 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
901 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
902
903.fi
904The SIG array only contains values for the signals actually set within
905the perl script.
906.Sh "Packages"
907Perl provides a mechanism for alternate namespaces to protect packages from
908stomping on each others variables.
909By default, a perl script starts compiling into the package known as \*(L"main\*(R".
910By use of the
911.I package
912declaration, you can switch namespaces.
913The scope of the package declaration is from the declaration itself to the end
914of the enclosing block (the same scope as the local() operator).
915Typically it would be the first declaration in a file to be included by
916the \*(L"do FILE\*(R" operator.
917You can switch into a package in more than one place; it merely influences
918which symbol table is used by the compiler for the rest of that block.
663a0e37 919You can refer to variables and filehandles in other packages by prefixing
920the identifier with the package name and a single quote.
a687059c 921If the package name is null, the \*(L"main\*(R" package as assumed.
663a0e37 922.PP
923Only identifiers starting with letters are stored in the packages symbol
924table.
925All other symbols are kept in package \*(L"main\*(R".
926In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
927and SIG are forced to be in package \*(L"main\*(R", even when used for
928other purposes than their built-in one.
929Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
930or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
931will be interpreted instead as a pattern match, a substitution
932or a translation.
933.PP
a687059c 934Eval'ed strings are compiled in the package in which the eval was compiled
935in.
936(Assignments to $SIG{}, however, assume the signal handler specified is in the
937main package.
938Qualify the signal handler name if you wish to have a signal handler in
939a package.)
940For an example, examine perldb.pl in the perl library.
941It initially switches to the DB package so that the debugger doesn't interfere
942with variables in the script you are trying to debug.
943At various points, however, it temporarily switches back to the main package
944to evaluate various expressions in the context of the main package.
945.PP
946The symbol table for a package happens to be stored in the associative array
947of that name prepended with an underscore.
948The value in each entry of the associative array is
949what you are referring to when you use the *name notation.
950In fact, the following have the same effect (in package main, anyway),
951though the first is more
952efficient because it does the symbol table lookups at compile time:
953.nf
954
955.ne 2
956 local(*foo) = *bar;
957 local($_main{'foo'}) = $_main{'bar'};
958
959.fi
960You can use this to print out all the variables in a package, for instance.
961Here is dumpvar.pl from the perl library:
962.nf
963.ne 11
964 package dumpvar;
965
966 sub main'dumpvar {
967 \& ($package) = @_;
968 \& local(*stab) = eval("*_$package");
969 \& while (($key,$val) = each(%stab)) {
970 \& {
971 \& local(*entry) = $val;
972 \& if (defined $entry) {
973 \& print "\e$$key = '$entry'\en";
974 \& }
975.ne 7
976 \& if (defined @entry) {
977 \& print "\e@$key = (\en";
978 \& foreach $num ($[ .. $#entry) {
979 \& print " $num\et'",$entry[$num],"'\en";
980 \& }
981 \& print ")\en";
982 \& }
983.ne 10
984 \& if ($key ne "_$package" && defined %entry) {
985 \& print "\e%$key = (\en";
986 \& foreach $key (sort keys(%entry)) {
987 \& print " $key\et'",$entry{$key},"'\en";
988 \& }
989 \& print ")\en";
990 \& }
991 \& }
992 \& }
993 }
994
995.fi
996Note that, even though the subroutine is compiled in package dumpvar, the
663a0e37 997name of the subroutine is qualified so that its name is inserted into package
a687059c 998\*(L"main\*(R".
999.Sh "Style"
1000Each programmer will, of course, have his or her own preferences in regards
1001to formatting, but there are some general guidelines that will make your
1002programs easier to read.
1003.Ip 1. 4 4
1004Just because you CAN do something a particular way doesn't mean that
1005you SHOULD do it that way.
1006.I Perl
1007is designed to give you several ways to do anything, so consider picking
1008the most readable one.
1009For instance
1010
1011 open(FOO,$foo) || die "Can't open $foo: $!";
1012
1013is better than
1014
1015 die "Can't open $foo: $!" unless open(FOO,$foo);
1016
1017because the second way hides the main point of the statement in a
1018modifier.
1019On the other hand
1020
1021 print "Starting analysis\en" if $verbose;
1022
1023is better than
1024
1025 $verbose && print "Starting analysis\en";
1026
1027since the main point isn't whether the user typed -v or not.
1028.Sp
1029Similarly, just because an operator lets you assume default arguments
1030doesn't mean that you have to make use of the defaults.
1031The defaults are there for lazy systems programmers writing one-shot
1032programs.
1033If you want your program to be readable, consider supplying the argument.
03a14243 1034.Sp
1035Along the same lines, just because you
1036.I can
1037omit parentheses in many places doesn't mean that you ought to:
1038.nf
1039
1040 return print reverse sort num values array;
1041 return print(reverse(sort num (values(%array))));
1042
1043.fi
1044When in doubt, parenthesize.
1045At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1046.Ip 2. 4 4
1047Don't go through silly contortions to exit a loop at the top or the
1048bottom, when
1049.I perl
1050provides the "last" operator so you can exit in the middle.
1051Just outdent it a little to make it more visible:
1052.nf
1053
1054.ne 7
1055 line:
1056 for (;;) {
1057 statements;
1058 last line if $foo;
1059 next line if /^#/;
1060 statements;
1061 }
1062
1063.fi
1064.Ip 3. 4 4
1065Don't be afraid to use loop labels\*(--they're there to enhance readability as
1066well as to allow multi-level loop breaks.
1067See last example.
ffed7fef 1068.Ip 4. 4 4
a687059c 1069For portability, when using features that may not be implemented on every
1070machine, test the construct in an eval to see if it fails.
03a14243 1071If you know what version or patchlevel a particular feature was implemented,
1072you can test $] to see if it will be there.
a687059c 1073.Ip 5. 4 4
ffed7fef 1074Choose mnemonic identifiers.
1075.Ip 6. 4 4
a687059c 1076Be consistent.
1077.Sh "Debugging"
1078If you invoke
1079.I perl
1080with a
1081.B \-d
1082switch, your script will be run under a debugging monitor.
1083It will halt before the first executable statement and ask you for a
1084command, such as:
1085.Ip "h" 12 4
1086Prints out a help message.
1087.Ip "s" 12 4
1088Single step.
1089Executes until it reaches the beginning of another statement.
1090.Ip "c" 12 4
1091Continue.
1092Executes until the next breakpoint is reached.
1093.Ip "<CR>" 12 4
1094Repeat last s or c.
1095.Ip "n" 12 4
1096Single step around subroutine call.
1097.Ip "l min+incr" 12 4
1098List incr+1 lines starting at min.
1099If min is omitted, starts where last listing left off.
1100If incr is omitted, previous value of incr is used.
1101.Ip "l min-max" 12 4
1102List lines in the indicated range.
1103.Ip "l line" 12 4
1104List just the indicated line.
1105.Ip "l" 12 4
1106List incr+1 more lines after last printed line.
1107.Ip "l subname" 12 4
1108List subroutine.
1109If it's a long subroutine it just lists the beginning.
1110Use \*(L"l\*(R" to list more.
1111.Ip "L" 12 4
1112List lines that have breakpoints or actions.
1113.Ip "t" 12 4
1114Toggle trace mode on or off.
1115.Ip "b line" 12 4
1116Set a breakpoint.
1117If line is omitted, sets a breakpoint on the current line
1118line that is about to be executed.
1119Breakpoints may only be set on lines that begin an executable statement.
1120.Ip "b subname" 12 4
1121Set breakpoint at first executable line of subroutine.
1122.Ip "S" 12 4
1123Lists the names of all subroutines.
1124.Ip "d line" 12 4
1125Delete breakpoint.
1126If line is omitted, deletes the breakpoint on the current line
1127line that is about to be executed.
1128.Ip "D" 12 4
1129Delete all breakpoints.
1130.Ip "A" 12 4
1131Delete all line actions.
1132.Ip "V package" 12 4
1133List all variables in package.
1134Default is main package.
1135.Ip "a line command" 12 4
1136Set an action for line.
1137A multi-line command may be entered by backslashing the newlines.
1138.Ip "< command" 12 4
1139Set an action to happen before every debugger prompt.
1140A multi-line command may be entered by backslashing the newlines.
1141.Ip "> command" 12 4
1142Set an action to happen after the prompt when you've just given a command
1143to return to executing the script.
1144A multi-line command may be entered by backslashing the newlines.
1145.Ip "! number" 12 4
1146Redo a debugging command.
1147If number is omitted, redoes the previous command.
1148.Ip "! -number" 12 4
1149Redo the command that was that many commands ago.
1150.Ip "H -number" 12 4
1151Display last n commands.
1152Only commands longer than one character are listed.
1153If number is omitted, lists them all.
1154.Ip "q or ^D" 12 4
1155Quit.
1156.Ip "command" 12 4
1157Execute command as a perl statement.
1158A missing semicolon will be supplied.
1159.Ip "p expr" 12 4
1160Same as \*(L"print DB'OUT expr\*(R".
1161The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1162may be redirected to.
1163.PP
1164If you want to modify the debugger, copy perldb.pl from the perl library
1165to your current directory and modify it as necessary.
1166You can do some customization by setting up a .perldb file which contains
1167initialization code.
1168For instance, you could make aliases like these:
1169.nf
1170
1171 $DBalias{'len'} = 's/^len(.*)/p length(\e$1)/';
1172 $DBalias{'stop'} = 's/^stop (at|in)/b/';
1173 $DBalias{'.'} =
1174 's/^./p "\e$DBsub(\e$DBline):\et\e$DBline[\e$DBline]"/';
1175
1176.fi
1177.Sh "Setuid Scripts"
1178.I Perl
1179is designed to make it easy to write secure setuid and setgid scripts.
1180Unlike shells, which are based on multiple substitution passes on each line
1181of the script,
1182.I perl
1183uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1184Additionally, since the language has more built-in functionality, it
1185has to rely less upon external (and possibly untrustworthy) programs to
1186accomplish its purposes.
1187.PP
1188In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1189insecure, but this kernel feature can be disabled.
1190If it is,
1191.I perl
1192can emulate the setuid and setgid mechanism when it notices the otherwise
1193useless setuid/gid bits on perl scripts.
1194If the kernel feature isn't disabled,
1195.I perl
1196will complain loudly that your setuid script is insecure.
1197You'll need to either disable the kernel setuid script feature, or put
1198a C wrapper around the script.
1199.PP
1200When perl is executing a setuid script, it takes special precautions to
1201prevent you from falling into any obvious traps.
1202(In some ways, a perl script is more secure than the corresponding
1203C program.)
1204Any command line argument, environment variable, or input is marked as
1205\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1206command that invokes a subshell, or in any command that modifies files,
1207directories or processes.
1208Any variable that is set within an expression that has previously referenced
1209a tainted value also becomes tainted (even if it is logically impossible
1210for the tainted value to influence the variable).
1211For example:
1212.nf
1213
1214.ne 5
1215 $foo = shift; # $foo is tainted
1216 $bar = $foo,\'bar\'; # $bar is also tainted
1217 $xxx = <>; # Tainted
1218 $path = $ENV{\'PATH\'}; # Tainted, but see below
1219 $abc = \'abc\'; # Not tainted
1220
1221.ne 4
1222 system "echo $foo"; # Insecure
1223 system "echo", $foo; # Secure (doesn't use sh)
1224 system "echo $bar"; # Insecure
1225 system "echo $abc"; # Insecure until PATH set
1226
1227.ne 5
1228 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1229 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1230
1231 $path = $ENV{\'PATH\'}; # Not tainted
1232 system "echo $abc"; # Is secure now!
1233
1234.ne 5
1235 open(FOO,"$foo"); # OK
1236 open(FOO,">$foo"); # Not OK
1237
1238 open(FOO,"echo $foo|"); # Not OK, but...
1239 open(FOO,"-|") || exec \'echo\', $foo; # OK
1240
1241 $zzz = `echo $foo`; # Insecure, zzz tainted
1242
1243 unlink $abc,$foo; # Insecure
1244 umask $foo; # Insecure
1245
1246.ne 3
1247 exec "echo $foo"; # Insecure
1248 exec "echo", $foo; # Secure (doesn't use sh)
1249 exec "sh", \'-c\', $foo; # Considered secure, alas
1250
1251.fi
1252The taintedness is associated with each scalar value, so some elements
1253of an array can be tainted, and others not.
1254.PP
1255If you try to do something insecure, you will get a fatal error saying
1256something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1257Note that you can still write an insecure system call or exec,
ae986130 1258but only by explicitly doing something like the last example above.
a687059c 1259You can also bypass the tainting mechanism by referencing
1260subpatterns\*(--\c
1261.I perl
1262presumes that if you reference a substring using $1, $2, etc, you knew
1263what you were doing when you wrote the pattern:
1264.nf
1265
1266 $ARGV[0] =~ /^\-P(\ew+)$/;
1267 $printer = $1; # Not tainted
1268
1269.fi
1270This is fairly secure since \ew+ doesn't match shell metacharacters.
1271Use of .+ would have been insecure, but
1272.I perl
1273doesn't check for that, so you must be careful with your patterns.
1274This is the ONLY mechanism for untainting user supplied filenames if you
1275want to do file operations on them (unless you make $> equal to $<).
1276.PP
1277It's also possible to get into trouble with other operations that don't care
1278whether they use tainted values.
1279Make judicious use of the file tests in dealing with any user-supplied
1280filenames.
1281When possible, do opens and such after setting $> = $<.
1282.I Perl
1283doesn't prevent you from opening tainted filenames for reading, so be
1284careful what you print out.
1285The tainting mechanism is intended to prevent stupid mistakes, not to remove
1286the need for thought.
1287.SH ENVIRONMENT
1288.I Perl
1289uses PATH in executing subprocesses, and in finding the script if \-S
1290is used.
1291HOME or LOGDIR are used if chdir has no argument.
1292.PP
1293Apart from these,
1294.I perl
1295uses no environment variables, except to make them available
1296to the script being executed, and to child processes.
1297However, scripts running setuid would do well to execute the following lines
1298before doing anything else, just to keep people honest:
1299.nf
1300
1301.ne 3
1302 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1303 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1304 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1305
1306.fi
1307.SH AUTHOR
1308Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1309.SH FILES
1310/tmp/perl\-eXXXXXX temporary file for
1311.B \-e
1312commands.
1313.SH SEE ALSO
1314a2p awk to perl translator
1315.br
1316s2p sed to perl translator
1317.SH DIAGNOSTICS
1318Compilation errors will tell you the line number of the error, with an
1319indication of the next token or token type that was to be examined.
1320(In the case of a script passed to
1321.I perl
1322via
1323.B \-e
1324switches, each
1325.B \-e
1326is counted as one line.)
1327.PP
1328Setuid scripts have additional constraints that can produce error messages
1329such as \*(L"Insecure dependency\*(R".
1330See the section on setuid scripts.
1331.SH TRAPS
1332Accustomed
1333.IR awk
1334users should take special note of the following:
1335.Ip * 4 2
1336Semicolons are required after all simple statements in
1337.IR perl .
1338Newline
1339is not a statement delimiter.
1340.Ip * 4 2
1341Curly brackets are required on ifs and whiles.
1342.Ip * 4 2
1343Variables begin with $ or @ in
1344.IR perl .
1345.Ip * 4 2
1346Arrays index from 0 unless you set $[.
1347Likewise string positions in substr() and index().
1348.Ip * 4 2
1349You have to decide whether your array has numeric or string indices.
1350.Ip * 4 2
1351Associative array values do not spring into existence upon mere reference.
1352.Ip * 4 2
1353You have to decide whether you want to use string or numeric comparisons.
1354.Ip * 4 2
1355Reading an input line does not split it for you. You get to split it yourself
1356to an array.
1357And the
1358.I split
1359operator has different arguments.
1360.Ip * 4 2
1361The current input line is normally in $_, not $0.
1362It generally does not have the newline stripped.
1363($0 is initially the name of the program executed, then the last matched
1364string.)
1365.Ip * 4 2
1366$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1367match pattern.
1368.Ip * 4 2
1369The
1370.I print
1371statement does not add field and record separators unless you set
1372$, and $\e.
1373.Ip * 4 2
1374You must open your files before you print to them.
1375.Ip * 4 2
1376The range operator is \*(L".\|.\*(R", not comma.
1377(The comma operator works as in C.)
1378.Ip * 4 2
1379The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1380(\*(L"~\*(R" is the one's complement operator, as in C.)
1381.Ip * 4 2
1382The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1383(\*(L"^\*(R" is the XOR operator, as in C.)
1384.Ip * 4 2
1385The concatenation operator is \*(L".\*(R", not the null string.
1386(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1387since the third slash would be interpreted as a division operator\*(--the
1388tokener is in fact slightly context sensitive for operators like /, ?, and <.
1389And in fact, . itself can be the beginning of a number.)
1390.Ip * 4 2
1391.IR Next ,
1392.I exit
1393and
1394.I continue
1395work differently.
1396.Ip * 4 2
1397The following variables work differently
1398.nf
1399
1400 Awk \h'|2.5i'Perl
1401 ARGC \h'|2.5i'$#ARGV
1402 ARGV[0] \h'|2.5i'$0
1403 FILENAME\h'|2.5i'$ARGV
1404 FNR \h'|2.5i'$. \- something
1405 FS \h'|2.5i'(whatever you like)
1406 NF \h'|2.5i'$#Fld, or some such
1407 NR \h'|2.5i'$.
1408 OFMT \h'|2.5i'$#
1409 OFS \h'|2.5i'$,
1410 ORS \h'|2.5i'$\e
1411 RLENGTH \h'|2.5i'length($&)
ae986130 1412 RS \h'|2.5i'$\/
a687059c 1413 RSTART \h'|2.5i'length($\`)
1414 SUBSEP \h'|2.5i'$;
1415
1416.fi
1417.Ip * 4 2
1418When in doubt, run the
1419.I awk
1420construct through a2p and see what it gives you.
1421.PP
1422Cerebral C programmers should take note of the following:
1423.Ip * 4 2
1424Curly brackets are required on ifs and whiles.
1425.Ip * 4 2
1426You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1427.Ip * 4 2
1428.I Break
1429and
1430.I continue
1431become
1432.I last
1433and
1434.IR next ,
1435respectively.
1436.Ip * 4 2
1437There's no switch statement.
1438.Ip * 4 2
1439Variables begin with $ or @ in
1440.IR perl .
1441.Ip * 4 2
1442Printf does not implement *.
1443.Ip * 4 2
1444Comments begin with #, not /*.
1445.Ip * 4 2
1446You can't take the address of anything.
1447.Ip * 4 2
1448ARGV must be capitalized.
1449.Ip * 4 2
1450The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1451.Ip * 4 2
1452Signal handlers deal with signal names, not numbers.
1453.Ip * 4 2
1454You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1455.PP
1456Seasoned
1457.I sed
1458programmers should take note of the following:
1459.Ip * 4 2
1460Backreferences in substitutions use $ rather than \e.
1461.Ip * 4 2
1462The pattern matching metacharacters (, ), and | do not have backslashes in front.
1463.Ip * 4 2
1464The range operator is .\|. rather than comma.
1465.PP
1466Sharp shell programmers should take note of the following:
1467.Ip * 4 2
1468The backtick operator does variable interpretation without regard to the
1469presence of single quotes in the command.
1470.Ip * 4 2
1471The backtick operator does no translation of the return value, unlike csh.
1472.Ip * 4 2
1473Shells (especially csh) do several levels of substitution on each command line.
1474.I Perl
1475does substitution only in certain constructs such as double quotes,
1476backticks, angle brackets and search patterns.
1477.Ip * 4 2
1478Shells interpret scripts a little bit at a time.
1479.I Perl
1480compiles the whole program before executing it.
1481.Ip * 4 2
1482The arguments are available via @ARGV, not $1, $2, etc.
1483.Ip * 4 2
1484The environment is not automatically made available as variables.
1485.SH BUGS
1486.PP
1487.I Perl
1488is at the mercy of your machine's definitions of various operations
1489such as type casting, atof() and sprintf().
1490.PP
1491If your stdio requires an seek or eof between reads and writes on a particular
1492stream, so does
1493.IR perl .
1494.PP
1495While none of the built-in data types have any arbitrary size limits (apart
1496from memory size), there are still a few arbitrary limits:
1497a given identifier may not be longer than 255 characters;
1498sprintf is limited on many machines to 128 characters per field (unless the format
1499specifier is exactly %s);
1500and no component of your PATH may be longer than 255 if you use \-S.
1501.PP
1502.I Perl
1503actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1504anyone I said that.
1505.rn }` ''