perl 3.0 patch #15 (combined patch)
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
63f2c1e1 2''' $Header: perl.man.4,v 3.0.1.7 90/03/14 12:29:50 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
63f2c1e1 5''' Revision 3.0.1.7 90/03/14 12:29:50 lwall
6''' patch15: man page falsely states that you can't subscript array values
7'''
79a0689e 8''' Revision 3.0.1.6 90/03/12 16:54:04 lwall
9''' patch13: improved documentation of *name
10'''
ac58e20f 11''' Revision 3.0.1.5 90/02/28 18:01:52 lwall
12''' patch9: $0 is now always the command name
13'''
663a0e37 14''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
15''' patch7: documented that package'filehandle works as well as $package'variable
16''' patch7: documented which identifiers are always in package main
17'''
ffed7fef 18''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
19''' patch5: fixed some manual typos and indent problems
20''' patch5: clarified difference between $! and $@
21'''
ae986130 22''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
23''' patch2: made some line breaks depend on troff vs. nroff
24''' patch2: clarified operation of ^ and $ when $* is false
25'''
03a14243 26''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
27''' patch1: documented the desirability of unnecessary parentheses
28'''
a687059c 29''' Revision 3.0 89/10/18 15:21:55 lwall
30''' 3.0 baseline
31'''
32.Sh "Precedence"
33.I Perl
34operators have the following associativity and precedence:
35.nf
36
37nonassoc\h'|1i'print printf exec system sort reverse
38\h'1.5i'chmod chown kill unlink utime die return
39left\h'|1i',
40right\h'|1i'= += \-= *= etc.
41right\h'|1i'?:
42nonassoc\h'|1i'.\|.
43left\h'|1i'||
44left\h'|1i'&&
45left\h'|1i'| ^
46left\h'|1i'&
47nonassoc\h'|1i'== != eq ne
48nonassoc\h'|1i'< > <= >= lt gt le ge
49nonassoc\h'|1i'chdir exit eval reset sleep rand umask
50nonassoc\h'|1i'\-r \-w \-x etc.
51left\h'|1i'<< >>
52left\h'|1i'+ \- .
53left\h'|1i'* / % x
54left\h'|1i'=~ !~
55right\h'|1i'! ~ and unary minus
56right\h'|1i'**
57nonassoc\h'|1i'++ \-\|\-
58left\h'|1i'\*(L'(\*(R'
59
60.fi
61As mentioned earlier, if any list operator (print, etc.) or
62any unary operator (chdir, etc.)
63is followed by a left parenthesis as the next token on the same line,
64the operator and arguments within parentheses are taken to
65be of highest precedence, just like a normal function call.
66Examples:
67.nf
68
ffed7fef 69 chdir $foo || die;\h'|3i'# (chdir $foo) || die
70 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
71 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
72 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 73
74but, because * is higher precedence than ||:
75
ffed7fef 76 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
77 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
78 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
79 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 80
ffed7fef 81 rand 10 * 20;\h'|3i'# rand (10 * 20)
82 rand(10) * 20;\h'|3i'# (rand 10) * 20
83 rand (10) * 20;\h'|3i'# (rand 10) * 20
84 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 85
86.fi
87In the absence of parentheses,
88the precedence of list operators such as print, sort or chmod is
89either very high or very low depending on whether you look at the left
90side of operator or the right side of it.
91For example, in
92.nf
93
94 @ary = (1, 3, sort 4, 2);
95 print @ary; # prints 1324
96
97.fi
98the commas on the right of the sort are evaluated before the sort, but
99the commas on the left are evaluated after.
100In other words, list operators tend to gobble up all the arguments that
101follow them, and then act like a simple term with regard to the preceding
102expression.
103Note that you have to be careful with parens:
104.nf
105
106.ne 3
107 # These evaluate exit before doing the print:
108 print($foo, exit); # Obviously not what you want.
109 print $foo, exit; # Nor is this.
110
111.ne 4
112 # These do the print before evaluating exit:
113 (print $foo), exit; # This is what you want.
114 print($foo), exit; # Or this.
115 print ($foo), exit; # Or even this.
116
117Also note that
118
119 print ($foo & 255) + 1, "\en";
120
121.fi
122probably doesn't do what you expect at first glance.
123.Sh "Subroutines"
124A subroutine may be declared as follows:
125.nf
126
127 sub NAME BLOCK
128
129.fi
130.PP
131Any arguments passed to the routine come in as array @_,
132that is ($_[0], $_[1], .\|.\|.).
133The array @_ is a local array, but its values are references to the
134actual scalar parameters.
135The return value of the subroutine is the value of the last expression
136evaluated, and can be either an array value or a scalar value.
137Alternately, a return statement may be used to specify the returned value and
138exit the subroutine.
139To create local variables see the
140.I local
141operator.
142.PP
143A subroutine is called using the
144.I do
145operator or the & operator.
146.nf
147
148.ne 12
149Example:
150
151 sub MAX {
152 local($max) = pop(@_);
153 foreach $foo (@_) {
154 $max = $foo \|if \|$max < $foo;
155 }
156 $max;
157 }
158
159 .\|.\|.
160 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
161
162.ne 21
163Example:
164
165 # get a line, combining continuation lines
166 # that start with whitespace
167 sub get_line {
168 $thisline = $lookahead;
169 line: while ($lookahead = <STDIN>) {
170 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
171 $thisline \|.= \|$lookahead;
172 }
173 else {
174 last line;
175 }
176 }
177 $thisline;
178 }
179
180 $lookahead = <STDIN>; # get first line
181 while ($_ = do get_line(\|)) {
182 .\|.\|.
183 }
184
185.fi
186.nf
187.ne 6
188Use array assignment to a local list to name your formal arguments:
189
190 sub maybeset {
191 local($key, $value) = @_;
192 $foo{$key} = $value unless $foo{$key};
193 }
194
195.fi
196This also has the effect of turning call-by-reference into call-by-value,
197since the assignment copies the values.
198.Sp
199Subroutines may be called recursively.
200If a subroutine is called using the & form, the argument list is optional.
201If omitted, no @_ array is set up for the subroutine; the @_ array at the
202time of the call is visible to subroutine instead.
203.nf
204
205 do foo(1,2,3); # pass three arguments
206 &foo(1,2,3); # the same
207
208 do foo(); # pass a null list
209 &foo(); # the same
210 &foo; # pass no arguments--more efficient
211
212.fi
213.Sh "Passing By Reference"
214Sometimes you don't want to pass the value of an array to a subroutine but
215rather the name of it, so that the subroutine can modify the global copy
216of it rather than working with a local copy.
217In perl you can refer to all the objects of a particular name by prefixing
218the name with a star: *foo.
219When evaluated, it produces a scalar value that represents all the objects
79a0689e 220of that name, including any filehandle, format or subroutine.
a687059c 221When assigned to within a local() operation, it causes the name mentioned
222to refer to whatever * value was assigned to it.
223Example:
224.nf
225
226 sub doubleary {
227 local(*someary) = @_;
228 foreach $elem (@someary) {
229 $elem *= 2;
230 }
231 }
232 do doubleary(*foo);
233 do doubleary(*bar);
234
235.fi
236Assignment to *name is currently recommended only inside a local().
237You can actually assign to *name anywhere, but the previous referent of
238*name may be stranded forever.
239This may or may not bother you.
240.Sp
241Note that scalars are already passed by reference, so you can modify scalar
ae986130 242arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 243in question.
244You can modify all the elements of an array by passing all the elements
245as scalars, but you have to use the * mechanism to push, pop or change the
246size of an array.
247The * mechanism will probably be more efficient in any case.
248.Sp
249Since a *name value contains unprintable binary data, if it is used as
250an argument in a print, or as a %s argument in a printf or sprintf, it
251then has the value '*name', just so it prints out pretty.
79a0689e 252.Sp
253Even if you don't want to modify an array, this mechanism is useful for
254passing multiple arrays in a single LIST, since normally the LIST mechanism
255will merge all the array values so that you can't extract out the
256individual arrays.
a687059c 257.Sh "Regular Expressions"
258The patterns used in pattern matching are regular expressions such as
259those supplied in the Version 8 regexp routines.
260(In fact, the routines are derived from Henry Spencer's freely redistributable
261reimplementation of the V8 routines.)
262In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
263Word boundaries may be matched by \eb, and non-boundaries by \eB.
264A whitespace character is matched by \es, non-whitespace by \eS.
265A numeric character is matched by \ed, non-numeric by \eD.
266You may use \ew, \es and \ed within character classes.
267Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
268Within character classes \eb represents backspace rather than a word boundary.
269Alternatives may be separated by |.
270The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
271matches the digit'th substring, where digit can range from 1 to 9.
272(Outside of the pattern, always use $ instead of \e in front of the digit.
273The scope of $<digit> (and $\`, $& and $\')
274extends to the end of the enclosing BLOCK or eval string, or to
275the next pattern match with subexpressions.
276The \e<digit> notation sometimes works outside the current pattern, but should
277not be relied upon.)
278$+ returns whatever the last bracket match matched.
279$& returns the entire matched string.
ac58e20f 280($0 used to return the same thing, but not any more.)
a687059c 281$\` returns everything before the matched string.
282$\' returns everything after the matched string.
283Examples:
284.nf
285
286 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
287
288.ne 5
289 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
290 $hours = $1;
291 $minutes = $2;
292 $seconds = $3;
293 }
294
295.fi
ae986130 296By default, the ^ character is only guaranteed to match at the beginning
297of the string,
298the $ character only at the end (or before the newline at the end)
a687059c 299and
300.I perl
301does certain optimizations with the assumption that the string contains
302only one line.
ae986130 303The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 304You may, however, wish to treat a string as a multi-line buffer, such that
305the ^ will match after any newline within the string, and $ will match
306before any newline.
307At the cost of a little more overhead, you can do this by setting the variable
308$* to 1.
309Setting it back to 0 makes
310.I perl
311revert to its old behavior.
312.PP
313To facilitate multi-line substitutions, the . character never matches a newline
314(even when $* is 0).
315In particular, the following leaves a newline on the $_ string:
316.nf
317
318 $_ = <STDIN>;
319 s/.*(some_string).*/$1/;
320
321If the newline is unwanted, try one of
322
323 s/.*(some_string).*\en/$1/;
324 s/.*(some_string)[^\e000]*/$1/;
325 s/.*(some_string)(.|\en)*/$1/;
326 chop; s/.*(some_string).*/$1/;
327 /(some_string)/ && ($_ = $1);
328
329.fi
330Any item of a regular expression may be followed with digits in curly brackets
331of the form {n,m}, where n gives the minimum number of times to match the item
332and m gives the maximum.
333The form {n} is equivalent to {n,n} and matches exactly n times.
334The form {n,} matches n or more times.
335(If a curly bracket occurs in any other context, it is treated as a regular
336character.)
337The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
338to {0,1}.
339There is no limit to the size of n or m, but large numbers will chew up
340more memory.
341.Sp
342You will note that all backslashed metacharacters in
343.I perl
344are alphanumeric,
345such as \eb, \ew, \en.
346Unlike some other regular expression languages, there are no backslashed
347symbols that aren't alphanumeric.
348So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
349interpreted as a literal character, not a metacharacter.
350This makes it simple to quote a string that you want to use for a pattern
351but that you are afraid might contain metacharacters.
352Simply quote all the non-alphanumeric characters:
353.nf
354
355 $pattern =~ s/(\eW)/\e\e$1/g;
356
357.fi
358.Sh "Formats"
359Output record formats for use with the
360.I write
361operator may declared as follows:
362.nf
363
364.ne 3
365 format NAME =
366 FORMLIST
367 .
368
369.fi
370If name is omitted, format \*(L"STDOUT\*(R" is defined.
371FORMLIST consists of a sequence of lines, each of which may be of one of three
372types:
373.Ip 1. 4
374A comment.
375.Ip 2. 4
376A \*(L"picture\*(R" line giving the format for one output line.
377.Ip 3. 4
378An argument line supplying values to plug into a picture line.
379.PP
380Picture lines are printed exactly as they look, except for certain fields
381that substitute values into the line.
382Each picture field starts with either @ or ^.
383The @ field (not to be confused with the array marker @) is the normal
384case; ^ fields are used
385to do rudimentary multi-line text block filling.
386The length of the field is supplied by padding out the field
387with multiple <, >, or | characters to specify, respectively, left justification,
388right justification, or centering.
389If any of the values supplied for these fields contains a newline, only
390the text up to the newline is printed.
391The special field @* can be used for printing multi-line values.
392It should appear by itself on a line.
393.PP
394The values are specified on the following line, in the same order as
395the picture fields.
396The values should be separated by commas.
397.PP
398Picture fields that begin with ^ rather than @ are treated specially.
399The value supplied must be a scalar variable name which contains a text
400string.
401.I Perl
402puts as much text as it can into the field, and then chops off the front
403of the string so that the next time the variable is referenced,
404more of the text can be printed.
405Normally you would use a sequence of fields in a vertical stack to print
406out a block of text.
407If you like, you can end the final field with .\|.\|., which will appear in the
408output if the text was too long to appear in its entirety.
409You can change which characters are legal to break on by changing the
410variable $: to a list of the desired characters.
411.PP
412Since use of ^ fields can produce variable length records if the text to be
413formatted is short, you can suppress blank lines by putting the tilde (~)
414character anywhere in the line.
415(Normally you should put it in the front if possible, for visibility.)
416The tilde will be translated to a space upon output.
417If you put a second tilde contiguous to the first, the line will be repeated
418until all the fields on the line are exhausted.
419(If you use a field of the @ variety, the expression you supply had better
420not give the same value every time forever!)
421.PP
422Examples:
423.nf
424.lg 0
425.cs R 25
426.ft C
427
428.ne 10
429# a report on the /etc/passwd file
430format top =
431\& Passwd File
432Name Login Office Uid Gid Home
433------------------------------------------------------------------
434\&.
435format STDOUT =
436@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
437$name, $login, $office,$uid,$gid, $home
438\&.
439
440.ne 29
441# a report from a bug report form
442format top =
443\& Bug Reports
444@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
445$system, $%, $date
446------------------------------------------------------------------
447\&.
448format STDOUT =
449Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
450\& $subject
451Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
452\& $index, $description
453Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
454\& $priority, $date, $description
455From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
456\& $from, $description
457Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
458\& $programmer, $description
459\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
460\& $description
461\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
462\& $description
463\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
464\& $description
465\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
466\& $description
467\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
468\& $description
469\&.
470
471.ft R
472.cs R
473.lg
474.fi
475It is possible to intermix prints with writes on the same output channel,
476but you'll have to handle $\- (lines left on the page) yourself.
477.PP
478If you are printing lots of fields that are usually blank, you should consider
479using the reset operator between records.
480Not only is it more efficient, but it can prevent the bug of adding another
481field and forgetting to zero it.
482.Sh "Interprocess Communication"
483The IPC facilities of perl are built on the Berkeley socket mechanism.
484If you don't have sockets, you can ignore this section.
485The calls have the same names as the corresponding system calls,
486but the arguments tend to differ, for two reasons.
487First, perl file handles work differently than C file descriptors.
488Second, perl already knows the length of its strings, so you don't need
489to pass that information.
490Here is a sample client (untested):
491.nf
492
493 ($them,$port) = @ARGV;
494 $port = 2345 unless $port;
495 $them = 'localhost' unless $them;
496
497 $SIG{'INT'} = 'dokill';
498 sub dokill { kill 9,$child if $child; }
499
500 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
501
502 $sockaddr = 'S n a4 x8';
503 chop($hostname = `hostname`);
504
505 ($name, $aliases, $proto) = getprotobyname('tcp');
506 ($name, $aliases, $port) = getservbyname($port, 'tcp')
507 unless $port =~ /^\ed+$/;;
ae986130 508.ie t \{\
a687059c 509 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 510'br\}
511.el \{\
512 ($name, $aliases, $type, $len, $thisaddr) =
513 gethostbyname($hostname);
514'br\}
a687059c 515 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
516
517 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
518 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
519
520 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
521 bind(S, $this) || die "bind: $!";
522 connect(S, $that) || die "connect: $!";
523
524 select(S); $| = 1; select(stdout);
525
526 if ($child = fork) {
527 while (<>) {
528 print S;
529 }
530 sleep 3;
531 do dokill();
532 }
533 else {
534 while (<S>) {
535 print;
536 }
537 }
538
539.fi
540And here's a server:
541.nf
542
543 ($port) = @ARGV;
544 $port = 2345 unless $port;
545
546 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
547
548 $sockaddr = 'S n a4 x8';
549
550 ($name, $aliases, $proto) = getprotobyname('tcp');
551 ($name, $aliases, $port) = getservbyname($port, 'tcp')
552 unless $port =~ /^\ed+$/;;
553
554 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
555
556 select(NS); $| = 1; select(stdout);
557
558 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
559 bind(S, $this) || die "bind: $!";
560 listen(S, 5) || die "connect: $!";
561
562 select(S); $| = 1; select(stdout);
563
564 for (;;) {
565 print "Listening again\en";
566 ($addr = accept(NS,S)) || die $!;
567 print "accept ok\en";
568
ae986130 569 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 570 @inetaddr = unpack('C4',$inetaddr);
571 print "$af $port @inetaddr\en";
572
573 while (<NS>) {
574 print;
575 print NS;
576 }
577 }
578
579.fi
580.Sh "Predefined Names"
581The following names have special meaning to
582.IR perl .
583I could have used alphabetic symbols for some of these, but I didn't want
584to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
585out.
586You'll just have to suffer along with these silly symbols.
587Most of them have reasonable mnemonics, or analogues in one of the shells.
588.Ip $_ 8
589The default input and pattern-searching space.
590The following pairs are equivalent:
591.nf
592
593.ne 2
594 while (<>) {\|.\|.\|. # only equivalent in while!
595 while ($_ = <>) {\|.\|.\|.
596
597.ne 2
598 /\|^Subject:/
599 $_ \|=~ \|/\|^Subject:/
600
601.ne 2
602 y/a\-z/A\-Z/
603 $_ =~ y/a\-z/A\-Z/
604
605.ne 2
606 chop
607 chop($_)
608
609.fi
610(Mnemonic: underline is understood in certain operations.)
611.Ip $. 8
612The current input line number of the last filehandle that was read.
613Readonly.
614Remember that only an explicit close on the filehandle resets the line number.
615Since <> never does an explicit close, line numbers increase across ARGV files
616(but see examples under eof).
617(Mnemonic: many programs use . to mean the current line number.)
618.Ip $/ 8
619The input record separator, newline by default.
620Works like
621.IR awk 's
622RS variable, including treating blank lines as delimiters
623if set to the null string.
624If set to a value longer than one character, only the first character is used.
625(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
626.Ip $, 8
627The output field separator for the print operator.
628Ordinarily the print operator simply prints out the comma separated fields
629you specify.
630In order to get behavior more like
631.IR awk ,
632set this variable as you would set
633.IR awk 's
634OFS variable to specify what is printed between fields.
635(Mnemonic: what is printed when there is a , in your print statement.)
636.Ip $"" 8
637This is like $, except that it applies to array values interpolated into
638a double-quoted string (or similar interpreted string).
639Default is a space.
640(Mnemonic: obvious, I think.)
641.Ip $\e 8
642The output record separator for the print operator.
643Ordinarily the print operator simply prints out the comma separated fields
644you specify, with no trailing newline or record separator assumed.
645In order to get behavior more like
646.IR awk ,
647set this variable as you would set
648.IR awk 's
649ORS variable to specify what is printed at the end of the print.
650(Mnemonic: you set $\e instead of adding \en at the end of the print.
651Also, it's just like /, but it's what you get \*(L"back\*(R" from
652.IR perl .)
653.Ip $# 8
654The output format for printed numbers.
655This variable is a half-hearted attempt to emulate
656.IR awk 's
657OFMT variable.
658There are times, however, when
659.I awk
660and
661.I perl
662have differing notions of what
663is in fact numeric.
664Also, the initial value is %.20g rather than %.6g, so you need to set $#
665explicitly to get
666.IR awk 's
667value.
668(Mnemonic: # is the number sign.)
669.Ip $% 8
670The current page number of the currently selected output channel.
671(Mnemonic: % is page number in nroff.)
672.Ip $= 8
673The current page length (printable lines) of the currently selected output
674channel.
675Default is 60.
676(Mnemonic: = has horizontal lines.)
677.Ip $\- 8
678The number of lines left on the page of the currently selected output channel.
679(Mnemonic: lines_on_page \- lines_printed.)
680.Ip $~ 8
681The name of the current report format for the currently selected output
682channel.
683(Mnemonic: brother to $^.)
684.Ip $^ 8
685The name of the current top-of-page format for the currently selected output
686channel.
687(Mnemonic: points to top of page.)
688.Ip $| 8
689If set to nonzero, forces a flush after every write or print on the currently
690selected output channel.
691Default is 0.
692Note that
693.I STDOUT
694will typically be line buffered if output is to the
695terminal and block buffered otherwise.
696Setting this variable is useful primarily when you are outputting to a pipe,
697such as when you are running a
698.I perl
699script under rsh and want to see the
700output as it's happening.
701(Mnemonic: when you want your pipes to be piping hot.)
702.Ip $$ 8
703The process number of the
704.I perl
705running this script.
706(Mnemonic: same as shells.)
707.Ip $? 8
708The status returned by the last pipe close, backtick (\`\`) command or
709.I system
710operator.
711Note that this is the status word returned by the wait() system
712call, so the exit value of the subprocess is actually ($? >> 8).
713$? & 255 gives which signal, if any, the process died from, and whether
714there was a core dump.
715(Mnemonic: similar to sh and ksh.)
716.Ip $& 8 4
717The string matched by the last pattern match (not counting any matches hidden
718within a BLOCK or eval enclosed by the current BLOCK).
719(Mnemonic: like & in some editors.)
720.Ip $\` 8 4
721The string preceding whatever was matched by the last pattern match
722(not counting any matches hidden within a BLOCK or eval enclosed by the current
723BLOCK).
724(Mnemonic: \` often precedes a quoted string.)
725.Ip $\' 8 4
726The string following whatever was matched by the last pattern match
727(not counting any matches hidden within a BLOCK or eval enclosed by the current
728BLOCK).
729(Mnemonic: \' often follows a quoted string.)
730Example:
731.nf
732
733.ne 3
734 $_ = \'abcdefghi\';
735 /def/;
736 print "$\`:$&:$\'\en"; # prints abc:def:ghi
737
738.fi
739.Ip $+ 8 4
740The last bracket matched by the last search pattern.
741This is useful if you don't know which of a set of alternative patterns
742matched.
743For example:
744.nf
745
746 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
747
748.fi
749(Mnemonic: be positive and forward looking.)
750.Ip $* 8 2
751Set to 1 to do multiline matching within a string, 0 to tell
752.I perl
753that it can assume that strings contain a single line, for the purpose
754of optimizing pattern matches.
755Pattern matches on strings containing multiple newlines can produce confusing
756results when $* is 0.
757Default is 0.
758(Mnemonic: * matches multiple things.)
759.Ip $0 8
760Contains the name of the file containing the
761.I perl
762script being executed.
a687059c 763(Mnemonic: same as sh and ksh.)
764.Ip $<digit> 8
765Contains the subpattern from the corresponding set of parentheses in the last
766pattern matched, not counting patterns matched in nested blocks that have
767been exited already.
768(Mnemonic: like \edigit.)
769.Ip $[ 8 2
770The index of the first element in an array, and of the first character in
771a substring.
772Default is 0, but you could set it to 1 to make
773.I perl
774behave more like
775.I awk
776(or Fortran)
777when subscripting and when evaluating the index() and substr() functions.
778(Mnemonic: [ begins subscripts.)
779.Ip $] 8 2
780The string printed out when you say \*(L"perl -v\*(R".
781It can be used to determine at the beginning of a script whether the perl
782interpreter executing the script is in the right range of versions.
783Example:
784.nf
785
786.ne 5
787 # see if getc is available
788 ($version,$patchlevel) =
789 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
790 print STDERR "(No filename completion available.)\en"
791 if $version * 1000 + $patchlevel < 2016;
792
793.fi
794(Mnemonic: Is this version of perl in the right bracket?)
795.Ip $; 8 2
796The subscript separator for multi-dimensional array emulation.
797If you refer to an associative array element as
798.nf
799 $foo{$a,$b,$c}
800
801it really means
802
803 $foo{join($;, $a, $b, $c)}
804
805But don't put
806
807 @foo{$a,$b,$c} # a slice--note the @
808
809which means
810
811 ($foo{$a},$foo{$b},$foo{$c})
812
813.fi
814Default is "\e034", the same as SUBSEP in
815.IR awk .
816Note that if your keys contain binary data there might not be any safe
817value for $;.
818(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
819Yeah, I know, it's pretty lame, but $, is already taken for something more
820important.)
821.Ip $! 8 2
822If used in a numeric context, yields the current value of errno, with all the
823usual caveats.
ffed7fef 824(This means that you shouldn't depend on the value of $! to be anything
825in particular unless you've gotten a specific error return indicating a
826system error.)
a687059c 827If used in a string context, yields the corresponding system error string.
828You can assign to $! in order to set errno
829if, for instance, you want $! to return the string for error n, or you want
830to set the exit value for the die operator.
831(Mnemonic: What just went bang?)
832.Ip $@ 8 2
ffed7fef 833The perl syntax error message from the last eval command.
834If null, the last eval parsed and executed correctly (although the operations
835you invoked may have failed in the normal fashion).
a687059c 836(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
837.Ip $< 8 2
838The real uid of this process.
839(Mnemonic: it's the uid you came FROM, if you're running setuid.)
840.Ip $> 8 2
841The effective uid of this process.
842Example:
843.nf
844
845.ne 2
846 $< = $>; # set real uid to the effective uid
847 ($<,$>) = ($>,$<); # swap real and effective uid
848
849.fi
850(Mnemonic: it's the uid you went TO, if you're running setuid.)
851Note: $< and $> can only be swapped on machines supporting setreuid().
852.Ip $( 8 2
853The real gid of this process.
854If you are on a machine that supports membership in multiple groups
855simultaneously, gives a space separated list of groups you are in.
856The first number is the one returned by getgid(), and the subsequent ones
857by getgroups(), one of which may be the same as the first number.
858(Mnemonic: parentheses are used to GROUP things.
859The real gid is the group you LEFT, if you're running setgid.)
860.Ip $) 8 2
861The effective gid of this process.
862If you are on a machine that supports membership in multiple groups
863simultaneously, gives a space separated list of groups you are in.
864The first number is the one returned by getegid(), and the subsequent ones
865by getgroups(), one of which may be the same as the first number.
866(Mnemonic: parentheses are used to GROUP things.
867The effective gid is the group that's RIGHT for you, if you're running setgid.)
868.Sp
869Note: $<, $>, $( and $) can only be set on machines that support the
870corresponding set[re][ug]id() routine.
871$( and $) can only be swapped on machines supporting setregid().
872.Ip $: 8 2
873The current set of characters after which a string may be broken to
874fill continuation fields (starting with ^) in a format.
875Default is "\ \en-", to break on whitespace or hyphens.
876(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
877.Ip @ARGV 8 3
878The array ARGV contains the command line arguments intended for the script.
879Note that $#ARGV is the generally number of arguments minus one, since
880$ARGV[0] is the first argument, NOT the command name.
881See $0 for the command name.
882.Ip @INC 8 3
883The array INC contains the list of places to look for
884.I perl
885scripts to be
886evaluated by the \*(L"do EXPR\*(R" command.
887It initially consists of the arguments to any
888.B \-I
889command line switches, followed
890by the default
891.I perl
892library, probably \*(L"/usr/local/lib/perl\*(R".
893.Ip $ENV{expr} 8 2
894The associative array ENV contains your current environment.
895Setting a value in ENV changes the environment for child processes.
896.Ip $SIG{expr} 8 2
897The associative array SIG is used to set signal handlers for various signals.
898Example:
899.nf
900
901.ne 12
902 sub handler { # 1st argument is signal name
903 local($sig) = @_;
904 print "Caught a SIG$sig\-\|\-shutting down\en";
905 close(LOG);
906 exit(0);
907 }
908
909 $SIG{\'INT\'} = \'handler\';
910 $SIG{\'QUIT\'} = \'handler\';
911 .\|.\|.
912 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
913 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
914
915.fi
916The SIG array only contains values for the signals actually set within
917the perl script.
918.Sh "Packages"
919Perl provides a mechanism for alternate namespaces to protect packages from
920stomping on each others variables.
921By default, a perl script starts compiling into the package known as \*(L"main\*(R".
922By use of the
923.I package
924declaration, you can switch namespaces.
925The scope of the package declaration is from the declaration itself to the end
926of the enclosing block (the same scope as the local() operator).
927Typically it would be the first declaration in a file to be included by
928the \*(L"do FILE\*(R" operator.
929You can switch into a package in more than one place; it merely influences
930which symbol table is used by the compiler for the rest of that block.
663a0e37 931You can refer to variables and filehandles in other packages by prefixing
932the identifier with the package name and a single quote.
a687059c 933If the package name is null, the \*(L"main\*(R" package as assumed.
663a0e37 934.PP
935Only identifiers starting with letters are stored in the packages symbol
936table.
937All other symbols are kept in package \*(L"main\*(R".
938In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
939and SIG are forced to be in package \*(L"main\*(R", even when used for
940other purposes than their built-in one.
941Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
942or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
943will be interpreted instead as a pattern match, a substitution
944or a translation.
945.PP
a687059c 946Eval'ed strings are compiled in the package in which the eval was compiled
947in.
948(Assignments to $SIG{}, however, assume the signal handler specified is in the
949main package.
950Qualify the signal handler name if you wish to have a signal handler in
951a package.)
952For an example, examine perldb.pl in the perl library.
953It initially switches to the DB package so that the debugger doesn't interfere
954with variables in the script you are trying to debug.
955At various points, however, it temporarily switches back to the main package
956to evaluate various expressions in the context of the main package.
957.PP
958The symbol table for a package happens to be stored in the associative array
959of that name prepended with an underscore.
960The value in each entry of the associative array is
961what you are referring to when you use the *name notation.
962In fact, the following have the same effect (in package main, anyway),
963though the first is more
964efficient because it does the symbol table lookups at compile time:
965.nf
966
967.ne 2
968 local(*foo) = *bar;
969 local($_main{'foo'}) = $_main{'bar'};
970
971.fi
972You can use this to print out all the variables in a package, for instance.
973Here is dumpvar.pl from the perl library:
974.nf
975.ne 11
976 package dumpvar;
977
978 sub main'dumpvar {
979 \& ($package) = @_;
980 \& local(*stab) = eval("*_$package");
981 \& while (($key,$val) = each(%stab)) {
982 \& {
983 \& local(*entry) = $val;
984 \& if (defined $entry) {
985 \& print "\e$$key = '$entry'\en";
986 \& }
987.ne 7
988 \& if (defined @entry) {
989 \& print "\e@$key = (\en";
990 \& foreach $num ($[ .. $#entry) {
991 \& print " $num\et'",$entry[$num],"'\en";
992 \& }
993 \& print ")\en";
994 \& }
995.ne 10
996 \& if ($key ne "_$package" && defined %entry) {
997 \& print "\e%$key = (\en";
998 \& foreach $key (sort keys(%entry)) {
999 \& print " $key\et'",$entry{$key},"'\en";
1000 \& }
1001 \& print ")\en";
1002 \& }
1003 \& }
1004 \& }
1005 }
1006
1007.fi
1008Note that, even though the subroutine is compiled in package dumpvar, the
663a0e37 1009name of the subroutine is qualified so that its name is inserted into package
a687059c 1010\*(L"main\*(R".
1011.Sh "Style"
1012Each programmer will, of course, have his or her own preferences in regards
1013to formatting, but there are some general guidelines that will make your
1014programs easier to read.
1015.Ip 1. 4 4
1016Just because you CAN do something a particular way doesn't mean that
1017you SHOULD do it that way.
1018.I Perl
1019is designed to give you several ways to do anything, so consider picking
1020the most readable one.
1021For instance
1022
1023 open(FOO,$foo) || die "Can't open $foo: $!";
1024
1025is better than
1026
1027 die "Can't open $foo: $!" unless open(FOO,$foo);
1028
1029because the second way hides the main point of the statement in a
1030modifier.
1031On the other hand
1032
1033 print "Starting analysis\en" if $verbose;
1034
1035is better than
1036
1037 $verbose && print "Starting analysis\en";
1038
1039since the main point isn't whether the user typed -v or not.
1040.Sp
1041Similarly, just because an operator lets you assume default arguments
1042doesn't mean that you have to make use of the defaults.
1043The defaults are there for lazy systems programmers writing one-shot
1044programs.
1045If you want your program to be readable, consider supplying the argument.
03a14243 1046.Sp
1047Along the same lines, just because you
1048.I can
1049omit parentheses in many places doesn't mean that you ought to:
1050.nf
1051
1052 return print reverse sort num values array;
1053 return print(reverse(sort num (values(%array))));
1054
1055.fi
1056When in doubt, parenthesize.
1057At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1058.Ip 2. 4 4
1059Don't go through silly contortions to exit a loop at the top or the
1060bottom, when
1061.I perl
1062provides the "last" operator so you can exit in the middle.
1063Just outdent it a little to make it more visible:
1064.nf
1065
1066.ne 7
1067 line:
1068 for (;;) {
1069 statements;
1070 last line if $foo;
1071 next line if /^#/;
1072 statements;
1073 }
1074
1075.fi
1076.Ip 3. 4 4
1077Don't be afraid to use loop labels\*(--they're there to enhance readability as
1078well as to allow multi-level loop breaks.
1079See last example.
ffed7fef 1080.Ip 4. 4 4
a687059c 1081For portability, when using features that may not be implemented on every
1082machine, test the construct in an eval to see if it fails.
03a14243 1083If you know what version or patchlevel a particular feature was implemented,
1084you can test $] to see if it will be there.
a687059c 1085.Ip 5. 4 4
ffed7fef 1086Choose mnemonic identifiers.
1087.Ip 6. 4 4
a687059c 1088Be consistent.
1089.Sh "Debugging"
1090If you invoke
1091.I perl
1092with a
1093.B \-d
1094switch, your script will be run under a debugging monitor.
1095It will halt before the first executable statement and ask you for a
1096command, such as:
1097.Ip "h" 12 4
1098Prints out a help message.
1099.Ip "s" 12 4
1100Single step.
1101Executes until it reaches the beginning of another statement.
1102.Ip "c" 12 4
1103Continue.
1104Executes until the next breakpoint is reached.
1105.Ip "<CR>" 12 4
1106Repeat last s or c.
1107.Ip "n" 12 4
1108Single step around subroutine call.
1109.Ip "l min+incr" 12 4
1110List incr+1 lines starting at min.
1111If min is omitted, starts where last listing left off.
1112If incr is omitted, previous value of incr is used.
1113.Ip "l min-max" 12 4
1114List lines in the indicated range.
1115.Ip "l line" 12 4
1116List just the indicated line.
1117.Ip "l" 12 4
1118List incr+1 more lines after last printed line.
1119.Ip "l subname" 12 4
1120List subroutine.
1121If it's a long subroutine it just lists the beginning.
1122Use \*(L"l\*(R" to list more.
1123.Ip "L" 12 4
1124List lines that have breakpoints or actions.
1125.Ip "t" 12 4
1126Toggle trace mode on or off.
1127.Ip "b line" 12 4
1128Set a breakpoint.
1129If line is omitted, sets a breakpoint on the current line
1130line that is about to be executed.
1131Breakpoints may only be set on lines that begin an executable statement.
1132.Ip "b subname" 12 4
1133Set breakpoint at first executable line of subroutine.
1134.Ip "S" 12 4
1135Lists the names of all subroutines.
1136.Ip "d line" 12 4
1137Delete breakpoint.
1138If line is omitted, deletes the breakpoint on the current line
1139line that is about to be executed.
1140.Ip "D" 12 4
1141Delete all breakpoints.
1142.Ip "A" 12 4
1143Delete all line actions.
1144.Ip "V package" 12 4
1145List all variables in package.
1146Default is main package.
1147.Ip "a line command" 12 4
1148Set an action for line.
1149A multi-line command may be entered by backslashing the newlines.
1150.Ip "< command" 12 4
1151Set an action to happen before every debugger prompt.
1152A multi-line command may be entered by backslashing the newlines.
1153.Ip "> command" 12 4
1154Set an action to happen after the prompt when you've just given a command
1155to return to executing the script.
1156A multi-line command may be entered by backslashing the newlines.
1157.Ip "! number" 12 4
1158Redo a debugging command.
1159If number is omitted, redoes the previous command.
1160.Ip "! -number" 12 4
1161Redo the command that was that many commands ago.
1162.Ip "H -number" 12 4
1163Display last n commands.
1164Only commands longer than one character are listed.
1165If number is omitted, lists them all.
1166.Ip "q or ^D" 12 4
1167Quit.
1168.Ip "command" 12 4
1169Execute command as a perl statement.
1170A missing semicolon will be supplied.
1171.Ip "p expr" 12 4
1172Same as \*(L"print DB'OUT expr\*(R".
1173The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1174may be redirected to.
1175.PP
1176If you want to modify the debugger, copy perldb.pl from the perl library
1177to your current directory and modify it as necessary.
1178You can do some customization by setting up a .perldb file which contains
1179initialization code.
1180For instance, you could make aliases like these:
1181.nf
1182
ac58e20f 1183 $DB'alias{'len'} = 's/^len(.*)/p length($1)/';
1184 $DB'alias{'stop'} = 's/^stop (at|in)/b/';
1185 $DB'alias{'.'} =
1186 's/^\e./p "\e$DB\e'sub(\e$DB\e'line):\et",\e$DB\e'line[\e$DB\e'line]/';
a687059c 1187
1188.fi
1189.Sh "Setuid Scripts"
1190.I Perl
1191is designed to make it easy to write secure setuid and setgid scripts.
1192Unlike shells, which are based on multiple substitution passes on each line
1193of the script,
1194.I perl
1195uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1196Additionally, since the language has more built-in functionality, it
1197has to rely less upon external (and possibly untrustworthy) programs to
1198accomplish its purposes.
1199.PP
1200In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1201insecure, but this kernel feature can be disabled.
1202If it is,
1203.I perl
1204can emulate the setuid and setgid mechanism when it notices the otherwise
1205useless setuid/gid bits on perl scripts.
1206If the kernel feature isn't disabled,
1207.I perl
1208will complain loudly that your setuid script is insecure.
1209You'll need to either disable the kernel setuid script feature, or put
1210a C wrapper around the script.
1211.PP
1212When perl is executing a setuid script, it takes special precautions to
1213prevent you from falling into any obvious traps.
1214(In some ways, a perl script is more secure than the corresponding
1215C program.)
1216Any command line argument, environment variable, or input is marked as
1217\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1218command that invokes a subshell, or in any command that modifies files,
1219directories or processes.
1220Any variable that is set within an expression that has previously referenced
1221a tainted value also becomes tainted (even if it is logically impossible
1222for the tainted value to influence the variable).
1223For example:
1224.nf
1225
1226.ne 5
1227 $foo = shift; # $foo is tainted
1228 $bar = $foo,\'bar\'; # $bar is also tainted
1229 $xxx = <>; # Tainted
1230 $path = $ENV{\'PATH\'}; # Tainted, but see below
1231 $abc = \'abc\'; # Not tainted
1232
1233.ne 4
1234 system "echo $foo"; # Insecure
79a0689e 1235 system "/bin/echo", $foo; # Secure (doesn't use sh)
a687059c 1236 system "echo $bar"; # Insecure
1237 system "echo $abc"; # Insecure until PATH set
1238
1239.ne 5
1240 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1241 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1242
1243 $path = $ENV{\'PATH\'}; # Not tainted
1244 system "echo $abc"; # Is secure now!
1245
1246.ne 5
1247 open(FOO,"$foo"); # OK
1248 open(FOO,">$foo"); # Not OK
1249
1250 open(FOO,"echo $foo|"); # Not OK, but...
1251 open(FOO,"-|") || exec \'echo\', $foo; # OK
1252
1253 $zzz = `echo $foo`; # Insecure, zzz tainted
1254
1255 unlink $abc,$foo; # Insecure
1256 umask $foo; # Insecure
1257
1258.ne 3
1259 exec "echo $foo"; # Insecure
1260 exec "echo", $foo; # Secure (doesn't use sh)
1261 exec "sh", \'-c\', $foo; # Considered secure, alas
1262
1263.fi
1264The taintedness is associated with each scalar value, so some elements
1265of an array can be tainted, and others not.
1266.PP
1267If you try to do something insecure, you will get a fatal error saying
1268something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1269Note that you can still write an insecure system call or exec,
ae986130 1270but only by explicitly doing something like the last example above.
a687059c 1271You can also bypass the tainting mechanism by referencing
1272subpatterns\*(--\c
1273.I perl
1274presumes that if you reference a substring using $1, $2, etc, you knew
1275what you were doing when you wrote the pattern:
1276.nf
1277
1278 $ARGV[0] =~ /^\-P(\ew+)$/;
1279 $printer = $1; # Not tainted
1280
1281.fi
1282This is fairly secure since \ew+ doesn't match shell metacharacters.
1283Use of .+ would have been insecure, but
1284.I perl
1285doesn't check for that, so you must be careful with your patterns.
1286This is the ONLY mechanism for untainting user supplied filenames if you
1287want to do file operations on them (unless you make $> equal to $<).
1288.PP
1289It's also possible to get into trouble with other operations that don't care
1290whether they use tainted values.
1291Make judicious use of the file tests in dealing with any user-supplied
1292filenames.
1293When possible, do opens and such after setting $> = $<.
1294.I Perl
1295doesn't prevent you from opening tainted filenames for reading, so be
1296careful what you print out.
1297The tainting mechanism is intended to prevent stupid mistakes, not to remove
1298the need for thought.
1299.SH ENVIRONMENT
1300.I Perl
1301uses PATH in executing subprocesses, and in finding the script if \-S
1302is used.
1303HOME or LOGDIR are used if chdir has no argument.
1304.PP
1305Apart from these,
1306.I perl
1307uses no environment variables, except to make them available
1308to the script being executed, and to child processes.
1309However, scripts running setuid would do well to execute the following lines
1310before doing anything else, just to keep people honest:
1311.nf
1312
1313.ne 3
1314 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1315 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1316 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1317
1318.fi
1319.SH AUTHOR
1320Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1321.SH FILES
1322/tmp/perl\-eXXXXXX temporary file for
1323.B \-e
1324commands.
1325.SH SEE ALSO
1326a2p awk to perl translator
1327.br
1328s2p sed to perl translator
1329.SH DIAGNOSTICS
1330Compilation errors will tell you the line number of the error, with an
1331indication of the next token or token type that was to be examined.
1332(In the case of a script passed to
1333.I perl
1334via
1335.B \-e
1336switches, each
1337.B \-e
1338is counted as one line.)
1339.PP
1340Setuid scripts have additional constraints that can produce error messages
1341such as \*(L"Insecure dependency\*(R".
1342See the section on setuid scripts.
1343.SH TRAPS
1344Accustomed
1345.IR awk
1346users should take special note of the following:
1347.Ip * 4 2
1348Semicolons are required after all simple statements in
1349.IR perl .
1350Newline
1351is not a statement delimiter.
1352.Ip * 4 2
1353Curly brackets are required on ifs and whiles.
1354.Ip * 4 2
1355Variables begin with $ or @ in
1356.IR perl .
1357.Ip * 4 2
1358Arrays index from 0 unless you set $[.
1359Likewise string positions in substr() and index().
1360.Ip * 4 2
1361You have to decide whether your array has numeric or string indices.
1362.Ip * 4 2
1363Associative array values do not spring into existence upon mere reference.
1364.Ip * 4 2
1365You have to decide whether you want to use string or numeric comparisons.
1366.Ip * 4 2
1367Reading an input line does not split it for you. You get to split it yourself
1368to an array.
1369And the
1370.I split
1371operator has different arguments.
1372.Ip * 4 2
1373The current input line is normally in $_, not $0.
1374It generally does not have the newline stripped.
ac58e20f 1375($0 is the name of the program executed.)
a687059c 1376.Ip * 4 2
1377$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1378match pattern.
1379.Ip * 4 2
1380The
1381.I print
1382statement does not add field and record separators unless you set
1383$, and $\e.
1384.Ip * 4 2
1385You must open your files before you print to them.
1386.Ip * 4 2
1387The range operator is \*(L".\|.\*(R", not comma.
1388(The comma operator works as in C.)
1389.Ip * 4 2
1390The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1391(\*(L"~\*(R" is the one's complement operator, as in C.)
1392.Ip * 4 2
1393The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1394(\*(L"^\*(R" is the XOR operator, as in C.)
1395.Ip * 4 2
1396The concatenation operator is \*(L".\*(R", not the null string.
1397(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1398since the third slash would be interpreted as a division operator\*(--the
1399tokener is in fact slightly context sensitive for operators like /, ?, and <.
1400And in fact, . itself can be the beginning of a number.)
1401.Ip * 4 2
1402.IR Next ,
1403.I exit
1404and
1405.I continue
1406work differently.
1407.Ip * 4 2
1408The following variables work differently
1409.nf
1410
1411 Awk \h'|2.5i'Perl
1412 ARGC \h'|2.5i'$#ARGV
1413 ARGV[0] \h'|2.5i'$0
1414 FILENAME\h'|2.5i'$ARGV
1415 FNR \h'|2.5i'$. \- something
1416 FS \h'|2.5i'(whatever you like)
1417 NF \h'|2.5i'$#Fld, or some such
1418 NR \h'|2.5i'$.
1419 OFMT \h'|2.5i'$#
1420 OFS \h'|2.5i'$,
1421 ORS \h'|2.5i'$\e
1422 RLENGTH \h'|2.5i'length($&)
ac58e20f 1423 RS \h'|2.5i'$/
a687059c 1424 RSTART \h'|2.5i'length($\`)
1425 SUBSEP \h'|2.5i'$;
1426
1427.fi
1428.Ip * 4 2
1429When in doubt, run the
1430.I awk
1431construct through a2p and see what it gives you.
1432.PP
1433Cerebral C programmers should take note of the following:
1434.Ip * 4 2
1435Curly brackets are required on ifs and whiles.
1436.Ip * 4 2
1437You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1438.Ip * 4 2
1439.I Break
1440and
1441.I continue
1442become
1443.I last
1444and
1445.IR next ,
1446respectively.
1447.Ip * 4 2
1448There's no switch statement.
1449.Ip * 4 2
1450Variables begin with $ or @ in
1451.IR perl .
1452.Ip * 4 2
1453Printf does not implement *.
1454.Ip * 4 2
1455Comments begin with #, not /*.
1456.Ip * 4 2
1457You can't take the address of anything.
1458.Ip * 4 2
1459ARGV must be capitalized.
1460.Ip * 4 2
1461The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1462.Ip * 4 2
1463Signal handlers deal with signal names, not numbers.
a687059c 1464.PP
1465Seasoned
1466.I sed
1467programmers should take note of the following:
1468.Ip * 4 2
1469Backreferences in substitutions use $ rather than \e.
1470.Ip * 4 2
1471The pattern matching metacharacters (, ), and | do not have backslashes in front.
1472.Ip * 4 2
1473The range operator is .\|. rather than comma.
1474.PP
1475Sharp shell programmers should take note of the following:
1476.Ip * 4 2
1477The backtick operator does variable interpretation without regard to the
1478presence of single quotes in the command.
1479.Ip * 4 2
1480The backtick operator does no translation of the return value, unlike csh.
1481.Ip * 4 2
1482Shells (especially csh) do several levels of substitution on each command line.
1483.I Perl
1484does substitution only in certain constructs such as double quotes,
1485backticks, angle brackets and search patterns.
1486.Ip * 4 2
1487Shells interpret scripts a little bit at a time.
1488.I Perl
1489compiles the whole program before executing it.
1490.Ip * 4 2
1491The arguments are available via @ARGV, not $1, $2, etc.
1492.Ip * 4 2
1493The environment is not automatically made available as variables.
1494.SH BUGS
1495.PP
1496.I Perl
1497is at the mercy of your machine's definitions of various operations
1498such as type casting, atof() and sprintf().
1499.PP
1500If your stdio requires an seek or eof between reads and writes on a particular
1501stream, so does
1502.IR perl .
1503.PP
1504While none of the built-in data types have any arbitrary size limits (apart
1505from memory size), there are still a few arbitrary limits:
1506a given identifier may not be longer than 255 characters;
1507sprintf is limited on many machines to 128 characters per field (unless the format
1508specifier is exactly %s);
1509and no component of your PATH may be longer than 255 if you use \-S.
1510.PP
1511.I Perl
1512actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1513anyone I said that.
1514.rn }` ''