perl 3.0 patch #14 patch #13, continued
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
79a0689e 2''' $Header: perl.man.4,v 3.0.1.6 90/03/12 16:54:04 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
79a0689e 5''' Revision 3.0.1.6 90/03/12 16:54:04 lwall
6''' patch13: improved documentation of *name
7'''
ac58e20f 8''' Revision 3.0.1.5 90/02/28 18:01:52 lwall
9''' patch9: $0 is now always the command name
10'''
663a0e37 11''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
12''' patch7: documented that package'filehandle works as well as $package'variable
13''' patch7: documented which identifiers are always in package main
14'''
ffed7fef 15''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
16''' patch5: fixed some manual typos and indent problems
17''' patch5: clarified difference between $! and $@
18'''
ae986130 19''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
20''' patch2: made some line breaks depend on troff vs. nroff
21''' patch2: clarified operation of ^ and $ when $* is false
22'''
03a14243 23''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
24''' patch1: documented the desirability of unnecessary parentheses
25'''
a687059c 26''' Revision 3.0 89/10/18 15:21:55 lwall
27''' 3.0 baseline
28'''
29.Sh "Precedence"
30.I Perl
31operators have the following associativity and precedence:
32.nf
33
34nonassoc\h'|1i'print printf exec system sort reverse
35\h'1.5i'chmod chown kill unlink utime die return
36left\h'|1i',
37right\h'|1i'= += \-= *= etc.
38right\h'|1i'?:
39nonassoc\h'|1i'.\|.
40left\h'|1i'||
41left\h'|1i'&&
42left\h'|1i'| ^
43left\h'|1i'&
44nonassoc\h'|1i'== != eq ne
45nonassoc\h'|1i'< > <= >= lt gt le ge
46nonassoc\h'|1i'chdir exit eval reset sleep rand umask
47nonassoc\h'|1i'\-r \-w \-x etc.
48left\h'|1i'<< >>
49left\h'|1i'+ \- .
50left\h'|1i'* / % x
51left\h'|1i'=~ !~
52right\h'|1i'! ~ and unary minus
53right\h'|1i'**
54nonassoc\h'|1i'++ \-\|\-
55left\h'|1i'\*(L'(\*(R'
56
57.fi
58As mentioned earlier, if any list operator (print, etc.) or
59any unary operator (chdir, etc.)
60is followed by a left parenthesis as the next token on the same line,
61the operator and arguments within parentheses are taken to
62be of highest precedence, just like a normal function call.
63Examples:
64.nf
65
ffed7fef 66 chdir $foo || die;\h'|3i'# (chdir $foo) || die
67 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
68 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
69 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 70
71but, because * is higher precedence than ||:
72
ffed7fef 73 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
74 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
75 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
76 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 77
ffed7fef 78 rand 10 * 20;\h'|3i'# rand (10 * 20)
79 rand(10) * 20;\h'|3i'# (rand 10) * 20
80 rand (10) * 20;\h'|3i'# (rand 10) * 20
81 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 82
83.fi
84In the absence of parentheses,
85the precedence of list operators such as print, sort or chmod is
86either very high or very low depending on whether you look at the left
87side of operator or the right side of it.
88For example, in
89.nf
90
91 @ary = (1, 3, sort 4, 2);
92 print @ary; # prints 1324
93
94.fi
95the commas on the right of the sort are evaluated before the sort, but
96the commas on the left are evaluated after.
97In other words, list operators tend to gobble up all the arguments that
98follow them, and then act like a simple term with regard to the preceding
99expression.
100Note that you have to be careful with parens:
101.nf
102
103.ne 3
104 # These evaluate exit before doing the print:
105 print($foo, exit); # Obviously not what you want.
106 print $foo, exit; # Nor is this.
107
108.ne 4
109 # These do the print before evaluating exit:
110 (print $foo), exit; # This is what you want.
111 print($foo), exit; # Or this.
112 print ($foo), exit; # Or even this.
113
114Also note that
115
116 print ($foo & 255) + 1, "\en";
117
118.fi
119probably doesn't do what you expect at first glance.
120.Sh "Subroutines"
121A subroutine may be declared as follows:
122.nf
123
124 sub NAME BLOCK
125
126.fi
127.PP
128Any arguments passed to the routine come in as array @_,
129that is ($_[0], $_[1], .\|.\|.).
130The array @_ is a local array, but its values are references to the
131actual scalar parameters.
132The return value of the subroutine is the value of the last expression
133evaluated, and can be either an array value or a scalar value.
134Alternately, a return statement may be used to specify the returned value and
135exit the subroutine.
136To create local variables see the
137.I local
138operator.
139.PP
140A subroutine is called using the
141.I do
142operator or the & operator.
143.nf
144
145.ne 12
146Example:
147
148 sub MAX {
149 local($max) = pop(@_);
150 foreach $foo (@_) {
151 $max = $foo \|if \|$max < $foo;
152 }
153 $max;
154 }
155
156 .\|.\|.
157 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
158
159.ne 21
160Example:
161
162 # get a line, combining continuation lines
163 # that start with whitespace
164 sub get_line {
165 $thisline = $lookahead;
166 line: while ($lookahead = <STDIN>) {
167 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
168 $thisline \|.= \|$lookahead;
169 }
170 else {
171 last line;
172 }
173 }
174 $thisline;
175 }
176
177 $lookahead = <STDIN>; # get first line
178 while ($_ = do get_line(\|)) {
179 .\|.\|.
180 }
181
182.fi
183.nf
184.ne 6
185Use array assignment to a local list to name your formal arguments:
186
187 sub maybeset {
188 local($key, $value) = @_;
189 $foo{$key} = $value unless $foo{$key};
190 }
191
192.fi
193This also has the effect of turning call-by-reference into call-by-value,
194since the assignment copies the values.
195.Sp
196Subroutines may be called recursively.
197If a subroutine is called using the & form, the argument list is optional.
198If omitted, no @_ array is set up for the subroutine; the @_ array at the
199time of the call is visible to subroutine instead.
200.nf
201
202 do foo(1,2,3); # pass three arguments
203 &foo(1,2,3); # the same
204
205 do foo(); # pass a null list
206 &foo(); # the same
207 &foo; # pass no arguments--more efficient
208
209.fi
210.Sh "Passing By Reference"
211Sometimes you don't want to pass the value of an array to a subroutine but
212rather the name of it, so that the subroutine can modify the global copy
213of it rather than working with a local copy.
214In perl you can refer to all the objects of a particular name by prefixing
215the name with a star: *foo.
216When evaluated, it produces a scalar value that represents all the objects
79a0689e 217of that name, including any filehandle, format or subroutine.
a687059c 218When assigned to within a local() operation, it causes the name mentioned
219to refer to whatever * value was assigned to it.
220Example:
221.nf
222
223 sub doubleary {
224 local(*someary) = @_;
225 foreach $elem (@someary) {
226 $elem *= 2;
227 }
228 }
229 do doubleary(*foo);
230 do doubleary(*bar);
231
232.fi
233Assignment to *name is currently recommended only inside a local().
234You can actually assign to *name anywhere, but the previous referent of
235*name may be stranded forever.
236This may or may not bother you.
237.Sp
238Note that scalars are already passed by reference, so you can modify scalar
ae986130 239arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 240in question.
241You can modify all the elements of an array by passing all the elements
242as scalars, but you have to use the * mechanism to push, pop or change the
243size of an array.
244The * mechanism will probably be more efficient in any case.
245.Sp
246Since a *name value contains unprintable binary data, if it is used as
247an argument in a print, or as a %s argument in a printf or sprintf, it
248then has the value '*name', just so it prints out pretty.
79a0689e 249.Sp
250Even if you don't want to modify an array, this mechanism is useful for
251passing multiple arrays in a single LIST, since normally the LIST mechanism
252will merge all the array values so that you can't extract out the
253individual arrays.
a687059c 254.Sh "Regular Expressions"
255The patterns used in pattern matching are regular expressions such as
256those supplied in the Version 8 regexp routines.
257(In fact, the routines are derived from Henry Spencer's freely redistributable
258reimplementation of the V8 routines.)
259In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
260Word boundaries may be matched by \eb, and non-boundaries by \eB.
261A whitespace character is matched by \es, non-whitespace by \eS.
262A numeric character is matched by \ed, non-numeric by \eD.
263You may use \ew, \es and \ed within character classes.
264Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
265Within character classes \eb represents backspace rather than a word boundary.
266Alternatives may be separated by |.
267The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
268matches the digit'th substring, where digit can range from 1 to 9.
269(Outside of the pattern, always use $ instead of \e in front of the digit.
270The scope of $<digit> (and $\`, $& and $\')
271extends to the end of the enclosing BLOCK or eval string, or to
272the next pattern match with subexpressions.
273The \e<digit> notation sometimes works outside the current pattern, but should
274not be relied upon.)
275$+ returns whatever the last bracket match matched.
276$& returns the entire matched string.
ac58e20f 277($0 used to return the same thing, but not any more.)
a687059c 278$\` returns everything before the matched string.
279$\' returns everything after the matched string.
280Examples:
281.nf
282
283 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
284
285.ne 5
286 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
287 $hours = $1;
288 $minutes = $2;
289 $seconds = $3;
290 }
291
292.fi
ae986130 293By default, the ^ character is only guaranteed to match at the beginning
294of the string,
295the $ character only at the end (or before the newline at the end)
a687059c 296and
297.I perl
298does certain optimizations with the assumption that the string contains
299only one line.
ae986130 300The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 301You may, however, wish to treat a string as a multi-line buffer, such that
302the ^ will match after any newline within the string, and $ will match
303before any newline.
304At the cost of a little more overhead, you can do this by setting the variable
305$* to 1.
306Setting it back to 0 makes
307.I perl
308revert to its old behavior.
309.PP
310To facilitate multi-line substitutions, the . character never matches a newline
311(even when $* is 0).
312In particular, the following leaves a newline on the $_ string:
313.nf
314
315 $_ = <STDIN>;
316 s/.*(some_string).*/$1/;
317
318If the newline is unwanted, try one of
319
320 s/.*(some_string).*\en/$1/;
321 s/.*(some_string)[^\e000]*/$1/;
322 s/.*(some_string)(.|\en)*/$1/;
323 chop; s/.*(some_string).*/$1/;
324 /(some_string)/ && ($_ = $1);
325
326.fi
327Any item of a regular expression may be followed with digits in curly brackets
328of the form {n,m}, where n gives the minimum number of times to match the item
329and m gives the maximum.
330The form {n} is equivalent to {n,n} and matches exactly n times.
331The form {n,} matches n or more times.
332(If a curly bracket occurs in any other context, it is treated as a regular
333character.)
334The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
335to {0,1}.
336There is no limit to the size of n or m, but large numbers will chew up
337more memory.
338.Sp
339You will note that all backslashed metacharacters in
340.I perl
341are alphanumeric,
342such as \eb, \ew, \en.
343Unlike some other regular expression languages, there are no backslashed
344symbols that aren't alphanumeric.
345So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
346interpreted as a literal character, not a metacharacter.
347This makes it simple to quote a string that you want to use for a pattern
348but that you are afraid might contain metacharacters.
349Simply quote all the non-alphanumeric characters:
350.nf
351
352 $pattern =~ s/(\eW)/\e\e$1/g;
353
354.fi
355.Sh "Formats"
356Output record formats for use with the
357.I write
358operator may declared as follows:
359.nf
360
361.ne 3
362 format NAME =
363 FORMLIST
364 .
365
366.fi
367If name is omitted, format \*(L"STDOUT\*(R" is defined.
368FORMLIST consists of a sequence of lines, each of which may be of one of three
369types:
370.Ip 1. 4
371A comment.
372.Ip 2. 4
373A \*(L"picture\*(R" line giving the format for one output line.
374.Ip 3. 4
375An argument line supplying values to plug into a picture line.
376.PP
377Picture lines are printed exactly as they look, except for certain fields
378that substitute values into the line.
379Each picture field starts with either @ or ^.
380The @ field (not to be confused with the array marker @) is the normal
381case; ^ fields are used
382to do rudimentary multi-line text block filling.
383The length of the field is supplied by padding out the field
384with multiple <, >, or | characters to specify, respectively, left justification,
385right justification, or centering.
386If any of the values supplied for these fields contains a newline, only
387the text up to the newline is printed.
388The special field @* can be used for printing multi-line values.
389It should appear by itself on a line.
390.PP
391The values are specified on the following line, in the same order as
392the picture fields.
393The values should be separated by commas.
394.PP
395Picture fields that begin with ^ rather than @ are treated specially.
396The value supplied must be a scalar variable name which contains a text
397string.
398.I Perl
399puts as much text as it can into the field, and then chops off the front
400of the string so that the next time the variable is referenced,
401more of the text can be printed.
402Normally you would use a sequence of fields in a vertical stack to print
403out a block of text.
404If you like, you can end the final field with .\|.\|., which will appear in the
405output if the text was too long to appear in its entirety.
406You can change which characters are legal to break on by changing the
407variable $: to a list of the desired characters.
408.PP
409Since use of ^ fields can produce variable length records if the text to be
410formatted is short, you can suppress blank lines by putting the tilde (~)
411character anywhere in the line.
412(Normally you should put it in the front if possible, for visibility.)
413The tilde will be translated to a space upon output.
414If you put a second tilde contiguous to the first, the line will be repeated
415until all the fields on the line are exhausted.
416(If you use a field of the @ variety, the expression you supply had better
417not give the same value every time forever!)
418.PP
419Examples:
420.nf
421.lg 0
422.cs R 25
423.ft C
424
425.ne 10
426# a report on the /etc/passwd file
427format top =
428\& Passwd File
429Name Login Office Uid Gid Home
430------------------------------------------------------------------
431\&.
432format STDOUT =
433@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
434$name, $login, $office,$uid,$gid, $home
435\&.
436
437.ne 29
438# a report from a bug report form
439format top =
440\& Bug Reports
441@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
442$system, $%, $date
443------------------------------------------------------------------
444\&.
445format STDOUT =
446Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
447\& $subject
448Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
449\& $index, $description
450Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
451\& $priority, $date, $description
452From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
453\& $from, $description
454Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
455\& $programmer, $description
456\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
457\& $description
458\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
459\& $description
460\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
461\& $description
462\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
463\& $description
464\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
465\& $description
466\&.
467
468.ft R
469.cs R
470.lg
471.fi
472It is possible to intermix prints with writes on the same output channel,
473but you'll have to handle $\- (lines left on the page) yourself.
474.PP
475If you are printing lots of fields that are usually blank, you should consider
476using the reset operator between records.
477Not only is it more efficient, but it can prevent the bug of adding another
478field and forgetting to zero it.
479.Sh "Interprocess Communication"
480The IPC facilities of perl are built on the Berkeley socket mechanism.
481If you don't have sockets, you can ignore this section.
482The calls have the same names as the corresponding system calls,
483but the arguments tend to differ, for two reasons.
484First, perl file handles work differently than C file descriptors.
485Second, perl already knows the length of its strings, so you don't need
486to pass that information.
487Here is a sample client (untested):
488.nf
489
490 ($them,$port) = @ARGV;
491 $port = 2345 unless $port;
492 $them = 'localhost' unless $them;
493
494 $SIG{'INT'} = 'dokill';
495 sub dokill { kill 9,$child if $child; }
496
497 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
498
499 $sockaddr = 'S n a4 x8';
500 chop($hostname = `hostname`);
501
502 ($name, $aliases, $proto) = getprotobyname('tcp');
503 ($name, $aliases, $port) = getservbyname($port, 'tcp')
504 unless $port =~ /^\ed+$/;;
ae986130 505.ie t \{\
a687059c 506 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 507'br\}
508.el \{\
509 ($name, $aliases, $type, $len, $thisaddr) =
510 gethostbyname($hostname);
511'br\}
a687059c 512 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
513
514 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
515 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
516
517 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
518 bind(S, $this) || die "bind: $!";
519 connect(S, $that) || die "connect: $!";
520
521 select(S); $| = 1; select(stdout);
522
523 if ($child = fork) {
524 while (<>) {
525 print S;
526 }
527 sleep 3;
528 do dokill();
529 }
530 else {
531 while (<S>) {
532 print;
533 }
534 }
535
536.fi
537And here's a server:
538.nf
539
540 ($port) = @ARGV;
541 $port = 2345 unless $port;
542
543 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
544
545 $sockaddr = 'S n a4 x8';
546
547 ($name, $aliases, $proto) = getprotobyname('tcp');
548 ($name, $aliases, $port) = getservbyname($port, 'tcp')
549 unless $port =~ /^\ed+$/;;
550
551 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
552
553 select(NS); $| = 1; select(stdout);
554
555 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
556 bind(S, $this) || die "bind: $!";
557 listen(S, 5) || die "connect: $!";
558
559 select(S); $| = 1; select(stdout);
560
561 for (;;) {
562 print "Listening again\en";
563 ($addr = accept(NS,S)) || die $!;
564 print "accept ok\en";
565
ae986130 566 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 567 @inetaddr = unpack('C4',$inetaddr);
568 print "$af $port @inetaddr\en";
569
570 while (<NS>) {
571 print;
572 print NS;
573 }
574 }
575
576.fi
577.Sh "Predefined Names"
578The following names have special meaning to
579.IR perl .
580I could have used alphabetic symbols for some of these, but I didn't want
581to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
582out.
583You'll just have to suffer along with these silly symbols.
584Most of them have reasonable mnemonics, or analogues in one of the shells.
585.Ip $_ 8
586The default input and pattern-searching space.
587The following pairs are equivalent:
588.nf
589
590.ne 2
591 while (<>) {\|.\|.\|. # only equivalent in while!
592 while ($_ = <>) {\|.\|.\|.
593
594.ne 2
595 /\|^Subject:/
596 $_ \|=~ \|/\|^Subject:/
597
598.ne 2
599 y/a\-z/A\-Z/
600 $_ =~ y/a\-z/A\-Z/
601
602.ne 2
603 chop
604 chop($_)
605
606.fi
607(Mnemonic: underline is understood in certain operations.)
608.Ip $. 8
609The current input line number of the last filehandle that was read.
610Readonly.
611Remember that only an explicit close on the filehandle resets the line number.
612Since <> never does an explicit close, line numbers increase across ARGV files
613(but see examples under eof).
614(Mnemonic: many programs use . to mean the current line number.)
615.Ip $/ 8
616The input record separator, newline by default.
617Works like
618.IR awk 's
619RS variable, including treating blank lines as delimiters
620if set to the null string.
621If set to a value longer than one character, only the first character is used.
622(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
623.Ip $, 8
624The output field separator for the print operator.
625Ordinarily the print operator simply prints out the comma separated fields
626you specify.
627In order to get behavior more like
628.IR awk ,
629set this variable as you would set
630.IR awk 's
631OFS variable to specify what is printed between fields.
632(Mnemonic: what is printed when there is a , in your print statement.)
633.Ip $"" 8
634This is like $, except that it applies to array values interpolated into
635a double-quoted string (or similar interpreted string).
636Default is a space.
637(Mnemonic: obvious, I think.)
638.Ip $\e 8
639The output record separator for the print operator.
640Ordinarily the print operator simply prints out the comma separated fields
641you specify, with no trailing newline or record separator assumed.
642In order to get behavior more like
643.IR awk ,
644set this variable as you would set
645.IR awk 's
646ORS variable to specify what is printed at the end of the print.
647(Mnemonic: you set $\e instead of adding \en at the end of the print.
648Also, it's just like /, but it's what you get \*(L"back\*(R" from
649.IR perl .)
650.Ip $# 8
651The output format for printed numbers.
652This variable is a half-hearted attempt to emulate
653.IR awk 's
654OFMT variable.
655There are times, however, when
656.I awk
657and
658.I perl
659have differing notions of what
660is in fact numeric.
661Also, the initial value is %.20g rather than %.6g, so you need to set $#
662explicitly to get
663.IR awk 's
664value.
665(Mnemonic: # is the number sign.)
666.Ip $% 8
667The current page number of the currently selected output channel.
668(Mnemonic: % is page number in nroff.)
669.Ip $= 8
670The current page length (printable lines) of the currently selected output
671channel.
672Default is 60.
673(Mnemonic: = has horizontal lines.)
674.Ip $\- 8
675The number of lines left on the page of the currently selected output channel.
676(Mnemonic: lines_on_page \- lines_printed.)
677.Ip $~ 8
678The name of the current report format for the currently selected output
679channel.
680(Mnemonic: brother to $^.)
681.Ip $^ 8
682The name of the current top-of-page format for the currently selected output
683channel.
684(Mnemonic: points to top of page.)
685.Ip $| 8
686If set to nonzero, forces a flush after every write or print on the currently
687selected output channel.
688Default is 0.
689Note that
690.I STDOUT
691will typically be line buffered if output is to the
692terminal and block buffered otherwise.
693Setting this variable is useful primarily when you are outputting to a pipe,
694such as when you are running a
695.I perl
696script under rsh and want to see the
697output as it's happening.
698(Mnemonic: when you want your pipes to be piping hot.)
699.Ip $$ 8
700The process number of the
701.I perl
702running this script.
703(Mnemonic: same as shells.)
704.Ip $? 8
705The status returned by the last pipe close, backtick (\`\`) command or
706.I system
707operator.
708Note that this is the status word returned by the wait() system
709call, so the exit value of the subprocess is actually ($? >> 8).
710$? & 255 gives which signal, if any, the process died from, and whether
711there was a core dump.
712(Mnemonic: similar to sh and ksh.)
713.Ip $& 8 4
714The string matched by the last pattern match (not counting any matches hidden
715within a BLOCK or eval enclosed by the current BLOCK).
716(Mnemonic: like & in some editors.)
717.Ip $\` 8 4
718The string preceding whatever was matched by the last pattern match
719(not counting any matches hidden within a BLOCK or eval enclosed by the current
720BLOCK).
721(Mnemonic: \` often precedes a quoted string.)
722.Ip $\' 8 4
723The string following whatever was matched by the last pattern match
724(not counting any matches hidden within a BLOCK or eval enclosed by the current
725BLOCK).
726(Mnemonic: \' often follows a quoted string.)
727Example:
728.nf
729
730.ne 3
731 $_ = \'abcdefghi\';
732 /def/;
733 print "$\`:$&:$\'\en"; # prints abc:def:ghi
734
735.fi
736.Ip $+ 8 4
737The last bracket matched by the last search pattern.
738This is useful if you don't know which of a set of alternative patterns
739matched.
740For example:
741.nf
742
743 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
744
745.fi
746(Mnemonic: be positive and forward looking.)
747.Ip $* 8 2
748Set to 1 to do multiline matching within a string, 0 to tell
749.I perl
750that it can assume that strings contain a single line, for the purpose
751of optimizing pattern matches.
752Pattern matches on strings containing multiple newlines can produce confusing
753results when $* is 0.
754Default is 0.
755(Mnemonic: * matches multiple things.)
756.Ip $0 8
757Contains the name of the file containing the
758.I perl
759script being executed.
a687059c 760(Mnemonic: same as sh and ksh.)
761.Ip $<digit> 8
762Contains the subpattern from the corresponding set of parentheses in the last
763pattern matched, not counting patterns matched in nested blocks that have
764been exited already.
765(Mnemonic: like \edigit.)
766.Ip $[ 8 2
767The index of the first element in an array, and of the first character in
768a substring.
769Default is 0, but you could set it to 1 to make
770.I perl
771behave more like
772.I awk
773(or Fortran)
774when subscripting and when evaluating the index() and substr() functions.
775(Mnemonic: [ begins subscripts.)
776.Ip $] 8 2
777The string printed out when you say \*(L"perl -v\*(R".
778It can be used to determine at the beginning of a script whether the perl
779interpreter executing the script is in the right range of versions.
780Example:
781.nf
782
783.ne 5
784 # see if getc is available
785 ($version,$patchlevel) =
786 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
787 print STDERR "(No filename completion available.)\en"
788 if $version * 1000 + $patchlevel < 2016;
789
790.fi
791(Mnemonic: Is this version of perl in the right bracket?)
792.Ip $; 8 2
793The subscript separator for multi-dimensional array emulation.
794If you refer to an associative array element as
795.nf
796 $foo{$a,$b,$c}
797
798it really means
799
800 $foo{join($;, $a, $b, $c)}
801
802But don't put
803
804 @foo{$a,$b,$c} # a slice--note the @
805
806which means
807
808 ($foo{$a},$foo{$b},$foo{$c})
809
810.fi
811Default is "\e034", the same as SUBSEP in
812.IR awk .
813Note that if your keys contain binary data there might not be any safe
814value for $;.
815(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
816Yeah, I know, it's pretty lame, but $, is already taken for something more
817important.)
818.Ip $! 8 2
819If used in a numeric context, yields the current value of errno, with all the
820usual caveats.
ffed7fef 821(This means that you shouldn't depend on the value of $! to be anything
822in particular unless you've gotten a specific error return indicating a
823system error.)
a687059c 824If used in a string context, yields the corresponding system error string.
825You can assign to $! in order to set errno
826if, for instance, you want $! to return the string for error n, or you want
827to set the exit value for the die operator.
828(Mnemonic: What just went bang?)
829.Ip $@ 8 2
ffed7fef 830The perl syntax error message from the last eval command.
831If null, the last eval parsed and executed correctly (although the operations
832you invoked may have failed in the normal fashion).
a687059c 833(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
834.Ip $< 8 2
835The real uid of this process.
836(Mnemonic: it's the uid you came FROM, if you're running setuid.)
837.Ip $> 8 2
838The effective uid of this process.
839Example:
840.nf
841
842.ne 2
843 $< = $>; # set real uid to the effective uid
844 ($<,$>) = ($>,$<); # swap real and effective uid
845
846.fi
847(Mnemonic: it's the uid you went TO, if you're running setuid.)
848Note: $< and $> can only be swapped on machines supporting setreuid().
849.Ip $( 8 2
850The real gid of this process.
851If you are on a machine that supports membership in multiple groups
852simultaneously, gives a space separated list of groups you are in.
853The first number is the one returned by getgid(), and the subsequent ones
854by getgroups(), one of which may be the same as the first number.
855(Mnemonic: parentheses are used to GROUP things.
856The real gid is the group you LEFT, if you're running setgid.)
857.Ip $) 8 2
858The effective gid of this process.
859If you are on a machine that supports membership in multiple groups
860simultaneously, gives a space separated list of groups you are in.
861The first number is the one returned by getegid(), and the subsequent ones
862by getgroups(), one of which may be the same as the first number.
863(Mnemonic: parentheses are used to GROUP things.
864The effective gid is the group that's RIGHT for you, if you're running setgid.)
865.Sp
866Note: $<, $>, $( and $) can only be set on machines that support the
867corresponding set[re][ug]id() routine.
868$( and $) can only be swapped on machines supporting setregid().
869.Ip $: 8 2
870The current set of characters after which a string may be broken to
871fill continuation fields (starting with ^) in a format.
872Default is "\ \en-", to break on whitespace or hyphens.
873(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
874.Ip @ARGV 8 3
875The array ARGV contains the command line arguments intended for the script.
876Note that $#ARGV is the generally number of arguments minus one, since
877$ARGV[0] is the first argument, NOT the command name.
878See $0 for the command name.
879.Ip @INC 8 3
880The array INC contains the list of places to look for
881.I perl
882scripts to be
883evaluated by the \*(L"do EXPR\*(R" command.
884It initially consists of the arguments to any
885.B \-I
886command line switches, followed
887by the default
888.I perl
889library, probably \*(L"/usr/local/lib/perl\*(R".
890.Ip $ENV{expr} 8 2
891The associative array ENV contains your current environment.
892Setting a value in ENV changes the environment for child processes.
893.Ip $SIG{expr} 8 2
894The associative array SIG is used to set signal handlers for various signals.
895Example:
896.nf
897
898.ne 12
899 sub handler { # 1st argument is signal name
900 local($sig) = @_;
901 print "Caught a SIG$sig\-\|\-shutting down\en";
902 close(LOG);
903 exit(0);
904 }
905
906 $SIG{\'INT\'} = \'handler\';
907 $SIG{\'QUIT\'} = \'handler\';
908 .\|.\|.
909 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
910 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
911
912.fi
913The SIG array only contains values for the signals actually set within
914the perl script.
915.Sh "Packages"
916Perl provides a mechanism for alternate namespaces to protect packages from
917stomping on each others variables.
918By default, a perl script starts compiling into the package known as \*(L"main\*(R".
919By use of the
920.I package
921declaration, you can switch namespaces.
922The scope of the package declaration is from the declaration itself to the end
923of the enclosing block (the same scope as the local() operator).
924Typically it would be the first declaration in a file to be included by
925the \*(L"do FILE\*(R" operator.
926You can switch into a package in more than one place; it merely influences
927which symbol table is used by the compiler for the rest of that block.
663a0e37 928You can refer to variables and filehandles in other packages by prefixing
929the identifier with the package name and a single quote.
a687059c 930If the package name is null, the \*(L"main\*(R" package as assumed.
663a0e37 931.PP
932Only identifiers starting with letters are stored in the packages symbol
933table.
934All other symbols are kept in package \*(L"main\*(R".
935In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
936and SIG are forced to be in package \*(L"main\*(R", even when used for
937other purposes than their built-in one.
938Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
939or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
940will be interpreted instead as a pattern match, a substitution
941or a translation.
942.PP
a687059c 943Eval'ed strings are compiled in the package in which the eval was compiled
944in.
945(Assignments to $SIG{}, however, assume the signal handler specified is in the
946main package.
947Qualify the signal handler name if you wish to have a signal handler in
948a package.)
949For an example, examine perldb.pl in the perl library.
950It initially switches to the DB package so that the debugger doesn't interfere
951with variables in the script you are trying to debug.
952At various points, however, it temporarily switches back to the main package
953to evaluate various expressions in the context of the main package.
954.PP
955The symbol table for a package happens to be stored in the associative array
956of that name prepended with an underscore.
957The value in each entry of the associative array is
958what you are referring to when you use the *name notation.
959In fact, the following have the same effect (in package main, anyway),
960though the first is more
961efficient because it does the symbol table lookups at compile time:
962.nf
963
964.ne 2
965 local(*foo) = *bar;
966 local($_main{'foo'}) = $_main{'bar'};
967
968.fi
969You can use this to print out all the variables in a package, for instance.
970Here is dumpvar.pl from the perl library:
971.nf
972.ne 11
973 package dumpvar;
974
975 sub main'dumpvar {
976 \& ($package) = @_;
977 \& local(*stab) = eval("*_$package");
978 \& while (($key,$val) = each(%stab)) {
979 \& {
980 \& local(*entry) = $val;
981 \& if (defined $entry) {
982 \& print "\e$$key = '$entry'\en";
983 \& }
984.ne 7
985 \& if (defined @entry) {
986 \& print "\e@$key = (\en";
987 \& foreach $num ($[ .. $#entry) {
988 \& print " $num\et'",$entry[$num],"'\en";
989 \& }
990 \& print ")\en";
991 \& }
992.ne 10
993 \& if ($key ne "_$package" && defined %entry) {
994 \& print "\e%$key = (\en";
995 \& foreach $key (sort keys(%entry)) {
996 \& print " $key\et'",$entry{$key},"'\en";
997 \& }
998 \& print ")\en";
999 \& }
1000 \& }
1001 \& }
1002 }
1003
1004.fi
1005Note that, even though the subroutine is compiled in package dumpvar, the
663a0e37 1006name of the subroutine is qualified so that its name is inserted into package
a687059c 1007\*(L"main\*(R".
1008.Sh "Style"
1009Each programmer will, of course, have his or her own preferences in regards
1010to formatting, but there are some general guidelines that will make your
1011programs easier to read.
1012.Ip 1. 4 4
1013Just because you CAN do something a particular way doesn't mean that
1014you SHOULD do it that way.
1015.I Perl
1016is designed to give you several ways to do anything, so consider picking
1017the most readable one.
1018For instance
1019
1020 open(FOO,$foo) || die "Can't open $foo: $!";
1021
1022is better than
1023
1024 die "Can't open $foo: $!" unless open(FOO,$foo);
1025
1026because the second way hides the main point of the statement in a
1027modifier.
1028On the other hand
1029
1030 print "Starting analysis\en" if $verbose;
1031
1032is better than
1033
1034 $verbose && print "Starting analysis\en";
1035
1036since the main point isn't whether the user typed -v or not.
1037.Sp
1038Similarly, just because an operator lets you assume default arguments
1039doesn't mean that you have to make use of the defaults.
1040The defaults are there for lazy systems programmers writing one-shot
1041programs.
1042If you want your program to be readable, consider supplying the argument.
03a14243 1043.Sp
1044Along the same lines, just because you
1045.I can
1046omit parentheses in many places doesn't mean that you ought to:
1047.nf
1048
1049 return print reverse sort num values array;
1050 return print(reverse(sort num (values(%array))));
1051
1052.fi
1053When in doubt, parenthesize.
1054At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1055.Ip 2. 4 4
1056Don't go through silly contortions to exit a loop at the top or the
1057bottom, when
1058.I perl
1059provides the "last" operator so you can exit in the middle.
1060Just outdent it a little to make it more visible:
1061.nf
1062
1063.ne 7
1064 line:
1065 for (;;) {
1066 statements;
1067 last line if $foo;
1068 next line if /^#/;
1069 statements;
1070 }
1071
1072.fi
1073.Ip 3. 4 4
1074Don't be afraid to use loop labels\*(--they're there to enhance readability as
1075well as to allow multi-level loop breaks.
1076See last example.
ffed7fef 1077.Ip 4. 4 4
a687059c 1078For portability, when using features that may not be implemented on every
1079machine, test the construct in an eval to see if it fails.
03a14243 1080If you know what version or patchlevel a particular feature was implemented,
1081you can test $] to see if it will be there.
a687059c 1082.Ip 5. 4 4
ffed7fef 1083Choose mnemonic identifiers.
1084.Ip 6. 4 4
a687059c 1085Be consistent.
1086.Sh "Debugging"
1087If you invoke
1088.I perl
1089with a
1090.B \-d
1091switch, your script will be run under a debugging monitor.
1092It will halt before the first executable statement and ask you for a
1093command, such as:
1094.Ip "h" 12 4
1095Prints out a help message.
1096.Ip "s" 12 4
1097Single step.
1098Executes until it reaches the beginning of another statement.
1099.Ip "c" 12 4
1100Continue.
1101Executes until the next breakpoint is reached.
1102.Ip "<CR>" 12 4
1103Repeat last s or c.
1104.Ip "n" 12 4
1105Single step around subroutine call.
1106.Ip "l min+incr" 12 4
1107List incr+1 lines starting at min.
1108If min is omitted, starts where last listing left off.
1109If incr is omitted, previous value of incr is used.
1110.Ip "l min-max" 12 4
1111List lines in the indicated range.
1112.Ip "l line" 12 4
1113List just the indicated line.
1114.Ip "l" 12 4
1115List incr+1 more lines after last printed line.
1116.Ip "l subname" 12 4
1117List subroutine.
1118If it's a long subroutine it just lists the beginning.
1119Use \*(L"l\*(R" to list more.
1120.Ip "L" 12 4
1121List lines that have breakpoints or actions.
1122.Ip "t" 12 4
1123Toggle trace mode on or off.
1124.Ip "b line" 12 4
1125Set a breakpoint.
1126If line is omitted, sets a breakpoint on the current line
1127line that is about to be executed.
1128Breakpoints may only be set on lines that begin an executable statement.
1129.Ip "b subname" 12 4
1130Set breakpoint at first executable line of subroutine.
1131.Ip "S" 12 4
1132Lists the names of all subroutines.
1133.Ip "d line" 12 4
1134Delete breakpoint.
1135If line is omitted, deletes the breakpoint on the current line
1136line that is about to be executed.
1137.Ip "D" 12 4
1138Delete all breakpoints.
1139.Ip "A" 12 4
1140Delete all line actions.
1141.Ip "V package" 12 4
1142List all variables in package.
1143Default is main package.
1144.Ip "a line command" 12 4
1145Set an action for line.
1146A multi-line command may be entered by backslashing the newlines.
1147.Ip "< command" 12 4
1148Set an action to happen before every debugger prompt.
1149A multi-line command may be entered by backslashing the newlines.
1150.Ip "> command" 12 4
1151Set an action to happen after the prompt when you've just given a command
1152to return to executing the script.
1153A multi-line command may be entered by backslashing the newlines.
1154.Ip "! number" 12 4
1155Redo a debugging command.
1156If number is omitted, redoes the previous command.
1157.Ip "! -number" 12 4
1158Redo the command that was that many commands ago.
1159.Ip "H -number" 12 4
1160Display last n commands.
1161Only commands longer than one character are listed.
1162If number is omitted, lists them all.
1163.Ip "q or ^D" 12 4
1164Quit.
1165.Ip "command" 12 4
1166Execute command as a perl statement.
1167A missing semicolon will be supplied.
1168.Ip "p expr" 12 4
1169Same as \*(L"print DB'OUT expr\*(R".
1170The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1171may be redirected to.
1172.PP
1173If you want to modify the debugger, copy perldb.pl from the perl library
1174to your current directory and modify it as necessary.
1175You can do some customization by setting up a .perldb file which contains
1176initialization code.
1177For instance, you could make aliases like these:
1178.nf
1179
ac58e20f 1180 $DB'alias{'len'} = 's/^len(.*)/p length($1)/';
1181 $DB'alias{'stop'} = 's/^stop (at|in)/b/';
1182 $DB'alias{'.'} =
1183 's/^\e./p "\e$DB\e'sub(\e$DB\e'line):\et",\e$DB\e'line[\e$DB\e'line]/';
a687059c 1184
1185.fi
1186.Sh "Setuid Scripts"
1187.I Perl
1188is designed to make it easy to write secure setuid and setgid scripts.
1189Unlike shells, which are based on multiple substitution passes on each line
1190of the script,
1191.I perl
1192uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1193Additionally, since the language has more built-in functionality, it
1194has to rely less upon external (and possibly untrustworthy) programs to
1195accomplish its purposes.
1196.PP
1197In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1198insecure, but this kernel feature can be disabled.
1199If it is,
1200.I perl
1201can emulate the setuid and setgid mechanism when it notices the otherwise
1202useless setuid/gid bits on perl scripts.
1203If the kernel feature isn't disabled,
1204.I perl
1205will complain loudly that your setuid script is insecure.
1206You'll need to either disable the kernel setuid script feature, or put
1207a C wrapper around the script.
1208.PP
1209When perl is executing a setuid script, it takes special precautions to
1210prevent you from falling into any obvious traps.
1211(In some ways, a perl script is more secure than the corresponding
1212C program.)
1213Any command line argument, environment variable, or input is marked as
1214\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1215command that invokes a subshell, or in any command that modifies files,
1216directories or processes.
1217Any variable that is set within an expression that has previously referenced
1218a tainted value also becomes tainted (even if it is logically impossible
1219for the tainted value to influence the variable).
1220For example:
1221.nf
1222
1223.ne 5
1224 $foo = shift; # $foo is tainted
1225 $bar = $foo,\'bar\'; # $bar is also tainted
1226 $xxx = <>; # Tainted
1227 $path = $ENV{\'PATH\'}; # Tainted, but see below
1228 $abc = \'abc\'; # Not tainted
1229
1230.ne 4
1231 system "echo $foo"; # Insecure
79a0689e 1232 system "/bin/echo", $foo; # Secure (doesn't use sh)
a687059c 1233 system "echo $bar"; # Insecure
1234 system "echo $abc"; # Insecure until PATH set
1235
1236.ne 5
1237 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1238 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1239
1240 $path = $ENV{\'PATH\'}; # Not tainted
1241 system "echo $abc"; # Is secure now!
1242
1243.ne 5
1244 open(FOO,"$foo"); # OK
1245 open(FOO,">$foo"); # Not OK
1246
1247 open(FOO,"echo $foo|"); # Not OK, but...
1248 open(FOO,"-|") || exec \'echo\', $foo; # OK
1249
1250 $zzz = `echo $foo`; # Insecure, zzz tainted
1251
1252 unlink $abc,$foo; # Insecure
1253 umask $foo; # Insecure
1254
1255.ne 3
1256 exec "echo $foo"; # Insecure
1257 exec "echo", $foo; # Secure (doesn't use sh)
1258 exec "sh", \'-c\', $foo; # Considered secure, alas
1259
1260.fi
1261The taintedness is associated with each scalar value, so some elements
1262of an array can be tainted, and others not.
1263.PP
1264If you try to do something insecure, you will get a fatal error saying
1265something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1266Note that you can still write an insecure system call or exec,
ae986130 1267but only by explicitly doing something like the last example above.
a687059c 1268You can also bypass the tainting mechanism by referencing
1269subpatterns\*(--\c
1270.I perl
1271presumes that if you reference a substring using $1, $2, etc, you knew
1272what you were doing when you wrote the pattern:
1273.nf
1274
1275 $ARGV[0] =~ /^\-P(\ew+)$/;
1276 $printer = $1; # Not tainted
1277
1278.fi
1279This is fairly secure since \ew+ doesn't match shell metacharacters.
1280Use of .+ would have been insecure, but
1281.I perl
1282doesn't check for that, so you must be careful with your patterns.
1283This is the ONLY mechanism for untainting user supplied filenames if you
1284want to do file operations on them (unless you make $> equal to $<).
1285.PP
1286It's also possible to get into trouble with other operations that don't care
1287whether they use tainted values.
1288Make judicious use of the file tests in dealing with any user-supplied
1289filenames.
1290When possible, do opens and such after setting $> = $<.
1291.I Perl
1292doesn't prevent you from opening tainted filenames for reading, so be
1293careful what you print out.
1294The tainting mechanism is intended to prevent stupid mistakes, not to remove
1295the need for thought.
1296.SH ENVIRONMENT
1297.I Perl
1298uses PATH in executing subprocesses, and in finding the script if \-S
1299is used.
1300HOME or LOGDIR are used if chdir has no argument.
1301.PP
1302Apart from these,
1303.I perl
1304uses no environment variables, except to make them available
1305to the script being executed, and to child processes.
1306However, scripts running setuid would do well to execute the following lines
1307before doing anything else, just to keep people honest:
1308.nf
1309
1310.ne 3
1311 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1312 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1313 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1314
1315.fi
1316.SH AUTHOR
1317Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1318.SH FILES
1319/tmp/perl\-eXXXXXX temporary file for
1320.B \-e
1321commands.
1322.SH SEE ALSO
1323a2p awk to perl translator
1324.br
1325s2p sed to perl translator
1326.SH DIAGNOSTICS
1327Compilation errors will tell you the line number of the error, with an
1328indication of the next token or token type that was to be examined.
1329(In the case of a script passed to
1330.I perl
1331via
1332.B \-e
1333switches, each
1334.B \-e
1335is counted as one line.)
1336.PP
1337Setuid scripts have additional constraints that can produce error messages
1338such as \*(L"Insecure dependency\*(R".
1339See the section on setuid scripts.
1340.SH TRAPS
1341Accustomed
1342.IR awk
1343users should take special note of the following:
1344.Ip * 4 2
1345Semicolons are required after all simple statements in
1346.IR perl .
1347Newline
1348is not a statement delimiter.
1349.Ip * 4 2
1350Curly brackets are required on ifs and whiles.
1351.Ip * 4 2
1352Variables begin with $ or @ in
1353.IR perl .
1354.Ip * 4 2
1355Arrays index from 0 unless you set $[.
1356Likewise string positions in substr() and index().
1357.Ip * 4 2
1358You have to decide whether your array has numeric or string indices.
1359.Ip * 4 2
1360Associative array values do not spring into existence upon mere reference.
1361.Ip * 4 2
1362You have to decide whether you want to use string or numeric comparisons.
1363.Ip * 4 2
1364Reading an input line does not split it for you. You get to split it yourself
1365to an array.
1366And the
1367.I split
1368operator has different arguments.
1369.Ip * 4 2
1370The current input line is normally in $_, not $0.
1371It generally does not have the newline stripped.
ac58e20f 1372($0 is the name of the program executed.)
a687059c 1373.Ip * 4 2
1374$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1375match pattern.
1376.Ip * 4 2
1377The
1378.I print
1379statement does not add field and record separators unless you set
1380$, and $\e.
1381.Ip * 4 2
1382You must open your files before you print to them.
1383.Ip * 4 2
1384The range operator is \*(L".\|.\*(R", not comma.
1385(The comma operator works as in C.)
1386.Ip * 4 2
1387The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1388(\*(L"~\*(R" is the one's complement operator, as in C.)
1389.Ip * 4 2
1390The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1391(\*(L"^\*(R" is the XOR operator, as in C.)
1392.Ip * 4 2
1393The concatenation operator is \*(L".\*(R", not the null string.
1394(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1395since the third slash would be interpreted as a division operator\*(--the
1396tokener is in fact slightly context sensitive for operators like /, ?, and <.
1397And in fact, . itself can be the beginning of a number.)
1398.Ip * 4 2
1399.IR Next ,
1400.I exit
1401and
1402.I continue
1403work differently.
1404.Ip * 4 2
1405The following variables work differently
1406.nf
1407
1408 Awk \h'|2.5i'Perl
1409 ARGC \h'|2.5i'$#ARGV
1410 ARGV[0] \h'|2.5i'$0
1411 FILENAME\h'|2.5i'$ARGV
1412 FNR \h'|2.5i'$. \- something
1413 FS \h'|2.5i'(whatever you like)
1414 NF \h'|2.5i'$#Fld, or some such
1415 NR \h'|2.5i'$.
1416 OFMT \h'|2.5i'$#
1417 OFS \h'|2.5i'$,
1418 ORS \h'|2.5i'$\e
1419 RLENGTH \h'|2.5i'length($&)
ac58e20f 1420 RS \h'|2.5i'$/
a687059c 1421 RSTART \h'|2.5i'length($\`)
1422 SUBSEP \h'|2.5i'$;
1423
1424.fi
1425.Ip * 4 2
1426When in doubt, run the
1427.I awk
1428construct through a2p and see what it gives you.
1429.PP
1430Cerebral C programmers should take note of the following:
1431.Ip * 4 2
1432Curly brackets are required on ifs and whiles.
1433.Ip * 4 2
1434You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1435.Ip * 4 2
1436.I Break
1437and
1438.I continue
1439become
1440.I last
1441and
1442.IR next ,
1443respectively.
1444.Ip * 4 2
1445There's no switch statement.
1446.Ip * 4 2
1447Variables begin with $ or @ in
1448.IR perl .
1449.Ip * 4 2
1450Printf does not implement *.
1451.Ip * 4 2
1452Comments begin with #, not /*.
1453.Ip * 4 2
1454You can't take the address of anything.
1455.Ip * 4 2
1456ARGV must be capitalized.
1457.Ip * 4 2
1458The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1459.Ip * 4 2
1460Signal handlers deal with signal names, not numbers.
1461.Ip * 4 2
1462You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1463.PP
1464Seasoned
1465.I sed
1466programmers should take note of the following:
1467.Ip * 4 2
1468Backreferences in substitutions use $ rather than \e.
1469.Ip * 4 2
1470The pattern matching metacharacters (, ), and | do not have backslashes in front.
1471.Ip * 4 2
1472The range operator is .\|. rather than comma.
1473.PP
1474Sharp shell programmers should take note of the following:
1475.Ip * 4 2
1476The backtick operator does variable interpretation without regard to the
1477presence of single quotes in the command.
1478.Ip * 4 2
1479The backtick operator does no translation of the return value, unlike csh.
1480.Ip * 4 2
1481Shells (especially csh) do several levels of substitution on each command line.
1482.I Perl
1483does substitution only in certain constructs such as double quotes,
1484backticks, angle brackets and search patterns.
1485.Ip * 4 2
1486Shells interpret scripts a little bit at a time.
1487.I Perl
1488compiles the whole program before executing it.
1489.Ip * 4 2
1490The arguments are available via @ARGV, not $1, $2, etc.
1491.Ip * 4 2
1492The environment is not automatically made available as variables.
1493.SH BUGS
1494.PP
1495.I Perl
1496is at the mercy of your machine's definitions of various operations
1497such as type casting, atof() and sprintf().
1498.PP
1499If your stdio requires an seek or eof between reads and writes on a particular
1500stream, so does
1501.IR perl .
1502.PP
1503While none of the built-in data types have any arbitrary size limits (apart
1504from memory size), there are still a few arbitrary limits:
1505a given identifier may not be longer than 255 characters;
1506sprintf is limited on many machines to 128 characters per field (unless the format
1507specifier is exactly %s);
1508and no component of your PATH may be longer than 255 if you use \-S.
1509.PP
1510.I Perl
1511actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1512anyone I said that.
1513.rn }` ''