perl 3.0 patch #13 (combined patch)
[p5sagit/p5-mst-13.2.git] / perl.man.4
CommitLineData
a687059c 1''' Beginning of part 4
ac58e20f 2''' $Header: perl.man.4,v 3.0.1.5 90/02/28 18:01:52 lwall Locked $
a687059c 3'''
4''' $Log: perl.man.4,v $
ac58e20f 5''' Revision 3.0.1.5 90/02/28 18:01:52 lwall
6''' patch9: $0 is now always the command name
7'''
663a0e37 8''' Revision 3.0.1.4 89/12/21 20:12:39 lwall
9''' patch7: documented that package'filehandle works as well as $package'variable
10''' patch7: documented which identifiers are always in package main
11'''
ffed7fef 12''' Revision 3.0.1.3 89/11/17 15:32:25 lwall
13''' patch5: fixed some manual typos and indent problems
14''' patch5: clarified difference between $! and $@
15'''
ae986130 16''' Revision 3.0.1.2 89/11/11 04:46:40 lwall
17''' patch2: made some line breaks depend on troff vs. nroff
18''' patch2: clarified operation of ^ and $ when $* is false
19'''
03a14243 20''' Revision 3.0.1.1 89/10/26 23:18:43 lwall
21''' patch1: documented the desirability of unnecessary parentheses
22'''
a687059c 23''' Revision 3.0 89/10/18 15:21:55 lwall
24''' 3.0 baseline
25'''
26.Sh "Precedence"
27.I Perl
28operators have the following associativity and precedence:
29.nf
30
31nonassoc\h'|1i'print printf exec system sort reverse
32\h'1.5i'chmod chown kill unlink utime die return
33left\h'|1i',
34right\h'|1i'= += \-= *= etc.
35right\h'|1i'?:
36nonassoc\h'|1i'.\|.
37left\h'|1i'||
38left\h'|1i'&&
39left\h'|1i'| ^
40left\h'|1i'&
41nonassoc\h'|1i'== != eq ne
42nonassoc\h'|1i'< > <= >= lt gt le ge
43nonassoc\h'|1i'chdir exit eval reset sleep rand umask
44nonassoc\h'|1i'\-r \-w \-x etc.
45left\h'|1i'<< >>
46left\h'|1i'+ \- .
47left\h'|1i'* / % x
48left\h'|1i'=~ !~
49right\h'|1i'! ~ and unary minus
50right\h'|1i'**
51nonassoc\h'|1i'++ \-\|\-
52left\h'|1i'\*(L'(\*(R'
53
54.fi
55As mentioned earlier, if any list operator (print, etc.) or
56any unary operator (chdir, etc.)
57is followed by a left parenthesis as the next token on the same line,
58the operator and arguments within parentheses are taken to
59be of highest precedence, just like a normal function call.
60Examples:
61.nf
62
ffed7fef 63 chdir $foo || die;\h'|3i'# (chdir $foo) || die
64 chdir($foo) || die;\h'|3i'# (chdir $foo) || die
65 chdir ($foo) || die;\h'|3i'# (chdir $foo) || die
66 chdir +($foo) || die;\h'|3i'# (chdir $foo) || die
a687059c 67
68but, because * is higher precedence than ||:
69
ffed7fef 70 chdir $foo * 20;\h'|3i'# chdir ($foo * 20)
71 chdir($foo) * 20;\h'|3i'# (chdir $foo) * 20
72 chdir ($foo) * 20;\h'|3i'# (chdir $foo) * 20
73 chdir +($foo) * 20;\h'|3i'# chdir ($foo * 20)
a687059c 74
ffed7fef 75 rand 10 * 20;\h'|3i'# rand (10 * 20)
76 rand(10) * 20;\h'|3i'# (rand 10) * 20
77 rand (10) * 20;\h'|3i'# (rand 10) * 20
78 rand +(10) * 20;\h'|3i'# rand (10 * 20)
a687059c 79
80.fi
81In the absence of parentheses,
82the precedence of list operators such as print, sort or chmod is
83either very high or very low depending on whether you look at the left
84side of operator or the right side of it.
85For example, in
86.nf
87
88 @ary = (1, 3, sort 4, 2);
89 print @ary; # prints 1324
90
91.fi
92the commas on the right of the sort are evaluated before the sort, but
93the commas on the left are evaluated after.
94In other words, list operators tend to gobble up all the arguments that
95follow them, and then act like a simple term with regard to the preceding
96expression.
97Note that you have to be careful with parens:
98.nf
99
100.ne 3
101 # These evaluate exit before doing the print:
102 print($foo, exit); # Obviously not what you want.
103 print $foo, exit; # Nor is this.
104
105.ne 4
106 # These do the print before evaluating exit:
107 (print $foo), exit; # This is what you want.
108 print($foo), exit; # Or this.
109 print ($foo), exit; # Or even this.
110
111Also note that
112
113 print ($foo & 255) + 1, "\en";
114
115.fi
116probably doesn't do what you expect at first glance.
117.Sh "Subroutines"
118A subroutine may be declared as follows:
119.nf
120
121 sub NAME BLOCK
122
123.fi
124.PP
125Any arguments passed to the routine come in as array @_,
126that is ($_[0], $_[1], .\|.\|.).
127The array @_ is a local array, but its values are references to the
128actual scalar parameters.
129The return value of the subroutine is the value of the last expression
130evaluated, and can be either an array value or a scalar value.
131Alternately, a return statement may be used to specify the returned value and
132exit the subroutine.
133To create local variables see the
134.I local
135operator.
136.PP
137A subroutine is called using the
138.I do
139operator or the & operator.
140.nf
141
142.ne 12
143Example:
144
145 sub MAX {
146 local($max) = pop(@_);
147 foreach $foo (@_) {
148 $max = $foo \|if \|$max < $foo;
149 }
150 $max;
151 }
152
153 .\|.\|.
154 $bestday = &MAX($mon,$tue,$wed,$thu,$fri);
155
156.ne 21
157Example:
158
159 # get a line, combining continuation lines
160 # that start with whitespace
161 sub get_line {
162 $thisline = $lookahead;
163 line: while ($lookahead = <STDIN>) {
164 if ($lookahead \|=~ \|/\|^[ \^\e\|t]\|/\|) {
165 $thisline \|.= \|$lookahead;
166 }
167 else {
168 last line;
169 }
170 }
171 $thisline;
172 }
173
174 $lookahead = <STDIN>; # get first line
175 while ($_ = do get_line(\|)) {
176 .\|.\|.
177 }
178
179.fi
180.nf
181.ne 6
182Use array assignment to a local list to name your formal arguments:
183
184 sub maybeset {
185 local($key, $value) = @_;
186 $foo{$key} = $value unless $foo{$key};
187 }
188
189.fi
190This also has the effect of turning call-by-reference into call-by-value,
191since the assignment copies the values.
192.Sp
193Subroutines may be called recursively.
194If a subroutine is called using the & form, the argument list is optional.
195If omitted, no @_ array is set up for the subroutine; the @_ array at the
196time of the call is visible to subroutine instead.
197.nf
198
199 do foo(1,2,3); # pass three arguments
200 &foo(1,2,3); # the same
201
202 do foo(); # pass a null list
203 &foo(); # the same
204 &foo; # pass no arguments--more efficient
205
206.fi
207.Sh "Passing By Reference"
208Sometimes you don't want to pass the value of an array to a subroutine but
209rather the name of it, so that the subroutine can modify the global copy
210of it rather than working with a local copy.
211In perl you can refer to all the objects of a particular name by prefixing
212the name with a star: *foo.
213When evaluated, it produces a scalar value that represents all the objects
214of that name.
215When assigned to within a local() operation, it causes the name mentioned
216to refer to whatever * value was assigned to it.
217Example:
218.nf
219
220 sub doubleary {
221 local(*someary) = @_;
222 foreach $elem (@someary) {
223 $elem *= 2;
224 }
225 }
226 do doubleary(*foo);
227 do doubleary(*bar);
228
229.fi
230Assignment to *name is currently recommended only inside a local().
231You can actually assign to *name anywhere, but the previous referent of
232*name may be stranded forever.
233This may or may not bother you.
234.Sp
235Note that scalars are already passed by reference, so you can modify scalar
ae986130 236arguments without using this mechanism by referring explicitly to the $_[nnn]
a687059c 237in question.
238You can modify all the elements of an array by passing all the elements
239as scalars, but you have to use the * mechanism to push, pop or change the
240size of an array.
241The * mechanism will probably be more efficient in any case.
242.Sp
243Since a *name value contains unprintable binary data, if it is used as
244an argument in a print, or as a %s argument in a printf or sprintf, it
245then has the value '*name', just so it prints out pretty.
246.Sh "Regular Expressions"
247The patterns used in pattern matching are regular expressions such as
248those supplied in the Version 8 regexp routines.
249(In fact, the routines are derived from Henry Spencer's freely redistributable
250reimplementation of the V8 routines.)
251In addition, \ew matches an alphanumeric character (including \*(L"_\*(R") and \eW a nonalphanumeric.
252Word boundaries may be matched by \eb, and non-boundaries by \eB.
253A whitespace character is matched by \es, non-whitespace by \eS.
254A numeric character is matched by \ed, non-numeric by \eD.
255You may use \ew, \es and \ed within character classes.
256Also, \en, \er, \ef, \et and \eNNN have their normal interpretations.
257Within character classes \eb represents backspace rather than a word boundary.
258Alternatives may be separated by |.
259The bracketing construct \|(\ .\|.\|.\ \|) may also be used, in which case \e<digit>
260matches the digit'th substring, where digit can range from 1 to 9.
261(Outside of the pattern, always use $ instead of \e in front of the digit.
262The scope of $<digit> (and $\`, $& and $\')
263extends to the end of the enclosing BLOCK or eval string, or to
264the next pattern match with subexpressions.
265The \e<digit> notation sometimes works outside the current pattern, but should
266not be relied upon.)
267$+ returns whatever the last bracket match matched.
268$& returns the entire matched string.
ac58e20f 269($0 used to return the same thing, but not any more.)
a687059c 270$\` returns everything before the matched string.
271$\' returns everything after the matched string.
272Examples:
273.nf
274
275 s/\|^\|([^ \|]*\|) \|*([^ \|]*\|)\|/\|$2 $1\|/; # swap first two words
276
277.ne 5
278 if (/\|Time: \|(.\|.\|):\|(.\|.\|):\|(.\|.\|)\|/\|) {
279 $hours = $1;
280 $minutes = $2;
281 $seconds = $3;
282 }
283
284.fi
ae986130 285By default, the ^ character is only guaranteed to match at the beginning
286of the string,
287the $ character only at the end (or before the newline at the end)
a687059c 288and
289.I perl
290does certain optimizations with the assumption that the string contains
291only one line.
ae986130 292The behavior of ^ and $ on embedded newlines will be inconsistent.
a687059c 293You may, however, wish to treat a string as a multi-line buffer, such that
294the ^ will match after any newline within the string, and $ will match
295before any newline.
296At the cost of a little more overhead, you can do this by setting the variable
297$* to 1.
298Setting it back to 0 makes
299.I perl
300revert to its old behavior.
301.PP
302To facilitate multi-line substitutions, the . character never matches a newline
303(even when $* is 0).
304In particular, the following leaves a newline on the $_ string:
305.nf
306
307 $_ = <STDIN>;
308 s/.*(some_string).*/$1/;
309
310If the newline is unwanted, try one of
311
312 s/.*(some_string).*\en/$1/;
313 s/.*(some_string)[^\e000]*/$1/;
314 s/.*(some_string)(.|\en)*/$1/;
315 chop; s/.*(some_string).*/$1/;
316 /(some_string)/ && ($_ = $1);
317
318.fi
319Any item of a regular expression may be followed with digits in curly brackets
320of the form {n,m}, where n gives the minimum number of times to match the item
321and m gives the maximum.
322The form {n} is equivalent to {n,n} and matches exactly n times.
323The form {n,} matches n or more times.
324(If a curly bracket occurs in any other context, it is treated as a regular
325character.)
326The * modifier is equivalent to {0,}, the + modifier to {1,} and the ? modifier
327to {0,1}.
328There is no limit to the size of n or m, but large numbers will chew up
329more memory.
330.Sp
331You will note that all backslashed metacharacters in
332.I perl
333are alphanumeric,
334such as \eb, \ew, \en.
335Unlike some other regular expression languages, there are no backslashed
336symbols that aren't alphanumeric.
337So anything that looks like \e\e, \e(, \e), \e<, \e>, \e{, or \e} is always
338interpreted as a literal character, not a metacharacter.
339This makes it simple to quote a string that you want to use for a pattern
340but that you are afraid might contain metacharacters.
341Simply quote all the non-alphanumeric characters:
342.nf
343
344 $pattern =~ s/(\eW)/\e\e$1/g;
345
346.fi
347.Sh "Formats"
348Output record formats for use with the
349.I write
350operator may declared as follows:
351.nf
352
353.ne 3
354 format NAME =
355 FORMLIST
356 .
357
358.fi
359If name is omitted, format \*(L"STDOUT\*(R" is defined.
360FORMLIST consists of a sequence of lines, each of which may be of one of three
361types:
362.Ip 1. 4
363A comment.
364.Ip 2. 4
365A \*(L"picture\*(R" line giving the format for one output line.
366.Ip 3. 4
367An argument line supplying values to plug into a picture line.
368.PP
369Picture lines are printed exactly as they look, except for certain fields
370that substitute values into the line.
371Each picture field starts with either @ or ^.
372The @ field (not to be confused with the array marker @) is the normal
373case; ^ fields are used
374to do rudimentary multi-line text block filling.
375The length of the field is supplied by padding out the field
376with multiple <, >, or | characters to specify, respectively, left justification,
377right justification, or centering.
378If any of the values supplied for these fields contains a newline, only
379the text up to the newline is printed.
380The special field @* can be used for printing multi-line values.
381It should appear by itself on a line.
382.PP
383The values are specified on the following line, in the same order as
384the picture fields.
385The values should be separated by commas.
386.PP
387Picture fields that begin with ^ rather than @ are treated specially.
388The value supplied must be a scalar variable name which contains a text
389string.
390.I Perl
391puts as much text as it can into the field, and then chops off the front
392of the string so that the next time the variable is referenced,
393more of the text can be printed.
394Normally you would use a sequence of fields in a vertical stack to print
395out a block of text.
396If you like, you can end the final field with .\|.\|., which will appear in the
397output if the text was too long to appear in its entirety.
398You can change which characters are legal to break on by changing the
399variable $: to a list of the desired characters.
400.PP
401Since use of ^ fields can produce variable length records if the text to be
402formatted is short, you can suppress blank lines by putting the tilde (~)
403character anywhere in the line.
404(Normally you should put it in the front if possible, for visibility.)
405The tilde will be translated to a space upon output.
406If you put a second tilde contiguous to the first, the line will be repeated
407until all the fields on the line are exhausted.
408(If you use a field of the @ variety, the expression you supply had better
409not give the same value every time forever!)
410.PP
411Examples:
412.nf
413.lg 0
414.cs R 25
415.ft C
416
417.ne 10
418# a report on the /etc/passwd file
419format top =
420\& Passwd File
421Name Login Office Uid Gid Home
422------------------------------------------------------------------
423\&.
424format STDOUT =
425@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
426$name, $login, $office,$uid,$gid, $home
427\&.
428
429.ne 29
430# a report from a bug report form
431format top =
432\& Bug Reports
433@<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
434$system, $%, $date
435------------------------------------------------------------------
436\&.
437format STDOUT =
438Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
439\& $subject
440Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
441\& $index, $description
442Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
443\& $priority, $date, $description
444From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
445\& $from, $description
446Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
447\& $programmer, $description
448\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
449\& $description
450\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
451\& $description
452\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
453\& $description
454\&~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
455\& $description
456\&~ ^<<<<<<<<<<<<<<<<<<<<<<<...
457\& $description
458\&.
459
460.ft R
461.cs R
462.lg
463.fi
464It is possible to intermix prints with writes on the same output channel,
465but you'll have to handle $\- (lines left on the page) yourself.
466.PP
467If you are printing lots of fields that are usually blank, you should consider
468using the reset operator between records.
469Not only is it more efficient, but it can prevent the bug of adding another
470field and forgetting to zero it.
471.Sh "Interprocess Communication"
472The IPC facilities of perl are built on the Berkeley socket mechanism.
473If you don't have sockets, you can ignore this section.
474The calls have the same names as the corresponding system calls,
475but the arguments tend to differ, for two reasons.
476First, perl file handles work differently than C file descriptors.
477Second, perl already knows the length of its strings, so you don't need
478to pass that information.
479Here is a sample client (untested):
480.nf
481
482 ($them,$port) = @ARGV;
483 $port = 2345 unless $port;
484 $them = 'localhost' unless $them;
485
486 $SIG{'INT'} = 'dokill';
487 sub dokill { kill 9,$child if $child; }
488
489 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
490
491 $sockaddr = 'S n a4 x8';
492 chop($hostname = `hostname`);
493
494 ($name, $aliases, $proto) = getprotobyname('tcp');
495 ($name, $aliases, $port) = getservbyname($port, 'tcp')
496 unless $port =~ /^\ed+$/;;
ae986130 497.ie t \{\
a687059c 498 ($name, $aliases, $type, $len, $thisaddr) = gethostbyname($hostname);
ae986130 499'br\}
500.el \{\
501 ($name, $aliases, $type, $len, $thisaddr) =
502 gethostbyname($hostname);
503'br\}
a687059c 504 ($name, $aliases, $type, $len, $thataddr) = gethostbyname($them);
505
506 $this = pack($sockaddr, &AF_INET, 0, $thisaddr);
507 $that = pack($sockaddr, &AF_INET, $port, $thataddr);
508
509 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
510 bind(S, $this) || die "bind: $!";
511 connect(S, $that) || die "connect: $!";
512
513 select(S); $| = 1; select(stdout);
514
515 if ($child = fork) {
516 while (<>) {
517 print S;
518 }
519 sleep 3;
520 do dokill();
521 }
522 else {
523 while (<S>) {
524 print;
525 }
526 }
527
528.fi
529And here's a server:
530.nf
531
532 ($port) = @ARGV;
533 $port = 2345 unless $port;
534
535 do 'sys/socket.h' || die "Can't do sys/socket.h: $@";
536
537 $sockaddr = 'S n a4 x8';
538
539 ($name, $aliases, $proto) = getprotobyname('tcp');
540 ($name, $aliases, $port) = getservbyname($port, 'tcp')
541 unless $port =~ /^\ed+$/;;
542
543 $this = pack($sockaddr, &AF_INET, $port, "\e0\e0\e0\e0");
544
545 select(NS); $| = 1; select(stdout);
546
547 socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
548 bind(S, $this) || die "bind: $!";
549 listen(S, 5) || die "connect: $!";
550
551 select(S); $| = 1; select(stdout);
552
553 for (;;) {
554 print "Listening again\en";
555 ($addr = accept(NS,S)) || die $!;
556 print "accept ok\en";
557
ae986130 558 ($af,$port,$inetaddr) = unpack($sockaddr,$addr);
a687059c 559 @inetaddr = unpack('C4',$inetaddr);
560 print "$af $port @inetaddr\en";
561
562 while (<NS>) {
563 print;
564 print NS;
565 }
566 }
567
568.fi
569.Sh "Predefined Names"
570The following names have special meaning to
571.IR perl .
572I could have used alphabetic symbols for some of these, but I didn't want
573to take the chance that someone would say reset \*(L"a\-zA\-Z\*(R" and wipe them all
574out.
575You'll just have to suffer along with these silly symbols.
576Most of them have reasonable mnemonics, or analogues in one of the shells.
577.Ip $_ 8
578The default input and pattern-searching space.
579The following pairs are equivalent:
580.nf
581
582.ne 2
583 while (<>) {\|.\|.\|. # only equivalent in while!
584 while ($_ = <>) {\|.\|.\|.
585
586.ne 2
587 /\|^Subject:/
588 $_ \|=~ \|/\|^Subject:/
589
590.ne 2
591 y/a\-z/A\-Z/
592 $_ =~ y/a\-z/A\-Z/
593
594.ne 2
595 chop
596 chop($_)
597
598.fi
599(Mnemonic: underline is understood in certain operations.)
600.Ip $. 8
601The current input line number of the last filehandle that was read.
602Readonly.
603Remember that only an explicit close on the filehandle resets the line number.
604Since <> never does an explicit close, line numbers increase across ARGV files
605(but see examples under eof).
606(Mnemonic: many programs use . to mean the current line number.)
607.Ip $/ 8
608The input record separator, newline by default.
609Works like
610.IR awk 's
611RS variable, including treating blank lines as delimiters
612if set to the null string.
613If set to a value longer than one character, only the first character is used.
614(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
615.Ip $, 8
616The output field separator for the print operator.
617Ordinarily the print operator simply prints out the comma separated fields
618you specify.
619In order to get behavior more like
620.IR awk ,
621set this variable as you would set
622.IR awk 's
623OFS variable to specify what is printed between fields.
624(Mnemonic: what is printed when there is a , in your print statement.)
625.Ip $"" 8
626This is like $, except that it applies to array values interpolated into
627a double-quoted string (or similar interpreted string).
628Default is a space.
629(Mnemonic: obvious, I think.)
630.Ip $\e 8
631The output record separator for the print operator.
632Ordinarily the print operator simply prints out the comma separated fields
633you specify, with no trailing newline or record separator assumed.
634In order to get behavior more like
635.IR awk ,
636set this variable as you would set
637.IR awk 's
638ORS variable to specify what is printed at the end of the print.
639(Mnemonic: you set $\e instead of adding \en at the end of the print.
640Also, it's just like /, but it's what you get \*(L"back\*(R" from
641.IR perl .)
642.Ip $# 8
643The output format for printed numbers.
644This variable is a half-hearted attempt to emulate
645.IR awk 's
646OFMT variable.
647There are times, however, when
648.I awk
649and
650.I perl
651have differing notions of what
652is in fact numeric.
653Also, the initial value is %.20g rather than %.6g, so you need to set $#
654explicitly to get
655.IR awk 's
656value.
657(Mnemonic: # is the number sign.)
658.Ip $% 8
659The current page number of the currently selected output channel.
660(Mnemonic: % is page number in nroff.)
661.Ip $= 8
662The current page length (printable lines) of the currently selected output
663channel.
664Default is 60.
665(Mnemonic: = has horizontal lines.)
666.Ip $\- 8
667The number of lines left on the page of the currently selected output channel.
668(Mnemonic: lines_on_page \- lines_printed.)
669.Ip $~ 8
670The name of the current report format for the currently selected output
671channel.
672(Mnemonic: brother to $^.)
673.Ip $^ 8
674The name of the current top-of-page format for the currently selected output
675channel.
676(Mnemonic: points to top of page.)
677.Ip $| 8
678If set to nonzero, forces a flush after every write or print on the currently
679selected output channel.
680Default is 0.
681Note that
682.I STDOUT
683will typically be line buffered if output is to the
684terminal and block buffered otherwise.
685Setting this variable is useful primarily when you are outputting to a pipe,
686such as when you are running a
687.I perl
688script under rsh and want to see the
689output as it's happening.
690(Mnemonic: when you want your pipes to be piping hot.)
691.Ip $$ 8
692The process number of the
693.I perl
694running this script.
695(Mnemonic: same as shells.)
696.Ip $? 8
697The status returned by the last pipe close, backtick (\`\`) command or
698.I system
699operator.
700Note that this is the status word returned by the wait() system
701call, so the exit value of the subprocess is actually ($? >> 8).
702$? & 255 gives which signal, if any, the process died from, and whether
703there was a core dump.
704(Mnemonic: similar to sh and ksh.)
705.Ip $& 8 4
706The string matched by the last pattern match (not counting any matches hidden
707within a BLOCK or eval enclosed by the current BLOCK).
708(Mnemonic: like & in some editors.)
709.Ip $\` 8 4
710The string preceding whatever was matched by the last pattern match
711(not counting any matches hidden within a BLOCK or eval enclosed by the current
712BLOCK).
713(Mnemonic: \` often precedes a quoted string.)
714.Ip $\' 8 4
715The string following whatever was matched by the last pattern match
716(not counting any matches hidden within a BLOCK or eval enclosed by the current
717BLOCK).
718(Mnemonic: \' often follows a quoted string.)
719Example:
720.nf
721
722.ne 3
723 $_ = \'abcdefghi\';
724 /def/;
725 print "$\`:$&:$\'\en"; # prints abc:def:ghi
726
727.fi
728.Ip $+ 8 4
729The last bracket matched by the last search pattern.
730This is useful if you don't know which of a set of alternative patterns
731matched.
732For example:
733.nf
734
735 /Version: \|(.*\|)|Revision: \|(.*\|)\|/ \|&& \|($rev = $+);
736
737.fi
738(Mnemonic: be positive and forward looking.)
739.Ip $* 8 2
740Set to 1 to do multiline matching within a string, 0 to tell
741.I perl
742that it can assume that strings contain a single line, for the purpose
743of optimizing pattern matches.
744Pattern matches on strings containing multiple newlines can produce confusing
745results when $* is 0.
746Default is 0.
747(Mnemonic: * matches multiple things.)
748.Ip $0 8
749Contains the name of the file containing the
750.I perl
751script being executed.
a687059c 752(Mnemonic: same as sh and ksh.)
753.Ip $<digit> 8
754Contains the subpattern from the corresponding set of parentheses in the last
755pattern matched, not counting patterns matched in nested blocks that have
756been exited already.
757(Mnemonic: like \edigit.)
758.Ip $[ 8 2
759The index of the first element in an array, and of the first character in
760a substring.
761Default is 0, but you could set it to 1 to make
762.I perl
763behave more like
764.I awk
765(or Fortran)
766when subscripting and when evaluating the index() and substr() functions.
767(Mnemonic: [ begins subscripts.)
768.Ip $] 8 2
769The string printed out when you say \*(L"perl -v\*(R".
770It can be used to determine at the beginning of a script whether the perl
771interpreter executing the script is in the right range of versions.
772Example:
773.nf
774
775.ne 5
776 # see if getc is available
777 ($version,$patchlevel) =
778 $] =~ /(\ed+\e.\ed+).*\enPatch level: (\ed+)/;
779 print STDERR "(No filename completion available.)\en"
780 if $version * 1000 + $patchlevel < 2016;
781
782.fi
783(Mnemonic: Is this version of perl in the right bracket?)
784.Ip $; 8 2
785The subscript separator for multi-dimensional array emulation.
786If you refer to an associative array element as
787.nf
788 $foo{$a,$b,$c}
789
790it really means
791
792 $foo{join($;, $a, $b, $c)}
793
794But don't put
795
796 @foo{$a,$b,$c} # a slice--note the @
797
798which means
799
800 ($foo{$a},$foo{$b},$foo{$c})
801
802.fi
803Default is "\e034", the same as SUBSEP in
804.IR awk .
805Note that if your keys contain binary data there might not be any safe
806value for $;.
807(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
808Yeah, I know, it's pretty lame, but $, is already taken for something more
809important.)
810.Ip $! 8 2
811If used in a numeric context, yields the current value of errno, with all the
812usual caveats.
ffed7fef 813(This means that you shouldn't depend on the value of $! to be anything
814in particular unless you've gotten a specific error return indicating a
815system error.)
a687059c 816If used in a string context, yields the corresponding system error string.
817You can assign to $! in order to set errno
818if, for instance, you want $! to return the string for error n, or you want
819to set the exit value for the die operator.
820(Mnemonic: What just went bang?)
821.Ip $@ 8 2
ffed7fef 822The perl syntax error message from the last eval command.
823If null, the last eval parsed and executed correctly (although the operations
824you invoked may have failed in the normal fashion).
a687059c 825(Mnemonic: Where was the syntax error \*(L"at\*(R"?)
826.Ip $< 8 2
827The real uid of this process.
828(Mnemonic: it's the uid you came FROM, if you're running setuid.)
829.Ip $> 8 2
830The effective uid of this process.
831Example:
832.nf
833
834.ne 2
835 $< = $>; # set real uid to the effective uid
836 ($<,$>) = ($>,$<); # swap real and effective uid
837
838.fi
839(Mnemonic: it's the uid you went TO, if you're running setuid.)
840Note: $< and $> can only be swapped on machines supporting setreuid().
841.Ip $( 8 2
842The real gid of this process.
843If you are on a machine that supports membership in multiple groups
844simultaneously, gives a space separated list of groups you are in.
845The first number is the one returned by getgid(), and the subsequent ones
846by getgroups(), one of which may be the same as the first number.
847(Mnemonic: parentheses are used to GROUP things.
848The real gid is the group you LEFT, if you're running setgid.)
849.Ip $) 8 2
850The effective gid of this process.
851If you are on a machine that supports membership in multiple groups
852simultaneously, gives a space separated list of groups you are in.
853The first number is the one returned by getegid(), and the subsequent ones
854by getgroups(), one of which may be the same as the first number.
855(Mnemonic: parentheses are used to GROUP things.
856The effective gid is the group that's RIGHT for you, if you're running setgid.)
857.Sp
858Note: $<, $>, $( and $) can only be set on machines that support the
859corresponding set[re][ug]id() routine.
860$( and $) can only be swapped on machines supporting setregid().
861.Ip $: 8 2
862The current set of characters after which a string may be broken to
863fill continuation fields (starting with ^) in a format.
864Default is "\ \en-", to break on whitespace or hyphens.
865(Mnemonic: a \*(L"colon\*(R" in poetry is a part of a line.)
866.Ip @ARGV 8 3
867The array ARGV contains the command line arguments intended for the script.
868Note that $#ARGV is the generally number of arguments minus one, since
869$ARGV[0] is the first argument, NOT the command name.
870See $0 for the command name.
871.Ip @INC 8 3
872The array INC contains the list of places to look for
873.I perl
874scripts to be
875evaluated by the \*(L"do EXPR\*(R" command.
876It initially consists of the arguments to any
877.B \-I
878command line switches, followed
879by the default
880.I perl
881library, probably \*(L"/usr/local/lib/perl\*(R".
882.Ip $ENV{expr} 8 2
883The associative array ENV contains your current environment.
884Setting a value in ENV changes the environment for child processes.
885.Ip $SIG{expr} 8 2
886The associative array SIG is used to set signal handlers for various signals.
887Example:
888.nf
889
890.ne 12
891 sub handler { # 1st argument is signal name
892 local($sig) = @_;
893 print "Caught a SIG$sig\-\|\-shutting down\en";
894 close(LOG);
895 exit(0);
896 }
897
898 $SIG{\'INT\'} = \'handler\';
899 $SIG{\'QUIT\'} = \'handler\';
900 .\|.\|.
901 $SIG{\'INT\'} = \'DEFAULT\'; # restore default action
902 $SIG{\'QUIT\'} = \'IGNORE\'; # ignore SIGQUIT
903
904.fi
905The SIG array only contains values for the signals actually set within
906the perl script.
907.Sh "Packages"
908Perl provides a mechanism for alternate namespaces to protect packages from
909stomping on each others variables.
910By default, a perl script starts compiling into the package known as \*(L"main\*(R".
911By use of the
912.I package
913declaration, you can switch namespaces.
914The scope of the package declaration is from the declaration itself to the end
915of the enclosing block (the same scope as the local() operator).
916Typically it would be the first declaration in a file to be included by
917the \*(L"do FILE\*(R" operator.
918You can switch into a package in more than one place; it merely influences
919which symbol table is used by the compiler for the rest of that block.
663a0e37 920You can refer to variables and filehandles in other packages by prefixing
921the identifier with the package name and a single quote.
a687059c 922If the package name is null, the \*(L"main\*(R" package as assumed.
663a0e37 923.PP
924Only identifiers starting with letters are stored in the packages symbol
925table.
926All other symbols are kept in package \*(L"main\*(R".
927In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC
928and SIG are forced to be in package \*(L"main\*(R", even when used for
929other purposes than their built-in one.
930Note also that, if you have a package called \*(L"m\*(R", \*(L"s\*(R"
931or \*(L"y\*(R", the you can't use the qualified form of an identifier since it
932will be interpreted instead as a pattern match, a substitution
933or a translation.
934.PP
a687059c 935Eval'ed strings are compiled in the package in which the eval was compiled
936in.
937(Assignments to $SIG{}, however, assume the signal handler specified is in the
938main package.
939Qualify the signal handler name if you wish to have a signal handler in
940a package.)
941For an example, examine perldb.pl in the perl library.
942It initially switches to the DB package so that the debugger doesn't interfere
943with variables in the script you are trying to debug.
944At various points, however, it temporarily switches back to the main package
945to evaluate various expressions in the context of the main package.
946.PP
947The symbol table for a package happens to be stored in the associative array
948of that name prepended with an underscore.
949The value in each entry of the associative array is
950what you are referring to when you use the *name notation.
951In fact, the following have the same effect (in package main, anyway),
952though the first is more
953efficient because it does the symbol table lookups at compile time:
954.nf
955
956.ne 2
957 local(*foo) = *bar;
958 local($_main{'foo'}) = $_main{'bar'};
959
960.fi
961You can use this to print out all the variables in a package, for instance.
962Here is dumpvar.pl from the perl library:
963.nf
964.ne 11
965 package dumpvar;
966
967 sub main'dumpvar {
968 \& ($package) = @_;
969 \& local(*stab) = eval("*_$package");
970 \& while (($key,$val) = each(%stab)) {
971 \& {
972 \& local(*entry) = $val;
973 \& if (defined $entry) {
974 \& print "\e$$key = '$entry'\en";
975 \& }
976.ne 7
977 \& if (defined @entry) {
978 \& print "\e@$key = (\en";
979 \& foreach $num ($[ .. $#entry) {
980 \& print " $num\et'",$entry[$num],"'\en";
981 \& }
982 \& print ")\en";
983 \& }
984.ne 10
985 \& if ($key ne "_$package" && defined %entry) {
986 \& print "\e%$key = (\en";
987 \& foreach $key (sort keys(%entry)) {
988 \& print " $key\et'",$entry{$key},"'\en";
989 \& }
990 \& print ")\en";
991 \& }
992 \& }
993 \& }
994 }
995
996.fi
997Note that, even though the subroutine is compiled in package dumpvar, the
663a0e37 998name of the subroutine is qualified so that its name is inserted into package
a687059c 999\*(L"main\*(R".
1000.Sh "Style"
1001Each programmer will, of course, have his or her own preferences in regards
1002to formatting, but there are some general guidelines that will make your
1003programs easier to read.
1004.Ip 1. 4 4
1005Just because you CAN do something a particular way doesn't mean that
1006you SHOULD do it that way.
1007.I Perl
1008is designed to give you several ways to do anything, so consider picking
1009the most readable one.
1010For instance
1011
1012 open(FOO,$foo) || die "Can't open $foo: $!";
1013
1014is better than
1015
1016 die "Can't open $foo: $!" unless open(FOO,$foo);
1017
1018because the second way hides the main point of the statement in a
1019modifier.
1020On the other hand
1021
1022 print "Starting analysis\en" if $verbose;
1023
1024is better than
1025
1026 $verbose && print "Starting analysis\en";
1027
1028since the main point isn't whether the user typed -v or not.
1029.Sp
1030Similarly, just because an operator lets you assume default arguments
1031doesn't mean that you have to make use of the defaults.
1032The defaults are there for lazy systems programmers writing one-shot
1033programs.
1034If you want your program to be readable, consider supplying the argument.
03a14243 1035.Sp
1036Along the same lines, just because you
1037.I can
1038omit parentheses in many places doesn't mean that you ought to:
1039.nf
1040
1041 return print reverse sort num values array;
1042 return print(reverse(sort num (values(%array))));
1043
1044.fi
1045When in doubt, parenthesize.
1046At the very least it will let some poor schmuck bounce on the % key in vi.
a687059c 1047.Ip 2. 4 4
1048Don't go through silly contortions to exit a loop at the top or the
1049bottom, when
1050.I perl
1051provides the "last" operator so you can exit in the middle.
1052Just outdent it a little to make it more visible:
1053.nf
1054
1055.ne 7
1056 line:
1057 for (;;) {
1058 statements;
1059 last line if $foo;
1060 next line if /^#/;
1061 statements;
1062 }
1063
1064.fi
1065.Ip 3. 4 4
1066Don't be afraid to use loop labels\*(--they're there to enhance readability as
1067well as to allow multi-level loop breaks.
1068See last example.
ffed7fef 1069.Ip 4. 4 4
a687059c 1070For portability, when using features that may not be implemented on every
1071machine, test the construct in an eval to see if it fails.
03a14243 1072If you know what version or patchlevel a particular feature was implemented,
1073you can test $] to see if it will be there.
a687059c 1074.Ip 5. 4 4
ffed7fef 1075Choose mnemonic identifiers.
1076.Ip 6. 4 4
a687059c 1077Be consistent.
1078.Sh "Debugging"
1079If you invoke
1080.I perl
1081with a
1082.B \-d
1083switch, your script will be run under a debugging monitor.
1084It will halt before the first executable statement and ask you for a
1085command, such as:
1086.Ip "h" 12 4
1087Prints out a help message.
1088.Ip "s" 12 4
1089Single step.
1090Executes until it reaches the beginning of another statement.
1091.Ip "c" 12 4
1092Continue.
1093Executes until the next breakpoint is reached.
1094.Ip "<CR>" 12 4
1095Repeat last s or c.
1096.Ip "n" 12 4
1097Single step around subroutine call.
1098.Ip "l min+incr" 12 4
1099List incr+1 lines starting at min.
1100If min is omitted, starts where last listing left off.
1101If incr is omitted, previous value of incr is used.
1102.Ip "l min-max" 12 4
1103List lines in the indicated range.
1104.Ip "l line" 12 4
1105List just the indicated line.
1106.Ip "l" 12 4
1107List incr+1 more lines after last printed line.
1108.Ip "l subname" 12 4
1109List subroutine.
1110If it's a long subroutine it just lists the beginning.
1111Use \*(L"l\*(R" to list more.
1112.Ip "L" 12 4
1113List lines that have breakpoints or actions.
1114.Ip "t" 12 4
1115Toggle trace mode on or off.
1116.Ip "b line" 12 4
1117Set a breakpoint.
1118If line is omitted, sets a breakpoint on the current line
1119line that is about to be executed.
1120Breakpoints may only be set on lines that begin an executable statement.
1121.Ip "b subname" 12 4
1122Set breakpoint at first executable line of subroutine.
1123.Ip "S" 12 4
1124Lists the names of all subroutines.
1125.Ip "d line" 12 4
1126Delete breakpoint.
1127If line is omitted, deletes the breakpoint on the current line
1128line that is about to be executed.
1129.Ip "D" 12 4
1130Delete all breakpoints.
1131.Ip "A" 12 4
1132Delete all line actions.
1133.Ip "V package" 12 4
1134List all variables in package.
1135Default is main package.
1136.Ip "a line command" 12 4
1137Set an action for line.
1138A multi-line command may be entered by backslashing the newlines.
1139.Ip "< command" 12 4
1140Set an action to happen before every debugger prompt.
1141A multi-line command may be entered by backslashing the newlines.
1142.Ip "> command" 12 4
1143Set an action to happen after the prompt when you've just given a command
1144to return to executing the script.
1145A multi-line command may be entered by backslashing the newlines.
1146.Ip "! number" 12 4
1147Redo a debugging command.
1148If number is omitted, redoes the previous command.
1149.Ip "! -number" 12 4
1150Redo the command that was that many commands ago.
1151.Ip "H -number" 12 4
1152Display last n commands.
1153Only commands longer than one character are listed.
1154If number is omitted, lists them all.
1155.Ip "q or ^D" 12 4
1156Quit.
1157.Ip "command" 12 4
1158Execute command as a perl statement.
1159A missing semicolon will be supplied.
1160.Ip "p expr" 12 4
1161Same as \*(L"print DB'OUT expr\*(R".
1162The DB'OUT filehandle is opened to /dev/tty, regardless of where STDOUT
1163may be redirected to.
1164.PP
1165If you want to modify the debugger, copy perldb.pl from the perl library
1166to your current directory and modify it as necessary.
1167You can do some customization by setting up a .perldb file which contains
1168initialization code.
1169For instance, you could make aliases like these:
1170.nf
1171
ac58e20f 1172 $DB'alias{'len'} = 's/^len(.*)/p length($1)/';
1173 $DB'alias{'stop'} = 's/^stop (at|in)/b/';
1174 $DB'alias{'.'} =
1175 's/^\e./p "\e$DB\e'sub(\e$DB\e'line):\et",\e$DB\e'line[\e$DB\e'line]/';
a687059c 1176
1177.fi
1178.Sh "Setuid Scripts"
1179.I Perl
1180is designed to make it easy to write secure setuid and setgid scripts.
1181Unlike shells, which are based on multiple substitution passes on each line
1182of the script,
1183.I perl
1184uses a more conventional evaluation scheme with fewer hidden \*(L"gotchas\*(R".
1185Additionally, since the language has more built-in functionality, it
1186has to rely less upon external (and possibly untrustworthy) programs to
1187accomplish its purposes.
1188.PP
1189In an unpatched 4.2 or 4.3bsd kernel, setuid scripts are intrinsically
1190insecure, but this kernel feature can be disabled.
1191If it is,
1192.I perl
1193can emulate the setuid and setgid mechanism when it notices the otherwise
1194useless setuid/gid bits on perl scripts.
1195If the kernel feature isn't disabled,
1196.I perl
1197will complain loudly that your setuid script is insecure.
1198You'll need to either disable the kernel setuid script feature, or put
1199a C wrapper around the script.
1200.PP
1201When perl is executing a setuid script, it takes special precautions to
1202prevent you from falling into any obvious traps.
1203(In some ways, a perl script is more secure than the corresponding
1204C program.)
1205Any command line argument, environment variable, or input is marked as
1206\*(L"tainted\*(R", and may not be used, directly or indirectly, in any
1207command that invokes a subshell, or in any command that modifies files,
1208directories or processes.
1209Any variable that is set within an expression that has previously referenced
1210a tainted value also becomes tainted (even if it is logically impossible
1211for the tainted value to influence the variable).
1212For example:
1213.nf
1214
1215.ne 5
1216 $foo = shift; # $foo is tainted
1217 $bar = $foo,\'bar\'; # $bar is also tainted
1218 $xxx = <>; # Tainted
1219 $path = $ENV{\'PATH\'}; # Tainted, but see below
1220 $abc = \'abc\'; # Not tainted
1221
1222.ne 4
1223 system "echo $foo"; # Insecure
1224 system "echo", $foo; # Secure (doesn't use sh)
1225 system "echo $bar"; # Insecure
1226 system "echo $abc"; # Insecure until PATH set
1227
1228.ne 5
1229 $ENV{\'PATH\'} = \'/bin:/usr/bin\';
1230 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1231
1232 $path = $ENV{\'PATH\'}; # Not tainted
1233 system "echo $abc"; # Is secure now!
1234
1235.ne 5
1236 open(FOO,"$foo"); # OK
1237 open(FOO,">$foo"); # Not OK
1238
1239 open(FOO,"echo $foo|"); # Not OK, but...
1240 open(FOO,"-|") || exec \'echo\', $foo; # OK
1241
1242 $zzz = `echo $foo`; # Insecure, zzz tainted
1243
1244 unlink $abc,$foo; # Insecure
1245 umask $foo; # Insecure
1246
1247.ne 3
1248 exec "echo $foo"; # Insecure
1249 exec "echo", $foo; # Secure (doesn't use sh)
1250 exec "sh", \'-c\', $foo; # Considered secure, alas
1251
1252.fi
1253The taintedness is associated with each scalar value, so some elements
1254of an array can be tainted, and others not.
1255.PP
1256If you try to do something insecure, you will get a fatal error saying
1257something like \*(L"Insecure dependency\*(R" or \*(L"Insecure PATH\*(R".
1258Note that you can still write an insecure system call or exec,
ae986130 1259but only by explicitly doing something like the last example above.
a687059c 1260You can also bypass the tainting mechanism by referencing
1261subpatterns\*(--\c
1262.I perl
1263presumes that if you reference a substring using $1, $2, etc, you knew
1264what you were doing when you wrote the pattern:
1265.nf
1266
1267 $ARGV[0] =~ /^\-P(\ew+)$/;
1268 $printer = $1; # Not tainted
1269
1270.fi
1271This is fairly secure since \ew+ doesn't match shell metacharacters.
1272Use of .+ would have been insecure, but
1273.I perl
1274doesn't check for that, so you must be careful with your patterns.
1275This is the ONLY mechanism for untainting user supplied filenames if you
1276want to do file operations on them (unless you make $> equal to $<).
1277.PP
1278It's also possible to get into trouble with other operations that don't care
1279whether they use tainted values.
1280Make judicious use of the file tests in dealing with any user-supplied
1281filenames.
1282When possible, do opens and such after setting $> = $<.
1283.I Perl
1284doesn't prevent you from opening tainted filenames for reading, so be
1285careful what you print out.
1286The tainting mechanism is intended to prevent stupid mistakes, not to remove
1287the need for thought.
1288.SH ENVIRONMENT
1289.I Perl
1290uses PATH in executing subprocesses, and in finding the script if \-S
1291is used.
1292HOME or LOGDIR are used if chdir has no argument.
1293.PP
1294Apart from these,
1295.I perl
1296uses no environment variables, except to make them available
1297to the script being executed, and to child processes.
1298However, scripts running setuid would do well to execute the following lines
1299before doing anything else, just to keep people honest:
1300.nf
1301
1302.ne 3
1303 $ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
1304 $ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
1305 $ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
1306
1307.fi
1308.SH AUTHOR
1309Larry Wall <lwall@jpl-devvax.Jpl.Nasa.Gov>
1310.SH FILES
1311/tmp/perl\-eXXXXXX temporary file for
1312.B \-e
1313commands.
1314.SH SEE ALSO
1315a2p awk to perl translator
1316.br
1317s2p sed to perl translator
1318.SH DIAGNOSTICS
1319Compilation errors will tell you the line number of the error, with an
1320indication of the next token or token type that was to be examined.
1321(In the case of a script passed to
1322.I perl
1323via
1324.B \-e
1325switches, each
1326.B \-e
1327is counted as one line.)
1328.PP
1329Setuid scripts have additional constraints that can produce error messages
1330such as \*(L"Insecure dependency\*(R".
1331See the section on setuid scripts.
1332.SH TRAPS
1333Accustomed
1334.IR awk
1335users should take special note of the following:
1336.Ip * 4 2
1337Semicolons are required after all simple statements in
1338.IR perl .
1339Newline
1340is not a statement delimiter.
1341.Ip * 4 2
1342Curly brackets are required on ifs and whiles.
1343.Ip * 4 2
1344Variables begin with $ or @ in
1345.IR perl .
1346.Ip * 4 2
1347Arrays index from 0 unless you set $[.
1348Likewise string positions in substr() and index().
1349.Ip * 4 2
1350You have to decide whether your array has numeric or string indices.
1351.Ip * 4 2
1352Associative array values do not spring into existence upon mere reference.
1353.Ip * 4 2
1354You have to decide whether you want to use string or numeric comparisons.
1355.Ip * 4 2
1356Reading an input line does not split it for you. You get to split it yourself
1357to an array.
1358And the
1359.I split
1360operator has different arguments.
1361.Ip * 4 2
1362The current input line is normally in $_, not $0.
1363It generally does not have the newline stripped.
ac58e20f 1364($0 is the name of the program executed.)
a687059c 1365.Ip * 4 2
1366$<digit> does not refer to fields\*(--it refers to substrings matched by the last
1367match pattern.
1368.Ip * 4 2
1369The
1370.I print
1371statement does not add field and record separators unless you set
1372$, and $\e.
1373.Ip * 4 2
1374You must open your files before you print to them.
1375.Ip * 4 2
1376The range operator is \*(L".\|.\*(R", not comma.
1377(The comma operator works as in C.)
1378.Ip * 4 2
1379The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
1380(\*(L"~\*(R" is the one's complement operator, as in C.)
1381.Ip * 4 2
1382The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
1383(\*(L"^\*(R" is the XOR operator, as in C.)
1384.Ip * 4 2
1385The concatenation operator is \*(L".\*(R", not the null string.
1386(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
1387since the third slash would be interpreted as a division operator\*(--the
1388tokener is in fact slightly context sensitive for operators like /, ?, and <.
1389And in fact, . itself can be the beginning of a number.)
1390.Ip * 4 2
1391.IR Next ,
1392.I exit
1393and
1394.I continue
1395work differently.
1396.Ip * 4 2
1397The following variables work differently
1398.nf
1399
1400 Awk \h'|2.5i'Perl
1401 ARGC \h'|2.5i'$#ARGV
1402 ARGV[0] \h'|2.5i'$0
1403 FILENAME\h'|2.5i'$ARGV
1404 FNR \h'|2.5i'$. \- something
1405 FS \h'|2.5i'(whatever you like)
1406 NF \h'|2.5i'$#Fld, or some such
1407 NR \h'|2.5i'$.
1408 OFMT \h'|2.5i'$#
1409 OFS \h'|2.5i'$,
1410 ORS \h'|2.5i'$\e
1411 RLENGTH \h'|2.5i'length($&)
ac58e20f 1412 RS \h'|2.5i'$/
a687059c 1413 RSTART \h'|2.5i'length($\`)
1414 SUBSEP \h'|2.5i'$;
1415
1416.fi
1417.Ip * 4 2
1418When in doubt, run the
1419.I awk
1420construct through a2p and see what it gives you.
1421.PP
1422Cerebral C programmers should take note of the following:
1423.Ip * 4 2
1424Curly brackets are required on ifs and whiles.
1425.Ip * 4 2
1426You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
1427.Ip * 4 2
1428.I Break
1429and
1430.I continue
1431become
1432.I last
1433and
1434.IR next ,
1435respectively.
1436.Ip * 4 2
1437There's no switch statement.
1438.Ip * 4 2
1439Variables begin with $ or @ in
1440.IR perl .
1441.Ip * 4 2
1442Printf does not implement *.
1443.Ip * 4 2
1444Comments begin with #, not /*.
1445.Ip * 4 2
1446You can't take the address of anything.
1447.Ip * 4 2
1448ARGV must be capitalized.
1449.Ip * 4 2
1450The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
1451.Ip * 4 2
1452Signal handlers deal with signal names, not numbers.
1453.Ip * 4 2
1454You can't subscript array values, only arrays (no $x = (1,2,3)[2];).
1455.PP
1456Seasoned
1457.I sed
1458programmers should take note of the following:
1459.Ip * 4 2
1460Backreferences in substitutions use $ rather than \e.
1461.Ip * 4 2
1462The pattern matching metacharacters (, ), and | do not have backslashes in front.
1463.Ip * 4 2
1464The range operator is .\|. rather than comma.
1465.PP
1466Sharp shell programmers should take note of the following:
1467.Ip * 4 2
1468The backtick operator does variable interpretation without regard to the
1469presence of single quotes in the command.
1470.Ip * 4 2
1471The backtick operator does no translation of the return value, unlike csh.
1472.Ip * 4 2
1473Shells (especially csh) do several levels of substitution on each command line.
1474.I Perl
1475does substitution only in certain constructs such as double quotes,
1476backticks, angle brackets and search patterns.
1477.Ip * 4 2
1478Shells interpret scripts a little bit at a time.
1479.I Perl
1480compiles the whole program before executing it.
1481.Ip * 4 2
1482The arguments are available via @ARGV, not $1, $2, etc.
1483.Ip * 4 2
1484The environment is not automatically made available as variables.
1485.SH BUGS
1486.PP
1487.I Perl
1488is at the mercy of your machine's definitions of various operations
1489such as type casting, atof() and sprintf().
1490.PP
1491If your stdio requires an seek or eof between reads and writes on a particular
1492stream, so does
1493.IR perl .
1494.PP
1495While none of the built-in data types have any arbitrary size limits (apart
1496from memory size), there are still a few arbitrary limits:
1497a given identifier may not be longer than 255 characters;
1498sprintf is limited on many machines to 128 characters per field (unless the format
1499specifier is exactly %s);
1500and no component of your PATH may be longer than 255 if you use \-S.
1501.PP
1502.I Perl
1503actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
1504anyone I said that.
1505.rn }` ''