Commit | Line | Data |
8d063cd8 |
1 | .rn '' }` |
450a55e4 |
2 | ''' $Header: perl_man.1,v 3.0.1.7 90/08/09 04:24:03 lwall Locked $ |
8d063cd8 |
3 | ''' |
4 | ''' $Log: perl.man.1,v $ |
450a55e4 |
5 | ''' Revision 3.0.1.7 90/08/09 04:24:03 lwall |
6 | ''' patch19: added -x switch to extract script from input trash |
7 | ''' patch19: Added -c switch to do compilation only |
8 | ''' patch19: bare identifiers are now strings if no other interpretation possible |
9 | ''' patch19: -s now returns size of file |
10 | ''' patch19: Added __LINE__ and __FILE__ tokens |
11 | ''' patch19: Added __END__ token |
12 | ''' |
13 | ''' Revision 3.0.1.6 90/08/03 11:14:44 lwall |
14 | ''' patch19: Intermediate diffs for Randal |
15 | ''' |
0f85fab0 |
16 | ''' Revision 3.0.1.5 90/03/27 16:14:37 lwall |
17 | ''' patch16: .. now works using magical string increment |
18 | ''' |
79a0689e |
19 | ''' Revision 3.0.1.4 90/03/12 16:44:33 lwall |
20 | ''' patch13: (LIST,) now legal |
21 | ''' patch13: improved LIST documentation |
22 | ''' patch13: example of if-elsif switch was wrong |
23 | ''' |
ac58e20f |
24 | ''' Revision 3.0.1.3 90/02/28 17:54:32 lwall |
25 | ''' patch9: @array in scalar context now returns length of array |
26 | ''' patch9: in manual, example of open and ?: was backwards |
27 | ''' |
ffed7fef |
28 | ''' Revision 3.0.1.2 89/11/17 15:30:03 lwall |
29 | ''' patch5: fixed some manual typos and indent problems |
30 | ''' |
ae986130 |
31 | ''' Revision 3.0.1.1 89/11/11 04:41:22 lwall |
32 | ''' patch2: explained about sh and ${1+"$@"} |
33 | ''' patch2: documented that space must separate word and '' string |
34 | ''' |
a687059c |
35 | ''' Revision 3.0 89/10/18 15:21:29 lwall |
36 | ''' 3.0 baseline |
8d063cd8 |
37 | ''' |
38 | ''' |
39 | .de Sh |
40 | .br |
41 | .ne 5 |
42 | .PP |
43 | \fB\\$1\fR |
44 | .PP |
45 | .. |
46 | .de Sp |
47 | .if t .sp .5v |
48 | .if n .sp |
49 | .. |
50 | .de Ip |
51 | .br |
52 | .ie \\n.$>=3 .ne \\$3 |
53 | .el .ne 3 |
54 | .IP "\\$1" \\$2 |
55 | .. |
56 | ''' |
57 | ''' Set up \*(-- to give an unbreakable dash; |
58 | ''' string Tr holds user defined translation string. |
59 | ''' Bell System Logo is used as a dummy character. |
60 | ''' |
378cc40b |
61 | .tr \(*W-|\(bv\*(Tr |
8d063cd8 |
62 | .ie n \{\ |
378cc40b |
63 | .ds -- \(*W- |
64 | .if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch |
65 | .if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch |
8d063cd8 |
66 | .ds L" "" |
67 | .ds R" "" |
68 | .ds L' ' |
69 | .ds R' ' |
70 | 'br\} |
71 | .el\{\ |
72 | .ds -- \(em\| |
73 | .tr \*(Tr |
74 | .ds L" `` |
75 | .ds R" '' |
76 | .ds L' ` |
77 | .ds R' ' |
78 | 'br\} |
a687059c |
79 | .TH PERL 1 "\*(RP" |
80 | .UC |
8d063cd8 |
81 | .SH NAME |
a687059c |
82 | perl \- Practical Extraction and Report Language |
8d063cd8 |
83 | .SH SYNOPSIS |
a687059c |
84 | .B perl |
85 | [options] filename args |
8d063cd8 |
86 | .SH DESCRIPTION |
87 | .I Perl |
a687059c |
88 | is an interpreted language optimized for scanning arbitrary text files, |
8d063cd8 |
89 | extracting information from those text files, and printing reports based |
90 | on that information. |
91 | It's also a good language for many system management tasks. |
92 | The language is intended to be practical (easy to use, efficient, complete) |
93 | rather than beautiful (tiny, elegant, minimal). |
94 | It combines (in the author's opinion, anyway) some of the best features of C, |
95 | \fIsed\fR, \fIawk\fR, and \fIsh\fR, |
96 | so people familiar with those languages should have little difficulty with it. |
97 | (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and |
98 | even BASIC-PLUS.) |
99 | Expression syntax corresponds quite closely to C expression syntax. |
a687059c |
100 | Unlike most Unix utilities, |
101 | .I perl |
102 | does not arbitrarily limit the size of your data\*(--if you've got |
103 | the memory, |
104 | .I perl |
105 | can slurp in your whole file as a single string. |
106 | Recursion is of unlimited depth. |
107 | And the hash tables used by associative arrays grow as necessary to prevent |
108 | degraded performance. |
109 | .I Perl |
110 | uses sophisticated pattern matching techniques to scan large amounts of |
111 | data very quickly. |
112 | Although optimized for scanning text, |
113 | .I perl |
114 | can also deal with binary data, and can make dbm files look like associative |
115 | arrays (where dbm is available). |
116 | Setuid |
117 | .I perl |
118 | scripts are safer than C programs |
119 | through a dataflow tracing mechanism which prevents many stupid security holes. |
8d063cd8 |
120 | If you have a problem that would ordinarily use \fIsed\fR |
121 | or \fIawk\fR or \fIsh\fR, but it |
122 | exceeds their capabilities or must run a little faster, |
123 | and you don't want to write the silly thing in C, then |
124 | .I perl |
125 | may be for you. |
a687059c |
126 | There are also translators to turn your |
127 | .I sed |
128 | and |
129 | .I awk |
130 | scripts into |
131 | .I perl |
132 | scripts. |
8d063cd8 |
133 | OK, enough hype. |
134 | .PP |
135 | Upon startup, |
136 | .I perl |
137 | looks for your script in one of the following places: |
138 | .Ip 1. 4 2 |
139 | Specified line by line via |
140 | .B \-e |
141 | switches on the command line. |
142 | .Ip 2. 4 2 |
143 | Contained in the file specified by the first filename on the command line. |
144 | (Note that systems supporting the #! notation invoke interpreters this way.) |
145 | .Ip 3. 4 2 |
a687059c |
146 | Passed in implicitly via standard input. |
378cc40b |
147 | This only works if there are no filename arguments\*(--to pass |
a687059c |
148 | arguments to a |
149 | .I stdin |
150 | script you must explicitly specify a \- for the script name. |
8d063cd8 |
151 | .PP |
152 | After locating your script, |
153 | .I perl |
154 | compiles it to an internal form. |
155 | If the script is syntactically correct, it is executed. |
156 | .Sh "Options" |
83b4785a |
157 | Note: on first reading this section may not make much sense to you. It's here |
8d063cd8 |
158 | at the front for easy reference. |
159 | .PP |
160 | A single-character option may be combined with the following option, if any. |
161 | This is particularly useful when invoking a script using the #! construct which |
162 | only allows one argument. Example: |
163 | .nf |
164 | |
165 | .ne 2 |
a687059c |
166 | #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak |
8d063cd8 |
167 | .\|.\|. |
168 | |
169 | .fi |
170 | Options include: |
171 | .TP 5 |
378cc40b |
172 | .B \-a |
a687059c |
173 | turns on autosplit mode when used with a |
174 | .B \-n |
175 | or |
176 | .BR \-p . |
378cc40b |
177 | An implicit split command to the @F array |
178 | is done as the first thing inside the implicit while loop produced by |
a687059c |
179 | the |
180 | .B \-n |
181 | or |
182 | .BR \-p . |
378cc40b |
183 | .nf |
184 | |
a687059c |
185 | perl \-ane \'print pop(@F), "\en";\' |
378cc40b |
186 | |
187 | is equivalent to |
188 | |
189 | while (<>) { |
a687059c |
190 | @F = split(\' \'); |
191 | print pop(@F), "\en"; |
378cc40b |
192 | } |
193 | |
194 | .fi |
195 | .TP 5 |
450a55e4 |
196 | .B \-c |
197 | causes |
198 | .I perl |
199 | to check the syntax of the script and then exit without executing it. |
200 | .TP 5 |
a687059c |
201 | .BI \-d |
202 | runs the script under the perl debugger. |
203 | See the section on Debugging. |
204 | .TP 5 |
205 | .BI \-D number |
8d063cd8 |
206 | sets debugging flags. |
207 | To watch how it executes your script, use |
a687059c |
208 | .BR \-D14 . |
8d063cd8 |
209 | (This only works if debugging is compiled into your |
210 | .IR perl .) |
a687059c |
211 | Another nice value is \-D1024, which lists your compiled syntax tree. |
212 | And \-D512 displays compiled regular expressions. |
8d063cd8 |
213 | .TP 5 |
a687059c |
214 | .BI \-e " commandline" |
8d063cd8 |
215 | may be used to enter one line of script. |
216 | Multiple |
217 | .B \-e |
218 | commands may be given to build up a multi-line script. |
219 | If |
220 | .B \-e |
221 | is given, |
222 | .I perl |
223 | will not look for a script filename in the argument list. |
224 | .TP 5 |
a687059c |
225 | .BI \-i extension |
8d063cd8 |
226 | specifies that files processed by the <> construct are to be edited |
227 | in-place. |
228 | It does this by renaming the input file, opening the output file by the |
229 | same name, and selecting that output file as the default for print statements. |
230 | The extension, if supplied, is added to the name of the |
231 | old file to make a backup copy. |
232 | If no extension is supplied, no backup is made. |
a687059c |
233 | Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using |
8d063cd8 |
234 | the script: |
235 | .nf |
236 | |
237 | .ne 2 |
a687059c |
238 | #!/usr/bin/perl \-pi.bak |
8d063cd8 |
239 | s/foo/bar/; |
240 | |
241 | which is equivalent to |
242 | |
243 | .ne 14 |
378cc40b |
244 | #!/usr/bin/perl |
8d063cd8 |
245 | while (<>) { |
246 | if ($ARGV ne $oldargv) { |
a687059c |
247 | rename($ARGV, $ARGV . \'.bak\'); |
248 | open(ARGVOUT, ">$ARGV"); |
8d063cd8 |
249 | select(ARGVOUT); |
250 | $oldargv = $ARGV; |
251 | } |
252 | s/foo/bar/; |
253 | } |
254 | continue { |
255 | print; # this prints to original filename |
256 | } |
a687059c |
257 | select(STDOUT); |
8d063cd8 |
258 | |
259 | .fi |
a687059c |
260 | except that the |
261 | .B \-i |
262 | form doesn't need to compare $ARGV to $oldargv to know when |
8d063cd8 |
263 | the filename has changed. |
264 | It does, however, use ARGVOUT for the selected filehandle. |
a687059c |
265 | Note that |
266 | .I STDOUT |
267 | is restored as the default output filehandle after the loop. |
378cc40b |
268 | .Sp |
269 | You can use eof to locate the end of each input file, in case you want |
270 | to append to each file, or reset line numbering (see example under eof). |
8d063cd8 |
271 | .TP 5 |
a687059c |
272 | .BI \-I directory |
8d063cd8 |
273 | may be used in conjunction with |
274 | .B \-P |
275 | to tell the C preprocessor where to look for include files. |
276 | By default /usr/include and /usr/lib/perl are searched. |
277 | .TP 5 |
278 | .B \-n |
279 | causes |
280 | .I perl |
281 | to assume the following loop around your script, which makes it iterate |
a687059c |
282 | over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR: |
8d063cd8 |
283 | .nf |
284 | |
285 | .ne 3 |
286 | while (<>) { |
378cc40b |
287 | .\|.\|. # your script goes here |
8d063cd8 |
288 | } |
289 | |
290 | .fi |
291 | Note that the lines are not printed by default. |
292 | See |
293 | .B \-p |
294 | to have lines printed. |
378cc40b |
295 | Here is an efficient way to delete all files older than a week: |
296 | .nf |
297 | |
a687059c |
298 | find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\' |
378cc40b |
299 | |
300 | .fi |
a687059c |
301 | This is faster than using the \-exec switch of find because you don't have to |
378cc40b |
302 | start a process on every filename found. |
8d063cd8 |
303 | .TP 5 |
304 | .B \-p |
305 | causes |
306 | .I perl |
307 | to assume the following loop around your script, which makes it iterate |
308 | over filename arguments somewhat like \fIsed\fR: |
309 | .nf |
310 | |
311 | .ne 5 |
312 | while (<>) { |
378cc40b |
313 | .\|.\|. # your script goes here |
8d063cd8 |
314 | } continue { |
315 | print; |
316 | } |
317 | |
318 | .fi |
319 | Note that the lines are printed automatically. |
320 | To suppress printing use the |
321 | .B \-n |
322 | switch. |
83b4785a |
323 | A |
324 | .B \-p |
325 | overrides a |
326 | .B \-n |
327 | switch. |
8d063cd8 |
328 | .TP 5 |
329 | .B \-P |
330 | causes your script to be run through the C preprocessor before |
331 | compilation by |
a687059c |
332 | .IR perl . |
8d063cd8 |
333 | (Since both comments and cpp directives begin with the # character, |
334 | you should avoid starting comments with any words recognized |
335 | by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".) |
336 | .TP 5 |
337 | .B \-s |
338 | enables some rudimentary switch parsing for switches on the command line |
a687059c |
339 | after the script name but before any filename arguments (or before a \-\|\-). |
83b4785a |
340 | Any switch found there is removed from @ARGV and sets the corresponding variable in the |
8d063cd8 |
341 | .I perl |
342 | script. |
343 | The following script prints \*(L"true\*(R" if and only if the script is |
a687059c |
344 | invoked with a \-xyz switch. |
8d063cd8 |
345 | .nf |
346 | |
347 | .ne 2 |
a687059c |
348 | #!/usr/bin/perl \-s |
83b4785a |
349 | if ($xyz) { print "true\en"; } |
8d063cd8 |
350 | |
351 | .fi |
378cc40b |
352 | .TP 5 |
353 | .B \-S |
a687059c |
354 | makes |
355 | .I perl |
356 | use the PATH environment variable to search for the script |
378cc40b |
357 | (unless the name of the script starts with a slash). |
358 | Typically this is used to emulate #! startup on machines that don't |
359 | support #!, in the following manner: |
360 | .nf |
361 | |
362 | #!/usr/bin/perl |
a687059c |
363 | eval "exec /usr/bin/perl \-S $0 $*" |
378cc40b |
364 | if $running_under_some_shell; |
365 | |
366 | .fi |
367 | The system ignores the first line and feeds the script to /bin/sh, |
a687059c |
368 | which proceeds to try to execute the |
369 | .I perl |
370 | script as a shell script. |
378cc40b |
371 | The shell executes the second line as a normal shell command, and thus |
a687059c |
372 | starts up the |
373 | .I perl |
374 | interpreter. |
378cc40b |
375 | On some systems $0 doesn't always contain the full pathname, |
a687059c |
376 | so the |
377 | .B \-S |
378 | tells |
379 | .I perl |
380 | to search for the script if necessary. |
381 | After |
382 | .I perl |
383 | locates the script, it parses the lines and ignores them because |
378cc40b |
384 | the variable $running_under_some_shell is never true. |
ae986130 |
385 | A better construct than $* would be ${1+"$@"}, which handles embedded spaces |
386 | and such in the filenames, but doesn't work if the script is being interpreted |
387 | by csh. |
388 | In order to start up sh rather than csh, some systems may have to replace the |
389 | #! line with a line containing just |
390 | a colon, which will be politely ignored by perl. |
450a55e4 |
391 | Other systems can't control that, and need a totally devious construct that |
392 | will work under any of csh, sh or perl, such as the following: |
393 | .nf |
394 | |
395 | .ne 3 |
396 | eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' |
397 | & eval 'exec /usr/bin/perl -S $0 $argv:q' |
398 | if 0; |
399 | |
400 | .fi |
378cc40b |
401 | .TP 5 |
a687059c |
402 | .B \-u |
403 | causes |
404 | .I perl |
405 | to dump core after compiling your script. |
406 | You can then take this core dump and turn it into an executable file |
407 | by using the undump program (not supplied). |
408 | This speeds startup at the expense of some disk space (which you can |
409 | minimize by stripping the executable). |
410 | (Still, a "hello world" executable comes out to about 200K on my machine.) |
411 | If you are going to run your executable as a set-id program then you |
412 | should probably compile it using taintperl rather than normal perl. |
413 | If you want to execute a portion of your script before dumping, use the |
414 | dump operator instead. |
450a55e4 |
415 | Note: availability of undump is platform specific and may not be available |
416 | for a specific port of perl. |
a687059c |
417 | .TP 5 |
378cc40b |
418 | .B \-U |
a687059c |
419 | allows |
420 | .I perl |
421 | to do unsafe operations. |
13281fa4 |
422 | Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while |
378cc40b |
423 | running as superuser. |
424 | .TP 5 |
425 | .B \-v |
a687059c |
426 | prints the version and patchlevel of your |
427 | .I perl |
428 | executable. |
378cc40b |
429 | .TP 5 |
430 | .B \-w |
431 | prints warnings about identifiers that are mentioned only once, and scalar |
432 | variables that are used before being set. |
433 | Also warns about redefined subroutines, and references to undefined |
a687059c |
434 | filehandles or filehandles opened readonly that you are attempting to |
435 | write on. |
436 | Also warns you if you use == on values that don't look like numbers, and if |
437 | your subroutines recurse more than 100 deep. |
450a55e4 |
438 | .TP 5 |
439 | .BI \-x directory |
440 | tells |
441 | .I perl |
442 | that the script is embedded in a message. |
443 | Leading garbage will be discarded until the first line that starts |
444 | with #! and contains the string "perl". |
445 | Any meaningful switches on that line will be applied (but only one |
446 | group of switches, as with normal #! processing). |
447 | If a directory name is specified, Perl will switch to that directory |
448 | before running the script. |
449 | The |
450 | .B \-x |
451 | switch only controls the the disposal of leading garbage. |
452 | The script must be terminated with __END__ if there is trailing garbage |
453 | to be ignored (the script can process any or all of the trailing garbage |
454 | via standard input if desired). |
8d063cd8 |
455 | .Sh "Data Types and Objects" |
456 | .PP |
a687059c |
457 | .I Perl |
458 | has three data types: scalars, arrays of scalars, and |
459 | associative arrays of scalars. |
460 | Normal arrays are indexed by number, and associative arrays by string. |
8d063cd8 |
461 | .PP |
a687059c |
462 | The interpretation of operations and values in perl sometimes |
463 | depends on the requirements |
464 | of the context around the operation or value. |
465 | There are three major contexts: string, numeric and array. |
466 | Certain operations return array values |
467 | in contexts wanting an array, and scalar values otherwise. |
468 | (If this is true of an operation it will be mentioned in the documentation |
469 | for that operation.) |
470 | Operations which return scalars don't care whether the context is looking |
471 | for a string or a number, but |
472 | scalar variables and values are interpreted as strings or numbers |
473 | as appropriate to the context. |
378cc40b |
474 | A scalar is interpreted as TRUE in the boolean sense if it is not the null |
8d063cd8 |
475 | string or 0. |
ffed7fef |
476 | Booleans returned by operators are 1 for true and 0 or \'\' (the null |
8d063cd8 |
477 | string) for false. |
478 | .PP |
a687059c |
479 | There are actually two varieties of null string: defined and undefined. |
480 | Undefined null strings are returned when there is no real value for something, |
481 | such as when there was an error, or at end of file, or when you refer |
482 | to an uninitialized variable or element of an array. |
483 | An undefined null string may become defined the first time you access it, but |
484 | prior to that you can use the defined() operator to determine whether the |
485 | value is defined or not. |
486 | .PP |
378cc40b |
487 | References to scalar variables always begin with \*(L'$\*(R', even when referring |
488 | to a scalar that is part of an array. |
8d063cd8 |
489 | Thus: |
490 | .nf |
491 | |
492 | .ne 3 |
378cc40b |
493 | $days \h'|2i'# a simple scalar variable |
8d063cd8 |
494 | $days[28] \h'|2i'# 29th element of array @days |
a687059c |
495 | $days{\'Feb\'}\h'|2i'# one value from an associative array |
378cc40b |
496 | $#days \h'|2i'# last index of array @days |
8d063cd8 |
497 | |
a687059c |
498 | but entire arrays or array slices are denoted by \*(L'@\*(R': |
8d063cd8 |
499 | |
500 | @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n]) |
a687059c |
501 | @days[3,4,5]\h'|2i'# same as @days[3.\|.5] |
502 | @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'}) |
503 | |
504 | and entire associative arrays are denoted by \*(L'%\*(R': |
8d063cd8 |
505 | |
a687059c |
506 | %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.) |
8d063cd8 |
507 | .fi |
508 | .PP |
a687059c |
509 | Any of these eight constructs may serve as an lvalue, |
378cc40b |
510 | that is, may be assigned to. |
a687059c |
511 | (It also turns out that an assignment is itself an lvalue in |
512 | certain contexts\*(--see examples under s, tr and chop.) |
513 | Assignment to a scalar evaluates the righthand side in a scalar context, |
514 | while assignment to an array or array slice evaluates the righthand side |
515 | in an array context. |
516 | .PP |
378cc40b |
517 | You may find the length of array @days by evaluating |
8d063cd8 |
518 | \*(L"$#days\*(R", as in |
519 | .IR csh . |
378cc40b |
520 | (Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.) |
521 | Assigning to $#days changes the length of the array. |
522 | Shortening an array by this method does not actually destroy any values. |
523 | Lengthening an array that was previously shortened recovers the values that |
524 | were in those elements. |
525 | You can also gain some measure of efficiency by preextending an array that |
526 | is going to get big. |
527 | (You can also extend an array by assigning to an element that is off the |
528 | end of the array. |
529 | This differs from assigning to $#whatever in that intervening values |
530 | are set to null rather than recovered.) |
531 | You can truncate an array down to nothing by assigning the null list () to |
532 | it. |
533 | The following are exactly equivalent |
534 | .nf |
535 | |
536 | @whatever = (); |
537 | $#whatever = $[ \- 1; |
538 | |
539 | .fi |
8d063cd8 |
540 | .PP |
ac58e20f |
541 | If you evaluate an array in a scalar context, it returns the length of |
542 | the array. |
543 | The following is always true: |
544 | .nf |
545 | |
546 | @whatever == $#whatever \- $[ + 1; |
547 | |
548 | .fi |
549 | .PP |
a687059c |
550 | Multi-dimensional arrays are not directly supported, but see the discussion |
551 | of the $; variable later for a means of emulating multiple subscripts with |
552 | an associative array. |
ac58e20f |
553 | You could also write a subroutine to turn multiple subscripts into a single |
554 | subscript. |
a687059c |
555 | .PP |
8d063cd8 |
556 | Every data type has its own namespace. |
378cc40b |
557 | You can, without fear of conflict, use the same name for a scalar variable, |
8d063cd8 |
558 | an array, an associative array, a filehandle, a subroutine name, and/or |
559 | a label. |
a687059c |
560 | Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R', |
561 | or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved |
8d063cd8 |
562 | with respect to variable names. |
563 | (They ARE reserved with respect to labels and filehandles, however, which |
378cc40b |
564 | don't have an initial special character. |
a687059c |
565 | Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\'). |
566 | Using uppercase filehandles also improves readability and protects you |
567 | from conflict with future reserved words.) |
8d063cd8 |
568 | Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all |
569 | different names. |
570 | Names which start with a letter may also contain digits and underscores. |
571 | Names which do not start with a letter are limited to one character, |
572 | e.g. \*(L"$%\*(R" or \*(L"$$\*(R". |
a687059c |
573 | (Most of the one character names have a predefined significance to |
574 | .IR perl . |
8d063cd8 |
575 | More later.) |
576 | .PP |
a687059c |
577 | Numeric literals are specified in any of the usual floating point or |
578 | integer formats: |
579 | .nf |
580 | |
581 | .ne 5 |
582 | 12345 |
583 | 12345.67 |
584 | .23E-10 |
585 | 0xffff # hex |
586 | 0377 # octal |
587 | |
588 | .fi |
8d063cd8 |
589 | String literals are delimited by either single or double quotes. |
590 | They work much like shell quotes: |
591 | double-quoted string literals are subject to backslash and variable |
a687059c |
592 | substitution; single-quoted strings are not (except for \e\' and \e\e). |
8d063cd8 |
593 | The usual backslash rules apply for making characters such as newline, tab, etc. |
594 | You can also embed newlines directly in your strings, i.e. they can end on |
595 | a different line than they begin. |
596 | This is nice, but if you forget your trailing quote, the error will not be |
a687059c |
597 | reported until |
598 | .I perl |
599 | finds another line containing the quote character, which |
8d063cd8 |
600 | may be much further on in the script. |
a687059c |
601 | Variable substitution inside strings is limited to scalar variables, normal |
602 | array values, and array slices. |
603 | (In other words, identifiers beginning with $ or @, followed by an optional |
604 | bracketed expression as a subscript.) |
8d063cd8 |
605 | The following code segment prints out \*(L"The price is $100.\*(R" |
606 | .nf |
607 | |
608 | .ne 2 |
a687059c |
609 | $Price = \'$100\';\h'|3.5i'# not interpreted |
8d063cd8 |
610 | print "The price is $Price.\e\|n";\h'|3.5i'# interpreted |
611 | |
612 | .fi |
83b4785a |
613 | Note that you can put curly brackets around the identifier to delimit it |
614 | from following alphanumerics. |
ae986130 |
615 | Also note that a single quoted string must be separated from a preceding |
616 | word by a space, since single quote is a valid character in an identifier |
617 | (see Packages). |
8d063cd8 |
618 | .PP |
450a55e4 |
619 | Two special literals are __LINE__ and __FILE__, which represent the current |
620 | line number and filename at that point in your program. |
621 | They may only be used as separate tokens; they will not be interpolated |
622 | into strings. |
623 | In addition, the token __END__ may be used to indicate the logical end of the |
624 | script before the actual end of file. |
625 | Any following text is ignored (but if the script is being read from |
626 | the standard input, then the rest of the input is available by reading |
627 | from filehandle STDIN). |
628 | The two control characters ^D and ^Z are synonyms for __END__. |
629 | .PP |
630 | A word that doesn't have any other interpretation in the grammar will be |
631 | treated as if it had single quotes around it. |
632 | For this purpose, a word consists only of alphanumeric characters and underline, |
633 | and must start with an alphabetic character. |
634 | As with filehandles and labels, a bare word that consists entirely of |
635 | lowercase letters risks conflict with future reserved words, and if you |
636 | use the |
637 | .B \-w |
638 | switch, Perl will warn you about any such words. |
639 | .PP |
a687059c |
640 | Array values are interpolated into double-quoted strings by joining all the |
641 | elements of the array with the delimiter specified in the $" variable, |
642 | space by default. |
643 | (Since in versions of perl prior to 3.0 the @ character was not a metacharacter |
644 | in double-quoted strings, the interpolation of @array, $array[EXPR], |
645 | @array[LIST], $array{EXPR}, or @array{LIST} only happens if array is |
646 | referenced elsewhere in the program or is predefined.) |
647 | The following are equivalent: |
648 | .nf |
649 | |
650 | .ne 4 |
651 | $temp = join($",@ARGV); |
652 | system "echo $temp"; |
653 | |
654 | system "echo @ARGV"; |
655 | |
656 | .fi |
ae986130 |
657 | Within search patterns (which also undergo double-quotish substitution) |
a687059c |
658 | there is a bad ambiguity: Is /$foo[bar]/ to be |
659 | interpreted as /${foo}[bar]/ (where [bar] is a character class for the |
660 | regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to |
661 | array @foo)? |
662 | If @foo doesn't otherwise exist, then it's obviously a character class. |
663 | If @foo exists, perl takes a good guess about [bar], and is almost always right. |
664 | If it does guess wrong, or if you're just plain paranoid, |
665 | you can force the correct interpretation with curly brackets as above. |
666 | .PP |
667 | A line-oriented form of quoting is based on the shell here-is syntax. |
668 | Following a << you specify a string to terminate the quoted material, and all lines |
669 | following the current line down to the terminating string are the value |
670 | of the item. |
671 | The terminating string may be either an identifier (a word), or some |
672 | quoted text. |
673 | If quoted, the type of quotes you use determines the treatment of the text, |
674 | just as in regular quoting. |
675 | An unquoted identifier works like double quotes. |
676 | There must be no space between the << and the identifier. |
677 | (If you put a space it will be treated as a null identifier, which is |
678 | valid, and matches the first blank line\*(--see Merry Christmas example below.) |
679 | The terminating string must appear by itself (unquoted and with no surrounding |
680 | whitespace) on the terminating line. |
681 | .nf |
682 | |
683 | print <<EOF; # same as above |
684 | The price is $Price. |
685 | EOF |
686 | |
687 | print <<"EOF"; # same as above |
688 | The price is $Price. |
689 | EOF |
690 | |
691 | print << x 10; # null identifier is delimiter |
692 | Merry Christmas! |
693 | |
694 | print <<`EOC`; # execute commands |
695 | echo hi there |
696 | echo lo there |
697 | EOC |
698 | |
699 | print <<foo, <<bar; # you can stack them |
700 | I said foo. |
701 | foo |
702 | I said bar. |
703 | bar |
704 | |
705 | .fi |
8d063cd8 |
706 | Array literals are denoted by separating individual values by commas, and |
79a0689e |
707 | enclosing the list in parentheses: |
708 | .nf |
709 | |
710 | (LIST) |
711 | |
712 | .fi |
8d063cd8 |
713 | In a context not requiring an array value, the value of the array literal |
714 | is the value of the final element, as in the C comma operator. |
715 | For example, |
716 | .nf |
717 | |
83b4785a |
718 | .ne 4 |
a687059c |
719 | @foo = (\'cc\', \'\-E\', $bar); |
8d063cd8 |
720 | |
721 | assigns the entire array value to array foo, but |
722 | |
a687059c |
723 | $foo = (\'cc\', \'\-E\', $bar); |
8d063cd8 |
724 | |
725 | .fi |
726 | assigns the value of variable bar to variable foo. |
79a0689e |
727 | Note that the value of an actual array in a scalar context is the length |
728 | of the array; the following assigns to $foo the value 3: |
729 | .nf |
730 | |
731 | .ne 2 |
732 | @foo = (\'cc\', \'\-E\', $bar); |
733 | $foo = @foo; # $foo gets 3 |
734 | |
735 | .fi |
736 | You may have an optional comma before the closing parenthesis of an |
737 | array literal, so that you can say: |
738 | .nf |
739 | |
740 | @foo = ( |
741 | 1, |
742 | 2, |
743 | 3, |
744 | ); |
745 | |
746 | .fi |
747 | When a LIST is evaluated, each element of the list is evaluated in |
748 | an array context, and the resulting array value is interpolated into LIST |
749 | just as if each individual element were a member of LIST. Thus arrays |
750 | lose their identity in a LIST\*(--the list |
751 | |
752 | (@foo,@bar,&SomeSub) |
753 | |
754 | contains all the elements of @foo followed by all the elements of @bar, |
755 | followed by all the elements returned by the subroutine named SomeSub. |
756 | .PP |
757 | A list value may also be subscripted like a normal array. |
758 | Examples: |
759 | .nf |
760 | |
761 | $time = (stat($file))[8]; # stat returns array value |
762 | $digit = ('a','b','c','d','e','f')[$digit-10]; |
763 | return (pop(@foo),pop(@foo))[0]; |
764 | |
765 | .fi |
766 | .PP |
8d063cd8 |
767 | Array lists may be assigned to if and only if each element of the list |
768 | is an lvalue: |
769 | .nf |
770 | |
771 | ($a, $b, $c) = (1, 2, 3); |
772 | |
a687059c |
773 | ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00); |
774 | |
775 | The final element may be an array or an associative array: |
776 | |
777 | ($a, $b, @rest) = split; |
778 | local($a, $b, %rest) = @_; |
8d063cd8 |
779 | |
780 | .fi |
a687059c |
781 | You can actually put an array anywhere in the list, but the first array |
782 | in the list will soak up all the values, and anything after it will get |
783 | a null value. |
784 | This may be useful in a local(). |
8d063cd8 |
785 | .PP |
a687059c |
786 | An associative array literal contains pairs of values to be interpreted |
787 | as a key and a value: |
788 | .nf |
789 | |
790 | .ne 2 |
791 | # same as map assignment above |
792 | %map = ('red',0x00f,'blue',0x0f0,'green',0xf00); |
793 | |
794 | .fi |
795 | Array assignment in a scalar context returns the number of elements |
796 | produced by the expression on the right side of the assignment: |
797 | .nf |
798 | |
799 | $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2 |
800 | |
801 | .fi |
8d063cd8 |
802 | .PP |
803 | There are several other pseudo-literals that you should know about. |
378cc40b |
804 | If a string is enclosed by backticks (grave accents), it first undergoes |
805 | variable substitution just like a double quoted string. |
806 | It is then interpreted as a command, and the output of that command |
807 | is the value of the pseudo-literal, like in a shell. |
450a55e4 |
808 | In a scalar context, a single string consisting of all the output is |
809 | returned. |
810 | In an array context, an array of values is returned, one for each line |
811 | of output. |
812 | (You can set $/ to use a different line terminator.) |
8d063cd8 |
813 | The command is executed each time the pseudo-literal is evaluated. |
378cc40b |
814 | The status value of the command is returned in $? (see Predefined Names |
815 | for the interpretation of $?). |
816 | Unlike in \f2csh\f1, no translation is done on the return |
8d063cd8 |
817 | data\*(--newlines remain newlines. |
378cc40b |
818 | Unlike in any of the shells, single quotes do not hide variable names |
819 | in the command from interpretation. |
820 | To pass a $ through to the shell you need to hide it with a backslash. |
8d063cd8 |
821 | .PP |
822 | Evaluating a filehandle in angle brackets yields the next line |
a687059c |
823 | from that file (newline included, so it's never false until EOF, at |
824 | which time an undefined value is returned). |
8d063cd8 |
825 | Ordinarily you must assign that value to a variable, |
ac58e20f |
826 | but there is one situation where an automatic assignment happens. |
8d063cd8 |
827 | If (and only if) the input symbol is the only thing inside the conditional of a |
828 | .I while |
829 | loop, the value is |
830 | automatically assigned to the variable \*(L"$_\*(R". |
831 | (This may seem like an odd thing to you, but you'll use the construct |
832 | in almost every |
833 | .I perl |
834 | script you write.) |
835 | Anyway, the following lines are equivalent to each other: |
836 | .nf |
837 | |
a687059c |
838 | .ne 5 |
839 | while ($_ = <STDIN>) { print; } |
840 | while (<STDIN>) { print; } |
841 | for (\|;\|<STDIN>;\|) { print; } |
842 | print while $_ = <STDIN>; |
843 | print while <STDIN>; |
8d063cd8 |
844 | |
845 | .fi |
846 | The filehandles |
a687059c |
847 | .IR STDIN , |
848 | .I STDOUT |
849 | and |
850 | .I STDERR |
851 | are predefined. |
852 | (The filehandles |
8d063cd8 |
853 | .IR stdin , |
854 | .I stdout |
855 | and |
856 | .I stderr |
a687059c |
857 | will also work except in packages, where they would be interpreted as |
858 | local identifiers rather than global.) |
8d063cd8 |
859 | Additional filehandles may be created with the |
860 | .I open |
861 | function. |
862 | .PP |
378cc40b |
863 | If a <FILEHANDLE> is used in a context that is looking for an array, an array |
864 | consisting of all the input lines is returned, one line per array element. |
865 | It's easy to make a LARGE data space this way, so use with care. |
866 | .PP |
8d063cd8 |
867 | The null filehandle <> is special and can be used to emulate the behavior of |
868 | \fIsed\fR and \fIawk\fR. |
869 | Input from <> comes either from standard input, or from each file listed on |
870 | the command line. |
871 | Here's how it works: the first time <> is evaluated, the ARGV array is checked, |
a687059c |
872 | and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard |
8d063cd8 |
873 | input. |
874 | The ARGV array is then processed as a list of filenames. |
875 | The loop |
876 | .nf |
877 | |
878 | .ne 3 |
879 | while (<>) { |
880 | .\|.\|. # code for each line |
881 | } |
882 | |
883 | .ne 10 |
884 | is equivalent to |
885 | |
a687059c |
886 | unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[; |
8d063cd8 |
887 | while ($ARGV = shift) { |
888 | open(ARGV, $ARGV); |
889 | while (<ARGV>) { |
890 | .\|.\|. # code for each line |
891 | } |
892 | } |
893 | |
894 | .fi |
895 | except that it isn't as cumbersome to say. |
896 | It really does shift array ARGV and put the current filename into |
897 | variable ARGV. |
898 | It also uses filehandle ARGV internally. |
899 | You can modify @ARGV before the first <> as long as you leave the first |
900 | filename at the beginning of the array. |
83b4785a |
901 | Line numbers ($.) continue as if the input was one big happy file. |
378cc40b |
902 | (But see example under eof for how to reset line numbers on each file.) |
8d063cd8 |
903 | .PP |
83b4785a |
904 | .ne 5 |
378cc40b |
905 | If you want to set @ARGV to your own list of files, go right ahead. |
8d063cd8 |
906 | If you want to pass switches into your script, you can |
907 | put a loop on the front like this: |
908 | .nf |
909 | |
910 | .ne 10 |
911 | while ($_ = $ARGV[0], /\|^\-/\|) { |
912 | shift; |
913 | last if /\|^\-\|\-$\|/\|; |
914 | /\|^\-D\|(.*\|)/ \|&& \|($debug = $1); |
915 | /\|^\-v\|/ \|&& \|$verbose++; |
916 | .\|.\|. # other switches |
917 | } |
918 | while (<>) { |
919 | .\|.\|. # code for each line |
920 | } |
921 | |
922 | .fi |
923 | The <> symbol will return FALSE only once. |
924 | If you call it again after this it will assume you are processing another |
a687059c |
925 | @ARGV list, and if you haven't set @ARGV, will input from |
926 | .IR STDIN . |
378cc40b |
927 | .PP |
928 | If the string inside the angle brackets is a reference to a scalar variable |
929 | (e.g. <$foo>), |
930 | then that variable contains the name of the filehandle to input from. |
931 | .PP |
932 | If the string inside angle brackets is not a filehandle, it is interpreted |
933 | as a filename pattern to be globbed, and either an array of filenames or the |
934 | next filename in the list is returned, depending on context. |
935 | One level of $ interpretation is done first, but you can't say <$foo> |
936 | because that's an indirect filehandle as explained in the previous |
937 | paragraph. |
938 | You could insert curly brackets to force interpretation as a |
939 | filename glob: <${foo}>. |
940 | Example: |
941 | .nf |
942 | |
943 | .ne 3 |
944 | while (<*.c>) { |
a687059c |
945 | chmod 0644, $_; |
378cc40b |
946 | } |
947 | |
948 | is equivalent to |
949 | |
950 | .ne 5 |
a687059c |
951 | open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|"); |
378cc40b |
952 | while (<foo>) { |
953 | chop; |
a687059c |
954 | chmod 0644, $_; |
378cc40b |
955 | } |
956 | |
957 | .fi |
958 | In fact, it's currently implemented that way. |
a687059c |
959 | (Which means it will not work on filenames with spaces in them unless |
960 | you have /bin/csh on your machine.) |
378cc40b |
961 | Of course, the shortest way to do the above is: |
962 | .nf |
963 | |
a687059c |
964 | chmod 0644, <*.c>; |
378cc40b |
965 | |
966 | .fi |
8d063cd8 |
967 | .Sh "Syntax" |
968 | .PP |
969 | A |
970 | .I perl |
971 | script consists of a sequence of declarations and commands. |
972 | The only things that need to be declared in |
973 | .I perl |
974 | are report formats and subroutines. |
975 | See the sections below for more information on those declarations. |
ffed7fef |
976 | All uninitialized user-created objects are assumed to |
a687059c |
977 | start with a null or 0 value until they |
978 | are defined by some explicit operation such as assignment. |
8d063cd8 |
979 | The sequence of commands is executed just once, unlike in |
980 | .I sed |
981 | and |
982 | .I awk |
983 | scripts, where the sequence of commands is executed for each input line. |
984 | While this means that you must explicitly loop over the lines of your input file |
985 | (or files), it also means you have much more control over which files and which |
986 | lines you look at. |
987 | (Actually, I'm lying\*(--it is possible to do an implicit loop with either the |
988 | .B \-n |
989 | or |
990 | .B \-p |
991 | switch.) |
992 | .PP |
993 | A declaration can be put anywhere a command can, but has no effect on the |
a687059c |
994 | execution of the primary sequence of commands--declarations all take effect |
995 | at compile time. |
8d063cd8 |
996 | Typically all the declarations are put at the beginning or the end of the script. |
997 | .PP |
998 | .I Perl |
999 | is, for the most part, a free-form language. |
1000 | (The only exception to this is format declarations, for fairly obvious reasons.) |
1001 | Comments are indicated by the # character, and extend to the end of the line. |
1002 | If you attempt to use /* */ C comments, it will be interpreted either as |
1003 | division or pattern matching, depending on the context. |
1004 | So don't do that. |
1005 | .Sh "Compound statements" |
1006 | In |
1007 | .IR perl , |
1008 | a sequence of commands may be treated as one command by enclosing it |
1009 | in curly brackets. |
1010 | We will call this a BLOCK. |
1011 | .PP |
1012 | The following compound commands may be used to control flow: |
1013 | .nf |
1014 | |
1015 | .ne 4 |
1016 | if (EXPR) BLOCK |
1017 | if (EXPR) BLOCK else BLOCK |
378cc40b |
1018 | if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK |
8d063cd8 |
1019 | LABEL while (EXPR) BLOCK |
1020 | LABEL while (EXPR) BLOCK continue BLOCK |
1021 | LABEL for (EXPR; EXPR; EXPR) BLOCK |
378cc40b |
1022 | LABEL foreach VAR (ARRAY) BLOCK |
8d063cd8 |
1023 | LABEL BLOCK continue BLOCK |
1024 | |
1025 | .fi |
83b4785a |
1026 | Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not |
8d063cd8 |
1027 | statements. |
1028 | This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed. |
1029 | If you want to write conditionals without curly brackets there are several |
1030 | other ways to do it. |
1031 | The following all do the same thing: |
1032 | .nf |
1033 | |
1034 | .ne 5 |
a687059c |
1035 | if (!open(foo)) { die "Can't open $foo: $!"; } |
1036 | die "Can't open $foo: $!" unless open(foo); |
1037 | open(foo) || die "Can't open $foo: $!"; # foo or bust! |
ac58e20f |
1038 | open(foo) ? \'hi mom\' : die "Can't open $foo: $!"; |
a687059c |
1039 | # a bit exotic, that last one |
8d063cd8 |
1040 | |
1041 | .fi |
8d063cd8 |
1042 | .PP |
1043 | The |
1044 | .I if |
1045 | statement is straightforward. |
1046 | Since BLOCKs are always bounded by curly brackets, there is never any |
1047 | ambiguity about which |
1048 | .I if |
1049 | an |
1050 | .I else |
1051 | goes with. |
1052 | If you use |
1053 | .I unless |
1054 | in place of |
1055 | .IR if , |
1056 | the sense of the test is reversed. |
1057 | .PP |
1058 | The |
1059 | .I while |
1060 | statement executes the block as long as the expression is true |
1061 | (does not evaluate to the null string or 0). |
1062 | The LABEL is optional, and if present, consists of an identifier followed by |
1063 | a colon. |
1064 | The LABEL identifies the loop for the loop control statements |
1065 | .IR next , |
a687059c |
1066 | .IR last , |
8d063cd8 |
1067 | and |
1068 | .I redo |
1069 | (see below). |
1070 | If there is a |
1071 | .I continue |
1072 | BLOCK, it is always executed just before |
1073 | the conditional is about to be evaluated again, similarly to the third part |
1074 | of a |
1075 | .I for |
1076 | loop in C. |
1077 | Thus it can be used to increment a loop variable, even when the loop has |
1078 | been continued via the |
1079 | .I next |
1080 | statement (similar to the C \*(L"continue\*(R" statement). |
1081 | .PP |
1082 | If the word |
1083 | .I while |
1084 | is replaced by the word |
1085 | .IR until , |
1086 | the sense of the test is reversed, but the conditional is still tested before |
1087 | the first iteration. |
1088 | .PP |
1089 | In either the |
1090 | .I if |
1091 | or the |
1092 | .I while |
1093 | statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional |
1094 | is true if the value of the last command in that block is true. |
1095 | .PP |
1096 | The |
1097 | .I for |
1098 | loop works exactly like the corresponding |
1099 | .I while |
1100 | loop: |
1101 | .nf |
1102 | |
1103 | .ne 12 |
1104 | for ($i = 1; $i < 10; $i++) { |
1105 | .\|.\|. |
1106 | } |
1107 | |
1108 | is the same as |
1109 | |
1110 | $i = 1; |
1111 | while ($i < 10) { |
1112 | .\|.\|. |
1113 | } continue { |
1114 | $i++; |
1115 | } |
1116 | .fi |
1117 | .PP |
378cc40b |
1118 | The foreach loop iterates over a normal array value and sets the variable |
1119 | VAR to be each element of the array in turn. |
450a55e4 |
1120 | The variable is implicitly local to the loop, and regains its former value |
1121 | upon exiting the loop. |
13281fa4 |
1122 | The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword, |
1123 | so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity. |
378cc40b |
1124 | If VAR is omitted, $_ is set to each value. |
1125 | If ARRAY is an actual array (as opposed to an expression returning an array |
1126 | value), you can modify each element of the array |
1127 | by modifying VAR inside the loop. |
1128 | Examples: |
1129 | .nf |
1130 | |
1131 | .ne 5 |
1132 | for (@ary) { s/foo/bar/; } |
1133 | |
1134 | foreach $elem (@elements) { |
1135 | $elem *= 2; |
1136 | } |
1137 | |
a687059c |
1138 | .ne 3 |
1139 | for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) { |
1140 | print $_, "\en"; sleep(1); |
378cc40b |
1141 | } |
1142 | |
a687059c |
1143 | for (1..15) { print "Merry Christmas\en"; } |
1144 | |
378cc40b |
1145 | .ne 3 |
450a55e4 |
1146 | foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'})) { |
378cc40b |
1147 | print "Item: $item\en"; |
1148 | } |
a687059c |
1149 | |
378cc40b |
1150 | .fi |
1151 | .PP |
8d063cd8 |
1152 | The BLOCK by itself (labeled or not) is equivalent to a loop that executes |
1153 | once. |
1154 | Thus you can use any of the loop control statements in it to leave or |
1155 | restart the block. |
1156 | The |
1157 | .I continue |
1158 | block is optional. |
1159 | This construct is particularly nice for doing case structures. |
1160 | .nf |
1161 | |
1162 | .ne 6 |
1163 | foo: { |
a687059c |
1164 | if (/^abc/) { $abc = 1; last foo; } |
1165 | if (/^def/) { $def = 1; last foo; } |
1166 | if (/^xyz/) { $xyz = 1; last foo; } |
8d063cd8 |
1167 | $nothing = 1; |
1168 | } |
1169 | |
1170 | .fi |
a687059c |
1171 | There is no official switch statement in perl, because there |
1172 | are already several ways to write the equivalent. |
1173 | In addition to the above, you could write |
378cc40b |
1174 | .nf |
1175 | |
a687059c |
1176 | .ne 6 |
1177 | foo: { |
ffed7fef |
1178 | $abc = 1, last foo if /^abc/; |
1179 | $def = 1, last foo if /^def/; |
1180 | $xyz = 1, last foo if /^xyz/; |
a687059c |
1181 | $nothing = 1; |
1182 | } |
1183 | |
1184 | or |
1185 | |
1186 | .ne 6 |
1187 | foo: { |
450a55e4 |
1188 | /^abc/ && do { $abc = 1; last foo; }; |
1189 | /^def/ && do { $def = 1; last foo; }; |
1190 | /^xyz/ && do { $xyz = 1; last foo; }; |
a687059c |
1191 | $nothing = 1; |
1192 | } |
1193 | |
1194 | or |
1195 | |
1196 | .ne 6 |
1197 | foo: { |
1198 | /^abc/ && ($abc = 1, last foo); |
1199 | /^def/ && ($def = 1, last foo); |
1200 | /^xyz/ && ($xyz = 1, last foo); |
1201 | $nothing = 1; |
1202 | } |
1203 | |
1204 | or even |
1205 | |
378cc40b |
1206 | .ne 8 |
a687059c |
1207 | if (/^abc/) |
79a0689e |
1208 | { $abc = 1; } |
a687059c |
1209 | elsif (/^def/) |
79a0689e |
1210 | { $def = 1; } |
a687059c |
1211 | elsif (/^xyz/) |
79a0689e |
1212 | { $xyz = 1; } |
a687059c |
1213 | else |
1214 | {$nothing = 1;} |
378cc40b |
1215 | |
1216 | .fi |
a687059c |
1217 | As it happens, these are all optimized internally to a switch structure, |
1218 | so perl jumps directly to the desired statement, and you needn't worry |
1219 | about perl executing a lot of unnecessary statements when you have a string |
1220 | of 50 elsifs, as long as you are testing the same simple scalar variable |
1221 | using ==, eq, or pattern matching as above. |
1222 | (If you're curious as to whether the optimizer has done this for a particular |
1223 | case statement, you can use the \-D1024 switch to list the syntax tree |
1224 | before execution.) |
8d063cd8 |
1225 | .Sh "Simple statements" |
1226 | The only kind of simple statement is an expression evaluated for its side |
1227 | effects. |
1228 | Every expression (simple statement) must be terminated with a semicolon. |
1229 | Note that this is like C, but unlike Pascal (and |
1230 | .IR awk ). |
1231 | .PP |
1232 | Any simple statement may optionally be followed by a |
1233 | single modifier, just before the terminating semicolon. |
1234 | The possible modifiers are: |
1235 | .nf |
1236 | |
1237 | .ne 4 |
1238 | if EXPR |
1239 | unless EXPR |
1240 | while EXPR |
1241 | until EXPR |
1242 | |
1243 | .fi |
1244 | The |
1245 | .I if |
1246 | and |
1247 | .I unless |
1248 | modifiers have the expected semantics. |
1249 | The |
1250 | .I while |
1251 | and |
378cc40b |
1252 | .I until |
8d063cd8 |
1253 | modifiers also have the expected semantics (conditional evaluated first), |
1254 | except when applied to a do-BLOCK command, |
1255 | in which case the block executes once before the conditional is evaluated. |
1256 | This is so that you can write loops like: |
1257 | .nf |
1258 | |
1259 | .ne 4 |
1260 | do { |
a687059c |
1261 | $_ = <STDIN>; |
8d063cd8 |
1262 | .\|.\|. |
1263 | } until $_ \|eq \|".\|\e\|n"; |
1264 | |
1265 | .fi |
1266 | (See the |
1267 | .I do |
1268 | operator below. Note also that the loop control commands described later will |
83b4785a |
1269 | NOT work in this construct, since modifiers don't take loop labels. |
8d063cd8 |
1270 | Sorry.) |
1271 | .Sh "Expressions" |
1272 | Since |
1273 | .I perl |
1274 | expressions work almost exactly like C expressions, only the differences |
1275 | will be mentioned here. |
1276 | .PP |
1277 | Here's what |
1278 | .I perl |
1279 | has that C doesn't: |
a687059c |
1280 | .Ip ** 8 2 |
1281 | The exponentiation operator. |
1282 | .Ip **= 8 |
1283 | The exponentiation assignment operator. |
8d063cd8 |
1284 | .Ip (\|) 8 3 |
1285 | The null list, used to initialize an array to null. |
1286 | .Ip . 8 |
1287 | Concatenation of two strings. |
1288 | .Ip .= 8 |
a687059c |
1289 | The concatenation assignment operator. |
8d063cd8 |
1290 | .Ip eq 8 |
1291 | String equality (== is numeric equality). |
1292 | For a mnemonic just think of \*(L"eq\*(R" as a string. |
1293 | (If you are used to the |
1294 | .I awk |
1295 | behavior of using == for either string or numeric equality |
1296 | based on the current form of the comparands, beware! |
1297 | You must be explicit here.) |
1298 | .Ip ne 8 |
1299 | String inequality (!= is numeric inequality). |
1300 | .Ip lt 8 |
1301 | String less than. |
1302 | .Ip gt 8 |
1303 | String greater than. |
1304 | .Ip le 8 |
1305 | String less than or equal. |
1306 | .Ip ge 8 |
1307 | String greater than or equal. |
1308 | .Ip =~ 8 2 |
1309 | Certain operations search or modify the string \*(L"$_\*(R" by default. |
1310 | This operator makes that kind of operation work on some other string. |
1311 | The right argument is a search pattern, substitution, or translation. |
1312 | The left argument is what is supposed to be searched, substituted, or |
1313 | translated instead of the default \*(L"$_\*(R". |
1314 | The return value indicates the success of the operation. |
1315 | (If the right argument is an expression other than a search pattern, |
1316 | substitution, or translation, it is interpreted as a search pattern |
1317 | at run time. |
1318 | This is less efficient than an explicit search, since the pattern must |
1319 | be compiled every time the expression is evaluated.) |
1320 | The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else. |
1321 | .Ip !~ 8 |
1322 | Just like =~ except the return value is negated. |
1323 | .Ip x 8 |
1324 | The repetition operator. |
1325 | Returns a string consisting of the left operand repeated the |
1326 | number of times specified by the right operand. |
1327 | .nf |
1328 | |
a687059c |
1329 | print \'\-\' x 80; # print row of dashes |
1330 | print \'\-\' x80; # illegal, x80 is identifier |
8d063cd8 |
1331 | |
a687059c |
1332 | print "\et" x ($tab/8), \' \' x ($tab%8); # tab over |
8d063cd8 |
1333 | |
1334 | .fi |
1335 | .Ip x= 8 |
a687059c |
1336 | The repetition assignment operator. |
1337 | .Ip .\|. 8 |
1338 | The range operator, which is really two different operators depending |
1339 | on the context. |
1340 | In an array context, returns an array of values counting (by ones) |
1341 | from the left value to the right value. |
1342 | This is useful for writing \*(L"for (1..10)\*(R" loops and for doing |
1343 | slice operations on arrays. |
1344 | .Sp |
1345 | In a scalar context, .\|. returns a boolean value. |
1346 | The operator is bistable, like a flip-flop.. |
1347 | Each .\|. operator maintains its own boolean state. |
378cc40b |
1348 | It is false as long as its left operand is false. |
1349 | Once the left operand is true, the range operator stays true |
1350 | until the right operand is true, |
1351 | AFTER which the range operator becomes false again. |
a687059c |
1352 | (It doesn't become false till the next time the range operator is evaluated. |
8d063cd8 |
1353 | It can become false on the same evaluation it became true, but it still returns |
1354 | true once.) |
13281fa4 |
1355 | The right operand is not evaluated while the operator is in the \*(L"false\*(R" state, |
1356 | and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state. |
a687059c |
1357 | The scalar .\|. operator is primarily intended for doing line number ranges |
1358 | after |
8d063cd8 |
1359 | the fashion of \fIsed\fR or \fIawk\fR. |
1360 | The precedence is a little lower than || and &&. |
1361 | The value returned is either the null string for false, or a sequence number |
1362 | (beginning with 1) for true. |
1363 | The sequence number is reset for each range encountered. |
a687059c |
1364 | The final sequence number in a range has the string \'E0\' appended to it, which |
8d063cd8 |
1365 | doesn't affect its numeric value, but gives you something to search for if you |
1366 | want to exclude the endpoint. |
1367 | You can exclude the beginning point by waiting for the sequence number to be |
1368 | greater than 1. |
a687059c |
1369 | If either operand of scalar .\|. is static, that operand is implicitly compared |
1370 | to the $. variable, the current line number. |
8d063cd8 |
1371 | Examples: |
1372 | .nf |
1373 | |
a687059c |
1374 | .ne 6 |
1375 | As a scalar operator: |
1376 | if (101 .\|. 200) { print; } # print 2nd hundred lines |
8d063cd8 |
1377 | |
a687059c |
1378 | next line if (1 .\|. /^$/); # skip header lines |
8d063cd8 |
1379 | |
a687059c |
1380 | s/^/> / if (/^$/ .\|. eof()); # quote body |
1381 | |
1382 | .ne 4 |
1383 | As an array operator: |
1384 | for (101 .\|. 200) { print; } # print $_ 100 times |
1385 | |
1386 | @foo = @foo[$[ .\|. $#foo]; # an expensive no-op |
1387 | @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items |
8d063cd8 |
1388 | |
1389 | .fi |
378cc40b |
1390 | .Ip \-x 8 |
1391 | A file test. |
1392 | This unary operator takes one argument, either a filename or a filehandle, |
1393 | and tests the associated file to see if something is true about it. |
a687059c |
1394 | If the argument is omitted, tests $_, except for \-t, which tests |
1395 | .IR STDIN . |
1396 | It returns 1 for true and \'\' for false, or the undefined value if the |
1397 | file doesn't exist. |
378cc40b |
1398 | Precedence is higher than logical and relational operators, but lower than |
1399 | arithmetic operators. |
1400 | The operator may be any of: |
1401 | .nf |
1402 | \-r File is readable by effective uid. |
a687059c |
1403 | \-w File is writable by effective uid. |
378cc40b |
1404 | \-x File is executable by effective uid. |
1405 | \-o File is owned by effective uid. |
1406 | \-R File is readable by real uid. |
a687059c |
1407 | \-W File is writable by real uid. |
378cc40b |
1408 | \-X File is executable by real uid. |
1409 | \-O File is owned by real uid. |
1410 | \-e File exists. |
1411 | \-z File has zero size. |
450a55e4 |
1412 | \-s File has non-zero size (returns size). |
378cc40b |
1413 | \-f File is a plain file. |
1414 | \-d File is a directory. |
1415 | \-l File is a symbolic link. |
1416 | \-p File is a named pipe (FIFO). |
1417 | \-S File is a socket. |
1418 | \-b File is a block special file. |
1419 | \-c File is a character special file. |
1420 | \-u File has setuid bit set. |
1421 | \-g File has setgid bit set. |
1422 | \-k File has sticky bit set. |
1423 | \-t Filehandle is opened to a tty. |
1424 | \-T File is a text file. |
1425 | \-B File is a binary file (opposite of \-T). |
1426 | |
1427 | .fi |
1428 | The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X |
1429 | is based solely on the mode of the file and the uids and gids of the user. |
1430 | There may be other reasons you can't actually read, write or execute the file. |
1431 | Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and |
1432 | \-x and \-X return 1 if any execute bit is set in the mode. |
1433 | Scripts run by the superuser may thus need to do a stat() in order to determine |
1434 | the actual mode of the file, or temporarily set the uid to something else. |
1435 | .Sp |
1436 | Example: |
1437 | .nf |
1438 | .ne 7 |
1439 | |
1440 | while (<>) { |
1441 | chop; |
1442 | next unless \-f $_; # ignore specials |
1443 | .\|.\|. |
1444 | } |
1445 | |
1446 | .fi |
a687059c |
1447 | Note that \-s/a/b/ does not do a negated substitution. |
1448 | Saying \-exp($foo) still works as expected, however\*(--only single letters |
378cc40b |
1449 | following a minus are interpreted as file tests. |
1450 | .Sp |
1451 | The \-T and \-B switches work as follows. |
1452 | The first block or so of the file is examined for odd characters such as |
1453 | strange control codes or metacharacters. |
1454 | If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file. |
1455 | Also, any file containing null in the first block is considered a binary file. |
1456 | If \-T or \-B is used on a filehandle, the current stdio buffer is examined |
1457 | rather than the first block. |
378cc40b |
1458 | Both \-T and \-B return TRUE on a null file, or a file at EOF when testing |
1459 | a filehandle. |
8d063cd8 |
1460 | .PP |
a687059c |
1461 | If any of the file tests (or either stat operator) are given the special |
1462 | filehandle consisting of a solitary underline, then the stat structure |
1463 | of the previous file test (or stat operator) is used, saving a system |
1464 | call. |
1465 | (This doesn't work with \-t, and you need to remember that lstat and -l |
1466 | will leave values in the stat structure for the symbolic link, not the |
1467 | real file.) |
1468 | Example: |
1469 | .nf |
1470 | |
1471 | print "Can do.\en" if -r $a || -w _ || -x _; |
1472 | |
1473 | .ne 9 |
1474 | stat($filename); |
1475 | print "Readable\en" if -r _; |
1476 | print "Writable\en" if -w _; |
1477 | print "Executable\en" if -x _; |
1478 | print "Setuid\en" if -u _; |
1479 | print "Setgid\en" if -g _; |
1480 | print "Sticky\en" if -k _; |
1481 | print "Text\en" if -T _; |
1482 | print "Binary\en" if -B _; |
1483 | |
1484 | .fi |
1485 | .PP |
8d063cd8 |
1486 | Here is what C has that |
1487 | .I perl |
1488 | doesn't: |
1489 | .Ip "unary &" 12 |
1490 | Address-of operator. |
1491 | .Ip "unary *" 12 |
1492 | Dereference-address operator. |
378cc40b |
1493 | .Ip "(TYPE)" 12 |
1494 | Type casting operator. |
8d063cd8 |
1495 | .PP |
1496 | Like C, |
1497 | .I perl |
1498 | does a certain amount of expression evaluation at compile time, whenever |
1499 | it determines that all of the arguments to an operator are static and have |
1500 | no side effects. |
1501 | In particular, string concatenation happens at compile time between literals that don't do variable substitution. |
1502 | Backslash interpretation also happens at compile time. |
1503 | You can say |
1504 | .nf |
1505 | |
1506 | .ne 2 |
a687059c |
1507 | \'Now is the time for all\' . "\|\e\|n" . |
1508 | \'good men to come to.\' |
8d063cd8 |
1509 | |
1510 | .fi |
1511 | and this all reduces to one string internally. |
1512 | .PP |
378cc40b |
1513 | The autoincrement operator has a little extra built-in magic to it. |
1514 | If you increment a variable that is numeric, or that has ever been used in |
1515 | a numeric context, you get a normal increment. |
1516 | If, however, the variable has only been used in string contexts since it |
1517 | was set, and has a value that is not null and matches the |
a687059c |
1518 | pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done |
378cc40b |
1519 | as a string, preserving each character within its range, with carry: |
1520 | .nf |
1521 | |
a687059c |
1522 | print ++($foo = \'99\'); # prints \*(L'100\*(R' |
1523 | print ++($foo = \'a0\'); # prints \*(L'a1\*(R' |
1524 | print ++($foo = \'Az\'); # prints \*(L'Ba\*(R' |
1525 | print ++($foo = \'zz\'); # prints \*(L'aaa\*(R' |
378cc40b |
1526 | |
1527 | .fi |
1528 | The autodecrement is not magical. |
0f85fab0 |
1529 | .PP |
1530 | The range operator (in an array context) makes use of the magical |
1531 | autoincrement algorithm if the minimum and maximum are strings. |
1532 | You can say |
1533 | |
1534 | @alphabet = (\'A\' .. \'Z\'); |
1535 | |
1536 | to get all the letters of the alphabet, or |
1537 | |
1538 | $hexdigit = (0 .. 9, \'a\' .. \'f\')[$num & 15]; |
1539 | |
1540 | to get a hexadecimal digit, or |
1541 | |
1542 | @z2 = (\'01\' .. \'31\'); print @z2[$mday]; |
1543 | |
1544 | to get dates with leading zeros. |
1545 | (If the final value specified is not in the sequence that the magical increment |
1546 | would produce, the sequence goes until the next value would be longer than |
1547 | the final value specified.) |
450a55e4 |
1548 | .PP |
1549 | The || and && operators differ from C's in that, rather than returning 0 or 1, |
1550 | they return the last value evaluated. |
1551 | Thus, a portable way to find out the home directory might be: |
1552 | .nf |
1553 | |
1554 | $home = $ENV{'HOME'} || $ENV{'LOGDIR'} || |
1555 | (getpwuid($<))[7] || die "You're homeless!\en"; |
1556 | |
1557 | .fi |