Commit | Line | Data |
8d063cd8 |
1 | .rn '' }` |
0f85fab0 |
2 | ''' $Header: perl_man.1,v 3.0.1.5 90/03/27 16:14:37 lwall Locked $ |
8d063cd8 |
3 | ''' |
4 | ''' $Log: perl.man.1,v $ |
0f85fab0 |
5 | ''' Revision 3.0.1.5 90/03/27 16:14:37 lwall |
6 | ''' patch16: .. now works using magical string increment |
7 | ''' |
79a0689e |
8 | ''' Revision 3.0.1.4 90/03/12 16:44:33 lwall |
9 | ''' patch13: (LIST,) now legal |
10 | ''' patch13: improved LIST documentation |
11 | ''' patch13: example of if-elsif switch was wrong |
12 | ''' |
ac58e20f |
13 | ''' Revision 3.0.1.3 90/02/28 17:54:32 lwall |
14 | ''' patch9: @array in scalar context now returns length of array |
15 | ''' patch9: in manual, example of open and ?: was backwards |
16 | ''' |
ffed7fef |
17 | ''' Revision 3.0.1.2 89/11/17 15:30:03 lwall |
18 | ''' patch5: fixed some manual typos and indent problems |
19 | ''' |
ae986130 |
20 | ''' Revision 3.0.1.1 89/11/11 04:41:22 lwall |
21 | ''' patch2: explained about sh and ${1+"$@"} |
22 | ''' patch2: documented that space must separate word and '' string |
23 | ''' |
a687059c |
24 | ''' Revision 3.0 89/10/18 15:21:29 lwall |
25 | ''' 3.0 baseline |
8d063cd8 |
26 | ''' |
27 | ''' |
28 | .de Sh |
29 | .br |
30 | .ne 5 |
31 | .PP |
32 | \fB\\$1\fR |
33 | .PP |
34 | .. |
35 | .de Sp |
36 | .if t .sp .5v |
37 | .if n .sp |
38 | .. |
39 | .de Ip |
40 | .br |
41 | .ie \\n.$>=3 .ne \\$3 |
42 | .el .ne 3 |
43 | .IP "\\$1" \\$2 |
44 | .. |
45 | ''' |
46 | ''' Set up \*(-- to give an unbreakable dash; |
47 | ''' string Tr holds user defined translation string. |
48 | ''' Bell System Logo is used as a dummy character. |
49 | ''' |
378cc40b |
50 | .tr \(*W-|\(bv\*(Tr |
8d063cd8 |
51 | .ie n \{\ |
378cc40b |
52 | .ds -- \(*W- |
53 | .if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch |
54 | .if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch |
8d063cd8 |
55 | .ds L" "" |
56 | .ds R" "" |
57 | .ds L' ' |
58 | .ds R' ' |
59 | 'br\} |
60 | .el\{\ |
61 | .ds -- \(em\| |
62 | .tr \*(Tr |
63 | .ds L" `` |
64 | .ds R" '' |
65 | .ds L' ` |
66 | .ds R' ' |
67 | 'br\} |
a687059c |
68 | .TH PERL 1 "\*(RP" |
69 | .UC |
8d063cd8 |
70 | .SH NAME |
a687059c |
71 | perl \- Practical Extraction and Report Language |
8d063cd8 |
72 | .SH SYNOPSIS |
a687059c |
73 | .B perl |
74 | [options] filename args |
8d063cd8 |
75 | .SH DESCRIPTION |
76 | .I Perl |
a687059c |
77 | is an interpreted language optimized for scanning arbitrary text files, |
8d063cd8 |
78 | extracting information from those text files, and printing reports based |
79 | on that information. |
80 | It's also a good language for many system management tasks. |
81 | The language is intended to be practical (easy to use, efficient, complete) |
82 | rather than beautiful (tiny, elegant, minimal). |
83 | It combines (in the author's opinion, anyway) some of the best features of C, |
84 | \fIsed\fR, \fIawk\fR, and \fIsh\fR, |
85 | so people familiar with those languages should have little difficulty with it. |
86 | (Language historians will also note some vestiges of \fIcsh\fR, Pascal, and |
87 | even BASIC-PLUS.) |
88 | Expression syntax corresponds quite closely to C expression syntax. |
a687059c |
89 | Unlike most Unix utilities, |
90 | .I perl |
91 | does not arbitrarily limit the size of your data\*(--if you've got |
92 | the memory, |
93 | .I perl |
94 | can slurp in your whole file as a single string. |
95 | Recursion is of unlimited depth. |
96 | And the hash tables used by associative arrays grow as necessary to prevent |
97 | degraded performance. |
98 | .I Perl |
99 | uses sophisticated pattern matching techniques to scan large amounts of |
100 | data very quickly. |
101 | Although optimized for scanning text, |
102 | .I perl |
103 | can also deal with binary data, and can make dbm files look like associative |
104 | arrays (where dbm is available). |
105 | Setuid |
106 | .I perl |
107 | scripts are safer than C programs |
108 | through a dataflow tracing mechanism which prevents many stupid security holes. |
8d063cd8 |
109 | If you have a problem that would ordinarily use \fIsed\fR |
110 | or \fIawk\fR or \fIsh\fR, but it |
111 | exceeds their capabilities or must run a little faster, |
112 | and you don't want to write the silly thing in C, then |
113 | .I perl |
114 | may be for you. |
a687059c |
115 | There are also translators to turn your |
116 | .I sed |
117 | and |
118 | .I awk |
119 | scripts into |
120 | .I perl |
121 | scripts. |
8d063cd8 |
122 | OK, enough hype. |
123 | .PP |
124 | Upon startup, |
125 | .I perl |
126 | looks for your script in one of the following places: |
127 | .Ip 1. 4 2 |
128 | Specified line by line via |
129 | .B \-e |
130 | switches on the command line. |
131 | .Ip 2. 4 2 |
132 | Contained in the file specified by the first filename on the command line. |
133 | (Note that systems supporting the #! notation invoke interpreters this way.) |
134 | .Ip 3. 4 2 |
a687059c |
135 | Passed in implicitly via standard input. |
378cc40b |
136 | This only works if there are no filename arguments\*(--to pass |
a687059c |
137 | arguments to a |
138 | .I stdin |
139 | script you must explicitly specify a \- for the script name. |
8d063cd8 |
140 | .PP |
141 | After locating your script, |
142 | .I perl |
143 | compiles it to an internal form. |
144 | If the script is syntactically correct, it is executed. |
145 | .Sh "Options" |
83b4785a |
146 | Note: on first reading this section may not make much sense to you. It's here |
8d063cd8 |
147 | at the front for easy reference. |
148 | .PP |
149 | A single-character option may be combined with the following option, if any. |
150 | This is particularly useful when invoking a script using the #! construct which |
151 | only allows one argument. Example: |
152 | .nf |
153 | |
154 | .ne 2 |
a687059c |
155 | #!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak |
8d063cd8 |
156 | .\|.\|. |
157 | |
158 | .fi |
159 | Options include: |
160 | .TP 5 |
378cc40b |
161 | .B \-a |
a687059c |
162 | turns on autosplit mode when used with a |
163 | .B \-n |
164 | or |
165 | .BR \-p . |
378cc40b |
166 | An implicit split command to the @F array |
167 | is done as the first thing inside the implicit while loop produced by |
a687059c |
168 | the |
169 | .B \-n |
170 | or |
171 | .BR \-p . |
378cc40b |
172 | .nf |
173 | |
a687059c |
174 | perl \-ane \'print pop(@F), "\en";\' |
378cc40b |
175 | |
176 | is equivalent to |
177 | |
178 | while (<>) { |
a687059c |
179 | @F = split(\' \'); |
180 | print pop(@F), "\en"; |
378cc40b |
181 | } |
182 | |
183 | .fi |
184 | .TP 5 |
a687059c |
185 | .BI \-d |
186 | runs the script under the perl debugger. |
187 | See the section on Debugging. |
188 | .TP 5 |
189 | .BI \-D number |
8d063cd8 |
190 | sets debugging flags. |
191 | To watch how it executes your script, use |
a687059c |
192 | .BR \-D14 . |
8d063cd8 |
193 | (This only works if debugging is compiled into your |
194 | .IR perl .) |
a687059c |
195 | Another nice value is \-D1024, which lists your compiled syntax tree. |
196 | And \-D512 displays compiled regular expressions. |
8d063cd8 |
197 | .TP 5 |
a687059c |
198 | .BI \-e " commandline" |
8d063cd8 |
199 | may be used to enter one line of script. |
200 | Multiple |
201 | .B \-e |
202 | commands may be given to build up a multi-line script. |
203 | If |
204 | .B \-e |
205 | is given, |
206 | .I perl |
207 | will not look for a script filename in the argument list. |
208 | .TP 5 |
a687059c |
209 | .BI \-i extension |
8d063cd8 |
210 | specifies that files processed by the <> construct are to be edited |
211 | in-place. |
212 | It does this by renaming the input file, opening the output file by the |
213 | same name, and selecting that output file as the default for print statements. |
214 | The extension, if supplied, is added to the name of the |
215 | old file to make a backup copy. |
216 | If no extension is supplied, no backup is made. |
a687059c |
217 | Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using |
8d063cd8 |
218 | the script: |
219 | .nf |
220 | |
221 | .ne 2 |
a687059c |
222 | #!/usr/bin/perl \-pi.bak |
8d063cd8 |
223 | s/foo/bar/; |
224 | |
225 | which is equivalent to |
226 | |
227 | .ne 14 |
378cc40b |
228 | #!/usr/bin/perl |
8d063cd8 |
229 | while (<>) { |
230 | if ($ARGV ne $oldargv) { |
a687059c |
231 | rename($ARGV, $ARGV . \'.bak\'); |
232 | open(ARGVOUT, ">$ARGV"); |
8d063cd8 |
233 | select(ARGVOUT); |
234 | $oldargv = $ARGV; |
235 | } |
236 | s/foo/bar/; |
237 | } |
238 | continue { |
239 | print; # this prints to original filename |
240 | } |
a687059c |
241 | select(STDOUT); |
8d063cd8 |
242 | |
243 | .fi |
a687059c |
244 | except that the |
245 | .B \-i |
246 | form doesn't need to compare $ARGV to $oldargv to know when |
8d063cd8 |
247 | the filename has changed. |
248 | It does, however, use ARGVOUT for the selected filehandle. |
a687059c |
249 | Note that |
250 | .I STDOUT |
251 | is restored as the default output filehandle after the loop. |
378cc40b |
252 | .Sp |
253 | You can use eof to locate the end of each input file, in case you want |
254 | to append to each file, or reset line numbering (see example under eof). |
8d063cd8 |
255 | .TP 5 |
a687059c |
256 | .BI \-I directory |
8d063cd8 |
257 | may be used in conjunction with |
258 | .B \-P |
259 | to tell the C preprocessor where to look for include files. |
260 | By default /usr/include and /usr/lib/perl are searched. |
261 | .TP 5 |
262 | .B \-n |
263 | causes |
264 | .I perl |
265 | to assume the following loop around your script, which makes it iterate |
a687059c |
266 | over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR: |
8d063cd8 |
267 | .nf |
268 | |
269 | .ne 3 |
270 | while (<>) { |
378cc40b |
271 | .\|.\|. # your script goes here |
8d063cd8 |
272 | } |
273 | |
274 | .fi |
275 | Note that the lines are not printed by default. |
276 | See |
277 | .B \-p |
278 | to have lines printed. |
378cc40b |
279 | Here is an efficient way to delete all files older than a week: |
280 | .nf |
281 | |
a687059c |
282 | find . \-mtime +7 \-print | perl \-ne \'chop;unlink;\' |
378cc40b |
283 | |
284 | .fi |
a687059c |
285 | This is faster than using the \-exec switch of find because you don't have to |
378cc40b |
286 | start a process on every filename found. |
8d063cd8 |
287 | .TP 5 |
288 | .B \-p |
289 | causes |
290 | .I perl |
291 | to assume the following loop around your script, which makes it iterate |
292 | over filename arguments somewhat like \fIsed\fR: |
293 | .nf |
294 | |
295 | .ne 5 |
296 | while (<>) { |
378cc40b |
297 | .\|.\|. # your script goes here |
8d063cd8 |
298 | } continue { |
299 | print; |
300 | } |
301 | |
302 | .fi |
303 | Note that the lines are printed automatically. |
304 | To suppress printing use the |
305 | .B \-n |
306 | switch. |
83b4785a |
307 | A |
308 | .B \-p |
309 | overrides a |
310 | .B \-n |
311 | switch. |
8d063cd8 |
312 | .TP 5 |
313 | .B \-P |
314 | causes your script to be run through the C preprocessor before |
315 | compilation by |
a687059c |
316 | .IR perl . |
8d063cd8 |
317 | (Since both comments and cpp directives begin with the # character, |
318 | you should avoid starting comments with any words recognized |
319 | by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".) |
320 | .TP 5 |
321 | .B \-s |
322 | enables some rudimentary switch parsing for switches on the command line |
a687059c |
323 | after the script name but before any filename arguments (or before a \-\|\-). |
83b4785a |
324 | Any switch found there is removed from @ARGV and sets the corresponding variable in the |
8d063cd8 |
325 | .I perl |
326 | script. |
327 | The following script prints \*(L"true\*(R" if and only if the script is |
a687059c |
328 | invoked with a \-xyz switch. |
8d063cd8 |
329 | .nf |
330 | |
331 | .ne 2 |
a687059c |
332 | #!/usr/bin/perl \-s |
83b4785a |
333 | if ($xyz) { print "true\en"; } |
8d063cd8 |
334 | |
335 | .fi |
378cc40b |
336 | .TP 5 |
337 | .B \-S |
a687059c |
338 | makes |
339 | .I perl |
340 | use the PATH environment variable to search for the script |
378cc40b |
341 | (unless the name of the script starts with a slash). |
342 | Typically this is used to emulate #! startup on machines that don't |
343 | support #!, in the following manner: |
344 | .nf |
345 | |
346 | #!/usr/bin/perl |
a687059c |
347 | eval "exec /usr/bin/perl \-S $0 $*" |
378cc40b |
348 | if $running_under_some_shell; |
349 | |
350 | .fi |
351 | The system ignores the first line and feeds the script to /bin/sh, |
a687059c |
352 | which proceeds to try to execute the |
353 | .I perl |
354 | script as a shell script. |
378cc40b |
355 | The shell executes the second line as a normal shell command, and thus |
a687059c |
356 | starts up the |
357 | .I perl |
358 | interpreter. |
378cc40b |
359 | On some systems $0 doesn't always contain the full pathname, |
a687059c |
360 | so the |
361 | .B \-S |
362 | tells |
363 | .I perl |
364 | to search for the script if necessary. |
365 | After |
366 | .I perl |
367 | locates the script, it parses the lines and ignores them because |
378cc40b |
368 | the variable $running_under_some_shell is never true. |
ae986130 |
369 | A better construct than $* would be ${1+"$@"}, which handles embedded spaces |
370 | and such in the filenames, but doesn't work if the script is being interpreted |
371 | by csh. |
372 | In order to start up sh rather than csh, some systems may have to replace the |
373 | #! line with a line containing just |
374 | a colon, which will be politely ignored by perl. |
378cc40b |
375 | .TP 5 |
a687059c |
376 | .B \-u |
377 | causes |
378 | .I perl |
379 | to dump core after compiling your script. |
380 | You can then take this core dump and turn it into an executable file |
381 | by using the undump program (not supplied). |
382 | This speeds startup at the expense of some disk space (which you can |
383 | minimize by stripping the executable). |
384 | (Still, a "hello world" executable comes out to about 200K on my machine.) |
385 | If you are going to run your executable as a set-id program then you |
386 | should probably compile it using taintperl rather than normal perl. |
387 | If you want to execute a portion of your script before dumping, use the |
388 | dump operator instead. |
389 | .TP 5 |
378cc40b |
390 | .B \-U |
a687059c |
391 | allows |
392 | .I perl |
393 | to do unsafe operations. |
13281fa4 |
394 | Currently the only \*(L"unsafe\*(R" operation is the unlinking of directories while |
378cc40b |
395 | running as superuser. |
396 | .TP 5 |
397 | .B \-v |
a687059c |
398 | prints the version and patchlevel of your |
399 | .I perl |
400 | executable. |
378cc40b |
401 | .TP 5 |
402 | .B \-w |
403 | prints warnings about identifiers that are mentioned only once, and scalar |
404 | variables that are used before being set. |
405 | Also warns about redefined subroutines, and references to undefined |
a687059c |
406 | filehandles or filehandles opened readonly that you are attempting to |
407 | write on. |
408 | Also warns you if you use == on values that don't look like numbers, and if |
409 | your subroutines recurse more than 100 deep. |
8d063cd8 |
410 | .Sh "Data Types and Objects" |
411 | .PP |
a687059c |
412 | .I Perl |
413 | has three data types: scalars, arrays of scalars, and |
414 | associative arrays of scalars. |
415 | Normal arrays are indexed by number, and associative arrays by string. |
8d063cd8 |
416 | .PP |
a687059c |
417 | The interpretation of operations and values in perl sometimes |
418 | depends on the requirements |
419 | of the context around the operation or value. |
420 | There are three major contexts: string, numeric and array. |
421 | Certain operations return array values |
422 | in contexts wanting an array, and scalar values otherwise. |
423 | (If this is true of an operation it will be mentioned in the documentation |
424 | for that operation.) |
425 | Operations which return scalars don't care whether the context is looking |
426 | for a string or a number, but |
427 | scalar variables and values are interpreted as strings or numbers |
428 | as appropriate to the context. |
378cc40b |
429 | A scalar is interpreted as TRUE in the boolean sense if it is not the null |
8d063cd8 |
430 | string or 0. |
ffed7fef |
431 | Booleans returned by operators are 1 for true and 0 or \'\' (the null |
8d063cd8 |
432 | string) for false. |
433 | .PP |
a687059c |
434 | There are actually two varieties of null string: defined and undefined. |
435 | Undefined null strings are returned when there is no real value for something, |
436 | such as when there was an error, or at end of file, or when you refer |
437 | to an uninitialized variable or element of an array. |
438 | An undefined null string may become defined the first time you access it, but |
439 | prior to that you can use the defined() operator to determine whether the |
440 | value is defined or not. |
441 | .PP |
378cc40b |
442 | References to scalar variables always begin with \*(L'$\*(R', even when referring |
443 | to a scalar that is part of an array. |
8d063cd8 |
444 | Thus: |
445 | .nf |
446 | |
447 | .ne 3 |
378cc40b |
448 | $days \h'|2i'# a simple scalar variable |
8d063cd8 |
449 | $days[28] \h'|2i'# 29th element of array @days |
a687059c |
450 | $days{\'Feb\'}\h'|2i'# one value from an associative array |
378cc40b |
451 | $#days \h'|2i'# last index of array @days |
8d063cd8 |
452 | |
a687059c |
453 | but entire arrays or array slices are denoted by \*(L'@\*(R': |
8d063cd8 |
454 | |
455 | @days \h'|2i'# ($days[0], $days[1],\|.\|.\|. $days[n]) |
a687059c |
456 | @days[3,4,5]\h'|2i'# same as @days[3.\|.5] |
457 | @days{'a','c'}\h'|2i'# same as ($days{'a'},$days{'c'}) |
458 | |
459 | and entire associative arrays are denoted by \*(L'%\*(R': |
8d063cd8 |
460 | |
a687059c |
461 | %days \h'|2i'# (key1, val1, key2, val2 .\|.\|.) |
8d063cd8 |
462 | .fi |
463 | .PP |
a687059c |
464 | Any of these eight constructs may serve as an lvalue, |
378cc40b |
465 | that is, may be assigned to. |
a687059c |
466 | (It also turns out that an assignment is itself an lvalue in |
467 | certain contexts\*(--see examples under s, tr and chop.) |
468 | Assignment to a scalar evaluates the righthand side in a scalar context, |
469 | while assignment to an array or array slice evaluates the righthand side |
470 | in an array context. |
471 | .PP |
378cc40b |
472 | You may find the length of array @days by evaluating |
8d063cd8 |
473 | \*(L"$#days\*(R", as in |
474 | .IR csh . |
378cc40b |
475 | (Actually, it's not the length of the array, it's the subscript of the last element, since there is (ordinarily) a 0th element.) |
476 | Assigning to $#days changes the length of the array. |
477 | Shortening an array by this method does not actually destroy any values. |
478 | Lengthening an array that was previously shortened recovers the values that |
479 | were in those elements. |
480 | You can also gain some measure of efficiency by preextending an array that |
481 | is going to get big. |
482 | (You can also extend an array by assigning to an element that is off the |
483 | end of the array. |
484 | This differs from assigning to $#whatever in that intervening values |
485 | are set to null rather than recovered.) |
486 | You can truncate an array down to nothing by assigning the null list () to |
487 | it. |
488 | The following are exactly equivalent |
489 | .nf |
490 | |
491 | @whatever = (); |
492 | $#whatever = $[ \- 1; |
493 | |
494 | .fi |
8d063cd8 |
495 | .PP |
ac58e20f |
496 | If you evaluate an array in a scalar context, it returns the length of |
497 | the array. |
498 | The following is always true: |
499 | .nf |
500 | |
501 | @whatever == $#whatever \- $[ + 1; |
502 | |
503 | .fi |
504 | .PP |
a687059c |
505 | Multi-dimensional arrays are not directly supported, but see the discussion |
506 | of the $; variable later for a means of emulating multiple subscripts with |
507 | an associative array. |
ac58e20f |
508 | You could also write a subroutine to turn multiple subscripts into a single |
509 | subscript. |
a687059c |
510 | .PP |
8d063cd8 |
511 | Every data type has its own namespace. |
378cc40b |
512 | You can, without fear of conflict, use the same name for a scalar variable, |
8d063cd8 |
513 | an array, an associative array, a filehandle, a subroutine name, and/or |
514 | a label. |
a687059c |
515 | Since variable and array references always start with \*(L'$\*(R', \*(L'@\*(R', |
516 | or \*(L'%\*(R', the \*(L"reserved\*(R" words aren't in fact reserved |
8d063cd8 |
517 | with respect to variable names. |
518 | (They ARE reserved with respect to labels and filehandles, however, which |
378cc40b |
519 | don't have an initial special character. |
a687059c |
520 | Hint: you could say open(LOG,\'logfile\') rather than open(log,\'logfile\'). |
521 | Using uppercase filehandles also improves readability and protects you |
522 | from conflict with future reserved words.) |
8d063cd8 |
523 | Case IS significant\*(--\*(L"FOO\*(R", \*(L"Foo\*(R" and \*(L"foo\*(R" are all |
524 | different names. |
525 | Names which start with a letter may also contain digits and underscores. |
526 | Names which do not start with a letter are limited to one character, |
527 | e.g. \*(L"$%\*(R" or \*(L"$$\*(R". |
a687059c |
528 | (Most of the one character names have a predefined significance to |
529 | .IR perl . |
8d063cd8 |
530 | More later.) |
531 | .PP |
a687059c |
532 | Numeric literals are specified in any of the usual floating point or |
533 | integer formats: |
534 | .nf |
535 | |
536 | .ne 5 |
537 | 12345 |
538 | 12345.67 |
539 | .23E-10 |
540 | 0xffff # hex |
541 | 0377 # octal |
542 | |
543 | .fi |
8d063cd8 |
544 | String literals are delimited by either single or double quotes. |
545 | They work much like shell quotes: |
546 | double-quoted string literals are subject to backslash and variable |
a687059c |
547 | substitution; single-quoted strings are not (except for \e\' and \e\e). |
8d063cd8 |
548 | The usual backslash rules apply for making characters such as newline, tab, etc. |
549 | You can also embed newlines directly in your strings, i.e. they can end on |
550 | a different line than they begin. |
551 | This is nice, but if you forget your trailing quote, the error will not be |
a687059c |
552 | reported until |
553 | .I perl |
554 | finds another line containing the quote character, which |
8d063cd8 |
555 | may be much further on in the script. |
a687059c |
556 | Variable substitution inside strings is limited to scalar variables, normal |
557 | array values, and array slices. |
558 | (In other words, identifiers beginning with $ or @, followed by an optional |
559 | bracketed expression as a subscript.) |
8d063cd8 |
560 | The following code segment prints out \*(L"The price is $100.\*(R" |
561 | .nf |
562 | |
563 | .ne 2 |
a687059c |
564 | $Price = \'$100\';\h'|3.5i'# not interpreted |
8d063cd8 |
565 | print "The price is $Price.\e\|n";\h'|3.5i'# interpreted |
566 | |
567 | .fi |
83b4785a |
568 | Note that you can put curly brackets around the identifier to delimit it |
569 | from following alphanumerics. |
ae986130 |
570 | Also note that a single quoted string must be separated from a preceding |
571 | word by a space, since single quote is a valid character in an identifier |
572 | (see Packages). |
8d063cd8 |
573 | .PP |
a687059c |
574 | Array values are interpolated into double-quoted strings by joining all the |
575 | elements of the array with the delimiter specified in the $" variable, |
576 | space by default. |
577 | (Since in versions of perl prior to 3.0 the @ character was not a metacharacter |
578 | in double-quoted strings, the interpolation of @array, $array[EXPR], |
579 | @array[LIST], $array{EXPR}, or @array{LIST} only happens if array is |
580 | referenced elsewhere in the program or is predefined.) |
581 | The following are equivalent: |
582 | .nf |
583 | |
584 | .ne 4 |
585 | $temp = join($",@ARGV); |
586 | system "echo $temp"; |
587 | |
588 | system "echo @ARGV"; |
589 | |
590 | .fi |
ae986130 |
591 | Within search patterns (which also undergo double-quotish substitution) |
a687059c |
592 | there is a bad ambiguity: Is /$foo[bar]/ to be |
593 | interpreted as /${foo}[bar]/ (where [bar] is a character class for the |
594 | regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to |
595 | array @foo)? |
596 | If @foo doesn't otherwise exist, then it's obviously a character class. |
597 | If @foo exists, perl takes a good guess about [bar], and is almost always right. |
598 | If it does guess wrong, or if you're just plain paranoid, |
599 | you can force the correct interpretation with curly brackets as above. |
600 | .PP |
601 | A line-oriented form of quoting is based on the shell here-is syntax. |
602 | Following a << you specify a string to terminate the quoted material, and all lines |
603 | following the current line down to the terminating string are the value |
604 | of the item. |
605 | The terminating string may be either an identifier (a word), or some |
606 | quoted text. |
607 | If quoted, the type of quotes you use determines the treatment of the text, |
608 | just as in regular quoting. |
609 | An unquoted identifier works like double quotes. |
610 | There must be no space between the << and the identifier. |
611 | (If you put a space it will be treated as a null identifier, which is |
612 | valid, and matches the first blank line\*(--see Merry Christmas example below.) |
613 | The terminating string must appear by itself (unquoted and with no surrounding |
614 | whitespace) on the terminating line. |
615 | .nf |
616 | |
617 | print <<EOF; # same as above |
618 | The price is $Price. |
619 | EOF |
620 | |
621 | print <<"EOF"; # same as above |
622 | The price is $Price. |
623 | EOF |
624 | |
625 | print << x 10; # null identifier is delimiter |
626 | Merry Christmas! |
627 | |
628 | print <<`EOC`; # execute commands |
629 | echo hi there |
630 | echo lo there |
631 | EOC |
632 | |
633 | print <<foo, <<bar; # you can stack them |
634 | I said foo. |
635 | foo |
636 | I said bar. |
637 | bar |
638 | |
639 | .fi |
8d063cd8 |
640 | Array literals are denoted by separating individual values by commas, and |
79a0689e |
641 | enclosing the list in parentheses: |
642 | .nf |
643 | |
644 | (LIST) |
645 | |
646 | .fi |
8d063cd8 |
647 | In a context not requiring an array value, the value of the array literal |
648 | is the value of the final element, as in the C comma operator. |
649 | For example, |
650 | .nf |
651 | |
83b4785a |
652 | .ne 4 |
a687059c |
653 | @foo = (\'cc\', \'\-E\', $bar); |
8d063cd8 |
654 | |
655 | assigns the entire array value to array foo, but |
656 | |
a687059c |
657 | $foo = (\'cc\', \'\-E\', $bar); |
8d063cd8 |
658 | |
659 | .fi |
660 | assigns the value of variable bar to variable foo. |
79a0689e |
661 | Note that the value of an actual array in a scalar context is the length |
662 | of the array; the following assigns to $foo the value 3: |
663 | .nf |
664 | |
665 | .ne 2 |
666 | @foo = (\'cc\', \'\-E\', $bar); |
667 | $foo = @foo; # $foo gets 3 |
668 | |
669 | .fi |
670 | You may have an optional comma before the closing parenthesis of an |
671 | array literal, so that you can say: |
672 | .nf |
673 | |
674 | @foo = ( |
675 | 1, |
676 | 2, |
677 | 3, |
678 | ); |
679 | |
680 | .fi |
681 | When a LIST is evaluated, each element of the list is evaluated in |
682 | an array context, and the resulting array value is interpolated into LIST |
683 | just as if each individual element were a member of LIST. Thus arrays |
684 | lose their identity in a LIST\*(--the list |
685 | |
686 | (@foo,@bar,&SomeSub) |
687 | |
688 | contains all the elements of @foo followed by all the elements of @bar, |
689 | followed by all the elements returned by the subroutine named SomeSub. |
690 | .PP |
691 | A list value may also be subscripted like a normal array. |
692 | Examples: |
693 | .nf |
694 | |
695 | $time = (stat($file))[8]; # stat returns array value |
696 | $digit = ('a','b','c','d','e','f')[$digit-10]; |
697 | return (pop(@foo),pop(@foo))[0]; |
698 | |
699 | .fi |
700 | .PP |
8d063cd8 |
701 | Array lists may be assigned to if and only if each element of the list |
702 | is an lvalue: |
703 | .nf |
704 | |
705 | ($a, $b, $c) = (1, 2, 3); |
706 | |
a687059c |
707 | ($map{\'red\'}, $map{\'blue\'}, $map{\'green\'}) = (0x00f, 0x0f0, 0xf00); |
708 | |
709 | The final element may be an array or an associative array: |
710 | |
711 | ($a, $b, @rest) = split; |
712 | local($a, $b, %rest) = @_; |
8d063cd8 |
713 | |
714 | .fi |
a687059c |
715 | You can actually put an array anywhere in the list, but the first array |
716 | in the list will soak up all the values, and anything after it will get |
717 | a null value. |
718 | This may be useful in a local(). |
8d063cd8 |
719 | .PP |
a687059c |
720 | An associative array literal contains pairs of values to be interpreted |
721 | as a key and a value: |
722 | .nf |
723 | |
724 | .ne 2 |
725 | # same as map assignment above |
726 | %map = ('red',0x00f,'blue',0x0f0,'green',0xf00); |
727 | |
728 | .fi |
729 | Array assignment in a scalar context returns the number of elements |
730 | produced by the expression on the right side of the assignment: |
731 | .nf |
732 | |
733 | $x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2 |
734 | |
735 | .fi |
8d063cd8 |
736 | .PP |
737 | There are several other pseudo-literals that you should know about. |
378cc40b |
738 | If a string is enclosed by backticks (grave accents), it first undergoes |
739 | variable substitution just like a double quoted string. |
740 | It is then interpreted as a command, and the output of that command |
741 | is the value of the pseudo-literal, like in a shell. |
8d063cd8 |
742 | The command is executed each time the pseudo-literal is evaluated. |
378cc40b |
743 | The status value of the command is returned in $? (see Predefined Names |
744 | for the interpretation of $?). |
745 | Unlike in \f2csh\f1, no translation is done on the return |
8d063cd8 |
746 | data\*(--newlines remain newlines. |
378cc40b |
747 | Unlike in any of the shells, single quotes do not hide variable names |
748 | in the command from interpretation. |
749 | To pass a $ through to the shell you need to hide it with a backslash. |
8d063cd8 |
750 | .PP |
751 | Evaluating a filehandle in angle brackets yields the next line |
a687059c |
752 | from that file (newline included, so it's never false until EOF, at |
753 | which time an undefined value is returned). |
8d063cd8 |
754 | Ordinarily you must assign that value to a variable, |
ac58e20f |
755 | but there is one situation where an automatic assignment happens. |
8d063cd8 |
756 | If (and only if) the input symbol is the only thing inside the conditional of a |
757 | .I while |
758 | loop, the value is |
759 | automatically assigned to the variable \*(L"$_\*(R". |
760 | (This may seem like an odd thing to you, but you'll use the construct |
761 | in almost every |
762 | .I perl |
763 | script you write.) |
764 | Anyway, the following lines are equivalent to each other: |
765 | .nf |
766 | |
a687059c |
767 | .ne 5 |
768 | while ($_ = <STDIN>) { print; } |
769 | while (<STDIN>) { print; } |
770 | for (\|;\|<STDIN>;\|) { print; } |
771 | print while $_ = <STDIN>; |
772 | print while <STDIN>; |
8d063cd8 |
773 | |
774 | .fi |
775 | The filehandles |
a687059c |
776 | .IR STDIN , |
777 | .I STDOUT |
778 | and |
779 | .I STDERR |
780 | are predefined. |
781 | (The filehandles |
8d063cd8 |
782 | .IR stdin , |
783 | .I stdout |
784 | and |
785 | .I stderr |
a687059c |
786 | will also work except in packages, where they would be interpreted as |
787 | local identifiers rather than global.) |
8d063cd8 |
788 | Additional filehandles may be created with the |
789 | .I open |
790 | function. |
791 | .PP |
378cc40b |
792 | If a <FILEHANDLE> is used in a context that is looking for an array, an array |
793 | consisting of all the input lines is returned, one line per array element. |
794 | It's easy to make a LARGE data space this way, so use with care. |
795 | .PP |
8d063cd8 |
796 | The null filehandle <> is special and can be used to emulate the behavior of |
797 | \fIsed\fR and \fIawk\fR. |
798 | Input from <> comes either from standard input, or from each file listed on |
799 | the command line. |
800 | Here's how it works: the first time <> is evaluated, the ARGV array is checked, |
a687059c |
801 | and if it is null, $ARGV[0] is set to \'-\', which when opened gives you standard |
8d063cd8 |
802 | input. |
803 | The ARGV array is then processed as a list of filenames. |
804 | The loop |
805 | .nf |
806 | |
807 | .ne 3 |
808 | while (<>) { |
809 | .\|.\|. # code for each line |
810 | } |
811 | |
812 | .ne 10 |
813 | is equivalent to |
814 | |
a687059c |
815 | unshift(@ARGV, \'\-\') \|if \|$#ARGV < $[; |
8d063cd8 |
816 | while ($ARGV = shift) { |
817 | open(ARGV, $ARGV); |
818 | while (<ARGV>) { |
819 | .\|.\|. # code for each line |
820 | } |
821 | } |
822 | |
823 | .fi |
824 | except that it isn't as cumbersome to say. |
825 | It really does shift array ARGV and put the current filename into |
826 | variable ARGV. |
827 | It also uses filehandle ARGV internally. |
828 | You can modify @ARGV before the first <> as long as you leave the first |
829 | filename at the beginning of the array. |
83b4785a |
830 | Line numbers ($.) continue as if the input was one big happy file. |
378cc40b |
831 | (But see example under eof for how to reset line numbers on each file.) |
8d063cd8 |
832 | .PP |
83b4785a |
833 | .ne 5 |
378cc40b |
834 | If you want to set @ARGV to your own list of files, go right ahead. |
8d063cd8 |
835 | If you want to pass switches into your script, you can |
836 | put a loop on the front like this: |
837 | .nf |
838 | |
839 | .ne 10 |
840 | while ($_ = $ARGV[0], /\|^\-/\|) { |
841 | shift; |
842 | last if /\|^\-\|\-$\|/\|; |
843 | /\|^\-D\|(.*\|)/ \|&& \|($debug = $1); |
844 | /\|^\-v\|/ \|&& \|$verbose++; |
845 | .\|.\|. # other switches |
846 | } |
847 | while (<>) { |
848 | .\|.\|. # code for each line |
849 | } |
850 | |
851 | .fi |
852 | The <> symbol will return FALSE only once. |
853 | If you call it again after this it will assume you are processing another |
a687059c |
854 | @ARGV list, and if you haven't set @ARGV, will input from |
855 | .IR STDIN . |
378cc40b |
856 | .PP |
857 | If the string inside the angle brackets is a reference to a scalar variable |
858 | (e.g. <$foo>), |
859 | then that variable contains the name of the filehandle to input from. |
860 | .PP |
861 | If the string inside angle brackets is not a filehandle, it is interpreted |
862 | as a filename pattern to be globbed, and either an array of filenames or the |
863 | next filename in the list is returned, depending on context. |
864 | One level of $ interpretation is done first, but you can't say <$foo> |
865 | because that's an indirect filehandle as explained in the previous |
866 | paragraph. |
867 | You could insert curly brackets to force interpretation as a |
868 | filename glob: <${foo}>. |
869 | Example: |
870 | .nf |
871 | |
872 | .ne 3 |
873 | while (<*.c>) { |
a687059c |
874 | chmod 0644, $_; |
378cc40b |
875 | } |
876 | |
877 | is equivalent to |
878 | |
879 | .ne 5 |
a687059c |
880 | open(foo, "echo *.c | tr \-s \' \et\er\ef\' \'\e\e012\e\e012\e\e012\e\e012\'|"); |
378cc40b |
881 | while (<foo>) { |
882 | chop; |
a687059c |
883 | chmod 0644, $_; |
378cc40b |
884 | } |
885 | |
886 | .fi |
887 | In fact, it's currently implemented that way. |
a687059c |
888 | (Which means it will not work on filenames with spaces in them unless |
889 | you have /bin/csh on your machine.) |
378cc40b |
890 | Of course, the shortest way to do the above is: |
891 | .nf |
892 | |
a687059c |
893 | chmod 0644, <*.c>; |
378cc40b |
894 | |
895 | .fi |
8d063cd8 |
896 | .Sh "Syntax" |
897 | .PP |
898 | A |
899 | .I perl |
900 | script consists of a sequence of declarations and commands. |
901 | The only things that need to be declared in |
902 | .I perl |
903 | are report formats and subroutines. |
904 | See the sections below for more information on those declarations. |
ffed7fef |
905 | All uninitialized user-created objects are assumed to |
a687059c |
906 | start with a null or 0 value until they |
907 | are defined by some explicit operation such as assignment. |
8d063cd8 |
908 | The sequence of commands is executed just once, unlike in |
909 | .I sed |
910 | and |
911 | .I awk |
912 | scripts, where the sequence of commands is executed for each input line. |
913 | While this means that you must explicitly loop over the lines of your input file |
914 | (or files), it also means you have much more control over which files and which |
915 | lines you look at. |
916 | (Actually, I'm lying\*(--it is possible to do an implicit loop with either the |
917 | .B \-n |
918 | or |
919 | .B \-p |
920 | switch.) |
921 | .PP |
922 | A declaration can be put anywhere a command can, but has no effect on the |
a687059c |
923 | execution of the primary sequence of commands--declarations all take effect |
924 | at compile time. |
8d063cd8 |
925 | Typically all the declarations are put at the beginning or the end of the script. |
926 | .PP |
927 | .I Perl |
928 | is, for the most part, a free-form language. |
929 | (The only exception to this is format declarations, for fairly obvious reasons.) |
930 | Comments are indicated by the # character, and extend to the end of the line. |
931 | If you attempt to use /* */ C comments, it will be interpreted either as |
932 | division or pattern matching, depending on the context. |
933 | So don't do that. |
934 | .Sh "Compound statements" |
935 | In |
936 | .IR perl , |
937 | a sequence of commands may be treated as one command by enclosing it |
938 | in curly brackets. |
939 | We will call this a BLOCK. |
940 | .PP |
941 | The following compound commands may be used to control flow: |
942 | .nf |
943 | |
944 | .ne 4 |
945 | if (EXPR) BLOCK |
946 | if (EXPR) BLOCK else BLOCK |
378cc40b |
947 | if (EXPR) BLOCK elsif (EXPR) BLOCK .\|.\|. else BLOCK |
8d063cd8 |
948 | LABEL while (EXPR) BLOCK |
949 | LABEL while (EXPR) BLOCK continue BLOCK |
950 | LABEL for (EXPR; EXPR; EXPR) BLOCK |
378cc40b |
951 | LABEL foreach VAR (ARRAY) BLOCK |
8d063cd8 |
952 | LABEL BLOCK continue BLOCK |
953 | |
954 | .fi |
83b4785a |
955 | Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not |
8d063cd8 |
956 | statements. |
957 | This means that the curly brackets are \fIrequired\fR\*(--no dangling statements allowed. |
958 | If you want to write conditionals without curly brackets there are several |
959 | other ways to do it. |
960 | The following all do the same thing: |
961 | .nf |
962 | |
963 | .ne 5 |
a687059c |
964 | if (!open(foo)) { die "Can't open $foo: $!"; } |
965 | die "Can't open $foo: $!" unless open(foo); |
966 | open(foo) || die "Can't open $foo: $!"; # foo or bust! |
ac58e20f |
967 | open(foo) ? \'hi mom\' : die "Can't open $foo: $!"; |
a687059c |
968 | # a bit exotic, that last one |
8d063cd8 |
969 | |
970 | .fi |
8d063cd8 |
971 | .PP |
972 | The |
973 | .I if |
974 | statement is straightforward. |
975 | Since BLOCKs are always bounded by curly brackets, there is never any |
976 | ambiguity about which |
977 | .I if |
978 | an |
979 | .I else |
980 | goes with. |
981 | If you use |
982 | .I unless |
983 | in place of |
984 | .IR if , |
985 | the sense of the test is reversed. |
986 | .PP |
987 | The |
988 | .I while |
989 | statement executes the block as long as the expression is true |
990 | (does not evaluate to the null string or 0). |
991 | The LABEL is optional, and if present, consists of an identifier followed by |
992 | a colon. |
993 | The LABEL identifies the loop for the loop control statements |
994 | .IR next , |
a687059c |
995 | .IR last , |
8d063cd8 |
996 | and |
997 | .I redo |
998 | (see below). |
999 | If there is a |
1000 | .I continue |
1001 | BLOCK, it is always executed just before |
1002 | the conditional is about to be evaluated again, similarly to the third part |
1003 | of a |
1004 | .I for |
1005 | loop in C. |
1006 | Thus it can be used to increment a loop variable, even when the loop has |
1007 | been continued via the |
1008 | .I next |
1009 | statement (similar to the C \*(L"continue\*(R" statement). |
1010 | .PP |
1011 | If the word |
1012 | .I while |
1013 | is replaced by the word |
1014 | .IR until , |
1015 | the sense of the test is reversed, but the conditional is still tested before |
1016 | the first iteration. |
1017 | .PP |
1018 | In either the |
1019 | .I if |
1020 | or the |
1021 | .I while |
1022 | statement, you may replace \*(L"(EXPR)\*(R" with a BLOCK, and the conditional |
1023 | is true if the value of the last command in that block is true. |
1024 | .PP |
1025 | The |
1026 | .I for |
1027 | loop works exactly like the corresponding |
1028 | .I while |
1029 | loop: |
1030 | .nf |
1031 | |
1032 | .ne 12 |
1033 | for ($i = 1; $i < 10; $i++) { |
1034 | .\|.\|. |
1035 | } |
1036 | |
1037 | is the same as |
1038 | |
1039 | $i = 1; |
1040 | while ($i < 10) { |
1041 | .\|.\|. |
1042 | } continue { |
1043 | $i++; |
1044 | } |
1045 | .fi |
1046 | .PP |
378cc40b |
1047 | The foreach loop iterates over a normal array value and sets the variable |
1048 | VAR to be each element of the array in turn. |
13281fa4 |
1049 | The \*(L"foreach\*(R" keyword is actually identical to the \*(L"for\*(R" keyword, |
1050 | so you can use \*(L"foreach\*(R" for readability or \*(L"for\*(R" for brevity. |
378cc40b |
1051 | If VAR is omitted, $_ is set to each value. |
1052 | If ARRAY is an actual array (as opposed to an expression returning an array |
1053 | value), you can modify each element of the array |
1054 | by modifying VAR inside the loop. |
1055 | Examples: |
1056 | .nf |
1057 | |
1058 | .ne 5 |
1059 | for (@ary) { s/foo/bar/; } |
1060 | |
1061 | foreach $elem (@elements) { |
1062 | $elem *= 2; |
1063 | } |
1064 | |
a687059c |
1065 | .ne 3 |
1066 | for ((10,9,8,7,6,5,4,3,2,1,\'BOOM\')) { |
1067 | print $_, "\en"; sleep(1); |
378cc40b |
1068 | } |
1069 | |
a687059c |
1070 | for (1..15) { print "Merry Christmas\en"; } |
1071 | |
378cc40b |
1072 | .ne 3 |
a687059c |
1073 | foreach $item (split(/:[\e\e\en:]*/, $ENV{\'TERMCAP\'}) { |
378cc40b |
1074 | print "Item: $item\en"; |
1075 | } |
a687059c |
1076 | |
378cc40b |
1077 | .fi |
1078 | .PP |
8d063cd8 |
1079 | The BLOCK by itself (labeled or not) is equivalent to a loop that executes |
1080 | once. |
1081 | Thus you can use any of the loop control statements in it to leave or |
1082 | restart the block. |
1083 | The |
1084 | .I continue |
1085 | block is optional. |
1086 | This construct is particularly nice for doing case structures. |
1087 | .nf |
1088 | |
1089 | .ne 6 |
1090 | foo: { |
a687059c |
1091 | if (/^abc/) { $abc = 1; last foo; } |
1092 | if (/^def/) { $def = 1; last foo; } |
1093 | if (/^xyz/) { $xyz = 1; last foo; } |
8d063cd8 |
1094 | $nothing = 1; |
1095 | } |
1096 | |
1097 | .fi |
a687059c |
1098 | There is no official switch statement in perl, because there |
1099 | are already several ways to write the equivalent. |
1100 | In addition to the above, you could write |
378cc40b |
1101 | .nf |
1102 | |
a687059c |
1103 | .ne 6 |
1104 | foo: { |
ffed7fef |
1105 | $abc = 1, last foo if /^abc/; |
1106 | $def = 1, last foo if /^def/; |
1107 | $xyz = 1, last foo if /^xyz/; |
a687059c |
1108 | $nothing = 1; |
1109 | } |
1110 | |
1111 | or |
1112 | |
1113 | .ne 6 |
1114 | foo: { |
1115 | /^abc/ && do { $abc = 1; last foo; } |
1116 | /^def/ && do { $def = 1; last foo; } |
1117 | /^xyz/ && do { $xyz = 1; last foo; } |
1118 | $nothing = 1; |
1119 | } |
1120 | |
1121 | or |
1122 | |
1123 | .ne 6 |
1124 | foo: { |
1125 | /^abc/ && ($abc = 1, last foo); |
1126 | /^def/ && ($def = 1, last foo); |
1127 | /^xyz/ && ($xyz = 1, last foo); |
1128 | $nothing = 1; |
1129 | } |
1130 | |
1131 | or even |
1132 | |
378cc40b |
1133 | .ne 8 |
a687059c |
1134 | if (/^abc/) |
79a0689e |
1135 | { $abc = 1; } |
a687059c |
1136 | elsif (/^def/) |
79a0689e |
1137 | { $def = 1; } |
a687059c |
1138 | elsif (/^xyz/) |
79a0689e |
1139 | { $xyz = 1; } |
a687059c |
1140 | else |
1141 | {$nothing = 1;} |
378cc40b |
1142 | |
1143 | .fi |
a687059c |
1144 | As it happens, these are all optimized internally to a switch structure, |
1145 | so perl jumps directly to the desired statement, and you needn't worry |
1146 | about perl executing a lot of unnecessary statements when you have a string |
1147 | of 50 elsifs, as long as you are testing the same simple scalar variable |
1148 | using ==, eq, or pattern matching as above. |
1149 | (If you're curious as to whether the optimizer has done this for a particular |
1150 | case statement, you can use the \-D1024 switch to list the syntax tree |
1151 | before execution.) |
8d063cd8 |
1152 | .Sh "Simple statements" |
1153 | The only kind of simple statement is an expression evaluated for its side |
1154 | effects. |
1155 | Every expression (simple statement) must be terminated with a semicolon. |
1156 | Note that this is like C, but unlike Pascal (and |
1157 | .IR awk ). |
1158 | .PP |
1159 | Any simple statement may optionally be followed by a |
1160 | single modifier, just before the terminating semicolon. |
1161 | The possible modifiers are: |
1162 | .nf |
1163 | |
1164 | .ne 4 |
1165 | if EXPR |
1166 | unless EXPR |
1167 | while EXPR |
1168 | until EXPR |
1169 | |
1170 | .fi |
1171 | The |
1172 | .I if |
1173 | and |
1174 | .I unless |
1175 | modifiers have the expected semantics. |
1176 | The |
1177 | .I while |
1178 | and |
378cc40b |
1179 | .I until |
8d063cd8 |
1180 | modifiers also have the expected semantics (conditional evaluated first), |
1181 | except when applied to a do-BLOCK command, |
1182 | in which case the block executes once before the conditional is evaluated. |
1183 | This is so that you can write loops like: |
1184 | .nf |
1185 | |
1186 | .ne 4 |
1187 | do { |
a687059c |
1188 | $_ = <STDIN>; |
8d063cd8 |
1189 | .\|.\|. |
1190 | } until $_ \|eq \|".\|\e\|n"; |
1191 | |
1192 | .fi |
1193 | (See the |
1194 | .I do |
1195 | operator below. Note also that the loop control commands described later will |
83b4785a |
1196 | NOT work in this construct, since modifiers don't take loop labels. |
8d063cd8 |
1197 | Sorry.) |
1198 | .Sh "Expressions" |
1199 | Since |
1200 | .I perl |
1201 | expressions work almost exactly like C expressions, only the differences |
1202 | will be mentioned here. |
1203 | .PP |
1204 | Here's what |
1205 | .I perl |
1206 | has that C doesn't: |
a687059c |
1207 | .Ip ** 8 2 |
1208 | The exponentiation operator. |
1209 | .Ip **= 8 |
1210 | The exponentiation assignment operator. |
8d063cd8 |
1211 | .Ip (\|) 8 3 |
1212 | The null list, used to initialize an array to null. |
1213 | .Ip . 8 |
1214 | Concatenation of two strings. |
1215 | .Ip .= 8 |
a687059c |
1216 | The concatenation assignment operator. |
8d063cd8 |
1217 | .Ip eq 8 |
1218 | String equality (== is numeric equality). |
1219 | For a mnemonic just think of \*(L"eq\*(R" as a string. |
1220 | (If you are used to the |
1221 | .I awk |
1222 | behavior of using == for either string or numeric equality |
1223 | based on the current form of the comparands, beware! |
1224 | You must be explicit here.) |
1225 | .Ip ne 8 |
1226 | String inequality (!= is numeric inequality). |
1227 | .Ip lt 8 |
1228 | String less than. |
1229 | .Ip gt 8 |
1230 | String greater than. |
1231 | .Ip le 8 |
1232 | String less than or equal. |
1233 | .Ip ge 8 |
1234 | String greater than or equal. |
1235 | .Ip =~ 8 2 |
1236 | Certain operations search or modify the string \*(L"$_\*(R" by default. |
1237 | This operator makes that kind of operation work on some other string. |
1238 | The right argument is a search pattern, substitution, or translation. |
1239 | The left argument is what is supposed to be searched, substituted, or |
1240 | translated instead of the default \*(L"$_\*(R". |
1241 | The return value indicates the success of the operation. |
1242 | (If the right argument is an expression other than a search pattern, |
1243 | substitution, or translation, it is interpreted as a search pattern |
1244 | at run time. |
1245 | This is less efficient than an explicit search, since the pattern must |
1246 | be compiled every time the expression is evaluated.) |
1247 | The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else. |
1248 | .Ip !~ 8 |
1249 | Just like =~ except the return value is negated. |
1250 | .Ip x 8 |
1251 | The repetition operator. |
1252 | Returns a string consisting of the left operand repeated the |
1253 | number of times specified by the right operand. |
1254 | .nf |
1255 | |
a687059c |
1256 | print \'\-\' x 80; # print row of dashes |
1257 | print \'\-\' x80; # illegal, x80 is identifier |
8d063cd8 |
1258 | |
a687059c |
1259 | print "\et" x ($tab/8), \' \' x ($tab%8); # tab over |
8d063cd8 |
1260 | |
1261 | .fi |
1262 | .Ip x= 8 |
a687059c |
1263 | The repetition assignment operator. |
1264 | .Ip .\|. 8 |
1265 | The range operator, which is really two different operators depending |
1266 | on the context. |
1267 | In an array context, returns an array of values counting (by ones) |
1268 | from the left value to the right value. |
1269 | This is useful for writing \*(L"for (1..10)\*(R" loops and for doing |
1270 | slice operations on arrays. |
1271 | .Sp |
1272 | In a scalar context, .\|. returns a boolean value. |
1273 | The operator is bistable, like a flip-flop.. |
1274 | Each .\|. operator maintains its own boolean state. |
378cc40b |
1275 | It is false as long as its left operand is false. |
1276 | Once the left operand is true, the range operator stays true |
1277 | until the right operand is true, |
1278 | AFTER which the range operator becomes false again. |
a687059c |
1279 | (It doesn't become false till the next time the range operator is evaluated. |
8d063cd8 |
1280 | It can become false on the same evaluation it became true, but it still returns |
1281 | true once.) |
13281fa4 |
1282 | The right operand is not evaluated while the operator is in the \*(L"false\*(R" state, |
1283 | and the left operand is not evaluated while the operator is in the \*(L"true\*(R" state. |
a687059c |
1284 | The scalar .\|. operator is primarily intended for doing line number ranges |
1285 | after |
8d063cd8 |
1286 | the fashion of \fIsed\fR or \fIawk\fR. |
1287 | The precedence is a little lower than || and &&. |
1288 | The value returned is either the null string for false, or a sequence number |
1289 | (beginning with 1) for true. |
1290 | The sequence number is reset for each range encountered. |
a687059c |
1291 | The final sequence number in a range has the string \'E0\' appended to it, which |
8d063cd8 |
1292 | doesn't affect its numeric value, but gives you something to search for if you |
1293 | want to exclude the endpoint. |
1294 | You can exclude the beginning point by waiting for the sequence number to be |
1295 | greater than 1. |
a687059c |
1296 | If either operand of scalar .\|. is static, that operand is implicitly compared |
1297 | to the $. variable, the current line number. |
8d063cd8 |
1298 | Examples: |
1299 | .nf |
1300 | |
a687059c |
1301 | .ne 6 |
1302 | As a scalar operator: |
1303 | if (101 .\|. 200) { print; } # print 2nd hundred lines |
8d063cd8 |
1304 | |
a687059c |
1305 | next line if (1 .\|. /^$/); # skip header lines |
8d063cd8 |
1306 | |
a687059c |
1307 | s/^/> / if (/^$/ .\|. eof()); # quote body |
1308 | |
1309 | .ne 4 |
1310 | As an array operator: |
1311 | for (101 .\|. 200) { print; } # print $_ 100 times |
1312 | |
1313 | @foo = @foo[$[ .\|. $#foo]; # an expensive no-op |
1314 | @foo = @foo[$#foo-4 .\|. $#foo]; # slice last 5 items |
8d063cd8 |
1315 | |
1316 | .fi |
378cc40b |
1317 | .Ip \-x 8 |
1318 | A file test. |
1319 | This unary operator takes one argument, either a filename or a filehandle, |
1320 | and tests the associated file to see if something is true about it. |
a687059c |
1321 | If the argument is omitted, tests $_, except for \-t, which tests |
1322 | .IR STDIN . |
1323 | It returns 1 for true and \'\' for false, or the undefined value if the |
1324 | file doesn't exist. |
378cc40b |
1325 | Precedence is higher than logical and relational operators, but lower than |
1326 | arithmetic operators. |
1327 | The operator may be any of: |
1328 | .nf |
1329 | \-r File is readable by effective uid. |
a687059c |
1330 | \-w File is writable by effective uid. |
378cc40b |
1331 | \-x File is executable by effective uid. |
1332 | \-o File is owned by effective uid. |
1333 | \-R File is readable by real uid. |
a687059c |
1334 | \-W File is writable by real uid. |
378cc40b |
1335 | \-X File is executable by real uid. |
1336 | \-O File is owned by real uid. |
1337 | \-e File exists. |
1338 | \-z File has zero size. |
1339 | \-s File has non-zero size. |
1340 | \-f File is a plain file. |
1341 | \-d File is a directory. |
1342 | \-l File is a symbolic link. |
1343 | \-p File is a named pipe (FIFO). |
1344 | \-S File is a socket. |
1345 | \-b File is a block special file. |
1346 | \-c File is a character special file. |
1347 | \-u File has setuid bit set. |
1348 | \-g File has setgid bit set. |
1349 | \-k File has sticky bit set. |
1350 | \-t Filehandle is opened to a tty. |
1351 | \-T File is a text file. |
1352 | \-B File is a binary file (opposite of \-T). |
1353 | |
1354 | .fi |
1355 | The interpretation of the file permission operators \-r, \-R, \-w, \-W, \-x and \-X |
1356 | is based solely on the mode of the file and the uids and gids of the user. |
1357 | There may be other reasons you can't actually read, write or execute the file. |
1358 | Also note that, for the superuser, \-r, \-R, \-w and \-W always return 1, and |
1359 | \-x and \-X return 1 if any execute bit is set in the mode. |
1360 | Scripts run by the superuser may thus need to do a stat() in order to determine |
1361 | the actual mode of the file, or temporarily set the uid to something else. |
1362 | .Sp |
1363 | Example: |
1364 | .nf |
1365 | .ne 7 |
1366 | |
1367 | while (<>) { |
1368 | chop; |
1369 | next unless \-f $_; # ignore specials |
1370 | .\|.\|. |
1371 | } |
1372 | |
1373 | .fi |
a687059c |
1374 | Note that \-s/a/b/ does not do a negated substitution. |
1375 | Saying \-exp($foo) still works as expected, however\*(--only single letters |
378cc40b |
1376 | following a minus are interpreted as file tests. |
1377 | .Sp |
1378 | The \-T and \-B switches work as follows. |
1379 | The first block or so of the file is examined for odd characters such as |
1380 | strange control codes or metacharacters. |
1381 | If too many odd characters (>10%) are found, it's a \-B file, otherwise it's a \-T file. |
1382 | Also, any file containing null in the first block is considered a binary file. |
1383 | If \-T or \-B is used on a filehandle, the current stdio buffer is examined |
1384 | rather than the first block. |
378cc40b |
1385 | Both \-T and \-B return TRUE on a null file, or a file at EOF when testing |
1386 | a filehandle. |
8d063cd8 |
1387 | .PP |
a687059c |
1388 | If any of the file tests (or either stat operator) are given the special |
1389 | filehandle consisting of a solitary underline, then the stat structure |
1390 | of the previous file test (or stat operator) is used, saving a system |
1391 | call. |
1392 | (This doesn't work with \-t, and you need to remember that lstat and -l |
1393 | will leave values in the stat structure for the symbolic link, not the |
1394 | real file.) |
1395 | Example: |
1396 | .nf |
1397 | |
1398 | print "Can do.\en" if -r $a || -w _ || -x _; |
1399 | |
1400 | .ne 9 |
1401 | stat($filename); |
1402 | print "Readable\en" if -r _; |
1403 | print "Writable\en" if -w _; |
1404 | print "Executable\en" if -x _; |
1405 | print "Setuid\en" if -u _; |
1406 | print "Setgid\en" if -g _; |
1407 | print "Sticky\en" if -k _; |
1408 | print "Text\en" if -T _; |
1409 | print "Binary\en" if -B _; |
1410 | |
1411 | .fi |
1412 | .PP |
8d063cd8 |
1413 | Here is what C has that |
1414 | .I perl |
1415 | doesn't: |
1416 | .Ip "unary &" 12 |
1417 | Address-of operator. |
1418 | .Ip "unary *" 12 |
1419 | Dereference-address operator. |
378cc40b |
1420 | .Ip "(TYPE)" 12 |
1421 | Type casting operator. |
8d063cd8 |
1422 | .PP |
1423 | Like C, |
1424 | .I perl |
1425 | does a certain amount of expression evaluation at compile time, whenever |
1426 | it determines that all of the arguments to an operator are static and have |
1427 | no side effects. |
1428 | In particular, string concatenation happens at compile time between literals that don't do variable substitution. |
1429 | Backslash interpretation also happens at compile time. |
1430 | You can say |
1431 | .nf |
1432 | |
1433 | .ne 2 |
a687059c |
1434 | \'Now is the time for all\' . "\|\e\|n" . |
1435 | \'good men to come to.\' |
8d063cd8 |
1436 | |
1437 | .fi |
1438 | and this all reduces to one string internally. |
1439 | .PP |
378cc40b |
1440 | The autoincrement operator has a little extra built-in magic to it. |
1441 | If you increment a variable that is numeric, or that has ever been used in |
1442 | a numeric context, you get a normal increment. |
1443 | If, however, the variable has only been used in string contexts since it |
1444 | was set, and has a value that is not null and matches the |
a687059c |
1445 | pattern /^[a\-zA\-Z]*[0\-9]*$/, the increment is done |
378cc40b |
1446 | as a string, preserving each character within its range, with carry: |
1447 | .nf |
1448 | |
a687059c |
1449 | print ++($foo = \'99\'); # prints \*(L'100\*(R' |
1450 | print ++($foo = \'a0\'); # prints \*(L'a1\*(R' |
1451 | print ++($foo = \'Az\'); # prints \*(L'Ba\*(R' |
1452 | print ++($foo = \'zz\'); # prints \*(L'aaa\*(R' |
378cc40b |
1453 | |
1454 | .fi |
1455 | The autodecrement is not magical. |
0f85fab0 |
1456 | .PP |
1457 | The range operator (in an array context) makes use of the magical |
1458 | autoincrement algorithm if the minimum and maximum are strings. |
1459 | You can say |
1460 | |
1461 | @alphabet = (\'A\' .. \'Z\'); |
1462 | |
1463 | to get all the letters of the alphabet, or |
1464 | |
1465 | $hexdigit = (0 .. 9, \'a\' .. \'f\')[$num & 15]; |
1466 | |
1467 | to get a hexadecimal digit, or |
1468 | |
1469 | @z2 = (\'01\' .. \'31\'); print @z2[$mday]; |
1470 | |
1471 | to get dates with leading zeros. |
1472 | (If the final value specified is not in the sequence that the magical increment |
1473 | would produce, the sequence goes until the next value would be longer than |
1474 | the final value specified.) |