Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
3 | perltrap - Perl traps for the unwary |
4 | |
5 | =head1 DESCRIPTION |
6 | |
cb1a09d0 |
7 | The biggest trap of all is forgetting to use the B<-w> switch; see |
8 | L<perlrun>. The second biggest trap is not making your entire program |
9 | runnable under C<use strict>. |
a0d0e21e |
10 | |
11 | =head2 Awk Traps |
12 | |
13 | Accustomed B<awk> users should take special note of the following: |
14 | |
15 | =over 4 |
16 | |
17 | =item * |
18 | |
19 | The English module, loaded via |
20 | |
21 | use English; |
22 | |
23 | allows you to refer to special variables (like $RS) as |
24 | though they were in B<awk>; see L<perlvar> for details. |
25 | |
26 | =item * |
27 | |
28 | Semicolons are required after all simple statements in Perl (except |
29 | at the end of a block). Newline is not a statement delimiter. |
30 | |
31 | =item * |
32 | |
33 | Curly brackets are required on C<if>s and C<while>s. |
34 | |
35 | =item * |
36 | |
37 | Variables begin with "$" or "@" in Perl. |
38 | |
39 | =item * |
40 | |
41 | Arrays index from 0. Likewise string positions in substr() and |
42 | index(). |
43 | |
44 | =item * |
45 | |
46 | You have to decide whether your array has numeric or string indices. |
47 | |
48 | =item * |
49 | |
50 | Associative array values do not spring into existence upon mere |
51 | reference. |
52 | |
53 | =item * |
54 | |
55 | You have to decide whether you want to use string or numeric |
56 | comparisons. |
57 | |
58 | =item * |
59 | |
60 | Reading an input line does not split it for you. You get to split it |
61 | yourself to an array. And split() operator has different |
62 | arguments. |
63 | |
64 | =item * |
65 | |
66 | The current input line is normally in $_, not $0. It generally does |
67 | not have the newline stripped. ($0 is the name of the program |
68 | executed.) See L<perlvar>. |
69 | |
70 | =item * |
71 | |
72 | $<I<digit>> does not refer to fields--it refers to substrings matched by |
73 | the last match pattern. |
74 | |
75 | =item * |
76 | |
77 | The print() statement does not add field and record separators unless |
78 | you set C<$,> and C<$.>. You can set $OFS and $ORS if you're using |
79 | the English module. |
80 | |
81 | =item * |
82 | |
83 | You must open your files before you print to them. |
84 | |
85 | =item * |
86 | |
87 | The range operator is "..", not comma. The comma operator works as in |
88 | C. |
89 | |
90 | =item * |
91 | |
92 | The match operator is "=~", not "~". ("~" is the one's complement |
93 | operator, as in C.) |
94 | |
95 | =item * |
96 | |
97 | The exponentiation operator is "**", not "^". "^" is the XOR |
98 | operator, as in C. (You know, one could get the feeling that B<awk> is |
99 | basically incompatible with C.) |
100 | |
101 | =item * |
102 | |
103 | The concatenation operator is ".", not the null string. (Using the |
104 | null string would render C</pat/ /pat/> unparsable, since the third slash |
105 | would be interpreted as a division operator--the tokener is in fact |
106 | slightly context sensitive for operators like "/", "?", and ">". |
107 | And in fact, "." itself can be the beginning of a number.) |
108 | |
109 | =item * |
110 | |
111 | The C<next>, C<exit>, and C<continue> keywords work differently. |
112 | |
113 | =item * |
114 | |
115 | |
116 | The following variables work differently: |
117 | |
118 | Awk Perl |
119 | ARGC $#ARGV or scalar @ARGV |
120 | ARGV[0] $0 |
121 | FILENAME $ARGV |
122 | FNR $. - something |
123 | FS (whatever you like) |
124 | NF $#Fld, or some such |
125 | NR $. |
126 | OFMT $# |
127 | OFS $, |
128 | ORS $\ |
129 | RLENGTH length($&) |
130 | RS $/ |
131 | RSTART length($`) |
132 | SUBSEP $; |
133 | |
134 | =item * |
135 | |
136 | You cannot set $RS to a pattern, only a string. |
137 | |
138 | =item * |
139 | |
140 | When in doubt, run the B<awk> construct through B<a2p> and see what it |
141 | gives you. |
142 | |
143 | =back |
144 | |
145 | =head2 C Traps |
146 | |
147 | Cerebral C programmers should take note of the following: |
148 | |
149 | =over 4 |
150 | |
151 | =item * |
152 | |
153 | Curly brackets are required on C<if>'s and C<while>'s. |
154 | |
155 | =item * |
156 | |
157 | You must use C<elsif> rather than C<else if>. |
158 | |
159 | =item * |
160 | |
161 | The C<break> and C<continue> keywords from C become in |
162 | Perl C<last> and C<next>, respectively. |
163 | Unlike in C, these do I<NOT> work within a C<do { } while> construct. |
164 | |
165 | =item * |
166 | |
167 | There's no switch statement. (But it's easy to build one on the fly.) |
168 | |
169 | =item * |
170 | |
171 | Variables begin with "$" or "@" in Perl. |
172 | |
173 | =item * |
174 | |
175 | printf() does not implement the "*" format for interpolating |
176 | field widths, but it's trivial to use interpolation of double-quoted |
177 | strings to achieve the same effect. |
178 | |
179 | =item * |
180 | |
181 | Comments begin with "#", not "/*". |
182 | |
183 | =item * |
184 | |
185 | You can't take the address of anything, although a similar operator |
186 | in Perl 5 is the backslash, which creates a reference. |
187 | |
188 | =item * |
189 | |
4633a7c4 |
190 | C<ARGV> must be capitalized. C<$ARGV[0]> is C's C<argv[1]>, and C<argv[0]> |
191 | ends up in C<$0>. |
a0d0e21e |
192 | |
193 | =item * |
194 | |
195 | System calls such as link(), unlink(), rename(), etc. return nonzero for |
196 | success, not 0. |
197 | |
198 | =item * |
199 | |
200 | Signal handlers deal with signal names, not numbers. Use C<kill -l> |
201 | to find their names on your system. |
202 | |
203 | =back |
204 | |
205 | =head2 Sed Traps |
206 | |
207 | Seasoned B<sed> programmers should take note of the following: |
208 | |
209 | =over 4 |
210 | |
211 | =item * |
212 | |
213 | Backreferences in substitutions use "$" rather than "\". |
214 | |
215 | =item * |
216 | |
217 | The pattern matching metacharacters "(", ")", and "|" do not have backslashes |
218 | in front. |
219 | |
220 | =item * |
221 | |
222 | The range operator is C<...>, rather than comma. |
223 | |
224 | =back |
225 | |
226 | =head2 Shell Traps |
227 | |
228 | Sharp shell programmers should take note of the following: |
229 | |
230 | =over 4 |
231 | |
232 | =item * |
233 | |
234 | The backtick operator does variable interpretation without regard to |
235 | the presence of single quotes in the command. |
236 | |
237 | =item * |
238 | |
239 | The backtick operator does no translation of the return value, unlike B<csh>. |
240 | |
241 | =item * |
242 | |
243 | Shells (especially B<csh>) do several levels of substitution on each |
244 | command line. Perl does substitution only in certain constructs |
245 | such as double quotes, backticks, angle brackets, and search patterns. |
246 | |
247 | =item * |
248 | |
249 | Shells interpret scripts a little bit at a time. Perl compiles the |
250 | entire program before executing it (except for C<BEGIN> blocks, which |
251 | execute at compile time). |
252 | |
253 | =item * |
254 | |
255 | The arguments are available via @ARGV, not $1, $2, etc. |
256 | |
257 | =item * |
258 | |
259 | The environment is not automatically made available as separate scalar |
260 | variables. |
261 | |
262 | =back |
263 | |
264 | =head2 Perl Traps |
265 | |
266 | Practicing Perl Programmers should take note of the following: |
267 | |
268 | =over 4 |
269 | |
270 | =item * |
271 | |
272 | Remember that many operations behave differently in a list |
273 | context than they do in a scalar one. See L<perldata> for details. |
274 | |
275 | =item * |
276 | |
277 | Avoid barewords if you can, especially all lower-case ones. |
278 | You can't tell just by looking at it whether a bareword is |
279 | a function or a string. By using quotes on strings and |
280 | parens on function calls, you won't ever get them confused. |
281 | |
282 | =item * |
283 | |
284 | You cannot discern from mere inspection which built-ins |
285 | are unary operators (like chop() and chdir()) |
286 | and which are list operators (like print() and unlink()). |
287 | (User-defined subroutines can B<only> be list operators, never |
288 | unary ones.) See L<perlop>. |
289 | |
290 | =item * |
291 | |
748a9306 |
292 | People have a hard time remembering that some functions |
a0d0e21e |
293 | default to $_, or @ARGV, or whatever, but that others which |
294 | you might expect to do not. |
295 | |
296 | =item * |
297 | |
748a9306 |
298 | The <FH> construct is not the name of the filehandle, it is a readline |
299 | operation on that handle. The data read is only assigned to $_ if the |
300 | file read is the sole condition in a while loop: |
301 | |
302 | while (<FH>) { } |
303 | while ($_ = <FH>) { }.. |
304 | <FH>; # data discarded! |
305 | |
306 | =item * |
307 | |
a0d0e21e |
308 | Remember not to use "C<=>" when you need "C<=~>"; |
309 | these two constructs are quite different: |
310 | |
311 | $x = /foo/; |
312 | $x =~ /foo/; |
313 | |
314 | =item * |
315 | |
316 | The C<do {}> construct isn't a real loop that you can use |
317 | loop control on. |
318 | |
319 | =item * |
320 | |
321 | Use my() for local variables whenever you can get away with |
322 | it (but see L<perlform> for where you can't). |
323 | Using local() actually gives a local value to a global |
324 | variable, which leaves you open to unforeseen side-effects |
325 | of dynamic scoping. |
326 | |
327 | =back |
328 | |
329 | =head2 Perl4 Traps |
330 | |
331 | Penitent Perl 4 Programmers should take note of the following |
332 | incompatible changes that occurred between release 4 and release 5: |
333 | |
334 | =over 4 |
335 | |
336 | =item * |
337 | |
338 | C<@> now always interpolates an array in double-quotish strings. Some programs |
339 | may now need to use backslash to protect any C<@> that shouldn't interpolate. |
340 | |
341 | =item * |
748a9306 |
342 | |
a0d0e21e |
343 | Barewords that used to look like strings to Perl will now look like subroutine |
344 | calls if a subroutine by that name is defined before the compiler sees them. |
345 | For example: |
346 | |
347 | sub SeeYa { die "Hasta la vista, baby!" } |
748a9306 |
348 | $SIG{'QUIT'} = SeeYa; |
a0d0e21e |
349 | |
350 | In Perl 4, that set the signal handler; in Perl 5, it actually calls the |
351 | function! You may use the B<-w> switch to find such places. |
352 | |
353 | =item * |
354 | |
355 | Symbols starting with C<_> are no longer forced into package C<main>, except |
356 | for $_ itself (and @_, etc.). |
357 | |
358 | =item * |
359 | |
cb1a09d0 |
360 | Double-colon is now a valid package separator in an identifier. Thus these |
361 | behave differently in perl4 vs. perl5: |
362 | |
363 | print "$a::$b::$c\n"; |
364 | print "$var::abc::xyz\n"; |
365 | |
366 | =item * |
367 | |
a0d0e21e |
368 | C<s'$lhs'$rhs'> now does no interpolation on either side. It used to |
369 | interpolate C<$lhs> but not C<$rhs>. |
370 | |
371 | =item * |
372 | |
373 | The second and third arguments of splice() are now evaluated in scalar |
374 | context (as the book says) rather than list context. |
375 | |
376 | =item * |
377 | |
378 | These are now semantic errors because of precedence: |
379 | |
380 | shift @list + 20; |
381 | $n = keys %map + 20; |
382 | |
383 | Because if that were to work, then this couldn't: |
384 | |
385 | sleep $dormancy + 20; |
386 | |
387 | =item * |
388 | |
4633a7c4 |
389 | The precedence of assignment operators is now the same as the precedence |
390 | of assignment. Perl 4 mistakenly gave them the precedence of the associated |
391 | operator. So you now must parenthesize them in expressions like |
392 | |
393 | /foo/ ? ($a += 2) : ($a -= 2); |
394 | |
395 | Otherwise |
396 | |
397 | /foo/ ? $a += 2 : $a -= 2; |
398 | |
399 | would be erroneously parsed as |
400 | |
401 | (/foo/ ? $a += 2 : $a) -= 2; |
402 | |
403 | On the other hand, |
404 | |
405 | $a += /foo/ ? 1 : 2; |
406 | |
407 | now works as a C programmer would expect. |
408 | |
409 | =item * |
410 | |
a0d0e21e |
411 | C<open FOO || die> is now incorrect. You need parens around the filehandle. |
412 | While temporarily supported, using such a construct will |
413 | generate a non-fatal (but non-suppressible) warning. |
414 | |
415 | =item * |
416 | |
417 | The elements of argument lists for formats are now evaluated in list |
418 | context. This means you can interpolate list values now. |
419 | |
420 | =item * |
421 | |
422 | You can't do a C<goto> into a block that is optimized away. Darn. |
423 | |
424 | =item * |
425 | |
426 | It is no longer syntactically legal to use whitespace as the name |
427 | of a variable, or as a delimiter for any kind of quote construct. |
428 | Double darn. |
429 | |
430 | =item * |
431 | |
432 | The caller() function now returns a false value in a scalar context if there |
433 | is no caller. This lets library files determine if they're being required. |
434 | |
435 | =item * |
436 | |
437 | C<m//g> now attaches its state to the searched string rather than the |
438 | regular expression. |
439 | |
440 | =item * |
441 | |
442 | C<reverse> is no longer allowed as the name of a sort subroutine. |
443 | |
444 | =item * |
445 | |
446 | B<taintperl> is no longer a separate executable. There is now a B<-T> |
447 | switch to turn on tainting when it isn't turned on automatically. |
448 | |
449 | =item * |
450 | |
451 | Double-quoted strings may no longer end with an unescaped C<$> or C<@>. |
452 | |
453 | =item * |
454 | |
455 | The archaic C<while/if> BLOCK BLOCK syntax is no longer supported. |
456 | |
457 | |
458 | =item * |
459 | |
460 | Negative array subscripts now count from the end of the array. |
461 | |
462 | =item * |
463 | |
464 | The comma operator in a scalar context is now guaranteed to give a |
465 | scalar context to its arguments. |
466 | |
467 | =item * |
468 | |
469 | The C<**> operator now binds more tightly than unary minus. |
470 | It was documented to work this way before, but didn't. |
471 | |
472 | =item * |
473 | |
474 | Setting C<$#array> lower now discards array elements. |
475 | |
476 | =item * |
477 | |
478 | delete() is not guaranteed to return the old value for tie()d arrays, |
479 | since this capability may be onerous for some modules to implement. |
480 | |
481 | =item * |
482 | |
748a9306 |
483 | The construct "this is $$x" used to interpolate the pid at that |
484 | point, but now tries to dereference $x. C<$$> by itself still |
485 | works fine, however. |
486 | |
487 | =item * |
488 | |
a0d0e21e |
489 | Some error messages will be different. |
490 | |
491 | =item * |
492 | |
493 | Some bugs may have been inadvertently removed. |
494 | |
495 | =back |