Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
3 | perlsub - Perl subroutines |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | To declare subroutines: |
8 | |
cb1a09d0 |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes |
11 | |
12 | sub NAME BLOCK # A declaration and a definition. |
13 | sub NAME(PROTO) BLOCK # ditto, but with prototypes |
a0d0e21e |
14 | |
748a9306 |
15 | To define an anonymous subroutine at runtime: |
16 | |
17 | $subref = sub BLOCK; |
18 | |
a0d0e21e |
19 | To import subroutines: |
20 | |
21 | use PACKAGE qw(NAME1 NAME2 NAME3); |
22 | |
23 | To call subroutines: |
24 | |
5f05dabc |
25 | NAME(LIST); # & is optional with parentheses. |
54310121 |
26 | NAME LIST; # Parentheses optional if predeclared/imported. |
cb1a09d0 |
27 | &NAME; # Passes current @_ to subroutine. |
a0d0e21e |
28 | |
29 | =head1 DESCRIPTION |
30 | |
cb1a09d0 |
31 | Like many languages, Perl provides for user-defined subroutines. These |
32 | may be located anywhere in the main program, loaded in from other files |
33 | via the C<do>, C<require>, or C<use> keywords, or even generated on the |
34 | fly using C<eval> or anonymous subroutines (closures). You can even call |
c07a80fd |
35 | a function indirectly using a variable containing its name or a CODE reference |
36 | to it, as in C<$var = \&function>. |
cb1a09d0 |
37 | |
38 | The Perl model for function call and return values is simple: all |
39 | functions are passed as parameters one single flat list of scalars, and |
40 | all functions likewise return to their caller one single flat list of |
41 | scalars. Any arrays or hashes in these call and return lists will |
42 | collapse, losing their identities--but you may always use |
43 | pass-by-reference instead to avoid this. Both call and return lists may |
44 | contain as many or as few scalar elements as you'd like. (Often a |
45 | function without an explicit return statement is called a subroutine, but |
46 | there's really no difference from the language's perspective.) |
47 | |
48 | Any arguments passed to the routine come in as the array @_. Thus if you |
49 | called a function with two arguments, those would be stored in C<$_[0]> |
3fe9a6f1 |
50 | and C<$_[1]>. The array @_ is a local array, but its elements are |
51 | aliases for the actual scalar parameters. In particular, if an element |
52 | C<$_[0]> is updated, the corresponding argument is updated (or an error |
53 | occurs if it is not updatable). If an argument is an array or hash |
54 | element which did not exist when the function was called, that element is |
55 | created only when (and if) it is modified or if a reference to it is |
56 | taken. (Some earlier versions of Perl created the element whether or not |
57 | it was assigned to.) Note that assigning to the whole array @_ removes |
58 | the aliasing, and does not update any arguments. |
59 | |
60 | The return value of the subroutine is the value of the last expression |
3e3baf6d |
61 | evaluated. Alternatively, a return statement may be used to exit the |
54310121 |
62 | subroutine, optionally specifying the returned value, which will be |
63 | evaluated in the appropriate context (list, scalar, or void) depending |
64 | on the context of the subroutine call. If you specify no return value, |
65 | the subroutine will return an empty list in a list context, an undefined |
66 | value in a scalar context, or nothing in a void context. If you return |
67 | one or more arrays and/or hashes, these will be flattened together into |
68 | one large indistinguishable list. |
cb1a09d0 |
69 | |
70 | Perl does not have named formal parameters, but in practice all you do is |
71 | assign to a my() list of these. Any variables you use in the function |
72 | that aren't declared private are global variables. For the gory details |
1fef88e7 |
73 | on creating private variables, see |
6d28dffb |
74 | L<"Private Variables via my()"> and L<"Temporary Values via local()">. |
75 | To create protected environments for a set of functions in a separate |
76 | package (and probably a separate file), see L<perlmod/"Packages">. |
a0d0e21e |
77 | |
78 | Example: |
79 | |
cb1a09d0 |
80 | sub max { |
81 | my $max = shift(@_); |
a0d0e21e |
82 | foreach $foo (@_) { |
83 | $max = $foo if $max < $foo; |
84 | } |
cb1a09d0 |
85 | return $max; |
a0d0e21e |
86 | } |
cb1a09d0 |
87 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e |
88 | |
89 | Example: |
90 | |
91 | # get a line, combining continuation lines |
92 | # that start with whitespace |
93 | |
94 | sub get_line { |
cb1a09d0 |
95 | $thisline = $lookahead; # GLOBAL VARIABLES!! |
54310121 |
96 | LINE: while (defined($lookahead = <STDIN>)) { |
a0d0e21e |
97 | if ($lookahead =~ /^[ \t]/) { |
98 | $thisline .= $lookahead; |
99 | } |
100 | else { |
101 | last LINE; |
102 | } |
103 | } |
104 | $thisline; |
105 | } |
106 | |
107 | $lookahead = <STDIN>; # get first line |
108 | while ($_ = get_line()) { |
109 | ... |
110 | } |
111 | |
112 | Use array assignment to a local list to name your formal arguments: |
113 | |
114 | sub maybeset { |
115 | my($key, $value) = @_; |
cb1a09d0 |
116 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e |
117 | } |
118 | |
cb1a09d0 |
119 | This also has the effect of turning call-by-reference into call-by-value, |
5f05dabc |
120 | because the assignment copies the values. Otherwise a function is free to |
1fef88e7 |
121 | do in-place modifications of @_ and change its caller's values. |
cb1a09d0 |
122 | |
123 | upcase_in($v1, $v2); # this changes $v1 and $v2 |
124 | sub upcase_in { |
54310121 |
125 | for (@_) { tr/a-z/A-Z/ } |
126 | } |
cb1a09d0 |
127 | |
128 | You aren't allowed to modify constants in this way, of course. If an |
129 | argument were actually literal and you tried to change it, you'd take a |
130 | (presumably fatal) exception. For example, this won't work: |
131 | |
132 | upcase_in("frederick"); |
133 | |
54310121 |
134 | It would be much safer if the upcase_in() function |
cb1a09d0 |
135 | were written to return a copy of its parameters instead |
136 | of changing them in place: |
137 | |
138 | ($v3, $v4) = upcase($v1, $v2); # this doesn't |
139 | sub upcase { |
54310121 |
140 | return unless defined wantarray; # void context, do nothing |
cb1a09d0 |
141 | my @parms = @_; |
54310121 |
142 | for (@parms) { tr/a-z/A-Z/ } |
c07a80fd |
143 | return wantarray ? @parms : $parms[0]; |
54310121 |
144 | } |
cb1a09d0 |
145 | |
146 | Notice how this (unprototyped) function doesn't care whether it was passed |
147 | real scalars or arrays. Perl will see everything as one big long flat @_ |
148 | parameter list. This is one of the ways where Perl's simple |
149 | argument-passing style shines. The upcase() function would work perfectly |
150 | well without changing the upcase() definition even if we fed it things |
151 | like this: |
152 | |
153 | @newlist = upcase(@list1, @list2); |
154 | @newlist = upcase( split /:/, $var ); |
155 | |
156 | Do not, however, be tempted to do this: |
157 | |
158 | (@a, @b) = upcase(@list1, @list2); |
159 | |
160 | Because like its flat incoming parameter list, the return list is also |
161 | flat. So all you have managed to do here is stored everything in @a and |
7b8d334a |
162 | made @b an empty list. See L<Pass by Reference> for alternatives. |
cb1a09d0 |
163 | |
5f05dabc |
164 | A subroutine may be called using the "&" prefix. The "&" is optional |
165 | in modern Perls, and so are the parentheses if the subroutine has been |
54310121 |
166 | predeclared. (Note, however, that the "&" is I<NOT> optional when |
5f05dabc |
167 | you're just naming the subroutine, such as when it's used as an |
168 | argument to defined() or undef(). Nor is it optional when you want to |
169 | do an indirect subroutine call with a subroutine name or reference |
170 | using the C<&$subref()> or C<&{$subref}()> constructs. See L<perlref> |
171 | for more on that.) |
a0d0e21e |
172 | |
173 | Subroutines may be called recursively. If a subroutine is called using |
cb1a09d0 |
174 | the "&" form, the argument list is optional, and if omitted, no @_ array is |
175 | set up for the subroutine: the @_ array at the time of the call is |
176 | visible to subroutine instead. This is an efficiency mechanism that |
177 | new users may wish to avoid. |
a0d0e21e |
178 | |
179 | &foo(1,2,3); # pass three arguments |
180 | foo(1,2,3); # the same |
181 | |
182 | foo(); # pass a null list |
183 | &foo(); # the same |
a0d0e21e |
184 | |
cb1a09d0 |
185 | &foo; # foo() get current args, like foo(@_) !! |
54310121 |
186 | foo; # like foo() IFF sub foo predeclared, else "foo" |
cb1a09d0 |
187 | |
c07a80fd |
188 | Not only does the "&" form make the argument list optional, but it also |
189 | disables any prototype checking on the arguments you do provide. This |
190 | is partly for historical reasons, and partly for having a convenient way |
191 | to cheat if you know what you're doing. See the section on Prototypes below. |
192 | |
cb1a09d0 |
193 | =head2 Private Variables via my() |
194 | |
195 | Synopsis: |
196 | |
197 | my $foo; # declare $foo lexically local |
198 | my (@wid, %get); # declare list of variables local |
199 | my $foo = "flurp"; # declare $foo lexical, and init it |
200 | my @oof = @bar; # declare @oof lexical, and init it |
201 | |
202 | A "my" declares the listed variables to be confined (lexically) to the |
55497cff |
203 | enclosing block, conditional (C<if/unless/elsif/else>), loop |
204 | (C<for/foreach/while/until/continue>), subroutine, C<eval>, or |
205 | C<do/require/use>'d file. If more than one value is listed, the list |
5f05dabc |
206 | must be placed in parentheses. All listed elements must be legal lvalues. |
55497cff |
207 | Only alphanumeric identifiers may be lexically scoped--magical |
208 | builtins like $/ must currently be localized with "local" instead. |
cb1a09d0 |
209 | |
210 | Unlike dynamic variables created by the "local" statement, lexical |
211 | variables declared with "my" are totally hidden from the outside world, |
212 | including any called subroutines (even if it's the same subroutine called |
213 | from itself or elsewhere--every call gets its own copy). |
214 | |
215 | (An eval(), however, can see the lexical variables of the scope it is |
216 | being evaluated in so long as the names aren't hidden by declarations within |
217 | the eval() itself. See L<perlref>.) |
218 | |
219 | The parameter list to my() may be assigned to if desired, which allows you |
220 | to initialize your variables. (If no initializer is given for a |
221 | particular variable, it is created with the undefined value.) Commonly |
222 | this is used to name the parameters to a subroutine. Examples: |
223 | |
224 | $arg = "fred"; # "global" variable |
225 | $n = cube_root(27); |
226 | print "$arg thinks the root is $n\n"; |
227 | fred thinks the root is 3 |
228 | |
229 | sub cube_root { |
230 | my $arg = shift; # name doesn't matter |
231 | $arg **= 1/3; |
232 | return $arg; |
54310121 |
233 | } |
cb1a09d0 |
234 | |
235 | The "my" is simply a modifier on something you might assign to. So when |
236 | you do assign to the variables in its argument list, the "my" doesn't |
237 | change whether those variables is viewed as a scalar or an array. So |
238 | |
239 | my ($foo) = <STDIN>; |
240 | my @FOO = <STDIN>; |
241 | |
5f05dabc |
242 | both supply a list context to the right-hand side, while |
cb1a09d0 |
243 | |
244 | my $foo = <STDIN>; |
245 | |
5f05dabc |
246 | supplies a scalar context. But the following declares only one variable: |
748a9306 |
247 | |
cb1a09d0 |
248 | my $foo, $bar = 1; |
748a9306 |
249 | |
cb1a09d0 |
250 | That has the same effect as |
748a9306 |
251 | |
cb1a09d0 |
252 | my $foo; |
253 | $bar = 1; |
a0d0e21e |
254 | |
cb1a09d0 |
255 | The declared variable is not introduced (is not visible) until after |
256 | the current statement. Thus, |
257 | |
258 | my $x = $x; |
259 | |
54310121 |
260 | can be used to initialize the new $x with the value of the old $x, and |
cb1a09d0 |
261 | the expression |
262 | |
263 | my $x = 123 and $x == 123 |
264 | |
265 | is false unless the old $x happened to have the value 123. |
266 | |
55497cff |
267 | Lexical scopes of control structures are not bounded precisely by the |
268 | braces that delimit their controlled blocks; control expressions are |
269 | part of the scope, too. Thus in the loop |
270 | |
54310121 |
271 | while (defined(my $line = <>)) { |
55497cff |
272 | $line = lc $line; |
273 | } continue { |
274 | print $line; |
275 | } |
276 | |
277 | the scope of $line extends from its declaration throughout the rest of |
278 | the loop construct (including the C<continue> clause), but not beyond |
279 | it. Similarly, in the conditional |
280 | |
281 | if ((my $answer = <STDIN>) =~ /^yes$/i) { |
282 | user_agrees(); |
283 | } elsif ($answer =~ /^no$/i) { |
284 | user_disagrees(); |
285 | } else { |
286 | chomp $answer; |
287 | die "'$answer' is neither 'yes' nor 'no'"; |
288 | } |
289 | |
290 | the scope of $answer extends from its declaration throughout the rest |
291 | of the conditional (including C<elsif> and C<else> clauses, if any), |
292 | but not beyond it. |
293 | |
294 | (None of the foregoing applies to C<if/unless> or C<while/until> |
295 | modifiers appended to simple statements. Such modifiers are not |
296 | control structures and have no effect on scoping.) |
297 | |
5f05dabc |
298 | The C<foreach> loop defaults to scoping its index variable dynamically |
55497cff |
299 | (in the manner of C<local>; see below). However, if the index |
300 | variable is prefixed with the keyword "my", then it is lexically |
301 | scoped instead. Thus in the loop |
302 | |
303 | for my $i (1, 2, 3) { |
304 | some_function(); |
305 | } |
306 | |
307 | the scope of $i extends to the end of the loop, but not beyond it, and |
308 | so the value of $i is unavailable in some_function(). |
309 | |
cb1a09d0 |
310 | Some users may wish to encourage the use of lexically scoped variables. |
311 | As an aid to catching implicit references to package variables, |
312 | if you say |
313 | |
314 | use strict 'vars'; |
315 | |
316 | then any variable reference from there to the end of the enclosing |
317 | block must either refer to a lexical variable, or must be fully |
318 | qualified with the package name. A compilation error results |
319 | otherwise. An inner block may countermand this with S<"no strict 'vars'">. |
320 | |
321 | A my() has both a compile-time and a run-time effect. At compile time, |
322 | the compiler takes notice of it; the principle usefulness of this is to |
7bac28a0 |
323 | quiet C<use strict 'vars'>. The actual initialization is delayed until |
324 | run time, so it gets executed appropriately; every time through a loop, |
325 | for example. |
cb1a09d0 |
326 | |
327 | Variables declared with "my" are not part of any package and are therefore |
328 | never fully qualified with the package name. In particular, you're not |
329 | allowed to try to make a package variable (or other global) lexical: |
330 | |
331 | my $pack::var; # ERROR! Illegal syntax |
332 | my $_; # also illegal (currently) |
333 | |
334 | In fact, a dynamic variable (also known as package or global variables) |
335 | are still accessible using the fully qualified :: notation even while a |
336 | lexical of the same name is also visible: |
337 | |
338 | package main; |
339 | local $x = 10; |
340 | my $x = 20; |
341 | print "$x and $::x\n"; |
342 | |
343 | That will print out 20 and 10. |
344 | |
5f05dabc |
345 | You may declare "my" variables at the outermost scope of a file to |
346 | hide any such identifiers totally from the outside world. This is similar |
6d28dffb |
347 | to C's static variables at the file level. To do this with a subroutine |
cb1a09d0 |
348 | requires the use of a closure (anonymous function). If a block (such as |
349 | an eval(), function, or C<package>) wants to create a private subroutine |
350 | that cannot be called from outside that block, it can declare a lexical |
351 | variable containing an anonymous sub reference: |
352 | |
353 | my $secret_version = '1.001-beta'; |
354 | my $secret_sub = sub { print $secret_version }; |
355 | &$secret_sub(); |
356 | |
357 | As long as the reference is never returned by any function within the |
5f05dabc |
358 | module, no outside module can see the subroutine, because its name is not in |
cb1a09d0 |
359 | any package's symbol table. Remember that it's not I<REALLY> called |
360 | $some_pack::secret_version or anything; it's just $secret_version, |
361 | unqualified and unqualifiable. |
362 | |
363 | This does not work with object methods, however; all object methods have |
364 | to be in the symbol table of some package to be found. |
365 | |
366 | Just because the lexical variable is lexically (also called statically) |
367 | scoped doesn't mean that within a function it works like a C static. It |
368 | normally works more like a C auto. But here's a mechanism for giving a |
369 | function private variables with both lexical scoping and a static |
370 | lifetime. If you do want to create something like C's static variables, |
371 | just enclose the whole function in an extra block, and put the |
372 | static variable outside the function but in the block. |
373 | |
374 | { |
54310121 |
375 | my $secret_val = 0; |
cb1a09d0 |
376 | sub gimme_another { |
377 | return ++$secret_val; |
54310121 |
378 | } |
379 | } |
cb1a09d0 |
380 | # $secret_val now becomes unreachable by the outside |
381 | # world, but retains its value between calls to gimme_another |
382 | |
54310121 |
383 | If this function is being sourced in from a separate file |
cb1a09d0 |
384 | via C<require> or C<use>, then this is probably just fine. If it's |
54310121 |
385 | all in the main program, you'll need to arrange for the my() |
cb1a09d0 |
386 | to be executed early, either by putting the whole block above |
93e318e6 |
387 | your main program, or more likely, placing merely a BEGIN |
cb1a09d0 |
388 | sub around it to make sure it gets executed before your program |
389 | starts to run: |
390 | |
391 | sub BEGIN { |
54310121 |
392 | my $secret_val = 0; |
cb1a09d0 |
393 | sub gimme_another { |
394 | return ++$secret_val; |
54310121 |
395 | } |
396 | } |
cb1a09d0 |
397 | |
398 | See L<perlrun> about the BEGIN function. |
399 | |
400 | =head2 Temporary Values via local() |
401 | |
402 | B<NOTE>: In general, you should be using "my" instead of "local", because |
6d28dffb |
403 | it's faster and safer. Exceptions to this include the global punctuation |
cb1a09d0 |
404 | variables, filehandles and formats, and direct manipulation of the Perl |
405 | symbol table itself. Format variables often use "local" though, as do |
406 | other variables whose current value must be visible to called |
407 | subroutines. |
408 | |
409 | Synopsis: |
410 | |
411 | local $foo; # declare $foo dynamically local |
412 | local (@wid, %get); # declare list of variables local |
413 | local $foo = "flurp"; # declare $foo dynamic, and init it |
414 | local @oof = @bar; # declare @oof dynamic, and init it |
415 | |
416 | local *FH; # localize $FH, @FH, %FH, &FH ... |
417 | local *merlyn = *randal; # now $merlyn is really $randal, plus |
418 | # @merlyn is really @randal, etc |
419 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal |
54310121 |
420 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
cb1a09d0 |
421 | |
422 | A local() modifies its listed variables to be local to the enclosing |
5f05dabc |
423 | block, (or subroutine, C<eval{}>, or C<do>) and I<any called from |
cb1a09d0 |
424 | within that block>. A local() just gives temporary values to global |
425 | (meaning package) variables. This is known as dynamic scoping. Lexical |
426 | scoping is done with "my", which works more like C's auto declarations. |
427 | |
428 | If more than one variable is given to local(), they must be placed in |
5f05dabc |
429 | parentheses. All listed elements must be legal lvalues. This operator works |
cb1a09d0 |
430 | by saving the current values of those variables in its argument list on a |
5f05dabc |
431 | hidden stack and restoring them upon exiting the block, subroutine, or |
cb1a09d0 |
432 | eval. This means that called subroutines can also reference the local |
433 | variable, but not the global one. The argument list may be assigned to if |
434 | desired, which allows you to initialize your local variables. (If no |
435 | initializer is given for a particular variable, it is created with an |
436 | undefined value.) Commonly this is used to name the parameters to a |
437 | subroutine. Examples: |
438 | |
439 | for $i ( 0 .. 9 ) { |
440 | $digits{$i} = $i; |
54310121 |
441 | } |
cb1a09d0 |
442 | # assume this function uses global %digits hash |
54310121 |
443 | parse_num(); |
cb1a09d0 |
444 | |
445 | # now temporarily add to %digits hash |
446 | if ($base12) { |
447 | # (NOTE: not claiming this is efficient!) |
448 | local %digits = (%digits, 't' => 10, 'e' => 11); |
449 | parse_num(); # parse_num gets this new %digits! |
450 | } |
451 | # old %digits restored here |
452 | |
1fef88e7 |
453 | Because local() is a run-time command, it gets executed every time |
cb1a09d0 |
454 | through a loop. In releases of Perl previous to 5.0, this used more stack |
455 | storage each time until the loop was exited. Perl now reclaims the space |
456 | each time through, but it's still more efficient to declare your variables |
457 | outside the loop. |
458 | |
459 | A local is simply a modifier on an lvalue expression. When you assign to |
460 | a localized variable, the local doesn't change whether its list is viewed |
461 | as a scalar or an array. So |
462 | |
463 | local($foo) = <STDIN>; |
464 | local @FOO = <STDIN>; |
465 | |
5f05dabc |
466 | both supply a list context to the right-hand side, while |
cb1a09d0 |
467 | |
468 | local $foo = <STDIN>; |
469 | |
470 | supplies a scalar context. |
471 | |
3e3baf6d |
472 | A note about C<local()> and composite types is in order. Something |
473 | like C<local(%foo)> works by temporarily placing a brand new hash in |
474 | the symbol table. The old hash is left alone, but is hidden "behind" |
475 | the new one. |
476 | |
477 | This means the old variable is completely invisible via the symbol |
478 | table (i.e. the hash entry in the C<*foo> typeglob) for the duration |
479 | of the dynamic scope within which the C<local()> was seen. This |
480 | has the effect of allowing one to temporarily occlude any magic on |
481 | composite types. For instance, this will briefly alter a tied |
482 | hash to some other implementation: |
483 | |
484 | tie %ahash, 'APackage'; |
485 | [...] |
486 | { |
487 | local %ahash; |
488 | tie %ahash, 'BPackage'; |
489 | [..called code will see %ahash tied to 'BPackage'..] |
490 | { |
491 | local %ahash; |
492 | [..%ahash is a normal (untied) hash here..] |
493 | } |
494 | } |
495 | [..%ahash back to its initial tied self again..] |
496 | |
497 | As another example, a custom implementation of C<%ENV> might look |
498 | like this: |
499 | |
500 | { |
501 | local %ENV; |
502 | tie %ENV, 'MyOwnEnv'; |
503 | [..do your own fancy %ENV manipulation here..] |
504 | } |
505 | [..normal %ENV behavior here..] |
506 | |
6ee623d5 |
507 | It's also worth taking a moment to explain what happens when you |
508 | localize a member of a composite type (i.e. an array or hash element). |
509 | In this case, the element is localized I<by name>. This means that |
510 | when the scope of the C<local()> ends, the saved value will be |
511 | restored to the hash element whose key was named in the C<local()>, or |
512 | the array element whose index was named in the C<local()>. If that |
513 | element was deleted while the C<local()> was in effect (e.g. by a |
514 | C<delete()> from a hash or a C<shift()> of an array), it will spring |
515 | back into existence, possibly extending an array and filling in the |
516 | skipped elements with C<undef>. For instance, if you say |
517 | |
518 | %hash = ( 'This' => 'is', 'a' => 'test' ); |
519 | @ary = ( 0..5 ); |
520 | { |
521 | local($ary[5]) = 6; |
522 | local($hash{'a'}) = 'drill'; |
523 | while (my $e = pop(@ary)) { |
524 | print "$e . . .\n"; |
525 | last unless $e > 3; |
526 | } |
527 | if (@ary) { |
528 | $hash{'only a'} = 'test'; |
529 | delete $hash{'a'}; |
530 | } |
531 | } |
532 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; |
533 | print "The array has ",scalar(@ary)," elements: ", |
534 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; |
535 | |
536 | Perl will print |
537 | |
538 | 6 . . . |
539 | 4 . . . |
540 | 3 . . . |
541 | This is a test only a test. |
542 | The array has 6 elements: 0, 1, 2, undef, undef, 5 |
543 | |
544 | In short, be careful when manipulating the containers for composite types |
545 | whose elements have been localized. |
3e3baf6d |
546 | |
cb1a09d0 |
547 | =head2 Passing Symbol Table Entries (typeglobs) |
548 | |
549 | [Note: The mechanism described in this section was originally the only |
550 | way to simulate pass-by-reference in older versions of Perl. While it |
551 | still works fine in modern versions, the new reference mechanism is |
552 | generally easier to work with. See below.] |
a0d0e21e |
553 | |
554 | Sometimes you don't want to pass the value of an array to a subroutine |
555 | but rather the name of it, so that the subroutine can modify the global |
556 | copy of it rather than working with a local copy. In perl you can |
cb1a09d0 |
557 | refer to all objects of a particular name by prefixing the name |
5f05dabc |
558 | with a star: C<*foo>. This is often known as a "typeglob", because the |
a0d0e21e |
559 | star on the front can be thought of as a wildcard match for all the |
560 | funny prefix characters on variables and subroutines and such. |
561 | |
55497cff |
562 | When evaluated, the typeglob produces a scalar value that represents |
5f05dabc |
563 | all the objects of that name, including any filehandle, format, or |
a0d0e21e |
564 | subroutine. When assigned to, it causes the name mentioned to refer to |
565 | whatever "*" value was assigned to it. Example: |
566 | |
567 | sub doubleary { |
568 | local(*someary) = @_; |
569 | foreach $elem (@someary) { |
570 | $elem *= 2; |
571 | } |
572 | } |
573 | doubleary(*foo); |
574 | doubleary(*bar); |
575 | |
576 | Note that scalars are already passed by reference, so you can modify |
577 | scalar arguments without using this mechanism by referring explicitly |
1fef88e7 |
578 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
a0d0e21e |
579 | all the elements as scalars, but you have to use the * mechanism (or |
5f05dabc |
580 | the equivalent reference mechanism) to push, pop, or change the size of |
a0d0e21e |
581 | an array. It will certainly be faster to pass the typeglob (or reference). |
582 | |
583 | Even if you don't want to modify an array, this mechanism is useful for |
5f05dabc |
584 | passing multiple arrays in a single LIST, because normally the LIST |
a0d0e21e |
585 | mechanism will merge all the array values so that you can't extract out |
55497cff |
586 | the individual arrays. For more on typeglobs, see |
2ae324a7 |
587 | L<perldata/"Typeglobs and Filehandles">. |
cb1a09d0 |
588 | |
589 | =head2 Pass by Reference |
590 | |
55497cff |
591 | If you want to pass more than one array or hash into a function--or |
592 | return them from it--and have them maintain their integrity, then |
593 | you're going to have to use an explicit pass-by-reference. Before you |
594 | do that, you need to understand references as detailed in L<perlref>. |
c07a80fd |
595 | This section may not make much sense to you otherwise. |
cb1a09d0 |
596 | |
597 | Here are a few simple examples. First, let's pass in several |
598 | arrays to a function and have it pop all of then, return a new |
599 | list of all their former last elements: |
600 | |
601 | @tailings = popmany ( \@a, \@b, \@c, \@d ); |
602 | |
603 | sub popmany { |
604 | my $aref; |
605 | my @retlist = (); |
606 | foreach $aref ( @_ ) { |
607 | push @retlist, pop @$aref; |
54310121 |
608 | } |
cb1a09d0 |
609 | return @retlist; |
54310121 |
610 | } |
cb1a09d0 |
611 | |
54310121 |
612 | Here's how you might write a function that returns a |
cb1a09d0 |
613 | list of keys occurring in all the hashes passed to it: |
614 | |
54310121 |
615 | @common = inter( \%foo, \%bar, \%joe ); |
cb1a09d0 |
616 | sub inter { |
617 | my ($k, $href, %seen); # locals |
618 | foreach $href (@_) { |
619 | while ( $k = each %$href ) { |
620 | $seen{$k}++; |
54310121 |
621 | } |
622 | } |
cb1a09d0 |
623 | return grep { $seen{$_} == @_ } keys %seen; |
54310121 |
624 | } |
cb1a09d0 |
625 | |
5f05dabc |
626 | So far, we're using just the normal list return mechanism. |
54310121 |
627 | What happens if you want to pass or return a hash? Well, |
628 | if you're using only one of them, or you don't mind them |
cb1a09d0 |
629 | concatenating, then the normal calling convention is ok, although |
54310121 |
630 | a little expensive. |
cb1a09d0 |
631 | |
632 | Where people get into trouble is here: |
633 | |
634 | (@a, @b) = func(@c, @d); |
635 | or |
636 | (%a, %b) = func(%c, %d); |
637 | |
5f05dabc |
638 | That syntax simply won't work. It sets just @a or %a and clears the @b or |
cb1a09d0 |
639 | %b. Plus the function didn't get passed into two separate arrays or |
640 | hashes: it got one long list in @_, as always. |
641 | |
642 | If you can arrange for everyone to deal with this through references, it's |
643 | cleaner code, although not so nice to look at. Here's a function that |
644 | takes two array references as arguments, returning the two array elements |
645 | in order of how many elements they have in them: |
646 | |
647 | ($aref, $bref) = func(\@c, \@d); |
648 | print "@$aref has more than @$bref\n"; |
649 | sub func { |
650 | my ($cref, $dref) = @_; |
651 | if (@$cref > @$dref) { |
652 | return ($cref, $dref); |
653 | } else { |
c07a80fd |
654 | return ($dref, $cref); |
54310121 |
655 | } |
656 | } |
cb1a09d0 |
657 | |
658 | It turns out that you can actually do this also: |
659 | |
660 | (*a, *b) = func(\@c, \@d); |
661 | print "@a has more than @b\n"; |
662 | sub func { |
663 | local (*c, *d) = @_; |
664 | if (@c > @d) { |
665 | return (\@c, \@d); |
666 | } else { |
667 | return (\@d, \@c); |
54310121 |
668 | } |
669 | } |
cb1a09d0 |
670 | |
671 | Here we're using the typeglobs to do symbol table aliasing. It's |
672 | a tad subtle, though, and also won't work if you're using my() |
5f05dabc |
673 | variables, because only globals (well, and local()s) are in the symbol table. |
674 | |
675 | If you're passing around filehandles, you could usually just use the bare |
676 | typeglob, like *STDOUT, but typeglobs references would be better because |
677 | they'll still work properly under C<use strict 'refs'>. For example: |
678 | |
679 | splutter(\*STDOUT); |
680 | sub splutter { |
681 | my $fh = shift; |
682 | print $fh "her um well a hmmm\n"; |
683 | } |
684 | |
685 | $rec = get_rec(\*STDIN); |
686 | sub get_rec { |
687 | my $fh = shift; |
688 | return scalar <$fh>; |
689 | } |
690 | |
691 | Another way to do this is using *HANDLE{IO}, see L<perlref> for usage |
692 | and caveats. |
693 | |
694 | If you're planning on generating new filehandles, you could do this: |
695 | |
696 | sub openit { |
697 | my $name = shift; |
698 | local *FH; |
e05a3a1e |
699 | return open (FH, $path) ? *FH : undef; |
54310121 |
700 | } |
5f05dabc |
701 | |
702 | Although that will actually produce a small memory leak. See the bottom |
703 | of L<perlfunc/open()> for a somewhat cleaner way using the IO::Handle |
704 | package. |
cb1a09d0 |
705 | |
cb1a09d0 |
706 | =head2 Prototypes |
707 | |
708 | As of the 5.002 release of perl, if you declare |
709 | |
710 | sub mypush (\@@) |
711 | |
c07a80fd |
712 | then mypush() takes arguments exactly like push() does. The declaration |
713 | of the function to be called must be visible at compile time. The prototype |
5f05dabc |
714 | affects only the interpretation of new-style calls to the function, where |
c07a80fd |
715 | new-style is defined as not using the C<&> character. In other words, |
716 | if you call it like a builtin function, then it behaves like a builtin |
717 | function. If you call it like an old-fashioned subroutine, then it |
718 | behaves like an old-fashioned subroutine. It naturally falls out from |
719 | this rule that prototypes have no influence on subroutine references |
720 | like C<\&foo> or on indirect subroutine calls like C<&{$subref}>. |
721 | |
722 | Method calls are not influenced by prototypes either, because the |
5f05dabc |
723 | function to be called is indeterminate at compile time, because it depends |
c07a80fd |
724 | on inheritance. |
cb1a09d0 |
725 | |
5f05dabc |
726 | Because the intent is primarily to let you define subroutines that work |
c07a80fd |
727 | like builtin commands, here are the prototypes for some other functions |
728 | that parse almost exactly like the corresponding builtins. |
cb1a09d0 |
729 | |
730 | Declared as Called as |
731 | |
732 | sub mylink ($$) mylink $old, $new |
733 | sub myvec ($$$) myvec $var, $offset, 1 |
734 | sub myindex ($$;$) myindex &getstring, "substr" |
735 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off |
736 | sub myreverse (@) myreverse $a,$b,$c |
737 | sub myjoin ($@) myjoin ":",$a,$b,$c |
738 | sub mypop (\@) mypop @array |
739 | sub mysplice (\@$$@) mysplice @array,@array,0,@pushme |
740 | sub mykeys (\%) mykeys %{$hashref} |
741 | sub myopen (*;$) myopen HANDLE, $name |
742 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE |
743 | sub mygrep (&@) mygrep { /foo/ } $a,$b,$c |
744 | sub myrand ($) myrand 42 |
745 | sub mytime () mytime |
746 | |
c07a80fd |
747 | Any backslashed prototype character represents an actual argument |
6e47f808 |
748 | that absolutely must start with that character. The value passed |
749 | to the subroutine (as part of C<@_>) will be a reference to the |
750 | actual argument given in the subroutine call, obtained by applying |
751 | C<\> to that argument. |
c07a80fd |
752 | |
753 | Unbackslashed prototype characters have special meanings. Any |
754 | unbackslashed @ or % eats all the rest of the arguments, and forces |
755 | list context. An argument represented by $ forces scalar context. An |
756 | & requires an anonymous subroutine, which, if passed as the first |
757 | argument, does not require the "sub" keyword or a subsequent comma. A |
758 | * does whatever it has to do to turn the argument into a reference to a |
759 | symbol table entry. |
760 | |
761 | A semicolon separates mandatory arguments from optional arguments. |
762 | (It is redundant before @ or %.) |
cb1a09d0 |
763 | |
c07a80fd |
764 | Note how the last three examples above are treated specially by the parser. |
cb1a09d0 |
765 | mygrep() is parsed as a true list operator, myrand() is parsed as a |
766 | true unary operator with unary precedence the same as rand(), and |
5f05dabc |
767 | mytime() is truly without arguments, just like time(). That is, if you |
cb1a09d0 |
768 | say |
769 | |
770 | mytime +2; |
771 | |
772 | you'll get mytime() + 2, not mytime(2), which is how it would be parsed |
773 | without the prototype. |
774 | |
775 | The interesting thing about & is that you can generate new syntax with it: |
776 | |
6d28dffb |
777 | sub try (&@) { |
cb1a09d0 |
778 | my($try,$catch) = @_; |
779 | eval { &$try }; |
780 | if ($@) { |
781 | local $_ = $@; |
782 | &$catch; |
783 | } |
784 | } |
55497cff |
785 | sub catch (&) { $_[0] } |
cb1a09d0 |
786 | |
787 | try { |
788 | die "phooey"; |
789 | } catch { |
790 | /phooey/ and print "unphooey\n"; |
791 | }; |
792 | |
793 | That prints "unphooey". (Yes, there are still unresolved |
794 | issues having to do with the visibility of @_. I'm ignoring that |
795 | question for the moment. (But note that if we make @_ lexically |
796 | scoped, those anonymous subroutines can act like closures... (Gee, |
5f05dabc |
797 | is this sounding a little Lispish? (Never mind.)))) |
cb1a09d0 |
798 | |
799 | And here's a reimplementation of grep: |
800 | |
801 | sub mygrep (&@) { |
802 | my $code = shift; |
803 | my @result; |
804 | foreach $_ (@_) { |
6e47f808 |
805 | push(@result, $_) if &$code; |
cb1a09d0 |
806 | } |
807 | @result; |
808 | } |
a0d0e21e |
809 | |
cb1a09d0 |
810 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
811 | been intentionally left out of prototypes for the express purpose of |
812 | someday in the future adding named, formal parameters. The current |
813 | mechanism's main goal is to let module writers provide better diagnostics |
814 | for module users. Larry feels the notation quite understandable to Perl |
815 | programmers, and that it will not intrude greatly upon the meat of the |
816 | module, nor make it harder to read. The line noise is visually |
817 | encapsulated into a small pill that's easy to swallow. |
818 | |
819 | It's probably best to prototype new functions, not retrofit prototyping |
820 | into older ones. That's because you must be especially careful about |
821 | silent impositions of differing list versus scalar contexts. For example, |
822 | if you decide that a function should take just one parameter, like this: |
823 | |
824 | sub func ($) { |
825 | my $n = shift; |
826 | print "you gave me $n\n"; |
54310121 |
827 | } |
cb1a09d0 |
828 | |
829 | and someone has been calling it with an array or expression |
830 | returning a list: |
831 | |
832 | func(@foo); |
833 | func( split /:/ ); |
834 | |
835 | Then you've just supplied an automatic scalar() in front of their |
836 | argument, which can be more than a bit surprising. The old @foo |
837 | which used to hold one thing doesn't get passed in. Instead, |
5f05dabc |
838 | the func() now gets passed in 1, that is, the number of elements |
cb1a09d0 |
839 | in @foo. And the split() gets called in a scalar context and |
840 | starts scribbling on your @_ parameter list. |
841 | |
5f05dabc |
842 | This is all very powerful, of course, and should be used only in moderation |
54310121 |
843 | to make the world a better place. |
44a8e56a |
844 | |
845 | =head2 Constant Functions |
846 | |
847 | Functions with a prototype of C<()> are potential candidates for |
54310121 |
848 | inlining. If the result after optimization and constant folding is |
849 | either a constant or a lexically-scoped scalar which has no other |
850 | references, then it will be used in place of function calls made |
851 | without C<&> or C<do>. Calls made using C<&> or C<do> are never |
852 | inlined. (See constant.pm for an easy way to declare most |
853 | constants.) |
44a8e56a |
854 | |
855 | All of the following functions would be inlined. |
856 | |
699e6cd4 |
857 | sub pi () { 3.14159 } # Not exact, but close. |
858 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, |
859 | # and it's inlined, too! |
44a8e56a |
860 | sub ST_DEV () { 0 } |
861 | sub ST_INO () { 1 } |
862 | |
863 | sub FLAG_FOO () { 1 << 8 } |
864 | sub FLAG_BAR () { 1 << 9 } |
865 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } |
54310121 |
866 | |
867 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } |
44a8e56a |
868 | sub BAZ_VAL () { |
869 | if (OPT_BAZ) { |
870 | return 23; |
871 | } |
872 | else { |
873 | return 42; |
874 | } |
875 | } |
cb1a09d0 |
876 | |
54310121 |
877 | sub N () { int(BAZ_VAL) / 3 } |
878 | BEGIN { |
879 | my $prod = 1; |
880 | for (1..N) { $prod *= $_ } |
881 | sub N_FACTORIAL () { $prod } |
882 | } |
883 | |
4cee8e80 |
884 | If you redefine a subroutine which was eligible for inlining you'll get |
885 | a mandatory warning. (You can use this warning to tell whether or not a |
886 | particular subroutine is considered constant.) The warning is |
887 | considered severe enough not to be optional because previously compiled |
888 | invocations of the function will still be using the old value of the |
889 | function. If you need to be able to redefine the subroutine you need to |
890 | ensure that it isn't inlined, either by dropping the C<()> prototype |
891 | (which changes the calling semantics, so beware) or by thwarting the |
892 | inlining mechanism in some other way, such as |
893 | |
4cee8e80 |
894 | sub not_inlined () { |
54310121 |
895 | 23 if $]; |
4cee8e80 |
896 | } |
897 | |
cb1a09d0 |
898 | =head2 Overriding Builtin Functions |
a0d0e21e |
899 | |
5f05dabc |
900 | Many builtin functions may be overridden, though this should be tried |
901 | only occasionally and for good reason. Typically this might be |
a0d0e21e |
902 | done by a package attempting to emulate missing builtin functionality |
903 | on a non-Unix system. |
904 | |
5f05dabc |
905 | Overriding may be done only by importing the name from a |
a0d0e21e |
906 | module--ordinary predeclaration isn't good enough. However, the |
54310121 |
907 | C<subs> pragma (compiler directive) lets you, in effect, predeclare subs |
a0d0e21e |
908 | via the import syntax, and these names may then override the builtin ones: |
909 | |
910 | use subs 'chdir', 'chroot', 'chmod', 'chown'; |
911 | chdir $somewhere; |
912 | sub chdir { ... } |
913 | |
fb73857a |
914 | To unambiguously refer to the builtin form, one may precede the |
915 | builtin name with the special package qualifier C<CORE::>. For example, |
916 | saying C<CORE::open()> will always refer to the builtin C<open()>, even |
917 | if the current package has imported some other subroutine called |
918 | C<&open()> from elsewhere. |
919 | |
a0d0e21e |
920 | Library modules should not in general export builtin names like "open" |
5f05dabc |
921 | or "chdir" as part of their default @EXPORT list, because these may |
a0d0e21e |
922 | sneak into someone else's namespace and change the semantics unexpectedly. |
923 | Instead, if the module adds the name to the @EXPORT_OK list, then it's |
924 | possible for a user to import the name explicitly, but not implicitly. |
925 | That is, they could say |
926 | |
927 | use Module 'open'; |
928 | |
929 | and it would import the open override, but if they said |
930 | |
931 | use Module; |
932 | |
933 | they would get the default imports without the overrides. |
934 | |
fb73857a |
935 | Note that such overriding is restricted to the package that requests |
936 | the import. Some means of "globally" overriding builtins may become |
937 | available in future. |
938 | |
a0d0e21e |
939 | =head2 Autoloading |
940 | |
941 | If you call a subroutine that is undefined, you would ordinarily get an |
942 | immediate fatal error complaining that the subroutine doesn't exist. |
943 | (Likewise for subroutines being used as methods, when the method |
944 | doesn't exist in any of the base classes of the class package.) If, |
945 | however, there is an C<AUTOLOAD> subroutine defined in the package or |
946 | packages that were searched for the original subroutine, then that |
947 | C<AUTOLOAD> subroutine is called with the arguments that would have been |
948 | passed to the original subroutine. The fully qualified name of the |
949 | original subroutine magically appears in the $AUTOLOAD variable in the |
950 | same package as the C<AUTOLOAD> routine. The name is not passed as an |
951 | ordinary argument because, er, well, just because, that's why... |
952 | |
953 | Most C<AUTOLOAD> routines will load in a definition for the subroutine in |
954 | question using eval, and then execute that subroutine using a special |
955 | form of "goto" that erases the stack frame of the C<AUTOLOAD> routine |
956 | without a trace. (See the standard C<AutoLoader> module, for example.) |
957 | But an C<AUTOLOAD> routine can also just emulate the routine and never |
cb1a09d0 |
958 | define it. For example, let's pretend that a function that wasn't defined |
959 | should just call system() with those arguments. All you'd do is this: |
960 | |
961 | sub AUTOLOAD { |
962 | my $program = $AUTOLOAD; |
963 | $program =~ s/.*:://; |
964 | system($program, @_); |
54310121 |
965 | } |
cb1a09d0 |
966 | date(); |
6d28dffb |
967 | who('am', 'i'); |
cb1a09d0 |
968 | ls('-l'); |
969 | |
54310121 |
970 | In fact, if you predeclare the functions you want to call that way, you don't |
cb1a09d0 |
971 | even need the parentheses: |
972 | |
973 | use subs qw(date who ls); |
974 | date; |
975 | who "am", "i"; |
976 | ls -l; |
977 | |
978 | A more complete example of this is the standard Shell module, which |
a0d0e21e |
979 | can treat undefined subroutine calls as calls to Unix programs. |
980 | |
cb1a09d0 |
981 | Mechanisms are available for modules writers to help split the modules |
6d28dffb |
982 | up into autoloadable files. See the standard AutoLoader module |
983 | described in L<AutoLoader> and in L<AutoSplit>, the standard |
984 | SelfLoader modules in L<SelfLoader>, and the document on adding C |
985 | functions to perl code in L<perlxs>. |
cb1a09d0 |
986 | |
987 | =head1 SEE ALSO |
a0d0e21e |
988 | |
cb1a09d0 |
989 | See L<perlref> for more on references. See L<perlxs> if you'd |
54310121 |
990 | like to learn about calling C subroutines from perl. See |
991 | L<perlmod> to learn about bundling up your functions in |
cb1a09d0 |
992 | separate files. |