Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
3 | perlsub - Perl subroutines |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | To declare subroutines: |
8 | |
09bef843 |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes |
11 | sub NAME : ATTRS; # with attributes |
12 | sub NAME(PROTO) : ATTRS; # with attributes and prototypes |
cb1a09d0 |
13 | |
09bef843 |
14 | sub NAME BLOCK # A declaration and a definition. |
15 | sub NAME(PROTO) BLOCK # ditto, but with prototypes |
16 | sub NAME : ATTRS BLOCK # with attributes |
17 | sub NAME(PROTO) : ATTRS BLOCK # with prototypes and attributes |
a0d0e21e |
18 | |
748a9306 |
19 | To define an anonymous subroutine at runtime: |
20 | |
09bef843 |
21 | $subref = sub BLOCK; # no proto |
22 | $subref = sub (PROTO) BLOCK; # with proto |
23 | $subref = sub : ATTRS BLOCK; # with attributes |
24 | $subref = sub (PROTO) : ATTRS BLOCK; # with proto and attributes |
748a9306 |
25 | |
a0d0e21e |
26 | To import subroutines: |
27 | |
19799a22 |
28 | use MODULE qw(NAME1 NAME2 NAME3); |
a0d0e21e |
29 | |
30 | To call subroutines: |
31 | |
5f05dabc |
32 | NAME(LIST); # & is optional with parentheses. |
54310121 |
33 | NAME LIST; # Parentheses optional if predeclared/imported. |
19799a22 |
34 | &NAME(LIST); # Circumvent prototypes. |
5a964f20 |
35 | &NAME; # Makes current @_ visible to called subroutine. |
a0d0e21e |
36 | |
37 | =head1 DESCRIPTION |
38 | |
19799a22 |
39 | Like many languages, Perl provides for user-defined subroutines. |
40 | These may be located anywhere in the main program, loaded in from |
41 | other files via the C<do>, C<require>, or C<use> keywords, or |
be3174d2 |
42 | generated on the fly using C<eval> or anonymous subroutines. |
19799a22 |
43 | You can even call a function indirectly using a variable containing |
44 | its name or a CODE reference. |
cb1a09d0 |
45 | |
46 | The Perl model for function call and return values is simple: all |
47 | functions are passed as parameters one single flat list of scalars, and |
48 | all functions likewise return to their caller one single flat list of |
49 | scalars. Any arrays or hashes in these call and return lists will |
50 | collapse, losing their identities--but you may always use |
51 | pass-by-reference instead to avoid this. Both call and return lists may |
52 | contain as many or as few scalar elements as you'd like. (Often a |
53 | function without an explicit return statement is called a subroutine, but |
19799a22 |
54 | there's really no difference from Perl's perspective.) |
55 | |
56 | Any arguments passed in show up in the array C<@_>. Therefore, if |
57 | you called a function with two arguments, those would be stored in |
58 | C<$_[0]> and C<$_[1]>. The array C<@_> is a local array, but its |
59 | elements are aliases for the actual scalar parameters. In particular, |
60 | if an element C<$_[0]> is updated, the corresponding argument is |
61 | updated (or an error occurs if it is not updatable). If an argument |
62 | is an array or hash element which did not exist when the function |
63 | was called, that element is created only when (and if) it is modified |
64 | or a reference to it is taken. (Some earlier versions of Perl |
65 | created the element whether or not the element was assigned to.) |
66 | Assigning to the whole array C<@_> removes that aliasing, and does |
67 | not update any arguments. |
68 | |
69 | The return value of a subroutine is the value of the last expression |
4c885f75 |
70 | evaluated by that sub, or the empty list in the case of an empty sub. |
71 | More explicitly, a C<return> statement may be used to exit the |
54310121 |
72 | subroutine, optionally specifying the returned value, which will be |
73 | evaluated in the appropriate context (list, scalar, or void) depending |
74 | on the context of the subroutine call. If you specify no return value, |
19799a22 |
75 | the subroutine returns an empty list in list context, the undefined |
76 | value in scalar context, or nothing in void context. If you return |
77 | one or more aggregates (arrays and hashes), these will be flattened |
78 | together into one large indistinguishable list. |
79 | |
80 | Perl does not have named formal parameters. In practice all you |
81 | do is assign to a C<my()> list of these. Variables that aren't |
82 | declared to be private are global variables. For gory details |
83 | on creating private variables, see L<"Private Variables via my()"> |
84 | and L<"Temporary Values via local()">. To create protected |
85 | environments for a set of functions in a separate package (and |
86 | probably a separate file), see L<perlmod/"Packages">. |
a0d0e21e |
87 | |
88 | Example: |
89 | |
cb1a09d0 |
90 | sub max { |
91 | my $max = shift(@_); |
a0d0e21e |
92 | foreach $foo (@_) { |
93 | $max = $foo if $max < $foo; |
94 | } |
cb1a09d0 |
95 | return $max; |
a0d0e21e |
96 | } |
cb1a09d0 |
97 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e |
98 | |
99 | Example: |
100 | |
101 | # get a line, combining continuation lines |
102 | # that start with whitespace |
103 | |
104 | sub get_line { |
19799a22 |
105 | $thisline = $lookahead; # global variables! |
54310121 |
106 | LINE: while (defined($lookahead = <STDIN>)) { |
a0d0e21e |
107 | if ($lookahead =~ /^[ \t]/) { |
108 | $thisline .= $lookahead; |
109 | } |
110 | else { |
111 | last LINE; |
112 | } |
113 | } |
19799a22 |
114 | return $thisline; |
a0d0e21e |
115 | } |
116 | |
117 | $lookahead = <STDIN>; # get first line |
19799a22 |
118 | while (defined($line = get_line())) { |
a0d0e21e |
119 | ... |
120 | } |
121 | |
09bef843 |
122 | Assigning to a list of private variables to name your arguments: |
a0d0e21e |
123 | |
124 | sub maybeset { |
125 | my($key, $value) = @_; |
cb1a09d0 |
126 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e |
127 | } |
128 | |
19799a22 |
129 | Because the assignment copies the values, this also has the effect |
130 | of turning call-by-reference into call-by-value. Otherwise a |
131 | function is free to do in-place modifications of C<@_> and change |
132 | its caller's values. |
cb1a09d0 |
133 | |
134 | upcase_in($v1, $v2); # this changes $v1 and $v2 |
135 | sub upcase_in { |
54310121 |
136 | for (@_) { tr/a-z/A-Z/ } |
137 | } |
cb1a09d0 |
138 | |
139 | You aren't allowed to modify constants in this way, of course. If an |
140 | argument were actually literal and you tried to change it, you'd take a |
141 | (presumably fatal) exception. For example, this won't work: |
142 | |
143 | upcase_in("frederick"); |
144 | |
f86cebdf |
145 | It would be much safer if the C<upcase_in()> function |
cb1a09d0 |
146 | were written to return a copy of its parameters instead |
147 | of changing them in place: |
148 | |
19799a22 |
149 | ($v3, $v4) = upcase($v1, $v2); # this doesn't change $v1 and $v2 |
cb1a09d0 |
150 | sub upcase { |
54310121 |
151 | return unless defined wantarray; # void context, do nothing |
cb1a09d0 |
152 | my @parms = @_; |
54310121 |
153 | for (@parms) { tr/a-z/A-Z/ } |
c07a80fd |
154 | return wantarray ? @parms : $parms[0]; |
54310121 |
155 | } |
cb1a09d0 |
156 | |
19799a22 |
157 | Notice how this (unprototyped) function doesn't care whether it was |
a2293a43 |
158 | passed real scalars or arrays. Perl sees all arguments as one big, |
19799a22 |
159 | long, flat parameter list in C<@_>. This is one area where |
160 | Perl's simple argument-passing style shines. The C<upcase()> |
161 | function would work perfectly well without changing the C<upcase()> |
162 | definition even if we fed it things like this: |
cb1a09d0 |
163 | |
164 | @newlist = upcase(@list1, @list2); |
165 | @newlist = upcase( split /:/, $var ); |
166 | |
167 | Do not, however, be tempted to do this: |
168 | |
169 | (@a, @b) = upcase(@list1, @list2); |
170 | |
19799a22 |
171 | Like the flattened incoming parameter list, the return list is also |
172 | flattened on return. So all you have managed to do here is stored |
17b63f68 |
173 | everything in C<@a> and made C<@b> empty. See |
13a2d996 |
174 | L<Pass by Reference> for alternatives. |
19799a22 |
175 | |
176 | A subroutine may be called using an explicit C<&> prefix. The |
177 | C<&> is optional in modern Perl, as are parentheses if the |
178 | subroutine has been predeclared. The C<&> is I<not> optional |
179 | when just naming the subroutine, such as when it's used as |
180 | an argument to defined() or undef(). Nor is it optional when you |
181 | want to do an indirect subroutine call with a subroutine name or |
182 | reference using the C<&$subref()> or C<&{$subref}()> constructs, |
c47ff5f1 |
183 | although the C<< $subref->() >> notation solves that problem. |
19799a22 |
184 | See L<perlref> for more about all that. |
185 | |
186 | Subroutines may be called recursively. If a subroutine is called |
187 | using the C<&> form, the argument list is optional, and if omitted, |
188 | no C<@_> array is set up for the subroutine: the C<@_> array at the |
189 | time of the call is visible to subroutine instead. This is an |
190 | efficiency mechanism that new users may wish to avoid. |
a0d0e21e |
191 | |
192 | &foo(1,2,3); # pass three arguments |
193 | foo(1,2,3); # the same |
194 | |
195 | foo(); # pass a null list |
196 | &foo(); # the same |
a0d0e21e |
197 | |
cb1a09d0 |
198 | &foo; # foo() get current args, like foo(@_) !! |
54310121 |
199 | foo; # like foo() IFF sub foo predeclared, else "foo" |
cb1a09d0 |
200 | |
19799a22 |
201 | Not only does the C<&> form make the argument list optional, it also |
202 | disables any prototype checking on arguments you do provide. This |
c07a80fd |
203 | is partly for historical reasons, and partly for having a convenient way |
19799a22 |
204 | to cheat if you know what you're doing. See L<Prototypes> below. |
c07a80fd |
205 | |
ac90fb77 |
206 | Subroutines whose names are in all upper case are reserved to the Perl |
207 | core, as are modules whose names are in all lower case. A subroutine in |
208 | all capitals is a loosely-held convention meaning it will be called |
209 | indirectly by the run-time system itself, usually due to a triggered event. |
210 | Subroutines that do special, pre-defined things include C<AUTOLOAD>, C<CLONE>, |
211 | C<DESTROY> plus all functions mentioned in L<perltie> and L<PerlIO::via>. |
212 | |
213 | The C<BEGIN>, C<CHECK>, C<INIT> and C<END> subroutines are not so much |
214 | subroutines as named special code blocks, of which you can have more |
fa11829f |
215 | than one in a package, and which you can B<not> call explicitly. See |
ac90fb77 |
216 | L<perlmod/"BEGIN, CHECK, INIT and END"> |
5a964f20 |
217 | |
b687b08b |
218 | =head2 Private Variables via my() |
cb1a09d0 |
219 | |
220 | Synopsis: |
221 | |
222 | my $foo; # declare $foo lexically local |
223 | my (@wid, %get); # declare list of variables local |
224 | my $foo = "flurp"; # declare $foo lexical, and init it |
225 | my @oof = @bar; # declare @oof lexical, and init it |
09bef843 |
226 | my $x : Foo = $y; # similar, with an attribute applied |
227 | |
a0ae32d3 |
228 | B<WARNING>: The use of attribute lists on C<my> declarations is still |
229 | evolving. The current semantics and interface are subject to change. |
230 | See L<attributes> and L<Attribute::Handlers>. |
cb1a09d0 |
231 | |
19799a22 |
232 | The C<my> operator declares the listed variables to be lexically |
233 | confined to the enclosing block, conditional (C<if/unless/elsif/else>), |
234 | loop (C<for/foreach/while/until/continue>), subroutine, C<eval>, |
235 | or C<do/require/use>'d file. If more than one value is listed, the |
236 | list must be placed in parentheses. All listed elements must be |
237 | legal lvalues. Only alphanumeric identifiers may be lexically |
325192b1 |
238 | scoped--magical built-ins like C<$/> must currently be C<local>ized |
19799a22 |
239 | with C<local> instead. |
240 | |
241 | Unlike dynamic variables created by the C<local> operator, lexical |
242 | variables declared with C<my> are totally hidden from the outside |
243 | world, including any called subroutines. This is true if it's the |
244 | same subroutine called from itself or elsewhere--every call gets |
245 | its own copy. |
246 | |
247 | This doesn't mean that a C<my> variable declared in a statically |
248 | enclosing lexical scope would be invisible. Only dynamic scopes |
249 | are cut off. For example, the C<bumpx()> function below has access |
250 | to the lexical $x variable because both the C<my> and the C<sub> |
251 | occurred at the same scope, presumably file scope. |
5a964f20 |
252 | |
253 | my $x = 10; |
254 | sub bumpx { $x++ } |
255 | |
19799a22 |
256 | An C<eval()>, however, can see lexical variables of the scope it is |
257 | being evaluated in, so long as the names aren't hidden by declarations within |
258 | the C<eval()> itself. See L<perlref>. |
cb1a09d0 |
259 | |
19799a22 |
260 | The parameter list to my() may be assigned to if desired, which allows you |
cb1a09d0 |
261 | to initialize your variables. (If no initializer is given for a |
262 | particular variable, it is created with the undefined value.) Commonly |
19799a22 |
263 | this is used to name input parameters to a subroutine. Examples: |
cb1a09d0 |
264 | |
265 | $arg = "fred"; # "global" variable |
266 | $n = cube_root(27); |
267 | print "$arg thinks the root is $n\n"; |
268 | fred thinks the root is 3 |
269 | |
270 | sub cube_root { |
271 | my $arg = shift; # name doesn't matter |
272 | $arg **= 1/3; |
273 | return $arg; |
54310121 |
274 | } |
cb1a09d0 |
275 | |
19799a22 |
276 | The C<my> is simply a modifier on something you might assign to. So when |
277 | you do assign to variables in its argument list, C<my> doesn't |
6cc33c6d |
278 | change whether those variables are viewed as a scalar or an array. So |
cb1a09d0 |
279 | |
5a964f20 |
280 | my ($foo) = <STDIN>; # WRONG? |
cb1a09d0 |
281 | my @FOO = <STDIN>; |
282 | |
5f05dabc |
283 | both supply a list context to the right-hand side, while |
cb1a09d0 |
284 | |
285 | my $foo = <STDIN>; |
286 | |
5f05dabc |
287 | supplies a scalar context. But the following declares only one variable: |
748a9306 |
288 | |
5a964f20 |
289 | my $foo, $bar = 1; # WRONG |
748a9306 |
290 | |
cb1a09d0 |
291 | That has the same effect as |
748a9306 |
292 | |
cb1a09d0 |
293 | my $foo; |
294 | $bar = 1; |
a0d0e21e |
295 | |
cb1a09d0 |
296 | The declared variable is not introduced (is not visible) until after |
297 | the current statement. Thus, |
298 | |
299 | my $x = $x; |
300 | |
19799a22 |
301 | can be used to initialize a new $x with the value of the old $x, and |
cb1a09d0 |
302 | the expression |
303 | |
304 | my $x = 123 and $x == 123 |
305 | |
19799a22 |
306 | is false unless the old $x happened to have the value C<123>. |
cb1a09d0 |
307 | |
55497cff |
308 | Lexical scopes of control structures are not bounded precisely by the |
309 | braces that delimit their controlled blocks; control expressions are |
19799a22 |
310 | part of that scope, too. Thus in the loop |
55497cff |
311 | |
19799a22 |
312 | while (my $line = <>) { |
55497cff |
313 | $line = lc $line; |
314 | } continue { |
315 | print $line; |
316 | } |
317 | |
19799a22 |
318 | the scope of $line extends from its declaration throughout the rest of |
55497cff |
319 | the loop construct (including the C<continue> clause), but not beyond |
320 | it. Similarly, in the conditional |
321 | |
322 | if ((my $answer = <STDIN>) =~ /^yes$/i) { |
323 | user_agrees(); |
324 | } elsif ($answer =~ /^no$/i) { |
325 | user_disagrees(); |
326 | } else { |
327 | chomp $answer; |
328 | die "'$answer' is neither 'yes' nor 'no'"; |
329 | } |
330 | |
19799a22 |
331 | the scope of $answer extends from its declaration through the rest |
332 | of that conditional, including any C<elsif> and C<else> clauses, |
457b36cb |
333 | but not beyond it. See L<perlsyn/"Simple statements"> for information |
334 | on the scope of variables in statements with modifiers. |
55497cff |
335 | |
5f05dabc |
336 | The C<foreach> loop defaults to scoping its index variable dynamically |
19799a22 |
337 | in the manner of C<local>. However, if the index variable is |
338 | prefixed with the keyword C<my>, or if there is already a lexical |
339 | by that name in scope, then a new lexical is created instead. Thus |
340 | in the loop |
55497cff |
341 | |
342 | for my $i (1, 2, 3) { |
343 | some_function(); |
344 | } |
345 | |
19799a22 |
346 | the scope of $i extends to the end of the loop, but not beyond it, |
347 | rendering the value of $i inaccessible within C<some_function()>. |
55497cff |
348 | |
cb1a09d0 |
349 | Some users may wish to encourage the use of lexically scoped variables. |
19799a22 |
350 | As an aid to catching implicit uses to package variables, |
351 | which are always global, if you say |
cb1a09d0 |
352 | |
353 | use strict 'vars'; |
354 | |
19799a22 |
355 | then any variable mentioned from there to the end of the enclosing |
356 | block must either refer to a lexical variable, be predeclared via |
77ca0c92 |
357 | C<our> or C<use vars>, or else must be fully qualified with the package name. |
19799a22 |
358 | A compilation error results otherwise. An inner block may countermand |
359 | this with C<no strict 'vars'>. |
360 | |
361 | A C<my> has both a compile-time and a run-time effect. At compile |
8593bda5 |
362 | time, the compiler takes notice of it. The principal usefulness |
19799a22 |
363 | of this is to quiet C<use strict 'vars'>, but it is also essential |
364 | for generation of closures as detailed in L<perlref>. Actual |
365 | initialization is delayed until run time, though, so it gets executed |
366 | at the appropriate time, such as each time through a loop, for |
367 | example. |
368 | |
369 | Variables declared with C<my> are not part of any package and are therefore |
cb1a09d0 |
370 | never fully qualified with the package name. In particular, you're not |
371 | allowed to try to make a package variable (or other global) lexical: |
372 | |
373 | my $pack::var; # ERROR! Illegal syntax |
374 | my $_; # also illegal (currently) |
375 | |
376 | In fact, a dynamic variable (also known as package or global variables) |
f86cebdf |
377 | are still accessible using the fully qualified C<::> notation even while a |
cb1a09d0 |
378 | lexical of the same name is also visible: |
379 | |
380 | package main; |
381 | local $x = 10; |
382 | my $x = 20; |
383 | print "$x and $::x\n"; |
384 | |
f86cebdf |
385 | That will print out C<20> and C<10>. |
cb1a09d0 |
386 | |
19799a22 |
387 | You may declare C<my> variables at the outermost scope of a file |
388 | to hide any such identifiers from the world outside that file. This |
389 | is similar in spirit to C's static variables when they are used at |
390 | the file level. To do this with a subroutine requires the use of |
391 | a closure (an anonymous function that accesses enclosing lexicals). |
392 | If you want to create a private subroutine that cannot be called |
393 | from outside that block, it can declare a lexical variable containing |
394 | an anonymous sub reference: |
cb1a09d0 |
395 | |
396 | my $secret_version = '1.001-beta'; |
397 | my $secret_sub = sub { print $secret_version }; |
398 | &$secret_sub(); |
399 | |
400 | As long as the reference is never returned by any function within the |
5f05dabc |
401 | module, no outside module can see the subroutine, because its name is not in |
cb1a09d0 |
402 | any package's symbol table. Remember that it's not I<REALLY> called |
19799a22 |
403 | C<$some_pack::secret_version> or anything; it's just $secret_version, |
cb1a09d0 |
404 | unqualified and unqualifiable. |
405 | |
19799a22 |
406 | This does not work with object methods, however; all object methods |
407 | have to be in the symbol table of some package to be found. See |
408 | L<perlref/"Function Templates"> for something of a work-around to |
409 | this. |
cb1a09d0 |
410 | |
c2611fb3 |
411 | =head2 Persistent Private Variables |
5a964f20 |
412 | |
413 | Just because a lexical variable is lexically (also called statically) |
f86cebdf |
414 | scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that |
5a964f20 |
415 | within a function it works like a C static. It normally works more |
416 | like a C auto, but with implicit garbage collection. |
417 | |
418 | Unlike local variables in C or C++, Perl's lexical variables don't |
419 | necessarily get recycled just because their scope has exited. |
420 | If something more permanent is still aware of the lexical, it will |
421 | stick around. So long as something else references a lexical, that |
422 | lexical won't be freed--which is as it should be. You wouldn't want |
423 | memory being free until you were done using it, or kept around once you |
424 | were done. Automatic garbage collection takes care of this for you. |
425 | |
426 | This means that you can pass back or save away references to lexical |
427 | variables, whereas to return a pointer to a C auto is a grave error. |
428 | It also gives us a way to simulate C's function statics. Here's a |
429 | mechanism for giving a function private variables with both lexical |
430 | scoping and a static lifetime. If you do want to create something like |
431 | C's static variables, just enclose the whole function in an extra block, |
432 | and put the static variable outside the function but in the block. |
cb1a09d0 |
433 | |
434 | { |
54310121 |
435 | my $secret_val = 0; |
cb1a09d0 |
436 | sub gimme_another { |
437 | return ++$secret_val; |
54310121 |
438 | } |
439 | } |
cb1a09d0 |
440 | # $secret_val now becomes unreachable by the outside |
441 | # world, but retains its value between calls to gimme_another |
442 | |
54310121 |
443 | If this function is being sourced in from a separate file |
cb1a09d0 |
444 | via C<require> or C<use>, then this is probably just fine. If it's |
19799a22 |
445 | all in the main program, you'll need to arrange for the C<my> |
cb1a09d0 |
446 | to be executed early, either by putting the whole block above |
f86cebdf |
447 | your main program, or more likely, placing merely a C<BEGIN> |
ac90fb77 |
448 | code block around it to make sure it gets executed before your program |
cb1a09d0 |
449 | starts to run: |
450 | |
ac90fb77 |
451 | BEGIN { |
54310121 |
452 | my $secret_val = 0; |
cb1a09d0 |
453 | sub gimme_another { |
454 | return ++$secret_val; |
54310121 |
455 | } |
456 | } |
cb1a09d0 |
457 | |
ac90fb77 |
458 | See L<perlmod/"BEGIN, CHECK, INIT and END"> about the |
459 | special triggered code blocks, C<BEGIN>, C<CHECK>, C<INIT> and C<END>. |
cb1a09d0 |
460 | |
19799a22 |
461 | If declared at the outermost scope (the file scope), then lexicals |
462 | work somewhat like C's file statics. They are available to all |
463 | functions in that same file declared below them, but are inaccessible |
464 | from outside that file. This strategy is sometimes used in modules |
465 | to create private variables that the whole module can see. |
5a964f20 |
466 | |
cb1a09d0 |
467 | =head2 Temporary Values via local() |
468 | |
19799a22 |
469 | B<WARNING>: In general, you should be using C<my> instead of C<local>, because |
6d28dffb |
470 | it's faster and safer. Exceptions to this include the global punctuation |
325192b1 |
471 | variables, global filehandles and formats, and direct manipulation of the |
472 | Perl symbol table itself. C<local> is mostly used when the current value |
473 | of a variable must be visible to called subroutines. |
cb1a09d0 |
474 | |
475 | Synopsis: |
476 | |
325192b1 |
477 | # localization of values |
478 | |
479 | local $foo; # make $foo dynamically local |
480 | local (@wid, %get); # make list of variables local |
481 | local $foo = "flurp"; # make $foo dynamic, and init it |
482 | local @oof = @bar; # make @oof dynamic, and init it |
483 | |
484 | local $hash{key} = "val"; # sets a local value for this hash entry |
485 | local ($cond ? $v1 : $v2); # several types of lvalues support |
486 | # localization |
487 | |
488 | # localization of symbols |
cb1a09d0 |
489 | |
490 | local *FH; # localize $FH, @FH, %FH, &FH ... |
491 | local *merlyn = *randal; # now $merlyn is really $randal, plus |
492 | # @merlyn is really @randal, etc |
493 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal |
54310121 |
494 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
cb1a09d0 |
495 | |
19799a22 |
496 | A C<local> modifies its listed variables to be "local" to the |
497 | enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine |
498 | called from within that block>. A C<local> just gives temporary |
499 | values to global (meaning package) variables. It does I<not> create |
500 | a local variable. This is known as dynamic scoping. Lexical scoping |
501 | is done with C<my>, which works more like C's auto declarations. |
cb1a09d0 |
502 | |
325192b1 |
503 | Some types of lvalues can be localized as well : hash and array elements |
504 | and slices, conditionals (provided that their result is always |
505 | localizable), and symbolic references. As for simple variables, this |
506 | creates new, dynamically scoped values. |
507 | |
508 | If more than one variable or expression is given to C<local>, they must be |
509 | placed in parentheses. This operator works |
cb1a09d0 |
510 | by saving the current values of those variables in its argument list on a |
5f05dabc |
511 | hidden stack and restoring them upon exiting the block, subroutine, or |
cb1a09d0 |
512 | eval. This means that called subroutines can also reference the local |
513 | variable, but not the global one. The argument list may be assigned to if |
514 | desired, which allows you to initialize your local variables. (If no |
515 | initializer is given for a particular variable, it is created with an |
325192b1 |
516 | undefined value.) |
cb1a09d0 |
517 | |
19799a22 |
518 | Because C<local> is a run-time operator, it gets executed each time |
325192b1 |
519 | through a loop. Consequently, it's more efficient to localize your |
520 | variables outside the loop. |
521 | |
522 | =head3 Grammatical note on local() |
cb1a09d0 |
523 | |
f86cebdf |
524 | A C<local> is simply a modifier on an lvalue expression. When you assign to |
525 | a C<local>ized variable, the C<local> doesn't change whether its list is viewed |
cb1a09d0 |
526 | as a scalar or an array. So |
527 | |
528 | local($foo) = <STDIN>; |
529 | local @FOO = <STDIN>; |
530 | |
5f05dabc |
531 | both supply a list context to the right-hand side, while |
cb1a09d0 |
532 | |
533 | local $foo = <STDIN>; |
534 | |
535 | supplies a scalar context. |
536 | |
325192b1 |
537 | =head3 Localization of special variables |
3e3baf6d |
538 | |
325192b1 |
539 | If you localize a special variable, you'll be giving a new value to it, |
540 | but its magic won't go away. That means that all side-effects related |
541 | to this magic still work with the localized value. |
3e3baf6d |
542 | |
325192b1 |
543 | This feature allows code like this to work : |
544 | |
545 | # Read the whole contents of FILE in $slurp |
546 | { local $/ = undef; $slurp = <FILE>; } |
547 | |
548 | Note, however, that this restricts localization of some values ; for |
549 | example, the following statement dies, as of perl 5.9.0, with an error |
550 | I<Modification of a read-only value attempted>, because the $1 variable is |
551 | magical and read-only : |
552 | |
553 | local $1 = 2; |
554 | |
555 | Similarly, but in a way more difficult to spot, the following snippet will |
556 | die in perl 5.9.0 : |
557 | |
558 | sub f { local $_ = "foo"; print } |
559 | for ($1) { |
560 | # now $_ is aliased to $1, thus is magic and readonly |
561 | f(); |
3e3baf6d |
562 | } |
3e3baf6d |
563 | |
325192b1 |
564 | See next section for an alternative to this situation. |
565 | |
566 | B<WARNING>: Localization of tied arrays and hashes does not currently |
567 | work as described. |
fd5a896a |
568 | This will be fixed in a future release of Perl; in the meantime, avoid |
569 | code that relies on any particular behaviour of localising tied arrays |
570 | or hashes (localising individual elements is still okay). |
325192b1 |
571 | See L<perl58delta/"Localising Tied Arrays and Hashes Is Broken"> for more |
fd5a896a |
572 | details. |
573 | |
325192b1 |
574 | =head3 Localization of globs |
3e3baf6d |
575 | |
325192b1 |
576 | The construct |
577 | |
578 | local *name; |
579 | |
580 | creates a whole new symbol table entry for the glob C<name> in the |
581 | current package. That means that all variables in its glob slot ($name, |
582 | @name, %name, &name, and the C<name> filehandle) are dynamically reset. |
583 | |
584 | This implies, among other things, that any magic eventually carried by |
585 | those variables is locally lost. In other words, saying C<local */> |
586 | will not have any effect on the internal value of the input record |
587 | separator. |
588 | |
589 | Notably, if you want to work with a brand new value of the default scalar |
590 | $_, and avoid the potential problem listed above about $_ previously |
591 | carrying a magic value, you should use C<local *_> instead of C<local $_>. |
a4fb8298 |
592 | As of perl 5.9.1, you can also use the lexical form of C<$_> (declaring it |
593 | with C<my $_>), which avoids completely this problem. |
325192b1 |
594 | |
595 | =head3 Localization of elements of composite types |
3e3baf6d |
596 | |
6ee623d5 |
597 | It's also worth taking a moment to explain what happens when you |
f86cebdf |
598 | C<local>ize a member of a composite type (i.e. an array or hash element). |
599 | In this case, the element is C<local>ized I<by name>. This means that |
6ee623d5 |
600 | when the scope of the C<local()> ends, the saved value will be |
601 | restored to the hash element whose key was named in the C<local()>, or |
602 | the array element whose index was named in the C<local()>. If that |
603 | element was deleted while the C<local()> was in effect (e.g. by a |
604 | C<delete()> from a hash or a C<shift()> of an array), it will spring |
605 | back into existence, possibly extending an array and filling in the |
606 | skipped elements with C<undef>. For instance, if you say |
607 | |
608 | %hash = ( 'This' => 'is', 'a' => 'test' ); |
609 | @ary = ( 0..5 ); |
610 | { |
611 | local($ary[5]) = 6; |
612 | local($hash{'a'}) = 'drill'; |
613 | while (my $e = pop(@ary)) { |
614 | print "$e . . .\n"; |
615 | last unless $e > 3; |
616 | } |
617 | if (@ary) { |
618 | $hash{'only a'} = 'test'; |
619 | delete $hash{'a'}; |
620 | } |
621 | } |
622 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; |
623 | print "The array has ",scalar(@ary)," elements: ", |
624 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; |
625 | |
626 | Perl will print |
627 | |
628 | 6 . . . |
629 | 4 . . . |
630 | 3 . . . |
631 | This is a test only a test. |
632 | The array has 6 elements: 0, 1, 2, undef, undef, 5 |
633 | |
19799a22 |
634 | The behavior of local() on non-existent members of composite |
7185e5cc |
635 | types is subject to change in future. |
636 | |
cd06dffe |
637 | =head2 Lvalue subroutines |
638 | |
e6a32221 |
639 | B<WARNING>: Lvalue subroutines are still experimental and the |
640 | implementation may change in future versions of Perl. |
cd06dffe |
641 | |
642 | It is possible to return a modifiable value from a subroutine. |
643 | To do this, you have to declare the subroutine to return an lvalue. |
644 | |
645 | my $val; |
646 | sub canmod : lvalue { |
e6a32221 |
647 | # return $val; this doesn't work, don't say "return" |
cd06dffe |
648 | $val; |
649 | } |
650 | sub nomod { |
651 | $val; |
652 | } |
653 | |
654 | canmod() = 5; # assigns to $val |
655 | nomod() = 5; # ERROR |
656 | |
657 | The scalar/list context for the subroutine and for the right-hand |
658 | side of assignment is determined as if the subroutine call is replaced |
659 | by a scalar. For example, consider: |
660 | |
661 | data(2,3) = get_data(3,4); |
662 | |
663 | Both subroutines here are called in a scalar context, while in: |
664 | |
665 | (data(2,3)) = get_data(3,4); |
666 | |
667 | and in: |
668 | |
669 | (data(2),data(3)) = get_data(3,4); |
670 | |
671 | all the subroutines are called in a list context. |
672 | |
e6a32221 |
673 | =over 4 |
674 | |
675 | =item Lvalue subroutines are EXPERIMENTAL |
676 | |
677 | They appear to be convenient, but there are several reasons to be |
678 | circumspect. |
679 | |
680 | You can't use the return keyword, you must pass out the value before |
681 | falling out of subroutine scope. (see comment in example above). This |
682 | is usually not a problem, but it disallows an explicit return out of a |
683 | deeply nested loop, which is sometimes a nice way out. |
684 | |
685 | They violate encapsulation. A normal mutator can check the supplied |
686 | argument before setting the attribute it is protecting, an lvalue |
687 | subroutine never gets that chance. Consider; |
688 | |
689 | my $some_array_ref = []; # protected by mutators ?? |
690 | |
691 | sub set_arr { # normal mutator |
692 | my $val = shift; |
693 | die("expected array, you supplied ", ref $val) |
694 | unless ref $val eq 'ARRAY'; |
695 | $some_array_ref = $val; |
696 | } |
697 | sub set_arr_lv : lvalue { # lvalue mutator |
698 | $some_array_ref; |
699 | } |
700 | |
701 | # set_arr_lv cannot stop this ! |
702 | set_arr_lv() = { a => 1 }; |
818c4caa |
703 | |
e6a32221 |
704 | =back |
705 | |
cb1a09d0 |
706 | =head2 Passing Symbol Table Entries (typeglobs) |
707 | |
19799a22 |
708 | B<WARNING>: The mechanism described in this section was originally |
709 | the only way to simulate pass-by-reference in older versions of |
710 | Perl. While it still works fine in modern versions, the new reference |
711 | mechanism is generally easier to work with. See below. |
a0d0e21e |
712 | |
713 | Sometimes you don't want to pass the value of an array to a subroutine |
714 | but rather the name of it, so that the subroutine can modify the global |
715 | copy of it rather than working with a local copy. In perl you can |
cb1a09d0 |
716 | refer to all objects of a particular name by prefixing the name |
5f05dabc |
717 | with a star: C<*foo>. This is often known as a "typeglob", because the |
a0d0e21e |
718 | star on the front can be thought of as a wildcard match for all the |
719 | funny prefix characters on variables and subroutines and such. |
720 | |
55497cff |
721 | When evaluated, the typeglob produces a scalar value that represents |
5f05dabc |
722 | all the objects of that name, including any filehandle, format, or |
a0d0e21e |
723 | subroutine. When assigned to, it causes the name mentioned to refer to |
19799a22 |
724 | whatever C<*> value was assigned to it. Example: |
a0d0e21e |
725 | |
726 | sub doubleary { |
727 | local(*someary) = @_; |
728 | foreach $elem (@someary) { |
729 | $elem *= 2; |
730 | } |
731 | } |
732 | doubleary(*foo); |
733 | doubleary(*bar); |
734 | |
19799a22 |
735 | Scalars are already passed by reference, so you can modify |
a0d0e21e |
736 | scalar arguments without using this mechanism by referring explicitly |
1fef88e7 |
737 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
f86cebdf |
738 | all the elements as scalars, but you have to use the C<*> mechanism (or |
739 | the equivalent reference mechanism) to C<push>, C<pop>, or change the size of |
a0d0e21e |
740 | an array. It will certainly be faster to pass the typeglob (or reference). |
741 | |
742 | Even if you don't want to modify an array, this mechanism is useful for |
5f05dabc |
743 | passing multiple arrays in a single LIST, because normally the LIST |
a0d0e21e |
744 | mechanism will merge all the array values so that you can't extract out |
55497cff |
745 | the individual arrays. For more on typeglobs, see |
2ae324a7 |
746 | L<perldata/"Typeglobs and Filehandles">. |
cb1a09d0 |
747 | |
5a964f20 |
748 | =head2 When to Still Use local() |
749 | |
19799a22 |
750 | Despite the existence of C<my>, there are still three places where the |
751 | C<local> operator still shines. In fact, in these three places, you |
5a964f20 |
752 | I<must> use C<local> instead of C<my>. |
753 | |
13a2d996 |
754 | =over 4 |
5a964f20 |
755 | |
551e1d92 |
756 | =item 1. |
757 | |
758 | You need to give a global variable a temporary value, especially $_. |
5a964f20 |
759 | |
f86cebdf |
760 | The global variables, like C<@ARGV> or the punctuation variables, must be |
761 | C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits |
5a964f20 |
762 | it up into chunks separated by lines of equal signs, which are placed |
f86cebdf |
763 | in C<@Fields>. |
5a964f20 |
764 | |
765 | { |
766 | local @ARGV = ("/etc/motd"); |
767 | local $/ = undef; |
768 | local $_ = <>; |
769 | @Fields = split /^\s*=+\s*$/; |
770 | } |
771 | |
19799a22 |
772 | It particular, it's important to C<local>ize $_ in any routine that assigns |
5a964f20 |
773 | to it. Look out for implicit assignments in C<while> conditionals. |
774 | |
551e1d92 |
775 | =item 2. |
776 | |
777 | You need to create a local file or directory handle or a local function. |
5a964f20 |
778 | |
09bef843 |
779 | A function that needs a filehandle of its own must use |
780 | C<local()> on a complete typeglob. This can be used to create new symbol |
5a964f20 |
781 | table entries: |
782 | |
783 | sub ioqueue { |
784 | local (*READER, *WRITER); # not my! |
17b63f68 |
785 | pipe (READER, WRITER) or die "pipe: $!"; |
5a964f20 |
786 | return (*READER, *WRITER); |
787 | } |
788 | ($head, $tail) = ioqueue(); |
789 | |
790 | See the Symbol module for a way to create anonymous symbol table |
791 | entries. |
792 | |
793 | Because assignment of a reference to a typeglob creates an alias, this |
794 | can be used to create what is effectively a local function, or at least, |
795 | a local alias. |
796 | |
797 | { |
f86cebdf |
798 | local *grow = \&shrink; # only until this block exists |
799 | grow(); # really calls shrink() |
800 | move(); # if move() grow()s, it shrink()s too |
5a964f20 |
801 | } |
f86cebdf |
802 | grow(); # get the real grow() again |
5a964f20 |
803 | |
804 | See L<perlref/"Function Templates"> for more about manipulating |
805 | functions by name in this way. |
806 | |
551e1d92 |
807 | =item 3. |
808 | |
809 | You want to temporarily change just one element of an array or hash. |
5a964f20 |
810 | |
f86cebdf |
811 | You can C<local>ize just one element of an aggregate. Usually this |
5a964f20 |
812 | is done on dynamics: |
813 | |
814 | { |
815 | local $SIG{INT} = 'IGNORE'; |
816 | funct(); # uninterruptible |
817 | } |
818 | # interruptibility automatically restored here |
819 | |
820 | But it also works on lexically declared aggregates. Prior to 5.005, |
821 | this operation could on occasion misbehave. |
822 | |
823 | =back |
824 | |
cb1a09d0 |
825 | =head2 Pass by Reference |
826 | |
55497cff |
827 | If you want to pass more than one array or hash into a function--or |
828 | return them from it--and have them maintain their integrity, then |
829 | you're going to have to use an explicit pass-by-reference. Before you |
830 | do that, you need to understand references as detailed in L<perlref>. |
c07a80fd |
831 | This section may not make much sense to you otherwise. |
cb1a09d0 |
832 | |
19799a22 |
833 | Here are a few simple examples. First, let's pass in several arrays |
834 | to a function and have it C<pop> all of then, returning a new list |
835 | of all their former last elements: |
cb1a09d0 |
836 | |
837 | @tailings = popmany ( \@a, \@b, \@c, \@d ); |
838 | |
839 | sub popmany { |
840 | my $aref; |
841 | my @retlist = (); |
842 | foreach $aref ( @_ ) { |
843 | push @retlist, pop @$aref; |
54310121 |
844 | } |
cb1a09d0 |
845 | return @retlist; |
54310121 |
846 | } |
cb1a09d0 |
847 | |
54310121 |
848 | Here's how you might write a function that returns a |
cb1a09d0 |
849 | list of keys occurring in all the hashes passed to it: |
850 | |
54310121 |
851 | @common = inter( \%foo, \%bar, \%joe ); |
cb1a09d0 |
852 | sub inter { |
853 | my ($k, $href, %seen); # locals |
854 | foreach $href (@_) { |
855 | while ( $k = each %$href ) { |
856 | $seen{$k}++; |
54310121 |
857 | } |
858 | } |
cb1a09d0 |
859 | return grep { $seen{$_} == @_ } keys %seen; |
54310121 |
860 | } |
cb1a09d0 |
861 | |
5f05dabc |
862 | So far, we're using just the normal list return mechanism. |
54310121 |
863 | What happens if you want to pass or return a hash? Well, |
864 | if you're using only one of them, or you don't mind them |
cb1a09d0 |
865 | concatenating, then the normal calling convention is ok, although |
54310121 |
866 | a little expensive. |
cb1a09d0 |
867 | |
868 | Where people get into trouble is here: |
869 | |
870 | (@a, @b) = func(@c, @d); |
871 | or |
872 | (%a, %b) = func(%c, %d); |
873 | |
19799a22 |
874 | That syntax simply won't work. It sets just C<@a> or C<%a> and |
875 | clears the C<@b> or C<%b>. Plus the function didn't get passed |
876 | into two separate arrays or hashes: it got one long list in C<@_>, |
877 | as always. |
cb1a09d0 |
878 | |
879 | If you can arrange for everyone to deal with this through references, it's |
880 | cleaner code, although not so nice to look at. Here's a function that |
881 | takes two array references as arguments, returning the two array elements |
882 | in order of how many elements they have in them: |
883 | |
884 | ($aref, $bref) = func(\@c, \@d); |
885 | print "@$aref has more than @$bref\n"; |
886 | sub func { |
887 | my ($cref, $dref) = @_; |
888 | if (@$cref > @$dref) { |
889 | return ($cref, $dref); |
890 | } else { |
c07a80fd |
891 | return ($dref, $cref); |
54310121 |
892 | } |
893 | } |
cb1a09d0 |
894 | |
895 | It turns out that you can actually do this also: |
896 | |
897 | (*a, *b) = func(\@c, \@d); |
898 | print "@a has more than @b\n"; |
899 | sub func { |
900 | local (*c, *d) = @_; |
901 | if (@c > @d) { |
902 | return (\@c, \@d); |
903 | } else { |
904 | return (\@d, \@c); |
54310121 |
905 | } |
906 | } |
cb1a09d0 |
907 | |
908 | Here we're using the typeglobs to do symbol table aliasing. It's |
19799a22 |
909 | a tad subtle, though, and also won't work if you're using C<my> |
09bef843 |
910 | variables, because only globals (even in disguise as C<local>s) |
19799a22 |
911 | are in the symbol table. |
5f05dabc |
912 | |
913 | If you're passing around filehandles, you could usually just use the bare |
19799a22 |
914 | typeglob, like C<*STDOUT>, but typeglobs references work, too. |
915 | For example: |
5f05dabc |
916 | |
917 | splutter(\*STDOUT); |
918 | sub splutter { |
919 | my $fh = shift; |
920 | print $fh "her um well a hmmm\n"; |
921 | } |
922 | |
923 | $rec = get_rec(\*STDIN); |
924 | sub get_rec { |
925 | my $fh = shift; |
926 | return scalar <$fh>; |
927 | } |
928 | |
19799a22 |
929 | If you're planning on generating new filehandles, you could do this. |
930 | Notice to pass back just the bare *FH, not its reference. |
5f05dabc |
931 | |
932 | sub openit { |
19799a22 |
933 | my $path = shift; |
5f05dabc |
934 | local *FH; |
e05a3a1e |
935 | return open (FH, $path) ? *FH : undef; |
54310121 |
936 | } |
5f05dabc |
937 | |
cb1a09d0 |
938 | =head2 Prototypes |
939 | |
19799a22 |
940 | Perl supports a very limited kind of compile-time argument checking |
941 | using function prototyping. If you declare |
cb1a09d0 |
942 | |
943 | sub mypush (\@@) |
944 | |
19799a22 |
945 | then C<mypush()> takes arguments exactly like C<push()> does. The |
946 | function declaration must be visible at compile time. The prototype |
947 | affects only interpretation of new-style calls to the function, |
948 | where new-style is defined as not using the C<&> character. In |
949 | other words, if you call it like a built-in function, then it behaves |
950 | like a built-in function. If you call it like an old-fashioned |
951 | subroutine, then it behaves like an old-fashioned subroutine. It |
952 | naturally falls out from this rule that prototypes have no influence |
953 | on subroutine references like C<\&foo> or on indirect subroutine |
c47ff5f1 |
954 | calls like C<&{$subref}> or C<< $subref->() >>. |
c07a80fd |
955 | |
956 | Method calls are not influenced by prototypes either, because the |
19799a22 |
957 | function to be called is indeterminate at compile time, since |
958 | the exact code called depends on inheritance. |
cb1a09d0 |
959 | |
19799a22 |
960 | Because the intent of this feature is primarily to let you define |
961 | subroutines that work like built-in functions, here are prototypes |
962 | for some other functions that parse almost exactly like the |
963 | corresponding built-in. |
cb1a09d0 |
964 | |
965 | Declared as Called as |
966 | |
f86cebdf |
967 | sub mylink ($$) mylink $old, $new |
968 | sub myvec ($$$) myvec $var, $offset, 1 |
969 | sub myindex ($$;$) myindex &getstring, "substr" |
970 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off |
971 | sub myreverse (@) myreverse $a, $b, $c |
972 | sub myjoin ($@) myjoin ":", $a, $b, $c |
973 | sub mypop (\@) mypop @array |
974 | sub mysplice (\@$$@) mysplice @array, @array, 0, @pushme |
975 | sub mykeys (\%) mykeys %{$hashref} |
976 | sub myopen (*;$) myopen HANDLE, $name |
977 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE |
978 | sub mygrep (&@) mygrep { /foo/ } $a, $b, $c |
979 | sub myrand ($) myrand 42 |
980 | sub mytime () mytime |
cb1a09d0 |
981 | |
c07a80fd |
982 | Any backslashed prototype character represents an actual argument |
6e47f808 |
983 | that absolutely must start with that character. The value passed |
19799a22 |
984 | as part of C<@_> will be a reference to the actual argument given |
985 | in the subroutine call, obtained by applying C<\> to that argument. |
c07a80fd |
986 | |
5b794e05 |
987 | You can also backslash several argument types simultaneously by using |
988 | the C<\[]> notation: |
989 | |
990 | sub myref (\[$@%&*]) |
991 | |
992 | will allow calling myref() as |
993 | |
994 | myref $var |
995 | myref @array |
996 | myref %hash |
997 | myref &sub |
998 | myref *glob |
999 | |
1000 | and the first argument of myref() will be a reference to |
1001 | a scalar, an array, a hash, a code, or a glob. |
1002 | |
c07a80fd |
1003 | Unbackslashed prototype characters have special meanings. Any |
19799a22 |
1004 | unbackslashed C<@> or C<%> eats all remaining arguments, and forces |
f86cebdf |
1005 | list context. An argument represented by C<$> forces scalar context. An |
1006 | C<&> requires an anonymous subroutine, which, if passed as the first |
0df79f0c |
1007 | argument, does not require the C<sub> keyword or a subsequent comma. |
1008 | |
1009 | A C<*> allows the subroutine to accept a bareword, constant, scalar expression, |
648ca4f7 |
1010 | typeglob, or a reference to a typeglob in that slot. The value will be |
1011 | available to the subroutine either as a simple scalar, or (in the latter |
0df79f0c |
1012 | two cases) as a reference to the typeglob. If you wish to always convert |
1013 | such arguments to a typeglob reference, use Symbol::qualify_to_ref() as |
1014 | follows: |
1015 | |
1016 | use Symbol 'qualify_to_ref'; |
1017 | |
1018 | sub foo (*) { |
1019 | my $fh = qualify_to_ref(shift, caller); |
1020 | ... |
1021 | } |
c07a80fd |
1022 | |
1023 | A semicolon separates mandatory arguments from optional arguments. |
19799a22 |
1024 | It is redundant before C<@> or C<%>, which gobble up everything else. |
cb1a09d0 |
1025 | |
19799a22 |
1026 | Note how the last three examples in the table above are treated |
1027 | specially by the parser. C<mygrep()> is parsed as a true list |
1028 | operator, C<myrand()> is parsed as a true unary operator with unary |
1029 | precedence the same as C<rand()>, and C<mytime()> is truly without |
1030 | arguments, just like C<time()>. That is, if you say |
cb1a09d0 |
1031 | |
1032 | mytime +2; |
1033 | |
f86cebdf |
1034 | you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed |
19799a22 |
1035 | without a prototype. |
cb1a09d0 |
1036 | |
19799a22 |
1037 | The interesting thing about C<&> is that you can generate new syntax with it, |
1038 | provided it's in the initial position: |
cb1a09d0 |
1039 | |
6d28dffb |
1040 | sub try (&@) { |
cb1a09d0 |
1041 | my($try,$catch) = @_; |
1042 | eval { &$try }; |
1043 | if ($@) { |
1044 | local $_ = $@; |
1045 | &$catch; |
1046 | } |
1047 | } |
55497cff |
1048 | sub catch (&) { $_[0] } |
cb1a09d0 |
1049 | |
1050 | try { |
1051 | die "phooey"; |
1052 | } catch { |
1053 | /phooey/ and print "unphooey\n"; |
1054 | }; |
1055 | |
f86cebdf |
1056 | That prints C<"unphooey">. (Yes, there are still unresolved |
19799a22 |
1057 | issues having to do with visibility of C<@_>. I'm ignoring that |
f86cebdf |
1058 | question for the moment. (But note that if we make C<@_> lexically |
cb1a09d0 |
1059 | scoped, those anonymous subroutines can act like closures... (Gee, |
5f05dabc |
1060 | is this sounding a little Lispish? (Never mind.)))) |
cb1a09d0 |
1061 | |
19799a22 |
1062 | And here's a reimplementation of the Perl C<grep> operator: |
cb1a09d0 |
1063 | |
1064 | sub mygrep (&@) { |
1065 | my $code = shift; |
1066 | my @result; |
1067 | foreach $_ (@_) { |
6e47f808 |
1068 | push(@result, $_) if &$code; |
cb1a09d0 |
1069 | } |
1070 | @result; |
1071 | } |
a0d0e21e |
1072 | |
cb1a09d0 |
1073 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
1074 | been intentionally left out of prototypes for the express purpose of |
1075 | someday in the future adding named, formal parameters. The current |
1076 | mechanism's main goal is to let module writers provide better diagnostics |
1077 | for module users. Larry feels the notation quite understandable to Perl |
1078 | programmers, and that it will not intrude greatly upon the meat of the |
1079 | module, nor make it harder to read. The line noise is visually |
1080 | encapsulated into a small pill that's easy to swallow. |
1081 | |
420cdfc1 |
1082 | If you try to use an alphanumeric sequence in a prototype you will |
1083 | generate an optional warning - "Illegal character in prototype...". |
1084 | Unfortunately earlier versions of Perl allowed the prototype to be |
1085 | used as long as its prefix was a valid prototype. The warning may be |
1086 | upgraded to a fatal error in a future version of Perl once the |
1087 | majority of offending code is fixed. |
1088 | |
cb1a09d0 |
1089 | It's probably best to prototype new functions, not retrofit prototyping |
1090 | into older ones. That's because you must be especially careful about |
1091 | silent impositions of differing list versus scalar contexts. For example, |
1092 | if you decide that a function should take just one parameter, like this: |
1093 | |
1094 | sub func ($) { |
1095 | my $n = shift; |
1096 | print "you gave me $n\n"; |
54310121 |
1097 | } |
cb1a09d0 |
1098 | |
1099 | and someone has been calling it with an array or expression |
1100 | returning a list: |
1101 | |
1102 | func(@foo); |
1103 | func( split /:/ ); |
1104 | |
19799a22 |
1105 | Then you've just supplied an automatic C<scalar> in front of their |
f86cebdf |
1106 | argument, which can be more than a bit surprising. The old C<@foo> |
cb1a09d0 |
1107 | which used to hold one thing doesn't get passed in. Instead, |
19799a22 |
1108 | C<func()> now gets passed in a C<1>; that is, the number of elements |
1109 | in C<@foo>. And the C<split> gets called in scalar context so it |
1110 | starts scribbling on your C<@_> parameter list. Ouch! |
cb1a09d0 |
1111 | |
5f05dabc |
1112 | This is all very powerful, of course, and should be used only in moderation |
54310121 |
1113 | to make the world a better place. |
44a8e56a |
1114 | |
1115 | =head2 Constant Functions |
1116 | |
1117 | Functions with a prototype of C<()> are potential candidates for |
19799a22 |
1118 | inlining. If the result after optimization and constant folding |
1119 | is either a constant or a lexically-scoped scalar which has no other |
54310121 |
1120 | references, then it will be used in place of function calls made |
19799a22 |
1121 | without C<&>. Calls made using C<&> are never inlined. (See |
1122 | F<constant.pm> for an easy way to declare most constants.) |
44a8e56a |
1123 | |
5a964f20 |
1124 | The following functions would all be inlined: |
44a8e56a |
1125 | |
699e6cd4 |
1126 | sub pi () { 3.14159 } # Not exact, but close. |
1127 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, |
1128 | # and it's inlined, too! |
44a8e56a |
1129 | sub ST_DEV () { 0 } |
1130 | sub ST_INO () { 1 } |
1131 | |
1132 | sub FLAG_FOO () { 1 << 8 } |
1133 | sub FLAG_BAR () { 1 << 9 } |
1134 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } |
54310121 |
1135 | |
1136 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } |
88267271 |
1137 | |
1138 | sub N () { int(OPT_BAZ) / 3 } |
1139 | |
1140 | sub FOO_SET () { 1 if FLAG_MASK & FLAG_FOO } |
1141 | |
1142 | Be aware that these will not be inlined; as they contain inner scopes, |
1143 | the constant folding doesn't reduce them to a single constant: |
1144 | |
1145 | sub foo_set () { if (FLAG_MASK & FLAG_FOO) { 1 } } |
1146 | |
1147 | sub baz_val () { |
44a8e56a |
1148 | if (OPT_BAZ) { |
1149 | return 23; |
1150 | } |
1151 | else { |
1152 | return 42; |
1153 | } |
1154 | } |
cb1a09d0 |
1155 | |
5a964f20 |
1156 | If you redefine a subroutine that was eligible for inlining, you'll get |
4cee8e80 |
1157 | a mandatory warning. (You can use this warning to tell whether or not a |
1158 | particular subroutine is considered constant.) The warning is |
1159 | considered severe enough not to be optional because previously compiled |
1160 | invocations of the function will still be using the old value of the |
19799a22 |
1161 | function. If you need to be able to redefine the subroutine, you need to |
4cee8e80 |
1162 | ensure that it isn't inlined, either by dropping the C<()> prototype |
19799a22 |
1163 | (which changes calling semantics, so beware) or by thwarting the |
4cee8e80 |
1164 | inlining mechanism in some other way, such as |
1165 | |
4cee8e80 |
1166 | sub not_inlined () { |
54310121 |
1167 | 23 if $]; |
4cee8e80 |
1168 | } |
1169 | |
19799a22 |
1170 | =head2 Overriding Built-in Functions |
a0d0e21e |
1171 | |
19799a22 |
1172 | Many built-in functions may be overridden, though this should be tried |
5f05dabc |
1173 | only occasionally and for good reason. Typically this might be |
19799a22 |
1174 | done by a package attempting to emulate missing built-in functionality |
a0d0e21e |
1175 | on a non-Unix system. |
1176 | |
163e3a99 |
1177 | Overriding may be done only by importing the name from a module at |
1178 | compile time--ordinary predeclaration isn't good enough. However, the |
19799a22 |
1179 | C<use subs> pragma lets you, in effect, predeclare subs |
1180 | via the import syntax, and these names may then override built-in ones: |
a0d0e21e |
1181 | |
1182 | use subs 'chdir', 'chroot', 'chmod', 'chown'; |
1183 | chdir $somewhere; |
1184 | sub chdir { ... } |
1185 | |
19799a22 |
1186 | To unambiguously refer to the built-in form, precede the |
1187 | built-in name with the special package qualifier C<CORE::>. For example, |
1188 | saying C<CORE::open()> always refers to the built-in C<open()>, even |
fb73857a |
1189 | if the current package has imported some other subroutine called |
19799a22 |
1190 | C<&open()> from elsewhere. Even though it looks like a regular |
09bef843 |
1191 | function call, it isn't: you can't take a reference to it, such as |
19799a22 |
1192 | the incorrect C<\&CORE::open> might appear to produce. |
fb73857a |
1193 | |
19799a22 |
1194 | Library modules should not in general export built-in names like C<open> |
1195 | or C<chdir> as part of their default C<@EXPORT> list, because these may |
a0d0e21e |
1196 | sneak into someone else's namespace and change the semantics unexpectedly. |
19799a22 |
1197 | Instead, if the module adds that name to C<@EXPORT_OK>, then it's |
a0d0e21e |
1198 | possible for a user to import the name explicitly, but not implicitly. |
1199 | That is, they could say |
1200 | |
1201 | use Module 'open'; |
1202 | |
19799a22 |
1203 | and it would import the C<open> override. But if they said |
a0d0e21e |
1204 | |
1205 | use Module; |
1206 | |
19799a22 |
1207 | they would get the default imports without overrides. |
a0d0e21e |
1208 | |
19799a22 |
1209 | The foregoing mechanism for overriding built-in is restricted, quite |
95d94a4f |
1210 | deliberately, to the package that requests the import. There is a second |
19799a22 |
1211 | method that is sometimes applicable when you wish to override a built-in |
95d94a4f |
1212 | everywhere, without regard to namespace boundaries. This is achieved by |
1213 | importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an |
1214 | example that quite brazenly replaces the C<glob> operator with something |
1215 | that understands regular expressions. |
1216 | |
1217 | package REGlob; |
1218 | require Exporter; |
1219 | @ISA = 'Exporter'; |
1220 | @EXPORT_OK = 'glob'; |
1221 | |
1222 | sub import { |
1223 | my $pkg = shift; |
1224 | return unless @_; |
1225 | my $sym = shift; |
1226 | my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); |
1227 | $pkg->export($where, $sym, @_); |
1228 | } |
1229 | |
1230 | sub glob { |
1231 | my $pat = shift; |
1232 | my @got; |
19799a22 |
1233 | local *D; |
1234 | if (opendir D, '.') { |
1235 | @got = grep /$pat/, readdir D; |
1236 | closedir D; |
1237 | } |
1238 | return @got; |
95d94a4f |
1239 | } |
1240 | 1; |
1241 | |
1242 | And here's how it could be (ab)used: |
1243 | |
1244 | #use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces |
1245 | package Foo; |
1246 | use REGlob 'glob'; # override glob() in Foo:: only |
1247 | print for <^[a-z_]+\.pm\$>; # show all pragmatic modules |
1248 | |
19799a22 |
1249 | The initial comment shows a contrived, even dangerous example. |
95d94a4f |
1250 | By overriding C<glob> globally, you would be forcing the new (and |
19799a22 |
1251 | subversive) behavior for the C<glob> operator for I<every> namespace, |
95d94a4f |
1252 | without the complete cognizance or cooperation of the modules that own |
1253 | those namespaces. Naturally, this should be done with extreme caution--if |
1254 | it must be done at all. |
1255 | |
1256 | The C<REGlob> example above does not implement all the support needed to |
19799a22 |
1257 | cleanly override perl's C<glob> operator. The built-in C<glob> has |
95d94a4f |
1258 | different behaviors depending on whether it appears in a scalar or list |
19799a22 |
1259 | context, but our C<REGlob> doesn't. Indeed, many perl built-in have such |
95d94a4f |
1260 | context sensitive behaviors, and these must be adequately supported by |
1261 | a properly written override. For a fully functional example of overriding |
1262 | C<glob>, study the implementation of C<File::DosGlob> in the standard |
1263 | library. |
1264 | |
77bc9082 |
1265 | When you override a built-in, your replacement should be consistent (if |
1266 | possible) with the built-in native syntax. You can achieve this by using |
1267 | a suitable prototype. To get the prototype of an overridable built-in, |
1268 | use the C<prototype> function with an argument of C<"CORE::builtin_name"> |
1269 | (see L<perlfunc/prototype>). |
1270 | |
1271 | Note however that some built-ins can't have their syntax expressed by a |
1272 | prototype (such as C<system> or C<chomp>). If you override them you won't |
1273 | be able to fully mimic their original syntax. |
1274 | |
fe854a6f |
1275 | The built-ins C<do>, C<require> and C<glob> can also be overridden, but due |
77bc9082 |
1276 | to special magic, their original syntax is preserved, and you don't have |
1277 | to define a prototype for their replacements. (You can't override the |
1278 | C<do BLOCK> syntax, though). |
1279 | |
1280 | C<require> has special additional dark magic: if you invoke your |
1281 | C<require> replacement as C<require Foo::Bar>, it will actually receive |
1282 | the argument C<"Foo/Bar.pm"> in @_. See L<perlfunc/require>. |
1283 | |
1284 | And, as you'll have noticed from the previous example, if you override |
593b9c14 |
1285 | C<glob>, the C<< <*> >> glob operator is overridden as well. |
77bc9082 |
1286 | |
9b3023bc |
1287 | In a similar fashion, overriding the C<readline> function also overrides |
1288 | the equivalent I/O operator C<< <FILEHANDLE> >>. |
1289 | |
fe854a6f |
1290 | Finally, some built-ins (e.g. C<exists> or C<grep>) can't be overridden. |
77bc9082 |
1291 | |
a0d0e21e |
1292 | =head2 Autoloading |
1293 | |
19799a22 |
1294 | If you call a subroutine that is undefined, you would ordinarily |
1295 | get an immediate, fatal error complaining that the subroutine doesn't |
1296 | exist. (Likewise for subroutines being used as methods, when the |
1297 | method doesn't exist in any base class of the class's package.) |
1298 | However, if an C<AUTOLOAD> subroutine is defined in the package or |
1299 | packages used to locate the original subroutine, then that |
1300 | C<AUTOLOAD> subroutine is called with the arguments that would have |
1301 | been passed to the original subroutine. The fully qualified name |
1302 | of the original subroutine magically appears in the global $AUTOLOAD |
1303 | variable of the same package as the C<AUTOLOAD> routine. The name |
1304 | is not passed as an ordinary argument because, er, well, just |
593b9c14 |
1305 | because, that's why. (As an exception, a method call to a nonexistent |
1306 | C<import> or C<unimport> method is just skipped instead.) |
19799a22 |
1307 | |
1308 | Many C<AUTOLOAD> routines load in a definition for the requested |
1309 | subroutine using eval(), then execute that subroutine using a special |
1310 | form of goto() that erases the stack frame of the C<AUTOLOAD> routine |
1311 | without a trace. (See the source to the standard module documented |
1312 | in L<AutoLoader>, for example.) But an C<AUTOLOAD> routine can |
1313 | also just emulate the routine and never define it. For example, |
1314 | let's pretend that a function that wasn't defined should just invoke |
1315 | C<system> with those arguments. All you'd do is: |
cb1a09d0 |
1316 | |
1317 | sub AUTOLOAD { |
1318 | my $program = $AUTOLOAD; |
1319 | $program =~ s/.*:://; |
1320 | system($program, @_); |
54310121 |
1321 | } |
cb1a09d0 |
1322 | date(); |
6d28dffb |
1323 | who('am', 'i'); |
cb1a09d0 |
1324 | ls('-l'); |
1325 | |
19799a22 |
1326 | In fact, if you predeclare functions you want to call that way, you don't |
1327 | even need parentheses: |
cb1a09d0 |
1328 | |
1329 | use subs qw(date who ls); |
1330 | date; |
1331 | who "am", "i"; |
593b9c14 |
1332 | ls '-l'; |
cb1a09d0 |
1333 | |
1334 | A more complete example of this is the standard Shell module, which |
19799a22 |
1335 | can treat undefined subroutine calls as calls to external programs. |
a0d0e21e |
1336 | |
19799a22 |
1337 | Mechanisms are available to help modules writers split their modules |
1338 | into autoloadable files. See the standard AutoLoader module |
6d28dffb |
1339 | described in L<AutoLoader> and in L<AutoSplit>, the standard |
1340 | SelfLoader modules in L<SelfLoader>, and the document on adding C |
19799a22 |
1341 | functions to Perl code in L<perlxs>. |
cb1a09d0 |
1342 | |
09bef843 |
1343 | =head2 Subroutine Attributes |
1344 | |
1345 | A subroutine declaration or definition may have a list of attributes |
1346 | associated with it. If such an attribute list is present, it is |
0120eecf |
1347 | broken up at space or colon boundaries and treated as though a |
09bef843 |
1348 | C<use attributes> had been seen. See L<attributes> for details |
1349 | about what attributes are currently supported. |
1350 | Unlike the limitation with the obsolescent C<use attrs>, the |
1351 | C<sub : ATTRLIST> syntax works to associate the attributes with |
1352 | a pre-declaration, and not just with a subroutine definition. |
1353 | |
1354 | The attributes must be valid as simple identifier names (without any |
1355 | punctuation other than the '_' character). They may have a parameter |
1356 | list appended, which is only checked for whether its parentheses ('(',')') |
1357 | nest properly. |
1358 | |
1359 | Examples of valid syntax (even though the attributes are unknown): |
1360 | |
0120eecf |
1361 | sub fnord (&\%) : switch(10,foo(7,3)) : expensive ; |
1362 | sub plugh () : Ugly('\(") :Bad ; |
09bef843 |
1363 | sub xyzzy : _5x5 { ... } |
1364 | |
1365 | Examples of invalid syntax: |
1366 | |
1367 | sub fnord : switch(10,foo() ; # ()-string not balanced |
1368 | sub snoid : Ugly('(') ; # ()-string not balanced |
1369 | sub xyzzy : 5x5 ; # "5x5" not a valid identifier |
1370 | sub plugh : Y2::north ; # "Y2::north" not a simple identifier |
0120eecf |
1371 | sub snurt : foo + bar ; # "+" not a colon or space |
09bef843 |
1372 | |
1373 | The attribute list is passed as a list of constant strings to the code |
1374 | which associates them with the subroutine. In particular, the second example |
1375 | of valid syntax above currently looks like this in terms of how it's |
1376 | parsed and invoked: |
1377 | |
1378 | use attributes __PACKAGE__, \&plugh, q[Ugly('\(")], 'Bad'; |
1379 | |
1380 | For further details on attribute lists and their manipulation, |
a0ae32d3 |
1381 | see L<attributes> and L<Attribute::Handlers>. |
09bef843 |
1382 | |
cb1a09d0 |
1383 | =head1 SEE ALSO |
a0d0e21e |
1384 | |
19799a22 |
1385 | See L<perlref/"Function Templates"> for more about references and closures. |
1386 | See L<perlxs> if you'd like to learn about calling C subroutines from Perl. |
a2293a43 |
1387 | See L<perlembed> if you'd like to learn about calling Perl subroutines from C. |
19799a22 |
1388 | See L<perlmod> to learn about bundling up your functions in separate files. |
1389 | See L<perlmodlib> to learn what library modules come standard on your system. |
1390 | See L<perltoot> to learn how to make object method calls. |