Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
3 | perlsub - Perl subroutines |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | To declare subroutines: |
8 | |
cb1a09d0 |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes |
11 | |
12 | sub NAME BLOCK # A declaration and a definition. |
13 | sub NAME(PROTO) BLOCK # ditto, but with prototypes |
a0d0e21e |
14 | |
748a9306 |
15 | To define an anonymous subroutine at runtime: |
16 | |
17 | $subref = sub BLOCK; |
18 | |
a0d0e21e |
19 | To import subroutines: |
20 | |
21 | use PACKAGE qw(NAME1 NAME2 NAME3); |
22 | |
23 | To call subroutines: |
24 | |
a0d0e21e |
25 | NAME(LIST); # & is optional with parens. |
26 | NAME LIST; # Parens optional if predeclared/imported. |
cb1a09d0 |
27 | &NAME; # Passes current @_ to subroutine. |
a0d0e21e |
28 | |
29 | =head1 DESCRIPTION |
30 | |
cb1a09d0 |
31 | Like many languages, Perl provides for user-defined subroutines. These |
32 | may be located anywhere in the main program, loaded in from other files |
33 | via the C<do>, C<require>, or C<use> keywords, or even generated on the |
34 | fly using C<eval> or anonymous subroutines (closures). You can even call |
35 | a function indirectly using a variable containing its name or a CODE reference. |
36 | |
37 | The Perl model for function call and return values is simple: all |
38 | functions are passed as parameters one single flat list of scalars, and |
39 | all functions likewise return to their caller one single flat list of |
40 | scalars. Any arrays or hashes in these call and return lists will |
41 | collapse, losing their identities--but you may always use |
42 | pass-by-reference instead to avoid this. Both call and return lists may |
43 | contain as many or as few scalar elements as you'd like. (Often a |
44 | function without an explicit return statement is called a subroutine, but |
45 | there's really no difference from the language's perspective.) |
46 | |
47 | Any arguments passed to the routine come in as the array @_. Thus if you |
48 | called a function with two arguments, those would be stored in C<$_[0]> |
49 | and C<$_[1]>. The array @_ is a local array, but its values are implicit |
50 | references (predating L<perlref>) to the actual scalar parameters. The |
51 | return value of the subroutine is the value of the last expression |
52 | evaluated. Alternatively, a return statement may be used to specify the |
53 | returned value and exit the subroutine. If you return one or more arrays |
54 | and/or hashes, these will be flattened together into one large |
55 | indistinguishable list. |
56 | |
57 | Perl does not have named formal parameters, but in practice all you do is |
58 | assign to a my() list of these. Any variables you use in the function |
59 | that aren't declared private are global variables. For the gory details |
60 | on creating private variables, see the sections below on L<"Private |
61 | Variables via my()"> and L</"Temporary Values via local()">. To create |
62 | protected environments for a set of functions in a separate package (and |
63 | probably a separate file), see L<perlmod/"Packages">. |
a0d0e21e |
64 | |
65 | Example: |
66 | |
cb1a09d0 |
67 | sub max { |
68 | my $max = shift(@_); |
a0d0e21e |
69 | foreach $foo (@_) { |
70 | $max = $foo if $max < $foo; |
71 | } |
cb1a09d0 |
72 | return $max; |
a0d0e21e |
73 | } |
cb1a09d0 |
74 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e |
75 | |
76 | Example: |
77 | |
78 | # get a line, combining continuation lines |
79 | # that start with whitespace |
80 | |
81 | sub get_line { |
cb1a09d0 |
82 | $thisline = $lookahead; # GLOBAL VARIABLES!! |
a0d0e21e |
83 | LINE: while ($lookahead = <STDIN>) { |
84 | if ($lookahead =~ /^[ \t]/) { |
85 | $thisline .= $lookahead; |
86 | } |
87 | else { |
88 | last LINE; |
89 | } |
90 | } |
91 | $thisline; |
92 | } |
93 | |
94 | $lookahead = <STDIN>; # get first line |
95 | while ($_ = get_line()) { |
96 | ... |
97 | } |
98 | |
99 | Use array assignment to a local list to name your formal arguments: |
100 | |
101 | sub maybeset { |
102 | my($key, $value) = @_; |
cb1a09d0 |
103 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e |
104 | } |
105 | |
cb1a09d0 |
106 | This also has the effect of turning call-by-reference into call-by-value, |
107 | since the assignment copies the values. Otherwise a function is free to |
108 | do in-place modifications of @_ and change its callers values. |
109 | |
110 | upcase_in($v1, $v2); # this changes $v1 and $v2 |
111 | sub upcase_in { |
112 | for (@_) { tr/a-z/A-Z/ } |
113 | } |
114 | |
115 | You aren't allowed to modify constants in this way, of course. If an |
116 | argument were actually literal and you tried to change it, you'd take a |
117 | (presumably fatal) exception. For example, this won't work: |
118 | |
119 | upcase_in("frederick"); |
120 | |
121 | It would be much safer if the upcase_in() function |
122 | were written to return a copy of its parameters instead |
123 | of changing them in place: |
124 | |
125 | ($v3, $v4) = upcase($v1, $v2); # this doesn't |
126 | sub upcase { |
127 | my @parms = @_; |
128 | for (@parms) { tr/a-z/A-Z/ } |
129 | return @parms; |
130 | } |
131 | |
132 | Notice how this (unprototyped) function doesn't care whether it was passed |
133 | real scalars or arrays. Perl will see everything as one big long flat @_ |
134 | parameter list. This is one of the ways where Perl's simple |
135 | argument-passing style shines. The upcase() function would work perfectly |
136 | well without changing the upcase() definition even if we fed it things |
137 | like this: |
138 | |
139 | @newlist = upcase(@list1, @list2); |
140 | @newlist = upcase( split /:/, $var ); |
141 | |
142 | Do not, however, be tempted to do this: |
143 | |
144 | (@a, @b) = upcase(@list1, @list2); |
145 | |
146 | Because like its flat incoming parameter list, the return list is also |
147 | flat. So all you have managed to do here is stored everything in @a and |
148 | made @b an empty list. See L</"Pass by Reference"> for alternatives. |
149 | |
150 | A subroutine may be called using the "&" prefix. The "&" is optional in |
151 | Perl 5, and so are the parens if the subroutine has been predeclared. |
152 | (Note, however, that the "&" is I<NOT> optional when you're just naming |
153 | the subroutine, such as when it's used as an argument to defined() or |
154 | undef(). Nor is it optional when you want to do an indirect subroutine |
155 | call with a subroutine name or reference using the C<&$subref()> or |
156 | C<&{$subref}()> constructs. See L<perlref> for more on that.) |
a0d0e21e |
157 | |
158 | Subroutines may be called recursively. If a subroutine is called using |
cb1a09d0 |
159 | the "&" form, the argument list is optional, and if omitted, no @_ array is |
160 | set up for the subroutine: the @_ array at the time of the call is |
161 | visible to subroutine instead. This is an efficiency mechanism that |
162 | new users may wish to avoid. |
a0d0e21e |
163 | |
164 | &foo(1,2,3); # pass three arguments |
165 | foo(1,2,3); # the same |
166 | |
167 | foo(); # pass a null list |
168 | &foo(); # the same |
a0d0e21e |
169 | |
cb1a09d0 |
170 | &foo; # foo() get current args, like foo(@_) !! |
171 | foo; # like foo() IFF sub foo pre-declared, else "foo" |
172 | |
173 | =head2 Private Variables via my() |
174 | |
175 | Synopsis: |
176 | |
177 | my $foo; # declare $foo lexically local |
178 | my (@wid, %get); # declare list of variables local |
179 | my $foo = "flurp"; # declare $foo lexical, and init it |
180 | my @oof = @bar; # declare @oof lexical, and init it |
181 | |
182 | A "my" declares the listed variables to be confined (lexically) to the |
183 | enclosing block, subroutine, C<eval>, or C<do/require/use>'d file. If |
184 | more than one value is listed, the list must be placed in parens. All |
185 | listed elements must be legal lvalues. Only alphanumeric identifiers may |
186 | be lexically scoped--magical builtins like $/ must currently be localized with |
187 | "local" instead. |
188 | |
189 | Unlike dynamic variables created by the "local" statement, lexical |
190 | variables declared with "my" are totally hidden from the outside world, |
191 | including any called subroutines (even if it's the same subroutine called |
192 | from itself or elsewhere--every call gets its own copy). |
193 | |
194 | (An eval(), however, can see the lexical variables of the scope it is |
195 | being evaluated in so long as the names aren't hidden by declarations within |
196 | the eval() itself. See L<perlref>.) |
197 | |
198 | The parameter list to my() may be assigned to if desired, which allows you |
199 | to initialize your variables. (If no initializer is given for a |
200 | particular variable, it is created with the undefined value.) Commonly |
201 | this is used to name the parameters to a subroutine. Examples: |
202 | |
203 | $arg = "fred"; # "global" variable |
204 | $n = cube_root(27); |
205 | print "$arg thinks the root is $n\n"; |
206 | fred thinks the root is 3 |
207 | |
208 | sub cube_root { |
209 | my $arg = shift; # name doesn't matter |
210 | $arg **= 1/3; |
211 | return $arg; |
212 | } |
213 | |
214 | The "my" is simply a modifier on something you might assign to. So when |
215 | you do assign to the variables in its argument list, the "my" doesn't |
216 | change whether those variables is viewed as a scalar or an array. So |
217 | |
218 | my ($foo) = <STDIN>; |
219 | my @FOO = <STDIN>; |
220 | |
221 | both supply a list context to the righthand side, while |
222 | |
223 | my $foo = <STDIN>; |
224 | |
225 | supplies a scalar context. But the following only declares one variable: |
748a9306 |
226 | |
cb1a09d0 |
227 | my $foo, $bar = 1; |
748a9306 |
228 | |
cb1a09d0 |
229 | That has the same effect as |
748a9306 |
230 | |
cb1a09d0 |
231 | my $foo; |
232 | $bar = 1; |
a0d0e21e |
233 | |
cb1a09d0 |
234 | The declared variable is not introduced (is not visible) until after |
235 | the current statement. Thus, |
236 | |
237 | my $x = $x; |
238 | |
239 | can be used to initialize the new $x with the value of the old $x, and |
240 | the expression |
241 | |
242 | my $x = 123 and $x == 123 |
243 | |
244 | is false unless the old $x happened to have the value 123. |
245 | |
246 | Some users may wish to encourage the use of lexically scoped variables. |
247 | As an aid to catching implicit references to package variables, |
248 | if you say |
249 | |
250 | use strict 'vars'; |
251 | |
252 | then any variable reference from there to the end of the enclosing |
253 | block must either refer to a lexical variable, or must be fully |
254 | qualified with the package name. A compilation error results |
255 | otherwise. An inner block may countermand this with S<"no strict 'vars'">. |
256 | |
257 | A my() has both a compile-time and a run-time effect. At compile time, |
258 | the compiler takes notice of it; the principle usefulness of this is to |
259 | quiet C<use strict 'vars'>. The actual initialization doesn't happen |
260 | until run time, so gets executed every time through a loop. |
261 | |
262 | Variables declared with "my" are not part of any package and are therefore |
263 | never fully qualified with the package name. In particular, you're not |
264 | allowed to try to make a package variable (or other global) lexical: |
265 | |
266 | my $pack::var; # ERROR! Illegal syntax |
267 | my $_; # also illegal (currently) |
268 | |
269 | In fact, a dynamic variable (also known as package or global variables) |
270 | are still accessible using the fully qualified :: notation even while a |
271 | lexical of the same name is also visible: |
272 | |
273 | package main; |
274 | local $x = 10; |
275 | my $x = 20; |
276 | print "$x and $::x\n"; |
277 | |
278 | That will print out 20 and 10. |
279 | |
280 | You may declare "my" variables at the outer most scope of a file to |
281 | totally hide any such identifiers from the outside world. This is similar |
282 | to a C's static variables at the file level. To do this with a subroutine |
283 | requires the use of a closure (anonymous function). If a block (such as |
284 | an eval(), function, or C<package>) wants to create a private subroutine |
285 | that cannot be called from outside that block, it can declare a lexical |
286 | variable containing an anonymous sub reference: |
287 | |
288 | my $secret_version = '1.001-beta'; |
289 | my $secret_sub = sub { print $secret_version }; |
290 | &$secret_sub(); |
291 | |
292 | As long as the reference is never returned by any function within the |
293 | module, no outside module can see the subroutine, since its name is not in |
294 | any package's symbol table. Remember that it's not I<REALLY> called |
295 | $some_pack::secret_version or anything; it's just $secret_version, |
296 | unqualified and unqualifiable. |
297 | |
298 | This does not work with object methods, however; all object methods have |
299 | to be in the symbol table of some package to be found. |
300 | |
301 | Just because the lexical variable is lexically (also called statically) |
302 | scoped doesn't mean that within a function it works like a C static. It |
303 | normally works more like a C auto. But here's a mechanism for giving a |
304 | function private variables with both lexical scoping and a static |
305 | lifetime. If you do want to create something like C's static variables, |
306 | just enclose the whole function in an extra block, and put the |
307 | static variable outside the function but in the block. |
308 | |
309 | { |
310 | my $secret_val = 0; |
311 | sub gimme_another { |
312 | return ++$secret_val; |
313 | } |
314 | } |
315 | # $secret_val now becomes unreachable by the outside |
316 | # world, but retains its value between calls to gimme_another |
317 | |
318 | If this function is being sourced in from a separate file |
319 | via C<require> or C<use>, then this is probably just fine. If it's |
320 | all in the main program, you'll need to arrange for the my() |
321 | to be executed early, either by putting the whole block above |
322 | your pain program, or more likely, merely placing a BEGIN |
323 | sub around it to make sure it gets executed before your program |
324 | starts to run: |
325 | |
326 | sub BEGIN { |
327 | my $secret_val = 0; |
328 | sub gimme_another { |
329 | return ++$secret_val; |
330 | } |
331 | } |
332 | |
333 | See L<perlrun> about the BEGIN function. |
334 | |
335 | =head2 Temporary Values via local() |
336 | |
337 | B<NOTE>: In general, you should be using "my" instead of "local", because |
338 | it's faster and safer. Execeptions to this include the global punctuation |
339 | variables, filehandles and formats, and direct manipulation of the Perl |
340 | symbol table itself. Format variables often use "local" though, as do |
341 | other variables whose current value must be visible to called |
342 | subroutines. |
343 | |
344 | Synopsis: |
345 | |
346 | local $foo; # declare $foo dynamically local |
347 | local (@wid, %get); # declare list of variables local |
348 | local $foo = "flurp"; # declare $foo dynamic, and init it |
349 | local @oof = @bar; # declare @oof dynamic, and init it |
350 | |
351 | local *FH; # localize $FH, @FH, %FH, &FH ... |
352 | local *merlyn = *randal; # now $merlyn is really $randal, plus |
353 | # @merlyn is really @randal, etc |
354 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal |
355 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
356 | |
357 | A local() modifies its listed variables to be local to the enclosing |
358 | block, (or subroutine, C<eval{}> or C<do>) and I<the any called from |
359 | within that block>. A local() just gives temporary values to global |
360 | (meaning package) variables. This is known as dynamic scoping. Lexical |
361 | scoping is done with "my", which works more like C's auto declarations. |
362 | |
363 | If more than one variable is given to local(), they must be placed in |
364 | parens. All listed elements must be legal lvalues. This operator works |
365 | by saving the current values of those variables in its argument list on a |
366 | hidden stack and restoring them upon exiting the block, subroutine or |
367 | eval. This means that called subroutines can also reference the local |
368 | variable, but not the global one. The argument list may be assigned to if |
369 | desired, which allows you to initialize your local variables. (If no |
370 | initializer is given for a particular variable, it is created with an |
371 | undefined value.) Commonly this is used to name the parameters to a |
372 | subroutine. Examples: |
373 | |
374 | for $i ( 0 .. 9 ) { |
375 | $digits{$i} = $i; |
376 | } |
377 | # assume this function uses global %digits hash |
378 | parse_num(); |
379 | |
380 | # now temporarily add to %digits hash |
381 | if ($base12) { |
382 | # (NOTE: not claiming this is efficient!) |
383 | local %digits = (%digits, 't' => 10, 'e' => 11); |
384 | parse_num(); # parse_num gets this new %digits! |
385 | } |
386 | # old %digits restored here |
387 | |
388 | Because local() is a run-time command, and so gets executed every time |
389 | through a loop. In releases of Perl previous to 5.0, this used more stack |
390 | storage each time until the loop was exited. Perl now reclaims the space |
391 | each time through, but it's still more efficient to declare your variables |
392 | outside the loop. |
393 | |
394 | A local is simply a modifier on an lvalue expression. When you assign to |
395 | a localized variable, the local doesn't change whether its list is viewed |
396 | as a scalar or an array. So |
397 | |
398 | local($foo) = <STDIN>; |
399 | local @FOO = <STDIN>; |
400 | |
401 | both supply a list context to the righthand side, while |
402 | |
403 | local $foo = <STDIN>; |
404 | |
405 | supplies a scalar context. |
406 | |
407 | =head2 Passing Symbol Table Entries (typeglobs) |
408 | |
409 | [Note: The mechanism described in this section was originally the only |
410 | way to simulate pass-by-reference in older versions of Perl. While it |
411 | still works fine in modern versions, the new reference mechanism is |
412 | generally easier to work with. See below.] |
a0d0e21e |
413 | |
414 | Sometimes you don't want to pass the value of an array to a subroutine |
415 | but rather the name of it, so that the subroutine can modify the global |
416 | copy of it rather than working with a local copy. In perl you can |
cb1a09d0 |
417 | refer to all objects of a particular name by prefixing the name |
a0d0e21e |
418 | with a star: C<*foo>. This is often known as a "type glob", since the |
419 | star on the front can be thought of as a wildcard match for all the |
420 | funny prefix characters on variables and subroutines and such. |
421 | |
422 | When evaluated, the type glob produces a scalar value that represents |
423 | all the objects of that name, including any filehandle, format or |
424 | subroutine. When assigned to, it causes the name mentioned to refer to |
425 | whatever "*" value was assigned to it. Example: |
426 | |
427 | sub doubleary { |
428 | local(*someary) = @_; |
429 | foreach $elem (@someary) { |
430 | $elem *= 2; |
431 | } |
432 | } |
433 | doubleary(*foo); |
434 | doubleary(*bar); |
435 | |
436 | Note that scalars are already passed by reference, so you can modify |
437 | scalar arguments without using this mechanism by referring explicitly |
438 | to $_[0] etc. You can modify all the elements of an array by passing |
439 | all the elements as scalars, but you have to use the * mechanism (or |
440 | the equivalent reference mechanism) to push, pop or change the size of |
441 | an array. It will certainly be faster to pass the typeglob (or reference). |
442 | |
443 | Even if you don't want to modify an array, this mechanism is useful for |
444 | passing multiple arrays in a single LIST, since normally the LIST |
445 | mechanism will merge all the array values so that you can't extract out |
cb1a09d0 |
446 | the individual arrays. For more on typeglobs, see L<perldata/"Typeglobs">. |
447 | |
448 | =head2 Pass by Reference |
449 | |
450 | If you want to pass more than one array or hash into a function--or |
451 | return them from it--and have them maintain their integrity, |
452 | then you're going to have to use an explicit pass-by-reference. |
453 | Before you do that, you need to understand references; see L<perlref>. |
454 | |
455 | Here are a few simple examples. First, let's pass in several |
456 | arrays to a function and have it pop all of then, return a new |
457 | list of all their former last elements: |
458 | |
459 | @tailings = popmany ( \@a, \@b, \@c, \@d ); |
460 | |
461 | sub popmany { |
462 | my $aref; |
463 | my @retlist = (); |
464 | foreach $aref ( @_ ) { |
465 | push @retlist, pop @$aref; |
466 | } |
467 | return @retlist; |
468 | } |
469 | |
470 | Here's how you might write a function that returns a |
471 | list of keys occurring in all the hashes passed to it: |
472 | |
473 | @common = inter( \%foo, \%bar, \%joe ); |
474 | sub inter { |
475 | my ($k, $href, %seen); # locals |
476 | foreach $href (@_) { |
477 | while ( $k = each %$href ) { |
478 | $seen{$k}++; |
479 | } |
480 | } |
481 | return grep { $seen{$_} == @_ } keys %seen; |
482 | } |
483 | |
484 | So far, we're just using the normal list return mechanism. |
485 | What happens if you want to pass or return a hash? Well, |
486 | if you're only using one of them, or you don't mind them |
487 | concatenating, then the normal calling convention is ok, although |
488 | a little expensive. |
489 | |
490 | Where people get into trouble is here: |
491 | |
492 | (@a, @b) = func(@c, @d); |
493 | or |
494 | (%a, %b) = func(%c, %d); |
495 | |
496 | That syntax simply won't work. It just sets @a or %a and clears the @b or |
497 | %b. Plus the function didn't get passed into two separate arrays or |
498 | hashes: it got one long list in @_, as always. |
499 | |
500 | If you can arrange for everyone to deal with this through references, it's |
501 | cleaner code, although not so nice to look at. Here's a function that |
502 | takes two array references as arguments, returning the two array elements |
503 | in order of how many elements they have in them: |
504 | |
505 | ($aref, $bref) = func(\@c, \@d); |
506 | print "@$aref has more than @$bref\n"; |
507 | sub func { |
508 | my ($cref, $dref) = @_; |
509 | if (@$cref > @$dref) { |
510 | return ($cref, $dref); |
511 | } else { |
512 | return ($cref, $cref); |
513 | } |
514 | } |
515 | |
516 | It turns out that you can actually do this also: |
517 | |
518 | (*a, *b) = func(\@c, \@d); |
519 | print "@a has more than @b\n"; |
520 | sub func { |
521 | local (*c, *d) = @_; |
522 | if (@c > @d) { |
523 | return (\@c, \@d); |
524 | } else { |
525 | return (\@d, \@c); |
526 | } |
527 | } |
528 | |
529 | Here we're using the typeglobs to do symbol table aliasing. It's |
530 | a tad subtle, though, and also won't work if you're using my() |
531 | variables, since only globals (well, and local()s) are in the symbol table. |
532 | |
533 | If you're passing around filehandles, you could usually just use the bare |
534 | typeglob, like *STDOUT, but typeglobs references would be better because |
535 | they'll still work properly under C<use strict 'refs'>. For example: |
536 | |
537 | splutter(\*STDOUT); |
538 | sub splutter { |
539 | my $fh = shift; |
540 | print $fh "her um well a hmmm\n"; |
541 | } |
542 | |
543 | $rec = get_rec(\*STDIN); |
544 | sub get_rec { |
545 | my $fh = shift; |
546 | return scalar <$fh>; |
547 | } |
548 | |
549 | If you're planning on generating new filehandles, you could do this: |
550 | |
551 | sub openit { |
552 | my $name = shift; |
553 | local *FH; |
554 | return open (FH, $path) ? \*FH : undef; |
555 | } |
556 | |
557 | Although that will actually produce a small memory leak. See the bottom |
558 | of L<perlfunc/open()> for a somewhat cleaner way using the FileHandle |
559 | functions supplied with the POSIX package. |
560 | |
561 | =head2 Prototypes |
562 | |
563 | As of the 5.002 release of perl, if you declare |
564 | |
565 | sub mypush (\@@) |
566 | |
567 | then mypush() takes arguments exactly like push() does. (This only works |
568 | for function calls that are visible at compile time, not indirect function |
569 | calls through a C<&$func> reference nor for method calls as described in |
570 | L<perlobj>.) |
571 | |
572 | Here are the prototypes for some other functions that parse almost exactly |
573 | like the corresponding builtins. |
574 | |
575 | Declared as Called as |
576 | |
577 | sub mylink ($$) mylink $old, $new |
578 | sub myvec ($$$) myvec $var, $offset, 1 |
579 | sub myindex ($$;$) myindex &getstring, "substr" |
580 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off |
581 | sub myreverse (@) myreverse $a,$b,$c |
582 | sub myjoin ($@) myjoin ":",$a,$b,$c |
583 | sub mypop (\@) mypop @array |
584 | sub mysplice (\@$$@) mysplice @array,@array,0,@pushme |
585 | sub mykeys (\%) mykeys %{$hashref} |
586 | sub myopen (*;$) myopen HANDLE, $name |
587 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE |
588 | sub mygrep (&@) mygrep { /foo/ } $a,$b,$c |
589 | sub myrand ($) myrand 42 |
590 | sub mytime () mytime |
591 | |
592 | Any backslashed prototype character must be passed something starting |
593 | with that character. Any unbackslashed @ or % eats all the rest of the |
594 | arguments, and forces list context. An argument represented by $ |
595 | forces scalar context. An & requires an anonymous subroutine, and * |
596 | does whatever it has to do to turn the argument into a reference to a |
597 | symbol table entry. A semicolon separates mandatory arguments from |
598 | optional arguments. |
599 | |
600 | Note that the last three are syntactically distinguished by the lexer. |
601 | mygrep() is parsed as a true list operator, myrand() is parsed as a |
602 | true unary operator with unary precedence the same as rand(), and |
603 | mytime() is truly argumentless, just like time(). That is, if you |
604 | say |
605 | |
606 | mytime +2; |
607 | |
608 | you'll get mytime() + 2, not mytime(2), which is how it would be parsed |
609 | without the prototype. |
610 | |
611 | The interesting thing about & is that you can generate new syntax with it: |
612 | |
613 | sub try (&$) { |
614 | my($try,$catch) = @_; |
615 | eval { &$try }; |
616 | if ($@) { |
617 | local $_ = $@; |
618 | &$catch; |
619 | } |
620 | } |
621 | sub catch (&) { @_ } |
622 | |
623 | try { |
624 | die "phooey"; |
625 | } catch { |
626 | /phooey/ and print "unphooey\n"; |
627 | }; |
628 | |
629 | That prints "unphooey". (Yes, there are still unresolved |
630 | issues having to do with the visibility of @_. I'm ignoring that |
631 | question for the moment. (But note that if we make @_ lexically |
632 | scoped, those anonymous subroutines can act like closures... (Gee, |
633 | is this sounding a little Lispish? (Nevermind.)))) |
634 | |
635 | And here's a reimplementation of grep: |
636 | |
637 | sub mygrep (&@) { |
638 | my $code = shift; |
639 | my @result; |
640 | foreach $_ (@_) { |
641 | push(@result, $_) if &$ref; |
642 | } |
643 | @result; |
644 | } |
a0d0e21e |
645 | |
cb1a09d0 |
646 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
647 | been intentionally left out of prototypes for the express purpose of |
648 | someday in the future adding named, formal parameters. The current |
649 | mechanism's main goal is to let module writers provide better diagnostics |
650 | for module users. Larry feels the notation quite understandable to Perl |
651 | programmers, and that it will not intrude greatly upon the meat of the |
652 | module, nor make it harder to read. The line noise is visually |
653 | encapsulated into a small pill that's easy to swallow. |
654 | |
655 | It's probably best to prototype new functions, not retrofit prototyping |
656 | into older ones. That's because you must be especially careful about |
657 | silent impositions of differing list versus scalar contexts. For example, |
658 | if you decide that a function should take just one parameter, like this: |
659 | |
660 | sub func ($) { |
661 | my $n = shift; |
662 | print "you gave me $n\n"; |
663 | } |
664 | |
665 | and someone has been calling it with an array or expression |
666 | returning a list: |
667 | |
668 | func(@foo); |
669 | func( split /:/ ); |
670 | |
671 | Then you've just supplied an automatic scalar() in front of their |
672 | argument, which can be more than a bit surprising. The old @foo |
673 | which used to hold one thing doesn't get passed in. Instead, |
674 | the func() now gets passed in 1, that is, the number of elments |
675 | in @foo. And the split() gets called in a scalar context and |
676 | starts scribbling on your @_ parameter list. |
677 | |
678 | This is all very powerful, of course, and should only be used in moderation |
679 | to make the world a better place. |
680 | |
681 | =head2 Overriding Builtin Functions |
a0d0e21e |
682 | |
683 | Many builtin functions may be overridden, though this should only be |
684 | tried occasionally and for good reason. Typically this might be |
685 | done by a package attempting to emulate missing builtin functionality |
686 | on a non-Unix system. |
687 | |
688 | Overriding may only be done by importing the name from a |
689 | module--ordinary predeclaration isn't good enough. However, the |
690 | C<subs> pragma (compiler directive) lets you, in effect, predeclare subs |
691 | via the import syntax, and these names may then override the builtin ones: |
692 | |
693 | use subs 'chdir', 'chroot', 'chmod', 'chown'; |
694 | chdir $somewhere; |
695 | sub chdir { ... } |
696 | |
697 | Library modules should not in general export builtin names like "open" |
698 | or "chdir" as part of their default @EXPORT list, since these may |
699 | sneak into someone else's namespace and change the semantics unexpectedly. |
700 | Instead, if the module adds the name to the @EXPORT_OK list, then it's |
701 | possible for a user to import the name explicitly, but not implicitly. |
702 | That is, they could say |
703 | |
704 | use Module 'open'; |
705 | |
706 | and it would import the open override, but if they said |
707 | |
708 | use Module; |
709 | |
710 | they would get the default imports without the overrides. |
711 | |
712 | =head2 Autoloading |
713 | |
714 | If you call a subroutine that is undefined, you would ordinarily get an |
715 | immediate fatal error complaining that the subroutine doesn't exist. |
716 | (Likewise for subroutines being used as methods, when the method |
717 | doesn't exist in any of the base classes of the class package.) If, |
718 | however, there is an C<AUTOLOAD> subroutine defined in the package or |
719 | packages that were searched for the original subroutine, then that |
720 | C<AUTOLOAD> subroutine is called with the arguments that would have been |
721 | passed to the original subroutine. The fully qualified name of the |
722 | original subroutine magically appears in the $AUTOLOAD variable in the |
723 | same package as the C<AUTOLOAD> routine. The name is not passed as an |
724 | ordinary argument because, er, well, just because, that's why... |
725 | |
726 | Most C<AUTOLOAD> routines will load in a definition for the subroutine in |
727 | question using eval, and then execute that subroutine using a special |
728 | form of "goto" that erases the stack frame of the C<AUTOLOAD> routine |
729 | without a trace. (See the standard C<AutoLoader> module, for example.) |
730 | But an C<AUTOLOAD> routine can also just emulate the routine and never |
cb1a09d0 |
731 | define it. For example, let's pretend that a function that wasn't defined |
732 | should just call system() with those arguments. All you'd do is this: |
733 | |
734 | sub AUTOLOAD { |
735 | my $program = $AUTOLOAD; |
736 | $program =~ s/.*:://; |
737 | system($program, @_); |
738 | } |
739 | date(); |
740 | who('am', i'); |
741 | ls('-l'); |
742 | |
743 | In fact, if you preclare the functions you want to call that way, you don't |
744 | even need the parentheses: |
745 | |
746 | use subs qw(date who ls); |
747 | date; |
748 | who "am", "i"; |
749 | ls -l; |
750 | |
751 | A more complete example of this is the standard Shell module, which |
a0d0e21e |
752 | can treat undefined subroutine calls as calls to Unix programs. |
753 | |
cb1a09d0 |
754 | Mechanisms are available for modules writers to help split the modules |
755 | up into autoloadable files. See the standard AutoLoader module described |
756 | in L<Autoloader>, the standard SelfLoader modules in L<SelfLoader>, and |
757 | the document on adding C functions to perl code in L<perlxs>. |
758 | |
759 | =head1 SEE ALSO |
a0d0e21e |
760 | |
cb1a09d0 |
761 | See L<perlref> for more on references. See L<perlxs> if you'd |
762 | like to learn about calling C subroutines from perl. See |
763 | L<perlmod> to learn about bundling up your functions in |
764 | separate files. |