3 perlsub - Perl subroutines
7 To declare subroutines:
9 sub NAME; # A "forward" declaration.
10 sub NAME(PROTO); # ditto, but with prototypes
12 sub NAME BLOCK # A declaration and a definition.
13 sub NAME(PROTO) BLOCK # ditto, but with prototypes
15 To define an anonymous subroutine at runtime:
19 To import subroutines:
21 use PACKAGE qw(NAME1 NAME2 NAME3);
25 NAME(LIST); # & is optional with parens.
26 NAME LIST; # Parens optional if predeclared/imported.
27 &NAME; # Passes current @_ to subroutine.
31 Like many languages, Perl provides for user-defined subroutines. These
32 may be located anywhere in the main program, loaded in from other files
33 via the C<do>, C<require>, or C<use> keywords, or even generated on the
34 fly using C<eval> or anonymous subroutines (closures). You can even call
35 a function indirectly using a variable containing its name or a CODE reference
36 to it, as in C<$var = \&function>.
38 The Perl model for function call and return values is simple: all
39 functions are passed as parameters one single flat list of scalars, and
40 all functions likewise return to their caller one single flat list of
41 scalars. Any arrays or hashes in these call and return lists will
42 collapse, losing their identities--but you may always use
43 pass-by-reference instead to avoid this. Both call and return lists may
44 contain as many or as few scalar elements as you'd like. (Often a
45 function without an explicit return statement is called a subroutine, but
46 there's really no difference from the language's perspective.)
48 Any arguments passed to the routine come in as the array @_. Thus if you
49 called a function with two arguments, those would be stored in C<$_[0]>
50 and C<$_[1]>. The array @_ is a local array, but its values are implicit
51 references (predating L<perlref>) to the actual scalar parameters. The
52 return value of the subroutine is the value of the last expression
53 evaluated. Alternatively, a return statement may be used to specify the
54 returned value and exit the subroutine. If you return one or more arrays
55 and/or hashes, these will be flattened together into one large
56 indistinguishable list.
58 Perl does not have named formal parameters, but in practice all you do is
59 assign to a my() list of these. Any variables you use in the function
60 that aren't declared private are global variables. For the gory details
61 on creating private variables, see the sections below on L<"Private
62 Variables via my()"> and L</"Temporary Values via local()">. To create
63 protected environments for a set of functions in a separate package (and
64 probably a separate file), see L<perlmod/"Packages">.
71 $max = $foo if $max < $foo;
75 $bestday = max($mon,$tue,$wed,$thu,$fri);
79 # get a line, combining continuation lines
80 # that start with whitespace
83 $thisline = $lookahead; # GLOBAL VARIABLES!!
84 LINE: while ($lookahead = <STDIN>) {
85 if ($lookahead =~ /^[ \t]/) {
86 $thisline .= $lookahead;
95 $lookahead = <STDIN>; # get first line
96 while ($_ = get_line()) {
100 Use array assignment to a local list to name your formal arguments:
103 my($key, $value) = @_;
104 $Foo{$key} = $value unless $Foo{$key};
107 This also has the effect of turning call-by-reference into call-by-value,
108 since the assignment copies the values. Otherwise a function is free to
109 do in-place modifications of @_ and change its callers values.
111 upcase_in($v1, $v2); # this changes $v1 and $v2
113 for (@_) { tr/a-z/A-Z/ }
116 You aren't allowed to modify constants in this way, of course. If an
117 argument were actually literal and you tried to change it, you'd take a
118 (presumably fatal) exception. For example, this won't work:
120 upcase_in("frederick");
122 It would be much safer if the upcase_in() function
123 were written to return a copy of its parameters instead
124 of changing them in place:
126 ($v3, $v4) = upcase($v1, $v2); # this doesn't
129 for (@parms) { tr/a-z/A-Z/ }
130 # wantarray checks if we were called in list context
131 return wantarray ? @parms : $parms[0];
134 Notice how this (unprototyped) function doesn't care whether it was passed
135 real scalars or arrays. Perl will see everything as one big long flat @_
136 parameter list. This is one of the ways where Perl's simple
137 argument-passing style shines. The upcase() function would work perfectly
138 well without changing the upcase() definition even if we fed it things
141 @newlist = upcase(@list1, @list2);
142 @newlist = upcase( split /:/, $var );
144 Do not, however, be tempted to do this:
146 (@a, @b) = upcase(@list1, @list2);
148 Because like its flat incoming parameter list, the return list is also
149 flat. So all you have managed to do here is stored everything in @a and
150 made @b an empty list. See L</"Pass by Reference"> for alternatives.
152 A subroutine may be called using the "&" prefix. The "&" is optional in
153 Perl 5, and so are the parens if the subroutine has been predeclared.
154 (Note, however, that the "&" is I<NOT> optional when you're just naming
155 the subroutine, such as when it's used as an argument to defined() or
156 undef(). Nor is it optional when you want to do an indirect subroutine
157 call with a subroutine name or reference using the C<&$subref()> or
158 C<&{$subref}()> constructs. See L<perlref> for more on that.)
160 Subroutines may be called recursively. If a subroutine is called using
161 the "&" form, the argument list is optional, and if omitted, no @_ array is
162 set up for the subroutine: the @_ array at the time of the call is
163 visible to subroutine instead. This is an efficiency mechanism that
164 new users may wish to avoid.
166 &foo(1,2,3); # pass three arguments
167 foo(1,2,3); # the same
169 foo(); # pass a null list
172 &foo; # foo() get current args, like foo(@_) !!
173 foo; # like foo() IFF sub foo pre-declared, else "foo"
175 Not only does the "&" form make the argument list optional, but it also
176 disables any prototype checking on the arguments you do provide. This
177 is partly for historical reasons, and partly for having a convenient way
178 to cheat if you know what you're doing. See the section on Prototypes below.
180 =head2 Private Variables via my()
184 my $foo; # declare $foo lexically local
185 my (@wid, %get); # declare list of variables local
186 my $foo = "flurp"; # declare $foo lexical, and init it
187 my @oof = @bar; # declare @oof lexical, and init it
189 A "my" declares the listed variables to be confined (lexically) to the
190 enclosing block, subroutine, C<eval>, or C<do/require/use>'d file. If
191 more than one value is listed, the list must be placed in parens. All
192 listed elements must be legal lvalues. Only alphanumeric identifiers may
193 be lexically scoped--magical builtins like $/ must currently be localized with
196 Unlike dynamic variables created by the "local" statement, lexical
197 variables declared with "my" are totally hidden from the outside world,
198 including any called subroutines (even if it's the same subroutine called
199 from itself or elsewhere--every call gets its own copy).
201 (An eval(), however, can see the lexical variables of the scope it is
202 being evaluated in so long as the names aren't hidden by declarations within
203 the eval() itself. See L<perlref>.)
205 The parameter list to my() may be assigned to if desired, which allows you
206 to initialize your variables. (If no initializer is given for a
207 particular variable, it is created with the undefined value.) Commonly
208 this is used to name the parameters to a subroutine. Examples:
210 $arg = "fred"; # "global" variable
212 print "$arg thinks the root is $n\n";
213 fred thinks the root is 3
216 my $arg = shift; # name doesn't matter
221 The "my" is simply a modifier on something you might assign to. So when
222 you do assign to the variables in its argument list, the "my" doesn't
223 change whether those variables is viewed as a scalar or an array. So
228 both supply a list context to the righthand side, while
232 supplies a scalar context. But the following only declares one variable:
236 That has the same effect as
241 The declared variable is not introduced (is not visible) until after
242 the current statement. Thus,
246 can be used to initialize the new $x with the value of the old $x, and
249 my $x = 123 and $x == 123
251 is false unless the old $x happened to have the value 123.
253 Some users may wish to encourage the use of lexically scoped variables.
254 As an aid to catching implicit references to package variables,
259 then any variable reference from there to the end of the enclosing
260 block must either refer to a lexical variable, or must be fully
261 qualified with the package name. A compilation error results
262 otherwise. An inner block may countermand this with S<"no strict 'vars'">.
264 A my() has both a compile-time and a run-time effect. At compile time,
265 the compiler takes notice of it; the principle usefulness of this is to
266 quiet C<use strict 'vars'>. The actual initialization doesn't happen
267 until run time, so gets executed every time through a loop.
269 Variables declared with "my" are not part of any package and are therefore
270 never fully qualified with the package name. In particular, you're not
271 allowed to try to make a package variable (or other global) lexical:
273 my $pack::var; # ERROR! Illegal syntax
274 my $_; # also illegal (currently)
276 In fact, a dynamic variable (also known as package or global variables)
277 are still accessible using the fully qualified :: notation even while a
278 lexical of the same name is also visible:
283 print "$x and $::x\n";
285 That will print out 20 and 10.
287 You may declare "my" variables at the outer most scope of a file to
288 totally hide any such identifiers from the outside world. This is similar
289 to a C's static variables at the file level. To do this with a subroutine
290 requires the use of a closure (anonymous function). If a block (such as
291 an eval(), function, or C<package>) wants to create a private subroutine
292 that cannot be called from outside that block, it can declare a lexical
293 variable containing an anonymous sub reference:
295 my $secret_version = '1.001-beta';
296 my $secret_sub = sub { print $secret_version };
299 As long as the reference is never returned by any function within the
300 module, no outside module can see the subroutine, since its name is not in
301 any package's symbol table. Remember that it's not I<REALLY> called
302 $some_pack::secret_version or anything; it's just $secret_version,
303 unqualified and unqualifiable.
305 This does not work with object methods, however; all object methods have
306 to be in the symbol table of some package to be found.
308 Just because the lexical variable is lexically (also called statically)
309 scoped doesn't mean that within a function it works like a C static. It
310 normally works more like a C auto. But here's a mechanism for giving a
311 function private variables with both lexical scoping and a static
312 lifetime. If you do want to create something like C's static variables,
313 just enclose the whole function in an extra block, and put the
314 static variable outside the function but in the block.
319 return ++$secret_val;
322 # $secret_val now becomes unreachable by the outside
323 # world, but retains its value between calls to gimme_another
325 If this function is being sourced in from a separate file
326 via C<require> or C<use>, then this is probably just fine. If it's
327 all in the main program, you'll need to arrange for the my()
328 to be executed early, either by putting the whole block above
329 your pain program, or more likely, merely placing a BEGIN
330 sub around it to make sure it gets executed before your program
336 return ++$secret_val;
340 See L<perlrun> about the BEGIN function.
342 =head2 Temporary Values via local()
344 B<NOTE>: In general, you should be using "my" instead of "local", because
345 it's faster and safer. Execeptions to this include the global punctuation
346 variables, filehandles and formats, and direct manipulation of the Perl
347 symbol table itself. Format variables often use "local" though, as do
348 other variables whose current value must be visible to called
353 local $foo; # declare $foo dynamically local
354 local (@wid, %get); # declare list of variables local
355 local $foo = "flurp"; # declare $foo dynamic, and init it
356 local @oof = @bar; # declare @oof dynamic, and init it
358 local *FH; # localize $FH, @FH, %FH, &FH ...
359 local *merlyn = *randal; # now $merlyn is really $randal, plus
360 # @merlyn is really @randal, etc
361 local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal
362 local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc
364 A local() modifies its listed variables to be local to the enclosing
365 block, (or subroutine, C<eval{}> or C<do>) and I<the any called from
366 within that block>. A local() just gives temporary values to global
367 (meaning package) variables. This is known as dynamic scoping. Lexical
368 scoping is done with "my", which works more like C's auto declarations.
370 If more than one variable is given to local(), they must be placed in
371 parens. All listed elements must be legal lvalues. This operator works
372 by saving the current values of those variables in its argument list on a
373 hidden stack and restoring them upon exiting the block, subroutine or
374 eval. This means that called subroutines can also reference the local
375 variable, but not the global one. The argument list may be assigned to if
376 desired, which allows you to initialize your local variables. (If no
377 initializer is given for a particular variable, it is created with an
378 undefined value.) Commonly this is used to name the parameters to a
379 subroutine. Examples:
384 # assume this function uses global %digits hash
387 # now temporarily add to %digits hash
389 # (NOTE: not claiming this is efficient!)
390 local %digits = (%digits, 't' => 10, 'e' => 11);
391 parse_num(); # parse_num gets this new %digits!
393 # old %digits restored here
395 Because local() is a run-time command, and so gets executed every time
396 through a loop. In releases of Perl previous to 5.0, this used more stack
397 storage each time until the loop was exited. Perl now reclaims the space
398 each time through, but it's still more efficient to declare your variables
401 A local is simply a modifier on an lvalue expression. When you assign to
402 a localized variable, the local doesn't change whether its list is viewed
403 as a scalar or an array. So
405 local($foo) = <STDIN>;
406 local @FOO = <STDIN>;
408 both supply a list context to the righthand side, while
410 local $foo = <STDIN>;
412 supplies a scalar context.
414 =head2 Passing Symbol Table Entries (typeglobs)
416 [Note: The mechanism described in this section was originally the only
417 way to simulate pass-by-reference in older versions of Perl. While it
418 still works fine in modern versions, the new reference mechanism is
419 generally easier to work with. See below.]
421 Sometimes you don't want to pass the value of an array to a subroutine
422 but rather the name of it, so that the subroutine can modify the global
423 copy of it rather than working with a local copy. In perl you can
424 refer to all objects of a particular name by prefixing the name
425 with a star: C<*foo>. This is often known as a "type glob", since the
426 star on the front can be thought of as a wildcard match for all the
427 funny prefix characters on variables and subroutines and such.
429 When evaluated, the type glob produces a scalar value that represents
430 all the objects of that name, including any filehandle, format or
431 subroutine. When assigned to, it causes the name mentioned to refer to
432 whatever "*" value was assigned to it. Example:
435 local(*someary) = @_;
436 foreach $elem (@someary) {
443 Note that scalars are already passed by reference, so you can modify
444 scalar arguments without using this mechanism by referring explicitly
445 to $_[0] etc. You can modify all the elements of an array by passing
446 all the elements as scalars, but you have to use the * mechanism (or
447 the equivalent reference mechanism) to push, pop or change the size of
448 an array. It will certainly be faster to pass the typeglob (or reference).
450 Even if you don't want to modify an array, this mechanism is useful for
451 passing multiple arrays in a single LIST, since normally the LIST
452 mechanism will merge all the array values so that you can't extract out
453 the individual arrays. For more on typeglobs, see L<perldata/"Typeglobs">.
455 =head2 Pass by Reference
457 If you want to pass more than one array or hash into a function--or
458 return them from it--and have them maintain their integrity,
459 then you're going to have to use an explicit pass-by-reference.
460 Before you do that, you need to understand references as detailed in L<perlref>.
461 This section may not make much sense to you otherwise.
463 Here are a few simple examples. First, let's pass in several
464 arrays to a function and have it pop all of then, return a new
465 list of all their former last elements:
467 @tailings = popmany ( \@a, \@b, \@c, \@d );
472 foreach $aref ( @_ ) {
473 push @retlist, pop @$aref;
478 Here's how you might write a function that returns a
479 list of keys occurring in all the hashes passed to it:
481 @common = inter( \%foo, \%bar, \%joe );
483 my ($k, $href, %seen); # locals
485 while ( $k = each %$href ) {
489 return grep { $seen{$_} == @_ } keys %seen;
492 So far, we're just using the normal list return mechanism.
493 What happens if you want to pass or return a hash? Well,
494 if you're only using one of them, or you don't mind them
495 concatenating, then the normal calling convention is ok, although
498 Where people get into trouble is here:
500 (@a, @b) = func(@c, @d);
502 (%a, %b) = func(%c, %d);
504 That syntax simply won't work. It just sets @a or %a and clears the @b or
505 %b. Plus the function didn't get passed into two separate arrays or
506 hashes: it got one long list in @_, as always.
508 If you can arrange for everyone to deal with this through references, it's
509 cleaner code, although not so nice to look at. Here's a function that
510 takes two array references as arguments, returning the two array elements
511 in order of how many elements they have in them:
513 ($aref, $bref) = func(\@c, \@d);
514 print "@$aref has more than @$bref\n";
516 my ($cref, $dref) = @_;
517 if (@$cref > @$dref) {
518 return ($cref, $dref);
520 return ($dref, $cref);
524 It turns out that you can actually do this also:
526 (*a, *b) = func(\@c, \@d);
527 print "@a has more than @b\n";
537 Here we're using the typeglobs to do symbol table aliasing. It's
538 a tad subtle, though, and also won't work if you're using my()
539 variables, since only globals (well, and local()s) are in the symbol table.
541 If you're passing around filehandles, you could usually just use the bare
542 typeglob, like *STDOUT, but typeglobs references would be better because
543 they'll still work properly under C<use strict 'refs'>. For example:
548 print $fh "her um well a hmmm\n";
551 $rec = get_rec(\*STDIN);
557 If you're planning on generating new filehandles, you could do this:
562 return open (FH, $path) ? \*FH : undef;
565 Although that will actually produce a small memory leak. See the bottom
566 of L<perlfunc/open()> for a somewhat cleaner way using the FileHandle
567 functions supplied with the POSIX package.
571 As of the 5.002 release of perl, if you declare
575 then mypush() takes arguments exactly like push() does. The declaration
576 of the function to be called must be visible at compile time. The prototype
577 only affects the interpretation of new-style calls to the function, where
578 new-style is defined as not using the C<&> character. In other words,
579 if you call it like a builtin function, then it behaves like a builtin
580 function. If you call it like an old-fashioned subroutine, then it
581 behaves like an old-fashioned subroutine. It naturally falls out from
582 this rule that prototypes have no influence on subroutine references
583 like C<\&foo> or on indirect subroutine calls like C<&{$subref}>.
585 Method calls are not influenced by prototypes either, because the
586 function to be called is indeterminate at compile time, since it depends
589 Since the intent is primarily to let you define subroutines that work
590 like builtin commands, here are the prototypes for some other functions
591 that parse almost exactly like the corresponding builtins.
593 Declared as Called as
595 sub mylink ($$) mylink $old, $new
596 sub myvec ($$$) myvec $var, $offset, 1
597 sub myindex ($$;$) myindex &getstring, "substr"
598 sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off
599 sub myreverse (@) myreverse $a,$b,$c
600 sub myjoin ($@) myjoin ":",$a,$b,$c
601 sub mypop (\@) mypop @array
602 sub mysplice (\@$$@) mysplice @array,@array,0,@pushme
603 sub mykeys (\%) mykeys %{$hashref}
604 sub myopen (*;$) myopen HANDLE, $name
605 sub mypipe (**) mypipe READHANDLE, WRITEHANDLE
606 sub mygrep (&@) mygrep { /foo/ } $a,$b,$c
607 sub myrand ($) myrand 42
610 Any backslashed prototype character represents an actual argument
611 that absolutely must start with that character.
613 Unbackslashed prototype characters have special meanings. Any
614 unbackslashed @ or % eats all the rest of the arguments, and forces
615 list context. An argument represented by $ forces scalar context. An
616 & requires an anonymous subroutine, which, if passed as the first
617 argument, does not require the "sub" keyword or a subsequent comma. A
618 * does whatever it has to do to turn the argument into a reference to a
621 A semicolon separates mandatory arguments from optional arguments.
622 (It is redundant before @ or %.)
624 Note how the last three examples above are treated specially by the parser.
625 mygrep() is parsed as a true list operator, myrand() is parsed as a
626 true unary operator with unary precedence the same as rand(), and
627 mytime() is truly argumentless, just like time(). That is, if you
632 you'll get mytime() + 2, not mytime(2), which is how it would be parsed
633 without the prototype.
635 The interesting thing about & is that you can generate new syntax with it:
638 my($try,$catch) = @_;
650 /phooey/ and print "unphooey\n";
653 That prints "unphooey". (Yes, there are still unresolved
654 issues having to do with the visibility of @_. I'm ignoring that
655 question for the moment. (But note that if we make @_ lexically
656 scoped, those anonymous subroutines can act like closures... (Gee,
657 is this sounding a little Lispish? (Nevermind.))))
659 And here's a reimplementation of grep:
665 push(@result, $_) if &$ref;
670 Some folks would prefer full alphanumeric prototypes. Alphanumerics have
671 been intentionally left out of prototypes for the express purpose of
672 someday in the future adding named, formal parameters. The current
673 mechanism's main goal is to let module writers provide better diagnostics
674 for module users. Larry feels the notation quite understandable to Perl
675 programmers, and that it will not intrude greatly upon the meat of the
676 module, nor make it harder to read. The line noise is visually
677 encapsulated into a small pill that's easy to swallow.
679 It's probably best to prototype new functions, not retrofit prototyping
680 into older ones. That's because you must be especially careful about
681 silent impositions of differing list versus scalar contexts. For example,
682 if you decide that a function should take just one parameter, like this:
686 print "you gave me $n\n";
689 and someone has been calling it with an array or expression
695 Then you've just supplied an automatic scalar() in front of their
696 argument, which can be more than a bit surprising. The old @foo
697 which used to hold one thing doesn't get passed in. Instead,
698 the func() now gets passed in 1, that is, the number of elments
699 in @foo. And the split() gets called in a scalar context and
700 starts scribbling on your @_ parameter list.
702 This is all very powerful, of course, and should only be used in moderation
703 to make the world a better place.
705 =head2 Overriding Builtin Functions
707 Many builtin functions may be overridden, though this should only be
708 tried occasionally and for good reason. Typically this might be
709 done by a package attempting to emulate missing builtin functionality
710 on a non-Unix system.
712 Overriding may only be done by importing the name from a
713 module--ordinary predeclaration isn't good enough. However, the
714 C<subs> pragma (compiler directive) lets you, in effect, predeclare subs
715 via the import syntax, and these names may then override the builtin ones:
717 use subs 'chdir', 'chroot', 'chmod', 'chown';
721 Library modules should not in general export builtin names like "open"
722 or "chdir" as part of their default @EXPORT list, since these may
723 sneak into someone else's namespace and change the semantics unexpectedly.
724 Instead, if the module adds the name to the @EXPORT_OK list, then it's
725 possible for a user to import the name explicitly, but not implicitly.
726 That is, they could say
730 and it would import the open override, but if they said
734 they would get the default imports without the overrides.
738 If you call a subroutine that is undefined, you would ordinarily get an
739 immediate fatal error complaining that the subroutine doesn't exist.
740 (Likewise for subroutines being used as methods, when the method
741 doesn't exist in any of the base classes of the class package.) If,
742 however, there is an C<AUTOLOAD> subroutine defined in the package or
743 packages that were searched for the original subroutine, then that
744 C<AUTOLOAD> subroutine is called with the arguments that would have been
745 passed to the original subroutine. The fully qualified name of the
746 original subroutine magically appears in the $AUTOLOAD variable in the
747 same package as the C<AUTOLOAD> routine. The name is not passed as an
748 ordinary argument because, er, well, just because, that's why...
750 Most C<AUTOLOAD> routines will load in a definition for the subroutine in
751 question using eval, and then execute that subroutine using a special
752 form of "goto" that erases the stack frame of the C<AUTOLOAD> routine
753 without a trace. (See the standard C<AutoLoader> module, for example.)
754 But an C<AUTOLOAD> routine can also just emulate the routine and never
755 define it. For example, let's pretend that a function that wasn't defined
756 should just call system() with those arguments. All you'd do is this:
759 my $program = $AUTOLOAD;
760 $program =~ s/.*:://;
761 system($program, @_);
767 In fact, if you preclare the functions you want to call that way, you don't
768 even need the parentheses:
770 use subs qw(date who ls);
775 A more complete example of this is the standard Shell module, which
776 can treat undefined subroutine calls as calls to Unix programs.
778 Mechanisms are available for modules writers to help split the modules
779 up into autoloadable files. See the standard AutoLoader module described
780 in L<Autoloader>, the standard SelfLoader modules in L<SelfLoader>, and
781 the document on adding C functions to perl code in L<perlxs>.
785 See L<perlref> for more on references. See L<perlxs> if you'd
786 like to learn about calling C subroutines from perl. See
787 L<perlmod> to learn about bundling up your functions in