pod/perlsub.pod

   1 =head1 NAME
   2
   3 perlsub - Perl subroutines
   4
   5 =head1 SYNOPSIS
   6
   7 To declare subroutines:
   8
   9     sub NAME;             # A "forward" declaration.
  10     sub NAME(PROTO);      #  ditto, but with prototypes
  11
  12     sub NAME BLOCK        # A declaration and a definition.
  13     sub NAME(PROTO) BLOCK #  ditto, but with prototypes
  14
  15 To define an anonymous subroutine at runtime:
  16
  17     $subref = sub BLOCK;
  18
  19 To import subroutines:
  20
  21     use PACKAGE qw(NAME1 NAME2 NAME3);
  22
  23 To call subroutines:
  24
  25     NAME(LIST);    # & is optional with parens.
  26     NAME LIST;     # Parens optional if predeclared/imported.
  27     &NAME;         # Passes current @_ to subroutine.
  28
  29 =head1 DESCRIPTION
  30
  31 Like many languages, Perl provides for user-defined subroutines.  These
  32 may be located anywhere in the main program, loaded in from other files
  33 via the C<do>, C<require>, or C<use> keywords, or even generated on the
  34 fly using C<eval> or anonymous subroutines (closures).  You can even call
  35 a function indirectly using a variable containing its name or a CODE reference
  36 to it, as in C<$var = \&function>.
  37
  38 The Perl model for function call and return values is simple: all
  39 functions are passed as parameters one single flat list of scalars, and
  40 all functions likewise return to their caller one single flat list of
  41 scalars.  Any arrays or hashes in these call and return lists will
  42 collapse, losing their identities--but you may always use
  43 pass-by-reference instead to avoid this.  Both call and return lists may
  44 contain as many or as few scalar elements as you'd like.  (Often a
  45 function without an explicit return statement is called a subroutine, but
  46 there's really no difference from the language's perspective.)
  47
  48 Any arguments passed to the routine come in as the array @_.  Thus if you
  49 called a function with two arguments, those would be stored in C<$_[0]>
  50 and C<$_[1]>.  The array @_ is a local array, but its values are implicit
  51 references (predating L<perlref>) to the actual scalar parameters.  The
  52 return value of the subroutine is the value of the last expression
  53 evaluated.  Alternatively, a return statement may be used to specify the
  54 returned value and exit the subroutine.  If you return one or more arrays
  55 and/or hashes, these will be flattened together into one large
  56 indistinguishable list.
  57
  58 Perl does not have named formal parameters, but in practice all you do is
  59 assign to a my() list of these.  Any variables you use in the function
  60 that aren't declared private are global variables.  For the gory details
  61 on creating private variables, see
  62 L<"Private Variables via my()"> and L<"Temporary Values via local()">.
  63 To create protected environments for a set of functions in a separate
  64 package (and probably a separate file), see L<perlmod/"Packages">.
  65
  66 Example:
  67
  68     sub max {
  69         my $max = shift(@_);
  70         foreach $foo (@_) {
  71             $max = $foo if $max < $foo;
  72         }
  73         return $max;
  74     }
  75     $bestday = max($mon,$tue,$wed,$thu,$fri);
  76
  77 Example:
  78
  79     # get a line, combining continuation lines
  80     #  that start with whitespace
  81
  82     sub get_line {
  83         $thisline = $lookahead;  # GLOBAL VARIABLES!!
  84         LINE: while ($lookahead = <STDIN>) {
  85             if ($lookahead =~ /^[ \t]/) {
  86                 $thisline .= $lookahead;
  87             }
  88             else {
  89                 last LINE;
  90             }
  91         }
  92         $thisline;
  93     }
  94
  95     $lookahead = <STDIN>;       # get first line
  96     while ($_ = get_line()) {
  97         ...
  98     }
  99
 100 Use array assignment to a local list to name your formal arguments:
 101
 102     sub maybeset {
 103         my($key, $value) = @_;
 104         $Foo{$key} = $value unless $Foo{$key};
 105     }
 106
 107 This also has the effect of turning call-by-reference into call-by-value,
 108 since the assignment copies the values.  Otherwise a function is free to
 109 do in-place modifications of @_ and change its caller's values.
 110
 111     upcase_in($v1, $v2);  # this changes $v1 and $v2
 112     sub upcase_in {
 113         for (@_) { tr/a-z/A-Z/ }
 114     }
 115
 116 You aren't allowed to modify constants in this way, of course.  If an
 117 argument were actually literal and you tried to change it, you'd take a
 118 (presumably fatal) exception.   For example, this won't work:
 119
 120     upcase_in("frederick");
 121
 122 It would be much safer if the upcase_in() function
 123 were written to return a copy of its parameters instead
 124 of changing them in place:
 125
 126     ($v3, $v4) = upcase($v1, $v2);  # this doesn't
 127     sub upcase {
 128         my @parms = @_;
 129         for (@parms) { tr/a-z/A-Z/ }
 130         # wantarray checks if we were called in list context
 131         return wantarray ? @parms : $parms[0];
 132     }
 133
 134 Notice how this (unprototyped) function doesn't care whether it was passed
 135 real scalars or arrays.  Perl will see everything as one big long flat @_
 136 parameter list.  This is one of the ways where Perl's simple
 137 argument-passing style shines.  The upcase() function would work perfectly
 138 well without changing the upcase() definition even if we fed it things
 139 like this:
 140
 141     @newlist   = upcase(@list1, @list2);
 142     @newlist   = upcase( split /:/, $var );
 143
 144 Do not, however, be tempted to do this:
 145
 146     (@a, @b)   = upcase(@list1, @list2);
 147
 148 Because like its flat incoming parameter list, the return list is also
 149 flat.  So all you have managed to do here is stored everything in @a and
 150 made @b an empty list.  See L</"Pass by Reference"> for alternatives.
 151
 152 A subroutine may be called using the "&" prefix.  The "&" is optional in
 153 Perl 5, and so are the parens if the subroutine has been predeclared.
 154 (Note, however, that the "&" is I<NOT> optional when you're just naming
 155 the subroutine, such as when it's used as an argument to defined() or
 156 undef().  Nor is it optional when you want to do an indirect subroutine
 157 call with a subroutine name or reference using the C<&$subref()> or
 158 C<&{$subref}()> constructs.  See L<perlref> for more on that.)
 159
 160 Subroutines may be called recursively.  If a subroutine is called using
 161 the "&" form, the argument list is optional, and if omitted, no @_ array is
 162 set up for the subroutine: the @_ array at the time of the call is
 163 visible to subroutine instead.  This is an efficiency mechanism that
 164 new users may wish to avoid.
 165
 166     &foo(1,2,3);        # pass three arguments
 167     foo(1,2,3);         # the same
 168
 169     foo();              # pass a null list
 170     &foo();             # the same
 171
 172     &foo;               # foo() get current args, like foo(@_) !!
 173     foo;                # like foo() IFF sub foo pre-declared, else "foo"
 174
 175 Not only does the "&" form make the argument list optional, but it also
 176 disables any prototype checking on the arguments you do provide.  This
 177 is partly for historical reasons, and partly for having a convenient way
 178 to cheat if you know what you're doing.  See the section on Prototypes below.
 179
 180 =head2 Private Variables via my()
 181
 182 Synopsis:
 183
 184     my $foo;            # declare $foo lexically local
 185     my (@wid, %get);    # declare list of variables local
 186     my $foo = "flurp";  # declare $foo lexical, and init it
 187     my @oof = @bar;     # declare @oof lexical, and init it
 188
 189 A "my" declares the listed variables to be confined (lexically) to the
 190 enclosing block, conditional (C<if/unless/elsif/else>), loop
 191 (C<for/foreach/while/until/continue>), subroutine, C<eval>, or
 192 C<do/require/use>'d file.  If more than one value is listed, the list
 193 must be placed in parens.  All listed elements must be legal lvalues.
 194 Only alphanumeric identifiers may be lexically scoped--magical
 195 builtins like $/ must currently be localized with "local" instead.
 196
 197 Unlike dynamic variables created by the "local" statement, lexical
 198 variables declared with "my" are totally hidden from the outside world,
 199 including any called subroutines (even if it's the same subroutine called
 200 from itself or elsewhere--every call gets its own copy).
 201
 202 (An eval(), however, can see the lexical variables of the scope it is
 203 being evaluated in so long as the names aren't hidden by declarations within
 204 the eval() itself.  See L<perlref>.)
 205
 206 The parameter list to my() may be assigned to if desired, which allows you
 207 to initialize your variables.  (If no initializer is given for a
 208 particular variable, it is created with the undefined value.)  Commonly
 209 this is used to name the parameters to a subroutine.  Examples:
 210
 211     $arg = "fred";        # "global" variable
 212     $n = cube_root(27);
 213     print "$arg thinks the root is $n\n";
 214  fred thinks the root is 3
 215
 216     sub cube_root {
 217         my $arg = shift;  # name doesn't matter
 218         $arg **= 1/3;
 219         return $arg;
 220     }
 221
 222 The "my" is simply a modifier on something you might assign to.  So when
 223 you do assign to the variables in its argument list, the "my" doesn't
 224 change whether those variables is viewed as a scalar or an array.  So
 225
 226     my ($foo) = <STDIN>;
 227     my @FOO = <STDIN>;
 228
 229 both supply a list context to the righthand side, while
 230
 231     my $foo = <STDIN>;
 232
 233 supplies a scalar context.  But the following only declares one variable:
 234
 235     my $foo, $bar = 1;
 236
 237 That has the same effect as
 238
 239     my $foo;
 240     $bar = 1;
 241
 242 The declared variable is not introduced (is not visible) until after
 243 the current statement.  Thus,
 244
 245     my $x = $x;
 246
 247 can be used to initialize the new $x with the value of the old $x, and
 248 the expression
 249
 250     my $x = 123 and $x == 123
 251
 252 is false unless the old $x happened to have the value 123.
 253
 254 Lexical scopes of control structures are not bounded precisely by the
 255 braces that delimit their controlled blocks; control expressions are
 256 part of the scope, too.  Thus in the loop
 257
 258     while (my $line = <>) {
 259         $line = lc $line;
 260     } continue {
 261         print $line;
 262     }
 263
 264 the scope of $line extends from its declaration throughout the rest of
 265 the loop construct (including the C<continue> clause), but not beyond
 266 it.  Similarly, in the conditional
 267
 268     if ((my $answer = <STDIN>) =~ /^yes$/i) {
 269         user_agrees();
 270     } elsif ($answer =~ /^no$/i) {
 271         user_disagrees();
 272     } else {
 273         chomp $answer;
 274         die "'$answer' is neither 'yes' nor 'no'";
 275     }
 276
 277 the scope of $answer extends from its declaration throughout the rest
 278 of the conditional (including C<elsif> and C<else> clauses, if any),
 279 but not beyond it.
 280
 281 (None of the foregoing applies to C<if/unless> or C<while/until>
 282 modifiers appended to simple statements.  Such modifiers are not
 283 control structures and have no effect on scoping.)
 284
 285 The C<foreach> loop defaults to dynamically scoping its index variable
 286 (in the manner of C<local>; see below).  However, if the index
 287 variable is prefixed with the keyword "my", then it is lexically
 288 scoped instead.  Thus in the loop
 289
 290     for my $i (1, 2, 3) {
 291         some_function();
 292     }
 293
 294 the scope of $i extends to the end of the loop, but not beyond it, and
 295 so the value of $i is unavailable in some_function().
 296
 297 Some users may wish to encourage the use of lexically scoped variables.
 298 As an aid to catching implicit references to package variables,
 299 if you say
 300
 301     use strict 'vars';
 302
 303 then any variable reference from there to the end of the enclosing
 304 block must either refer to a lexical variable, or must be fully
 305 qualified with the package name.  A compilation error results
 306 otherwise.  An inner block may countermand this with S<"no strict 'vars'">.
 307
 308 A my() has both a compile-time and a run-time effect.  At compile time,
 309 the compiler takes notice of it; the principle usefulness of this is to
 310 quiet C<use strict 'vars'>.  The actual initialization doesn't happen
 311 until run time, so gets executed every time through a loop.
 312
 313 Variables declared with "my" are not part of any package and are therefore
 314 never fully qualified with the package name.  In particular, you're not
 315 allowed to try to make a package variable (or other global) lexical:
 316
 317     my $pack::var;      # ERROR!  Illegal syntax
 318     my $_;              # also illegal (currently)
 319
 320 In fact, a dynamic variable (also known as package or global variables)
 321 are still accessible using the fully qualified :: notation even while a
 322 lexical of the same name is also visible:
 323
 324     package main;
 325     local $x = 10;
 326     my    $x = 20;
 327     print "$x and $::x\n";
 328
 329 That will print out 20 and 10.
 330
 331 You may declare "my" variables at the outer most scope of a file to
 332 totally hide any such identifiers from the outside world.  This is similar
 333 to C's static variables at the file level.  To do this with a subroutine
 334 requires the use of a closure (anonymous function).  If a block (such as
 335 an eval(), function, or C<package>) wants to create a private subroutine
 336 that cannot be called from outside that block, it can declare a lexical
 337 variable containing an anonymous sub reference:
 338
 339     my $secret_version = '1.001-beta';
 340     my $secret_sub = sub { print $secret_version };
 341     &$secret_sub();
 342
 343 As long as the reference is never returned by any function within the
 344 module, no outside module can see the subroutine, since its name is not in
 345 any package's symbol table.  Remember that it's not I<REALLY> called
 346 $some_pack::secret_version or anything; it's just $secret_version,
 347 unqualified and unqualifiable.
 348
 349 This does not work with object methods, however; all object methods have
 350 to be in the symbol table of some package to be found.
 351
 352 Just because the lexical variable is lexically (also called statically)
 353 scoped doesn't mean that within a function it works like a C static.  It
 354 normally works more like a C auto.  But here's a mechanism for giving a
 355 function private variables with both lexical scoping and a static
 356 lifetime.  If you do want to create something like C's static variables,
 357 just enclose the whole function in an extra block, and put the
 358 static variable outside the function but in the block.
 359
 360     {
 361         my $secret_val = 0;
 362         sub gimme_another {
 363             return ++$secret_val;
 364         }
 365     }
 366     # $secret_val now becomes unreachable by the outside
 367     # world, but retains its value between calls to gimme_another
 368
 369 If this function is being sourced in from a separate file
 370 via C<require> or C<use>, then this is probably just fine.  If it's
 371 all in the main program, you'll need to arrange for the my()
 372 to be executed early, either by putting the whole block above
 373 your pain program, or more likely, merely placing a BEGIN
 374 sub around it to make sure it gets executed before your program
 375 starts to run:
 376
 377     sub BEGIN {
 378         my $secret_val = 0;
 379         sub gimme_another {
 380             return ++$secret_val;
 381         }
 382     }
 383
 384 See L<perlrun> about the BEGIN function.
 385
 386 =head2 Temporary Values via local()
 387
 388 B<NOTE>: In general, you should be using "my" instead of "local", because
 389 it's faster and safer.  Exceptions to this include the global punctuation
 390 variables, filehandles and formats, and direct manipulation of the Perl
 391 symbol table itself.  Format variables often use "local" though, as do
 392 other variables whose current value must be visible to called
 393 subroutines.
 394
 395 Synopsis:
 396
 397     local $foo;                 # declare $foo dynamically local
 398     local (@wid, %get);         # declare list of variables local
 399     local $foo = "flurp";       # declare $foo dynamic, and init it
 400     local @oof = @bar;          # declare @oof dynamic, and init it
 401
 402     local *FH;                  # localize $FH, @FH, %FH, &FH  ...
 403     local *merlyn = *randal;    # now $merlyn is really $randal, plus
 404                                 #     @merlyn is really @randal, etc
 405     local *merlyn = 'randal';   # SAME THING: promote 'randal' to *randal
 406     local *merlyn = \$randal;   # just alias $merlyn, not @merlyn etc
 407
 408 A local() modifies its listed variables to be local to the enclosing
 409 block, (or subroutine, C<eval{}> or C<do>) and I<any called from
 410 within that block>.  A local() just gives temporary values to global
 411 (meaning package) variables.  This is known as dynamic scoping.  Lexical
 412 scoping is done with "my", which works more like C's auto declarations.
 413
 414 If more than one variable is given to local(), they must be placed in
 415 parens.  All listed elements must be legal lvalues.  This operator works
 416 by saving the current values of those variables in its argument list on a
 417 hidden stack and restoring them upon exiting the block, subroutine or
 418 eval.  This means that called subroutines can also reference the local
 419 variable, but not the global one.  The argument list may be assigned to if
 420 desired, which allows you to initialize your local variables.  (If no
 421 initializer is given for a particular variable, it is created with an
 422 undefined value.)  Commonly this is used to name the parameters to a
 423 subroutine.  Examples:
 424
 425     for $i ( 0 .. 9 ) {
 426         $digits{$i} = $i;
 427     }
 428     # assume this function uses global %digits hash
 429     parse_num();
 430
 431     # now temporarily add to %digits hash
 432     if ($base12) {
 433         # (NOTE: not claiming this is efficient!)
 434         local %digits  = (%digits, 't' => 10, 'e' => 11);
 435         parse_num();  # parse_num gets this new %digits!
 436     }
 437     # old %digits restored here
 438
 439 Because local() is a run-time command, it gets executed every time
 440 through a loop.  In releases of Perl previous to 5.0, this used more stack
 441 storage each time until the loop was exited.  Perl now reclaims the space
 442 each time through, but it's still more efficient to declare your variables
 443 outside the loop.
 444
 445 A local is simply a modifier on an lvalue expression.  When you assign to
 446 a localized variable, the local doesn't change whether its list is viewed
 447 as a scalar or an array.  So
 448
 449     local($foo) = <STDIN>;
 450     local @FOO = <STDIN>;
 451
 452 both supply a list context to the righthand side, while
 453
 454     local $foo = <STDIN>;
 455
 456 supplies a scalar context.
 457
 458 =head2 Passing Symbol Table Entries (typeglobs)
 459
 460 [Note:  The mechanism described in this section was originally the only
 461 way to simulate pass-by-reference in older versions of Perl.  While it
 462 still works fine in modern versions, the new reference mechanism is
 463 generally easier to work with.  See below.]
 464
 465 Sometimes you don't want to pass the value of an array to a subroutine
 466 but rather the name of it, so that the subroutine can modify the global
 467 copy of it rather than working with a local copy.  In perl you can
 468 refer to all objects of a particular name by prefixing the name
 469 with a star: C<*foo>.  This is often known as a "typeglob", since the
 470 star on the front can be thought of as a wildcard match for all the
 471 funny prefix characters on variables and subroutines and such.
 472
 473 When evaluated, the typeglob produces a scalar value that represents
 474 all the objects of that name, including any filehandle, format or
 475 subroutine.  When assigned to, it causes the name mentioned to refer to
 476 whatever "*" value was assigned to it.  Example:
 477
 478     sub doubleary {
 479         local(*someary) = @_;
 480         foreach $elem (@someary) {
 481             $elem *= 2;
 482         }
 483     }
 484     doubleary(*foo);
 485     doubleary(*bar);
 486
 487 Note that scalars are already passed by reference, so you can modify
 488 scalar arguments without using this mechanism by referring explicitly
 489 to C<$_[0]> etc.  You can modify all the elements of an array by passing
 490 all the elements as scalars, but you have to use the * mechanism (or
 491 the equivalent reference mechanism) to push, pop or change the size of
 492 an array.  It will certainly be faster to pass the typeglob (or reference).
 493
 494 Even if you don't want to modify an array, this mechanism is useful for
 495 passing multiple arrays in a single LIST, since normally the LIST
 496 mechanism will merge all the array values so that you can't extract out
 497 the individual arrays.  For more on typeglobs, see
 498 L<perldata/"Typeglobs and FileHandles">.
 499
 500 =head2 Pass by Reference
 501
 502 If you want to pass more than one array or hash into a function--or
 503 return them from it--and have them maintain their integrity, then
 504 you're going to have to use an explicit pass-by-reference.  Before you
 505 do that, you need to understand references as detailed in L<perlref>.
 506 This section may not make much sense to you otherwise.
 507
 508 Here are a few simple examples.  First, let's pass in several
 509 arrays to a function and have it pop all of then, return a new
 510 list of all their former last elements:
 511
 512     @tailings = popmany ( \@a, \@b, \@c, \@d );
 513
 514     sub popmany {
 515         my $aref;
 516         my @retlist = ();
 517         foreach $aref ( @_ ) {
 518             push @retlist, pop @$aref;
 519         }
 520         return @retlist;
 521     }
 522
 523 Here's how you might write a function that returns a
 524 list of keys occurring in all the hashes passed to it:
 525
 526     @common = inter( \%foo, \%bar, \%joe );
 527     sub inter {
 528         my ($k, $href, %seen); # locals
 529         foreach $href (@_) {
 530             while ( $k = each %$href ) {
 531                 $seen{$k}++;
 532             }
 533         }
 534         return grep { $seen{$_} == @_ } keys %seen;
 535     }
 536
 537 So far, we're just using the normal list return mechanism.
 538 What happens if you want to pass or return a hash?  Well,
 539 if you're only using one of them, or you don't mind them
 540 concatenating, then the normal calling convention is ok, although
 541 a little expensive.
 542
 543 Where people get into trouble is here:
 544
 545     (@a, @b) = func(@c, @d);
 546 or
 547     (%a, %b) = func(%c, %d);
 548
 549 That syntax simply won't work.  It just sets @a or %a and clears the @b or
 550 %b.  Plus the function didn't get passed into two separate arrays or
 551 hashes: it got one long list in @_, as always.
 552
 553 If you can arrange for everyone to deal with this through references, it's
 554 cleaner code, although not so nice to look at.  Here's a function that
 555 takes two array references as arguments, returning the two array elements
 556 in order of how many elements they have in them:
 557
 558     ($aref, $bref) = func(\@c, \@d);
 559     print "@$aref has more than @$bref\n";
 560     sub func {
 561         my ($cref, $dref) = @_;
 562         if (@$cref > @$dref) {
 563             return ($cref, $dref);
 564         } else {
 565             return ($dref, $cref);
 566         }
 567     }
 568
 569 It turns out that you can actually do this also:
 570
 571     (*a, *b) = func(\@c, \@d);
 572     print "@a has more than @b\n";
 573     sub func {
 574         local (*c, *d) = @_;
 575         if (@c > @d) {
 576             return (\@c, \@d);
 577         } else {
 578             return (\@d, \@c);
 579         }
 580     }
 581
 582 Here we're using the typeglobs to do symbol table aliasing.  It's
 583 a tad subtle, though, and also won't work if you're using my()
 584 variables, since only globals (well, and local()s) are in the symbol table.
 585
 586 =head2 Prototypes
 587
 588 As of the 5.002 release of perl, if you declare
 589
 590     sub mypush (\@@)
 591
 592 then mypush() takes arguments exactly like push() does.  The declaration
 593 of the function to be called must be visible at compile time.  The prototype
 594 only affects the interpretation of new-style calls to the function, where
 595 new-style is defined as not using the C<&> character.  In other words,
 596 if you call it like a builtin function, then it behaves like a builtin
 597 function.  If you call it like an old-fashioned subroutine, then it
 598 behaves like an old-fashioned subroutine.  It naturally falls out from
 599 this rule that prototypes have no influence on subroutine references
 600 like C<\&foo> or on indirect subroutine calls like C<&{$subref}>.
 601
 602 Method calls are not influenced by prototypes either, because the
 603 function to be called is indeterminate at compile time, since it depends
 604 on inheritance.
 605
 606 Since the intent is primarily to let you define subroutines that work
 607 like builtin commands, here are the prototypes for some other functions
 608 that parse almost exactly like the corresponding builtins.
 609
 610     Declared as                 Called as
 611
 612     sub mylink ($$)             mylink $old, $new
 613     sub myvec ($$$)             myvec $var, $offset, 1
 614     sub myindex ($$;$)          myindex &getstring, "substr"
 615     sub mysyswrite ($$$;$)      mysyswrite $buf, 0, length($buf) - $off, $off
 616     sub myreverse (@)           myreverse $a,$b,$c
 617     sub myjoin ($@)             myjoin ":",$a,$b,$c
 618     sub mypop (\@)              mypop @array
 619     sub mysplice (\@$$@)        mysplice @array,@array,0,@pushme
 620     sub mykeys (\%)             mykeys %{$hashref}
 621     sub myopen (*;$)            myopen HANDLE, $name
 622     sub mypipe (**)             mypipe READHANDLE, WRITEHANDLE
 623     sub mygrep (&@)             mygrep { /foo/ } $a,$b,$c
 624     sub myrand ($)              myrand 42
 625     sub mytime ()               mytime
 626
 627 Any backslashed prototype character represents an actual argument
 628 that absolutely must start with that character.  The value passed
 629 to the subroutine (as part of C<@_>) will be a reference to the
 630 actual argument given in the subroutine call, obtained by applying
 631 C<\> to that argument.
 632
 633 Unbackslashed prototype characters have special meanings.  Any
 634 unbackslashed @ or % eats all the rest of the arguments, and forces
 635 list context.  An argument represented by $ forces scalar context.  An
 636 & requires an anonymous subroutine, which, if passed as the first
 637 argument, does not require the "sub" keyword or a subsequent comma.  A
 638 * does whatever it has to do to turn the argument into a reference to a
 639 symbol table entry.
 640
 641 A semicolon separates mandatory arguments from optional arguments.
 642 (It is redundant before @ or %.)
 643
 644 Note how the last three examples above are treated specially by the parser.
 645 mygrep() is parsed as a true list operator, myrand() is parsed as a
 646 true unary operator with unary precedence the same as rand(), and
 647 mytime() is truly argumentless, just like time().  That is, if you
 648 say
 649
 650     mytime +2;
 651
 652 you'll get mytime() + 2, not mytime(2), which is how it would be parsed
 653 without the prototype.
 654
 655 The interesting thing about & is that you can generate new syntax with it:
 656
 657     sub try (&@) {
 658         my($try,$catch) = @_;
 659         eval { &$try };
 660         if ($@) {
 661             local $_ = $@;
 662             &$catch;
 663         }
 664     }
 665     sub catch (&) { $_[0] }
 666
 667     try {
 668         die "phooey";
 669     } catch {
 670         /phooey/ and print "unphooey\n";
 671     };
 672
 673 That prints "unphooey".  (Yes, there are still unresolved
 674 issues having to do with the visibility of @_.  I'm ignoring that
 675 question for the moment.  (But note that if we make @_ lexically
 676 scoped, those anonymous subroutines can act like closures... (Gee,
 677 is this sounding a little Lispish?  (Nevermind.))))
 678
 679 And here's a reimplementation of grep:
 680
 681     sub mygrep (&@) {
 682         my $code = shift;
 683         my @result;
 684         foreach $_ (@_) {
 685             push(@result, $_) if &$code;
 686         }
 687         @result;
 688     }
 689
 690 Some folks would prefer full alphanumeric prototypes.  Alphanumerics have
 691 been intentionally left out of prototypes for the express purpose of
 692 someday in the future adding named, formal parameters.  The current
 693 mechanism's main goal is to let module writers provide better diagnostics
 694 for module users.  Larry feels the notation quite understandable to Perl
 695 programmers, and that it will not intrude greatly upon the meat of the
 696 module, nor make it harder to read.  The line noise is visually
 697 encapsulated into a small pill that's easy to swallow.
 698
 699 It's probably best to prototype new functions, not retrofit prototyping
 700 into older ones.  That's because you must be especially careful about
 701 silent impositions of differing list versus scalar contexts.  For example,
 702 if you decide that a function should take just one parameter, like this:
 703
 704     sub func ($) {
 705         my $n = shift;
 706         print "you gave me $n\n";
 707     }
 708
 709 and someone has been calling it with an array or expression
 710 returning a list:
 711
 712     func(@foo);
 713     func( split /:/ );
 714
 715 Then you've just supplied an automatic scalar() in front of their
 716 argument, which can be more than a bit surprising.  The old @foo
 717 which used to hold one thing doesn't get passed in.  Instead,
 718 the func() now gets passed in 1, that is, the number of elments
 719 in @foo.  And the split() gets called in a scalar context and
 720 starts scribbling on your @_ parameter list.
 721
 722 This is all very powerful, of course, and should only be used in moderation
 723 to make the world a better place.
 724
 725 =head2 Overriding Builtin Functions
 726
 727 Many builtin functions may be overridden, though this should only be
 728 tried occasionally and for good reason.  Typically this might be
 729 done by a package attempting to emulate missing builtin functionality
 730 on a non-Unix system.
 731
 732 Overriding may only be done by importing the name from a
 733 module--ordinary predeclaration isn't good enough.  However, the
 734 C<subs> pragma (compiler directive) lets you, in effect, predeclare subs
 735 via the import syntax, and these names may then override the builtin ones:
 736
 737     use subs 'chdir', 'chroot', 'chmod', 'chown';
 738     chdir $somewhere;
 739     sub chdir { ... }
 740
 741 Library modules should not in general export builtin names like "open"
 742 or "chdir" as part of their default @EXPORT list, since these may
 743 sneak into someone else's namespace and change the semantics unexpectedly.
 744 Instead, if the module adds the name to the @EXPORT_OK list, then it's
 745 possible for a user to import the name explicitly, but not implicitly.
 746 That is, they could say
 747
 748     use Module 'open';
 749
 750 and it would import the open override, but if they said
 751
 752     use Module;
 753
 754 they would get the default imports without the overrides.
 755
 756 =head2 Autoloading
 757
 758 If you call a subroutine that is undefined, you would ordinarily get an
 759 immediate fatal error complaining that the subroutine doesn't exist.
 760 (Likewise for subroutines being used as methods, when the method
 761 doesn't exist in any of the base classes of the class package.) If,
 762 however, there is an C<AUTOLOAD> subroutine defined in the package or
 763 packages that were searched for the original subroutine, then that
 764 C<AUTOLOAD> subroutine is called with the arguments that would have been
 765 passed to the original subroutine.  The fully qualified name of the
 766 original subroutine magically appears in the $AUTOLOAD variable in the
 767 same package as the C<AUTOLOAD> routine.  The name is not passed as an
 768 ordinary argument because, er, well, just because, that's why...
 769
 770 Most C<AUTOLOAD> routines will load in a definition for the subroutine in
 771 question using eval, and then execute that subroutine using a special
 772 form of "goto" that erases the stack frame of the C<AUTOLOAD> routine
 773 without a trace.  (See the standard C<AutoLoader> module, for example.)
 774 But an C<AUTOLOAD> routine can also just emulate the routine and never
 775 define it.   For example, let's pretend that a function that wasn't defined
 776 should just call system() with those arguments.  All you'd do is this:
 777
 778     sub AUTOLOAD {
 779         my $program = $AUTOLOAD;
 780         $program =~ s/.*:://;
 781         system($program, @_);
 782     }
 783     date();
 784     who('am', 'i');
 785     ls('-l');
 786
 787 In fact, if you preclare the functions you want to call that way, you don't
 788 even need the parentheses:
 789
 790     use subs qw(date who ls);
 791     date;
 792     who "am", "i";
 793     ls -l;
 794
 795 A more complete example of this is the standard Shell module, which
 796 can treat undefined subroutine calls as calls to Unix programs.
 797
 798 Mechanisms are available for modules writers to help split the modules
 799 up into autoloadable files.  See the standard AutoLoader module
 800 described in L<AutoLoader> and in L<AutoSplit>, the standard
 801 SelfLoader modules in L<SelfLoader>, and the document on adding C
 802 functions to perl code in L<perlxs>.
 803
 804 =head1 SEE ALSO
 805
 806 See L<perlref> for more on references.  See L<perlxs> if you'd
 807 like to learn about calling C subroutines from perl.  See
 808 L<perlmod> to learn about bundling up your functions in
 809 separate files.