Pod updates
[p5sagit/p5-mst-13.2.git] / pod / perlref.pod
CommitLineData
a0d0e21e 1=head1 NAME
2
3perlref - Perl references and nested data structures
4
5=head1 DESCRIPTION
6
cb1a09d0 7Before release 5 of Perl it was difficult to represent complex data
8structures, because all references had to be symbolic, and even that was
9difficult to do when you wanted to refer to a variable rather than a
5f05dabc 10symbol table entry. Perl not only makes it easier to use symbolic
cb1a09d0 11references to variables, but lets you have "hard" references to any piece
5f05dabc 12of data. Any scalar may hold a hard reference. Because arrays and hashes
cb1a09d0 13contain scalars, you can now easily build arrays of arrays, arrays of
14hashes, hashes of arrays, arrays of hashes of functions, and so on.
a0d0e21e 15
16Hard references are smart--they keep track of reference counts for you,
17automatically freeing the thing referred to when its reference count
6309d9d9 18goes to zero. (Note: The reference counts for values in self-referential
19or cyclic data structures may not go to zero without a little help; see
20L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation.
21If that thing happens to be an object, the object is
a0d0e21e 22destructed. See L<perlobj> for more about objects. (In a sense,
23everything in Perl is an object, but we usually reserve the word for
24references to objects that have been officially "blessed" into a class package.)
25
26A symbolic reference contains the name of a variable, just as a
5f05dabc 27symbolic link in the filesystem contains merely the name of a file.
a0d0e21e 28The C<*glob> notation is a kind of symbolic reference. Hard references
29are more like hard links in the file system: merely another way
30at getting at the same underlying object, irrespective of its name.
31
32"Hard" references are easy to use in Perl. There is just one
33overriding principle: Perl does no implicit referencing or
34dereferencing. When a scalar is holding a reference, it always behaves
35as a scalar. It doesn't magically start being an array or a hash
36unless you tell it so explicitly by dereferencing it.
37
5695b28e 38References can be constructed in several ways.
a0d0e21e 39
40=over 4
41
42=item 1.
43
44By using the backslash operator on a variable, subroutine, or value.
5695b28e 45(This works much like the & (address-of) operator in C.) Note
5f05dabc 46that this typically creates I<ANOTHER> reference to a variable, because
a0d0e21e 47there's already a reference to the variable in the symbol table. But
48the symbol table reference might go away, and you'll still have the
49reference that the backslash returned. Here are some examples:
50
51 $scalarref = \$foo;
52 $arrayref = \@ARGV;
53 $hashref = \%ENV;
54 $coderef = \&handler;
55497cff 55 $globref = \*foo;
cb1a09d0 56
5f05dabc 57It isn't possible to create a true reference to an IO handle (filehandle or
36477c24 58dirhandle) using the backslash operator. See the explanation of the
5f05dabc 59*foo{THING} syntax below. (However, you're apt to find Perl code
60out there using globrefs as though they were IO handles, which is
61grandfathered into continued functioning.)
a0d0e21e 62
63=item 2.
64
65A reference to an anonymous array can be constructed using square
66brackets:
67
68 $arrayref = [1, 2, ['a', 'b', 'c']];
69
70Here we've constructed a reference to an anonymous array of three elements
5695b28e 71whose final element is itself a reference to another anonymous array of three
a0d0e21e 72elements. (The multidimensional syntax described later can be used to
184e9718 73access this. For example, after the above, C<$arrayref-E<gt>[2][1]> would have
a0d0e21e 74the value "b".)
75
cb1a09d0 76Note that taking a reference to an enumerated list is not the same
77as using square brackets--instead it's the same as creating
78a list of references!
79
58e0a6ae 80 @list = (\$a, \@b, \%c);
81 @list = \($a, @b, %c); # same thing!
82
83As a special case, C<\(@foo)> returns a list of references to the contents
84of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>.
cb1a09d0 85
a0d0e21e 86=item 3.
87
88A reference to an anonymous hash can be constructed using curly
89brackets:
90
91 $hashref = {
92 'Adam' => 'Eve',
93 'Clyde' => 'Bonnie',
94 };
95
96Anonymous hash and array constructors can be intermixed freely to
97produce as complicated a structure as you want. The multidimensional
98syntax described below works for these too. The values above are
99literals, but variables and expressions would work just as well, because
100assignment operators in Perl (even within local() or my()) are executable
101statements, not compile-time declarations.
102
103Because curly brackets (braces) are used for several other things
104including BLOCKs, you may occasionally have to disambiguate braces at the
105beginning of a statement by putting a C<+> or a C<return> in front so
106that Perl realizes the opening brace isn't starting a BLOCK. The economy and
107mnemonic value of using curlies is deemed worth this occasional extra
108hassle.
109
110For example, if you wanted a function to make a new hash and return a
111reference to it, you have these options:
112
113 sub hashem { { @_ } } # silently wrong
114 sub hashem { +{ @_ } } # ok
115 sub hashem { return { @_ } } # ok
116
117=item 4.
118
119A reference to an anonymous subroutine can be constructed by using
120C<sub> without a subname:
121
122 $coderef = sub { print "Boink!\n" };
123
124Note the presence of the semicolon. Except for the fact that the code
125inside isn't executed immediately, a C<sub {}> is not so much a
126declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no
127matter how many times you execute that line (unless you're in an
128C<eval("...")>), C<$coderef> will still have a reference to the I<SAME>
129anonymous subroutine.)
130
748a9306 131Anonymous subroutines act as closures with respect to my() variables,
132that is, variables visible lexically within the current scope. Closure
133is a notion out of the Lisp world that says if you define an anonymous
134function in a particular lexical context, it pretends to run in that
135context even when it's called outside of the context.
136
137In human terms, it's a funny way of passing arguments to a subroutine when
138you define it as well as when you call it. It's useful for setting up
139little bits of code to run later, such as callbacks. You can even
5695b28e 140do object-oriented stuff with it, though Perl already provides a different
141mechanism to do that--see L<perlobj>.
748a9306 142
143You can also think of closure as a way to write a subroutine template without
144using eval. (In fact, in version 5.000, eval was the I<only> way to get
145closures. You may wish to use "require 5.001" if you use closures.)
146
147Here's a small example of how closures works:
148
149 sub newprint {
150 my $x = shift;
151 return sub { my $y = shift; print "$x, $y!\n"; };
a0d0e21e 152 }
748a9306 153 $h = newprint("Howdy");
154 $g = newprint("Greetings");
155
156 # Time passes...
157
158 &$h("world");
159 &$g("earthlings");
a0d0e21e 160
748a9306 161This prints
162
163 Howdy, world!
164 Greetings, earthlings!
165
166Note particularly that $x continues to refer to the value passed into
cb1a09d0 167newprint() I<despite> the fact that the "my $x" has seemingly gone out of
748a9306 168scope by the time the anonymous subroutine runs. That's what closure
169is all about.
170
5f05dabc 171This applies to only lexical variables, by the way. Dynamic variables
748a9306 172continue to work as they have always worked. Closure is not something
173that most Perl programmers need trouble themselves about to begin with.
a0d0e21e 174
175=item 5.
176
177References are often returned by special subroutines called constructors.
748a9306 178Perl objects are just references to a special kind of object that happens to know
a0d0e21e 179which package it's associated with. Constructors are just special
180subroutines that know how to create that association. They do so by
181starting with an ordinary reference, and it remains an ordinary reference
182even while it's also being an object. Constructors are customarily
183named new(), but don't have to be:
184
185 $objref = new Doggie (Tail => 'short', Ears => 'long');
186
187=item 6.
188
189References of the appropriate type can spring into existence if you
5f05dabc 190dereference them in a context that assumes they exist. Because we haven't
a0d0e21e 191talked about dereferencing yet, we can't show you any examples yet.
192
cb1a09d0 193=item 7.
194
55497cff 195A reference can be created by using a special syntax, lovingly known as
196the *foo{THING} syntax. *foo{THING} returns a reference to the THING
197slot in *foo (which is the symbol table entry which holds everything
198known as foo).
cb1a09d0 199
55497cff 200 $scalarref = *foo{SCALAR};
201 $arrayref = *ARGV{ARRAY};
202 $hashref = *ENV{HASH};
203 $coderef = *handler{CODE};
36477c24 204 $ioref = *STDIN{IO};
55497cff 205 $globref = *foo{GLOB};
206
36477c24 207All of these are self-explanatory except for *foo{IO}. It returns the
208IO handle, used for file handles (L<perlfunc/open>), sockets
209(L<perlfunc/socket> and L<perlfunc/socketpair>), and directory handles
210(L<perlfunc/opendir>). For compatibility with previous versions of
211Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}.
55497cff 212
5f05dabc 213*foo{THING} returns undef if that particular THING hasn't been used yet,
214except in the case of scalars. *foo{SCALAR} returns a reference to an
215anonymous scalar if $foo hasn't been used yet. This might change in a
216future release.
217
218The use of *foo{IO} is the best way to pass bareword filehandles into or
219out of subroutines, or to store them in larger data structures.
36477c24 220
221 splutter(*STDOUT{IO});
cb1a09d0 222 sub splutter {
223 my $fh = shift;
224 print $fh "her um well a hmmm\n";
225 }
226
36477c24 227 $rec = get_rec(*STDIN{IO});
cb1a09d0 228 sub get_rec {
229 my $fh = shift;
230 return scalar <$fh>;
231 }
232
5f05dabc 233Beware, though, that you can't do this with a routine which is going to
234open the filehandle for you, because *HANDLE{IO} will be undef if HANDLE
235hasn't been used yet. Use \*HANDLE for that sort of thing instead.
236
237Using \*HANDLE (or *HANDLE) is another way to use and store non-bareword
a6006777 238filehandles (before perl version 5.002 it was the only way). The two
239methods are largely interchangeable, you can do
5f05dabc 240
241 splutter(\*STDOUT);
242 $rec = get_rec(\*STDIN);
243
244with the above subroutine definitions.
55497cff 245
a0d0e21e 246=back
247
248That's it for creating references. By now you're probably dying to
249know how to use references to get back to your long-lost data. There
250are several basic methods.
251
252=over 4
253
254=item 1.
255
6309d9d9 256Anywhere you'd put an identifier (or chain of identifiers) as part
257of a variable or subroutine name, you can replace the identifier with
258a simple scalar variable containing a reference of the correct type:
a0d0e21e 259
260 $bar = $$scalarref;
261 push(@$arrayref, $filename);
262 $$arrayref[0] = "January";
263 $$hashref{"KEY"} = "VALUE";
264 &$coderef(1,2,3);
cb1a09d0 265 print $globref "output\n";
a0d0e21e 266
267It's important to understand that we are specifically I<NOT> dereferencing
268C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the
269scalar variable happens I<BEFORE> it does any key lookups. Anything more
270complicated than a simple scalar variable must use methods 2 or 3 below.
271However, a "simple scalar" includes an identifier that itself uses method
2721 recursively. Therefore, the following prints "howdy".
273
274 $refrefref = \\\"howdy";
275 print $$$$refrefref;
276
277=item 2.
278
6309d9d9 279Anywhere you'd put an identifier (or chain of identifiers) as part of a
280variable or subroutine name, you can replace the identifier with a
281BLOCK returning a reference of the correct type. In other words, the
282previous examples could be written like this:
a0d0e21e 283
284 $bar = ${$scalarref};
285 push(@{$arrayref}, $filename);
286 ${$arrayref}[0] = "January";
287 ${$hashref}{"KEY"} = "VALUE";
288 &{$coderef}(1,2,3);
36477c24 289 $globref->print("output\n"); # iff IO::Handle is loaded
a0d0e21e 290
291Admittedly, it's a little silly to use the curlies in this case, but
292the BLOCK can contain any arbitrary expression, in particular,
293subscripted expressions:
294
295 &{ $dispatch{$index} }(1,2,3); # call correct routine
296
297Because of being able to omit the curlies for the simple case of C<$$x>,
298people often make the mistake of viewing the dereferencing symbols as
299proper operators, and wonder about their precedence. If they were,
5f05dabc 300though, you could use parentheses instead of braces. That's not the case.
a0d0e21e 301Consider the difference below; case 0 is a short-hand version of case 1,
302I<NOT> case 2:
303
304 $$hashref{"KEY"} = "VALUE"; # CASE 0
305 ${$hashref}{"KEY"} = "VALUE"; # CASE 1
306 ${$hashref{"KEY"}} = "VALUE"; # CASE 2
307 ${$hashref->{"KEY"}} = "VALUE"; # CASE 3
308
309Case 2 is also deceptive in that you're accessing a variable
310called %hashref, not dereferencing through $hashref to the hash
311it's presumably referencing. That would be case 3.
312
313=item 3.
314
315The case of individual array elements arises often enough that it gets
316cumbersome to use method 2. As a form of syntactic sugar, the two
317lines like that above can be written:
318
319 $arrayref->[0] = "January";
748a9306 320 $hashref->{"KEY"} = "VALUE";
a0d0e21e 321
322The left side of the array can be any expression returning a reference,
323including a previous dereference. Note that C<$array[$x]> is I<NOT> the
324same thing as C<$array-E<gt>[$x]> here:
325
326 $array[$x]->{"foo"}->[0] = "January";
327
328This is one of the cases we mentioned earlier in which references could
329spring into existence when in an lvalue context. Before this
330statement, C<$array[$x]> may have been undefined. If so, it's
331automatically defined with a hash reference so that we can look up
332C<{"foo"}> in it. Likewise C<$array[$x]-E<gt>{"foo"}> will automatically get
333defined with an array reference so that we can look up C<[0]> in it.
334
335One more thing here. The arrow is optional I<BETWEEN> brackets
336subscripts, so you can shrink the above down to
337
338 $array[$x]{"foo"}[0] = "January";
339
340Which, in the degenerate case of using only ordinary arrays, gives you
341multidimensional arrays just like C's:
342
343 $score[$x][$y][$z] += 42;
344
345Well, okay, not entirely like C's arrays, actually. C doesn't know how
346to grow its arrays on demand. Perl does.
347
348=item 4.
349
350If a reference happens to be a reference to an object, then there are
351probably methods to access the things referred to, and you should probably
352stick to those methods unless you're in the class package that defines the
353object's methods. In other words, be nice, and don't violate the object's
354encapsulation without a very good reason. Perl does not enforce
355encapsulation. We are not totalitarians here. We do expect some basic
356civility though.
357
358=back
359
360The ref() operator may be used to determine what type of thing the
361reference is pointing to. See L<perlfunc>.
362
363The bless() operator may be used to associate a reference with a package
364functioning as an object class. See L<perlobj>.
365
5f05dabc 366A typeglob may be dereferenced the same way a reference can, because
a0d0e21e 367the dereference syntax always indicates the kind of reference desired.
368So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable.
369
370Here's a trick for interpolating a subroutine call into a string:
371
cb1a09d0 372 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
373
374The way it works is that when the C<@{...}> is seen in the double-quoted
375string, it's evaluated as a block. The block creates a reference to an
376anonymous array containing the results of the call to C<mysub(1,2,3)>. So
377the whole block returns a reference to an array, which is then
378dereferenced by C<@{...}> and stuck into the double-quoted string. This
379chicanery is also useful for arbitrary expressions:
a0d0e21e 380
184e9718 381 print "That yields @{[$n + 5]} widgets\n";
a0d0e21e 382
383=head2 Symbolic references
384
385We said that references spring into existence as necessary if they are
386undefined, but we didn't say what happens if a value used as a
387reference is already defined, but I<ISN'T> a hard reference. If you
388use it as a reference in this case, it'll be treated as a symbolic
389reference. That is, the value of the scalar is taken to be the I<NAME>
390of a variable, rather than a direct link to a (possibly) anonymous
391value.
392
393People frequently expect it to work like this. So it does.
394
395 $name = "foo";
396 $$name = 1; # Sets $foo
397 ${$name} = 2; # Sets $foo
398 ${$name x 2} = 3; # Sets $foofoo
399 $name->[0] = 4; # Sets $foo[0]
400 @$name = (); # Clears @foo
401 &$name(); # Calls &foo() (as in Perl 4)
402 $pack = "THAT";
403 ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval
404
405This is very powerful, and slightly dangerous, in that it's possible
406to intend (with the utmost sincerity) to use a hard reference, and
407accidentally use a symbolic reference instead. To protect against
408that, you can say
409
410 use strict 'refs';
411
412and then only hard references will be allowed for the rest of the enclosing
413block. An inner block may countermand that with
414
415 no strict 'refs';
416
417Only package variables are visible to symbolic references. Lexical
418variables (declared with my()) aren't in a symbol table, and thus are
419invisible to this mechanism. For example:
420
421 local($value) = 10;
422 $ref = \$value;
423 {
424 my $value = 20;
425 print $$ref;
426 }
427
428This will still print 10, not 20. Remember that local() affects package
429variables, which are all "global" to the package.
430
748a9306 431=head2 Not-so-symbolic references
432
a6006777 433A new feature contributing to readability in perl version 5.001 is that the
434brackets around a symbolic reference behave more like quotes, just as they
748a9306 435always have within a string. That is,
436
437 $push = "pop on ";
438 print "${push}over";
439
440has always meant to print "pop on over", despite the fact that push is
441a reserved word. This has been generalized to work the same outside
442of quotes, so that
443
444 print ${push} . "over";
445
446and even
447
448 print ${ push } . "over";
449
450will have the same effect. (This would have been a syntax error in
a6006777 451Perl 5.000, though Perl 4 allowed it in the spaceless form.) Note that this
748a9306 452construct is I<not> considered to be a symbolic reference when you're
453using strict refs:
454
455 use strict 'refs';
456 ${ bareword }; # Okay, means $bareword.
457 ${ "bareword" }; # Error, symbolic reference.
458
459Similarly, because of all the subscripting that is done using single
460words, we've applied the same rule to any bareword that is used for
461subscripting a hash. So now, instead of writing
462
463 $array{ "aaa" }{ "bbb" }{ "ccc" }
464
5f05dabc 465you can write just
748a9306 466
467 $array{ aaa }{ bbb }{ ccc }
468
469and not worry about whether the subscripts are reserved words. In the
470rare event that you do wish to do something like
471
472 $array{ shift }
473
474you can force interpretation as a reserved word by adding anything that
475makes it more than a bareword:
476
477 $array{ shift() }
478 $array{ +shift }
479 $array{ shift @_ }
480
481The B<-w> switch will warn you if it interprets a reserved word as a string.
5f05dabc 482But it will no longer warn you about using lowercase words, because the
748a9306 483string is effectively quoted.
484
cb1a09d0 485=head1 WARNING
748a9306 486
487You may not (usefully) use a reference as the key to a hash. It will be
488converted into a string:
489
490 $x{ \$a } = $a;
491
492If you try to dereference the key, it won't do a hard dereference, and
184e9718 493you won't accomplish what you're attempting. You might want to do something
cb1a09d0 494more like
748a9306 495
cb1a09d0 496 $r = \@a;
497 $x{ $r } = $r;
498
499And then at least you can use the values(), which will be
500real refs, instead of the keys(), which won't.
501
502=head1 SEE ALSO
a0d0e21e 503
504Besides the obvious documents, source code can be instructive.
505Some rather pathological examples of the use of references can be found
506in the F<t/op/ref.t> regression test in the Perl source directory.
cb1a09d0 507
508See also L<perldsc> and L<perllol> for how to use references to create
509complex data structures, and L<perlobj> for how to use them to create
510objects.