Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
3 | perlref - Perl references and nested data structures |
4 | |
5 | =head1 DESCRIPTION |
6 | |
cb1a09d0 |
7 | Before release 5 of Perl it was difficult to represent complex data |
8 | structures, because all references had to be symbolic, and even that was |
9 | difficult to do when you wanted to refer to a variable rather than a |
5f05dabc |
10 | symbol table entry. Perl not only makes it easier to use symbolic |
cb1a09d0 |
11 | references to variables, but lets you have "hard" references to any piece |
5f05dabc |
12 | of data. Any scalar may hold a hard reference. Because arrays and hashes |
cb1a09d0 |
13 | contain scalars, you can now easily build arrays of arrays, arrays of |
14 | hashes, hashes of arrays, arrays of hashes of functions, and so on. |
a0d0e21e |
15 | |
16 | Hard references are smart--they keep track of reference counts for you, |
17 | automatically freeing the thing referred to when its reference count |
6309d9d9 |
18 | goes to zero. (Note: The reference counts for values in self-referential |
19 | or cyclic data structures may not go to zero without a little help; see |
20 | L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation. |
21 | If that thing happens to be an object, the object is |
a0d0e21e |
22 | destructed. See L<perlobj> for more about objects. (In a sense, |
23 | everything in Perl is an object, but we usually reserve the word for |
24 | references to objects that have been officially "blessed" into a class package.) |
25 | |
26 | A symbolic reference contains the name of a variable, just as a |
5f05dabc |
27 | symbolic link in the filesystem contains merely the name of a file. |
a0d0e21e |
28 | The C<*glob> notation is a kind of symbolic reference. Hard references |
29 | are more like hard links in the file system: merely another way |
30 | at getting at the same underlying object, irrespective of its name. |
31 | |
32 | "Hard" references are easy to use in Perl. There is just one |
33 | overriding principle: Perl does no implicit referencing or |
34 | dereferencing. When a scalar is holding a reference, it always behaves |
35 | as a scalar. It doesn't magically start being an array or a hash |
36 | unless you tell it so explicitly by dereferencing it. |
37 | |
5695b28e |
38 | References can be constructed in several ways. |
a0d0e21e |
39 | |
40 | =over 4 |
41 | |
42 | =item 1. |
43 | |
44 | By using the backslash operator on a variable, subroutine, or value. |
5695b28e |
45 | (This works much like the & (address-of) operator in C.) Note |
5f05dabc |
46 | that this typically creates I<ANOTHER> reference to a variable, because |
a0d0e21e |
47 | there's already a reference to the variable in the symbol table. But |
48 | the symbol table reference might go away, and you'll still have the |
49 | reference that the backslash returned. Here are some examples: |
50 | |
51 | $scalarref = \$foo; |
52 | $arrayref = \@ARGV; |
53 | $hashref = \%ENV; |
54 | $coderef = \&handler; |
55497cff |
55 | $globref = \*foo; |
cb1a09d0 |
56 | |
5f05dabc |
57 | It isn't possible to create a true reference to an IO handle (filehandle or |
36477c24 |
58 | dirhandle) using the backslash operator. See the explanation of the |
5f05dabc |
59 | *foo{THING} syntax below. (However, you're apt to find Perl code |
60 | out there using globrefs as though they were IO handles, which is |
61 | grandfathered into continued functioning.) |
a0d0e21e |
62 | |
63 | =item 2. |
64 | |
65 | A reference to an anonymous array can be constructed using square |
66 | brackets: |
67 | |
68 | $arrayref = [1, 2, ['a', 'b', 'c']]; |
69 | |
70 | Here we've constructed a reference to an anonymous array of three elements |
5695b28e |
71 | whose final element is itself a reference to another anonymous array of three |
a0d0e21e |
72 | elements. (The multidimensional syntax described later can be used to |
184e9718 |
73 | access this. For example, after the above, C<$arrayref-E<gt>[2][1]> would have |
a0d0e21e |
74 | the value "b".) |
75 | |
cb1a09d0 |
76 | Note that taking a reference to an enumerated list is not the same |
77 | as using square brackets--instead it's the same as creating |
78 | a list of references! |
79 | |
58e0a6ae |
80 | @list = (\$a, \@b, \%c); |
81 | @list = \($a, @b, %c); # same thing! |
82 | |
83 | As a special case, C<\(@foo)> returns a list of references to the contents |
84 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>. |
cb1a09d0 |
85 | |
a0d0e21e |
86 | =item 3. |
87 | |
88 | A reference to an anonymous hash can be constructed using curly |
89 | brackets: |
90 | |
91 | $hashref = { |
92 | 'Adam' => 'Eve', |
93 | 'Clyde' => 'Bonnie', |
94 | }; |
95 | |
96 | Anonymous hash and array constructors can be intermixed freely to |
97 | produce as complicated a structure as you want. The multidimensional |
98 | syntax described below works for these too. The values above are |
99 | literals, but variables and expressions would work just as well, because |
100 | assignment operators in Perl (even within local() or my()) are executable |
101 | statements, not compile-time declarations. |
102 | |
103 | Because curly brackets (braces) are used for several other things |
104 | including BLOCKs, you may occasionally have to disambiguate braces at the |
105 | beginning of a statement by putting a C<+> or a C<return> in front so |
106 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and |
107 | mnemonic value of using curlies is deemed worth this occasional extra |
108 | hassle. |
109 | |
110 | For example, if you wanted a function to make a new hash and return a |
111 | reference to it, you have these options: |
112 | |
113 | sub hashem { { @_ } } # silently wrong |
114 | sub hashem { +{ @_ } } # ok |
115 | sub hashem { return { @_ } } # ok |
116 | |
117 | =item 4. |
118 | |
119 | A reference to an anonymous subroutine can be constructed by using |
120 | C<sub> without a subname: |
121 | |
122 | $coderef = sub { print "Boink!\n" }; |
123 | |
124 | Note the presence of the semicolon. Except for the fact that the code |
125 | inside isn't executed immediately, a C<sub {}> is not so much a |
126 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no |
127 | matter how many times you execute that line (unless you're in an |
128 | C<eval("...")>), C<$coderef> will still have a reference to the I<SAME> |
129 | anonymous subroutine.) |
130 | |
748a9306 |
131 | Anonymous subroutines act as closures with respect to my() variables, |
132 | that is, variables visible lexically within the current scope. Closure |
133 | is a notion out of the Lisp world that says if you define an anonymous |
134 | function in a particular lexical context, it pretends to run in that |
135 | context even when it's called outside of the context. |
136 | |
137 | In human terms, it's a funny way of passing arguments to a subroutine when |
138 | you define it as well as when you call it. It's useful for setting up |
139 | little bits of code to run later, such as callbacks. You can even |
5695b28e |
140 | do object-oriented stuff with it, though Perl already provides a different |
141 | mechanism to do that--see L<perlobj>. |
748a9306 |
142 | |
143 | You can also think of closure as a way to write a subroutine template without |
144 | using eval. (In fact, in version 5.000, eval was the I<only> way to get |
145 | closures. You may wish to use "require 5.001" if you use closures.) |
146 | |
147 | Here's a small example of how closures works: |
148 | |
149 | sub newprint { |
150 | my $x = shift; |
151 | return sub { my $y = shift; print "$x, $y!\n"; }; |
a0d0e21e |
152 | } |
748a9306 |
153 | $h = newprint("Howdy"); |
154 | $g = newprint("Greetings"); |
155 | |
156 | # Time passes... |
157 | |
158 | &$h("world"); |
159 | &$g("earthlings"); |
a0d0e21e |
160 | |
748a9306 |
161 | This prints |
162 | |
163 | Howdy, world! |
164 | Greetings, earthlings! |
165 | |
166 | Note particularly that $x continues to refer to the value passed into |
cb1a09d0 |
167 | newprint() I<despite> the fact that the "my $x" has seemingly gone out of |
748a9306 |
168 | scope by the time the anonymous subroutine runs. That's what closure |
169 | is all about. |
170 | |
5f05dabc |
171 | This applies to only lexical variables, by the way. Dynamic variables |
748a9306 |
172 | continue to work as they have always worked. Closure is not something |
173 | that most Perl programmers need trouble themselves about to begin with. |
a0d0e21e |
174 | |
175 | =item 5. |
176 | |
177 | References are often returned by special subroutines called constructors. |
748a9306 |
178 | Perl objects are just references to a special kind of object that happens to know |
a0d0e21e |
179 | which package it's associated with. Constructors are just special |
180 | subroutines that know how to create that association. They do so by |
181 | starting with an ordinary reference, and it remains an ordinary reference |
182 | even while it's also being an object. Constructors are customarily |
183 | named new(), but don't have to be: |
184 | |
185 | $objref = new Doggie (Tail => 'short', Ears => 'long'); |
186 | |
187 | =item 6. |
188 | |
189 | References of the appropriate type can spring into existence if you |
5f05dabc |
190 | dereference them in a context that assumes they exist. Because we haven't |
a0d0e21e |
191 | talked about dereferencing yet, we can't show you any examples yet. |
192 | |
cb1a09d0 |
193 | =item 7. |
194 | |
55497cff |
195 | A reference can be created by using a special syntax, lovingly known as |
196 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING |
197 | slot in *foo (which is the symbol table entry which holds everything |
198 | known as foo). |
cb1a09d0 |
199 | |
55497cff |
200 | $scalarref = *foo{SCALAR}; |
201 | $arrayref = *ARGV{ARRAY}; |
202 | $hashref = *ENV{HASH}; |
203 | $coderef = *handler{CODE}; |
36477c24 |
204 | $ioref = *STDIN{IO}; |
55497cff |
205 | $globref = *foo{GLOB}; |
206 | |
36477c24 |
207 | All of these are self-explanatory except for *foo{IO}. It returns the |
208 | IO handle, used for file handles (L<perlfunc/open>), sockets |
209 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory handles |
210 | (L<perlfunc/opendir>). For compatibility with previous versions of |
211 | Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}. |
55497cff |
212 | |
5f05dabc |
213 | *foo{THING} returns undef if that particular THING hasn't been used yet, |
214 | except in the case of scalars. *foo{SCALAR} returns a reference to an |
215 | anonymous scalar if $foo hasn't been used yet. This might change in a |
216 | future release. |
217 | |
218 | The use of *foo{IO} is the best way to pass bareword filehandles into or |
219 | out of subroutines, or to store them in larger data structures. |
36477c24 |
220 | |
221 | splutter(*STDOUT{IO}); |
cb1a09d0 |
222 | sub splutter { |
223 | my $fh = shift; |
224 | print $fh "her um well a hmmm\n"; |
225 | } |
226 | |
36477c24 |
227 | $rec = get_rec(*STDIN{IO}); |
cb1a09d0 |
228 | sub get_rec { |
229 | my $fh = shift; |
230 | return scalar <$fh>; |
231 | } |
232 | |
5f05dabc |
233 | Beware, though, that you can't do this with a routine which is going to |
234 | open the filehandle for you, because *HANDLE{IO} will be undef if HANDLE |
235 | hasn't been used yet. Use \*HANDLE for that sort of thing instead. |
236 | |
237 | Using \*HANDLE (or *HANDLE) is another way to use and store non-bareword |
a6006777 |
238 | filehandles (before perl version 5.002 it was the only way). The two |
239 | methods are largely interchangeable, you can do |
5f05dabc |
240 | |
241 | splutter(\*STDOUT); |
242 | $rec = get_rec(\*STDIN); |
243 | |
244 | with the above subroutine definitions. |
55497cff |
245 | |
a0d0e21e |
246 | =back |
247 | |
248 | That's it for creating references. By now you're probably dying to |
249 | know how to use references to get back to your long-lost data. There |
250 | are several basic methods. |
251 | |
252 | =over 4 |
253 | |
254 | =item 1. |
255 | |
6309d9d9 |
256 | Anywhere you'd put an identifier (or chain of identifiers) as part |
257 | of a variable or subroutine name, you can replace the identifier with |
258 | a simple scalar variable containing a reference of the correct type: |
a0d0e21e |
259 | |
260 | $bar = $$scalarref; |
261 | push(@$arrayref, $filename); |
262 | $$arrayref[0] = "January"; |
263 | $$hashref{"KEY"} = "VALUE"; |
264 | &$coderef(1,2,3); |
cb1a09d0 |
265 | print $globref "output\n"; |
a0d0e21e |
266 | |
267 | It's important to understand that we are specifically I<NOT> dereferencing |
268 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the |
269 | scalar variable happens I<BEFORE> it does any key lookups. Anything more |
270 | complicated than a simple scalar variable must use methods 2 or 3 below. |
271 | However, a "simple scalar" includes an identifier that itself uses method |
272 | 1 recursively. Therefore, the following prints "howdy". |
273 | |
274 | $refrefref = \\\"howdy"; |
275 | print $$$$refrefref; |
276 | |
277 | =item 2. |
278 | |
6309d9d9 |
279 | Anywhere you'd put an identifier (or chain of identifiers) as part of a |
280 | variable or subroutine name, you can replace the identifier with a |
281 | BLOCK returning a reference of the correct type. In other words, the |
282 | previous examples could be written like this: |
a0d0e21e |
283 | |
284 | $bar = ${$scalarref}; |
285 | push(@{$arrayref}, $filename); |
286 | ${$arrayref}[0] = "January"; |
287 | ${$hashref}{"KEY"} = "VALUE"; |
288 | &{$coderef}(1,2,3); |
36477c24 |
289 | $globref->print("output\n"); # iff IO::Handle is loaded |
a0d0e21e |
290 | |
291 | Admittedly, it's a little silly to use the curlies in this case, but |
292 | the BLOCK can contain any arbitrary expression, in particular, |
293 | subscripted expressions: |
294 | |
295 | &{ $dispatch{$index} }(1,2,3); # call correct routine |
296 | |
297 | Because of being able to omit the curlies for the simple case of C<$$x>, |
298 | people often make the mistake of viewing the dereferencing symbols as |
299 | proper operators, and wonder about their precedence. If they were, |
5f05dabc |
300 | though, you could use parentheses instead of braces. That's not the case. |
a0d0e21e |
301 | Consider the difference below; case 0 is a short-hand version of case 1, |
302 | I<NOT> case 2: |
303 | |
304 | $$hashref{"KEY"} = "VALUE"; # CASE 0 |
305 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 |
306 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 |
307 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 |
308 | |
309 | Case 2 is also deceptive in that you're accessing a variable |
310 | called %hashref, not dereferencing through $hashref to the hash |
311 | it's presumably referencing. That would be case 3. |
312 | |
313 | =item 3. |
314 | |
315 | The case of individual array elements arises often enough that it gets |
316 | cumbersome to use method 2. As a form of syntactic sugar, the two |
317 | lines like that above can be written: |
318 | |
319 | $arrayref->[0] = "January"; |
748a9306 |
320 | $hashref->{"KEY"} = "VALUE"; |
a0d0e21e |
321 | |
322 | The left side of the array can be any expression returning a reference, |
323 | including a previous dereference. Note that C<$array[$x]> is I<NOT> the |
324 | same thing as C<$array-E<gt>[$x]> here: |
325 | |
326 | $array[$x]->{"foo"}->[0] = "January"; |
327 | |
328 | This is one of the cases we mentioned earlier in which references could |
329 | spring into existence when in an lvalue context. Before this |
330 | statement, C<$array[$x]> may have been undefined. If so, it's |
331 | automatically defined with a hash reference so that we can look up |
332 | C<{"foo"}> in it. Likewise C<$array[$x]-E<gt>{"foo"}> will automatically get |
333 | defined with an array reference so that we can look up C<[0]> in it. |
334 | |
335 | One more thing here. The arrow is optional I<BETWEEN> brackets |
336 | subscripts, so you can shrink the above down to |
337 | |
338 | $array[$x]{"foo"}[0] = "January"; |
339 | |
340 | Which, in the degenerate case of using only ordinary arrays, gives you |
341 | multidimensional arrays just like C's: |
342 | |
343 | $score[$x][$y][$z] += 42; |
344 | |
345 | Well, okay, not entirely like C's arrays, actually. C doesn't know how |
346 | to grow its arrays on demand. Perl does. |
347 | |
348 | =item 4. |
349 | |
350 | If a reference happens to be a reference to an object, then there are |
351 | probably methods to access the things referred to, and you should probably |
352 | stick to those methods unless you're in the class package that defines the |
353 | object's methods. In other words, be nice, and don't violate the object's |
354 | encapsulation without a very good reason. Perl does not enforce |
355 | encapsulation. We are not totalitarians here. We do expect some basic |
356 | civility though. |
357 | |
358 | =back |
359 | |
360 | The ref() operator may be used to determine what type of thing the |
361 | reference is pointing to. See L<perlfunc>. |
362 | |
363 | The bless() operator may be used to associate a reference with a package |
364 | functioning as an object class. See L<perlobj>. |
365 | |
5f05dabc |
366 | A typeglob may be dereferenced the same way a reference can, because |
a0d0e21e |
367 | the dereference syntax always indicates the kind of reference desired. |
368 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. |
369 | |
370 | Here's a trick for interpolating a subroutine call into a string: |
371 | |
cb1a09d0 |
372 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; |
373 | |
374 | The way it works is that when the C<@{...}> is seen in the double-quoted |
375 | string, it's evaluated as a block. The block creates a reference to an |
376 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So |
377 | the whole block returns a reference to an array, which is then |
378 | dereferenced by C<@{...}> and stuck into the double-quoted string. This |
379 | chicanery is also useful for arbitrary expressions: |
a0d0e21e |
380 | |
184e9718 |
381 | print "That yields @{[$n + 5]} widgets\n"; |
a0d0e21e |
382 | |
383 | =head2 Symbolic references |
384 | |
385 | We said that references spring into existence as necessary if they are |
386 | undefined, but we didn't say what happens if a value used as a |
387 | reference is already defined, but I<ISN'T> a hard reference. If you |
388 | use it as a reference in this case, it'll be treated as a symbolic |
389 | reference. That is, the value of the scalar is taken to be the I<NAME> |
390 | of a variable, rather than a direct link to a (possibly) anonymous |
391 | value. |
392 | |
393 | People frequently expect it to work like this. So it does. |
394 | |
395 | $name = "foo"; |
396 | $$name = 1; # Sets $foo |
397 | ${$name} = 2; # Sets $foo |
398 | ${$name x 2} = 3; # Sets $foofoo |
399 | $name->[0] = 4; # Sets $foo[0] |
400 | @$name = (); # Clears @foo |
401 | &$name(); # Calls &foo() (as in Perl 4) |
402 | $pack = "THAT"; |
403 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval |
404 | |
405 | This is very powerful, and slightly dangerous, in that it's possible |
406 | to intend (with the utmost sincerity) to use a hard reference, and |
407 | accidentally use a symbolic reference instead. To protect against |
408 | that, you can say |
409 | |
410 | use strict 'refs'; |
411 | |
412 | and then only hard references will be allowed for the rest of the enclosing |
413 | block. An inner block may countermand that with |
414 | |
415 | no strict 'refs'; |
416 | |
417 | Only package variables are visible to symbolic references. Lexical |
418 | variables (declared with my()) aren't in a symbol table, and thus are |
419 | invisible to this mechanism. For example: |
420 | |
421 | local($value) = 10; |
422 | $ref = \$value; |
423 | { |
424 | my $value = 20; |
425 | print $$ref; |
426 | } |
427 | |
428 | This will still print 10, not 20. Remember that local() affects package |
429 | variables, which are all "global" to the package. |
430 | |
748a9306 |
431 | =head2 Not-so-symbolic references |
432 | |
a6006777 |
433 | A new feature contributing to readability in perl version 5.001 is that the |
434 | brackets around a symbolic reference behave more like quotes, just as they |
748a9306 |
435 | always have within a string. That is, |
436 | |
437 | $push = "pop on "; |
438 | print "${push}over"; |
439 | |
440 | has always meant to print "pop on over", despite the fact that push is |
441 | a reserved word. This has been generalized to work the same outside |
442 | of quotes, so that |
443 | |
444 | print ${push} . "over"; |
445 | |
446 | and even |
447 | |
448 | print ${ push } . "over"; |
449 | |
450 | will have the same effect. (This would have been a syntax error in |
a6006777 |
451 | Perl 5.000, though Perl 4 allowed it in the spaceless form.) Note that this |
748a9306 |
452 | construct is I<not> considered to be a symbolic reference when you're |
453 | using strict refs: |
454 | |
455 | use strict 'refs'; |
456 | ${ bareword }; # Okay, means $bareword. |
457 | ${ "bareword" }; # Error, symbolic reference. |
458 | |
459 | Similarly, because of all the subscripting that is done using single |
460 | words, we've applied the same rule to any bareword that is used for |
461 | subscripting a hash. So now, instead of writing |
462 | |
463 | $array{ "aaa" }{ "bbb" }{ "ccc" } |
464 | |
5f05dabc |
465 | you can write just |
748a9306 |
466 | |
467 | $array{ aaa }{ bbb }{ ccc } |
468 | |
469 | and not worry about whether the subscripts are reserved words. In the |
470 | rare event that you do wish to do something like |
471 | |
472 | $array{ shift } |
473 | |
474 | you can force interpretation as a reserved word by adding anything that |
475 | makes it more than a bareword: |
476 | |
477 | $array{ shift() } |
478 | $array{ +shift } |
479 | $array{ shift @_ } |
480 | |
481 | The B<-w> switch will warn you if it interprets a reserved word as a string. |
5f05dabc |
482 | But it will no longer warn you about using lowercase words, because the |
748a9306 |
483 | string is effectively quoted. |
484 | |
cb1a09d0 |
485 | =head1 WARNING |
748a9306 |
486 | |
487 | You may not (usefully) use a reference as the key to a hash. It will be |
488 | converted into a string: |
489 | |
490 | $x{ \$a } = $a; |
491 | |
492 | If you try to dereference the key, it won't do a hard dereference, and |
184e9718 |
493 | you won't accomplish what you're attempting. You might want to do something |
cb1a09d0 |
494 | more like |
748a9306 |
495 | |
cb1a09d0 |
496 | $r = \@a; |
497 | $x{ $r } = $r; |
498 | |
499 | And then at least you can use the values(), which will be |
500 | real refs, instead of the keys(), which won't. |
501 | |
502 | =head1 SEE ALSO |
a0d0e21e |
503 | |
504 | Besides the obvious documents, source code can be instructive. |
505 | Some rather pathological examples of the use of references can be found |
506 | in the F<t/op/ref.t> regression test in the Perl source directory. |
cb1a09d0 |
507 | |
508 | See also L<perldsc> and L<perllol> for how to use references to create |
509 | complex data structures, and L<perlobj> for how to use them to create |
510 | objects. |