Commit | Line | Data |
a1e2a320 |
1 | =head1 NAME |
2 | |
3 | perlreftut - Mark's very short tutorial about references |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | One of the most important new features in Perl 5 was the capability to |
8 | manage complicated data structures like multidimensional arrays and |
9 | nested hashes. To enable these, Perl 5 introduced a feature called |
10 | `references', and using references is the key to managing complicated, |
11 | structured data in Perl. Unfortunately, there's a lot of funny syntax |
12 | to learn, and the main manual page can be hard to follow. The manual |
1da6492a |
13 | is quite complete, and sometimes people find that a problem, because |
14 | it can be hard to tell what is important and what isn't. |
a1e2a320 |
15 | |
16 | Fortunately, you only need to know 10% of what's in the main page to get |
17 | 90% of the benefit. This page will show you that 10%. |
18 | |
19 | =head1 Who Needs Complicated Data Structures? |
20 | |
21 | One problem that came up all the time in Perl 4 was how to represent a |
22 | hash whose values were lists. Perl 4 had hashes, of course, but the |
23 | values had to be scalars; they couldn't be lists. |
24 | |
25 | Why would you want a hash of lists? Let's take a simple example: You |
1da6492a |
26 | have a file of city and country names, like this: |
a1e2a320 |
27 | |
1da6492a |
28 | Chicago, USA |
29 | Frankfurt, Germany |
30 | Berlin, Germany |
31 | Washington, USA |
32 | Helsinki, Finland |
33 | New York, USA |
a1e2a320 |
34 | |
1da6492a |
35 | and you want to produce an output like this, with each country mentioned |
36 | once, and then an alphabetical list of the cities in that country: |
a1e2a320 |
37 | |
1da6492a |
38 | Finland: Helsinki. |
39 | Germany: Berlin, Frankfurt. |
40 | USA: Chicago, New York, Washington. |
a1e2a320 |
41 | |
1da6492a |
42 | The natural way to do this is to have a hash whose keys are country |
43 | names. Associated with each country name key is a list of the cities in |
44 | that country. Each time you read a line of input, split it into a country |
a1e2a320 |
45 | and a city, look up the list of cities already known to be in that |
1da6492a |
46 | country, and append the new city to the list. When you're done reading |
a1e2a320 |
47 | the input, iterate over the hash as usual, sorting each list of cities |
48 | before you print it out. |
49 | |
50 | If hash values can't be lists, you lose. In Perl 4, hash values can't |
51 | be lists; they can only be strings. You lose. You'd probably have to |
52 | combine all the cities into a single string somehow, and then when |
53 | time came to write the output, you'd have to break the string into a |
54 | list, sort the list, and turn it back into a string. This is messy |
55 | and error-prone. And it's frustrating, because Perl already has |
56 | perfectly good lists that would solve the problem if only you could |
57 | use them. |
58 | |
59 | =head1 The Solution |
60 | |
1da6492a |
61 | By the time Perl 5 rolled around, we were already stuck with this |
62 | design: Hash values must be scalars. The solution to this is |
a1e2a320 |
63 | references. |
64 | |
65 | A reference is a scalar value that I<refers to> an entire array or an |
1da6492a |
66 | entire hash (or to just about anything else). Names are one kind of |
e937c8c3 |
67 | reference that you're already familiar with. Think of the President |
68 | of the United States: a messy, inconvenient bag of blood and bones. |
69 | But to talk about him, or to represent him in a computer program, all |
70 | you need is the easy, convenient scalar string "George Bush". |
a1e2a320 |
71 | |
72 | References in Perl are like names for arrays and hashes. They're |
73 | Perl's private, internal names, so you can be sure they're |
e937c8c3 |
74 | unambiguous. Unlike "George Bush", a reference only refers to one |
a1e2a320 |
75 | thing, and you always know what it refers to. If you have a reference |
76 | to an array, you can recover the entire array from it. If you have a |
77 | reference to a hash, you can recover the entire hash. But the |
78 | reference is still an easy, compact scalar value. |
79 | |
80 | You can't have a hash whose values are arrays; hash values can only be |
81 | scalars. We're stuck with that. But a single reference can refer to |
82 | an entire array, and references are scalars, so you can have a hash of |
83 | references to arrays, and it'll act a lot like a hash of arrays, and |
84 | it'll be just as useful as a hash of arrays. |
85 | |
1da6492a |
86 | We'll come back to this city-country problem later, after we've seen |
a1e2a320 |
87 | some syntax for managing references. |
88 | |
89 | |
90 | =head1 Syntax |
91 | |
92 | There are just two ways to make a reference, and just two ways to use |
93 | it once you have it. |
94 | |
95 | =head2 Making References |
96 | |
a29d1a25 |
97 | =head3 B<Make Rule 1> |
a1e2a320 |
98 | |
99 | If you put a C<\> in front of a variable, you get a |
100 | reference to that variable. |
101 | |
102 | $aref = \@array; # $aref now holds a reference to @array |
103 | $href = \%hash; # $href now holds a reference to %hash |
104 | |
105 | Once the reference is stored in a variable like $aref or $href, you |
106 | can copy it or store it just the same as any other scalar value: |
107 | |
108 | $xy = $aref; # $xy now holds a reference to @array |
109 | $p[3] = $href; # $p[3] now holds a reference to %hash |
110 | $z = $p[3]; # $z now holds a reference to %hash |
111 | |
112 | |
113 | These examples show how to make references to variables with names. |
114 | Sometimes you want to make an array or a hash that doesn't have a |
115 | name. This is analogous to the way you like to be able to use the |
116 | string C<"\n"> or the number 80 without having to store it in a named |
117 | variable first. |
118 | |
119 | B<Make Rule 2> |
120 | |
121 | C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to |
0c76616b |
122 | that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a |
a1e2a320 |
123 | reference to that hash. |
124 | |
125 | $aref = [ 1, "foo", undef, 13 ]; |
126 | # $aref now holds a reference to an array |
127 | |
128 | $href = { APR => 4, AUG => 8 }; |
129 | # $href now holds a reference to a hash |
130 | |
131 | |
132 | The references you get from rule 2 are the same kind of |
133 | references that you get from rule 1: |
134 | |
135 | # This: |
136 | $aref = [ 1, 2, 3 ]; |
137 | |
138 | # Does the same as this: |
139 | @array = (1, 2, 3); |
140 | $aref = \@array; |
141 | |
142 | |
143 | The first line is an abbreviation for the following two lines, except |
144 | that it doesn't create the superfluous array variable C<@array>. |
145 | |
a29d1a25 |
146 | If you write just C<[]>, you get a new, empty anonymous array. |
147 | If you write just C<{}>, you get a new, empty anonymous hash. |
148 | |
a1e2a320 |
149 | |
150 | =head2 Using References |
151 | |
152 | What can you do with a reference once you have it? It's a scalar |
153 | value, and we've seen that you can store it as a scalar and get it back |
154 | again just like any scalar. There are just two more ways to use it: |
155 | |
a29d1a25 |
156 | =head3 B<Use Rule 1> |
a1e2a320 |
157 | |
a29d1a25 |
158 | You can always use an array reference, in curly braces, in place of |
159 | the name of an array. For example, C<@{$aref}> instead of C<@array>. |
a1e2a320 |
160 | |
161 | Here are some examples of that: |
162 | |
163 | Arrays: |
164 | |
165 | |
166 | @a @{$aref} An array |
167 | reverse @a reverse @{$aref} Reverse the array |
168 | $a[3] ${$aref}[3] An element of the array |
169 | $a[3] = 17; ${$aref}[3] = 17 Assigning an element |
170 | |
171 | |
172 | On each line are two expressions that do the same thing. The |
0c76616b |
173 | left-hand versions operate on the array C<@a>. The right-hand |
174 | versions operate on the array that is referred to by C<$aref>. Once |
175 | they find the array they're operating on, both versions do the same |
176 | things to the arrays. |
a1e2a320 |
177 | |
178 | Using a hash reference is I<exactly> the same: |
179 | |
180 | %h %{$href} A hash |
181 | keys %h keys %{$href} Get the keys from the hash |
182 | $h{'red'} ${$href}{'red'} An element of the hash |
183 | $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element |
184 | |
a29d1a25 |
185 | Whatever you want to do with a reference, B<Use Rule 1> tells you how |
186 | to do it. You just write the Perl code that you would have written |
187 | for doing the same thing to a regular array or hash, and then replace |
188 | the array or hash name with C<{$reference}>. "How do I loop over an |
189 | array when all I have is a reference?" Well, to loop over an array, you |
190 | would write |
191 | |
192 | for my $element (@array) { |
193 | ... |
194 | } |
195 | |
196 | so replace the array name, C<@array>, with the reference: |
197 | |
198 | for my $element (@{$aref}) { |
199 | ... |
200 | } |
201 | |
202 | "How do I print out the contents of a hash when all I have is a |
203 | reference?" First write the code for printing out a hash: |
204 | |
205 | for my $key (keys %hash) { |
206 | print "$key => $hash{$key}\n"; |
207 | } |
208 | |
209 | And then replace the hash name with the reference: |
210 | |
211 | for my $key (keys %{$href}) { |
212 | print "$key => ${$href}{$key}\n"; |
213 | } |
214 | |
215 | =head3 B<Use Rule 2> |
a1e2a320 |
216 | |
a0981a78 |
217 | B<Use Rule 1> is all you really need, because it tells you how to do |
a29d1a25 |
218 | absolutely everything you ever need to do with references. But the |
219 | most common thing to do with an array or a hash is to extract a single |
220 | element, and the B<Use Rule 1> notation is cumbersome. So there is an |
221 | abbreviation. |
a1e2a320 |
222 | |
c47ff5f1 |
223 | C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> |
a1e2a320 |
224 | instead. |
225 | |
226 | C<${$href}{red}> is too hard to read, so you can write |
c47ff5f1 |
227 | C<< $href->{red} >> instead. |
a1e2a320 |
228 | |
c47ff5f1 |
229 | If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is |
a1e2a320 |
230 | the fourth element of the array. Don't confuse this with C<$aref[3]>, |
231 | which is the fourth element of a totally different array, one |
232 | deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the |
233 | same way that C<$item> and C<@item> are. |
234 | |
c47ff5f1 |
235 | Similarly, C<< $href->{'red'} >> is part of the hash referred to by |
a1e2a320 |
236 | the scalar variable C<$href>, perhaps even one with no name. |
237 | C<$href{'red'}> is part of the deceptively named C<%href> hash. It's |
c47ff5f1 |
238 | easy to forget to leave out the C<< -> >>, and if you do, you'll get |
a1e2a320 |
239 | bizarre results when your program gets array and hash elements out of |
240 | totally unexpected hashes and arrays that weren't the ones you wanted |
241 | to use. |
242 | |
243 | |
a29d1a25 |
244 | =head2 An Example |
a1e2a320 |
245 | |
246 | Let's see a quick example of how all this is useful. |
247 | |
248 | First, remember that C<[1, 2, 3]> makes an anonymous array containing |
249 | C<(1, 2, 3)>, and gives you a reference to that array. |
250 | |
251 | Now think about |
252 | |
253 | @a = ( [1, 2, 3], |
254 | [4, 5, 6], |
255 | [7, 8, 9] |
256 | ); |
257 | |
258 | @a is an array with three elements, and each one is a reference to |
259 | another array. |
260 | |
261 | C<$a[1]> is one of these references. It refers to an array, the array |
262 | containing C<(4, 5, 6)>, and because it is a reference to an array, |
a29d1a25 |
263 | B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the |
c47ff5f1 |
264 | third element from that array. C<< $a[1]->[2] >> is the 6. |
265 | Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a |
266 | two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get |
a1e2a320 |
267 | or set the element in any row and any column of the array. |
268 | |
269 | The notation still looks a little cumbersome, so there's one more |
270 | abbreviation: |
271 | |
a29d1a25 |
272 | =head2 Arrow Rule |
a1e2a320 |
273 | |
274 | In between two B<subscripts>, the arrow is optional. |
275 | |
c47ff5f1 |
276 | Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the |
a29d1a25 |
277 | same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write |
278 | C<$a[0][1] = 23>; it means the same thing. |
a1e2a320 |
279 | |
280 | Now it really looks like two-dimensional arrays! |
281 | |
282 | You can see why the arrows are important. Without them, we would have |
283 | had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For |
284 | three-dimensional arrays, they let us write C<$x[2][3][5]> instead of |
285 | the unreadable C<${${$x[2]}[3]}[5]>. |
286 | |
a1e2a320 |
287 | =head1 Solution |
288 | |
1da6492a |
289 | Here's the answer to the problem I posed earlier, of reformatting a |
290 | file of city and country names. |
a1e2a320 |
291 | |
a29d1a25 |
292 | 1 my %table; |
293 | |
294 | 2 while (<>) { |
295 | 3 chomp; |
296 | 4 my ($city, $country) = split /, /; |
297 | 5 $table{$country} = [] unless exists $table{$country}; |
298 | 6 push @{$table{$country}}, $city; |
299 | 7 } |
300 | |
301 | 8 foreach $country (sort keys %table) { |
302 | 9 print "$country: "; |
303 | 10 my @cities = @{$table{$country}}; |
304 | 11 print join ', ', sort @cities; |
305 | 12 print ".\n"; |
306 | 13 } |
307 | |
308 | |
309 | The program has two pieces: Lines 2--7 read the input and build a data |
310 | structure, and lines 8-13 analyze the data and print out the report. |
311 | We're going to have a hash, C<%table>, whose keys are country names, |
312 | and whose values are references to arrays of city names. The data |
313 | structure will look like this: |
314 | |
315 | |
316 | %table |
317 | +-------+---+ |
318 | | | | +-----------+--------+ |
319 | |Germany| *---->| Frankfurt | Berlin | |
320 | | | | +-----------+--------+ |
321 | +-------+---+ |
322 | | | | +----------+ |
323 | |Finland| *---->| Helsinki | |
324 | | | | +----------+ |
325 | +-------+---+ |
326 | | | | +---------+------------+----------+ |
327 | | USA | *---->| Chicago | Washington | New York | |
328 | | | | +---------+------------+----------+ |
329 | +-------+---+ |
330 | |
331 | We'll look at output first. Supposing we already have this structure, |
332 | how do we print it out? |
333 | |
0c76616b |
334 | 8 foreach $country (sort keys %table) { |
335 | 9 print "$country: "; |
336 | 10 my @cities = @{$table{$country}}; |
337 | 11 print join ', ', sort @cities; |
338 | 12 print ".\n"; |
339 | 13 } |
340 | |
a29d1a25 |
341 | C<%table> is an |
342 | ordinary hash, and we get a list of keys from it, sort the keys, and |
343 | loop over the keys as usual. The only use of references is in line 10. |
344 | C<$table{$country}> looks up the key C<$country> in the hash |
345 | and gets the value, which is a reference to an array of cities in that country. |
346 | B<Use Rule 1> says that |
347 | we can recover the array by saying |
348 | C<@{$table{$country}}>. Line 10 is just like |
a1e2a320 |
349 | |
a29d1a25 |
350 | @cities = @array; |
a1e2a320 |
351 | |
352 | except that the name C<array> has been replaced by the reference |
a29d1a25 |
353 | C<{$table{$country}}>. The C<@> tells Perl to get the entire array. |
354 | Having gotten the list of cities, we sort it, join it, and print it |
355 | out as usual. |
a1e2a320 |
356 | |
a29d1a25 |
357 | Lines 2-7 are responsible for building the structure in the first |
0c76616b |
358 | place. Here they are again: |
a1e2a320 |
359 | |
a29d1a25 |
360 | 2 while (<>) { |
361 | 3 chomp; |
362 | 4 my ($city, $country) = split /, /; |
363 | 5 $table{$country} = [] unless exists $table{$country}; |
364 | 6 push @{$table{$country}}, $city; |
365 | 7 } |
a1e2a320 |
366 | |
a29d1a25 |
367 | Lines 2-4 acquire a city and country name. Line 5 looks to see if the |
368 | country is already present as a key in the hash. If it's not, the |
369 | program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new, |
370 | empty anonymous array of cities, and installs a reference to it into |
371 | the hash under the appropriate key. |
a1e2a320 |
372 | |
a29d1a25 |
373 | Line 6 installs the city name into the appropriate array. |
374 | C<$table{$country}> now holds a reference to the array of cities seen |
375 | in that country so far. Line 6 is exactly like |
a1e2a320 |
376 | |
a29d1a25 |
377 | push @array, $city; |
a1e2a320 |
378 | |
a29d1a25 |
379 | except that the name C<array> has been replaced by the reference |
380 | C<{$table{$country}}>. The C<push> adds a city name to the end of the |
381 | referred-to array. |
a1e2a320 |
382 | |
a29d1a25 |
383 | There's one fine point I skipped. Line 5 is unnecessary, and we can |
384 | get rid of it. |
385 | |
386 | 2 while (<>) { |
387 | 3 chomp; |
388 | 4 my ($city, $country) = split /, /; |
389 | 5 #### $table{$country} = [] unless exists $table{$country}; |
390 | 6 push @{$table{$country}}, $city; |
391 | 7 } |
392 | |
393 | If there's already an entry in C<%table> for the current C<$country>, |
394 | then nothing is different. Line 6 will locate the value in |
395 | C<$table{$country}>, which is a reference to an array, and push |
396 | C<$city> into the array. But |
397 | what does it do when |
398 | C<$country> holds a key, say C<Greece>, that is not yet in C<%table>? |
a1e2a320 |
399 | |
400 | This is Perl, so it does the exact right thing. It sees that you want |
1da6492a |
401 | to push C<Athens> onto an array that doesn't exist, so it helpfully |
a29d1a25 |
402 | makes a new, empty, anonymous array for you, installs it into |
403 | C<%table>, and then pushes C<Athens> onto it. This is called |
404 | `autovivification'--bringing things to life automatically. Perl saw |
405 | that they key wasn't in the hash, so it created a new hash entry |
406 | automatically. Perl saw that you wanted to use the hash value as an |
407 | array, so it created a new empty array and installed a reference to it |
408 | in the hash automatically. And as usual, Perl made the array one |
409 | element longer to hold the new city name. |
a1e2a320 |
410 | |
411 | =head1 The Rest |
412 | |
413 | I promised to give you 90% of the benefit with 10% of the details, and |
414 | that means I left out 90% of the details. Now that you have an |
415 | overview of the important parts, it should be easier to read the |
416 | L<perlref> manual page, which discusses 100% of the details. |
417 | |
418 | Some of the highlights of L<perlref>: |
419 | |
420 | =over 4 |
421 | |
422 | =item * |
423 | |
424 | You can make references to anything, including scalars, functions, and |
425 | other references. |
426 | |
427 | =item * |
428 | |
0c76616b |
429 | In B<Use Rule 1>, you can omit the curly brackets whenever the thing |
1da6492a |
430 | inside them is an atomic scalar variable like C<$aref>. For example, |
a1e2a320 |
431 | C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as |
1da6492a |
432 | C<${$aref}[1]>. If you're just starting out, you may want to adopt |
d98d5fff |
433 | the habit of always including the curly brackets. |
a1e2a320 |
434 | |
a29d1a25 |
435 | =item * |
436 | |
437 | This doesn't copy the underlying array: |
438 | |
439 | $aref2 = $aref1; |
440 | |
441 | You get two references to the same array. If you modify |
442 | C<< $aref1->[23] >> and then look at |
443 | C<< $aref2->[23] >> you'll see the change. |
444 | |
445 | To copy the array, use |
446 | |
447 | $aref2 = [@{$aref1}]; |
448 | |
449 | This uses C<[...]> notation to create a new anonymous array, and |
450 | C<$aref2> is assigned a reference to the new array. The new array is |
451 | initialized with the contents of the array referred to by C<$aref1>. |
452 | |
453 | Similarly, to copy an anonymous hash, you can use |
454 | |
0c76616b |
455 | $href2 = {%{$href1}}; |
a29d1a25 |
456 | |
a1e2a320 |
457 | =item * |
458 | |
0c76616b |
459 | To see if a variable contains a reference, use the C<ref> function. It |
a29d1a25 |
460 | returns true if its argument is a reference. Actually it's a little |
461 | better than that: It returns C<HASH> for hash references and C<ARRAY> |
462 | for array references. |
a1e2a320 |
463 | |
464 | =item * |
465 | |
466 | If you try to use a reference like a string, you get strings like |
467 | |
468 | ARRAY(0x80f5dec) or HASH(0x826afc0) |
469 | |
470 | If you ever see a string that looks like this, you'll know you |
471 | printed out a reference by mistake. |
472 | |
473 | A side effect of this representation is that you can use C<eq> to see |
474 | if two references refer to the same thing. (But you should usually use |
475 | C<==> instead because it's much faster.) |
476 | |
477 | =item * |
478 | |
479 | You can use a string as if it were a reference. If you use the string |
480 | C<"foo"> as an array reference, it's taken to be a reference to the |
0c76616b |
481 | array C<@foo>. This is called a I<soft reference> or I<symbolic |
482 | reference>. The declaration C<use strict 'refs'> disables this |
483 | feature, which can cause all sorts of trouble if you use it by accident. |
a1e2a320 |
484 | |
485 | =back |
486 | |
487 | You might prefer to go on to L<perllol> instead of L<perlref>; it |
488 | discusses lists of lists and multidimensional arrays in detail. After |
489 | that, you should move on to L<perldsc>; it's a Data Structure Cookbook |
490 | that shows recipes for using and printing out arrays of hashes, hashes |
491 | of arrays, and other kinds of data. |
492 | |
493 | =head1 Summary |
494 | |
495 | Everyone needs compound data structures, and in Perl the way you get |
496 | them is with references. There are four important rules for managing |
497 | references: Two for making references and two for using them. Once |
498 | you know these rules you can do most of the important things you need |
499 | to do with references. |
500 | |
501 | =head1 Credits |
502 | |
0c76616b |
503 | Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) |
a1e2a320 |
504 | |
1da6492a |
505 | This article originally appeared in I<The Perl Journal> |
f224927c |
506 | ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. |
a1e2a320 |
507 | |
508 | The original title was I<Understand References Today>. |
509 | |
1da6492a |
510 | =head2 Distribution Conditions |
511 | |
512 | Copyright 1998 The Perl Journal. |
513 | |
49d5cbad |
514 | This documentation is free; you can redistribute it and/or modify it |
515 | under the same terms as Perl itself. |
1da6492a |
516 | |
517 | Irrespective of its distribution, all code examples in these files are |
518 | hereby placed into the public domain. You are permitted and |
519 | encouraged to use this code in your own programs for fun or for profit |
520 | as you see fit. A simple comment in the code giving credit would be |
521 | courteous but is not required. |
a1e2a320 |
522 | |
a1e2a320 |
523 | |
1da6492a |
524 | |
525 | |
526 | =cut |