Commit | Line | Data |
a1e2a320 |
1 | =head1 NAME |
2 | |
3 | perlreftut - Mark's very short tutorial about references |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | One of the most important new features in Perl 5 was the capability to |
8 | manage complicated data structures like multidimensional arrays and |
9 | nested hashes. To enable these, Perl 5 introduced a feature called |
10 | `references', and using references is the key to managing complicated, |
11 | structured data in Perl. Unfortunately, there's a lot of funny syntax |
12 | to learn, and the main manual page can be hard to follow. The manual |
1da6492a |
13 | is quite complete, and sometimes people find that a problem, because |
14 | it can be hard to tell what is important and what isn't. |
a1e2a320 |
15 | |
16 | Fortunately, you only need to know 10% of what's in the main page to get |
17 | 90% of the benefit. This page will show you that 10%. |
18 | |
19 | =head1 Who Needs Complicated Data Structures? |
20 | |
21 | One problem that came up all the time in Perl 4 was how to represent a |
22 | hash whose values were lists. Perl 4 had hashes, of course, but the |
91ee9109 |
23 | values had to be scalars; they couldn't be lists. |
a1e2a320 |
24 | |
25 | Why would you want a hash of lists? Let's take a simple example: You |
1da6492a |
26 | have a file of city and country names, like this: |
a1e2a320 |
27 | |
1da6492a |
28 | Chicago, USA |
29 | Frankfurt, Germany |
30 | Berlin, Germany |
31 | Washington, USA |
32 | Helsinki, Finland |
33 | New York, USA |
a1e2a320 |
34 | |
1da6492a |
35 | and you want to produce an output like this, with each country mentioned |
36 | once, and then an alphabetical list of the cities in that country: |
a1e2a320 |
37 | |
1da6492a |
38 | Finland: Helsinki. |
39 | Germany: Berlin, Frankfurt. |
40 | USA: Chicago, New York, Washington. |
a1e2a320 |
41 | |
1da6492a |
42 | The natural way to do this is to have a hash whose keys are country |
43 | names. Associated with each country name key is a list of the cities in |
44 | that country. Each time you read a line of input, split it into a country |
a1e2a320 |
45 | and a city, look up the list of cities already known to be in that |
1da6492a |
46 | country, and append the new city to the list. When you're done reading |
a1e2a320 |
47 | the input, iterate over the hash as usual, sorting each list of cities |
48 | before you print it out. |
49 | |
50 | If hash values can't be lists, you lose. In Perl 4, hash values can't |
51 | be lists; they can only be strings. You lose. You'd probably have to |
52 | combine all the cities into a single string somehow, and then when |
53 | time came to write the output, you'd have to break the string into a |
54 | list, sort the list, and turn it back into a string. This is messy |
55 | and error-prone. And it's frustrating, because Perl already has |
56 | perfectly good lists that would solve the problem if only you could |
57 | use them. |
58 | |
59 | =head1 The Solution |
60 | |
1da6492a |
61 | By the time Perl 5 rolled around, we were already stuck with this |
62 | design: Hash values must be scalars. The solution to this is |
a1e2a320 |
63 | references. |
64 | |
65 | A reference is a scalar value that I<refers to> an entire array or an |
1da6492a |
66 | entire hash (or to just about anything else). Names are one kind of |
e937c8c3 |
67 | reference that you're already familiar with. Think of the President |
68 | of the United States: a messy, inconvenient bag of blood and bones. |
69 | But to talk about him, or to represent him in a computer program, all |
70 | you need is the easy, convenient scalar string "George Bush". |
a1e2a320 |
71 | |
72 | References in Perl are like names for arrays and hashes. They're |
73 | Perl's private, internal names, so you can be sure they're |
e937c8c3 |
74 | unambiguous. Unlike "George Bush", a reference only refers to one |
a1e2a320 |
75 | thing, and you always know what it refers to. If you have a reference |
76 | to an array, you can recover the entire array from it. If you have a |
77 | reference to a hash, you can recover the entire hash. But the |
78 | reference is still an easy, compact scalar value. |
79 | |
80 | You can't have a hash whose values are arrays; hash values can only be |
81 | scalars. We're stuck with that. But a single reference can refer to |
82 | an entire array, and references are scalars, so you can have a hash of |
83 | references to arrays, and it'll act a lot like a hash of arrays, and |
84 | it'll be just as useful as a hash of arrays. |
85 | |
1da6492a |
86 | We'll come back to this city-country problem later, after we've seen |
a1e2a320 |
87 | some syntax for managing references. |
88 | |
89 | |
90 | =head1 Syntax |
91 | |
92 | There are just two ways to make a reference, and just two ways to use |
93 | it once you have it. |
94 | |
95 | =head2 Making References |
96 | |
a29d1a25 |
97 | =head3 B<Make Rule 1> |
a1e2a320 |
98 | |
99 | If you put a C<\> in front of a variable, you get a |
100 | reference to that variable. |
101 | |
102 | $aref = \@array; # $aref now holds a reference to @array |
103 | $href = \%hash; # $href now holds a reference to %hash |
91ee9109 |
104 | $sref = \$scalar; # $sref now holds a reference to $scalar |
a1e2a320 |
105 | |
106 | Once the reference is stored in a variable like $aref or $href, you |
107 | can copy it or store it just the same as any other scalar value: |
108 | |
109 | $xy = $aref; # $xy now holds a reference to @array |
110 | $p[3] = $href; # $p[3] now holds a reference to %hash |
111 | $z = $p[3]; # $z now holds a reference to %hash |
112 | |
113 | |
114 | These examples show how to make references to variables with names. |
115 | Sometimes you want to make an array or a hash that doesn't have a |
116 | name. This is analogous to the way you like to be able to use the |
117 | string C<"\n"> or the number 80 without having to store it in a named |
118 | variable first. |
119 | |
120 | B<Make Rule 2> |
121 | |
122 | C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to |
0c76616b |
123 | that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a |
a1e2a320 |
124 | reference to that hash. |
125 | |
91ee9109 |
126 | $aref = [ 1, "foo", undef, 13 ]; |
a1e2a320 |
127 | # $aref now holds a reference to an array |
128 | |
91ee9109 |
129 | $href = { APR => 4, AUG => 8 }; |
a1e2a320 |
130 | # $href now holds a reference to a hash |
131 | |
132 | |
133 | The references you get from rule 2 are the same kind of |
134 | references that you get from rule 1: |
135 | |
136 | # This: |
137 | $aref = [ 1, 2, 3 ]; |
138 | |
139 | # Does the same as this: |
140 | @array = (1, 2, 3); |
141 | $aref = \@array; |
142 | |
143 | |
144 | The first line is an abbreviation for the following two lines, except |
145 | that it doesn't create the superfluous array variable C<@array>. |
146 | |
a29d1a25 |
147 | If you write just C<[]>, you get a new, empty anonymous array. |
148 | If you write just C<{}>, you get a new, empty anonymous hash. |
149 | |
a1e2a320 |
150 | |
151 | =head2 Using References |
152 | |
153 | What can you do with a reference once you have it? It's a scalar |
154 | value, and we've seen that you can store it as a scalar and get it back |
155 | again just like any scalar. There are just two more ways to use it: |
156 | |
a29d1a25 |
157 | =head3 B<Use Rule 1> |
a1e2a320 |
158 | |
a29d1a25 |
159 | You can always use an array reference, in curly braces, in place of |
160 | the name of an array. For example, C<@{$aref}> instead of C<@array>. |
a1e2a320 |
161 | |
162 | Here are some examples of that: |
163 | |
164 | Arrays: |
165 | |
166 | |
167 | @a @{$aref} An array |
168 | reverse @a reverse @{$aref} Reverse the array |
169 | $a[3] ${$aref}[3] An element of the array |
170 | $a[3] = 17; ${$aref}[3] = 17 Assigning an element |
171 | |
172 | |
173 | On each line are two expressions that do the same thing. The |
0c76616b |
174 | left-hand versions operate on the array C<@a>. The right-hand |
175 | versions operate on the array that is referred to by C<$aref>. Once |
176 | they find the array they're operating on, both versions do the same |
177 | things to the arrays. |
a1e2a320 |
178 | |
179 | Using a hash reference is I<exactly> the same: |
180 | |
181 | %h %{$href} A hash |
182 | keys %h keys %{$href} Get the keys from the hash |
183 | $h{'red'} ${$href}{'red'} An element of the hash |
184 | $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element |
185 | |
a29d1a25 |
186 | Whatever you want to do with a reference, B<Use Rule 1> tells you how |
187 | to do it. You just write the Perl code that you would have written |
188 | for doing the same thing to a regular array or hash, and then replace |
189 | the array or hash name with C<{$reference}>. "How do I loop over an |
190 | array when all I have is a reference?" Well, to loop over an array, you |
191 | would write |
192 | |
193 | for my $element (@array) { |
194 | ... |
195 | } |
196 | |
197 | so replace the array name, C<@array>, with the reference: |
198 | |
199 | for my $element (@{$aref}) { |
200 | ... |
201 | } |
202 | |
203 | "How do I print out the contents of a hash when all I have is a |
204 | reference?" First write the code for printing out a hash: |
205 | |
206 | for my $key (keys %hash) { |
207 | print "$key => $hash{$key}\n"; |
208 | } |
209 | |
210 | And then replace the hash name with the reference: |
211 | |
212 | for my $key (keys %{$href}) { |
213 | print "$key => ${$href}{$key}\n"; |
214 | } |
215 | |
216 | =head3 B<Use Rule 2> |
a1e2a320 |
217 | |
a0981a78 |
218 | B<Use Rule 1> is all you really need, because it tells you how to do |
a29d1a25 |
219 | absolutely everything you ever need to do with references. But the |
220 | most common thing to do with an array or a hash is to extract a single |
221 | element, and the B<Use Rule 1> notation is cumbersome. So there is an |
222 | abbreviation. |
a1e2a320 |
223 | |
c47ff5f1 |
224 | C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> |
a1e2a320 |
225 | instead. |
226 | |
227 | C<${$href}{red}> is too hard to read, so you can write |
c47ff5f1 |
228 | C<< $href->{red} >> instead. |
a1e2a320 |
229 | |
c47ff5f1 |
230 | If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is |
a1e2a320 |
231 | the fourth element of the array. Don't confuse this with C<$aref[3]>, |
232 | which is the fourth element of a totally different array, one |
233 | deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the |
234 | same way that C<$item> and C<@item> are. |
235 | |
c47ff5f1 |
236 | Similarly, C<< $href->{'red'} >> is part of the hash referred to by |
a1e2a320 |
237 | the scalar variable C<$href>, perhaps even one with no name. |
238 | C<$href{'red'}> is part of the deceptively named C<%href> hash. It's |
c47ff5f1 |
239 | easy to forget to leave out the C<< -> >>, and if you do, you'll get |
a1e2a320 |
240 | bizarre results when your program gets array and hash elements out of |
241 | totally unexpected hashes and arrays that weren't the ones you wanted |
242 | to use. |
243 | |
244 | |
a29d1a25 |
245 | =head2 An Example |
a1e2a320 |
246 | |
247 | Let's see a quick example of how all this is useful. |
248 | |
249 | First, remember that C<[1, 2, 3]> makes an anonymous array containing |
250 | C<(1, 2, 3)>, and gives you a reference to that array. |
251 | |
252 | Now think about |
253 | |
254 | @a = ( [1, 2, 3], |
255 | [4, 5, 6], |
256 | [7, 8, 9] |
257 | ); |
258 | |
259 | @a is an array with three elements, and each one is a reference to |
260 | another array. |
261 | |
262 | C<$a[1]> is one of these references. It refers to an array, the array |
263 | containing C<(4, 5, 6)>, and because it is a reference to an array, |
a29d1a25 |
264 | B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the |
c47ff5f1 |
265 | third element from that array. C<< $a[1]->[2] >> is the 6. |
266 | Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a |
267 | two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get |
a1e2a320 |
268 | or set the element in any row and any column of the array. |
269 | |
270 | The notation still looks a little cumbersome, so there's one more |
91ee9109 |
271 | abbreviation: |
a1e2a320 |
272 | |
a29d1a25 |
273 | =head2 Arrow Rule |
a1e2a320 |
274 | |
275 | In between two B<subscripts>, the arrow is optional. |
276 | |
c47ff5f1 |
277 | Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the |
a29d1a25 |
278 | same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write |
279 | C<$a[0][1] = 23>; it means the same thing. |
a1e2a320 |
280 | |
281 | Now it really looks like two-dimensional arrays! |
282 | |
283 | You can see why the arrows are important. Without them, we would have |
284 | had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For |
285 | three-dimensional arrays, they let us write C<$x[2][3][5]> instead of |
286 | the unreadable C<${${$x[2]}[3]}[5]>. |
287 | |
a1e2a320 |
288 | =head1 Solution |
289 | |
1da6492a |
290 | Here's the answer to the problem I posed earlier, of reformatting a |
291 | file of city and country names. |
a1e2a320 |
292 | |
a29d1a25 |
293 | 1 my %table; |
294 | |
295 | 2 while (<>) { |
296 | 3 chomp; |
297 | 4 my ($city, $country) = split /, /; |
298 | 5 $table{$country} = [] unless exists $table{$country}; |
299 | 6 push @{$table{$country}}, $city; |
300 | 7 } |
301 | |
302 | 8 foreach $country (sort keys %table) { |
303 | 9 print "$country: "; |
304 | 10 my @cities = @{$table{$country}}; |
305 | 11 print join ', ', sort @cities; |
306 | 12 print ".\n"; |
307 | 13 } |
308 | |
309 | |
310 | The program has two pieces: Lines 2--7 read the input and build a data |
311 | structure, and lines 8-13 analyze the data and print out the report. |
312 | We're going to have a hash, C<%table>, whose keys are country names, |
313 | and whose values are references to arrays of city names. The data |
314 | structure will look like this: |
315 | |
316 | |
317 | %table |
91ee9109 |
318 | +-------+---+ |
a29d1a25 |
319 | | | | +-----------+--------+ |
320 | |Germany| *---->| Frankfurt | Berlin | |
321 | | | | +-----------+--------+ |
322 | +-------+---+ |
323 | | | | +----------+ |
324 | |Finland| *---->| Helsinki | |
325 | | | | +----------+ |
326 | +-------+---+ |
327 | | | | +---------+------------+----------+ |
328 | | USA | *---->| Chicago | Washington | New York | |
329 | | | | +---------+------------+----------+ |
330 | +-------+---+ |
331 | |
332 | We'll look at output first. Supposing we already have this structure, |
333 | how do we print it out? |
334 | |
0c76616b |
335 | 8 foreach $country (sort keys %table) { |
336 | 9 print "$country: "; |
337 | 10 my @cities = @{$table{$country}}; |
338 | 11 print join ', ', sort @cities; |
339 | 12 print ".\n"; |
340 | 13 } |
341 | |
a29d1a25 |
342 | C<%table> is an |
343 | ordinary hash, and we get a list of keys from it, sort the keys, and |
344 | loop over the keys as usual. The only use of references is in line 10. |
345 | C<$table{$country}> looks up the key C<$country> in the hash |
346 | and gets the value, which is a reference to an array of cities in that country. |
347 | B<Use Rule 1> says that |
348 | we can recover the array by saying |
349 | C<@{$table{$country}}>. Line 10 is just like |
a1e2a320 |
350 | |
a29d1a25 |
351 | @cities = @array; |
a1e2a320 |
352 | |
353 | except that the name C<array> has been replaced by the reference |
a29d1a25 |
354 | C<{$table{$country}}>. The C<@> tells Perl to get the entire array. |
355 | Having gotten the list of cities, we sort it, join it, and print it |
356 | out as usual. |
a1e2a320 |
357 | |
a29d1a25 |
358 | Lines 2-7 are responsible for building the structure in the first |
0c76616b |
359 | place. Here they are again: |
a1e2a320 |
360 | |
a29d1a25 |
361 | 2 while (<>) { |
362 | 3 chomp; |
363 | 4 my ($city, $country) = split /, /; |
364 | 5 $table{$country} = [] unless exists $table{$country}; |
365 | 6 push @{$table{$country}}, $city; |
366 | 7 } |
a1e2a320 |
367 | |
a29d1a25 |
368 | Lines 2-4 acquire a city and country name. Line 5 looks to see if the |
369 | country is already present as a key in the hash. If it's not, the |
370 | program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new, |
371 | empty anonymous array of cities, and installs a reference to it into |
372 | the hash under the appropriate key. |
a1e2a320 |
373 | |
a29d1a25 |
374 | Line 6 installs the city name into the appropriate array. |
375 | C<$table{$country}> now holds a reference to the array of cities seen |
376 | in that country so far. Line 6 is exactly like |
a1e2a320 |
377 | |
a29d1a25 |
378 | push @array, $city; |
a1e2a320 |
379 | |
a29d1a25 |
380 | except that the name C<array> has been replaced by the reference |
381 | C<{$table{$country}}>. The C<push> adds a city name to the end of the |
382 | referred-to array. |
a1e2a320 |
383 | |
a29d1a25 |
384 | There's one fine point I skipped. Line 5 is unnecessary, and we can |
91ee9109 |
385 | get rid of it. |
a29d1a25 |
386 | |
387 | 2 while (<>) { |
388 | 3 chomp; |
389 | 4 my ($city, $country) = split /, /; |
390 | 5 #### $table{$country} = [] unless exists $table{$country}; |
391 | 6 push @{$table{$country}}, $city; |
392 | 7 } |
393 | |
394 | If there's already an entry in C<%table> for the current C<$country>, |
395 | then nothing is different. Line 6 will locate the value in |
396 | C<$table{$country}>, which is a reference to an array, and push |
397 | C<$city> into the array. But |
398 | what does it do when |
399 | C<$country> holds a key, say C<Greece>, that is not yet in C<%table>? |
a1e2a320 |
400 | |
401 | This is Perl, so it does the exact right thing. It sees that you want |
1da6492a |
402 | to push C<Athens> onto an array that doesn't exist, so it helpfully |
a29d1a25 |
403 | makes a new, empty, anonymous array for you, installs it into |
404 | C<%table>, and then pushes C<Athens> onto it. This is called |
405 | `autovivification'--bringing things to life automatically. Perl saw |
406 | that they key wasn't in the hash, so it created a new hash entry |
407 | automatically. Perl saw that you wanted to use the hash value as an |
408 | array, so it created a new empty array and installed a reference to it |
409 | in the hash automatically. And as usual, Perl made the array one |
410 | element longer to hold the new city name. |
a1e2a320 |
411 | |
412 | =head1 The Rest |
413 | |
414 | I promised to give you 90% of the benefit with 10% of the details, and |
415 | that means I left out 90% of the details. Now that you have an |
416 | overview of the important parts, it should be easier to read the |
417 | L<perlref> manual page, which discusses 100% of the details. |
418 | |
419 | Some of the highlights of L<perlref>: |
420 | |
421 | =over 4 |
422 | |
423 | =item * |
424 | |
425 | You can make references to anything, including scalars, functions, and |
426 | other references. |
427 | |
428 | =item * |
429 | |
0c76616b |
430 | In B<Use Rule 1>, you can omit the curly brackets whenever the thing |
1da6492a |
431 | inside them is an atomic scalar variable like C<$aref>. For example, |
a1e2a320 |
432 | C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as |
1da6492a |
433 | C<${$aref}[1]>. If you're just starting out, you may want to adopt |
d98d5fff |
434 | the habit of always including the curly brackets. |
a1e2a320 |
435 | |
a29d1a25 |
436 | =item * |
437 | |
438 | This doesn't copy the underlying array: |
439 | |
91ee9109 |
440 | $aref2 = $aref1; |
a29d1a25 |
441 | |
91ee9109 |
442 | You get two references to the same array. If you modify |
a29d1a25 |
443 | C<< $aref1->[23] >> and then look at |
91ee9109 |
444 | C<< $aref2->[23] >> you'll see the change. |
a29d1a25 |
445 | |
446 | To copy the array, use |
447 | |
448 | $aref2 = [@{$aref1}]; |
449 | |
450 | This uses C<[...]> notation to create a new anonymous array, and |
451 | C<$aref2> is assigned a reference to the new array. The new array is |
452 | initialized with the contents of the array referred to by C<$aref1>. |
453 | |
454 | Similarly, to copy an anonymous hash, you can use |
455 | |
0c76616b |
456 | $href2 = {%{$href1}}; |
a29d1a25 |
457 | |
91ee9109 |
458 | =item * |
a1e2a320 |
459 | |
0c76616b |
460 | To see if a variable contains a reference, use the C<ref> function. It |
a29d1a25 |
461 | returns true if its argument is a reference. Actually it's a little |
462 | better than that: It returns C<HASH> for hash references and C<ARRAY> |
463 | for array references. |
a1e2a320 |
464 | |
91ee9109 |
465 | =item * |
a1e2a320 |
466 | |
467 | If you try to use a reference like a string, you get strings like |
468 | |
469 | ARRAY(0x80f5dec) or HASH(0x826afc0) |
470 | |
471 | If you ever see a string that looks like this, you'll know you |
472 | printed out a reference by mistake. |
473 | |
474 | A side effect of this representation is that you can use C<eq> to see |
475 | if two references refer to the same thing. (But you should usually use |
476 | C<==> instead because it's much faster.) |
477 | |
478 | =item * |
479 | |
480 | You can use a string as if it were a reference. If you use the string |
481 | C<"foo"> as an array reference, it's taken to be a reference to the |
0c76616b |
482 | array C<@foo>. This is called a I<soft reference> or I<symbolic |
483 | reference>. The declaration C<use strict 'refs'> disables this |
484 | feature, which can cause all sorts of trouble if you use it by accident. |
a1e2a320 |
485 | |
486 | =back |
487 | |
488 | You might prefer to go on to L<perllol> instead of L<perlref>; it |
489 | discusses lists of lists and multidimensional arrays in detail. After |
490 | that, you should move on to L<perldsc>; it's a Data Structure Cookbook |
491 | that shows recipes for using and printing out arrays of hashes, hashes |
492 | of arrays, and other kinds of data. |
493 | |
494 | =head1 Summary |
495 | |
496 | Everyone needs compound data structures, and in Perl the way you get |
497 | them is with references. There are four important rules for managing |
498 | references: Two for making references and two for using them. Once |
499 | you know these rules you can do most of the important things you need |
500 | to do with references. |
501 | |
502 | =head1 Credits |
503 | |
0c76616b |
504 | Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) |
a1e2a320 |
505 | |
1da6492a |
506 | This article originally appeared in I<The Perl Journal> |
91ee9109 |
507 | ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. |
a1e2a320 |
508 | |
509 | The original title was I<Understand References Today>. |
510 | |
1da6492a |
511 | =head2 Distribution Conditions |
512 | |
513 | Copyright 1998 The Perl Journal. |
514 | |
49d5cbad |
515 | This documentation is free; you can redistribute it and/or modify it |
516 | under the same terms as Perl itself. |
1da6492a |
517 | |
518 | Irrespective of its distribution, all code examples in these files are |
519 | hereby placed into the public domain. You are permitted and |
520 | encouraged to use this code in your own programs for fun or for profit |
521 | as you see fit. A simple comment in the code giving credit would be |
522 | courteous but is not required. |
a1e2a320 |
523 | |
a1e2a320 |
524 | |
1da6492a |
525 | |
526 | |
527 | =cut |