Commit | Line | Data |
a1e2a320 |
1 | |
2 | =head1 NAME |
3 | |
4 | perlreftut - Mark's very short tutorial about references |
5 | |
6 | =head1 DESCRIPTION |
7 | |
8 | One of the most important new features in Perl 5 was the capability to |
9 | manage complicated data structures like multidimensional arrays and |
10 | nested hashes. To enable these, Perl 5 introduced a feature called |
11 | `references', and using references is the key to managing complicated, |
12 | structured data in Perl. Unfortunately, there's a lot of funny syntax |
13 | to learn, and the main manual page can be hard to follow. The manual |
1da6492a |
14 | is quite complete, and sometimes people find that a problem, because |
15 | it can be hard to tell what is important and what isn't. |
a1e2a320 |
16 | |
17 | Fortunately, you only need to know 10% of what's in the main page to get |
18 | 90% of the benefit. This page will show you that 10%. |
19 | |
20 | =head1 Who Needs Complicated Data Structures? |
21 | |
22 | One problem that came up all the time in Perl 4 was how to represent a |
23 | hash whose values were lists. Perl 4 had hashes, of course, but the |
24 | values had to be scalars; they couldn't be lists. |
25 | |
26 | Why would you want a hash of lists? Let's take a simple example: You |
1da6492a |
27 | have a file of city and country names, like this: |
a1e2a320 |
28 | |
1da6492a |
29 | Chicago, USA |
30 | Frankfurt, Germany |
31 | Berlin, Germany |
32 | Washington, USA |
33 | Helsinki, Finland |
34 | New York, USA |
a1e2a320 |
35 | |
1da6492a |
36 | and you want to produce an output like this, with each country mentioned |
37 | once, and then an alphabetical list of the cities in that country: |
a1e2a320 |
38 | |
1da6492a |
39 | Finland: Helsinki. |
40 | Germany: Berlin, Frankfurt. |
41 | USA: Chicago, New York, Washington. |
a1e2a320 |
42 | |
1da6492a |
43 | The natural way to do this is to have a hash whose keys are country |
44 | names. Associated with each country name key is a list of the cities in |
45 | that country. Each time you read a line of input, split it into a country |
a1e2a320 |
46 | and a city, look up the list of cities already known to be in that |
1da6492a |
47 | country, and append the new city to the list. When you're done reading |
a1e2a320 |
48 | the input, iterate over the hash as usual, sorting each list of cities |
49 | before you print it out. |
50 | |
51 | If hash values can't be lists, you lose. In Perl 4, hash values can't |
52 | be lists; they can only be strings. You lose. You'd probably have to |
53 | combine all the cities into a single string somehow, and then when |
54 | time came to write the output, you'd have to break the string into a |
55 | list, sort the list, and turn it back into a string. This is messy |
56 | and error-prone. And it's frustrating, because Perl already has |
57 | perfectly good lists that would solve the problem if only you could |
58 | use them. |
59 | |
60 | =head1 The Solution |
61 | |
1da6492a |
62 | By the time Perl 5 rolled around, we were already stuck with this |
63 | design: Hash values must be scalars. The solution to this is |
a1e2a320 |
64 | references. |
65 | |
66 | A reference is a scalar value that I<refers to> an entire array or an |
1da6492a |
67 | entire hash (or to just about anything else). Names are one kind of |
a1e2a320 |
68 | reference that you're already familiar with. Think of the President: |
69 | a messy, inconvenient bag of blood and bones. But to talk about him, |
70 | or to represent him in a computer program, all you need is the easy, |
71 | convenient scalar string "Bill Clinton". |
72 | |
73 | References in Perl are like names for arrays and hashes. They're |
74 | Perl's private, internal names, so you can be sure they're |
75 | unambiguous. Unlike "Bill Clinton", a reference only refers to one |
76 | thing, and you always know what it refers to. If you have a reference |
77 | to an array, you can recover the entire array from it. If you have a |
78 | reference to a hash, you can recover the entire hash. But the |
79 | reference is still an easy, compact scalar value. |
80 | |
81 | You can't have a hash whose values are arrays; hash values can only be |
82 | scalars. We're stuck with that. But a single reference can refer to |
83 | an entire array, and references are scalars, so you can have a hash of |
84 | references to arrays, and it'll act a lot like a hash of arrays, and |
85 | it'll be just as useful as a hash of arrays. |
86 | |
1da6492a |
87 | We'll come back to this city-country problem later, after we've seen |
a1e2a320 |
88 | some syntax for managing references. |
89 | |
90 | |
91 | =head1 Syntax |
92 | |
93 | There are just two ways to make a reference, and just two ways to use |
94 | it once you have it. |
95 | |
96 | =head2 Making References |
97 | |
98 | B<Make Rule 1> |
99 | |
100 | If you put a C<\> in front of a variable, you get a |
101 | reference to that variable. |
102 | |
103 | $aref = \@array; # $aref now holds a reference to @array |
104 | $href = \%hash; # $href now holds a reference to %hash |
105 | |
106 | Once the reference is stored in a variable like $aref or $href, you |
107 | can copy it or store it just the same as any other scalar value: |
108 | |
109 | $xy = $aref; # $xy now holds a reference to @array |
110 | $p[3] = $href; # $p[3] now holds a reference to %hash |
111 | $z = $p[3]; # $z now holds a reference to %hash |
112 | |
113 | |
114 | These examples show how to make references to variables with names. |
115 | Sometimes you want to make an array or a hash that doesn't have a |
116 | name. This is analogous to the way you like to be able to use the |
117 | string C<"\n"> or the number 80 without having to store it in a named |
118 | variable first. |
119 | |
120 | B<Make Rule 2> |
121 | |
122 | C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to |
123 | that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a |
124 | reference to that hash. |
125 | |
126 | $aref = [ 1, "foo", undef, 13 ]; |
127 | # $aref now holds a reference to an array |
128 | |
129 | $href = { APR => 4, AUG => 8 }; |
130 | # $href now holds a reference to a hash |
131 | |
132 | |
133 | The references you get from rule 2 are the same kind of |
134 | references that you get from rule 1: |
135 | |
136 | # This: |
137 | $aref = [ 1, 2, 3 ]; |
138 | |
139 | # Does the same as this: |
140 | @array = (1, 2, 3); |
141 | $aref = \@array; |
142 | |
143 | |
144 | The first line is an abbreviation for the following two lines, except |
145 | that it doesn't create the superfluous array variable C<@array>. |
146 | |
147 | |
148 | =head2 Using References |
149 | |
150 | What can you do with a reference once you have it? It's a scalar |
151 | value, and we've seen that you can store it as a scalar and get it back |
152 | again just like any scalar. There are just two more ways to use it: |
153 | |
154 | B<Use Rule 1> |
155 | |
156 | If C<$aref> contains a reference to an array, then you |
157 | can put C<{$aref}> anywhere you would normally put the name of an |
158 | array. For example, C<@{$aref}> instead of C<@array>. |
159 | |
160 | Here are some examples of that: |
161 | |
162 | Arrays: |
163 | |
164 | |
165 | @a @{$aref} An array |
166 | reverse @a reverse @{$aref} Reverse the array |
167 | $a[3] ${$aref}[3] An element of the array |
168 | $a[3] = 17; ${$aref}[3] = 17 Assigning an element |
169 | |
170 | |
171 | On each line are two expressions that do the same thing. The |
172 | left-hand versions operate on the array C<@a>, and the right-hand |
173 | versions operate on the array that is referred to by C<$aref>, but |
174 | once they find the array they're operating on, they do the same things |
175 | to the arrays. |
176 | |
177 | Using a hash reference is I<exactly> the same: |
178 | |
179 | %h %{$href} A hash |
180 | keys %h keys %{$href} Get the keys from the hash |
181 | $h{'red'} ${$href}{'red'} An element of the hash |
182 | $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element |
183 | |
184 | |
185 | B<Use Rule 2> |
186 | |
187 | C<${$aref}[3]> is too hard to read, so you can write C<$aref-E<gt>[3]> |
188 | instead. |
189 | |
190 | C<${$href}{red}> is too hard to read, so you can write |
191 | C<$href-E<gt>{red}> instead. |
192 | |
193 | Most often, when you have an array or a hash, you want to get or set a |
194 | single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have |
195 | too much punctuation, and Perl lets you abbreviate. |
196 | |
197 | If C<$aref> holds a reference to an array, then C<$aref-E<gt>[3]> is |
198 | the fourth element of the array. Don't confuse this with C<$aref[3]>, |
199 | which is the fourth element of a totally different array, one |
200 | deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the |
201 | same way that C<$item> and C<@item> are. |
202 | |
203 | Similarly, C<$href-E<gt>{'red'}> is part of the hash referred to by |
204 | the scalar variable C<$href>, perhaps even one with no name. |
205 | C<$href{'red'}> is part of the deceptively named C<%href> hash. It's |
206 | easy to forget to leave out the C<-E<gt>>, and if you do, you'll get |
207 | bizarre results when your program gets array and hash elements out of |
208 | totally unexpected hashes and arrays that weren't the ones you wanted |
209 | to use. |
210 | |
211 | |
212 | =head1 An Example |
213 | |
214 | Let's see a quick example of how all this is useful. |
215 | |
216 | First, remember that C<[1, 2, 3]> makes an anonymous array containing |
217 | C<(1, 2, 3)>, and gives you a reference to that array. |
218 | |
219 | Now think about |
220 | |
221 | @a = ( [1, 2, 3], |
222 | [4, 5, 6], |
223 | [7, 8, 9] |
224 | ); |
225 | |
226 | @a is an array with three elements, and each one is a reference to |
227 | another array. |
228 | |
229 | C<$a[1]> is one of these references. It refers to an array, the array |
230 | containing C<(4, 5, 6)>, and because it is a reference to an array, |
231 | B<USE RULE 2> says that we can write C<$a[1]-E<gt>[2]> to get the |
232 | third element from that array. C<$a[1]-E<gt>[2]> is the 6. |
233 | Similarly, C<$a[0]-E<gt>[1]> is the 2. What we have here is like a |
234 | two-dimensional array; you can write C<$a[ROW]-E<gt>[COLUMN]> to get |
235 | or set the element in any row and any column of the array. |
236 | |
237 | The notation still looks a little cumbersome, so there's one more |
238 | abbreviation: |
239 | |
240 | =head1 Arrow Rule |
241 | |
242 | In between two B<subscripts>, the arrow is optional. |
243 | |
244 | Instead of C<$a[1]-E<gt>[2]>, we can write C<$a[1][2]>; it means the |
245 | same thing. Instead of C<$a[0]-E<gt>[1]>, we can write C<$a[0][1]>; |
246 | it means the same thing. |
247 | |
248 | Now it really looks like two-dimensional arrays! |
249 | |
250 | You can see why the arrows are important. Without them, we would have |
251 | had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For |
252 | three-dimensional arrays, they let us write C<$x[2][3][5]> instead of |
253 | the unreadable C<${${$x[2]}[3]}[5]>. |
254 | |
255 | |
256 | =head1 Solution |
257 | |
1da6492a |
258 | Here's the answer to the problem I posed earlier, of reformatting a |
259 | file of city and country names. |
a1e2a320 |
260 | |
261 | 1 while (<>) { |
262 | 2 chomp; |
1da6492a |
263 | 3 my ($city, $country) = split /, /; |
264 | 4 push @{$table{$country}}, $city; |
a1e2a320 |
265 | 5 } |
266 | 6 |
1da6492a |
267 | 7 foreach $country (sort keys %table) { |
268 | 8 print "$country: "; |
269 | 9 my @cities = @{$table{$country}}; |
a1e2a320 |
270 | 10 print join ', ', sort @cities; |
271 | 11 print ".\n"; |
272 | 12 } |
273 | |
274 | |
275 | The program has two pieces: Lines 1--5 read the input and build a |
276 | data structure, and lines 7--12 analyze the data and print out the |
277 | report. |
278 | |
279 | In the first part, line 4 is the important one. We're going to have a |
1da6492a |
280 | hash, C<%table>, whose keys are country names, and whose values are |
a1e2a320 |
281 | (references to) arrays of city names. After acquiring a city and |
1da6492a |
282 | country name, the program looks up C<$table{$country}>, which holds (a |
283 | reference to) the list of cities seen in that country so far. Line 4 is |
a1e2a320 |
284 | totally analogous to |
285 | |
286 | push @array, $city; |
287 | |
288 | except that the name C<array> has been replaced by the reference |
1da6492a |
289 | C<{$table{$country}}>. The C<push> adds a city name to the end of the |
a1e2a320 |
290 | referred-to array. |
291 | |
292 | In the second part, line 9 is the important one. Again, |
1da6492a |
293 | C<$table{$country}> is (a reference to) the list of cities in the country, so |
a1e2a320 |
294 | we can recover the original list, and copy it into the array C<@cities>, |
1da6492a |
295 | by using C<@{$table{$country}}>. Line 9 is totally analogous to |
a1e2a320 |
296 | |
297 | @cities = @array; |
298 | |
299 | except that the name C<array> has been replaced by the reference |
1da6492a |
300 | C<{$table{$country}}>. The C<@> tells Perl to get the entire array. |
a1e2a320 |
301 | |
302 | The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>, |
303 | C<print>, and doesn't involve references at all. |
304 | |
305 | There's one fine point I skipped. Suppose the program has just read |
1da6492a |
306 | the first line in its input that happens to mention Greece. |
307 | Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is |
308 | C<'Athens'>. Since this is the first city in Greece, |
309 | C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key |
a1e2a320 |
310 | in C<%table> at all. What does line 4 do here? |
311 | |
1da6492a |
312 | 4 push @{$table{$country}}, $city; |
a1e2a320 |
313 | |
314 | |
315 | This is Perl, so it does the exact right thing. It sees that you want |
1da6492a |
316 | to push C<Athens> onto an array that doesn't exist, so it helpfully |
a1e2a320 |
317 | makes a new, empty, anonymous array for you, installs it in the table, |
1da6492a |
318 | and then pushes C<Athens> onto it. This is called `autovivification'. |
a1e2a320 |
319 | |
320 | |
321 | =head1 The Rest |
322 | |
323 | I promised to give you 90% of the benefit with 10% of the details, and |
324 | that means I left out 90% of the details. Now that you have an |
325 | overview of the important parts, it should be easier to read the |
326 | L<perlref> manual page, which discusses 100% of the details. |
327 | |
328 | Some of the highlights of L<perlref>: |
329 | |
330 | =over 4 |
331 | |
332 | =item * |
333 | |
334 | You can make references to anything, including scalars, functions, and |
335 | other references. |
336 | |
337 | =item * |
338 | |
d98d5fff |
339 | In B<USE RULE 1>, you can omit the curly brackets whenever the thing |
1da6492a |
340 | inside them is an atomic scalar variable like C<$aref>. For example, |
a1e2a320 |
341 | C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as |
1da6492a |
342 | C<${$aref}[1]>. If you're just starting out, you may want to adopt |
d98d5fff |
343 | the habit of always including the curly brackets. |
a1e2a320 |
344 | |
345 | =item * |
346 | |
347 | To see if a variable contains a reference, use the `ref' function. |
348 | It returns true if its argument is a reference. Actually it's a |
349 | little better than that: It returns HASH for hash references and |
1da6492a |
350 | ARRAY for array references. |
a1e2a320 |
351 | |
352 | =item * |
353 | |
354 | If you try to use a reference like a string, you get strings like |
355 | |
356 | ARRAY(0x80f5dec) or HASH(0x826afc0) |
357 | |
358 | If you ever see a string that looks like this, you'll know you |
359 | printed out a reference by mistake. |
360 | |
361 | A side effect of this representation is that you can use C<eq> to see |
362 | if two references refer to the same thing. (But you should usually use |
363 | C<==> instead because it's much faster.) |
364 | |
365 | =item * |
366 | |
367 | You can use a string as if it were a reference. If you use the string |
368 | C<"foo"> as an array reference, it's taken to be a reference to the |
369 | array C<@foo>. This is called a I<soft reference> or I<symbolic reference>. |
370 | |
371 | =back |
372 | |
373 | You might prefer to go on to L<perllol> instead of L<perlref>; it |
374 | discusses lists of lists and multidimensional arrays in detail. After |
375 | that, you should move on to L<perldsc>; it's a Data Structure Cookbook |
376 | that shows recipes for using and printing out arrays of hashes, hashes |
377 | of arrays, and other kinds of data. |
378 | |
379 | =head1 Summary |
380 | |
381 | Everyone needs compound data structures, and in Perl the way you get |
382 | them is with references. There are four important rules for managing |
383 | references: Two for making references and two for using them. Once |
384 | you know these rules you can do most of the important things you need |
385 | to do with references. |
386 | |
387 | =head1 Credits |
388 | |
389 | Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref@plover.com>) |
390 | |
1da6492a |
391 | This article originally appeared in I<The Perl Journal> |
392 | (http://tpj.com) volume 3, #2. Reprinted with permission. |
a1e2a320 |
393 | |
394 | The original title was I<Understand References Today>. |
395 | |
1da6492a |
396 | =head2 Distribution Conditions |
397 | |
398 | Copyright 1998 The Perl Journal. |
399 | |
400 | When included as part of the Standard Version of Perl, or as part of |
401 | its complete documentation whether printed or otherwise, this work may |
402 | be distributed only under the terms of Perl's Artistic License. Any |
403 | distribution of this file or derivatives thereof outside of that |
404 | package require that special arrangements be made with copyright |
405 | holder. |
406 | |
407 | Irrespective of its distribution, all code examples in these files are |
408 | hereby placed into the public domain. You are permitted and |
409 | encouraged to use this code in your own programs for fun or for profit |
410 | as you see fit. A simple comment in the code giving credit would be |
411 | courteous but is not required. |
a1e2a320 |
412 | |
a1e2a320 |
413 | |
1da6492a |
414 | |
415 | |
416 | =cut |