3 perldsc - Perl Data Structures Cookbook
7 The single feature most sorely lacking in the Perl programming language
8 prior to its 5.0 release was complex data structures. Even without direct
9 language support, some valiant programmers did manage to emulate them, but
10 it was hard work and not for the faint of heart. You could occasionally
11 get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
12 keys are actually more like a single concatenated string C<"$AoA$b">, but
13 traversal and sorting were difficult. More desperate programmers even
14 hacked Perl's internal symbol table directly, a strategy that proved hard
15 to develop and maintain--to put it mildly.
17 The 5.0 release of Perl let us have complex data structures. You
18 may now write something like this and all of a sudden, you'd have a array
19 with three dimensions!
25 $AoA[$x][$y][$z] = $x ** $y + $z;
30 Alas, however simple this may appear, underneath it's a much more
31 elaborate construct than meets the eye!
33 How do you print it out? Why can't you say just C<print @AoA>? How do
34 you sort it? How can you pass it to a function or get one of these back
35 from a function? Is is an object? Can you save it to disk to read
36 back later? How do you access whole rows or columns of that matrix? Do
37 all the values have to be numeric?
39 As you see, it's quite easy to become confused. While some small portion
40 of the blame for this can be attributed to the reference-based
41 implementation, it's really more due to a lack of existing documentation with
42 examples designed for the beginner.
44 This document is meant to be a detailed but understandable treatment of the
45 many different sorts of data structures you might want to develop. It
46 should also serve as a cookbook of examples. That way, when you need to
47 create one of these complex data structures, you can just pinch, pilfer, or
48 purloin a drop-in example from here.
50 Let's look at each of these possible constructs in detail. There are separate
51 sections on each of the following:
55 =item * arrays of arrays
57 =item * hashes of arrays
59 =item * arrays of hashes
61 =item * hashes of hashes
63 =item * more elaborate constructs
67 But for now, let's look at general issues common to all
68 these types of data structures.
72 The most important thing to understand about all data structures in Perl
73 -- including multidimensional arrays--is that even though they might
74 appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
75 one-dimensional. They can hold only scalar values (meaning a string,
76 number, or a reference). They cannot directly contain other arrays or
77 hashes, but instead contain I<references> to other arrays or hashes.
79 You can't use a reference to a array or hash in quite the same way that you
80 would a real array or hash. For C or C++ programmers unused to
81 distinguishing between arrays and pointers to the same, this can be
82 confusing. If so, just think of it as the difference between a structure
83 and a pointer to a structure.
85 You can (and should) read more about references in the perlref(1) man
86 page. Briefly, references are rather like pointers that know what they
87 point to. (Objects are also a kind of reference, but we won't be needing
88 them right away--if ever.) This means that when you have something which
89 looks to you like an access to a two-or-more-dimensional array and/or hash,
90 what's really going on is that the base type is
91 merely a one-dimensional entity that contains references to the next
92 level. It's just that you can I<use> it as though it were a
93 two-dimensional one. This is actually the way almost all C
94 multidimensional arrays work as well.
96 $array[7][12] # array of arrays
97 $array[7]{string} # array of hashes
98 $hash{string}[7] # hash of arrays
99 $hash{string}{'another string'} # hash of hashes
101 Now, because the top level contains only references, if you try to print
102 out your array in with a simple print() function, you'll get something
103 that doesn't look very nice, like this:
113 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
116 That's because Perl doesn't (ever) implicitly dereference your variables.
117 If you want to get at the thing a reference is referring to, then you have
118 to do this yourself using either prefix typing indicators, like
119 C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
120 like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
122 =head1 COMMON MISTAKES
124 The two most common mistakes made in constructing something like
125 an array of arrays is either accidentally counting the number of
126 elements or else taking a reference to the same memory location
127 repeatedly. Here's the case where you just get the count instead
132 my @array = somefunc($i);
133 $AoA[$i] = @array; # WRONG!
136 That's just the simple case of assigning an array to a scalar and getting
137 its element count. If that's what you really and truly want, then you
138 might do well to consider being a tad more explicit about it, like this:
142 my @array = somefunc($i);
143 $counts[$i] = scalar @array;
146 Here's the right way to do the reference C<@array>:
150 my @array = somefunc($i);
151 $AoA[$i] = [ @array ];
154 The square brackets make a reference to a new array with a I<copy>
155 of what's in C<@array>.
157 Note that this will produce something similar, but it's
162 my @array = somefunc($i);
163 @{ $AoA[$i] } = @array;
166 Is it the same? Well, maybe so--and maybe not. The subtle difference
167 is that when you assign something in square brackets, you know for sure
168 it's always a brand new reference with a new I<copy> of the data.
169 Something else could be going on in this new case with the C<@{ $AoA[$i]} }>
170 dereference on the left-hand-side of the assignment. It all depends on
171 whether C<$AoA[$i]> had been undefined to start with, or whether it
172 already contained a reference. If you had already populated @AoA with
175 $AoA[3] = \@another_array;
177 Then the assignment with the indirection on the left-hand-side would
178 use the existing reference that was already there:
180 @{ $AoA[3] } = @array;
182 Of course, this I<would> have the "interesting" effect of clobbering
183 @another_array. (Have you ever noticed how when a programmer says
184 something is "interesting", that rather than meaning "intriguing",
185 they're disturbingly more apt to mean that it's "annoying",
186 "difficult", or both? :-)
188 So just remember always to use the array or hash constructors with C<[]>
189 or C<{}>, and you'll be fine, although it's not always optimally
192 Surprisingly, the following dangerous-looking construct will
193 actually work out fine:
197 my @array = somefunc($i);
201 That's because my() is more of a run-time statement than it is a
202 compile-time declaration I<per se>. This means that the my() variable is
203 remade afresh each time through the loop. So even though it I<looks> as
204 though you stored the same variable reference each time, you actually did
205 not! This is a subtle distinction that can produce more efficient code at
206 the risk of misleading all but the most experienced of programmers. So I
207 usually advise against teaching it to beginners. In fact, except for
208 passing arguments to functions, I seldom like to see the gimme-a-reference
209 operator (backslash) used much at all in code. Instead, I advise
210 beginners that they (and most of the rest of us) should try to use the
211 much more easily understood constructors C<[]> and C<{}> instead of
212 relying upon lexical (or dynamic) scoping and hidden reference-counting to
213 do the right thing behind the scenes.
217 $AoA[$i] = [ @array ]; # usually best
218 $AoA[$i] = \@array; # perilous; just how my() is that array?
219 @{ $AoA[$i] } = @array; # way too tricky for most programmers
222 =head1 CAVEAT ON PRECEDENCE
224 Speaking of things like C<@{ $AoA[$i] }>, the following are actually the
227 $aref->[2][2] # clear
228 $$aref[2][2] # confusing
230 That's because Perl's precedence rules on its five prefix dereferencers
231 (which look like someone swearing: C<$ @ * % &>) make them bind more
232 tightly than the postfix subscripting brackets or braces! This will no
233 doubt come as a great shock to the C or C++ programmer, who is quite
234 accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
235 element of C<a>. That is, they first take the subscript, and only then
236 dereference the thing at that subscript. That's fine in C, but this isn't C.
238 The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
239 the deref of $aref, making it take $aref as a reference to an
240 array, and then dereference that, and finally tell you the I<i'th> value
241 of the array pointed to by $AoA. If you wanted the C notion, you'd have to
242 write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
243 before the leading C<$> dereferencer.
245 =head1 WHY YOU SHOULD ALWAYS C<use strict>
247 If this is starting to sound scarier than it's worth, relax. Perl has
248 some features to help you avoid its most common pitfalls. The best
249 way to avoid getting confused is to start every program like this:
254 This way, you'll be forced to declare all your variables with my() and
255 also disallow accidental "symbolic dereferencing". Therefore if you'd done
259 [ 'fred', 'barney', 'pebbles', 'bambam', 'dino', ],
260 [ 'homer', 'bart', 'marge', 'maggie', ],
261 [ 'george', 'jane', 'elroy', 'judy', ],
266 The compiler would immediately flag that as an error I<at compile time>,
267 because you were accidentally accessing C<@aref>, an undeclared
268 variable, and it would thereby remind you to write instead:
274 Before version 5.002, the standard Perl debugger didn't do a very nice job of
275 printing out complex data structures. With 5.002 or above, the
276 debugger includes several new features, including command line editing as
277 well as the C<x> command to dump out complex data structures. For
278 example, given the assignment to $AoA above, here's the debugger output:
281 $AoA = ARRAY(0x13b5a0)
301 Presented with little comment (these will get their own manpages someday)
302 here are short code examples illustrating access of various
303 types of data structures.
305 =head1 ARRAYS OF ARRAYS
307 =head2 Declaration of a ARRAY OF ARRAYS
310 [ 'fred', 'barney' ],
311 [ 'george', 'jane', 'elroy' ],
312 [ 'homer', 'marge', 'bart' ],
315 =head2 Generation of a ARRAY OF ARRAYS
320 push @AoA, [ split ];
325 foreach my $i ( 1 .. 10 ) {
326 $AoA[$i] = [ somefunc($i) ];
331 foreach my $i ( 1 .. 10 ) {
332 my @tmp = somefunc($i);
336 # add to an existing row
337 push @{ $AoA[0] }, 'wilma', 'betty';
339 =head2 Access and Printing of a ARRAY OF ARRAYS
347 $AoA[1][1] =~ s/(\w)/\u$1/;
349 # print the whole thing with refs
350 foreach my $aref ( @AoA ) {
351 print "\t [ @$aref ],\n";
354 # print the whole thing with indices
355 foreach my $i ( 0 .. $#AoA ) {
356 print "\t [ @{ $AoA[$i] } ],\n";
359 # print the whole thing one at a time
360 foreach my $i ( 0 .. $#AoA ) {
361 foreach my $j ( 0 .. $#{ $AoA[$i] } ) {
362 print "element $i $j is $AoA[$i][$j]\n";
366 =head1 HASHES OF ARRAYS
368 =head2 Declaration of a HASH OF ARRAYS
371 flintstones => [ 'fred', 'barney' ],
372 jetsons => [ 'george', 'jane', 'elroy' ],
373 simpsons => [ 'homer', 'marge', 'bart' ],
376 =head2 Generation of a HASH OF ARRAYS
379 # flintstones: fred barney wilma dino
382 next unless s/^([^:]*):\s*//;
383 $HoA{$1} = [ split ];
386 # reading from file; more temps
387 # flintstones: fred barney wilma dino
389 while ( my $line = <> ) {
390 my ($who, $rest) = split /:\s*/, $line, 2;
391 my @fields = split ' ', $rest;
392 $HoA{$who} = [ @fields ];
395 # calling a function that returns a list
397 foreach my $group ( 'simpsons', 'jetsons', 'flintstones' ) {
398 $HoA{$group} = [ get_family($group) ];
401 # likewise, but using temps
403 foreach my $group ( 'simpsons', 'jetsons', 'flintstones' ) {
404 my @members = get_family($group);
405 $HoA{$group} = [ @members ];
408 # append new members to an existing family
409 push @{ $HoA{flintstones} }, 'wilma', 'betty';
411 =head2 Access and Printing of a HASH OF ARRAYS
416 $HoA{flintstones}[0] = 'Fred';
419 $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
421 # print the whole thing
422 foreach my $family ( keys %HoA ) {
423 print "$family: @{ $HoA{$family} }\n";
426 # print the whole thing with indices
427 foreach my $family ( keys %HoA ) {
429 foreach my $i ( 0 .. $#{ $HoA{$family} } ) {
430 print " $i = $HoA{$family}[$i]";
435 # print the whole thing sorted by number of members
437 @{ $HoA{$b} } <=> @{ $HoA{$a} }
439 foreach my $family ( sort num_members keys %HoA ) {
440 print "$family: @{ $HoA{$family} }\n"
443 # print the whole thing sorted by number of members and name
444 sub members_and_name {
445 @{ $HoA{$b} } <=> @{ $HoA{$a} }
449 foreach my $family ( sort members_and_name keys %HoA ) {
450 print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
453 =head1 ARRAYS OF HASHES
455 =head2 Declaration of a ARRAY OF HASHES
474 =head2 Generation of a ARRAY OF HASHES
477 # format: LEAD=fred FRIEND=barney
481 foreach my $field ( split ) {
482 my($key, $value) = split /=/, $field;
483 $rec->{$key} = $value;
490 # format: LEAD=fred FRIEND=barney
494 push @AoH, { split /[\s+=]/ };
497 # calling a function that returns a key/value pair list, like
498 # lead => 'fred', daughter => 'pebbles'
500 while ( my %fields = getnextpairset() ) {
501 push @AoH, { %fields };
504 # likewise, but using no temp vars
507 push @AoH, { parsepairs($_) };
510 # add key/value to an element
511 $AoH[0]{pet} = 'dino';
512 $AoH[2]{pet} = "santa's little helper";
514 =head2 Access and Printing of a ARRAY OF HASHES
519 $AoH[0]{lead} = 'fred';
522 $AoH[1]{lead} =~ s/(\w)/\u$1/;
524 # print the whole thing with refs
525 foreach my $href ( @AoH ) {
527 foreach my $role ( keys %$href ) {
528 print "$role = $href->{$role} ";
533 # print the whole thing with indices
534 foreach my $i ( 0 .. $#AoH ) {
536 foreach my $role ( keys %{ $AoH[$i] } ) {
537 print "$role = $AoH[$i]{$role} ";
542 # print the whole thing one at a time
543 foreach my $i ( 0 .. $#AoH ) {
544 foreach my $role ( keys %{ $AoH[$i] } ) {
545 print "element $i $role is $AoH[$i]{$role}\n";
549 =head1 HASHES OF HASHES
551 =head2 Declaration of a HASH OF HASHES
561 'his boy' => 'elroy',
570 =head2 Generation of a HASH OF HASHES
573 # flintstones: lead=fred pal=barney wife=wilma pet=dino
576 next unless s/^([^:]*):\s*//;
578 for my $field ( split ) {
579 my($key, $value) = split /=/, $field;
580 $HoH{$who}{$key} = $value;
584 # reading from file; more temps
587 next unless s/^([^:]*):\s*//;
591 foreach my $field ( split ) {
592 my($key, $value) = split /=/, $field;
593 $rec->{$key} = $value;
597 # calling a function that returns a key,value hash
599 foreach my $group ( 'simpsons', 'jetsons', 'flintstones' ) {
600 $HoH{$group} = { get_family($group) };
603 # likewise, but using temps
605 foreach my $group ( 'simpsons', 'jetsons', 'flintstones' ) {
606 my %members = get_family($group);
607 $HoH{$group} = { %members };
610 # append new members to an existing family
617 foreach my $what (keys %new_folks) {
618 $HoH{flintstones}{$what} = $new_folks{$what};
621 =head2 Access and Printing of a HASH OF HASHES
626 $HoH{flintstones}{wife} = 'wilma';
629 $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
631 # print the whole thing
632 foreach my $family ( keys %HoH ) {
634 foreach my $role ( keys %{ $HoH{$family} } ) {
635 print "$role = $HoH{$family}{$role} ";
640 # print the whole thing somewhat sorted
641 foreach my $family ( sort keys %HoH ) {
643 foreach my $role ( sort keys %{ $HoH{$family} } ) {
644 print "$role = $HoH{$family}{$role} ";
649 # print the whole thing sorted by number of members
651 keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} }
653 foreach my $family ( sort num_members keys %HoH ) {
655 foreach my $role ( sort keys %{ $HoH{$family} } ) {
656 print "$role = $HoH{$family}{$role} ";
661 # establish a sort order (rank) for each role
664 foreach ( qw(lead wife son daughter pal pet) ) {
668 # now print the whole thing sorted by number of members
670 keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} }
673 $rank{$a} <=> $rank{$b}
676 foreach my $family ( sort num_members keys %HoH ) {
678 # and print these according to rank order
679 foreach my $role ( sort rank keys %{ $HoH{$family} } ) {
680 print "$role = $HoH{$family}{$role} ";
686 =head1 MORE ELABORATE RECORDS
688 =head2 Declaration of MORE ELABORATE RECORDS
690 Here's a sample showing how to create and use a record whose fields are of
691 many different sorts:
695 SEQUENCE => [ @old_values ],
696 LOOKUP => { %some_table },
697 THATCODE => \&some_function,
698 THISCODE => sub { $_[0] ** $_[1] },
704 print $rec->{SEQUENCE}->[0];
705 my $last = pop @{ $rec->{SEQUENCE} };
707 print $rec->{LOOKUP}->{key};
708 my($first_k, $first_v) = each %{ $rec->{LOOKUP} };
710 my $answer = $rec->{THATCODE}->($arg);
711 my $result = $rec->{THISCODE}->($arg1, $arg2);
713 # careful of extra block braces on fh ref
714 print { $rec->{HANDLE} } "a string\n";
717 $rec->{HANDLE}->autoflush(1);
718 $rec->{HANDLE}->print(" a string\n");
720 =head2 Declaration of a HASH OF COMPLEX RECORDS
724 series => 'flintstones',
725 nights => [ qw(monday thursday friday) ],
727 { name => 'fred', role => 'lead', age => 36, },
728 { name => 'wilma', role => 'wife', age => 31, },
729 { name => 'pebbles', role => 'kid', age => 4, },
735 nights => [ qw(wednesday saturday) ],
737 { name => 'george", role => 'lead', age => 41, },
738 { name => 'jane", role => 'wife', age => 39, },
739 { name => 'elroy", role => 'kid', age => 9, },
744 series => 'simpsons',
745 nights => [ qw(monday) ],
747 { name => 'homer', role => 'lead', age => 34, },
748 { name => 'marge', role => 'wife', age => 37, },
749 { name => 'bart', role => 'kid', age => 11, },
754 =head2 Generation of a HASH OF COMPLEX RECORDS
756 Here's a piece by piece build up of a hash of complex records. We'll
757 read in a file that has our data in it.
761 $rec->{series} = 'flintstones';
762 $rec->{nights} = [ find_days() ];
765 # assume this file in field=value syntax
767 my %fields = split /[\s=]+/, $_;
768 push @members, { %fields };
770 $rec->{members} = [ @members ];
772 # now remember the whole thing
773 $TV{ $rec->{series} } = $rec;
775 Now, you might want to make interesting extra fields that
776 include pointers back into the same data structure so if
777 change one piece, it changes everywhere, like for example
778 if you wanted a 'kids' field that was a reference
779 to an array of the kids' records without having duplicate
780 records and thus update problems.
782 foreach my $family ( keys %TV ) {
783 my $rec = $TV{$family}; # $rec points to $TV{$family}
785 foreach my $person ( @{ $rec->{members} } ) {
786 if ( $person->{role} =~ /kid|son|daughter/ ) {
790 # REMEMBER: $rec and $TV{$family} point to same data!!
791 $rec->{kids} = [ @kids ];
794 You copied the array, but the array itself contains pointers
795 to uncopied objects. This means that if you make bart get
798 $TV{simpsons}{kids}[0]{age}++;
800 Then this would also change in C<$TV{simpsons}{members}[2]{age}>
801 because C<$TV{simpsons}{kids}[0]> and C<$TV{simpsons}{members}[2]>
802 both point to the same underlying anonymous hash table.
804 # print the whole thing
805 foreach my $family ( keys %TV ) {
806 print "the $family is on during @{ $TV{$family}{nights} }\n",
807 "its members are:\n";
809 foraech my $who ( @{ $TV{$family}{members} } ) {
810 print " $who->{name} ($who->{role}), age $who->{age}\n";
813 print "it turns out that $TV{$family}{lead} has ",
814 scalar ( @{ $TV{$family}{kids} } ),
818 map { $_->{name} } @{ $TV{$family}{kids} }
825 You cannot easily tie a multilevel data structure (such as a hash of
826 hashes) to a dbm file. The first problem is that all but GDBM and
827 Berkeley DB have size limitations, but beyond that, you also have problems
828 with how references are to be represented on disk. One experimental
829 module that does partially attempt to address this need is the MLDBM
830 module. Check your nearest CPAN site as described in L<perlmodlib> for
831 source code to MLDBM.
835 perlref(1), perllol(1), perldata(1), perlobj(1)
839 Tom Christiansen <F<tchrist@perl.com>>
841 Last update (by Tom):
842 Wed Oct 23 04:57:50 MET DST 1996
844 Last update (by Casey West, <F<casey@geeknest.com>>
845 Mon Sep 17 13:33:41 EDT 2001