2 X<data structure> X<complex data structure> X<struct>
4 perldsc - Perl Data Structures Cookbook
8 The single feature most sorely lacking in the Perl programming language
9 prior to its 5.0 release was complex data structures. Even without direct
10 language support, some valiant programmers did manage to emulate them, but
11 it was hard work and not for the faint of heart. You could occasionally
12 get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
13 keys are actually more like a single concatenated string C<"$AoA$b">, but
14 traversal and sorting were difficult. More desperate programmers even
15 hacked Perl's internal symbol table directly, a strategy that proved hard
16 to develop and maintain--to put it mildly.
18 The 5.0 release of Perl let us have complex data structures. You
19 may now write something like this and all of a sudden, you'd have an array
20 with three dimensions!
31 Alas, however simple this may appear, underneath it's a much more
32 elaborate construct than meets the eye!
34 How do you print it out? Why can't you say just C<print @AoA>? How do
35 you sort it? How can you pass it to a function or get one of these back
36 from a function? Is it an object? Can you save it to disk to read
37 back later? How do you access whole rows or columns of that matrix? Do
38 all the values have to be numeric?
40 As you see, it's quite easy to become confused. While some small portion
41 of the blame for this can be attributed to the reference-based
42 implementation, it's really more due to a lack of existing documentation with
43 examples designed for the beginner.
45 This document is meant to be a detailed but understandable treatment of the
46 many different sorts of data structures you might want to develop. It
47 should also serve as a cookbook of examples. That way, when you need to
48 create one of these complex data structures, you can just pinch, pilfer, or
49 purloin a drop-in example from here.
51 Let's look at each of these possible constructs in detail. There are separate
52 sections on each of the following:
56 =item * arrays of arrays
58 =item * hashes of arrays
60 =item * arrays of hashes
62 =item * hashes of hashes
64 =item * more elaborate constructs
68 But for now, let's look at general issues common to all
69 these types of data structures.
72 X<reference> X<dereference> X<dereferencing> X<pointer>
74 The most important thing to understand about all data structures in
75 Perl--including multidimensional arrays--is that even though they might
76 appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
77 one-dimensional. They can hold only scalar values (meaning a string,
78 number, or a reference). They cannot directly contain other arrays or
79 hashes, but instead contain I<references> to other arrays or hashes.
80 X<multidimensional array> X<array, multidimensional>
82 You can't use a reference to an array or hash in quite the same way that you
83 would a real array or hash. For C or C++ programmers unused to
84 distinguishing between arrays and pointers to the same, this can be
85 confusing. If so, just think of it as the difference between a structure
86 and a pointer to a structure.
88 You can (and should) read more about references in the perlref(1) man
89 page. Briefly, references are rather like pointers that know what they
90 point to. (Objects are also a kind of reference, but we won't be needing
91 them right away--if ever.) This means that when you have something which
92 looks to you like an access to a two-or-more-dimensional array and/or hash,
93 what's really going on is that the base type is
94 merely a one-dimensional entity that contains references to the next
95 level. It's just that you can I<use> it as though it were a
96 two-dimensional one. This is actually the way almost all C
97 multidimensional arrays work as well.
99 $array[7][12] # array of arrays
100 $array[7]{string} # array of hashes
101 $hash{string}[7] # hash of arrays
102 $hash{string}{'another string'} # hash of hashes
104 Now, because the top level contains only references, if you try to print
105 out your array in with a simple print() function, you'll get something
106 that doesn't look very nice, like this:
108 @AoA = ( [2, 3], [4, 5, 7], [0] );
112 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
115 That's because Perl doesn't (ever) implicitly dereference your variables.
116 If you want to get at the thing a reference is referring to, then you have
117 to do this yourself using either prefix typing indicators, like
118 C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
119 like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
121 =head1 COMMON MISTAKES
123 The two most common mistakes made in constructing something like
124 an array of arrays is either accidentally counting the number of
125 elements or else taking a reference to the same memory location
126 repeatedly. Here's the case where you just get the count instead
130 @array = somefunc($i);
131 $AoA[$i] = @array; # WRONG!
134 That's just the simple case of assigning an array to a scalar and getting
135 its element count. If that's what you really and truly want, then you
136 might do well to consider being a tad more explicit about it, like this:
139 @array = somefunc($i);
140 $counts[$i] = scalar @array;
143 Here's the case of taking a reference to the same memory location
147 @array = somefunc($i);
148 $AoA[$i] = \@array; # WRONG!
151 So, what's the big problem with that? It looks right, doesn't it?
152 After all, I just told you that you need an array of references, so by
153 golly, you've made me one!
155 Unfortunately, while this is true, it's still broken. All the references
156 in @AoA refer to the I<very same place>, and they will therefore all hold
157 whatever was last in @array! It's similar to the problem demonstrated in
158 the following C program:
162 struct passwd *getpwnam(), *rp, *dp;
163 rp = getpwnam("root");
164 dp = getpwnam("daemon");
166 printf("daemon name is %s\nroot name is %s\n",
167 dp->pw_name, rp->pw_name);
172 daemon name is daemon
175 The problem is that both C<rp> and C<dp> are pointers to the same location
176 in memory! In C, you'd have to remember to malloc() yourself some new
177 memory. In Perl, you'll want to use the array constructor C<[]> or the
178 hash constructor C<{}> instead. Here's the right way to do the preceding
179 broken code fragments:
183 @array = somefunc($i);
184 $AoA[$i] = [ @array ];
187 The square brackets make a reference to a new array with a I<copy>
188 of what's in @array at the time of the assignment. This is what
191 Note that this will produce something similar, but it's
196 @{$AoA[$i]} = @array;
199 Is it the same? Well, maybe so--and maybe not. The subtle difference
200 is that when you assign something in square brackets, you know for sure
201 it's always a brand new reference with a new I<copy> of the data.
202 Something else could be going on in this new case with the C<@{$AoA[$i]}>
203 dereference on the left-hand-side of the assignment. It all depends on
204 whether C<$AoA[$i]> had been undefined to start with, or whether it
205 already contained a reference. If you had already populated @AoA with
208 $AoA[3] = \@another_array;
210 Then the assignment with the indirection on the left-hand-side would
211 use the existing reference that was already there:
215 Of course, this I<would> have the "interesting" effect of clobbering
216 @another_array. (Have you ever noticed how when a programmer says
217 something is "interesting", that rather than meaning "intriguing",
218 they're disturbingly more apt to mean that it's "annoying",
219 "difficult", or both? :-)
221 So just remember always to use the array or hash constructors with C<[]>
222 or C<{}>, and you'll be fine, although it's not always optimally
225 Surprisingly, the following dangerous-looking construct will
226 actually work out fine:
229 my @array = somefunc($i);
233 That's because my() is more of a run-time statement than it is a
234 compile-time declaration I<per se>. This means that the my() variable is
235 remade afresh each time through the loop. So even though it I<looks> as
236 though you stored the same variable reference each time, you actually did
237 not! This is a subtle distinction that can produce more efficient code at
238 the risk of misleading all but the most experienced of programmers. So I
239 usually advise against teaching it to beginners. In fact, except for
240 passing arguments to functions, I seldom like to see the gimme-a-reference
241 operator (backslash) used much at all in code. Instead, I advise
242 beginners that they (and most of the rest of us) should try to use the
243 much more easily understood constructors C<[]> and C<{}> instead of
244 relying upon lexical (or dynamic) scoping and hidden reference-counting to
245 do the right thing behind the scenes.
249 $AoA[$i] = [ @array ]; # usually best
250 $AoA[$i] = \@array; # perilous; just how my() was that array?
251 @{ $AoA[$i] } = @array; # way too tricky for most programmers
254 =head1 CAVEAT ON PRECEDENCE
255 X<dereference, precedence> X<dereferencing, precedence>
257 Speaking of things like C<@{$AoA[$i]}>, the following are actually the
261 $aref->[2][2] # clear
262 $$aref[2][2] # confusing
264 That's because Perl's precedence rules on its five prefix dereferencers
265 (which look like someone swearing: C<$ @ * % &>) make them bind more
266 tightly than the postfix subscripting brackets or braces! This will no
267 doubt come as a great shock to the C or C++ programmer, who is quite
268 accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
269 element of C<a>. That is, they first take the subscript, and only then
270 dereference the thing at that subscript. That's fine in C, but this isn't C.
272 The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
273 the deref of $aref, making it take $aref as a reference to an
274 array, and then dereference that, and finally tell you the I<i'th> value
275 of the array pointed to by $AoA. If you wanted the C notion, you'd have to
276 write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
277 before the leading C<$> dereferencer.
279 =head1 WHY YOU SHOULD ALWAYS C<use strict>
281 If this is starting to sound scarier than it's worth, relax. Perl has
282 some features to help you avoid its most common pitfalls. The best
283 way to avoid getting confused is to start every program like this:
288 This way, you'll be forced to declare all your variables with my() and
289 also disallow accidental "symbolic dereferencing". Therefore if you'd done
293 [ "fred", "barney", "pebbles", "bambam", "dino", ],
294 [ "homer", "bart", "marge", "maggie", ],
295 [ "george", "jane", "elroy", "judy", ],
300 The compiler would immediately flag that as an error I<at compile time>,
301 because you were accidentally accessing C<@aref>, an undeclared
302 variable, and it would thereby remind you to write instead:
307 X<data structure, debugging> X<complex data structure, debugging>
308 X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
309 X<array of arrays, debugging> X<hash of arrays, debugging>
310 X<array of hashes, debugging> X<hash of hashes, debugging>
312 Before version 5.002, the standard Perl debugger didn't do a very nice job of
313 printing out complex data structures. With 5.002 or above, the
314 debugger includes several new features, including command line editing as
315 well as the C<x> command to dump out complex data structures. For
316 example, given the assignment to $AoA above, here's the debugger output:
319 $AoA = ARRAY(0x13b5a0)
339 Presented with little comment (these will get their own manpages someday)
340 here are short code examples illustrating access of various
341 types of data structures.
343 =head1 ARRAYS OF ARRAYS
344 X<array of arrays> X<AoA>
346 =head2 Declaration of an ARRAY OF ARRAYS
349 [ "fred", "barney" ],
350 [ "george", "jane", "elroy" ],
351 [ "homer", "marge", "bart" ],
354 =head2 Generation of an ARRAY OF ARRAYS
358 push @AoA, [ split ];
363 $AoA[$i] = [ somefunc($i) ];
372 # add to an existing row
373 push @{ $AoA[0] }, "wilma", "betty";
375 =head2 Access and Printing of an ARRAY OF ARRAYS
381 $AoA[1][1] =~ s/(\w)/\u$1/;
383 # print the whole thing with refs
385 print "\t [ @$aref ],\n";
388 # print the whole thing with indices
389 for $i ( 0 .. $#AoA ) {
390 print "\t [ @{$AoA[$i]} ],\n";
393 # print the whole thing one at a time
394 for $i ( 0 .. $#AoA ) {
395 for $j ( 0 .. $#{ $AoA[$i] } ) {
396 print "elt $i $j is $AoA[$i][$j]\n";
400 =head1 HASHES OF ARRAYS
401 X<hash of arrays> X<HoA>
403 =head2 Declaration of a HASH OF ARRAYS
406 flintstones => [ "fred", "barney" ],
407 jetsons => [ "george", "jane", "elroy" ],
408 simpsons => [ "homer", "marge", "bart" ],
411 =head2 Generation of a HASH OF ARRAYS
414 # flintstones: fred barney wilma dino
416 next unless s/^(.*?):\s*//;
417 $HoA{$1} = [ split ];
420 # reading from file; more temps
421 # flintstones: fred barney wilma dino
422 while ( $line = <> ) {
423 ($who, $rest) = split /:\s*/, $line, 2;
424 @fields = split ' ', $rest;
425 $HoA{$who} = [ @fields ];
428 # calling a function that returns a list
429 for $group ( "simpsons", "jetsons", "flintstones" ) {
430 $HoA{$group} = [ get_family($group) ];
433 # likewise, but using temps
434 for $group ( "simpsons", "jetsons", "flintstones" ) {
435 @members = get_family($group);
436 $HoA{$group} = [ @members ];
439 # append new members to an existing family
440 push @{ $HoA{"flintstones"} }, "wilma", "betty";
442 =head2 Access and Printing of a HASH OF ARRAYS
445 $HoA{flintstones}[0] = "Fred";
448 $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
450 # print the whole thing
451 foreach $family ( keys %HoA ) {
452 print "$family: @{ $HoA{$family} }\n"
455 # print the whole thing with indices
456 foreach $family ( keys %HoA ) {
458 foreach $i ( 0 .. $#{ $HoA{$family} } ) {
459 print " $i = $HoA{$family}[$i]";
464 # print the whole thing sorted by number of members
465 foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
466 print "$family: @{ $HoA{$family} }\n"
469 # print the whole thing sorted by number of members and name
470 foreach $family ( sort {
471 @{$HoA{$b}} <=> @{$HoA{$a}}
476 print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
479 =head1 ARRAYS OF HASHES
480 X<array of hashes> X<AoH>
482 =head2 Declaration of an ARRAY OF HASHES
501 =head2 Generation of an ARRAY OF HASHES
504 # format: LEAD=fred FRIEND=barney
507 for $field ( split ) {
508 ($key, $value) = split /=/, $field;
509 $rec->{$key} = $value;
516 # format: LEAD=fred FRIEND=barney
519 push @AoH, { split /[\s+=]/ };
522 # calling a function that returns a key/value pair list, like
523 # "lead","fred","daughter","pebbles"
524 while ( %fields = getnextpairset() ) {
525 push @AoH, { %fields };
528 # likewise, but using no temp vars
530 push @AoH, { parsepairs($_) };
533 # add key/value to an element
534 $AoH[0]{pet} = "dino";
535 $AoH[2]{pet} = "santa's little helper";
537 =head2 Access and Printing of an ARRAY OF HASHES
540 $AoH[0]{lead} = "fred";
543 $AoH[1]{lead} =~ s/(\w)/\u$1/;
545 # print the whole thing with refs
548 for $role ( keys %$href ) {
549 print "$role=$href->{$role} ";
554 # print the whole thing with indices
555 for $i ( 0 .. $#AoH ) {
557 for $role ( keys %{ $AoH[$i] } ) {
558 print "$role=$AoH[$i]{$role} ";
563 # print the whole thing one at a time
564 for $i ( 0 .. $#AoH ) {
565 for $role ( keys %{ $AoH[$i] } ) {
566 print "elt $i $role is $AoH[$i]{$role}\n";
570 =head1 HASHES OF HASHES
571 X<hash of hashes> X<HoH>
573 =head2 Declaration of a HASH OF HASHES
583 "his boy" => "elroy",
592 =head2 Generation of a HASH OF HASHES
595 # flintstones: lead=fred pal=barney wife=wilma pet=dino
597 next unless s/^(.*?):\s*//;
599 for $field ( split ) {
600 ($key, $value) = split /=/, $field;
601 $HoH{$who}{$key} = $value;
605 # reading from file; more temps
607 next unless s/^(.*?):\s*//;
611 for $field ( split ) {
612 ($key, $value) = split /=/, $field;
613 $rec->{$key} = $value;
617 # calling a function that returns a key,value hash
618 for $group ( "simpsons", "jetsons", "flintstones" ) {
619 $HoH{$group} = { get_family($group) };
622 # likewise, but using temps
623 for $group ( "simpsons", "jetsons", "flintstones" ) {
624 %members = get_family($group);
625 $HoH{$group} = { %members };
628 # append new members to an existing family
634 for $what (keys %new_folks) {
635 $HoH{flintstones}{$what} = $new_folks{$what};
638 =head2 Access and Printing of a HASH OF HASHES
641 $HoH{flintstones}{wife} = "wilma";
644 $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
646 # print the whole thing
647 foreach $family ( keys %HoH ) {
649 for $role ( keys %{ $HoH{$family} } ) {
650 print "$role=$HoH{$family}{$role} ";
655 # print the whole thing somewhat sorted
656 foreach $family ( sort keys %HoH ) {
658 for $role ( sort keys %{ $HoH{$family} } ) {
659 print "$role=$HoH{$family}{$role} ";
665 # print the whole thing sorted by number of members
666 foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
668 for $role ( sort keys %{ $HoH{$family} } ) {
669 print "$role=$HoH{$family}{$role} ";
674 # establish a sort order (rank) for each role
676 for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
678 # now print the whole thing sorted by number of members
679 foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
681 # and print these according to rank order
682 for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
683 print "$role=$HoH{$family}{$role} ";
689 =head1 MORE ELABORATE RECORDS
690 X<record> X<structure> X<struct>
692 =head2 Declaration of MORE ELABORATE RECORDS
694 Here's a sample showing how to create and use a record whose fields are of
695 many different sorts:
699 SEQUENCE => [ @old_values ],
700 LOOKUP => { %some_table },
701 THATCODE => \&some_function,
702 THISCODE => sub { $_[0] ** $_[1] },
708 print $rec->{SEQUENCE}[0];
709 $last = pop @ { $rec->{SEQUENCE} };
711 print $rec->{LOOKUP}{"key"};
712 ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
714 $answer = $rec->{THATCODE}->($arg);
715 $answer = $rec->{THISCODE}->($arg1, $arg2);
717 # careful of extra block braces on fh ref
718 print { $rec->{HANDLE} } "a string\n";
721 $rec->{HANDLE}->autoflush(1);
722 $rec->{HANDLE}->print(" a string\n");
724 =head2 Declaration of a HASH OF COMPLEX RECORDS
728 series => "flintstones",
729 nights => [ qw(monday thursday friday) ],
731 { name => "fred", role => "lead", age => 36, },
732 { name => "wilma", role => "wife", age => 31, },
733 { name => "pebbles", role => "kid", age => 4, },
739 nights => [ qw(wednesday saturday) ],
741 { name => "george", role => "lead", age => 41, },
742 { name => "jane", role => "wife", age => 39, },
743 { name => "elroy", role => "kid", age => 9, },
748 series => "simpsons",
749 nights => [ qw(monday) ],
751 { name => "homer", role => "lead", age => 34, },
752 { name => "marge", role => "wife", age => 37, },
753 { name => "bart", role => "kid", age => 11, },
758 =head2 Generation of a HASH OF COMPLEX RECORDS
761 # this is most easily done by having the file itself be
762 # in the raw data format as shown above. perl is happy
763 # to parse complex data structures if declared as data, so
764 # sometimes it's easiest to do that
766 # here's a piece by piece build up
768 $rec->{series} = "flintstones";
769 $rec->{nights} = [ find_days() ];
772 # assume this file in field=value syntax
774 %fields = split /[\s=]+/;
775 push @members, { %fields };
777 $rec->{members} = [ @members ];
779 # now remember the whole thing
780 $TV{ $rec->{series} } = $rec;
782 ###########################################################
783 # now, you might want to make interesting extra fields that
784 # include pointers back into the same data structure so if
785 # change one piece, it changes everywhere, like for example
786 # if you wanted a {kids} field that was a reference
787 # to an array of the kids' records without having duplicate
788 # records and thus update problems.
789 ###########################################################
790 foreach $family (keys %TV) {
791 $rec = $TV{$family}; # temp pointer
793 for $person ( @{ $rec->{members} } ) {
794 if ($person->{role} =~ /kid|son|daughter/) {
798 # REMEMBER: $rec and $TV{$family} point to same data!!
799 $rec->{kids} = [ @kids ];
802 # you copied the array, but the array itself contains pointers
803 # to uncopied objects. this means that if you make bart get
806 $TV{simpsons}{kids}[0]{age}++;
808 # then this would also change in
809 print $TV{simpsons}{members}[2]{age};
811 # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
812 # both point to the same underlying anonymous hash table
814 # print the whole thing
815 foreach $family ( keys %TV ) {
817 print " is on during @{ $TV{$family}{nights} }\n";
818 print "its members are:\n";
819 for $who ( @{ $TV{$family}{members} } ) {
820 print " $who->{name} ($who->{role}), age $who->{age}\n";
822 print "it turns out that $TV{$family}{lead} has ";
823 print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
824 print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
830 You cannot easily tie a multilevel data structure (such as a hash of
831 hashes) to a dbm file. The first problem is that all but GDBM and
832 Berkeley DB have size limitations, but beyond that, you also have problems
833 with how references are to be represented on disk. One experimental
834 module that does partially attempt to address this need is the MLDBM
835 module. Check your nearest CPAN site as described in L<perlmodlib> for
836 source code to MLDBM.
840 perlref(1), perllol(1), perldata(1), perlobj(1)
844 Tom Christiansen <F<tchrist@perl.com>>
847 Wed Oct 23 04:57:50 MET DST 1996