3 perldsc - Perl Data Structures Cookbook
7 The single feature most sorely lacking in the Perl programming language
8 prior to its 5.0 release was complex data structures. Even without direct
9 language support, some valiant programmers did manage to emulate them, but
10 it was hard work and not for the faint of heart. You could occasionally
11 get away with the C<$m{$LoL,$b}> notation borrowed from I<awk> in which the
12 keys are actually more like a single concatenated string C<"$LoL$b">, but
13 traversal and sorting were difficult. More desperate programmers even
14 hacked Perl's internal symbol table directly, a strategy that proved hard
15 to develop and maintain--to put it mildly.
17 The 5.0 release of Perl let us have complex data structures. You
18 may now write something like this and all of a sudden, you'd have a array
19 with three dimensions!
30 Alas, however simple this may appear, underneath it's a much more
31 elaborate construct than meets the eye!
33 How do you print it out? Why can't you say just C<print @LoL>? How do
34 you sort it? How can you pass it to a function or get one of these back
35 from a function? Is is an object? Can you save it to disk to read
36 back later? How do you access whole rows or columns of that matrix? Do
37 all the values have to be numeric?
39 As you see, it's quite easy to become confused. While some small portion
40 of the blame for this can be attributed to the reference-based
41 implementation, it's really more due to a lack of existing documentation with
42 examples designed for the beginner.
44 This document is meant to be a detailed but understandable treatment of the
45 many different sorts of data structures you might want to develop. It
46 should also serve as a cookbook of examples. That way, when you need to
47 create one of these complex data structures, you can just pinch, pilfer, or
48 purloin a drop-in example from here.
50 Let's look at each of these possible constructs in detail. There are separate
51 sections on each of the following:
55 =item * arrays of arrays
57 =item * hashes of arrays
59 =item * arrays of hashes
61 =item * hashes of hashes
63 =item * more elaborate constructs
67 But for now, let's look at some of the general issues common to all
68 of these types of data structures.
72 The most important thing to understand about all data structures in Perl
73 -- including multidimensional arrays--is that even though they might
74 appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
75 one-dimensional. They can hold only scalar values (meaning a string,
76 number, or a reference). They cannot directly contain other arrays or
77 hashes, but instead contain I<references> to other arrays or hashes.
79 You can't use a reference to a array or hash in quite the same way that you
80 would a real array or hash. For C or C++ programmers unused to
81 distinguishing between arrays and pointers to the same, this can be
82 confusing. If so, just think of it as the difference between a structure
83 and a pointer to a structure.
85 You can (and should) read more about references in the perlref(1) man
86 page. Briefly, references are rather like pointers that know what they
87 point to. (Objects are also a kind of reference, but we won't be needing
88 them right away--if ever.) This means that when you have something which
89 looks to you like an access to a two-or-more-dimensional array and/or hash,
90 what's really going on is that the base type is
91 merely a one-dimensional entity that contains references to the next
92 level. It's just that you can I<use> it as though it were a
93 two-dimensional one. This is actually the way almost all C
94 multidimensional arrays work as well.
96 $list[7][12] # array of arrays
97 $list[7]{string} # array of hashes
98 $hash{string}[7] # hash of arrays
99 $hash{string}{'another string'} # hash of hashes
101 Now, because the top level contains only references, if you try to print
102 out your array in with a simple print() function, you'll get something
103 that doesn't look very nice, like this:
105 @LoL = ( [2, 3], [4, 5, 7], [0] );
109 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
112 That's because Perl doesn't (ever) implicitly dereference your variables.
113 If you want to get at the thing a reference is referring to, then you have
114 to do this yourself using either prefix typing indicators, like
115 C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
116 like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
118 =head1 COMMON MISTAKES
120 The two most common mistakes made in constructing something like
121 an array of arrays is either accidentally counting the number of
122 elements or else taking a reference to the same memory location
123 repeatedly. Here's the case where you just get the count instead
127 @list = somefunc($i);
128 $LoL[$i] = @list; # WRONG!
131 That's just the simple case of assigning a list to a scalar and getting
132 its element count. If that's what you really and truly want, then you
133 might do well to consider being a tad more explicit about it, like this:
136 @list = somefunc($i);
137 $counts[$i] = scalar @list;
140 Here's the case of taking a reference to the same memory location
144 @list = somefunc($i);
145 $LoL[$i] = \@list; # WRONG!
148 So, what's the big problem with that? It looks right, doesn't it?
149 After all, I just told you that you need an array of references, so by
150 golly, you've made me one!
152 Unfortunately, while this is true, it's still broken. All the references
153 in @LoL refer to the I<very same place>, and they will therefore all hold
154 whatever was last in @list! It's similar to the problem demonstrated in
155 the following C program:
159 struct passwd *getpwnam(), *rp, *dp;
160 rp = getpwnam("root");
161 dp = getpwnam("daemon");
163 printf("daemon name is %s\nroot name is %s\n",
164 dp->pw_name, rp->pw_name);
169 daemon name is daemon
172 The problem is that both C<rp> and C<dp> are pointers to the same location
173 in memory! In C, you'd have to remember to malloc() yourself some new
174 memory. In Perl, you'll want to use the array constructor C<[]> or the
175 hash constructor C<{}> instead. Here's the right way to do the preceding
176 broken code fragments:
179 @list = somefunc($i);
180 $LoL[$i] = [ @list ];
183 The square brackets make a reference to a new array with a I<copy>
184 of what's in @list at the time of the assignment. This is what
187 Note that this will produce something similar, but it's
195 Is it the same? Well, maybe so--and maybe not. The subtle difference
196 is that when you assign something in square brackets, you know for sure
197 it's always a brand new reference with a new I<copy> of the data.
198 Something else could be going on in this new case with the C<@{$LoL[$i]}}>
199 dereference on the left-hand-side of the assignment. It all depends on
200 whether C<$LoL[$i]> had been undefined to start with, or whether it
201 already contained a reference. If you had already populated @LoL with
204 $LoL[3] = \@another_list;
206 Then the assignment with the indirection on the left-hand-side would
207 use the existing reference that was already there:
211 Of course, this I<would> have the "interesting" effect of clobbering
212 @another_list. (Have you ever noticed how when a programmer says
213 something is "interesting", that rather than meaning "intriguing",
214 they're disturbingly more apt to mean that it's "annoying",
215 "difficult", or both? :-)
217 So just remember always to use the array or hash constructors with C<[]>
218 or C<{}>, and you'll be fine, although it's not always optimally
221 Surprisingly, the following dangerous-looking construct will
222 actually work out fine:
225 my @list = somefunc($i);
229 That's because my() is more of a run-time statement than it is a
230 compile-time declaration I<per se>. This means that the my() variable is
231 remade afresh each time through the loop. So even though it I<looks> as
232 though you stored the same variable reference each time, you actually did
233 not! This is a subtle distinction that can produce more efficient code at
234 the risk of misleading all but the most experienced of programmers. So I
235 usually advise against teaching it to beginners. In fact, except for
236 passing arguments to functions, I seldom like to see the gimme-a-reference
237 operator (backslash) used much at all in code. Instead, I advise
238 beginners that they (and most of the rest of us) should try to use the
239 much more easily understood constructors C<[]> and C<{}> instead of
240 relying upon lexical (or dynamic) scoping and hidden reference-counting to
241 do the right thing behind the scenes.
245 $LoL[$i] = [ @list ]; # usually best
246 $LoL[$i] = \@list; # perilous; just how my() was that list?
247 @{ $LoL[$i] } = @list; # way too tricky for most programmers
250 =head1 CAVEAT ON PRECEDENCE
252 Speaking of things like C<@{$LoL[$i]}>, the following are actually the
255 $listref->[2][2] # clear
256 $$listref[2][2] # confusing
258 That's because Perl's precedence rules on its five prefix dereferencers
259 (which look like someone swearing: C<$ @ * % &>) make them bind more
260 tightly than the postfix subscripting brackets or braces! This will no
261 doubt come as a great shock to the C or C++ programmer, who is quite
262 accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
263 element of C<a>. That is, they first take the subscript, and only then
264 dereference the thing at that subscript. That's fine in C, but this isn't C.
266 The seemingly equivalent construct in Perl, C<$$listref[$i]> first does
267 the deref of C<$listref>, making it take $listref as a reference to an
268 array, and then dereference that, and finally tell you the I<i'th> value
269 of the array pointed to by $LoL. If you wanted the C notion, you'd have to
270 write C<${$LoL[$i]}> to force the C<$LoL[$i]> to get evaluated first
271 before the leading C<$> dereferencer.
273 =head1 WHY YOU SHOULD ALWAYS C<use strict>
275 If this is starting to sound scarier than it's worth, relax. Perl has
276 some features to help you avoid its most common pitfalls. The best
277 way to avoid getting confused is to start every program like this:
282 This way, you'll be forced to declare all your variables with my() and
283 also disallow accidental "symbolic dereferencing". Therefore if you'd done
287 [ "fred", "barney", "pebbles", "bambam", "dino", ],
288 [ "homer", "bart", "marge", "maggie", ],
289 [ "george", "jane", "elroy", "judy", ],
292 print $listref[2][2];
294 The compiler would immediately flag that as an error I<at compile time>,
295 because you were accidentally accessing C<@listref>, an undeclared
296 variable, and it would thereby remind you to write instead:
298 print $listref->[2][2]
302 Before version 5.002, the standard Perl debugger didn't do a very nice job of
303 printing out complex data structures. With 5.002 or above, the
304 debugger includes several new features, including command line editing as
305 well as the C<x> command to dump out complex data structures. For
306 example, given the assignment to $LoL above, here's the debugger output:
309 $LoL = ARRAY(0x13b5a0)
327 There's also a lower-case B<x> command which is nearly the same.
331 Presented with little comment (these will get their own man pages someday)
332 here are short code examples illustrating access of various
333 types of data structures.
335 =head1 LISTS OF LISTS
337 =head2 Declaration of a LIST OF LISTS
340 [ "fred", "barney" ],
341 [ "george", "jane", "elroy" ],
342 [ "homer", "marge", "bart" ],
345 =head2 Generation of a LIST OF LISTS
349 push @LoL, [ split ];
354 $LoL[$i] = [ somefunc($i) ];
363 # add to an existing row
364 push @{ $LoL[0] }, "wilma", "betty";
366 =head2 Access and Printing of a LIST OF LISTS
372 $LoL[1][1] =~ s/(\w)/\u$1/;
374 # print the whole thing with refs
376 print "\t [ @$aref ],\n";
379 # print the whole thing with indices
380 for $i ( 0 .. $#LoL ) {
381 print "\t [ @{$LoL[$i]} ],\n";
384 # print the whole thing one at a time
385 for $i ( 0 .. $#LoL ) {
386 for $j ( 0 .. $#{ $LoL[$i] } ) {
387 print "elt $i $j is $LoL[$i][$j]\n";
391 =head1 HASHES OF LISTS
393 =head2 Declaration of a HASH OF LISTS
396 flintstones => [ "fred", "barney" ],
397 jetsons => [ "george", "jane", "elroy" ],
398 simpsons => [ "homer", "marge", "bart" ],
401 =head2 Generation of a HASH OF LISTS
404 # flintstones: fred barney wilma dino
406 next unless s/^(.*?):\s*//;
407 $HoL{$1} = [ split ];
410 # reading from file; more temps
411 # flintstones: fred barney wilma dino
412 while ( $line = <> ) {
413 ($who, $rest) = split /:\s*/, $line, 2;
414 @fields = split ' ', $rest;
415 $HoL{$who} = [ @fields ];
418 # calling a function that returns a list
419 for $group ( "simpsons", "jetsons", "flintstones" ) {
420 $HoL{$group} = [ get_family($group) ];
423 # likewise, but using temps
424 for $group ( "simpsons", "jetsons", "flintstones" ) {
425 @members = get_family($group);
426 $HoL{$group} = [ @members ];
429 # append new members to an existing family
430 push @{ $HoL{"flintstones"} }, "wilma", "betty";
432 =head2 Access and Printing of a HASH OF LISTS
435 $HoL{flintstones}[0] = "Fred";
438 $HoL{simpsons}[1] =~ s/(\w)/\u$1/;
440 # print the whole thing
441 foreach $family ( keys %HoL ) {
442 print "$family: @{ $HoL{$family} }\n"
445 # print the whole thing with indices
446 foreach $family ( keys %HoL ) {
448 foreach $i ( 0 .. $#{ $HoL{$family} } ) {
449 print " $i = $HoL{$family}[$i]";
454 # print the whole thing sorted by number of members
455 foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) {
456 print "$family: @{ $HoL{$family} }\n"
459 # print the whole thing sorted by number of members and name
460 foreach $family ( sort {
461 @{$HoL{$b}} <=> @{$HoL{$a}}
466 print "$family: ", join(", ", sort @{ $HoL{$family}), "\n";
469 =head1 LISTS OF HASHES
471 =head2 Declaration of a LIST OF HASHES
490 =head2 Generation of a LIST OF HASHES
493 # format: LEAD=fred FRIEND=barney
496 for $field ( split ) {
497 ($key, $value) = split /=/, $field;
498 $rec->{$key} = $value;
505 # format: LEAD=fred FRIEND=barney
508 push @LoH, { split /[\s+=]/ };
511 # calling a function that returns a key,value list, like
512 # "lead","fred","daughter","pebbles"
513 while ( %fields = getnextpairset() ) {
514 push @LoH, { %fields };
517 # likewise, but using no temp vars
519 push @LoH, { parsepairs($_) };
522 # add key/value to an element
523 $LoH[0]{pet} = "dino";
524 $LoH[2]{pet} = "santa's little helper";
526 =head2 Access and Printing of a LIST OF HASHES
529 $LoH[0]{lead} = "fred";
532 $LoH[1]{lead} =~ s/(\w)/\u$1/;
534 # print the whole thing with refs
537 for $role ( keys %$href ) {
538 print "$role=$href->{$role} ";
543 # print the whole thing with indices
544 for $i ( 0 .. $#LoH ) {
546 for $role ( keys %{ $LoH[$i] } ) {
547 print "$role=$LoH[$i]{$role} ";
552 # print the whole thing one at a time
553 for $i ( 0 .. $#LoH ) {
554 for $role ( keys %{ $LoH[$i] } ) {
555 print "elt $i $role is $LoH[$i]{$role}\n";
559 =head1 HASHES OF HASHES
561 =head2 Declaration of a HASH OF HASHES
571 "his boy" => "elroy",
580 =head2 Generation of a HASH OF HASHES
583 # flintstones: lead=fred pal=barney wife=wilma pet=dino
585 next unless s/^(.*?):\s*//;
587 for $field ( split ) {
588 ($key, $value) = split /=/, $field;
589 $HoH{$who}{$key} = $value;
593 # reading from file; more temps
595 next unless s/^(.*?):\s*//;
599 for $field ( split ) {
600 ($key, $value) = split /=/, $field;
601 $rec->{$key} = $value;
605 # calling a function that returns a key,value hash
606 for $group ( "simpsons", "jetsons", "flintstones" ) {
607 $HoH{$group} = { get_family($group) };
610 # likewise, but using temps
611 for $group ( "simpsons", "jetsons", "flintstones" ) {
612 %members = get_family($group);
613 $HoH{$group} = { %members };
616 # append new members to an existing family
622 for $what (keys %new_folks) {
623 $HoH{flintstones}{$what} = $new_folks{$what};
626 =head2 Access and Printing of a HASH OF HASHES
629 $HoH{flintstones}{wife} = "wilma";
632 $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
634 # print the whole thing
635 foreach $family ( keys %HoH ) {
637 for $role ( keys %{ $HoH{$family} } ) {
638 print "$role=$HoH{$family}{$role} ";
643 # print the whole thing somewhat sorted
644 foreach $family ( sort keys %HoH ) {
646 for $role ( sort keys %{ $HoH{$family} } ) {
647 print "$role=$HoH{$family}{$role} ";
653 # print the whole thing sorted by number of members
654 foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
656 for $role ( sort keys %{ $HoH{$family} } ) {
657 print "$role=$HoH{$family}{$role} ";
662 # establish a sort order (rank) for each role
664 for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
666 # now print the whole thing sorted by number of members
667 foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
669 # and print these according to rank order
670 for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
671 print "$role=$HoH{$family}{$role} ";
677 =head1 MORE ELABORATE RECORDS
679 =head2 Declaration of MORE ELABORATE RECORDS
681 Here's a sample showing how to create and use a record whose fields are of
682 many different sorts:
686 SEQUENCE => [ @old_values ],
687 LOOKUP => { %some_table },
688 THATCODE => \&some_function,
689 THISCODE => sub { $_[0] ** $_[1] },
695 print $rec->{LIST}[0];
696 $last = pop @ { $rec->{SEQUENCE} };
698 print $rec->{LOOKUP}{"key"};
699 ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
701 $answer = &{ $rec->{THATCODE} }($arg);
702 $answer = &{ $rec->{THISCODE} }($arg1, $arg2);
704 # careful of extra block braces on fh ref
705 print { $rec->{HANDLE} } "a string\n";
708 $rec->{HANDLE}->autoflush(1);
709 $rec->{HANDLE}->print(" a string\n");
711 =head2 Declaration of a HASH OF COMPLEX RECORDS
715 series => "flintstones",
716 nights => [ qw(monday thursday friday) ],
718 { name => "fred", role => "lead", age => 36, },
719 { name => "wilma", role => "wife", age => 31, },
720 { name => "pebbles", role => "kid", age => 4, },
726 nights => [ qw(wednesday saturday) ],
728 { name => "george", role => "lead", age => 41, },
729 { name => "jane", role => "wife", age => 39, },
730 { name => "elroy", role => "kid", age => 9, },
735 series => "simpsons",
736 nights => [ qw(monday) ],
738 { name => "homer", role => "lead", age => 34, },
739 { name => "marge", role => "wife", age => 37, },
740 { name => "bart", role => "kid", age => 11, },
745 =head2 Generation of a HASH OF COMPLEX RECORDS
748 # this is most easily done by having the file itself be
749 # in the raw data format as shown above. perl is happy
750 # to parse complex data structures if declared as data, so
751 # sometimes it's easiest to do that
753 # here's a piece by piece build up
755 $rec->{series} = "flintstones";
756 $rec->{nights} = [ find_days() ];
759 # assume this file in field=value syntax
761 %fields = split /[\s=]+/;
762 push @members, { %fields };
764 $rec->{members} = [ @members ];
766 # now remember the whole thing
767 $TV{ $rec->{series} } = $rec;
769 ###########################################################
770 # now, you might want to make interesting extra fields that
771 # include pointers back into the same data structure so if
772 # change one piece, it changes everywhere, like for examples
773 # if you wanted a {kids} field that was an array reference
774 # to a list of the kids' records without having duplicate
775 # records and thus update problems.
776 ###########################################################
777 foreach $family (keys %TV) {
778 $rec = $TV{$family}; # temp pointer
780 for $person ( @{ $rec->{members} } ) {
781 if ($person->{role} =~ /kid|son|daughter/) {
785 # REMEMBER: $rec and $TV{$family} point to same data!!
786 $rec->{kids} = [ @kids ];
789 # you copied the list, but the list itself contains pointers
790 # to uncopied objects. this means that if you make bart get
793 $TV{simpsons}{kids}[0]{age}++;
795 # then this would also change in
796 print $TV{simpsons}{members}[2]{age};
798 # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
799 # both point to the same underlying anonymous hash table
801 # print the whole thing
802 foreach $family ( keys %TV ) {
804 print " is on during @{ $TV{$family}{nights} }\n";
805 print "its members are:\n";
806 for $who ( @{ $TV{$family}{members} } ) {
807 print " $who->{name} ($who->{role}), age $who->{age}\n";
809 print "it turns out that $TV{$family}{lead} has ";
810 print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
811 print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
817 You cannot easily tie a multilevel data structure (such as a hash of
818 hashes) to a dbm file. The first problem is that all but GDBM and
819 Berkeley DB have size limitations, but beyond that, you also have problems
820 with how references are to be represented on disk. One experimental
821 module that does partially attempt to address this need is the MLDBM
822 module. Check your nearest CPAN site as described in L<perlmod> for
823 source code to MLDBM.
827 perlref(1), perllol(1), perldata(1), perlobj(1)
831 Tom Christiansen E<lt>F<tchrist@perl.com>E<gt>
834 Wed Oct 23 04:57:50 MET DST 1996