1 # DB_File.pm -- Perl 5 interface to Berkeley DB
3 # written by Paul Marquess (pmarquess@bfsec.bt.co.uk)
4 # last modified 28th June 1996
7 package DB_File::HASHINFO ;
12 @DB_File::HASHINFO::ISA = qw(Tie::Hash);
26 bless { 'bsize' => undef,
40 return $self->{$key} if exists $self->{$key} ;
43 croak "${pkg}::FETCH - Unknown element '$key'" ;
53 if ( exists $self->{$key} )
55 $self->{$key} = $value ;
60 croak "${pkg}::STORE - Unknown element '$key'" ;
68 if ( exists $self->{$key} )
70 delete $self->{$key} ;
75 croak "DB_File::HASHINFO::DELETE - Unknown element '$key'" ;
83 exists $self->{$key} ;
91 croak "${pkg} does not define the method ${method}" ;
94 sub DESTROY { undef %{$_[0]} }
95 sub FIRSTKEY { my $self = shift ; $self->NotHere(ref $self, "FIRSTKEY") }
96 sub NEXTKEY { my $self = shift ; $self->NotHere(ref $self, "NEXTKEY") }
97 sub CLEAR { my $self = shift ; $self->NotHere(ref $self, "CLEAR") }
99 package DB_File::RECNOINFO ;
103 @DB_File::RECNOINFO::ISA = qw(DB_File::HASHINFO) ;
109 bless { 'bval' => undef,
110 'cachesize' => undef,
119 package DB_File::BTREEINFO ;
123 @DB_File::BTREEINFO::ISA = qw(DB_File::HASHINFO) ;
129 bless { 'flags' => undef,
130 'cachesize' => undef,
131 'maxkeypage' => undef,
132 'minkeypage' => undef,
144 use vars qw($VERSION @ISA @EXPORT $AUTOLOAD $DB_BTREE $DB_HASH $DB_RECNO) ;
150 #typedef enum { DB_BTREE, DB_HASH, DB_RECNO } DBTYPE;
151 #$DB_BTREE = TIEHASH DB_File::BTREEINFO ;
152 #$DB_HASH = TIEHASH DB_File::HASHINFO ;
153 #$DB_RECNO = TIEHASH DB_File::RECNOINFO ;
155 $DB_BTREE = new DB_File::BTREEINFO ;
156 $DB_HASH = new DB_File::HASHINFO ;
157 $DB_RECNO = new DB_File::RECNOINFO ;
163 @ISA = qw(Tie::Hash Exporter DynaLoader);
165 $DB_BTREE $DB_HASH $DB_RECNO
200 ($constname = $AUTOLOAD) =~ s/.*:://;
201 my $val = constant($constname, @_ ? $_[0] : 0);
203 if ($! =~ /Invalid/) {
204 $AutoLoader::AUTOLOAD = $AUTOLOAD;
205 goto &AutoLoader::AUTOLOAD;
208 my($pack,$file,$line) = caller;
209 croak "Your vendor has not defined DB macro $constname, used at $file line $line.
213 eval "sub $AUTOLOAD { $val }";
217 bootstrap DB_File $VERSION;
219 # Preloaded methods go here. Autoload methods go after __END__, and are
220 # processed by the autosplit program.
225 croak "Usage: \$db->get_dup(key [,flag])\n"
226 unless @_ == 2 or @_ == 3 ;
233 my $wantarray = wantarray ;
237 # get the first value associated with the key, $key
238 $db->seq($key, $value, R_CURSOR()) ;
240 if ( $key eq $origkey) {
243 # save the value or count matches
245 { push (@values, $value) ; push(@values, 1) if $flag }
249 # iterate through the database until either EOF
250 # or a different key is encountered.
251 last if $db->seq($key, $value, R_NEXT()) != 0 or $key ne $origkey ;
255 $wantarray ? @values : $counter ;
266 DB_File - Perl5 access to Berkeley DB
273 [$X =] tie %hash, 'DB_File', [$filename, $flags, $mode, $DB_HASH] ;
274 [$X =] tie %hash, 'DB_File', $filename, $flags, $mode, $DB_BTREE ;
275 [$X =] tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO ;
277 $status = $X->del($key [, $flags]) ;
278 $status = $X->put($key, $value [, $flags]) ;
279 $status = $X->get($key, $value [, $flags]) ;
280 $status = $X->seq($key, $value, $flags) ;
281 $status = $X->sync([$flags]) ;
284 $count = $X->get_dup($key) ;
285 @list = $X->get_dup($key) ;
286 %list = $X->get_dup($key, 1) ;
293 B<DB_File> is a module which allows Perl programs to make use of the
294 facilities provided by Berkeley DB. If you intend to use this
295 module you should really have a copy of the Berkeley DB manual page at
296 hand. The interface defined here mirrors the Berkeley DB interface
299 Berkeley DB is a C library which provides a consistent interface to a
300 number of database formats. B<DB_File> provides an interface to all
301 three of the database types currently supported by Berkeley DB.
309 This database type allows arbitrary key/value pairs to be stored in data
310 files. This is equivalent to the functionality provided by other
311 hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember though,
312 the files created using DB_HASH are not compatible with any of the
313 other packages mentioned.
315 A default hashing algorithm, which will be adequate for most
316 applications, is built into Berkeley DB. If you do need to use your own
317 hashing algorithm it is possible to write your own in Perl and have
318 B<DB_File> use it instead.
322 The btree format allows arbitrary key/value pairs to be stored in a
323 sorted, balanced binary tree.
325 As with the DB_HASH format, it is possible to provide a user defined
326 Perl routine to perform the comparison of keys. By default, though, the
327 keys are stored in lexical order.
331 DB_RECNO allows both fixed-length and variable-length flat text files
332 to be manipulated using the same key/value pair interface as in DB_HASH
333 and DB_BTREE. In this case the key will consist of a record (line)
338 =head2 How does DB_File interface to Berkeley DB?
340 B<DB_File> allows access to Berkeley DB files using the tie() mechanism
341 in Perl 5 (for full details, see L<perlfunc/tie()>). This facility
342 allows B<DB_File> to access Berkeley DB files using either an
343 associative array (for DB_HASH & DB_BTREE file types) or an ordinary
344 array (for the DB_RECNO file type).
346 In addition to the tie() interface, it is also possible to access most
347 of the functions provided in the Berkeley DB API directly.
348 See L<"Using the Berkeley DB API Directly">.
350 =head2 Opening a Berkeley DB Database File
352 Berkeley DB uses the function dbopen() to open or create a database.
353 Below is the C prototype for dbopen().
356 dbopen (const char * file, int flags, int mode,
357 DBTYPE type, const void * openinfo)
359 The parameter C<type> is an enumeration which specifies which of the 3
360 interface methods (DB_HASH, DB_BTREE or DB_RECNO) is to be used.
361 Depending on which of these is actually chosen, the final parameter,
362 I<openinfo> points to a data structure which allows tailoring of the
363 specific interface method.
365 This interface is handled slightly differently in B<DB_File>. Here is
366 an equivalent call using B<DB_File>:
368 tie %array, 'DB_File', $filename, $flags, $mode, $DB_HASH ;
370 The C<filename>, C<flags> and C<mode> parameters are the direct
371 equivalent of their dbopen() counterparts. The final parameter $DB_HASH
372 performs the function of both the C<type> and C<openinfo> parameters in
375 In the example above $DB_HASH is actually a pre-defined reference to a
376 hash object. B<DB_File> has three of these pre-defined references.
377 Apart from $DB_HASH, there is also $DB_BTREE and $DB_RECNO.
379 The keys allowed in each of these pre-defined references is limited to
380 the names used in the equivalent C structure. So, for example, the
381 $DB_HASH reference will only allow keys called C<bsize>, C<cachesize>,
382 C<ffactor>, C<hash>, C<lorder> and C<nelem>.
384 To change one of these elements, just assign to it like this:
386 $DB_HASH->{'cachesize'} = 10000 ;
388 The three predefined variables $DB_HASH, $DB_BTREE and $DB_RECNO are
389 usually adequate for most applications. If you do need to create extra
390 instances of these objects, constructors are available for each file
393 Here are examples of the constructors and the valid options available
394 for DB_HASH, DB_BTREE and DB_RECNO respectively.
396 $a = new DB_File::HASHINFO ;
404 $b = new DB_File::BTREEINFO ;
414 $c = new DB_File::RECNOINFO ;
423 The values stored in the hashes above are mostly the direct equivalent
424 of their C counterpart. Like their C counterparts, all are set to a
425 default set of values - that means you don't have to set I<all> of the
426 values when you only want to change one. Here is an example:
428 $a = new DB_File::HASHINFO ;
429 $a->{'cachesize'} = 12345 ;
430 tie %y, 'DB_File', "filename", $flags, 0777, $a ;
432 A few of the values need extra discussion here. When used, the C
433 equivalent of the keys C<hash>, C<compare> and C<prefix> store pointers
434 to C functions. In B<DB_File> these keys are used to store references
435 to Perl subs. Below are templates for each of the subs:
441 # return the hash value for $data
447 my ($key, $key2) = @_ ;
449 # return 0 if $key1 eq $key2
450 # -1 if $key1 lt $key2
451 # 1 if $key1 gt $key2
452 return (-1 , 0 or 1) ;
457 my ($key, $key2) = @_ ;
459 # return number of bytes of $key2 which are
460 # necessary to determine that it is greater than $key1
464 See L<"Using BTREE"> for an example of using the C<compare>
466 =head2 Default Parameters
468 It is possible to omit some or all of the final 4 parameters in the
469 call to C<tie> and let them take default values. As DB_HASH is the most
470 common file format used, the call:
472 tie %A, "DB_File", "filename" ;
476 tie %A, "DB_File", "filename", O_CREAT|O_RDWR, 0640, $DB_HASH ;
478 It is also possible to omit the filename parameter as well, so the
485 tie %A, "DB_File", undef, O_CREAT|O_RDWR, 0640, $DB_HASH ;
487 See L<"In Memory Databases"> for a discussion on the use of C<undef>
488 in place of a filename.
490 =head2 Handling duplicate keys in BTREE databases
492 The BTREE file type in Berkeley DB optionally allows a single key to be
493 associated with an arbitrary number of values. This option is enabled by
494 setting the flags element of C<$DB_BTREE> to R_DUP when creating the
497 There are some difficulties in using the tied hash interface if you
498 want to manipulate a BTREE database with duplicate keys. Consider this
507 # Enable duplicate records
508 $DB_BTREE->{'flags'} = R_DUP ;
510 tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0640, $DB_BTREE
511 or die "Cannot open $filename: $!\n";
513 # Add some key/value pairs to the file
514 $h{'Wall'} = 'Larry' ;
515 $h{'Wall'} = 'Brick' ; # Note the duplicate key
516 $h{'Smith'} = 'John' ;
517 $h{'mouse'} = 'mickey' ;
519 # iterate through the associative array
520 # and print each key/value pair.
522 { print "$_ -> $h{$_}\n" }
531 As you can see 2 records have been successfully created with key C<Wall>
532 - the only thing is, when they are retrieved from the database they
533 both I<seem> to have the same value, namely C<Larry>. The problem is
534 caused by the way that the associative array interface works.
535 Basically, when the associative array interface is used to fetch the
536 value associated with a given key, it will only ever retrieve the first
539 Although it may not be immediately obvious from the code above, the
540 associative array interface can be used to write values with duplicate
541 keys, but it cannot be used to read them back from the database.
543 The way to get around this problem is to use the Berkeley DB API method
544 called C<seq>. This method allows sequential access to key/value
545 pairs. See L<"Using the Berkeley DB API Directly"> for details of both
546 the C<seq> method and the API in general.
548 Here is the script above rewritten using the C<seq> API method.
556 # Enable duplicate records
557 $DB_BTREE->{'flags'} = R_DUP ;
559 $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0640, $DB_BTREE
560 or die "Cannot open $filename: $!\n";
562 # Add some key/value pairs to the file
563 $h{'Wall'} = 'Larry' ;
564 $h{'Wall'} = 'Brick' ; # Note the duplicate key
565 $h{'Smith'} = 'John' ;
566 $h{'mouse'} = 'mickey' ;
568 # Point to the first record in the btree
569 $x->seq($key, $value, R_FIRST) ;
571 # now iterate through the rest of the btree
572 # and print each key/value pair.
573 print "$key -> $value\n" ;
574 while ( $x->seq($key, $value, R_NEXT) == 0)
575 { print "$key -> $value\n" }
587 This time we have got all the key/value pairs, including both the
588 values associated with the key C<Wall>.
590 C<DB_File> comes with a utility method, called C<get_dup>, to assist in
591 reading duplicate values from BTREE databases. The method can take the
594 $count = $x->get_dup($key) ;
595 @list = $x->get_dup($key) ;
596 %list = $x->get_dup($key, 1) ;
598 In a scalar context the method returns the number of values associated
599 with the key, C<$key>.
601 In list context, it returns all the values which match C<$key>. Note
602 that the values returned will be in an apparently random order.
604 If the second parameter is present and evaluates TRUE, the method
605 returns an associative array whose keys correspond to the the values
606 from the BTREE and whose values are all C<1>.
608 So assuming the database created above, we can use C<get_dups> like
611 $cnt = $x->get_dups("Wall") ;
612 print "Wall occurred $cnt times\n" ;
614 %hash = $x->get_dups("Wall", 1) ;
615 print "Larry is there\n" if $hash{'Larry'} ;
617 @list = $x->get_dups("Wall") ;
618 print "Wall => [@list]\n" ;
620 @list = $x->get_dups("Smith") ;
621 print "Smith => [@list]\n" ;
623 @list = $x->get_dups("Dog") ;
624 print "Dog => [@list]\n" ;
629 Wall occurred 2 times
631 Wall => [Brick Larry]
637 In order to make RECNO more compatible with Perl the array offset for
638 all RECNO arrays begins at 0 rather than 1 as in Berkeley DB.
640 As with normal Perl arrays, a RECNO array can be accessed using
641 negative indexes. The index -1 refers to the last element of the array,
642 -2 the second last, and so on. Attempting to access an element before
643 the start of the array will raise a fatal run-time error.
645 =head2 In Memory Databases
647 Berkeley DB allows the creation of in-memory databases by using NULL
648 (that is, a C<(char *)0> in C) in place of the filename. B<DB_File>
649 uses C<undef> instead of NULL to provide this functionality.
652 =head2 Using the Berkeley DB API Directly
654 As well as accessing Berkeley DB using a tied hash or array, it is also
655 possible to make direct use of most of the API functions defined in the
656 Berkeley DB documentation.
658 To do this you need to store a copy of the object returned from the tie.
660 $db = tie %hash, "DB_File", "filename" ;
662 Once you have done that, you can access the Berkeley DB API functions
663 as B<DB_File> methods directly like this:
665 $db->put($key, $value, R_NOOVERWRITE) ;
667 B<Important:> If you have saved a copy of the object returned from
668 C<tie>, the underlying database file will I<not> be closed until both
669 the tied variable is untied and all copies of the saved object are
673 $db = tie %hash, "DB_File", "filename"
674 or die "Cannot tie filename: $!" ;
679 All the functions defined in L<dbopen> are available except for
680 close() and dbopen() itself. The B<DB_File> method interface to the
681 supported functions have been implemented to mirror the way Berkeley DB
682 works whenever possible. In particular note that:
688 The methods return a status value. All return 0 on success.
689 All return -1 to signify an error and set C<$!> to the exact
690 error code. The return code 1 generally (but not always) means that the
691 key specified did not exist in the database.
693 Other return codes are defined. See below and in the Berkeley DB
694 documentation for details. The Berkeley DB documentation should be used
695 as the definitive source.
699 Whenever a Berkeley DB function returns data via one of its parameters,
700 the equivalent B<DB_File> method does exactly the same.
704 If you are careful, it is possible to mix API calls with the tied
705 hash/array interface in the same piece of code. Although only a few of
706 the methods used to implement the tied interface currently make use of
707 the cursor, you should always assume that the cursor has been changed
708 any time the tied hash/array interface is used. As an example, this
709 code will probably not do what you expect:
711 $X = tie %x, 'DB_File', $filename, O_RDWR|O_CREAT, 0777, $DB_BTREE
712 or die "Cannot tie $filename: $!" ;
714 # Get the first key/value pair and set the cursor
715 $X->seq($key, $value, R_FIRST) ;
717 # this line will modify the cursor
718 $count = scalar keys %x ;
720 # Get the second key/value pair.
721 # oops, it didn't, it got the last key/value pair!
722 $X->seq($key, $value, R_NEXT) ;
724 The code above can be rearranged to get around the problem, like this:
726 $X = tie %x, 'DB_File', $filename, O_RDWR|O_CREAT, 0777, $DB_BTREE
727 or die "Cannot tie $filename: $!" ;
729 # this line will modify the cursor
730 $count = scalar keys %x ;
732 # Get the first key/value pair and set the cursor
733 $X->seq($key, $value, R_FIRST) ;
735 # Get the second key/value pair.
737 $X->seq($key, $value, R_NEXT) ;
741 All the constants defined in L<dbopen> for use in the flags parameters
742 in the methods defined below are also available. Refer to the Berkeley
743 DB documentation for the precise meaning of the flags values.
745 Below is a list of the methods available.
749 =item C<$status = $X-E<gt>get($key, $value [, $flags]) ;>
751 Given a key (C<$key>) this method reads the value associated with it
752 from the database. The value read from the database is returned in the
755 If the key does not exist the method returns 1.
757 No flags are currently defined for this method.
759 =item C<$status = $X-E<gt>put($key, $value [, $flags]) ;>
761 Stores the key/value pair in the database.
763 If you use either the R_IAFTER or R_IBEFORE flags, the C<$key> parameter
764 will have the record number of the inserted key/value pair set.
766 Valid flags are R_CURSOR, R_IAFTER, R_IBEFORE, R_NOOVERWRITE and
769 =item C<$status = $X-E<gt>del($key [, $flags]) ;>
771 Removes all key/value pairs with key C<$key> from the database.
773 A return code of 1 means that the requested key was not in the
776 R_CURSOR is the only valid flag at present.
778 =item C<$status = $X-E<gt>fd ;>
780 Returns the file descriptor for the underlying database.
782 See L<"Locking Databases"> for an example of how to make use of the
783 C<fd> method to lock your database.
785 =item C<$status = $X-E<gt>seq($key, $value, $flags) ;>
787 This interface allows sequential retrieval from the database. See
788 L<dbopen> for full details.
790 Both the C<$key> and C<$value> parameters will be set to the key/value
791 pair read from the database.
793 The flags parameter is mandatory. The valid flag values are R_CURSOR,
794 R_FIRST, R_LAST, R_NEXT and R_PREV.
796 =item C<$status = $X-E<gt>sync([$flags]) ;>
798 Flushes any cached buffers to disk.
800 R_RECNOSYNC is the only valid flag at present.
806 It is always a lot easier to understand something when you see a real
807 example. So here are a few.
814 tie %h, "DB_File", "hashed", O_RDWR|O_CREAT, 0640, $DB_HASH
815 or die "Cannot open file 'hashed': $!\n";
817 # Add a key/value pair to the file
818 $h{"apple"} = "orange" ;
820 # Check for existence of a key
821 print "Exists\n" if $h{"banana"} ;
830 Here is a sample of code which uses BTREE. Just to make life more
831 interesting the default comparison function will not be used. Instead
832 a Perl sub, C<Compare()>, will be used to do a case insensitive
840 my ($key1, $key2) = @_ ;
842 "\L$key1" cmp "\L$key2" ;
845 $DB_BTREE->{'compare'} = 'Compare' ;
847 tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0640, $DB_BTREE
848 or die "Cannot open file 'tree': $!\n" ;
850 # Add a key/value pair to the file
851 $h{'Wall'} = 'Larry' ;
852 $h{'Smith'} = 'John' ;
853 $h{'mouse'} = 'mickey' ;
854 $h{'duck'} = 'donald' ;
859 # Cycle through the keys printing them in order.
860 # Note it is not necessary to sort the keys as
861 # the btree will have kept them in order automatically.
867 Here is the output from the code above.
876 Here is a simple example that uses RECNO.
881 $DB_RECNO->{'psize'} = 3000 ;
883 tie @h, "DB_File", "text", O_RDWR|O_CREAT, 0640, $DB_RECNO
884 or die "Cannot open file 'text': $!\n" ;
886 # Add a key/value pair to the file
889 # Check for existence of a key
890 print "Exists\n" if $h[1] ;
894 =head2 Locking Databases
896 Concurrent access of a read-write database by several parties requires
897 them all to use some kind of locking. Here's an example of Tom's that
898 uses the I<fd> method to get the file descriptor, and then a careful
899 open() to give something Perl will flock() for you. Run this repeatedly
900 in the background to watch the locks granted in proper order.
912 my($oldval, $fd, $db, %db, $value, $key);
914 $key = shift || 'default';
915 $value = shift || 'magic';
919 $db = tie(%db, 'DB_File', '/tmp/foo.db', O_CREAT|O_RDWR, 0644)
920 || die "dbcreat /tmp/foo.db $!";
922 print "$$: db fd is $fd\n";
923 open(DB_FH, "+<&=$fd") || die "dup $!";
926 unless (flock (DB_FH, LOCK_SH | LOCK_NB)) {
927 print "$$: CONTENTION; can't read during write update!
928 Waiting for read lock ($!) ....";
929 unless (flock (DB_FH, LOCK_SH)) { die "flock: $!" }
931 print "$$: Read lock granted\n";
934 print "$$: Old value was $oldval\n";
935 flock(DB_FH, LOCK_UN);
937 unless (flock (DB_FH, LOCK_EX | LOCK_NB)) {
938 print "$$: CONTENTION; must have exclusive lock!
939 Waiting for write lock ($!) ....";
940 unless (flock (DB_FH, LOCK_EX)) { die "flock: $!" }
943 print "$$: Write lock granted\n";
948 flock(DB_FH, LOCK_UN);
952 print "$$: Updated db to $key=$value\n";
964 When B<DB_File> is opening a database file it no longer terminates the
965 process if I<dbopen> returned an error. This allows file protection
966 errors to be caught at run time. Thanks to Judith Grass
967 E<lt>grass@cybercash.comE<gt> for spotting the bug.
971 Added prototype support for multiple btree compare callbacks.
975 B<DB_File> has been in use for over a year. To reflect that, the
976 version number has been incremented to 1.0.
978 Added complete support for multiple concurrent callbacks.
980 Using the I<push> method on an empty list didn't work properly. This
985 Fixed a core dump problem with SunOS.
987 The return value from TIEHASH wasn't set to NULL when dbopen returned
992 Merged OS2 specific code into DB_File.xs
994 Removed some redundant code in DB_File.xs.
996 Documentation update.
998 Allow negative subscripts with RECNO interface.
1000 Changed the default flags from O_RDWR to O_CREAT|O_RDWR.
1002 The example code which showed how to lock a database needed a call to
1003 C<sync> added. Without it the resultant database file was empty.
1005 Added get_dups method.
1009 If you happen to find any other functions defined in the source for
1010 this module that have not been mentioned in this document -- beware. I
1011 may drop them at a moments notice.
1013 If you cannot find any, then either you didn't look very hard or the
1014 moment has passed and I have dropped them.
1018 Some older versions of Berkeley DB had problems with fixed length
1019 records using the RECNO file format. The newest version at the time of
1020 writing was 1.85 - this seems to have fixed the problems with RECNO.
1022 I am sure there are bugs in the code. If you do find any, or can
1023 suggest any enhancements, I would welcome your comments.
1027 Berkeley DB is available at your nearest CPAN archive (see
1028 L<perlmod/"CPAN"> for a list) in F<src/misc/db.1.85.tar.gz>, or via the
1029 host F<ftp.cs.berkeley.edu> in F</ucb/4bsd/db.tar.gz>. It is I<not> under
1032 If you are running IRIX, then get Berkeley DB from
1033 F<http://reality.sgi.com/ariel>. It has the patches necessary to
1034 compile properly on IRIX 5.3.
1038 L<perl(1)>, L<dbopen(3)>, L<hash(3)>, L<recno(3)>, L<btree(3)>
1040 Berkeley DB is available from F<ftp.cs.berkeley.edu> in the directory
1045 The DB_File interface was written by Paul Marquess
1046 E<lt>pmarquess@bfsec.bt.co.ukE<gt>.
1047 Questions about the DB system itself may be addressed to Keith Bostic
1048 E<lt>bostic@cs.berkeley.eduE<gt>.