README

   1 NAME
   2     DBM::Deep - A pure perl multi-level hash/array DBM
   3
   4 SYNOPSIS
   5       use DBM::Deep;
   6       my $db = new DBM::Deep "foo.db";
   7
   8       $db->{key} = 'value'; # tie() style
   9       print $db->{key};
  10
  11       $db->put('key', 'value'); # OO style
  12       print $db->get('key');
  13
  14       # true multi-level support
  15       $db->{my_complex} = [
  16             'hello', { perl => 'rules' },
  17             42, 99 ];
  18
  19 DESCRIPTION
  20     A unique flat-file database module, written in pure perl. True
  21     multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO
  22     / tie() interface, cross-platform FTPable files, and quite fast. Can
  23     handle millions of keys and unlimited hash levels without significant
  24     slow-down. Written from the ground-up in pure perl -- this is NOT a
  25     wrapper around a C-based DBM. Out-of-the-box compatibility with Unix,
  26     Mac OS X and Windows.
  27
  28 INSTALLATION
  29     Hopefully you are using CPAN's excellent Perl module, which will
  30     download and install the module for you. If not, get the tarball, and
  31     run these commands:
  32
  33             tar zxf DBM-Deep-*
  34             cd DBM-Deep-*
  35             perl Makefile.PL
  36             make
  37             make test
  38             make install
  39
  40 SETUP
  41     Construction can be done OO-style (which is the recommended way), or
  42     using Perl's tie() function. Both are examined here.
  43
  44   OO CONSTRUCTION
  45     The recommended way to construct a DBM::Deep object is to use the new()
  46     method, which gets you a blessed, tied hash or array reference.
  47
  48             my $db = new DBM::Deep "foo.db";
  49
  50     This opens a new database handle, mapped to the file "foo.db". If this
  51     file does not exist, it will automatically be created. DB files are
  52     opened in "r+" (read/write) mode, and the type of object returned is a
  53     hash, unless otherwise specified (see OPTIONS below).
  54
  55     You can pass a number of options to the constructor to specify things
  56     like locking, autoflush, etc. This is done by passing an inline hash:
  57
  58             my $db = new DBM::Deep(
  59                     file => "foo.db",
  60                     locking => 1,
  61                     autoflush => 1
  62             );
  63
  64     Notice that the filename is now specified *inside* the hash with the
  65     "file" parameter, as opposed to being the sole argument to the
  66     constructor. This is required if any options are specified. See OPTIONS
  67     below for the complete list.
  68
  69     You can also start with an array instead of a hash. For this, you must
  70     specify the "type" parameter:
  71
  72             my $db = new DBM::Deep(
  73                     file => "foo.db",
  74                     type => DBM::Deep::TYPE_ARRAY
  75             );
  76
  77     Note: Specifing the "type" parameter only takes effect when beginning a
  78     new DB file. If you create a DBM::Deep object with an existing file, the
  79     "type" will be loaded from the file header, and ignored if it is passed
  80     to the constructor.
  81
  82   TIE CONSTRUCTION
  83     Alternatively, you can create a DBM::Deep handle by using Perl's
  84     built-in tie() function. This is not ideal, because you get only a
  85     basic, tied hash (or array) which is not blessed, so you can't call any
  86     functions on it.
  87
  88             my %hash;
  89             tie %hash, "DBM::Deep", "foo.db";
  90
  91             my @array;
  92             tie @array, "DBM::Deep", "bar.db";
  93
  94     As with the OO constructor, you can replace the DB filename parameter
  95     with a hash containing one or more options (see OPTIONS just below for
  96     the complete list).
  97
  98             tie %hash, "DBM::Deep", {
  99                     file => "foo.db",
 100                     locking => 1,
 101                     autoflush => 1
 102             };
 103
 104   OPTIONS
 105     There are a number of options that can be passed in when constructing
 106     your DBM::Deep objects. These apply to both the OO- and tie- based
 107     approaches.
 108
 109     * file
 110         Filename of the DB file to link the handle to. You can pass a full
 111         absolute filesystem path, partial path, or a plain filename if the
 112         file is in the current working directory. This is a required
 113         parameter.
 114
 115     * mode
 116         File open mode (read-only, read-write, etc.) string passed to Perl's
 117         FileHandle module. This is an optional parameter, and defaults to
 118         "r+" (read/write). Note: If the default (r+) mode is selected, the
 119         file will also be auto- created if it doesn't exist.
 120
 121     * type
 122         This parameter specifies what type of object to create, a hash or
 123         array. Use one of these two constants: "DBM::Deep::TYPE_HASH" or
 124         "DBM::Deep::TYPE_ARRAY". This only takes effect when beginning a new
 125         file. This is an optional parameter, and defaults to hash.
 126
 127     * locking
 128         Specifies whether locking is to be enabled. DBM::Deep uses Perl's
 129         Fnctl flock() function to lock the database in exclusive mode for
 130         writes, and shared mode for reads. Pass any true value to enable.
 131         This affects the base DB handle *and any child hashes or arrays*
 132         that use the same DB file. This is an optional parameter, and
 133         defaults to 0 (disabled). See LOCKING below for more.
 134
 135     * autoflush
 136         Specifies whether autoflush is to be enabled on the underlying
 137         FileHandle. This obviously slows down write operations, but is
 138         required if you may have multiple processes accessing the same DB
 139         file (also consider enable *locking* or at least *volatile*). Pass
 140         any true value to enable. This is an optional parameter, and
 141         defaults to 0 (disabled).
 142
 143     * volatile
 144         If *volatile* mode is enabled, DBM::Deep will stat() the DB file
 145         before each STORE() operation. This is required if an outside force
 146         may change the size of the file between transactions. Locking also
 147         implicitly enables volatile. This is useful if you want to use a
 148         different locking system or write your own. Pass any true value to
 149         enable. This is an optional parameter, and defaults to 0 (disabled).
 150
 151     * autobless
 152         If *autobless* mode is enabled, DBM::Deep will preserve blessed
 153         hashes, and restore them when fetched. This is an experimental
 154         feature, and does have side-effects. Basically, when hashes are
 155         re-blessed into their original classes, they are no longer blessed
 156         into the DBM::Deep class! So you won't be able to call any DBM::Deep
 157         methods on them. You have been warned. This is an optional
 158         parameter, and defaults to 0 (disabled).
 159
 160     * filter_*
 161         See FILTERS below.
 162
 163     * debug
 164         Setting *debug* mode will make all errors non-fatal, dump them out
 165         to STDERR, and continue on. This is for debugging purposes only, and
 166         probably not what you want. This is an optional parameter, and
 167         defaults to 0 (disabled).
 168
 169 TIE INTERFACE
 170     With DBM::Deep you can access your databases using Perl's standard
 171     hash/array syntax. Because all Deep objects are *tied* to hashes or
 172     arrays, you can treat them as such. Deep will intercept all reads/writes
 173     and direct them to the right place -- the DB file. This has nothing to
 174     do with the "TIE CONSTRUCTION" section above. This simply tells you how
 175     to use DBM::Deep using regular hashes and arrays, rather than calling
 176     functions like "get()" and "put()" (although those work too). It is
 177     entirely up to you how to want to access your databases.
 178
 179   HASHES
 180     You can treat any DBM::Deep object like a normal Perl hash reference.
 181     Add keys, or even nested hashes (or arrays) using standard Perl syntax:
 182
 183             my $db = new DBM::Deep "foo.db";
 184
 185             $db->{mykey} = "myvalue";
 186             $db->{myhash} = {};
 187             $db->{myhash}->{subkey} = "subvalue";
 188
 189             print $db->{myhash}->{subkey} . "\n";
 190
 191     You can even step through hash keys using the normal Perl "keys()"
 192     function:
 193
 194             foreach my $key (keys %$db) {
 195                     print "$key: " . $db->{$key} . "\n";
 196             }
 197
 198     Remember that Perl's "keys()" function extracts *every* key from the
 199     hash and pushes them onto an array, all before the loop even begins. If
 200     you have an extra large hash, this may exhaust Perl's memory. Instead,
 201     consider using Perl's "each()" function, which pulls keys/values one at
 202     a time, using very little memory:
 203
 204             while (my ($key, $value) = each %$db) {
 205                     print "$key: $value\n";
 206             }
 207
 208     Please note that when using "each()", you should always pass a direct
 209     hash reference, not a lookup. Meaning, you should never do this:
 210
 211             # NEVER DO THIS
 212             while (my ($key, $value) = each %{$db->{foo}}) { # BAD
 213
 214     This causes an infinite loop, because for each iteration, Perl is
 215     calling FETCH() on the $db handle, resulting in a "new" hash for foo
 216     every time, so it effectively keeps returning the first key over and
 217     over again. Instead, assign a temporary variable to "$db-"{foo}>, then
 218     pass that to each().
 219
 220   ARRAYS
 221     As with hashes, you can treat any DBM::Deep object like a normal Perl
 222     array reference. This includes inserting, removing and manipulating
 223     elements, and the "push()", "pop()", "shift()", "unshift()" and
 224     "splice()" functions. The object must have first been created using type
 225     "DBM::Deep::TYPE_ARRAY", or simply be a nested array reference inside a
 226     hash. Example:
 227
 228             my $db = new DBM::Deep(
 229                     file => "foo-array.db",
 230                     type => DBM::Deep::TYPE_ARRAY
 231             );
 232
 233             $db->[0] = "foo";
 234             push @$db, "bar", "baz";
 235             unshift @$db, "bah";
 236
 237             my $last_elem = pop @$db; # baz
 238             my $first_elem = shift @$db; # bah
 239             my $second_elem = $db->[1]; # bar
 240
 241             my $num_elements = scalar @$db;
 242
 243 OO INTERFACE
 244     In addition to the *tie()* interface, you can also use a standard OO
 245     interface to manipulate all aspects of DBM::Deep databases. Each type of
 246     object (hash or array) has its own methods, but both types share the
 247     following common methods: "put()", "get()", "exists()", "delete()" and
 248     "clear()".
 249
 250     * put()
 251         Stores a new hash key/value pair, or sets an array element value.
 252         Takes two arguments, the hash key or array index, and the new value.
 253         The value can be a scalar, hash ref or array ref. Returns true on
 254         success, false on failure.
 255
 256                 $db->put("foo", "bar"); # for hashes
 257                 $db->put(1, "bar"); # for arrays
 258
 259     * get()
 260         Fetches the value of a hash key or array element. Takes one
 261         argument: the hash key or array index. Returns a scalar, hash ref or
 262         array ref, depending on the data type stored.
 263
 264                 my $value = $db->get("foo"); # for hashes
 265                 my $value = $db->get(1); # for arrays
 266
 267     * exists()
 268         Checks if a hash key or array index exists. Takes one argument: the
 269         hash key or array index. Returns true if it exists, false if not.
 270
 271                 if ($db->exists("foo")) { print "yay!\n"; } # for hashes
 272                 if ($db->exists(1)) { print "yay!\n"; } # for arrays
 273
 274     * delete()
 275         Deletes one hash key/value pair or array element. Takes one
 276         argument: the hash key or array index. Returns true on success,
 277         false if not found. For arrays, the remaining elements located after
 278         the deleted element are NOT moved over. The deleted element is
 279         essentially just undefined, which is exactly how Perl's internal
 280         arrays work. Please note that the space occupied by the deleted
 281         key/value or element is not reused again -- see "UNUSED SPACE
 282         RECOVERY" below for details and workarounds.
 283
 284                 $db->delete("foo"); # for hashes
 285                 $db->delete(1); # for arrays
 286
 287     * clear()
 288         Deletes all hash keys or array elements. Takes no arguments. No
 289         return value. Please note that the space occupied by the deleted
 290         keys/values or elements is not reused again -- see "UNUSED SPACE
 291         RECOVERY" below for details and workarounds.
 292
 293                 $db->clear(); # hashes or arrays
 294
 295   HASHES
 296     For hashes, DBM::Deep supports all the common methods described above,
 297     and the following additional methods: "first_key()" and "next_key()".
 298
 299     * first_key()
 300         Returns the "first" key in the hash. As with built-in Perl hashes,
 301         keys are fetched in an undefined order (which appears random). Takes
 302         no arguments, returns the key as a scalar value.
 303
 304                 my $key = $db->first_key();
 305
 306     * next_key()
 307         Returns the "next" key in the hash, given the previous one as the
 308         sole argument. Returns undef if there are no more keys to be
 309         fetched.
 310
 311                 $key = $db->next_key($key);
 312
 313     Here are some examples of using hashes:
 314
 315             my $db = new DBM::Deep "foo.db";
 316
 317             $db->put("foo", "bar");
 318             print "foo: " . $db->get("foo") . "\n";
 319
 320             $db->put("baz", {}); # new child hash ref
 321             $db->get("baz")->put("buz", "biz");
 322             print "buz: " . $db->get("baz")->get("buz") . "\n";
 323
 324             my $key = $db->first_key();
 325             while ($key) {
 326                     print "$key: " . $db->get($key) . "\n";
 327                     $key = $db->next_key($key);
 328             }
 329
 330             if ($db->exists("foo")) { $db->delete("foo"); }
 331
 332   ARRAYS
 333     For arrays, DBM::Deep supports all the common methods described above,
 334     and the following additional methods: "length()", "push()", "pop()",
 335     "shift()", "unshift()" and "splice()".
 336
 337     * length()
 338         Returns the number of elements in the array. Takes no arguments.
 339
 340                 my $len = $db->length();
 341
 342     * push()
 343         Adds one or more elements onto the end of the array. Accepts
 344         scalars, hash refs or array refs. No return value.
 345
 346                 $db->push("foo", "bar", {});
 347
 348     * pop()
 349         Fetches the last element in the array, and deletes it. Takes no
 350         arguments. Returns undef if array is empty. Returns the element
 351         value.
 352
 353                 my $elem = $db->pop();
 354
 355     * shift()
 356         Fetches the first element in the array, deletes it, then shifts all
 357         the remaining elements over to take up the space. Returns the
 358         element value. This method is not recommended with large arrays --
 359         see "LARGE ARRAYS" below for details.
 360
 361                 my $elem = $db->shift();
 362
 363     * unshift()
 364         Inserts one or more elements onto the beginning of the array,
 365         shifting all existing elements over to make room. Accepts scalars,
 366         hash refs or array refs. No return value. This method is not
 367         recommended with large arrays -- see <LARGE ARRAYS> below for
 368         details.
 369
 370                 $db->unshift("foo", "bar", {});
 371
 372     * splice()
 373         Performs exactly like Perl's built-in function of the same name. See
 374         "perldoc -f splice" for usage -- it is too complicated to document
 375         here. This method is not recommended with large arrays -- see "LARGE
 376         ARRAYS" below for details.
 377
 378     Here are some examples of using arrays:
 379
 380             my $db = new DBM::Deep(
 381                     file => "foo.db",
 382                     type => DBM::Deep::TYPE_ARRAY
 383             );
 384
 385             $db->push("bar", "baz");
 386             $db->unshift("foo");
 387             $db->put(3, "buz");
 388
 389             my $len = $db->length();
 390             print "length: $len\n"; # 4
 391
 392             for (my $k=0; $k<$len; $k++) {
 393                     print "$k: " . $db->get($k) . "\n";
 394             }
 395
 396             $db->splice(1, 2, "biz", "baf");
 397
 398             while (my $elem = shift @$db) {
 399                     print "shifted: $elem\n";
 400             }
 401
 402 LOCKING
 403     Enable automatic file locking by passing a true value to the "locking"
 404     parameter when constructing your DBM::Deep object (see SETUP above).
 405
 406             my $db = new DBM::Deep(
 407                     file => "foo.db",
 408                     locking => 1
 409             );
 410
 411     This causes Deep to "flock()" the underlying FileHandle object with
 412     exclusive mode for writes, and shared mode for reads. This is required
 413     if you have multiple processes accessing the same database file, to
 414     avoid file corruption. Please note that "flock()" does NOT work for
 415     files over NFS. See "DB OVER NFS" below for more.
 416
 417   EXPLICIT LOCKING
 418     You can explicitly lock a database, so it remains locked for multiple
 419     transactions. This is done by calling the "lock()" method, and passing
 420     an optional lock mode argument (defaults to exclusive mode). This is
 421     particularly useful for things like counters, where the current value
 422     needs to be fetched, then incremented, then stored again.
 423
 424             $db->lock();
 425             my $counter = $db->get("counter");
 426             $counter++;
 427             $db->put("counter", $counter);
 428             $db->unlock();
 429
 430             # or...
 431
 432             $db->lock();
 433             $db->{counter}++;
 434             $db->unlock();
 435
 436     You can pass "lock()" an optional argument, which specifies which mode
 437     to use (exclusive or shared). Use one of these two constants:
 438     "DBM::Deep::LOCK_EX" or "DBM::Deep::LOCK_SH". These are passed directly
 439     to "flock()", and are the same as the constants defined in Perl's
 440     "Fcntl" module.
 441
 442             $db->lock( DBM::Deep::LOCK_SH );
 443             # something here
 444             $db->unlock();
 445
 446     If you want to implement your own file locking scheme, be sure to create
 447     your DBM::Deep objects setting the "volatile" option to true. This hints
 448     to Deep that the DB file may change between transactions. See "LOW-LEVEL
 449     ACCESS" below for more.
 450
 451 IMPORTING/EXPORTING
 452     You can import existing complex structures by calling the "import()"
 453     method, and export an entire database into an in-memory structure using
 454     the "export()" method. Both are examined here.
 455
 456   IMPORTING
 457     Say you have an existing hash with nested hashes/arrays inside it.
 458     Instead of walking the structure and adding keys/elements to the
 459     database as you go, simply pass a reference to the "import()" method.
 460     This recursively adds everything to an existing DBM::Deep object for
 461     you. Here is an example:
 462
 463             my $struct = {
 464                     key1 => "value1",
 465                     key2 => "value2",
 466                     array1 => [ "elem0", "elem1", "elem2" ],
 467                     hash1 => {
 468                             subkey1 => "subvalue1",
 469                             subkey2 => "subvalue2"
 470                     }
 471             };
 472
 473             my $db = new DBM::Deep "foo.db";
 474             $db->import( $struct );
 475
 476             print $db->{key1} . "\n"; # prints "value1"
 477
 478     This recursively imports the entire $struct object into $db, including
 479     all nested hashes and arrays. If the DBM::Deep object contains exsiting
 480     data, keys are merged with the existing ones, replacing if they already
 481     exist. The "import()" method can be called on any database level (not
 482     just the base level), and works with both hash and array DB types.
 483
 484     Note: Make sure your existing structure has no circular references in
 485     it. These will cause an infinite loop when importing.
 486
 487   EXPORTING
 488     Calling the "export()" method on an existing DBM::Deep object will
 489     return a reference to a new in-memory copy of the database. The export
 490     is done recursively, so all nested hashes/arrays are all exported to
 491     standard Perl objects. Here is an example:
 492
 493             my $db = new DBM::Deep "foo.db";
 494
 495             $db->{key1} = "value1";
 496             $db->{key2} = "value2";
 497             $db->{hash1} = {};
 498             $db->{hash1}->{subkey1} = "subvalue1";
 499             $db->{hash1}->{subkey2} = "subvalue2";
 500
 501             my $struct = $db->export();
 502
 503             print $struct->{key1} . "\n"; # prints "value1"
 504
 505     This makes a complete copy of the database in memory, and returns a
 506     reference to it. The "export()" method can be called on any database
 507     level (not just the base level), and works with both hash and array DB
 508     types. Be careful of large databases -- you can store a lot more data in
 509     a DBM::Deep object than an in-memory Perl structure.
 510
 511     Note: Make sure your database has no circular references in it. These
 512     will cause an infinite loop when exporting.
 513
 514 FILTERS
 515     DBM::Deep has a number of hooks where you can specify your own Perl
 516     function to perform filtering on incoming or outgoing data. This is a
 517     perfect way to extend the engine, and implement things like real-time
 518     compression or encryption. Filtering applies to the base DB level, and
 519     all child hashes / arrays. Filter hooks can be specified when your
 520     DBM::Deep object is first constructed, or by calling the "set_filter()"
 521     method at any time. There are four available filter hooks, described
 522     below:
 523
 524     * filter_store_key
 525         This filter is called whenever a hash key is stored. It is passed
 526         the incoming key, and expected to return a transformed key.
 527
 528     * filter_store_value
 529         This filter is called whenever a hash key or array element is
 530         stored. It is passed the incoming value, and expected to return a
 531         transformed value.
 532
 533     * filter_fetch_key
 534         This filter is called whenever a hash key is fetched (i.e. via
 535         "first_key()" or "next_key()"). It is passed the transformed key,
 536         and expected to return the plain key.
 537
 538     * filter_fetch_value
 539         This filter is called whenever a hash key or array element is
 540         fetched. It is passed the transformed value, and expected to return
 541         the plain value.
 542
 543     Here are the two ways to setup a filter hook:
 544
 545             my $db = new DBM::Deep(
 546                     file => "foo.db",
 547                     filter_store_value => \&my_filter_store,
 548                     filter_fetch_value => \&my_filter_fetch
 549             );
 550
 551             # or...
 552
 553             $db->set_filter( "filter_store_value", \&my_filter_store );
 554             $db->set_filter( "filter_fetch_value", \&my_filter_fetch );
 555
 556     Your filter function will be called only when dealing with SCALAR keys
 557     or values. When nested hashes and arrays are being stored/fetched,
 558     filtering is bypassed. Filters are called as static functions, passed a
 559     single SCALAR argument, and expected to return a single SCALAR value. If
 560     you want to remove a filter, set the function reference to "undef":
 561
 562             $db->set_filter( "filter_store_value", undef );
 563
 564   REAL-TIME ENCRYPTION EXAMPLE
 565     Here is a working example that uses the *Crypt::Blowfish* module to do
 566     real-time encryption / decryption of keys & values with DBM::Deep
 567     Filters. Please visit
 568     <http://search.cpan.org/search?module=Crypt::Blowfish> for more on
 569     *Crypt::Blowfish*. You'll also need the *Crypt::CBC* module.
 570
 571             use DBM::Deep;
 572             use Crypt::Blowfish;
 573             use Crypt::CBC;
 574
 575             my $cipher = new Crypt::CBC({
 576                     'key'             => 'my secret key',
 577                     'cipher'          => 'Blowfish',
 578                     'iv'              => '$KJh#(}q',
 579                     'regenerate_key'  => 0,
 580                     'padding'         => 'space',
 581                     'prepend_iv'      => 0
 582             });
 583
 584             my $db = new DBM::Deep(
 585                     file => "foo-encrypt.db",
 586                     filter_store_key => \&my_encrypt,
 587                     filter_store_value => \&my_encrypt,
 588                     filter_fetch_key => \&my_decrypt,
 589                     filter_fetch_value => \&my_decrypt,
 590             );
 591
 592             $db->{key1} = "value1";
 593             $db->{key2} = "value2";
 594             print "key1: " . $db->{key1} . "\n";
 595             print "key2: " . $db->{key2} . "\n";
 596
 597             undef $db;
 598             exit;
 599
 600             sub my_encrypt {
 601                     return $cipher->encrypt( $_[0] );
 602             }
 603             sub my_decrypt {
 604                     return $cipher->decrypt( $_[0] );
 605             }
 606
 607   REAL-TIME COMPRESSION EXAMPLE
 608     Here is a working example that uses the *Compress::Zlib* module to do
 609     real-time compression / decompression of keys & values with DBM::Deep
 610     Filters. Please visit
 611     <http://search.cpan.org/search?module=Compress::Zlib> for more on
 612     *Compress::Zlib*.
 613
 614             use DBM::Deep;
 615             use Compress::Zlib;
 616
 617             my $db = new DBM::Deep(
 618                     file => "foo-compress.db",
 619                     filter_store_key => \&my_compress,
 620                     filter_store_value => \&my_compress,
 621                     filter_fetch_key => \&my_decompress,
 622                     filter_fetch_value => \&my_decompress,
 623             );
 624
 625             $db->{key1} = "value1";
 626             $db->{key2} = "value2";
 627             print "key1: " . $db->{key1} . "\n";
 628             print "key2: " . $db->{key2} . "\n";
 629
 630             undef $db;
 631             exit;
 632
 633             sub my_compress {
 634                     return Compress::Zlib::memGzip( $_[0] ) ;
 635             }
 636             sub my_decompress {
 637                     return Compress::Zlib::memGunzip( $_[0] ) ;
 638             }
 639
 640     Note: Filtering of keys only applies to hashes. Array "keys" are
 641     actually numerical index numbers, and are not filtered.
 642
 643 ERROR HANDLING
 644     Most DBM::Deep methods return a true value for success, and call die()
 645     on failure. You can wrap calls in an eval block to catch the die. Also,
 646     the actual error message is stored in an internal scalar, which can be
 647     fetched by calling the "error()" method.
 648
 649             my $db = new DBM::Deep "foo.db"; # create hash
 650             eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call
 651
 652             print $db->error(); # prints error message
 653
 654     You can then call "clear_error()" to clear the current error state.
 655
 656             $db->clear_error();
 657
 658     If you set the "debug" option to true when creating your DBM::Deep
 659     object, all errors are considered NON-FATAL, and dumped to STDERR. This
 660     is only for debugging purposes.
 661
 662 LARGEFILE SUPPORT
 663     If you have a 64-bit system, and your Perl is compiled with both
 664     LARGEFILE and 64-bit support, you *may* be able to create databases
 665     larger than 2 GB. DBM::Deep by default uses 32-bit file offset tags, but
 666     these can be changed by calling the static "set_pack()" method before
 667     you do anything else.
 668
 669             DBM::Deep::set_pack(8, 'Q');
 670
 671     This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad
 672     words instead of 32-bit longs. After setting these values your DB files
 673     have a theoretical maximum size of 16 XB (exabytes).
 674
 675     Note: Changing these values will NOT work for existing database files.
 676     Only change this for new files, and make sure it stays set consistently
 677     throughout the file's life. If you do set these values, you can no
 678     longer access 32-bit DB files. You can, however, call "set_pack(4, 'N')"
 679     to change back to 32-bit mode.
 680
 681     Note: I have not personally tested files > 2 GB -- all my systems have
 682     only a 32-bit Perl. However, I have received user reports that this does
 683     indeed work!
 684
 685 LOW-LEVEL ACCESS
 686     If you require low-level access to the underlying FileHandle that Deep
 687     uses, you can call the "fh()" method, which returns the handle:
 688
 689             my $fh = $db->fh();
 690
 691     This method can be called on the root level of the datbase, or any child
 692     hashes or arrays. All levels share a *root* structure, which contains
 693     things like the FileHandle, a reference counter, and all your options
 694     you specified when you created the object. You can get access to this
 695     root structure by calling the "root()" method.
 696
 697             my $root = $db->root();
 698
 699     This is useful for changing options after the object has already been
 700     created, such as enabling/disabling locking, volatile or debug modes.
 701     You can also store your own temporary user data in this structure (be
 702     wary of name collision), which is then accessible from any child hash or
 703     array.
 704
 705 CUSTOM DIGEST ALGORITHM
 706     DBM::Deep by default uses the *Message Digest 5* (MD5) algorithm for
 707     hashing keys. However you can override this, and use another algorithm
 708     (such as SHA-256) or even write your own. But please note that Deep
 709     currently expects zero collisions, so your algorithm has to be
 710     *perfect*, so to speak. Collision detection may be introduced in a later
 711     version.
 712
 713     You can specify a custom digest algorithm by calling the static
 714     "set_digest()" function, passing a reference to a subroutine, and the
 715     length of the algorithm's hashes (in bytes). This is a global static
 716     function, which affects ALL Deep objects. Here is a working example that
 717     uses a 256-bit hash from the *Digest::SHA256* module. Please see
 718     <http://search.cpan.org/search?module=Digest::SHA256> for more.
 719
 720             use DBM::Deep;
 721             use Digest::SHA256;
 722
 723             my $context = Digest::SHA256::new(256);
 724
 725             DBM::Deep::set_digest( \&my_digest, 32 );
 726
 727             my $db = new DBM::Deep "foo-sha.db";
 728
 729             $db->{key1} = "value1";
 730             $db->{key2} = "value2";
 731             print "key1: " . $db->{key1} . "\n";
 732             print "key2: " . $db->{key2} . "\n";
 733
 734             undef $db;
 735             exit;
 736
 737             sub my_digest {
 738                     return substr( $context->hash($_[0]), 0, 32 );
 739             }
 740
 741     Note: Your returned digest strings must be EXACTLY the number of bytes
 742     you specify in the "set_digest()" function (in this case 32).
 743
 744 CIRCULAR REFERENCES
 745     DBM::Deep has experimental support for circular references. Meaning you
 746     can have a nested hash key or array element that points to a parent
 747     object. This relationship is stored in the DB file, and is preserved
 748     between sessions. Here is an example:
 749
 750             my $db = new DBM::Deep "foo.db";
 751
 752             $db->{foo} = "bar";
 753             $db->{circle} = $db; # ref to self
 754
 755             print $db->{foo} . "\n"; # prints "foo"
 756             print $db->{circle}->{foo} . "\n"; # prints "foo" again
 757
 758     One catch is, passing the object to a function that recursively walks
 759     the object tree (such as *Data::Dumper* or even the built-in
 760     "optimize()" or "export()" methods) will result in an infinite loop. The
 761     other catch is, if you fetch the *key* of a circular reference (i.e.
 762     using the "first_key()" or "next_key()" methods), you will get the
 763     *target object's key*, not the ref's key. This gets even more
 764     interesting with the above example, where the *circle* key points to the
 765     base DB object, which technically doesn't have a key. So I made
 766     DBM::Deep return "[base]" as the key name in that special case.
 767
 768 CAVEATS / ISSUES / BUGS
 769     This section describes all the known issues with DBM::Deep. It you have
 770     found something that is not listed here, please send e-mail to
 771     jhuckaby@cpan.org.
 772
 773   UNUSED SPACE RECOVERY
 774     One major caveat with Deep is that space occupied by existing keys and
 775     values is not recovered when they are deleted. Meaning if you keep
 776     deleting and adding new keys, your file will continuously grow. I am
 777     working on this, but in the meantime you can call the built-in
 778     "optimize()" method from time to time (perhaps in a crontab or
 779     something) to recover all your unused space.
 780
 781             $db->optimize(); # returns true on success
 782
 783     This rebuilds the ENTIRE database into a new file, then moves it on top
 784     of the original. The new file will have no unused space, thus it will
 785     take up as little disk space as possible. Please note that this
 786     operation can take a long time for large files, and you need enough disk
 787     space to temporarily hold 2 copies of your DB file. The temporary file
 788     is created in the same directory as the original, named with a ".tmp"
 789     extension, and is deleted when the operation completes. Oh, and if
 790     locking is enabled, the DB is automatically locked for the entire
 791     duration of the copy.
 792
 793     WARNING: Only call optimize() on the top-level node of the database, and
 794     make sure there are no child references lying around. Deep keeps a
 795     reference counter, and if it is greater than 1, optimize() will abort
 796     and return undef.
 797
 798   AUTOVIVIFICATION
 799     Unfortunately, autovivification doesn't work with tied hashes. This
 800     appears to be a bug in Perl's tie() system, as *Jakob Schmidt*
 801     encountered the very same issue with his *DWH_FIle* module (see
 802     <http://search.cpan.org/search?module=DWH_File>), and it is also
 803     mentioned in the BUGS section for the *MLDBM* module <see
 804     <http://search.cpan.org/search?module=MLDBM>). Basically, on a new db
 805     file, this does not work:
 806
 807             $db->{foo}->{bar} = "hello";
 808
 809     Since "foo" doesn't exist, you cannot add "bar" to it. You end up with
 810     "foo" being an empty hash. Try this instead, which works fine:
 811
 812             $db->{foo} = { bar => "hello" };
 813
 814     As of Perl 5.8.7, this bug still exists. I have walked very carefully
 815     through the execution path, and Perl indeed passes an empty hash to the
 816     STORE() method. Probably a bug in Perl.
 817
 818   FILE CORRUPTION
 819     The current level of error handling in Deep is minimal. Files *are*
 820     checked for a 32-bit signature on open(), but other corruption in files
 821     can cause segmentation faults. Deep may try to seek() past the end of a
 822     file, or get stuck in an infinite loop depending on the level of
 823     corruption. File write operations are not checked for failure (for
 824     speed), so if you happen to run out of disk space, Deep will probably
 825     fail in a bad way. These things will be addressed in a later version of
 826     DBM::Deep.
 827
 828   DB OVER NFS
 829     Beware of using DB files over NFS. Deep uses flock(), which works well
 830     on local filesystems, but will NOT protect you from file corruption over
 831     NFS. I've heard about setting up your NFS server with a locking daemon,
 832     then using lockf() to lock your files, but your milage may vary there as
 833     well. From what I understand, there is no real way to do it. However, if
 834     you need access to the underlying FileHandle in Deep for using some
 835     other kind of locking scheme like lockf(), see the "LOW-LEVEL ACCESS"
 836     section above.
 837
 838   COPYING OBJECTS
 839     Beware of copying tied objects in Perl. Very strange things can happen.
 840     Instead, use Deep's "clone()" method which safely copies the object and
 841     returns a new, blessed, tied hash or array to the same level in the DB.
 842
 843             my $copy = $db->clone();
 844
 845   LARGE ARRAYS
 846     Beware of using "shift()", "unshift()" or "splice()" with large arrays.
 847     These functions cause every element in the array to move, which can be
 848     murder on DBM::Deep, as every element has to be fetched from disk, then
 849     stored again in a different location. This may be addressed in a later
 850     version.
 851
 852 PERFORMANCE
 853     This section discusses DBM::Deep's speed and memory usage.
 854
 855   SPEED
 856     Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs,
 857     such as the almighty *BerkeleyDB*. But it makes up for it in features
 858     like true multi-level hash/array support, and cross-platform FTPable
 859     files. Even so, DBM::Deep is still pretty fast, and the speed stays
 860     fairly consistent, even with huge databases. Here is some test data:
 861
 862             Adding 1,000,000 keys to new DB file...
 863
 864             At 100 keys, avg. speed is 2,703 keys/sec
 865             At 200 keys, avg. speed is 2,642 keys/sec
 866             At 300 keys, avg. speed is 2,598 keys/sec
 867             At 400 keys, avg. speed is 2,578 keys/sec
 868             At 500 keys, avg. speed is 2,722 keys/sec
 869             At 600 keys, avg. speed is 2,628 keys/sec
 870             At 700 keys, avg. speed is 2,700 keys/sec
 871             At 800 keys, avg. speed is 2,607 keys/sec
 872             At 900 keys, avg. speed is 2,190 keys/sec
 873             At 1,000 keys, avg. speed is 2,570 keys/sec
 874             At 2,000 keys, avg. speed is 2,417 keys/sec
 875             At 3,000 keys, avg. speed is 1,982 keys/sec
 876             At 4,000 keys, avg. speed is 1,568 keys/sec
 877             At 5,000 keys, avg. speed is 1,533 keys/sec
 878             At 6,000 keys, avg. speed is 1,787 keys/sec
 879             At 7,000 keys, avg. speed is 1,977 keys/sec
 880             At 8,000 keys, avg. speed is 2,028 keys/sec
 881             At 9,000 keys, avg. speed is 2,077 keys/sec
 882             At 10,000 keys, avg. speed is 2,031 keys/sec
 883             At 20,000 keys, avg. speed is 1,970 keys/sec
 884             At 30,000 keys, avg. speed is 2,050 keys/sec
 885             At 40,000 keys, avg. speed is 2,073 keys/sec
 886             At 50,000 keys, avg. speed is 1,973 keys/sec
 887             At 60,000 keys, avg. speed is 1,914 keys/sec
 888             At 70,000 keys, avg. speed is 2,091 keys/sec
 889             At 80,000 keys, avg. speed is 2,103 keys/sec
 890             At 90,000 keys, avg. speed is 1,886 keys/sec
 891             At 100,000 keys, avg. speed is 1,970 keys/sec
 892             At 200,000 keys, avg. speed is 2,053 keys/sec
 893             At 300,000 keys, avg. speed is 1,697 keys/sec
 894             At 400,000 keys, avg. speed is 1,838 keys/sec
 895             At 500,000 keys, avg. speed is 1,941 keys/sec
 896             At 600,000 keys, avg. speed is 1,930 keys/sec
 897             At 700,000 keys, avg. speed is 1,735 keys/sec
 898             At 800,000 keys, avg. speed is 1,795 keys/sec
 899             At 900,000 keys, avg. speed is 1,221 keys/sec
 900             At 1,000,000 keys, avg. speed is 1,077 keys/sec
 901
 902     This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 &
 903     Perl 5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash
 904     keys and values were between 6 - 12 chars in length. The DB file ended
 905     up at 210MB. Run time was 12 min 3 sec.
 906
 907   MEMORY USAGE
 908     One of the great things about DBM::Deep is that it uses very little
 909     memory. Even with huge databases (1,000,000+ keys) you will not see much
 910     increased memory on your process. Deep relies solely on the filesystem
 911     for storing and fetching data. Here is output from */usr/bin/top* before
 912     even opening a database handle:
 913
 914               PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 915             22831 root      11   0  2716 2716  1296 R     0.0  0.2   0:07 perl
 916
 917     Basically the process is taking 2,716K of memory. And here is the same
 918     process after storing and fetching 1,000,000 keys:
 919
 920               PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 921             22831 root      14   0  2772 2772  1328 R     0.0  0.2  13:32 perl
 922
 923     Notice the memory usage increased by only 56K. Test was performed on a
 924     700mHz x86 box running Linux RedHat 7.2 & Perl 5.6.1.
 925
 926 DB FILE FORMAT
 927     In case you were interested in the underlying DB file format, it is
 928     documented here in this section. You don't need to know this to use the
 929     module, it's just included for reference.
 930
 931   SIGNATURE
 932     DBM::Deep files always start with a 32-bit signature to identify the
 933     file type. This is at offset 0. The signature is "DPDB" in network byte
 934     order. This is checked upon each file open().
 935
 936   TAG
 937     The DBM::Deep file is in a *tagged format*, meaning each section of the
 938     file has a standard header containing the type of data, the length of
 939     data, and then the data itself. The type is a single character (1 byte),
 940     the length is a 32-bit unsigned long in network byte order, and the data
 941     is, well, the data. Here is how it unfolds:
 942
 943   MASTER INDEX
 944     Immediately after the 32-bit file signature is the *Master Index*
 945     record. This is a standard tag header followed by 1024 bytes (in 32-bit
 946     mode) or 2048 bytes (in 64-bit mode) of data. The type is *H* for hash
 947     or *A* for array, depending on how the DBM::Deep object was constructed.
 948
 949     The index works by looking at a *MD5 Hash* of the hash key (or array
 950     index number). The first 8-bit char of the MD5 signature is the offset
 951     into the index, multipled by 4 in 32-bit mode, or 8 in 64-bit mode. The
 952     value of the index element is a file offset of the next tag for the
 953     key/element in question, which is usually a *Bucket List* tag (see
 954     below).
 955
 956     The next tag *could* be another index, depending on how many
 957     keys/elements exist. See RE-INDEXING below for details.
 958
 959   BUCKET LIST
 960     A *Bucket List* is a collection of 16 MD5 hashes for keys/elements, plus
 961     file offsets to where the actual data is stored. It starts with a
 962     standard tag header, with type *B*, and a data size of 320 bytes in
 963     32-bit mode, or 384 bytes in 64-bit mode. Each MD5 hash is stored in
 964     full (16 bytes), plus the 32-bit or 64-bit file offset for the *Bucket*
 965     containing the actual data. When the list fills up, a *Re-Index*
 966     operation is performed (See RE-INDEXING below).
 967
 968   BUCKET
 969     A *Bucket* is a tag containing a key/value pair (in hash mode), or a
 970     index/value pair (in array mode). It starts with a standard tag header
 971     with type *D* for scalar data (string, binary, etc.), or it could be a
 972     nested hash (type *H*) or array (type *A*). The value comes just after
 973     the tag header. The size reported in the tag header is only for the
 974     value, but then, just after the value is another size (32-bit unsigned
 975     long) and then the plain key itself. Since the value is likely to be
 976     fetched more often than the plain key, I figured it would be *slightly*
 977     faster to store the value first.
 978
 979     If the type is *H* (hash) or *A* (array), the value is another *Master
 980     Index* record for the nested structure, where the process begins all
 981     over again.
 982
 983   RE-INDEXING
 984     After a *Bucket List* grows to 16 records, its allocated space in the
 985     file is exhausted. Then, when another key/element comes in, the list is
 986     converted to a new index record. However, this index will look at the
 987     next char in the MD5 hash, and arrange new Bucket List pointers
 988     accordingly. This process is called *Re-Indexing*. Basically, a new
 989     index tag is created at the file EOF, and all 17 (16 + new one)
 990     keys/elements are removed from the old Bucket List and inserted into the
 991     new index. Several new Bucket Lists are created in the process, as a new
 992     MD5 char from the key is being examined (it is unlikely that the keys
 993     will all share the same next char of their MD5s).
 994
 995     Because of the way the *MD5* algorithm works, it is impossible to tell
 996     exactly when the Bucket Lists will turn into indexes, but the first
 997     round tends to happen right around 4,000 keys. You will see a *slight*
 998     decrease in performance here, but it picks back up pretty quick (see
 999     SPEED above). Then it takes a lot more keys to exhaust the next level of
1000     Bucket Lists. It's right around 900,000 keys. This process can continue
1001     nearly indefinitely -- right up until the point the *MD5* signatures
1002     start colliding with each other, and this is EXTREMELY rare -- like
1003     winning the lottery 5 times in a row AND getting struck by lightning
1004     while you are walking to cash in your tickets. Theoretically, since
1005     *MD5* hashes are 128-bit values, you *could* have up to
1006     340,282,366,921,000,000,000,000,000,000,000,000,000 keys/elements (I
1007     believe this is 340 unodecillion, but don't quote me).
1008
1009   STORING
1010     When a new key/element is stored, the key (or index number) is first ran
1011     through *Digest::MD5* to get a 128-bit signature (example, in hex:
1012     b05783b0773d894396d475ced9d2f4f6). Then, the *Master Index* record is
1013     checked for the first char of the signature (in this case *b*). If it
1014     does not exist, a new *Bucket List* is created for our key (and the next
1015     15 future keys that happen to also have *b* as their first MD5 char).
1016     The entire MD5 is written to the *Bucket List* along with the offset of
1017     the new *Bucket* record (EOF at this point, unless we are replacing an
1018     existing *Bucket*), where the actual data will be stored.
1019
1020   FETCHING
1021     Fetching an existing key/element involves getting a *Digest::MD5* of the
1022     key (or index number), then walking along the indexes. If there are
1023     enough keys/elements in this DB level, there might be nested indexes,
1024     each linked to a particular char of the MD5. Finally, a *Bucket List* is
1025     pointed to, which contains up to 16 full MD5 hashes. Each is checked for
1026     equality to the key in question. If we found a match, the *Bucket* tag
1027     is loaded, where the value and plain key are stored.
1028
1029     Fetching the plain key occurs when calling the *first_key()* and
1030     *next_key()* methods. In this process the indexes are walked
1031     systematically, and each key fetched in increasing MD5 order (which is
1032     why it appears random). Once the *Bucket* is found, the value is skipped
1033     the plain key returned instead. Note: Do not count on keys being fetched
1034     as if the MD5 hashes were alphabetically sorted. This only happens on an
1035     index-level -- as soon as the *Bucket Lists* are hit, the keys will come
1036     out in the order they went in -- so it's pretty much undefined how the
1037     keys will come out -- just like Perl's built-in hashes.
1038
1039 AUTHOR
1040     Joseph Huckaby, jhuckaby@cpan.org
1041
1042     Special thanks to Adam Sah and Rich Gaushell! You know why :-)
1043
1044 SEE ALSO
1045     perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3),
1046     nfs(5), Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3)
1047
1048 LICENSE
1049     Copyright (c) 2002-2005 Joseph Huckaby. All Rights Reserved. This is
1050     free software, you may use it and distribute it under the same terms as
1051     Perl itself.
1052