r13305@Rob-Kinyons-PowerBook: rob | 2006-05-18 15:18:06 -0400
[dbsrgits/DBM-Deep.git] / README
CommitLineData
ffed8b01 1NAME
2 DBM::Deep - A pure perl multi-level hash/array DBM
3
4SYNOPSIS
5 use DBM::Deep;
6 my $db = new DBM::Deep "foo.db";
7
8 $db->{key} = 'value'; # tie() style
9 print $db->{key};
10
11 $db->put('key', 'value'); # OO style
12 print $db->get('key');
13
14 # true multi-level support
15 $db->{my_complex} = [
16 'hello', { perl => 'rules' },
17 42, 99 ];
18
19DESCRIPTION
20 A unique flat-file database module, written in pure perl. True
21 multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO
22 / tie() interface, cross-platform FTPable files, and quite fast. Can
23 handle millions of keys and unlimited hash levels without significant
24 slow-down. Written from the ground-up in pure perl -- this is NOT a
25 wrapper around a C-based DBM. Out-of-the-box compatibility with Unix,
26 Mac OS X and Windows.
27
28INSTALLATION
29 Hopefully you are using CPAN's excellent Perl module, which will
30 download and install the module for you. If not, get the tarball, and
31 run these commands:
32
33 tar zxf DBM-Deep-*
34 cd DBM-Deep-*
35 perl Makefile.PL
36 make
37 make test
38 make install
39
40SETUP
41 Construction can be done OO-style (which is the recommended way), or
42 using Perl's tie() function. Both are examined here.
43
44 OO CONSTRUCTION
45 The recommended way to construct a DBM::Deep object is to use the new()
46 method, which gets you a blessed, tied hash or array reference.
47
48 my $db = new DBM::Deep "foo.db";
49
50 This opens a new database handle, mapped to the file "foo.db". If this
51 file does not exist, it will automatically be created. DB files are
52 opened in "r+" (read/write) mode, and the type of object returned is a
53 hash, unless otherwise specified (see OPTIONS below).
54
55 You can pass a number of options to the constructor to specify things
56 like locking, autoflush, etc. This is done by passing an inline hash:
57
58 my $db = new DBM::Deep(
59 file => "foo.db",
60 locking => 1,
61 autoflush => 1
62 );
63
64 Notice that the filename is now specified *inside* the hash with the
65 "file" parameter, as opposed to being the sole argument to the
66 constructor. This is required if any options are specified. See OPTIONS
67 below for the complete list.
68
69 You can also start with an array instead of a hash. For this, you must
70 specify the "type" parameter:
71
72 my $db = new DBM::Deep(
73 file => "foo.db",
74 type => DBM::Deep::TYPE_ARRAY
75 );
76
77 Note: Specifing the "type" parameter only takes effect when beginning a
78 new DB file. If you create a DBM::Deep object with an existing file, the
79 "type" will be loaded from the file header, and ignored if it is passed
80 to the constructor.
81
82 TIE CONSTRUCTION
83 Alternatively, you can create a DBM::Deep handle by using Perl's
84 built-in tie() function. This is not ideal, because you get only a
85 basic, tied hash (or array) which is not blessed, so you can't call any
86 functions on it.
87
88 my %hash;
89 tie %hash, "DBM::Deep", "foo.db";
90
91 my @array;
92 tie @array, "DBM::Deep", "bar.db";
93
94 As with the OO constructor, you can replace the DB filename parameter
95 with a hash containing one or more options (see OPTIONS just below for
96 the complete list).
97
98 tie %hash, "DBM::Deep", {
99 file => "foo.db",
100 locking => 1,
101 autoflush => 1
102 };
103
104 OPTIONS
105 There are a number of options that can be passed in when constructing
106 your DBM::Deep objects. These apply to both the OO- and tie- based
107 approaches.
108
109 * file
110 Filename of the DB file to link the handle to. You can pass a full
111 absolute filesystem path, partial path, or a plain filename if the
112 file is in the current working directory. This is a required
113 parameter.
114
115 * mode
116 File open mode (read-only, read-write, etc.) string passed to Perl's
117 FileHandle module. This is an optional parameter, and defaults to
118 "r+" (read/write). Note: If the default (r+) mode is selected, the
119 file will also be auto- created if it doesn't exist.
120
121 * type
122 This parameter specifies what type of object to create, a hash or
123 array. Use one of these two constants: "DBM::Deep::TYPE_HASH" or
124 "DBM::Deep::TYPE_ARRAY". This only takes effect when beginning a new
125 file. This is an optional parameter, and defaults to hash.
126
127 * locking
128 Specifies whether locking is to be enabled. DBM::Deep uses Perl's
129 Fnctl flock() function to lock the database in exclusive mode for
130 writes, and shared mode for reads. Pass any true value to enable.
131 This affects the base DB handle *and any child hashes or arrays*
132 that use the same DB file. This is an optional parameter, and
133 defaults to 0 (disabled). See LOCKING below for more.
134
135 * autoflush
136 Specifies whether autoflush is to be enabled on the underlying
137 FileHandle. This obviously slows down write operations, but is
138 required if you may have multiple processes accessing the same DB
139 file (also consider enable *locking* or at least *volatile*). Pass
140 any true value to enable. This is an optional parameter, and
141 defaults to 0 (disabled).
142
143 * volatile
144 If *volatile* mode is enabled, DBM::Deep will stat() the DB file
145 before each STORE() operation. This is required if an outside force
146 may change the size of the file between transactions. Locking also
147 implicitly enables volatile. This is useful if you want to use a
148 different locking system or write your own. Pass any true value to
149 enable. This is an optional parameter, and defaults to 0 (disabled).
150
151 * autobless
152 If *autobless* mode is enabled, DBM::Deep will preserve blessed
153 hashes, and restore them when fetched. This is an experimental
154 feature, and does have side-effects. Basically, when hashes are
155 re-blessed into their original classes, they are no longer blessed
156 into the DBM::Deep class! So you won't be able to call any DBM::Deep
157 methods on them. You have been warned. This is an optional
158 parameter, and defaults to 0 (disabled).
159
160 * filter_*
161 See FILTERS below.
162
163 * debug
164 Setting *debug* mode will make all errors non-fatal, dump them out
165 to STDERR, and continue on. This is for debugging purposes only, and
166 probably not what you want. This is an optional parameter, and
167 defaults to 0 (disabled).
168
169TIE INTERFACE
170 With DBM::Deep you can access your databases using Perl's standard
171 hash/array syntax. Because all Deep objects are *tied* to hashes or
172 arrays, you can treat them as such. Deep will intercept all reads/writes
173 and direct them to the right place -- the DB file. This has nothing to
174 do with the "TIE CONSTRUCTION" section above. This simply tells you how
175 to use DBM::Deep using regular hashes and arrays, rather than calling
176 functions like "get()" and "put()" (although those work too). It is
177 entirely up to you how to want to access your databases.
178
179 HASHES
180 You can treat any DBM::Deep object like a normal Perl hash reference.
181 Add keys, or even nested hashes (or arrays) using standard Perl syntax:
182
183 my $db = new DBM::Deep "foo.db";
184
185 $db->{mykey} = "myvalue";
186 $db->{myhash} = {};
187 $db->{myhash}->{subkey} = "subvalue";
188
189 print $db->{myhash}->{subkey} . "\n";
190
191 You can even step through hash keys using the normal Perl "keys()"
192 function:
193
194 foreach my $key (keys %$db) {
195 print "$key: " . $db->{$key} . "\n";
196 }
197
198 Remember that Perl's "keys()" function extracts *every* key from the
199 hash and pushes them onto an array, all before the loop even begins. If
200 you have an extra large hash, this may exhaust Perl's memory. Instead,
201 consider using Perl's "each()" function, which pulls keys/values one at
202 a time, using very little memory:
203
204 while (my ($key, $value) = each %$db) {
205 print "$key: $value\n";
206 }
207
208 Please note that when using "each()", you should always pass a direct
209 hash reference, not a lookup. Meaning, you should never do this:
210
211 # NEVER DO THIS
212 while (my ($key, $value) = each %{$db->{foo}}) { # BAD
213
214 This causes an infinite loop, because for each iteration, Perl is
215 calling FETCH() on the $db handle, resulting in a "new" hash for foo
216 every time, so it effectively keeps returning the first key over and
217 over again. Instead, assign a temporary variable to "$db-"{foo}>, then
218 pass that to each().
219
220 ARRAYS
221 As with hashes, you can treat any DBM::Deep object like a normal Perl
222 array reference. This includes inserting, removing and manipulating
223 elements, and the "push()", "pop()", "shift()", "unshift()" and
224 "splice()" functions. The object must have first been created using type
225 "DBM::Deep::TYPE_ARRAY", or simply be a nested array reference inside a
226 hash. Example:
227
228 my $db = new DBM::Deep(
229 file => "foo-array.db",
230 type => DBM::Deep::TYPE_ARRAY
231 );
232
233 $db->[0] = "foo";
234 push @$db, "bar", "baz";
235 unshift @$db, "bah";
236
237 my $last_elem = pop @$db; # baz
238 my $first_elem = shift @$db; # bah
239 my $second_elem = $db->[1]; # bar
240
241 my $num_elements = scalar @$db;
242
243OO INTERFACE
244 In addition to the *tie()* interface, you can also use a standard OO
245 interface to manipulate all aspects of DBM::Deep databases. Each type of
246 object (hash or array) has its own methods, but both types share the
247 following common methods: "put()", "get()", "exists()", "delete()" and
248 "clear()".
249
250 * put()
251 Stores a new hash key/value pair, or sets an array element value.
252 Takes two arguments, the hash key or array index, and the new value.
253 The value can be a scalar, hash ref or array ref. Returns true on
254 success, false on failure.
255
256 $db->put("foo", "bar"); # for hashes
257 $db->put(1, "bar"); # for arrays
258
259 * get()
260 Fetches the value of a hash key or array element. Takes one
261 argument: the hash key or array index. Returns a scalar, hash ref or
262 array ref, depending on the data type stored.
263
264 my $value = $db->get("foo"); # for hashes
265 my $value = $db->get(1); # for arrays
266
267 * exists()
268 Checks if a hash key or array index exists. Takes one argument: the
269 hash key or array index. Returns true if it exists, false if not.
270
271 if ($db->exists("foo")) { print "yay!\n"; } # for hashes
272 if ($db->exists(1)) { print "yay!\n"; } # for arrays
273
274 * delete()
275 Deletes one hash key/value pair or array element. Takes one
276 argument: the hash key or array index. Returns true on success,
277 false if not found. For arrays, the remaining elements located after
278 the deleted element are NOT moved over. The deleted element is
279 essentially just undefined, which is exactly how Perl's internal
280 arrays work. Please note that the space occupied by the deleted
281 key/value or element is not reused again -- see "UNUSED SPACE
282 RECOVERY" below for details and workarounds.
283
284 $db->delete("foo"); # for hashes
285 $db->delete(1); # for arrays
286
287 * clear()
288 Deletes all hash keys or array elements. Takes no arguments. No
289 return value. Please note that the space occupied by the deleted
290 keys/values or elements is not reused again -- see "UNUSED SPACE
291 RECOVERY" below for details and workarounds.
292
293 $db->clear(); # hashes or arrays
294
295 HASHES
296 For hashes, DBM::Deep supports all the common methods described above,
297 and the following additional methods: "first_key()" and "next_key()".
298
299 * first_key()
300 Returns the "first" key in the hash. As with built-in Perl hashes,
301 keys are fetched in an undefined order (which appears random). Takes
302 no arguments, returns the key as a scalar value.
303
304 my $key = $db->first_key();
305
306 * next_key()
307 Returns the "next" key in the hash, given the previous one as the
308 sole argument. Returns undef if there are no more keys to be
309 fetched.
310
311 $key = $db->next_key($key);
312
313 Here are some examples of using hashes:
314
315 my $db = new DBM::Deep "foo.db";
316
317 $db->put("foo", "bar");
318 print "foo: " . $db->get("foo") . "\n";
319
320 $db->put("baz", {}); # new child hash ref
321 $db->get("baz")->put("buz", "biz");
322 print "buz: " . $db->get("baz")->get("buz") . "\n";
323
324 my $key = $db->first_key();
325 while ($key) {
326 print "$key: " . $db->get($key) . "\n";
327 $key = $db->next_key($key);
328 }
329
330 if ($db->exists("foo")) { $db->delete("foo"); }
331
332 ARRAYS
333 For arrays, DBM::Deep supports all the common methods described above,
334 and the following additional methods: "length()", "push()", "pop()",
335 "shift()", "unshift()" and "splice()".
336
337 * length()
338 Returns the number of elements in the array. Takes no arguments.
339
340 my $len = $db->length();
341
342 * push()
343 Adds one or more elements onto the end of the array. Accepts
344 scalars, hash refs or array refs. No return value.
345
346 $db->push("foo", "bar", {});
347
348 * pop()
349 Fetches the last element in the array, and deletes it. Takes no
350 arguments. Returns undef if array is empty. Returns the element
351 value.
352
353 my $elem = $db->pop();
354
355 * shift()
356 Fetches the first element in the array, deletes it, then shifts all
357 the remaining elements over to take up the space. Returns the
358 element value. This method is not recommended with large arrays --
359 see "LARGE ARRAYS" below for details.
360
361 my $elem = $db->shift();
362
363 * unshift()
364 Inserts one or more elements onto the beginning of the array,
365 shifting all existing elements over to make room. Accepts scalars,
366 hash refs or array refs. No return value. This method is not
367 recommended with large arrays -- see <LARGE ARRAYS> below for
368 details.
369
370 $db->unshift("foo", "bar", {});
371
372 * splice()
373 Performs exactly like Perl's built-in function of the same name. See
374 "perldoc -f splice" for usage -- it is too complicated to document
375 here. This method is not recommended with large arrays -- see "LARGE
376 ARRAYS" below for details.
377
378 Here are some examples of using arrays:
379
380 my $db = new DBM::Deep(
381 file => "foo.db",
382 type => DBM::Deep::TYPE_ARRAY
383 );
384
385 $db->push("bar", "baz");
386 $db->unshift("foo");
387 $db->put(3, "buz");
388
389 my $len = $db->length();
390 print "length: $len\n"; # 4
391
392 for (my $k=0; $k<$len; $k++) {
393 print "$k: " . $db->get($k) . "\n";
394 }
395
396 $db->splice(1, 2, "biz", "baf");
397
398 while (my $elem = shift @$db) {
399 print "shifted: $elem\n";
400 }
401
402LOCKING
403 Enable automatic file locking by passing a true value to the "locking"
404 parameter when constructing your DBM::Deep object (see SETUP above).
405
406 my $db = new DBM::Deep(
407 file => "foo.db",
408 locking => 1
409 );
410
411 This causes Deep to "flock()" the underlying FileHandle object with
412 exclusive mode for writes, and shared mode for reads. This is required
413 if you have multiple processes accessing the same database file, to
414 avoid file corruption. Please note that "flock()" does NOT work for
415 files over NFS. See "DB OVER NFS" below for more.
416
417 EXPLICIT LOCKING
418 You can explicitly lock a database, so it remains locked for multiple
419 transactions. This is done by calling the "lock()" method, and passing
420 an optional lock mode argument (defaults to exclusive mode). This is
421 particularly useful for things like counters, where the current value
422 needs to be fetched, then incremented, then stored again.
423
424 $db->lock();
425 my $counter = $db->get("counter");
426 $counter++;
427 $db->put("counter", $counter);
428 $db->unlock();
429
430 # or...
431
432 $db->lock();
433 $db->{counter}++;
434 $db->unlock();
435
436 You can pass "lock()" an optional argument, which specifies which mode
437 to use (exclusive or shared). Use one of these two constants:
438 "DBM::Deep::LOCK_EX" or "DBM::Deep::LOCK_SH". These are passed directly
439 to "flock()", and are the same as the constants defined in Perl's
440 "Fcntl" module.
441
442 $db->lock( DBM::Deep::LOCK_SH );
443 # something here
444 $db->unlock();
445
446 If you want to implement your own file locking scheme, be sure to create
447 your DBM::Deep objects setting the "volatile" option to true. This hints
448 to Deep that the DB file may change between transactions. See "LOW-LEVEL
449 ACCESS" below for more.
450
451IMPORTING/EXPORTING
452 You can import existing complex structures by calling the "import()"
453 method, and export an entire database into an in-memory structure using
454 the "export()" method. Both are examined here.
455
456 IMPORTING
457 Say you have an existing hash with nested hashes/arrays inside it.
458 Instead of walking the structure and adding keys/elements to the
459 database as you go, simply pass a reference to the "import()" method.
460 This recursively adds everything to an existing DBM::Deep object for
461 you. Here is an example:
462
463 my $struct = {
464 key1 => "value1",
465 key2 => "value2",
466 array1 => [ "elem0", "elem1", "elem2" ],
467 hash1 => {
468 subkey1 => "subvalue1",
469 subkey2 => "subvalue2"
470 }
471 };
472
473 my $db = new DBM::Deep "foo.db";
474 $db->import( $struct );
475
476 print $db->{key1} . "\n"; # prints "value1"
477
478 This recursively imports the entire $struct object into $db, including
479 all nested hashes and arrays. If the DBM::Deep object contains exsiting
480 data, keys are merged with the existing ones, replacing if they already
481 exist. The "import()" method can be called on any database level (not
482 just the base level), and works with both hash and array DB types.
483
484 Note: Make sure your existing structure has no circular references in
485 it. These will cause an infinite loop when importing.
486
487 EXPORTING
488 Calling the "export()" method on an existing DBM::Deep object will
489 return a reference to a new in-memory copy of the database. The export
490 is done recursively, so all nested hashes/arrays are all exported to
491 standard Perl objects. Here is an example:
492
493 my $db = new DBM::Deep "foo.db";
494
495 $db->{key1} = "value1";
496 $db->{key2} = "value2";
497 $db->{hash1} = {};
498 $db->{hash1}->{subkey1} = "subvalue1";
499 $db->{hash1}->{subkey2} = "subvalue2";
500
501 my $struct = $db->export();
502
503 print $struct->{key1} . "\n"; # prints "value1"
504
505 This makes a complete copy of the database in memory, and returns a
506 reference to it. The "export()" method can be called on any database
507 level (not just the base level), and works with both hash and array DB
508 types. Be careful of large databases -- you can store a lot more data in
509 a DBM::Deep object than an in-memory Perl structure.
510
511 Note: Make sure your database has no circular references in it. These
512 will cause an infinite loop when exporting.
513
514FILTERS
515 DBM::Deep has a number of hooks where you can specify your own Perl
516 function to perform filtering on incoming or outgoing data. This is a
517 perfect way to extend the engine, and implement things like real-time
518 compression or encryption. Filtering applies to the base DB level, and
519 all child hashes / arrays. Filter hooks can be specified when your
520 DBM::Deep object is first constructed, or by calling the "set_filter()"
521 method at any time. There are four available filter hooks, described
522 below:
523
524 * filter_store_key
525 This filter is called whenever a hash key is stored. It is passed
526 the incoming key, and expected to return a transformed key.
527
528 * filter_store_value
529 This filter is called whenever a hash key or array element is
530 stored. It is passed the incoming value, and expected to return a
531 transformed value.
532
533 * filter_fetch_key
534 This filter is called whenever a hash key is fetched (i.e. via
535 "first_key()" or "next_key()"). It is passed the transformed key,
536 and expected to return the plain key.
537
538 * filter_fetch_value
539 This filter is called whenever a hash key or array element is
540 fetched. It is passed the transformed value, and expected to return
541 the plain value.
542
543 Here are the two ways to setup a filter hook:
544
545 my $db = new DBM::Deep(
546 file => "foo.db",
547 filter_store_value => \&my_filter_store,
548 filter_fetch_value => \&my_filter_fetch
549 );
550
551 # or...
552
553 $db->set_filter( "filter_store_value", \&my_filter_store );
554 $db->set_filter( "filter_fetch_value", \&my_filter_fetch );
555
556 Your filter function will be called only when dealing with SCALAR keys
557 or values. When nested hashes and arrays are being stored/fetched,
558 filtering is bypassed. Filters are called as static functions, passed a
559 single SCALAR argument, and expected to return a single SCALAR value. If
560 you want to remove a filter, set the function reference to "undef":
561
562 $db->set_filter( "filter_store_value", undef );
563
564 REAL-TIME ENCRYPTION EXAMPLE
565 Here is a working example that uses the *Crypt::Blowfish* module to do
566 real-time encryption / decryption of keys & values with DBM::Deep
567 Filters. Please visit
568 <http://search.cpan.org/search?module=Crypt::Blowfish> for more on
569 *Crypt::Blowfish*. You'll also need the *Crypt::CBC* module.
570
571 use DBM::Deep;
572 use Crypt::Blowfish;
573 use Crypt::CBC;
574
575 my $cipher = new Crypt::CBC({
576 'key' => 'my secret key',
577 'cipher' => 'Blowfish',
578 'iv' => '$KJh#(}q',
579 'regenerate_key' => 0,
580 'padding' => 'space',
581 'prepend_iv' => 0
582 });
583
584 my $db = new DBM::Deep(
585 file => "foo-encrypt.db",
586 filter_store_key => \&my_encrypt,
587 filter_store_value => \&my_encrypt,
588 filter_fetch_key => \&my_decrypt,
589 filter_fetch_value => \&my_decrypt,
590 );
591
592 $db->{key1} = "value1";
593 $db->{key2} = "value2";
594 print "key1: " . $db->{key1} . "\n";
595 print "key2: " . $db->{key2} . "\n";
596
597 undef $db;
598 exit;
599
600 sub my_encrypt {
601 return $cipher->encrypt( $_[0] );
602 }
603 sub my_decrypt {
604 return $cipher->decrypt( $_[0] );
605 }
606
607 REAL-TIME COMPRESSION EXAMPLE
608 Here is a working example that uses the *Compress::Zlib* module to do
609 real-time compression / decompression of keys & values with DBM::Deep
610 Filters. Please visit
611 <http://search.cpan.org/search?module=Compress::Zlib> for more on
612 *Compress::Zlib*.
613
614 use DBM::Deep;
615 use Compress::Zlib;
616
617 my $db = new DBM::Deep(
618 file => "foo-compress.db",
619 filter_store_key => \&my_compress,
620 filter_store_value => \&my_compress,
621 filter_fetch_key => \&my_decompress,
622 filter_fetch_value => \&my_decompress,
623 );
624
625 $db->{key1} = "value1";
626 $db->{key2} = "value2";
627 print "key1: " . $db->{key1} . "\n";
628 print "key2: " . $db->{key2} . "\n";
629
630 undef $db;
631 exit;
632
633 sub my_compress {
634 return Compress::Zlib::memGzip( $_[0] ) ;
635 }
636 sub my_decompress {
637 return Compress::Zlib::memGunzip( $_[0] ) ;
638 }
639
640 Note: Filtering of keys only applies to hashes. Array "keys" are
641 actually numerical index numbers, and are not filtered.
642
643ERROR HANDLING
644 Most DBM::Deep methods return a true value for success, and call die()
645 on failure. You can wrap calls in an eval block to catch the die. Also,
646 the actual error message is stored in an internal scalar, which can be
647 fetched by calling the "error()" method.
648
649 my $db = new DBM::Deep "foo.db"; # create hash
650 eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call
651
652 print $db->error(); # prints error message
653
654 You can then call "clear_error()" to clear the current error state.
655
656 $db->clear_error();
657
658 If you set the "debug" option to true when creating your DBM::Deep
659 object, all errors are considered NON-FATAL, and dumped to STDERR. This
660 is only for debugging purposes.
661
662LARGEFILE SUPPORT
663 If you have a 64-bit system, and your Perl is compiled with both
664 LARGEFILE and 64-bit support, you *may* be able to create databases
665 larger than 2 GB. DBM::Deep by default uses 32-bit file offset tags, but
666 these can be changed by calling the static "set_pack()" method before
667 you do anything else.
668
669 DBM::Deep::set_pack(8, 'Q');
670
671 This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad
672 words instead of 32-bit longs. After setting these values your DB files
673 have a theoretical maximum size of 16 XB (exabytes).
674
675 Note: Changing these values will NOT work for existing database files.
676 Only change this for new files, and make sure it stays set consistently
677 throughout the file's life. If you do set these values, you can no
678 longer access 32-bit DB files. You can, however, call "set_pack(4, 'N')"
679 to change back to 32-bit mode.
680
681 Note: I have not personally tested files > 2 GB -- all my systems have
682 only a 32-bit Perl. However, I have received user reports that this does
683 indeed work!
684
685LOW-LEVEL ACCESS
686 If you require low-level access to the underlying FileHandle that Deep
687 uses, you can call the "fh()" method, which returns the handle:
688
689 my $fh = $db->fh();
690
691 This method can be called on the root level of the datbase, or any child
692 hashes or arrays. All levels share a *root* structure, which contains
693 things like the FileHandle, a reference counter, and all your options
694 you specified when you created the object. You can get access to this
695 root structure by calling the "root()" method.
696
697 my $root = $db->root();
698
699 This is useful for changing options after the object has already been
700 created, such as enabling/disabling locking, volatile or debug modes.
701 You can also store your own temporary user data in this structure (be
702 wary of name collision), which is then accessible from any child hash or
703 array.
704
705CUSTOM DIGEST ALGORITHM
706 DBM::Deep by default uses the *Message Digest 5* (MD5) algorithm for
707 hashing keys. However you can override this, and use another algorithm
708 (such as SHA-256) or even write your own. But please note that Deep
709 currently expects zero collisions, so your algorithm has to be
710 *perfect*, so to speak. Collision detection may be introduced in a later
711 version.
712
713 You can specify a custom digest algorithm by calling the static
714 "set_digest()" function, passing a reference to a subroutine, and the
715 length of the algorithm's hashes (in bytes). This is a global static
716 function, which affects ALL Deep objects. Here is a working example that
717 uses a 256-bit hash from the *Digest::SHA256* module. Please see
718 <http://search.cpan.org/search?module=Digest::SHA256> for more.
719
720 use DBM::Deep;
721 use Digest::SHA256;
722
723 my $context = Digest::SHA256::new(256);
724
725 DBM::Deep::set_digest( \&my_digest, 32 );
726
727 my $db = new DBM::Deep "foo-sha.db";
728
729 $db->{key1} = "value1";
730 $db->{key2} = "value2";
731 print "key1: " . $db->{key1} . "\n";
732 print "key2: " . $db->{key2} . "\n";
733
734 undef $db;
735 exit;
736
737 sub my_digest {
738 return substr( $context->hash($_[0]), 0, 32 );
739 }
740
741 Note: Your returned digest strings must be EXACTLY the number of bytes
742 you specify in the "set_digest()" function (in this case 32).
743
744CIRCULAR REFERENCES
745 DBM::Deep has experimental support for circular references. Meaning you
746 can have a nested hash key or array element that points to a parent
747 object. This relationship is stored in the DB file, and is preserved
748 between sessions. Here is an example:
749
750 my $db = new DBM::Deep "foo.db";
751
752 $db->{foo} = "bar";
753 $db->{circle} = $db; # ref to self
754
755 print $db->{foo} . "\n"; # prints "foo"
756 print $db->{circle}->{foo} . "\n"; # prints "foo" again
757
758 One catch is, passing the object to a function that recursively walks
759 the object tree (such as *Data::Dumper* or even the built-in
760 "optimize()" or "export()" methods) will result in an infinite loop. The
761 other catch is, if you fetch the *key* of a circular reference (i.e.
762 using the "first_key()" or "next_key()" methods), you will get the
763 *target object's key*, not the ref's key. This gets even more
764 interesting with the above example, where the *circle* key points to the
765 base DB object, which technically doesn't have a key. So I made
766 DBM::Deep return "[base]" as the key name in that special case.
767
768CAVEATS / ISSUES / BUGS
769 This section describes all the known issues with DBM::Deep. It you have
770 found something that is not listed here, please send e-mail to
771 jhuckaby@cpan.org.
772
773 UNUSED SPACE RECOVERY
774 One major caveat with Deep is that space occupied by existing keys and
775 values is not recovered when they are deleted. Meaning if you keep
776 deleting and adding new keys, your file will continuously grow. I am
777 working on this, but in the meantime you can call the built-in
778 "optimize()" method from time to time (perhaps in a crontab or
779 something) to recover all your unused space.
780
781 $db->optimize(); # returns true on success
782
783 This rebuilds the ENTIRE database into a new file, then moves it on top
784 of the original. The new file will have no unused space, thus it will
785 take up as little disk space as possible. Please note that this
786 operation can take a long time for large files, and you need enough disk
787 space to temporarily hold 2 copies of your DB file. The temporary file
788 is created in the same directory as the original, named with a ".tmp"
789 extension, and is deleted when the operation completes. Oh, and if
790 locking is enabled, the DB is automatically locked for the entire
791 duration of the copy.
792
793 WARNING: Only call optimize() on the top-level node of the database, and
794 make sure there are no child references lying around. Deep keeps a
795 reference counter, and if it is greater than 1, optimize() will abort
796 and return undef.
797
798 AUTOVIVIFICATION
799 Unfortunately, autovivification doesn't work with tied hashes. This
800 appears to be a bug in Perl's tie() system, as *Jakob Schmidt*
801 encountered the very same issue with his *DWH_FIle* module (see
802 <http://search.cpan.org/search?module=DWH_File>), and it is also
803 mentioned in the BUGS section for the *MLDBM* module <see
804 <http://search.cpan.org/search?module=MLDBM>). Basically, on a new db
805 file, this does not work:
806
807 $db->{foo}->{bar} = "hello";
808
809 Since "foo" doesn't exist, you cannot add "bar" to it. You end up with
810 "foo" being an empty hash. Try this instead, which works fine:
811
812 $db->{foo} = { bar => "hello" };
813
814 As of Perl 5.8.7, this bug still exists. I have walked very carefully
815 through the execution path, and Perl indeed passes an empty hash to the
816 STORE() method. Probably a bug in Perl.
817
818 FILE CORRUPTION
819 The current level of error handling in Deep is minimal. Files *are*
820 checked for a 32-bit signature on open(), but other corruption in files
821 can cause segmentation faults. Deep may try to seek() past the end of a
822 file, or get stuck in an infinite loop depending on the level of
823 corruption. File write operations are not checked for failure (for
824 speed), so if you happen to run out of disk space, Deep will probably
825 fail in a bad way. These things will be addressed in a later version of
826 DBM::Deep.
827
828 DB OVER NFS
829 Beware of using DB files over NFS. Deep uses flock(), which works well
830 on local filesystems, but will NOT protect you from file corruption over
831 NFS. I've heard about setting up your NFS server with a locking daemon,
832 then using lockf() to lock your files, but your milage may vary there as
833 well. From what I understand, there is no real way to do it. However, if
834 you need access to the underlying FileHandle in Deep for using some
835 other kind of locking scheme like lockf(), see the "LOW-LEVEL ACCESS"
836 section above.
837
838 COPYING OBJECTS
839 Beware of copying tied objects in Perl. Very strange things can happen.
840 Instead, use Deep's "clone()" method which safely copies the object and
841 returns a new, blessed, tied hash or array to the same level in the DB.
842
843 my $copy = $db->clone();
844
845 LARGE ARRAYS
846 Beware of using "shift()", "unshift()" or "splice()" with large arrays.
847 These functions cause every element in the array to move, which can be
848 murder on DBM::Deep, as every element has to be fetched from disk, then
849 stored again in a different location. This may be addressed in a later
850 version.
851
852PERFORMANCE
853 This section discusses DBM::Deep's speed and memory usage.
854
855 SPEED
856 Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs,
857 such as the almighty *BerkeleyDB*. But it makes up for it in features
858 like true multi-level hash/array support, and cross-platform FTPable
859 files. Even so, DBM::Deep is still pretty fast, and the speed stays
860 fairly consistent, even with huge databases. Here is some test data:
861
862 Adding 1,000,000 keys to new DB file...
863
864 At 100 keys, avg. speed is 2,703 keys/sec
865 At 200 keys, avg. speed is 2,642 keys/sec
866 At 300 keys, avg. speed is 2,598 keys/sec
867 At 400 keys, avg. speed is 2,578 keys/sec
868 At 500 keys, avg. speed is 2,722 keys/sec
869 At 600 keys, avg. speed is 2,628 keys/sec
870 At 700 keys, avg. speed is 2,700 keys/sec
871 At 800 keys, avg. speed is 2,607 keys/sec
872 At 900 keys, avg. speed is 2,190 keys/sec
873 At 1,000 keys, avg. speed is 2,570 keys/sec
874 At 2,000 keys, avg. speed is 2,417 keys/sec
875 At 3,000 keys, avg. speed is 1,982 keys/sec
876 At 4,000 keys, avg. speed is 1,568 keys/sec
877 At 5,000 keys, avg. speed is 1,533 keys/sec
878 At 6,000 keys, avg. speed is 1,787 keys/sec
879 At 7,000 keys, avg. speed is 1,977 keys/sec
880 At 8,000 keys, avg. speed is 2,028 keys/sec
881 At 9,000 keys, avg. speed is 2,077 keys/sec
882 At 10,000 keys, avg. speed is 2,031 keys/sec
883 At 20,000 keys, avg. speed is 1,970 keys/sec
884 At 30,000 keys, avg. speed is 2,050 keys/sec
885 At 40,000 keys, avg. speed is 2,073 keys/sec
886 At 50,000 keys, avg. speed is 1,973 keys/sec
887 At 60,000 keys, avg. speed is 1,914 keys/sec
888 At 70,000 keys, avg. speed is 2,091 keys/sec
889 At 80,000 keys, avg. speed is 2,103 keys/sec
890 At 90,000 keys, avg. speed is 1,886 keys/sec
891 At 100,000 keys, avg. speed is 1,970 keys/sec
892 At 200,000 keys, avg. speed is 2,053 keys/sec
893 At 300,000 keys, avg. speed is 1,697 keys/sec
894 At 400,000 keys, avg. speed is 1,838 keys/sec
895 At 500,000 keys, avg. speed is 1,941 keys/sec
896 At 600,000 keys, avg. speed is 1,930 keys/sec
897 At 700,000 keys, avg. speed is 1,735 keys/sec
898 At 800,000 keys, avg. speed is 1,795 keys/sec
899 At 900,000 keys, avg. speed is 1,221 keys/sec
900 At 1,000,000 keys, avg. speed is 1,077 keys/sec
901
902 This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 &
903 Perl 5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash
904 keys and values were between 6 - 12 chars in length. The DB file ended
905 up at 210MB. Run time was 12 min 3 sec.
906
907 MEMORY USAGE
908 One of the great things about DBM::Deep is that it uses very little
909 memory. Even with huge databases (1,000,000+ keys) you will not see much
910 increased memory on your process. Deep relies solely on the filesystem
911 for storing and fetching data. Here is output from */usr/bin/top* before
912 even opening a database handle:
913
914 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
915 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl
916
917 Basically the process is taking 2,716K of memory. And here is the same
918 process after storing and fetching 1,000,000 keys:
919
920 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
921 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl
922
923 Notice the memory usage increased by only 56K. Test was performed on a
924 700mHz x86 box running Linux RedHat 7.2 & Perl 5.6.1.
925
926DB FILE FORMAT
927 In case you were interested in the underlying DB file format, it is
928 documented here in this section. You don't need to know this to use the
929 module, it's just included for reference.
930
931 SIGNATURE
932 DBM::Deep files always start with a 32-bit signature to identify the
933 file type. This is at offset 0. The signature is "DPDB" in network byte
934 order. This is checked upon each file open().
935
936 TAG
937 The DBM::Deep file is in a *tagged format*, meaning each section of the
938 file has a standard header containing the type of data, the length of
939 data, and then the data itself. The type is a single character (1 byte),
940 the length is a 32-bit unsigned long in network byte order, and the data
941 is, well, the data. Here is how it unfolds:
942
943 MASTER INDEX
944 Immediately after the 32-bit file signature is the *Master Index*
945 record. This is a standard tag header followed by 1024 bytes (in 32-bit
946 mode) or 2048 bytes (in 64-bit mode) of data. The type is *H* for hash
947 or *A* for array, depending on how the DBM::Deep object was constructed.
948
949 The index works by looking at a *MD5 Hash* of the hash key (or array
950 index number). The first 8-bit char of the MD5 signature is the offset
951 into the index, multipled by 4 in 32-bit mode, or 8 in 64-bit mode. The
952 value of the index element is a file offset of the next tag for the
953 key/element in question, which is usually a *Bucket List* tag (see
954 below).
955
956 The next tag *could* be another index, depending on how many
957 keys/elements exist. See RE-INDEXING below for details.
958
959 BUCKET LIST
960 A *Bucket List* is a collection of 16 MD5 hashes for keys/elements, plus
961 file offsets to where the actual data is stored. It starts with a
962 standard tag header, with type *B*, and a data size of 320 bytes in
963 32-bit mode, or 384 bytes in 64-bit mode. Each MD5 hash is stored in
964 full (16 bytes), plus the 32-bit or 64-bit file offset for the *Bucket*
965 containing the actual data. When the list fills up, a *Re-Index*
966 operation is performed (See RE-INDEXING below).
967
968 BUCKET
969 A *Bucket* is a tag containing a key/value pair (in hash mode), or a
970 index/value pair (in array mode). It starts with a standard tag header
971 with type *D* for scalar data (string, binary, etc.), or it could be a
972 nested hash (type *H*) or array (type *A*). The value comes just after
973 the tag header. The size reported in the tag header is only for the
974 value, but then, just after the value is another size (32-bit unsigned
975 long) and then the plain key itself. Since the value is likely to be
976 fetched more often than the plain key, I figured it would be *slightly*
977 faster to store the value first.
978
979 If the type is *H* (hash) or *A* (array), the value is another *Master
980 Index* record for the nested structure, where the process begins all
981 over again.
982
983 RE-INDEXING
984 After a *Bucket List* grows to 16 records, its allocated space in the
985 file is exhausted. Then, when another key/element comes in, the list is
986 converted to a new index record. However, this index will look at the
987 next char in the MD5 hash, and arrange new Bucket List pointers
988 accordingly. This process is called *Re-Indexing*. Basically, a new
989 index tag is created at the file EOF, and all 17 (16 + new one)
990 keys/elements are removed from the old Bucket List and inserted into the
991 new index. Several new Bucket Lists are created in the process, as a new
992 MD5 char from the key is being examined (it is unlikely that the keys
993 will all share the same next char of their MD5s).
994
995 Because of the way the *MD5* algorithm works, it is impossible to tell
996 exactly when the Bucket Lists will turn into indexes, but the first
997 round tends to happen right around 4,000 keys. You will see a *slight*
998 decrease in performance here, but it picks back up pretty quick (see
999 SPEED above). Then it takes a lot more keys to exhaust the next level of
1000 Bucket Lists. It's right around 900,000 keys. This process can continue
1001 nearly indefinitely -- right up until the point the *MD5* signatures
1002 start colliding with each other, and this is EXTREMELY rare -- like
1003 winning the lottery 5 times in a row AND getting struck by lightning
1004 while you are walking to cash in your tickets. Theoretically, since
1005 *MD5* hashes are 128-bit values, you *could* have up to
1006 340,282,366,921,000,000,000,000,000,000,000,000,000 keys/elements (I
1007 believe this is 340 unodecillion, but don't quote me).
1008
1009 STORING
1010 When a new key/element is stored, the key (or index number) is first ran
1011 through *Digest::MD5* to get a 128-bit signature (example, in hex:
1012 b05783b0773d894396d475ced9d2f4f6). Then, the *Master Index* record is
1013 checked for the first char of the signature (in this case *b*). If it
1014 does not exist, a new *Bucket List* is created for our key (and the next
1015 15 future keys that happen to also have *b* as their first MD5 char).
1016 The entire MD5 is written to the *Bucket List* along with the offset of
1017 the new *Bucket* record (EOF at this point, unless we are replacing an
1018 existing *Bucket*), where the actual data will be stored.
1019
1020 FETCHING
1021 Fetching an existing key/element involves getting a *Digest::MD5* of the
1022 key (or index number), then walking along the indexes. If there are
1023 enough keys/elements in this DB level, there might be nested indexes,
1024 each linked to a particular char of the MD5. Finally, a *Bucket List* is
1025 pointed to, which contains up to 16 full MD5 hashes. Each is checked for
1026 equality to the key in question. If we found a match, the *Bucket* tag
1027 is loaded, where the value and plain key are stored.
1028
1029 Fetching the plain key occurs when calling the *first_key()* and
1030 *next_key()* methods. In this process the indexes are walked
1031 systematically, and each key fetched in increasing MD5 order (which is
1032 why it appears random). Once the *Bucket* is found, the value is skipped
1033 the plain key returned instead. Note: Do not count on keys being fetched
1034 as if the MD5 hashes were alphabetically sorted. This only happens on an
1035 index-level -- as soon as the *Bucket Lists* are hit, the keys will come
1036 out in the order they went in -- so it's pretty much undefined how the
1037 keys will come out -- just like Perl's built-in hashes.
1038
1039AUTHOR
1040 Joseph Huckaby, jhuckaby@cpan.org
1041
1042 Special thanks to Adam Sah and Rich Gaushell! You know why :-)
1043
1044SEE ALSO
1045 perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3),
1046 nfs(5), Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3)
1047
1048LICENSE
1049 Copyright (c) 2002-2005 Joseph Huckaby. All Rights Reserved. This is
1050 free software, you may use it and distribute it under the same terms as
1051 Perl itself.
1052