Commit | Line | Data |
ffed8b01 |
1 | NAME |
2 | DBM::Deep - A pure perl multi-level hash/array DBM |
3 | |
4 | SYNOPSIS |
5 | use DBM::Deep; |
6 | my $db = new DBM::Deep "foo.db"; |
7 | |
8 | $db->{key} = 'value'; # tie() style |
9 | print $db->{key}; |
10 | |
11 | $db->put('key', 'value'); # OO style |
12 | print $db->get('key'); |
13 | |
14 | # true multi-level support |
15 | $db->{my_complex} = [ |
16 | 'hello', { perl => 'rules' }, |
17 | 42, 99 ]; |
18 | |
19 | DESCRIPTION |
20 | A unique flat-file database module, written in pure perl. True |
21 | multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO |
22 | / tie() interface, cross-platform FTPable files, and quite fast. Can |
23 | handle millions of keys and unlimited hash levels without significant |
24 | slow-down. Written from the ground-up in pure perl -- this is NOT a |
25 | wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, |
26 | Mac OS X and Windows. |
27 | |
28 | INSTALLATION |
29 | Hopefully you are using CPAN's excellent Perl module, which will |
30 | download and install the module for you. If not, get the tarball, and |
31 | run these commands: |
32 | |
33 | tar zxf DBM-Deep-* |
34 | cd DBM-Deep-* |
35 | perl Makefile.PL |
36 | make |
37 | make test |
38 | make install |
39 | |
40 | SETUP |
41 | Construction can be done OO-style (which is the recommended way), or |
42 | using Perl's tie() function. Both are examined here. |
43 | |
44 | OO CONSTRUCTION |
45 | The recommended way to construct a DBM::Deep object is to use the new() |
46 | method, which gets you a blessed, tied hash or array reference. |
47 | |
48 | my $db = new DBM::Deep "foo.db"; |
49 | |
50 | This opens a new database handle, mapped to the file "foo.db". If this |
51 | file does not exist, it will automatically be created. DB files are |
52 | opened in "r+" (read/write) mode, and the type of object returned is a |
53 | hash, unless otherwise specified (see OPTIONS below). |
54 | |
55 | You can pass a number of options to the constructor to specify things |
56 | like locking, autoflush, etc. This is done by passing an inline hash: |
57 | |
58 | my $db = new DBM::Deep( |
59 | file => "foo.db", |
60 | locking => 1, |
61 | autoflush => 1 |
62 | ); |
63 | |
64 | Notice that the filename is now specified *inside* the hash with the |
65 | "file" parameter, as opposed to being the sole argument to the |
66 | constructor. This is required if any options are specified. See OPTIONS |
67 | below for the complete list. |
68 | |
69 | You can also start with an array instead of a hash. For this, you must |
70 | specify the "type" parameter: |
71 | |
72 | my $db = new DBM::Deep( |
73 | file => "foo.db", |
74 | type => DBM::Deep::TYPE_ARRAY |
75 | ); |
76 | |
77 | Note: Specifing the "type" parameter only takes effect when beginning a |
78 | new DB file. If you create a DBM::Deep object with an existing file, the |
79 | "type" will be loaded from the file header, and ignored if it is passed |
80 | to the constructor. |
81 | |
82 | TIE CONSTRUCTION |
83 | Alternatively, you can create a DBM::Deep handle by using Perl's |
84 | built-in tie() function. This is not ideal, because you get only a |
85 | basic, tied hash (or array) which is not blessed, so you can't call any |
86 | functions on it. |
87 | |
88 | my %hash; |
89 | tie %hash, "DBM::Deep", "foo.db"; |
90 | |
91 | my @array; |
92 | tie @array, "DBM::Deep", "bar.db"; |
93 | |
94 | As with the OO constructor, you can replace the DB filename parameter |
95 | with a hash containing one or more options (see OPTIONS just below for |
96 | the complete list). |
97 | |
98 | tie %hash, "DBM::Deep", { |
99 | file => "foo.db", |
100 | locking => 1, |
101 | autoflush => 1 |
102 | }; |
103 | |
104 | OPTIONS |
105 | There are a number of options that can be passed in when constructing |
106 | your DBM::Deep objects. These apply to both the OO- and tie- based |
107 | approaches. |
108 | |
109 | * file |
110 | Filename of the DB file to link the handle to. You can pass a full |
111 | absolute filesystem path, partial path, or a plain filename if the |
112 | file is in the current working directory. This is a required |
113 | parameter. |
114 | |
115 | * mode |
116 | File open mode (read-only, read-write, etc.) string passed to Perl's |
117 | FileHandle module. This is an optional parameter, and defaults to |
118 | "r+" (read/write). Note: If the default (r+) mode is selected, the |
119 | file will also be auto- created if it doesn't exist. |
120 | |
121 | * type |
122 | This parameter specifies what type of object to create, a hash or |
123 | array. Use one of these two constants: "DBM::Deep::TYPE_HASH" or |
124 | "DBM::Deep::TYPE_ARRAY". This only takes effect when beginning a new |
125 | file. This is an optional parameter, and defaults to hash. |
126 | |
127 | * locking |
128 | Specifies whether locking is to be enabled. DBM::Deep uses Perl's |
129 | Fnctl flock() function to lock the database in exclusive mode for |
130 | writes, and shared mode for reads. Pass any true value to enable. |
131 | This affects the base DB handle *and any child hashes or arrays* |
132 | that use the same DB file. This is an optional parameter, and |
133 | defaults to 0 (disabled). See LOCKING below for more. |
134 | |
135 | * autoflush |
136 | Specifies whether autoflush is to be enabled on the underlying |
137 | FileHandle. This obviously slows down write operations, but is |
138 | required if you may have multiple processes accessing the same DB |
139 | file (also consider enable *locking* or at least *volatile*). Pass |
140 | any true value to enable. This is an optional parameter, and |
141 | defaults to 0 (disabled). |
142 | |
143 | * volatile |
144 | If *volatile* mode is enabled, DBM::Deep will stat() the DB file |
145 | before each STORE() operation. This is required if an outside force |
146 | may change the size of the file between transactions. Locking also |
147 | implicitly enables volatile. This is useful if you want to use a |
148 | different locking system or write your own. Pass any true value to |
149 | enable. This is an optional parameter, and defaults to 0 (disabled). |
150 | |
151 | * autobless |
152 | If *autobless* mode is enabled, DBM::Deep will preserve blessed |
153 | hashes, and restore them when fetched. This is an experimental |
154 | feature, and does have side-effects. Basically, when hashes are |
155 | re-blessed into their original classes, they are no longer blessed |
156 | into the DBM::Deep class! So you won't be able to call any DBM::Deep |
157 | methods on them. You have been warned. This is an optional |
158 | parameter, and defaults to 0 (disabled). |
159 | |
160 | * filter_* |
161 | See FILTERS below. |
162 | |
163 | * debug |
164 | Setting *debug* mode will make all errors non-fatal, dump them out |
165 | to STDERR, and continue on. This is for debugging purposes only, and |
166 | probably not what you want. This is an optional parameter, and |
167 | defaults to 0 (disabled). |
168 | |
169 | TIE INTERFACE |
170 | With DBM::Deep you can access your databases using Perl's standard |
171 | hash/array syntax. Because all Deep objects are *tied* to hashes or |
172 | arrays, you can treat them as such. Deep will intercept all reads/writes |
173 | and direct them to the right place -- the DB file. This has nothing to |
174 | do with the "TIE CONSTRUCTION" section above. This simply tells you how |
175 | to use DBM::Deep using regular hashes and arrays, rather than calling |
176 | functions like "get()" and "put()" (although those work too). It is |
177 | entirely up to you how to want to access your databases. |
178 | |
179 | HASHES |
180 | You can treat any DBM::Deep object like a normal Perl hash reference. |
181 | Add keys, or even nested hashes (or arrays) using standard Perl syntax: |
182 | |
183 | my $db = new DBM::Deep "foo.db"; |
184 | |
185 | $db->{mykey} = "myvalue"; |
186 | $db->{myhash} = {}; |
187 | $db->{myhash}->{subkey} = "subvalue"; |
188 | |
189 | print $db->{myhash}->{subkey} . "\n"; |
190 | |
191 | You can even step through hash keys using the normal Perl "keys()" |
192 | function: |
193 | |
194 | foreach my $key (keys %$db) { |
195 | print "$key: " . $db->{$key} . "\n"; |
196 | } |
197 | |
198 | Remember that Perl's "keys()" function extracts *every* key from the |
199 | hash and pushes them onto an array, all before the loop even begins. If |
200 | you have an extra large hash, this may exhaust Perl's memory. Instead, |
201 | consider using Perl's "each()" function, which pulls keys/values one at |
202 | a time, using very little memory: |
203 | |
204 | while (my ($key, $value) = each %$db) { |
205 | print "$key: $value\n"; |
206 | } |
207 | |
208 | Please note that when using "each()", you should always pass a direct |
209 | hash reference, not a lookup. Meaning, you should never do this: |
210 | |
211 | # NEVER DO THIS |
212 | while (my ($key, $value) = each %{$db->{foo}}) { # BAD |
213 | |
214 | This causes an infinite loop, because for each iteration, Perl is |
215 | calling FETCH() on the $db handle, resulting in a "new" hash for foo |
216 | every time, so it effectively keeps returning the first key over and |
217 | over again. Instead, assign a temporary variable to "$db-"{foo}>, then |
218 | pass that to each(). |
219 | |
220 | ARRAYS |
221 | As with hashes, you can treat any DBM::Deep object like a normal Perl |
222 | array reference. This includes inserting, removing and manipulating |
223 | elements, and the "push()", "pop()", "shift()", "unshift()" and |
224 | "splice()" functions. The object must have first been created using type |
225 | "DBM::Deep::TYPE_ARRAY", or simply be a nested array reference inside a |
226 | hash. Example: |
227 | |
228 | my $db = new DBM::Deep( |
229 | file => "foo-array.db", |
230 | type => DBM::Deep::TYPE_ARRAY |
231 | ); |
232 | |
233 | $db->[0] = "foo"; |
234 | push @$db, "bar", "baz"; |
235 | unshift @$db, "bah"; |
236 | |
237 | my $last_elem = pop @$db; # baz |
238 | my $first_elem = shift @$db; # bah |
239 | my $second_elem = $db->[1]; # bar |
240 | |
241 | my $num_elements = scalar @$db; |
242 | |
243 | OO INTERFACE |
244 | In addition to the *tie()* interface, you can also use a standard OO |
245 | interface to manipulate all aspects of DBM::Deep databases. Each type of |
246 | object (hash or array) has its own methods, but both types share the |
247 | following common methods: "put()", "get()", "exists()", "delete()" and |
248 | "clear()". |
249 | |
250 | * put() |
251 | Stores a new hash key/value pair, or sets an array element value. |
252 | Takes two arguments, the hash key or array index, and the new value. |
253 | The value can be a scalar, hash ref or array ref. Returns true on |
254 | success, false on failure. |
255 | |
256 | $db->put("foo", "bar"); # for hashes |
257 | $db->put(1, "bar"); # for arrays |
258 | |
259 | * get() |
260 | Fetches the value of a hash key or array element. Takes one |
261 | argument: the hash key or array index. Returns a scalar, hash ref or |
262 | array ref, depending on the data type stored. |
263 | |
264 | my $value = $db->get("foo"); # for hashes |
265 | my $value = $db->get(1); # for arrays |
266 | |
267 | * exists() |
268 | Checks if a hash key or array index exists. Takes one argument: the |
269 | hash key or array index. Returns true if it exists, false if not. |
270 | |
271 | if ($db->exists("foo")) { print "yay!\n"; } # for hashes |
272 | if ($db->exists(1)) { print "yay!\n"; } # for arrays |
273 | |
274 | * delete() |
275 | Deletes one hash key/value pair or array element. Takes one |
276 | argument: the hash key or array index. Returns true on success, |
277 | false if not found. For arrays, the remaining elements located after |
278 | the deleted element are NOT moved over. The deleted element is |
279 | essentially just undefined, which is exactly how Perl's internal |
280 | arrays work. Please note that the space occupied by the deleted |
281 | key/value or element is not reused again -- see "UNUSED SPACE |
282 | RECOVERY" below for details and workarounds. |
283 | |
284 | $db->delete("foo"); # for hashes |
285 | $db->delete(1); # for arrays |
286 | |
287 | * clear() |
288 | Deletes all hash keys or array elements. Takes no arguments. No |
289 | return value. Please note that the space occupied by the deleted |
290 | keys/values or elements is not reused again -- see "UNUSED SPACE |
291 | RECOVERY" below for details and workarounds. |
292 | |
293 | $db->clear(); # hashes or arrays |
294 | |
295 | HASHES |
296 | For hashes, DBM::Deep supports all the common methods described above, |
297 | and the following additional methods: "first_key()" and "next_key()". |
298 | |
299 | * first_key() |
300 | Returns the "first" key in the hash. As with built-in Perl hashes, |
301 | keys are fetched in an undefined order (which appears random). Takes |
302 | no arguments, returns the key as a scalar value. |
303 | |
304 | my $key = $db->first_key(); |
305 | |
306 | * next_key() |
307 | Returns the "next" key in the hash, given the previous one as the |
308 | sole argument. Returns undef if there are no more keys to be |
309 | fetched. |
310 | |
311 | $key = $db->next_key($key); |
312 | |
313 | Here are some examples of using hashes: |
314 | |
315 | my $db = new DBM::Deep "foo.db"; |
316 | |
317 | $db->put("foo", "bar"); |
318 | print "foo: " . $db->get("foo") . "\n"; |
319 | |
320 | $db->put("baz", {}); # new child hash ref |
321 | $db->get("baz")->put("buz", "biz"); |
322 | print "buz: " . $db->get("baz")->get("buz") . "\n"; |
323 | |
324 | my $key = $db->first_key(); |
325 | while ($key) { |
326 | print "$key: " . $db->get($key) . "\n"; |
327 | $key = $db->next_key($key); |
328 | } |
329 | |
330 | if ($db->exists("foo")) { $db->delete("foo"); } |
331 | |
332 | ARRAYS |
333 | For arrays, DBM::Deep supports all the common methods described above, |
334 | and the following additional methods: "length()", "push()", "pop()", |
335 | "shift()", "unshift()" and "splice()". |
336 | |
337 | * length() |
338 | Returns the number of elements in the array. Takes no arguments. |
339 | |
340 | my $len = $db->length(); |
341 | |
342 | * push() |
343 | Adds one or more elements onto the end of the array. Accepts |
344 | scalars, hash refs or array refs. No return value. |
345 | |
346 | $db->push("foo", "bar", {}); |
347 | |
348 | * pop() |
349 | Fetches the last element in the array, and deletes it. Takes no |
350 | arguments. Returns undef if array is empty. Returns the element |
351 | value. |
352 | |
353 | my $elem = $db->pop(); |
354 | |
355 | * shift() |
356 | Fetches the first element in the array, deletes it, then shifts all |
357 | the remaining elements over to take up the space. Returns the |
358 | element value. This method is not recommended with large arrays -- |
359 | see "LARGE ARRAYS" below for details. |
360 | |
361 | my $elem = $db->shift(); |
362 | |
363 | * unshift() |
364 | Inserts one or more elements onto the beginning of the array, |
365 | shifting all existing elements over to make room. Accepts scalars, |
366 | hash refs or array refs. No return value. This method is not |
367 | recommended with large arrays -- see <LARGE ARRAYS> below for |
368 | details. |
369 | |
370 | $db->unshift("foo", "bar", {}); |
371 | |
372 | * splice() |
373 | Performs exactly like Perl's built-in function of the same name. See |
374 | "perldoc -f splice" for usage -- it is too complicated to document |
375 | here. This method is not recommended with large arrays -- see "LARGE |
376 | ARRAYS" below for details. |
377 | |
378 | Here are some examples of using arrays: |
379 | |
380 | my $db = new DBM::Deep( |
381 | file => "foo.db", |
382 | type => DBM::Deep::TYPE_ARRAY |
383 | ); |
384 | |
385 | $db->push("bar", "baz"); |
386 | $db->unshift("foo"); |
387 | $db->put(3, "buz"); |
388 | |
389 | my $len = $db->length(); |
390 | print "length: $len\n"; # 4 |
391 | |
392 | for (my $k=0; $k<$len; $k++) { |
393 | print "$k: " . $db->get($k) . "\n"; |
394 | } |
395 | |
396 | $db->splice(1, 2, "biz", "baf"); |
397 | |
398 | while (my $elem = shift @$db) { |
399 | print "shifted: $elem\n"; |
400 | } |
401 | |
402 | LOCKING |
403 | Enable automatic file locking by passing a true value to the "locking" |
404 | parameter when constructing your DBM::Deep object (see SETUP above). |
405 | |
406 | my $db = new DBM::Deep( |
407 | file => "foo.db", |
408 | locking => 1 |
409 | ); |
410 | |
411 | This causes Deep to "flock()" the underlying FileHandle object with |
412 | exclusive mode for writes, and shared mode for reads. This is required |
413 | if you have multiple processes accessing the same database file, to |
414 | avoid file corruption. Please note that "flock()" does NOT work for |
415 | files over NFS. See "DB OVER NFS" below for more. |
416 | |
417 | EXPLICIT LOCKING |
418 | You can explicitly lock a database, so it remains locked for multiple |
419 | transactions. This is done by calling the "lock()" method, and passing |
420 | an optional lock mode argument (defaults to exclusive mode). This is |
421 | particularly useful for things like counters, where the current value |
422 | needs to be fetched, then incremented, then stored again. |
423 | |
424 | $db->lock(); |
425 | my $counter = $db->get("counter"); |
426 | $counter++; |
427 | $db->put("counter", $counter); |
428 | $db->unlock(); |
429 | |
430 | # or... |
431 | |
432 | $db->lock(); |
433 | $db->{counter}++; |
434 | $db->unlock(); |
435 | |
436 | You can pass "lock()" an optional argument, which specifies which mode |
437 | to use (exclusive or shared). Use one of these two constants: |
438 | "DBM::Deep::LOCK_EX" or "DBM::Deep::LOCK_SH". These are passed directly |
439 | to "flock()", and are the same as the constants defined in Perl's |
440 | "Fcntl" module. |
441 | |
442 | $db->lock( DBM::Deep::LOCK_SH ); |
443 | # something here |
444 | $db->unlock(); |
445 | |
446 | If you want to implement your own file locking scheme, be sure to create |
447 | your DBM::Deep objects setting the "volatile" option to true. This hints |
448 | to Deep that the DB file may change between transactions. See "LOW-LEVEL |
449 | ACCESS" below for more. |
450 | |
451 | IMPORTING/EXPORTING |
452 | You can import existing complex structures by calling the "import()" |
453 | method, and export an entire database into an in-memory structure using |
454 | the "export()" method. Both are examined here. |
455 | |
456 | IMPORTING |
457 | Say you have an existing hash with nested hashes/arrays inside it. |
458 | Instead of walking the structure and adding keys/elements to the |
459 | database as you go, simply pass a reference to the "import()" method. |
460 | This recursively adds everything to an existing DBM::Deep object for |
461 | you. Here is an example: |
462 | |
463 | my $struct = { |
464 | key1 => "value1", |
465 | key2 => "value2", |
466 | array1 => [ "elem0", "elem1", "elem2" ], |
467 | hash1 => { |
468 | subkey1 => "subvalue1", |
469 | subkey2 => "subvalue2" |
470 | } |
471 | }; |
472 | |
473 | my $db = new DBM::Deep "foo.db"; |
474 | $db->import( $struct ); |
475 | |
476 | print $db->{key1} . "\n"; # prints "value1" |
477 | |
478 | This recursively imports the entire $struct object into $db, including |
479 | all nested hashes and arrays. If the DBM::Deep object contains exsiting |
480 | data, keys are merged with the existing ones, replacing if they already |
481 | exist. The "import()" method can be called on any database level (not |
482 | just the base level), and works with both hash and array DB types. |
483 | |
484 | Note: Make sure your existing structure has no circular references in |
485 | it. These will cause an infinite loop when importing. |
486 | |
487 | EXPORTING |
488 | Calling the "export()" method on an existing DBM::Deep object will |
489 | return a reference to a new in-memory copy of the database. The export |
490 | is done recursively, so all nested hashes/arrays are all exported to |
491 | standard Perl objects. Here is an example: |
492 | |
493 | my $db = new DBM::Deep "foo.db"; |
494 | |
495 | $db->{key1} = "value1"; |
496 | $db->{key2} = "value2"; |
497 | $db->{hash1} = {}; |
498 | $db->{hash1}->{subkey1} = "subvalue1"; |
499 | $db->{hash1}->{subkey2} = "subvalue2"; |
500 | |
501 | my $struct = $db->export(); |
502 | |
503 | print $struct->{key1} . "\n"; # prints "value1" |
504 | |
505 | This makes a complete copy of the database in memory, and returns a |
506 | reference to it. The "export()" method can be called on any database |
507 | level (not just the base level), and works with both hash and array DB |
508 | types. Be careful of large databases -- you can store a lot more data in |
509 | a DBM::Deep object than an in-memory Perl structure. |
510 | |
511 | Note: Make sure your database has no circular references in it. These |
512 | will cause an infinite loop when exporting. |
513 | |
514 | FILTERS |
515 | DBM::Deep has a number of hooks where you can specify your own Perl |
516 | function to perform filtering on incoming or outgoing data. This is a |
517 | perfect way to extend the engine, and implement things like real-time |
518 | compression or encryption. Filtering applies to the base DB level, and |
519 | all child hashes / arrays. Filter hooks can be specified when your |
520 | DBM::Deep object is first constructed, or by calling the "set_filter()" |
521 | method at any time. There are four available filter hooks, described |
522 | below: |
523 | |
524 | * filter_store_key |
525 | This filter is called whenever a hash key is stored. It is passed |
526 | the incoming key, and expected to return a transformed key. |
527 | |
528 | * filter_store_value |
529 | This filter is called whenever a hash key or array element is |
530 | stored. It is passed the incoming value, and expected to return a |
531 | transformed value. |
532 | |
533 | * filter_fetch_key |
534 | This filter is called whenever a hash key is fetched (i.e. via |
535 | "first_key()" or "next_key()"). It is passed the transformed key, |
536 | and expected to return the plain key. |
537 | |
538 | * filter_fetch_value |
539 | This filter is called whenever a hash key or array element is |
540 | fetched. It is passed the transformed value, and expected to return |
541 | the plain value. |
542 | |
543 | Here are the two ways to setup a filter hook: |
544 | |
545 | my $db = new DBM::Deep( |
546 | file => "foo.db", |
547 | filter_store_value => \&my_filter_store, |
548 | filter_fetch_value => \&my_filter_fetch |
549 | ); |
550 | |
551 | # or... |
552 | |
553 | $db->set_filter( "filter_store_value", \&my_filter_store ); |
554 | $db->set_filter( "filter_fetch_value", \&my_filter_fetch ); |
555 | |
556 | Your filter function will be called only when dealing with SCALAR keys |
557 | or values. When nested hashes and arrays are being stored/fetched, |
558 | filtering is bypassed. Filters are called as static functions, passed a |
559 | single SCALAR argument, and expected to return a single SCALAR value. If |
560 | you want to remove a filter, set the function reference to "undef": |
561 | |
562 | $db->set_filter( "filter_store_value", undef ); |
563 | |
564 | REAL-TIME ENCRYPTION EXAMPLE |
565 | Here is a working example that uses the *Crypt::Blowfish* module to do |
566 | real-time encryption / decryption of keys & values with DBM::Deep |
567 | Filters. Please visit |
568 | <http://search.cpan.org/search?module=Crypt::Blowfish> for more on |
569 | *Crypt::Blowfish*. You'll also need the *Crypt::CBC* module. |
570 | |
571 | use DBM::Deep; |
572 | use Crypt::Blowfish; |
573 | use Crypt::CBC; |
574 | |
575 | my $cipher = new Crypt::CBC({ |
576 | 'key' => 'my secret key', |
577 | 'cipher' => 'Blowfish', |
578 | 'iv' => '$KJh#(}q', |
579 | 'regenerate_key' => 0, |
580 | 'padding' => 'space', |
581 | 'prepend_iv' => 0 |
582 | }); |
583 | |
584 | my $db = new DBM::Deep( |
585 | file => "foo-encrypt.db", |
586 | filter_store_key => \&my_encrypt, |
587 | filter_store_value => \&my_encrypt, |
588 | filter_fetch_key => \&my_decrypt, |
589 | filter_fetch_value => \&my_decrypt, |
590 | ); |
591 | |
592 | $db->{key1} = "value1"; |
593 | $db->{key2} = "value2"; |
594 | print "key1: " . $db->{key1} . "\n"; |
595 | print "key2: " . $db->{key2} . "\n"; |
596 | |
597 | undef $db; |
598 | exit; |
599 | |
600 | sub my_encrypt { |
601 | return $cipher->encrypt( $_[0] ); |
602 | } |
603 | sub my_decrypt { |
604 | return $cipher->decrypt( $_[0] ); |
605 | } |
606 | |
607 | REAL-TIME COMPRESSION EXAMPLE |
608 | Here is a working example that uses the *Compress::Zlib* module to do |
609 | real-time compression / decompression of keys & values with DBM::Deep |
610 | Filters. Please visit |
611 | <http://search.cpan.org/search?module=Compress::Zlib> for more on |
612 | *Compress::Zlib*. |
613 | |
614 | use DBM::Deep; |
615 | use Compress::Zlib; |
616 | |
617 | my $db = new DBM::Deep( |
618 | file => "foo-compress.db", |
619 | filter_store_key => \&my_compress, |
620 | filter_store_value => \&my_compress, |
621 | filter_fetch_key => \&my_decompress, |
622 | filter_fetch_value => \&my_decompress, |
623 | ); |
624 | |
625 | $db->{key1} = "value1"; |
626 | $db->{key2} = "value2"; |
627 | print "key1: " . $db->{key1} . "\n"; |
628 | print "key2: " . $db->{key2} . "\n"; |
629 | |
630 | undef $db; |
631 | exit; |
632 | |
633 | sub my_compress { |
634 | return Compress::Zlib::memGzip( $_[0] ) ; |
635 | } |
636 | sub my_decompress { |
637 | return Compress::Zlib::memGunzip( $_[0] ) ; |
638 | } |
639 | |
640 | Note: Filtering of keys only applies to hashes. Array "keys" are |
641 | actually numerical index numbers, and are not filtered. |
642 | |
643 | ERROR HANDLING |
644 | Most DBM::Deep methods return a true value for success, and call die() |
645 | on failure. You can wrap calls in an eval block to catch the die. Also, |
646 | the actual error message is stored in an internal scalar, which can be |
647 | fetched by calling the "error()" method. |
648 | |
649 | my $db = new DBM::Deep "foo.db"; # create hash |
650 | eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call |
651 | |
652 | print $db->error(); # prints error message |
653 | |
654 | You can then call "clear_error()" to clear the current error state. |
655 | |
656 | $db->clear_error(); |
657 | |
658 | If you set the "debug" option to true when creating your DBM::Deep |
659 | object, all errors are considered NON-FATAL, and dumped to STDERR. This |
660 | is only for debugging purposes. |
661 | |
662 | LARGEFILE SUPPORT |
663 | If you have a 64-bit system, and your Perl is compiled with both |
664 | LARGEFILE and 64-bit support, you *may* be able to create databases |
665 | larger than 2 GB. DBM::Deep by default uses 32-bit file offset tags, but |
666 | these can be changed by calling the static "set_pack()" method before |
667 | you do anything else. |
668 | |
669 | DBM::Deep::set_pack(8, 'Q'); |
670 | |
671 | This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad |
672 | words instead of 32-bit longs. After setting these values your DB files |
673 | have a theoretical maximum size of 16 XB (exabytes). |
674 | |
675 | Note: Changing these values will NOT work for existing database files. |
676 | Only change this for new files, and make sure it stays set consistently |
677 | throughout the file's life. If you do set these values, you can no |
678 | longer access 32-bit DB files. You can, however, call "set_pack(4, 'N')" |
679 | to change back to 32-bit mode. |
680 | |
681 | Note: I have not personally tested files > 2 GB -- all my systems have |
682 | only a 32-bit Perl. However, I have received user reports that this does |
683 | indeed work! |
684 | |
685 | LOW-LEVEL ACCESS |
686 | If you require low-level access to the underlying FileHandle that Deep |
687 | uses, you can call the "fh()" method, which returns the handle: |
688 | |
689 | my $fh = $db->fh(); |
690 | |
691 | This method can be called on the root level of the datbase, or any child |
692 | hashes or arrays. All levels share a *root* structure, which contains |
693 | things like the FileHandle, a reference counter, and all your options |
694 | you specified when you created the object. You can get access to this |
695 | root structure by calling the "root()" method. |
696 | |
697 | my $root = $db->root(); |
698 | |
699 | This is useful for changing options after the object has already been |
700 | created, such as enabling/disabling locking, volatile or debug modes. |
701 | You can also store your own temporary user data in this structure (be |
702 | wary of name collision), which is then accessible from any child hash or |
703 | array. |
704 | |
705 | CUSTOM DIGEST ALGORITHM |
706 | DBM::Deep by default uses the *Message Digest 5* (MD5) algorithm for |
707 | hashing keys. However you can override this, and use another algorithm |
708 | (such as SHA-256) or even write your own. But please note that Deep |
709 | currently expects zero collisions, so your algorithm has to be |
710 | *perfect*, so to speak. Collision detection may be introduced in a later |
711 | version. |
712 | |
713 | You can specify a custom digest algorithm by calling the static |
714 | "set_digest()" function, passing a reference to a subroutine, and the |
715 | length of the algorithm's hashes (in bytes). This is a global static |
716 | function, which affects ALL Deep objects. Here is a working example that |
717 | uses a 256-bit hash from the *Digest::SHA256* module. Please see |
718 | <http://search.cpan.org/search?module=Digest::SHA256> for more. |
719 | |
720 | use DBM::Deep; |
721 | use Digest::SHA256; |
722 | |
723 | my $context = Digest::SHA256::new(256); |
724 | |
725 | DBM::Deep::set_digest( \&my_digest, 32 ); |
726 | |
727 | my $db = new DBM::Deep "foo-sha.db"; |
728 | |
729 | $db->{key1} = "value1"; |
730 | $db->{key2} = "value2"; |
731 | print "key1: " . $db->{key1} . "\n"; |
732 | print "key2: " . $db->{key2} . "\n"; |
733 | |
734 | undef $db; |
735 | exit; |
736 | |
737 | sub my_digest { |
738 | return substr( $context->hash($_[0]), 0, 32 ); |
739 | } |
740 | |
741 | Note: Your returned digest strings must be EXACTLY the number of bytes |
742 | you specify in the "set_digest()" function (in this case 32). |
743 | |
744 | CIRCULAR REFERENCES |
745 | DBM::Deep has experimental support for circular references. Meaning you |
746 | can have a nested hash key or array element that points to a parent |
747 | object. This relationship is stored in the DB file, and is preserved |
748 | between sessions. Here is an example: |
749 | |
750 | my $db = new DBM::Deep "foo.db"; |
751 | |
752 | $db->{foo} = "bar"; |
753 | $db->{circle} = $db; # ref to self |
754 | |
755 | print $db->{foo} . "\n"; # prints "foo" |
756 | print $db->{circle}->{foo} . "\n"; # prints "foo" again |
757 | |
758 | One catch is, passing the object to a function that recursively walks |
759 | the object tree (such as *Data::Dumper* or even the built-in |
760 | "optimize()" or "export()" methods) will result in an infinite loop. The |
761 | other catch is, if you fetch the *key* of a circular reference (i.e. |
762 | using the "first_key()" or "next_key()" methods), you will get the |
763 | *target object's key*, not the ref's key. This gets even more |
764 | interesting with the above example, where the *circle* key points to the |
765 | base DB object, which technically doesn't have a key. So I made |
766 | DBM::Deep return "[base]" as the key name in that special case. |
767 | |
768 | CAVEATS / ISSUES / BUGS |
769 | This section describes all the known issues with DBM::Deep. It you have |
770 | found something that is not listed here, please send e-mail to |
771 | jhuckaby@cpan.org. |
772 | |
773 | UNUSED SPACE RECOVERY |
774 | One major caveat with Deep is that space occupied by existing keys and |
775 | values is not recovered when they are deleted. Meaning if you keep |
776 | deleting and adding new keys, your file will continuously grow. I am |
777 | working on this, but in the meantime you can call the built-in |
778 | "optimize()" method from time to time (perhaps in a crontab or |
779 | something) to recover all your unused space. |
780 | |
781 | $db->optimize(); # returns true on success |
782 | |
783 | This rebuilds the ENTIRE database into a new file, then moves it on top |
784 | of the original. The new file will have no unused space, thus it will |
785 | take up as little disk space as possible. Please note that this |
786 | operation can take a long time for large files, and you need enough disk |
787 | space to temporarily hold 2 copies of your DB file. The temporary file |
788 | is created in the same directory as the original, named with a ".tmp" |
789 | extension, and is deleted when the operation completes. Oh, and if |
790 | locking is enabled, the DB is automatically locked for the entire |
791 | duration of the copy. |
792 | |
793 | WARNING: Only call optimize() on the top-level node of the database, and |
794 | make sure there are no child references lying around. Deep keeps a |
795 | reference counter, and if it is greater than 1, optimize() will abort |
796 | and return undef. |
797 | |
798 | AUTOVIVIFICATION |
799 | Unfortunately, autovivification doesn't work with tied hashes. This |
800 | appears to be a bug in Perl's tie() system, as *Jakob Schmidt* |
801 | encountered the very same issue with his *DWH_FIle* module (see |
802 | <http://search.cpan.org/search?module=DWH_File>), and it is also |
803 | mentioned in the BUGS section for the *MLDBM* module <see |
804 | <http://search.cpan.org/search?module=MLDBM>). Basically, on a new db |
805 | file, this does not work: |
806 | |
807 | $db->{foo}->{bar} = "hello"; |
808 | |
809 | Since "foo" doesn't exist, you cannot add "bar" to it. You end up with |
810 | "foo" being an empty hash. Try this instead, which works fine: |
811 | |
812 | $db->{foo} = { bar => "hello" }; |
813 | |
814 | As of Perl 5.8.7, this bug still exists. I have walked very carefully |
815 | through the execution path, and Perl indeed passes an empty hash to the |
816 | STORE() method. Probably a bug in Perl. |
817 | |
818 | FILE CORRUPTION |
819 | The current level of error handling in Deep is minimal. Files *are* |
820 | checked for a 32-bit signature on open(), but other corruption in files |
821 | can cause segmentation faults. Deep may try to seek() past the end of a |
822 | file, or get stuck in an infinite loop depending on the level of |
823 | corruption. File write operations are not checked for failure (for |
824 | speed), so if you happen to run out of disk space, Deep will probably |
825 | fail in a bad way. These things will be addressed in a later version of |
826 | DBM::Deep. |
827 | |
828 | DB OVER NFS |
829 | Beware of using DB files over NFS. Deep uses flock(), which works well |
830 | on local filesystems, but will NOT protect you from file corruption over |
831 | NFS. I've heard about setting up your NFS server with a locking daemon, |
832 | then using lockf() to lock your files, but your milage may vary there as |
833 | well. From what I understand, there is no real way to do it. However, if |
834 | you need access to the underlying FileHandle in Deep for using some |
835 | other kind of locking scheme like lockf(), see the "LOW-LEVEL ACCESS" |
836 | section above. |
837 | |
838 | COPYING OBJECTS |
839 | Beware of copying tied objects in Perl. Very strange things can happen. |
840 | Instead, use Deep's "clone()" method which safely copies the object and |
841 | returns a new, blessed, tied hash or array to the same level in the DB. |
842 | |
843 | my $copy = $db->clone(); |
844 | |
845 | LARGE ARRAYS |
846 | Beware of using "shift()", "unshift()" or "splice()" with large arrays. |
847 | These functions cause every element in the array to move, which can be |
848 | murder on DBM::Deep, as every element has to be fetched from disk, then |
849 | stored again in a different location. This may be addressed in a later |
850 | version. |
851 | |
852 | PERFORMANCE |
853 | This section discusses DBM::Deep's speed and memory usage. |
854 | |
855 | SPEED |
856 | Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, |
857 | such as the almighty *BerkeleyDB*. But it makes up for it in features |
858 | like true multi-level hash/array support, and cross-platform FTPable |
859 | files. Even so, DBM::Deep is still pretty fast, and the speed stays |
860 | fairly consistent, even with huge databases. Here is some test data: |
861 | |
862 | Adding 1,000,000 keys to new DB file... |
863 | |
864 | At 100 keys, avg. speed is 2,703 keys/sec |
865 | At 200 keys, avg. speed is 2,642 keys/sec |
866 | At 300 keys, avg. speed is 2,598 keys/sec |
867 | At 400 keys, avg. speed is 2,578 keys/sec |
868 | At 500 keys, avg. speed is 2,722 keys/sec |
869 | At 600 keys, avg. speed is 2,628 keys/sec |
870 | At 700 keys, avg. speed is 2,700 keys/sec |
871 | At 800 keys, avg. speed is 2,607 keys/sec |
872 | At 900 keys, avg. speed is 2,190 keys/sec |
873 | At 1,000 keys, avg. speed is 2,570 keys/sec |
874 | At 2,000 keys, avg. speed is 2,417 keys/sec |
875 | At 3,000 keys, avg. speed is 1,982 keys/sec |
876 | At 4,000 keys, avg. speed is 1,568 keys/sec |
877 | At 5,000 keys, avg. speed is 1,533 keys/sec |
878 | At 6,000 keys, avg. speed is 1,787 keys/sec |
879 | At 7,000 keys, avg. speed is 1,977 keys/sec |
880 | At 8,000 keys, avg. speed is 2,028 keys/sec |
881 | At 9,000 keys, avg. speed is 2,077 keys/sec |
882 | At 10,000 keys, avg. speed is 2,031 keys/sec |
883 | At 20,000 keys, avg. speed is 1,970 keys/sec |
884 | At 30,000 keys, avg. speed is 2,050 keys/sec |
885 | At 40,000 keys, avg. speed is 2,073 keys/sec |
886 | At 50,000 keys, avg. speed is 1,973 keys/sec |
887 | At 60,000 keys, avg. speed is 1,914 keys/sec |
888 | At 70,000 keys, avg. speed is 2,091 keys/sec |
889 | At 80,000 keys, avg. speed is 2,103 keys/sec |
890 | At 90,000 keys, avg. speed is 1,886 keys/sec |
891 | At 100,000 keys, avg. speed is 1,970 keys/sec |
892 | At 200,000 keys, avg. speed is 2,053 keys/sec |
893 | At 300,000 keys, avg. speed is 1,697 keys/sec |
894 | At 400,000 keys, avg. speed is 1,838 keys/sec |
895 | At 500,000 keys, avg. speed is 1,941 keys/sec |
896 | At 600,000 keys, avg. speed is 1,930 keys/sec |
897 | At 700,000 keys, avg. speed is 1,735 keys/sec |
898 | At 800,000 keys, avg. speed is 1,795 keys/sec |
899 | At 900,000 keys, avg. speed is 1,221 keys/sec |
900 | At 1,000,000 keys, avg. speed is 1,077 keys/sec |
901 | |
902 | This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & |
903 | Perl 5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash |
904 | keys and values were between 6 - 12 chars in length. The DB file ended |
905 | up at 210MB. Run time was 12 min 3 sec. |
906 | |
907 | MEMORY USAGE |
908 | One of the great things about DBM::Deep is that it uses very little |
909 | memory. Even with huge databases (1,000,000+ keys) you will not see much |
910 | increased memory on your process. Deep relies solely on the filesystem |
911 | for storing and fetching data. Here is output from */usr/bin/top* before |
912 | even opening a database handle: |
913 | |
914 | PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND |
915 | 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl |
916 | |
917 | Basically the process is taking 2,716K of memory. And here is the same |
918 | process after storing and fetching 1,000,000 keys: |
919 | |
920 | PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND |
921 | 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl |
922 | |
923 | Notice the memory usage increased by only 56K. Test was performed on a |
924 | 700mHz x86 box running Linux RedHat 7.2 & Perl 5.6.1. |
925 | |
926 | DB FILE FORMAT |
927 | In case you were interested in the underlying DB file format, it is |
928 | documented here in this section. You don't need to know this to use the |
929 | module, it's just included for reference. |
930 | |
931 | SIGNATURE |
932 | DBM::Deep files always start with a 32-bit signature to identify the |
933 | file type. This is at offset 0. The signature is "DPDB" in network byte |
934 | order. This is checked upon each file open(). |
935 | |
936 | TAG |
937 | The DBM::Deep file is in a *tagged format*, meaning each section of the |
938 | file has a standard header containing the type of data, the length of |
939 | data, and then the data itself. The type is a single character (1 byte), |
940 | the length is a 32-bit unsigned long in network byte order, and the data |
941 | is, well, the data. Here is how it unfolds: |
942 | |
943 | MASTER INDEX |
944 | Immediately after the 32-bit file signature is the *Master Index* |
945 | record. This is a standard tag header followed by 1024 bytes (in 32-bit |
946 | mode) or 2048 bytes (in 64-bit mode) of data. The type is *H* for hash |
947 | or *A* for array, depending on how the DBM::Deep object was constructed. |
948 | |
949 | The index works by looking at a *MD5 Hash* of the hash key (or array |
950 | index number). The first 8-bit char of the MD5 signature is the offset |
951 | into the index, multipled by 4 in 32-bit mode, or 8 in 64-bit mode. The |
952 | value of the index element is a file offset of the next tag for the |
953 | key/element in question, which is usually a *Bucket List* tag (see |
954 | below). |
955 | |
956 | The next tag *could* be another index, depending on how many |
957 | keys/elements exist. See RE-INDEXING below for details. |
958 | |
959 | BUCKET LIST |
960 | A *Bucket List* is a collection of 16 MD5 hashes for keys/elements, plus |
961 | file offsets to where the actual data is stored. It starts with a |
962 | standard tag header, with type *B*, and a data size of 320 bytes in |
963 | 32-bit mode, or 384 bytes in 64-bit mode. Each MD5 hash is stored in |
964 | full (16 bytes), plus the 32-bit or 64-bit file offset for the *Bucket* |
965 | containing the actual data. When the list fills up, a *Re-Index* |
966 | operation is performed (See RE-INDEXING below). |
967 | |
968 | BUCKET |
969 | A *Bucket* is a tag containing a key/value pair (in hash mode), or a |
970 | index/value pair (in array mode). It starts with a standard tag header |
971 | with type *D* for scalar data (string, binary, etc.), or it could be a |
972 | nested hash (type *H*) or array (type *A*). The value comes just after |
973 | the tag header. The size reported in the tag header is only for the |
974 | value, but then, just after the value is another size (32-bit unsigned |
975 | long) and then the plain key itself. Since the value is likely to be |
976 | fetched more often than the plain key, I figured it would be *slightly* |
977 | faster to store the value first. |
978 | |
979 | If the type is *H* (hash) or *A* (array), the value is another *Master |
980 | Index* record for the nested structure, where the process begins all |
981 | over again. |
982 | |
983 | RE-INDEXING |
984 | After a *Bucket List* grows to 16 records, its allocated space in the |
985 | file is exhausted. Then, when another key/element comes in, the list is |
986 | converted to a new index record. However, this index will look at the |
987 | next char in the MD5 hash, and arrange new Bucket List pointers |
988 | accordingly. This process is called *Re-Indexing*. Basically, a new |
989 | index tag is created at the file EOF, and all 17 (16 + new one) |
990 | keys/elements are removed from the old Bucket List and inserted into the |
991 | new index. Several new Bucket Lists are created in the process, as a new |
992 | MD5 char from the key is being examined (it is unlikely that the keys |
993 | will all share the same next char of their MD5s). |
994 | |
995 | Because of the way the *MD5* algorithm works, it is impossible to tell |
996 | exactly when the Bucket Lists will turn into indexes, but the first |
997 | round tends to happen right around 4,000 keys. You will see a *slight* |
998 | decrease in performance here, but it picks back up pretty quick (see |
999 | SPEED above). Then it takes a lot more keys to exhaust the next level of |
1000 | Bucket Lists. It's right around 900,000 keys. This process can continue |
1001 | nearly indefinitely -- right up until the point the *MD5* signatures |
1002 | start colliding with each other, and this is EXTREMELY rare -- like |
1003 | winning the lottery 5 times in a row AND getting struck by lightning |
1004 | while you are walking to cash in your tickets. Theoretically, since |
1005 | *MD5* hashes are 128-bit values, you *could* have up to |
1006 | 340,282,366,921,000,000,000,000,000,000,000,000,000 keys/elements (I |
1007 | believe this is 340 unodecillion, but don't quote me). |
1008 | |
1009 | STORING |
1010 | When a new key/element is stored, the key (or index number) is first ran |
1011 | through *Digest::MD5* to get a 128-bit signature (example, in hex: |
1012 | b05783b0773d894396d475ced9d2f4f6). Then, the *Master Index* record is |
1013 | checked for the first char of the signature (in this case *b*). If it |
1014 | does not exist, a new *Bucket List* is created for our key (and the next |
1015 | 15 future keys that happen to also have *b* as their first MD5 char). |
1016 | The entire MD5 is written to the *Bucket List* along with the offset of |
1017 | the new *Bucket* record (EOF at this point, unless we are replacing an |
1018 | existing *Bucket*), where the actual data will be stored. |
1019 | |
1020 | FETCHING |
1021 | Fetching an existing key/element involves getting a *Digest::MD5* of the |
1022 | key (or index number), then walking along the indexes. If there are |
1023 | enough keys/elements in this DB level, there might be nested indexes, |
1024 | each linked to a particular char of the MD5. Finally, a *Bucket List* is |
1025 | pointed to, which contains up to 16 full MD5 hashes. Each is checked for |
1026 | equality to the key in question. If we found a match, the *Bucket* tag |
1027 | is loaded, where the value and plain key are stored. |
1028 | |
1029 | Fetching the plain key occurs when calling the *first_key()* and |
1030 | *next_key()* methods. In this process the indexes are walked |
1031 | systematically, and each key fetched in increasing MD5 order (which is |
1032 | why it appears random). Once the *Bucket* is found, the value is skipped |
1033 | the plain key returned instead. Note: Do not count on keys being fetched |
1034 | as if the MD5 hashes were alphabetically sorted. This only happens on an |
1035 | index-level -- as soon as the *Bucket Lists* are hit, the keys will come |
1036 | out in the order they went in -- so it's pretty much undefined how the |
1037 | keys will come out -- just like Perl's built-in hashes. |
1038 | |
1039 | AUTHOR |
1040 | Joseph Huckaby, jhuckaby@cpan.org |
1041 | |
1042 | Special thanks to Adam Sah and Rich Gaushell! You know why :-) |
1043 | |
1044 | SEE ALSO |
1045 | perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3), |
1046 | nfs(5), Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3) |
1047 | |
1048 | LICENSE |
1049 | Copyright (c) 2002-2005 Joseph Huckaby. All Rights Reserved. This is |
1050 | free software, you may use it and distribute it under the same terms as |
1051 | Perl itself. |
1052 | |