Commit | Line | Data |
de82ff48 |
1 | =head1 NAME |
2 | |
3 | DBM::Deep - A pure perl multi-level hash/array DBM that supports transactions |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | use DBM::Deep; |
8 | my $db = DBM::Deep->new( "foo.db" ); |
9 | |
10 | $db->{key} = 'value'; |
11 | print $db->{key}; |
12 | |
13 | $db->put('key' => 'value'); |
14 | print $db->get('key'); |
15 | |
16 | # true multi-level support |
17 | $db->{my_complex} = [ |
18 | 'hello', { perl => 'rules' }, |
19 | 42, 99, |
20 | ]; |
21 | |
22 | $db->begin_work; |
23 | |
24 | # Do stuff here |
25 | |
26 | $db->rollback; |
27 | $db->commit; |
28 | |
29 | tie my %db, 'DBM::Deep', 'foo.db'; |
30 | $db{key} = 'value'; |
31 | print $db{key}; |
32 | |
33 | tied(%db)->put('key' => 'value'); |
34 | print tied(%db)->get('key'); |
35 | |
36 | =head1 DESCRIPTION |
37 | |
38 | A unique flat-file database module, written in pure perl. True multi-level |
39 | hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() |
40 | interface, cross-platform FTPable files, ACID transactions, and is quite fast. |
41 | Can handle millions of keys and unlimited levels without significant |
42 | slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper |
43 | around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and |
44 | Windows. |
45 | |
46 | =head1 VERSION DIFFERENCES |
47 | |
48 | B<NOTE>: 0.99_03 has significant file format differences from prior versions. |
49 | THere will be a backwards-compatibility layer in 1.00, but that is slated for |
50 | a later 0.99_x release. This version is B<NOT> backwards compatible with any |
51 | other release of DBM::Deep. |
52 | |
53 | B<NOTE>: 0.99_01 and above have significant file format differences from 0.983 and |
54 | before. There will be a backwards-compatibility layer in 1.00, but that is |
55 | slated for a later 0.99_x release. This version is B<NOT> backwards compatible |
56 | with 0.983 and before. |
57 | |
58 | =head1 SETUP |
59 | |
60 | Construction can be done OO-style (which is the recommended way), or using |
61 | Perl's tie() function. Both are examined here. |
62 | |
63 | =head2 OO CONSTRUCTION |
64 | |
65 | The recommended way to construct a DBM::Deep object is to use the new() |
66 | method, which gets you a blessed I<and> tied hash (or array) reference. |
67 | |
68 | my $db = DBM::Deep->new( "foo.db" ); |
69 | |
70 | This opens a new database handle, mapped to the file "foo.db". If this |
71 | file does not exist, it will automatically be created. DB files are |
72 | opened in "r+" (read/write) mode, and the type of object returned is a |
73 | hash, unless otherwise specified (see L<OPTIONS> below). |
74 | |
75 | You can pass a number of options to the constructor to specify things like |
76 | locking, autoflush, etc. This is done by passing an inline hash (or hashref): |
77 | |
78 | my $db = DBM::Deep->new( |
79 | file => "foo.db", |
80 | locking => 1, |
81 | autoflush => 1 |
82 | ); |
83 | |
84 | Notice that the filename is now specified I<inside> the hash with |
85 | the "file" parameter, as opposed to being the sole argument to the |
86 | constructor. This is required if any options are specified. |
87 | See L<OPTIONS> below for the complete list. |
88 | |
89 | You can also start with an array instead of a hash. For this, you must |
90 | specify the C<type> parameter: |
91 | |
92 | my $db = DBM::Deep->new( |
93 | file => "foo.db", |
94 | type => DBM::Deep->TYPE_ARRAY |
95 | ); |
96 | |
97 | B<Note:> Specifing the C<type> parameter only takes effect when beginning |
98 | a new DB file. If you create a DBM::Deep object with an existing file, the |
99 | C<type> will be loaded from the file header, and an error will be thrown if |
100 | the wrong type is passed in. |
101 | |
102 | =head2 TIE CONSTRUCTION |
103 | |
104 | Alternately, you can create a DBM::Deep handle by using Perl's built-in |
105 | tie() function. The object returned from tie() can be used to call methods, |
106 | such as lock() and unlock(). (That object can be retrieved from the tied |
107 | variable at any time using tied() - please see L<perltie/> for more info. |
108 | |
109 | my %hash; |
110 | my $db = tie %hash, "DBM::Deep", "foo.db"; |
111 | |
112 | my @array; |
113 | my $db = tie @array, "DBM::Deep", "bar.db"; |
114 | |
115 | As with the OO constructor, you can replace the DB filename parameter with |
116 | a hash containing one or more options (see L<OPTIONS> just below for the |
117 | complete list). |
118 | |
119 | tie %hash, "DBM::Deep", { |
120 | file => "foo.db", |
121 | locking => 1, |
122 | autoflush => 1 |
123 | }; |
124 | |
125 | =head2 OPTIONS |
126 | |
127 | There are a number of options that can be passed in when constructing your |
128 | DBM::Deep objects. These apply to both the OO- and tie- based approaches. |
129 | |
130 | =over |
131 | |
132 | =item * file |
133 | |
134 | Filename of the DB file to link the handle to. You can pass a full absolute |
135 | filesystem path, partial path, or a plain filename if the file is in the |
136 | current working directory. This is a required parameter (though q.v. fh). |
137 | |
138 | =item * fh |
139 | |
140 | If you want, you can pass in the fh instead of the file. This is most useful for doing |
141 | something like: |
142 | |
143 | my $db = DBM::Deep->new( { fh => \*DATA } ); |
144 | |
145 | You are responsible for making sure that the fh has been opened appropriately for your |
146 | needs. If you open it read-only and attempt to write, an exception will be thrown. If you |
147 | open it write-only or append-only, an exception will be thrown immediately as DBM::Deep |
148 | needs to read from the fh. |
149 | |
150 | =item * file_offset |
151 | |
152 | This is the offset within the file that the DBM::Deep db starts. Most of the time, you will |
153 | not need to set this. However, it's there if you want it. |
154 | |
155 | If you pass in fh and do not set this, it will be set appropriately. |
156 | |
157 | =item * type |
158 | |
159 | This parameter specifies what type of object to create, a hash or array. Use |
160 | one of these two constants: |
161 | |
162 | =over 4 |
163 | |
164 | =item * C<DBM::Deep-E<gt>TYPE_HASH> |
165 | |
166 | =item * C<DBM::Deep-E<gt>TYPE_ARRAY>. |
167 | |
168 | =back |
169 | |
170 | This only takes effect when beginning a new file. This is an optional |
171 | parameter, and defaults to C<DBM::Deep-E<gt>TYPE_HASH>. |
172 | |
173 | =item * locking |
174 | |
175 | Specifies whether locking is to be enabled. DBM::Deep uses Perl's flock() |
176 | function to lock the database in exclusive mode for writes, and shared mode |
177 | for reads. Pass any true value to enable. This affects the base DB handle |
178 | I<and any child hashes or arrays> that use the same DB file. This is an |
179 | optional parameter, and defaults to 0 (disabled). See L<LOCKING> below for |
180 | more. |
181 | |
182 | =item * autoflush |
183 | |
184 | Specifies whether autoflush is to be enabled on the underlying filehandle. |
185 | This obviously slows down write operations, but is required if you may have |
186 | multiple processes accessing the same DB file (also consider enable I<locking>). |
187 | Pass any true value to enable. This is an optional parameter, and defaults to 0 |
188 | (disabled). |
189 | |
190 | =item * filter_* |
191 | |
192 | See L</FILTERS> below. |
193 | |
194 | =back |
195 | |
42c6906e |
196 | The following parameters may be specified in the constructor the first time the |
197 | datafile is created. However, they will be stored in the header of the file and |
198 | cannot be overridden by subsequent openings of the file - the values will be set |
199 | from the values stored in the datafile's header. |
200 | |
201 | =over 4 |
202 | |
203 | =item * num_txns |
204 | |
205 | This is the maximum number of transactions that can be running at one time. The |
206 | default is two - the HEAD and one for imports. The minimum is two and the |
207 | maximum is 255. The more transactions, the larger and quicker the datafile grows. |
208 | |
209 | See L</TRANSACTIONS> below. |
210 | |
211 | =item * max_buckets |
212 | |
213 | This is the number of entries that can be added before a reindexing. The larger |
214 | this number is made, the larger a file gets, but the better performance you will |
215 | have. The default and minimum number this can be is 16. There is no maximum, but |
216 | more than 32 isn't recommended. |
217 | |
218 | =item * pack_size |
219 | |
220 | This is the size of the file pointer used throughout the file. The valid values |
221 | are: |
222 | |
223 | =over 4 |
224 | |
225 | =item * small |
226 | |
227 | This uses 2-byte offsets, allowing for a maximum file size of 65K |
228 | |
229 | =item * medium (default) |
230 | |
231 | This uses 4-byte offsets, allowing for a maximum file size of 2G. |
232 | |
233 | =item * large |
234 | |
235 | This uses 8-byte offsets, allowing for a maximum file size of 16XB (exabytes). |
236 | |
237 | =back |
238 | |
239 | See L</LARGEFILE SUPPORT> for more information. |
240 | |
241 | =back |
242 | |
de82ff48 |
243 | =head1 TIE INTERFACE |
244 | |
245 | With DBM::Deep you can access your databases using Perl's standard hash/array |
246 | syntax. Because all DBM::Deep objects are I<tied> to hashes or arrays, you can |
247 | treat them as such. DBM::Deep will intercept all reads/writes and direct them |
248 | to the right place -- the DB file. This has nothing to do with the |
249 | L<TIE CONSTRUCTION> section above. This simply tells you how to use DBM::Deep |
250 | using regular hashes and arrays, rather than calling functions like C<get()> |
251 | and C<put()> (although those work too). It is entirely up to you how to want |
252 | to access your databases. |
253 | |
254 | =head2 HASHES |
255 | |
256 | You can treat any DBM::Deep object like a normal Perl hash reference. Add keys, |
257 | or even nested hashes (or arrays) using standard Perl syntax: |
258 | |
259 | my $db = DBM::Deep->new( "foo.db" ); |
260 | |
261 | $db->{mykey} = "myvalue"; |
262 | $db->{myhash} = {}; |
263 | $db->{myhash}->{subkey} = "subvalue"; |
264 | |
265 | print $db->{myhash}->{subkey} . "\n"; |
266 | |
267 | You can even step through hash keys using the normal Perl C<keys()> function: |
268 | |
269 | foreach my $key (keys %$db) { |
270 | print "$key: " . $db->{$key} . "\n"; |
271 | } |
272 | |
273 | Remember that Perl's C<keys()> function extracts I<every> key from the hash and |
274 | pushes them onto an array, all before the loop even begins. If you have an |
275 | extremely large hash, this may exhaust Perl's memory. Instead, consider using |
276 | Perl's C<each()> function, which pulls keys/values one at a time, using very |
277 | little memory: |
278 | |
279 | while (my ($key, $value) = each %$db) { |
280 | print "$key: $value\n"; |
281 | } |
282 | |
283 | Please note that when using C<each()>, you should always pass a direct |
284 | hash reference, not a lookup. Meaning, you should B<never> do this: |
285 | |
286 | # NEVER DO THIS |
287 | while (my ($key, $value) = each %{$db->{foo}}) { # BAD |
288 | |
289 | This causes an infinite loop, because for each iteration, Perl is calling |
290 | FETCH() on the $db handle, resulting in a "new" hash for foo every time, so |
291 | it effectively keeps returning the first key over and over again. Instead, |
292 | assign a temporary variable to C<$db->{foo}>, then pass that to each(). |
293 | |
294 | =head2 ARRAYS |
295 | |
296 | As with hashes, you can treat any DBM::Deep object like a normal Perl array |
297 | reference. This includes inserting, removing and manipulating elements, |
298 | and the C<push()>, C<pop()>, C<shift()>, C<unshift()> and C<splice()> functions. |
299 | The object must have first been created using type C<DBM::Deep-E<gt>TYPE_ARRAY>, |
300 | or simply be a nested array reference inside a hash. Example: |
301 | |
302 | my $db = DBM::Deep->new( |
303 | file => "foo-array.db", |
304 | type => DBM::Deep->TYPE_ARRAY |
305 | ); |
306 | |
307 | $db->[0] = "foo"; |
308 | push @$db, "bar", "baz"; |
309 | unshift @$db, "bah"; |
310 | |
311 | my $last_elem = pop @$db; # baz |
312 | my $first_elem = shift @$db; # bah |
313 | my $second_elem = $db->[1]; # bar |
314 | |
315 | my $num_elements = scalar @$db; |
316 | |
317 | =head1 OO INTERFACE |
318 | |
319 | In addition to the I<tie()> interface, you can also use a standard OO interface |
320 | to manipulate all aspects of DBM::Deep databases. Each type of object (hash or |
321 | array) has its own methods, but both types share the following common methods: |
322 | C<put()>, C<get()>, C<exists()>, C<delete()> and C<clear()>. C<fetch()> and |
323 | C<store(> are aliases to C<put()> and C<get()>, respectively. |
324 | |
325 | =over |
326 | |
327 | =item * new() / clone() |
328 | |
329 | These are the constructor and copy-functions. |
330 | |
331 | =item * put() / store() |
332 | |
333 | Stores a new hash key/value pair, or sets an array element value. Takes two |
334 | arguments, the hash key or array index, and the new value. The value can be |
335 | a scalar, hash ref or array ref. Returns true on success, false on failure. |
336 | |
337 | $db->put("foo", "bar"); # for hashes |
338 | $db->put(1, "bar"); # for arrays |
339 | |
340 | =item * get() / fetch() |
341 | |
342 | Fetches the value of a hash key or array element. Takes one argument: the hash |
343 | key or array index. Returns a scalar, hash ref or array ref, depending on the |
344 | data type stored. |
345 | |
346 | my $value = $db->get("foo"); # for hashes |
347 | my $value = $db->get(1); # for arrays |
348 | |
349 | =item * exists() |
350 | |
351 | Checks if a hash key or array index exists. Takes one argument: the hash key |
352 | or array index. Returns true if it exists, false if not. |
353 | |
354 | if ($db->exists("foo")) { print "yay!\n"; } # for hashes |
355 | if ($db->exists(1)) { print "yay!\n"; } # for arrays |
356 | |
357 | =item * delete() |
358 | |
359 | Deletes one hash key/value pair or array element. Takes one argument: the hash |
360 | key or array index. Returns true on success, false if not found. For arrays, |
361 | the remaining elements located after the deleted element are NOT moved over. |
362 | The deleted element is essentially just undefined, which is exactly how Perl's |
363 | internal arrays work. Please note that the space occupied by the deleted |
364 | key/value or element is B<not> reused again -- see L<UNUSED SPACE RECOVERY> |
365 | below for details and workarounds. |
366 | |
367 | $db->delete("foo"); # for hashes |
368 | $db->delete(1); # for arrays |
369 | |
370 | =item * clear() |
371 | |
372 | Deletes B<all> hash keys or array elements. Takes no arguments. No return |
373 | value. Please note that the space occupied by the deleted keys/values or |
374 | elements is B<not> reused again -- see L<UNUSED SPACE RECOVERY> below for |
375 | details and workarounds. |
376 | |
377 | $db->clear(); # hashes or arrays |
378 | |
379 | =item * lock() / unlock() |
380 | |
381 | q.v. Locking. |
382 | |
383 | =item * optimize() |
384 | |
385 | Recover lost disk space. This is important to do, especially if you use |
386 | transactions. |
387 | |
388 | =item * import() / export() |
389 | |
390 | Data going in and out. |
391 | |
392 | =back |
393 | |
394 | =head2 HASHES |
395 | |
396 | For hashes, DBM::Deep supports all the common methods described above, and the |
397 | following additional methods: C<first_key()> and C<next_key()>. |
398 | |
399 | =over |
400 | |
401 | =item * first_key() |
402 | |
403 | Returns the "first" key in the hash. As with built-in Perl hashes, keys are |
404 | fetched in an undefined order (which appears random). Takes no arguments, |
405 | returns the key as a scalar value. |
406 | |
407 | my $key = $db->first_key(); |
408 | |
409 | =item * next_key() |
410 | |
411 | Returns the "next" key in the hash, given the previous one as the sole argument. |
412 | Returns undef if there are no more keys to be fetched. |
413 | |
414 | $key = $db->next_key($key); |
415 | |
416 | =back |
417 | |
418 | Here are some examples of using hashes: |
419 | |
420 | my $db = DBM::Deep->new( "foo.db" ); |
421 | |
422 | $db->put("foo", "bar"); |
423 | print "foo: " . $db->get("foo") . "\n"; |
424 | |
425 | $db->put("baz", {}); # new child hash ref |
426 | $db->get("baz")->put("buz", "biz"); |
427 | print "buz: " . $db->get("baz")->get("buz") . "\n"; |
428 | |
429 | my $key = $db->first_key(); |
430 | while ($key) { |
431 | print "$key: " . $db->get($key) . "\n"; |
432 | $key = $db->next_key($key); |
433 | } |
434 | |
435 | if ($db->exists("foo")) { $db->delete("foo"); } |
436 | |
437 | =head2 ARRAYS |
438 | |
439 | For arrays, DBM::Deep supports all the common methods described above, and the |
440 | following additional methods: C<length()>, C<push()>, C<pop()>, C<shift()>, |
441 | C<unshift()> and C<splice()>. |
442 | |
443 | =over |
444 | |
445 | =item * length() |
446 | |
447 | Returns the number of elements in the array. Takes no arguments. |
448 | |
449 | my $len = $db->length(); |
450 | |
451 | =item * push() |
452 | |
453 | Adds one or more elements onto the end of the array. Accepts scalars, hash |
454 | refs or array refs. No return value. |
455 | |
456 | $db->push("foo", "bar", {}); |
457 | |
458 | =item * pop() |
459 | |
460 | Fetches the last element in the array, and deletes it. Takes no arguments. |
461 | Returns undef if array is empty. Returns the element value. |
462 | |
463 | my $elem = $db->pop(); |
464 | |
465 | =item * shift() |
466 | |
467 | Fetches the first element in the array, deletes it, then shifts all the |
468 | remaining elements over to take up the space. Returns the element value. This |
469 | method is not recommended with large arrays -- see L<LARGE ARRAYS> below for |
470 | details. |
471 | |
472 | my $elem = $db->shift(); |
473 | |
474 | =item * unshift() |
475 | |
476 | Inserts one or more elements onto the beginning of the array, shifting all |
477 | existing elements over to make room. Accepts scalars, hash refs or array refs. |
478 | No return value. This method is not recommended with large arrays -- see |
479 | <LARGE ARRAYS> below for details. |
480 | |
481 | $db->unshift("foo", "bar", {}); |
482 | |
483 | =item * splice() |
484 | |
485 | Performs exactly like Perl's built-in function of the same name. See L<perldoc |
486 | -f splice> for usage -- it is too complicated to document here. This method is |
487 | not recommended with large arrays -- see L<LARGE ARRAYS> below for details. |
488 | |
489 | =back |
490 | |
491 | Here are some examples of using arrays: |
492 | |
493 | my $db = DBM::Deep->new( |
494 | file => "foo.db", |
495 | type => DBM::Deep->TYPE_ARRAY |
496 | ); |
497 | |
498 | $db->push("bar", "baz"); |
499 | $db->unshift("foo"); |
500 | $db->put(3, "buz"); |
501 | |
502 | my $len = $db->length(); |
503 | print "length: $len\n"; # 4 |
504 | |
505 | for (my $k=0; $k<$len; $k++) { |
506 | print "$k: " . $db->get($k) . "\n"; |
507 | } |
508 | |
509 | $db->splice(1, 2, "biz", "baf"); |
510 | |
511 | while (my $elem = shift @$db) { |
512 | print "shifted: $elem\n"; |
513 | } |
514 | |
515 | =head1 LOCKING |
516 | |
517 | Enable automatic file locking by passing a true value to the C<locking> |
518 | parameter when constructing your DBM::Deep object (see L<SETUP> above). |
519 | |
520 | my $db = DBM::Deep->new( |
521 | file => "foo.db", |
522 | locking => 1 |
523 | ); |
524 | |
525 | This causes DBM::Deep to C<flock()> the underlying filehandle with exclusive |
526 | mode for writes, and shared mode for reads. This is required if you have |
527 | multiple processes accessing the same database file, to avoid file corruption. |
528 | Please note that C<flock()> does NOT work for files over NFS. See L<DB OVER |
529 | NFS> below for more. |
530 | |
531 | =head2 EXPLICIT LOCKING |
532 | |
533 | You can explicitly lock a database, so it remains locked for multiple |
534 | actions. This is done by calling the C<lock()> method, and passing an |
535 | optional lock mode argument (defaults to exclusive mode). This is particularly |
536 | useful for things like counters, where the current value needs to be fetched, |
537 | then incremented, then stored again. |
538 | |
539 | $db->lock(); |
540 | my $counter = $db->get("counter"); |
541 | $counter++; |
542 | $db->put("counter", $counter); |
543 | $db->unlock(); |
544 | |
545 | # or... |
546 | |
547 | $db->lock(); |
548 | $db->{counter}++; |
549 | $db->unlock(); |
550 | |
551 | You can pass C<lock()> an optional argument, which specifies which mode to use |
552 | (exclusive or shared). Use one of these two constants: |
553 | C<DBM::Deep-E<gt>LOCK_EX> or C<DBM::Deep-E<gt>LOCK_SH>. These are passed |
554 | directly to C<flock()>, and are the same as the constants defined in Perl's |
555 | L<Fcntl/> module. |
556 | |
557 | $db->lock( $db->LOCK_SH ); |
558 | # something here |
559 | $db->unlock(); |
560 | |
561 | =head1 IMPORTING/EXPORTING |
562 | |
563 | You can import existing complex structures by calling the C<import()> method, |
564 | and export an entire database into an in-memory structure using the C<export()> |
565 | method. Both are examined here. |
566 | |
567 | =head2 IMPORTING |
568 | |
569 | Say you have an existing hash with nested hashes/arrays inside it. Instead of |
570 | walking the structure and adding keys/elements to the database as you go, |
571 | simply pass a reference to the C<import()> method. This recursively adds |
572 | everything to an existing DBM::Deep object for you. Here is an example: |
573 | |
574 | my $struct = { |
575 | key1 => "value1", |
576 | key2 => "value2", |
577 | array1 => [ "elem0", "elem1", "elem2" ], |
578 | hash1 => { |
579 | subkey1 => "subvalue1", |
580 | subkey2 => "subvalue2" |
581 | } |
582 | }; |
583 | |
584 | my $db = DBM::Deep->new( "foo.db" ); |
585 | $db->import( $struct ); |
586 | |
587 | print $db->{key1} . "\n"; # prints "value1" |
588 | |
589 | This recursively imports the entire C<$struct> object into C<$db>, including |
590 | all nested hashes and arrays. If the DBM::Deep object contains exsiting data, |
591 | keys are merged with the existing ones, replacing if they already exist. |
592 | The C<import()> method can be called on any database level (not just the base |
593 | level), and works with both hash and array DB types. |
594 | |
595 | B<Note:> Make sure your existing structure has no circular references in it. |
596 | These will cause an infinite loop when importing. There are plans to fix this |
597 | in a later release. |
598 | |
599 | =head2 EXPORTING |
600 | |
601 | Calling the C<export()> method on an existing DBM::Deep object will return |
602 | a reference to a new in-memory copy of the database. The export is done |
603 | recursively, so all nested hashes/arrays are all exported to standard Perl |
604 | objects. Here is an example: |
605 | |
606 | my $db = DBM::Deep->new( "foo.db" ); |
607 | |
608 | $db->{key1} = "value1"; |
609 | $db->{key2} = "value2"; |
610 | $db->{hash1} = {}; |
611 | $db->{hash1}->{subkey1} = "subvalue1"; |
612 | $db->{hash1}->{subkey2} = "subvalue2"; |
613 | |
614 | my $struct = $db->export(); |
615 | |
616 | print $struct->{key1} . "\n"; # prints "value1" |
617 | |
618 | This makes a complete copy of the database in memory, and returns a reference |
619 | to it. The C<export()> method can be called on any database level (not just |
620 | the base level), and works with both hash and array DB types. Be careful of |
621 | large databases -- you can store a lot more data in a DBM::Deep object than an |
622 | in-memory Perl structure. |
623 | |
624 | B<Note:> Make sure your database has no circular references in it. |
625 | These will cause an infinite loop when exporting. There are plans to fix this |
626 | in a later release. |
627 | |
628 | =head1 FILTERS |
629 | |
630 | DBM::Deep has a number of hooks where you can specify your own Perl function |
631 | to perform filtering on incoming or outgoing data. This is a perfect |
632 | way to extend the engine, and implement things like real-time compression or |
633 | encryption. Filtering applies to the base DB level, and all child hashes / |
634 | arrays. Filter hooks can be specified when your DBM::Deep object is first |
635 | constructed, or by calling the C<set_filter()> method at any time. There are |
636 | four available filter hooks, described below: |
637 | |
638 | =over |
639 | |
640 | =item * filter_store_key |
641 | |
642 | This filter is called whenever a hash key is stored. It |
643 | is passed the incoming key, and expected to return a transformed key. |
644 | |
645 | =item * filter_store_value |
646 | |
647 | This filter is called whenever a hash key or array element is stored. It |
648 | is passed the incoming value, and expected to return a transformed value. |
649 | |
650 | =item * filter_fetch_key |
651 | |
652 | This filter is called whenever a hash key is fetched (i.e. via |
653 | C<first_key()> or C<next_key()>). It is passed the transformed key, |
654 | and expected to return the plain key. |
655 | |
656 | =item * filter_fetch_value |
657 | |
658 | This filter is called whenever a hash key or array element is fetched. |
659 | It is passed the transformed value, and expected to return the plain value. |
660 | |
661 | =back |
662 | |
663 | Here are the two ways to setup a filter hook: |
664 | |
665 | my $db = DBM::Deep->new( |
666 | file => "foo.db", |
667 | filter_store_value => \&my_filter_store, |
668 | filter_fetch_value => \&my_filter_fetch |
669 | ); |
670 | |
671 | # or... |
672 | |
673 | $db->set_filter( "filter_store_value", \&my_filter_store ); |
674 | $db->set_filter( "filter_fetch_value", \&my_filter_fetch ); |
675 | |
676 | Your filter function will be called only when dealing with SCALAR keys or |
677 | values. When nested hashes and arrays are being stored/fetched, filtering |
678 | is bypassed. Filters are called as static functions, passed a single SCALAR |
679 | argument, and expected to return a single SCALAR value. If you want to |
680 | remove a filter, set the function reference to C<undef>: |
681 | |
682 | $db->set_filter( "filter_store_value", undef ); |
683 | |
684 | =head2 REAL-TIME ENCRYPTION EXAMPLE |
685 | |
686 | Here is a working example that uses the I<Crypt::Blowfish> module to |
687 | do real-time encryption / decryption of keys & values with DBM::Deep Filters. |
688 | Please visit L<http://search.cpan.org/search?module=Crypt::Blowfish> for more |
689 | on I<Crypt::Blowfish>. You'll also need the I<Crypt::CBC> module. |
690 | |
691 | use DBM::Deep; |
692 | use Crypt::Blowfish; |
693 | use Crypt::CBC; |
694 | |
695 | my $cipher = Crypt::CBC->new({ |
696 | 'key' => 'my secret key', |
697 | 'cipher' => 'Blowfish', |
698 | 'iv' => '$KJh#(}q', |
699 | 'regenerate_key' => 0, |
700 | 'padding' => 'space', |
701 | 'prepend_iv' => 0 |
702 | }); |
703 | |
704 | my $db = DBM::Deep->new( |
705 | file => "foo-encrypt.db", |
706 | filter_store_key => \&my_encrypt, |
707 | filter_store_value => \&my_encrypt, |
708 | filter_fetch_key => \&my_decrypt, |
709 | filter_fetch_value => \&my_decrypt, |
710 | ); |
711 | |
712 | $db->{key1} = "value1"; |
713 | $db->{key2} = "value2"; |
714 | print "key1: " . $db->{key1} . "\n"; |
715 | print "key2: " . $db->{key2} . "\n"; |
716 | |
717 | undef $db; |
718 | exit; |
719 | |
720 | sub my_encrypt { |
721 | return $cipher->encrypt( $_[0] ); |
722 | } |
723 | sub my_decrypt { |
724 | return $cipher->decrypt( $_[0] ); |
725 | } |
726 | |
727 | =head2 REAL-TIME COMPRESSION EXAMPLE |
728 | |
729 | Here is a working example that uses the I<Compress::Zlib> module to do real-time |
730 | compression / decompression of keys & values with DBM::Deep Filters. |
731 | Please visit L<http://search.cpan.org/search?module=Compress::Zlib> for |
732 | more on I<Compress::Zlib>. |
733 | |
734 | use DBM::Deep; |
735 | use Compress::Zlib; |
736 | |
737 | my $db = DBM::Deep->new( |
738 | file => "foo-compress.db", |
739 | filter_store_key => \&my_compress, |
740 | filter_store_value => \&my_compress, |
741 | filter_fetch_key => \&my_decompress, |
742 | filter_fetch_value => \&my_decompress, |
743 | ); |
744 | |
745 | $db->{key1} = "value1"; |
746 | $db->{key2} = "value2"; |
747 | print "key1: " . $db->{key1} . "\n"; |
748 | print "key2: " . $db->{key2} . "\n"; |
749 | |
750 | undef $db; |
751 | exit; |
752 | |
753 | sub my_compress { |
754 | return Compress::Zlib::memGzip( $_[0] ) ; |
755 | } |
756 | sub my_decompress { |
757 | return Compress::Zlib::memGunzip( $_[0] ) ; |
758 | } |
759 | |
760 | B<Note:> Filtering of keys only applies to hashes. Array "keys" are |
761 | actually numerical index numbers, and are not filtered. |
762 | |
763 | =head1 ERROR HANDLING |
764 | |
765 | Most DBM::Deep methods return a true value for success, and call die() on |
766 | failure. You can wrap calls in an eval block to catch the die. |
767 | |
768 | my $db = DBM::Deep->new( "foo.db" ); # create hash |
769 | eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call |
770 | |
771 | print $@; # prints error message |
772 | |
773 | =head1 LARGEFILE SUPPORT |
774 | |
775 | If you have a 64-bit system, and your Perl is compiled with both LARGEFILE |
776 | and 64-bit support, you I<may> be able to create databases larger than 2 GB. |
777 | DBM::Deep by default uses 32-bit file offset tags, but these can be changed |
778 | by specifying the 'pack_size' parameter when constructing the file. |
779 | |
780 | DBM::Deep->new( |
781 | filename => $filename, |
782 | pack_size => 'large', |
783 | ); |
784 | |
785 | This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad words |
786 | instead of 32-bit longs. After setting these values your DB files have a |
787 | theoretical maximum size of 16 XB (exabytes). |
788 | |
789 | You can also use C<pack_size =E<gt> 'small'> in order to use 16-bit file |
790 | offsets. |
791 | |
792 | B<Note:> Changing these values will B<NOT> work for existing database files. |
793 | Only change this for new files. Once the value has been set, it is stored in |
794 | the file's header and cannot be changed for the life of the file. These |
795 | parameters are per-file, meaning you can access 32-bit and 64-bit files, as |
796 | you choose. |
797 | |
798 | B<Note:> We have not personally tested files larger than 2 GB -- all my |
799 | systems have only a 32-bit Perl. However, I have received user reports that |
800 | this does indeed work! |
801 | |
802 | =head1 LOW-LEVEL ACCESS |
803 | |
804 | If you require low-level access to the underlying filehandle that DBM::Deep uses, |
805 | you can call the C<_fh()> method, which returns the handle: |
806 | |
807 | my $fh = $db->_fh(); |
808 | |
809 | This method can be called on the root level of the datbase, or any child |
810 | hashes or arrays. All levels share a I<root> structure, which contains things |
811 | like the filehandle, a reference counter, and all the options specified |
812 | when you created the object. You can get access to this file object by |
813 | calling the C<_storage()> method. |
814 | |
815 | my $file_obj = $db->_storage(); |
816 | |
817 | This is useful for changing options after the object has already been created, |
818 | such as enabling/disabling locking. You can also store your own temporary user |
819 | data in this structure (be wary of name collision), which is then accessible from |
820 | any child hash or array. |
821 | |
822 | =head1 CUSTOM DIGEST ALGORITHM |
823 | |
824 | DBM::Deep by default uses the I<Message Digest 5> (MD5) algorithm for hashing |
825 | keys. However you can override this, and use another algorithm (such as SHA-256) |
826 | or even write your own. But please note that DBM::Deep currently expects zero |
827 | collisions, so your algorithm has to be I<perfect>, so to speak. Collision |
828 | detection may be introduced in a later version. |
829 | |
830 | You can specify a custom digest algorithm by passing it into the parameter |
831 | list for new(), passing a reference to a subroutine as the 'digest' parameter, |
832 | and the length of the algorithm's hashes (in bytes) as the 'hash_size' |
833 | parameter. Here is a working example that uses a 256-bit hash from the |
834 | I<Digest::SHA256> module. Please see |
835 | L<http://search.cpan.org/search?module=Digest::SHA256> for more information. |
836 | |
837 | use DBM::Deep; |
838 | use Digest::SHA256; |
839 | |
840 | my $context = Digest::SHA256::new(256); |
841 | |
842 | my $db = DBM::Deep->new( |
843 | filename => "foo-sha.db", |
844 | digest => \&my_digest, |
845 | hash_size => 32, |
846 | ); |
847 | |
848 | $db->{key1} = "value1"; |
849 | $db->{key2} = "value2"; |
850 | print "key1: " . $db->{key1} . "\n"; |
851 | print "key2: " . $db->{key2} . "\n"; |
852 | |
853 | undef $db; |
854 | exit; |
855 | |
856 | sub my_digest { |
857 | return substr( $context->hash($_[0]), 0, 32 ); |
858 | } |
859 | |
860 | B<Note:> Your returned digest strings must be B<EXACTLY> the number |
861 | of bytes you specify in the hash_size parameter (in this case 32). |
862 | |
863 | B<Note:> If you do choose to use a custom digest algorithm, you must set it |
864 | every time you access this file. Otherwise, the default (MD5) will be used. |
865 | |
866 | =head1 CIRCULAR REFERENCES |
867 | |
868 | B<NOTE>: DBM::Deep 0.99_03 has turned off circular references pending |
869 | evaluation of some edge cases. I hope to be able to re-enable circular |
870 | references in a future version prior to 1.00. |
871 | |
872 | DBM::Deep has B<experimental> support for circular references. Meaning you |
873 | can have a nested hash key or array element that points to a parent object. |
874 | This relationship is stored in the DB file, and is preserved between sessions. |
875 | Here is an example: |
876 | |
877 | my $db = DBM::Deep->new( "foo.db" ); |
878 | |
879 | $db->{foo} = "bar"; |
880 | $db->{circle} = $db; # ref to self |
881 | |
882 | print $db->{foo} . "\n"; # prints "bar" |
883 | print $db->{circle}->{foo} . "\n"; # prints "bar" again |
884 | |
885 | B<Note>: Passing the object to a function that recursively walks the |
886 | object tree (such as I<Data::Dumper> or even the built-in C<optimize()> or |
887 | C<export()> methods) will result in an infinite loop. This will be fixed in |
888 | a future release. |
889 | |
890 | =head1 TRANSACTIONS |
891 | |
892 | New in 0.99_01 is ACID transactions. Every DBM::Deep object is completely |
42c6906e |
893 | transaction-ready - it is not an option you have to turn on. You do have to |
894 | specify how many transactions may run simultaneously (q.v. L</num_txns>). |
895 | |
896 | Three new methods have been added to support them. They are: |
de82ff48 |
897 | |
898 | =over 4 |
899 | |
900 | =item * begin_work() |
901 | |
902 | This starts a transaction. |
903 | |
904 | =item * commit() |
905 | |
906 | This applies the changes done within the transaction to the mainline and ends |
907 | the transaction. |
908 | |
909 | =item * rollback() |
910 | |
911 | This discards the changes done within the transaction to the mainline and ends |
912 | the transaction. |
913 | |
914 | =back |
915 | |
916 | Transactions in DBM::Deep are done using the MVCC method, the same method used |
42c6906e |
917 | by the InnoDB MySQL engine. |
de82ff48 |
918 | |
919 | =head1 PERFORMANCE |
920 | |
921 | Because DBM::Deep is a conncurrent datastore, every change is flushed to disk |
922 | immediately and every read goes to disk. This means that DBM::Deep functions |
923 | at the speed of disk (generally 10-20ms) vs. the speed of RAM (generally |
924 | 50-70ns), or at least 150-200x slower than the comparable in-memory |
925 | datastructure in Perl. |
926 | |
927 | There are several techniques you can use to speed up how DBM::Deep functions. |
928 | |
929 | =over 4 |
930 | |
931 | =item * Put it on a ramdisk |
932 | |
933 | The easiest and quickest mechanism to making DBM::Deep run faster is to create |
934 | a ramdisk and locate the DBM::Deep file there. Doing this as an option may |
935 | become a feature of DBM::Deep, assuming there is a good ramdisk wrapper on CPAN. |
936 | |
937 | =item * Work at the tightest level possible |
938 | |
939 | It is much faster to assign the level of your db that you are working with to |
940 | an intermediate variable than to re-look it up every time. Thus |
941 | |
942 | # BAD |
943 | while ( my ($k, $v) = each %{$db->{foo}{bar}{baz}} ) { |
944 | ... |
945 | } |
946 | |
947 | # GOOD |
948 | my $x = $db->{foo}{bar}{baz}; |
949 | while ( my ($k, $v) = each %$x ) { |
950 | ... |
951 | } |
952 | |
953 | =item * Make your file as tight as possible |
954 | |
955 | If you know that you are not going to use more than 65K in your database, |
956 | consider using the C<pack_size =#<gt> 'small'> option. This will instruct |
957 | DBM::Deep to use 16bit addresses, meaning that the seek times will be less. |
958 | The same goes with the number of transactions. num_Txns defaults to 16. If you |
959 | can set that to 1 or 2, that will reduce the file-size considerably, thus |
960 | reducing seek times. |
961 | |
962 | =back |
963 | |
964 | =head1 CAVEATS / ISSUES / BUGS |
965 | |
966 | This section describes all the known issues with DBM::Deep. It you have found |
967 | something that is not listed here, please send e-mail to L<jhuckaby@cpan.org>. |
968 | |
969 | =head2 REFERENCES |
970 | |
971 | (The reasons given assume a high level of Perl understanding, specifically of |
972 | references. You can safely skip this section.) |
973 | |
974 | Currently, the only references supported are HASH and ARRAY. The other reference |
975 | types (SCALAR, CODE, GLOB, and REF) cannot be supported for various reasons. |
976 | |
977 | =over 4 |
978 | |
979 | =item * GLOB |
980 | |
981 | These are things like filehandles and other sockets. They can't be supported |
982 | because it's completely unclear how DBM::Deep should serialize them. |
983 | |
984 | =item * SCALAR / REF |
985 | |
986 | The discussion here refers to the following type of example: |
987 | |
988 | my $x = 25; |
989 | $db->{key1} = \$x; |
990 | |
991 | $x = 50; |
992 | |
993 | # In some other process ... |
994 | |
995 | my $val = ${ $db->{key1} }; |
996 | |
997 | is( $val, 50, "What actually gets stored in the DB file?" ); |
998 | |
999 | The problem is one of synchronization. When the variable being referred to |
1000 | changes value, the reference isn't notified. This means that the new value won't |
1001 | be stored in the datafile for other processes to read. There is no TIEREF. |
1002 | |
1003 | It is theoretically possible to store references to values already within a |
1004 | DBM::Deep object because everything already is synchronized, but the change to |
1005 | the internals would be quite large. Specifically, DBM::Deep would have to tie |
1006 | every single value that is stored. This would bloat the RAM footprint of |
1007 | DBM::Deep at least twofold (if not more) and be a significant performance drain, |
1008 | all to support a feature that has never been requested. |
1009 | |
1010 | =item * CODE |
1011 | |
1012 | L<Data::Dump::Streamer/> provides a mechanism for serializing coderefs, |
1013 | including saving off all closure state. However, just as for SCALAR and REF, |
1014 | that closure state may change without notifying the DBM::Deep object storing |
1015 | the reference. |
1016 | |
1017 | =back |
1018 | |
1019 | =head2 FILE CORRUPTION |
1020 | |
1021 | The current level of error handling in DBM::Deep is minimal. Files I<are> checked |
1022 | for a 32-bit signature when opened, but other corruption in files can cause |
1023 | segmentation faults. DBM::Deep may try to seek() past the end of a file, or get |
1024 | stuck in an infinite loop depending on the level of corruption. File write |
1025 | operations are not checked for failure (for speed), so if you happen to run |
1026 | out of disk space, DBM::Deep will probably fail in a bad way. These things will |
1027 | be addressed in a later version of DBM::Deep. |
1028 | |
1029 | =head2 DB OVER NFS |
1030 | |
1031 | Beware of using DBM::Deep files over NFS. DBM::Deep uses flock(), which works |
1032 | well on local filesystems, but will NOT protect you from file corruption over |
1033 | NFS. I've heard about setting up your NFS server with a locking daemon, then |
1034 | using lockf() to lock your files, but your mileage may vary there as well. |
1035 | From what I understand, there is no real way to do it. However, if you need |
1036 | access to the underlying filehandle in DBM::Deep for using some other kind of |
1037 | locking scheme like lockf(), see the L<LOW-LEVEL ACCESS> section above. |
1038 | |
1039 | =head2 COPYING OBJECTS |
1040 | |
1041 | Beware of copying tied objects in Perl. Very strange things can happen. |
1042 | Instead, use DBM::Deep's C<clone()> method which safely copies the object and |
1043 | returns a new, blessed, tied hash or array to the same level in the DB. |
1044 | |
1045 | my $copy = $db->clone(); |
1046 | |
1047 | B<Note>: Since clone() here is cloning the object, not the database location, any |
1048 | modifications to either $db or $copy will be visible to both. |
1049 | |
1050 | =head2 LARGE ARRAYS |
1051 | |
1052 | Beware of using C<shift()>, C<unshift()> or C<splice()> with large arrays. |
1053 | These functions cause every element in the array to move, which can be murder |
1054 | on DBM::Deep, as every element has to be fetched from disk, then stored again in |
1055 | a different location. This will be addressed in a future version. |
1056 | |
1057 | =head2 WRITEONLY FILES |
1058 | |
1059 | If you pass in a filehandle to new(), you may have opened it in either a readonly or |
1060 | writeonly mode. STORE will verify that the filehandle is writable. However, there |
1061 | doesn't seem to be a good way to determine if a filehandle is readable. And, if the |
1062 | filehandle isn't readable, it's not clear what will happen. So, don't do that. |
1063 | |
1064 | =head1 CODE COVERAGE |
1065 | |
1066 | B<Devel::Cover> is used to test the code coverage of the tests. Below is the |
1067 | B<Devel::Cover> report on this distribution's test suite. |
1068 | |
1069 | ---------------------------- ------ ------ ------ ------ ------ ------ ------ |
1070 | File stmt bran cond sub pod time total |
1071 | ---------------------------- ------ ------ ------ ------ ------ ------ ------ |
3a917474 |
1072 | blib/lib/DBM/Deep.pm 96.8 87.9 90.5 100.0 89.5 4.5 95.2 |
1073 | blib/lib/DBM/Deep/Array.pm 100.0 94.3 100.0 100.0 100.0 4.9 98.7 |
1074 | blib/lib/DBM/Deep/Engine.pm 96.9 85.2 79.7 100.0 0.0 58.2 90.3 |
1075 | blib/lib/DBM/Deep/File.pm 99.0 88.9 77.8 100.0 0.0 30.0 90.3 |
1076 | blib/lib/DBM/Deep/Hash.pm 100.0 100.0 100.0 100.0 100.0 2.4 100.0 |
1077 | Total 97.6 87.9 84.0 100.0 32.1 100.0 92.8 |
de82ff48 |
1078 | ---------------------------- ------ ------ ------ ------ ------ ------ ------ |
7578f62a |
1079 | |
de82ff48 |
1080 | =head1 MORE INFORMATION |
1081 | |
1082 | Check out the DBM::Deep Google Group at L<http://groups.google.com/group/DBM-Deep> |
1083 | or send email to L<DBM-Deep@googlegroups.com>. You can also visit #dbm-deep on |
1084 | irc.perl.org |
1085 | |
1086 | The source code repository is at L<http://svn.perl.org/modules/DBM-Deep> |
1087 | |
1088 | =head1 MAINTAINER(S) |
1089 | |
1090 | Rob Kinyon, L<rkinyon@cpan.org> |
1091 | |
1092 | Originally written by Joseph Huckaby, L<jhuckaby@cpan.org> |
1093 | |
1094 | Special thanks to Adam Sah and Rich Gaushell! You know why :-) |
1095 | |
1096 | Additional thanks go out to Stonehenge who have sponsored the 1.00 release. |
1097 | |
1098 | =head1 SEE ALSO |
1099 | |
1100 | perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3), nfs(5), |
1101 | Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3) |
1102 | |
1103 | =head1 LICENSE |
1104 | |
1105 | Copyright (c) 2007 Rob Kinyon. All Rights Reserved. |
1106 | This is free software, you may use it and distribute it under the |
1107 | same terms as Perl itself. |
1108 | |
1109 | =cut |