Commit | Line | Data |
2120a181 |
1 | =head1 NAME |
2 | |
3 | DBM::Deep - A pure perl multi-level hash/array DBM that supports transactions |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | use DBM::Deep; |
8 | my $db = DBM::Deep->new( "foo.db" ); |
9 | |
10 | $db->{key} = 'value'; |
11 | print $db->{key}; |
12 | |
13 | $db->put('key' => 'value'); |
14 | print $db->get('key'); |
15 | |
16 | # true multi-level support |
17 | $db->{my_complex} = [ |
18 | 'hello', { perl => 'rules' }, |
19 | 42, 99, |
20 | ]; |
21 | |
22 | $db->begin_work; |
23 | |
24 | # Do stuff here |
25 | |
26 | $db->rollback; |
27 | $db->commit; |
28 | |
29 | tie my %db, 'DBM::Deep', 'foo.db'; |
30 | $db{key} = 'value'; |
31 | print $db{key}; |
32 | |
33 | tied(%db)->put('key' => 'value'); |
34 | print tied(%db)->get('key'); |
35 | |
36 | =head1 DESCRIPTION |
37 | |
38 | A unique flat-file database module, written in pure perl. True multi-level |
39 | hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() |
40 | interface, cross-platform FTPable files, ACID transactions, and is quite fast. |
41 | Can handle millions of keys and unlimited levels without significant |
42 | slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper |
43 | around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and |
44 | Windows. |
45 | |
46 | =head1 VERSION DIFFERENCES |
47 | |
807f63a7 |
48 | B<NOTE>: 1.0000 has significant file format differences from prior versions. |
49 | THere is a backwards-compatibility layer at C<utils/upgrade_db.pl>. Files |
50 | created by 1.0000 or higher are B<NOT> compatible with scripts using prior |
51 | versions. |
2120a181 |
52 | |
53 | =head1 SETUP |
54 | |
55 | Construction can be done OO-style (which is the recommended way), or using |
56 | Perl's tie() function. Both are examined here. |
57 | |
58 | =head2 OO Construction |
59 | |
60 | The recommended way to construct a DBM::Deep object is to use the new() |
61 | method, which gets you a blessed I<and> tied hash (or array) reference. |
62 | |
63 | my $db = DBM::Deep->new( "foo.db" ); |
64 | |
65 | This opens a new database handle, mapped to the file "foo.db". If this |
66 | file does not exist, it will automatically be created. DB files are |
67 | opened in "r+" (read/write) mode, and the type of object returned is a |
68 | hash, unless otherwise specified (see L<OPTIONS> below). |
69 | |
70 | You can pass a number of options to the constructor to specify things like |
71 | locking, autoflush, etc. This is done by passing an inline hash (or hashref): |
72 | |
73 | my $db = DBM::Deep->new( |
74 | file => "foo.db", |
75 | locking => 1, |
76 | autoflush => 1 |
77 | ); |
78 | |
79 | Notice that the filename is now specified I<inside> the hash with |
80 | the "file" parameter, as opposed to being the sole argument to the |
81 | constructor. This is required if any options are specified. |
82 | See L<OPTIONS> below for the complete list. |
83 | |
84 | You can also start with an array instead of a hash. For this, you must |
85 | specify the C<type> parameter: |
86 | |
87 | my $db = DBM::Deep->new( |
88 | file => "foo.db", |
89 | type => DBM::Deep->TYPE_ARRAY |
90 | ); |
91 | |
92 | B<Note:> Specifing the C<type> parameter only takes effect when beginning |
93 | a new DB file. If you create a DBM::Deep object with an existing file, the |
94 | C<type> will be loaded from the file header, and an error will be thrown if |
95 | the wrong type is passed in. |
96 | |
97 | =head2 Tie Construction |
98 | |
99 | Alternately, you can create a DBM::Deep handle by using Perl's built-in |
100 | tie() function. The object returned from tie() can be used to call methods, |
101 | such as lock() and unlock(). (That object can be retrieved from the tied |
b8370759 |
102 | variable at any time using tied() - please see L<perltie> for more info. |
2120a181 |
103 | |
104 | my %hash; |
105 | my $db = tie %hash, "DBM::Deep", "foo.db"; |
106 | |
107 | my @array; |
108 | my $db = tie @array, "DBM::Deep", "bar.db"; |
109 | |
110 | As with the OO constructor, you can replace the DB filename parameter with |
111 | a hash containing one or more options (see L<OPTIONS> just below for the |
112 | complete list). |
113 | |
114 | tie %hash, "DBM::Deep", { |
115 | file => "foo.db", |
116 | locking => 1, |
117 | autoflush => 1 |
118 | }; |
119 | |
120 | =head2 Options |
121 | |
122 | There are a number of options that can be passed in when constructing your |
123 | DBM::Deep objects. These apply to both the OO- and tie- based approaches. |
124 | |
125 | =over |
126 | |
127 | =item * file |
128 | |
129 | Filename of the DB file to link the handle to. You can pass a full absolute |
130 | filesystem path, partial path, or a plain filename if the file is in the |
131 | current working directory. This is a required parameter (though q.v. fh). |
132 | |
133 | =item * fh |
134 | |
135 | If you want, you can pass in the fh instead of the file. This is most useful for doing |
136 | something like: |
137 | |
138 | my $db = DBM::Deep->new( { fh => \*DATA } ); |
139 | |
140 | You are responsible for making sure that the fh has been opened appropriately for your |
141 | needs. If you open it read-only and attempt to write, an exception will be thrown. If you |
142 | open it write-only or append-only, an exception will be thrown immediately as DBM::Deep |
143 | needs to read from the fh. |
144 | |
145 | =item * file_offset |
146 | |
147 | This is the offset within the file that the DBM::Deep db starts. Most of the time, you will |
148 | not need to set this. However, it's there if you want it. |
149 | |
150 | If you pass in fh and do not set this, it will be set appropriately. |
151 | |
152 | =item * type |
153 | |
154 | This parameter specifies what type of object to create, a hash or array. Use |
155 | one of these two constants: |
156 | |
157 | =over 4 |
158 | |
159 | =item * C<DBM::Deep-E<gt>TYPE_HASH> |
160 | |
161 | =item * C<DBM::Deep-E<gt>TYPE_ARRAY>. |
162 | |
163 | =back |
164 | |
165 | This only takes effect when beginning a new file. This is an optional |
166 | parameter, and defaults to C<DBM::Deep-E<gt>TYPE_HASH>. |
167 | |
168 | =item * locking |
169 | |
170 | Specifies whether locking is to be enabled. DBM::Deep uses Perl's flock() |
171 | function to lock the database in exclusive mode for writes, and shared mode |
172 | for reads. Pass any true value to enable. This affects the base DB handle |
173 | I<and any child hashes or arrays> that use the same DB file. This is an |
174 | optional parameter, and defaults to 1 (enabled). See L<LOCKING> below for |
175 | more. |
176 | |
177 | =item * autoflush |
178 | |
179 | Specifies whether autoflush is to be enabled on the underlying filehandle. |
180 | This obviously slows down write operations, but is required if you may have |
181 | multiple processes accessing the same DB file (also consider enable I<locking>). |
182 | Pass any true value to enable. This is an optional parameter, and defaults to 1 |
183 | (enabled). |
184 | |
185 | =item * filter_* |
186 | |
187 | See L</FILTERS> below. |
188 | |
189 | =back |
190 | |
191 | The following parameters may be specified in the constructor the first time the |
192 | datafile is created. However, they will be stored in the header of the file and |
193 | cannot be overridden by subsequent openings of the file - the values will be set |
194 | from the values stored in the datafile's header. |
195 | |
196 | =over 4 |
197 | |
198 | =item * num_txns |
199 | |
e9b0b5f0 |
200 | This is the number of transactions that can be running at one time. The |
201 | default is one - the HEAD. The minimum is one and the maximum is 255. The more |
202 | transactions, the larger and quicker the datafile grows. |
2120a181 |
203 | |
204 | See L</TRANSACTIONS> below. |
205 | |
206 | =item * max_buckets |
207 | |
208 | This is the number of entries that can be added before a reindexing. The larger |
209 | this number is made, the larger a file gets, but the better performance you will |
e9b0b5f0 |
210 | have. The default and minimum number this can be is 16. The maximum is 256, but |
211 | more than 64 isn't recommended. |
212 | |
213 | =item * data_sector_size |
214 | |
215 | This is the size in bytes of a given data sector. Data sectors will chain, so |
216 | a value of any size can be stored. However, chaining is expensive in terms of |
217 | time. Setting this value to something close to the expected common length of |
218 | your scalars will improve your performance. If it is too small, your file will |
219 | have a lot of chaining. If it is too large, your file will have a lot of dead |
220 | space in it. |
221 | |
222 | The default for this is 64 bytes. The minimum value is 32 and the maximum is |
223 | 256 bytes. |
224 | |
225 | B<Note:> There are between 6 and 10 bytes taken up in each data sector for |
226 | bookkeeping. (It's 4 + the number of bytes in your L</pack_size>.) This is |
227 | included within the data_sector_size, thus the effective value is 6-10 bytes |
228 | less than what you specified. |
2120a181 |
229 | |
230 | =item * pack_size |
231 | |
232 | This is the size of the file pointer used throughout the file. The valid values |
233 | are: |
234 | |
235 | =over 4 |
236 | |
237 | =item * small |
238 | |
e9b0b5f0 |
239 | This uses 2-byte offsets, allowing for a maximum file size of 65 KB. |
2120a181 |
240 | |
241 | =item * medium (default) |
242 | |
e9b0b5f0 |
243 | This uses 4-byte offsets, allowing for a maximum file size of 4 GB. |
2120a181 |
244 | |
245 | =item * large |
246 | |
e9b0b5f0 |
247 | This uses 8-byte offsets, allowing for a maximum file size of 16 XB |
248 | (exabytes). This can only be enabled if your Perl is compiled for 64-bit. |
2120a181 |
249 | |
250 | =back |
251 | |
252 | See L</LARGEFILE SUPPORT> for more information. |
253 | |
254 | =back |
255 | |
256 | =head1 TIE INTERFACE |
257 | |
258 | With DBM::Deep you can access your databases using Perl's standard hash/array |
259 | syntax. Because all DBM::Deep objects are I<tied> to hashes or arrays, you can |
260 | treat them as such. DBM::Deep will intercept all reads/writes and direct them |
261 | to the right place -- the DB file. This has nothing to do with the |
262 | L<TIE CONSTRUCTION> section above. This simply tells you how to use DBM::Deep |
263 | using regular hashes and arrays, rather than calling functions like C<get()> |
264 | and C<put()> (although those work too). It is entirely up to you how to want |
265 | to access your databases. |
266 | |
267 | =head2 Hashes |
268 | |
269 | You can treat any DBM::Deep object like a normal Perl hash reference. Add keys, |
270 | or even nested hashes (or arrays) using standard Perl syntax: |
271 | |
272 | my $db = DBM::Deep->new( "foo.db" ); |
273 | |
274 | $db->{mykey} = "myvalue"; |
275 | $db->{myhash} = {}; |
276 | $db->{myhash}->{subkey} = "subvalue"; |
277 | |
278 | print $db->{myhash}->{subkey} . "\n"; |
279 | |
280 | You can even step through hash keys using the normal Perl C<keys()> function: |
281 | |
282 | foreach my $key (keys %$db) { |
283 | print "$key: " . $db->{$key} . "\n"; |
284 | } |
285 | |
286 | Remember that Perl's C<keys()> function extracts I<every> key from the hash and |
287 | pushes them onto an array, all before the loop even begins. If you have an |
288 | extremely large hash, this may exhaust Perl's memory. Instead, consider using |
289 | Perl's C<each()> function, which pulls keys/values one at a time, using very |
290 | little memory: |
291 | |
292 | while (my ($key, $value) = each %$db) { |
293 | print "$key: $value\n"; |
294 | } |
295 | |
296 | Please note that when using C<each()>, you should always pass a direct |
297 | hash reference, not a lookup. Meaning, you should B<never> do this: |
298 | |
299 | # NEVER DO THIS |
300 | while (my ($key, $value) = each %{$db->{foo}}) { # BAD |
301 | |
302 | This causes an infinite loop, because for each iteration, Perl is calling |
303 | FETCH() on the $db handle, resulting in a "new" hash for foo every time, so |
304 | it effectively keeps returning the first key over and over again. Instead, |
305 | assign a temporary variable to C<$db->{foo}>, then pass that to each(). |
306 | |
307 | =head2 Arrays |
308 | |
309 | As with hashes, you can treat any DBM::Deep object like a normal Perl array |
310 | reference. This includes inserting, removing and manipulating elements, |
311 | and the C<push()>, C<pop()>, C<shift()>, C<unshift()> and C<splice()> functions. |
312 | The object must have first been created using type C<DBM::Deep-E<gt>TYPE_ARRAY>, |
313 | or simply be a nested array reference inside a hash. Example: |
314 | |
315 | my $db = DBM::Deep->new( |
316 | file => "foo-array.db", |
317 | type => DBM::Deep->TYPE_ARRAY |
318 | ); |
319 | |
320 | $db->[0] = "foo"; |
321 | push @$db, "bar", "baz"; |
322 | unshift @$db, "bah"; |
323 | |
324 | my $last_elem = pop @$db; # baz |
325 | my $first_elem = shift @$db; # bah |
326 | my $second_elem = $db->[1]; # bar |
327 | |
328 | my $num_elements = scalar @$db; |
329 | |
330 | =head1 OO INTERFACE |
331 | |
332 | In addition to the I<tie()> interface, you can also use a standard OO interface |
333 | to manipulate all aspects of DBM::Deep databases. Each type of object (hash or |
334 | array) has its own methods, but both types share the following common methods: |
335 | C<put()>, C<get()>, C<exists()>, C<delete()> and C<clear()>. C<fetch()> and |
336 | C<store(> are aliases to C<put()> and C<get()>, respectively. |
337 | |
338 | =over |
339 | |
340 | =item * new() / clone() |
341 | |
342 | These are the constructor and copy-functions. |
343 | |
344 | =item * put() / store() |
345 | |
346 | Stores a new hash key/value pair, or sets an array element value. Takes two |
347 | arguments, the hash key or array index, and the new value. The value can be |
348 | a scalar, hash ref or array ref. Returns true on success, false on failure. |
349 | |
350 | $db->put("foo", "bar"); # for hashes |
351 | $db->put(1, "bar"); # for arrays |
352 | |
353 | =item * get() / fetch() |
354 | |
355 | Fetches the value of a hash key or array element. Takes one argument: the hash |
356 | key or array index. Returns a scalar, hash ref or array ref, depending on the |
357 | data type stored. |
358 | |
359 | my $value = $db->get("foo"); # for hashes |
360 | my $value = $db->get(1); # for arrays |
361 | |
362 | =item * exists() |
363 | |
364 | Checks if a hash key or array index exists. Takes one argument: the hash key |
365 | or array index. Returns true if it exists, false if not. |
366 | |
367 | if ($db->exists("foo")) { print "yay!\n"; } # for hashes |
368 | if ($db->exists(1)) { print "yay!\n"; } # for arrays |
369 | |
370 | =item * delete() |
371 | |
372 | Deletes one hash key/value pair or array element. Takes one argument: the hash |
373 | key or array index. Returns true on success, false if not found. For arrays, |
374 | the remaining elements located after the deleted element are NOT moved over. |
375 | The deleted element is essentially just undefined, which is exactly how Perl's |
376 | internal arrays work. |
377 | |
378 | $db->delete("foo"); # for hashes |
379 | $db->delete(1); # for arrays |
380 | |
381 | =item * clear() |
382 | |
383 | Deletes B<all> hash keys or array elements. Takes no arguments. No return |
384 | value. |
385 | |
386 | $db->clear(); # hashes or arrays |
387 | |
5c0756fc |
388 | =item * lock_exclusive() / lock_shared() / lock() / unlock() |
2120a181 |
389 | |
e00d0eb3 |
390 | q.v. L</LOCKING> for more info. |
2120a181 |
391 | |
392 | =item * optimize() |
393 | |
e00d0eb3 |
394 | This will compress the datafile so that it takes up as little space as possible. |
395 | There is a freespace manager so that when space is freed up, it is used before |
396 | extending the size of the datafile. But, that freespace just sits in the datafile |
397 | unless C<optimize()> is called. |
2120a181 |
398 | |
e00d0eb3 |
399 | =item * import() |
2120a181 |
400 | |
e00d0eb3 |
401 | Unlike simple assignment, C<import()> does not tie the right-hand side. Instead, |
402 | a copy of your data is put into the DB. C<import()> takes either an arrayref (if |
403 | your DB is an array) or a hashref (if your DB is a hash). C<import()> will die |
404 | if anything else is passed in. |
405 | |
406 | =item * export() |
407 | |
408 | This returns a complete copy of the data structure at the point you do the export. |
409 | This copy is in RAM, not on disk like the DB is. |
2120a181 |
410 | |
411 | =item * begin_work() / commit() / rollback() |
412 | |
413 | These are the transactional functions. L</TRANSACTIONS> for more information. |
414 | |
415 | =back |
416 | |
417 | =head2 Hashes |
418 | |
419 | For hashes, DBM::Deep supports all the common methods described above, and the |
420 | following additional methods: C<first_key()> and C<next_key()>. |
421 | |
422 | =over |
423 | |
424 | =item * first_key() |
425 | |
426 | Returns the "first" key in the hash. As with built-in Perl hashes, keys are |
427 | fetched in an undefined order (which appears random). Takes no arguments, |
428 | returns the key as a scalar value. |
429 | |
430 | my $key = $db->first_key(); |
431 | |
432 | =item * next_key() |
433 | |
434 | Returns the "next" key in the hash, given the previous one as the sole argument. |
435 | Returns undef if there are no more keys to be fetched. |
436 | |
437 | $key = $db->next_key($key); |
438 | |
439 | =back |
440 | |
441 | Here are some examples of using hashes: |
442 | |
443 | my $db = DBM::Deep->new( "foo.db" ); |
444 | |
445 | $db->put("foo", "bar"); |
446 | print "foo: " . $db->get("foo") . "\n"; |
447 | |
448 | $db->put("baz", {}); # new child hash ref |
449 | $db->get("baz")->put("buz", "biz"); |
450 | print "buz: " . $db->get("baz")->get("buz") . "\n"; |
451 | |
452 | my $key = $db->first_key(); |
453 | while ($key) { |
454 | print "$key: " . $db->get($key) . "\n"; |
455 | $key = $db->next_key($key); |
456 | } |
457 | |
458 | if ($db->exists("foo")) { $db->delete("foo"); } |
459 | |
460 | =head2 Arrays |
461 | |
462 | For arrays, DBM::Deep supports all the common methods described above, and the |
463 | following additional methods: C<length()>, C<push()>, C<pop()>, C<shift()>, |
464 | C<unshift()> and C<splice()>. |
465 | |
466 | =over |
467 | |
468 | =item * length() |
469 | |
470 | Returns the number of elements in the array. Takes no arguments. |
471 | |
472 | my $len = $db->length(); |
473 | |
474 | =item * push() |
475 | |
476 | Adds one or more elements onto the end of the array. Accepts scalars, hash |
477 | refs or array refs. No return value. |
478 | |
479 | $db->push("foo", "bar", {}); |
480 | |
481 | =item * pop() |
482 | |
483 | Fetches the last element in the array, and deletes it. Takes no arguments. |
484 | Returns undef if array is empty. Returns the element value. |
485 | |
486 | my $elem = $db->pop(); |
487 | |
488 | =item * shift() |
489 | |
490 | Fetches the first element in the array, deletes it, then shifts all the |
491 | remaining elements over to take up the space. Returns the element value. This |
492 | method is not recommended with large arrays -- see L<LARGE ARRAYS> below for |
493 | details. |
494 | |
495 | my $elem = $db->shift(); |
496 | |
497 | =item * unshift() |
498 | |
499 | Inserts one or more elements onto the beginning of the array, shifting all |
500 | existing elements over to make room. Accepts scalars, hash refs or array refs. |
501 | No return value. This method is not recommended with large arrays -- see |
502 | <LARGE ARRAYS> below for details. |
503 | |
504 | $db->unshift("foo", "bar", {}); |
505 | |
506 | =item * splice() |
507 | |
508 | Performs exactly like Perl's built-in function of the same name. See L<perldoc |
509 | -f splice> for usage -- it is too complicated to document here. This method is |
510 | not recommended with large arrays -- see L<LARGE ARRAYS> below for details. |
511 | |
512 | =back |
513 | |
514 | Here are some examples of using arrays: |
515 | |
516 | my $db = DBM::Deep->new( |
517 | file => "foo.db", |
518 | type => DBM::Deep->TYPE_ARRAY |
519 | ); |
520 | |
521 | $db->push("bar", "baz"); |
522 | $db->unshift("foo"); |
523 | $db->put(3, "buz"); |
524 | |
525 | my $len = $db->length(); |
526 | print "length: $len\n"; # 4 |
527 | |
528 | for (my $k=0; $k<$len; $k++) { |
529 | print "$k: " . $db->get($k) . "\n"; |
530 | } |
531 | |
532 | $db->splice(1, 2, "biz", "baf"); |
533 | |
534 | while (my $elem = shift @$db) { |
535 | print "shifted: $elem\n"; |
536 | } |
537 | |
538 | =head1 LOCKING |
539 | |
540 | Enable or disable automatic file locking by passing a boolean value to the |
541 | C<locking> parameter when constructing your DBM::Deep object (see L<SETUP> |
1cff45d7 |
542 | above). |
2120a181 |
543 | |
544 | my $db = DBM::Deep->new( |
545 | file => "foo.db", |
546 | locking => 1 |
547 | ); |
548 | |
549 | This causes DBM::Deep to C<flock()> the underlying filehandle with exclusive |
550 | mode for writes, and shared mode for reads. This is required if you have |
551 | multiple processes accessing the same database file, to avoid file corruption. |
552 | Please note that C<flock()> does NOT work for files over NFS. See L<DB OVER |
553 | NFS> below for more. |
554 | |
555 | =head2 Explicit Locking |
556 | |
557 | You can explicitly lock a database, so it remains locked for multiple |
558 | actions. This is done by calling the C<lock()> method, and passing an |
559 | optional lock mode argument (defaults to exclusive mode). This is particularly |
560 | useful for things like counters, where the current value needs to be fetched, |
561 | then incremented, then stored again. |
562 | |
563 | $db->lock(); |
564 | my $counter = $db->get("counter"); |
565 | $counter++; |
566 | $db->put("counter", $counter); |
567 | $db->unlock(); |
568 | |
569 | # or... |
570 | |
571 | $db->lock(); |
572 | $db->{counter}++; |
573 | $db->unlock(); |
574 | |
5c0756fc |
575 | If you want a shared lock, you will need to call C<lock_shared()>. C<lock()> is |
576 | an alias to C<lock_exclusive()>. |
2120a181 |
577 | |
45f047f8 |
578 | =head2 Win32/Cygwin |
579 | |
580 | Due to Win32 actually enforcing the read-only status of a shared lock, all |
581 | locks on Win32 and cygwin are exclusive. This is because of how autovivification |
582 | currently works. Hopefully, this will go away in a future release. |
583 | |
2120a181 |
584 | =head1 IMPORTING/EXPORTING |
585 | |
586 | You can import existing complex structures by calling the C<import()> method, |
587 | and export an entire database into an in-memory structure using the C<export()> |
588 | method. Both are examined here. |
589 | |
590 | =head2 Importing |
591 | |
592 | Say you have an existing hash with nested hashes/arrays inside it. Instead of |
593 | walking the structure and adding keys/elements to the database as you go, |
594 | simply pass a reference to the C<import()> method. This recursively adds |
595 | everything to an existing DBM::Deep object for you. Here is an example: |
596 | |
597 | my $struct = { |
598 | key1 => "value1", |
599 | key2 => "value2", |
600 | array1 => [ "elem0", "elem1", "elem2" ], |
601 | hash1 => { |
602 | subkey1 => "subvalue1", |
603 | subkey2 => "subvalue2" |
604 | } |
605 | }; |
606 | |
607 | my $db = DBM::Deep->new( "foo.db" ); |
608 | $db->import( $struct ); |
609 | |
610 | print $db->{key1} . "\n"; # prints "value1" |
611 | |
612 | This recursively imports the entire C<$struct> object into C<$db>, including |
613 | all nested hashes and arrays. If the DBM::Deep object contains exsiting data, |
614 | keys are merged with the existing ones, replacing if they already exist. |
615 | The C<import()> method can be called on any database level (not just the base |
616 | level), and works with both hash and array DB types. |
617 | |
618 | B<Note:> Make sure your existing structure has no circular references in it. |
619 | These will cause an infinite loop when importing. There are plans to fix this |
620 | in a later release. |
621 | |
2120a181 |
622 | =head2 Exporting |
623 | |
624 | Calling the C<export()> method on an existing DBM::Deep object will return |
625 | a reference to a new in-memory copy of the database. The export is done |
626 | recursively, so all nested hashes/arrays are all exported to standard Perl |
627 | objects. Here is an example: |
628 | |
629 | my $db = DBM::Deep->new( "foo.db" ); |
630 | |
631 | $db->{key1} = "value1"; |
632 | $db->{key2} = "value2"; |
633 | $db->{hash1} = {}; |
634 | $db->{hash1}->{subkey1} = "subvalue1"; |
635 | $db->{hash1}->{subkey2} = "subvalue2"; |
636 | |
637 | my $struct = $db->export(); |
638 | |
639 | print $struct->{key1} . "\n"; # prints "value1" |
640 | |
641 | This makes a complete copy of the database in memory, and returns a reference |
642 | to it. The C<export()> method can be called on any database level (not just |
643 | the base level), and works with both hash and array DB types. Be careful of |
644 | large databases -- you can store a lot more data in a DBM::Deep object than an |
645 | in-memory Perl structure. |
646 | |
647 | B<Note:> Make sure your database has no circular references in it. |
648 | These will cause an infinite loop when exporting. There are plans to fix this |
649 | in a later release. |
650 | |
651 | =head1 FILTERS |
652 | |
653 | DBM::Deep has a number of hooks where you can specify your own Perl function |
654 | to perform filtering on incoming or outgoing data. This is a perfect |
655 | way to extend the engine, and implement things like real-time compression or |
656 | encryption. Filtering applies to the base DB level, and all child hashes / |
657 | arrays. Filter hooks can be specified when your DBM::Deep object is first |
658 | constructed, or by calling the C<set_filter()> method at any time. There are |
1cff45d7 |
659 | four available filter hooks. |
660 | |
661 | =head2 set_filter() |
662 | |
663 | This method takes two paramters - the filter type and the filter subreference. |
664 | The four types are: |
2120a181 |
665 | |
666 | =over |
667 | |
668 | =item * filter_store_key |
669 | |
670 | This filter is called whenever a hash key is stored. It |
671 | is passed the incoming key, and expected to return a transformed key. |
672 | |
673 | =item * filter_store_value |
674 | |
675 | This filter is called whenever a hash key or array element is stored. It |
676 | is passed the incoming value, and expected to return a transformed value. |
677 | |
678 | =item * filter_fetch_key |
679 | |
680 | This filter is called whenever a hash key is fetched (i.e. via |
681 | C<first_key()> or C<next_key()>). It is passed the transformed key, |
682 | and expected to return the plain key. |
683 | |
684 | =item * filter_fetch_value |
685 | |
686 | This filter is called whenever a hash key or array element is fetched. |
687 | It is passed the transformed value, and expected to return the plain value. |
688 | |
689 | =back |
690 | |
691 | Here are the two ways to setup a filter hook: |
692 | |
693 | my $db = DBM::Deep->new( |
694 | file => "foo.db", |
695 | filter_store_value => \&my_filter_store, |
696 | filter_fetch_value => \&my_filter_fetch |
697 | ); |
698 | |
699 | # or... |
700 | |
701 | $db->set_filter( "filter_store_value", \&my_filter_store ); |
702 | $db->set_filter( "filter_fetch_value", \&my_filter_fetch ); |
703 | |
704 | Your filter function will be called only when dealing with SCALAR keys or |
705 | values. When nested hashes and arrays are being stored/fetched, filtering |
706 | is bypassed. Filters are called as static functions, passed a single SCALAR |
707 | argument, and expected to return a single SCALAR value. If you want to |
708 | remove a filter, set the function reference to C<undef>: |
709 | |
710 | $db->set_filter( "filter_store_value", undef ); |
711 | |
1cff45d7 |
712 | =head2 Examples |
2120a181 |
713 | |
b8370759 |
714 | Please read L<DBM::Deep::Manual> for examples of filters. |
2120a181 |
715 | |
716 | =head1 ERROR HANDLING |
717 | |
718 | Most DBM::Deep methods return a true value for success, and call die() on |
719 | failure. You can wrap calls in an eval block to catch the die. |
720 | |
721 | my $db = DBM::Deep->new( "foo.db" ); # create hash |
722 | eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call |
723 | |
724 | print $@; # prints error message |
725 | |
726 | =head1 LARGEFILE SUPPORT |
727 | |
728 | If you have a 64-bit system, and your Perl is compiled with both LARGEFILE |
e9b0b5f0 |
729 | and 64-bit support, you I<may> be able to create databases larger than 4 GB. |
2120a181 |
730 | DBM::Deep by default uses 32-bit file offset tags, but these can be changed |
731 | by specifying the 'pack_size' parameter when constructing the file. |
732 | |
733 | DBM::Deep->new( |
734 | filename => $filename, |
735 | pack_size => 'large', |
736 | ); |
737 | |
738 | This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad words |
739 | instead of 32-bit longs. After setting these values your DB files have a |
740 | theoretical maximum size of 16 XB (exabytes). |
741 | |
742 | You can also use C<pack_size =E<gt> 'small'> in order to use 16-bit file |
743 | offsets. |
744 | |
745 | B<Note:> Changing these values will B<NOT> work for existing database files. |
746 | Only change this for new files. Once the value has been set, it is stored in |
747 | the file's header and cannot be changed for the life of the file. These |
748 | parameters are per-file, meaning you can access 32-bit and 64-bit files, as |
749 | you choose. |
750 | |
1cff45d7 |
751 | B<Note:> We have not personally tested files larger than 4 GB -- all our |
752 | systems have only a 32-bit Perl. However, we have received user reports that |
e9b0b5f0 |
753 | this does indeed work. |
2120a181 |
754 | |
755 | =head1 LOW-LEVEL ACCESS |
756 | |
757 | If you require low-level access to the underlying filehandle that DBM::Deep uses, |
758 | you can call the C<_fh()> method, which returns the handle: |
759 | |
760 | my $fh = $db->_fh(); |
761 | |
762 | This method can be called on the root level of the datbase, or any child |
763 | hashes or arrays. All levels share a I<root> structure, which contains things |
764 | like the filehandle, a reference counter, and all the options specified |
765 | when you created the object. You can get access to this file object by |
766 | calling the C<_storage()> method. |
767 | |
768 | my $file_obj = $db->_storage(); |
769 | |
770 | This is useful for changing options after the object has already been created, |
771 | such as enabling/disabling locking. You can also store your own temporary user |
772 | data in this structure (be wary of name collision), which is then accessible from |
773 | any child hash or array. |
774 | |
2120a181 |
775 | =head1 CIRCULAR REFERENCES |
776 | |
1cff45d7 |
777 | DBM::Deep has full support for circular references. Meaning you |
2120a181 |
778 | can have a nested hash key or array element that points to a parent object. |
779 | This relationship is stored in the DB file, and is preserved between sessions. |
780 | Here is an example: |
781 | |
782 | my $db = DBM::Deep->new( "foo.db" ); |
783 | |
784 | $db->{foo} = "bar"; |
785 | $db->{circle} = $db; # ref to self |
786 | |
787 | print $db->{foo} . "\n"; # prints "bar" |
788 | print $db->{circle}->{foo} . "\n"; # prints "bar" again |
789 | |
1cff45d7 |
790 | This also works as expected with array and hash references. So, the following |
791 | works as expected: |
792 | |
793 | $db->{foo} = [ 1 .. 3 ]; |
794 | $db->{bar} = $db->{foo}; |
795 | |
796 | push @{$db->{foo}}, 42; |
797 | is( $db->{bar}[-1], 42 ); # Passes |
798 | |
799 | This, however, does I<not> extend to assignments from one DB file to another. |
800 | So, the following will throw an error: |
801 | |
802 | my $db1 = DBM::Deep->new( "foo.db" ); |
803 | my $db2 = DBM::Deep->new( "bar.db" ); |
804 | |
805 | $db1->{foo} = []; |
806 | $db2->{foo} = $db1->{foo}; # dies |
807 | |
2120a181 |
808 | B<Note>: Passing the object to a function that recursively walks the |
809 | object tree (such as I<Data::Dumper> or even the built-in C<optimize()> or |
810 | C<export()> methods) will result in an infinite loop. This will be fixed in |
1cff45d7 |
811 | a future release by adding singleton support. |
2120a181 |
812 | |
813 | =head1 TRANSACTIONS |
814 | |
1cff45d7 |
815 | As of 1.0000, DBM::Deep hass ACID transactions. Every DBM::Deep object is completely |
2120a181 |
816 | transaction-ready - it is not an option you have to turn on. You do have to |
817 | specify how many transactions may run simultaneously (q.v. L</num_txns>). |
818 | |
819 | Three new methods have been added to support them. They are: |
820 | |
821 | =over 4 |
822 | |
823 | =item * begin_work() |
824 | |
825 | This starts a transaction. |
826 | |
827 | =item * commit() |
828 | |
829 | This applies the changes done within the transaction to the mainline and ends |
830 | the transaction. |
831 | |
832 | =item * rollback() |
833 | |
834 | This discards the changes done within the transaction to the mainline and ends |
835 | the transaction. |
836 | |
837 | =back |
838 | |
839 | Transactions in DBM::Deep are done using a variant of the MVCC method, the |
840 | same method used by the InnoDB MySQL engine. |
841 | |
e9b0b5f0 |
842 | =head1 MIGRATION |
843 | |
844 | As of 1.0000, the file format has changed. Furthermore, DBM::Deep is now |
845 | designed to potentially change file format between point-releases, if needed to |
846 | support a requested feature. To aid in this, a migration script is provided |
847 | within the CPAN distribution called C<utils/upgrade_db.pl>. |
848 | |
849 | B<NOTE:> This script is not installed onto your system because it carries a copy |
850 | of every version prior to the current version. |
851 | |
2120a181 |
852 | =head1 TODO |
853 | |
854 | The following are items that are planned to be added in future releases. These |
855 | are separate from the L<CAVEATS, ISSUES & BUGS> below. |
856 | |
857 | =head2 Sub-Transactions |
858 | |
859 | Right now, you cannot run a transaction within a transaction. Removing this |
860 | restriction is technically straightforward, but the combinatorial explosion of |
861 | possible usecases hurts my head. If this is something you want to see |
862 | immediately, please submit many testcases. |
863 | |
864 | =head2 Caching |
865 | |
08164b50 |
866 | If a client is willing to assert upon opening the file that this process will be |
2120a181 |
867 | the only consumer of that datafile, then there are a number of caching |
868 | possibilities that can be taken advantage of. This does, however, mean that |
869 | DBM::Deep is more vulnerable to losing data due to unflushed changes. It also |
870 | means a much larger in-memory footprint. As such, it's not clear exactly how |
871 | this should be done. Suggestions are welcome. |
872 | |
873 | =head2 Ram-only |
874 | |
875 | The techniques used in DBM::Deep simply require a seekable contiguous |
876 | datastore. This could just as easily be a large string as a file. By using |
877 | substr, the STM capabilities of DBM::Deep could be used within a |
878 | single-process. I have no idea how I'd specify this, though. Suggestions are |
879 | welcome. |
880 | |
2120a181 |
881 | =head2 Different contention resolution mechanisms |
882 | |
883 | Currently, the only contention resolution mechanism is last-write-wins. This |
884 | is the mechanism used by most RDBMSes and should be good enough for most uses. |
885 | For advanced uses of STM, other contention mechanisms will be needed. If you |
886 | have an idea of how you'd like to see contention resolution in DBM::Deep, |
887 | please let me know. |
888 | |
889 | =head1 CAVEATS, ISSUES & BUGS |
890 | |
891 | This section describes all the known issues with DBM::Deep. These are issues |
892 | that are either intractable or depend on some feature within Perl working |
893 | exactly right. It you have found something that is not listed below, please |
894 | send an e-mail to L<rkinyon@cpan.org>. Likewise, if you think you know of a |
895 | way around one of these issues, please let me know. |
896 | |
897 | =head2 References |
898 | |
899 | (The following assumes a high level of Perl understanding, specifically of |
900 | references. Most users can safely skip this section.) |
901 | |
902 | Currently, the only references supported are HASH and ARRAY. The other reference |
903 | types (SCALAR, CODE, GLOB, and REF) cannot be supported for various reasons. |
904 | |
905 | =over 4 |
906 | |
907 | =item * GLOB |
908 | |
909 | These are things like filehandles and other sockets. They can't be supported |
910 | because it's completely unclear how DBM::Deep should serialize them. |
911 | |
912 | =item * SCALAR / REF |
913 | |
914 | The discussion here refers to the following type of example: |
915 | |
916 | my $x = 25; |
917 | $db->{key1} = \$x; |
918 | |
919 | $x = 50; |
920 | |
921 | # In some other process ... |
922 | |
923 | my $val = ${ $db->{key1} }; |
924 | |
925 | is( $val, 50, "What actually gets stored in the DB file?" ); |
926 | |
927 | The problem is one of synchronization. When the variable being referred to |
928 | changes value, the reference isn't notified, which is kind of the point of |
929 | references. This means that the new value won't be stored in the datafile for |
930 | other processes to read. There is no TIEREF. |
931 | |
932 | It is theoretically possible to store references to values already within a |
933 | DBM::Deep object because everything already is synchronized, but the change to |
934 | the internals would be quite large. Specifically, DBM::Deep would have to tie |
935 | every single value that is stored. This would bloat the RAM footprint of |
936 | DBM::Deep at least twofold (if not more) and be a significant performance drain, |
937 | all to support a feature that has never been requested. |
938 | |
939 | =item * CODE |
940 | |
b8370759 |
941 | L<Data::Dump::Streamer> provides a mechanism for serializing coderefs, |
2120a181 |
942 | including saving off all closure state. This would allow for DBM::Deep to |
943 | store the code for a subroutine. Then, whenever the subroutine is read, the |
944 | code could be C<eval()>'ed into being. However, just as for SCALAR and REF, |
945 | that closure state may change without notifying the DBM::Deep object storing |
946 | the reference. Again, this would generally be considered a feature. |
947 | |
948 | =back |
949 | |
c57b19c6 |
950 | =head2 External references and transactions |
1cff45d7 |
951 | |
c57b19c6 |
952 | If you do C<my $x = $db-E<gt>{foo};>, then start a transaction, $x will be |
953 | referencing the database from outside the transaction. A fix for this (and other |
954 | issues with how external references into the database) is being looked into. This |
955 | is the skipped set of tests in t/39_singletons.t and a related issue is the focus |
956 | of t/37_delete_edge_cases.t |
1cff45d7 |
957 | |
2120a181 |
958 | =head2 File corruption |
959 | |
960 | The current level of error handling in DBM::Deep is minimal. Files I<are> checked |
961 | for a 32-bit signature when opened, but any other form of corruption in the |
962 | datafile can cause segmentation faults. DBM::Deep may try to C<seek()> past |
963 | the end of a file, or get stuck in an infinite loop depending on the level and |
964 | type of corruption. File write operations are not checked for failure (for |
965 | speed), so if you happen to run out of disk space, DBM::Deep will probably fail in |
966 | a bad way. These things will be addressed in a later version of DBM::Deep. |
967 | |
968 | =head2 DB over NFS |
969 | |
970 | Beware of using DBM::Deep files over NFS. DBM::Deep uses flock(), which works |
971 | well on local filesystems, but will NOT protect you from file corruption over |
972 | NFS. I've heard about setting up your NFS server with a locking daemon, then |
973 | using C<lockf()> to lock your files, but your mileage may vary there as well. |
974 | From what I understand, there is no real way to do it. However, if you need |
975 | access to the underlying filehandle in DBM::Deep for using some other kind of |
976 | locking scheme like C<lockf()>, see the L<LOW-LEVEL ACCESS> section above. |
977 | |
978 | =head2 Copying Objects |
979 | |
980 | Beware of copying tied objects in Perl. Very strange things can happen. |
981 | Instead, use DBM::Deep's C<clone()> method which safely copies the object and |
982 | returns a new, blessed and tied hash or array to the same level in the DB. |
983 | |
984 | my $copy = $db->clone(); |
985 | |
986 | B<Note>: Since clone() here is cloning the object, not the database location, any |
987 | modifications to either $db or $copy will be visible to both. |
988 | |
989 | =head2 Large Arrays |
990 | |
991 | Beware of using C<shift()>, C<unshift()> or C<splice()> with large arrays. |
992 | These functions cause every element in the array to move, which can be murder |
993 | on DBM::Deep, as every element has to be fetched from disk, then stored again in |
994 | a different location. This will be addressed in a future version. |
995 | |
08164b50 |
996 | This has been somewhat addressed so that the cost is constant, regardless of |
997 | what is stored at those locations. So, small arrays with huge data structures in |
998 | them are faster. But, large arrays are still large. |
999 | |
2120a181 |
1000 | =head2 Writeonly Files |
1001 | |
08164b50 |
1002 | If you pass in a filehandle to new(), you may have opened it in either a |
1003 | readonly or writeonly mode. STORE will verify that the filehandle is writable. |
1004 | However, there doesn't seem to be a good way to determine if a filehandle is |
1005 | readable. And, if the filehandle isn't readable, it's not clear what will |
1006 | happen. So, don't do that. |
2120a181 |
1007 | |
1008 | =head2 Assignments Within Transactions |
1009 | |
1010 | The following will I<not> work as one might expect: |
1011 | |
1012 | my $x = { a => 1 }; |
1013 | |
1014 | $db->begin_work; |
1015 | $db->{foo} = $x; |
1016 | $db->rollback; |
1017 | |
1018 | is( $x->{a}, 1 ); # This will fail! |
1019 | |
1020 | The problem is that the moment a reference used as the rvalue to a DBM::Deep |
1021 | object's lvalue, it becomes tied itself. This is so that future changes to |
1022 | C<$x> can be tracked within the DBM::Deep file and is considered to be a |
1023 | feature. By the time the rollback occurs, there is no knowledge that there had |
1024 | been an C<$x> or what memory location to assign an C<export()> to. |
1025 | |
1026 | B<NOTE:> This does not affect importing because imports do a walk over the |
1027 | reference to be imported in order to explicitly leave it untied. |
1028 | |
1029 | =head1 CODE COVERAGE |
1030 | |
b8370759 |
1031 | L<Devel::Cover> is used to test the code coverage of the tests. Below is the |
1032 | L<Devel::Cover> report on this distribution's test suite. |
2120a181 |
1033 | |
888453b9 |
1034 | ------------------------------------------ ------ ------ ------ ------ ------ |
1035 | File stmt bran cond sub total |
1036 | ------------------------------------------ ------ ------ ------ ------ ------ |
e00d0eb3 |
1037 | blib/lib/DBM/Deep.pm 97.2 90.9 83.3 100.0 95.4 |
c57b19c6 |
1038 | blib/lib/DBM/Deep/Array.pm 100.0 95.7 100.0 100.0 99.0 |
e00d0eb3 |
1039 | blib/lib/DBM/Deep/Engine.pm 95.6 84.7 81.6 98.4 92.5 |
888453b9 |
1040 | blib/lib/DBM/Deep/File.pm 97.2 81.6 66.7 100.0 91.9 |
1041 | blib/lib/DBM/Deep/Hash.pm 100.0 100.0 100.0 100.0 100.0 |
e00d0eb3 |
1042 | Total 96.7 87.5 82.2 99.2 94.1 |
888453b9 |
1043 | ------------------------------------------ ------ ------ ------ ------ ------ |
2120a181 |
1044 | |
1045 | =head1 MORE INFORMATION |
1046 | |
1047 | Check out the DBM::Deep Google Group at L<http://groups.google.com/group/DBM-Deep> |
1048 | or send email to L<DBM-Deep@googlegroups.com>. You can also visit #dbm-deep on |
1049 | irc.perl.org |
1050 | |
1051 | The source code repository is at L<http://svn.perl.org/modules/DBM-Deep> |
1052 | |
e9b0b5f0 |
1053 | =head1 MAINTAINERS |
2120a181 |
1054 | |
1055 | Rob Kinyon, L<rkinyon@cpan.org> |
1056 | |
1057 | Originally written by Joseph Huckaby, L<jhuckaby@cpan.org> |
1058 | |
e9b0b5f0 |
1059 | =head1 SPONSORS |
1060 | |
1061 | Stonehenge Consulting (L<http://www.stonehenge.com/>) sponsored the |
1062 | developement of transactions and freespace management, leading to the 1.0000 |
1063 | release. A great debt of gratitude goes out to them for their continuing |
1064 | leadership in and support of the Perl community. |
1065 | |
2120a181 |
1066 | =head1 CONTRIBUTORS |
1067 | |
1068 | The following have contributed greatly to make DBM::Deep what it is today: |
1069 | |
1070 | =over 4 |
1071 | |
e9b0b5f0 |
1072 | =item * Adam Sah and Rich Gaushell for innumerable contributions early on. |
2120a181 |
1073 | |
1074 | =item * Dan Golden and others at YAPC::NA 2006 for helping me design through transactions. |
1075 | |
1076 | =back |
1077 | |
1078 | =head1 SEE ALSO |
1079 | |
1080 | perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3), nfs(5), |
1081 | Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3) |
1082 | |
1083 | =head1 LICENSE |
1084 | |
1085 | Copyright (c) 2007 Rob Kinyon. All Rights Reserved. |
e9b0b5f0 |
1086 | This is free software, you may use it and distribute it under the same terms |
1087 | as Perl itself. |
2120a181 |
1088 | |
1089 | =cut |