From: rkinyon Date: Mon, 26 Feb 2007 17:08:23 +0000 (+0000) Subject: r15625@rob-kinyons-computer (orig r9171): rkinyon | 2007-02-26 11:56:32 -0500 X-Git-Tag: 1-0000~1 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=e9b0b5f026035a56f53cecda0126cb62bc1da3da;p=dbsrgits%2FDBM-Deep.git r15625@rob-kinyons-computer (orig r9171): rkinyon | 2007-02-26 11:56:32 -0500 r14949@rob-kinyons-computer (orig r8702): rkinyon | 2007-01-24 23:08:35 -0500 Added more to the article r14963@rob-kinyons-computer (orig r8736): rkinyon | 2007-01-28 01:22:53 -0500 On the warpath to 1.0000 r14964@rob-kinyons-computer (orig r8737): rkinyon | 2007-01-28 01:37:21 -0500 Added tests for wrong file versions r14965@rob-kinyons-computer (orig r8738): rkinyon | 2007-01-28 14:32:26 -0500 data_sector_size parameterization is proceeding apace r15013@rob-kinyons-computer (orig r8751): rkinyon | 2007-01-29 22:16:24 -0500 Change some defaults and the tests to match r15014@rob-kinyons-computer (orig r8752): rkinyon | 2007-01-29 22:25:19 -0500 Cleaned up validation code r15015@rob-kinyons-computer (orig r8762): rkinyon | 2007-01-29 23:22:46 -0500 Removed transactional staleness counter from the HEAD as it can never be stale r15016@rob-kinyons-computer (orig r8763): rkinyon | 2007-01-29 23:36:02 -0500 Transactional staleness counters are down from 4 bytes to 2 r15017@rob-kinyons-computer (orig r8764): rkinyon | 2007-01-29 23:40:17 -0500 Minor cleanups r15019@rob-kinyons-computer (orig r8766): rkinyon | 2007-01-30 00:25:55 -0500 Added failing test for large numbers of transactions r15102@rob-kinyons-computer (orig r8795): rkinyon | 2007-02-03 23:36:19 -0500 Fixed limitation of transactions to only 32 r15103@rob-kinyons-computer (orig r8796): rkinyon | 2007-02-03 23:44:14 -0500 Cleaned up a little bit r15104@rob-kinyons-computer (orig r8797): rkinyon | 2007-02-04 00:23:10 -0500 Removed the verybig test from the MANIFEST - I want to think about that one first r15204@rob-kinyons-computer (orig r8841): rkinyon | 2007-02-09 11:15:29 -0500 Article changes r15223@rob-kinyons-computer (orig r9063): rkinyon | 2007-02-10 13:43:15 -0500 Article improvements r15225@rob-kinyons-computer (orig r9076): rkinyon | 2007-02-11 14:49:29 -0500 More work on the article r15516@rob-kinyons-computer (orig r9111): rkinyon | 2007-02-15 09:27:05 -0500 Initial draft (unworking) of upgrade_db.pl r15517@rob-kinyons-computer (orig r9113): rkinyon | 2007-02-15 14:50:32 -0500 Cleanup of DB files, step 1 r15518@rob-kinyons-computer (orig r9114): rkinyon | 2007-02-15 14:51:42 -0500 Cleanup of DB files, step 2 r15519@rob-kinyons-computer (orig r9115): rkinyon | 2007-02-15 14:52:29 -0500 Cleanup for new file versions r15528@rob-kinyons-computer (orig r9119): rkinyon | 2007-02-18 05:40:50 -0500 Further work done and a test for utils/upgrade_db.pl r15529@rob-kinyons-computer (orig r9120): rkinyon | 2007-02-18 06:10:55 -0500 0-983 is now moved over and parsing - still need to perform tests on the converted file r15530@rob-kinyons-computer (orig r9121): rkinyon | 2007-02-18 06:53:27 -0500 Conversion seems to be working r15531@rob-kinyons-computer (orig r9122): rkinyon | 2007-02-18 07:16:23 -0500 Made get_pod support Perl 5.6 by removing use of in-memory filehandles r15532@rob-kinyons-computer (orig r9123): rkinyon | 2007-02-18 08:23:39 -0500 upgrade_db.pl is almost ready for release r15533@rob-kinyons-computer (orig r9124): rkinyon | 2007-02-18 08:25:07 -0500 Fixed MANIFEST and README r15534@rob-kinyons-computer (orig r9131): rkinyon | 2007-02-19 07:06:57 -0500 Documentation r15548@rob-kinyons-computer (orig r9141): rkinyon | 2007-02-19 12:25:18 -0500 Added better handling of 1.0000 to upgrade_db.pl r15549@rob-kinyons-computer (orig r9142): rkinyon | 2007-02-19 21:41:57 -0500 Article modifications r15550@rob-kinyons-computer (orig r9144): rkinyon | 2007-02-20 18:28:56 -0500 Finished final draft of the article r15551@rob-kinyons-computer (orig r9146): rkinyon | 2007-02-20 18:47:14 -0500 Added SPONSORS section to DBM::Deep's POD r15622@rob-kinyons-computer (orig r9168): rkinyon | 2007-02-26 11:20:04 -0500 Fixed Changes and a failing test found due to disttest r15623@rob-kinyons-computer (orig r9169): rkinyon | 2007-02-26 11:23:45 -0500 Created an articles directory, manifest.skip'ed it, and cleaned up a bit more r15624@rob-kinyons-computer (orig r9170): rkinyon | 2007-02-26 11:52:00 -0500 t_attic created --- diff --git a/Build.PL b/Build.PL index d636a23..55071fe 100644 --- a/Build.PL +++ b/Build.PL @@ -19,6 +19,7 @@ my $build = Module::Build->new( 'File::Path' => '0.01', 'File::Temp' => '0.01', 'Test::Deep' => '0.095', + 'Test::Warn' => '0.08', 'Test::More' => '0.47', 'Test::Exception' => '0.21', }, diff --git a/Changes b/Changes index 6b840a5..6409e9f 100644 --- a/Changes +++ b/Changes @@ -1,11 +1,26 @@ Revision history for DBM::Deep. +1.0000 Feb 26 22:30:00 2007 EDT + - THIS VERSION IS INCOMPATIBLE WITH FILES FROM ALL OTHER PRIOR VERSIONS. + - To aid in this form of upgrades, DBM::Deep now checks the file format + version to make sure that it knows how to read it. + - db_upgrade.pl was added to utils/. This will -NOT- install onto + your system. This is deliberate. + - db_upgrade.pl will not handle developer release file formats. This + is due to the fact that all developer releases in preparation for a + given release share the same file version, even though the file + format may change. This is deliberate. + - Importing no longer takes place within a transaction + - The following parameters were added: + - data_sector_size - this determines the default size of a data sector. + - Correctly handle opening readonly files + 0.99_04 Jan 24 22:30:00 2007 EDT - Added the missing lib/DBM/Deep.pod file to the MANIFEST - Fixed a poorly-designed test that was failing depending on what Clone::Any - - was using. + was using. - All "use 5.6.0;" lines are now "use 5.006_000;" to avoid warnings about - unsupported vstrings in bleadperl. + unsupported vstrings in 5.9.x 0.99_03 Jan 23 22:30:00 2007 EDT - THIS VERSION IS INCOMPATIBLE WITH FILES FROM ALL OTHER PRIOR VERSIONS. diff --git a/MANIFEST b/MANIFEST index 0343370..95ba4b1 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1,5 +1,9 @@ Build.PL Changes +Makefile.PL +MANIFEST +META.yml +README lib/DBM/Deep.pm lib/DBM/Deep.pod lib/DBM/Deep/Array.pm @@ -8,10 +12,8 @@ lib/DBM/Deep/Engine.pm lib/DBM/Deep/File.pm lib/DBM/Deep/Hash.pm lib/DBM/Deep/Internals.pod -Makefile.PL -MANIFEST -META.yml -README +utils/upgrade_db.pl +utils/lib/DBM/Deep/09830.pm t/01_basic.t t/02_hash.t t/03_bighash.t @@ -48,8 +50,14 @@ t/33_transactions.t t/34_transaction_arrays.t t/35_transaction_multiple.t t/37_delete_edge_cases.t +t/38_data_sector_size.t t/39_singletons.t t/40_freespace.t t/41_transaction_multilevel.t t/42_transaction_indexsector.t +t/43_transaction_maximum.t +t/44_upgrade_db.t t/common.pm +t/etc/db-0-983 +t/etc/db-0-99_04 +t/etc/db-1-0000 diff --git a/MANIFEST.SKIP b/MANIFEST.SKIP index b008153..49ba2da 100644 --- a/MANIFEST.SKIP +++ b/MANIFEST.SKIP @@ -17,3 +17,5 @@ cover_db ^\.# ^\.DS_Store ^__MACOSX +^articles +^t_attic diff --git a/README b/README index d0b8b42..2c05482 100644 --- a/README +++ b/README @@ -1,51 +1,66 @@ NAME - DBM::Deep - A pure perl multi-level hash/array DBM + DBM::Deep - A pure perl multi-level hash/array DBM that supports + transactions SYNOPSIS use DBM::Deep; - my $db = new DBM::Deep "foo.db"; - - $db->{key} = 'value'; # tie() style + my $db = DBM::Deep->new( "foo.db" ); + + $db->{key} = 'value'; print $db->{key}; - - $db->put('key', 'value'); # OO style + + $db->put('key' => 'value'); print $db->get('key'); - + # true multi-level support $db->{my_complex} = [ - 'hello', { perl => 'rules' }, - 42, 99 ]; + 'hello', { perl => 'rules' }, + 42, 99, + ]; + + $db->begin_work; + + # Do stuff here + + $db->rollback; + $db->commit; + + tie my %db, 'DBM::Deep', 'foo.db'; + $db{key} = 'value'; + print $db{key}; + + tied(%db)->put('key' => 'value'); + print tied(%db)->get('key'); DESCRIPTION A unique flat-file database module, written in pure perl. True multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO - / tie() interface, cross-platform FTPable files, and quite fast. Can - handle millions of keys and unlimited hash levels without significant - slow-down. Written from the ground-up in pure perl -- this is NOT a - wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, - Mac OS X and Windows. - -INSTALLATION - Hopefully you are using CPAN's excellent Perl module, which will - download and install the module for you. If not, get the tarball, and - run these commands: - - tar zxf DBM-Deep-* - cd DBM-Deep-* - perl Makefile.PL - make - make test - make install + / tie() interface, cross-platform FTPable files, ACID transactions, and + is quite fast. Can handle millions of keys and unlimited levels without + significant slow-down. Written from the ground-up in pure perl -- this + is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with + Unix, Mac OS X and Windows. + +VERSION DIFFERENCES + NOTE: 0.99_03 has significant file format differences from prior + versions. THere will be a backwards-compatibility layer in 1.00, but + that is slated for a later 0.99_x release. This version is NOT backwards + compatible with any other release of DBM::Deep. + + NOTE: 0.99_01 and above have significant file format differences from + 0.983 and before. There will be a backwards-compatibility layer in 1.00, + but that is slated for a later 0.99_x release. This version is NOT + backwards compatible with 0.983 and before. SETUP Construction can be done OO-style (which is the recommended way), or using Perl's tie() function. Both are examined here. - OO CONSTRUCTION + OO Construction The recommended way to construct a DBM::Deep object is to use the new() - method, which gets you a blessed, tied hash or array reference. + method, which gets you a blessed *and* tied hash (or array) reference. - my $db = new DBM::Deep "foo.db"; + my $db = DBM::Deep->new( "foo.db" ); This opens a new database handle, mapped to the file "foo.db". If this file does not exist, it will automatically be created. DB files are @@ -53,13 +68,14 @@ SETUP hash, unless otherwise specified (see OPTIONS below). You can pass a number of options to the constructor to specify things - like locking, autoflush, etc. This is done by passing an inline hash: + like locking, autoflush, etc. This is done by passing an inline hash (or + hashref): - my $db = new DBM::Deep( - file => "foo.db", - locking => 1, - autoflush => 1 - ); + my $db = DBM::Deep->new( + file => "foo.db", + locking => 1, + autoflush => 1 + ); Notice that the filename is now specified *inside* the hash with the "file" parameter, as opposed to being the sole argument to the @@ -69,39 +85,40 @@ SETUP You can also start with an array instead of a hash. For this, you must specify the "type" parameter: - my $db = new DBM::Deep( - file => "foo.db", - type => DBM::Deep::TYPE_ARRAY - ); + my $db = DBM::Deep->new( + file => "foo.db", + type => DBM::Deep->TYPE_ARRAY + ); Note: Specifing the "type" parameter only takes effect when beginning a new DB file. If you create a DBM::Deep object with an existing file, the - "type" will be loaded from the file header, and ignored if it is passed - to the constructor. + "type" will be loaded from the file header, and an error will be thrown + if the wrong type is passed in. + + Tie Construction + Alternately, you can create a DBM::Deep handle by using Perl's built-in + tie() function. The object returned from tie() can be used to call + methods, such as lock() and unlock(). (That object can be retrieved from + the tied variable at any time using tied() - please see perltie for more + info. - TIE CONSTRUCTION - Alternatively, you can create a DBM::Deep handle by using Perl's - built-in tie() function. This is not ideal, because you get only a - basic, tied hash (or array) which is not blessed, so you can't call any - functions on it. + my %hash; + my $db = tie %hash, "DBM::Deep", "foo.db"; - my %hash; - tie %hash, "DBM::Deep", "foo.db"; - - my @array; - tie @array, "DBM::Deep", "bar.db"; + my @array; + my $db = tie @array, "DBM::Deep", "bar.db"; As with the OO constructor, you can replace the DB filename parameter with a hash containing one or more options (see OPTIONS just below for the complete list). - tie %hash, "DBM::Deep", { - file => "foo.db", - locking => 1, - autoflush => 1 - }; + tie %hash, "DBM::Deep", { + file => "foo.db", + locking => 1, + autoflush => 1 + }; - OPTIONS + Options There are a number of options that can be passed in when constructing your DBM::Deep objects. These apply to both the OO- and tie- based approaches. @@ -110,106 +127,156 @@ SETUP Filename of the DB file to link the handle to. You can pass a full absolute filesystem path, partial path, or a plain filename if the file is in the current working directory. This is a required - parameter. + parameter (though q.v. fh). + + * fh + If you want, you can pass in the fh instead of the file. This is + most useful for doing something like: + + my $db = DBM::Deep->new( { fh => \*DATA } ); - * mode - File open mode (read-only, read-write, etc.) string passed to Perl's - FileHandle module. This is an optional parameter, and defaults to - "r+" (read/write). Note: If the default (r+) mode is selected, the - file will also be auto- created if it doesn't exist. + You are responsible for making sure that the fh has been opened + appropriately for your needs. If you open it read-only and attempt + to write, an exception will be thrown. If you open it write-only or + append-only, an exception will be thrown immediately as DBM::Deep + needs to read from the fh. + + * file_offset + This is the offset within the file that the DBM::Deep db starts. + Most of the time, you will not need to set this. However, it's there + if you want it. + + If you pass in fh and do not set this, it will be set appropriately. * type This parameter specifies what type of object to create, a hash or - array. Use one of these two constants: "DBM::Deep::TYPE_HASH" or - "DBM::Deep::TYPE_ARRAY". This only takes effect when beginning a new - file. This is an optional parameter, and defaults to hash. + array. Use one of these two constants: + + * "DBM::Deep->TYPE_HASH" + * "DBM::Deep->TYPE_ARRAY". + + This only takes effect when beginning a new file. This is an + optional parameter, and defaults to "DBM::Deep->TYPE_HASH". * locking Specifies whether locking is to be enabled. DBM::Deep uses Perl's - Fnctl flock() function to lock the database in exclusive mode for - writes, and shared mode for reads. Pass any true value to enable. - This affects the base DB handle *and any child hashes or arrays* - that use the same DB file. This is an optional parameter, and - defaults to 0 (disabled). See LOCKING below for more. + flock() function to lock the database in exclusive mode for writes, + and shared mode for reads. Pass any true value to enable. This + affects the base DB handle *and any child hashes or arrays* that use + the same DB file. This is an optional parameter, and defaults to 1 + (enabled). See LOCKING below for more. * autoflush Specifies whether autoflush is to be enabled on the underlying - FileHandle. This obviously slows down write operations, but is + filehandle. This obviously slows down write operations, but is required if you may have multiple processes accessing the same DB - file (also consider enable *locking* or at least *volatile*). Pass - any true value to enable. This is an optional parameter, and - defaults to 0 (disabled). - - * volatile - If *volatile* mode is enabled, DBM::Deep will stat() the DB file - before each STORE() operation. This is required if an outside force - may change the size of the file between transactions. Locking also - implicitly enables volatile. This is useful if you want to use a - different locking system or write your own. Pass any true value to - enable. This is an optional parameter, and defaults to 0 (disabled). - - * autobless - If *autobless* mode is enabled, DBM::Deep will preserve blessed - hashes, and restore them when fetched. This is an experimental - feature, and does have side-effects. Basically, when hashes are - re-blessed into their original classes, they are no longer blessed - into the DBM::Deep class! So you won't be able to call any DBM::Deep - methods on them. You have been warned. This is an optional - parameter, and defaults to 0 (disabled). + file (also consider enable *locking*). Pass any true value to + enable. This is an optional parameter, and defaults to 1 (enabled). * filter_* - See FILTERS below. - - * debug - Setting *debug* mode will make all errors non-fatal, dump them out - to STDERR, and continue on. This is for debugging purposes only, and - probably not what you want. This is an optional parameter, and - defaults to 0 (disabled). + See "FILTERS" below. + + The following parameters may be specified in the constructor the first + time the datafile is created. However, they will be stored in the header + of the file and cannot be overridden by subsequent openings of the file + - the values will be set from the values stored in the datafile's + header. + + * num_txns + This is the number of transactions that can be running at one time. + The default is one - the HEAD. The minimum is one and the maximum is + 255. The more transactions, the larger and quicker the datafile + grows. + + See "TRANSACTIONS" below. + + * max_buckets + This is the number of entries that can be added before a reindexing. + The larger this number is made, the larger a file gets, but the + better performance you will have. The default and minimum number + this can be is 16. The maximum is 256, but more than 64 isn't + recommended. + + * data_sector_size + This is the size in bytes of a given data sector. Data sectors will + chain, so a value of any size can be stored. However, chaining is + expensive in terms of time. Setting this value to something close to + the expected common length of your scalars will improve your + performance. If it is too small, your file will have a lot of + chaining. If it is too large, your file will have a lot of dead + space in it. + + The default for this is 64 bytes. The minimum value is 32 and the + maximum is 256 bytes. + + Note: There are between 6 and 10 bytes taken up in each data sector + for bookkeeping. (It's 4 + the number of bytes in your "pack_size".) + This is included within the data_sector_size, thus the effective + value is 6-10 bytes less than what you specified. + + * pack_size + This is the size of the file pointer used throughout the file. The + valid values are: + + * small + This uses 2-byte offsets, allowing for a maximum file size of 65 + KB. + + * medium (default) + This uses 4-byte offsets, allowing for a maximum file size of 4 + GB. + + * large + This uses 8-byte offsets, allowing for a maximum file size of 16 + XB (exabytes). This can only be enabled if your Perl is compiled + for 64-bit. + + See "LARGEFILE SUPPORT" for more information. TIE INTERFACE With DBM::Deep you can access your databases using Perl's standard - hash/array syntax. Because all Deep objects are *tied* to hashes or - arrays, you can treat them as such. Deep will intercept all reads/writes - and direct them to the right place -- the DB file. This has nothing to - do with the "TIE CONSTRUCTION" section above. This simply tells you how - to use DBM::Deep using regular hashes and arrays, rather than calling - functions like "get()" and "put()" (although those work too). It is - entirely up to you how to want to access your databases. - - HASHES + hash/array syntax. Because all DBM::Deep objects are *tied* to hashes or + arrays, you can treat them as such. DBM::Deep will intercept all + reads/writes and direct them to the right place -- the DB file. This has + nothing to do with the "TIE CONSTRUCTION" section above. This simply + tells you how to use DBM::Deep using regular hashes and arrays, rather + than calling functions like "get()" and "put()" (although those work + too). It is entirely up to you how to want to access your databases. + + Hashes You can treat any DBM::Deep object like a normal Perl hash reference. Add keys, or even nested hashes (or arrays) using standard Perl syntax: - my $db = new DBM::Deep "foo.db"; - - $db->{mykey} = "myvalue"; - $db->{myhash} = {}; - $db->{myhash}->{subkey} = "subvalue"; + my $db = DBM::Deep->new( "foo.db" ); + + $db->{mykey} = "myvalue"; + $db->{myhash} = {}; + $db->{myhash}->{subkey} = "subvalue"; - print $db->{myhash}->{subkey} . "\n"; + print $db->{myhash}->{subkey} . "\n"; You can even step through hash keys using the normal Perl "keys()" function: - foreach my $key (keys %$db) { - print "$key: " . $db->{$key} . "\n"; - } + foreach my $key (keys %$db) { + print "$key: " . $db->{$key} . "\n"; + } Remember that Perl's "keys()" function extracts *every* key from the hash and pushes them onto an array, all before the loop even begins. If - you have an extra large hash, this may exhaust Perl's memory. Instead, - consider using Perl's "each()" function, which pulls keys/values one at - a time, using very little memory: + you have an extremely large hash, this may exhaust Perl's memory. + Instead, consider using Perl's "each()" function, which pulls + keys/values one at a time, using very little memory: - while (my ($key, $value) = each %$db) { - print "$key: $value\n"; - } + while (my ($key, $value) = each %$db) { + print "$key: $value\n"; + } Please note that when using "each()", you should always pass a direct hash reference, not a lookup. Meaning, you should never do this: - # NEVER DO THIS - while (my ($key, $value) = each %{$db->{foo}}) { # BAD + # NEVER DO THIS + while (my ($key, $value) = each %{$db->{foo}}) { # BAD This causes an infinite loop, because for each iteration, Perl is calling FETCH() on the $db handle, resulting in a "new" hash for foo @@ -217,59 +284,63 @@ TIE INTERFACE over again. Instead, assign a temporary variable to "$db-"{foo}>, then pass that to each(). - ARRAYS + Arrays As with hashes, you can treat any DBM::Deep object like a normal Perl array reference. This includes inserting, removing and manipulating elements, and the "push()", "pop()", "shift()", "unshift()" and "splice()" functions. The object must have first been created using type - "DBM::Deep::TYPE_ARRAY", or simply be a nested array reference inside a + "DBM::Deep->TYPE_ARRAY", or simply be a nested array reference inside a hash. Example: - my $db = new DBM::Deep( - file => "foo-array.db", - type => DBM::Deep::TYPE_ARRAY - ); - - $db->[0] = "foo"; - push @$db, "bar", "baz"; - unshift @$db, "bah"; - - my $last_elem = pop @$db; # baz - my $first_elem = shift @$db; # bah - my $second_elem = $db->[1]; # bar - - my $num_elements = scalar @$db; + my $db = DBM::Deep->new( + file => "foo-array.db", + type => DBM::Deep->TYPE_ARRAY + ); + + $db->[0] = "foo"; + push @$db, "bar", "baz"; + unshift @$db, "bah"; + + my $last_elem = pop @$db; # baz + my $first_elem = shift @$db; # bah + my $second_elem = $db->[1]; # bar + + my $num_elements = scalar @$db; OO INTERFACE In addition to the *tie()* interface, you can also use a standard OO interface to manipulate all aspects of DBM::Deep databases. Each type of object (hash or array) has its own methods, but both types share the following common methods: "put()", "get()", "exists()", "delete()" and - "clear()". + "clear()". "fetch()" and "store(" are aliases to "put()" and "get()", + respectively. - * put() + * new() / clone() + These are the constructor and copy-functions. + + * put() / store() Stores a new hash key/value pair, or sets an array element value. Takes two arguments, the hash key or array index, and the new value. The value can be a scalar, hash ref or array ref. Returns true on success, false on failure. - $db->put("foo", "bar"); # for hashes - $db->put(1, "bar"); # for arrays + $db->put("foo", "bar"); # for hashes + $db->put(1, "bar"); # for arrays - * get() + * get() / fetch() Fetches the value of a hash key or array element. Takes one argument: the hash key or array index. Returns a scalar, hash ref or array ref, depending on the data type stored. - my $value = $db->get("foo"); # for hashes - my $value = $db->get(1); # for arrays + my $value = $db->get("foo"); # for hashes + my $value = $db->get(1); # for arrays * exists() Checks if a hash key or array index exists. Takes one argument: the hash key or array index. Returns true if it exists, false if not. - if ($db->exists("foo")) { print "yay!\n"; } # for hashes - if ($db->exists(1)) { print "yay!\n"; } # for arrays + if ($db->exists("foo")) { print "yay!\n"; } # for hashes + if ($db->exists(1)) { print "yay!\n"; } # for arrays * delete() Deletes one hash key/value pair or array element. Takes one @@ -277,22 +348,32 @@ OO INTERFACE false if not found. For arrays, the remaining elements located after the deleted element are NOT moved over. The deleted element is essentially just undefined, which is exactly how Perl's internal - arrays work. Please note that the space occupied by the deleted - key/value or element is not reused again -- see "UNUSED SPACE - RECOVERY" below for details and workarounds. + arrays work. - $db->delete("foo"); # for hashes - $db->delete(1); # for arrays + $db->delete("foo"); # for hashes + $db->delete(1); # for arrays * clear() Deletes all hash keys or array elements. Takes no arguments. No - return value. Please note that the space occupied by the deleted - keys/values or elements is not reused again -- see "UNUSED SPACE - RECOVERY" below for details and workarounds. + return value. + + $db->clear(); # hashes or arrays - $db->clear(); # hashes or arrays + * lock() / unlock() + q.v. Locking. - HASHES + * optimize() + Recover lost disk space. This is important to do, especially if you + use transactions. + + * import() / export() + Data going in and out. + + * begin_work() / commit() / rollback() + These are the transactional functions. "TRANSACTIONS" for more + information. + + Hashes For hashes, DBM::Deep supports all the common methods described above, and the following additional methods: "first_key()" and "next_key()". @@ -301,35 +382,35 @@ OO INTERFACE keys are fetched in an undefined order (which appears random). Takes no arguments, returns the key as a scalar value. - my $key = $db->first_key(); + my $key = $db->first_key(); * next_key() Returns the "next" key in the hash, given the previous one as the sole argument. Returns undef if there are no more keys to be fetched. - $key = $db->next_key($key); + $key = $db->next_key($key); Here are some examples of using hashes: - my $db = new DBM::Deep "foo.db"; - - $db->put("foo", "bar"); - print "foo: " . $db->get("foo") . "\n"; - - $db->put("baz", {}); # new child hash ref - $db->get("baz")->put("buz", "biz"); - print "buz: " . $db->get("baz")->get("buz") . "\n"; - - my $key = $db->first_key(); - while ($key) { - print "$key: " . $db->get($key) . "\n"; - $key = $db->next_key($key); - } - - if ($db->exists("foo")) { $db->delete("foo"); } - - ARRAYS + my $db = DBM::Deep->new( "foo.db" ); + + $db->put("foo", "bar"); + print "foo: " . $db->get("foo") . "\n"; + + $db->put("baz", {}); # new child hash ref + $db->get("baz")->put("buz", "biz"); + print "buz: " . $db->get("baz")->get("buz") . "\n"; + + my $key = $db->first_key(); + while ($key) { + print "$key: " . $db->get($key) . "\n"; + $key = $db->next_key($key); + } + + if ($db->exists("foo")) { $db->delete("foo"); } + + Arrays For arrays, DBM::Deep supports all the common methods described above, and the following additional methods: "length()", "push()", "pop()", "shift()", "unshift()" and "splice()". @@ -337,20 +418,20 @@ OO INTERFACE * length() Returns the number of elements in the array. Takes no arguments. - my $len = $db->length(); + my $len = $db->length(); * push() Adds one or more elements onto the end of the array. Accepts scalars, hash refs or array refs. No return value. - $db->push("foo", "bar", {}); + $db->push("foo", "bar", {}); * pop() Fetches the last element in the array, and deletes it. Takes no arguments. Returns undef if array is empty. Returns the element value. - my $elem = $db->pop(); + my $elem = $db->pop(); * shift() Fetches the first element in the array, deletes it, then shifts all @@ -358,7 +439,7 @@ OO INTERFACE element value. This method is not recommended with large arrays -- see "LARGE ARRAYS" below for details. - my $elem = $db->shift(); + my $elem = $db->shift(); * unshift() Inserts one or more elements onto the beginning of the array, @@ -367,7 +448,7 @@ OO INTERFACE recommended with large arrays -- see below for details. - $db->unshift("foo", "bar", {}); + $db->unshift("foo", "bar", {}); * splice() Performs exactly like Perl's built-in function of the same name. See @@ -377,103 +458,99 @@ OO INTERFACE Here are some examples of using arrays: - my $db = new DBM::Deep( - file => "foo.db", - type => DBM::Deep::TYPE_ARRAY - ); - - $db->push("bar", "baz"); - $db->unshift("foo"); - $db->put(3, "buz"); - - my $len = $db->length(); - print "length: $len\n"; # 4 - - for (my $k=0; $k<$len; $k++) { - print "$k: " . $db->get($k) . "\n"; - } - - $db->splice(1, 2, "biz", "baf"); - - while (my $elem = shift @$db) { - print "shifted: $elem\n"; - } + my $db = DBM::Deep->new( + file => "foo.db", + type => DBM::Deep->TYPE_ARRAY + ); + + $db->push("bar", "baz"); + $db->unshift("foo"); + $db->put(3, "buz"); + + my $len = $db->length(); + print "length: $len\n"; # 4 + + for (my $k=0; $k<$len; $k++) { + print "$k: " . $db->get($k) . "\n"; + } + + $db->splice(1, 2, "biz", "baf"); + + while (my $elem = shift @$db) { + print "shifted: $elem\n"; + } LOCKING - Enable automatic file locking by passing a true value to the "locking" - parameter when constructing your DBM::Deep object (see SETUP above). + Enable or disable automatic file locking by passing a boolean value to + the "locking" parameter when constructing your DBM::Deep object (see + SETUP above). - my $db = new DBM::Deep( - file => "foo.db", - locking => 1 - ); + my $db = DBM::Deep->new( + file => "foo.db", + locking => 1 + ); - This causes Deep to "flock()" the underlying FileHandle object with + This causes DBM::Deep to "flock()" the underlying filehandle with exclusive mode for writes, and shared mode for reads. This is required if you have multiple processes accessing the same database file, to avoid file corruption. Please note that "flock()" does NOT work for files over NFS. See "DB OVER NFS" below for more. - EXPLICIT LOCKING + Explicit Locking You can explicitly lock a database, so it remains locked for multiple - transactions. This is done by calling the "lock()" method, and passing - an optional lock mode argument (defaults to exclusive mode). This is + actions. This is done by calling the "lock()" method, and passing an + optional lock mode argument (defaults to exclusive mode). This is particularly useful for things like counters, where the current value needs to be fetched, then incremented, then stored again. - $db->lock(); - my $counter = $db->get("counter"); - $counter++; - $db->put("counter", $counter); - $db->unlock(); + $db->lock(); + my $counter = $db->get("counter"); + $counter++; + $db->put("counter", $counter); + $db->unlock(); + + # or... - # or... - - $db->lock(); - $db->{counter}++; - $db->unlock(); + $db->lock(); + $db->{counter}++; + $db->unlock(); You can pass "lock()" an optional argument, which specifies which mode to use (exclusive or shared). Use one of these two constants: - "DBM::Deep::LOCK_EX" or "DBM::Deep::LOCK_SH". These are passed directly - to "flock()", and are the same as the constants defined in Perl's - "Fcntl" module. + "DBM::Deep->LOCK_EX" or "DBM::Deep->LOCK_SH". These are passed directly + to "flock()", and are the same as the constants defined in Perl's Fcntl + module. - $db->lock( DBM::Deep::LOCK_SH ); - # something here - $db->unlock(); - - If you want to implement your own file locking scheme, be sure to create - your DBM::Deep objects setting the "volatile" option to true. This hints - to Deep that the DB file may change between transactions. See "LOW-LEVEL - ACCESS" below for more. + $db->lock( $db->LOCK_SH ); + # something here + $db->unlock(); IMPORTING/EXPORTING You can import existing complex structures by calling the "import()" method, and export an entire database into an in-memory structure using the "export()" method. Both are examined here. - IMPORTING + Importing Say you have an existing hash with nested hashes/arrays inside it. Instead of walking the structure and adding keys/elements to the database as you go, simply pass a reference to the "import()" method. This recursively adds everything to an existing DBM::Deep object for you. Here is an example: - my $struct = { - key1 => "value1", - key2 => "value2", - array1 => [ "elem0", "elem1", "elem2" ], - hash1 => { - subkey1 => "subvalue1", - subkey2 => "subvalue2" - } - }; - - my $db = new DBM::Deep "foo.db"; - $db->import( $struct ); - - print $db->{key1} . "\n"; # prints "value1" + my $struct = { + key1 => "value1", + key2 => "value2", + array1 => [ "elem0", "elem1", "elem2" ], + hash1 => { + subkey1 => "subvalue1", + subkey2 => "subvalue2" + } + }; + + my $db = DBM::Deep->new( "foo.db" ); + $db->import( $struct ); + + print $db->{key1} . "\n"; # prints "value1" This recursively imports the entire $struct object into $db, including all nested hashes and arrays. If the DBM::Deep object contains exsiting @@ -482,25 +559,26 @@ IMPORTING/EXPORTING just the base level), and works with both hash and array DB types. Note: Make sure your existing structure has no circular references in - it. These will cause an infinite loop when importing. + it. These will cause an infinite loop when importing. There are plans to + fix this in a later release. - EXPORTING + Exporting Calling the "export()" method on an existing DBM::Deep object will return a reference to a new in-memory copy of the database. The export is done recursively, so all nested hashes/arrays are all exported to standard Perl objects. Here is an example: - my $db = new DBM::Deep "foo.db"; - - $db->{key1} = "value1"; - $db->{key2} = "value2"; - $db->{hash1} = {}; - $db->{hash1}->{subkey1} = "subvalue1"; - $db->{hash1}->{subkey2} = "subvalue2"; - - my $struct = $db->export(); - - print $struct->{key1} . "\n"; # prints "value1" + my $db = DBM::Deep->new( "foo.db" ); + + $db->{key1} = "value1"; + $db->{key2} = "value2"; + $db->{hash1} = {}; + $db->{hash1}->{subkey1} = "subvalue1"; + $db->{hash1}->{subkey2} = "subvalue2"; + + my $struct = $db->export(); + + print $struct->{key1} . "\n"; # prints "value1" This makes a complete copy of the database in memory, and returns a reference to it. The "export()" method can be called on any database @@ -509,7 +587,8 @@ IMPORTING/EXPORTING a DBM::Deep object than an in-memory Perl structure. Note: Make sure your database has no circular references in it. These - will cause an infinite loop when exporting. + will cause an infinite loop when exporting. There are plans to fix this + in a later release. FILTERS DBM::Deep has a number of hooks where you can specify your own Perl @@ -542,16 +621,16 @@ FILTERS Here are the two ways to setup a filter hook: - my $db = new DBM::Deep( - file => "foo.db", - filter_store_value => \&my_filter_store, - filter_fetch_value => \&my_filter_fetch - ); - - # or... - - $db->set_filter( "filter_store_value", \&my_filter_store ); - $db->set_filter( "filter_fetch_value", \&my_filter_fetch ); + my $db = DBM::Deep->new( + file => "foo.db", + filter_store_value => \&my_filter_store, + filter_fetch_value => \&my_filter_fetch + ); + + # or... + + $db->set_filter( "filter_store_value", \&my_filter_store ); + $db->set_filter( "filter_fetch_value", \&my_filter_fetch ); Your filter function will be called only when dealing with SCALAR keys or values. When nested hashes and arrays are being stored/fetched, @@ -559,494 +638,486 @@ FILTERS single SCALAR argument, and expected to return a single SCALAR value. If you want to remove a filter, set the function reference to "undef": - $db->set_filter( "filter_store_value", undef ); + $db->set_filter( "filter_store_value", undef ); - REAL-TIME ENCRYPTION EXAMPLE + Real-time Encryption Example Here is a working example that uses the *Crypt::Blowfish* module to do real-time encryption / decryption of keys & values with DBM::Deep Filters. Please visit for more on *Crypt::Blowfish*. You'll also need the *Crypt::CBC* module. - use DBM::Deep; - use Crypt::Blowfish; - use Crypt::CBC; - - my $cipher = new Crypt::CBC({ - 'key' => 'my secret key', - 'cipher' => 'Blowfish', - 'iv' => '$KJh#(}q', - 'regenerate_key' => 0, - 'padding' => 'space', - 'prepend_iv' => 0 - }); - - my $db = new DBM::Deep( - file => "foo-encrypt.db", - filter_store_key => \&my_encrypt, - filter_store_value => \&my_encrypt, - filter_fetch_key => \&my_decrypt, - filter_fetch_value => \&my_decrypt, - ); - - $db->{key1} = "value1"; - $db->{key2} = "value2"; - print "key1: " . $db->{key1} . "\n"; - print "key2: " . $db->{key2} . "\n"; - - undef $db; - exit; - - sub my_encrypt { - return $cipher->encrypt( $_[0] ); - } - sub my_decrypt { - return $cipher->decrypt( $_[0] ); - } - - REAL-TIME COMPRESSION EXAMPLE + use DBM::Deep; + use Crypt::Blowfish; + use Crypt::CBC; + + my $cipher = Crypt::CBC->new({ + 'key' => 'my secret key', + 'cipher' => 'Blowfish', + 'iv' => '$KJh#(}q', + 'regenerate_key' => 0, + 'padding' => 'space', + 'prepend_iv' => 0 + }); + + my $db = DBM::Deep->new( + file => "foo-encrypt.db", + filter_store_key => \&my_encrypt, + filter_store_value => \&my_encrypt, + filter_fetch_key => \&my_decrypt, + filter_fetch_value => \&my_decrypt, + ); + + $db->{key1} = "value1"; + $db->{key2} = "value2"; + print "key1: " . $db->{key1} . "\n"; + print "key2: " . $db->{key2} . "\n"; + + undef $db; + exit; + + sub my_encrypt { + return $cipher->encrypt( $_[0] ); + } + sub my_decrypt { + return $cipher->decrypt( $_[0] ); + } + + Real-time Compression Example Here is a working example that uses the *Compress::Zlib* module to do real-time compression / decompression of keys & values with DBM::Deep Filters. Please visit for more on *Compress::Zlib*. - use DBM::Deep; - use Compress::Zlib; - - my $db = new DBM::Deep( - file => "foo-compress.db", - filter_store_key => \&my_compress, - filter_store_value => \&my_compress, - filter_fetch_key => \&my_decompress, - filter_fetch_value => \&my_decompress, - ); - - $db->{key1} = "value1"; - $db->{key2} = "value2"; - print "key1: " . $db->{key1} . "\n"; - print "key2: " . $db->{key2} . "\n"; - - undef $db; - exit; - - sub my_compress { - return Compress::Zlib::memGzip( $_[0] ) ; - } - sub my_decompress { - return Compress::Zlib::memGunzip( $_[0] ) ; - } + use DBM::Deep; + use Compress::Zlib; + + my $db = DBM::Deep->new( + file => "foo-compress.db", + filter_store_key => \&my_compress, + filter_store_value => \&my_compress, + filter_fetch_key => \&my_decompress, + filter_fetch_value => \&my_decompress, + ); + + $db->{key1} = "value1"; + $db->{key2} = "value2"; + print "key1: " . $db->{key1} . "\n"; + print "key2: " . $db->{key2} . "\n"; + + undef $db; + exit; + + sub my_compress { + return Compress::Zlib::memGzip( $_[0] ) ; + } + sub my_decompress { + return Compress::Zlib::memGunzip( $_[0] ) ; + } Note: Filtering of keys only applies to hashes. Array "keys" are actually numerical index numbers, and are not filtered. ERROR HANDLING Most DBM::Deep methods return a true value for success, and call die() - on failure. You can wrap calls in an eval block to catch the die. Also, - the actual error message is stored in an internal scalar, which can be - fetched by calling the "error()" method. + on failure. You can wrap calls in an eval block to catch the die. - my $db = new DBM::Deep "foo.db"; # create hash - eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call - - print $db->error(); # prints error message + my $db = DBM::Deep->new( "foo.db" ); # create hash + eval { $db->push("foo"); }; # ILLEGAL -- push is array-only call - You can then call "clear_error()" to clear the current error state. - - $db->clear_error(); - - If you set the "debug" option to true when creating your DBM::Deep - object, all errors are considered NON-FATAL, and dumped to STDERR. This - is only for debugging purposes. + print $@; # prints error message LARGEFILE SUPPORT If you have a 64-bit system, and your Perl is compiled with both LARGEFILE and 64-bit support, you *may* be able to create databases - larger than 2 GB. DBM::Deep by default uses 32-bit file offset tags, but - these can be changed by calling the static "set_pack()" method before - you do anything else. + larger than 4 GB. DBM::Deep by default uses 32-bit file offset tags, but + these can be changed by specifying the 'pack_size' parameter when + constructing the file. - DBM::Deep::set_pack(8, 'Q'); + DBM::Deep->new( + filename => $filename, + pack_size => 'large', + ); This tells DBM::Deep to pack all file offsets with 8-byte (64-bit) quad words instead of 32-bit longs. After setting these values your DB files have a theoretical maximum size of 16 XB (exabytes). + You can also use "pack_size => 'small'" in order to use 16-bit file + offsets. + Note: Changing these values will NOT work for existing database files. - Only change this for new files, and make sure it stays set consistently - throughout the file's life. If you do set these values, you can no - longer access 32-bit DB files. You can, however, call "set_pack(4, 'N')" - to change back to 32-bit mode. + Only change this for new files. Once the value has been set, it is + stored in the file's header and cannot be changed for the life of the + file. These parameters are per-file, meaning you can access 32-bit and + 64-bit files, as you choose. - Note: I have not personally tested files > 2 GB -- all my systems have - only a 32-bit Perl. However, I have received user reports that this does - indeed work! + Note: We have not personally tested files larger than 4 GB -- all my + systems have only a 32-bit Perl. However, I have received user reports + that this does indeed work. LOW-LEVEL ACCESS - If you require low-level access to the underlying FileHandle that Deep - uses, you can call the "fh()" method, which returns the handle: + If you require low-level access to the underlying filehandle that + DBM::Deep uses, you can call the "_fh()" method, which returns the + handle: - my $fh = $db->fh(); + my $fh = $db->_fh(); This method can be called on the root level of the datbase, or any child hashes or arrays. All levels share a *root* structure, which contains - things like the FileHandle, a reference counter, and all your options - you specified when you created the object. You can get access to this - root structure by calling the "root()" method. + things like the filehandle, a reference counter, and all the options + specified when you created the object. You can get access to this file + object by calling the "_storage()" method. - my $root = $db->root(); + my $file_obj = $db->_storage(); This is useful for changing options after the object has already been - created, such as enabling/disabling locking, volatile or debug modes. - You can also store your own temporary user data in this structure (be - wary of name collision), which is then accessible from any child hash or - array. + created, such as enabling/disabling locking. You can also store your own + temporary user data in this structure (be wary of name collision), which + is then accessible from any child hash or array. CUSTOM DIGEST ALGORITHM DBM::Deep by default uses the *Message Digest 5* (MD5) algorithm for hashing keys. However you can override this, and use another algorithm - (such as SHA-256) or even write your own. But please note that Deep + (such as SHA-256) or even write your own. But please note that DBM::Deep currently expects zero collisions, so your algorithm has to be *perfect*, so to speak. Collision detection may be introduced in a later version. - You can specify a custom digest algorithm by calling the static - "set_digest()" function, passing a reference to a subroutine, and the - length of the algorithm's hashes (in bytes). This is a global static - function, which affects ALL Deep objects. Here is a working example that - uses a 256-bit hash from the *Digest::SHA256* module. Please see - for more. - - use DBM::Deep; - use Digest::SHA256; - - my $context = Digest::SHA256::new(256); - - DBM::Deep::set_digest( \&my_digest, 32 ); - - my $db = new DBM::Deep "foo-sha.db"; - - $db->{key1} = "value1"; - $db->{key2} = "value2"; - print "key1: " . $db->{key1} . "\n"; - print "key2: " . $db->{key2} . "\n"; - - undef $db; - exit; - - sub my_digest { - return substr( $context->hash($_[0]), 0, 32 ); - } + You can specify a custom digest algorithm by passing it into the + parameter list for new(), passing a reference to a subroutine as the + 'digest' parameter, and the length of the algorithm's hashes (in bytes) + as the 'hash_size' parameter. Here is a working example that uses a + 256-bit hash from the *Digest::SHA256* module. Please see + for more + information. + + use DBM::Deep; + use Digest::SHA256; + + my $context = Digest::SHA256::new(256); + + my $db = DBM::Deep->new( + filename => "foo-sha.db", + digest => \&my_digest, + hash_size => 32, + ); + + $db->{key1} = "value1"; + $db->{key2} = "value2"; + print "key1: " . $db->{key1} . "\n"; + print "key2: " . $db->{key2} . "\n"; + + undef $db; + exit; + + sub my_digest { + return substr( $context->hash($_[0]), 0, 32 ); + } Note: Your returned digest strings must be EXACTLY the number of bytes - you specify in the "set_digest()" function (in this case 32). + you specify in the hash_size parameter (in this case 32). + + Note: If you do choose to use a custom digest algorithm, you must set it + every time you access this file. Otherwise, the default (MD5) will be + used. CIRCULAR REFERENCES + NOTE: DBM::Deep 0.99_03 has turned off circular references pending + evaluation of some edge cases. I hope to be able to re-enable circular + references in a future version after 1.00. This means that circular + references are NO LONGER available. + DBM::Deep has experimental support for circular references. Meaning you can have a nested hash key or array element that points to a parent object. This relationship is stored in the DB file, and is preserved between sessions. Here is an example: - my $db = new DBM::Deep "foo.db"; - - $db->{foo} = "bar"; - $db->{circle} = $db; # ref to self - - print $db->{foo} . "\n"; # prints "foo" - print $db->{circle}->{foo} . "\n"; # prints "foo" again - - One catch is, passing the object to a function that recursively walks - the object tree (such as *Data::Dumper* or even the built-in - "optimize()" or "export()" methods) will result in an infinite loop. The - other catch is, if you fetch the *key* of a circular reference (i.e. - using the "first_key()" or "next_key()" methods), you will get the - *target object's key*, not the ref's key. This gets even more - interesting with the above example, where the *circle* key points to the - base DB object, which technically doesn't have a key. So I made - DBM::Deep return "[base]" as the key name in that special case. - -CAVEATS / ISSUES / BUGS - This section describes all the known issues with DBM::Deep. It you have - found something that is not listed here, please send e-mail to - jhuckaby@cpan.org. - - UNUSED SPACE RECOVERY - One major caveat with Deep is that space occupied by existing keys and - values is not recovered when they are deleted. Meaning if you keep - deleting and adding new keys, your file will continuously grow. I am - working on this, but in the meantime you can call the built-in - "optimize()" method from time to time (perhaps in a crontab or - something) to recover all your unused space. - - $db->optimize(); # returns true on success - - This rebuilds the ENTIRE database into a new file, then moves it on top - of the original. The new file will have no unused space, thus it will - take up as little disk space as possible. Please note that this - operation can take a long time for large files, and you need enough disk - space to temporarily hold 2 copies of your DB file. The temporary file - is created in the same directory as the original, named with a ".tmp" - extension, and is deleted when the operation completes. Oh, and if - locking is enabled, the DB is automatically locked for the entire - duration of the copy. - - WARNING: Only call optimize() on the top-level node of the database, and - make sure there are no child references lying around. Deep keeps a - reference counter, and if it is greater than 1, optimize() will abort - and return undef. - - AUTOVIVIFICATION - Unfortunately, autovivification doesn't work with tied hashes. This - appears to be a bug in Perl's tie() system, as *Jakob Schmidt* - encountered the very same issue with his *DWH_FIle* module (see - ), and it is also - mentioned in the BUGS section for the *MLDBM* module ). Basically, on a new db - file, this does not work: - - $db->{foo}->{bar} = "hello"; - - Since "foo" doesn't exist, you cannot add "bar" to it. You end up with - "foo" being an empty hash. Try this instead, which works fine: - - $db->{foo} = { bar => "hello" }; - - As of Perl 5.8.7, this bug still exists. I have walked very carefully - through the execution path, and Perl indeed passes an empty hash to the - STORE() method. Probably a bug in Perl. - - FILE CORRUPTION - The current level of error handling in Deep is minimal. Files *are* - checked for a 32-bit signature on open(), but other corruption in files - can cause segmentation faults. Deep may try to seek() past the end of a - file, or get stuck in an infinite loop depending on the level of - corruption. File write operations are not checked for failure (for - speed), so if you happen to run out of disk space, Deep will probably - fail in a bad way. These things will be addressed in a later version of - DBM::Deep. - - DB OVER NFS - Beware of using DB files over NFS. Deep uses flock(), which works well - on local filesystems, but will NOT protect you from file corruption over - NFS. I've heard about setting up your NFS server with a locking daemon, - then using lockf() to lock your files, but your milage may vary there as - well. From what I understand, there is no real way to do it. However, if - you need access to the underlying FileHandle in Deep for using some - other kind of locking scheme like lockf(), see the "LOW-LEVEL ACCESS" - section above. - - COPYING OBJECTS + my $db = DBM::Deep->new( "foo.db" ); + + $db->{foo} = "bar"; + $db->{circle} = $db; # ref to self + + print $db->{foo} . "\n"; # prints "bar" + print $db->{circle}->{foo} . "\n"; # prints "bar" again + + Note: Passing the object to a function that recursively walks the object + tree (such as *Data::Dumper* or even the built-in "optimize()" or + "export()" methods) will result in an infinite loop. This will be fixed + in a future release. + +TRANSACTIONS + New in 0.99_01 is ACID transactions. Every DBM::Deep object is + completely transaction-ready - it is not an option you have to turn on. + You do have to specify how many transactions may run simultaneously + (q.v. "num_txns"). + + Three new methods have been added to support them. They are: + + * begin_work() + This starts a transaction. + + * commit() + This applies the changes done within the transaction to the mainline + and ends the transaction. + + * rollback() + This discards the changes done within the transaction to the + mainline and ends the transaction. + + Transactions in DBM::Deep are done using a variant of the MVCC method, + the same method used by the InnoDB MySQL engine. + + Software-Transactional Memory + The addition of transactions to this module provides the basis for STM + within Perl 5. Contention is resolved using a default last-write-wins. + Currently, this default cannot be changed, but it will be addressed in a + future version. + +PERFORMANCE + Because DBM::Deep is a conncurrent datastore, every change is flushed to + disk immediately and every read goes to disk. This means that DBM::Deep + functions at the speed of disk (generally 10-20ms) vs. the speed of RAM + (generally 50-70ns), or at least 150-200x slower than the comparable + in-memory datastructure in Perl. + + There are several techniques you can use to speed up how DBM::Deep + functions. + + * Put it on a ramdisk + The easiest and quickest mechanism to making DBM::Deep run faster is + to create a ramdisk and locate the DBM::Deep file there. Doing this + as an option may become a feature of DBM::Deep, assuming there is a + good ramdisk wrapper on CPAN. + + * Work at the tightest level possible + It is much faster to assign the level of your db that you are + working with to an intermediate variable than to re-look it up every + time. Thus + + # BAD + while ( my ($k, $v) = each %{$db->{foo}{bar}{baz}} ) { + ... + } + + # GOOD + my $x = $db->{foo}{bar}{baz}; + while ( my ($k, $v) = each %$x ) { + ... + } + + * Make your file as tight as possible + If you know that you are not going to use more than 65K in your + database, consider using the "pack_size => 'small'" option. This + will instruct DBM::Deep to use 16bit addresses, meaning that the + seek times will be less. + +TODO + The following are items that are planned to be added in future releases. + These are separate from the "CAVEATS, ISSUES & BUGS" below. + + Sub-Transactions + Right now, you cannot run a transaction within a transaction. Removing + this restriction is technically straightforward, but the combinatorial + explosion of possible usecases hurts my head. If this is something you + want to see immediately, please submit many testcases. + + Caching + If a user is willing to assert upon opening the file that this process + will be the only consumer of that datafile, then there are a number of + caching possibilities that can be taken advantage of. This does, + however, mean that DBM::Deep is more vulnerable to losing data due to + unflushed changes. It also means a much larger in-memory footprint. As + such, it's not clear exactly how this should be done. Suggestions are + welcome. + + Ram-only + The techniques used in DBM::Deep simply require a seekable contiguous + datastore. This could just as easily be a large string as a file. By + using substr, the STM capabilities of DBM::Deep could be used within a + single-process. I have no idea how I'd specify this, though. Suggestions + are welcome. + + Importing using Data::Walker + Right now, importing is done using "Clone::clone()" to make a complete + copy in memory, then tying that copy. It would be much better to use + Data::Walker to walk the data structure instead, particularly in the + case of large datastructures. + + Different contention resolution mechanisms + Currently, the only contention resolution mechanism is last-write-wins. + This is the mechanism used by most RDBMSes and should be good enough for + most uses. For advanced uses of STM, other contention mechanisms will be + needed. If you have an idea of how you'd like to see contention + resolution in DBM::Deep, please let me know. + +CAVEATS, ISSUES & BUGS + This section describes all the known issues with DBM::Deep. These are + issues that are either intractable or depend on some feature within Perl + working exactly right. It you have found something that is not listed + below, please send an e-mail to rkinyon@cpan.org. Likewise, if you think + you know of a way around one of these issues, please let me know. + + References + (The following assumes a high level of Perl understanding, specifically + of references. Most users can safely skip this section.) + + Currently, the only references supported are HASH and ARRAY. The other + reference types (SCALAR, CODE, GLOB, and REF) cannot be supported for + various reasons. + + * GLOB + These are things like filehandles and other sockets. They can't be + supported because it's completely unclear how DBM::Deep should + serialize them. + + * SCALAR / REF + The discussion here refers to the following type of example: + + my $x = 25; + $db->{key1} = \$x; + + $x = 50; + + # In some other process ... + + my $val = ${ $db->{key1} }; + + is( $val, 50, "What actually gets stored in the DB file?" ); + + The problem is one of synchronization. When the variable being + referred to changes value, the reference isn't notified, which is + kind of the point of references. This means that the new value won't + be stored in the datafile for other processes to read. There is no + TIEREF. + + It is theoretically possible to store references to values already + within a DBM::Deep object because everything already is + synchronized, but the change to the internals would be quite large. + Specifically, DBM::Deep would have to tie every single value that is + stored. This would bloat the RAM footprint of DBM::Deep at least + twofold (if not more) and be a significant performance drain, all to + support a feature that has never been requested. + + * CODE + Data::Dump::Streamer provides a mechanism for serializing coderefs, + including saving off all closure state. This would allow for + DBM::Deep to store the code for a subroutine. Then, whenever the + subroutine is read, the code could be "eval()"'ed into being. + However, just as for SCALAR and REF, that closure state may change + without notifying the DBM::Deep object storing the reference. Again, + this would generally be considered a feature. + + File corruption + The current level of error handling in DBM::Deep is minimal. Files *are* + checked for a 32-bit signature when opened, but any other form of + corruption in the datafile can cause segmentation faults. DBM::Deep may + try to "seek()" past the end of a file, or get stuck in an infinite loop + depending on the level and type of corruption. File write operations are + not checked for failure (for speed), so if you happen to run out of disk + space, DBM::Deep will probably fail in a bad way. These things will be + addressed in a later version of DBM::Deep. + + DB over NFS + Beware of using DBM::Deep files over NFS. DBM::Deep uses flock(), which + works well on local filesystems, but will NOT protect you from file + corruption over NFS. I've heard about setting up your NFS server with a + locking daemon, then using "lockf()" to lock your files, but your + mileage may vary there as well. From what I understand, there is no real + way to do it. However, if you need access to the underlying filehandle + in DBM::Deep for using some other kind of locking scheme like "lockf()", + see the "LOW-LEVEL ACCESS" section above. + + Copying Objects Beware of copying tied objects in Perl. Very strange things can happen. - Instead, use Deep's "clone()" method which safely copies the object and - returns a new, blessed, tied hash or array to the same level in the DB. + Instead, use DBM::Deep's "clone()" method which safely copies the object + and returns a new, blessed and tied hash or array to the same level in + the DB. + + my $copy = $db->clone(); - my $copy = $db->clone(); + Note: Since clone() here is cloning the object, not the database + location, any modifications to either $db or $copy will be visible to + both. - LARGE ARRAYS + Large Arrays Beware of using "shift()", "unshift()" or "splice()" with large arrays. These functions cause every element in the array to move, which can be murder on DBM::Deep, as every element has to be fetched from disk, then - stored again in a different location. This may be addressed in a later + stored again in a different location. This will be addressed in a future version. -PERFORMANCE - This section discusses DBM::Deep's speed and memory usage. - - SPEED - Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, - such as the almighty *BerkeleyDB*. But it makes up for it in features - like true multi-level hash/array support, and cross-platform FTPable - files. Even so, DBM::Deep is still pretty fast, and the speed stays - fairly consistent, even with huge databases. Here is some test data: - - Adding 1,000,000 keys to new DB file... - - At 100 keys, avg. speed is 2,703 keys/sec - At 200 keys, avg. speed is 2,642 keys/sec - At 300 keys, avg. speed is 2,598 keys/sec - At 400 keys, avg. speed is 2,578 keys/sec - At 500 keys, avg. speed is 2,722 keys/sec - At 600 keys, avg. speed is 2,628 keys/sec - At 700 keys, avg. speed is 2,700 keys/sec - At 800 keys, avg. speed is 2,607 keys/sec - At 900 keys, avg. speed is 2,190 keys/sec - At 1,000 keys, avg. speed is 2,570 keys/sec - At 2,000 keys, avg. speed is 2,417 keys/sec - At 3,000 keys, avg. speed is 1,982 keys/sec - At 4,000 keys, avg. speed is 1,568 keys/sec - At 5,000 keys, avg. speed is 1,533 keys/sec - At 6,000 keys, avg. speed is 1,787 keys/sec - At 7,000 keys, avg. speed is 1,977 keys/sec - At 8,000 keys, avg. speed is 2,028 keys/sec - At 9,000 keys, avg. speed is 2,077 keys/sec - At 10,000 keys, avg. speed is 2,031 keys/sec - At 20,000 keys, avg. speed is 1,970 keys/sec - At 30,000 keys, avg. speed is 2,050 keys/sec - At 40,000 keys, avg. speed is 2,073 keys/sec - At 50,000 keys, avg. speed is 1,973 keys/sec - At 60,000 keys, avg. speed is 1,914 keys/sec - At 70,000 keys, avg. speed is 2,091 keys/sec - At 80,000 keys, avg. speed is 2,103 keys/sec - At 90,000 keys, avg. speed is 1,886 keys/sec - At 100,000 keys, avg. speed is 1,970 keys/sec - At 200,000 keys, avg. speed is 2,053 keys/sec - At 300,000 keys, avg. speed is 1,697 keys/sec - At 400,000 keys, avg. speed is 1,838 keys/sec - At 500,000 keys, avg. speed is 1,941 keys/sec - At 600,000 keys, avg. speed is 1,930 keys/sec - At 700,000 keys, avg. speed is 1,735 keys/sec - At 800,000 keys, avg. speed is 1,795 keys/sec - At 900,000 keys, avg. speed is 1,221 keys/sec - At 1,000,000 keys, avg. speed is 1,077 keys/sec - - This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & - Perl 5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash - keys and values were between 6 - 12 chars in length. The DB file ended - up at 210MB. Run time was 12 min 3 sec. - - MEMORY USAGE - One of the great things about DBM::Deep is that it uses very little - memory. Even with huge databases (1,000,000+ keys) you will not see much - increased memory on your process. Deep relies solely on the filesystem - for storing and fetching data. Here is output from */usr/bin/top* before - even opening a database handle: - - PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND - 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl - - Basically the process is taking 2,716K of memory. And here is the same - process after storing and fetching 1,000,000 keys: - - PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND - 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl - - Notice the memory usage increased by only 56K. Test was performed on a - 700mHz x86 box running Linux RedHat 7.2 & Perl 5.6.1. - -DB FILE FORMAT - In case you were interested in the underlying DB file format, it is - documented here in this section. You don't need to know this to use the - module, it's just included for reference. - - SIGNATURE - DBM::Deep files always start with a 32-bit signature to identify the - file type. This is at offset 0. The signature is "DPDB" in network byte - order. This is checked upon each file open(). - - TAG - The DBM::Deep file is in a *tagged format*, meaning each section of the - file has a standard header containing the type of data, the length of - data, and then the data itself. The type is a single character (1 byte), - the length is a 32-bit unsigned long in network byte order, and the data - is, well, the data. Here is how it unfolds: - - MASTER INDEX - Immediately after the 32-bit file signature is the *Master Index* - record. This is a standard tag header followed by 1024 bytes (in 32-bit - mode) or 2048 bytes (in 64-bit mode) of data. The type is *H* for hash - or *A* for array, depending on how the DBM::Deep object was constructed. - - The index works by looking at a *MD5 Hash* of the hash key (or array - index number). The first 8-bit char of the MD5 signature is the offset - into the index, multipled by 4 in 32-bit mode, or 8 in 64-bit mode. The - value of the index element is a file offset of the next tag for the - key/element in question, which is usually a *Bucket List* tag (see - below). - - The next tag *could* be another index, depending on how many - keys/elements exist. See RE-INDEXING below for details. - - BUCKET LIST - A *Bucket List* is a collection of 16 MD5 hashes for keys/elements, plus - file offsets to where the actual data is stored. It starts with a - standard tag header, with type *B*, and a data size of 320 bytes in - 32-bit mode, or 384 bytes in 64-bit mode. Each MD5 hash is stored in - full (16 bytes), plus the 32-bit or 64-bit file offset for the *Bucket* - containing the actual data. When the list fills up, a *Re-Index* - operation is performed (See RE-INDEXING below). - - BUCKET - A *Bucket* is a tag containing a key/value pair (in hash mode), or a - index/value pair (in array mode). It starts with a standard tag header - with type *D* for scalar data (string, binary, etc.), or it could be a - nested hash (type *H*) or array (type *A*). The value comes just after - the tag header. The size reported in the tag header is only for the - value, but then, just after the value is another size (32-bit unsigned - long) and then the plain key itself. Since the value is likely to be - fetched more often than the plain key, I figured it would be *slightly* - faster to store the value first. - - If the type is *H* (hash) or *A* (array), the value is another *Master - Index* record for the nested structure, where the process begins all - over again. - - RE-INDEXING - After a *Bucket List* grows to 16 records, its allocated space in the - file is exhausted. Then, when another key/element comes in, the list is - converted to a new index record. However, this index will look at the - next char in the MD5 hash, and arrange new Bucket List pointers - accordingly. This process is called *Re-Indexing*. Basically, a new - index tag is created at the file EOF, and all 17 (16 + new one) - keys/elements are removed from the old Bucket List and inserted into the - new index. Several new Bucket Lists are created in the process, as a new - MD5 char from the key is being examined (it is unlikely that the keys - will all share the same next char of their MD5s). - - Because of the way the *MD5* algorithm works, it is impossible to tell - exactly when the Bucket Lists will turn into indexes, but the first - round tends to happen right around 4,000 keys. You will see a *slight* - decrease in performance here, but it picks back up pretty quick (see - SPEED above). Then it takes a lot more keys to exhaust the next level of - Bucket Lists. It's right around 900,000 keys. This process can continue - nearly indefinitely -- right up until the point the *MD5* signatures - start colliding with each other, and this is EXTREMELY rare -- like - winning the lottery 5 times in a row AND getting struck by lightning - while you are walking to cash in your tickets. Theoretically, since - *MD5* hashes are 128-bit values, you *could* have up to - 340,282,366,921,000,000,000,000,000,000,000,000,000 keys/elements (I - believe this is 340 unodecillion, but don't quote me). - - STORING - When a new key/element is stored, the key (or index number) is first ran - through *Digest::MD5* to get a 128-bit signature (example, in hex: - b05783b0773d894396d475ced9d2f4f6). Then, the *Master Index* record is - checked for the first char of the signature (in this case *b*). If it - does not exist, a new *Bucket List* is created for our key (and the next - 15 future keys that happen to also have *b* as their first MD5 char). - The entire MD5 is written to the *Bucket List* along with the offset of - the new *Bucket* record (EOF at this point, unless we are replacing an - existing *Bucket*), where the actual data will be stored. - - FETCHING - Fetching an existing key/element involves getting a *Digest::MD5* of the - key (or index number), then walking along the indexes. If there are - enough keys/elements in this DB level, there might be nested indexes, - each linked to a particular char of the MD5. Finally, a *Bucket List* is - pointed to, which contains up to 16 full MD5 hashes. Each is checked for - equality to the key in question. If we found a match, the *Bucket* tag - is loaded, where the value and plain key are stored. - - Fetching the plain key occurs when calling the *first_key()* and - *next_key()* methods. In this process the indexes are walked - systematically, and each key fetched in increasing MD5 order (which is - why it appears random). Once the *Bucket* is found, the value is skipped - the plain key returned instead. Note: Do not count on keys being fetched - as if the MD5 hashes were alphabetically sorted. This only happens on an - index-level -- as soon as the *Bucket Lists* are hit, the keys will come - out in the order they went in -- so it's pretty much undefined how the - keys will come out -- just like Perl's built-in hashes. - -AUTHOR - Joseph Huckaby, jhuckaby@cpan.org - - Special thanks to Adam Sah and Rich Gaushell! You know why :-) + Writeonly Files + If you pass in a filehandle to new(), you may have opened it in either a + readonly or writeonly mode. STORE will verify that the filehandle is + writable. However, there doesn't seem to be a good way to determine if a + filehandle is readable. And, if the filehandle isn't readable, it's not + clear what will happen. So, don't do that. + + Assignments Within Transactions + The following will *not* work as one might expect: + + my $x = { a => 1 }; + + $db->begin_work; + $db->{foo} = $x; + $db->rollback; + + is( $x->{a}, 1 ); # This will fail! + + The problem is that the moment a reference used as the rvalue to a + DBM::Deep object's lvalue, it becomes tied itself. This is so that + future changes to $x can be tracked within the DBM::Deep file and is + considered to be a feature. By the time the rollback occurs, there is no + knowledge that there had been an $x or what memory location to assign an + "export()" to. + + NOTE: This does not affect importing because imports do a walk over the + reference to be imported in order to explicitly leave it untied. + +CODE COVERAGE + Devel::Cover is used to test the code coverage of the tests. Below is + the Devel::Cover report on this distribution's test suite. + + ---------------------------- ------ ------ ------ ------ ------ ------ ------ + File stmt bran cond sub pod time total + ---------------------------- ------ ------ ------ ------ ------ ------ ------ + blib/lib/DBM/Deep.pm 96.8 87.9 90.5 100.0 89.5 4.5 95.2 + blib/lib/DBM/Deep/Array.pm 100.0 94.3 100.0 100.0 100.0 4.8 98.7 + blib/lib/DBM/Deep/Engine.pm 97.2 86.4 86.0 100.0 0.0 56.8 91.0 + blib/lib/DBM/Deep/File.pm 98.1 83.3 66.7 100.0 0.0 31.4 88.0 + blib/lib/DBM/Deep/Hash.pm 100.0 100.0 100.0 100.0 100.0 2.5 100.0 + Total 97.7 88.1 86.6 100.0 31.6 100.0 93.0 + ---------------------------- ------ ------ ------ ------ ------ ------ ------ + +MORE INFORMATION + Check out the DBM::Deep Google Group at + or send email to + DBM-Deep@googlegroups.com. You can also visit #dbm-deep on irc.perl.org + + The source code repository is at + +MAINTAINER(S) + Rob Kinyon, rkinyon@cpan.org + + Originally written by Joseph Huckaby, jhuckaby@cpan.org + +CONTRIBUTORS + The following have contributed greatly to make DBM::Deep what it is + today: + + * Adam Sah and Rich Gaushell + * Stonehenge for sponsoring the 1.00 release + * Dan Golden and others at YAPC::NA 2006 for helping me design through + transactions. SEE ALSO perltie(1), Tie::Hash(3), Digest::MD5(3), Fcntl(3), flock(2), lockf(3), nfs(5), Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3) LICENSE - Copyright (c) 2002-2005 Joseph Huckaby. All Rights Reserved. This is - free software, you may use it and distribute it under the same terms as - Perl itself. + Copyright (c) 2007 Rob Kinyon. All Rights Reserved. This is free + software, you may use it and distribute it under the same terms as Perl + itself. diff --git a/article.pod b/article.pod deleted file mode 100644 index 5441cef..0000000 --- a/article.pod +++ /dev/null @@ -1,282 +0,0 @@ -=head0 Adding transactions to DBM::Deep - -=head1 What is DBM::Deep? - -L is a module written completely in Perl that provides a way of -storing Perl datastructures (scalars, hashes, and arrays) on disk instead of -in memory. The datafile produced is able to be ftp'ed from one machine to -another, regardless of OS or Perl version. There are several reasons why -someone would want to do this. - -=over 4 - -=item * Transparent Persistence - -This is the ability to save a set of data structures to disk and retrieve them -later without the vast majority of the program even knowing that the data is -persisted. Furthermore, the datastructure is persisted immediately and not at -set marshalling periods. - -=item * Huge datastructures - -Normally, datastructures are limited by the size of RAM the server has. -L allows for the size a given datastructure to be limited by disk -instead. - -=item * IPC - -While not a common use, this allows for inter-process communication without -worrying about the specifics of how a given OS handles IPC. - -=back - -And, with the release of 1.00, there is now a fourth reason - -software-transactional memory, or STM -(L). - -=head1 What are transactions? - -Originally from the database world, a transaction is a way of isolating the -effects of a given set of actions, then applying them all at once. It's a way -of saying "I'm going to try the following steps, see if I like the result, -then I want everyone else looking at this datastore to see the results -immediately." The most common example is taken from banking. Let's say that an -application receives a request to have Joe pay Bob five zorkmids. Without -transactions, the application would take the money from Joe's account, then -add the money to Bob's account. But, what happens if the application crashes -after debiting Joe, but before crediting Bob? The application has made money -disappear. Or, vice versa, if Bob is credited before Joe is debited, the -application has created money. - -With a transaction wrapping the money transfer, if the application crashes in -the middle, it's as if the action never happened. So, when the application -recovers from the crash, Joe and Bob still have the same amount of money in -their accounts as they did before and the transaction can restart and Bob can -finally receive his zorkmids. - -More formally, transactions are generally considered to be proper when they are -ACID-compliant. ACID is an acronym that means the following: - -=over 4 - -=item * Atomic - -Either every change happens or none of the changes happen. - -=item * Consistent - -When the transaction begins and when it is committed, the database must be in -a legal state. This restriction doesn't apply to L very much. - -=item * Isolated - -As far as a transaction is concerned, it is the only thing running against the -database while it is running. Unlike most RDBMSes, L provides the -strongest isolation level possible. - -=item * Durable - -Once the database says that a comit has happened, the commit will be -guaranteed, regardless of whatever happens. - -=back - -=head1 Why add them to DBM::Deep? - -The ability to have actions occur in either I (as in the previous -example) or I from the rest of the users of the data is a powerful -thing. This allows for a certain amount of safety and predictability in how -data transformations occur. Imagine, for example, that you have a set of -calculations that will update various variables. However, there are some -situations that will cause you to throw away all results and start over with a -different seed. Without transactions, you would have to put everything into -temporary variables, then transfer the values when the calculations were found -to be successful. With STM, you start a transaction and do your thing within -it. If the calculations succeed, you commit. If they fail, you rollback and -try again. If you're thinking that this is very similar to how SVN or CVS -works, you're absolutely correct - they are transactional in the exact same -way. - -=head1 How it happened - -=head2 The backstory - -The addition of transactions to L has easily been the single most -complex software endeavor I've ever undertaken. The first step was to figure -out exactly how transactions were going to work. After several spikesN, the best design seemed to look to SVN -instead of relational databases. The more I investigated, the more I ran up -against the object-relational impedance mismatch -N, this -time in terms of being able to translate designs. In the relational world, -transactions are generally implemented either as row-level locks or using MVCC -N. Both of -these assume that there is a I, or singular object, that can be locked -transparently to everything else. This doesn't translate to a fractally -repeating structure like a hash or an array. - -However, the design used by SVN deals with directories and files which -corresponds very closely to hashes and hashkeys. In SVN, the modifications are -stored in the file's structure. Translating this to hashes and hashkeys, this -means that transactional information should be stored in the keys. This means -that the entire datafile is unaware of anything to do with transactions, except -for the key's data structure within the bucket. - -=head2 DBM::Deep's file structure - -L's file structure is a record-based structure. The key (or array -index - arrays are currently just funny hashes internally) is hashed using MD5 -and then stored in a cascade of Index and Bucketlist records. The bucketlist -record stores the actual key string and pointers to where the data records are -stored. The data records themselves are one of Null, Scalar, or Reference. -Null represents an I, Scalar represents a string (numbers are -stringified for simplicity) and are allocated in 256byte chunks. References -represent an array or hash reference and contains a pointer to an Index and -Bucketlist cascade of its own. - -=head2 Transactions in the keys - -The first pass was to expand the Bucketlist sector to go from a simple key / -datapointer mapping to a more complex key / transaction / datapointer mapping. -Initially, I interposed a Transaction record that the bucketlist pointed to. -That then contained the transaction / datapointer mapping. This had the -advantage of changing nothing except for adding one new sector type and the -handling for it. This was very quickly merged into the Bucketlist record to -simplify the resulting code. - -This first step got me to the point where I could pass the following test: - - my $db1 = DBM::Deep->new( $filename ); - my $db2 = DBM::Deep->new( $filename ); - - $db1->{abc} = 'foo'; - - is( $db1->{abc}, 'foo' ); - is( $db2->{abc}, 'foo' ); - - $db1->begin_work(); - - is( $db1->{abc}, 'foo' ); - is( $db2->{abc}, 'foo' ); - - $db1->{abc} = 'floober'; - - is( $db1->{abc}, 'floober' ); - is( $db2->{abc}, 'foo' ); - -Just that much was a major accomplishment. The first pass only looked in the -transaction's spot in the bucket for that key. And, that passed my first tests -because I didn't check that C<$db1-E{abc}> was still 'foo' I -modifying it in the transaction. To pass that test, the code for retrieval -needed to look first in the transaction's spot and if that spot had never been -assigned to, look at the spot for the HEAD. - -=head2 The concept of the HEAD - -This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest -revision checked into the repository. When you do a ocal modification, you're -doing a modification to the HEAD. Then, you choose to either check in your -code (commit()) or revert (rollback()). - -In L, I chose to make the HEAD transaction ID 0. This has several -benefits: - -=over 4 - -=item * Easy identifiaction of a transaction - -C will run the code if and only if we are in a running -transaction. - -=item * The HEAD is the first bucket - -In a given bucket, the HEAD is the first datapointer because we mutliply the -size of the transactional bookkeeping by the transaction ID to find the offset -to seek into the file. - -=back - -=head2 Protection from changes - -Let's assume that a transaction is running in one process and another process -is also modifying the same area in the data. The only way that process B can -notify process A that a change has occurred is through the common point - the -DBM file. Because of this, every process that changes the HEAD needs to -protect all currently running transactions by copying over the pointer to the -original value into every transaction which hasn't already modified this -entry. (If it has, then the new value shouldn't overwrite the transaction's -modification.) This is the key piece for providing I. - -=head2 Tracking modified buckets - -Rolling back changes is very simple - just don't apply them to the HEAD. The -next time that transaction ID is reused, the changes will be ignored (q.v. -L). Committing, however, requires that all the changes -must be transferred over from the bucket entry for the given transaction ID to -the entry for the HEAD. - -=head2 Deleted marker - -Transactions are performed copy-on-write. This means that if there isn't an -entry for that transaction, the HEAD is looked at. This doesn't work if a key -has been deleted within a transaction. So, the entry must be marked as deleted -within the transaction so that the HEAD isn't checekd. - -Likewise, when a new key is created in a transaction, the HEAD doesn't have an -entry for that key. Consider the following situation: - - ok( !exists $db1->{foo} ); - ok( !exists $db2->{foo} ); - - $db1->begin_work(); - $db1->{foo} = 'bar'; - - ok( !exists $db2->{foo} ); - -The entry for the HEAD for 'foo' needs to be marked as deleted so that -transactions which don't have 'foo' don't find something in the HEAD. - -=head2 Freespace management - -The second major piece to the 1.00 release was freespace management. In -pre-1.00 versions of L, the space used by deleted keys would not be -recycled. While always a requested feature, the complexity required to -implement freespace meant that it needed to wait for a complete rewrite of -several pieces, such as for transactions. - -Freespace is implemented by regularizing all the records so that L -only has three different record sizes - Index, BucketList, and Data. Each -record type has a fixed length based on various parameters the L -datafile is created with. (In order to accomodate values of various sizes, Data -records chain.) Whenever a sector is freed, it's added to a freelist of that -sector's size. Whenever a new sector is requested, the freelist is checked -first. If the freelist has a sector, it's reused, otherwise a new sector is -added to the end of the datafile. - -Freespace management did bring up another issue - staleness. It is possible to -have a pointer to a record in memory. If that record is deleted, then reused, -the pointer in memory has no way of determining that is was deleted and -readded vs. modified. So, a staleness counter was added which is incremented -every time the sector is reused through the freelist. If you then attempt to -access that stale record, L returns undef because, at some point, -the entry was deleted. - -=head2 Staleness counters - -Once it was implemented for freespace management, staleness counters proved to -be a very powerful concept for transactions themselves. Back in L, I mentioned that other processes modifying the HEAD will -protect all running transactions from their effects. This provides -I. But, the running transaction doesn't know about these entries. -If they're not cleaned up, they will be seen the next time a transaction uses -that transaction ID. - -By providing a staleness counter for transactions, the costs of cleaning up -finished transactions is deferred until the space is actually used again. This -is at the cost of having less-than-optimal space utilization. Changing this in -the future would be completely transparent to users, so I felt it was an -acceptable tradeoff for delivering working code quickly. - -=head1 Conclusion - -=cut diff --git a/articles/how_transactions_were_added.pod b/articles/how_transactions_were_added.pod new file mode 100644 index 0000000..851d25f --- /dev/null +++ b/articles/how_transactions_were_added.pod @@ -0,0 +1,593 @@ +=head0 Adding transactions to DBM::Deep + +For the past nine months, I've been working on adding transactions to +L. During that time, I looked far and wide for an +accessible description of how a programmer should go about implementing +transactions. The only things I found were either extremely pedantic academic +papers or the code of complex applications. The former weren't very easy to +read and the latter were less so N ever tried to read the source +for BDB or InnoDB? Reading perl's source is easier.>. This is the article I +wished I'd been able to read nine months ago. + +=head1 What is DBM::Deep? + +L is a module written completely in Perl that provides a way of +storing Perl datastructures (scalars, hashes, and arrays) on disk instead of +in memory. The datafile produced is able to be transferred from one machine to +another, regardless of OS or Perl version. There are several reasons why +someone would want to do this. + +=over 4 + +=item * Transparent Persistence + +This is the ability to save a set of datastructures to disk and retrieve them +later without the vast majority of the program ever knowing that the data is +persisted. Furthermore, the datastructure is persisted immediately and not at +set marshalling periods. + +=item * Huge datastructures + +Normally, datastructures are limited by the size of RAM the server has. +L allows for the size a given datastructure to be limited by disk +size instead (up to the given perl's largefile support). + +=item * Database + +Most programmers hear the word "database" and think "relational database +management system" (or RDBMS). A database is a more general term meaning +"place one stores data." This can be relational, object, or something else. +The software used to manage and query a database is a "database management +system" (DBMS). + +L provides one half of a DBMS - the data storage part. +Once the datastructures on disk, L provides the +capability to allow multiple processes to access the data. N + +=back + +=head1 How does DBM::Deep work? + +L works by tying a variable to a file on disk. Every +read and write go directly to the file and modify the file immediately. To +represent Perl's hashes and arrays, a record-based file format is used. There +is a file header storing file-wide values, such as the size of the internal +file pointers. Afterwards, there are the data records. + +The most important feature of L is that it can be +completely transparent. Other than the line tying the variable to the file, no +other part of your program needs to know that the variable being used isn't a +"normal" Perl variable. So, the following works just fine: + + # As far as the rest of the program is concerned, the following two lines + # are identical - they produce a variable $foo that can be used as a hashref. + # my $foo = {}; + my $foo = DBM::Deep->new( 'mydb.db' ); + + $foo->{bar} = 'baz'; + $foo->{complex} = [ + { a => 'b' }, 0, '123', undef, [ 1 .. 5 ], + ]; + + # And so on and so forth. + +=head2 DBM::Deep's file structure + +L's file structure is record-based. The key (or array +index - arrays are currently just funny hashes internally) is hashed using MD5 +and then stored in a cascade of Index and bucketlist records. The bucketlist +record stores the actual key string and pointers to where the data records are +stored. The data records themselves are one of Null, Scalar, or Reference. +Null represents an I, Scalar represents a string (numbers are +stringified internally for simplicity) and are allocated in fixed-size chunks. +Reference represent an array or hash reference and contains a pointer to an +Index and bucketlist cascade of its own. Reference will also store the class +the hash or array reference is blessed into, meaning that almost all objects +can be stored safely. + +=head2 DBM::Deep's class hierarchy + +Managing all of these functions takes a lot of different abstractions. There +are at least 3 different interfacing layers, and more if you look hard enough. +To manage this complexity, L uses the following abstractions: + +=over 4 + +=item * Tying classes + +These are the classes that manage the external face of the module. They manage +B of the interactions with outside code - OO interface, tying, and +various utility methods. If they cannot handle the request themselves, they +delegate to the engine. There are currently three classes in this layer. + +=item * Engine classes + +These classes manage the file format and all of the ways that the records +interact with each other. Nearly every call will make requests to the File +classes for reading and/or writing data to the file. There are currently nine +classes in this layer, including a class for each record type. + +=item * File class + +This class mediates all interaction with the file system. Every read, write, +lock, and unlock goes through this class. There is currently one class in this +layer. + +=item * Iterator classes + +These are introspection classes that provide iteration for hashes. They manage +keeping track of where the next key should be and how to get there. There are +currently four classes in this layer. + +=back + +=head1 Why add transactions to DBM::Deep? + +For the most part, L functions perfectly well without +transactions. Most uses that I've seen tend to be either single-process +read/write persistence or a multi-process readonly cache for an expensive, but +static, lookup. With transactions, L can now be used +safely for multi-process read/write persistence in situations that don't +need (or cannot use) a full RDBMS. + +=head2 What are transactions? + +Originally from the database world, a transaction is a way of isolating the +effects of a given set of actions, then applying them all at once. It's a way +of saying "I'm going to try the following steps, see if I like the result, +then I want everyone else looking at this datastore to see the results +immediately." The most common example is taken from banking. Let's say that an +application receives a request to have Joe pay Bob five zorkmids. Without +transactions, the application would take the money from Joe's account, then +add the money to Bob's account. But, what happens if the application crashes +after debiting Joe, but before crediting Bob? The application has made money +disappear. Or, vice versa, if Bob is credited before Joe is debited, the +application has created money. + +With a transaction wrapping the money transfer, if the application crashes in +the middle, it's as if the action never happened. So, when the application +recovers from the crash, Joe and Bob still have the same amount of money in +their accounts as they did before. The transaction can restart and Bob can +finally receive his zorkmids. + +More formally, transactions are generally considered to be proper when they are +ACID-compliant. ACID is an acronym that stands for the following: + +=over 4 + +=item * Atomic + +Either every change happens or none of the changes happen. + +=item * Consistent + +When the transaction begins and when it is committed, the database must be in +a legal state. This condition doesn't apply to L as all +Perl datastructures are internally consistent. + +=item * Isolated + +As far as a transaction is concerned, it is the only thing running against the +database while it is running. Unlike most RDBMSes, L +provides the strongest isolation level possible, usually called +I by most RDBMSes. + +=item * Durable + +Once the database says that a commit has happened, the commit will be +guaranteed, regardless of whatever happens. I chose to not implement this +condition in L N. + +=back + +=head2 Why add them to DBM::Deep? + +The ability to have actions occur in either I (as in the previous +example) or I from the rest of the users of the data is a powerful +thing. This allows for a large amount of safety and predictability in how +data transformations occur. Imagine, for example, that you have a set of +calculations that will update various variables. However, there are some +situations that will cause you to throw away all results and start over with a +different seed. Without transactions, you would have to put everything into +temporary variables, then transfer the values when the calculations were found +to be successful. If you ever add a new value or if a value is used in only +certain calculations, you may forget to do the correct thing. With +transactions, you start a transaction and do your thing within it. If the +calculations succeed, you commit. If they fail, you rollback and try again. + +If you're thinking that this is very similar to how Subversion (SVN) or CVS +works, you're absolutely correct - they are transactional in exactly the same +way. + +=head1 How it happened + +The addition of transactions to L has easily been the +single most complex software endeavor I've ever undertaken. While transactions +are conceptually simple, the devil is in the details. And there were a B +of details. + +=head2 The naive approach + +Initially, I hoped I could just copy the entire datastructure and mark it as +owned by the transaction. This is the most straightforward solution and is +extremely simple to implement. Whenever a transaction starts, copy the whole +thing over to somewhere else. If the transaction is committed, overwrite the +original with the transaction's version. If it's rolled back, throw it away. + +It's a popular solution as seen by the fact that it's the mechanism used in +both L and +L. While very simple to +implement, it scales very poorly as the datastructure grows. As one of the +primary usecases for L is working with huge +datastructures, this plan was dead on arrival. + +=head2 The relational approach + +As I'm also a MySQL DBA, I looked to how the InnoDB engine implements +transactions. Given that relational databases are designed to work with large +amounts of data, it made sense to look here next. + +InnoDB implements transactions using MVCC +N. When a +transaction starts, it stores a timestamp corresponding to its start time. +Whenever a modification to a row is committed, the modification is +timestamped. When a transaction modifies a row, it copies the row into its +own scratchpad and modifies it. Whenever a transaction reads a row, it first +attempts to read the row from its scratchpad. If it's not there, then it reads +the version of the row whose timestamp is no later than the timestamp of the +transaction. When committing, the transaction's scratchpad is written out to +the main data area with the timestamp of the commit and the scratchpad is +thrown away. When rolling back, the scratchpad is thrown out. + +At first, this mechanism looked promising and I whipped up a couple spikes +(or code explorations) to try it out. The problem I ran into, again, was the +existence of large datastructures. When making large changes to a relational +database within a transaction, the engine can store the rows within the actual +table and mark them as being part of a transaction's scratchpad. Perl's +fractal datastructures, however, don't lend themselves to this kind of +treatment. The scratchpad would, in some pathological cases, be a +near-complete copy of the original datastructure. N + +=head2 The subversive approach + +Despairing, I went to YAPC::NA::2006 hoping to discuss the problem with the +best minds in the Perl community. I was lucky enough to run into both Audrey +Tang (author of Pugs) and clkao (author of SVK). In between talks, I managed +to discuss the problems I'd run into with both of them. They looked at me +oddly and asked why I wasn't looking at Subversion (SVN) as a model for +transactions. My first reaction was "It's a source control application. What +does it know about transa- . . . Ohhhh!" And they smiled. + +Like Perl datastructures, a filesystem is fractal. Directories contain both +files and directories. Directories act as hashes and a files act as scalars +whose names are their hashkeys. When a modification is made to a SVN checkout, +SVN tracks the changes at the filename (or directory name) level. When a +commit is made, only those filenames which have changes are copied over to the +HEAD. Everything else remains untouched. + +Translating this to hashes and hashkeys, this implies that transactional +information should be stored at the level of the hashkey. Or, in +L terms, within the bucket for that key. As a nice +side-effect, other than the key's datastructure within the bucket, the entire +datafile is unaware of anything to do with transactions. + +=head2 The spike + +Spikes are kind of like a reconnaissance mission in the military. They go out +to get intel on the enemy and are explicitly not supposed to take any ground +or, in many cases, take out of the enemy forces. In coding terms, the spike is +code meant to explore a problemspace that you B throw away and +reimplement. + +As transactions were going to be between the bucket for the key and the +datapointer to the value, my first thought was to put in another sector that +would handle this mapping. This had the advantage of changing nothing except +for adding one new sector type and the handling for it. Just doing this got me +to the point where I could pass the following test: + + my $db1 = DBM::Deep->new( $filename ); + my $db2 = DBM::Deep->new( $filename ); + + $db1->{abc} = 'foo'; + + is( $db1->{abc}, 'foo' ); + is( $db2->{abc}, 'foo' ); + + $db1->begin_work(); + + $db1->{abc} = 'floober'; + + is( $db1->{abc}, 'floober' ); + is( $db2->{abc}, 'foo' ); + +Just that much was a major accomplishment. + +=head2 Tests, tests, and more tests + +I was lucky that when I took over L that Joe Huckaby +(the original author) handed me a comprehensive test suite. This meant that I +could add in transactions with a high degree of confidence that I hadn't +messed up non-transactional uses. The test suite was also invaluable for +working through the various situations that transactions can cause. + +But, a test is only as good as the test-writer. For example, it was a while +before I realized that I needed to test C{abc}, 'foo' )> +I modifying it in the transaction. + +To pass that test, the code for retrieval needed to look first in the +transaction's spot and if that spot had never been assigned to, look at the +spot for the HEAD. While this is how SVN works, it wasn't an immediately +obvious test to write. + +=head2 The HEAD + +In SVN, the HEAD revision is the latest revision checked into the repository. +When you do a local modification, you're doing a modification to your copy of +the HEAD. Then, you choose to either check in (C) or revert +(C) your changes. + +In order to make the code work for the base case (no transaction running), the +first entry in the transaction sector became the HEAD. Thus, it was assigned +transaction ID 0. This also had the really neat side-benefit that C will run the code if and only if L is +in a running transaction. + +=head2 Ending the spike + +At this point, I had learned everything I needed from the spike. Yes, the +SVN idea looked like it was going to work. Yes, there were a lot of squibbly +details. No, it wasn't going to be finished before I left YAPC::NA. *sigh* + +The biggest lessons learned from the spike were: + +=over 4 + +=item 1 Tests are good + +I seem to have to relearn this every project I work on. It's pretty sad, if +you ask me. + +=item 1 The transaction sector is superfluous + +As I worked with it, the transaction sector didn't add any information over +extending the actual bucket to have the transaction to datapointer mapping +within it. + +=back + +=head2 Protection from changes + +After the missed test for checking that starting a transaction didn't lose the +connection to the HEAD, I started writing more and more tests, being very anal +about what I was checking. I wrote tests to check every piece of state I could +think of before and after every change in state, regardless of where the +change was made. Writing these tests immediately raised an issue with changing +the HEAD while a transaction is running. If the transaction has already edited +that key, it already has its new value. However, if it doesn't, it needs to be +protected from the change to the HEAD. This is the key piece for providing +I. + +My first attempt to solve this problem focused on having the transaction +itself detect changes. But, the primary usecase for transactions is that each +transaction is going to be running in a separate process. Without implementing +IPC, the only common point between various processes is the datafile itself. +The only process aware of the change is the process making the change. Even +though it seemed counter-intuitive, the only sane mechanism was that each +process modifying the HEAD would also protect all running transactions from +its change, if needed. + +=head2 Committing and rolling back + +Now that changes are able to be made within a transaction and the transaction, +the HEAD, and other transactions are protected from one other, the next step +was to provide the ability to both commit and rollback these changes. + +=head3 Rollback + +Conceptually, rolling back should the simpler to implement - just discard the +changes that have been made within the transaction and continue onward with +the HEAD. And, for the first attempt, that is exactly what I did. This meant +that the following test would pass: + + $db->{foo} = 'bar'; + + $db->begin_work; + + is( $db->{foo}, 'bar' ); + + $db->{foo} = 'baz'; + + is( $db->{foo}, 'baz' ); + + $db->rollback; + + is( $db->{foo}, 'bar' ); + +But, this didn't work out very well for repeated use of that transaction slot. +I threw a number of solutions at the problem, but none of them were +addressing the real issue - knowing which use of a transaction ID made the +change vs. which use of a transaction ID was accessing the value. + +XXX + +=head3 Committing + +Committing is much harder than rolling back. The primary difficulty lies in +tracking exactly what this transaction has changed in order to copy those +changed bucket entries over to the HEAD. The good news is that only the actual +datapointers for that transaction need to be copied over - the actual data +sectors are left untouched. + +The key to the solution lay in the decoupled nature of the code I was writing +along with the fact that every piece of the code had access to the engine +object, if needed. Committing (and rolling back) are both handled by the +Engine object. To get that information into the engine, each bucket modified +by the transaction would inform the engine object that it had been modified by +that transaction. When a commit occurs, the engine objet iterates over the +modified buckets and transfers over the new datapointer and discards the old +one. + +=head2 Deleted marker + +After some more tests, a final edge-case was found. Transactions are performed +copy-on-write. This means that if there isn't an entry for that transaction, +the HEAD is looked at. This doesn't work if a key has been deleted within a +transaction. So, the entry must be marked as deleted within the transaction so +that the HEAD isn't checekd. + +Likewise, when a new key is created in a transaction, the HEAD doesn't have an +entry for that key. Consider the following situation: + + ok( !exists $db1->{foo} ); + ok( !exists $db2->{foo} ); + + $db1->begin_work(); + $db1->{foo} = 'bar'; + + ok( !exists $db2->{foo} ); + +The entry for the HEAD for 'foo' needs to be marked as deleted so that +transactions which don't have 'foo' don't find something in the HEAD. To add +this, I originally used a separate flag for each datapointer to indicate if it +had been marked as deleted or not. I quickly recognized that a data-pointer +can never have a value of 0 or 1 as those would point to the first and second +bytes of the datafile, respectively. As these are part of the header, those +are nonsensical values, so can be re-used for metadata. 0 now means "This +slot has never been written to" and 1 means "This slot has been explicitly +deleted." + +=head2 Freespace management + +Pre-1.0000 versions of L didn't have any form of +freespace management. This meant that whenever a value was deleted, the old +value just sat around taking up space, even though it would never be accessed +again. While barely acceptable for non-transactional uses, this was made +transactions unusable because transactions, as I've implemented them, are +predicated on the concept of parallel values that are (presumably) cleaned up +after the transaction is done with them. + +Freespace had never been added before because it requires a different file +format than the one used in the pre-1.0000 versions. Because I had to change +the file format anyways B I needed the feature, adding freespace now +seemed like a good plan. + +Freespace was implemented by regularizing all the records so that +L only has three different record sizes - Index, +BucketList, and Data. Each record type has a fixed length based on various +parameters the L datafile is created with. (In order to +accomodate values of various sizes, Data records chain.) Whenever the engine +is finished with a sector, it is freed and added to a list of free sectors of +that sector's size. Whenever a new sector is requested, the freelist is +checked first. If the freelist has a sector, it's reused, otherwise a new +sector is added to the end of the datafile. + +Just like everything else, I wrote a mess of tests for adding freespace +management. One of the tests I thought up was the following: + + $db->{foo} = [ 1 .. 3]; + my $arr = $db->{foo}; + + is( $arr->[1], 2 ); # This always has worked. + + delete $db->{foo}; + + isnt( $arr->[1], 2 ); + +If this was a Perl datastructure, the last test should pass. In the past, that +test would fail. The key concept I realized was that the C<$arr> variable is +pointing to a stale area in memory. So, by adding a staleness counter that is +incremented whenever the sector in use is deleted, I am able to determine if +the variable in question is looking for a stale version of the sector. At this +point, L returns undef because, at some point, the entry +was deleted. + +=head2 Transactional staleness counters + +Once it was implemented for freespace management, staleness counters proved to +be a very powerful concept for transactions themselves. Back in L, I mentioned that other processes modifying the HEAD will +protect all running transactions from their effects. This provides +I. But, the running transaction doesn't know about these entries. +This is both a benefit and a drawback. It's a benefit that it makes tracking +modified buckets very simple (q.v. L). But, it means that changes +made to protect the transaction are not tracked. If they're not cleaned up, +they will be seen the next time a transaction uses that transaction ID. + +By providing a staleness counter for transactions, the costs of cleaning up +finished transactions is deferred until the space is actually used again. This +is at the cost of having less-than-optimal space utilization. Changing this in +the future would be completely transparent to users, so I felt it was an +acceptable tradeoff for quick delivery of a functional product. + +=head2 Fiddly bits + +At this point, all the major pieces were in place. All that was left was to +get all the fiddly bits into place. This included handling the behavior of +C, simultaneous transactions with commits and rollbacks in various +order, and making sure that transactions played nicely when a new Index sector +needed to be created due to reindexing a full Bucketlist sector. Of these, +C was the hardest. This is when I actually implemented the Iterator classes +to handle walking the index/bucketlist chain. + +=head1 The future + +Basic transactions are only the first step. There are several features that +can be added on top of what's been provided. If and in which order any of +these are implemented is completely up to user-feedback. (Note: these are +advanced topics - I cannot be held responsible for any bleeding ears.) + +=head2 Conflict resolution + +Conflict, in this context, is what happens when two transactions run +simultaneously and change the same piece of data. Like most relational databases, +L uses a very simplistic form of conflict resolution - +last commit wins. This works quite well for a row-based RDBMS, but doesn't work +as well for fractal structures like hashes. + +Contrast this with how Subversion handles conflict. It tracks when each +transaction was started. If the HEAD was changed after the transaction +started, the commit is rejected. It is up to the developer to pull in the +latest changes, mediate any conflicts, and then recommit. There are several +other ways to handle conflict resolution, many of which can be pulled from +Haskell's use of Software Transactional Memory (STM). + +=head2 Checkpoints + +A transaction may have several steps within it. The first three may succeed, +but the fourth might fail. Instead of rolling back all the way to the +beginning, you might want to rollback to the last successful step and try +again. This is a checkpoint. Most RDBMSes provide them and they aren't very +difficult, conceptually, but I've seen just how "easy" some features can be +once you start really exploring the problemspace. + +=head2 Sub-transactions + +Similar to L, sub-transactions provide a mechanism for trying +something within a transaction without affecting the transaction. However, +instead of saying "These steps are safely finished," sub-transactions still +provides for the ability to rollback the primary transaction all the way. +Sub-transactions can also work better with libraries that may want to use +transactions themselves. + +This, of all the features listed, is the one I'm most interested in +implementing next. + +=head2 Durability + +As mentioned in L, the 'D' in ACID stands for +I. L does not satisfy that criterion because +durability almost always requires another file (or files) for a commit log. I +deemed this unacceptable for this release because one of the +L's features is the single datafile. To be honest, I +don't anticipate this to be an issue for most users because the niche that +L occupies is one that is tolerant to failure and a small +chance of potential dataloss. + +However, Berkley DB does provide durability with only a single file. If it +becomes necessary, cribbing from them could be an option. + +=cut diff --git a/lib/DBM/Deep.pm b/lib/DBM/Deep.pm index 2d00dca..7a6bbcd 100644 --- a/lib/DBM/Deep.pm +++ b/lib/DBM/Deep.pm @@ -1,40 +1,11 @@ package DBM::Deep; -## -# DBM::Deep -# -# Description: -# Multi-level database module for storing hash trees, arrays and simple -# key/value pairs into FTP-able, cross-platform binary database files. -# -# Type `perldoc DBM::Deep` for complete documentation. -# -# Usage Examples: -# my %db; -# tie %db, 'DBM::Deep', 'my_database.db'; # standard tie() method -# -# my $db = new DBM::Deep( 'my_database.db' ); # preferred OO method -# -# $db->{my_scalar} = 'hello world'; -# $db->{my_hash} = { larry => 'genius', hashes => 'fast' }; -# $db->{my_array} = [ 1, 2, 3, time() ]; -# $db->{my_complex} = [ 'hello', { perl => 'rules' }, 42, 99 ]; -# push @{$db->{my_array}}, 'another value'; -# my @key_list = keys %{$db->{my_hash}}; -# print "This module " . $db->{my_complex}->[1]->{perl} . "!\n"; -# -# Copyright: -# (c) 2002-2006 Joseph Huckaby. All Rights Reserved. -# This program is free software; you can redistribute it and/or -# modify it under the same terms as Perl itself. -## - use 5.006_000; use strict; use warnings; -our $VERSION = q(0.99_04); +our $VERSION = q(1.0000); use Fcntl qw( :flock ); @@ -245,11 +216,8 @@ sub import { #XXX but that's a lot more thinking than I want to do right now. eval { local $SIG{'__DIE__'}; - $self->begin_work; $self->_import( Clone::clone( $struct ) ); - $self->commit; }; if ( my $e = $@ ) { - $self->rollback; die $e; } diff --git a/lib/DBM/Deep.pod b/lib/DBM/Deep.pod index 2312d95..c519335 100644 --- a/lib/DBM/Deep.pod +++ b/lib/DBM/Deep.pod @@ -202,9 +202,9 @@ from the values stored in the datafile's header. =item * num_txns -This is the maximum number of transactions that can be running at one time. The -default is two - the HEAD and one for imports. The minimum is two and the -maximum is 255. The more transactions, the larger and quicker the datafile grows. +This is the number of transactions that can be running at one time. The +default is one - the HEAD. The minimum is one and the maximum is 255. The more +transactions, the larger and quicker the datafile grows. See L below. @@ -212,8 +212,25 @@ See L below. This is the number of entries that can be added before a reindexing. The larger this number is made, the larger a file gets, but the better performance you will -have. The default and minimum number this can be is 16. There is no maximum, but -more than 32 isn't recommended. +have. The default and minimum number this can be is 16. The maximum is 256, but +more than 64 isn't recommended. + +=item * data_sector_size + +This is the size in bytes of a given data sector. Data sectors will chain, so +a value of any size can be stored. However, chaining is expensive in terms of +time. Setting this value to something close to the expected common length of +your scalars will improve your performance. If it is too small, your file will +have a lot of chaining. If it is too large, your file will have a lot of dead +space in it. + +The default for this is 64 bytes. The minimum value is 32 and the maximum is +256 bytes. + +B There are between 6 and 10 bytes taken up in each data sector for +bookkeeping. (It's 4 + the number of bytes in your L.) This is +included within the data_sector_size, thus the effective value is 6-10 bytes +less than what you specified. =item * pack_size @@ -224,15 +241,16 @@ are: =item * small -This uses 2-byte offsets, allowing for a maximum file size of 65K +This uses 2-byte offsets, allowing for a maximum file size of 65 KB. =item * medium (default) -This uses 4-byte offsets, allowing for a maximum file size of 2G. +This uses 4-byte offsets, allowing for a maximum file size of 4 GB. =item * large -This uses 8-byte offsets, allowing for a maximum file size of 16XB (exabytes). +This uses 8-byte offsets, allowing for a maximum file size of 16 XB +(exabytes). This can only be enabled if your Perl is compiled for 64-bit. =back @@ -597,12 +615,6 @@ B Make sure your existing structure has no circular references in it. These will cause an infinite loop when importing. There are plans to fix this in a later release. -B With the addition of transactions, importing is performed within a -transaction, then immediately committed upon success (and rolled back upon -failre). As a result, you cannot call C from within a transaction. -This restriction will be lifted when subtransactions are added in a future -release. - =head2 Exporting Calling the C method on an existing DBM::Deep object will return @@ -780,7 +792,7 @@ failure. You can wrap calls in an eval block to catch the die. =head1 LARGEFILE SUPPORT If you have a 64-bit system, and your Perl is compiled with both LARGEFILE -and 64-bit support, you I be able to create databases larger than 2 GB. +and 64-bit support, you I be able to create databases larger than 4 GB. DBM::Deep by default uses 32-bit file offset tags, but these can be changed by specifying the 'pack_size' parameter when constructing the file. @@ -802,9 +814,9 @@ the file's header and cannot be changed for the life of the file. These parameters are per-file, meaning you can access 32-bit and 64-bit files, as you choose. -B We have not personally tested files larger than 2 GB -- all my +B We have not personally tested files larger than 4 GB -- all my systems have only a 32-bit Perl. However, I have received user reports that -this does indeed work! +this does indeed work. =head1 LOW-LEVEL ACCESS @@ -924,12 +936,6 @@ the transaction. Transactions in DBM::Deep are done using a variant of the MVCC method, the same method used by the InnoDB MySQL engine. -=head2 Software-Transactional Memory - -The addition of transactions to this module provides the basis for STM within -Perl 5. Contention is resolved using a default last-write-wins. Currently, -this default cannot be changed, but it will be addressed in a future version. - =head1 PERFORMANCE Because DBM::Deep is a conncurrent datastore, every change is flushed to disk @@ -972,6 +978,16 @@ DBM::Deep to use 16bit addresses, meaning that the seek times will be less. =back +=head1 MIGRATION + +As of 1.0000, the file format has changed. Furthermore, DBM::Deep is now +designed to potentially change file format between point-releases, if needed to +support a requested feature. To aid in this, a migration script is provided +within the CPAN distribution called C. + +B This script is not installed onto your system because it carries a copy +of every version prior to the current version. + =head1 TODO The following are items that are planned to be added in future releases. These @@ -1152,11 +1168,11 @@ B report on this distribution's test suite. File stmt bran cond sub pod time total ---------------------------- ------ ------ ------ ------ ------ ------ ------ blib/lib/DBM/Deep.pm 96.8 87.9 90.5 100.0 89.5 4.5 95.2 - blib/lib/DBM/Deep/Array.pm 100.0 94.3 100.0 100.0 100.0 4.9 98.7 - blib/lib/DBM/Deep/Engine.pm 96.9 85.2 79.7 100.0 0.0 58.2 90.3 - blib/lib/DBM/Deep/File.pm 99.0 88.9 77.8 100.0 0.0 30.0 90.3 - blib/lib/DBM/Deep/Hash.pm 100.0 100.0 100.0 100.0 100.0 2.4 100.0 - Total 97.6 87.9 84.0 100.0 32.1 100.0 92.8 + blib/lib/DBM/Deep/Array.pm 100.0 94.3 100.0 100.0 100.0 4.8 98.7 + blib/lib/DBM/Deep/Engine.pm 97.2 86.4 86.0 100.0 0.0 56.8 91.0 + blib/lib/DBM/Deep/File.pm 98.1 83.3 66.7 100.0 0.0 31.4 88.0 + blib/lib/DBM/Deep/Hash.pm 100.0 100.0 100.0 100.0 100.0 2.5 100.0 + Total 97.7 88.1 86.6 100.0 31.6 100.0 93.0 ---------------------------- ------ ------ ------ ------ ------ ------ ------ =head1 MORE INFORMATION @@ -1167,21 +1183,26 @@ irc.perl.org The source code repository is at L -=head1 MAINTAINER(S) +=head1 MAINTAINERS Rob Kinyon, L Originally written by Joseph Huckaby, L +=head1 SPONSORS + +Stonehenge Consulting (L) sponsored the +developement of transactions and freespace management, leading to the 1.0000 +release. A great debt of gratitude goes out to them for their continuing +leadership in and support of the Perl community. + =head1 CONTRIBUTORS The following have contributed greatly to make DBM::Deep what it is today: =over 4 -=item * Adam Sah and Rich Gaushell - -=item * Stonehenge for sponsoring the 1.00 release +=item * Adam Sah and Rich Gaushell for innumerable contributions early on. =item * Dan Golden and others at YAPC::NA 2006 for helping me design through transactions. @@ -1195,7 +1216,7 @@ Digest::SHA256(3), Crypt::Blowfish(3), Compress::Zlib(3) =head1 LICENSE Copyright (c) 2007 Rob Kinyon. All Rights Reserved. -This is free software, you may use it and distribute it under the -same terms as Perl itself. +This is free software, you may use it and distribute it under the same terms +as Perl itself. =cut diff --git a/lib/DBM/Deep/Array.pm b/lib/DBM/Deep/Array.pm index 3d5804f..438756b 100644 --- a/lib/DBM/Deep/Array.pm +++ b/lib/DBM/Deep/Array.pm @@ -5,7 +5,7 @@ use 5.006_000; use strict; use warnings; -our $VERSION = '0.99_04'; +our $VERSION = q(1.0000); # This is to allow DBM::Deep::Array to handle negative indices on # its own. Otherwise, Perl would intercept the call to negative diff --git a/lib/DBM/Deep/Engine.pm b/lib/DBM/Deep/Engine.pm index a491892..663a0f0 100644 --- a/lib/DBM/Deep/Engine.pm +++ b/lib/DBM/Deep/Engine.pm @@ -3,8 +3,9 @@ package DBM::Deep::Engine; use 5.006_000; use strict; +use warnings; -our $VERSION = q(0.99_04); +our $VERSION = q(1.0000); use Scalar::Util (); @@ -17,7 +18,6 @@ use Scalar::Util (); # Setup file and tag signatures. These should never change. sub SIG_FILE () { 'DPDB' } sub SIG_HEADER () { 'h' } -sub SIG_INTERNAL () { 'i' } sub SIG_HASH () { 'H' } sub SIG_ARRAY () { 'A' } sub SIG_NULL () { 'N' } @@ -25,13 +25,13 @@ sub SIG_DATA () { 'D' } sub SIG_INDEX () { 'I' } sub SIG_BLIST () { 'B' } sub SIG_FREE () { 'F' } -sub SIG_KEYS () { 'K' } sub SIG_SIZE () { 1 } -sub STALE_SIZE () { 1 } + +my $STALE_SIZE = 2; # Please refer to the pack() documentation for further information my %StP = ( - 1 => 'C', # Unsigned char value (no order specified, presumably ASCII) + 1 => 'C', # Unsigned char value (no order needed as it's just one byte) 2 => 'n', # Unsigned short in "network" (big-endian) order 4 => 'N', # Unsigned long in "network" (big-endian) order 8 => 'Q', # Usigned quad (no order specified, presumably machine-dependent) @@ -50,13 +50,17 @@ sub new { hash_size => 16, # In bytes hash_chars => 256, # Number of chars the algorithm uses per byte max_buckets => 16, - num_txns => 2, # HEAD plus 1 additional transaction for importing + num_txns => 1, # The HEAD trans_id => 0, # Default to the HEAD + data_sector_size => 64, # Size in bytes of each data sector + entries => {}, # This is the list of entries for transactions storage => undef, }, $class; + # Never allow byte_size to be set directly. + delete $args->{byte_size}; if ( defined $args->{pack_size} ) { if ( lc $args->{pack_size} eq 'small' ) { $args->{byte_size} = 2; @@ -78,15 +82,26 @@ sub new { $self->{$param} = $args->{$param}; } - ## - # Number of buckets per blist before another level of indexing is - # done. Increase this value for slightly greater speed, but larger database - # files. DO NOT decrease this value below 16, due to risk of recursive - # reindex overrun. - ## - if ( $self->{max_buckets} < 16 ) { - warn "Floor of max_buckets is 16. Setting it to 16 from '$self->{max_buckets}'\n"; - $self->{max_buckets} = 16; + my %validations = ( + max_buckets => { floor => 16, ceil => 256 }, + num_txns => { floor => 1, ceil => 255 }, + data_sector_size => { floor => 32, ceil => 256 }, + ); + + while ( my ($attr, $c) = each %validations ) { + if ( !defined $self->{$attr} + || !length $self->{$attr} + || $self->{$attr} =~ /\D/ + || $self->{$attr} < $c->{floor} + ) { + $self->{$attr} = '(undef)' if !defined $self->{$attr}; + warn "Floor of $attr is $c->{floor}. Setting it to $c->{floor} from '$self->{$attr}'\n"; + $self->{$attr} = $c->{floor}; + } + elsif ( $self->{$attr} > $c->{ceil} ) { + warn "Ceiling of $attr is $c->{ceil}. Setting it to $c->{ceil} from '$self->{$attr}'\n"; + $self->{$attr} = $c->{ceil}; + } } if ( !$self->{digest} ) { @@ -345,12 +360,18 @@ sub begin_work { } my @slots = $self->read_txn_slots; - for my $i ( 1 .. @slots ) { + my $found; + for my $i ( 0 .. $#slots ) { next if $slots[$i]; + $slots[$i] = 1; - $self->set_trans_id( $i ); + $self->set_trans_id( $i + 1 ); + $found = 1; last; } + unless ( $found ) { + DBM::Deep->_throw_error( "Cannot allocate transaction ID" ); + } $self->write_txn_slots( @slots ); if ( !$self->trans_id ) { @@ -375,7 +396,8 @@ sub rollback { my $read_loc = $entry + $self->hash_size + $self->byte_size - + $self->trans_id * ( $self->byte_size + 4 ); + + $self->byte_size + + ($self->trans_id - 1) * ( $self->byte_size + $STALE_SIZE ); my $data_loc = $self->storage->read_at( $read_loc, $self->byte_size ); $data_loc = unpack( $StP{$self->byte_size}, $data_loc ); @@ -389,7 +411,7 @@ sub rollback { $self->clear_entries; my @slots = $self->read_txn_slots; - $slots[$self->trans_id] = 0; + $slots[$self->trans_id-1] = 0; $self->write_txn_slots( @slots ); $self->inc_txn_staleness_counter( $self->trans_id ); $self->set_trans_id( 0 ); @@ -413,14 +435,16 @@ sub commit { my $head_loc = $self->storage->read_at( $base, $self->byte_size ); $head_loc = unpack( $StP{$self->byte_size}, $head_loc ); + + my $spot = $base + $self->byte_size + ($self->trans_id - 1) * ( $self->byte_size + $STALE_SIZE ); my $trans_loc = $self->storage->read_at( - $base + $self->trans_id * ( $self->byte_size + 4 ), $self->byte_size, + $spot, $self->byte_size, ); $self->storage->print_at( $base, $trans_loc ); $self->storage->print_at( - $base + $self->trans_id * ( $self->byte_size + 4 ), - pack( $StP{$self->byte_size} . ' N', (0) x 2 ), + $spot, + pack( $StP{$self->byte_size} . ' ' . $StP{$STALE_SIZE}, (0) x 2 ), ); if ( $head_loc > 1 ) { @@ -431,7 +455,7 @@ sub commit { $self->clear_entries; my @slots = $self->read_txn_slots; - $slots[$self->trans_id] = 0; + $slots[$self->trans_id-1] = 0; $self->write_txn_slots( @slots ); $self->inc_txn_staleness_counter( $self->trans_id ); $self->set_trans_id( 0 ); @@ -441,24 +465,27 @@ sub commit { sub read_txn_slots { my $self = shift; - return split '', unpack( 'b32', + my $bl = $self->txn_bitfield_len; + my $num_bits = $bl * 8; + return split '', unpack( 'b'.$num_bits, $self->storage->read_at( - $self->trans_loc, 4, + $self->trans_loc, $bl, ) ); } sub write_txn_slots { my $self = shift; + my $num_bits = $self->txn_bitfield_len * 8; $self->storage->print_at( $self->trans_loc, - pack( 'b32', join('', @_) ), + pack( 'b'.$num_bits, join('', @_) ), ); } sub get_running_txn_ids { my $self = shift; my @transactions = $self->read_txn_slots; - my @trans_ids = grep { $transactions[$_] } 0 .. $#transactions; + my @trans_ids = map { $_+1} grep { $transactions[$_] } 0 .. $#transactions; } sub get_txn_staleness_counter { @@ -468,13 +495,12 @@ sub get_txn_staleness_counter { # Hardcode staleness of 0 for the HEAD return 0 unless $trans_id; - my $x = unpack( 'N', + return unpack( $StP{$STALE_SIZE}, $self->storage->read_at( - $self->trans_loc + 4 * $trans_id, + $self->trans_loc + 4 + $STALE_SIZE * ($trans_id - 1), 4, ) ); - return $x; } sub inc_txn_staleness_counter { @@ -485,8 +511,8 @@ sub inc_txn_staleness_counter { return unless $trans_id; $self->storage->print_at( - $self->trans_loc + 4 * $trans_id, - pack( 'N', $self->get_txn_staleness_counter( $trans_id ) + 1 ), + $self->trans_loc + 4 + $STALE_SIZE * ($trans_id - 1), + pack( $StP{$STALE_SIZE}, $self->get_txn_staleness_counter( $trans_id ) + 1 ), ); } @@ -530,32 +556,41 @@ sub clear_entries { { my $header_fixed = length( SIG_FILE ) + 1 + 4 + 4; + my $this_file_version = 2; sub _write_file_header { my $self = shift; - my $header_var = 1 + 1 + 1 + 4 + 4 * $self->num_txns + 3 * $self->byte_size; + my $nt = $self->num_txns; + my $bl = $self->txn_bitfield_len; + + my $header_var = 1 + 1 + 1 + 1 + $bl + $STALE_SIZE * ($nt - 1) + 3 * $self->byte_size; my $loc = $self->storage->request_space( $header_fixed + $header_var ); $self->storage->print_at( $loc, SIG_FILE, SIG_HEADER, - pack('N', 1), # header version - at this point, we're at 9 bytes - pack('N', $header_var), # header size + pack('N', $this_file_version), # At this point, we're at 9 bytes + pack('N', $header_var), # header size # --- Above is $header_fixed. Below is $header_var pack('C', $self->byte_size), - pack('C', $self->max_buckets), - pack('C', $self->num_txns), - pack('N', 0 ), # Transaction activeness bitfield - pack('N' . $self->num_txns, 0 x $self->num_txns ), # Transaction staleness counters + + # These shenanigans are to allow a 256 within a C + pack('C', $self->max_buckets - 1), + pack('C', $self->data_sector_size - 1), + + pack('C', $nt), + pack('C' . $bl, 0 ), # Transaction activeness bitfield + pack($StP{$STALE_SIZE}.($nt-1), 0 x ($nt-1) ), # Transaction staleness counters pack($StP{$self->byte_size}, 0), # Start of free chain (blist size) pack($StP{$self->byte_size}, 0), # Start of free chain (data size) pack($StP{$self->byte_size}, 0), # Start of free chain (index size) ); - $self->set_trans_loc( $header_fixed + 3 ); - $self->set_chains_loc( $header_fixed + 3 + 4 + 4 * $self->num_txns ); + #XXX Set these less fragilely + $self->set_trans_loc( $header_fixed + 4 ); + $self->set_chains_loc( $header_fixed + 4 + $bl + $STALE_SIZE * ($nt-1) ); return; } @@ -566,7 +601,7 @@ sub clear_entries { my $buffer = $self->storage->read_at( 0, $header_fixed ); return unless length($buffer); - my ($file_signature, $sig_header, $header_version, $size) = unpack( + my ($file_signature, $sig_header, $file_version, $size) = unpack( 'A4 A N N', $buffer ); @@ -577,29 +612,43 @@ sub clear_entries { unless ( $sig_header eq SIG_HEADER ) { $self->storage->close; - DBM::Deep->_throw_error( "Old file version found." ); + DBM::Deep->_throw_error( "Pre-1.00 file version found" ); + } + + unless ( $file_version == $this_file_version ) { + $self->storage->close; + DBM::Deep->_throw_error( + "Wrong file version found - " . $file_version . + " - expected " . $this_file_version + ); } my $buffer2 = $self->storage->read_at( undef, $size ); - my @values = unpack( 'C C C', $buffer2 ); + my @values = unpack( 'C C C C', $buffer2 ); - if ( @values != 3 || grep { !defined } @values ) { + if ( @values != 4 || grep { !defined } @values ) { $self->storage->close; DBM::Deep->_throw_error("Corrupted file - bad header"); } - $self->set_trans_loc( $header_fixed + scalar(@values) ); - $self->set_chains_loc( $header_fixed + scalar(@values) + 4 + 4 * $self->num_txns ); - #XXX Add warnings if values weren't set right - @{$self}{qw(byte_size max_buckets num_txns)} = @values; + @{$self}{qw(byte_size max_buckets data_sector_size num_txns)} = @values; - my $header_var = scalar(@values) + 4 + 4 * $self->num_txns + 3 * $self->byte_size; + # These shenangians are to allow a 256 within a C + $self->{max_buckets} += 1; + $self->{data_sector_size} += 1; + + my $bl = $self->txn_bitfield_len; + + my $header_var = scalar(@values) + $bl + $STALE_SIZE * ($self->num_txns - 1) + 3 * $self->byte_size; unless ( $size == $header_var ) { $self->storage->close; DBM::Deep->_throw_error( "Unexpected size found ($size <-> $header_var)." ); } + $self->set_trans_loc( $header_fixed + scalar(@values) ); + $self->set_chains_loc( $header_fixed + scalar(@values) + $bl + $STALE_SIZE * ($self->num_txns - 1) ); + return length($buffer) + length($buffer2); } } @@ -677,9 +726,9 @@ sub _add_free_sector { # Increment staleness. # XXX Can this increment+modulo be done by "&= 0x1" ? - my $staleness = unpack( $StP{STALE_SIZE()}, $storage->read_at( $offset + SIG_SIZE, STALE_SIZE ) ); - $staleness = ($staleness + 1 ) % ( 2 ** ( 8 * STALE_SIZE ) ); - $storage->print_at( $offset + SIG_SIZE, pack( $StP{STALE_SIZE()}, $staleness ) ); + my $staleness = unpack( $StP{$STALE_SIZE}, $storage->read_at( $offset + SIG_SIZE, $STALE_SIZE ) ); + $staleness = ($staleness + 1 ) % ( 2 ** ( 8 * $STALE_SIZE ) ); + $storage->print_at( $offset + SIG_SIZE, pack( $StP{$STALE_SIZE}, $staleness ) ); my $old_head = $storage->read_at( $self->chains_loc + $chains_offset, $self->byte_size ); @@ -688,7 +737,7 @@ sub _add_free_sector { ); # Record the old head in the new sector after the signature and staleness counter - $storage->print_at( $offset + SIG_SIZE + STALE_SIZE, $old_head ); + $storage->print_at( $offset + SIG_SIZE + $STALE_SIZE, $old_head ); } sub _request_blist_sector { shift->_request_sector( 0, @_ ) } @@ -716,10 +765,10 @@ sub _request_sector { } # Read the new head after the signature and the staleness counter - my $new_head = $self->storage->read_at( $loc + SIG_SIZE + STALE_SIZE, $self->byte_size ); + my $new_head = $self->storage->read_at( $loc + SIG_SIZE + $STALE_SIZE, $self->byte_size ); $self->storage->print_at( $self->chains_loc + $chains_offset, $new_head ); $self->storage->print_at( - $loc + SIG_SIZE + STALE_SIZE, + $loc + SIG_SIZE + $STALE_SIZE, pack( $StP{$self->byte_size}, 0 ), ); @@ -735,6 +784,20 @@ sub hash_chars { $_[0]{hash_chars} } sub num_txns { $_[0]{num_txns} } sub max_buckets { $_[0]{max_buckets} } sub blank_md5 { chr(0) x $_[0]->hash_size } +sub data_sector_size { $_[0]{data_sector_size} } + +# This is a calculated value +sub txn_bitfield_len { + my $self = shift; + unless ( exists $self->{txn_bitfield_len} ) { + my $temp = ($self->num_txns) / 8; + if ( $temp > int( $temp ) ) { + $temp = int( $temp ) + 1; + } + $self->{txn_bitfield_len} = $temp; + } + return $self->{txn_bitfield_len}; +} sub trans_id { $_[0]{trans_id} } sub set_trans_id { $_[0]{trans_id} = $_[1] } @@ -928,7 +991,7 @@ sub type { $_[0]{type} } sub base_size { my $self = shift; - return $self->engine->SIG_SIZE + $self->engine->STALE_SIZE; + return $self->engine->SIG_SIZE + $STALE_SIZE; } sub free { @@ -953,15 +1016,15 @@ package DBM::Deep::Engine::Sector::Data; our @ISA = qw( DBM::Deep::Engine::Sector ); # This is in bytes -sub size { return 256 } +sub size { $_[0]{engine}->data_sector_size } sub free_meth { return '_add_free_data_sector' } sub clone { my $self = shift; return ref($self)->new({ engine => $self->engine, - data => $self->data, type => $self->type, + data => $self->data, }); } @@ -990,7 +1053,7 @@ sub _init { my $engine = $self->engine; unless ( $self->offset ) { - my $data_section = $self->size - $self->base_size - 1 * $engine->byte_size - 1; + my $data_section = $self->size - $self->base_size - $engine->byte_size - 1; $self->{offset} = $engine->_request_data_sector( $self->size ); @@ -1141,8 +1204,8 @@ sub _init { } $self->{staleness} = unpack( - $StP{$e->STALE_SIZE}, - $e->storage->read_at( $self->offset + $e->SIG_SIZE, $e->STALE_SIZE ), + $StP{$STALE_SIZE}, + $e->storage->read_at( $self->offset + $e->SIG_SIZE, $STALE_SIZE ), ); return; @@ -1524,7 +1587,7 @@ sub bucket_size { unless ( $self->{bucket_size} ) { my $e = $self->engine; # Key + head (location) + transactions (location + staleness-counter) - my $location_size = $e->byte_size + $e->num_txns * ( $e->byte_size + 4 ); + my $location_size = $e->byte_size + $e->byte_size + ($e->num_txns - 1) * ($e->byte_size + $STALE_SIZE); $self->{bucket_size} = $e->hash_size + $location_size; } return $self->{bucket_size}; @@ -1637,13 +1700,21 @@ sub write_md5 { my $loc = $spot + $engine->hash_size - + $engine->byte_size - + $args->{trans_id} * ( $engine->byte_size + 4 ); + + $engine->byte_size; - $engine->storage->print_at( $loc, - pack( $StP{$engine->byte_size}, $args->{value}->offset ), - pack( 'N', $engine->get_txn_staleness_counter( $args->{trans_id} ) ), - ); + if ( $args->{trans_id} ) { + $loc += $engine->byte_size + ($args->{trans_id} - 1) * ( $engine->byte_size + $STALE_SIZE ); + + $engine->storage->print_at( $loc, + pack( $StP{$engine->byte_size}, $args->{value}->offset ), + pack( $StP{$STALE_SIZE}, $engine->get_txn_staleness_counter( $args->{trans_id} ) ), + ); + } + else { + $engine->storage->print_at( $loc, + pack( $StP{$engine->byte_size}, $args->{value}->offset ), + ); + } } sub mark_deleted { @@ -1660,13 +1731,22 @@ sub mark_deleted { my $loc = $spot + $engine->hash_size - + $engine->byte_size - + $args->{trans_id} * ( $engine->byte_size + 4 ); + + $engine->byte_size; + + if ( $args->{trans_id} ) { + $loc += $engine->byte_size + ($args->{trans_id} - 1) * ( $engine->byte_size + $STALE_SIZE ); + + $engine->storage->print_at( $loc, + pack( $StP{$engine->byte_size}, 1 ), # 1 is the marker for deleted + pack( $StP{$STALE_SIZE}, $engine->get_txn_staleness_counter( $args->{trans_id} ) ), + ); + } + else { + $engine->storage->print_at( $loc, + pack( $StP{$engine->byte_size}, 1 ), # 1 is the marker for deleted + ); + } - $engine->storage->print_at( $loc, - pack( $StP{$engine->byte_size}, 1 ), # 1 is the marker for deleted - pack( 'N', $engine->get_txn_staleness_counter( $args->{trans_id} ) ), - ); } sub delete_md5 { @@ -1714,22 +1794,27 @@ sub get_data_location_for { my $spot = $self->offset + $self->base_size + $args->{idx} * $self->bucket_size + $e->hash_size - + $e->byte_size - + $args->{trans_id} * ( $e->byte_size + 4 ); + + $e->byte_size; + + if ( $args->{trans_id} ) { + $spot += $e->byte_size + ($args->{trans_id} - 1) * ( $e->byte_size + $STALE_SIZE ); + } my $buffer = $e->storage->read_at( $spot, - $e->byte_size + 4, + $e->byte_size + $STALE_SIZE, ); - my ($loc, $staleness) = unpack( $StP{$e->byte_size} . ' N', $buffer ); - - # We have found an entry that is old, so get rid of it - if ( $staleness != (my $s = $e->get_txn_staleness_counter( $args->{trans_id} ) ) ) { - $e->storage->print_at( - $spot, - pack( $StP{$e->byte_size} . ' N', (0) x 2 ), - ); - $loc = 0; + my ($loc, $staleness) = unpack( $StP{$e->byte_size} . ' ' . $StP{$STALE_SIZE}, $buffer ); + + if ( $args->{trans_id} ) { + # We have found an entry that is old, so get rid of it + if ( $staleness != (my $s = $e->get_txn_staleness_counter( $args->{trans_id} ) ) ) { + $e->storage->print_at( + $spot, + pack( $StP{$e->byte_size} . ' ' . $StP{$STALE_SIZE}, (0) x 2 ), + ); + $loc = 0; + } } # If we're in a transaction and we never wrote to this location, try the @@ -1798,6 +1883,7 @@ sub _init { return $self; } +#XXX Change here sub size { my $self = shift; unless ( $self->{size} ) { diff --git a/lib/DBM/Deep/File.pm b/lib/DBM/Deep/File.pm index 84a98d4..02276f0 100644 --- a/lib/DBM/Deep/File.pm +++ b/lib/DBM/Deep/File.pm @@ -5,7 +5,7 @@ use 5.006_000; use strict; use warnings; -our $VERSION = q(0.99_04); +our $VERSION = q(1.0000); use Fcntl qw( :DEFAULT :flock :seek ); @@ -50,7 +50,14 @@ sub open { # Adding O_BINARY should remove the need for the binmode below. However, # I'm not going to remove it because I don't have the Win32 chops to be # absolutely certain everything will be ok. - my $flags = O_RDWR | O_CREAT | O_BINARY; + my $flags = O_CREAT | O_BINARY; + + if ( !-e $self->{file} || -w _ ) { + $flags |= O_RDWR; + } + else { + $flags |= O_RDONLY; + } my $fh; sysopen( $fh, $self->{file}, $flags ) diff --git a/lib/DBM/Deep/Hash.pm b/lib/DBM/Deep/Hash.pm index fb27097..e3de270 100644 --- a/lib/DBM/Deep/Hash.pm +++ b/lib/DBM/Deep/Hash.pm @@ -5,7 +5,7 @@ use 5.006_000; use strict; use warnings; -our $VERSION = q(0.99_04); +our $VERSION = q(1.0000); use base 'DBM::Deep'; diff --git a/lib/DBM/Deep/Internals.pod b/lib/DBM/Deep/Internals.pod index 71a1b7e..b5b0ff2 100644 --- a/lib/DBM/Deep/Internals.pod +++ b/lib/DBM/Deep/Internals.pod @@ -57,17 +57,22 @@ This is the tagging of the file header. The file used by versions prior to =item * Version -This is four bytes containing the header version. This lets the header change over time. +This is four bytes containing the file version. This lets the file format change over time. + +=item * Constants + +These are the file-wide constants that determine how the file is laid out. +They can only be set upon file creation. =item * Transaction information The current running transactions are stored here, as is the next transaction ID. -=item * Constants +=item * Freespace information -These are the file-wide constants that determine how the file is laid out. -They can only be set upon file creation. +Pointers into the next free sectors of the various sector sizes (Index, +Bucketlist, and Data) are stored here. =back @@ -148,7 +153,7 @@ increasing your memeory usage at all. DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster DBM::Deep will be. Currently, when performing C{foo}>, there are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your -data). (All values assume a medium filesize.) The actions take are: +data). (All values assume a medium filesize.) The actions taken are: =over 4 diff --git a/t/06_error.t b/t/06_error.t index ea39773..c8775e8 100644 --- a/t/06_error.t +++ b/t/06_error.t @@ -3,23 +3,25 @@ ## $|++; use strict; -use Test::More tests => 6; +use Test::More tests => 23; use Test::Exception; +use Test::Warn; use t::common qw( new_fh ); use_ok( 'DBM::Deep' ); -my ($fh, $filename) = new_fh(); - -## # test a corrupted file -## -open FH, ">$filename"; -print FH 'DPDB'; -close FH; -throws_ok { - DBM::Deep->new( $filename ); -} qr/DBM::Deep: Old file version found/, "Fail if there's a bad header"; +{ + my ($fh, $filename) = new_fh(); + + open FH, ">$filename"; + print FH 'DPDB'; + close FH; + + throws_ok { + DBM::Deep->new( $filename ); + } qr/DBM::Deep: Pre-1.00 file version found/, "Fail if there's a bad header"; +} { my ($fh, $filename) = new_fh(); @@ -52,3 +54,84 @@ throws_ok { DBM::Deep->new( file => $filename, type => DBM::Deep->TYPE_HASH ) } qr/DBM::Deep: File type mismatch/, "Fail if we try and open an array file with a hash"; } + +{ + my %floors = ( + max_buckets => 16, + num_txns => 1, + data_sector_size => 32, + ); + + while ( my ($attr, $floor) = each %floors ) { + { + my ($fh, $filename) = new_fh(); + warning_like { + my $db = DBM::Deep->new( + file => $filename, + $attr => undef, + ); + } qr{Floor of $attr is $floor\. Setting it to $floor from '\Q(undef)\E'}, + "Warning for $attr => undef is correct"; + } + { + my ($fh, $filename) = new_fh(); + warning_like { + my $db = DBM::Deep->new( + file => $filename, + $attr => '', + ); + } qr{Floor of $attr is $floor\. Setting it to $floor from ''}, + "Warning for $attr => '' is correct"; + } + { + my ($fh, $filename) = new_fh(); + warning_like { + my $db = DBM::Deep->new( + file => $filename, + $attr => 'abcd', + ); + } qr{Floor of $attr is $floor\. Setting it to $floor from 'abcd'}, + "Warning for $attr => 'abcd' is correct"; + } + { + my ($fh, $filename) = new_fh(); + my $val = $floor - 1; + warning_like { + my $db = DBM::Deep->new( + file => $filename, + $attr => $val, + ); + } qr{Floor of $attr is $floor\. Setting it to $floor from '$val'}, + "Warning for $attr => $val is correct"; + } + } + + my %ceilings = ( + max_buckets => 256, + num_txns => 255, + data_sector_size => 256, + ); + + while ( my ($attr, $ceiling) = each %ceilings ) { + my ($fh, $filename) = new_fh(); + warning_like { + my $db = DBM::Deep->new( + file => $filename, + $attr => 1000, + ); + } qr{Ceiling of $attr is $ceiling\. Setting it to $ceiling from '1000'}, + "Warning for $attr => 1000 is correct"; + } +} + +{ + throws_ok { + DBM::Deep->new( 't/etc/db-0-983' ); + } qr/DBM::Deep: Pre-1.00 file version found/, "Fail if opening a pre-1.00 file"; +} + +{ + throws_ok { + DBM::Deep->new( 't/etc/db-0-99_04' ); + } qr/DBM::Deep: Wrong file version found - 1 - expected 2/, "Fail if opening a file version 1"; +} diff --git a/t/13_setpack.t b/t/13_setpack.t index fe8be0f..9b468b4 100644 --- a/t/13_setpack.t +++ b/t/13_setpack.t @@ -77,7 +77,7 @@ my ($default, $small, $medium, $large); SKIP: { skip "Largefile support is not compiled into $^X", 3 - if 1; #unless $Config{ uselargefile }; + unless $Config{ use64bitall }; my ($fh, $filename) = new_fh(); { diff --git a/t/17_import.t b/t/17_import.t index 204be66..7792b6d 100644 --- a/t/17_import.t +++ b/t/17_import.t @@ -109,13 +109,16 @@ use_ok( 'DBM::Deep' ); }; like( $@, qr/Storage of references of type 'SCALAR' is not supported/, 'Error message correct' ); - cmp_deeply( - $db, - noclass({ - foo => 'bar', - }), - "Everything matches", - ); + TODO: { + local $TODO = "Importing cannot occur within a transaction yet."; + cmp_deeply( + $db, + noclass({ + foo => 'bar', + }), + "Everything matches", + ); + } } __END__ diff --git a/t/35_transaction_multiple.t b/t/35_transaction_multiple.t index 901b5c0..11261fd 100644 --- a/t/35_transaction_multiple.t +++ b/t/35_transaction_multiple.t @@ -49,7 +49,7 @@ $db1->{bar} = 'foo'; ok( exists $db1->{bar}, "After DB1 set bar to foo, DB1's bar exists" ); ok( !exists $db2->{bar}, "After DB1 set bar to foo, DB2's bar doesn't exist" ); ok( !exists $db3->{bar}, "After DB1 set bar to foo, DB3's bar doesn't exist" ); - + $db2->begin_work; is( $db1->{foo}, 'bar2', "After DB2 transaction begin, DB1's foo is still bar2" ); diff --git a/t/38_data_sector_size.t b/t/38_data_sector_size.t new file mode 100644 index 0000000..8414066 --- /dev/null +++ b/t/38_data_sector_size.t @@ -0,0 +1,103 @@ +## +# DBM::Deep Test +## +use strict; +use Test::More tests => 8; + +use t::common qw( new_fh ); + +use_ok( 'DBM::Deep' ); + +my %sizes; + +{ + my ($fh, $filename) = new_fh(); + { + my $db = DBM::Deep->new( + file => $filename, + data_sector_size => 32, + ); + + do_stuff( $db ); + } + + $sizes{32} = -s $filename; + + { + my $db = DBM::Deep->new( $filename ); + verify( $db ); + } +} + +{ + my ($fh, $filename) = new_fh(); + { + my $db = DBM::Deep->new( + file => $filename, + data_sector_size => 64, + ); + + do_stuff( $db ); + } + + $sizes{64} = -s $filename; + + { + my $db = DBM::Deep->new( $filename ); + verify( $db ); + } +} + +{ + my ($fh, $filename) = new_fh(); + { + my $db = DBM::Deep->new( + file => $filename, + data_sector_size => 128, + ); + + do_stuff( $db ); + } + + $sizes{128} = -s $filename; + + { + my $db = DBM::Deep->new( $filename ); + verify( $db ); + } +} + +{ + my ($fh, $filename) = new_fh(); + { + my $db = DBM::Deep->new( + file => $filename, + data_sector_size => 256, + ); + + do_stuff( $db ); + } + + $sizes{256} = -s $filename; + + { + my $db = DBM::Deep->new( $filename ); + verify( $db ); + } +} + +cmp_ok( $sizes{256}, '>', $sizes{128}, "Filesize for 256 > filesize for 128" ); +cmp_ok( $sizes{128}, '>', $sizes{64}, "Filesize for 128 > filesize for 64" ); +cmp_ok( $sizes{64}, '>', $sizes{32}, "Filesize for 64 > filesize for 32" ); + +sub do_stuff { + my ($db) = @_; + + $db->{foo}{bar} = [ 1 .. 3 ]; +} + +sub verify { + my ($db) = @_; + + cmp_ok( $db->{foo}{bar}[2], '==', 3, "Correct value found" ); +} diff --git a/t/38_transaction_add_item.todo b/t/38_transaction_add_item.todo deleted file mode 100644 index 4306e1b..0000000 --- a/t/38_transaction_add_item.todo +++ /dev/null @@ -1,69 +0,0 @@ -use strict; -use Test::More tests => 9; -use Test::Deep; -use t::common qw( new_fh ); - -use_ok( 'DBM::Deep' ); - -my ($fh, $filename) = new_fh(); -my $db = DBM::Deep->new( - file => $filename, - locking => 1, - autoflush => 1, - num_txns => 16, -); - -{ - my $obj = bless { - foo => 5, - }, 'Foo'; - - cmp_ok( $obj->{foo}, '==', 5, "FOO is 5 in the object" ); - ok( !exists $obj->{bar}, "BAR doesn't exist in the object" ); - - $db->begin_work; - - $db->{foo} = $obj; - $db->{foo}{bar} = 1; - - cmp_ok( $db->{foo}{bar}, '==', 1, "The value is visible within the transaction" ); - cmp_ok( $obj->{bar}, '==', 1, "The value is visible within the object" ); - - $db->rollback; - -TODO: { - local $TODO = "Adding items in transactions will be fixed soon"; - local $^W; - cmp_ok( $obj->{foo}, '==', 5 ); -} - ok( !exists $obj->{bar}, "bar doesn't exist" ); -TODO: { - local $TODO = "Adding items in transactions will be fixed soon"; - ok( !tied(%$obj), "And it's not tied" ); -} - - ok( !exists $db->{foo}, "The transaction inside the DB works" ); -} - -__END__ -{ - my $obj = bless { - foo => 5, - }, 'Foo'; - - cmp_ok( $obj->{foo}, '==', 5 ); - ok( !exists $obj->{bar} ); - - $db->begin_work; - - $db->{foo} = $obj; - $db->{foo}{bar} = 1; - - cmp_ok( $db->{foo}{bar}, '==', 1, "The value is visible within the transaction" ); - cmp_ok( $obj->{bar}, '==', 1, "The value is visible within the object" ); - - $db->commit; - - cmp_ok( $obj->{foo}, '==', 5 ); - ok( !exists $obj->{bar} ); -} diff --git a/t/40_freespace.t b/t/40_freespace.t index bc8216d..7b0645d 100644 --- a/t/40_freespace.t +++ b/t/40_freespace.t @@ -75,7 +75,8 @@ use_ok( 'DBM::Deep' ); # we wrote this dreck ... my $size = -s $filename; - my $expected = $size + 9 * ( 256 + 256 ); + my $data_sector_size = $db->_engine->data_sector_size; + my $expected = $size + 9 * ( 2 * $data_sector_size ); $db->{ $_ } = undef for 5 .. 17; diff --git a/t/43_transaction_maximum.t b/t/43_transaction_maximum.t new file mode 100644 index 0000000..e8462b3 --- /dev/null +++ b/t/43_transaction_maximum.t @@ -0,0 +1,38 @@ +use strict; +use Test::More; +use Test::Deep; +use Test::Exception; +use t::common qw( new_fh ); + +use DBM::Deep; + +my $max_txns = 255; + +my ($fh, $filename) = new_fh(); + +my @dbs = grep { $_ } map { + eval { + DBM::Deep->new( + file => $filename, + num_txns => $max_txns, + ); + }; +} 1 .. $max_txns; + +my $num = $#dbs; + +plan tests => do { + my $n = $num + 1; + 2 * $n; +}; + +my %trans_ids; +for my $n (0 .. $num) { + lives_ok { + $dbs[$n]->begin_work + } "DB $n can begin_work"; + + my $trans_id = $dbs[$n]->_engine->trans_id; + ok( !exists $trans_ids{ $trans_id }, "DB $n has a unique transaction ID ($trans_id)" ); + $trans_ids{ $trans_id } = $n; +} diff --git a/t/44_upgrade_db.t b/t/44_upgrade_db.t new file mode 100644 index 0000000..b48d1be --- /dev/null +++ b/t/44_upgrade_db.t @@ -0,0 +1,163 @@ +$|++; +use strict; +use Test::More; + +# Add skips here +BEGIN { + my @failures; + eval { use Pod::Usage; }; push @failures, 'Pod::Usage' if $@; + eval { use IO::Scalar; }; push @failures, 'IO::Scalar' if $@; + if ( @failures ) { + my $missing = join ',', @failures; + plan skip_all => "'$missing' must be installed to run these tests"; + } +} + +plan tests => 116; + +use t::common qw( new_fh ); +use File::Spec; +use Test::Deep; + +my $PROG = File::Spec->catfile( qw( utils upgrade_db.pl ) ); + +my $short = get_pod( $PROG, 0 ); +my $long = get_pod( $PROG, 1 ); + +is( run_prog( $PROG ), "Missing required parameters.\n$long", "Failed no params" ); +is( run_prog( $PROG, '-input foo' ), "Missing required parameters.\n$long", "Failed only -input" ); +is( run_prog( $PROG, '-output foo' ), "Missing required parameters.\n$long", "Failed only -output" ); +is( + run_prog( $PROG, '-input foo', '-output foo' ), + "Cannot use the same filename for both input and output.\n$short", + "Failed same name", +); + +is( + run_prog( $PROG, '-input foo', '-output bar' ), + "'foo' is not a file.\n$short", + "Failed input does not exist", +); + +my (undef, $input_filename) = new_fh(); +my (undef, $output_filename) = new_fh(); + +is( + run_prog( $PROG, "-input $input_filename", "-output $output_filename" ), + "'$input_filename' is not a DBM::Deep file.\n$short", + "Input is not a DBM::Deep file", +); + +# All files are of the form: +# $db->{foo} = [ 1 .. 3 ]; + +my @input_files = ( + '0-983', + '0-99_04', + '1-0000', +); + +my @output_versions = ( + '0.91', '0.92', '0.93', '0.94', '0.95', '0.96', '0.97', '0.98', + '0.981', '0.982', '0.983', + '0.99_01', '0.99_02', '0.99_03', '0.99_04', + '1.00', '1.000', '1.0000', +); + +foreach my $input_filename ( + map { + File::Spec->catfile( qw( t etc ), "db-$_" ) + } @input_files +) { + # chmod it writable because old DBM::Deep versions don't handle readonly + # files correctly. This is fixed in DBM::Deep 1.0000 + chmod 0600, $input_filename; + + foreach my $v ( @output_versions ) { + my (undef, $output_filename) = new_fh(); + my $output = run_prog( + $PROG, + "-input $input_filename", + "-output $output_filename", + "-version $v", + ); + + if ( $input_filename =~ /_/ ) { + is( + $output, "'$input_filename' is a dev release and not supported.\n$short", + "Input file is a dev release - not supported", + ); + + next; + } + + if ( $v =~ /_/ ) { + is( + $output, "-version '$v' is a dev release and not supported.\n$short", + "Output version is a dev release - not supported", + ); + + next; + } + + # Now, read the output file with the right version. + ok( !$output, "A successful run produces no output" ); + die "$output\n" if $output; + + my $db; + if ( $v =~ /^0/ ) { + push @INC, File::Spec->catdir( 'utils', 'lib' ); + eval "use DBM::Deep::09830"; + $db = DBM::Deep::09830->new( $output_filename ); + } + elsif ( $v =~ /^1/ ) { + push @INC, 'lib'; + eval "use DBM::Deep"; + $db = DBM::Deep->new( $output_filename ); + } + else { + die "How did we get here?!\n"; + } + + ok( $db, "Writing to version $v made a file" ); + + cmp_deeply( + $db->export, + { foo => [ 1 .. 3 ] }, + "We can read the output file", + ); + } +} + +################################################################################ + +#XXX This needs to be made OS-portable +sub run_prog { + open( my $fh, '-|', "@_ 2>&1" ) + or die "Cannot launch '@_' as a piped filehandle: $!\n"; + return join '', <$fh>; +} + +# In 5.8, we could use in-memory filehandles and have done: +# open( my $fh, '>', \my $pod ) or die "Cannot open in-memory filehandle: $!\n"; +# ... +# return $pod; +# However, DBM::Deep requires 5.6, so this set of contortions will have to do. +sub get_pod { + my ($p,$v) = @_; + + my ($fh, $fn) = new_fh(); + close $fh; + + open $fh, '>', $fn; + pod2usage({ + -input => $p, + -output => $fh, + -verbose => $v, + -exitval => 'NOEXIT', + }); + close $fh; + + open $fh, '<', $fn; + return join '', <$fh>; +} diff --git a/t/common.pm b/t/common.pm index 3b4958c..2348cb9 100644 --- a/t/common.pm +++ b/t/common.pm @@ -27,7 +27,7 @@ sub new_fh { return ($fh, $filename); } -#END{<>} + 1; __END__ diff --git a/t/etc/db-0-983 b/t/etc/db-0-983 new file mode 100644 index 0000000..25a921e Binary files /dev/null and b/t/etc/db-0-983 differ diff --git a/t/etc/db-0-99_04 b/t/etc/db-0-99_04 new file mode 100644 index 0000000..9268167 Binary files /dev/null and b/t/etc/db-0-99_04 differ diff --git a/t/etc/db-1-0000 b/t/etc/db-1-0000 new file mode 100644 index 0000000..5dc981d Binary files /dev/null and b/t/etc/db-1-0000 differ diff --git a/t_attic/36_verybighash.t b/t_attic/36_verybighash.t new file mode 100644 index 0000000..4f0bcc3 --- /dev/null +++ b/t_attic/36_verybighash.t @@ -0,0 +1,46 @@ +# This test was contributed by Fedor Soreks, Jan 2007. + +use strict; +use Test::More; + +plan skip_all => "You must set \$ENV{LONG_TESTS} >= 2 to run the superlong tests" + unless $ENV{LONG_TESTS} && $ENV{LONG_TESTS} >= 2; + +use Test::Deep; +use t::common qw( new_fh ); + +plan tests => 2; + +use_ok( 'DBM::Deep' ); + +diag "This test can take up to several hours to run. Please be VERY patient."; + +my ($fh, $filename) = new_fh(); +my $db = DBM::Deep->new( + file => $filename, + type => DBM::Deep->TYPE_HASH, +); + +my $gigs = 2; + +## +# put/get many keys +## +my $max_keys = 4_000_000; +my $max_record_keys = 10; + +for my $key_no ( 0 .. $max_keys ) { + for my $rec_no ( 0 .. $max_record_keys ) { + $db->{"key_$key_no"}{"rec_key_$rec_no"} = "rec_val_$rec_no"; + } + + my $s = -s $filename; + print "$key_no: $s\n"; + + if ( $s > $gigs * 2**30) { + fail "DB file ($filename) size exceeds $gigs GB"; + exit; + } +} + +ok( 1, "We passed the test!" ); diff --git a/t/TODO b/t_attic/TODO similarity index 100% rename from t/TODO rename to t_attic/TODO diff --git a/utils/lib/DBM/Deep/09830.pm b/utils/lib/DBM/Deep/09830.pm new file mode 100644 index 0000000..4022e58 --- /dev/null +++ b/utils/lib/DBM/Deep/09830.pm @@ -0,0 +1,2112 @@ +package DBM::Deep::09830; + +## +# DBM::Deep +# +# Description: +# Multi-level database module for storing hash trees, arrays and simple +# key/value pairs into FTP-able, cross-platform binary database files. +# +# Type `perldoc DBM::Deep` for complete documentation. +# +# Usage Examples: +# my %db; +# tie %db, 'DBM::Deep', 'my_database.db'; # standard tie() method +# +# my $db = new DBM::Deep( 'my_database.db' ); # preferred OO method +# +# $db->{my_scalar} = 'hello world'; +# $db->{my_hash} = { larry => 'genius', hashes => 'fast' }; +# $db->{my_array} = [ 1, 2, 3, time() ]; +# $db->{my_complex} = [ 'hello', { perl => 'rules' }, 42, 99 ]; +# push @{$db->{my_array}}, 'another value'; +# my @key_list = keys %{$db->{my_hash}}; +# print "This module " . $db->{my_complex}->[1]->{perl} . "!\n"; +# +# Copyright: +# (c) 2002-2006 Joseph Huckaby. All Rights Reserved. +# This program is free software; you can redistribute it and/or +# modify it under the same terms as Perl itself. +## + +use strict; + +use Fcntl qw( :DEFAULT :flock :seek ); +use Digest::MD5 (); +use Scalar::Util (); + +use vars qw( $VERSION ); +$VERSION = q(0.983); + +## +# Set to 4 and 'N' for 32-bit offset tags (default). Theoretical limit of 4 GB per file. +# (Perl must be compiled with largefile support for files > 2 GB) +# +# Set to 8 and 'Q' for 64-bit offsets. Theoretical limit of 16 XB per file. +# (Perl must be compiled with largefile and 64-bit long support) +## +#my $LONG_SIZE = 4; +#my $LONG_PACK = 'N'; + +## +# Set to 4 and 'N' for 32-bit data length prefixes. Limit of 4 GB for each key/value. +# Upgrading this is possible (see above) but probably not necessary. If you need +# more than 4 GB for a single key or value, this module is really not for you :-) +## +#my $DATA_LENGTH_SIZE = 4; +#my $DATA_LENGTH_PACK = 'N'; +our ($LONG_SIZE, $LONG_PACK, $DATA_LENGTH_SIZE, $DATA_LENGTH_PACK); + +## +# Maximum number of buckets per list before another level of indexing is done. +# Increase this value for slightly greater speed, but larger database files. +# DO NOT decrease this value below 16, due to risk of recursive reindex overrun. +## +my $MAX_BUCKETS = 16; + +## +# Better not adjust anything below here, unless you're me :-) +## + +## +# Setup digest function for keys +## +our ($DIGEST_FUNC, $HASH_SIZE); +#my $DIGEST_FUNC = \&Digest::MD5::md5; + +## +# Precalculate index and bucket sizes based on values above. +## +#my $HASH_SIZE = 16; +my ($INDEX_SIZE, $BUCKET_SIZE, $BUCKET_LIST_SIZE); + +set_digest(); +#set_pack(); +#_precalc_sizes(); + +## +# Setup file and tag signatures. These should never change. +## +sub SIG_FILE () { 'DPDB' } +sub SIG_HASH () { 'H' } +sub SIG_ARRAY () { 'A' } +sub SIG_NULL () { 'N' } +sub SIG_DATA () { 'D' } +sub SIG_INDEX () { 'I' } +sub SIG_BLIST () { 'B' } +sub SIG_SIZE () { 1 } + +## +# Setup constants for users to pass to new() +## +sub TYPE_HASH () { SIG_HASH } +sub TYPE_ARRAY () { SIG_ARRAY } + +sub _get_args { + my $proto = shift; + + my $args; + if (scalar(@_) > 1) { + if ( @_ % 2 ) { + $proto->_throw_error( "Odd number of parameters to " . (caller(1))[2] ); + } + $args = {@_}; + } + elsif ( ref $_[0] ) { + unless ( eval { local $SIG{'__DIE__'}; %{$_[0]} || 1 } ) { + $proto->_throw_error( "Not a hashref in args to " . (caller(1))[2] ); + } + $args = $_[0]; + } + else { + $args = { file => shift }; + } + + return $args; +} + +sub new { + ## + # Class constructor method for Perl OO interface. + # Calls tie() and returns blessed reference to tied hash or array, + # providing a hybrid OO/tie interface. + ## + my $class = shift; + my $args = $class->_get_args( @_ ); + + ## + # Check if we want a tied hash or array. + ## + my $self; + if (defined($args->{type}) && $args->{type} eq TYPE_ARRAY) { + $class = 'DBM::Deep::09830::Array'; + #require DBM::Deep::09830::Array; + tie @$self, $class, %$args; + } + else { + $class = 'DBM::Deep::09830::Hash'; + #require DBM::Deep::09830::Hash; + tie %$self, $class, %$args; + } + + return bless $self, $class; +} + +sub _init { + ## + # Setup $self and bless into this class. + ## + my $class = shift; + my $args = shift; + + # These are the defaults to be optionally overridden below + my $self = bless { + type => TYPE_HASH, + base_offset => length(SIG_FILE), + }, $class; + + foreach my $param ( keys %$self ) { + next unless exists $args->{$param}; + $self->{$param} = delete $args->{$param} + } + + # locking implicitly enables autoflush + if ($args->{locking}) { $args->{autoflush} = 1; } + + $self->{root} = exists $args->{root} + ? $args->{root} + : DBM::Deep::09830::_::Root->new( $args ); + + if (!defined($self->_fh)) { $self->_open(); } + + return $self; +} + +sub TIEHASH { + shift; + #require DBM::Deep::09830::Hash; + return DBM::Deep::09830::Hash->TIEHASH( @_ ); +} + +sub TIEARRAY { + shift; + #require DBM::Deep::09830::Array; + return DBM::Deep::09830::Array->TIEARRAY( @_ ); +} + +#XXX Unneeded now ... +#sub DESTROY { +#} + +sub _open { + ## + # Open a fh to the database, create if nonexistent. + # Make sure file signature matches DBM::Deep spec. + ## + my $self = $_[0]->_get_self; + + local($/,$\); + + if (defined($self->_fh)) { $self->_close(); } + + my $flags = O_RDWR | O_CREAT | O_BINARY; + + my $fh; + sysopen( $fh, $self->_root->{file}, $flags ) + or $self->_throw_error( "Cannot sysopen file: " . $self->_root->{file} . ": $!" ); + + $self->_root->{fh} = $fh; + + if ($self->_root->{autoflush}) { + my $old = select $fh; + $|=1; + select $old; + } + + seek($fh, 0 + $self->_root->{file_offset}, SEEK_SET); + + my $signature; + my $bytes_read = read( $fh, $signature, length(SIG_FILE)); + + ## + # File is empty -- write signature and master index + ## + if (!$bytes_read) { + seek($fh, 0 + $self->_root->{file_offset}, SEEK_SET); + print( $fh SIG_FILE); + $self->_create_tag($self->_base_offset, $self->_type, chr(0) x $INDEX_SIZE); + + my $plain_key = "[base]"; + print( $fh pack($DATA_LENGTH_PACK, length($plain_key)) . $plain_key ); + + # Flush the filehandle + my $old_fh = select $fh; + my $old_af = $|; $| = 1; $| = $old_af; + select $old_fh; + + my @stats = stat($fh); + $self->_root->{inode} = $stats[1]; + $self->_root->{end} = $stats[7]; + + return 1; + } + + ## + # Check signature was valid + ## + unless ($signature eq SIG_FILE) { + $self->_close(); + return $self->_throw_error("Signature not found -- file is not a Deep DB"); + } + + my @stats = stat($fh); + $self->_root->{inode} = $stats[1]; + $self->_root->{end} = $stats[7]; + + ## + # Get our type from master index signature + ## + my $tag = $self->_load_tag($self->_base_offset); + +#XXX We probably also want to store the hash algorithm name and not assume anything +#XXX The cool thing would be to allow a different hashing algorithm at every level + + if (!$tag) { + return $self->_throw_error("Corrupted file, no master index record"); + } + if ($self->{type} ne $tag->{signature}) { + return $self->_throw_error("File type mismatch"); + } + + return 1; +} + +sub _close { + ## + # Close database fh + ## + my $self = $_[0]->_get_self; + close $self->_root->{fh} if $self->_root->{fh}; + $self->_root->{fh} = undef; +} + +sub _create_tag { + ## + # Given offset, signature and content, create tag and write to disk + ## + my ($self, $offset, $sig, $content) = @_; + my $size = length($content); + + local($/,$\); + + my $fh = $self->_fh; + + seek($fh, $offset + $self->_root->{file_offset}, SEEK_SET); + print( $fh $sig . pack($DATA_LENGTH_PACK, $size) . $content ); + + if ($offset == $self->_root->{end}) { + $self->_root->{end} += SIG_SIZE + $DATA_LENGTH_SIZE + $size; + } + + return { + signature => $sig, + size => $size, + offset => $offset + SIG_SIZE + $DATA_LENGTH_SIZE, + content => $content + }; +} + +sub _load_tag { + ## + # Given offset, load single tag and return signature, size and data + ## + my $self = shift; + my $offset = shift; + + local($/,$\); + + my $fh = $self->_fh; + + seek($fh, $offset + $self->_root->{file_offset}, SEEK_SET); + if (eof $fh) { return undef; } + + my $b; + read( $fh, $b, SIG_SIZE + $DATA_LENGTH_SIZE ); + my ($sig, $size) = unpack( "A $DATA_LENGTH_PACK", $b ); + + my $buffer; + read( $fh, $buffer, $size); + + return { + signature => $sig, + size => $size, + offset => $offset + SIG_SIZE + $DATA_LENGTH_SIZE, + content => $buffer + }; +} + +sub _index_lookup { + ## + # Given index tag, lookup single entry in index and return . + ## + my $self = shift; + my ($tag, $index) = @_; + + my $location = unpack($LONG_PACK, substr($tag->{content}, $index * $LONG_SIZE, $LONG_SIZE) ); + if (!$location) { return; } + + return $self->_load_tag( $location ); +} + +sub _add_bucket { + ## + # Adds one key/value pair to bucket list, given offset, MD5 digest of key, + # plain (undigested) key and value. + ## + my $self = shift; + my ($tag, $md5, $plain_key, $value) = @_; + my $keys = $tag->{content}; + my $location = 0; + my $result = 2; + + local($/,$\); + + # This verifies that only supported values will be stored. + { + my $r = Scalar::Util::reftype( $value ); + last if !defined $r; + + last if $r eq 'HASH'; + last if $r eq 'ARRAY'; + + $self->_throw_error( + "Storage of variables of type '$r' is not supported." + ); + } + + my $root = $self->_root; + + my $is_dbm_deep = eval { local $SIG{'__DIE__'}; $value->isa( 'DBM::Deep::09830' ) }; + my $internal_ref = $is_dbm_deep && ($value->_root eq $root); + + my $fh = $self->_fh; + + ## + # Iterate through buckets, seeing if this is a new entry or a replace. + ## + for (my $i=0; $i<$MAX_BUCKETS; $i++) { + my $subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + if (!$subloc) { + ## + # Found empty bucket (end of list). Populate and exit loop. + ## + $result = 2; + + $location = $internal_ref + ? $value->_base_offset + : $root->{end}; + + seek($fh, $tag->{offset} + ($i * $BUCKET_SIZE) + $root->{file_offset}, SEEK_SET); + print( $fh $md5 . pack($LONG_PACK, $location) ); + last; + } + + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + if ($md5 eq $key) { + ## + # Found existing bucket with same key. Replace with new value. + ## + $result = 1; + + if ($internal_ref) { + $location = $value->_base_offset; + seek($fh, $tag->{offset} + ($i * $BUCKET_SIZE) + $root->{file_offset}, SEEK_SET); + print( $fh $md5 . pack($LONG_PACK, $location) ); + return $result; + } + + seek($fh, $subloc + SIG_SIZE + $root->{file_offset}, SEEK_SET); + my $size; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + + ## + # If value is a hash, array, or raw value with equal or less size, we can + # reuse the same content area of the database. Otherwise, we have to create + # a new content area at the EOF. + ## + my $actual_length; + my $r = Scalar::Util::reftype( $value ) || ''; + if ( $r eq 'HASH' || $r eq 'ARRAY' ) { + $actual_length = $INDEX_SIZE; + + # if autobless is enabled, must also take into consideration + # the class name, as it is stored along with key/value. + if ( $root->{autobless} ) { + my $value_class = Scalar::Util::blessed($value); + if ( defined $value_class && !$value->isa('DBM::Deep::09830') ) { + $actual_length += length($value_class); + } + } + } + else { $actual_length = length($value); } + + if ($actual_length <= ($size || 0)) { + $location = $subloc; + } + else { + $location = $root->{end}; + seek($fh, $tag->{offset} + ($i * $BUCKET_SIZE) + $HASH_SIZE + $root->{file_offset}, SEEK_SET); + print( $fh pack($LONG_PACK, $location) ); + } + + last; + } + } + + ## + # If this is an internal reference, return now. + # No need to write value or plain key + ## + if ($internal_ref) { + return $result; + } + + ## + # If bucket didn't fit into list, split into a new index level + ## + if (!$location) { + seek($fh, $tag->{ref_loc} + $root->{file_offset}, SEEK_SET); + print( $fh pack($LONG_PACK, $root->{end}) ); + + my $index_tag = $self->_create_tag($root->{end}, SIG_INDEX, chr(0) x $INDEX_SIZE); + my @offsets = (); + + $keys .= $md5 . pack($LONG_PACK, 0); + + for (my $i=0; $i<=$MAX_BUCKETS; $i++) { + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + if ($key) { + my $old_subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + my $num = ord(substr($key, $tag->{ch} + 1, 1)); + + if ($offsets[$num]) { + my $offset = $offsets[$num] + SIG_SIZE + $DATA_LENGTH_SIZE; + seek($fh, $offset + $root->{file_offset}, SEEK_SET); + my $subkeys; + read( $fh, $subkeys, $BUCKET_LIST_SIZE); + + for (my $k=0; $k<$MAX_BUCKETS; $k++) { + my $subloc = unpack($LONG_PACK, substr($subkeys, ($k * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + if (!$subloc) { + seek($fh, $offset + ($k * $BUCKET_SIZE) + $root->{file_offset}, SEEK_SET); + print( $fh $key . pack($LONG_PACK, $old_subloc || $root->{end}) ); + last; + } + } # k loop + } + else { + $offsets[$num] = $root->{end}; + seek($fh, $index_tag->{offset} + ($num * $LONG_SIZE) + $root->{file_offset}, SEEK_SET); + print( $fh pack($LONG_PACK, $root->{end}) ); + + my $blist_tag = $self->_create_tag($root->{end}, SIG_BLIST, chr(0) x $BUCKET_LIST_SIZE); + + seek($fh, $blist_tag->{offset} + $root->{file_offset}, SEEK_SET); + print( $fh $key . pack($LONG_PACK, $old_subloc || $root->{end}) ); + } + } # key is real + } # i loop + + $location ||= $root->{end}; + } # re-index bucket list + + ## + # Seek to content area and store signature, value and plaintext key + ## + if ($location) { + my $content_length; + seek($fh, $location + $root->{file_offset}, SEEK_SET); + + ## + # Write signature based on content type, set content length and write actual value. + ## + my $r = Scalar::Util::reftype($value) || ''; + if ($r eq 'HASH') { + if ( !$internal_ref && tied %{$value} ) { + return $self->_throw_error("Cannot store a tied value"); + } + print( $fh TYPE_HASH ); + print( $fh pack($DATA_LENGTH_PACK, $INDEX_SIZE) . chr(0) x $INDEX_SIZE ); + $content_length = $INDEX_SIZE; + } + elsif ($r eq 'ARRAY') { + if ( !$internal_ref && tied @{$value} ) { + return $self->_throw_error("Cannot store a tied value"); + } + print( $fh TYPE_ARRAY ); + print( $fh pack($DATA_LENGTH_PACK, $INDEX_SIZE) . chr(0) x $INDEX_SIZE ); + $content_length = $INDEX_SIZE; + } + elsif (!defined($value)) { + print( $fh SIG_NULL ); + print( $fh pack($DATA_LENGTH_PACK, 0) ); + $content_length = 0; + } + else { + print( $fh SIG_DATA ); + print( $fh pack($DATA_LENGTH_PACK, length($value)) . $value ); + $content_length = length($value); + } + + ## + # Plain key is stored AFTER value, as keys are typically fetched less often. + ## + print( $fh pack($DATA_LENGTH_PACK, length($plain_key)) . $plain_key ); + + ## + # If value is blessed, preserve class name + ## + if ( $root->{autobless} ) { + my $value_class = Scalar::Util::blessed($value); + if ( defined $value_class && $value_class ne 'DBM::Deep::09830' ) { + ## + # Blessed ref -- will restore later + ## + print( $fh chr(1) ); + print( $fh pack($DATA_LENGTH_PACK, length($value_class)) . $value_class ); + $content_length += 1; + $content_length += $DATA_LENGTH_SIZE + length($value_class); + } + else { + print( $fh chr(0) ); + $content_length += 1; + } + } + + ## + # If this is a new content area, advance EOF counter + ## + if ($location == $root->{end}) { + $root->{end} += SIG_SIZE; + $root->{end} += $DATA_LENGTH_SIZE + $content_length; + $root->{end} += $DATA_LENGTH_SIZE + length($plain_key); + } + + ## + # If content is a hash or array, create new child DBM::Deep object and + # pass each key or element to it. + ## + if ($r eq 'HASH') { + my %x = %$value; + tie %$value, 'DBM::Deep::09830', { + type => TYPE_HASH, + base_offset => $location, + root => $root, + }; + %$value = %x; + } + elsif ($r eq 'ARRAY') { + my @x = @$value; + tie @$value, 'DBM::Deep::09830', { + type => TYPE_ARRAY, + base_offset => $location, + root => $root, + }; + @$value = @x; + } + + return $result; + } + + return $self->_throw_error("Fatal error: indexing failed -- possibly due to corruption in file"); +} + +sub _get_bucket_value { + ## + # Fetch single value given tag and MD5 digested key. + ## + my $self = shift; + my ($tag, $md5) = @_; + my $keys = $tag->{content}; + + local($/,$\); + + my $fh = $self->_fh; + + ## + # Iterate through buckets, looking for a key match + ## + BUCKET: + for (my $i=0; $i<$MAX_BUCKETS; $i++) { + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + my $subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + + if (!$subloc) { + ## + # Hit end of list, no match + ## + return; + } + + if ( $md5 ne $key ) { + next BUCKET; + } + + ## + # Found match -- seek to offset and read signature + ## + my $signature; + seek($fh, $subloc + $self->_root->{file_offset}, SEEK_SET); + read( $fh, $signature, SIG_SIZE); + + ## + # If value is a hash or array, return new DBM::Deep object with correct offset + ## + if (($signature eq TYPE_HASH) || ($signature eq TYPE_ARRAY)) { + my $obj = DBM::Deep::09830->new( + type => $signature, + base_offset => $subloc, + root => $self->_root + ); + + if ($self->_root->{autobless}) { + ## + # Skip over value and plain key to see if object needs + # to be re-blessed + ## + seek($fh, $DATA_LENGTH_SIZE + $INDEX_SIZE, SEEK_CUR); + + my $size; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + if ($size) { seek($fh, $size, SEEK_CUR); } + + my $bless_bit; + read( $fh, $bless_bit, 1); + if (ord($bless_bit)) { + ## + # Yes, object needs to be re-blessed + ## + my $class_name; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + if ($size) { read( $fh, $class_name, $size); } + if ($class_name) { $obj = bless( $obj, $class_name ); } + } + } + + return $obj; + } + + ## + # Otherwise return actual value + ## + elsif ($signature eq SIG_DATA) { + my $size; + my $value = ''; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + if ($size) { read( $fh, $value, $size); } + return $value; + } + + ## + # Key exists, but content is null + ## + else { return; } + } # i loop + + return; +} + +sub _delete_bucket { + ## + # Delete single key/value pair given tag and MD5 digested key. + ## + my $self = shift; + my ($tag, $md5) = @_; + my $keys = $tag->{content}; + + local($/,$\); + + my $fh = $self->_fh; + + ## + # Iterate through buckets, looking for a key match + ## + BUCKET: + for (my $i=0; $i<$MAX_BUCKETS; $i++) { + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + my $subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + + if (!$subloc) { + ## + # Hit end of list, no match + ## + return; + } + + if ( $md5 ne $key ) { + next BUCKET; + } + + ## + # Matched key -- delete bucket and return + ## + seek($fh, $tag->{offset} + ($i * $BUCKET_SIZE) + $self->_root->{file_offset}, SEEK_SET); + print( $fh substr($keys, ($i+1) * $BUCKET_SIZE ) ); + print( $fh chr(0) x $BUCKET_SIZE ); + + return 1; + } # i loop + + return; +} + +sub _bucket_exists { + ## + # Check existence of single key given tag and MD5 digested key. + ## + my $self = shift; + my ($tag, $md5) = @_; + my $keys = $tag->{content}; + + ## + # Iterate through buckets, looking for a key match + ## + BUCKET: + for (my $i=0; $i<$MAX_BUCKETS; $i++) { + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + my $subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + + if (!$subloc) { + ## + # Hit end of list, no match + ## + return; + } + + if ( $md5 ne $key ) { + next BUCKET; + } + + ## + # Matched key -- return true + ## + return 1; + } # i loop + + return; +} + +sub _find_bucket_list { + ## + # Locate offset for bucket list, given digested key + ## + my $self = shift; + my $md5 = shift; + + ## + # Locate offset for bucket list using digest index system + ## + my $ch = 0; + my $tag = $self->_load_tag($self->_base_offset); + if (!$tag) { return; } + + while ($tag->{signature} ne SIG_BLIST) { + $tag = $self->_index_lookup($tag, ord(substr($md5, $ch, 1))); + if (!$tag) { return; } + $ch++; + } + + return $tag; +} + +sub _traverse_index { + ## + # Scan index and recursively step into deeper levels, looking for next key. + ## + my ($self, $offset, $ch, $force_return_next) = @_; + $force_return_next = undef unless $force_return_next; + + local($/,$\); + + my $tag = $self->_load_tag( $offset ); + + my $fh = $self->_fh; + + if ($tag->{signature} ne SIG_BLIST) { + my $content = $tag->{content}; + my $start; + if ($self->{return_next}) { $start = 0; } + else { $start = ord(substr($self->{prev_md5}, $ch, 1)); } + + for (my $index = $start; $index < 256; $index++) { + my $subloc = unpack($LONG_PACK, substr($content, $index * $LONG_SIZE, $LONG_SIZE) ); + if ($subloc) { + my $result = $self->_traverse_index( $subloc, $ch + 1, $force_return_next ); + if (defined($result)) { return $result; } + } + } # index loop + + $self->{return_next} = 1; + } # tag is an index + + elsif ($tag->{signature} eq SIG_BLIST) { + my $keys = $tag->{content}; + if ($force_return_next) { $self->{return_next} = 1; } + + ## + # Iterate through buckets, looking for a key match + ## + for (my $i=0; $i<$MAX_BUCKETS; $i++) { + my $key = substr($keys, $i * $BUCKET_SIZE, $HASH_SIZE); + my $subloc = unpack($LONG_PACK, substr($keys, ($i * $BUCKET_SIZE) + $HASH_SIZE, $LONG_SIZE)); + + if (!$subloc) { + ## + # End of bucket list -- return to outer loop + ## + $self->{return_next} = 1; + last; + } + elsif ($key eq $self->{prev_md5}) { + ## + # Located previous key -- return next one found + ## + $self->{return_next} = 1; + next; + } + elsif ($self->{return_next}) { + ## + # Seek to bucket location and skip over signature + ## + seek($fh, $subloc + SIG_SIZE + $self->_root->{file_offset}, SEEK_SET); + + ## + # Skip over value to get to plain key + ## + my $size; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + if ($size) { seek($fh, $size, SEEK_CUR); } + + ## + # Read in plain key and return as scalar + ## + my $plain_key; + read( $fh, $size, $DATA_LENGTH_SIZE); $size = unpack($DATA_LENGTH_PACK, $size); + if ($size) { read( $fh, $plain_key, $size); } + + return $plain_key; + } + } # bucket loop + + $self->{return_next} = 1; + } # tag is a bucket list + + return; +} + +sub _get_next_key { + ## + # Locate next key, given digested previous one + ## + my $self = $_[0]->_get_self; + + $self->{prev_md5} = $_[1] ? $_[1] : undef; + $self->{return_next} = 0; + + ## + # If the previous key was not specifed, start at the top and + # return the first one found. + ## + if (!$self->{prev_md5}) { + $self->{prev_md5} = chr(0) x $HASH_SIZE; + $self->{return_next} = 1; + } + + return $self->_traverse_index( $self->_base_offset, 0 ); +} + +sub lock { + ## + # If db locking is set, flock() the db file. If called multiple + # times before unlock(), then the same number of unlocks() must + # be called before the lock is released. + ## + my $self = $_[0]->_get_self; + my $type = $_[1]; + $type = LOCK_EX unless defined $type; + + if (!defined($self->_fh)) { return; } + + if ($self->_root->{locking}) { + if (!$self->_root->{locked}) { + flock($self->_fh, $type); + + # refresh end counter in case file has changed size + my @stats = stat($self->_root->{file}); + $self->_root->{end} = $stats[7]; + + # double-check file inode, in case another process + # has optimize()d our file while we were waiting. + if ($stats[1] != $self->_root->{inode}) { + $self->_open(); # re-open + flock($self->_fh, $type); # re-lock + $self->_root->{end} = (stat($self->_fh))[7]; # re-end + } + } + $self->_root->{locked}++; + + return 1; + } + + return; +} + +sub unlock { + ## + # If db locking is set, unlock the db file. See note in lock() + # regarding calling lock() multiple times. + ## + my $self = $_[0]->_get_self; + + if (!defined($self->_fh)) { return; } + + if ($self->_root->{locking} && $self->_root->{locked} > 0) { + $self->_root->{locked}--; + if (!$self->_root->{locked}) { flock($self->_fh, LOCK_UN); } + + return 1; + } + + return; +} + +sub _copy_value { + my $self = shift->_get_self; + my ($spot, $value) = @_; + + if ( !ref $value ) { + ${$spot} = $value; + } + elsif ( eval { local $SIG{__DIE__}; $value->isa( 'DBM::Deep::09830' ) } ) { + my $type = $value->_type; + ${$spot} = $type eq TYPE_HASH ? {} : []; + $value->_copy_node( ${$spot} ); + } + else { + my $r = Scalar::Util::reftype( $value ); + my $c = Scalar::Util::blessed( $value ); + if ( $r eq 'ARRAY' ) { + ${$spot} = [ @{$value} ]; + } + else { + ${$spot} = { %{$value} }; + } + ${$spot} = bless ${$spot}, $c + if defined $c; + } + + return 1; +} + +sub _copy_node { + ## + # Copy single level of keys or elements to new DB handle. + # Recurse for nested structures + ## + my $self = shift->_get_self; + my ($db_temp) = @_; + + if ($self->_type eq TYPE_HASH) { + my $key = $self->first_key(); + while ($key) { + my $value = $self->get($key); + $self->_copy_value( \$db_temp->{$key}, $value ); + $key = $self->next_key($key); + } + } + else { + my $length = $self->length(); + for (my $index = 0; $index < $length; $index++) { + my $value = $self->get($index); + $self->_copy_value( \$db_temp->[$index], $value ); + } + } + + return 1; +} + +sub export { + ## + # Recursively export into standard Perl hashes and arrays. + ## + my $self = $_[0]->_get_self; + + my $temp; + if ($self->_type eq TYPE_HASH) { $temp = {}; } + elsif ($self->_type eq TYPE_ARRAY) { $temp = []; } + + $self->lock(); + $self->_copy_node( $temp ); + $self->unlock(); + + return $temp; +} + +sub import { + ## + # Recursively import Perl hash/array structure + ## + #XXX This use of ref() seems to be ok + if (!ref($_[0])) { return; } # Perl calls import() on use -- ignore + + my $self = $_[0]->_get_self; + my $struct = $_[1]; + + #XXX This use of ref() seems to be ok + if (!ref($struct)) { + ## + # struct is not a reference, so just import based on our type + ## + shift @_; + + if ($self->_type eq TYPE_HASH) { $struct = {@_}; } + elsif ($self->_type eq TYPE_ARRAY) { $struct = [@_]; } + } + + my $r = Scalar::Util::reftype($struct) || ''; + if ($r eq "HASH" && $self->_type eq TYPE_HASH) { + foreach my $key (keys %$struct) { $self->put($key, $struct->{$key}); } + } + elsif ($r eq "ARRAY" && $self->_type eq TYPE_ARRAY) { + $self->push( @$struct ); + } + else { + return $self->_throw_error("Cannot import: type mismatch"); + } + + return 1; +} + +sub optimize { + ## + # Rebuild entire database into new file, then move + # it back on top of original. + ## + my $self = $_[0]->_get_self; + +#XXX Need to create a new test for this +# if ($self->_root->{links} > 1) { +# return $self->_throw_error("Cannot optimize: reference count is greater than 1"); +# } + + my $db_temp = DBM::Deep::09830->new( + file => $self->_root->{file} . '.tmp', + type => $self->_type + ); + if (!$db_temp) { + return $self->_throw_error("Cannot optimize: failed to open temp file: $!"); + } + + $self->lock(); + $self->_copy_node( $db_temp ); + undef $db_temp; + + ## + # Attempt to copy user, group and permissions over to new file + ## + my @stats = stat($self->_fh); + my $perms = $stats[2] & 07777; + my $uid = $stats[4]; + my $gid = $stats[5]; + chown( $uid, $gid, $self->_root->{file} . '.tmp' ); + chmod( $perms, $self->_root->{file} . '.tmp' ); + + # q.v. perlport for more information on this variable + if ( $^O eq 'MSWin32' || $^O eq 'cygwin' ) { + ## + # Potential race condition when optmizing on Win32 with locking. + # The Windows filesystem requires that the filehandle be closed + # before it is overwritten with rename(). This could be redone + # with a soft copy. + ## + $self->unlock(); + $self->_close(); + } + + if (!rename $self->_root->{file} . '.tmp', $self->_root->{file}) { + unlink $self->_root->{file} . '.tmp'; + $self->unlock(); + return $self->_throw_error("Optimize failed: Cannot copy temp file over original: $!"); + } + + $self->unlock(); + $self->_close(); + $self->_open(); + + return 1; +} + +sub clone { + ## + # Make copy of object and return + ## + my $self = $_[0]->_get_self; + + return DBM::Deep::09830->new( + type => $self->_type, + base_offset => $self->_base_offset, + root => $self->_root + ); +} + +{ + my %is_legal_filter = map { + $_ => ~~1, + } qw( + store_key store_value + fetch_key fetch_value + ); + + sub set_filter { + ## + # Setup filter function for storing or fetching the key or value + ## + my $self = $_[0]->_get_self; + my $type = lc $_[1]; + my $func = $_[2] ? $_[2] : undef; + + if ( $is_legal_filter{$type} ) { + $self->_root->{"filter_$type"} = $func; + return 1; + } + + return; + } +} + +## +# Accessor methods +## + +sub _root { + ## + # Get access to the root structure + ## + my $self = $_[0]->_get_self; + return $self->{root}; +} + +sub _fh { + ## + # Get access to the raw fh + ## + #XXX It will be useful, though, when we split out HASH and ARRAY + my $self = $_[0]->_get_self; + return $self->_root->{fh}; +} + +sub _type { + ## + # Get type of current node (TYPE_HASH or TYPE_ARRAY) + ## + my $self = $_[0]->_get_self; + return $self->{type}; +} + +sub _base_offset { + ## + # Get base_offset of current node (TYPE_HASH or TYPE_ARRAY) + ## + my $self = $_[0]->_get_self; + return $self->{base_offset}; +} + +sub error { + ## + # Get last error string, or undef if no error + ## + return $_[0] + ? ( $_[0]->_get_self->{root}->{error} or undef ) + : $@; +} + +## +# Utility methods +## + +sub _throw_error { + ## + # Store error string in self + ## + my $error_text = $_[1]; + + if ( Scalar::Util::blessed $_[0] ) { + my $self = $_[0]->_get_self; + $self->_root->{error} = $error_text; + + unless ($self->_root->{debug}) { + die "DBM::Deep::09830: $error_text\n"; + } + + warn "DBM::Deep::09830: $error_text\n"; + return; + } + else { + die "DBM::Deep::09830: $error_text\n"; + } +} + +sub clear_error { + ## + # Clear error state + ## + my $self = $_[0]->_get_self; + + undef $self->_root->{error}; +} + +sub _precalc_sizes { + ## + # Precalculate index, bucket and bucket list sizes + ## + + #XXX I don't like this ... + set_pack() unless defined $LONG_SIZE; + + $INDEX_SIZE = 256 * $LONG_SIZE; + $BUCKET_SIZE = $HASH_SIZE + $LONG_SIZE; + $BUCKET_LIST_SIZE = $MAX_BUCKETS * $BUCKET_SIZE; +} + +sub set_pack { + ## + # Set pack/unpack modes (see file header for more) + ## + my ($long_s, $long_p, $data_s, $data_p) = @_; + + $LONG_SIZE = $long_s ? $long_s : 4; + $LONG_PACK = $long_p ? $long_p : 'N'; + + $DATA_LENGTH_SIZE = $data_s ? $data_s : 4; + $DATA_LENGTH_PACK = $data_p ? $data_p : 'N'; + + _precalc_sizes(); +} + +sub set_digest { + ## + # Set key digest function (default is MD5) + ## + my ($digest_func, $hash_size) = @_; + + $DIGEST_FUNC = $digest_func ? $digest_func : \&Digest::MD5::md5; + $HASH_SIZE = $hash_size ? $hash_size : 16; + + _precalc_sizes(); +} + +sub _is_writable { + my $fh = shift; + (O_WRONLY | O_RDWR) & fcntl( $fh, F_GETFL, my $slush = 0); +} + +#sub _is_readable { +# my $fh = shift; +# (O_RDONLY | O_RDWR) & fcntl( $fh, F_GETFL, my $slush = 0); +#} + +## +# tie() methods (hashes and arrays) +## + +sub STORE { + ## + # Store single hash key/value or array element in database. + ## + my $self = $_[0]->_get_self; + my $key = $_[1]; + + local($/,$\); + + # User may be storing a hash, in which case we do not want it run + # through the filtering system + my $value = ($self->_root->{filter_store_value} && !ref($_[2])) + ? $self->_root->{filter_store_value}->($_[2]) + : $_[2]; + + my $md5 = $DIGEST_FUNC->($key); + + ## + # Make sure file is open + ## + if (!defined($self->_fh) && !$self->_open()) { + return; + } + + if ( $^O ne 'MSWin32' && !_is_writable( $self->_fh ) ) { + $self->_throw_error( 'Cannot write to a readonly filehandle' ); + } + + ## + # Request exclusive lock for writing + ## + $self->lock( LOCK_EX ); + + my $fh = $self->_fh; + + ## + # Locate offset for bucket list using digest index system + ## + my $tag = $self->_load_tag($self->_base_offset); + if (!$tag) { + $tag = $self->_create_tag($self->_base_offset, SIG_INDEX, chr(0) x $INDEX_SIZE); + } + + my $ch = 0; + while ($tag->{signature} ne SIG_BLIST) { + my $num = ord(substr($md5, $ch, 1)); + + my $ref_loc = $tag->{offset} + ($num * $LONG_SIZE); + my $new_tag = $self->_index_lookup($tag, $num); + + if (!$new_tag) { + seek($fh, $ref_loc + $self->_root->{file_offset}, SEEK_SET); + print( $fh pack($LONG_PACK, $self->_root->{end}) ); + + $tag = $self->_create_tag($self->_root->{end}, SIG_BLIST, chr(0) x $BUCKET_LIST_SIZE); + + $tag->{ref_loc} = $ref_loc; + $tag->{ch} = $ch; + + last; + } + else { + $tag = $new_tag; + + $tag->{ref_loc} = $ref_loc; + $tag->{ch} = $ch; + } + $ch++; + } + + ## + # Add key/value to bucket list + ## + my $result = $self->_add_bucket( $tag, $md5, $key, $value ); + + $self->unlock(); + + return $result; +} + +sub FETCH { + ## + # Fetch single value or element given plain key or array index + ## + my $self = shift->_get_self; + my $key = shift; + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + my $md5 = $DIGEST_FUNC->($key); + + ## + # Request shared lock for reading + ## + $self->lock( LOCK_SH ); + + my $tag = $self->_find_bucket_list( $md5 ); + if (!$tag) { + $self->unlock(); + return; + } + + ## + # Get value from bucket list + ## + my $result = $self->_get_bucket_value( $tag, $md5 ); + + $self->unlock(); + + #XXX What is ref() checking here? + #YYY Filters only apply on scalar values, so the ref check is making + #YYY sure the fetched bucket is a scalar, not a child hash or array. + return ($result && !ref($result) && $self->_root->{filter_fetch_value}) + ? $self->_root->{filter_fetch_value}->($result) + : $result; +} + +sub DELETE { + ## + # Delete single key/value pair or element given plain key or array index + ## + my $self = $_[0]->_get_self; + my $key = $_[1]; + + my $md5 = $DIGEST_FUNC->($key); + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + ## + # Request exclusive lock for writing + ## + $self->lock( LOCK_EX ); + + my $tag = $self->_find_bucket_list( $md5 ); + if (!$tag) { + $self->unlock(); + return; + } + + ## + # Delete bucket + ## + my $value = $self->_get_bucket_value( $tag, $md5 ); + if ($value && !ref($value) && $self->_root->{filter_fetch_value}) { + $value = $self->_root->{filter_fetch_value}->($value); + } + + my $result = $self->_delete_bucket( $tag, $md5 ); + + ## + # If this object is an array and the key deleted was on the end of the stack, + # decrement the length variable. + ## + + $self->unlock(); + + return $value; +} + +sub EXISTS { + ## + # Check if a single key or element exists given plain key or array index + ## + my $self = $_[0]->_get_self; + my $key = $_[1]; + + my $md5 = $DIGEST_FUNC->($key); + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + ## + # Request shared lock for reading + ## + $self->lock( LOCK_SH ); + + my $tag = $self->_find_bucket_list( $md5 ); + + ## + # For some reason, the built-in exists() function returns '' for false + ## + if (!$tag) { + $self->unlock(); + return ''; + } + + ## + # Check if bucket exists and return 1 or '' + ## + my $result = $self->_bucket_exists( $tag, $md5 ) || ''; + + $self->unlock(); + + return $result; +} + +sub CLEAR { + ## + # Clear all keys from hash, or all elements from array. + ## + my $self = $_[0]->_get_self; + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + ## + # Request exclusive lock for writing + ## + $self->lock( LOCK_EX ); + + my $fh = $self->_fh; + + seek($fh, $self->_base_offset + $self->_root->{file_offset}, SEEK_SET); + if (eof $fh) { + $self->unlock(); + return; + } + + $self->_create_tag($self->_base_offset, $self->_type, chr(0) x $INDEX_SIZE); + + $self->unlock(); + + return 1; +} + +## +# Public method aliases +## +sub put { (shift)->STORE( @_ ) } +sub store { (shift)->STORE( @_ ) } +sub get { (shift)->FETCH( @_ ) } +sub fetch { (shift)->FETCH( @_ ) } +sub delete { (shift)->DELETE( @_ ) } +sub exists { (shift)->EXISTS( @_ ) } +sub clear { (shift)->CLEAR( @_ ) } + +package DBM::Deep::09830::_::Root; + +sub new { + my $class = shift; + my ($args) = @_; + + my $self = bless { + file => undef, + fh => undef, + file_offset => 0, + end => 0, + autoflush => undef, + locking => undef, + debug => undef, + filter_store_key => undef, + filter_store_value => undef, + filter_fetch_key => undef, + filter_fetch_value => undef, + autobless => undef, + locked => 0, + %$args, + }, $class; + + if ( $self->{fh} && !$self->{file_offset} ) { + $self->{file_offset} = tell( $self->{fh} ); + } + + return $self; +} + +sub DESTROY { + my $self = shift; + return unless $self; + + close $self->{fh} if $self->{fh}; + + return; +} + +package DBM::Deep::09830::Array; + +use strict; + +# This is to allow DBM::Deep::Array to handle negative indices on +# its own. Otherwise, Perl would intercept the call to negative +# indices for us. This was causing bugs for negative index handling. +use vars qw( $NEGATIVE_INDICES ); +$NEGATIVE_INDICES = 1; + +use base 'DBM::Deep::09830'; + +use Scalar::Util (); + +sub _get_self { + eval { local $SIG{'__DIE__'}; tied( @{$_[0]} ) } || $_[0] +} + +sub TIEARRAY { +## +# Tied array constructor method, called by Perl's tie() function. +## + my $class = shift; + my $args = $class->_get_args( @_ ); + + $args->{type} = $class->TYPE_ARRAY; + + return $class->_init($args); +} + +sub FETCH { + my $self = $_[0]->_get_self; + my $key = $_[1]; + + $self->lock( $self->LOCK_SH ); + + if ( $key =~ /^-?\d+$/ ) { + if ( $key < 0 ) { + $key += $self->FETCHSIZE; + unless ( $key >= 0 ) { + $self->unlock; + return; + } + } + + $key = pack($DBM::Deep::09830::LONG_PACK, $key); + } + + my $rv = $self->SUPER::FETCH( $key ); + + $self->unlock; + + return $rv; +} + +sub STORE { + my $self = shift->_get_self; + my ($key, $value) = @_; + + $self->lock( $self->LOCK_EX ); + + my $orig = $key; + + my $size; + my $numeric_idx; + if ( $key =~ /^\-?\d+$/ ) { + $numeric_idx = 1; + if ( $key < 0 ) { + $size = $self->FETCHSIZE; + $key += $size; + if ( $key < 0 ) { + die( "Modification of non-creatable array value attempted, subscript $orig" ); + } + } + + $key = pack($DBM::Deep::09830::LONG_PACK, $key); + } + + my $rv = $self->SUPER::STORE( $key, $value ); + + if ( $numeric_idx && $rv == 2 ) { + $size = $self->FETCHSIZE unless defined $size; + if ( $orig >= $size ) { + $self->STORESIZE( $orig + 1 ); + } + } + + $self->unlock; + + return $rv; +} + +sub EXISTS { + my $self = $_[0]->_get_self; + my $key = $_[1]; + + $self->lock( $self->LOCK_SH ); + + if ( $key =~ /^\-?\d+$/ ) { + if ( $key < 0 ) { + $key += $self->FETCHSIZE; + unless ( $key >= 0 ) { + $self->unlock; + return; + } + } + + $key = pack($DBM::Deep::09830::LONG_PACK, $key); + } + + my $rv = $self->SUPER::EXISTS( $key ); + + $self->unlock; + + return $rv; +} + +sub DELETE { + my $self = $_[0]->_get_self; + my $key = $_[1]; + + my $unpacked_key = $key; + + $self->lock( $self->LOCK_EX ); + + my $size = $self->FETCHSIZE; + if ( $key =~ /^-?\d+$/ ) { + if ( $key < 0 ) { + $key += $size; + unless ( $key >= 0 ) { + $self->unlock; + return; + } + } + + $key = pack($DBM::Deep::09830::LONG_PACK, $key); + } + + my $rv = $self->SUPER::DELETE( $key ); + + if ($rv && $unpacked_key == $size - 1) { + $self->STORESIZE( $unpacked_key ); + } + + $self->unlock; + + return $rv; +} + +sub FETCHSIZE { + ## + # Return the length of the array + ## + my $self = shift->_get_self; + + $self->lock( $self->LOCK_SH ); + + my $SAVE_FILTER = $self->_root->{filter_fetch_value}; + $self->_root->{filter_fetch_value} = undef; + + my $packed_size = $self->FETCH('length'); + + $self->_root->{filter_fetch_value} = $SAVE_FILTER; + + $self->unlock; + + if ($packed_size) { + return int(unpack($DBM::Deep::09830::LONG_PACK, $packed_size)); + } + + return 0; +} + +sub STORESIZE { + ## + # Set the length of the array + ## + my $self = $_[0]->_get_self; + my $new_length = $_[1]; + + $self->lock( $self->LOCK_EX ); + + my $SAVE_FILTER = $self->_root->{filter_store_value}; + $self->_root->{filter_store_value} = undef; + + my $result = $self->STORE('length', pack($DBM::Deep::09830::LONG_PACK, $new_length)); + + $self->_root->{filter_store_value} = $SAVE_FILTER; + + $self->unlock; + + return $result; +} + +sub POP { + ## + # Remove and return the last element on the array + ## + my $self = $_[0]->_get_self; + + $self->lock( $self->LOCK_EX ); + + my $length = $self->FETCHSIZE(); + + if ($length) { + my $content = $self->FETCH( $length - 1 ); + $self->DELETE( $length - 1 ); + + $self->unlock; + + return $content; + } + else { + $self->unlock; + return; + } +} + +sub PUSH { + ## + # Add new element(s) to the end of the array + ## + my $self = shift->_get_self; + + $self->lock( $self->LOCK_EX ); + + my $length = $self->FETCHSIZE(); + + while (my $content = shift @_) { + $self->STORE( $length, $content ); + $length++; + } + + $self->unlock; + + return $length; +} + +sub SHIFT { + ## + # Remove and return first element on the array. + # Shift over remaining elements to take up space. + ## + my $self = $_[0]->_get_self; + + $self->lock( $self->LOCK_EX ); + + my $length = $self->FETCHSIZE(); + + if ($length) { + my $content = $self->FETCH( 0 ); + + ## + # Shift elements over and remove last one. + ## + for (my $i = 0; $i < $length - 1; $i++) { + $self->STORE( $i, $self->FETCH($i + 1) ); + } + $self->DELETE( $length - 1 ); + + $self->unlock; + + return $content; + } + else { + $self->unlock; + return; + } +} + +sub UNSHIFT { + ## + # Insert new element(s) at beginning of array. + # Shift over other elements to make space. + ## + my $self = shift->_get_self; + my @new_elements = @_; + + $self->lock( $self->LOCK_EX ); + + my $length = $self->FETCHSIZE(); + my $new_size = scalar @new_elements; + + if ($length) { + for (my $i = $length - 1; $i >= 0; $i--) { + $self->STORE( $i + $new_size, $self->FETCH($i) ); + } + } + + for (my $i = 0; $i < $new_size; $i++) { + $self->STORE( $i, $new_elements[$i] ); + } + + $self->unlock; + + return $length + $new_size; +} + +sub SPLICE { + ## + # Splices section of array with optional new section. + # Returns deleted section, or last element deleted in scalar context. + ## + my $self = shift->_get_self; + + $self->lock( $self->LOCK_EX ); + + my $length = $self->FETCHSIZE(); + + ## + # Calculate offset and length of splice + ## + my $offset = shift; + $offset = 0 unless defined $offset; + if ($offset < 0) { $offset += $length; } + + my $splice_length; + if (scalar @_) { $splice_length = shift; } + else { $splice_length = $length - $offset; } + if ($splice_length < 0) { $splice_length += ($length - $offset); } + + ## + # Setup array with new elements, and copy out old elements for return + ## + my @new_elements = @_; + my $new_size = scalar @new_elements; + + my @old_elements = map { + $self->FETCH( $_ ) + } $offset .. ($offset + $splice_length - 1); + + ## + # Adjust array length, and shift elements to accomodate new section. + ## + if ( $new_size != $splice_length ) { + if ($new_size > $splice_length) { + for (my $i = $length - 1; $i >= $offset + $splice_length; $i--) { + $self->STORE( $i + ($new_size - $splice_length), $self->FETCH($i) ); + } + } + else { + for (my $i = $offset + $splice_length; $i < $length; $i++) { + $self->STORE( $i + ($new_size - $splice_length), $self->FETCH($i) ); + } + for (my $i = 0; $i < $splice_length - $new_size; $i++) { + $self->DELETE( $length - 1 ); + $length--; + } + } + } + + ## + # Insert new elements into array + ## + for (my $i = $offset; $i < $offset + $new_size; $i++) { + $self->STORE( $i, shift @new_elements ); + } + + $self->unlock; + + ## + # Return deleted section, or last element in scalar context. + ## + return wantarray ? @old_elements : $old_elements[-1]; +} + +sub EXTEND { + ## + # Perl will call EXTEND() when the array is likely to grow. + # We don't care, but include it for compatibility. + ## +} + +## +# Public method aliases +## +*length = *FETCHSIZE; +*pop = *POP; +*push = *PUSH; +*shift = *SHIFT; +*unshift = *UNSHIFT; +*splice = *SPLICE; + +package DBM::Deep::09830::Hash; + +use strict; + +use base 'DBM::Deep::09830'; + +sub _get_self { + eval { local $SIG{'__DIE__'}; tied( %{$_[0]} ) } || $_[0] +} + +sub TIEHASH { + ## + # Tied hash constructor method, called by Perl's tie() function. + ## + my $class = shift; + my $args = $class->_get_args( @_ ); + + $args->{type} = $class->TYPE_HASH; + + return $class->_init($args); +} + +sub FETCH { + my $self = shift->_get_self; + my $key = ($self->_root->{filter_store_key}) + ? $self->_root->{filter_store_key}->($_[0]) + : $_[0]; + + return $self->SUPER::FETCH( $key ); +} + +sub STORE { + my $self = shift->_get_self; + my $key = ($self->_root->{filter_store_key}) + ? $self->_root->{filter_store_key}->($_[0]) + : $_[0]; + my $value = $_[1]; + + return $self->SUPER::STORE( $key, $value ); +} + +sub EXISTS { + my $self = shift->_get_self; + my $key = ($self->_root->{filter_store_key}) + ? $self->_root->{filter_store_key}->($_[0]) + : $_[0]; + + return $self->SUPER::EXISTS( $key ); +} + +sub DELETE { + my $self = shift->_get_self; + my $key = ($self->_root->{filter_store_key}) + ? $self->_root->{filter_store_key}->($_[0]) + : $_[0]; + + return $self->SUPER::DELETE( $key ); +} + +sub FIRSTKEY { + ## + # Locate and return first key (in no particular order) + ## + my $self = $_[0]->_get_self; + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + ## + # Request shared lock for reading + ## + $self->lock( $self->LOCK_SH ); + + my $result = $self->_get_next_key(); + + $self->unlock(); + + return ($result && $self->_root->{filter_fetch_key}) + ? $self->_root->{filter_fetch_key}->($result) + : $result; +} + +sub NEXTKEY { + ## + # Return next key (in no particular order), given previous one + ## + my $self = $_[0]->_get_self; + + my $prev_key = ($self->_root->{filter_store_key}) + ? $self->_root->{filter_store_key}->($_[1]) + : $_[1]; + + my $prev_md5 = $DBM::Deep::09830::DIGEST_FUNC->($prev_key); + + ## + # Make sure file is open + ## + if (!defined($self->_fh)) { $self->_open(); } + + ## + # Request shared lock for reading + ## + $self->lock( $self->LOCK_SH ); + + my $result = $self->_get_next_key( $prev_md5 ); + + $self->unlock(); + + return ($result && $self->_root->{filter_fetch_key}) + ? $self->_root->{filter_fetch_key}->($result) + : $result; +} + +## +# Public method aliases +## +*first_key = *FIRSTKEY; +*next_key = *NEXTKEY; + +1; +__END__ diff --git a/utils/upgrade_db.pl b/utils/upgrade_db.pl new file mode 100755 index 0000000..4dfedf4 --- /dev/null +++ b/utils/upgrade_db.pl @@ -0,0 +1,223 @@ +#!/usr/bin/perl + +use 5.6.0; + +use strict; +use warnings FATAL => 'all'; + +use FindBin; +use File::Spec (); +use lib File::Spec->catdir( $FindBin::Bin, 'lib' ); + +# This is for the latest version. +use lib File::Spec->catdir( $FindBin::Bin, '..', 'lib' ); + +use Getopt::Long qw( GetOptions ); +use Pod::Usage; + +my %headerver_to_module = ( + '0' => 'DBM::Deep::09830', + '2' => 'DBM::Deep', +); + +my %is_dev = ( + '1' => 1, +); + +my %opts = ( + man => 0, + help => 0, + version => '1.0000', + autobless => 0, +); +GetOptions( \%opts, + 'input=s', 'output=s', 'version:s', 'autobless:i', + 'help|?', 'man', +) || pod2man(2); +pod2usage(1) if $opts{help}; +pod2usage(-exitstatus => 0, -verbose => 2) if $opts{man}; + +pod2usage(-msg => "Missing required parameters.", verbose => 1) + unless $opts{input} && $opts{output}; + +if ( $opts{input} eq $opts{output} ) { + _exit( "Cannot use the same filename for both input and output." ); +} + +unless ( -f $opts{input} ) { + _exit( "'$opts{input}' is not a file." ); +} + +my %db; +{ + my $ver = _read_file_header( $opts{input} ); + if ( $is_dev{ $ver } ) { + _exit( "'$opts{input}' is a dev release and not supported." ); + } + + my $mod = $headerver_to_module{ $ver }; + eval "use $mod;"; + $db{input} = $mod->new({ + file => $opts{input}, + locking => 1, + autobless => $opts{autobless}, + }); + $db{input}->lock; +} + +{ + my $ver = $opts{version}; + if ( $ver =~ /^0\.9[1-8]/ ) { + $ver = 0; + } + elsif ( $ver =~ /^0\.99/) { + $ver = 1; + } + elsif ( $ver =~ /^1\.000?0?/) { + $ver = 2; + } + else { + _exit( "'$ver' is an unrecognized version." ); + } + + if ( $is_dev{ $ver } ) { + _exit( "-version '$opts{version}' is a dev release and not supported." ); + } + + # First thing is to destroy the file, in case it's an incompatible version. + unlink $opts{output}; + + my $mod = $headerver_to_module{ $ver }; + eval "use $mod;"; + $db{output} = $mod->new({ + file => $opts{output}, + locking => 1, + autobless => $opts{autobless}, + }); + $db{output}->lock; +} + +# Do the actual conversion. This is the code that compress uses. +$db{input}->_copy_node( $db{output} ); +undef $db{output}; + +################################################################################ + +sub _read_file_header { + my ($file) = @_; + + open my $fh, '<', $file + or _exit( "Cannot open '$file' for reading: $!" ); + + my $buffer = _read_buffer( $fh, 9 ); + _exit( "'$file' is not a DBM::Deep file." ) + unless length $buffer == 9; + + my ($file_sig, $header_sig, $header_ver) = unpack( 'A4 A N', $buffer ); + + # SIG_FILE == 'DPDB' + _exit( "'$file' is not a DBM::Deep file." ) + unless $file_sig eq 'DPDB'; + + # SIG_HEADER == 'h' - this means that this is a pre-1.0 file + return 0 unless ($header_sig eq 'h'); + + return $header_ver; +} + +sub _read_buffer { + my ($fh, $len) = @_; + my $buffer; + read( $fh, $buffer, $len ); + return $buffer; +} + +sub _exit { + my ($msg) = @_; + pod2usage( -verbose => 0, -msg => $msg ); +} + +__END__ + +=head1 NAME + +upgrade_db.pl + +=head1 SYNOPSIS + + upgrade_db.pl -input -output + +=head1 DESCRIPTION + +This will attempt to upgrade DB files from one version of DBM::Deep to +another. The version of the input file is detected from the file header. The +version of the output file defaults to the version of the distro in this file, +but can be set, if desired. + +=head1 OPTIONS + +=over 4 + +=item B<-input> (required) + +This is the name of original DB file. + +=item B<-output> (required) + +This is the name of target output DB file. + +=item B<-version> + +Optionally, you can specify the version of L for the output file. +This can either be an upgrade or a downgrade. The minimum version supported is +0.91. + +If the version is the same as the input file, this acts like a compressed copy +of the database. + +=item B<-autobless> + +In pre-1.0000 versions, autoblessing was an optional setting. This defaults to +false. + +=item B<-help> + +Prints a brief help message, then exits. + +=item B<-man> + +Prints a much longer message, then exits; + +=back + +=head1 CAVEATS + +The following are known issues with this converter. + +=over 4 + +=item * Diskspace requirements + +This will require about twice the diskspace of the input file. + +=item * Feature support + +Not all versions support the same features. In particular, internal references +were supported in 0.983 and support was removed in 1.000. There is no +detection of this by upgrade_db.pl. + +=back + +=head1 MAINTAINER(S) + +Rob Kinyon, L + +Originally written by Rob Kinyon, L + +=head1 LICENSE + +Copyright (c) 2007 Rob Kinyon. All Rights Reserved. +This is free software, you may use it and distribute it under the +same terms as Perl itself. + +=cut