=head1 What is DBM::Deep?
-L<DBM::Deep> is a module written completely in Perl that provides a way of
+L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
in memory. The datafile produced is able to be ftp'ed from one machine to
another, regardless of OS or Perl version. There are several reasons why
=item * Huge datastructures
Normally, datastructures are limited by the size of RAM the server has.
-L<DBM::Deep> allows for the size a given datastructure to be limited by disk
+L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
instead.
=item * IPC
=item * Consistent
When the transaction begins and when it is committed, the database must be in
-a legal state. This restriction doesn't apply to L<DBM::Deep> very much.
+a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
=item * Isolated
As far as a transaction is concerned, it is the only thing running against the
-database while it is running. Unlike most RDBMSes, L<DBM::Deep> provides the
+database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
strongest isolation level possible.
=item * Durable
=head2 The backstory
-The addition of transactions to L<DBM::Deep> has easily been the single most
+The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
complex software endeavor I've ever undertaken. The first step was to figure
out exactly how transactions were going to work. After several spikesN<These
are throwaway coding explorations.>, the best design seemed to look to SVN
=head2 DBM::Deep's file structure
-L<DBM::Deep>'s file structure is a record-based structure. The key (or array
+L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
index - arrays are currently just funny hashes internally) is hashed using MD5
and then stored in a cascade of Index and Bucketlist records. The bucketlist
record stores the actual key string and pointers to where the data records are
doing a modification to the HEAD. Then, you choose to either check in your
code (commit()) or revert (rollback()).
-In L<DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
+In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
benefits:
=over 4
=head2 Freespace management
The second major piece to the 1.00 release was freespace management. In
-pre-1.00 versions of L<DBM::Deep>, the space used by deleted keys would not be
+pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
recycled. While always a requested feature, the complexity required to
implement freespace meant that it needed to wait for a complete rewrite of
several pieces, such as for transactions.
-Freespace is implemented by regularizing all the records so that L<DBM::Deep>
+Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
only has three different record sizes - Index, BucketList, and Data. Each
-record type has a fixed length based on various parameters the L<DBM::Deep>
+record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
datafile is created with. (In order to accomodate values of various sizes, Data
records chain.) Whenever a sector is freed, it's added to a freelist of that
sector's size. Whenever a new sector is requested, the freelist is checked
have a pointer to a record in memory. If that record is deleted, then reused,
the pointer in memory has no way of determining that is was deleted and
readded vs. modified. So, a staleness counter was added which is incremented
-every time the sector is reused through the freelist.
+every time the sector is reused through the freelist. If you then attempt to
+access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
+the entry was deleted.
=head2 Staleness counters
-The measures taken for isolation can cause staleness issues, as well.
+Once it was implemented for freespace management, staleness counters proved to
+be a very powerful concept for transactions themselves. Back in L<Protection
+from changes>, I mentioned that other processes modifying the HEAD will
+protect all running transactions from their effects. This provides
+I<Isolation>. But, the running transaction doesn't know about these entries.
+If they're not cleaned up, they will be seen the next time a transaction uses
+that transaction ID.
+
+By providing a staleness counter for transactions, the costs of cleaning up
+finished transactions is deferred until the space is actually used again. This
+is at the cost of having less-than-optimal space utilization. Changing this in
+the future would be completely transparent to users, so I felt it was an
+acceptable tradeoff for delivering working code quickly.
=head1 Conclusion
function to lock the database in exclusive mode for writes, and shared mode
for reads. Pass any true value to enable. This affects the base DB handle
I<and any child hashes or arrays> that use the same DB file. This is an
-optional parameter, and defaults to 0 (disabled). See L<LOCKING> below for
+optional parameter, and defaults to 1 (enabled). See L<LOCKING> below for
more.
=item * autoflush
Specifies whether autoflush is to be enabled on the underlying filehandle.
This obviously slows down write operations, but is required if you may have
multiple processes accessing the same DB file (also consider enable I<locking>).
-Pass any true value to enable. This is an optional parameter, and defaults to 0
-(disabled).
+Pass any true value to enable. This is an optional parameter, and defaults to 1
+(enabled).
=item * filter_*
=head1 LOCKING
-Enable automatic file locking by passing a true value to the C<locking>
-parameter when constructing your DBM::Deep object (see L<SETUP> above).
+Enable or disable automatic file locking by passing a boolean value to the
+C<locking> parameter when constructing your DBM::Deep object (see L<SETUP>
+ above).
my $db = DBM::Deep->new(
file => "foo.db",
These will cause an infinite loop when importing. There are plans to fix this
in a later release.
+B<Note:> With the addition of transactions, importing is performed within a
+transaction, then immediately committed upon success (and rolled back upon
+failre). As a result, you cannot call C<import()> from within a transaction.
+This restriction will be lifted when subtransactions are added in a future
+release.
+
=head2 EXPORTING
Calling the C<export()> method on an existing DBM::Deep object will return
B<NOTE>: DBM::Deep 0.99_03 has turned off circular references pending
evaluation of some edge cases. I hope to be able to re-enable circular
-references in a future version prior to 1.00.
+references in a future version after 1.00. This means that circular references
+are B<NO LONGER> available.
DBM::Deep has B<experimental> support for circular references. Meaning you
can have a nested hash key or array element that points to a parent object.
=item * Make your file as tight as possible
If you know that you are not going to use more than 65K in your database,
-consider using the C<pack_size =#<gt> 'small'> option. This will instruct
+consider using the C<pack_size =E<gt> 'small'> option. This will instruct
DBM::Deep to use 16bit addresses, meaning that the seek times will be less.
-The same goes with the number of transactions. num_Txns defaults to 16. If you
-can set that to 1 or 2, that will reduce the file-size considerably, thus
-reducing seek times.
=back
-=head1 CAVEATS / ISSUES / BUGS
+=head1 TODO
+
+The following are items that are planned to be added in future releases. These
+are separate from the L<CAVEATS, ISSUES & BUGS> below.
+
+=head2 SUB-TRANSACTIONS
+
+Right now, you cannot run a transaction within a transaction. Removing this
+restriction is technically straightforward, but the combinatorial explosion of
+possible usecases hurts my head. If this is something you want to see
+immediately, please submit many testcases.
+
+=head2 CACHING
+
+If a user is willing to assert upon opening the file that this process will be
+the only consumer of that datafile, then there are a number of caching
+possibilities that can be taken advantage of. This does, however, mean that
+DBM::Deep is more vulnerable to losing data due to unflushed changes. It also
+means a much larger in-memory footprint. As such, it's not clear exactly how
+this should be done. Suggestions are welcome.
+
+=head2 RAM-ONLY
+
+The techniques used in DBM::Deep simply require a seekable contiguous
+datastore. This could just as easily be a large string as a file. By using
+substr, the STM capabilities of DBM::Deep could be used within a
+single-process. I have no idea how I'd specify this, though. Suggestions are
+welcome.
+
+=head1 CAVEATS, ISSUES & BUGS
-This section describes all the known issues with DBM::Deep. It you have found
-something that is not listed here, please send e-mail to L<jhuckaby@cpan.org>.
+This section describes all the known issues with DBM::Deep. These are issues
+that are either intractable or depend on some feature within Perl working
+exactly right. It you have found something that is not listed below, please
+send an e-mail to L<rkinyon@cpan.org>. Likewise, if you think you know of a
+way around one of these issues, please let me know.
=head2 REFERENCES
The current level of error handling in DBM::Deep is minimal. Files I<are> checked
for a 32-bit signature when opened, but other corruption in files can cause
-segmentation faults. DBM::Deep may try to seek() past the end of a file, or get
+segmentation faults. DBM::Deep may try to C<seek()> past the end of a file, or get
stuck in an infinite loop depending on the level of corruption. File write
operations are not checked for failure (for speed), so if you happen to run
out of disk space, DBM::Deep will probably fail in a bad way. These things will
Beware of using DBM::Deep files over NFS. DBM::Deep uses flock(), which works
well on local filesystems, but will NOT protect you from file corruption over
NFS. I've heard about setting up your NFS server with a locking daemon, then
-using lockf() to lock your files, but your mileage may vary there as well.
+using C<lockf()> to lock your files, but your mileage may vary there as well.
From what I understand, there is no real way to do it. However, if you need
access to the underlying filehandle in DBM::Deep for using some other kind of
-locking scheme like lockf(), see the L<LOW-LEVEL ACCESS> section above.
+locking scheme like C<lockf()>, see the L<LOW-LEVEL ACCESS> section above.
=head2 COPYING OBJECTS
Beware of copying tied objects in Perl. Very strange things can happen.
Instead, use DBM::Deep's C<clone()> method which safely copies the object and
-returns a new, blessed, tied hash or array to the same level in the DB.
+returns a new, blessed and tied hash or array to the same level in the DB.
my $copy = $db->clone();