=head0 Adding transactions to DBM::Deep

For the past 9 months, I've been working on adding transactions to
L<DBM::Deep|DBM::Deep>. During that time, I looked far and wide for an
accessible description of how a programmer should go about implementing
transactions. The only things I found were either extremely pedantic academic
papers or code. The former weren't very easy to read and the latter were less
soN<Have I<you> ever tried to read the source for BDB or InnoDB? Reading
perl's source is easier.>. This is the article I wished I'd been able to read
nine months ago.

=head1 What is DBM::Deep?

L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
in memory. The datafile produced is able to be transferred from one machine to
another, regardless of OS or Perl version. There are several reasons why
someone would want to do this.

=over 4

=item * Transparent Persistence

This is the ability to save a set of data structures to disk and retrieve them
later without the vast majority of the program even knowing that the data is
persisted. Furthermore, the datastructure is persisted immediately and not at
set marshalling periods.

=item * Huge datastructures

Normally, datastructures are limited by the size of RAM the server has.
L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
instead (up to the given perl's largefile support).

=item * Database

Once you have persistent datastructures, you start wanting to have multiple
processes be able to use that data. (More on this later.)

=back

=head1 How does DBM::Deep work?

L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
read and write go directly to the file and modify the file immediately. To
represent Perl's hashes and arrays, a record-based file format is used. There
is a file header storing file-wide values, such as the size of the internal
file pointers.  Afterwards, there are the data records.

The most important feature of L<DBM::Deep|DBM::Deep> is that it can be
completely transparent. Other than the line tying the variable to the file, no
other part of your program needs to know that the variable being used isn't a
"normal" Perl variable. So, the following works just fine:

  # As far as the rest of the program is concerned, the following two lines
  # are identical - they produce a variable $foo that can be used as a hashref.
  # my $foo = {};
  my $foo = DBM::Deep->new( 'mydb.db' );

  $foo->{bar} = 'baz';
  $foo->{complex} = [
    { a => 'b' }, 0, '123', undef, [ 1 .. 5 ],
  ];

  # And so on and so forth.

=head2 DBM::Deep's file structure

L<DBM::Deep|DBM::Deep>'s file structure is record-based. The key (or array
index - arrays are currently just funny hashes internally) is hashed using MD5
and then stored in a cascade of Index and Bucketlist records. The bucketlist
record stores the actual key string and pointers to where the data records are
stored. The data records themselves are one of Null, Scalar, or Reference.
Null represents an I<undef>, Scalar represents a string (numbers are
stringified internally for simplicity) and are allocated in 256byte chunks.
Reference represent an array or hash reference and contains a pointer to an
Index and Bucketlist cascade of its own. Reference will also store the class
the hash or array reference is blessed into, meaning that almost all objects
can be stored safely.

=head2 DBM::Deep's class hierarchy

Managing all of these functions takes a lot of different abstractions. There
are at least 3 different interfacing layers, and more if you look hard enough.
To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:

=over 4

=item * Tying classes

These are the classes that manage the external face of the module. They manage
B<all> of the interactions with outside code - OO interface, tying, and
various utility methods. If they cannot handle the request themselves, they
delegate to the engine. There are currently three classes in this layer.

=item * Engine classes

These classes manage the file format and all of the ways that the records
interact with each other. Nearly every call will make requests to the File
classes for reading and/or writing data to the file. There are currently nine
classes in this layer, including a class for each record type.

=item * File class

This class mediates all interaction with the file system. Every read, write,
lock, and unlock goes through this class. There is currently one class in this
layer.

=item * Iterator classes

These are introspection classes that provide iteration for hashes. They manage
keeping track of where the next key should be and how to get there. There are
currently four classes in this layer.

=back

=head1 Why add transactions to DBM::Deep?

For the most part, L<DBM::Deep|DBM::Deep> functions perfectly well without
transactions. Most uses that I've seen tend to be either single-process
read/write persistence or a multi-process readonly cache for an expensive, but
static, lookup. With transactions, L<DBM::Deep|DBM::Deep> can now be used
safely for multi-process read/write persistence in situations that don't
need (or cannot use) a full RDBMS.

=head2 What are transactions?

Originally from the database world, a transaction is a way of isolating the
effects of a given set of actions, then applying them all at once. It's a way
of saying "I'm going to try the following steps, see if I like the result,
then I want everyone else looking at this datastore to see the results
immediately." The most common example is taken from banking. Let's say that an
application receives a request to have Joe pay Bob five zorkmids.  Without
transactions, the application would take the money from Joe's account, then
add the money to Bob's account. But, what happens if the application crashes
after debiting Joe, but before crediting Bob? The application has made money
disappear. Or, vice versa, if Bob is credited before Joe is debited, the
application has created money.

With a transaction wrapping the money transfer, if the application crashes in
the middle, it's as if the action never happened. So, when the application
recovers from the crash, Joe and Bob still have the same amount of money in
their accounts as they did before. The transaction can restart and Bob can
finally receive his zorkmids.

More formally, transactions are generally considered to be proper when they are
ACID-compliant. ACID is an acronym that stands for the following:

=over 4

=item * Atomic

Either every change happens or none of the changes happen.

=item * Consistent

When the transaction begins and when it is committed, the database must be in
a legal state. This condition doesn't apply to L<DBM::Deep|DBM::Deep> as all
Perl data structures are internally consistent.

=item * Isolated

As far as a transaction is concerned, it is the only thing running against the
database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep>
provides the strongest isolation level possible, usually called
I<Serializable> by most RDBMSes.

=item * Durable

Once the database says that a commit has happened, the commit will be
guaranteed, regardless of whatever happens. I chose to not implement this
condition in L<DBM::Deep|DBM::Deep>N<This condition requires the prescence of
at least one more file, which violates the original design goals.>.

=back

=head2 Why add them to DBM::Deep?

The ability to have actions occur in either I<atomically> (as in the previous
example) or I<isolation> from the rest of the users of the data is a powerful
thing. This allows for a large amount of safety and predictability in how
data transformations occur. Imagine, for example, that you have a set of
calculations that will update various variables. However, there are some
situations that will cause you to throw away all results and start over with a
different seed. Without transactions, you would have to put everything into
temporary variables, then transfer the values when the calculations were found
to be successful. If you ever add a new value or if a value is used in only
certain calculations, you may forget to do the correct thing. With
transactions, you start a transaction and do your thing within it. If the
calculations succeed, you commit. If they fail, you rollback and try again. If
you're thinking that this is very similar to how SVN or CVS works, you're
absolutely correct - they are transactional in exactly the same way.

=head1 How it happened

=head2 The backstory

The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the
single most complex software endeavor I've ever undertaken. The first step was
to figure out exactly how transactions were going to work. After several
spikesN<These are throwaway coding explorations.>, the best design seemed to
look to SVN instead of relational databases. The more I investigated, the more
I ran up against the object-relational impedance mismatch
N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
time in terms of being able to translate designs. In the relational world,
transactions are generally implemented either as row-level locks or using MVCC
N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>.  Both of
these assume that there is a I<row>, or singular object, that can be locked
transparently to everything else. This doesn't translate to a fractally
repeating structure like a hash or an array.

However, the design used by SVN deals with directories and files which
corresponds very closely to hashes and hashkeys. In SVN, the modifications are
stored in the file's metadata. Translating this to hashes and hashkeys, this
means that transactional information should be stored in the key's metadata.
Or, in L<DBM::Deep|DBM::Deep> terms, within the Bucket for that key. As a nice
side-effect, the entire datafile is unaware of anything to do with
transactions, except for the key's data structure within the bucket.

=head2 Transactions in the keys

The first pass was to expand the Bucketlist sector to go from a simple key / 
datapointer mapping to a more complex key / transaction / datapointer mapping.
Initially, I interposed a Transaction record that the bucketlist pointed to.
That then contained the transaction / datapointer mapping. This had the
advantage of changing nothing except for adding one new sector type and the
handling for it. This was very quickly merged into the Bucketlist record to
simplify the resulting code.

This first step got me to the point where I could pass the following test:

  my $db1 = DBM::Deep->new( $filename );
  my $db2 = DBM::Deep->new( $filename );

  $db1->{abc} = 'foo';

  is( $db1->{abc}, 'foo' );
  is( $db2->{abc}, 'foo' );

  $db1->begin_work();

      is( $db1->{abc}, 'foo' );
      is( $db2->{abc}, 'foo' );

      $db1->{abc} = 'floober';

      is( $db1->{abc}, 'floober' );
      is( $db2->{abc}, 'foo' );

Just that much was a major accomplishment. The first pass only looked in the
transaction's spot in the bucket for that key. And, that passed my first tests
because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
modifying it in the transaction. To pass that test, the code for retrieval
needed to look first in the transaction's spot and if that spot had never been
assigned to, look at the spot for the HEAD.

=head2 The concept of the HEAD

This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
revision checked into the repository. When you do a local modification, you're
doing a modification to your copy of the HEAD. Then, you choose to either
check in your code (commit()) or revert (rollback()).

In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
benefits:

=over 4

=item * Easy identifiaction of a transaction

C<if ( $trans_id ) {}> will run the code if and only if we are in a running
transaction.

=item * The HEAD is the first bucket

In a given bucket, the HEAD is the first datapointer because we mutliply the
size of the transactional bookkeeping by the transaction ID to find the offset
to seek into the file.

=back

=head2 Protection from changes

Let's assume that a transaction is running in one process and another process
is also modifying the same area in the data. The only way that process B can
notify process A that a change has occurred is through the common point - the
DBM file. Because of this, every process that changes the HEAD needs to
protect all currently running transactions by copying over the pointer to the
original value into every transaction which hasn't already modified this
entry. (If it has, then the new value shouldn't overwrite the transaction's
modification.) This is the key piece for providing I<Isolation>.

=head2 Tracking modified buckets

Rolling back changes is very simple - just don't apply them to the HEAD. The
next time that transaction ID is reused, the changes will be ignored (q.v.
L</Staleness counters>). Committing, however, requires that all the changes
must be transferred over from the bucket entry for the given transaction ID to
the entry for the HEAD.

This tracking is done by the modified buckets themselves. They register
themselves with the engine as having been modified within the transaction.
Given L</Protection from changes>, this only guarantees that only those
changes made by the transaction will be transferred over.

=head2 Deleted marker

Transactions are performed copy-on-write. This means that if there isn't an
entry for that transaction, the HEAD is looked at. This doesn't work if a key
has been deleted within a transaction. So, the entry must be marked as deleted
within the transaction so that the HEAD isn't checekd.

Likewise, when a new key is created in a transaction, the HEAD doesn't have an
entry for that key. Consider the following situation:

  ok( !exists $db1->{foo} );
  ok( !exists $db2->{foo} );

  $db1->begin_work();
  $db1->{foo} = 'bar';

  ok( !exists $db2->{foo} );

The entry for the HEAD for 'foo' needs to be marked as deleted so that
transactions which don't have 'foo' don't find something in the HEAD.

=head2 Freespace management

The second major piece to the 1.00 release was freespace management. In
pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
recycled. While always a requested feature, the complexity required to
implement freespace meant that it needed to wait for a complete rewrite of
several pieces, such as the engine.

Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
only has three different record sizes - Index, BucketList, and Data. Each
record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
datafile is created with. (In order to accomodate values of various sizes, Data
records chain.) Whenever a sector is freed, it's added to a freelist of that
sector's size. Whenever a new sector is requested, the freelist is checked
first. If the freelist has a sector, it's reused, otherwise a new sector is
added to the end of the datafile.

Freespace management did bring up another issue - staleness. It is possible to
have a pointer to a record in memory. If that record is deleted, then reused,
the pointer in memory has no way of determining that is was deleted and
readded vs. modified. So, a staleness counter was added which is incremented
every time the sector is reused through the freelist. If you then attempt to
access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
the entry was deleted.

=head2 Staleness counters

Once it was implemented for freespace management, staleness counters proved to
be a very powerful concept for transactions themselves. Back in L</Protection
from changes>, I mentioned that other processes modifying the HEAD will
protect all running transactions from their effects. This provides
I<Isolation>. But, the running transaction doesn't know about these entries.
If they're not cleaned up, they will be seen the next time a transaction uses
that transaction ID.

By providing a staleness counter for transactions, the costs of cleaning up
finished transactions is deferred until the space is actually used again. This
is at the cost of having less-than-optimal space utilization. Changing this in
the future would be completely transparent to users, so I felt it was an
acceptable tradeoff for quick delivery of a functional product.

=head1 The future

Basic transactions are only the first step. There are several features that
can be added on top of what's been provided. (Note: these are advanced topics -
I cannot be held responsible for any bleeding ears.)

=head2 Conflict resolution

Conflict, in this context, is what happens when two transactions run
simultaneously and change the same piece of data. Like most databases,
L<DBM::Deep|DBM::Deep> uses a very simplistic form of conflict resolution -
last commit wins. This works quite well for a row-based RDBM, but doesn't work
as well for fractal structures like hashes.

Contrast this with how Subversion handles conflict. It tracks when each
transaction was started. If the HEAD is at a later version than when the
transaction started, the commit is rejected. It is up to the developer to pull
in the latest changes, mediate any conflicts, and then recommit.

=head2 Checkpoints

A transaction may have several steps within it. The first three may succeed,
but the fourth might fail. Instead of rolling back all the way to the
beginning, you might want to rollback to the last successful step and try
again. This is a checkpoint. Most RDBMSes provide them, but we'll see if users
request them for L<DBM::Deep|DBM::Deep>.

=head2 Sub-transactions

Similar to L</Checkpoints>, sub-transactions provide a mechanism for trying
something within a transaction without affecting the transaction. However,
instead of saying "These steps are safely finished," sub-transactions still
provides for the ability to rollback the primary transaction all the way.
Sub-transactions can also work better with libraries that may want to use
transactions themselves.

=head2 Durability

As mentioned in L</What are transactions?>, the 'D' in ACID stands for
I<Durable>. L<DBM::Deep|DBM::Deep> does not satisfy that criterion because
durability almost always requires another file (or files) for a commit log. I
deemed this unacceptable for this release because one of the
L<DBM::Deep|DBM::Deep>'s features is the single datafile. However, Berkley DB
provides durability with only a single file. Cribbing from them may be viable
in the future.

=cut