L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
-in memory. The datafile produced is able to be ftp'ed from one machine to
+in memory. The datafile produced is able to be transferred from one machine to
another, regardless of OS or Perl version. There are several reasons why
someone would want to do this.
Normally, datastructures are limited by the size of RAM the server has.
L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
-instead.
+instead (up to the given perl's largefile support).
=item * IPC
=back
-And, with the release of 1.00, there is now a fourth reason -
-software-transactional memory, or STM
-(L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
-
=head1 How does DBM::Deep work?
L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
-single read and write go to the file and modify the file immediately. To
+read and write go to the file and modify the file immediately. To
represent Perl's hashes and arrays, a record-based file format is used. There
is a file header storing file-wide values, such as the size of the internal
file pointers. Afterwards, there are the data records.
+The most important feature of L<DBM::Deep|DBM::Deep> is that it can be
+completely transparent. Other than the line tying the variable to the file, no
+other part of your program needs to know that the variable being used isn't a
+"normal" Perl variable.
+
=head2 DBM::Deep's file structure
-L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
+L<DBM::Deep|DBM::Deep>'s file structure is record-based. The key (or array
index - arrays are currently just funny hashes internally) is hashed using MD5
and then stored in a cascade of Index and Bucketlist records. The bucketlist
record stores the actual key string and pointers to where the data records are
stored. The data records themselves are one of Null, Scalar, or Reference.
Null represents an I<undef>, Scalar represents a string (numbers are
-stringified for simplicity) and are allocated in 256byte chunks. References
-represent an array or hash reference and contains a pointer to an Index and
-Bucketlist cascade of its own.
+stringified internally for simplicity) and are allocated in 256byte chunks.
+Reference represent an array or hash reference and contains a pointer to an
+Index and Bucketlist cascade of its own. Reference will also store the class
+the hash or array reference is blessed into, meaning that almost all objects
+can be stored safely.
=head2 DBM::Deep's class hierarchy
These classes manage the file format and all of the ways that the records
interact with each other. Nearly every call will make requests to the File
-class for reading and/or writing data to the file. There are currently nine
-classes in this layer.
+classes for reading and/or writing data to the file. There are currently nine
+classes in this layer, including a class for each record type.
=item * File class
With a transaction wrapping the money transfer, if the application crashes in
the middle, it's as if the action never happened. So, when the application
recovers from the crash, Joe and Bob still have the same amount of money in
-their accounts as they did before and the transaction can restart and Bob can
+their accounts as they did before. The transaction can restart and Bob can
finally receive his zorkmids.
More formally, transactions are generally considered to be proper when they are
-ACID-compliant. ACID is an acronym that means the following:
+ACID-compliant. ACID is an acronym that stands for the following:
=over 4
=item * Consistent
When the transaction begins and when it is committed, the database must be in
-a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
+a legal state. This condition doesn't apply to L<DBM::Deep|DBM::Deep> as all
+Perl data structures are internally consistent.
=item * Isolated
As far as a transaction is concerned, it is the only thing running against the
-database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
-strongest isolation level possible.
+database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep>
+provides the strongest isolation level possible, usually called
+I<Serializable> by most RDBMSes.
=item * Durable
-Once the database says that a comit has happened, the commit will be
-guaranteed, regardless of whatever happens.
+Once the database says that a commit has happened, the commit will be
+guaranteed, regardless of whatever happens. I chose to not implement this
+condition in L<DBM::Deep|DBM::Deep>N<This condition requires the prescence of
+at least one more file, which violates the original design goals.>.
=back
The ability to have actions occur in either I<atomically> (as in the previous
example) or I<isolation> from the rest of the users of the data is a powerful
-thing. This allows for a certain amount of safety and predictability in how
+thing. This allows for a large amount of safety and predictability in how
data transformations occur. Imagine, for example, that you have a set of
calculations that will update various variables. However, there are some
situations that will cause you to throw away all results and start over with a
different seed. Without transactions, you would have to put everything into
temporary variables, then transfer the values when the calculations were found
-to be successful. With STM, you start a transaction and do your thing within
-it. If the calculations succeed, you commit. If they fail, you rollback and
-try again. If you're thinking that this is very similar to how SVN or CVS
-works, you're absolutely correct - they are transactional in the exact same
-way.
+to be successful. If you ever add a new value or if a value is used in only
+certain calculations, you may forget to do the correct thing. With
+transactions, you start a transaction and do your thing within it. If the
+calculations succeed, you commit. If they fail, you rollback and try again. If
+you're thinking that this is very similar to how SVN or CVS works, you're
+absolutely correct - they are transactional in exactly the same way.
=head1 How it happened
=head2 The backstory
-The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
-complex software endeavor I've ever undertaken. The first step was to figure
-out exactly how transactions were going to work. After several spikesN<These
-are throwaway coding explorations.>, the best design seemed to look to SVN
-instead of relational databases. The more I investigated, the more I ran up
-against the object-relational impedance mismatch
+The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the
+single most complex software endeavor I've ever undertaken. The first step was
+to figure out exactly how transactions were going to work. After several
+spikesN<These are throwaway coding explorations.>, the best design seemed to
+look to SVN instead of relational databases. The more I investigated, the more
+I ran up against the object-relational impedance mismatch
N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
time in terms of being able to translate designs. In the relational world,
transactions are generally implemented either as row-level locks or using MVCC
-N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
+N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
these assume that there is a I<row>, or singular object, that can be locked
transparently to everything else. This doesn't translate to a fractally
repeating structure like a hash or an array.
However, the design used by SVN deals with directories and files which
corresponds very closely to hashes and hashkeys. In SVN, the modifications are
-stored in the file's structure. Translating this to hashes and hashkeys, this
-means that transactional information should be stored in the keys. This means
-that the entire datafile is unaware of anything to do with transactions, except
-for the key's data structure within the bucket.
+stored in the file's metadata. Translating this to hashes and hashkeys, this
+means that transactional information should be stored in the key's metadata.
+Or, in L<DBM::Deep|DBM::Deep> terms, within the Bucket for that key. As a nice
+side-effect, the entire datafile is unaware of anything to do with
+transactions, except for the key's data structure within the bucket.
=head2 Transactions in the keys
=head2 The concept of the HEAD
This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
-revision checked into the repository. When you do a ocal modification, you're
-doing a modification to the HEAD. Then, you choose to either check in your
-code (commit()) or revert (rollback()).
+revision checked into the repository. When you do a local modification, you're
+doing a modification to your copy of the HEAD. Then, you choose to either
+check in your code (commit()) or revert (rollback()).
In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
benefits:
must be transferred over from the bucket entry for the given transaction ID to
the entry for the HEAD.
+This tracking is done by the modified buckets themselves. They
+
=head2 Deleted marker
Transactions are performed copy-on-write. This means that if there isn't an
pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
recycled. While always a requested feature, the complexity required to
implement freespace meant that it needed to wait for a complete rewrite of
-several pieces, such as for transactions.
+several pieces, such as the engine.
Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
only has three different record sizes - Index, BucketList, and Data. Each
finished transactions is deferred until the space is actually used again. This
is at the cost of having less-than-optimal space utilization. Changing this in
the future would be completely transparent to users, so I felt it was an
-acceptable tradeoff for delivering working code quickly.
+acceptable tradeoff for quick delivery of a functional product.
-=head1 Conclusion
+=head1 The future
=cut