=head0 Adding transactions to DBM::Deep For the past nine months, I've been working on adding transactions to L. During that time, I looked far and wide for an accessible description of how a programmer should go about implementing transactions. The only things I found were either extremely pedantic academic papers or the code of complex applications. The former weren't very easy to read and the latter were less so N ever tried to read the source for BDB or InnoDB? Reading perl's source is easier.>. This is the article I wished I'd been able to read nine months ago. =head1 What is DBM::Deep? L is a module written completely in Perl that provides a way of storing Perl datastructures (scalars, hashes, and arrays) on disk instead of in memory. The datafile produced is able to be transferred from one machine to another, regardless of OS or Perl version. There are several reasons why someone would want to do this. =over 4 =item * Transparent Persistence This is the ability to save a set of datastructures to disk and retrieve them later without the vast majority of the program ever knowing that the data is persisted. Furthermore, the datastructure is persisted immediately and not at set marshalling periods. =item * Huge datastructures Normally, datastructures are limited by the size of RAM the server has. L allows for the size a given datastructure to be limited by disk size instead (up to the given perl's largefile support). =item * Database Most programmers hear the word "database" and think "relational database management system" (or RDBMS). A database is a more general term meaning "place one stores data." This can be relational, object, or something else. The software used to manage and query a database is a "database management system" (DBMS). L provides one half of a DBMS - the data storage part. Once the datastructures on disk, L provides the capability to allow multiple processes to access the data. N =back =head1 How does DBM::Deep work? L works by tying a variable to a file on disk. Every read and write go directly to the file and modify the file immediately. To represent Perl's hashes and arrays, a record-based file format is used. There is a file header storing file-wide values, such as the size of the internal file pointers. Afterwards, there are the data records. The most important feature of L is that it can be completely transparent. Other than the line tying the variable to the file, no other part of your program needs to know that the variable being used isn't a "normal" Perl variable. So, the following works just fine: # As far as the rest of the program is concerned, the following two lines # are identical - they produce a variable $foo that can be used as a hashref. # my $foo = {}; my $foo = DBM::Deep->new( 'mydb.db' ); $foo->{bar} = 'baz'; $foo->{complex} = [ { a => 'b' }, 0, '123', undef, [ 1 .. 5 ], ]; # And so on and so forth. =head2 DBM::Deep's file structure L's file structure is record-based. The key (or array index - arrays are currently just funny hashes internally) is hashed using MD5 and then stored in a cascade of Index and bucketlist records. The bucketlist record stores the actual key string and pointers to where the data records are stored. The data records themselves are one of Null, Scalar, or Reference. Null represents an I, Scalar represents a string (numbers are stringified internally for simplicity) and are allocated in fixed-size chunks. Reference represent an array or hash reference and contains a pointer to an Index and bucketlist cascade of its own. Reference will also store the class the hash or array reference is blessed into, meaning that almost all objects can be stored safely. =head2 DBM::Deep's class hierarchy Managing all of these functions takes a lot of different abstractions. There are at least 3 different interfacing layers, and more if you look hard enough. To manage this complexity, L uses the following abstractions: =over 4 =item * Tying classes These are the classes that manage the external face of the module. They manage B of the interactions with outside code - OO interface, tying, and various utility methods. If they cannot handle the request themselves, they delegate to the engine. There are currently three classes in this layer. =item * Engine classes These classes manage the file format and all of the ways that the records interact with each other. Nearly every call will make requests to the File classes for reading and/or writing data to the file. There are currently nine classes in this layer, including a class for each record type. =item * File class This class mediates all interaction with the file system. Every read, write, lock, and unlock goes through this class. There is currently one class in this layer. =item * Iterator classes These are introspection classes that provide iteration for hashes. They manage keeping track of where the next key should be and how to get there. There are currently four classes in this layer. =back =head1 Why add transactions to DBM::Deep? For the most part, L functions perfectly well without transactions. Most uses that I've seen tend to be either single-process read/write persistence or a multi-process readonly cache for an expensive, but static, lookup. With transactions, L can now be used safely for multi-process read/write persistence in situations that don't need (or cannot use) a full RDBMS. =head2 What are transactions? Originally from the database world, a transaction is a way of isolating the effects of a given set of actions, then applying them all at once. It's a way of saying "I'm going to try the following steps, see if I like the result, then I want everyone else looking at this datastore to see the results immediately." The most common example is taken from banking. Let's say that an application receives a request to have Joe pay Bob five zorkmids. Without transactions, the application would take the money from Joe's account, then add the money to Bob's account. But, what happens if the application crashes after debiting Joe, but before crediting Bob? The application has made money disappear. Or, vice versa, if Bob is credited before Joe is debited, the application has created money. With a transaction wrapping the money transfer, if the application crashes in the middle, it's as if the action never happened. So, when the application recovers from the crash, Joe and Bob still have the same amount of money in their accounts as they did before. The transaction can restart and Bob can finally receive his zorkmids. More formally, transactions are generally considered to be proper when they are ACID-compliant. ACID is an acronym that stands for the following: =over 4 =item * Atomic Either every change happens or none of the changes happen. =item * Consistent When the transaction begins and when it is committed, the database must be in a legal state. This condition doesn't apply to L as all Perl datastructures are internally consistent. =item * Isolated As far as a transaction is concerned, it is the only thing running against the database while it is running. Unlike most RDBMSes, L provides the strongest isolation level possible, usually called I by most RDBMSes. =item * Durable Once the database says that a commit has happened, the commit will be guaranteed, regardless of whatever happens. I chose to not implement this condition in L N. =back =head2 Why add them to DBM::Deep? The ability to have actions occur in either I (as in the previous example) or I from the rest of the users of the data is a powerful thing. This allows for a large amount of safety and predictability in how data transformations occur. Imagine, for example, that you have a set of calculations that will update various variables. However, there are some situations that will cause you to throw away all results and start over with a different seed. Without transactions, you would have to put everything into temporary variables, then transfer the values when the calculations were found to be successful. If you ever add a new value or if a value is used in only certain calculations, you may forget to do the correct thing. With transactions, you start a transaction and do your thing within it. If the calculations succeed, you commit. If they fail, you rollback and try again. If you're thinking that this is very similar to how Subversion (SVN) or CVS works, you're absolutely correct - they are transactional in exactly the same way. =head1 How it happened The addition of transactions to L has easily been the single most complex software endeavor I've ever undertaken. While transactions are conceptually simple, the devil is in the details. And there were a B of details. =head2 The naive approach Initially, I hoped I could just copy the entire datastructure and mark it as owned by the transaction. This is the most straightforward solution and is extremely simple to implement. Whenever a transaction starts, copy the whole thing over to somewhere else. If the transaction is committed, overwrite the original with the transaction's version. If it's rolled back, throw it away. It's a popular solution as seen by the fact that it's the mechanism used in both L and L. While very simple to implement, it scales very poorly as the datastructure grows. As one of the primary usecases for L is working with huge datastructures, this plan was dead on arrival. =head2 The relational approach As I'm also a MySQL DBA, I looked to how the InnoDB engine implements transactions. Given that relational databases are designed to work with large amounts of data, it made sense to look here next. InnoDB implements transactions using MVCC N. When a transaction starts, it stores a timestamp corresponding to its start time. Whenever a modification to a row is committed, the modification is timestamped. When a transaction modifies a row, it copies the row into its own scratchpad and modifies it. Whenever a transaction reads a row, it first attempts to read the row from its scratchpad. If it's not there, then it reads the version of the row whose timestamp is no later than the timestamp of the transaction. When committing, the transaction's scratchpad is written out to the main data area with the timestamp of the commit and the scratchpad is thrown away. When rolling back, the scratchpad is thrown out. At first, this mechanism looked promising and I whipped up a couple spikes (or code explorations) to try it out. The problem I ran into, again, was the existence of large datastructures. When making large changes to a relational database within a transaction, the engine can store the rows within the actual table and mark them as being part of a transaction's scratchpad. Perl's fractal datastructures, however, don't lend themselves to this kind of treatment. The scratchpad would, in some pathological cases, be a near-complete copy of the original datastructure. N =head2 The subversive approach Despairing, I went to YAPC::NA::2006 hoping to discuss the problem with the best minds in the Perl community. I was lucky enough to run into both Audrey Tang (author of Pugs) and clkao (author of SVK). In between talks, I managed to discuss the problems I'd run into with both of them. They looked at me oddly and asked why I wasn't looking at Subversion (SVN) as a model for transactions. My first reaction was "It's a source control application. What does it know about transa- . . . Ohhhh!" And they smiled. Like Perl datastructures, a filesystem is fractal. Directories contain both files and directories. Directories act as hashes and a files act as scalars whose names are their hashkeys. When a modification is made to a SVN checkout, SVN tracks the changes at the filename (or directory name) level. When a commit is made, only those filenames which have changes are copied over to the HEAD. Everything else remains untouched. Translating this to hashes and hashkeys, this implies that transactional information should be stored at the level of the hashkey. Or, in L terms, within the bucket for that key. As a nice side-effect, other than the key's datastructure within the bucket, the entire datafile is unaware of anything to do with transactions. =head2 The spike Spikes are kind of like a reconnaissance mission in the military. They go out to get intel on the enemy and are explicitly not supposed to take any ground or, in many cases, take out of the enemy forces. In coding terms, the spike is code meant to explore a problemspace that you B throw away and reimplement. As transactions were going to be between the bucket for the key and the datapointer to the value, my first thought was to put in another sector that would handle this mapping. This had the advantage of changing nothing except for adding one new sector type and the handling for it. Just doing this got me to the point where I could pass the following test: my $db1 = DBM::Deep->new( $filename ); my $db2 = DBM::Deep->new( $filename ); $db1->{abc} = 'foo'; is( $db1->{abc}, 'foo' ); is( $db2->{abc}, 'foo' ); $db1->begin_work(); $db1->{abc} = 'floober'; is( $db1->{abc}, 'floober' ); is( $db2->{abc}, 'foo' ); Just that much was a major accomplishment. =head2 Tests, tests, and more tests I was lucky that when I took over L that Joe Huckaby (the original author) handed me a comprehensive test suite. This meant that I could add in transactions with a high degree of confidence that I hadn't messed up non-transactional uses. The test suite was also invaluable for working through the various situations that transactions can cause. But, a test is only as good as the test-writer. For example, it was a while before I realized that I needed to test C{abc}, 'foo' )> I modifying it in the transaction. To pass that test, the code for retrieval needed to look first in the transaction's spot and if that spot had never been assigned to, look at the spot for the HEAD. While this is how SVN works, it wasn't an immediately obvious test to write. =head2 The HEAD In SVN, the HEAD revision is the latest revision checked into the repository. When you do a local modification, you're doing a modification to your copy of the HEAD. Then, you choose to either check in (C) or revert (C) your changes. In order to make the code work for the base case (no transaction running), the first entry in the transaction sector became the HEAD. Thus, it was assigned transaction ID 0. This also had the really neat side-benefit that C will run the code if and only if L is in a running transaction. =head2 Ending the spike At this point, I had learned everything I needed from the spike. Yes, the SVN idea looked like it was going to work. Yes, there were a lot of squibbly details. No, it wasn't going to be finished before I left YAPC::NA. *sigh* The biggest lessons learned from the spike were: =over 4 =item 1 Tests are good I seem to have to relearn this every project I work on. It's pretty sad, if you ask me. =item 1 The transaction sector is superfluous As I worked with it, the transaction sector didn't add any information over extending the actual bucket to have the transaction to datapointer mapping within it. =back =head2 Protection from changes After the missed test for checking that starting a transaction didn't lose the connection to the HEAD, I started writing more and more tests, being very anal about what I was checking. I wrote tests to check every piece of state I could think of before and after every change in state, regardless of where the change was made. Writing these tests immediately raised an issue with changing the HEAD while a transaction is running. If the transaction has already edited that key, it already has its new value. However, if it doesn't, it needs to be protected from the change to the HEAD. This is the key piece for providing I. My first attempt to solve this problem focused on having the transaction itself detect changes. But, the primary usecase for transactions is that each transaction is going to be running in a separate process. Without implementing IPC, the only common point between various processes is the datafile itself. The only process aware of the change is the process making the change. Even though it seemed counter-intuitive, the only sane mechanism was that each process modifying the HEAD would also protect all running transactions from its change, if needed. =head2 Committing and rolling back Now that changes are able to be made within a transaction and the transaction, the HEAD, and other transactions are protected from one other, the next step was to provide the ability to both commit and rollback these changes. =head3 Rollback Conceptually, rolling back should the simpler to implement - just discard the changes that have been made within the transaction and continue onward with the HEAD. And, for the first attempt, that is exactly what I did. This meant that the following test would pass: $db->{foo} = 'bar'; $db->begin_work; is( $db->{foo}, 'bar' ); $db->{foo} = 'baz'; is( $db->{foo}, 'baz' ); $db->rollback; is( $db->{foo}, 'bar' ); But, this didn't work out very well for repeated use of that transaction slot. I threw a number of solutions at the problem, but none of them were addressing the real issue - knowing which use of a transaction ID made the change vs. which use of a transaction ID was accessing the value. XXX =head3 Committing Committing is much harder than rolling back. The primary difficulty lies in tracking exactly what this transaction has changed in order to copy those changed bucket entries over to the HEAD. The good news is that only the actual datapointers for that transaction need to be copied over - the actual data sectors are left untouched. The key to the solution lay in the decoupled nature of the code I was writing along with the fact that every piece of the code had access to the engine object, if needed. Committing (and rolling back) are both handled by the Engine object. To get that information into the engine, each bucket modified by the transaction would inform the engine object that it had been modified by that transaction. When a commit occurs, the engine objet iterates over the modified buckets and transfers over the new datapointer and discards the old one. =head2 Deleted marker After some more tests, a final edge-case was found. Transactions are performed copy-on-write. This means that if there isn't an entry for that transaction, the HEAD is looked at. This doesn't work if a key has been deleted within a transaction. So, the entry must be marked as deleted within the transaction so that the HEAD isn't checekd. Likewise, when a new key is created in a transaction, the HEAD doesn't have an entry for that key. Consider the following situation: ok( !exists $db1->{foo} ); ok( !exists $db2->{foo} ); $db1->begin_work(); $db1->{foo} = 'bar'; ok( !exists $db2->{foo} ); The entry for the HEAD for 'foo' needs to be marked as deleted so that transactions which don't have 'foo' don't find something in the HEAD. To add this, I originally used a separate flag for each datapointer to indicate if it had been marked as deleted or not. I quickly recognized that a data-pointer can never have a value of 0 or 1 as those would point to the first and second bytes of the datafile, respectively. As these are part of the header, those are nonsensical values, so can be re-used for metadata. 0 now means "This slot has never been written to" and 1 means "This slot has been explicitly deleted." =head2 Freespace management Pre-1.0000 versions of L didn't have any form of freespace management. This meant that whenever a value was deleted, the old value just sat around taking up space, even though it would never be accessed again. While barely acceptable for non-transactional uses, this was made transactions unusable because transactions, as I've implemented them, are predicated on the concept of parallel values that are (presumably) cleaned up after the transaction is done with them. Freespace had never been added before because it requires a different file format than the one used in the pre-1.0000 versions. Because I had to change the file format anyways B I needed the feature, adding freespace now seemed like a good plan. Freespace was implemented by regularizing all the records so that L only has three different record sizes - Index, BucketList, and Data. Each record type has a fixed length based on various parameters the L datafile is created with. (In order to accomodate values of various sizes, Data records chain.) Whenever the engine is finished with a sector, it is freed and added to a list of free sectors of that sector's size. Whenever a new sector is requested, the freelist is checked first. If the freelist has a sector, it's reused, otherwise a new sector is added to the end of the datafile. Just like everything else, I wrote a mess of tests for adding freespace management. One of the tests I thought up was the following: $db->{foo} = [ 1 .. 3]; my $arr = $db->{foo}; is( $arr->[1], 2 ); # This always has worked. delete $db->{foo}; isnt( $arr->[1], 2 ); If this was a Perl datastructure, the last test should pass. In the past, that test would fail. The key concept I realized was that the C<$arr> variable is pointing to a stale area in memory. So, by adding a staleness counter that is incremented whenever the sector in use is deleted, I am able to determine if the variable in question is looking for a stale version of the sector. At this point, L returns undef because, at some point, the entry was deleted. =head2 Transactional staleness counters Once it was implemented for freespace management, staleness counters proved to be a very powerful concept for transactions themselves. Back in L, I mentioned that other processes modifying the HEAD will protect all running transactions from their effects. This provides I. But, the running transaction doesn't know about these entries. This is both a benefit and a drawback. It's a benefit that it makes tracking modified buckets very simple (q.v. L). But, it means that changes made to protect the transaction are not tracked. If they're not cleaned up, they will be seen the next time a transaction uses that transaction ID. By providing a staleness counter for transactions, the costs of cleaning up finished transactions is deferred until the space is actually used again. This is at the cost of having less-than-optimal space utilization. Changing this in the future would be completely transparent to users, so I felt it was an acceptable tradeoff for quick delivery of a functional product. =head2 Fiddly bits At this point, all the major pieces were in place. All that was left was to get all the fiddly bits into place. This included handling the behavior of C, simultaneous transactions with commits and rollbacks in various order, and making sure that transactions played nicely when a new Index sector needed to be created due to reindexing a full Bucketlist sector. Of these, C was the hardest. This is when I actually implemented the Iterator classes to handle walking the index/bucketlist chain. =head1 The future Basic transactions are only the first step. There are several features that can be added on top of what's been provided. If and in which order any of these are implemented is completely up to user-feedback. (Note: these are advanced topics - I cannot be held responsible for any bleeding ears.) =head2 Conflict resolution Conflict, in this context, is what happens when two transactions run simultaneously and change the same piece of data. Like most relational databases, L uses a very simplistic form of conflict resolution - last commit wins. This works quite well for a row-based RDBMS, but doesn't work as well for fractal structures like hashes. Contrast this with how Subversion handles conflict. It tracks when each transaction was started. If the HEAD was changed after the transaction started, the commit is rejected. It is up to the developer to pull in the latest changes, mediate any conflicts, and then recommit. There are several other ways to handle conflict resolution, many of which can be pulled from Haskell's use of Software Transactional Memory (STM). =head2 Checkpoints A transaction may have several steps within it. The first three may succeed, but the fourth might fail. Instead of rolling back all the way to the beginning, you might want to rollback to the last successful step and try again. This is a checkpoint. Most RDBMSes provide them and they aren't very difficult, conceptually, but I've seen just how "easy" some features can be once you start really exploring the problemspace. =head2 Sub-transactions Similar to L, sub-transactions provide a mechanism for trying something within a transaction without affecting the transaction. However, instead of saying "These steps are safely finished," sub-transactions still provides for the ability to rollback the primary transaction all the way. Sub-transactions can also work better with libraries that may want to use transactions themselves. This, of all the features listed, is the one I'm most interested in implementing next. =head2 Durability As mentioned in L, the 'D' in ACID stands for I. L does not satisfy that criterion because durability almost always requires another file (or files) for a commit log. I deemed this unacceptable for this release because one of the L's features is the single datafile. To be honest, I don't anticipate this to be an issue for most users because the niche that L occupies is one that is tolerant to failure and a small chance of potential dataloss. However, Berkley DB does provide durability with only a single file. If it becomes necessary, cribbing from them could be an option. =cut