article.pod

   1 =head0 Adding transactions to DBM::Deep
   2
   3 =head1 What is DBM::Deep?
   4
   5 L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
   6 storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
   7 in memory. The datafile produced is able to be ftp'ed from one machine to
   8 another, regardless of OS or Perl version. There are several reasons why
   9 someone would want to do this.
  10
  11 =over 4
  12
  13 =item * Transparent Persistence
  14
  15 This is the ability to save a set of data structures to disk and retrieve them
  16 later without the vast majority of the program even knowing that the data is
  17 persisted. Furthermore, the datastructure is persisted immediately and not at
  18 set marshalling periods.
  19
  20 =item * Huge datastructures
  21
  22 Normally, datastructures are limited by the size of RAM the server has.
  23 L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
  24 instead.
  25
  26 =item * IPC
  27
  28 While not a common use, this allows for inter-process communication without
  29 worrying about the specifics of how a given OS handles IPC.
  30
  31 =back
  32
  33 And, with the release of 1.00, there is now a fourth reason -
  34 software-transactional memory, or STM
  35 (L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
  36
  37 =head1 What are transactions?
  38
  39 Originally from the database world, a transaction is a way of isolating the
  40 effects of a given set of actions, then applying them all at once. It's a way
  41 of saying "I'm going to try the following steps, see if I like the result,
  42 then I want everyone else looking at this datastore to see the results
  43 immediately." The most common example is taken from banking. Let's say that an
  44 application receives a request to have Joe pay Bob five zorkmids.  Without
  45 transactions, the application would take the money from Joe's account, then
  46 add the money to Bob's account. But, what happens if the application crashes
  47 after debiting Joe, but before crediting Bob? The application has made money
  48 disappear. Or, vice versa, if Bob is credited before Joe is debited, the
  49 application has created money.
  50
  51 With a transaction wrapping the money transfer, if the application crashes in
  52 the middle, it's as if the action never happened. So, when the application
  53 recovers from the crash, Joe and Bob still have the same amount of money in
  54 their accounts as they did before and the transaction can restart and Bob can
  55 finally receive his zorkmids.
  56
  57 More formally, transactions are generally considered to be proper when they are
  58 ACID-compliant. ACID is an acronym that means the following:
  59
  60 =over 4
  61
  62 =item * Atomic
  63
  64 Either every change happens or none of the changes happen.
  65
  66 =item * Consistent
  67
  68 When the transaction begins and when it is committed, the database must be in
  69 a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
  70
  71 =item * Isolated
  72
  73 As far as a transaction is concerned, it is the only thing running against the
  74 database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
  75 strongest isolation level possible.
  76
  77 =item * Durable
  78
  79 Once the database says that a comit has happened, the commit will be
  80 guaranteed, regardless of whatever happens.
  81
  82 =back
  83
  84 =head1 Why add them to DBM::Deep?
  85
  86 The ability to have actions occur in either I<atomically> (as in the previous
  87 example) or I<isolation> from the rest of the users of the data is a powerful
  88 thing. This allows for a certain amount of safety and predictability in how
  89 data transformations occur. Imagine, for example, that you have a set of
  90 calculations that will update various variables. However, there are some
  91 situations that will cause you to throw away all results and start over with a
  92 different seed. Without transactions, you would have to put everything into
  93 temporary variables, then transfer the values when the calculations were found
  94 to be successful. With STM, you start a transaction and do your thing within
  95 it. If the calculations succeed, you commit. If they fail, you rollback and
  96 try again. If you're thinking that this is very similar to how SVN or CVS
  97 works, you're absolutely correct - they are transactional in the exact same
  98 way.
  99
 100 =head1 How it happened
 101
 102 =head2 The backstory
 103
 104 The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
 105 complex software endeavor I've ever undertaken. The first step was to figure
 106 out exactly how transactions were going to work. After several spikesN<These
 107 are throwaway coding explorations.>, the best design seemed to look to SVN
 108 instead of relational databases. The more I investigated, the more I ran up
 109 against the object-relational impedance mismatch
 110 N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
 111 time in terms of being able to translate designs. In the relational world,
 112 transactions are generally implemented either as row-level locks or using MVCC
 113 N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
 114 these assume that there is a I<row>, or singular object, that can be locked
 115 transparently to everything else. This doesn't translate to a fractally
 116 repeating structure like a hash or an array.
 117
 118 However, the design used by SVN deals with directories and files which
 119 corresponds very closely to hashes and hashkeys. In SVN, the modifications are
 120 stored in the file's structure. Translating this to hashes and hashkeys, this
 121 means that transactional information should be stored in the keys. This means
 122 that the entire datafile is unaware of anything to do with transactions, except
 123 for the key's data structure within the bucket.
 124
 125 =head2 DBM::Deep's file structure
 126
 127 L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
 128 index - arrays are currently just funny hashes internally) is hashed using MD5
 129 and then stored in a cascade of Index and Bucketlist records. The bucketlist
 130 record stores the actual key string and pointers to where the data records are
 131 stored. The data records themselves are one of Null, Scalar, or Reference.
 132 Null represents an I<undef>, Scalar represents a string (numbers are
 133 stringified for simplicity) and are allocated in 256byte chunks.  References
 134 represent an array or hash reference and contains a pointer to an Index and
 135 Bucketlist cascade of its own.
 136
 137 =head2 Transactions in the keys
 138
 139 The first pass was to expand the Bucketlist sector to go from a simple key /
 140 datapointer mapping to a more complex key / transaction / datapointer mapping.
 141 Initially, I interposed a Transaction record that the bucketlist pointed to.
 142 That then contained the transaction / datapointer mapping. This had the
 143 advantage of changing nothing except for adding one new sector type and the
 144 handling for it. This was very quickly merged into the Bucketlist record to
 145 simplify the resulting code.
 146
 147 This first step got me to the point where I could pass the following test:
 148
 149   my $db1 = DBM::Deep->new( $filename );
 150   my $db2 = DBM::Deep->new( $filename );
 151
 152   $db1->{abc} = 'foo';
 153
 154   is( $db1->{abc}, 'foo' );
 155   is( $db2->{abc}, 'foo' );
 156
 157   $db1->begin_work();
 158
 159       is( $db1->{abc}, 'foo' );
 160       is( $db2->{abc}, 'foo' );
 161
 162       $db1->{abc} = 'floober';
 163
 164       is( $db1->{abc}, 'floober' );
 165       is( $db2->{abc}, 'foo' );
 166
 167 Just that much was a major accomplishment. The first pass only looked in the
 168 transaction's spot in the bucket for that key. And, that passed my first tests
 169 because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
 170 modifying it in the transaction. To pass that test, the code for retrieval
 171 needed to look first in the transaction's spot and if that spot had never been
 172 assigned to, look at the spot for the HEAD.
 173
 174 =head2 The concept of the HEAD
 175
 176 This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
 177 revision checked into the repository. When you do a ocal modification, you're
 178 doing a modification to the HEAD. Then, you choose to either check in your
 179 code (commit()) or revert (rollback()).
 180
 181 In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
 182 benefits:
 183
 184 =over 4
 185
 186 =item * Easy identifiaction of a transaction
 187
 188 C<if ( $trans_id ) {}> will run the code if and only if we are in a running
 189 transaction.
 190
 191 =item * The HEAD is the first bucket
 192
 193 In a given bucket, the HEAD is the first datapointer because we mutliply the
 194 size of the transactional bookkeeping by the transaction ID to find the offset
 195 to seek into the file.
 196
 197 =back
 198
 199 =head2 Protection from changes
 200
 201 Let's assume that a transaction is running in one process and another process
 202 is also modifying the same area in the data. The only way that process B can
 203 notify process A that a change has occurred is through the common point - the
 204 DBM file. Because of this, every process that changes the HEAD needs to
 205 protect all currently running transactions by copying over the pointer to the
 206 original value into every transaction which hasn't already modified this
 207 entry. (If it has, then the new value shouldn't overwrite the transaction's
 208 modification.) This is the key piece for providing I<Isolation>.
 209
 210 =head2 Tracking modified buckets
 211
 212 Rolling back changes is very simple - just don't apply them to the HEAD. The
 213 next time that transaction ID is reused, the changes will be ignored (q.v.
 214 L</Staleness counters>). Committing, however, requires that all the changes
 215 must be transferred over from the bucket entry for the given transaction ID to
 216 the entry for the HEAD.
 217
 218 =head2 Deleted marker
 219
 220 Transactions are performed copy-on-write. This means that if there isn't an
 221 entry for that transaction, the HEAD is looked at. This doesn't work if a key
 222 has been deleted within a transaction. So, the entry must be marked as deleted
 223 within the transaction so that the HEAD isn't checekd.
 224
 225 Likewise, when a new key is created in a transaction, the HEAD doesn't have an
 226 entry for that key. Consider the following situation:
 227
 228   ok( !exists $db1->{foo} );
 229   ok( !exists $db2->{foo} );
 230
 231   $db1->begin_work();
 232   $db1->{foo} = 'bar';
 233
 234   ok( !exists $db2->{foo} );
 235
 236 The entry for the HEAD for 'foo' needs to be marked as deleted so that
 237 transactions which don't have 'foo' don't find something in the HEAD.
 238
 239 =head2 Freespace management
 240
 241 The second major piece to the 1.00 release was freespace management. In
 242 pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
 243 recycled. While always a requested feature, the complexity required to
 244 implement freespace meant that it needed to wait for a complete rewrite of
 245 several pieces, such as for transactions.
 246
 247 Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
 248 only has three different record sizes - Index, BucketList, and Data. Each
 249 record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
 250 datafile is created with. (In order to accomodate values of various sizes, Data
 251 records chain.) Whenever a sector is freed, it's added to a freelist of that
 252 sector's size. Whenever a new sector is requested, the freelist is checked
 253 first. If the freelist has a sector, it's reused, otherwise a new sector is
 254 added to the end of the datafile.
 255
 256 Freespace management did bring up another issue - staleness. It is possible to
 257 have a pointer to a record in memory. If that record is deleted, then reused,
 258 the pointer in memory has no way of determining that is was deleted and
 259 readded vs. modified. So, a staleness counter was added which is incremented
 260 every time the sector is reused through the freelist. If you then attempt to
 261 access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
 262 the entry was deleted.
 263
 264 =head2 Staleness counters
 265
 266 Once it was implemented for freespace management, staleness counters proved to
 267 be a very powerful concept for transactions themselves. Back in L<Protection
 268 from changes>, I mentioned that other processes modifying the HEAD will
 269 protect all running transactions from their effects. This provides
 270 I<Isolation>. But, the running transaction doesn't know about these entries.
 271 If they're not cleaned up, they will be seen the next time a transaction uses
 272 that transaction ID.
 273
 274 By providing a staleness counter for transactions, the costs of cleaning up
 275 finished transactions is deferred until the space is actually used again. This
 276 is at the cost of having less-than-optimal space utilization. Changing this in
 277 the future would be completely transparent to users, so I felt it was an
 278 acceptable tradeoff for delivering working code quickly.
 279
 280 =head1 Conclusion
 281
 282 =cut