article.pod

   1 =head0 Adding transactions to DBM::Deep
   2
   3 =head1 What is DBM::Deep?
   4
   5 L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
   6 storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
   7 in memory. The datafile produced is able to be ftp'ed from one machine to
   8 another, regardless of OS or Perl version. There are several reasons why
   9 someone would want to do this.
  10
  11 =over 4
  12
  13 =item * Transparent Persistence
  14
  15 This is the ability to save a set of data structures to disk and retrieve them
  16 later without the vast majority of the program even knowing that the data is
  17 persisted. Furthermore, the datastructure is persisted immediately and not at
  18 set marshalling periods.
  19
  20 =item * Huge datastructures
  21
  22 Normally, datastructures are limited by the size of RAM the server has.
  23 L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
  24 instead.
  25
  26 =item * IPC
  27
  28 While not a common use, this allows for inter-process communication without
  29 worrying about the specifics of how a given OS handles IPC.
  30
  31 =back
  32
  33 And, with the release of 1.00, there is now a fourth reason -
  34 software-transactional memory, or STM
  35 (L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
  36
  37 =head1 How does DBM::Deep work?
  38
  39 L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
  40 single read and write go to the file and modify the file immediately. To
  41 represent Perl's hashes and arrays, a record-based file format is used. There
  42 is a file header storing file-wide values, such as the size of the internal
  43 file pointers.  Afterwards, there are the data records.
  44
  45 =head2 DBM::Deep's file structure
  46
  47 L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
  48 index - arrays are currently just funny hashes internally) is hashed using MD5
  49 and then stored in a cascade of Index and Bucketlist records. The bucketlist
  50 record stores the actual key string and pointers to where the data records are
  51 stored. The data records themselves are one of Null, Scalar, or Reference.
  52 Null represents an I<undef>, Scalar represents a string (numbers are
  53 stringified for simplicity) and are allocated in 256byte chunks.  References
  54 represent an array or hash reference and contains a pointer to an Index and
  55 Bucketlist cascade of its own.
  56
  57 =head2 DBM::Deep's class hierarchy
  58
  59 Managing all of these functions takes a lot of different abstractions. There
  60 are at least 3 different interfacing layers, and more if you look hard enough.
  61 To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:
  62
  63 =over 4
  64
  65 =item * Tying classes
  66
  67 These are the classes that manage the external face of the module. They manage
  68 B<all> of the interactions with outside code - OO interface, tying, and
  69 various utility methods. If they cannot handle the request themselves, they
  70 delegate to the engine. There are currently three classes in this layer.
  71
  72 =item * Engine classes
  73
  74 These classes manage the file format and all of the ways that the records
  75 interact with each other. Nearly every call will make requests to the File
  76 class for reading and/or writing data to the file. There are currently nine
  77 classes in this layer.
  78
  79 =item * File class
  80
  81 This class mediates all interaction with the file system. Every read, write,
  82 lock, and unlock goes through this class. There is currently one class in this
  83 layer.
  84
  85 =item * Iterator classes
  86
  87 These are introspection classes that provide iteration for hashes. They manage
  88 keeping track of where the next key should be and how to get there. There are
  89 currently four classes in this layer.
  90
  91 =back
  92
  93 =head1 What are transactions?
  94
  95 Originally from the database world, a transaction is a way of isolating the
  96 effects of a given set of actions, then applying them all at once. It's a way
  97 of saying "I'm going to try the following steps, see if I like the result,
  98 then I want everyone else looking at this datastore to see the results
  99 immediately." The most common example is taken from banking. Let's say that an
 100 application receives a request to have Joe pay Bob five zorkmids.  Without
 101 transactions, the application would take the money from Joe's account, then
 102 add the money to Bob's account. But, what happens if the application crashes
 103 after debiting Joe, but before crediting Bob? The application has made money
 104 disappear. Or, vice versa, if Bob is credited before Joe is debited, the
 105 application has created money.
 106
 107 With a transaction wrapping the money transfer, if the application crashes in
 108 the middle, it's as if the action never happened. So, when the application
 109 recovers from the crash, Joe and Bob still have the same amount of money in
 110 their accounts as they did before and the transaction can restart and Bob can
 111 finally receive his zorkmids.
 112
 113 More formally, transactions are generally considered to be proper when they are
 114 ACID-compliant. ACID is an acronym that means the following:
 115
 116 =over 4
 117
 118 =item * Atomic
 119
 120 Either every change happens or none of the changes happen.
 121
 122 =item * Consistent
 123
 124 When the transaction begins and when it is committed, the database must be in
 125 a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
 126
 127 =item * Isolated
 128
 129 As far as a transaction is concerned, it is the only thing running against the
 130 database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
 131 strongest isolation level possible.
 132
 133 =item * Durable
 134
 135 Once the database says that a comit has happened, the commit will be
 136 guaranteed, regardless of whatever happens.
 137
 138 =back
 139
 140 =head1 Why add them to DBM::Deep?
 141
 142 The ability to have actions occur in either I<atomically> (as in the previous
 143 example) or I<isolation> from the rest of the users of the data is a powerful
 144 thing. This allows for a certain amount of safety and predictability in how
 145 data transformations occur. Imagine, for example, that you have a set of
 146 calculations that will update various variables. However, there are some
 147 situations that will cause you to throw away all results and start over with a
 148 different seed. Without transactions, you would have to put everything into
 149 temporary variables, then transfer the values when the calculations were found
 150 to be successful. With STM, you start a transaction and do your thing within
 151 it. If the calculations succeed, you commit. If they fail, you rollback and
 152 try again. If you're thinking that this is very similar to how SVN or CVS
 153 works, you're absolutely correct - they are transactional in the exact same
 154 way.
 155
 156 =head1 How it happened
 157
 158 =head2 The backstory
 159
 160 The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
 161 complex software endeavor I've ever undertaken. The first step was to figure
 162 out exactly how transactions were going to work. After several spikesN<These
 163 are throwaway coding explorations.>, the best design seemed to look to SVN
 164 instead of relational databases. The more I investigated, the more I ran up
 165 against the object-relational impedance mismatch
 166 N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
 167 time in terms of being able to translate designs. In the relational world,
 168 transactions are generally implemented either as row-level locks or using MVCC
 169 N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
 170 these assume that there is a I<row>, or singular object, that can be locked
 171 transparently to everything else. This doesn't translate to a fractally
 172 repeating structure like a hash or an array.
 173
 174 However, the design used by SVN deals with directories and files which
 175 corresponds very closely to hashes and hashkeys. In SVN, the modifications are
 176 stored in the file's structure. Translating this to hashes and hashkeys, this
 177 means that transactional information should be stored in the keys. This means
 178 that the entire datafile is unaware of anything to do with transactions, except
 179 for the key's data structure within the bucket.
 180
 181 =head2 Transactions in the keys
 182
 183 The first pass was to expand the Bucketlist sector to go from a simple key /
 184 datapointer mapping to a more complex key / transaction / datapointer mapping.
 185 Initially, I interposed a Transaction record that the bucketlist pointed to.
 186 That then contained the transaction / datapointer mapping. This had the
 187 advantage of changing nothing except for adding one new sector type and the
 188 handling for it. This was very quickly merged into the Bucketlist record to
 189 simplify the resulting code.
 190
 191 This first step got me to the point where I could pass the following test:
 192
 193   my $db1 = DBM::Deep->new( $filename );
 194   my $db2 = DBM::Deep->new( $filename );
 195
 196   $db1->{abc} = 'foo';
 197
 198   is( $db1->{abc}, 'foo' );
 199   is( $db2->{abc}, 'foo' );
 200
 201   $db1->begin_work();
 202
 203       is( $db1->{abc}, 'foo' );
 204       is( $db2->{abc}, 'foo' );
 205
 206       $db1->{abc} = 'floober';
 207
 208       is( $db1->{abc}, 'floober' );
 209       is( $db2->{abc}, 'foo' );
 210
 211 Just that much was a major accomplishment. The first pass only looked in the
 212 transaction's spot in the bucket for that key. And, that passed my first tests
 213 because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
 214 modifying it in the transaction. To pass that test, the code for retrieval
 215 needed to look first in the transaction's spot and if that spot had never been
 216 assigned to, look at the spot for the HEAD.
 217
 218 =head2 The concept of the HEAD
 219
 220 This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
 221 revision checked into the repository. When you do a ocal modification, you're
 222 doing a modification to the HEAD. Then, you choose to either check in your
 223 code (commit()) or revert (rollback()).
 224
 225 In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
 226 benefits:
 227
 228 =over 4
 229
 230 =item * Easy identifiaction of a transaction
 231
 232 C<if ( $trans_id ) {}> will run the code if and only if we are in a running
 233 transaction.
 234
 235 =item * The HEAD is the first bucket
 236
 237 In a given bucket, the HEAD is the first datapointer because we mutliply the
 238 size of the transactional bookkeeping by the transaction ID to find the offset
 239 to seek into the file.
 240
 241 =back
 242
 243 =head2 Protection from changes
 244
 245 Let's assume that a transaction is running in one process and another process
 246 is also modifying the same area in the data. The only way that process B can
 247 notify process A that a change has occurred is through the common point - the
 248 DBM file. Because of this, every process that changes the HEAD needs to
 249 protect all currently running transactions by copying over the pointer to the
 250 original value into every transaction which hasn't already modified this
 251 entry. (If it has, then the new value shouldn't overwrite the transaction's
 252 modification.) This is the key piece for providing I<Isolation>.
 253
 254 =head2 Tracking modified buckets
 255
 256 Rolling back changes is very simple - just don't apply them to the HEAD. The
 257 next time that transaction ID is reused, the changes will be ignored (q.v.
 258 L</Staleness counters>). Committing, however, requires that all the changes
 259 must be transferred over from the bucket entry for the given transaction ID to
 260 the entry for the HEAD.
 261
 262 =head2 Deleted marker
 263
 264 Transactions are performed copy-on-write. This means that if there isn't an
 265 entry for that transaction, the HEAD is looked at. This doesn't work if a key
 266 has been deleted within a transaction. So, the entry must be marked as deleted
 267 within the transaction so that the HEAD isn't checekd.
 268
 269 Likewise, when a new key is created in a transaction, the HEAD doesn't have an
 270 entry for that key. Consider the following situation:
 271
 272   ok( !exists $db1->{foo} );
 273   ok( !exists $db2->{foo} );
 274
 275   $db1->begin_work();
 276   $db1->{foo} = 'bar';
 277
 278   ok( !exists $db2->{foo} );
 279
 280 The entry for the HEAD for 'foo' needs to be marked as deleted so that
 281 transactions which don't have 'foo' don't find something in the HEAD.
 282
 283 =head2 Freespace management
 284
 285 The second major piece to the 1.00 release was freespace management. In
 286 pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
 287 recycled. While always a requested feature, the complexity required to
 288 implement freespace meant that it needed to wait for a complete rewrite of
 289 several pieces, such as for transactions.
 290
 291 Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
 292 only has three different record sizes - Index, BucketList, and Data. Each
 293 record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
 294 datafile is created with. (In order to accomodate values of various sizes, Data
 295 records chain.) Whenever a sector is freed, it's added to a freelist of that
 296 sector's size. Whenever a new sector is requested, the freelist is checked
 297 first. If the freelist has a sector, it's reused, otherwise a new sector is
 298 added to the end of the datafile.
 299
 300 Freespace management did bring up another issue - staleness. It is possible to
 301 have a pointer to a record in memory. If that record is deleted, then reused,
 302 the pointer in memory has no way of determining that is was deleted and
 303 readded vs. modified. So, a staleness counter was added which is incremented
 304 every time the sector is reused through the freelist. If you then attempt to
 305 access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
 306 the entry was deleted.
 307
 308 =head2 Staleness counters
 309
 310 Once it was implemented for freespace management, staleness counters proved to
 311 be a very powerful concept for transactions themselves. Back in L<Protection
 312 from changes>, I mentioned that other processes modifying the HEAD will
 313 protect all running transactions from their effects. This provides
 314 I<Isolation>. But, the running transaction doesn't know about these entries.
 315 If they're not cleaned up, they will be seen the next time a transaction uses
 316 that transaction ID.
 317
 318 By providing a staleness counter for transactions, the costs of cleaning up
 319 finished transactions is deferred until the space is actually used again. This
 320 is at the cost of having less-than-optimal space utilization. Changing this in
 321 the future would be completely transparent to users, so I felt it was an
 322 acceptable tradeoff for delivering working code quickly.
 323
 324 =head1 Conclusion
 325
 326 =cut