article.pod

   1 =head0 Adding transactions to DBM::Deep
   2
   3 For the past nine months, I've been working on adding transactions to
   4 L<DBM::Deep|DBM::Deep>. During that time, I looked far and wide for an
   5 accessible description of how a programmer should go about implementing
   6 transactions. The only things I found were either extremely pedantic academic
   7 papers or the code of complex applications. The former weren't very easy to
   8 read and the latter were less so N<Have I<you> ever tried to read the source
   9 for BDB or InnoDB? Reading perl's source is easier.>. This is the article I
  10 wished I'd been able to read nine months ago.
  11
  12 =head1 What is DBM::Deep?
  13
  14 L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
  15 storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
  16 in memory. The datafile produced is able to be transferred from one machine to
  17 another, regardless of OS or Perl version. There are several reasons why
  18 someone would want to do this.
  19
  20 =over 4
  21
  22 =item * Transparent Persistence
  23
  24 This is the ability to save a set of datastructures to disk and retrieve them
  25 later without the vast majority of the program ever knowing that the data is
  26 persisted. Furthermore, the datastructure is persisted immediately and not at
  27 set marshalling periods.
  28
  29 =item * Huge datastructures
  30
  31 Normally, datastructures are limited by the size of RAM the server has.
  32 L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
  33 size instead (up to the given perl's largefile support).
  34
  35 =item * Database
  36
  37 Most programmers hear the word "database" and think "relational database
  38 management system." A database is a more general term meaning "place one
  39 stores data." This can be relational, object, or something else. The software
  40 used to manage and query a database is a "database management system" (DBMS).
  41
  42 L<DBM::Deep|DBM::Deep> provides one half of a DBMS - the data storage part.
  43 Once the datastructures on disk, L<DBM::Deep|DBM::Deep> provides the
  44 capability to allow multiple processes to access the data.
  45
  46 =back
  47
  48 =head1 How does DBM::Deep work?
  49
  50 L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
  51 read and write go directly to the file and modify the file immediately. To
  52 represent Perl's hashes and arrays, a record-based file format is used. There
  53 is a file header storing file-wide values, such as the size of the internal
  54 file pointers.  Afterwards, there are the data records.
  55
  56 The most important feature of L<DBM::Deep|DBM::Deep> is that it can be
  57 completely transparent. Other than the line tying the variable to the file, no
  58 other part of your program needs to know that the variable being used isn't a
  59 "normal" Perl variable. So, the following works just fine:
  60
  61   # As far as the rest of the program is concerned, the following two lines
  62   # are identical - they produce a variable $foo that can be used as a hashref.
  63   # my $foo = {};
  64   my $foo = DBM::Deep->new( 'mydb.db' );
  65
  66   $foo->{bar} = 'baz';
  67   $foo->{complex} = [
  68     { a => 'b' }, 0, '123', undef, [ 1 .. 5 ],
  69   ];
  70
  71   # And so on and so forth.
  72
  73 =head2 DBM::Deep's file structure
  74
  75 L<DBM::Deep|DBM::Deep>'s file structure is record-based. The key (or array
  76 index - arrays are currently just funny hashes internally) is hashed using MD5
  77 and then stored in a cascade of Index and bucketlist records. The bucketlist
  78 record stores the actual key string and pointers to where the data records are
  79 stored. The data records themselves are one of Null, Scalar, or Reference.
  80 Null represents an I<undef>, Scalar represents a string (numbers are
  81 stringified internally for simplicity) and are allocated in 256byte chunks.
  82 Reference represent an array or hash reference and contains a pointer to an
  83 Index and bucketlist cascade of its own. Reference will also store the class
  84 the hash or array reference is blessed into, meaning that almost all objects
  85 can be stored safely.
  86
  87 =head2 DBM::Deep's class hierarchy
  88
  89 Managing all of these functions takes a lot of different abstractions. There
  90 are at least 3 different interfacing layers, and more if you look hard enough.
  91 To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:
  92
  93 =over 4
  94
  95 =item * Tying classes
  96
  97 These are the classes that manage the external face of the module. They manage
  98 B<all> of the interactions with outside code - OO interface, tying, and
  99 various utility methods. If they cannot handle the request themselves, they
 100 delegate to the engine. There are currently three classes in this layer.
 101
 102 =item * Engine classes
 103
 104 These classes manage the file format and all of the ways that the records
 105 interact with each other. Nearly every call will make requests to the File
 106 classes for reading and/or writing data to the file. There are currently nine
 107 classes in this layer, including a class for each record type.
 108
 109 =item * File class
 110
 111 This class mediates all interaction with the file system. Every read, write,
 112 lock, and unlock goes through this class. There is currently one class in this
 113 layer.
 114
 115 =item * Iterator classes
 116
 117 These are introspection classes that provide iteration for hashes. They manage
 118 keeping track of where the next key should be and how to get there. There are
 119 currently four classes in this layer.
 120
 121 =back
 122
 123 =head1 Why add transactions to DBM::Deep?
 124
 125 For the most part, L<DBM::Deep|DBM::Deep> functions perfectly well without
 126 transactions. Most uses that I've seen tend to be either single-process
 127 read/write persistence or a multi-process readonly cache for an expensive, but
 128 static, lookup. With transactions, L<DBM::Deep|DBM::Deep> can now be used
 129 safely for multi-process read/write persistence in situations that don't
 130 need (or cannot use) a full RDBMS.
 131
 132 =head2 What are transactions?
 133
 134 Originally from the database world, a transaction is a way of isolating the
 135 effects of a given set of actions, then applying them all at once. It's a way
 136 of saying "I'm going to try the following steps, see if I like the result,
 137 then I want everyone else looking at this datastore to see the results
 138 immediately." The most common example is taken from banking. Let's say that an
 139 application receives a request to have Joe pay Bob five zorkmids. Without
 140 transactions, the application would take the money from Joe's account, then
 141 add the money to Bob's account. But, what happens if the application crashes
 142 after debiting Joe, but before crediting Bob? The application has made money
 143 disappear. Or, vice versa, if Bob is credited before Joe is debited, the
 144 application has created money.
 145
 146 With a transaction wrapping the money transfer, if the application crashes in
 147 the middle, it's as if the action never happened. So, when the application
 148 recovers from the crash, Joe and Bob still have the same amount of money in
 149 their accounts as they did before. The transaction can restart and Bob can
 150 finally receive his zorkmids.
 151
 152 More formally, transactions are generally considered to be proper when they are
 153 ACID-compliant. ACID is an acronym that stands for the following:
 154
 155 =over 4
 156
 157 =item * Atomic
 158
 159 Either every change happens or none of the changes happen.
 160
 161 =item * Consistent
 162
 163 When the transaction begins and when it is committed, the database must be in
 164 a legal state. This condition doesn't apply to L<DBM::Deep|DBM::Deep> as all
 165 Perl datastructures are internally consistent.
 166
 167 =item * Isolated
 168
 169 As far as a transaction is concerned, it is the only thing running against the
 170 database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep>
 171 provides the strongest isolation level possible, usually called
 172 I<Serializable> by most RDBMSes.
 173
 174 =item * Durable
 175
 176 Once the database says that a commit has happened, the commit will be
 177 guaranteed, regardless of whatever happens. I chose to not implement this
 178 condition in L<DBM::Deep|DBM::Deep> N<This condition requires the presence of
 179 at least one more file, which violates one of the original design goals.>.
 180
 181 =back
 182
 183 =head2 Why add them to DBM::Deep?
 184
 185 The ability to have actions occur in either I<atomically> (as in the previous
 186 example) or I<isolation> from the rest of the users of the data is a powerful
 187 thing. This allows for a large amount of safety and predictability in how
 188 data transformations occur. Imagine, for example, that you have a set of
 189 calculations that will update various variables. However, there are some
 190 situations that will cause you to throw away all results and start over with a
 191 different seed. Without transactions, you would have to put everything into
 192 temporary variables, then transfer the values when the calculations were found
 193 to be successful. If you ever add a new value or if a value is used in only
 194 certain calculations, you may forget to do the correct thing. With
 195 transactions, you start a transaction and do your thing within it. If the
 196 calculations succeed, you commit. If they fail, you rollback and try again.
 197
 198 If you're thinking that this is very similar to how Subversion (SVN) or CVS
 199 works, you're absolutely correct - they are transactional in exactly the same
 200 way.
 201
 202 =head1 How it happened
 203
 204 The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the
 205 single most complex software endeavor I've ever undertaken. While transactions
 206 are conceptually simple, the devil is in the details. And there were a B<lot>
 207 of details.
 208
 209 =head2 The naive approach
 210
 211 Initially, I hoped I could just copy the entire datastructure and mark it as
 212 owned by the transaction. This is the most straightforward solution and is
 213 extremely simple to implement. Whenever a transaction starts, copy the whole
 214 thing over to somewhere else. If the transaction is committed, overwrite the
 215 original with the transaction's version. If it's rolled back, throw it away.
 216
 217 It's a popular solution as seen by the fact that it's the mechanism used in
 218 both L<Data::Transactional|Data::Transactional> and
 219 L<Tie::Scalar::Transactional|Tie::Scalar::Transactional>. While very simple to
 220 implement, it scales very poorly as the datastructure grows. As one of the
 221 primary usecases for L<DBM::Deep|DBM::Deep> is working with huge
 222 datastructures, this plan was dead on arrival.
 223
 224 =head2 The relational approach
 225
 226 As I'm also a MySQL DBA, I looked to how the InnoDB engine implements
 227 transactions. Given that relational databases are designed to work with large
 228 amounts of data, it made sense to look here next.
 229
 230 InnoDB implements transactions using MVCC
 231 N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. When a
 232 transaction starts, it stores a timestamp corresponding to its start time.
 233 Whenever a modification to a row is committed, the modification is
 234 timestamped. When a transaction modifies a row, it copies the row into its
 235 own scratchpad and modifies it. Whenever a transaction reads a row, it first
 236 attempts to read the row from its scratchpad. If it's not there, then it reads
 237 the version of the row whose timestamp is no later than the timestamp of the
 238 transaction. When committing, the transaction's scratchpad is written out to
 239 the main data area with the timestamp of the commit and the scratchpad is
 240 thrown away. When rolling back, the scratchpad is thrown out.
 241
 242 At first, this mechanism looked promising and I whipped up a couple spikes
 243 (or code explorations) to try it out. The problem I ran into, again, was the
 244 existence of large datastructures. When making large changes to a relational
 245 database within a transaction, the engine can store the rows within the actual
 246 table and mark them as being part of a transaction's scratchpad. Perl's
 247 fractal datastructures, however, don't lend themselves to this kind of
 248 treatment. The scratchpad would, in some pathological cases, be a
 249 near-complete copy of the original datastructure. N<Funnily enough, this is
 250 yet another example of the object relational impedance mismatch
 251 (http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch).>
 252
 253 =head2 The subversive approach
 254
 255 Despairing, I went to YAPC::NA::2006 hoping to discuss the problem with the
 256 best minds in the Perl community. I was lucky enough to run into both Audrey
 257 Tang (author of Pugs) and clkao (author of SVK). In between talks, I managed
 258 to discuss the problems I'd run into with both of them. They looked at me
 259 oddly and asked why I wasn't looking at Subversion (SVN) as a model for
 260 transactions. My first reaction was "It's a source control application. What
 261 does it know about transa- . . . Ohhhh!" And they smiled.
 262
 263 Like Perl datastructures, a filesystem is fractal. Directories contain both
 264 files and directories. Directories act as hashes and a files act as scalars
 265 whose names are their hashkeys. When a modification is made to a SVN checkout,
 266 SVN tracks the changes at the filename (or directory name) level. When a
 267 commit is made, only those filenames which have changes are copied over to the
 268 HEAD. Everything else remains untouched.
 269
 270 Translating this to hashes and hashkeys, this implies that transactional
 271 information should be stored at the level of the hashkey. Or, in
 272 L<DBM::Deep|DBM::Deep> terms, within the bucket for that key. As a nice
 273 side-effect, other than the key's datastructure within the bucket, the entire
 274 datafile is unaware of anything to do with transactions.
 275
 276 =head2 The spike
 277
 278 Spikes are kind of like a reconnaissance mission in the military. They go out
 279 to get intel on the enemy and are explicitly not supposed to take any ground
 280 or, in many cases, take out of the enemy forces. In coding terms, the spike is
 281 code meant to explore a problemspace that you B<will> throw away and
 282 reimplement.
 283
 284 As transactions were going to be between the bucket for the key and the
 285 datapointer to the value, my first thought was to put in another sector that
 286 would handle this mapping. This had the advantage of changing nothing except
 287 for adding one new sector type and the handling for it. Just doing this got me
 288 to the point where I could pass the following test:
 289
 290   my $db1 = DBM::Deep->new( $filename );
 291   my $db2 = DBM::Deep->new( $filename );
 292
 293   $db1->{abc} = 'foo';
 294
 295   is( $db1->{abc}, 'foo' );
 296   is( $db2->{abc}, 'foo' );
 297
 298   $db1->begin_work();
 299
 300       $db1->{abc} = 'floober';
 301
 302       is( $db1->{abc}, 'floober' );
 303       is( $db2->{abc}, 'foo' );
 304
 305 Just that much was a major accomplishment.
 306
 307 =head2 Tests, tests, and more tests
 308
 309 I was lucky that when I took over L<DBM::Deep|DBM::Deep> that Joe Huckaby
 310 (the original author) handed me a comprehensive test suite. This meant that I
 311 could add in transactions with a high degree of confidence that I hadn't
 312 messed up non-transactional uses. The test suite was also invaluable for
 313 working through the various situations that transactions can cause.
 314
 315 But, a test is only as good as the test-writer. For example, it was a while
 316 before I realized that I needed to test C<is( $db1-E<gt>{abc}, 'foo' )>
 317 I<before> modifying it in the transaction.
 318
 319 To pass that test, the code for retrieval needed to look first in the
 320 transaction's spot and if that spot had never been assigned to, look at the
 321 spot for the HEAD. While this is how SVN works, it wasn't an immediately
 322 obvious test to write.
 323
 324 =head2 The HEAD
 325
 326 In SVN, the HEAD revision is the latest revision checked into the repository.
 327 When you do a local modification, you're doing a modification to your copy of
 328 the HEAD. Then, you choose to either check in (C<commit()>) or revert
 329 (C<rollback()>) your changes.
 330
 331 In order to make the code work for the base case (no transaction running), the
 332 first entry in the transaction sector became the HEAD. Thus, it was assigned
 333 transaction ID 0. This also had the really neat side-benefit that C<if (
 334 $trans_id ) {}> will run the code if and only if L<DBM::Deep|DBM::Deep> is
 335 in a running transaction.
 336
 337 =head2 Ending the spike
 338
 339 At this point, I had
 340
 341 =head2 Protection from changes
 342
 343 Let's assume that a transaction is running in one process and another process
 344 is also modifying the same area in the data. The only way that process B can
 345 notify process A that a change has occurred is through the common point - the
 346 DBM file. Because of this, every process that changes the HEAD needs to
 347 protect all currently running transactions by copying over the pointer to the
 348 original value into every transaction which hasn't already modified this
 349 entry. (If it has, then the new value shouldn't overwrite the transaction's
 350 modification.) This is the key piece for providing I<Isolation>.
 351
 352 =head2 Tracking modified buckets
 353
 354 Rolling back changes is very simple - just don't apply them to the HEAD. The
 355 next time that transaction ID is reused, the changes will be ignored (q.v.
 356 L</Staleness counters>). Committing, however, requires that all the changes
 357 must be transferred over from the bucket entry for the given transaction ID to
 358 the entry for the HEAD.
 359
 360 This tracking is done by the modified buckets themselves. They register
 361 themselves with the engine as having been modified within the transaction.
 362 Given L</Protection from changes>, this only guarantees that only those
 363 changes made by the transaction will be transferred over.
 364
 365 =head2 Deleted marker
 366
 367 Transactions are performed copy-on-write. This means that if there isn't an
 368 entry for that transaction, the HEAD is looked at. This doesn't work if a key
 369 has been deleted within a transaction. So, the entry must be marked as deleted
 370 within the transaction so that the HEAD isn't checekd.
 371
 372 Likewise, when a new key is created in a transaction, the HEAD doesn't have an
 373 entry for that key. Consider the following situation:
 374
 375   ok( !exists $db1->{foo} );
 376   ok( !exists $db2->{foo} );
 377
 378   $db1->begin_work();
 379   $db1->{foo} = 'bar';
 380
 381   ok( !exists $db2->{foo} );
 382
 383 The entry for the HEAD for 'foo' needs to be marked as deleted so that
 384 transactions which don't have 'foo' don't find something in the HEAD.
 385
 386 =head2 Freespace management
 387
 388 The second major piece to the 1.00 release was freespace management. In
 389 pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
 390 recycled. While always a requested feature, the complexity required to
 391 implement freespace meant that it needed to wait for a complete rewrite of
 392 several pieces, such as the engine.
 393
 394 Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
 395 only has three different record sizes - Index, BucketList, and Data. Each
 396 record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
 397 datafile is created with. (In order to accomodate values of various sizes, Data
 398 records chain.) Whenever a sector is freed, it's added to a freelist of that
 399 sector's size. Whenever a new sector is requested, the freelist is checked
 400 first. If the freelist has a sector, it's reused, otherwise a new sector is
 401 added to the end of the datafile.
 402
 403 Freespace management did bring up another issue - staleness. It is possible to
 404 have a pointer to a record in memory. If that record is deleted, then reused,
 405 the pointer in memory has no way of determining that is was deleted and
 406 readded vs. modified. So, a staleness counter was added which is incremented
 407 every time the sector is reused through the freelist. If you then attempt to
 408 access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
 409 the entry was deleted.
 410
 411 =head2 Staleness counters
 412
 413 Once it was implemented for freespace management, staleness counters proved to
 414 be a very powerful concept for transactions themselves. Back in L</Protection
 415 from changes>, I mentioned that other processes modifying the HEAD will
 416 protect all running transactions from their effects. This provides
 417 I<Isolation>. But, the running transaction doesn't know about these entries.
 418 If they're not cleaned up, they will be seen the next time a transaction uses
 419 that transaction ID.
 420
 421 By providing a staleness counter for transactions, the costs of cleaning up
 422 finished transactions is deferred until the space is actually used again. This
 423 is at the cost of having less-than-optimal space utilization. Changing this in
 424 the future would be completely transparent to users, so I felt it was an
 425 acceptable tradeoff for quick delivery of a functional product.
 426
 427 =head1 The future
 428
 429 Basic transactions are only the first step. There are several features that
 430 can be added on top of what's been provided. (Note: these are advanced topics -
 431 I cannot be held responsible for any bleeding ears.)
 432
 433 =head2 Conflict resolution
 434
 435 Conflict, in this context, is what happens when two transactions run
 436 simultaneously and change the same piece of data. Like most databases,
 437 L<DBM::Deep|DBM::Deep> uses a very simplistic form of conflict resolution -
 438 last commit wins. This works quite well for a row-based RDBM, but doesn't work
 439 as well for fractal structures like hashes.
 440
 441 Contrast this with how Subversion handles conflict. It tracks when each
 442 transaction was started. If the HEAD is at a later version than when the
 443 transaction started, the commit is rejected. It is up to the developer to pull
 444 in the latest changes, mediate any conflicts, and then recommit.
 445
 446 =head2 Checkpoints
 447
 448 A transaction may have several steps within it. The first three may succeed,
 449 but the fourth might fail. Instead of rolling back all the way to the
 450 beginning, you might want to rollback to the last successful step and try
 451 again. This is a checkpoint. Most RDBMSes provide them, but we'll see if users
 452 request them for L<DBM::Deep|DBM::Deep>.
 453
 454 =head2 Sub-transactions
 455
 456 Similar to L</Checkpoints>, sub-transactions provide a mechanism for trying
 457 something within a transaction without affecting the transaction. However,
 458 instead of saying "These steps are safely finished," sub-transactions still
 459 provides for the ability to rollback the primary transaction all the way.
 460 Sub-transactions can also work better with libraries that may want to use
 461 transactions themselves.
 462
 463 =head2 Durability
 464
 465 As mentioned in L</What are transactions?>, the 'D' in ACID stands for
 466 I<Durable>. L<DBM::Deep|DBM::Deep> does not satisfy that criterion because
 467 durability almost always requires another file (or files) for a commit log. I
 468 deemed this unacceptable for this release because one of the
 469 L<DBM::Deep|DBM::Deep>'s features is the single datafile. However, Berkley DB
 470 provides durability with only a single file. Cribbing from them may be viable
 471 in the future.
 472
 473 =cut