articles/how_transactions_were_added.pod

   1 =head0 Adding transactions to DBM::Deep
   2
   3 For the past nine months, I've been working on adding transactions to
   4 L<DBM::Deep|DBM::Deep>. During that time, I looked far and wide for an
   5 accessible description of how a programmer should go about implementing
   6 transactions. The only things I found were either extremely pedantic academic
   7 papers or the code of complex applications. The former weren't very easy to
   8 read and the latter were less so N<Have I<you> ever tried to read the source
   9 for BDB or InnoDB? Reading perl's source is easier.>. This is the article I
  10 wished I'd been able to read nine months ago.
  11
  12 =head1 What is DBM::Deep?
  13
  14 L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
  15 storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
  16 in memory. The datafile produced is able to be transferred from one machine to
  17 another, regardless of OS or Perl version. There are several reasons why
  18 someone would want to do this.
  19
  20 =over 4
  21
  22 =item * Transparent Persistence
  23
  24 This is the ability to save a set of datastructures to disk and retrieve them
  25 later without the vast majority of the program ever knowing that the data is
  26 persisted. Furthermore, the datastructure is persisted immediately and not at
  27 set marshalling periods.
  28
  29 =item * Huge datastructures
  30
  31 Normally, datastructures are limited by the size of RAM the server has.
  32 L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
  33 size instead (up to the given perl's largefile support).
  34
  35 =item * Database
  36
  37 Most programmers hear the word "database" and think "relational database
  38 management system" (or RDBMS). A database is a more general term meaning
  39 "place one stores data." This can be relational, object, or something else.
  40 The software used to manage and query a database is a "database management
  41 system" (DBMS).
  42
  43 L<DBM::Deep|DBM::Deep> provides one half of a DBMS - the data storage part.
  44 Once the datastructures on disk, L<DBM::Deep|DBM::Deep> provides the
  45 capability to allow multiple processes to access the data. N<Presto is a start
  46 at providing the other half of the DBMS using DBm::Deep as the engine.>
  47
  48 =back
  49
  50 =head1 How does DBM::Deep work?
  51
  52 L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
  53 read and write go directly to the file and modify the file immediately. To
  54 represent Perl's hashes and arrays, a record-based file format is used. There
  55 is a file header storing file-wide values, such as the size of the internal
  56 file pointers.  Afterwards, there are the data records.
  57
  58 The most important feature of L<DBM::Deep|DBM::Deep> is that it can be
  59 completely transparent. Other than the line tying the variable to the file, no
  60 other part of your program needs to know that the variable being used isn't a
  61 "normal" Perl variable. So, the following works just fine:
  62
  63   # As far as the rest of the program is concerned, the following two lines
  64   # are identical - they produce a variable $foo that can be used as a hashref.
  65   # my $foo = {};
  66   my $foo = DBM::Deep->new( 'mydb.db' );
  67
  68   $foo->{bar} = 'baz';
  69   $foo->{complex} = [
  70     { a => 'b' }, 0, '123', undef, [ 1 .. 5 ],
  71   ];
  72
  73   # And so on and so forth.
  74
  75 =head2 DBM::Deep's file structure
  76
  77 L<DBM::Deep|DBM::Deep>'s file structure is record-based. The key (or array
  78 index - arrays are currently just funny hashes internally) is hashed using MD5
  79 and then stored in a cascade of Index and bucketlist records. The bucketlist
  80 record stores the actual key string and pointers to where the data records are
  81 stored. The data records themselves are one of Null, Scalar, or Reference.
  82 Null represents an I<undef>, Scalar represents a string (numbers are
  83 stringified internally for simplicity) and are allocated in fixed-size chunks.
  84 Reference represent an array or hash reference and contains a pointer to an
  85 Index and bucketlist cascade of its own. Reference will also store the class
  86 the hash or array reference is blessed into, meaning that almost all objects
  87 can be stored safely.
  88
  89 =head2 DBM::Deep's class hierarchy
  90
  91 Managing all of these functions takes a lot of different abstractions. There
  92 are at least 3 different interfacing layers, and more if you look hard enough.
  93 To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:
  94
  95 =over 4
  96
  97 =item * Tying classes
  98
  99 These are the classes that manage the external face of the module. They manage
 100 B<all> of the interactions with outside code - OO interface, tying, and
 101 various utility methods. If they cannot handle the request themselves, they
 102 delegate to the engine. There are currently three classes in this layer.
 103
 104 =item * Engine classes
 105
 106 These classes manage the file format and all of the ways that the records
 107 interact with each other. Nearly every call will make requests to the File
 108 classes for reading and/or writing data to the file. There are currently nine
 109 classes in this layer, including a class for each record type.
 110
 111 =item * File class
 112
 113 This class mediates all interaction with the file system. Every read, write,
 114 lock, and unlock goes through this class. There is currently one class in this
 115 layer.
 116
 117 =item * Iterator classes
 118
 119 These are introspection classes that provide iteration for hashes. They manage
 120 keeping track of where the next key should be and how to get there. There are
 121 currently four classes in this layer.
 122
 123 =back
 124
 125 =head1 Why add transactions to DBM::Deep?
 126
 127 For the most part, L<DBM::Deep|DBM::Deep> functions perfectly well without
 128 transactions. Most uses that I've seen tend to be either single-process
 129 read/write persistence or a multi-process readonly cache for an expensive, but
 130 static, lookup. With transactions, L<DBM::Deep|DBM::Deep> can now be used
 131 safely for multi-process read/write persistence in situations that don't
 132 need (or cannot use) a full RDBMS.
 133
 134 =head2 What are transactions?
 135
 136 Originally from the database world, a transaction is a way of isolating the
 137 effects of a given set of actions, then applying them all at once. It's a way
 138 of saying "I'm going to try the following steps, see if I like the result,
 139 then I want everyone else looking at this datastore to see the results
 140 immediately." The most common example is taken from banking. Let's say that an
 141 application receives a request to have Joe pay Bob five zorkmids. Without
 142 transactions, the application would take the money from Joe's account, then
 143 add the money to Bob's account. But, what happens if the application crashes
 144 after debiting Joe, but before crediting Bob? The application has made money
 145 disappear. Or, vice versa, if Bob is credited before Joe is debited, the
 146 application has created money.
 147
 148 With a transaction wrapping the money transfer, if the application crashes in
 149 the middle, it's as if the action never happened. So, when the application
 150 recovers from the crash, Joe and Bob still have the same amount of money in
 151 their accounts as they did before. The transaction can restart and Bob can
 152 finally receive his zorkmids.
 153
 154 More formally, transactions are generally considered to be proper when they are
 155 ACID-compliant. ACID is an acronym that stands for the following:
 156
 157 =over 4
 158
 159 =item * Atomic
 160
 161 Either every change happens or none of the changes happen.
 162
 163 =item * Consistent
 164
 165 When the transaction begins and when it is committed, the database must be in
 166 a legal state. This condition doesn't apply to L<DBM::Deep|DBM::Deep> as all
 167 Perl datastructures are internally consistent.
 168
 169 =item * Isolated
 170
 171 As far as a transaction is concerned, it is the only thing running against the
 172 database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep>
 173 provides the strongest isolation level possible, usually called
 174 I<Serializable> by most RDBMSes.
 175
 176 =item * Durable
 177
 178 Once the database says that a commit has happened, the commit will be
 179 guaranteed, regardless of whatever happens. I chose to not implement this
 180 condition in L<DBM::Deep|DBM::Deep> N<This condition requires the presence of
 181 at least one more file, which violates one of the original design goals.>.
 182
 183 =back
 184
 185 =head2 Why add them to DBM::Deep?
 186
 187 The ability to have actions occur in either I<atomically> (as in the previous
 188 example) or I<isolation> from the rest of the users of the data is a powerful
 189 thing. This allows for a large amount of safety and predictability in how
 190 data transformations occur. Imagine, for example, that you have a set of
 191 calculations that will update various variables. However, there are some
 192 situations that will cause you to throw away all results and start over with a
 193 different seed. Without transactions, you would have to put everything into
 194 temporary variables, then transfer the values when the calculations were found
 195 to be successful. If you ever add a new value or if a value is used in only
 196 certain calculations, you may forget to do the correct thing. With
 197 transactions, you start a transaction and do your thing within it. If the
 198 calculations succeed, you commit. If they fail, you rollback and try again.
 199
 200 If you're thinking that this is very similar to how Subversion (SVN) or CVS
 201 works, you're absolutely correct - they are transactional in exactly the same
 202 way.
 203
 204 =head1 How it happened
 205
 206 The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the
 207 single most complex software endeavor I've ever undertaken. While transactions
 208 are conceptually simple, the devil is in the details. And there were a B<lot>
 209 of details.
 210
 211 =head2 The naive approach
 212
 213 Initially, I hoped I could just copy the entire datastructure and mark it as
 214 owned by the transaction. This is the most straightforward solution and is
 215 extremely simple to implement. Whenever a transaction starts, copy the whole
 216 thing over to somewhere else. If the transaction is committed, overwrite the
 217 original with the transaction's version. If it's rolled back, throw it away.
 218
 219 It's a popular solution as seen by the fact that it's the mechanism used in
 220 both L<Data::Transactional|Data::Transactional> and
 221 L<Tie::Scalar::Transactional|Tie::Scalar::Transactional>. While very simple to
 222 implement, it scales very poorly as the datastructure grows. As one of the
 223 primary usecases for L<DBM::Deep|DBM::Deep> is working with huge
 224 datastructures, this plan was dead on arrival.
 225
 226 =head2 The relational approach
 227
 228 As I'm also a MySQL DBA, I looked to how the InnoDB engine implements
 229 transactions. Given that relational databases are designed to work with large
 230 amounts of data, it made sense to look here next.
 231
 232 InnoDB implements transactions using MVCC
 233 N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. When a
 234 transaction starts, it stores a timestamp corresponding to its start time.
 235 Whenever a modification to a row is committed, the modification is
 236 timestamped. When a transaction modifies a row, it copies the row into its
 237 own scratchpad and modifies it. Whenever a transaction reads a row, it first
 238 attempts to read the row from its scratchpad. If it's not there, then it reads
 239 the version of the row whose timestamp is no later than the timestamp of the
 240 transaction. When committing, the transaction's scratchpad is written out to
 241 the main data area with the timestamp of the commit and the scratchpad is
 242 thrown away. When rolling back, the scratchpad is thrown out.
 243
 244 At first, this mechanism looked promising and I whipped up a couple spikes
 245 (or code explorations) to try it out. The problem I ran into, again, was the
 246 existence of large datastructures. When making large changes to a relational
 247 database within a transaction, the engine can store the rows within the actual
 248 table and mark them as being part of a transaction's scratchpad. Perl's
 249 fractal datastructures, however, don't lend themselves to this kind of
 250 treatment. The scratchpad would, in some pathological cases, be a
 251 near-complete copy of the original datastructure. N<Funnily enough, this is
 252 yet another example of the object relational impedance mismatch
 253 (http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch).>
 254
 255 =head2 The subversive approach
 256
 257 Despairing, I went to YAPC::NA::2006 hoping to discuss the problem with the
 258 best minds in the Perl community. I was lucky enough to run into both Audrey
 259 Tang (author of Pugs) and clkao (author of SVK). In between talks, I managed
 260 to discuss the problems I'd run into with both of them. They looked at me
 261 oddly and asked why I wasn't looking at Subversion (SVN) as a model for
 262 transactions. My first reaction was "It's a source control application. What
 263 does it know about transa- . . . Ohhhh!" And they smiled.
 264
 265 Like Perl datastructures, a filesystem is fractal. Directories contain both
 266 files and directories. Directories act as hashes and a files act as scalars
 267 whose names are their hashkeys. When a modification is made to a SVN checkout,
 268 SVN tracks the changes at the filename (or directory name) level. When a
 269 commit is made, only those filenames which have changes are copied over to the
 270 HEAD. Everything else remains untouched.
 271
 272 Translating this to hashes and hashkeys, this implies that transactional
 273 information should be stored at the level of the hashkey. Or, in
 274 L<DBM::Deep|DBM::Deep> terms, within the bucket for that key. As a nice
 275 side-effect, other than the key's datastructure within the bucket, the entire
 276 datafile is unaware of anything to do with transactions.
 277
 278 =head2 The spike
 279
 280 Spikes are kind of like a reconnaissance mission in the military. They go out
 281 to get intel on the enemy and are explicitly not supposed to take any ground
 282 or, in many cases, take out of the enemy forces. In coding terms, the spike is
 283 code meant to explore a problemspace that you B<will> throw away and
 284 reimplement.
 285
 286 As transactions were going to be between the bucket for the key and the
 287 datapointer to the value, my first thought was to put in another sector that
 288 would handle this mapping. This had the advantage of changing nothing except
 289 for adding one new sector type and the handling for it. Just doing this got me
 290 to the point where I could pass the following test:
 291
 292   my $db1 = DBM::Deep->new( $filename );
 293   my $db2 = DBM::Deep->new( $filename );
 294
 295   $db1->{abc} = 'foo';
 296
 297   is( $db1->{abc}, 'foo' );
 298   is( $db2->{abc}, 'foo' );
 299
 300   $db1->begin_work();
 301
 302       $db1->{abc} = 'floober';
 303
 304       is( $db1->{abc}, 'floober' );
 305       is( $db2->{abc}, 'foo' );
 306
 307 Just that much was a major accomplishment.
 308
 309 =head2 Tests, tests, and more tests
 310
 311 I was lucky that when I took over L<DBM::Deep|DBM::Deep> that Joe Huckaby
 312 (the original author) handed me a comprehensive test suite. This meant that I
 313 could add in transactions with a high degree of confidence that I hadn't
 314 messed up non-transactional uses. The test suite was also invaluable for
 315 working through the various situations that transactions can cause.
 316
 317 But, a test is only as good as the test-writer. For example, it was a while
 318 before I realized that I needed to test C<is( $db1-E<gt>{abc}, 'foo' )>
 319 I<before> modifying it in the transaction.
 320
 321 To pass that test, the code for retrieval needed to look first in the
 322 transaction's spot and if that spot had never been assigned to, look at the
 323 spot for the HEAD. While this is how SVN works, it wasn't an immediately
 324 obvious test to write.
 325
 326 =head2 The HEAD
 327
 328 In SVN, the HEAD revision is the latest revision checked into the repository.
 329 When you do a local modification, you're doing a modification to your copy of
 330 the HEAD. Then, you choose to either check in (C<commit()>) or revert
 331 (C<rollback()>) your changes.
 332
 333 In order to make the code work for the base case (no transaction running), the
 334 first entry in the transaction sector became the HEAD. Thus, it was assigned
 335 transaction ID 0. This also had the really neat side-benefit that C<if (
 336 $trans_id ) {}> will run the code if and only if L<DBM::Deep|DBM::Deep> is
 337 in a running transaction.
 338
 339 =head2 Ending the spike
 340
 341 At this point, I had learned everything I needed from the spike. Yes, the
 342 SVN idea looked like it was going to work. Yes, there were a lot of squibbly
 343 details. No, it wasn't going to be finished before I left YAPC::NA. *sigh*
 344
 345 The biggest lessons learned from the spike were:
 346
 347 =over 4
 348
 349 =item 1 Tests are good
 350
 351 I seem to have to relearn this every project I work on. It's pretty sad, if
 352 you ask me.
 353
 354 =item 1 The transaction sector is superfluous
 355
 356 As I worked with it, the transaction sector didn't add any information over
 357 extending the actual bucket to have the transaction to datapointer mapping
 358 within it.
 359
 360 =back
 361
 362 =head2 Protection from changes
 363
 364 After the missed test for checking that starting a transaction didn't lose the
 365 connection to the HEAD, I started writing more and more tests, being very anal
 366 about what I was checking. I wrote tests to check every piece of state I could
 367 think of before and after every change in state, regardless of where the
 368 change was made. Writing these tests immediately raised an issue with changing
 369 the HEAD while a transaction is running. If the transaction has already edited
 370 that key, it already has its new value. However, if it doesn't, it needs to be
 371 protected from the change to the HEAD. This is the key piece for providing
 372 I<Isolation>.
 373
 374 My first attempt to solve this problem focused on having the transaction
 375 itself detect changes. But, the primary usecase for transactions is that each
 376 transaction is going to be running in a separate process. Without implementing
 377 IPC, the only common point between various processes is the datafile itself.
 378 The only process aware of the change is the process making the change. Even
 379 though it seemed counter-intuitive, the only sane mechanism was that each
 380 process modifying the HEAD would also protect all running transactions from
 381 its change, if needed.
 382
 383 =head2 Committing and rolling back
 384
 385 Now that changes are able to be made within a transaction and the transaction,
 386 the HEAD, and other transactions are protected from one other, the next step
 387 was to provide the ability to both commit and rollback these changes.
 388
 389 =head3 Rollback
 390
 391 Conceptually, rolling back should the simpler to implement - just discard the
 392 changes that have been made within the transaction and continue onward with
 393 the HEAD. And, for the first attempt, that is exactly what I did. This meant
 394 that the following test would pass:
 395
 396   $db->{foo} = 'bar';
 397
 398   $db->begin_work;
 399
 400   is( $db->{foo}, 'bar' );
 401
 402   $db->{foo} = 'baz';
 403
 404   is( $db->{foo}, 'baz' );
 405
 406   $db->rollback;
 407
 408   is( $db->{foo}, 'bar' );
 409
 410 But, this didn't work out very well for repeated use of that transaction slot.
 411 I threw a number of solutions at the problem, but none of them were
 412 addressing the real issue - knowing which use of a transaction ID made the
 413 change vs. which use of a transaction ID was accessing the value.
 414
 415 XXX
 416
 417 =head3 Committing
 418
 419 Committing is much harder than rolling back. The primary difficulty lies in
 420 tracking exactly what this transaction has changed in order to copy those
 421 changed bucket entries over to the HEAD. The good news is that only the actual
 422 datapointers for that transaction need to be copied over - the actual data
 423 sectors are left untouched.
 424
 425 The key to the solution lay in the decoupled nature of the code I was writing
 426 along with the fact that every piece of the code had access to the engine
 427 object, if needed. Committing (and rolling back) are both handled by the
 428 Engine object. To get that information into the engine, each bucket modified
 429 by the transaction would inform the engine object that it had been modified by
 430 that transaction. When a commit occurs, the engine objet iterates over the
 431 modified buckets and transfers over the new datapointer and discards the old
 432 one.
 433
 434 =head2 Deleted marker
 435
 436 After some more tests, a final edge-case was found. Transactions are performed
 437 copy-on-write. This means that if there isn't an entry for that transaction,
 438 the HEAD is looked at. This doesn't work if a key has been deleted within a
 439 transaction. So, the entry must be marked as deleted within the transaction so
 440 that the HEAD isn't checekd.
 441
 442 Likewise, when a new key is created in a transaction, the HEAD doesn't have an
 443 entry for that key. Consider the following situation:
 444
 445   ok( !exists $db1->{foo} );
 446   ok( !exists $db2->{foo} );
 447
 448   $db1->begin_work();
 449   $db1->{foo} = 'bar';
 450
 451   ok( !exists $db2->{foo} );
 452
 453 The entry for the HEAD for 'foo' needs to be marked as deleted so that
 454 transactions which don't have 'foo' don't find something in the HEAD. To add
 455 this, I originally used a separate flag for each datapointer to indicate if it
 456 had been marked as deleted or not. I quickly recognized that a data-pointer
 457 can never have a value of 0 or 1 as those would point to the first and second
 458 bytes of the datafile, respectively. As these are part of the header, those
 459 are nonsensical values, so can be re-used for metadata. 0 now means "This
 460 slot has never been written to" and 1 means "This slot has been explicitly
 461 deleted."
 462
 463 =head2 Freespace management
 464
 465 Pre-1.0000 versions of L<DBM::Deep|DBM::Deep> didn't have any form of
 466 freespace management. This meant that whenever a value was deleted, the old
 467 value just sat around taking up space, even though it would never be accessed
 468 again. While barely acceptable for non-transactional uses, this was made
 469 transactions unusable because transactions, as I've implemented them, are
 470 predicated on the concept of parallel values that are (presumably) cleaned up
 471 after the transaction is done with them.
 472
 473 Freespace had never been added before because it requires a different file
 474 format than the one used in the pre-1.0000 versions. Because I had to change
 475 the file format anyways B<and> I needed the feature, adding freespace now
 476 seemed like a good plan.
 477
 478 Freespace was implemented by regularizing all the records so that
 479 L<DBM::Deep|DBM::Deep> only has three different record sizes - Index,
 480 BucketList, and Data. Each record type has a fixed length based on various
 481 parameters the L<DBM::Deep|DBM::Deep> datafile is created with. (In order to
 482 accomodate values of various sizes, Data records chain.) Whenever the engine
 483 is finished with a sector, it is freed and added to a list of free sectors of
 484 that sector's size. Whenever a new sector is requested, the freelist is
 485 checked first. If the freelist has a sector, it's reused, otherwise a new
 486 sector is added to the end of the datafile.
 487
 488 Just like everything else, I wrote a mess of tests for adding freespace
 489 management. One of the tests I thought up was the following:
 490
 491   $db->{foo} = [ 1 .. 3];
 492   my $arr = $db->{foo};
 493
 494   is( $arr->[1], 2 ); # This always has worked.
 495
 496   delete $db->{foo};
 497
 498   isnt( $arr->[1], 2 );
 499
 500 If this was a Perl datastructure, the last test should pass. In the past, that
 501 test would fail. The key concept I realized was that the C<$arr> variable is
 502 pointing to a stale area in memory. So, by adding a staleness counter that is
 503 incremented whenever the sector in use is deleted, I am able to determine if
 504 the variable in question is looking for a stale version of the sector. At this
 505 point, L<DBM::Deep|DBM::Deep> returns undef because, at some point, the entry
 506 was deleted.
 507
 508 =head2 Transactional staleness counters
 509
 510 Once it was implemented for freespace management, staleness counters proved to
 511 be a very powerful concept for transactions themselves. Back in L</Protection
 512 from changes>, I mentioned that other processes modifying the HEAD will
 513 protect all running transactions from their effects. This provides
 514 I<Isolation>. But, the running transaction doesn't know about these entries.
 515 This is both a benefit and a drawback. It's a benefit that it makes tracking
 516 modified buckets very simple (q.v. L</Committing>). But, it means that changes
 517 made to protect the transaction are not tracked.  If they're not cleaned up,
 518 they will be seen the next time a transaction uses that transaction ID.
 519
 520 By providing a staleness counter for transactions, the costs of cleaning up
 521 finished transactions is deferred until the space is actually used again. This
 522 is at the cost of having less-than-optimal space utilization. Changing this in
 523 the future would be completely transparent to users, so I felt it was an
 524 acceptable tradeoff for quick delivery of a functional product.
 525
 526 =head2 Fiddly bits
 527
 528 At this point, all the major pieces were in place. All that was left was to
 529 get all the fiddly bits into place. This included handling the behavior of
 530 C<keys()>, simultaneous transactions with commits and rollbacks in various
 531 order, and making sure that transactions played nicely when a new Index sector
 532 needed to be created due to reindexing a full Bucketlist sector. Of these,
 533 C<keys()> was the hardest. This is when I actually implemented the Iterator classes
 534 to handle walking the index/bucketlist chain.
 535
 536 =head1 The future
 537
 538 Basic transactions are only the first step. There are several features that
 539 can be added on top of what's been provided. If and in which order any of
 540 these are implemented is completely up to user-feedback. (Note: these are
 541 advanced topics - I cannot be held responsible for any bleeding ears.)
 542
 543 =head2 Conflict resolution
 544
 545 Conflict, in this context, is what happens when two transactions run
 546 simultaneously and change the same piece of data. Like most relational databases,
 547 L<DBM::Deep|DBM::Deep> uses a very simplistic form of conflict resolution -
 548 last commit wins. This works quite well for a row-based RDBMS, but doesn't work
 549 as well for fractal structures like hashes.
 550
 551 Contrast this with how Subversion handles conflict. It tracks when each
 552 transaction was started. If the HEAD was changed after the transaction
 553 started, the commit is rejected. It is up to the developer to pull in the
 554 latest changes, mediate any conflicts, and then recommit. There are several
 555 other ways to handle conflict resolution, many of which can be pulled from
 556 Haskell's use of Software Transactional Memory (STM).
 557
 558 =head2 Checkpoints
 559
 560 A transaction may have several steps within it. The first three may succeed,
 561 but the fourth might fail. Instead of rolling back all the way to the
 562 beginning, you might want to rollback to the last successful step and try
 563 again. This is a checkpoint. Most RDBMSes provide them and they aren't very
 564 difficult, conceptually, but I've seen just how "easy" some features can be
 565 once you start really exploring the problemspace.
 566
 567 =head2 Sub-transactions
 568
 569 Similar to L</Checkpoints>, sub-transactions provide a mechanism for trying
 570 something within a transaction without affecting the transaction. However,
 571 instead of saying "These steps are safely finished," sub-transactions still
 572 provides for the ability to rollback the primary transaction all the way.
 573 Sub-transactions can also work better with libraries that may want to use
 574 transactions themselves.
 575
 576 This, of all the features listed, is the one I'm most interested in
 577 implementing next.
 578
 579 =head2 Durability
 580
 581 As mentioned in L</What are transactions?>, the 'D' in ACID stands for
 582 I<Durable>. L<DBM::Deep|DBM::Deep> does not satisfy that criterion because
 583 durability almost always requires another file (or files) for a commit log. I
 584 deemed this unacceptable for this release because one of the
 585 L<DBM::Deep|DBM::Deep>'s features is the single datafile. To be honest, I
 586 don't anticipate this to be an issue for most users because the niche that
 587 L<DBM::Deep|DBM::Deep> occupies is one that is tolerant to failure and a small
 588 chance of potential dataloss.
 589
 590 However, Berkley DB does provide durability with only a single file. If it
 591 becomes necessary, cribbing from them could be an option.
 592
 593 =cut