article.pod

   1 =head0 Adding transactions to DBM::Deep
   2
   3 For the past 9 months, I've been working on adding transactions to
   4 L<DBM::Deep|DBM::Deep>. During that time, I looked far and wide for an
   5 accessible description of how a programmer should go about implementing
   6 transactions. The only things I found were either extremely pedantic academic
   7 papers or code. The former weren't very easy to read and the latter were less
   8 soN<Have I<you> ever tried to read the source for BDB or InnoDB? Reading
   9 perl's source is easier.>. This is the article I wished I'd been able to read
  10 nine months ago.
  11
  12 =head1 What is DBM::Deep?
  13
  14 L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
  15 storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
  16 in memory. The datafile produced is able to be transferred from one machine to
  17 another, regardless of OS or Perl version. There are several reasons why
  18 someone would want to do this.
  19
  20 =over 4
  21
  22 =item * Transparent Persistence
  23
  24 This is the ability to save a set of data structures to disk and retrieve them
  25 later without the vast majority of the program even knowing that the data is
  26 persisted. Furthermore, the datastructure is persisted immediately and not at
  27 set marshalling periods.
  28
  29 =item * Huge datastructures
  30
  31 Normally, datastructures are limited by the size of RAM the server has.
  32 L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
  33 instead (up to the given perl's largefile support).
  34
  35 =item * Database
  36
  37 Once you have persistent datastructures, you start wanting to have multiple
  38 processes be able to use that data. (More on this later.)
  39
  40 =back
  41
  42 =head1 How does DBM::Deep work?
  43
  44 L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
  45 read and write go directly to the file and modify the file immediately. To
  46 represent Perl's hashes and arrays, a record-based file format is used. There
  47 is a file header storing file-wide values, such as the size of the internal
  48 file pointers.  Afterwards, there are the data records.
  49
  50 The most important feature of L<DBM::Deep|DBM::Deep> is that it can be
  51 completely transparent. Other than the line tying the variable to the file, no
  52 other part of your program needs to know that the variable being used isn't a
  53 "normal" Perl variable. So, the following works just fine:
  54
  55   # As far as the rest of the program is concerned, the following two lines
  56   # are identical - they produce a variable $foo that can be used as a hashref.
  57   # my $foo = {};
  58   my $foo = DBM::Deep->new( 'mydb.db' );
  59
  60   $foo->{bar} = 'baz';
  61   $foo->{complex} = [
  62     { a => 'b' }, 0, '123', undef, [ 1 .. 5 ],
  63   ];
  64
  65   # And so on and so forth.
  66
  67 =head2 DBM::Deep's file structure
  68
  69 L<DBM::Deep|DBM::Deep>'s file structure is record-based. The key (or array
  70 index - arrays are currently just funny hashes internally) is hashed using MD5
  71 and then stored in a cascade of Index and Bucketlist records. The bucketlist
  72 record stores the actual key string and pointers to where the data records are
  73 stored. The data records themselves are one of Null, Scalar, or Reference.
  74 Null represents an I<undef>, Scalar represents a string (numbers are
  75 stringified internally for simplicity) and are allocated in 256byte chunks.
  76 Reference represent an array or hash reference and contains a pointer to an
  77 Index and Bucketlist cascade of its own. Reference will also store the class
  78 the hash or array reference is blessed into, meaning that almost all objects
  79 can be stored safely.
  80
  81 =head2 DBM::Deep's class hierarchy
  82
  83 Managing all of these functions takes a lot of different abstractions. There
  84 are at least 3 different interfacing layers, and more if you look hard enough.
  85 To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:
  86
  87 =over 4
  88
  89 =item * Tying classes
  90
  91 These are the classes that manage the external face of the module. They manage
  92 B<all> of the interactions with outside code - OO interface, tying, and
  93 various utility methods. If they cannot handle the request themselves, they
  94 delegate to the engine. There are currently three classes in this layer.
  95
  96 =item * Engine classes
  97
  98 These classes manage the file format and all of the ways that the records
  99 interact with each other. Nearly every call will make requests to the File
 100 classes for reading and/or writing data to the file. There are currently nine
 101 classes in this layer, including a class for each record type.
 102
 103 =item * File class
 104
 105 This class mediates all interaction with the file system. Every read, write,
 106 lock, and unlock goes through this class. There is currently one class in this
 107 layer.
 108
 109 =item * Iterator classes
 110
 111 These are introspection classes that provide iteration for hashes. They manage
 112 keeping track of where the next key should be and how to get there. There are
 113 currently four classes in this layer.
 114
 115 =back
 116
 117 =head1 Why add transactions to DBM::Deep?
 118
 119 For the most part, L<DBM::Deep|DBM::Deep> functions perfectly well without
 120 transactions. Most uses that I've seen tend to be either single-process
 121 read/write persistence or a multi-process readonly cache for an expensive, but
 122 static, lookup. With transactions, L<DBM::Deep|DBM::Deep> can now be used
 123 safely for multi-process read/write persistence in situations that don't
 124 need (or cannot use) a full RDBMS.
 125
 126 =head2 What are transactions?
 127
 128 Originally from the database world, a transaction is a way of isolating the
 129 effects of a given set of actions, then applying them all at once. It's a way
 130 of saying "I'm going to try the following steps, see if I like the result,
 131 then I want everyone else looking at this datastore to see the results
 132 immediately." The most common example is taken from banking. Let's say that an
 133 application receives a request to have Joe pay Bob five zorkmids.  Without
 134 transactions, the application would take the money from Joe's account, then
 135 add the money to Bob's account. But, what happens if the application crashes
 136 after debiting Joe, but before crediting Bob? The application has made money
 137 disappear. Or, vice versa, if Bob is credited before Joe is debited, the
 138 application has created money.
 139
 140 With a transaction wrapping the money transfer, if the application crashes in
 141 the middle, it's as if the action never happened. So, when the application
 142 recovers from the crash, Joe and Bob still have the same amount of money in
 143 their accounts as they did before. The transaction can restart and Bob can
 144 finally receive his zorkmids.
 145
 146 More formally, transactions are generally considered to be proper when they are
 147 ACID-compliant. ACID is an acronym that stands for the following:
 148
 149 =over 4
 150
 151 =item * Atomic
 152
 153 Either every change happens or none of the changes happen.
 154
 155 =item * Consistent
 156
 157 When the transaction begins and when it is committed, the database must be in
 158 a legal state. This condition doesn't apply to L<DBM::Deep|DBM::Deep> as all
 159 Perl data structures are internally consistent.
 160
 161 =item * Isolated
 162
 163 As far as a transaction is concerned, it is the only thing running against the
 164 database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep>
 165 provides the strongest isolation level possible, usually called
 166 I<Serializable> by most RDBMSes.
 167
 168 =item * Durable
 169
 170 Once the database says that a commit has happened, the commit will be
 171 guaranteed, regardless of whatever happens. I chose to not implement this
 172 condition in L<DBM::Deep|DBM::Deep>N<This condition requires the prescence of
 173 at least one more file, which violates the original design goals.>.
 174
 175 =back
 176
 177 =head2 Why add them to DBM::Deep?
 178
 179 The ability to have actions occur in either I<atomically> (as in the previous
 180 example) or I<isolation> from the rest of the users of the data is a powerful
 181 thing. This allows for a large amount of safety and predictability in how
 182 data transformations occur. Imagine, for example, that you have a set of
 183 calculations that will update various variables. However, there are some
 184 situations that will cause you to throw away all results and start over with a
 185 different seed. Without transactions, you would have to put everything into
 186 temporary variables, then transfer the values when the calculations were found
 187 to be successful. If you ever add a new value or if a value is used in only
 188 certain calculations, you may forget to do the correct thing. With
 189 transactions, you start a transaction and do your thing within it. If the
 190 calculations succeed, you commit. If they fail, you rollback and try again. If
 191 you're thinking that this is very similar to how SVN or CVS works, you're
 192 absolutely correct - they are transactional in exactly the same way.
 193
 194 =head1 How it happened
 195
 196 =head2 The backstory
 197
 198 The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the
 199 single most complex software endeavor I've ever undertaken. The first step was
 200 to figure out exactly how transactions were going to work. After several
 201 spikesN<These are throwaway coding explorations.>, the best design seemed to
 202 look to SVN instead of relational databases. The more I investigated, the more
 203 I ran up against the object-relational impedance mismatch
 204 N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
 205 time in terms of being able to translate designs. In the relational world,
 206 transactions are generally implemented either as row-level locks or using MVCC
 207 N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>.  Both of
 208 these assume that there is a I<row>, or singular object, that can be locked
 209 transparently to everything else. This doesn't translate to a fractally
 210 repeating structure like a hash or an array.
 211
 212 However, the design used by SVN deals with directories and files which
 213 corresponds very closely to hashes and hashkeys. In SVN, the modifications are
 214 stored in the file's metadata. Translating this to hashes and hashkeys, this
 215 means that transactional information should be stored in the key's metadata.
 216 Or, in L<DBM::Deep|DBM::Deep> terms, within the Bucket for that key. As a nice
 217 side-effect, the entire datafile is unaware of anything to do with
 218 transactions, except for the key's data structure within the bucket.
 219
 220 =head2 Transactions in the keys
 221
 222 The first pass was to expand the Bucketlist sector to go from a simple key /
 223 datapointer mapping to a more complex key / transaction / datapointer mapping.
 224 Initially, I interposed a Transaction record that the bucketlist pointed to.
 225 That then contained the transaction / datapointer mapping. This had the
 226 advantage of changing nothing except for adding one new sector type and the
 227 handling for it. This was very quickly merged into the Bucketlist record to
 228 simplify the resulting code.
 229
 230 This first step got me to the point where I could pass the following test:
 231
 232   my $db1 = DBM::Deep->new( $filename );
 233   my $db2 = DBM::Deep->new( $filename );
 234
 235   $db1->{abc} = 'foo';
 236
 237   is( $db1->{abc}, 'foo' );
 238   is( $db2->{abc}, 'foo' );
 239
 240   $db1->begin_work();
 241
 242       is( $db1->{abc}, 'foo' );
 243       is( $db2->{abc}, 'foo' );
 244
 245       $db1->{abc} = 'floober';
 246
 247       is( $db1->{abc}, 'floober' );
 248       is( $db2->{abc}, 'foo' );
 249
 250 Just that much was a major accomplishment. The first pass only looked in the
 251 transaction's spot in the bucket for that key. And, that passed my first tests
 252 because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
 253 modifying it in the transaction. To pass that test, the code for retrieval
 254 needed to look first in the transaction's spot and if that spot had never been
 255 assigned to, look at the spot for the HEAD.
 256
 257 =head2 The concept of the HEAD
 258
 259 This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
 260 revision checked into the repository. When you do a local modification, you're
 261 doing a modification to your copy of the HEAD. Then, you choose to either
 262 check in your code (commit()) or revert (rollback()).
 263
 264 In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
 265 benefits:
 266
 267 =over 4
 268
 269 =item * Easy identifiaction of a transaction
 270
 271 C<if ( $trans_id ) {}> will run the code if and only if we are in a running
 272 transaction.
 273
 274 =item * The HEAD is the first bucket
 275
 276 In a given bucket, the HEAD is the first datapointer because we mutliply the
 277 size of the transactional bookkeeping by the transaction ID to find the offset
 278 to seek into the file.
 279
 280 =back
 281
 282 =head2 Protection from changes
 283
 284 Let's assume that a transaction is running in one process and another process
 285 is also modifying the same area in the data. The only way that process B can
 286 notify process A that a change has occurred is through the common point - the
 287 DBM file. Because of this, every process that changes the HEAD needs to
 288 protect all currently running transactions by copying over the pointer to the
 289 original value into every transaction which hasn't already modified this
 290 entry. (If it has, then the new value shouldn't overwrite the transaction's
 291 modification.) This is the key piece for providing I<Isolation>.
 292
 293 =head2 Tracking modified buckets
 294
 295 Rolling back changes is very simple - just don't apply them to the HEAD. The
 296 next time that transaction ID is reused, the changes will be ignored (q.v.
 297 L</Staleness counters>). Committing, however, requires that all the changes
 298 must be transferred over from the bucket entry for the given transaction ID to
 299 the entry for the HEAD.
 300
 301 This tracking is done by the modified buckets themselves. They register
 302 themselves with the engine as having been modified within the transaction.
 303 Given L</Protection from changes>, this only guarantees that only those
 304 changes made by the transaction will be transferred over.
 305
 306 =head2 Deleted marker
 307
 308 Transactions are performed copy-on-write. This means that if there isn't an
 309 entry for that transaction, the HEAD is looked at. This doesn't work if a key
 310 has been deleted within a transaction. So, the entry must be marked as deleted
 311 within the transaction so that the HEAD isn't checekd.
 312
 313 Likewise, when a new key is created in a transaction, the HEAD doesn't have an
 314 entry for that key. Consider the following situation:
 315
 316   ok( !exists $db1->{foo} );
 317   ok( !exists $db2->{foo} );
 318
 319   $db1->begin_work();
 320   $db1->{foo} = 'bar';
 321
 322   ok( !exists $db2->{foo} );
 323
 324 The entry for the HEAD for 'foo' needs to be marked as deleted so that
 325 transactions which don't have 'foo' don't find something in the HEAD.
 326
 327 =head2 Freespace management
 328
 329 The second major piece to the 1.00 release was freespace management. In
 330 pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
 331 recycled. While always a requested feature, the complexity required to
 332 implement freespace meant that it needed to wait for a complete rewrite of
 333 several pieces, such as the engine.
 334
 335 Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
 336 only has three different record sizes - Index, BucketList, and Data. Each
 337 record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
 338 datafile is created with. (In order to accomodate values of various sizes, Data
 339 records chain.) Whenever a sector is freed, it's added to a freelist of that
 340 sector's size. Whenever a new sector is requested, the freelist is checked
 341 first. If the freelist has a sector, it's reused, otherwise a new sector is
 342 added to the end of the datafile.
 343
 344 Freespace management did bring up another issue - staleness. It is possible to
 345 have a pointer to a record in memory. If that record is deleted, then reused,
 346 the pointer in memory has no way of determining that is was deleted and
 347 readded vs. modified. So, a staleness counter was added which is incremented
 348 every time the sector is reused through the freelist. If you then attempt to
 349 access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
 350 the entry was deleted.
 351
 352 =head2 Staleness counters
 353
 354 Once it was implemented for freespace management, staleness counters proved to
 355 be a very powerful concept for transactions themselves. Back in L</Protection
 356 from changes>, I mentioned that other processes modifying the HEAD will
 357 protect all running transactions from their effects. This provides
 358 I<Isolation>. But, the running transaction doesn't know about these entries.
 359 If they're not cleaned up, they will be seen the next time a transaction uses
 360 that transaction ID.
 361
 362 By providing a staleness counter for transactions, the costs of cleaning up
 363 finished transactions is deferred until the space is actually used again. This
 364 is at the cost of having less-than-optimal space utilization. Changing this in
 365 the future would be completely transparent to users, so I felt it was an
 366 acceptable tradeoff for quick delivery of a functional product.
 367
 368 =head1 The future
 369
 370 Basic transactions are only the first step. There are several features that
 371 can be added on top of what's been provided. (Note: these are advanced topics -
 372 I cannot be held responsible for any bleeding ears.)
 373
 374 =head2 Conflict resolution
 375
 376 Conflict, in this context, is what happens when two transactions run
 377 simultaneously and change the same piece of data. Like most databases,
 378 L<DBM::Deep|DBM::Deep> uses a very simplistic form of conflict resolution -
 379 last commit wins. This works quite well for a row-based RDBM, but doesn't work
 380 as well for fractal structures like hashes.
 381
 382 Contrast this with how Subversion handles conflict. It tracks when each
 383 transaction was started. If the HEAD is at a later version than when the
 384 transaction started, the commit is rejected. It is up to the developer to pull
 385 in the latest changes, mediate any conflicts, and then recommit.
 386
 387 =head2 Checkpoints
 388
 389 A transaction may have several steps within it. The first three may succeed,
 390 but the fourth might fail. Instead of rolling back all the way to the
 391 beginning, you might want to rollback to the last successful step and try
 392 again. This is a checkpoint. Most RDBMSes provide them, but we'll see if users
 393 request them for L<DBM::Deep|DBM::Deep>.
 394
 395 =head2 Sub-transactions
 396
 397 Similar to L</Checkpoints>, sub-transactions provide a mechanism for trying
 398 something within a transaction without affecting the transaction. However,
 399 instead of saying "These steps are safely finished," sub-transactions still
 400 provides for the ability to rollback the primary transaction all the way.
 401 Sub-transactions can also work better with libraries that may want to use
 402 transactions themselves.
 403
 404 =head2 Durability
 405
 406 As mentioned in L</What are transactions?>, the 'D' in ACID stands for
 407 I<Durable>. L<DBM::Deep|DBM::Deep> does not satisfy that criterion because
 408 durability almost always requires another file (or files) for a commit log. I
 409 deemed this unacceptable for this release because one of the
 410 L<DBM::Deep|DBM::Deep>'s features is the single datafile. However, Berkley DB
 411 provides durability with only a single file. Cribbing from them may be viable
 412 in the future.
 413
 414 =cut