From: rkinyon Date: Fri, 19 Jan 2007 06:40:24 +0000 (+0000) Subject: r14864@rob-kinyons-computer: rob | 2007-01-18 23:05:43 -0500 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=bb67b1249b0db37b1b67e12c74690ab7647451d3;p=dbsrgits%2FDBM-Deep.git r14864@rob-kinyons-computer: rob | 2007-01-18 23:05:43 -0500 More article stuff --- diff --git a/article.pod b/article.pod index 314b1be..46eafb7 100644 --- a/article.pod +++ b/article.pod @@ -81,21 +81,100 @@ explorations.>, the best design seemed to be to make the keys mediate any transactional information. This means that the entire datafile is unaware of anything to do with transactions, except for the key's data structure. +=head2 Test-driven development + +As transactions have so many edge cases that need to be tested for, it was +imperative that I write my tests along with my development. I subscribe to +TDD, or Test-Driven Development, particularly in my CPAN work. This means that +I try to write a test exercising a piece of functionality I writing +the code providing that functionality. While I haven't always succeeded, doing +this meant that I was able to catch a lot of regressions very quickly. You'll +see this very quickly. + =head2 DBM::Deep's file structure L's file structure is a record-based structure. The key (or array index - arrays are just funny hashes internally) is hashed using MD5 and then stored in a cascade of Index and Bucketlist records. The bucketlist record -stores the actual key string and references to where the data records are -stored. Transactions are handled by associating different data record -references to the appropriate transaction ID. This allows two things: +stores the actual key string and pointers to where the data records are +stored. The data records themselves are one of Null, Scalar, or Reference. +Null represents an I, Scalar represents a string (numbers are +stringified for simplicity) and are allocated in 256byte chunks. Reference +represents an array or hash reference and contains a pointer to an Index and +Bucketlist cascade of its own. + +=head2 Transactions in the keys + +The first pass was to expand the Bucketlist sector to go from a simple key / +datapointer mapping to a more complex key / transaction / datapointer mapping. +Initially, I interposed a Transaction record that the bucketlist pointed to. +That then contained the transaction / datapointer mapping. This had the +advantage of changing nothing except for adding one new sector type and the +handling for it. This was very quickly merged into the Bucketlist record to +simplify the resulting code. + +This first step got me to the point where I could pass the following test: + + my $db1 = DBM::Deep->new( $filename ); + my $db2 = DBM::Deep->new( $filename ); + + $db1->{abc} = 'foo'; + + is( $db1->{abc}, 'foo' ); + is( $db2->{abc}, 'foo' ); + + $db1->begin_work(); + + is( $db1->{abc}, 'foo' ); + is( $db2->{abc}, 'foo' ); + + $db1->{abc} = 'floober'; + + is( $db1->{abc}, 'floober' ); + is( $db2->{abc}, 'foo' ); + +Just that much was a major accomplishment. The first pass only looked in the +transaction's spot in the bucket for that key. And, that passed my first tests +because I didn't check that C<$db1-E{abc}> was still 'foo' I +modifying it in the transaction. To pass that test, the code for retrieval +needed to look first in the transaction's spot and if that spot had never been +assigned to, look at the spot for the HEAD. + +=head2 The concept of the HEAD + +This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest +revision checked into the repository. When you do a ocal modification, you're +doing a modification to the HEAD. Then, you choose to either check in your +code (commit()) or revert (rollback()). + +In L, I chose to make the HEAD transaction ID 0. This has several +benefits: =over 4 -=item 1 Simplicity +=item * Easy identifiaction of a transaction + +C will run the code iff we are in a running transaction. + +=item * The HEAD is the first bucket -=item 1 Ease of portability +In a given bucket, the HEAD is the first datapointer because we mutliply the +size of the transactional bookkeeping by the transaction ID to find the offset +to seek into the file. =back +=head2 Tracking modified buckets + +This gets commit + +=head2 Deleted marker + +This gets delete within the txn and assignment within the txn + +=head2 Freespace management + +This keeps the file from exploding in size, but it interacts with the deleted +marker. + =cut