r14864@rob-kinyons-computer: rob | 2007-01-18 23:05:43 -0500
[dbsrgits/DBM-Deep.git] / article.pod
CommitLineData
315b34eb 1=head0 Adding transactions to DBM::Deep
2
3=head1 What is DBM::Deep?
4
5L<DBM::Deep> is a module written completely in Perl that provides a way of
6storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
7in memory. The datafile produced is able to be ftp'ed from one machine to
8another, regardless of OS or Perl version. There are several reasons why
9someone would want to do this.
10
11=over 4
12
13=item * Transparent Persistence
14
15This is the ability to save a set of data structures to disk and retrieve them
16later without the vast majority of the program even knowing that the data is
17persisted. Furthermore, the datastructure is persisted immediately and not at
18set marshalling periods.
19
20=item * Huge datastructures
21
22Normally, datastructures are limited by the size of RAM the server has.
23L<DBM::Deep> allows for the size a given datastructure to be limited by disk
24instead.
25
26=item * IPC
27
28While not a common use, this allows for inter-process communication without
29worrying about the specifics of how a given OS handles IPC.
30
31=back
32
33And, with the release of 1.00, there is now a fourth reason -
34software-transactional memory, or STM.
35
36=head1 What are transactions?
37
38Originally from the database world, a transaction is a way of isolating the
39effects of a given set of actions, then applying them all at once. It's a way
40of saying "I'm going to try the following steps, see if I like the result,
41then I want everyone else looking at this datastore to see the effects
42immediately." The most common example is taken from banking. Let's say that an
43application receives a request to have Joe pay Bob five zorkmids. Without
44transactions, the application would take the money from Joe's account, then
45add the money to Bob's account. But, what happens if the application crashes
46after debiting Joe, but before crediting Bob? The application has made money
47disappear. Or, vice versa, if Bob is credited before Joe is debited, the
48application has created moneyN<Yes, the US federal government does use
49transactions, so this isn't the cause.>.
50
51With a transaction wrapping the money transfer, if the application crashes in
52the middle, it's as if the action never happened. So, when the application
53recovers from the crash, Joe and Bob still have the same amount of money in
54their accounts as they did before and the transaction can restart and Bob can
55finally receive his zorkmids.
56
57=head1 Why add them to DBM::Deep?
58
59The ability to have actions occur in either I<atomically> (as in the previous
60example) or I<isolation> from the rest of the users of the data is a powerful
61one. This allows for a certain amount of safety and predictability in how data
62transformations occur. Imagine, for example, that you have a set of
63calculations that will update various variables. However, there are some
64situations that will cause you to throw away all results and start over with a
65different seed. Without transactions, you would have to put everything into
66temporary variables, then transfer the values when the calculations were found
67to be successful. With STM, you start a transaction and do your thing within
68it. If the calculations succeed, you commit. If they fail, you rollback and
d7866e36 69try again. If you're thinking that this is very similar to how SVN or CVS
70works, you're absolutely correct - SCMs are transactional in the exact same
71way.
315b34eb 72
73=head1 How it happened
74
d7866e36 75=head2 The backstory
76
315b34eb 77This project has easily been the single most complex software endeavor I've
d7866e36 78ever undertaken. The first step was to figure out exactly how transactions
79were going to work. After several spikesN<These are throwaway coding
80explorations.>, the best design seemed to be to make the keys mediate any
81transactional information. This means that the entire datafile is unaware of
82anything to do with transactions, except for the key's data structure.
83
bb67b124 84=head2 Test-driven development
85
86As transactions have so many edge cases that need to be tested for, it was
87imperative that I write my tests along with my development. I subscribe to
88TDD, or Test-Driven Development, particularly in my CPAN work. This means that
89I try to write a test exercising a piece of functionality I<before> writing
90the code providing that functionality. While I haven't always succeeded, doing
91this meant that I was able to catch a lot of regressions very quickly. You'll
92see this very quickly.
93
d7866e36 94=head2 DBM::Deep's file structure
95
96L<DBM::Deep>'s file structure is a record-based structure. The key (or array
97index - arrays are just funny hashes internally) is hashed using MD5 and then
98stored in a cascade of Index and Bucketlist records. The bucketlist record
bb67b124 99stores the actual key string and pointers to where the data records are
100stored. The data records themselves are one of Null, Scalar, or Reference.
101Null represents an I<undef>, Scalar represents a string (numbers are
102stringified for simplicity) and are allocated in 256byte chunks. Reference
103represents an array or hash reference and contains a pointer to an Index and
104Bucketlist cascade of its own.
105
106=head2 Transactions in the keys
107
108The first pass was to expand the Bucketlist sector to go from a simple key /
109datapointer mapping to a more complex key / transaction / datapointer mapping.
110Initially, I interposed a Transaction record that the bucketlist pointed to.
111That then contained the transaction / datapointer mapping. This had the
112advantage of changing nothing except for adding one new sector type and the
113handling for it. This was very quickly merged into the Bucketlist record to
114simplify the resulting code.
115
116This first step got me to the point where I could pass the following test:
117
118 my $db1 = DBM::Deep->new( $filename );
119 my $db2 = DBM::Deep->new( $filename );
120
121 $db1->{abc} = 'foo';
122
123 is( $db1->{abc}, 'foo' );
124 is( $db2->{abc}, 'foo' );
125
126 $db1->begin_work();
127
128 is( $db1->{abc}, 'foo' );
129 is( $db2->{abc}, 'foo' );
130
131 $db1->{abc} = 'floober';
132
133 is( $db1->{abc}, 'floober' );
134 is( $db2->{abc}, 'foo' );
135
136Just that much was a major accomplishment. The first pass only looked in the
137transaction's spot in the bucket for that key. And, that passed my first tests
138because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
139modifying it in the transaction. To pass that test, the code for retrieval
140needed to look first in the transaction's spot and if that spot had never been
141assigned to, look at the spot for the HEAD.
142
143=head2 The concept of the HEAD
144
145This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
146revision checked into the repository. When you do a ocal modification, you're
147doing a modification to the HEAD. Then, you choose to either check in your
148code (commit()) or revert (rollback()).
149
150In L<DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
151benefits:
d7866e36 152
153=over 4
154
bb67b124 155=item * Easy identifiaction of a transaction
156
157C<if ( $trans_id ) {}> will run the code iff we are in a running transaction.
158
159=item * The HEAD is the first bucket
d7866e36 160
bb67b124 161In a given bucket, the HEAD is the first datapointer because we mutliply the
162size of the transactional bookkeeping by the transaction ID to find the offset
163to seek into the file.
d7866e36 164
165=back
315b34eb 166
bb67b124 167=head2 Tracking modified buckets
168
169This gets commit
170
171=head2 Deleted marker
172
173This gets delete within the txn and assignment within the txn
174
175=head2 Freespace management
176
177This keeps the file from exploding in size, but it interacts with the deleted
178marker.
179
315b34eb 180=cut