r14213@rob-kinyons-computer (orig r8080): rkinyon | 2006-11-17 20:47:50 -0500
[dbsrgits/DBM-Deep.git] / article.pod
CommitLineData
2120a181 1=head0 Adding transactions to DBM::Deep
2
3=head1 What is DBM::Deep?
4
5L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
6storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
7in memory. The datafile produced is able to be ftp'ed from one machine to
8another, regardless of OS or Perl version. There are several reasons why
9someone would want to do this.
10
11=over 4
12
13=item * Transparent Persistence
14
15This is the ability to save a set of data structures to disk and retrieve them
16later without the vast majority of the program even knowing that the data is
17persisted. Furthermore, the datastructure is persisted immediately and not at
18set marshalling periods.
19
20=item * Huge datastructures
21
22Normally, datastructures are limited by the size of RAM the server has.
23L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
24instead.
25
26=item * IPC
27
28While not a common use, this allows for inter-process communication without
29worrying about the specifics of how a given OS handles IPC.
30
31=back
32
33And, with the release of 1.00, there is now a fourth reason -
34software-transactional memory, or STM
35(L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
36
37=head1 What are transactions?
38
39Originally from the database world, a transaction is a way of isolating the
40effects of a given set of actions, then applying them all at once. It's a way
41of saying "I'm going to try the following steps, see if I like the result,
42then I want everyone else looking at this datastore to see the results
43immediately." The most common example is taken from banking. Let's say that an
44application receives a request to have Joe pay Bob five zorkmids. Without
45transactions, the application would take the money from Joe's account, then
46add the money to Bob's account. But, what happens if the application crashes
47after debiting Joe, but before crediting Bob? The application has made money
48disappear. Or, vice versa, if Bob is credited before Joe is debited, the
49application has created money.
50
51With a transaction wrapping the money transfer, if the application crashes in
52the middle, it's as if the action never happened. So, when the application
53recovers from the crash, Joe and Bob still have the same amount of money in
54their accounts as they did before and the transaction can restart and Bob can
55finally receive his zorkmids.
56
57More formally, transactions are generally considered to be proper when they are
58ACID-compliant. ACID is an acronym that means the following:
59
60=over 4
61
62=item * Atomic
63
64Either every change happens or none of the changes happen.
65
66=item * Consistent
67
68When the transaction begins and when it is committed, the database must be in
69a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
70
71=item * Isolated
72
73As far as a transaction is concerned, it is the only thing running against the
74database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
75strongest isolation level possible.
76
77=item * Durable
78
79Once the database says that a comit has happened, the commit will be
80guaranteed, regardless of whatever happens.
81
82=back
83
84=head1 Why add them to DBM::Deep?
85
86The ability to have actions occur in either I<atomically> (as in the previous
87example) or I<isolation> from the rest of the users of the data is a powerful
88thing. This allows for a certain amount of safety and predictability in how
89data transformations occur. Imagine, for example, that you have a set of
90calculations that will update various variables. However, there are some
91situations that will cause you to throw away all results and start over with a
92different seed. Without transactions, you would have to put everything into
93temporary variables, then transfer the values when the calculations were found
94to be successful. With STM, you start a transaction and do your thing within
95it. If the calculations succeed, you commit. If they fail, you rollback and
96try again. If you're thinking that this is very similar to how SVN or CVS
97works, you're absolutely correct - they are transactional in the exact same
98way.
99
100=head1 How it happened
101
102=head2 The backstory
103
104The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
105complex software endeavor I've ever undertaken. The first step was to figure
106out exactly how transactions were going to work. After several spikesN<These
107are throwaway coding explorations.>, the best design seemed to look to SVN
108instead of relational databases. The more I investigated, the more I ran up
109against the object-relational impedance mismatch
110N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
111time in terms of being able to translate designs. In the relational world,
112transactions are generally implemented either as row-level locks or using MVCC
113N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
114these assume that there is a I<row>, or singular object, that can be locked
115transparently to everything else. This doesn't translate to a fractally
116repeating structure like a hash or an array.
117
118However, the design used by SVN deals with directories and files which
119corresponds very closely to hashes and hashkeys. In SVN, the modifications are
120stored in the file's structure. Translating this to hashes and hashkeys, this
121means that transactional information should be stored in the keys. This means
122that the entire datafile is unaware of anything to do with transactions, except
123for the key's data structure within the bucket.
124
125=head2 DBM::Deep's file structure
126
127L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
128index - arrays are currently just funny hashes internally) is hashed using MD5
129and then stored in a cascade of Index and Bucketlist records. The bucketlist
130record stores the actual key string and pointers to where the data records are
131stored. The data records themselves are one of Null, Scalar, or Reference.
132Null represents an I<undef>, Scalar represents a string (numbers are
133stringified for simplicity) and are allocated in 256byte chunks. References
134represent an array or hash reference and contains a pointer to an Index and
135Bucketlist cascade of its own.
136
137=head2 Transactions in the keys
138
139The first pass was to expand the Bucketlist sector to go from a simple key /
140datapointer mapping to a more complex key / transaction / datapointer mapping.
141Initially, I interposed a Transaction record that the bucketlist pointed to.
142That then contained the transaction / datapointer mapping. This had the
143advantage of changing nothing except for adding one new sector type and the
144handling for it. This was very quickly merged into the Bucketlist record to
145simplify the resulting code.
146
147This first step got me to the point where I could pass the following test:
148
149 my $db1 = DBM::Deep->new( $filename );
150 my $db2 = DBM::Deep->new( $filename );
151
152 $db1->{abc} = 'foo';
153
154 is( $db1->{abc}, 'foo' );
155 is( $db2->{abc}, 'foo' );
156
157 $db1->begin_work();
158
159 is( $db1->{abc}, 'foo' );
160 is( $db2->{abc}, 'foo' );
161
162 $db1->{abc} = 'floober';
163
164 is( $db1->{abc}, 'floober' );
165 is( $db2->{abc}, 'foo' );
166
167Just that much was a major accomplishment. The first pass only looked in the
168transaction's spot in the bucket for that key. And, that passed my first tests
169because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
170modifying it in the transaction. To pass that test, the code for retrieval
171needed to look first in the transaction's spot and if that spot had never been
172assigned to, look at the spot for the HEAD.
173
174=head2 The concept of the HEAD
175
176This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
177revision checked into the repository. When you do a ocal modification, you're
178doing a modification to the HEAD. Then, you choose to either check in your
179code (commit()) or revert (rollback()).
180
181In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
182benefits:
183
184=over 4
185
186=item * Easy identifiaction of a transaction
187
188C<if ( $trans_id ) {}> will run the code if and only if we are in a running
189transaction.
190
191=item * The HEAD is the first bucket
192
193In a given bucket, the HEAD is the first datapointer because we mutliply the
194size of the transactional bookkeeping by the transaction ID to find the offset
195to seek into the file.
196
197=back
198
199=head2 Protection from changes
200
201Let's assume that a transaction is running in one process and another process
202is also modifying the same area in the data. The only way that process B can
203notify process A that a change has occurred is through the common point - the
204DBM file. Because of this, every process that changes the HEAD needs to
205protect all currently running transactions by copying over the pointer to the
206original value into every transaction which hasn't already modified this
207entry. (If it has, then the new value shouldn't overwrite the transaction's
208modification.) This is the key piece for providing I<Isolation>.
209
210=head2 Tracking modified buckets
211
212Rolling back changes is very simple - just don't apply them to the HEAD. The
213next time that transaction ID is reused, the changes will be ignored (q.v.
214L</Staleness counters>). Committing, however, requires that all the changes
215must be transferred over from the bucket entry for the given transaction ID to
216the entry for the HEAD.
217
218=head2 Deleted marker
219
220Transactions are performed copy-on-write. This means that if there isn't an
221entry for that transaction, the HEAD is looked at. This doesn't work if a key
222has been deleted within a transaction. So, the entry must be marked as deleted
223within the transaction so that the HEAD isn't checekd.
224
225Likewise, when a new key is created in a transaction, the HEAD doesn't have an
226entry for that key. Consider the following situation:
227
228 ok( !exists $db1->{foo} );
229 ok( !exists $db2->{foo} );
230
231 $db1->begin_work();
232 $db1->{foo} = 'bar';
233
234 ok( !exists $db2->{foo} );
235
236The entry for the HEAD for 'foo' needs to be marked as deleted so that
237transactions which don't have 'foo' don't find something in the HEAD.
238
239=head2 Freespace management
240
241The second major piece to the 1.00 release was freespace management. In
242pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
243recycled. While always a requested feature, the complexity required to
244implement freespace meant that it needed to wait for a complete rewrite of
245several pieces, such as for transactions.
246
247Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
248only has three different record sizes - Index, BucketList, and Data. Each
249record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
250datafile is created with. (In order to accomodate values of various sizes, Data
251records chain.) Whenever a sector is freed, it's added to a freelist of that
252sector's size. Whenever a new sector is requested, the freelist is checked
253first. If the freelist has a sector, it's reused, otherwise a new sector is
254added to the end of the datafile.
255
256Freespace management did bring up another issue - staleness. It is possible to
257have a pointer to a record in memory. If that record is deleted, then reused,
258the pointer in memory has no way of determining that is was deleted and
259readded vs. modified. So, a staleness counter was added which is incremented
260every time the sector is reused through the freelist. If you then attempt to
261access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
262the entry was deleted.
263
264=head2 Staleness counters
265
266Once it was implemented for freespace management, staleness counters proved to
267be a very powerful concept for transactions themselves. Back in L<Protection
268from changes>, I mentioned that other processes modifying the HEAD will
269protect all running transactions from their effects. This provides
270I<Isolation>. But, the running transaction doesn't know about these entries.
271If they're not cleaned up, they will be seen the next time a transaction uses
272that transaction ID.
273
274By providing a staleness counter for transactions, the costs of cleaning up
275finished transactions is deferred until the space is actually used again. This
276is at the cost of having less-than-optimal space utilization. Changing this in
277the future would be completely transparent to users, so I felt it was an
278acceptable tradeoff for delivering working code quickly.
279
280=head1 Conclusion
281
282=cut