On the warpath to 1.0000
[dbsrgits/DBM-Deep.git] / article.pod
CommitLineData
315b34eb 1=head0 Adding transactions to DBM::Deep
2
3=head1 What is DBM::Deep?
4
4a38586b 5L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
315b34eb 6storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
7in memory. The datafile produced is able to be ftp'ed from one machine to
8another, regardless of OS or Perl version. There are several reasons why
9someone would want to do this.
10
11=over 4
12
13=item * Transparent Persistence
14
15This is the ability to save a set of data structures to disk and retrieve them
16later without the vast majority of the program even knowing that the data is
17persisted. Furthermore, the datastructure is persisted immediately and not at
18set marshalling periods.
19
20=item * Huge datastructures
21
22Normally, datastructures are limited by the size of RAM the server has.
4a38586b 23L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
315b34eb 24instead.
25
26=item * IPC
27
28While not a common use, this allows for inter-process communication without
29worrying about the specifics of how a given OS handles IPC.
30
31=back
32
33And, with the release of 1.00, there is now a fourth reason -
0dbc9d1d 34software-transactional memory, or STM
35(L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
315b34eb 36
2233b12f 37=head1 How does DBM::Deep work?
38
39L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every
40single read and write go to the file and modify the file immediately. To
41represent Perl's hashes and arrays, a record-based file format is used. There
42is a file header storing file-wide values, such as the size of the internal
43file pointers. Afterwards, there are the data records.
44
45=head2 DBM::Deep's file structure
46
47L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
48index - arrays are currently just funny hashes internally) is hashed using MD5
49and then stored in a cascade of Index and Bucketlist records. The bucketlist
50record stores the actual key string and pointers to where the data records are
51stored. The data records themselves are one of Null, Scalar, or Reference.
52Null represents an I<undef>, Scalar represents a string (numbers are
53stringified for simplicity) and are allocated in 256byte chunks. References
54represent an array or hash reference and contains a pointer to an Index and
55Bucketlist cascade of its own.
56
57=head2 DBM::Deep's class hierarchy
58
59Managing all of these functions takes a lot of different abstractions. There
60are at least 3 different interfacing layers, and more if you look hard enough.
61To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions:
62
63=over 4
64
65=item * Tying classes
66
67These are the classes that manage the external face of the module. They manage
68B<all> of the interactions with outside code - OO interface, tying, and
69various utility methods. If they cannot handle the request themselves, they
70delegate to the engine. There are currently three classes in this layer.
71
72=item * Engine classes
73
74These classes manage the file format and all of the ways that the records
75interact with each other. Nearly every call will make requests to the File
76class for reading and/or writing data to the file. There are currently nine
77classes in this layer.
78
79=item * File class
80
81This class mediates all interaction with the file system. Every read, write,
82lock, and unlock goes through this class. There is currently one class in this
83layer.
84
85=item * Iterator classes
86
87These are introspection classes that provide iteration for hashes. They manage
88keeping track of where the next key should be and how to get there. There are
89currently four classes in this layer.
90
91=back
92
315b34eb 93=head1 What are transactions?
94
95Originally from the database world, a transaction is a way of isolating the
96effects of a given set of actions, then applying them all at once. It's a way
97of saying "I'm going to try the following steps, see if I like the result,
0dbc9d1d 98then I want everyone else looking at this datastore to see the results
315b34eb 99immediately." The most common example is taken from banking. Let's say that an
100application receives a request to have Joe pay Bob five zorkmids. Without
101transactions, the application would take the money from Joe's account, then
102add the money to Bob's account. But, what happens if the application crashes
103after debiting Joe, but before crediting Bob? The application has made money
104disappear. Or, vice versa, if Bob is credited before Joe is debited, the
0dbc9d1d 105application has created money.
315b34eb 106
107With a transaction wrapping the money transfer, if the application crashes in
108the middle, it's as if the action never happened. So, when the application
109recovers from the crash, Joe and Bob still have the same amount of money in
110their accounts as they did before and the transaction can restart and Bob can
111finally receive his zorkmids.
112
0dbc9d1d 113More formally, transactions are generally considered to be proper when they are
114ACID-compliant. ACID is an acronym that means the following:
115
116=over 4
117
118=item * Atomic
119
120Either every change happens or none of the changes happen.
121
122=item * Consistent
123
124When the transaction begins and when it is committed, the database must be in
4a38586b 125a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.
0dbc9d1d 126
127=item * Isolated
128
129As far as a transaction is concerned, it is the only thing running against the
4a38586b 130database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
0dbc9d1d 131strongest isolation level possible.
132
133=item * Durable
134
135Once the database says that a comit has happened, the commit will be
136guaranteed, regardless of whatever happens.
137
138=back
139
315b34eb 140=head1 Why add them to DBM::Deep?
141
142The ability to have actions occur in either I<atomically> (as in the previous
143example) or I<isolation> from the rest of the users of the data is a powerful
0dbc9d1d 144thing. This allows for a certain amount of safety and predictability in how
145data transformations occur. Imagine, for example, that you have a set of
315b34eb 146calculations that will update various variables. However, there are some
147situations that will cause you to throw away all results and start over with a
148different seed. Without transactions, you would have to put everything into
149temporary variables, then transfer the values when the calculations were found
150to be successful. With STM, you start a transaction and do your thing within
151it. If the calculations succeed, you commit. If they fail, you rollback and
d7866e36 152try again. If you're thinking that this is very similar to how SVN or CVS
0dbc9d1d 153works, you're absolutely correct - they are transactional in the exact same
d7866e36 154way.
315b34eb 155
156=head1 How it happened
157
d7866e36 158=head2 The backstory
159
4a38586b 160The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
0dbc9d1d 161complex software endeavor I've ever undertaken. The first step was to figure
162out exactly how transactions were going to work. After several spikesN<These
163are throwaway coding explorations.>, the best design seemed to look to SVN
164instead of relational databases. The more I investigated, the more I ran up
165against the object-relational impedance mismatch
166N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
167time in terms of being able to translate designs. In the relational world,
168transactions are generally implemented either as row-level locks or using MVCC
169N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
170these assume that there is a I<row>, or singular object, that can be locked
171transparently to everything else. This doesn't translate to a fractally
172repeating structure like a hash or an array.
173
174However, the design used by SVN deals with directories and files which
175corresponds very closely to hashes and hashkeys. In SVN, the modifications are
176stored in the file's structure. Translating this to hashes and hashkeys, this
177means that transactional information should be stored in the keys. This means
178that the entire datafile is unaware of anything to do with transactions, except
179for the key's data structure within the bucket.
bb67b124 180
bb67b124 181=head2 Transactions in the keys
182
183The first pass was to expand the Bucketlist sector to go from a simple key /
184datapointer mapping to a more complex key / transaction / datapointer mapping.
185Initially, I interposed a Transaction record that the bucketlist pointed to.
186That then contained the transaction / datapointer mapping. This had the
187advantage of changing nothing except for adding one new sector type and the
188handling for it. This was very quickly merged into the Bucketlist record to
189simplify the resulting code.
190
191This first step got me to the point where I could pass the following test:
192
193 my $db1 = DBM::Deep->new( $filename );
194 my $db2 = DBM::Deep->new( $filename );
195
196 $db1->{abc} = 'foo';
197
198 is( $db1->{abc}, 'foo' );
199 is( $db2->{abc}, 'foo' );
200
201 $db1->begin_work();
202
203 is( $db1->{abc}, 'foo' );
204 is( $db2->{abc}, 'foo' );
205
206 $db1->{abc} = 'floober';
207
208 is( $db1->{abc}, 'floober' );
209 is( $db2->{abc}, 'foo' );
210
211Just that much was a major accomplishment. The first pass only looked in the
212transaction's spot in the bucket for that key. And, that passed my first tests
213because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
214modifying it in the transaction. To pass that test, the code for retrieval
215needed to look first in the transaction's spot and if that spot had never been
216assigned to, look at the spot for the HEAD.
217
218=head2 The concept of the HEAD
219
220This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
221revision checked into the repository. When you do a ocal modification, you're
222doing a modification to the HEAD. Then, you choose to either check in your
223code (commit()) or revert (rollback()).
224
4a38586b 225In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
bb67b124 226benefits:
d7866e36 227
228=over 4
229
bb67b124 230=item * Easy identifiaction of a transaction
231
0dbc9d1d 232C<if ( $trans_id ) {}> will run the code if and only if we are in a running
233transaction.
bb67b124 234
235=item * The HEAD is the first bucket
d7866e36 236
bb67b124 237In a given bucket, the HEAD is the first datapointer because we mutliply the
238size of the transactional bookkeeping by the transaction ID to find the offset
239to seek into the file.
d7866e36 240
241=back
315b34eb 242
0dbc9d1d 243=head2 Protection from changes
244
245Let's assume that a transaction is running in one process and another process
246is also modifying the same area in the data. The only way that process B can
247notify process A that a change has occurred is through the common point - the
248DBM file. Because of this, every process that changes the HEAD needs to
249protect all currently running transactions by copying over the pointer to the
250original value into every transaction which hasn't already modified this
251entry. (If it has, then the new value shouldn't overwrite the transaction's
252modification.) This is the key piece for providing I<Isolation>.
253
bb67b124 254=head2 Tracking modified buckets
255
0dbc9d1d 256Rolling back changes is very simple - just don't apply them to the HEAD. The
257next time that transaction ID is reused, the changes will be ignored (q.v.
258L</Staleness counters>). Committing, however, requires that all the changes
259must be transferred over from the bucket entry for the given transaction ID to
260the entry for the HEAD.
bb67b124 261
262=head2 Deleted marker
263
0dbc9d1d 264Transactions are performed copy-on-write. This means that if there isn't an
265entry for that transaction, the HEAD is looked at. This doesn't work if a key
266has been deleted within a transaction. So, the entry must be marked as deleted
267within the transaction so that the HEAD isn't checekd.
268
269Likewise, when a new key is created in a transaction, the HEAD doesn't have an
270entry for that key. Consider the following situation:
271
272 ok( !exists $db1->{foo} );
273 ok( !exists $db2->{foo} );
274
275 $db1->begin_work();
276 $db1->{foo} = 'bar';
277
278 ok( !exists $db2->{foo} );
279
280The entry for the HEAD for 'foo' needs to be marked as deleted so that
281transactions which don't have 'foo' don't find something in the HEAD.
bb67b124 282
283=head2 Freespace management
284
0dbc9d1d 285The second major piece to the 1.00 release was freespace management. In
4a38586b 286pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
0dbc9d1d 287recycled. While always a requested feature, the complexity required to
288implement freespace meant that it needed to wait for a complete rewrite of
289several pieces, such as for transactions.
290
4a38586b 291Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
0dbc9d1d 292only has three different record sizes - Index, BucketList, and Data. Each
4a38586b 293record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
0dbc9d1d 294datafile is created with. (In order to accomodate values of various sizes, Data
295records chain.) Whenever a sector is freed, it's added to a freelist of that
296sector's size. Whenever a new sector is requested, the freelist is checked
297first. If the freelist has a sector, it's reused, otherwise a new sector is
298added to the end of the datafile.
299
300Freespace management did bring up another issue - staleness. It is possible to
301have a pointer to a record in memory. If that record is deleted, then reused,
302the pointer in memory has no way of determining that is was deleted and
303readded vs. modified. So, a staleness counter was added which is incremented
4a38586b 304every time the sector is reused through the freelist. If you then attempt to
305access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
306the entry was deleted.
0dbc9d1d 307
308=head2 Staleness counters
309
4a38586b 310Once it was implemented for freespace management, staleness counters proved to
311be a very powerful concept for transactions themselves. Back in L<Protection
312from changes>, I mentioned that other processes modifying the HEAD will
313protect all running transactions from their effects. This provides
314I<Isolation>. But, the running transaction doesn't know about these entries.
315If they're not cleaned up, they will be seen the next time a transaction uses
316that transaction ID.
317
318By providing a staleness counter for transactions, the costs of cleaning up
319finished transactions is deferred until the space is actually used again. This
320is at the cost of having less-than-optimal space utilization. Changing this in
321the future would be completely transparent to users, so I felt it was an
322acceptable tradeoff for delivering working code quickly.
0dbc9d1d 323
324=head1 Conclusion
bb67b124 325
315b34eb 326=cut