Commit | Line | Data |
315b34eb |
1 | =head0 Adding transactions to DBM::Deep |
2 | |
3 | =head1 What is DBM::Deep? |
4 | |
4a38586b |
5 | L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of |
315b34eb |
6 | storing Perl datastructures (scalars, hashes, and arrays) on disk instead of |
7 | in memory. The datafile produced is able to be ftp'ed from one machine to |
8 | another, regardless of OS or Perl version. There are several reasons why |
9 | someone would want to do this. |
10 | |
11 | =over 4 |
12 | |
13 | =item * Transparent Persistence |
14 | |
15 | This is the ability to save a set of data structures to disk and retrieve them |
16 | later without the vast majority of the program even knowing that the data is |
17 | persisted. Furthermore, the datastructure is persisted immediately and not at |
18 | set marshalling periods. |
19 | |
20 | =item * Huge datastructures |
21 | |
22 | Normally, datastructures are limited by the size of RAM the server has. |
4a38586b |
23 | L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk |
315b34eb |
24 | instead. |
25 | |
26 | =item * IPC |
27 | |
28 | While not a common use, this allows for inter-process communication without |
29 | worrying about the specifics of how a given OS handles IPC. |
30 | |
31 | =back |
32 | |
33 | And, with the release of 1.00, there is now a fourth reason - |
0dbc9d1d |
34 | software-transactional memory, or STM |
35 | (L<http://en.wikipedia.org/wiki/Software_transactional_memory>). |
315b34eb |
36 | |
2233b12f |
37 | =head1 How does DBM::Deep work? |
38 | |
39 | L<DBM::Deep|DBM::Deep> works by tying a variable to a file on disk. Every |
40 | single read and write go to the file and modify the file immediately. To |
41 | represent Perl's hashes and arrays, a record-based file format is used. There |
42 | is a file header storing file-wide values, such as the size of the internal |
43 | file pointers. Afterwards, there are the data records. |
44 | |
45 | =head2 DBM::Deep's file structure |
46 | |
47 | L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array |
48 | index - arrays are currently just funny hashes internally) is hashed using MD5 |
49 | and then stored in a cascade of Index and Bucketlist records. The bucketlist |
50 | record stores the actual key string and pointers to where the data records are |
51 | stored. The data records themselves are one of Null, Scalar, or Reference. |
52 | Null represents an I<undef>, Scalar represents a string (numbers are |
53 | stringified for simplicity) and are allocated in 256byte chunks. References |
54 | represent an array or hash reference and contains a pointer to an Index and |
55 | Bucketlist cascade of its own. |
56 | |
57 | =head2 DBM::Deep's class hierarchy |
58 | |
59 | Managing all of these functions takes a lot of different abstractions. There |
60 | are at least 3 different interfacing layers, and more if you look hard enough. |
61 | To manage this complexity, L<DBM::Deep|DBM::Deep> uses the following abstractions: |
62 | |
63 | =over 4 |
64 | |
65 | =item * Tying classes |
66 | |
67 | These are the classes that manage the external face of the module. They manage |
68 | B<all> of the interactions with outside code - OO interface, tying, and |
69 | various utility methods. If they cannot handle the request themselves, they |
70 | delegate to the engine. There are currently three classes in this layer. |
71 | |
72 | =item * Engine classes |
73 | |
74 | These classes manage the file format and all of the ways that the records |
75 | interact with each other. Nearly every call will make requests to the File |
76 | class for reading and/or writing data to the file. There are currently nine |
77 | classes in this layer. |
78 | |
79 | =item * File class |
80 | |
81 | This class mediates all interaction with the file system. Every read, write, |
82 | lock, and unlock goes through this class. There is currently one class in this |
83 | layer. |
84 | |
85 | =item * Iterator classes |
86 | |
87 | These are introspection classes that provide iteration for hashes. They manage |
88 | keeping track of where the next key should be and how to get there. There are |
89 | currently four classes in this layer. |
90 | |
91 | =back |
92 | |
315b34eb |
93 | =head1 What are transactions? |
94 | |
95 | Originally from the database world, a transaction is a way of isolating the |
96 | effects of a given set of actions, then applying them all at once. It's a way |
97 | of saying "I'm going to try the following steps, see if I like the result, |
0dbc9d1d |
98 | then I want everyone else looking at this datastore to see the results |
315b34eb |
99 | immediately." The most common example is taken from banking. Let's say that an |
100 | application receives a request to have Joe pay Bob five zorkmids. Without |
101 | transactions, the application would take the money from Joe's account, then |
102 | add the money to Bob's account. But, what happens if the application crashes |
103 | after debiting Joe, but before crediting Bob? The application has made money |
104 | disappear. Or, vice versa, if Bob is credited before Joe is debited, the |
0dbc9d1d |
105 | application has created money. |
315b34eb |
106 | |
107 | With a transaction wrapping the money transfer, if the application crashes in |
108 | the middle, it's as if the action never happened. So, when the application |
109 | recovers from the crash, Joe and Bob still have the same amount of money in |
110 | their accounts as they did before and the transaction can restart and Bob can |
111 | finally receive his zorkmids. |
112 | |
0dbc9d1d |
113 | More formally, transactions are generally considered to be proper when they are |
114 | ACID-compliant. ACID is an acronym that means the following: |
115 | |
116 | =over 4 |
117 | |
118 | =item * Atomic |
119 | |
120 | Either every change happens or none of the changes happen. |
121 | |
122 | =item * Consistent |
123 | |
124 | When the transaction begins and when it is committed, the database must be in |
4a38586b |
125 | a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much. |
0dbc9d1d |
126 | |
127 | =item * Isolated |
128 | |
129 | As far as a transaction is concerned, it is the only thing running against the |
4a38586b |
130 | database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the |
0dbc9d1d |
131 | strongest isolation level possible. |
132 | |
133 | =item * Durable |
134 | |
135 | Once the database says that a comit has happened, the commit will be |
136 | guaranteed, regardless of whatever happens. |
137 | |
138 | =back |
139 | |
315b34eb |
140 | =head1 Why add them to DBM::Deep? |
141 | |
142 | The ability to have actions occur in either I<atomically> (as in the previous |
143 | example) or I<isolation> from the rest of the users of the data is a powerful |
0dbc9d1d |
144 | thing. This allows for a certain amount of safety and predictability in how |
145 | data transformations occur. Imagine, for example, that you have a set of |
315b34eb |
146 | calculations that will update various variables. However, there are some |
147 | situations that will cause you to throw away all results and start over with a |
148 | different seed. Without transactions, you would have to put everything into |
149 | temporary variables, then transfer the values when the calculations were found |
150 | to be successful. With STM, you start a transaction and do your thing within |
151 | it. If the calculations succeed, you commit. If they fail, you rollback and |
d7866e36 |
152 | try again. If you're thinking that this is very similar to how SVN or CVS |
0dbc9d1d |
153 | works, you're absolutely correct - they are transactional in the exact same |
d7866e36 |
154 | way. |
315b34eb |
155 | |
156 | =head1 How it happened |
157 | |
d7866e36 |
158 | =head2 The backstory |
159 | |
4a38586b |
160 | The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most |
0dbc9d1d |
161 | complex software endeavor I've ever undertaken. The first step was to figure |
162 | out exactly how transactions were going to work. After several spikesN<These |
163 | are throwaway coding explorations.>, the best design seemed to look to SVN |
164 | instead of relational databases. The more I investigated, the more I ran up |
165 | against the object-relational impedance mismatch |
166 | N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this |
167 | time in terms of being able to translate designs. In the relational world, |
168 | transactions are generally implemented either as row-level locks or using MVCC |
169 | N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of |
170 | these assume that there is a I<row>, or singular object, that can be locked |
171 | transparently to everything else. This doesn't translate to a fractally |
172 | repeating structure like a hash or an array. |
173 | |
174 | However, the design used by SVN deals with directories and files which |
175 | corresponds very closely to hashes and hashkeys. In SVN, the modifications are |
176 | stored in the file's structure. Translating this to hashes and hashkeys, this |
177 | means that transactional information should be stored in the keys. This means |
178 | that the entire datafile is unaware of anything to do with transactions, except |
179 | for the key's data structure within the bucket. |
bb67b124 |
180 | |
bb67b124 |
181 | =head2 Transactions in the keys |
182 | |
183 | The first pass was to expand the Bucketlist sector to go from a simple key / |
184 | datapointer mapping to a more complex key / transaction / datapointer mapping. |
185 | Initially, I interposed a Transaction record that the bucketlist pointed to. |
186 | That then contained the transaction / datapointer mapping. This had the |
187 | advantage of changing nothing except for adding one new sector type and the |
188 | handling for it. This was very quickly merged into the Bucketlist record to |
189 | simplify the resulting code. |
190 | |
191 | This first step got me to the point where I could pass the following test: |
192 | |
193 | my $db1 = DBM::Deep->new( $filename ); |
194 | my $db2 = DBM::Deep->new( $filename ); |
195 | |
196 | $db1->{abc} = 'foo'; |
197 | |
198 | is( $db1->{abc}, 'foo' ); |
199 | is( $db2->{abc}, 'foo' ); |
200 | |
201 | $db1->begin_work(); |
202 | |
203 | is( $db1->{abc}, 'foo' ); |
204 | is( $db2->{abc}, 'foo' ); |
205 | |
206 | $db1->{abc} = 'floober'; |
207 | |
208 | is( $db1->{abc}, 'floober' ); |
209 | is( $db2->{abc}, 'foo' ); |
210 | |
211 | Just that much was a major accomplishment. The first pass only looked in the |
212 | transaction's spot in the bucket for that key. And, that passed my first tests |
213 | because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before> |
214 | modifying it in the transaction. To pass that test, the code for retrieval |
215 | needed to look first in the transaction's spot and if that spot had never been |
216 | assigned to, look at the spot for the HEAD. |
217 | |
218 | =head2 The concept of the HEAD |
219 | |
220 | This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest |
221 | revision checked into the repository. When you do a ocal modification, you're |
222 | doing a modification to the HEAD. Then, you choose to either check in your |
223 | code (commit()) or revert (rollback()). |
224 | |
4a38586b |
225 | In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several |
bb67b124 |
226 | benefits: |
d7866e36 |
227 | |
228 | =over 4 |
229 | |
bb67b124 |
230 | =item * Easy identifiaction of a transaction |
231 | |
0dbc9d1d |
232 | C<if ( $trans_id ) {}> will run the code if and only if we are in a running |
233 | transaction. |
bb67b124 |
234 | |
235 | =item * The HEAD is the first bucket |
d7866e36 |
236 | |
bb67b124 |
237 | In a given bucket, the HEAD is the first datapointer because we mutliply the |
238 | size of the transactional bookkeeping by the transaction ID to find the offset |
239 | to seek into the file. |
d7866e36 |
240 | |
241 | =back |
315b34eb |
242 | |
0dbc9d1d |
243 | =head2 Protection from changes |
244 | |
245 | Let's assume that a transaction is running in one process and another process |
246 | is also modifying the same area in the data. The only way that process B can |
247 | notify process A that a change has occurred is through the common point - the |
248 | DBM file. Because of this, every process that changes the HEAD needs to |
249 | protect all currently running transactions by copying over the pointer to the |
250 | original value into every transaction which hasn't already modified this |
251 | entry. (If it has, then the new value shouldn't overwrite the transaction's |
252 | modification.) This is the key piece for providing I<Isolation>. |
253 | |
bb67b124 |
254 | =head2 Tracking modified buckets |
255 | |
0dbc9d1d |
256 | Rolling back changes is very simple - just don't apply them to the HEAD. The |
257 | next time that transaction ID is reused, the changes will be ignored (q.v. |
258 | L</Staleness counters>). Committing, however, requires that all the changes |
259 | must be transferred over from the bucket entry for the given transaction ID to |
260 | the entry for the HEAD. |
bb67b124 |
261 | |
262 | =head2 Deleted marker |
263 | |
0dbc9d1d |
264 | Transactions are performed copy-on-write. This means that if there isn't an |
265 | entry for that transaction, the HEAD is looked at. This doesn't work if a key |
266 | has been deleted within a transaction. So, the entry must be marked as deleted |
267 | within the transaction so that the HEAD isn't checekd. |
268 | |
269 | Likewise, when a new key is created in a transaction, the HEAD doesn't have an |
270 | entry for that key. Consider the following situation: |
271 | |
272 | ok( !exists $db1->{foo} ); |
273 | ok( !exists $db2->{foo} ); |
274 | |
275 | $db1->begin_work(); |
276 | $db1->{foo} = 'bar'; |
277 | |
278 | ok( !exists $db2->{foo} ); |
279 | |
280 | The entry for the HEAD for 'foo' needs to be marked as deleted so that |
281 | transactions which don't have 'foo' don't find something in the HEAD. |
bb67b124 |
282 | |
283 | =head2 Freespace management |
284 | |
0dbc9d1d |
285 | The second major piece to the 1.00 release was freespace management. In |
4a38586b |
286 | pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be |
0dbc9d1d |
287 | recycled. While always a requested feature, the complexity required to |
288 | implement freespace meant that it needed to wait for a complete rewrite of |
289 | several pieces, such as for transactions. |
290 | |
4a38586b |
291 | Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep> |
0dbc9d1d |
292 | only has three different record sizes - Index, BucketList, and Data. Each |
4a38586b |
293 | record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep> |
0dbc9d1d |
294 | datafile is created with. (In order to accomodate values of various sizes, Data |
295 | records chain.) Whenever a sector is freed, it's added to a freelist of that |
296 | sector's size. Whenever a new sector is requested, the freelist is checked |
297 | first. If the freelist has a sector, it's reused, otherwise a new sector is |
298 | added to the end of the datafile. |
299 | |
300 | Freespace management did bring up another issue - staleness. It is possible to |
301 | have a pointer to a record in memory. If that record is deleted, then reused, |
302 | the pointer in memory has no way of determining that is was deleted and |
303 | readded vs. modified. So, a staleness counter was added which is incremented |
4a38586b |
304 | every time the sector is reused through the freelist. If you then attempt to |
305 | access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point, |
306 | the entry was deleted. |
0dbc9d1d |
307 | |
308 | =head2 Staleness counters |
309 | |
4a38586b |
310 | Once it was implemented for freespace management, staleness counters proved to |
311 | be a very powerful concept for transactions themselves. Back in L<Protection |
312 | from changes>, I mentioned that other processes modifying the HEAD will |
313 | protect all running transactions from their effects. This provides |
314 | I<Isolation>. But, the running transaction doesn't know about these entries. |
315 | If they're not cleaned up, they will be seen the next time a transaction uses |
316 | that transaction ID. |
317 | |
318 | By providing a staleness counter for transactions, the costs of cleaning up |
319 | finished transactions is deferred until the space is actually used again. This |
320 | is at the cost of having less-than-optimal space utilization. Changing this in |
321 | the future would be completely transparent to users, so I felt it was an |
322 | acceptable tradeoff for delivering working code quickly. |
0dbc9d1d |
323 | |
324 | =head1 Conclusion |
bb67b124 |
325 | |
315b34eb |
326 | =cut |