[dbsrgits/DBM-Deep.git] / article.pod

=head0 Adding transactions to DBM::Deep

=head1 What is DBM::Deep?

L<DBM::Deep|DBM::Deep> is a module written completely in Perl that provides a way of
storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
in memory. The datafile produced is able to be ftp'ed from one machine to
another, regardless of OS or Perl version. There are several reasons why
someone would want to do this.

=over 4

=item * Transparent Persistence

This is the ability to save a set of data structures to disk and retrieve them
later without the vast majority of the program even knowing that the data is
persisted. Furthermore, the datastructure is persisted immediately and not at
set marshalling periods.

=item * Huge datastructures

Normally, datastructures are limited by the size of RAM the server has.
L<DBM::Deep|DBM::Deep> allows for the size a given datastructure to be limited by disk
instead.

=item * IPC

While not a common use, this allows for inter-process communication without
worrying about the specifics of how a given OS handles IPC.

=back

And, with the release of 1.00, there is now a fourth reason -
software-transactional memory, or STM
(L<http://en.wikipedia.org/wiki/Software_transactional_memory>).

=head1 What are transactions?

Originally from the database world, a transaction is a way of isolating the
effects of a given set of actions, then applying them all at once. It's a way
of saying "I'm going to try the following steps, see if I like the result,
then I want everyone else looking at this datastore to see the results
immediately." The most common example is taken from banking. Let's say that an
application receives a request to have Joe pay Bob five zorkmids.  Without
transactions, the application would take the money from Joe's account, then
add the money to Bob's account. But, what happens if the application crashes
after debiting Joe, but before crediting Bob? The application has made money
disappear. Or, vice versa, if Bob is credited before Joe is debited, the
application has created money.

With a transaction wrapping the money transfer, if the application crashes in
the middle, it's as if the action never happened. So, when the application
recovers from the crash, Joe and Bob still have the same amount of money in
their accounts as they did before and the transaction can restart and Bob can
finally receive his zorkmids.

More formally, transactions are generally considered to be proper when they are
ACID-compliant. ACID is an acronym that means the following:

=over 4

=item * Atomic

Either every change happens or none of the changes happen.

=item * Consistent

When the transaction begins and when it is committed, the database must be in
a legal state. This restriction doesn't apply to L<DBM::Deep|DBM::Deep> very much.

=item * Isolated

As far as a transaction is concerned, it is the only thing running against the
database while it is running. Unlike most RDBMSes, L<DBM::Deep|DBM::Deep> provides the
strongest isolation level possible.

=item * Durable

Once the database says that a comit has happened, the commit will be
guaranteed, regardless of whatever happens.

=back

=head1 Why add them to DBM::Deep?

The ability to have actions occur in either I<atomically> (as in the previous
example) or I<isolation> from the rest of the users of the data is a powerful
thing. This allows for a certain amount of safety and predictability in how
data transformations occur. Imagine, for example, that you have a set of
calculations that will update various variables. However, there are some
situations that will cause you to throw away all results and start over with a
different seed. Without transactions, you would have to put everything into
temporary variables, then transfer the values when the calculations were found
to be successful. With STM, you start a transaction and do your thing within
it. If the calculations succeed, you commit. If they fail, you rollback and
try again. If you're thinking that this is very similar to how SVN or CVS
works, you're absolutely correct - they are transactional in the exact same
way.

=head1 How it happened

=head2 The backstory

The addition of transactions to L<DBM::Deep|DBM::Deep> has easily been the single most
complex software endeavor I've ever undertaken. The first step was to figure
out exactly how transactions were going to work. After several spikesN<These
are throwaway coding explorations.>, the best design seemed to look to SVN
instead of relational databases. The more I investigated, the more I ran up
against the object-relational impedance mismatch
N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
time in terms of being able to translate designs. In the relational world,
transactions are generally implemented either as row-level locks or using MVCC
N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
these assume that there is a I<row>, or singular object, that can be locked
transparently to everything else. This doesn't translate to a fractally
repeating structure like a hash or an array.

However, the design used by SVN deals with directories and files which
corresponds very closely to hashes and hashkeys. In SVN, the modifications are
stored in the file's structure. Translating this to hashes and hashkeys, this
means that transactional information should be stored in the keys. This means
that the entire datafile is unaware of anything to do with transactions, except
for the key's data structure within the bucket.

=head2 DBM::Deep's file structure

L<DBM::Deep|DBM::Deep>'s file structure is a record-based structure. The key (or array
index - arrays are currently just funny hashes internally) is hashed using MD5
and then stored in a cascade of Index and Bucketlist records. The bucketlist
record stores the actual key string and pointers to where the data records are
stored. The data records themselves are one of Null, Scalar, or Reference.
Null represents an I<undef>, Scalar represents a string (numbers are
stringified for simplicity) and are allocated in 256byte chunks.  References
represent an array or hash reference and contains a pointer to an Index and
Bucketlist cascade of its own.

=head2 Transactions in the keys

The first pass was to expand the Bucketlist sector to go from a simple key / 
datapointer mapping to a more complex key / transaction / datapointer mapping.
Initially, I interposed a Transaction record that the bucketlist pointed to.
That then contained the transaction / datapointer mapping. This had the
advantage of changing nothing except for adding one new sector type and the
handling for it. This was very quickly merged into the Bucketlist record to
simplify the resulting code.

This first step got me to the point where I could pass the following test:

  my $db1 = DBM::Deep->new( $filename );
  my $db2 = DBM::Deep->new( $filename );

  $db1->{abc} = 'foo';

  is( $db1->{abc}, 'foo' );
  is( $db2->{abc}, 'foo' );

  $db1->begin_work();

      is( $db1->{abc}, 'foo' );
      is( $db2->{abc}, 'foo' );

      $db1->{abc} = 'floober';

      is( $db1->{abc}, 'floober' );
      is( $db2->{abc}, 'foo' );

Just that much was a major accomplishment. The first pass only looked in the
transaction's spot in the bucket for that key. And, that passed my first tests
because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
modifying it in the transaction. To pass that test, the code for retrieval
needed to look first in the transaction's spot and if that spot had never been
assigned to, look at the spot for the HEAD.

=head2 The concept of the HEAD

This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
revision checked into the repository. When you do a ocal modification, you're
doing a modification to the HEAD. Then, you choose to either check in your
code (commit()) or revert (rollback()).

In L<DBM::Deep|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
benefits:

=over 4

=item * Easy identifiaction of a transaction

C<if ( $trans_id ) {}> will run the code if and only if we are in a running
transaction.

=item * The HEAD is the first bucket

In a given bucket, the HEAD is the first datapointer because we mutliply the
size of the transactional bookkeeping by the transaction ID to find the offset
to seek into the file.

=back

=head2 Protection from changes

Let's assume that a transaction is running in one process and another process
is also modifying the same area in the data. The only way that process B can
notify process A that a change has occurred is through the common point - the
DBM file. Because of this, every process that changes the HEAD needs to
protect all currently running transactions by copying over the pointer to the
original value into every transaction which hasn't already modified this
entry. (If it has, then the new value shouldn't overwrite the transaction's
modification.) This is the key piece for providing I<Isolation>.

=head2 Tracking modified buckets

Rolling back changes is very simple - just don't apply them to the HEAD. The
next time that transaction ID is reused, the changes will be ignored (q.v.
L</Staleness counters>). Committing, however, requires that all the changes
must be transferred over from the bucket entry for the given transaction ID to
the entry for the HEAD.

=head2 Deleted marker

Transactions are performed copy-on-write. This means that if there isn't an
entry for that transaction, the HEAD is looked at. This doesn't work if a key
has been deleted within a transaction. So, the entry must be marked as deleted
within the transaction so that the HEAD isn't checekd.

Likewise, when a new key is created in a transaction, the HEAD doesn't have an
entry for that key. Consider the following situation:

  ok( !exists $db1->{foo} );
  ok( !exists $db2->{foo} );

  $db1->begin_work();
  $db1->{foo} = 'bar';

  ok( !exists $db2->{foo} );

The entry for the HEAD for 'foo' needs to be marked as deleted so that
transactions which don't have 'foo' don't find something in the HEAD.

=head2 Freespace management

The second major piece to the 1.00 release was freespace management. In
pre-1.00 versions of L<DBM::Deep|DBM::Deep>, the space used by deleted keys would not be
recycled. While always a requested feature, the complexity required to
implement freespace meant that it needed to wait for a complete rewrite of
several pieces, such as for transactions.

Freespace is implemented by regularizing all the records so that L<DBM::Deep|DBM::Deep>
only has three different record sizes - Index, BucketList, and Data. Each
record type has a fixed length based on various parameters the L<DBM::Deep|DBM::Deep>
datafile is created with. (In order to accomodate values of various sizes, Data
records chain.) Whenever a sector is freed, it's added to a freelist of that
sector's size. Whenever a new sector is requested, the freelist is checked
first. If the freelist has a sector, it's reused, otherwise a new sector is
added to the end of the datafile.

Freespace management did bring up another issue - staleness. It is possible to
have a pointer to a record in memory. If that record is deleted, then reused,
the pointer in memory has no way of determining that is was deleted and
readded vs. modified. So, a staleness counter was added which is incremented
every time the sector is reused through the freelist. If you then attempt to
access that stale record, L<DBM::Deep|DBM::Deep> returns undef because, at some point,
the entry was deleted.

=head2 Staleness counters

Once it was implemented for freespace management, staleness counters proved to
be a very powerful concept for transactions themselves. Back in L<Protection
from changes>, I mentioned that other processes modifying the HEAD will
protect all running transactions from their effects. This provides
I<Isolation>. But, the running transaction doesn't know about these entries.
If they're not cleaned up, they will be seen the next time a transaction uses
that transaction ID.

By providing a staleness counter for transactions, the costs of cleaning up
finished transactions is deferred until the space is actually used again. This
is at the cost of having less-than-optimal space utilization. Changing this in
the future would be completely transparent to users, so I felt it was an
acceptable tradeoff for delivering working code quickly.

=head1 Conclusion

=cut
Commit	Line	Data
315b34eb	1	=head0 Adding transactions to DBM::Deep
	2
	3	=head1 What is DBM::Deep?
	4
4a38586b	5	L<DBM::Deep\|DBM::Deep> is a module written completely in Perl that provides a way of
315b34eb	6	storing Perl datastructures (scalars, hashes, and arrays) on disk instead of
	7	in memory. The datafile produced is able to be ftp'ed from one machine to
	8	another, regardless of OS or Perl version. There are several reasons why
	9	someone would want to do this.
	10
	11	=over 4
	12
	13	=item * Transparent Persistence
	14
	15	This is the ability to save a set of data structures to disk and retrieve them
	16	later without the vast majority of the program even knowing that the data is
	17	persisted. Furthermore, the datastructure is persisted immediately and not at
	18	set marshalling periods.
	19
	20	=item * Huge datastructures
	21
	22	Normally, datastructures are limited by the size of RAM the server has.
4a38586b	23	L<DBM::Deep\|DBM::Deep> allows for the size a given datastructure to be limited by disk
315b34eb	24	instead.
	25
	26	=item * IPC
	27
	28	While not a common use, this allows for inter-process communication without
	29	worrying about the specifics of how a given OS handles IPC.
	30
	31	=back
	32
	33	And, with the release of 1.00, there is now a fourth reason -
0dbc9d1d	34	software-transactional memory, or STM
0dbc9d1d	35	(L<http://en.wikipedia.org/wiki/Software_transactional_memory>).
315b34eb	36
	37	=head1 What are transactions?
	38
	39	Originally from the database world, a transaction is a way of isolating the
	40	effects of a given set of actions, then applying them all at once. It's a way
	41	of saying "I'm going to try the following steps, see if I like the result,
0dbc9d1d	42	then I want everyone else looking at this datastore to see the results
315b34eb	43	immediately." The most common example is taken from banking. Let's say that an
	44	application receives a request to have Joe pay Bob five zorkmids. Without
	45	transactions, the application would take the money from Joe's account, then
	46	add the money to Bob's account. But, what happens if the application crashes
	47	after debiting Joe, but before crediting Bob? The application has made money
	48	disappear. Or, vice versa, if Bob is credited before Joe is debited, the
0dbc9d1d	49	application has created money.
315b34eb	50
	51	With a transaction wrapping the money transfer, if the application crashes in
	52	the middle, it's as if the action never happened. So, when the application
	53	recovers from the crash, Joe and Bob still have the same amount of money in
	54	their accounts as they did before and the transaction can restart and Bob can
	55	finally receive his zorkmids.
	56
0dbc9d1d	57	More formally, transactions are generally considered to be proper when they are
	58	ACID-compliant. ACID is an acronym that means the following:
	59
	60	=over 4
	61
	62	=item * Atomic
	63
	64	Either every change happens or none of the changes happen.
	65
	66	=item * Consistent
	67
	68	When the transaction begins and when it is committed, the database must be in
4a38586b	69	a legal state. This restriction doesn't apply to L<DBM::Deep\|DBM::Deep> very much.
0dbc9d1d	70
	71	=item * Isolated
	72
	73	As far as a transaction is concerned, it is the only thing running against the
4a38586b	74	database while it is running. Unlike most RDBMSes, L<DBM::Deep\|DBM::Deep> provides the
0dbc9d1d	75	strongest isolation level possible.
	76
	77	=item * Durable
	78
	79	Once the database says that a comit has happened, the commit will be
	80	guaranteed, regardless of whatever happens.
	81
	82	=back
	83
315b34eb	84	=head1 Why add them to DBM::Deep?
	85
	86	The ability to have actions occur in either I<atomically> (as in the previous
	87	example) or I<isolation> from the rest of the users of the data is a powerful
0dbc9d1d	88	thing. This allows for a certain amount of safety and predictability in how
0dbc9d1d	89	data transformations occur. Imagine, for example, that you have a set of
315b34eb	90	calculations that will update various variables. However, there are some
	91	situations that will cause you to throw away all results and start over with a
	92	different seed. Without transactions, you would have to put everything into
	93	temporary variables, then transfer the values when the calculations were found
	94	to be successful. With STM, you start a transaction and do your thing within
	95	it. If the calculations succeed, you commit. If they fail, you rollback and
d7866e36	96	try again. If you're thinking that this is very similar to how SVN or CVS
0dbc9d1d	97	works, you're absolutely correct - they are transactional in the exact same
d7866e36	98	way.
315b34eb	99
	100	=head1 How it happened
	101
d7866e36	102	=head2 The backstory
d7866e36	103
4a38586b	104	The addition of transactions to L<DBM::Deep\|DBM::Deep> has easily been the single most
0dbc9d1d	105	complex software endeavor I've ever undertaken. The first step was to figure
	106	out exactly how transactions were going to work. After several spikesN<These
	107	are throwaway coding explorations.>, the best design seemed to look to SVN
	108	instead of relational databases. The more I investigated, the more I ran up
	109	against the object-relational impedance mismatch
	110	N<http://en.wikipedia.org/wiki/Object-Relational_Impedance_Mismatch>, this
	111	time in terms of being able to translate designs. In the relational world,
	112	transactions are generally implemented either as row-level locks or using MVCC
	113	N<http://en.wikipedia.org/wiki/Multiversion_concurrency_control>. Both of
	114	these assume that there is a I<row>, or singular object, that can be locked
	115	transparently to everything else. This doesn't translate to a fractally
	116	repeating structure like a hash or an array.
	117
	118	However, the design used by SVN deals with directories and files which
	119	corresponds very closely to hashes and hashkeys. In SVN, the modifications are
	120	stored in the file's structure. Translating this to hashes and hashkeys, this
	121	means that transactional information should be stored in the keys. This means
	122	that the entire datafile is unaware of anything to do with transactions, except
	123	for the key's data structure within the bucket.
bb67b124	124
d7866e36	125	=head2 DBM::Deep's file structure
d7866e36	126
4a38586b	127	L<DBM::Deep\|DBM::Deep>'s file structure is a record-based structure. The key (or array
0dbc9d1d	128	index - arrays are currently just funny hashes internally) is hashed using MD5
	129	and then stored in a cascade of Index and Bucketlist records. The bucketlist
	130	record stores the actual key string and pointers to where the data records are
bb67b124	131	stored. The data records themselves are one of Null, Scalar, or Reference.
bb67b124	132	Null represents an I<undef>, Scalar represents a string (numbers are
0dbc9d1d	133	stringified for simplicity) and are allocated in 256byte chunks. References
0dbc9d1d	134	represent an array or hash reference and contains a pointer to an Index and
bb67b124	135	Bucketlist cascade of its own.
	136
	137	=head2 Transactions in the keys
	138
	139	The first pass was to expand the Bucketlist sector to go from a simple key /
	140	datapointer mapping to a more complex key / transaction / datapointer mapping.
	141	Initially, I interposed a Transaction record that the bucketlist pointed to.
	142	That then contained the transaction / datapointer mapping. This had the
	143	advantage of changing nothing except for adding one new sector type and the
	144	handling for it. This was very quickly merged into the Bucketlist record to
	145	simplify the resulting code.
	146
	147	This first step got me to the point where I could pass the following test:
	148
	149	my $db1 = DBM::Deep->new( $filename );
	150	my $db2 = DBM::Deep->new( $filename );
	151
	152	$db1->{abc} = 'foo';
	153
	154	is( $db1->{abc}, 'foo' );
	155	is( $db2->{abc}, 'foo' );
	156
	157	$db1->begin_work();
	158
	159	is( $db1->{abc}, 'foo' );
	160	is( $db2->{abc}, 'foo' );
	161
	162	$db1->{abc} = 'floober';
	163
	164	is( $db1->{abc}, 'floober' );
	165	is( $db2->{abc}, 'foo' );
	166
	167	Just that much was a major accomplishment. The first pass only looked in the
	168	transaction's spot in the bucket for that key. And, that passed my first tests
	169	because I didn't check that C<$db1-E<gt>{abc}> was still 'foo' I<before>
	170	modifying it in the transaction. To pass that test, the code for retrieval
	171	needed to look first in the transaction's spot and if that spot had never been
	172	assigned to, look at the spot for the HEAD.
	173
	174	=head2 The concept of the HEAD
	175
	176	This is a concept borrowed from SVN. In SVN, the HEAD revision is the latest
	177	revision checked into the repository. When you do a ocal modification, you're
	178	doing a modification to the HEAD. Then, you choose to either check in your
	179	code (commit()) or revert (rollback()).
	180
4a38586b	181	In L<DBM::Deep\|DBM::Deep>, I chose to make the HEAD transaction ID 0. This has several
bb67b124	182	benefits:
d7866e36	183
	184	=over 4
	185
bb67b124	186	=item * Easy identifiaction of a transaction
bb67b124	187
0dbc9d1d	188	C<if ( $trans_id ) {}> will run the code if and only if we are in a running
0dbc9d1d	189	transaction.
bb67b124	190
bb67b124	191	=item * The HEAD is the first bucket
d7866e36	192
bb67b124	193	In a given bucket, the HEAD is the first datapointer because we mutliply the
	194	size of the transactional bookkeeping by the transaction ID to find the offset
	195	to seek into the file.
d7866e36	196
d7866e36	197	=back
315b34eb	198
0dbc9d1d	199	=head2 Protection from changes
	200
	201	Let's assume that a transaction is running in one process and another process
	202	is also modifying the same area in the data. The only way that process B can
	203	notify process A that a change has occurred is through the common point - the
	204	DBM file. Because of this, every process that changes the HEAD needs to
	205	protect all currently running transactions by copying over the pointer to the
	206	original value into every transaction which hasn't already modified this
	207	entry. (If it has, then the new value shouldn't overwrite the transaction's
	208	modification.) This is the key piece for providing I<Isolation>.
	209
bb67b124	210	=head2 Tracking modified buckets
bb67b124	211
0dbc9d1d	212	Rolling back changes is very simple - just don't apply them to the HEAD. The
	213	next time that transaction ID is reused, the changes will be ignored (q.v.
	214	L</Staleness counters>). Committing, however, requires that all the changes
	215	must be transferred over from the bucket entry for the given transaction ID to
	216	the entry for the HEAD.
bb67b124	217
	218	=head2 Deleted marker
	219
0dbc9d1d	220	Transactions are performed copy-on-write. This means that if there isn't an
	221	entry for that transaction, the HEAD is looked at. This doesn't work if a key
	222	has been deleted within a transaction. So, the entry must be marked as deleted
	223	within the transaction so that the HEAD isn't checekd.
	224
	225	Likewise, when a new key is created in a transaction, the HEAD doesn't have an
	226	entry for that key. Consider the following situation:
	227
	228	ok( !exists $db1->{foo} );
	229	ok( !exists $db2->{foo} );
	230
	231	$db1->begin_work();
	232	$db1->{foo} = 'bar';
	233
	234	ok( !exists $db2->{foo} );
	235
	236	The entry for the HEAD for 'foo' needs to be marked as deleted so that
	237	transactions which don't have 'foo' don't find something in the HEAD.
bb67b124	238
	239	=head2 Freespace management
	240
0dbc9d1d	241	The second major piece to the 1.00 release was freespace management. In
4a38586b	242	pre-1.00 versions of L<DBM::Deep\|DBM::Deep>, the space used by deleted keys would not be
0dbc9d1d	243	recycled. While always a requested feature, the complexity required to
	244	implement freespace meant that it needed to wait for a complete rewrite of
	245	several pieces, such as for transactions.
	246
4a38586b	247	Freespace is implemented by regularizing all the records so that L<DBM::Deep\|DBM::Deep>
0dbc9d1d	248	only has three different record sizes - Index, BucketList, and Data. Each
4a38586b	249	record type has a fixed length based on various parameters the L<DBM::Deep\|DBM::Deep>
0dbc9d1d	250	datafile is created with. (In order to accomodate values of various sizes, Data
	251	records chain.) Whenever a sector is freed, it's added to a freelist of that
	252	sector's size. Whenever a new sector is requested, the freelist is checked
	253	first. If the freelist has a sector, it's reused, otherwise a new sector is
	254	added to the end of the datafile.
	255
	256	Freespace management did bring up another issue - staleness. It is possible to
	257	have a pointer to a record in memory. If that record is deleted, then reused,
	258	the pointer in memory has no way of determining that is was deleted and
	259	readded vs. modified. So, a staleness counter was added which is incremented
4a38586b	260	every time the sector is reused through the freelist. If you then attempt to
	261	access that stale record, L<DBM::Deep\|DBM::Deep> returns undef because, at some point,
	262	the entry was deleted.
0dbc9d1d	263
	264	=head2 Staleness counters
	265
4a38586b	266	Once it was implemented for freespace management, staleness counters proved to
	267	be a very powerful concept for transactions themselves. Back in L<Protection
	268	from changes>, I mentioned that other processes modifying the HEAD will
	269	protect all running transactions from their effects. This provides
	270	I<Isolation>. But, the running transaction doesn't know about these entries.
	271	If they're not cleaned up, they will be seen the next time a transaction uses
	272	that transaction ID.
	273
	274	By providing a staleness counter for transactions, the costs of cleaning up
	275	finished transactions is deferred until the space is actually used again. This
	276	is at the cost of having less-than-optimal space utilization. Changing this in
	277	the future would be completely transparent to users, so I felt it was an
	278	acceptable tradeoff for delivering working code quickly.
0dbc9d1d	279
0dbc9d1d	280	=head1 Conclusion
bb67b124	281
315b34eb	282	=cut