[dbsrgits/DBM-Deep.git] / lib / DBM / Deep / Internals.pod

=head1 NAME

DBM::Deep::Internals

=head1 OUT OF DATE

This document is out-of-date. It describes an intermediate file format used
during the development from 0.983 to 1.0000. It will be rewritten soon.

=head1 DESCRIPTION

This is a document describing the internal workings of L<DBM::Deep>. It is
not necessary to read this document if you only intend to be a user. This
document is intended for people who either want a deeper understanding of
specifics of how L<DBM::Deep> works or who wish to help program
L<DBM::Deep>.

=head1 CLASS LAYOUT

L<DBM::Deep> is broken up into five classes in three inheritance hierarchies.

=over 4

=item *

L<DBM::Deep> is the parent of L<DBM::Deep::Array> and L<DBM::Deep::Hash>.
These classes form the immediate interface to the outside world. They are the
classes that provide the TIE mechanisms as well as the OO methods.

=item *

L<DBM::Deep::Engine> is the layer that deals with the mechanics of reading
and writing to the file. This is where the logic of the file layout is
handled.

=item *

L<DBM::Deep::File> is the layer that deals with the physical file. As a
singleton that every other object has a reference to, it also provides a place
to handle datastructure-wide items, such as transactions.

=back

=head1 FILE LAYOUT

DBM::Deep uses a tagged file layout. Every section has a tag, a size, and then
the data.

=head2 File header

=over 4

=item * File Signature

The first four bytes are 'DPDB' in network byte order, signifying that this is
a DBM::Deep file.

=item * File tag/size

This is the tagging of the file header. The file used by versions prior to
1.00 had a different fifth byte, allowing the difference to the determined.

=item * Version

This is four bytes containing the file version. This lets the file format change over time.

=item * Constants

These are the file-wide constants that determine how the file is laid out.
They can only be set upon file creation.

=item * Transaction information

The current running transactions are stored here, as is the next transaction
ID.

=item * Freespace information

Pointers into the next free sectors of the various sector sizes (Index,
Bucketlist, and Data) are stored here.

=back

=head2 Index

The Index parts can be tagged either as Hash, Array, or Index. The latter
is if there was a reindexing due to a bucketlist growing too large. The others
are the root index for their respective datatypes. The index consists of a
tag, a size, and then 256 sections containing file locations. Each section
corresponds to each value representable in a byte.

The index is used as follows - whenever a hashed key is being looked up, the
first byte is used to determine which location to go to from the root index.
Then, if that's also an index, the second byte is used, and so forth until a
bucketlist is found.

=head2 Bucketlist

This is the part that contains the link to the data section. A bucketlist
defaults to being 16 buckets long (modifiable by the I<max_buckets>
parameter used when creating a new file). Each bucket contains an MD5 and a
location of the appropriate key section.

=head2 Key area

This is the part that handles transactional awareness. There are
I<max_buckets> sections. Each section contains the location to the data
section, a transaction ID, and whether that transaction considers this key to
be deleted or not.

=head2 Data area

This is the part that actual stores the key, value, and class (if
appropriate). The layout is:

=over 4

=item * tag

=item * length of the value

=item * the actual value

=item * keylength

=item * the actual key

=item * a byte indicating if this value has a classname

=item * the classname (if one is there)

=back

The key is stored after the value because the value is requested more often
than the key.

=head1 PERFORMANCE

L<DBM::Deep> is written completely in Perl. It also is a multi-process DBM
that uses the datafile as a method of synchronizing between multiple
processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore,
unlike all RDBMSes, L<DBM::Deep> stores both the data and the structure of
that data as it would appear in a Perl program.

=head2 CPU

DBM::Deep attempts to be CPU-light. As it stores all the data on disk,
DBM::Deep is I/O-bound, not CPU-bound.

=head2 RAM

DBM::Deep uses extremely little RAM relative to the amount of data you can
access. You can iterate through a million keys (using C<each()>) without
increasing your memeory usage at all.

=head2 DISK

DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster
DBM::Deep will be. Currently, when performing C<my $x = $db-E<gt>{foo}>, there
are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your
data). (All values assume a medium filesize.) The actions taken are:

=over 4

=item 1 Lock the file

=item 1 Perform a stat() to determine if the inode has changed

=item 1 Go to the primary index for the $db (1 seek)

=item 1 Read the tag/size of the primary index (5 bytes)

=item 1 Read the body of the primary index (1024 bytes)

=item 1 Go to the bucketlist for this MD5 (1 seek)

=item 1 Read the tag/size of the bucketlist (5 bytes)

=item 1 Read the body of the bucketlist (144 bytes)

=item 1 Go to the keys location for this MD5 (1 seek)

=item 1 Read the tag/size of the keys section (5 bytes)

=item 1 Read the body of the keys location (144 bytes)

=item 1 Go to the data section that corresponds to this transaction ID. (1 seek)

=item 1 Read the tag/size of the data section (5 bytes)

=item 1 Read the value for this data (N bytes)

=item 1 Unlock the file

=back

Every additional level of indexing (if there are enough keys) requires an
additional seek and the reading of 1029 additional bytes. If the value is
blessed, an additional 1 seek and 9 + M bytes are read (where M is the length
of the classname).

Arrays are (currently) even worse because they're considered "funny hashes"
with the length stored as just another key. This means that if you do any sort
of lookup with a negative index, this entire process is performed twice - once
for the length and once for the value.

=head1 ACTUAL TESTS

=head2 SPEED

Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, such as
the almighty I<BerkeleyDB>.  But it makes up for it in features like true
multi-level hash/array support, and cross-platform FTPable files.  Even so,
DBM::Deep is still pretty fast, and the speed stays fairly consistent, even
with huge databases.  Here is some test data:

    Adding 1,000,000 keys to new DB file...

    At 100 keys, avg. speed is 2,703 keys/sec
    At 200 keys, avg. speed is 2,642 keys/sec
    At 300 keys, avg. speed is 2,598 keys/sec
    At 400 keys, avg. speed is 2,578 keys/sec
    At 500 keys, avg. speed is 2,722 keys/sec
    At 600 keys, avg. speed is 2,628 keys/sec
    At 700 keys, avg. speed is 2,700 keys/sec
    At 800 keys, avg. speed is 2,607 keys/sec
    At 900 keys, avg. speed is 2,190 keys/sec
    At 1,000 keys, avg. speed is 2,570 keys/sec
    At 2,000 keys, avg. speed is 2,417 keys/sec
    At 3,000 keys, avg. speed is 1,982 keys/sec
    At 4,000 keys, avg. speed is 1,568 keys/sec
    At 5,000 keys, avg. speed is 1,533 keys/sec
    At 6,000 keys, avg. speed is 1,787 keys/sec
    At 7,000 keys, avg. speed is 1,977 keys/sec
    At 8,000 keys, avg. speed is 2,028 keys/sec
    At 9,000 keys, avg. speed is 2,077 keys/sec
    At 10,000 keys, avg. speed is 2,031 keys/sec
    At 20,000 keys, avg. speed is 1,970 keys/sec
    At 30,000 keys, avg. speed is 2,050 keys/sec
    At 40,000 keys, avg. speed is 2,073 keys/sec
    At 50,000 keys, avg. speed is 1,973 keys/sec
    At 60,000 keys, avg. speed is 1,914 keys/sec
    At 70,000 keys, avg. speed is 2,091 keys/sec
    At 80,000 keys, avg. speed is 2,103 keys/sec
    At 90,000 keys, avg. speed is 1,886 keys/sec
    At 100,000 keys, avg. speed is 1,970 keys/sec
    At 200,000 keys, avg. speed is 2,053 keys/sec
    At 300,000 keys, avg. speed is 1,697 keys/sec
    At 400,000 keys, avg. speed is 1,838 keys/sec
    At 500,000 keys, avg. speed is 1,941 keys/sec
    At 600,000 keys, avg. speed is 1,930 keys/sec
    At 700,000 keys, avg. speed is 1,735 keys/sec
    At 800,000 keys, avg. speed is 1,795 keys/sec
    At 900,000 keys, avg. speed is 1,221 keys/sec
    At 1,000,000 keys, avg. speed is 1,077 keys/sec

This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & Perl
5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM.  The hash keys and
values were between 6 - 12 chars in length.  The DB file ended up at 210MB.
Run time was 12 min 3 sec.

=head2 MEMORY USAGE

One of the great things about L<DBM::Deep> is that it uses very little memory.
Even with huge databases (1,000,000+ keys) you will not see much increased
memory on your process.  L<DBM::Deep> relies solely on the filesystem for storing
and fetching data.  Here is output from I<top> before even opening a database
handle:

    PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  22831 root      11   0  2716 2716  1296 R     0.0  0.2   0:07 perl

Basically the process is taking 2,716K of memory.  And here is the same
process after storing and fetching 1,000,000 keys:

    PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  22831 root      14   0  2772 2772  1328 R     0.0  0.2  13:32 perl

Notice the memory usage increased by only 56K.  Test was performed on a 700mHz
x86 box running Linux RedHat 7.2 & Perl 5.6.1.

=cut
Commit	Line	Data
d8db2929	1	=head1 NAME
	2
	3	DBM::Deep::Internals
	4
417f635b	5	=head1 OUT OF DATE
	6
	7	This document is out-of-date. It describes an intermediate file format used
	8	during the development from 0.983 to 1.0000. It will be rewritten soon.
d8db2929	9
417f635b	10	=head1 DESCRIPTION
1cff45d7	11
b8370759	12	This is a document describing the internal workings of L<DBM::Deep>. It is
d8db2929	13	not necessary to read this document if you only intend to be a user. This
d8db2929	14	document is intended for people who either want a deeper understanding of
b8370759	15	specifics of how L<DBM::Deep> works or who wish to help program
b8370759	16	L<DBM::Deep>.
d8db2929	17
	18	=head1 CLASS LAYOUT
	19
b8370759	20	L<DBM::Deep> is broken up into five classes in three inheritance hierarchies.
d8db2929	21
	22	=over 4
	23
	24	=item *
	25
b8370759	26	L<DBM::Deep> is the parent of L<DBM::Deep::Array> and L<DBM::Deep::Hash>.
d8db2929	27	These classes form the immediate interface to the outside world. They are the
	28	classes that provide the TIE mechanisms as well as the OO methods.
	29
	30	=item *
	31
b8370759	32	L<DBM::Deep::Engine> is the layer that deals with the mechanics of reading
d8db2929	33	and writing to the file. This is where the logic of the file layout is
	34	handled.
	35
	36	=item *
	37
b8370759	38	L<DBM::Deep::File> is the layer that deals with the physical file. As a
d8db2929	39	singleton that every other object has a reference to, it also provides a place
	40	to handle datastructure-wide items, such as transactions.
	41
	42	=back
	43
	44	=head1 FILE LAYOUT
	45
	46	DBM::Deep uses a tagged file layout. Every section has a tag, a size, and then
	47	the data.
	48
	49	=head2 File header
	50
	51	=over 4
	52
	53	=item * File Signature
	54
	55	The first four bytes are 'DPDB' in network byte order, signifying that this is
	56	a DBM::Deep file.
	57
	58	=item * File tag/size
	59
	60	This is the tagging of the file header. The file used by versions prior to
	61	1.00 had a different fifth byte, allowing the difference to the determined.
	62
	63	=item * Version
	64
e9b0b5f0	65	This is four bytes containing the file version. This lets the file format change over time.
	66
	67	=item * Constants
	68
	69	These are the file-wide constants that determine how the file is laid out.
	70	They can only be set upon file creation.
d8db2929	71
	72	=item * Transaction information
	73
	74	The current running transactions are stored here, as is the next transaction
	75	ID.
	76
e9b0b5f0	77	=item * Freespace information
d8db2929	78
e9b0b5f0	79	Pointers into the next free sectors of the various sector sizes (Index,
e9b0b5f0	80	Bucketlist, and Data) are stored here.
d8db2929	81
	82	=back
	83
	84	=head2 Index
	85
	86	The Index parts can be tagged either as Hash, Array, or Index. The latter
	87	is if there was a reindexing due to a bucketlist growing too large. The others
	88	are the root index for their respective datatypes. The index consists of a
	89	tag, a size, and then 256 sections containing file locations. Each section
	90	corresponds to each value representable in a byte.
	91
	92	The index is used as follows - whenever a hashed key is being looked up, the
	93	first byte is used to determine which location to go to from the root index.
	94	Then, if that's also an index, the second byte is used, and so forth until a
	95	bucketlist is found.
	96
	97	=head2 Bucketlist
	98
	99	This is the part that contains the link to the data section. A bucketlist
	100	defaults to being 16 buckets long (modifiable by the I<max_buckets>
	101	parameter used when creating a new file). Each bucket contains an MD5 and a
	102	location of the appropriate key section.
	103
	104	=head2 Key area
	105
	106	This is the part that handles transactional awareness. There are
	107	I<max_buckets> sections. Each section contains the location to the data
	108	section, a transaction ID, and whether that transaction considers this key to
	109	be deleted or not.
	110
	111	=head2 Data area
	112
	113	This is the part that actual stores the key, value, and class (if
	114	appropriate). The layout is:
	115
	116	=over 4
	117
	118	=item * tag
	119
	120	=item * length of the value
	121
	122	=item * the actual value
	123
	124	=item * keylength
	125
	126	=item * the actual key
	127
	128	=item * a byte indicating if this value has a classname
	129
	130	=item * the classname (if one is there)
	131
	132	=back
	133
	134	The key is stored after the value because the value is requested more often
	135	than the key.
	136
	137	=head1 PERFORMANCE
	138
b8370759	139	L<DBM::Deep> is written completely in Perl. It also is a multi-process DBM
d8db2929	140	that uses the datafile as a method of synchronizing between multiple
d8db2929	141	processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore,
b8370759	142	unlike all RDBMSes, L<DBM::Deep> stores both the data and the structure of
d8db2929	143	that data as it would appear in a Perl program.
	144
	145	=head2 CPU
	146
	147	DBM::Deep attempts to be CPU-light. As it stores all the data on disk,
	148	DBM::Deep is I/O-bound, not CPU-bound.
	149
	150	=head2 RAM
	151
	152	DBM::Deep uses extremely little RAM relative to the amount of data you can
	153	access. You can iterate through a million keys (using C<each()>) without
	154	increasing your memeory usage at all.
	155
	156	=head2 DISK
	157
	158	DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster
	159	DBM::Deep will be. Currently, when performing C<my $x = $db-E<gt>{foo}>, there
	160	are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your
e9b0b5f0	161	data). (All values assume a medium filesize.) The actions taken are:
d8db2929	162
	163	=over 4
	164
	165	=item 1 Lock the file
	166
	167	=item 1 Perform a stat() to determine if the inode has changed
	168
	169	=item 1 Go to the primary index for the $db (1 seek)
	170
	171	=item 1 Read the tag/size of the primary index (5 bytes)
	172
	173	=item 1 Read the body of the primary index (1024 bytes)
	174
	175	=item 1 Go to the bucketlist for this MD5 (1 seek)
	176
	177	=item 1 Read the tag/size of the bucketlist (5 bytes)
	178
	179	=item 1 Read the body of the bucketlist (144 bytes)
	180
	181	=item 1 Go to the keys location for this MD5 (1 seek)
	182
	183	=item 1 Read the tag/size of the keys section (5 bytes)
	184
	185	=item 1 Read the body of the keys location (144 bytes)
	186
	187	=item 1 Go to the data section that corresponds to this transaction ID. (1 seek)
	188
	189	=item 1 Read the tag/size of the data section (5 bytes)
	190
	191	=item 1 Read the value for this data (N bytes)
	192
	193	=item 1 Unlock the file
	194
	195	=back
	196
	197	Every additional level of indexing (if there are enough keys) requires an
	198	additional seek and the reading of 1029 additional bytes. If the value is
	199	blessed, an additional 1 seek and 9 + M bytes are read (where M is the length
	200	of the classname).
	201
	202	Arrays are (currently) even worse because they're considered "funny hashes"
	203	with the length stored as just another key. This means that if you do any sort
	204	of lookup with a negative index, this entire process is performed twice - once
	205	for the length and once for the value.
	206
1990c72d	207	=head1 ACTUAL TESTS
	208
	209	=head2 SPEED
	210
	211	Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, such as
	212	the almighty I<BerkeleyDB>. But it makes up for it in features like true
	213	multi-level hash/array support, and cross-platform FTPable files. Even so,
	214	DBM::Deep is still pretty fast, and the speed stays fairly consistent, even
	215	with huge databases. Here is some test data:
	216
	217	Adding 1,000,000 keys to new DB file...
	218
	219	At 100 keys, avg. speed is 2,703 keys/sec
	220	At 200 keys, avg. speed is 2,642 keys/sec
	221	At 300 keys, avg. speed is 2,598 keys/sec
	222	At 400 keys, avg. speed is 2,578 keys/sec
	223	At 500 keys, avg. speed is 2,722 keys/sec
	224	At 600 keys, avg. speed is 2,628 keys/sec
	225	At 700 keys, avg. speed is 2,700 keys/sec
	226	At 800 keys, avg. speed is 2,607 keys/sec
	227	At 900 keys, avg. speed is 2,190 keys/sec
	228	At 1,000 keys, avg. speed is 2,570 keys/sec
	229	At 2,000 keys, avg. speed is 2,417 keys/sec
	230	At 3,000 keys, avg. speed is 1,982 keys/sec
	231	At 4,000 keys, avg. speed is 1,568 keys/sec
	232	At 5,000 keys, avg. speed is 1,533 keys/sec
	233	At 6,000 keys, avg. speed is 1,787 keys/sec
	234	At 7,000 keys, avg. speed is 1,977 keys/sec
	235	At 8,000 keys, avg. speed is 2,028 keys/sec
	236	At 9,000 keys, avg. speed is 2,077 keys/sec
	237	At 10,000 keys, avg. speed is 2,031 keys/sec
	238	At 20,000 keys, avg. speed is 1,970 keys/sec
	239	At 30,000 keys, avg. speed is 2,050 keys/sec
	240	At 40,000 keys, avg. speed is 2,073 keys/sec
	241	At 50,000 keys, avg. speed is 1,973 keys/sec
	242	At 60,000 keys, avg. speed is 1,914 keys/sec
	243	At 70,000 keys, avg. speed is 2,091 keys/sec
	244	At 80,000 keys, avg. speed is 2,103 keys/sec
	245	At 90,000 keys, avg. speed is 1,886 keys/sec
	246	At 100,000 keys, avg. speed is 1,970 keys/sec
	247	At 200,000 keys, avg. speed is 2,053 keys/sec
	248	At 300,000 keys, avg. speed is 1,697 keys/sec
	249	At 400,000 keys, avg. speed is 1,838 keys/sec
	250	At 500,000 keys, avg. speed is 1,941 keys/sec
	251	At 600,000 keys, avg. speed is 1,930 keys/sec
	252	At 700,000 keys, avg. speed is 1,735 keys/sec
	253	At 800,000 keys, avg. speed is 1,795 keys/sec
	254	At 900,000 keys, avg. speed is 1,221 keys/sec
	255	At 1,000,000 keys, avg. speed is 1,077 keys/sec
	256
	257	This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & Perl
	258	5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash keys and
	259	values were between 6 - 12 chars in length. The DB file ended up at 210MB.
	260	Run time was 12 min 3 sec.
	261
	262	=head2 MEMORY USAGE
	263
b8370759	264	One of the great things about L<DBM::Deep> is that it uses very little memory.
1990c72d	265	Even with huge databases (1,000,000+ keys) you will not see much increased
b8370759	266	memory on your process. L<DBM::Deep> relies solely on the filesystem for storing
1990c72d	267	and fetching data. Here is output from I<top> before even opening a database
	268	handle:
	269
	270	PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
	271	22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl
	272
	273	Basically the process is taking 2,716K of memory. And here is the same
	274	process after storing and fetching 1,000,000 keys:
	275
	276	PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
	277	22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl
	278
	279	Notice the memory usage increased by only 56K. Test was performed on a 700mHz
	280	x86 box running Linux RedHat 7.2 & Perl 5.6.1.
	281
d8db2929	282	=cut