=head1 NAME DBM::Deep::Internals =head1 DESCRIPTION This is a document describing the internal workings of L. It is not necessary to read this document if you only intend to be a user. This document is intended for people who either want a deeper understanding of specifics of how L works or who wish to help program L. =head1 CLASS LAYOUT L is broken up into five classes in three inheritance hierarchies. =over 4 =item * L is the parent of L and L. These classes form the immediate interface to the outside world. They are the classes that provide the TIE mechanisms as well as the OO methods. =item * L is the layer that deals with the mechanics of reading and writing to the file. This is where the logic of the file layout is handled. =item * L is the layer that deals with the physical file. As a singleton that every other object has a reference to, it also provides a place to handle datastructure-wide items, such as transactions. =back =head1 FILE LAYOUT DBM::Deep uses a tagged file layout. Every section has a tag, a size, and then the data. =head2 File header =over 4 =item * File Signature The first four bytes are 'DPDB' in network byte order, signifying that this is a DBM::Deep file. =item * File tag/size This is the tagging of the file header. The file used by versions prior to 1.00 had a different fifth byte, allowing the difference to the determined. =item * Version This is four bytes containing the header version. This lets the header change over time. =item * Transaction information The current running transactions are stored here, as is the next transaction ID. =item * Constants These are the file-wide constants that determine how the file is laid out. They can only be set upon file creation. =back =head2 Index The Index parts can be tagged either as Hash, Array, or Index. The latter is if there was a reindexing due to a bucketlist growing too large. The others are the root index for their respective datatypes. The index consists of a tag, a size, and then 256 sections containing file locations. Each section corresponds to each value representable in a byte. The index is used as follows - whenever a hashed key is being looked up, the first byte is used to determine which location to go to from the root index. Then, if that's also an index, the second byte is used, and so forth until a bucketlist is found. =head2 Bucketlist This is the part that contains the link to the data section. A bucketlist defaults to being 16 buckets long (modifiable by the I parameter used when creating a new file). Each bucket contains an MD5 and a location of the appropriate key section. =head2 Key area This is the part that handles transactional awareness. There are I sections. Each section contains the location to the data section, a transaction ID, and whether that transaction considers this key to be deleted or not. =head2 Data area This is the part that actual stores the key, value, and class (if appropriate). The layout is: =over 4 =item * tag =item * length of the value =item * the actual value =item * keylength =item * the actual key =item * a byte indicating if this value has a classname =item * the classname (if one is there) =back The key is stored after the value because the value is requested more often than the key. =head1 PERFORMANCE L is written completely in Perl. It also is a multi-process DBM that uses the datafile as a method of synchronizing between multiple processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore, unlike all RDBMSes, L stores both the data and the structure of that data as it would appear in a Perl program. =head2 CPU DBM::Deep attempts to be CPU-light. As it stores all the data on disk, DBM::Deep is I/O-bound, not CPU-bound. =head2 RAM DBM::Deep uses extremely little RAM relative to the amount of data you can access. You can iterate through a million keys (using C) without increasing your memeory usage at all. =head2 DISK DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster DBM::Deep will be. Currently, when performing C{foo}>, there are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your data). (All values assume a medium filesize.) The actions take are: =over 4 =item 1 Lock the file =item 1 Perform a stat() to determine if the inode has changed =item 1 Go to the primary index for the $db (1 seek) =item 1 Read the tag/size of the primary index (5 bytes) =item 1 Read the body of the primary index (1024 bytes) =item 1 Go to the bucketlist for this MD5 (1 seek) =item 1 Read the tag/size of the bucketlist (5 bytes) =item 1 Read the body of the bucketlist (144 bytes) =item 1 Go to the keys location for this MD5 (1 seek) =item 1 Read the tag/size of the keys section (5 bytes) =item 1 Read the body of the keys location (144 bytes) =item 1 Go to the data section that corresponds to this transaction ID. (1 seek) =item 1 Read the tag/size of the data section (5 bytes) =item 1 Read the value for this data (N bytes) =item 1 Unlock the file =back Every additional level of indexing (if there are enough keys) requires an additional seek and the reading of 1029 additional bytes. If the value is blessed, an additional 1 seek and 9 + M bytes are read (where M is the length of the classname). Arrays are (currently) even worse because they're considered "funny hashes" with the length stored as just another key. This means that if you do any sort of lookup with a negative index, this entire process is performed twice - once for the length and once for the value. =cut