X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=lib%2FDBM%2FDeep%2FInternals.pod;h=132bc9eff17bb68bb204a04a6fb949b6200183e6;hb=66285e35e40d582589aef424377640ca96745fce;hp=d80c8163581149185a96325218df60e54a4ae8f9;hpb=d8db292980a47f863c228e8968c4069fd45d35d4;p=dbsrgits%2FDBM-Deep.git diff --git a/lib/DBM/Deep/Internals.pod b/lib/DBM/Deep/Internals.pod index d80c816..132bc9e 100644 --- a/lib/DBM/Deep/Internals.pod +++ b/lib/DBM/Deep/Internals.pod @@ -4,33 +4,37 @@ DBM::Deep::Internals =head1 DESCRIPTION -This is a document describing the internal workings of L. It is +B: This document is out-of-date. It describes an intermediate file +format used during the development from 0.983 to 1.0000. It will be rewritten +soon. + +This is a document describing the internal workings of L. It is not necessary to read this document if you only intend to be a user. This document is intended for people who either want a deeper understanding of -specifics of how L works or who wish to help program -L. +specifics of how L works or who wish to help program +L. =head1 CLASS LAYOUT -L is broken up into five classes in three inheritance hierarchies. +L is broken up into five classes in three inheritance hierarchies. =over 4 =item * -L is the parent of L and L. +L is the parent of L and L. These classes form the immediate interface to the outside world. They are the classes that provide the TIE mechanisms as well as the OO methods. =item * -L is the layer that deals with the mechanics of reading +L is the layer that deals with the mechanics of reading and writing to the file. This is where the logic of the file layout is handled. =item * -L is the layer that deals with the physical file. As a +L is the layer that deals with the physical file. As a singleton that every other object has a reference to, it also provides a place to handle datastructure-wide items, such as transactions. @@ -57,17 +61,22 @@ This is the tagging of the file header. The file used by versions prior to =item * Version -This is four bytes containing the header version. This lets the header change over time. +This is four bytes containing the file version. This lets the file format change over time. + +=item * Constants + +These are the file-wide constants that determine how the file is laid out. +They can only be set upon file creation. =item * Transaction information The current running transactions are stored here, as is the next transaction ID. -=item * Constants +=item * Freespace information -These are the file-wide constants that determine how the file is laid out. -They can only be set upon file creation. +Pointers into the next free sectors of the various sector sizes (Index, +Bucketlist, and Data) are stored here. =back @@ -126,10 +135,10 @@ than the key. =head1 PERFORMANCE -L is written completely in Perl. It also is a multi-process DBM +L is written completely in Perl. It also is a multi-process DBM that uses the datafile as a method of synchronizing between multiple processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore, -unlike all RDBMSes, L stores both the data and the structure of +unlike all RDBMSes, L stores both the data and the structure of that data as it would appear in a Perl program. =head2 CPU @@ -148,7 +157,7 @@ increasing your memeory usage at all. DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster DBM::Deep will be. Currently, when performing C{foo}>, there are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your -data). (All values assume a medium filesize.) The actions take are: +data). (All values assume a medium filesize.) The actions taken are: =over 4 @@ -194,4 +203,79 @@ with the length stored as just another key. This means that if you do any sort of lookup with a negative index, this entire process is performed twice - once for the length and once for the value. +=head1 ACTUAL TESTS + +=head2 SPEED + +Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, such as +the almighty I. But it makes up for it in features like true +multi-level hash/array support, and cross-platform FTPable files. Even so, +DBM::Deep is still pretty fast, and the speed stays fairly consistent, even +with huge databases. Here is some test data: + + Adding 1,000,000 keys to new DB file... + + At 100 keys, avg. speed is 2,703 keys/sec + At 200 keys, avg. speed is 2,642 keys/sec + At 300 keys, avg. speed is 2,598 keys/sec + At 400 keys, avg. speed is 2,578 keys/sec + At 500 keys, avg. speed is 2,722 keys/sec + At 600 keys, avg. speed is 2,628 keys/sec + At 700 keys, avg. speed is 2,700 keys/sec + At 800 keys, avg. speed is 2,607 keys/sec + At 900 keys, avg. speed is 2,190 keys/sec + At 1,000 keys, avg. speed is 2,570 keys/sec + At 2,000 keys, avg. speed is 2,417 keys/sec + At 3,000 keys, avg. speed is 1,982 keys/sec + At 4,000 keys, avg. speed is 1,568 keys/sec + At 5,000 keys, avg. speed is 1,533 keys/sec + At 6,000 keys, avg. speed is 1,787 keys/sec + At 7,000 keys, avg. speed is 1,977 keys/sec + At 8,000 keys, avg. speed is 2,028 keys/sec + At 9,000 keys, avg. speed is 2,077 keys/sec + At 10,000 keys, avg. speed is 2,031 keys/sec + At 20,000 keys, avg. speed is 1,970 keys/sec + At 30,000 keys, avg. speed is 2,050 keys/sec + At 40,000 keys, avg. speed is 2,073 keys/sec + At 50,000 keys, avg. speed is 1,973 keys/sec + At 60,000 keys, avg. speed is 1,914 keys/sec + At 70,000 keys, avg. speed is 2,091 keys/sec + At 80,000 keys, avg. speed is 2,103 keys/sec + At 90,000 keys, avg. speed is 1,886 keys/sec + At 100,000 keys, avg. speed is 1,970 keys/sec + At 200,000 keys, avg. speed is 2,053 keys/sec + At 300,000 keys, avg. speed is 1,697 keys/sec + At 400,000 keys, avg. speed is 1,838 keys/sec + At 500,000 keys, avg. speed is 1,941 keys/sec + At 600,000 keys, avg. speed is 1,930 keys/sec + At 700,000 keys, avg. speed is 1,735 keys/sec + At 800,000 keys, avg. speed is 1,795 keys/sec + At 900,000 keys, avg. speed is 1,221 keys/sec + At 1,000,000 keys, avg. speed is 1,077 keys/sec + +This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & Perl +5.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash keys and +values were between 6 - 12 chars in length. The DB file ended up at 210MB. +Run time was 12 min 3 sec. + +=head2 MEMORY USAGE + +One of the great things about L is that it uses very little memory. +Even with huge databases (1,000,000+ keys) you will not see much increased +memory on your process. L relies solely on the filesystem for storing +and fetching data. Here is output from I before even opening a database +handle: + + PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND + 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl + +Basically the process is taking 2,716K of memory. And here is the same +process after storing and fetching 1,000,000 keys: + + PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND + 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl + +Notice the memory usage increased by only 56K. Test was performed on a 700mHz +x86 box running Linux RedHat 7.2 & Perl 5.6.1. + =cut