Merged with master and am ready to merge back
[dbsrgits/DBM-Deep.git] / lib / DBM / Deep / Internals.pod
CommitLineData
d8db2929 1=head1 NAME
2
3DBM::Deep::Internals
4
417f635b 5=head1 OUT OF DATE
6
7This document is out-of-date. It describes an intermediate file format used
8during the development from 0.983 to 1.0000. It will be rewritten soon.
d8db2929 9
417f635b 10=head1 DESCRIPTION
1cff45d7 11
b8370759 12This is a document describing the internal workings of L<DBM::Deep>. It is
d8db2929 13not necessary to read this document if you only intend to be a user. This
14document is intended for people who either want a deeper understanding of
b8370759 15specifics of how L<DBM::Deep> works or who wish to help program
16L<DBM::Deep>.
d8db2929 17
18=head1 CLASS LAYOUT
19
b8370759 20L<DBM::Deep> is broken up into five classes in three inheritance hierarchies.
d8db2929 21
22=over 4
23
24=item *
25
b8370759 26L<DBM::Deep> is the parent of L<DBM::Deep::Array> and L<DBM::Deep::Hash>.
d8db2929 27These classes form the immediate interface to the outside world. They are the
28classes that provide the TIE mechanisms as well as the OO methods.
29
30=item *
31
b8370759 32L<DBM::Deep::Engine> is the layer that deals with the mechanics of reading
d8db2929 33and writing to the file. This is where the logic of the file layout is
34handled.
35
36=item *
37
b8370759 38L<DBM::Deep::File> is the layer that deals with the physical file. As a
d8db2929 39singleton that every other object has a reference to, it also provides a place
40to handle datastructure-wide items, such as transactions.
41
42=back
43
44=head1 FILE LAYOUT
45
46DBM::Deep uses a tagged file layout. Every section has a tag, a size, and then
47the data.
48
49=head2 File header
50
51=over 4
52
53=item * File Signature
54
55The first four bytes are 'DPDB' in network byte order, signifying that this is
56a DBM::Deep file.
57
58=item * File tag/size
59
60This is the tagging of the file header. The file used by versions prior to
611.00 had a different fifth byte, allowing the difference to the determined.
62
63=item * Version
64
e9b0b5f0 65This is four bytes containing the file version. This lets the file format change over time.
66
67=item * Constants
68
69These are the file-wide constants that determine how the file is laid out.
70They can only be set upon file creation.
d8db2929 71
72=item * Transaction information
73
74The current running transactions are stored here, as is the next transaction
75ID.
76
e9b0b5f0 77=item * Freespace information
d8db2929 78
e9b0b5f0 79Pointers into the next free sectors of the various sector sizes (Index,
80Bucketlist, and Data) are stored here.
d8db2929 81
82=back
83
84=head2 Index
85
86The Index parts can be tagged either as Hash, Array, or Index. The latter
87is if there was a reindexing due to a bucketlist growing too large. The others
88are the root index for their respective datatypes. The index consists of a
89tag, a size, and then 256 sections containing file locations. Each section
90corresponds to each value representable in a byte.
91
92The index is used as follows - whenever a hashed key is being looked up, the
93first byte is used to determine which location to go to from the root index.
94Then, if that's also an index, the second byte is used, and so forth until a
95bucketlist is found.
96
97=head2 Bucketlist
98
99This is the part that contains the link to the data section. A bucketlist
100defaults to being 16 buckets long (modifiable by the I<max_buckets>
101parameter used when creating a new file). Each bucket contains an MD5 and a
102location of the appropriate key section.
103
104=head2 Key area
105
106This is the part that handles transactional awareness. There are
107I<max_buckets> sections. Each section contains the location to the data
108section, a transaction ID, and whether that transaction considers this key to
109be deleted or not.
110
111=head2 Data area
112
113This is the part that actual stores the key, value, and class (if
114appropriate). The layout is:
115
116=over 4
117
118=item * tag
119
120=item * length of the value
121
122=item * the actual value
123
124=item * keylength
125
126=item * the actual key
127
128=item * a byte indicating if this value has a classname
129
130=item * the classname (if one is there)
131
132=back
133
134The key is stored after the value because the value is requested more often
135than the key.
136
137=head1 PERFORMANCE
138
b8370759 139L<DBM::Deep> is written completely in Perl. It also is a multi-process DBM
d8db2929 140that uses the datafile as a method of synchronizing between multiple
141processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore,
b8370759 142unlike all RDBMSes, L<DBM::Deep> stores both the data and the structure of
d8db2929 143that data as it would appear in a Perl program.
144
145=head2 CPU
146
147DBM::Deep attempts to be CPU-light. As it stores all the data on disk,
148DBM::Deep is I/O-bound, not CPU-bound.
149
150=head2 RAM
151
152DBM::Deep uses extremely little RAM relative to the amount of data you can
153access. You can iterate through a million keys (using C<each()>) without
154increasing your memeory usage at all.
155
156=head2 DISK
157
158DBM::Deep is I/O-bound, pure and simple. The faster your disk, the faster
159DBM::Deep will be. Currently, when performing C<my $x = $db-E<gt>{foo}>, there
160are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your
e9b0b5f0 161data). (All values assume a medium filesize.) The actions taken are:
d8db2929 162
163=over 4
164
165=item 1 Lock the file
166
167=item 1 Perform a stat() to determine if the inode has changed
168
169=item 1 Go to the primary index for the $db (1 seek)
170
171=item 1 Read the tag/size of the primary index (5 bytes)
172
173=item 1 Read the body of the primary index (1024 bytes)
174
175=item 1 Go to the bucketlist for this MD5 (1 seek)
176
177=item 1 Read the tag/size of the bucketlist (5 bytes)
178
179=item 1 Read the body of the bucketlist (144 bytes)
180
181=item 1 Go to the keys location for this MD5 (1 seek)
182
183=item 1 Read the tag/size of the keys section (5 bytes)
184
185=item 1 Read the body of the keys location (144 bytes)
186
187=item 1 Go to the data section that corresponds to this transaction ID. (1 seek)
188
189=item 1 Read the tag/size of the data section (5 bytes)
190
191=item 1 Read the value for this data (N bytes)
192
193=item 1 Unlock the file
194
195=back
196
197Every additional level of indexing (if there are enough keys) requires an
198additional seek and the reading of 1029 additional bytes. If the value is
199blessed, an additional 1 seek and 9 + M bytes are read (where M is the length
200of the classname).
201
202Arrays are (currently) even worse because they're considered "funny hashes"
203with the length stored as just another key. This means that if you do any sort
204of lookup with a negative index, this entire process is performed twice - once
205for the length and once for the value.
206
1990c72d 207=head1 ACTUAL TESTS
208
209=head2 SPEED
210
211Obviously, DBM::Deep isn't going to be as fast as some C-based DBMs, such as
212the almighty I<BerkeleyDB>. But it makes up for it in features like true
213multi-level hash/array support, and cross-platform FTPable files. Even so,
214DBM::Deep is still pretty fast, and the speed stays fairly consistent, even
215with huge databases. Here is some test data:
216
217 Adding 1,000,000 keys to new DB file...
218
219 At 100 keys, avg. speed is 2,703 keys/sec
220 At 200 keys, avg. speed is 2,642 keys/sec
221 At 300 keys, avg. speed is 2,598 keys/sec
222 At 400 keys, avg. speed is 2,578 keys/sec
223 At 500 keys, avg. speed is 2,722 keys/sec
224 At 600 keys, avg. speed is 2,628 keys/sec
225 At 700 keys, avg. speed is 2,700 keys/sec
226 At 800 keys, avg. speed is 2,607 keys/sec
227 At 900 keys, avg. speed is 2,190 keys/sec
228 At 1,000 keys, avg. speed is 2,570 keys/sec
229 At 2,000 keys, avg. speed is 2,417 keys/sec
230 At 3,000 keys, avg. speed is 1,982 keys/sec
231 At 4,000 keys, avg. speed is 1,568 keys/sec
232 At 5,000 keys, avg. speed is 1,533 keys/sec
233 At 6,000 keys, avg. speed is 1,787 keys/sec
234 At 7,000 keys, avg. speed is 1,977 keys/sec
235 At 8,000 keys, avg. speed is 2,028 keys/sec
236 At 9,000 keys, avg. speed is 2,077 keys/sec
237 At 10,000 keys, avg. speed is 2,031 keys/sec
238 At 20,000 keys, avg. speed is 1,970 keys/sec
239 At 30,000 keys, avg. speed is 2,050 keys/sec
240 At 40,000 keys, avg. speed is 2,073 keys/sec
241 At 50,000 keys, avg. speed is 1,973 keys/sec
242 At 60,000 keys, avg. speed is 1,914 keys/sec
243 At 70,000 keys, avg. speed is 2,091 keys/sec
244 At 80,000 keys, avg. speed is 2,103 keys/sec
245 At 90,000 keys, avg. speed is 1,886 keys/sec
246 At 100,000 keys, avg. speed is 1,970 keys/sec
247 At 200,000 keys, avg. speed is 2,053 keys/sec
248 At 300,000 keys, avg. speed is 1,697 keys/sec
249 At 400,000 keys, avg. speed is 1,838 keys/sec
250 At 500,000 keys, avg. speed is 1,941 keys/sec
251 At 600,000 keys, avg. speed is 1,930 keys/sec
252 At 700,000 keys, avg. speed is 1,735 keys/sec
253 At 800,000 keys, avg. speed is 1,795 keys/sec
254 At 900,000 keys, avg. speed is 1,221 keys/sec
255 At 1,000,000 keys, avg. speed is 1,077 keys/sec
256
257This test was performed on a PowerMac G4 1gHz running Mac OS X 10.3.2 & Perl
2585.8.1, with an 80GB Ultra ATA/100 HD spinning at 7200RPM. The hash keys and
259values were between 6 - 12 chars in length. The DB file ended up at 210MB.
260Run time was 12 min 3 sec.
261
262=head2 MEMORY USAGE
263
b8370759 264One of the great things about L<DBM::Deep> is that it uses very little memory.
1990c72d 265Even with huge databases (1,000,000+ keys) you will not see much increased
b8370759 266memory on your process. L<DBM::Deep> relies solely on the filesystem for storing
1990c72d 267and fetching data. Here is output from I<top> before even opening a database
268handle:
269
270 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
271 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl
272
273Basically the process is taking 2,716K of memory. And here is the same
274process after storing and fetching 1,000,000 keys:
275
276 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
277 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl
278
279Notice the memory usage increased by only 56K. Test was performed on a 700mHz
280x86 box running Linux RedHat 7.2 & Perl 5.6.1.
281
d8db2929 282=cut