9 Berkeley ndbm for Every UN*X[1] Made Simple
13 The Guild of PD Software Toolmakers
20 Implementation is the sincerest form of flattery. - L. Peter
23 A The Clone of the ndbm library
25 The sources accompanying this notice - sdbm - consti-
26 tute the first public release (Dec. 1990) of a complete
27 clone of the Berkeley UN*X ndbm library. The sdbm library is
28 meant to clone the proven functionality of ndbm as closely
29 as possible, including a few improvements. It is practical,
30 easy to understand, and compatible. The sdbm library is not
31 derived from any licensed, proprietary or copyrighted
34 The sdbm implementation is based on a 1978 algorithm
35 [Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
36 In the course of searching for a substitute for ndbm, I pro-
37 totyped three different external-hashing algorithms [Lar78,
38 Fag79, Lit80] and ultimately chose Larson's algorithm as a
39 basis of the sdbm implementation. The Bell Labs dbm (and
40 therefore ndbm) is based on an algorithm invented by Ken
41 Thompson, [Tho90, Tor87] and predates Larson's work.
43 The sdbm programming interface is totally compatible
44 with ndbm and includes a slight improvement in database ini-
45 tialization. It is also expected to be binary-compatible
46 under most UN*X versions that support the ndbm library.
48 The sdbm implementation shares the shortcomings of the
49 ndbm library, as a side effect of various simplifications to
50 the original Larson algorithm. It does produce holes in the
51 page file as it writes pages past the end of file. (Larson's
52 paper include a clever solution to this problem that is a
53 result of using the hash value directly as a block address.)
54 On the other hand, extensive tests seem to indicate that
55 sdbm creates fewer holes in general, and the resulting page-
56 files are smaller. The sdbm implementation is also faster
57 than ndbm in database creation. Unlike the ndbm, the sdbm
58 _________________________
60 [1] UN*X is not a trademark of any (dis)organization.
73 store operation will not ``wander away'' trying to split its
74 data pages to insert a datum that cannot (due to elaborate
75 worst-case situations) be inserted. (It will fail after a
76 pre-defined number of attempts.)
78 Important Compatibility Warning
80 The sdbm and ndbm libraries cannot share databases: one
81 cannot read the (dir/pag) database created by the other.
82 This is due to the differences between the ndbm and sdbm
83 algorithms[2], and the hash functions used. It is easy to
84 convert between the dbm/ndbm databases and sdbm by ignoring
85 the index completely: see dbd, dbu etc.
88 Notice of Intellectual Property
90 The entire sdbm library package, as authored by me, Ozan S.
91 Yigit, is hereby placed in the public domain. As such, the
92 author is not responsible for the consequences of use of
93 this software, no matter how awful, even if they arise from
94 defects in it. There is no expressed or implied warranty for
97 Since the sdbm library package is in the public domain,
98 this original release or any additional public-domain
99 releases of the modified original cannot possibly (by defin-
100 ition) be withheld from you. Also by definition, You (singu-
101 lar) have all the rights to this code (including the right
102 to sell without permission, the right to hoard[3] and the
103 right to do other icky things as you see fit) but those
104 rights are also granted to everyone else.
106 Please note that all previous distributions of this
107 software contained a copyright (which is now dropped) to
108 protect its origins and its current public domain status
109 against any possible claims and/or challenges.
113 Many people have been very helpful and supportive. A
114 partial list would necessarily include Rayan Zacherissen
115 (who contributed the man page, and also hacked a MMAP
116 _________________________
118 [2] Torek's discussion [Tor87] indicates that
119 dbm/ndbm implementations use the hash value to traverse
120 the radix trie differently than sdbm and as a result,
121 the page indexes are generated in different order. For
122 more information, send e-mail to the author.
123 [3] You cannot really hoard something that is avail-
124 able to the public at large, but try if it makes you
139 version of sdbm), Arnold Robbins, Chris Lewis, Bill David-
140 sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me
141 started in the first place), Johannes Ruschein (who did the
142 minix port) and David Tilbrook. I thank you all.
144 Distribution Manifest and Notes
146 This distribution of sdbm includes (at least) the following:
150 biblio a small bibliography on external hashing
151 dba.c a crude (n/s)dbm page file analyzer
152 dbd.c a crude (n/s)dbm page file dumper (for conversion)
153 dbe.1 man page for dbe.c
154 dbe.c Janick's database editor
155 dbm.c a dbm library emulation wrapper for ndbm/sdbm
156 dbm.h header file for the above
157 dbu.c a crude db management utility
158 hash.c hashing function
160 pair.c page-level routines (posted earlier)
161 pair.h header file for the above
162 readme.ms troff source for the README file
164 sdbm.c the real thing
165 sdbm.h header file for the above
166 tune.h place for tuning & portability thingies
169 dbu is a simple database manipulation program[4] that
170 tries to look like Bell Labs' cbt utility. It is currently
171 incomplete in functionality. I use dbu to test out the rou-
172 tines: it takes (from stdin) tab separated key/value pairs
173 for commands like build or insert or takes keys for commands
176 dbu <build|creat|look|insert|cat|delete> dbmfile
178 dba is a crude analyzer of dbm/sdbm/ndbm page files. It
179 scans the entire page file, reporting page level statistics,
180 and totals at the end.
182 dbd is a crude dump program for dbm/ndbm/sdbm data-
183 bases. It ignores the bitmap, and dumps the data pages in
184 sequence. It can be used to create input for the dbu util-
185 ity. Note that dbd will skip any NULLs in the key and data
186 fields, thus is unsuitable to convert some peculiar
187 _________________________
189 [4] The dbd, dba, dbu utilities are quick hacks and
190 are not fit for production use. They were developed
191 late one night, just to test out sdbm, and convert some
205 databases that insist in including the terminating null.
207 I have also included a copy of the dbe (ndbm DataBase
208 Editor) by Janick Bergeron [janick@bnr.ca] for your pleas-
209 ure. You may find it more useful than the little dbu util-
212 dbm.[ch] is a dbm library emulation on top of ndbm (and
213 hence suitable for sdbm). Written by Robert Elz.
215 The sdbm library has been around in beta test for quite
216 a long time, and from whatever little feedback I received
217 (maybe no news is good news), I believe it has been func-
218 tioning without any significant problems. I would, of
219 course, appreciate all fixes and/or improvements. Portabil-
220 ity enhancements would especially be useful.
222 Implementation Issues
224 Hash functions: The algorithm behind sdbm implementa-
225 tion needs a good bit-scrambling hash function to be effec-
226 tive. I ran into a set of constants for a simple hash func-
227 tion that seem to help sdbm perform better than ndbm for
231 * polynomial conversion ignoring overflows
232 * 65599 nice. 65587 even better.
235 dbm_hash(char *str, int len) {
236 register unsigned long n = 0;
239 n = n * 65599 + *str++;
243 There may be better hash functions for the purposes of
244 dynamic hashing. Try your favorite, and check the pagefile.
245 If it contains too many pages with too many holes, (in rela-
246 tion to this one for example) or if sdbm simply stops work-
247 ing (fails after SPLTMAX attempts to split) when you feed
248 your NEWS history file to it, you probably do not have a
249 good hashing function. If you do better (for different
250 types of input), I would like to know about the function you
253 Block sizes: It seems (from various tests on a few
254 machines) that a page file block size PBLKSIZ of 1024 is by
255 far the best for performance, but this also happens to limit
256 the size of a key/value pair. Depending on your needs, you
257 may wish to increase the page size, and also adjust PAIRMAX
258 (the maximum size of a key/value pair allowed: should always
271 be at least three words smaller than PBLKSIZ.) accordingly.
272 The system-wide version of the library should probably be
273 configured with 1024 (distribution default), as this appears
274 to be sufficient for most common uses of sdbm.
278 This package has been tested in many different UN*Xes
279 even including minix, and appears to be reasonably portable.
280 This does not mean it will port easily to non-UN*X systems.
282 Notes and Miscellaneous
284 The sdbm is not a very complicated package, at least
285 not after you familiarize yourself with the literature on
286 external hashing. There are other interesting algorithms in
287 existence that ensure (approximately) single-read access to
288 a data value associated with any key. These are directory-
289 less schemes such as linear hashing [Lit80] (+ Larson varia-
290 tions), spiral storage [Mar79] or directory schemes such as
291 extensible hashing [Fag79] by Fagin et al. I do hope these
292 sources provide a reasonable playground for experimentation
293 with other algorithms. See the June 1988 issue of ACM Com-
294 puting Surveys [Enb88] for an excellent overview of the
301 P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18, pp.
305 Ken Thompson, private communication, Nov. 1990
308 W. Litwin, `` Linear Hashing: A new tool for file and
309 table addressing'', Proceedings of the 6th Conference on
310 Very Large Dabatases (Montreal), pp. 212-223, Very
311 Large Database Foundation, Saratoga, Calif., 1980.
314 R. Fagin, J. Nievergelt, N. Pippinger, and H. R.
315 Strong, ``Extendible Hashing - A Fast Access Method for
316 Dynamic Files'', ACM Trans. Database Syst., vol. 4,
317 no.3, pp. 315-344, Sept. 1979.
320 Rich Wales, ``Discussion of "dbm" data base system'',
321 USENET newsgroup unix.wizards, Jan. 1984.
324 Chris Torek, ``Re: dbm.a and ndbm.a archives'',
337 USENET newsgroup comp.unix, 1987.
340 G. N. Martin, ``Spiral Storage: Incrementally Augment-
341 able Hash Addressed Storage'', Technical Report #27,
342 University of Varwick, Coventry, U.K., 1979.
345 R. J. Enbody and H. C. Du, ``Dynamic Hashing
346 Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp.