Commit | Line | Data |
463ee0b2 |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | sdbm - Substitute DBM |
8 | or |
9 | Berkeley ndbm for Every UN*X[1] Made Simple |
10 | |
11 | Ozan (oz) Yigit |
12 | |
13 | The Guild of PD Software Toolmakers |
14 | Toronto - Canada |
15 | |
16 | oz@nexus.yorku.ca |
17 | |
18 | |
19 | |
20 | Implementation is the sincerest form of flattery. - L. Peter |
21 | Deutsch |
22 | |
23 | A The Clone of the ndbm library |
24 | |
25 | The sources accompanying this notice - sdbm - consti- |
26 | tute the first public release (Dec. 1990) of a complete |
27 | clone of the Berkeley UN*X ndbm library. The sdbm library is |
28 | meant to clone the proven functionality of ndbm as closely |
29 | as possible, including a few improvements. It is practical, |
30 | easy to understand, and compatible. The sdbm library is not |
31 | derived from any licensed, proprietary or copyrighted |
32 | software. |
33 | |
34 | The sdbm implementation is based on a 1978 algorithm |
35 | [Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''. |
36 | In the course of searching for a substitute for ndbm, I pro- |
37 | totyped three different external-hashing algorithms [Lar78, |
38 | Fag79, Lit80] and ultimately chose Larson's algorithm as a |
39 | basis of the sdbm implementation. The Bell Labs dbm (and |
40 | therefore ndbm) is based on an algorithm invented by Ken |
41 | Thompson, [Tho90, Tor87] and predates Larson's work. |
42 | |
43 | The sdbm programming interface is totally compatible |
44 | with ndbm and includes a slight improvement in database ini- |
45 | tialization. It is also expected to be binary-compatible |
46 | under most UN*X versions that support the ndbm library. |
47 | |
48 | The sdbm implementation shares the shortcomings of the |
49 | ndbm library, as a side effect of various simplifications to |
50 | the original Larson algorithm. It does produce holes in the |
51 | page file as it writes pages past the end of file. (Larson's |
52 | paper include a clever solution to this problem that is a |
53 | result of using the hash value directly as a block address.) |
54 | On the other hand, extensive tests seem to indicate that |
55 | sdbm creates fewer holes in general, and the resulting page- |
56 | files are smaller. The sdbm implementation is also faster |
57 | than ndbm in database creation. Unlike the ndbm, the sdbm |
58 | _________________________ |
59 | |
60 | [1] UN*X is not a trademark of any (dis)organization. |
61 | |
62 | |
63 | |
64 | |
65 | |
66 | |
67 | |
68 | |
69 | |
70 | - 2 - |
71 | |
72 | |
73 | store operation will not ``wander away'' trying to split its |
74 | data pages to insert a datum that cannot (due to elaborate |
75 | worst-case situations) be inserted. (It will fail after a |
76 | pre-defined number of attempts.) |
77 | |
78 | Important Compatibility Warning |
79 | |
80 | The sdbm and ndbm libraries cannot share databases: one |
81 | cannot read the (dir/pag) database created by the other. |
82 | This is due to the differences between the ndbm and sdbm |
83 | algorithms[2], and the hash functions used. It is easy to |
84 | convert between the dbm/ndbm databases and sdbm by ignoring |
85 | the index completely: see dbd, dbu etc. |
86 | |
87 | |
88 | Notice of Intellectual Property |
89 | |
90 | The entire sdbm library package, as authored by me, Ozan S. |
91 | Yigit, is hereby placed in the public domain. As such, the |
92 | author is not responsible for the consequences of use of |
93 | this software, no matter how awful, even if they arise from |
94 | defects in it. There is no expressed or implied warranty for |
95 | the sdbm library. |
96 | |
97 | Since the sdbm library package is in the public domain, |
98 | this original release or any additional public-domain |
99 | releases of the modified original cannot possibly (by defin- |
100 | ition) be withheld from you. Also by definition, You (singu- |
101 | lar) have all the rights to this code (including the right |
102 | to sell without permission, the right to hoard[3] and the |
103 | right to do other icky things as you see fit) but those |
104 | rights are also granted to everyone else. |
105 | |
106 | Please note that all previous distributions of this |
107 | software contained a copyright (which is now dropped) to |
108 | protect its origins and its current public domain status |
109 | against any possible claims and/or challenges. |
110 | |
111 | Acknowledgments |
112 | |
113 | Many people have been very helpful and supportive. A |
114 | partial list would necessarily include Rayan Zacherissen |
115 | (who contributed the man page, and also hacked a MMAP |
116 | _________________________ |
117 | |
118 | [2] Torek's discussion [Tor87] indicates that |
119 | dbm/ndbm implementations use the hash value to traverse |
120 | the radix trie differently than sdbm and as a result, |
121 | the page indexes are generated in different order. For |
122 | more information, send e-mail to the author. |
123 | [3] You cannot really hoard something that is avail- |
124 | able to the public at large, but try if it makes you |
125 | feel any better. |
126 | |
127 | |
128 | |
129 | |
130 | |
131 | |
132 | |
133 | |
134 | |
135 | |
136 | - 3 - |
137 | |
138 | |
139 | version of sdbm), Arnold Robbins, Chris Lewis, Bill David- |
140 | sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me |
141 | started in the first place), Johannes Ruschein (who did the |
142 | minix port) and David Tilbrook. I thank you all. |
143 | |
144 | Distribution Manifest and Notes |
145 | |
146 | This distribution of sdbm includes (at least) the following: |
147 | |
148 | CHANGES change log |
149 | README this file. |
150 | biblio a small bibliography on external hashing |
151 | dba.c a crude (n/s)dbm page file analyzer |
152 | dbd.c a crude (n/s)dbm page file dumper (for conversion) |
153 | dbe.1 man page for dbe.c |
154 | dbe.c Janick's database editor |
155 | dbm.c a dbm library emulation wrapper for ndbm/sdbm |
156 | dbm.h header file for the above |
157 | dbu.c a crude db management utility |
158 | hash.c hashing function |
159 | makefile guess. |
160 | pair.c page-level routines (posted earlier) |
161 | pair.h header file for the above |
162 | readme.ms troff source for the README file |
163 | sdbm.3 man page |
164 | sdbm.c the real thing |
165 | sdbm.h header file for the above |
166 | tune.h place for tuning & portability thingies |
167 | util.c miscellaneous |
168 | |
169 | dbu is a simple database manipulation program[4] that |
170 | tries to look like Bell Labs' cbt utility. It is currently |
171 | incomplete in functionality. I use dbu to test out the rou- |
172 | tines: it takes (from stdin) tab separated key/value pairs |
173 | for commands like build or insert or takes keys for commands |
174 | like delete or look. |
175 | |
176 | dbu <build|creat|look|insert|cat|delete> dbmfile |
177 | |
178 | dba is a crude analyzer of dbm/sdbm/ndbm page files. It |
179 | scans the entire page file, reporting page level statistics, |
180 | and totals at the end. |
181 | |
182 | dbd is a crude dump program for dbm/ndbm/sdbm data- |
183 | bases. It ignores the bitmap, and dumps the data pages in |
184 | sequence. It can be used to create input for the dbu util- |
185 | ity. Note that dbd will skip any NULLs in the key and data |
186 | fields, thus is unsuitable to convert some peculiar |
187 | _________________________ |
188 | |
189 | [4] The dbd, dba, dbu utilities are quick hacks and |
190 | are not fit for production use. They were developed |
191 | late one night, just to test out sdbm, and convert some |
192 | databases. |
193 | |
194 | |
195 | |
196 | |
197 | |
198 | |
199 | |
200 | |
201 | |
202 | - 4 - |
203 | |
204 | |
205 | databases that insist in including the terminating null. |
206 | |
207 | I have also included a copy of the dbe (ndbm DataBase |
208 | Editor) by Janick Bergeron [janick@bnr.ca] for your pleas- |
209 | ure. You may find it more useful than the little dbu util- |
210 | ity. |
211 | |
212 | dbm.[ch] is a dbm library emulation on top of ndbm (and |
213 | hence suitable for sdbm). Written by Robert Elz. |
214 | |
215 | The sdbm library has been around in beta test for quite |
216 | a long time, and from whatever little feedback I received |
217 | (maybe no news is good news), I believe it has been func- |
218 | tioning without any significant problems. I would, of |
219 | course, appreciate all fixes and/or improvements. Portabil- |
220 | ity enhancements would especially be useful. |
221 | |
222 | Implementation Issues |
223 | |
224 | Hash functions: The algorithm behind sdbm implementa- |
225 | tion needs a good bit-scrambling hash function to be effec- |
226 | tive. I ran into a set of constants for a simple hash func- |
227 | tion that seem to help sdbm perform better than ndbm for |
228 | various inputs: |
229 | |
230 | /* |
231 | * polynomial conversion ignoring overflows |
232 | * 65599 nice. 65587 even better. |
233 | */ |
234 | long |
235 | dbm_hash(char *str, int len) { |
236 | register unsigned long n = 0; |
237 | |
238 | while (len--) |
239 | n = n * 65599 + *str++; |
240 | return n; |
241 | } |
242 | |
243 | There may be better hash functions for the purposes of |
244 | dynamic hashing. Try your favorite, and check the pagefile. |
245 | If it contains too many pages with too many holes, (in rela- |
246 | tion to this one for example) or if sdbm simply stops work- |
247 | ing (fails after SPLTMAX attempts to split) when you feed |
248 | your NEWS history file to it, you probably do not have a |
249 | good hashing function. If you do better (for different |
250 | types of input), I would like to know about the function you |
251 | use. |
252 | |
253 | Block sizes: It seems (from various tests on a few |
254 | machines) that a page file block size PBLKSIZ of 1024 is by |
255 | far the best for performance, but this also happens to limit |
256 | the size of a key/value pair. Depending on your needs, you |
257 | may wish to increase the page size, and also adjust PAIRMAX |
258 | (the maximum size of a key/value pair allowed: should always |
259 | |
260 | |
261 | |
262 | |
263 | |
264 | |
265 | |
266 | |
267 | |
268 | - 5 - |
269 | |
270 | |
271 | be at least three words smaller than PBLKSIZ.) accordingly. |
272 | The system-wide version of the library should probably be |
273 | configured with 1024 (distribution default), as this appears |
274 | to be sufficient for most common uses of sdbm. |
275 | |
276 | Portability |
277 | |
278 | This package has been tested in many different UN*Xes |
279 | even including minix, and appears to be reasonably portable. |
280 | This does not mean it will port easily to non-UN*X systems. |
281 | |
282 | Notes and Miscellaneous |
283 | |
284 | The sdbm is not a very complicated package, at least |
285 | not after you familiarize yourself with the literature on |
286 | external hashing. There are other interesting algorithms in |
287 | existence that ensure (approximately) single-read access to |
288 | a data value associated with any key. These are directory- |
289 | less schemes such as linear hashing [Lit80] (+ Larson varia- |
290 | tions), spiral storage [Mar79] or directory schemes such as |
291 | extensible hashing [Fag79] by Fagin et al. I do hope these |
292 | sources provide a reasonable playground for experimentation |
293 | with other algorithms. See the June 1988 issue of ACM Com- |
294 | puting Surveys [Enb88] for an excellent overview of the |
295 | field. |
296 | |
297 | References |
298 | |
299 | |
300 | [Lar78] |
301 | P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18, pp. |
302 | 184-201, 1978. |
303 | |
304 | [Tho90] |
305 | Ken Thompson, private communication, Nov. 1990 |
306 | |
307 | [Lit80] |
308 | W. Litwin, `` Linear Hashing: A new tool for file and |
309 | table addressing'', Proceedings of the 6th Conference on |
310 | Very Large Dabatases (Montreal), pp. 212-223, Very |
311 | Large Database Foundation, Saratoga, Calif., 1980. |
312 | |
313 | [Fag79] |
314 | R. Fagin, J. Nievergelt, N. Pippinger, and H. R. |
315 | Strong, ``Extendible Hashing - A Fast Access Method for |
316 | Dynamic Files'', ACM Trans. Database Syst., vol. 4, |
317 | no.3, pp. 315-344, Sept. 1979. |
318 | |
319 | [Wal84] |
320 | Rich Wales, ``Discussion of "dbm" data base system'', |
321 | USENET newsgroup unix.wizards, Jan. 1984. |
322 | |
323 | [Tor87] |
324 | Chris Torek, ``Re: dbm.a and ndbm.a archives'', |
325 | |
326 | |
327 | |
328 | |
329 | |
330 | |
331 | |
332 | |
333 | |
334 | - 6 - |
335 | |
336 | |
337 | USENET newsgroup comp.unix, 1987. |
338 | |
339 | [Mar79] |
340 | G. N. Martin, ``Spiral Storage: Incrementally Augment- |
341 | able Hash Addressed Storage'', Technical Report #27, |
342 | University of Varwick, Coventry, U.K., 1979. |
343 | |
344 | [Enb88] |
345 | R. J. Enbody and H. C. Du, ``Dynamic Hashing |
346 | Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp. |
347 | 85-113, June 1988. |
348 | |
349 | |
350 | |
351 | |
352 | |
353 | |
354 | |
355 | |
356 | |
357 | |
358 | |
359 | |
360 | |
361 | |
362 | |
363 | |
364 | |
365 | |
366 | |
367 | |
368 | |
369 | |
370 | |
371 | |
372 | |
373 | |
374 | |
375 | |
376 | |
377 | |
378 | |
379 | |
380 | |
381 | |
382 | |
383 | |
384 | |
385 | |
386 | |
387 | |
388 | |
389 | |
390 | |
391 | |
392 | |
393 | |
394 | |
395 | |
396 | |