Commit | Line | Data |
463ee0b2 |
1 | .\" tbl | readme.ms | [tn]roff -ms | ... |
2 | .\" note the "C" (courier) and "CB" fonts: you will probably have to |
3 | .\" change these. |
4 | .\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $ |
5 | |
6 | .de P1 |
7 | .br |
8 | .nr dT 4 |
9 | .nf |
10 | .ft C |
11 | .sp .5 |
12 | .nr t \\n(dT*\\w'x'u |
13 | .ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu |
14 | .. |
15 | .de P2 |
16 | .br |
17 | .ft 1 |
18 | .br |
19 | .sp .5 |
20 | .br |
21 | .fi |
22 | .. |
23 | .\" CW uses the typewriter/courier font. |
24 | .de CW |
25 | \fC\\$1\\fP\\$2 |
26 | .. |
27 | |
28 | .\" Footnote numbering [by Henry Spencer] |
29 | .\" <text>\*f for a footnote number.. |
30 | .\" .FS |
31 | .\" \*F <footnote text> |
32 | .\" .FE |
33 | .\" |
34 | .ds f \\u\\s-2\\n+f\\s+2\\d |
35 | .nr f 0 1 |
36 | .ds F \\n+F. |
37 | .nr F 0 1 |
38 | |
39 | .ND |
40 | .LP |
41 | .TL |
42 | \fIsdbm\fP \(em Substitute DBM |
43 | .br |
44 | or |
45 | .br |
46 | Berkeley \fIndbm\fP for Every UN*X\** Made Simple |
47 | .AU |
48 | Ozan (oz) Yigit |
49 | .AI |
50 | The Guild of PD Software Toolmakers |
51 | Toronto - Canada |
52 | .sp |
53 | oz@nexus.yorku.ca |
54 | .LP |
55 | .FS |
56 | UN*X is not a trademark of any (dis)organization. |
57 | .FE |
58 | .sp 2 |
59 | \fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP |
60 | .SH |
61 | A The Clone of the \fIndbm\fP library |
62 | .PP |
63 | The sources accompanying this notice \(em \fIsdbm\fP \(em constitute |
64 | the first public release (Dec. 1990) of a complete clone of |
65 | the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to |
66 | clone the proven functionality of \fIndbm\fP as closely as possible, |
67 | including a few improvements. It is practical, easy to understand, and |
68 | compatible. |
69 | The \fIsdbm\fP library is not derived from any licensed, proprietary or |
70 | copyrighted software. |
71 | .PP |
72 | The \fIsdbm\fP implementation is based on a 1978 algorithm |
73 | [Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''. |
74 | In the course of searching for a substitute for \fIndbm\fP, I |
75 | prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] |
76 | and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP |
77 | implementation. The Bell Labs |
78 | \fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by |
79 | Ken Thompson, [Tho90, Tor87] and predates Larson's work. |
80 | .PP |
81 | The \fIsdbm\fR programming interface is totally compatible |
82 | with \fIndbm\fP and includes a slight improvement in database initialization. |
83 | It is also expected to be binary-compatible under most UN*X versions that |
84 | support the \fIndbm\fP library. |
85 | .PP |
86 | The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP |
87 | library, as a side effect of various simplifications to the original Larson |
88 | algorithm. It does produce \fIholes\fP in the page file as it writes |
89 | pages past the end of file. (Larson's paper include a clever solution to |
90 | this problem that is a result of using the hash value directly as a block |
91 | address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP |
92 | creates fewer holes in general, and the resulting pagefiles are |
93 | smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP |
94 | in database creation. |
95 | Unlike the \fIndbm\fP, the \fIsdbm\fP |
96 | .CW store |
97 | operation will not ``wander away'' trying to split its |
98 | data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case |
99 | situations) be inserted. (It will fail after a pre-defined number of attempts.) |
100 | .SH |
101 | Important Compatibility Warning |
102 | .PP |
103 | The \fIsdbm\fP and \fIndbm\fP |
104 | libraries \fIcannot\fP share databases: one cannot read the (dir/pag) |
105 | database created by the other. This is due to the differences |
106 | between the \fIndbm\fP and \fIsdbm\fP algorithms\**, |
107 | .FS |
108 | Torek's discussion [Tor87] |
109 | indicates that \fIdbm/ndbm\fP implementations use the hash |
110 | value to traverse the radix trie differently than \fIsdbm\fP |
111 | and as a result, the page indexes are generated in \fIdifferent\fP order. |
112 | For more information, send e-mail to the author. |
113 | .FE |
114 | and the hash functions |
115 | used. |
116 | It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP |
117 | by ignoring the index completely: see |
118 | .CW dbd , |
119 | .CW dbu |
120 | etc. |
121 | .R |
122 | .LP |
123 | .SH |
124 | Notice of Intellectual Property |
125 | .LP |
126 | \fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit, |
127 | \fIis hereby placed in the public domain.\fP As such, the author is not |
128 | responsible for the consequences of use of this software, no matter how |
129 | awful, even if they arise from defects in it. There is no expressed or |
130 | implied warranty for the \fIsdbm\fP library. |
131 | .PP |
132 | Since the \fIsdbm\fP |
133 | library package is in the public domain, this \fIoriginal\fP |
134 | release or any additional public-domain releases of the modified original |
135 | cannot possibly (by definition) be withheld from you. Also by definition, |
136 | You (singular) have all the rights to this code (including the right to |
137 | sell without permission, the right to hoard\** |
138 | .FS |
139 | You cannot really hoard something that is available to the public at |
140 | large, but try if it makes you feel any better. |
141 | .FE |
142 | and the right to do other icky things as |
143 | you see fit) but those rights are also granted to everyone else. |
144 | .PP |
145 | Please note that all previous distributions of this software contained |
146 | a copyright (which is now dropped) to protect its |
147 | origins and its current public domain status against any possible claims |
148 | and/or challenges. |
149 | .SH |
150 | Acknowledgments |
151 | .PP |
152 | Many people have been very helpful and supportive. A partial list would |
153 | necessarily include Rayan Zacherissen (who contributed the man page, |
154 | and also hacked a MMAP version of \fIsdbm\fP), |
155 | Arnold Robbins, Chris Lewis, |
156 | Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started |
157 | in the first place), Johannes Ruschein |
158 | (who did the minix port) and David Tilbrook. I thank you all. |
159 | .SH |
160 | Distribution Manifest and Notes |
161 | .LP |
162 | This distribution of \fIsdbm\fP includes (at least) the following: |
163 | .P1 |
164 | CHANGES change log |
165 | README this file. |
166 | biblio a small bibliography on external hashing |
167 | dba.c a crude (n/s)dbm page file analyzer |
168 | dbd.c a crude (n/s)dbm page file dumper (for conversion) |
169 | dbe.1 man page for dbe.c |
170 | dbe.c Janick's database editor |
171 | dbm.c a dbm library emulation wrapper for ndbm/sdbm |
172 | dbm.h header file for the above |
173 | dbu.c a crude db management utility |
174 | hash.c hashing function |
175 | makefile guess. |
176 | pair.c page-level routines (posted earlier) |
177 | pair.h header file for the above |
178 | readme.ms troff source for the README file |
179 | sdbm.3 man page |
180 | sdbm.c the real thing |
181 | sdbm.h header file for the above |
182 | tune.h place for tuning & portability thingies |
183 | util.c miscellaneous |
184 | .P2 |
185 | .PP |
186 | .CW dbu |
187 | is a simple database manipulation program\** that tries to look |
188 | .FS |
189 | The |
190 | .CW dbd , |
191 | .CW dba , |
192 | .CW dbu |
193 | utilities are quick hacks and are not fit for production use. They were |
194 | developed late one night, just to test out \fIsdbm\fP, and convert some |
195 | databases. |
196 | .FE |
197 | like Bell Labs' |
198 | .CW cbt |
199 | utility. It is currently incomplete in functionality. |
200 | I use |
201 | .CW dbu |
202 | to test out the routines: it takes (from stdin) tab separated |
203 | key/value pairs for commands like |
204 | .CW build |
205 | or |
206 | .CW insert |
207 | or takes keys for |
208 | commands like |
209 | .CW delete |
210 | or |
211 | .CW look . |
212 | .P1 |
213 | dbu <build|creat|look|insert|cat|delete> dbmfile |
214 | .P2 |
215 | .PP |
216 | .CW dba |
217 | is a crude analyzer of \fIdbm/sdbm/ndbm\fP |
218 | page files. It scans the entire |
219 | page file, reporting page level statistics, and totals at the end. |
220 | .PP |
221 | .CW dbd |
222 | is a crude dump program for \fIdbm/ndbm/sdbm\fP |
223 | databases. It ignores the |
224 | bitmap, and dumps the data pages in sequence. It can be used to create |
225 | input for the |
226 | .CW dbu |
227 | utility. |
228 | Note that |
229 | .CW dbd |
230 | will skip any NULLs in the key and data |
231 | fields, thus is unsuitable to convert some peculiar databases that |
232 | insist in including the terminating null. |
233 | .PP |
234 | I have also included a copy of the |
235 | .CW dbe |
236 | (\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for |
237 | your pleasure. You may find it more useful than the little |
238 | .CW dbu |
239 | utility. |
240 | .PP |
241 | .CW dbm.[ch] |
242 | is a \fIdbm\fP library emulation on top of \fIndbm\fP |
243 | (and hence suitable for \fIsdbm\fP). Written by Robert Elz. |
244 | .PP |
245 | The \fIsdbm\fP |
246 | library has been around in beta test for quite a long time, and from whatever |
247 | little feedback I received (maybe no news is good news), I believe it has been |
248 | functioning without any significant problems. I would, of course, appreciate |
249 | all fixes and/or improvements. Portability enhancements would especially be |
250 | useful. |
251 | .SH |
252 | Implementation Issues |
253 | .PP |
254 | Hash functions: |
255 | The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling |
256 | hash function to be effective. I ran into a set of constants for a simple |
257 | hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP |
258 | for various inputs: |
259 | .P1 |
260 | /* |
261 | * polynomial conversion ignoring overflows |
262 | * 65599 nice. 65587 even better. |
263 | */ |
264 | long |
265 | dbm_hash(char *str, int len) { |
266 | register unsigned long n = 0; |
267 | |
268 | while (len--) |
269 | n = n * 65599 + *str++; |
270 | return n; |
271 | } |
272 | .P2 |
273 | .PP |
274 | There may be better hash functions for the purposes of dynamic hashing. |
275 | Try your favorite, and check the pagefile. If it contains too many pages |
276 | with too many holes, (in relation to this one for example) or if |
277 | \fIsdbm\fP |
278 | simply stops working (fails after |
279 | .CW SPLTMAX |
280 | attempts to split) when you feed your |
281 | NEWS |
282 | .CW history |
283 | file to it, you probably do not have a good hashing function. |
284 | If you do better (for different types of input), I would like to know |
285 | about the function you use. |
286 | .PP |
287 | Block sizes: It seems (from various tests on a few machines) that a page |
288 | file block size |
289 | .CW PBLKSIZ |
290 | of 1024 is by far the best for performance, but |
291 | this also happens to limit the size of a key/value pair. Depending on your |
292 | needs, you may wish to increase the page size, and also adjust |
293 | .CW PAIRMAX |
294 | (the maximum size of a key/value pair allowed: should always be at least |
295 | three words smaller than |
296 | .CW PBLKSIZ .) |
297 | accordingly. The system-wide version of the library |
298 | should probably be |
299 | configured with 1024 (distribution default), as this appears to be sufficient |
300 | for most common uses of \fIsdbm\fP. |
301 | .SH |
302 | Portability |
303 | .PP |
304 | This package has been tested in many different UN*Xes even including minix, |
305 | and appears to be reasonably portable. This does not mean it will port |
306 | easily to non-UN*X systems. |
307 | .SH |
308 | Notes and Miscellaneous |
309 | .PP |
310 | The \fIsdbm\fP is not a very complicated package, at least not after you |
311 | familiarize yourself with the literature on external hashing. There are |
312 | other interesting algorithms in existence that ensure (approximately) |
313 | single-read access to a data value associated with any key. These are |
314 | directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson |
315 | variations), \fIspiral storage\fP [Mar79] or directory schemes such as |
316 | \fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources |
317 | provide a reasonable playground for experimentation with other algorithms. |
318 | See the June 1988 issue of ACM Computing Surveys [Enb88] for an |
319 | excellent overview of the field. |
320 | .PG |
321 | .SH |
322 | References |
323 | .LP |
324 | .IP [Lar78] 4m |
325 | P.-A. Larson, |
326 | ``Dynamic Hashing'', \fIBIT\fP, vol. 18, pp. 184-201, 1978. |
327 | .IP [Tho90] 4m |
328 | Ken Thompson, \fIprivate communication\fP, Nov. 1990 |
329 | .IP [Lit80] 4m |
330 | W. Litwin, |
331 | `` Linear Hashing: A new tool for file and table addressing'', |
332 | \fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP, |
333 | pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980. |
334 | .IP [Fag79] 4m |
335 | R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, |
336 | ``Extendible Hashing - A Fast Access Method for Dynamic Files'', |
337 | \fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979. |
338 | .IP [Wal84] 4m |
339 | Rich Wales, |
340 | ``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP, |
341 | Jan. 1984. |
342 | .IP [Tor87] 4m |
343 | Chris Torek, |
344 | ``Re: dbm.a and ndbm.a archives'', \fIUSENET newsgroup comp.unix\fP, |
345 | 1987. |
346 | .IP [Mar79] 4m |
347 | G. N. Martin, |
348 | ``Spiral Storage: Incrementally Augmentable Hash Addressed Storage'', |
349 | \fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979. |
350 | .IP [Enb88] 4m |
351 | R. J. Enbody and H. C. Du, |
352 | ``Dynamic Hashing Schemes'',\fIACM Computing Surveys\fP, |
353 | vol. 20, no. 2, pp. 85-113, June 1988. |