Remove tmon.out in make clean
[p5sagit/p5-mst-13.2.git] / lib / Memoize / README
CommitLineData
a0cb3900 1
2Name: Memoize
3What: Transparently speed up functions by caching return values.
4Version: 0.51
5Author: Mark-Jason Dominus (mjd-perl-memoize+@plover.com)
6
7################################################################
8
9How to build me:
10
11 perl Makefile.PL
12 make
13 make test
14
15There's a very small chance that the tests in speed.t and
16expire_module_t.t might fail because of clock skew or bizarre system
17load conditions. If the tests there fail, rerun them and see if the
18problem persists.
19
20If the tests work,
21
22 make install
23
24If not, please send me a report that mentions which tests failed.
25The address is: mjd-perl-memoize+@plover.com.
26
27################################################################
28What's new since 0.49:
29
30Just a maintenance release. I made the tests a little more robust,
31and I included the Memoization article that I forgot to put into 0.48.
32
33################################################################
34What's new since 0.48:
35
36You can now expire data from the memoization cache according to any
37expiration policy you desire. A sample policy is provided in the
38Memoize::Expire module. It supports expiration of items that have
39been in the cache a certain number of seconds and items that have been
40accessed a certain number of times. When you call a memoized
41function, and Memoize discovers that a cache item has expired, it
42calls the real function and stores the result in the cache, just as if
43the data had not been in the cache in the first place.
44
45Many people asked for a cache expiration feature, and some people even
46sent patches. Thanks for the patches! But I did not accept them,
47because they all added the expiration stuff into the module, and I was
48sure that this was a bad way to do it. Everyone had a different idea
49of what useful expiration behavior was, so I foresaw an endless series
50of creeeping features and an expiration mechansim that got more and
51more and more complicated and slower and slower and slower.
52
53The new expiration policy mechanism makes use of the TIE feature. You
54write a cache policy module ( which might be very simple) and use the
55TIE feature to insert it between memoize and the real cache. The
56Memoize::Expire module. included in this package, is a useful example
57of this that might satisfy many people. The documentation for that
58module includes an even simpler module for those who would like to
59implement their own expiration policies.
60
61Big win: If you don't use the expiration feature, you don't pay for
62it. Memoize 0.49 with expiration turned off runs *exactly* as fast as
63Memoize 0.48 did. Not one line of code has been changed.
64
65Moral of the story: Sometimes, there is a Right Way to Do Things that
66really is better than the obvious way. It might not be obvious at
67first, and sometimes you have to make people wait for features so that
68the Right Way to Do Things can make itself known.
69
70Many thanks to Mike Cariaso for helping me figure out The Right Way to
71Do Things.
72
73Also: If you try to use ODBM_File, NDBM_File, SDBM_File, GDBM_File, or
74DB_File for the LIST_CACHE, you get an error right away, because those
75kinds of files will only store strings. Thanks to Jonathan Roy for
76suggesting this. If you want to store list values in a persistent
77cache, try Memoize::Storable.
78
79################################################################
80
81What's new since 0.46:
82
83Caching of function return values into NDBM files is now supported.
84You can cache function return values into Memoize::AnyDBM files, which
85is a pseudo-module that selects the `best' available DBM
86implementation.
87
88Bug fix: Prototyped functions are now memoized correctly; memoizing
89used to remove the prototype and issue a warning. Also new tests for
90this feature. (Thanks Alex Dudkevich)
91
92New test suites for SDBM and NDBM caching and prototyped functions.
93Various small fixes in the test suite.
94Various documentation enhancements and fixes.
95
96################################################################
97
98What's new since 0.45:
99
100Now has an interface to `Storable'. This wasn't formerly possible,
101because the base package can only store caches via modules that
102present a tied hash interface, and `Storable' doesn't. Solution:
103Memoize::Storable is a tied hash interface to `Storable'.
104
105################################################################
106
107What's new since 0.06:
108
109Storage of cached function return values in a static file is now
110tentatively supported. `memoize' now accepts new options SCALAR_CACHE
111and LIST_CACHE to specify the destination and protocol for saving
112cached values to disk.
113
114Consider these features alpha, and please report bugs to
115mjd-perl-memoize@plover.com. The beta version is awaiting a more
116complete test suite.
117
118Much new documentation to support all this.
119
120################################################################
121
122What's new since 0.05:
123
124Calling syntax is now
125
126 memoize(function, OPTION1 => VALUE1, ...)
127
128instead of
129
130 memoize(function, { OPTION1 => VALUE1, ... })
131
132
133Functions that return lists can now be memoized.
134
135New tests for list-returning functions and their normalizers.
136
137Various documentation changes.
138
139Return value from `unmemoize' is now the resulting unmemoized
140function, instead of the constant `1'. It was already docmuented to
141do so.
142
143################################################################
144
145
146=head1 NAME
147
148Memoize - Make your functions faster by trading space for time
149
150=head1 SYNOPSIS
151
152 use Memoize;
153 memoize('slow_function');
154 slow_function(arguments); # Is faster than it was before
155
156
157This is normally all you need to know. However, many options are available:
158
159 memoize(function, options...);
160
161Options include:
162
163 NORMALIZER => function
164 INSTALL => new_name
165
166 SCALAR_CACHE => 'MEMORY'
167 SCALAR_CACHE => ['TIE', Module, arguments...]
168 SCALAR_CACHE => 'FAULT'
169 SCALAR_CACHE => 'MERGE'
170
171 LIST_CACHE => 'MEMORY'
172 LIST_CACHE => ['TIE', Module, arguments...]
173 LIST_CACHE => 'FAULT'
174 LIST_CACHE => 'MERGE'
175
176
177=head1 DESCRIPTION
178
179`Memoizing' a function makes it faster by trading space for time. It
180does this by caching the return values of the function in a table.
181If you call the function again with the same arguments, C<memoize>
182jmups in and gives you the value out of the table, instead of letting
183the function compute the value all over again.
184
185Here is an extreme example. Consider the Fibonacci sequence, defined
186by the following function:
187
188 # Compute Fibonacci numbers
189 sub fib {
190 my $n = shift;
191 return $n if $n < 2;
192 fib($n-1) + fib($n-2);
193 }
194
195This function is very slow. Why? To compute fib(14), it first wants
196to compute fib(13) and fib(12), and add the results. But to compute
197fib(13), it first has to compute fib(12) and fib(11), and then it
198comes back and computes fib(12) all over again even though the answer
199is the same. And both of the times that it wants to compute fib(12),
200it has to compute fib(11) from scratch, and then it has to do it
201again each time it wants to compute fib(13). This function does so
202much recomputing of old results that it takes a really long time to
203run---fib(14) makes 1,200 extra recursive calls to itself, to compute
204and recompute things that it already computed.
205
206This function is a good candidate for memoization. If you memoize the
207`fib' function above, it will compute fib(14) exactly once, the first
208time it needs to, and then save the result in a table. Then if you
209ask for fib(14) again, it gives you the result out of the table.
210While computing fib(14), instead of computing fib(12) twice, it does
211it once; the second time it needs the value it gets it from the table.
212It doesn't compute fib(11) four times; it computes it once, getting it
213from the table the next three times. Instead of making 1,200
214recursive calls to `fib', it makes 15. This makes the function about
215150 times faster.
216
217You could do the memoization yourself, by rewriting the function, like
218this:
219
220 # Compute Fibonacci numbers, memoized version
221 { my @fib;
222 sub fib {
223 my $n = shift;
224 return $fib[$n] if defined $fib[$n];
225 return $fib[$n] = $n if $n < 2;
226 $fib[$n] = fib($n-1) + fib($n-2);
227 }
228 }
229
230Or you could use this module, like this:
231
232 use Memoize;
233 memoize('fib');
234
235 # Rest of the fib function just like the original version.
236
237This makes it easy to turn memoizing on and off.
238
239Here's an even simpler example: I wrote a simple ray tracer; the
240program would look in a certain direction, figure out what it was
241looking at, and then convert the `color' value (typically a string
242like `red') of that object to a red, green, and blue pixel value, like
243this:
244
245 for ($direction = 0; $direction < 300; $direction++) {
246 # Figure out which object is in direction $direction
247 $color = $object->{color};
248 ($r, $g, $b) = @{&ColorToRGB($color)};
249 ...
250 }
251
252Since there are relatively few objects in a picture, there are only a
253few colors, which get looked up over and over again. Memoizing
254C<ColorToRGB> speeded up the program by several percent.
255
256=head1 DETAILS
257
258This module exports exactly one function, C<memoize>. The rest of the
259functions in this package are None of Your Business.
260
261You should say
262
263 memoize(function)
264
265where C<function> is the name of the function you want to memoize, or
266a reference to it. C<memoize> returns a reference to the new,
267memoized version of the function, or C<undef> on a non-fatal error.
268At present, there are no non-fatal errors, but there might be some in
269the future.
270
271If C<function> was the name of a function, then C<memoize> hides the
272old version and installs the new memoized version under the old name,
273so that C<&function(...)> actually invokes the memoized version.
274
275=head1 OPTIONS
276
277There are some optional options you can pass to C<memoize> to change
278the way it behaves a little. To supply options, invoke C<memoize>
279like this:
280
281 memoize(function, NORMALIZER => function,
282 INSTALL => newname,
283 SCALAR_CACHE => option,
284 LIST_CACHE => option
285 );
286
287Each of these options is optional; you can include some, all, or none
288of them.
289
290=head2 INSTALL
291
292If you supply a function name with C<INSTALL>, memoize will install
293the new, memoized version of the function under the name you give.
294For example,
295
296 memoize('fib', INSTALL => 'fastfib')
297
298installs the memoized version of C<fib> as C<fastfib>; without the
299C<INSTALL> option it would have replaced the old C<fib> with the
300memoized version.
301
302To prevent C<memoize> from installing the memoized version anywhere, use
303C<INSTALL =E<gt> undef>.
304
305=head2 NORMALIZER
306
307Suppose your function looks like this:
308
309 # Typical call: f('aha!', A => 11, B => 12);
310 sub f {
311 my $a = shift;
312 my %hash = @_;
313 $hash{B} ||= 2; # B defaults to 2
314 $hash{C} ||= 7; # C defaults to 7
315
316 # Do something with $a, %hash
317 }
318
319Now, the following calls to your function are all completely equivalent:
320
321 f(OUCH);
322 f(OUCH, B => 2);
323 f(OUCH, C => 7);
324 f(OUCH, B => 2, C => 7);
325 f(OUCH, C => 7, B => 2);
326 (etc.)
327
328However, unless you tell C<Memoize> that these calls are equivalent,
329it will not know that, and it will compute the values for these
330invocations of your function separately, and store them separately.
331
332To prevent this, supply a C<NORMALIZER> function that turns the
333program arguments into a string in a way that equivalent arguments
334turn into the same string. A C<NORMALIZER> function for C<f> above
335might look like this:
336
337 sub normalize_f {
338 my $a = shift;
339 my %hash = @_;
340 $hash{B} ||= 2;
341 $hash{C} ||= 7;
342
343 join($;, $a, map ($_ => $hash{$_}) sort keys %hash);
344 }
345
346Each of the argument lists above comes out of the C<normalize_f>
347function looking exactly the same, like this:
348
349 OUCH^\B^\2^\C^\7
350
351You would tell C<Memoize> to use this normalizer this way:
352
353 memoize('f', NORMALIZER => 'normalize_f');
354
355C<memoize> knows that if the normalized version of the arguments is
356the same for two argument lists, then it can safely look up the value
357that it computed for one argument list and return it as the result of
358calling the function with the other argument list, even if the
359argument lists look different.
360
361The default normalizer just concatenates the arguments with C<$;> in
362between. This always works correctly for functions with only one
363argument, and also when the arguments never contain C<$;> (which is
364normally character #28, control-\. ) However, it can confuse certain
365argument lists:
366
367 normalizer("a\034", "b")
368 normalizer("a", "\034b")
369 normalizer("a\034\034b")
370
371for example.
372
373The calling context of the function (scalar or list context) is
374propagated to the normalizer. This means that if the memoized
375function will treat its arguments differently in list context than it
376would in scalar context, you can have the normalizer function select
377its behavior based on the results of C<wantarray>. Even if called in
378a list context, a normalizer should still return a single string.
379
380=head2 C<SCALAR_CACHE>, C<LIST_CACHE>
381
382Normally, C<Memoize> caches your function's return values into an
383ordinary Perl hash variable. However, you might like to have the
384values cached on the disk, so that they persist from one run of your
385program to the next, or you might like to associate some other
386interesting semantics with the cached values.
387
388There's a slight complication under the hood of C<Memoize>: There are
389actually I<two> caches, one for scalar values and one for list values.
390When your function is called in scalar context, its return value is
391cached in one hash, and when your function is called in list context,
392its value is cached in the other hash. You can control the caching
393behavior of both contexts independently with these options.
394
395The argument to C<LIST_CACHE> or C<SCALAR_CACHE> must either be one of
396the following four strings:
397
398 MEMORY
399 TIE
400 FAULT
401 MERGE
402
403or else it must be a reference to a list whose first element is one of
404these four strings, such as C<[TIE, arguments...]>.
405
406=over 4
407
408=item C<MEMORY>
409
410C<MEMORY> means that return values from the function will be cached in
411an ordinary Perl hash variable. The hash variable will not persist
412after the program exits. This is the default.
413
414=item C<TIE>
415
416C<TIE> means that the function's return values will be cached in a
417tied hash. A tied hash can have any semantics at all. It is
418typically tied to an on-disk database, so that cached values are
419stored in the database and retrieved from it again when needed, and
420the disk file typically persists after your pogram has exited.
421
422If C<TIE> is specified as the first element of a list, the remaining
423list elements are taken as arguments to the C<tie> call that sets up
424the tied hash. For example,
425
426 SCALAR_CACHE => [TIE, DB_File, $filename, O_RDWR | O_CREAT, 0666]
427
428says to tie the hash into the C<DB_File> package, and to pass the
429C<$filename>, C<O_RDWR | O_CREAT>, and C<0666> arguments to the C<tie>
430call. This has the effect of storing the cache in a C<DB_File>
431database whose name is in C<$filename>.
432
433Other typical uses of C<TIE>:
434
435 LIST_CACHE => [TIE, GDBM_File, $filename, O_RDWR | O_CREAT, 0666]
436 SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, O_RDWR|O_CREAT, 0666]
437 LIST_CACHE => [TIE, My_Package, $tablename, $key_field, $val_field]
438
439This last might tie the cache hash to a package that you wrote
440yourself that stores the cache in a SQL-accessible database.
441A useful use of this feature: You can construct a batch program that
442runs in the background and populates the memo table, and then when you
443come to run your real program the memoized function will be
444screamingly fast because all its results have been precomputed.
445
446=item C<FAULT>
447
448C<FAULT> means that you never expect to call the function in scalar
449(or list) context, and that if C<Memoize> detects such a call, it
450should abort the program. The error message is one of
451
452 `foo' function called in forbidden list context at line ...
453 `foo' function called in forbidden scalar context at line ...
454
455=item C<MERGE>
456
457C<MERGE> normally means the function does not distinguish between list
458and sclar context, and that return values in both contexts should be
459stored together. C<LIST_CACHE =E<gt> MERGE> means that list context
460return values should be stored in the same hash that is used for
461scalar context returns, and C<SCALAR_CACHE =E<gt> MERGE> means the
462same, mutatis mutandis. It is an error to specify C<MERGE> for both,
463but it probably does something useful.
464
465Consider this function:
466
467 sub pi { 3; }
468
469Normally, the following code will result in two calls to C<pi>:
470
471 $x = pi();
472 ($y) = pi();
473 $z = pi();
474
475The first call caches the value C<3> in the scalar cache; the second
476caches the list C<(3)> in the list cache. The third call doesn't call
477the real C<pi> function; it gets the value from the scalar cache.
478
479Obviously, the second call to C<pi> is a waste of time, and storing
480its return value is a waste of space. Specifying C<LIST_CACHE
481=E<gt> MERGE> will make C<memoize> use the same cache for scalar and
482list context return values, so that the second call uses the scalar
483cache that was populated by the first call. C<pi> ends up being
484cvalled only once, and both subsequent calls return C<3> from the
485cache, regardless of the calling context.
486
487Another use for C<MERGE> is when you want both kinds of return values
488stored in the same disk file; this saves you from having to deal with
489two disk files instead of one. You can use a normalizer function to
490keep the two sets of return values separate. For example:
491
492 memoize 'myfunc',
493 NORMALIZER => 'n',
494 SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, ...],
495 LIST_CACHE => MERGE,
496 ;
497
498 sub n {
499 my $context = wantarray() ? 'L' : 'S';
500 # ... now compute the hash key from the arguments ...
501 $hashkey = "$context:$hashkey";
502 }
503
504This normalizer function will store scalar context return values in
505the disk file under keys that begin with C<S:>, and list context
506return values under keys that begin with C<L:>.
507
508=back
509
510=head1 OTHER FUNCTION
511
512There's an C<unmemoize> function that you can import if you want to.
513Why would you want to? Here's an example: Suppose you have your cache
514tied to a DBM file, and you want to make sure that the cache is
515written out to disk if someone interrupts the program. If the program
516exits normally, this will happen anyway, but if someone types
517control-C or something then the program will terminate immediately
518without syncronizing the database. So what you can do instead is
519
520 $SIG{INT} = sub { unmemoize 'function' };
521
522
523Thanks to Jonathan Roy for discovering a use for C<unmemoize>.
524
525C<unmemoize> accepts a reference to, or the name of a previously
526memoized function, and undoes whatever it did to provide the memoized
527version in the first place, including making the name refer to the
528unmemoized version if appropriate. It returns a reference to the
529unmemoized version of the function.
530
531If you ask it to unmemoize a function that was never memoized, it
532croaks.
533
534=head1 CAVEATS
535
536Memoization is not a cure-all:
537
538=over 4
539
540=item *
541
542Do not memoize a function whose behavior depends on program
543state other than its own arguments, such as global variables, the time
544of day, or file input. These functions will not produce correct
545results when memoized. For a particularly easy example:
546
547 sub f {
548 time;
549 }
550
551This function takes no arguments, and as far as C<Memoize> is
552concerned, it always returns the same result. C<Memoize> is wrong, of
553course, and the memoized version of this function will call C<time> once
554to get the current time, and it will return that same time
555every time you call it after that.
556
557=item *
558
559Do not memoize a function with side effects.
560
561 sub f {
562 my ($a, $b) = @_;
563 my $s = $a + $b;
564 print "$a + $b = $s.\n";
565 }
566
567This function accepts two arguments, adds them, and prints their sum.
568Its return value is the numuber of characters it printed, but you
569probably didn't care about that. But C<Memoize> doesn't understand
570that. If you memoize this function, you will get the result you
571expect the first time you ask it to print the sum of 2 and 3, but
572subsequent calls will return the number 11 (the return value of
573C<print>) without actually printing anything.
574
575=item *
576
577Do not memoize a function that returns a data structure that is
578modified by its caller.
579
580Consider these functions: C<getusers> returns a list of users somehow,
581and then C<main> throws away the first user on the list and prints the
582rest:
583
584 sub main {
585 my $userlist = getusers();
586 shift @$userlist;
587 foreach $u (@$userlist) {
588 print "User $u\n";
589 }
590 }
591
592 sub getusers {
593 my @users;
594 # Do something to get a list of users;
595 \@users; # Return reference to list.
596 }
597
598If you memoize C<getusers> here, it will work right exactly once. The
599reference to the users list will be stored in the memo table. C<main>
600will discard the first element from the referenced list. The next
601time you invoke C<main>, C<Memoize> will not call C<getusers>; it will
602just return the same reference to the same list it got last time. But
603this time the list has already had its head removed; C<main> will
604erroneously remove another element from it. The list will get shorter
605and shorter every time you call C<main>.
606
607
608=back
609
610=head1 PERSISTENT CACHE SUPPORT
611
612You can tie the cache tables to any sort of tied hash that you want
613to, as long as it supports C<TIEHASH>, C<FETCH>, C<STORE>, and
614C<EXISTS>. For example,
615
616 memoize 'function', SCALAR_CACHE =>
617 [TIE, GDBM_File, $filename, O_RDWR|O_CREAT, 0666];
618
619works just fine. For some storage methods, you need a little glue.
620
621C<SDBM_File> doesn't supply an C<EXISTS> method, so included in this
622package is a glue module called C<Memoize::SDBM_File> which does
623provide one. Use this instead of plain C<SDBM_File> to store your
624cache table on disk in an C<SDBM_File> database:
625
626 memoize 'function',
627 SCALAR_CACHE =>
628 [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666];
629
630C<NDBM_File> has the same problem and the same solution.
631
632C<Storable> isn't a tied hash class at all. You can use it to store a
633hash to disk and retrieve it again, but you can't modify the hash while
634it's on the disk. So if you want to store your cache table in a
635C<Storable> database, use C<Memoize::Storable>, which puts a hashlike
636front-end onto C<Storable>. The hash table is actually kept in
637memory, and is loaded from your C<Storable> file at the time you
638memoize the function, and stored back at the time you unmemoize the
639function (or when your program exits):
640
641 memoize 'function',
642 SCALAR_CACHE => [TIE, Memoize::Storable, $filename];
643
644 memoize 'function',
645 SCALAR_CACHE => [TIE, Memoize::Storable, $filename, 'nstore'];
646
647Include the `nstore' option to have the C<Storable> database written
648in `network order'. (See L<Storable> for more details about this.)
649
650=head1 EXPIRATION SUPPORT
651
652See Memoize::Expire, which is a plug-in module that adds expiration
653functionality to Memoize. If you don't like the kinds of policies
654that Memoize::Expire implements, it is easy to write your own plug-in
655module to implement whatever policy you desire.
656
657=head1 MY BUGS
658
659Needs a better test suite, especially for the tied and expiration stuff.
660
661Also, there is some problem with the way C<goto &f> works under
662threaded Perl, because of the lexical scoping of C<@_>. This is a bug
663in Perl, and until it is resolved, Memoize won't work with these
664Perls. To fix it, you need to chop the source code a little. Find
665the comment in the source code that says C<--- THREADED PERL
666COMMENT---> and comment out the active line and uncomment the
667commented one. Then try it again.
668
669I wish I could investigate this threaded Perl problem. If someone
670could lend me an account on a machine with threaded Perl for a few
671hours, it would be very helpful.
672
673That is why the version number is 0.49 instead of 1.00.
674
675=head1 MAILING LIST
676
677To join a very low-traffic mailing list for announcements about
678C<Memoize>, send an empty note to C<mjd-perl-memoize-request@plover.com>.
679
680=head1 AUTHOR
681
682Mark-Jason Dominus (C<mjd-perl-memoize+@plover.com>), Plover Systems co.
683
684See the C<Memoize.pm> Page at http://www.plover.com/~mjd/perl/Memoize/
685for news and upgrades. Near this page, at
686http://www.plover.com/~mjd/perl/MiniMemoize/ there is an article about
687memoization and about the internals of Memoize that appeared in The
688Perl Journal, issue #13. (This article is also included in the
689Memoize distribution as `article.html'.)
690
691To join a mailing list for announcements about C<Memoize>, send an
692empty message to C<mjd-perl-memoize-request@plover.com>. This mailing
693list is for announcements only and has extremely low traffic---about
694four messages per year.
695
696=head1 THANK YOU
697
698Many thanks to Jonathan Roy for bug reports and suggestions, to
699Michael Schwern for other bug reports and patches, to Mike Cariaso for
700helping me to figure out the Right Thing to Do About Expiration, to
701Joshua Gerth, Joshua Chamas, Jonathan Roy, Mark D. Anderson, and
702Andrew Johnson for more suggestions about expiration, to Ariel
703Scolnikov for delightful messages about the Fibonacci function, to
704Dion Almaer for thought-provoking suggestions about the default
705normalizer, to Walt Mankowski and Kurt Starsinic for much help
706investigating problems under threaded Perl, to Alex Dudkevich for
707reporting the bug in prototyped functions and for checking my patch,
708to Tony Bass for many helpful suggestions, to Philippe Verdret for
709enlightening discussion of Hook::PrePostCall, to Nat Torkington for
710advice I ignored, to Chris Nandor for portability advice, and to Jenda
711Krynicky for being a light in the world.
712
713=cut
714