3 What: Transparently speed up functions by caching return values.
5 Author: Mark-Jason Dominus (mjd-perl-memoize+@plover.com)
7 ################################################################
15 There's a very small chance that the tests in speed.t and
16 expire_module_t.t might fail because of clock skew or bizarre system
17 load conditions. If the tests there fail, rerun them and see if the
24 If not, please send me a report that mentions which tests failed.
25 The address is: mjd-perl-memoize+@plover.com.
27 ################################################################
28 What's new since 0.49:
30 Just a maintenance release. I made the tests a little more robust,
31 and I included the Memoization article that I forgot to put into 0.48.
33 ################################################################
34 What's new since 0.48:
36 You can now expire data from the memoization cache according to any
37 expiration policy you desire. A sample policy is provided in the
38 Memoize::Expire module. It supports expiration of items that have
39 been in the cache a certain number of seconds and items that have been
40 accessed a certain number of times. When you call a memoized
41 function, and Memoize discovers that a cache item has expired, it
42 calls the real function and stores the result in the cache, just as if
43 the data had not been in the cache in the first place.
45 Many people asked for a cache expiration feature, and some people even
46 sent patches. Thanks for the patches! But I did not accept them,
47 because they all added the expiration stuff into the module, and I was
48 sure that this was a bad way to do it. Everyone had a different idea
49 of what useful expiration behavior was, so I foresaw an endless series
50 of creeeping features and an expiration mechansim that got more and
51 more and more complicated and slower and slower and slower.
53 The new expiration policy mechanism makes use of the TIE feature. You
54 write a cache policy module ( which might be very simple) and use the
55 TIE feature to insert it between memoize and the real cache. The
56 Memoize::Expire module. included in this package, is a useful example
57 of this that might satisfy many people. The documentation for that
58 module includes an even simpler module for those who would like to
59 implement their own expiration policies.
61 Big win: If you don't use the expiration feature, you don't pay for
62 it. Memoize 0.49 with expiration turned off runs *exactly* as fast as
63 Memoize 0.48 did. Not one line of code has been changed.
65 Moral of the story: Sometimes, there is a Right Way to Do Things that
66 really is better than the obvious way. It might not be obvious at
67 first, and sometimes you have to make people wait for features so that
68 the Right Way to Do Things can make itself known.
70 Many thanks to Mike Cariaso for helping me figure out The Right Way to
73 Also: If you try to use ODBM_File, NDBM_File, SDBM_File, GDBM_File, or
74 DB_File for the LIST_CACHE, you get an error right away, because those
75 kinds of files will only store strings. Thanks to Jonathan Roy for
76 suggesting this. If you want to store list values in a persistent
77 cache, try Memoize::Storable.
79 ################################################################
81 What's new since 0.46:
83 Caching of function return values into NDBM files is now supported.
84 You can cache function return values into Memoize::AnyDBM files, which
85 is a pseudo-module that selects the `best' available DBM
88 Bug fix: Prototyped functions are now memoized correctly; memoizing
89 used to remove the prototype and issue a warning. Also new tests for
90 this feature. (Thanks Alex Dudkevich)
92 New test suites for SDBM and NDBM caching and prototyped functions.
93 Various small fixes in the test suite.
94 Various documentation enhancements and fixes.
96 ################################################################
98 What's new since 0.45:
100 Now has an interface to `Storable'. This wasn't formerly possible,
101 because the base package can only store caches via modules that
102 present a tied hash interface, and `Storable' doesn't. Solution:
103 Memoize::Storable is a tied hash interface to `Storable'.
105 ################################################################
107 What's new since 0.06:
109 Storage of cached function return values in a static file is now
110 tentatively supported. `memoize' now accepts new options SCALAR_CACHE
111 and LIST_CACHE to specify the destination and protocol for saving
112 cached values to disk.
114 Consider these features alpha, and please report bugs to
115 mjd-perl-memoize@plover.com. The beta version is awaiting a more
118 Much new documentation to support all this.
120 ################################################################
122 What's new since 0.05:
124 Calling syntax is now
126 memoize(function, OPTION1 => VALUE1, ...)
130 memoize(function, { OPTION1 => VALUE1, ... })
133 Functions that return lists can now be memoized.
135 New tests for list-returning functions and their normalizers.
137 Various documentation changes.
139 Return value from `unmemoize' is now the resulting unmemoized
140 function, instead of the constant `1'. It was already docmuented to
143 ################################################################
148 Memoize - Make your functions faster by trading space for time
153 memoize('slow_function');
154 slow_function(arguments); # Is faster than it was before
157 This is normally all you need to know. However, many options are available:
159 memoize(function, options...);
163 NORMALIZER => function
166 SCALAR_CACHE => 'MEMORY'
167 SCALAR_CACHE => ['TIE', Module, arguments...]
168 SCALAR_CACHE => 'FAULT'
169 SCALAR_CACHE => 'MERGE'
171 LIST_CACHE => 'MEMORY'
172 LIST_CACHE => ['TIE', Module, arguments...]
173 LIST_CACHE => 'FAULT'
174 LIST_CACHE => 'MERGE'
179 `Memoizing' a function makes it faster by trading space for time. It
180 does this by caching the return values of the function in a table.
181 If you call the function again with the same arguments, C<memoize>
182 jmups in and gives you the value out of the table, instead of letting
183 the function compute the value all over again.
185 Here is an extreme example. Consider the Fibonacci sequence, defined
186 by the following function:
188 # Compute Fibonacci numbers
192 fib($n-1) + fib($n-2);
195 This function is very slow. Why? To compute fib(14), it first wants
196 to compute fib(13) and fib(12), and add the results. But to compute
197 fib(13), it first has to compute fib(12) and fib(11), and then it
198 comes back and computes fib(12) all over again even though the answer
199 is the same. And both of the times that it wants to compute fib(12),
200 it has to compute fib(11) from scratch, and then it has to do it
201 again each time it wants to compute fib(13). This function does so
202 much recomputing of old results that it takes a really long time to
203 run---fib(14) makes 1,200 extra recursive calls to itself, to compute
204 and recompute things that it already computed.
206 This function is a good candidate for memoization. If you memoize the
207 `fib' function above, it will compute fib(14) exactly once, the first
208 time it needs to, and then save the result in a table. Then if you
209 ask for fib(14) again, it gives you the result out of the table.
210 While computing fib(14), instead of computing fib(12) twice, it does
211 it once; the second time it needs the value it gets it from the table.
212 It doesn't compute fib(11) four times; it computes it once, getting it
213 from the table the next three times. Instead of making 1,200
214 recursive calls to `fib', it makes 15. This makes the function about
217 You could do the memoization yourself, by rewriting the function, like
220 # Compute Fibonacci numbers, memoized version
224 return $fib[$n] if defined $fib[$n];
225 return $fib[$n] = $n if $n < 2;
226 $fib[$n] = fib($n-1) + fib($n-2);
230 Or you could use this module, like this:
235 # Rest of the fib function just like the original version.
237 This makes it easy to turn memoizing on and off.
239 Here's an even simpler example: I wrote a simple ray tracer; the
240 program would look in a certain direction, figure out what it was
241 looking at, and then convert the `color' value (typically a string
242 like `red') of that object to a red, green, and blue pixel value, like
245 for ($direction = 0; $direction < 300; $direction++) {
246 # Figure out which object is in direction $direction
247 $color = $object->{color};
248 ($r, $g, $b) = @{&ColorToRGB($color)};
252 Since there are relatively few objects in a picture, there are only a
253 few colors, which get looked up over and over again. Memoizing
254 C<ColorToRGB> speeded up the program by several percent.
258 This module exports exactly one function, C<memoize>. The rest of the
259 functions in this package are None of Your Business.
265 where C<function> is the name of the function you want to memoize, or
266 a reference to it. C<memoize> returns a reference to the new,
267 memoized version of the function, or C<undef> on a non-fatal error.
268 At present, there are no non-fatal errors, but there might be some in
271 If C<function> was the name of a function, then C<memoize> hides the
272 old version and installs the new memoized version under the old name,
273 so that C<&function(...)> actually invokes the memoized version.
277 There are some optional options you can pass to C<memoize> to change
278 the way it behaves a little. To supply options, invoke C<memoize>
281 memoize(function, NORMALIZER => function,
283 SCALAR_CACHE => option,
287 Each of these options is optional; you can include some, all, or none
292 If you supply a function name with C<INSTALL>, memoize will install
293 the new, memoized version of the function under the name you give.
296 memoize('fib', INSTALL => 'fastfib')
298 installs the memoized version of C<fib> as C<fastfib>; without the
299 C<INSTALL> option it would have replaced the old C<fib> with the
302 To prevent C<memoize> from installing the memoized version anywhere, use
303 C<INSTALL =E<gt> undef>.
307 Suppose your function looks like this:
309 # Typical call: f('aha!', A => 11, B => 12);
313 $hash{B} ||= 2; # B defaults to 2
314 $hash{C} ||= 7; # C defaults to 7
316 # Do something with $a, %hash
319 Now, the following calls to your function are all completely equivalent:
324 f(OUCH, B => 2, C => 7);
325 f(OUCH, C => 7, B => 2);
328 However, unless you tell C<Memoize> that these calls are equivalent,
329 it will not know that, and it will compute the values for these
330 invocations of your function separately, and store them separately.
332 To prevent this, supply a C<NORMALIZER> function that turns the
333 program arguments into a string in a way that equivalent arguments
334 turn into the same string. A C<NORMALIZER> function for C<f> above
335 might look like this:
343 join($;, $a, map ($_ => $hash{$_}) sort keys %hash);
346 Each of the argument lists above comes out of the C<normalize_f>
347 function looking exactly the same, like this:
351 You would tell C<Memoize> to use this normalizer this way:
353 memoize('f', NORMALIZER => 'normalize_f');
355 C<memoize> knows that if the normalized version of the arguments is
356 the same for two argument lists, then it can safely look up the value
357 that it computed for one argument list and return it as the result of
358 calling the function with the other argument list, even if the
359 argument lists look different.
361 The default normalizer just concatenates the arguments with C<$;> in
362 between. This always works correctly for functions with only one
363 argument, and also when the arguments never contain C<$;> (which is
364 normally character #28, control-\. ) However, it can confuse certain
367 normalizer("a\034", "b")
368 normalizer("a", "\034b")
369 normalizer("a\034\034b")
373 The calling context of the function (scalar or list context) is
374 propagated to the normalizer. This means that if the memoized
375 function will treat its arguments differently in list context than it
376 would in scalar context, you can have the normalizer function select
377 its behavior based on the results of C<wantarray>. Even if called in
378 a list context, a normalizer should still return a single string.
380 =head2 C<SCALAR_CACHE>, C<LIST_CACHE>
382 Normally, C<Memoize> caches your function's return values into an
383 ordinary Perl hash variable. However, you might like to have the
384 values cached on the disk, so that they persist from one run of your
385 program to the next, or you might like to associate some other
386 interesting semantics with the cached values.
388 There's a slight complication under the hood of C<Memoize>: There are
389 actually I<two> caches, one for scalar values and one for list values.
390 When your function is called in scalar context, its return value is
391 cached in one hash, and when your function is called in list context,
392 its value is cached in the other hash. You can control the caching
393 behavior of both contexts independently with these options.
395 The argument to C<LIST_CACHE> or C<SCALAR_CACHE> must either be one of
396 the following four strings:
403 or else it must be a reference to a list whose first element is one of
404 these four strings, such as C<[TIE, arguments...]>.
410 C<MEMORY> means that return values from the function will be cached in
411 an ordinary Perl hash variable. The hash variable will not persist
412 after the program exits. This is the default.
416 C<TIE> means that the function's return values will be cached in a
417 tied hash. A tied hash can have any semantics at all. It is
418 typically tied to an on-disk database, so that cached values are
419 stored in the database and retrieved from it again when needed, and
420 the disk file typically persists after your pogram has exited.
422 If C<TIE> is specified as the first element of a list, the remaining
423 list elements are taken as arguments to the C<tie> call that sets up
424 the tied hash. For example,
426 SCALAR_CACHE => [TIE, DB_File, $filename, O_RDWR | O_CREAT, 0666]
428 says to tie the hash into the C<DB_File> package, and to pass the
429 C<$filename>, C<O_RDWR | O_CREAT>, and C<0666> arguments to the C<tie>
430 call. This has the effect of storing the cache in a C<DB_File>
431 database whose name is in C<$filename>.
433 Other typical uses of C<TIE>:
435 LIST_CACHE => [TIE, GDBM_File, $filename, O_RDWR | O_CREAT, 0666]
436 SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, O_RDWR|O_CREAT, 0666]
437 LIST_CACHE => [TIE, My_Package, $tablename, $key_field, $val_field]
439 This last might tie the cache hash to a package that you wrote
440 yourself that stores the cache in a SQL-accessible database.
441 A useful use of this feature: You can construct a batch program that
442 runs in the background and populates the memo table, and then when you
443 come to run your real program the memoized function will be
444 screamingly fast because all its results have been precomputed.
448 C<FAULT> means that you never expect to call the function in scalar
449 (or list) context, and that if C<Memoize> detects such a call, it
450 should abort the program. The error message is one of
452 `foo' function called in forbidden list context at line ...
453 `foo' function called in forbidden scalar context at line ...
457 C<MERGE> normally means the function does not distinguish between list
458 and sclar context, and that return values in both contexts should be
459 stored together. C<LIST_CACHE =E<gt> MERGE> means that list context
460 return values should be stored in the same hash that is used for
461 scalar context returns, and C<SCALAR_CACHE =E<gt> MERGE> means the
462 same, mutatis mutandis. It is an error to specify C<MERGE> for both,
463 but it probably does something useful.
465 Consider this function:
469 Normally, the following code will result in two calls to C<pi>:
475 The first call caches the value C<3> in the scalar cache; the second
476 caches the list C<(3)> in the list cache. The third call doesn't call
477 the real C<pi> function; it gets the value from the scalar cache.
479 Obviously, the second call to C<pi> is a waste of time, and storing
480 its return value is a waste of space. Specifying C<LIST_CACHE
481 =E<gt> MERGE> will make C<memoize> use the same cache for scalar and
482 list context return values, so that the second call uses the scalar
483 cache that was populated by the first call. C<pi> ends up being
484 cvalled only once, and both subsequent calls return C<3> from the
485 cache, regardless of the calling context.
487 Another use for C<MERGE> is when you want both kinds of return values
488 stored in the same disk file; this saves you from having to deal with
489 two disk files instead of one. You can use a normalizer function to
490 keep the two sets of return values separate. For example:
494 SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, ...],
499 my $context = wantarray() ? 'L' : 'S';
500 # ... now compute the hash key from the arguments ...
501 $hashkey = "$context:$hashkey";
504 This normalizer function will store scalar context return values in
505 the disk file under keys that begin with C<S:>, and list context
506 return values under keys that begin with C<L:>.
510 =head1 OTHER FUNCTION
512 There's an C<unmemoize> function that you can import if you want to.
513 Why would you want to? Here's an example: Suppose you have your cache
514 tied to a DBM file, and you want to make sure that the cache is
515 written out to disk if someone interrupts the program. If the program
516 exits normally, this will happen anyway, but if someone types
517 control-C or something then the program will terminate immediately
518 without syncronizing the database. So what you can do instead is
520 $SIG{INT} = sub { unmemoize 'function' };
523 Thanks to Jonathan Roy for discovering a use for C<unmemoize>.
525 C<unmemoize> accepts a reference to, or the name of a previously
526 memoized function, and undoes whatever it did to provide the memoized
527 version in the first place, including making the name refer to the
528 unmemoized version if appropriate. It returns a reference to the
529 unmemoized version of the function.
531 If you ask it to unmemoize a function that was never memoized, it
536 Memoization is not a cure-all:
542 Do not memoize a function whose behavior depends on program
543 state other than its own arguments, such as global variables, the time
544 of day, or file input. These functions will not produce correct
545 results when memoized. For a particularly easy example:
551 This function takes no arguments, and as far as C<Memoize> is
552 concerned, it always returns the same result. C<Memoize> is wrong, of
553 course, and the memoized version of this function will call C<time> once
554 to get the current time, and it will return that same time
555 every time you call it after that.
559 Do not memoize a function with side effects.
564 print "$a + $b = $s.\n";
567 This function accepts two arguments, adds them, and prints their sum.
568 Its return value is the numuber of characters it printed, but you
569 probably didn't care about that. But C<Memoize> doesn't understand
570 that. If you memoize this function, you will get the result you
571 expect the first time you ask it to print the sum of 2 and 3, but
572 subsequent calls will return the number 11 (the return value of
573 C<print>) without actually printing anything.
577 Do not memoize a function that returns a data structure that is
578 modified by its caller.
580 Consider these functions: C<getusers> returns a list of users somehow,
581 and then C<main> throws away the first user on the list and prints the
585 my $userlist = getusers();
587 foreach $u (@$userlist) {
594 # Do something to get a list of users;
595 \@users; # Return reference to list.
598 If you memoize C<getusers> here, it will work right exactly once. The
599 reference to the users list will be stored in the memo table. C<main>
600 will discard the first element from the referenced list. The next
601 time you invoke C<main>, C<Memoize> will not call C<getusers>; it will
602 just return the same reference to the same list it got last time. But
603 this time the list has already had its head removed; C<main> will
604 erroneously remove another element from it. The list will get shorter
605 and shorter every time you call C<main>.
610 =head1 PERSISTENT CACHE SUPPORT
612 You can tie the cache tables to any sort of tied hash that you want
613 to, as long as it supports C<TIEHASH>, C<FETCH>, C<STORE>, and
614 C<EXISTS>. For example,
616 memoize 'function', SCALAR_CACHE =>
617 [TIE, GDBM_File, $filename, O_RDWR|O_CREAT, 0666];
619 works just fine. For some storage methods, you need a little glue.
621 C<SDBM_File> doesn't supply an C<EXISTS> method, so included in this
622 package is a glue module called C<Memoize::SDBM_File> which does
623 provide one. Use this instead of plain C<SDBM_File> to store your
624 cache table on disk in an C<SDBM_File> database:
628 [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666];
630 C<NDBM_File> has the same problem and the same solution.
632 C<Storable> isn't a tied hash class at all. You can use it to store a
633 hash to disk and retrieve it again, but you can't modify the hash while
634 it's on the disk. So if you want to store your cache table in a
635 C<Storable> database, use C<Memoize::Storable>, which puts a hashlike
636 front-end onto C<Storable>. The hash table is actually kept in
637 memory, and is loaded from your C<Storable> file at the time you
638 memoize the function, and stored back at the time you unmemoize the
639 function (or when your program exits):
642 SCALAR_CACHE => [TIE, Memoize::Storable, $filename];
645 SCALAR_CACHE => [TIE, Memoize::Storable, $filename, 'nstore'];
647 Include the `nstore' option to have the C<Storable> database written
648 in `network order'. (See L<Storable> for more details about this.)
650 =head1 EXPIRATION SUPPORT
652 See Memoize::Expire, which is a plug-in module that adds expiration
653 functionality to Memoize. If you don't like the kinds of policies
654 that Memoize::Expire implements, it is easy to write your own plug-in
655 module to implement whatever policy you desire.
659 Needs a better test suite, especially for the tied and expiration stuff.
661 Also, there is some problem with the way C<goto &f> works under
662 threaded Perl, because of the lexical scoping of C<@_>. This is a bug
663 in Perl, and until it is resolved, Memoize won't work with these
664 Perls. To fix it, you need to chop the source code a little. Find
665 the comment in the source code that says C<--- THREADED PERL
666 COMMENT---> and comment out the active line and uncomment the
667 commented one. Then try it again.
669 I wish I could investigate this threaded Perl problem. If someone
670 could lend me an account on a machine with threaded Perl for a few
671 hours, it would be very helpful.
673 That is why the version number is 0.49 instead of 1.00.
677 To join a very low-traffic mailing list for announcements about
678 C<Memoize>, send an empty note to C<mjd-perl-memoize-request@plover.com>.
682 Mark-Jason Dominus (C<mjd-perl-memoize+@plover.com>), Plover Systems co.
684 See the C<Memoize.pm> Page at http://www.plover.com/~mjd/perl/Memoize/
685 for news and upgrades. Near this page, at
686 http://www.plover.com/~mjd/perl/MiniMemoize/ there is an article about
687 memoization and about the internals of Memoize that appeared in The
688 Perl Journal, issue #13. (This article is also included in the
689 Memoize distribution as `article.html'.)
691 To join a mailing list for announcements about C<Memoize>, send an
692 empty message to C<mjd-perl-memoize-request@plover.com>. This mailing
693 list is for announcements only and has extremely low traffic---about
694 four messages per year.
698 Many thanks to Jonathan Roy for bug reports and suggestions, to
699 Michael Schwern for other bug reports and patches, to Mike Cariaso for
700 helping me to figure out the Right Thing to Do About Expiration, to
701 Joshua Gerth, Joshua Chamas, Jonathan Roy, Mark D. Anderson, and
702 Andrew Johnson for more suggestions about expiration, to Ariel
703 Scolnikov for delightful messages about the Fibonacci function, to
704 Dion Almaer for thought-provoking suggestions about the default
705 normalizer, to Walt Mankowski and Kurt Starsinic for much help
706 investigating problems under threaded Perl, to Alex Dudkevich for
707 reporting the bug in prototyped functions and for checking my patch,
708 to Tony Bass for many helpful suggestions, to Philippe Verdret for
709 enlightening discussion of Hook::PrePostCall, to Nat Torkington for
710 advice I ignored, to Chris Nandor for portability advice, and to Jenda
711 Krynicky for being a light in the world.