X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=lib%2FMemoize%2FREADME;h=552f6212363e81e6258b0a4310eebf54f5735e98;hb=29ddfe354327d85ef66e9723b006d41eb553cd25;hp=60f9b8387bcf9704d695d3d3a747768a4c4096f0;hpb=a0cb39004565ec2396fbdb3f1949b8f13304208e;p=p5sagit%2Fp5-mst-13.2.git diff --git a/lib/Memoize/README b/lib/Memoize/README index 60f9b83..552f621 100644 --- a/lib/Memoize/README +++ b/lib/Memoize/README @@ -1,7 +1,7 @@ Name: Memoize What: Transparently speed up functions by caching return values. -Version: 0.51 +Version: 1.00 Author: Mark-Jason Dominus (mjd-perl-memoize+@plover.com) ################################################################ @@ -25,690 +25,58 @@ If not, please send me a report that mentions which tests failed. The address is: mjd-perl-memoize+@plover.com. ################################################################ -What's new since 0.49: +What's new since 0.66: -Just a maintenance release. I made the tests a little more robust, -and I included the Memoization article that I forgot to put into 0.48. +Minor documentation and test changes only. ################################################################ -What's new since 0.48: +What's new since 0.65: -You can now expire data from the memoization cache according to any -expiration policy you desire. A sample policy is provided in the -Memoize::Expire module. It supports expiration of items that have -been in the cache a certain number of seconds and items that have been -accessed a certain number of times. When you call a memoized -function, and Memoize discovers that a cache item has expired, it -calls the real function and stores the result in the cache, just as if -the data had not been in the cache in the first place. +Test changes only. -Many people asked for a cache expiration feature, and some people even -sent patches. Thanks for the patches! But I did not accept them, -because they all added the expiration stuff into the module, and I was -sure that this was a bad way to do it. Everyone had a different idea -of what useful expiration behavior was, so I foresaw an endless series -of creeeping features and an expiration mechansim that got more and -more and more complicated and slower and slower and slower. - -The new expiration policy mechanism makes use of the TIE feature. You -write a cache policy module ( which might be very simple) and use the -TIE feature to insert it between memoize and the real cache. The -Memoize::Expire module. included in this package, is a useful example -of this that might satisfy many people. The documentation for that -module includes an even simpler module for those who would like to -implement their own expiration policies. - -Big win: If you don't use the expiration feature, you don't pay for -it. Memoize 0.49 with expiration turned off runs *exactly* as fast as -Memoize 0.48 did. Not one line of code has been changed. - -Moral of the story: Sometimes, there is a Right Way to Do Things that -really is better than the obvious way. It might not be obvious at -first, and sometimes you have to make people wait for features so that -the Right Way to Do Things can make itself known. - -Many thanks to Mike Cariaso for helping me figure out The Right Way to -Do Things. - -Also: If you try to use ODBM_File, NDBM_File, SDBM_File, GDBM_File, or -DB_File for the LIST_CACHE, you get an error right away, because those -kinds of files will only store strings. Thanks to Jonathan Roy for -suggesting this. If you want to store list values in a persistent -cache, try Memoize::Storable. - -################################################################ - -What's new since 0.46: - -Caching of function return values into NDBM files is now supported. -You can cache function return values into Memoize::AnyDBM files, which -is a pseudo-module that selects the `best' available DBM -implementation. - -Bug fix: Prototyped functions are now memoized correctly; memoizing -used to remove the prototype and issue a warning. Also new tests for -this feature. (Thanks Alex Dudkevich) - -New test suites for SDBM and NDBM caching and prototyped functions. -Various small fixes in the test suite. -Various documentation enhancements and fixes. + 0.62 was the fist version that would be distributed with Perl. + I got so absorbed in integrating it that I wrote some tests + that used Time::HiRes. I knew this was safe because + Time::HiRes is also distributed with the same versions of + Perl. I totally forgot that some people will get the module + off of CPAN without Perl and they may not have TIme::HiRes. + Sorry! ################################################################ +What's new since 0.62: -What's new since 0.45: - -Now has an interface to `Storable'. This wasn't formerly possible, -because the base package can only store caches via modules that -present a tied hash interface, and `Storable' doesn't. Solution: -Memoize::Storable is a tied hash interface to `Storable'. - -################################################################ - -What's new since 0.06: - -Storage of cached function return values in a static file is now -tentatively supported. `memoize' now accepts new options SCALAR_CACHE -and LIST_CACHE to specify the destination and protocol for saving -cached values to disk. - -Consider these features alpha, and please report bugs to -mjd-perl-memoize@plover.com. The beta version is awaiting a more -complete test suite. - -Much new documentation to support all this. - -################################################################ - -What's new since 0.05: - -Calling syntax is now - - memoize(function, OPTION1 => VALUE1, ...) - -instead of - - memoize(function, { OPTION1 => VALUE1, ... }) - - -Functions that return lists can now be memoized. - -New tests for list-returning functions and their normalizers. - -Various documentation changes. - -Return value from `unmemoize' is now the resulting unmemoized -function, instead of the constant `1'. It was already docmuented to -do so. - -################################################################ - - -=head1 NAME - -Memoize - Make your functions faster by trading space for time - -=head1 SYNOPSIS - - use Memoize; - memoize('slow_function'); - slow_function(arguments); # Is faster than it was before - - -This is normally all you need to know. However, many options are available: - - memoize(function, options...); - -Options include: - - NORMALIZER => function - INSTALL => new_name - - SCALAR_CACHE => 'MEMORY' - SCALAR_CACHE => ['TIE', Module, arguments...] - SCALAR_CACHE => 'FAULT' - SCALAR_CACHE => 'MERGE' - - LIST_CACHE => 'MEMORY' - LIST_CACHE => ['TIE', Module, arguments...] - LIST_CACHE => 'FAULT' - LIST_CACHE => 'MERGE' - - -=head1 DESCRIPTION - -`Memoizing' a function makes it faster by trading space for time. It -does this by caching the return values of the function in a table. -If you call the function again with the same arguments, C -jmups in and gives you the value out of the table, instead of letting -the function compute the value all over again. - -Here is an extreme example. Consider the Fibonacci sequence, defined -by the following function: - - # Compute Fibonacci numbers - sub fib { - my $n = shift; - return $n if $n < 2; - fib($n-1) + fib($n-2); - } - -This function is very slow. Why? To compute fib(14), it first wants -to compute fib(13) and fib(12), and add the results. But to compute -fib(13), it first has to compute fib(12) and fib(11), and then it -comes back and computes fib(12) all over again even though the answer -is the same. And both of the times that it wants to compute fib(12), -it has to compute fib(11) from scratch, and then it has to do it -again each time it wants to compute fib(13). This function does so -much recomputing of old results that it takes a really long time to -run---fib(14) makes 1,200 extra recursive calls to itself, to compute -and recompute things that it already computed. - -This function is a good candidate for memoization. If you memoize the -`fib' function above, it will compute fib(14) exactly once, the first -time it needs to, and then save the result in a table. Then if you -ask for fib(14) again, it gives you the result out of the table. -While computing fib(14), instead of computing fib(12) twice, it does -it once; the second time it needs the value it gets it from the table. -It doesn't compute fib(11) four times; it computes it once, getting it -from the table the next three times. Instead of making 1,200 -recursive calls to `fib', it makes 15. This makes the function about -150 times faster. - -You could do the memoization yourself, by rewriting the function, like -this: - - # Compute Fibonacci numbers, memoized version - { my @fib; - sub fib { - my $n = shift; - return $fib[$n] if defined $fib[$n]; - return $fib[$n] = $n if $n < 2; - $fib[$n] = fib($n-1) + fib($n-2); - } - } - -Or you could use this module, like this: - - use Memoize; - memoize('fib'); - - # Rest of the fib function just like the original version. - -This makes it easy to turn memoizing on and off. - -Here's an even simpler example: I wrote a simple ray tracer; the -program would look in a certain direction, figure out what it was -looking at, and then convert the `color' value (typically a string -like `red') of that object to a red, green, and blue pixel value, like -this: - - for ($direction = 0; $direction < 300; $direction++) { - # Figure out which object is in direction $direction - $color = $object->{color}; - ($r, $g, $b) = @{&ColorToRGB($color)}; - ... - } - -Since there are relatively few objects in a picture, there are only a -few colors, which get looked up over and over again. Memoizing -C speeded up the program by several percent. - -=head1 DETAILS - -This module exports exactly one function, C. The rest of the -functions in this package are None of Your Business. - -You should say - - memoize(function) - -where C is the name of the function you want to memoize, or -a reference to it. C returns a reference to the new, -memoized version of the function, or C on a non-fatal error. -At present, there are no non-fatal errors, but there might be some in -the future. - -If C was the name of a function, then C hides the -old version and installs the new memoized version under the old name, -so that C<&function(...)> actually invokes the memoized version. - -=head1 OPTIONS - -There are some optional options you can pass to C to change -the way it behaves a little. To supply options, invoke C -like this: - - memoize(function, NORMALIZER => function, - INSTALL => newname, - SCALAR_CACHE => option, - LIST_CACHE => option - ); - -Each of these options is optional; you can include some, all, or none -of them. - -=head2 INSTALL - -If you supply a function name with C, memoize will install -the new, memoized version of the function under the name you give. -For example, - - memoize('fib', INSTALL => 'fastfib') - -installs the memoized version of C as C; without the -C option it would have replaced the old C with the -memoized version. - -To prevent C from installing the memoized version anywhere, use -C undef>. - -=head2 NORMALIZER - -Suppose your function looks like this: - - # Typical call: f('aha!', A => 11, B => 12); - sub f { - my $a = shift; - my %hash = @_; - $hash{B} ||= 2; # B defaults to 2 - $hash{C} ||= 7; # C defaults to 7 - - # Do something with $a, %hash - } - -Now, the following calls to your function are all completely equivalent: - - f(OUCH); - f(OUCH, B => 2); - f(OUCH, C => 7); - f(OUCH, B => 2, C => 7); - f(OUCH, C => 7, B => 2); - (etc.) - -However, unless you tell C that these calls are equivalent, -it will not know that, and it will compute the values for these -invocations of your function separately, and store them separately. - -To prevent this, supply a C function that turns the -program arguments into a string in a way that equivalent arguments -turn into the same string. A C function for C above -might look like this: - - sub normalize_f { - my $a = shift; - my %hash = @_; - $hash{B} ||= 2; - $hash{C} ||= 7; - - join($;, $a, map ($_ => $hash{$_}) sort keys %hash); - } - -Each of the argument lists above comes out of the C -function looking exactly the same, like this: - - OUCH^\B^\2^\C^\7 - -You would tell C to use this normalizer this way: - - memoize('f', NORMALIZER => 'normalize_f'); - -C knows that if the normalized version of the arguments is -the same for two argument lists, then it can safely look up the value -that it computed for one argument list and return it as the result of -calling the function with the other argument list, even if the -argument lists look different. - -The default normalizer just concatenates the arguments with C<$;> in -between. This always works correctly for functions with only one -argument, and also when the arguments never contain C<$;> (which is -normally character #28, control-\. ) However, it can confuse certain -argument lists: - - normalizer("a\034", "b") - normalizer("a", "\034b") - normalizer("a\034\034b") - -for example. - -The calling context of the function (scalar or list context) is -propagated to the normalizer. This means that if the memoized -function will treat its arguments differently in list context than it -would in scalar context, you can have the normalizer function select -its behavior based on the results of C. Even if called in -a list context, a normalizer should still return a single string. - -=head2 C, C - -Normally, C caches your function's return values into an -ordinary Perl hash variable. However, you might like to have the -values cached on the disk, so that they persist from one run of your -program to the next, or you might like to associate some other -interesting semantics with the cached values. - -There's a slight complication under the hood of C: There are -actually I caches, one for scalar values and one for list values. -When your function is called in scalar context, its return value is -cached in one hash, and when your function is called in list context, -its value is cached in the other hash. You can control the caching -behavior of both contexts independently with these options. - -The argument to C or C must either be one of -the following four strings: - - MEMORY - TIE - FAULT - MERGE - -or else it must be a reference to a list whose first element is one of -these four strings, such as C<[TIE, arguments...]>. - -=over 4 - -=item C - -C means that return values from the function will be cached in -an ordinary Perl hash variable. The hash variable will not persist -after the program exits. This is the default. - -=item C - -C means that the function's return values will be cached in a -tied hash. A tied hash can have any semantics at all. It is -typically tied to an on-disk database, so that cached values are -stored in the database and retrieved from it again when needed, and -the disk file typically persists after your pogram has exited. - -If C is specified as the first element of a list, the remaining -list elements are taken as arguments to the C call that sets up -the tied hash. For example, - - SCALAR_CACHE => [TIE, DB_File, $filename, O_RDWR | O_CREAT, 0666] - -says to tie the hash into the C package, and to pass the -C<$filename>, C, and C<0666> arguments to the C -call. This has the effect of storing the cache in a C -database whose name is in C<$filename>. - -Other typical uses of C: - - LIST_CACHE => [TIE, GDBM_File, $filename, O_RDWR | O_CREAT, 0666] - SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, O_RDWR|O_CREAT, 0666] - LIST_CACHE => [TIE, My_Package, $tablename, $key_field, $val_field] - -This last might tie the cache hash to a package that you wrote -yourself that stores the cache in a SQL-accessible database. -A useful use of this feature: You can construct a batch program that -runs in the background and populates the memo table, and then when you -come to run your real program the memoized function will be -screamingly fast because all its results have been precomputed. - -=item C - -C means that you never expect to call the function in scalar -(or list) context, and that if C detects such a call, it -should abort the program. The error message is one of - - `foo' function called in forbidden list context at line ... - `foo' function called in forbidden scalar context at line ... - -=item C - -C normally means the function does not distinguish between list -and sclar context, and that return values in both contexts should be -stored together. C MERGE> means that list context -return values should be stored in the same hash that is used for -scalar context returns, and C MERGE> means the -same, mutatis mutandis. It is an error to specify C for both, -but it probably does something useful. - -Consider this function: - - sub pi { 3; } - -Normally, the following code will result in two calls to C: - - $x = pi(); - ($y) = pi(); - $z = pi(); - -The first call caches the value C<3> in the scalar cache; the second -caches the list C<(3)> in the list cache. The third call doesn't call -the real C function; it gets the value from the scalar cache. - -Obviously, the second call to C is a waste of time, and storing -its return value is a waste of space. Specifying C MERGE> will make C use the same cache for scalar and -list context return values, so that the second call uses the scalar -cache that was populated by the first call. C ends up being -cvalled only once, and both subsequent calls return C<3> from the -cache, regardless of the calling context. - -Another use for C is when you want both kinds of return values -stored in the same disk file; this saves you from having to deal with -two disk files instead of one. You can use a normalizer function to -keep the two sets of return values separate. For example: - - memoize 'myfunc', - NORMALIZER => 'n', - SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, ...], - LIST_CACHE => MERGE, - ; - - sub n { - my $context = wantarray() ? 'L' : 'S'; - # ... now compute the hash key from the arguments ... - $hashkey = "$context:$hashkey"; - } - -This normalizer function will store scalar context return values in -the disk file under keys that begin with C, and list context -return values under keys that begin with C. - -=back - -=head1 OTHER FUNCTION - -There's an C function that you can import if you want to. -Why would you want to? Here's an example: Suppose you have your cache -tied to a DBM file, and you want to make sure that the cache is -written out to disk if someone interrupts the program. If the program -exits normally, this will happen anyway, but if someone types -control-C or something then the program will terminate immediately -without syncronizing the database. So what you can do instead is - - $SIG{INT} = sub { unmemoize 'function' }; - - -Thanks to Jonathan Roy for discovering a use for C. - -C accepts a reference to, or the name of a previously -memoized function, and undoes whatever it did to provide the memoized -version in the first place, including making the name refer to the -unmemoized version if appropriate. It returns a reference to the -unmemoized version of the function. - -If you ask it to unmemoize a function that was never memoized, it -croaks. - -=head1 CAVEATS - -Memoization is not a cure-all: - -=over 4 - -=item * - -Do not memoize a function whose behavior depends on program -state other than its own arguments, such as global variables, the time -of day, or file input. These functions will not produce correct -results when memoized. For a particularly easy example: - - sub f { - time; - } - -This function takes no arguments, and as far as C is -concerned, it always returns the same result. C is wrong, of -course, and the memoized version of this function will call C