lib/Memoize/TODO

   1 # Version 0.05 alpha $Revision: 1.5 $ $Date: 1999/09/17 14:57:55 $
   2
   3 =head1 TO DO
   4
   5 =over 4
   6
   7 =item *
   8
   9 LIST_CACHE doesn't work with ties to most DBM implementations, because
  10 Memouze tries to save a listref, and DB_File etc. can only store
  11 strings.  This should at least be documented.  Maybe Memoize could
  12 detect the problem at TIE time and throw a fatal error.
  13
  14 Try out MLDBM here and document it if it works.
  15
  16 =item *
  17
  18 We should extend the benchmarking module to allow
  19
  20         timethis(main, { MEMOIZED => [ suba, subb ] })
  21
  22 What would this do?  It would time C<main> three times, once with
  23 C<suba> and C<subb> unmemoized, twice with them memoized.
  24
  25 Why would you want to do this?  By the third set of runs, the memo
  26 tables would be fully populated, so all calls by C<main> to C<suba>
  27 and C<subb> would return immediately.  You would be able to see how
  28 much of C<main>'s running time was due to time spent computing in
  29 C<suba> and C<subb>.  If that was just a little time, you would know
  30 that optimizing or improving C<suba> and C<subb> would not have a
  31 large effect on the performance of C<main>.  But if there was a big
  32 difference, you would know that C<suba> or C<subb> was a good
  33 candidate for optimization if you needed to make C<main> go faster.
  34
  35 Done.
  36
  37 =item *
  38
  39 Perhaps C<memoize> should return a reference to the original function
  40 as well as one to the memoized version?  But the programmer could
  41 always construct such a reference themselves, so perhaps it's not
  42 necessary.  We save such a reference anyway, so a new package method
  43 could return it on demand even if it wasn't provided by C<memoize>.
  44 We could even bless the new function reference so that it could have
  45 accessor methods for getting to the original function, the options,
  46 the memo table, etc.
  47
  48 Naah.
  49
  50 =item *
  51
  52 The TODISK feature is not ready yet.  It will have to be rather
  53 complicated, providing options for which disk method to use (GDBM?
  54 DB_File?  Flat file?  Storable?  User-supplied?) and which stringizing
  55 method to use (FreezeThaw?  Marshal?  User-supplied?)
  56
  57 Done!
  58
  59 =item *
  60
  61 Maybe an option for automatic expiration of cache values?  (`After one
  62 day,' `After five uses,' etc.)  Also possibly an option to limit the
  63 number of active entries with automatic LRU expiration.
  64
  65 You have a long note to Mike Cariaso that outlines a good approach
  66 that you sent on 9 April 1999.
  67
  68 What's the timeout stuff going to look like?
  69
  70         EXPIRE_TIME => time_in_sec
  71         EXPIRE_USES => num_uses
  72         MAXENTRIES => n
  73
  74 perhaps?  Is EXPIRE_USES actually useful?
  75
  76 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
  77 MAXENTRIES can come later as a separate module.
  78
  79 =item *
  80
  81 Put in a better example than C<fibo>.  Show an example of a
  82 nonrecursive function that simply takes a long time to run.
  83 C<getpwuid> for example?  But this exposes the bug that you can't say
  84 C<memoize('getpwuid')>, so perhaps it's not a very good example.
  85
  86 Well, I did add the ColorToRGB example, but it's still not so good.
  87 These examples need a lot of work.  C<factorial> might be a better
  88 example than C<fibo>.
  89
  90 =item *
  91
  92 Add more regression tests for normalizers.
  93
  94 =item *
  95
  96 Maybe resolve normalizer function to code-ref at memoize time instead
  97 of at function call time for efficiency?  I think there was some
  98 reason not to do this, but I can't remember what it was.
  99
 100 =item *
 101
 102 Add more array value tests to the test suite.
 103
 104 Does it need more now?
 105
 106 =item *
 107
 108 Fix that `Subroutine u redefined ... line 484' message.
 109
 110 Fixed, I think.
 111
 112 =item *
 113
 114 Get rid of any remaining *{$ref}{CODE} or similar magic hashes.
 115
 116 =item *
 117
 118 There should be an option to dump out the memoized values or to
 119 otherwise traverse them.
 120
 121 What for?
 122
 123 Maybe the tied hash interface taskes care of this anyway?
 124
 125 =item *
 126
 127 Include an example that caches DNS lookups.
 128
 129 =item *
 130
 131 Make tie for Storable (Memoize::Storable)
 132
 133 A prototype of Memoize::Storable is finished.  Test it and add to the
 134 test suite.
 135
 136 Done.
 137
 138 =item *
 139
 140 Make tie for DBI  (Memoize::DBI)
 141
 142 =item *
 143
 144 I think there's a bug.  See `###BUG'.
 145
 146 =item *
 147
 148 Storable probably can't be done, because it doesn't allow updating.
 149 Maybe a different interface that supports readonly caches fronted by a
 150 writable in-memory cache?  A generic tied hash maybe?
 151
 152         FETCH {
 153           if (it's in the memory hash) {
 154             return it
 155           } elsif (it's in the readonly disk hash) {
 156             return it
 157           } else {
 158             not-there
 159           }
 160         }
 161
 162         STORE {
 163           put it into the in-memory hash
 164         }
 165
 166 Maybe `save' and `restore' methods?
 167
 168 It isn't working right because the destructor doesn't get called at
 169 the right time.
 170
 171 This is fixed.  `use strict vars' would have caught it immediately.  Duh.
 172
 173 =item *
 174
 175 Don't forget about generic interface to Storable-like packages
 176
 177 =item *
 178
 179
 180 Maybe add in TODISK after all, with TODISK => 'filename' equivalent to
 181
 182         SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666],
 183         LIST_CACHE => MERGE
 184
 185 =item *
 186
 187 Maybe the default for LIST_CACHE should be MERGE anyway.
 188
 189 =item *
 190
 191 There's some terrible bug probably related to use under threaded perl,
 192 possibly connected with line 56:
 193
 194     my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";
 195
 196 I think becayse C<@_> is lexically scoped in threadperl, the effect of
 197 C<unshift> never makes it into C<_memoizer>.  That's probably a bug in
 198 Perl, but maybe I should work around it.   Can anyone provide more
 199 information here, or lend me a machine with threaded Perl where I can
 200 test this theory?  Line 59, currently commented out, may fix the
 201 problem.
 202
 203 =item *
 204
 205 Maybe if the original function has a prototype, the module can use
 206 that to select the most appropriate default normalizer.  For example,
 207 if the prototype was C<($)>, there's no reason to use `join'.  If it's
 208 C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.
 209
 210 =item *
 211
 212 Ariel Scolnikov suggests using the change counting problem as an
 213 example.  (How many ways to make change of a dollar?)
 214
 215 =item *
 216
 217 I found a use for `unmemoize'.  If you're using the Storable glue, and
 218 your program gets SIGINT, you find that the cache data is not in the
 219 cache, because Perl normally writes it all out at once from a
 220 DESTROY method, and signals skip DESTROY processing.  So you could add
 221
 222         $sig{INT} = sub { unmemoize ... };
 223
 224 (Jonathan Roy pointed this out)
 225
 226 =item *
 227
 228 This means it would be useful to have a method to return references to
 229 all the currently-memoized functions so that you could say
 230
 231         $sig{INT} = sub { for $f (Memoize->all_memoized) {
 232                             unmemoize $f;
 233                           }
 234                         }
 235
 236
 237 =item *
 238
 239 19990917 There should be a call you can make to get back the cache
 240 itself.  If there were, then you could delete stuff from it to
 241 manually expire data items.
 242
 243 =item *
 244
 245 19990925 Randal says that the docs for Memoize;:Expire should make it
 246 clear that the expired entries are never flushed all at once.  He
 247 asked if you would need to do that manually.  I said:
 248
 249   Right, if that's what you want.  If you have EXISTS return false,
 250   it'll throw away the old cached item and replace it in the cache
 251   with a new item.  But if you want the cache to actually get smaller,
 252   you have to do that yourself.
 253
 254   I was planning to build an Expire module that implemented an LRU
 255   queue and kept the cache at a constant fixed size, but I didn't get
 256   to it yet.  It's not clear to me that the automatic exptynig-out
 257   behavior is very useful anyway.  The whole point of a cache is to
 258   trade space for time, so why bother going through the cache to throw
 259   away old items before you need to?
 260
 261 Randal then pointed out that it could discard expired items at DESTRoY
 262 or TIEHASH time, which seemed like a good idea, because if the cache
 263 is on disk you might like to keep it as small as possible.
 264
 265 =item *
 266
 267 19991219 Philip Gwyn suggests this technique:  You have a load_file
 268 function that memoizes the file contexts.  But then if the file
 269 changes you get the old contents.  So add a normalizer that does
 270
 271         return join $;, (stat($_[0])[9]), $_[0];
 272
 273 Now when the modification date changes, the true key returned by the
 274 normalizer is different, so you get a cache miss and it loads the new
 275 contents.   Disadvantage:  The old contents are still in the cache.  I
 276 think it makes more sense to have a special expiration manager for
 277 this.  Make one up and bundle it.
 278
 279 19991220 I have one written: Memoize::ExpireFile.  But how can you
 280 make this work when the function might have several arguments, of
 281 which some are filenames and some aren't?
 282
 283 =item *
 284
 285 19991219 There should be an inheritable TIEHASH method that does the
 286 argument processing properly.
 287
 288 19991220 Philip Gwyn contributed a patch for this.
 289
 290 20001231 You should really put this in.  Jonathan Roy uncovered a
 291 problem that it will be needed to solve.  Here's the problem:  He has:
 292
 293         memoize "get_items",
 294         LIST_CACHE => ["TIE", "Memoize::Expire",
 295                 LIFETIME => 86400,
 296                 TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666]
 297         ];
 298
 299 This won't work, because memoize is trying to store listrefs in a
 300 DB_File.    He owuld have gotten a fatal error if he had done this:
 301
 302         memoize "get_items",
 303           LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]'
 304
 305
 306 But in this case, he tied the cache to Memoize::Expire, which is *not*
 307 scalar-only, and the check for scalar-only ties is missing from
 308 Memoize::Expire.  The inheritable method can take care of this.
 309
 310 =item *
 311
 312 20001130 Custom cache manager that checks to make sure the function
 313 return values actually match the memoized values.
 314
 315 =item *
 316
 317 20001231 Expiration manager that watches cache performance and
 318 accumulates statistics.  Variation:  Have it automatically unmemoize
 319 the function if performance is bad.
 320
 321 =item *
 322
 323 20010517 Option to have normalizer *modify* @_ for use by memoized
 324 function.  This would save code and time in cases like the one in the
 325 manual under 'NORMALIZER', where both f() and normalize_f() do the
 326 same analysis and make the same adjustments to the hash.  If the
 327 normalizer could make the adjustments and save the changes in @_, you
 328 wouldn't have to do it twice.
 329
 330 =item *
 331 There was probably some other stuff that I forgot.
 332
 333
 334
 335 =back