Commit | Line | Data |
3d4a255c |
1 | # Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $ |
a0cb3900 |
2 | |
3 | =head1 TO DO |
4 | |
5 | =over 4 |
6 | |
7 | =item * |
8 | |
9 | LIST_CACHE doesn't work with ties to most DBM implementations, because |
10 | Memouze tries to save a listref, and DB_File etc. can only store |
11 | strings. This should at least be documented. Maybe Memoize could |
12 | detect the problem at TIE time and throw a fatal error. |
13 | |
899dc88a |
14 | 20010623 This was added sometime prior to 20001025. |
15 | |
a0cb3900 |
16 | Try out MLDBM here and document it if it works. |
17 | |
18 | =item * |
19 | |
20 | We should extend the benchmarking module to allow |
21 | |
22 | timethis(main, { MEMOIZED => [ suba, subb ] }) |
23 | |
24 | What would this do? It would time C<main> three times, once with |
25 | C<suba> and C<subb> unmemoized, twice with them memoized. |
26 | |
27 | Why would you want to do this? By the third set of runs, the memo |
28 | tables would be fully populated, so all calls by C<main> to C<suba> |
29 | and C<subb> would return immediately. You would be able to see how |
30 | much of C<main>'s running time was due to time spent computing in |
31 | C<suba> and C<subb>. If that was just a little time, you would know |
32 | that optimizing or improving C<suba> and C<subb> would not have a |
33 | large effect on the performance of C<main>. But if there was a big |
34 | difference, you would know that C<suba> or C<subb> was a good |
35 | candidate for optimization if you needed to make C<main> go faster. |
36 | |
37 | Done. |
38 | |
39 | =item * |
40 | |
41 | Perhaps C<memoize> should return a reference to the original function |
42 | as well as one to the memoized version? But the programmer could |
43 | always construct such a reference themselves, so perhaps it's not |
44 | necessary. We save such a reference anyway, so a new package method |
45 | could return it on demand even if it wasn't provided by C<memoize>. |
46 | We could even bless the new function reference so that it could have |
47 | accessor methods for getting to the original function, the options, |
48 | the memo table, etc. |
49 | |
50 | Naah. |
51 | |
52 | =item * |
53 | |
54 | The TODISK feature is not ready yet. It will have to be rather |
55 | complicated, providing options for which disk method to use (GDBM? |
56 | DB_File? Flat file? Storable? User-supplied?) and which stringizing |
57 | method to use (FreezeThaw? Marshal? User-supplied?) |
58 | |
59 | Done! |
60 | |
61 | =item * |
62 | |
63 | Maybe an option for automatic expiration of cache values? (`After one |
64 | day,' `After five uses,' etc.) Also possibly an option to limit the |
65 | number of active entries with automatic LRU expiration. |
66 | |
67 | You have a long note to Mike Cariaso that outlines a good approach |
68 | that you sent on 9 April 1999. |
69 | |
70 | What's the timeout stuff going to look like? |
71 | |
72 | EXPIRE_TIME => time_in_sec |
73 | EXPIRE_USES => num_uses |
74 | MAXENTRIES => n |
75 | |
76 | perhaps? Is EXPIRE_USES actually useful? |
77 | |
78 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES. |
79 | MAXENTRIES can come later as a separate module. |
80 | |
81 | =item * |
82 | |
83 | Put in a better example than C<fibo>. Show an example of a |
84 | nonrecursive function that simply takes a long time to run. |
85 | C<getpwuid> for example? But this exposes the bug that you can't say |
86 | C<memoize('getpwuid')>, so perhaps it's not a very good example. |
87 | |
88 | Well, I did add the ColorToRGB example, but it's still not so good. |
89 | These examples need a lot of work. C<factorial> might be a better |
90 | example than C<fibo>. |
91 | |
92 | =item * |
93 | |
94 | Add more regression tests for normalizers. |
95 | |
96 | =item * |
97 | |
98 | Maybe resolve normalizer function to code-ref at memoize time instead |
99 | of at function call time for efficiency? I think there was some |
100 | reason not to do this, but I can't remember what it was. |
101 | |
102 | =item * |
103 | |
104 | Add more array value tests to the test suite. |
105 | |
106 | Does it need more now? |
107 | |
108 | =item * |
109 | |
110 | Fix that `Subroutine u redefined ... line 484' message. |
111 | |
112 | Fixed, I think. |
113 | |
114 | =item * |
115 | |
116 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes. |
117 | |
118 | =item * |
119 | |
120 | There should be an option to dump out the memoized values or to |
121 | otherwise traverse them. |
122 | |
123 | What for? |
124 | |
125 | Maybe the tied hash interface taskes care of this anyway? |
126 | |
127 | =item * |
128 | |
129 | Include an example that caches DNS lookups. |
130 | |
131 | =item * |
132 | |
133 | Make tie for Storable (Memoize::Storable) |
134 | |
135 | A prototype of Memoize::Storable is finished. Test it and add to the |
136 | test suite. |
137 | |
138 | Done. |
139 | |
140 | =item * |
141 | |
142 | Make tie for DBI (Memoize::DBI) |
143 | |
144 | =item * |
145 | |
146 | I think there's a bug. See `###BUG'. |
147 | |
148 | =item * |
149 | |
150 | Storable probably can't be done, because it doesn't allow updating. |
151 | Maybe a different interface that supports readonly caches fronted by a |
152 | writable in-memory cache? A generic tied hash maybe? |
153 | |
154 | FETCH { |
155 | if (it's in the memory hash) { |
156 | return it |
157 | } elsif (it's in the readonly disk hash) { |
158 | return it |
159 | } else { |
160 | not-there |
161 | } |
162 | } |
163 | |
164 | STORE { |
165 | put it into the in-memory hash |
166 | } |
167 | |
168 | Maybe `save' and `restore' methods? |
169 | |
170 | It isn't working right because the destructor doesn't get called at |
171 | the right time. |
172 | |
173 | This is fixed. `use strict vars' would have caught it immediately. Duh. |
174 | |
175 | =item * |
176 | |
177 | Don't forget about generic interface to Storable-like packages |
178 | |
899dc88a |
179 | 20010627 It would appear that you put this into 0.51. |
a0cb3900 |
180 | |
899dc88a |
181 | =item * |
a0cb3900 |
182 | |
183 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to |
184 | |
185 | SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666], |
186 | LIST_CACHE => MERGE |
187 | |
188 | =item * |
189 | |
190 | Maybe the default for LIST_CACHE should be MERGE anyway. |
191 | |
192 | =item * |
193 | |
194 | There's some terrible bug probably related to use under threaded perl, |
195 | possibly connected with line 56: |
196 | |
197 | my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }"; |
198 | |
199 | I think becayse C<@_> is lexically scoped in threadperl, the effect of |
200 | C<unshift> never makes it into C<_memoizer>. That's probably a bug in |
201 | Perl, but maybe I should work around it. Can anyone provide more |
202 | information here, or lend me a machine with threaded Perl where I can |
203 | test this theory? Line 59, currently commented out, may fix the |
204 | problem. |
205 | |
899dc88a |
206 | 20010623 Working around this in 0.65, but it still blows. |
207 | |
a0cb3900 |
208 | =item * |
209 | |
210 | Maybe if the original function has a prototype, the module can use |
211 | that to select the most appropriate default normalizer. For example, |
212 | if the prototype was C<($)>, there's no reason to use `join'. If it's |
213 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>. |
214 | |
215 | =item * |
216 | |
217 | Ariel Scolnikov suggests using the change counting problem as an |
218 | example. (How many ways to make change of a dollar?) |
219 | |
220 | =item * |
221 | |
899dc88a |
222 | Jonathan Roy found a use for `unmemoize'. If you're using the |
223 | Storable glue, and your program gets SIGINT, you find that the cache |
224 | data is not in the cache, because Perl normally writes it all out at |
225 | once from a DESTROY method, and signals skip DESTROY processing. So |
226 | you could add |
a0cb3900 |
227 | |
228 | $sig{INT} = sub { unmemoize ... }; |
229 | |
a0cb3900 |
230 | |
231 | =item * |
232 | |
233 | This means it would be useful to have a method to return references to |
234 | all the currently-memoized functions so that you could say |
235 | |
236 | $sig{INT} = sub { for $f (Memoize->all_memoized) { |
237 | unmemoize $f; |
238 | } |
239 | } |
240 | |
241 | |
242 | =item * |
243 | |
244 | 19990917 There should be a call you can make to get back the cache |
245 | itself. If there were, then you could delete stuff from it to |
246 | manually expire data items. |
247 | |
248 | =item * |
249 | |
250 | 19990925 Randal says that the docs for Memoize;:Expire should make it |
251 | clear that the expired entries are never flushed all at once. He |
252 | asked if you would need to do that manually. I said: |
253 | |
254 | Right, if that's what you want. If you have EXISTS return false, |
255 | it'll throw away the old cached item and replace it in the cache |
256 | with a new item. But if you want the cache to actually get smaller, |
257 | you have to do that yourself. |
258 | |
259 | I was planning to build an Expire module that implemented an LRU |
260 | queue and kept the cache at a constant fixed size, but I didn't get |
261 | to it yet. It's not clear to me that the automatic exptynig-out |
262 | behavior is very useful anyway. The whole point of a cache is to |
263 | trade space for time, so why bother going through the cache to throw |
264 | away old items before you need to? |
265 | |
266 | Randal then pointed out that it could discard expired items at DESTRoY |
267 | or TIEHASH time, which seemed like a good idea, because if the cache |
268 | is on disk you might like to keep it as small as possible. |
269 | |
270 | =item * |
271 | |
272 | 19991219 Philip Gwyn suggests this technique: You have a load_file |
273 | function that memoizes the file contexts. But then if the file |
274 | changes you get the old contents. So add a normalizer that does |
275 | |
276 | return join $;, (stat($_[0])[9]), $_[0]; |
277 | |
278 | Now when the modification date changes, the true key returned by the |
279 | normalizer is different, so you get a cache miss and it loads the new |
280 | contents. Disadvantage: The old contents are still in the cache. I |
281 | think it makes more sense to have a special expiration manager for |
282 | this. Make one up and bundle it. |
283 | |
284 | 19991220 I have one written: Memoize::ExpireFile. But how can you |
285 | make this work when the function might have several arguments, of |
286 | which some are filenames and some aren't? |
287 | |
288 | =item * |
289 | |
290 | 19991219 There should be an inheritable TIEHASH method that does the |
291 | argument processing properly. |
292 | |
293 | 19991220 Philip Gwyn contributed a patch for this. |
294 | |
295 | 20001231 You should really put this in. Jonathan Roy uncovered a |
296 | problem that it will be needed to solve. Here's the problem: He has: |
297 | |
298 | memoize "get_items", |
299 | LIST_CACHE => ["TIE", "Memoize::Expire", |
300 | LIFETIME => 86400, |
301 | TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666] |
302 | ]; |
303 | |
304 | This won't work, because memoize is trying to store listrefs in a |
305 | DB_File. He owuld have gotten a fatal error if he had done this: |
306 | |
307 | memoize "get_items", |
308 | LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]' |
309 | |
310 | |
311 | But in this case, he tied the cache to Memoize::Expire, which is *not* |
312 | scalar-only, and the check for scalar-only ties is missing from |
313 | Memoize::Expire. The inheritable method can take care of this. |
314 | |
899dc88a |
315 | 20010623 I decided not to put it in. Instead, we avoid the problem by |
316 | getting rid of TIE. The HASH option does the same thing, and HASH is |
317 | so simple to support that a module is superfluous. |
318 | |
a0cb3900 |
319 | =item * |
320 | |
321 | 20001130 Custom cache manager that checks to make sure the function |
322 | return values actually match the memoized values. |
323 | |
324 | =item * |
325 | |
326 | 20001231 Expiration manager that watches cache performance and |
327 | accumulates statistics. Variation: Have it automatically unmemoize |
328 | the function if performance is bad. |
329 | |
330 | =item * |
331 | |
332 | 20010517 Option to have normalizer *modify* @_ for use by memoized |
333 | function. This would save code and time in cases like the one in the |
334 | manual under 'NORMALIZER', where both f() and normalize_f() do the |
335 | same analysis and make the same adjustments to the hash. If the |
336 | normalizer could make the adjustments and save the changes in @_, you |
337 | wouldn't have to do it twice. |
338 | |
3d4a255c |
339 | =item * |
899dc88a |
340 | 20010623 Add CLEAR methods to tied hash modules. |
341 | |
3d4a255c |
342 | =item * |
899dc88a |
343 | 20010623 You get a warning if you try to use DB_File as LIST_CACHE, |
344 | because it won't store lists. But if you use it as the underlying |
345 | cache with an expiration manager in the middle, no warning---the |
346 | expiration manager doesn't know it's managing a list cache, and |
347 | memoize doesn't know that DB_File is underlying. Is this fixable? |
348 | Probably not, but think about it. |
349 | |
a0cb3900 |
350 | =item * |
351 | There was probably some other stuff that I forgot. |
352 | |
353 | |
354 | |
355 | =back |