Commit | Line | Data |
a0cb3900 |
1 | # Version 0.05 alpha $Revision: 1.5 $ $Date: 1999/09/17 14:57:55 $ |
2 | |
3 | =head1 TO DO |
4 | |
5 | =over 4 |
6 | |
7 | =item * |
8 | |
9 | LIST_CACHE doesn't work with ties to most DBM implementations, because |
10 | Memouze tries to save a listref, and DB_File etc. can only store |
11 | strings. This should at least be documented. Maybe Memoize could |
12 | detect the problem at TIE time and throw a fatal error. |
13 | |
14 | Try out MLDBM here and document it if it works. |
15 | |
16 | =item * |
17 | |
18 | We should extend the benchmarking module to allow |
19 | |
20 | timethis(main, { MEMOIZED => [ suba, subb ] }) |
21 | |
22 | What would this do? It would time C<main> three times, once with |
23 | C<suba> and C<subb> unmemoized, twice with them memoized. |
24 | |
25 | Why would you want to do this? By the third set of runs, the memo |
26 | tables would be fully populated, so all calls by C<main> to C<suba> |
27 | and C<subb> would return immediately. You would be able to see how |
28 | much of C<main>'s running time was due to time spent computing in |
29 | C<suba> and C<subb>. If that was just a little time, you would know |
30 | that optimizing or improving C<suba> and C<subb> would not have a |
31 | large effect on the performance of C<main>. But if there was a big |
32 | difference, you would know that C<suba> or C<subb> was a good |
33 | candidate for optimization if you needed to make C<main> go faster. |
34 | |
35 | Done. |
36 | |
37 | =item * |
38 | |
39 | Perhaps C<memoize> should return a reference to the original function |
40 | as well as one to the memoized version? But the programmer could |
41 | always construct such a reference themselves, so perhaps it's not |
42 | necessary. We save such a reference anyway, so a new package method |
43 | could return it on demand even if it wasn't provided by C<memoize>. |
44 | We could even bless the new function reference so that it could have |
45 | accessor methods for getting to the original function, the options, |
46 | the memo table, etc. |
47 | |
48 | Naah. |
49 | |
50 | =item * |
51 | |
52 | The TODISK feature is not ready yet. It will have to be rather |
53 | complicated, providing options for which disk method to use (GDBM? |
54 | DB_File? Flat file? Storable? User-supplied?) and which stringizing |
55 | method to use (FreezeThaw? Marshal? User-supplied?) |
56 | |
57 | Done! |
58 | |
59 | =item * |
60 | |
61 | Maybe an option for automatic expiration of cache values? (`After one |
62 | day,' `After five uses,' etc.) Also possibly an option to limit the |
63 | number of active entries with automatic LRU expiration. |
64 | |
65 | You have a long note to Mike Cariaso that outlines a good approach |
66 | that you sent on 9 April 1999. |
67 | |
68 | What's the timeout stuff going to look like? |
69 | |
70 | EXPIRE_TIME => time_in_sec |
71 | EXPIRE_USES => num_uses |
72 | MAXENTRIES => n |
73 | |
74 | perhaps? Is EXPIRE_USES actually useful? |
75 | |
76 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES. |
77 | MAXENTRIES can come later as a separate module. |
78 | |
79 | =item * |
80 | |
81 | Put in a better example than C<fibo>. Show an example of a |
82 | nonrecursive function that simply takes a long time to run. |
83 | C<getpwuid> for example? But this exposes the bug that you can't say |
84 | C<memoize('getpwuid')>, so perhaps it's not a very good example. |
85 | |
86 | Well, I did add the ColorToRGB example, but it's still not so good. |
87 | These examples need a lot of work. C<factorial> might be a better |
88 | example than C<fibo>. |
89 | |
90 | =item * |
91 | |
92 | Add more regression tests for normalizers. |
93 | |
94 | =item * |
95 | |
96 | Maybe resolve normalizer function to code-ref at memoize time instead |
97 | of at function call time for efficiency? I think there was some |
98 | reason not to do this, but I can't remember what it was. |
99 | |
100 | =item * |
101 | |
102 | Add more array value tests to the test suite. |
103 | |
104 | Does it need more now? |
105 | |
106 | =item * |
107 | |
108 | Fix that `Subroutine u redefined ... line 484' message. |
109 | |
110 | Fixed, I think. |
111 | |
112 | =item * |
113 | |
114 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes. |
115 | |
116 | =item * |
117 | |
118 | There should be an option to dump out the memoized values or to |
119 | otherwise traverse them. |
120 | |
121 | What for? |
122 | |
123 | Maybe the tied hash interface taskes care of this anyway? |
124 | |
125 | =item * |
126 | |
127 | Include an example that caches DNS lookups. |
128 | |
129 | =item * |
130 | |
131 | Make tie for Storable (Memoize::Storable) |
132 | |
133 | A prototype of Memoize::Storable is finished. Test it and add to the |
134 | test suite. |
135 | |
136 | Done. |
137 | |
138 | =item * |
139 | |
140 | Make tie for DBI (Memoize::DBI) |
141 | |
142 | =item * |
143 | |
144 | I think there's a bug. See `###BUG'. |
145 | |
146 | =item * |
147 | |
148 | Storable probably can't be done, because it doesn't allow updating. |
149 | Maybe a different interface that supports readonly caches fronted by a |
150 | writable in-memory cache? A generic tied hash maybe? |
151 | |
152 | FETCH { |
153 | if (it's in the memory hash) { |
154 | return it |
155 | } elsif (it's in the readonly disk hash) { |
156 | return it |
157 | } else { |
158 | not-there |
159 | } |
160 | } |
161 | |
162 | STORE { |
163 | put it into the in-memory hash |
164 | } |
165 | |
166 | Maybe `save' and `restore' methods? |
167 | |
168 | It isn't working right because the destructor doesn't get called at |
169 | the right time. |
170 | |
171 | This is fixed. `use strict vars' would have caught it immediately. Duh. |
172 | |
173 | =item * |
174 | |
175 | Don't forget about generic interface to Storable-like packages |
176 | |
177 | =item * |
178 | |
179 | |
180 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to |
181 | |
182 | SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666], |
183 | LIST_CACHE => MERGE |
184 | |
185 | =item * |
186 | |
187 | Maybe the default for LIST_CACHE should be MERGE anyway. |
188 | |
189 | =item * |
190 | |
191 | There's some terrible bug probably related to use under threaded perl, |
192 | possibly connected with line 56: |
193 | |
194 | my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }"; |
195 | |
196 | I think becayse C<@_> is lexically scoped in threadperl, the effect of |
197 | C<unshift> never makes it into C<_memoizer>. That's probably a bug in |
198 | Perl, but maybe I should work around it. Can anyone provide more |
199 | information here, or lend me a machine with threaded Perl where I can |
200 | test this theory? Line 59, currently commented out, may fix the |
201 | problem. |
202 | |
203 | =item * |
204 | |
205 | Maybe if the original function has a prototype, the module can use |
206 | that to select the most appropriate default normalizer. For example, |
207 | if the prototype was C<($)>, there's no reason to use `join'. If it's |
208 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>. |
209 | |
210 | =item * |
211 | |
212 | Ariel Scolnikov suggests using the change counting problem as an |
213 | example. (How many ways to make change of a dollar?) |
214 | |
215 | =item * |
216 | |
217 | I found a use for `unmemoize'. If you're using the Storable glue, and |
218 | your program gets SIGINT, you find that the cache data is not in the |
219 | cache, because Perl normally writes it all out at once from a |
220 | DESTROY method, and signals skip DESTROY processing. So you could add |
221 | |
222 | $sig{INT} = sub { unmemoize ... }; |
223 | |
224 | (Jonathan Roy pointed this out) |
225 | |
226 | =item * |
227 | |
228 | This means it would be useful to have a method to return references to |
229 | all the currently-memoized functions so that you could say |
230 | |
231 | $sig{INT} = sub { for $f (Memoize->all_memoized) { |
232 | unmemoize $f; |
233 | } |
234 | } |
235 | |
236 | |
237 | =item * |
238 | |
239 | 19990917 There should be a call you can make to get back the cache |
240 | itself. If there were, then you could delete stuff from it to |
241 | manually expire data items. |
242 | |
243 | =item * |
244 | |
245 | 19990925 Randal says that the docs for Memoize;:Expire should make it |
246 | clear that the expired entries are never flushed all at once. He |
247 | asked if you would need to do that manually. I said: |
248 | |
249 | Right, if that's what you want. If you have EXISTS return false, |
250 | it'll throw away the old cached item and replace it in the cache |
251 | with a new item. But if you want the cache to actually get smaller, |
252 | you have to do that yourself. |
253 | |
254 | I was planning to build an Expire module that implemented an LRU |
255 | queue and kept the cache at a constant fixed size, but I didn't get |
256 | to it yet. It's not clear to me that the automatic exptynig-out |
257 | behavior is very useful anyway. The whole point of a cache is to |
258 | trade space for time, so why bother going through the cache to throw |
259 | away old items before you need to? |
260 | |
261 | Randal then pointed out that it could discard expired items at DESTRoY |
262 | or TIEHASH time, which seemed like a good idea, because if the cache |
263 | is on disk you might like to keep it as small as possible. |
264 | |
265 | =item * |
266 | |
267 | 19991219 Philip Gwyn suggests this technique: You have a load_file |
268 | function that memoizes the file contexts. But then if the file |
269 | changes you get the old contents. So add a normalizer that does |
270 | |
271 | return join $;, (stat($_[0])[9]), $_[0]; |
272 | |
273 | Now when the modification date changes, the true key returned by the |
274 | normalizer is different, so you get a cache miss and it loads the new |
275 | contents. Disadvantage: The old contents are still in the cache. I |
276 | think it makes more sense to have a special expiration manager for |
277 | this. Make one up and bundle it. |
278 | |
279 | 19991220 I have one written: Memoize::ExpireFile. But how can you |
280 | make this work when the function might have several arguments, of |
281 | which some are filenames and some aren't? |
282 | |
283 | =item * |
284 | |
285 | 19991219 There should be an inheritable TIEHASH method that does the |
286 | argument processing properly. |
287 | |
288 | 19991220 Philip Gwyn contributed a patch for this. |
289 | |
290 | 20001231 You should really put this in. Jonathan Roy uncovered a |
291 | problem that it will be needed to solve. Here's the problem: He has: |
292 | |
293 | memoize "get_items", |
294 | LIST_CACHE => ["TIE", "Memoize::Expire", |
295 | LIFETIME => 86400, |
296 | TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666] |
297 | ]; |
298 | |
299 | This won't work, because memoize is trying to store listrefs in a |
300 | DB_File. He owuld have gotten a fatal error if he had done this: |
301 | |
302 | memoize "get_items", |
303 | LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]' |
304 | |
305 | |
306 | But in this case, he tied the cache to Memoize::Expire, which is *not* |
307 | scalar-only, and the check for scalar-only ties is missing from |
308 | Memoize::Expire. The inheritable method can take care of this. |
309 | |
310 | =item * |
311 | |
312 | 20001130 Custom cache manager that checks to make sure the function |
313 | return values actually match the memoized values. |
314 | |
315 | =item * |
316 | |
317 | 20001231 Expiration manager that watches cache performance and |
318 | accumulates statistics. Variation: Have it automatically unmemoize |
319 | the function if performance is bad. |
320 | |
321 | =item * |
322 | |
323 | 20010517 Option to have normalizer *modify* @_ for use by memoized |
324 | function. This would save code and time in cases like the one in the |
325 | manual under 'NORMALIZER', where both f() and normalize_f() do the |
326 | same analysis and make the same adjustments to the hash. If the |
327 | normalizer could make the adjustments and save the changes in @_, you |
328 | wouldn't have to do it twice. |
329 | |
330 | =item * |
331 | There was probably some other stuff that I forgot. |
332 | |
333 | |
334 | |
335 | =back |