Commit | Line | Data |
1e73acc8 |
1 | package Hash::Util::FieldHash; |
2 | |
3 | use 5.009004; |
4 | use strict; |
5 | use warnings; |
1e73acc8 |
6 | use Scalar::Util qw( reftype); |
7 | |
8 | require Exporter; |
9 | our @ISA = qw(Exporter); |
10 | our %EXPORT_TAGS = ( |
11 | 'all' => [ qw( |
12 | fieldhash |
13 | fieldhashes |
14 | )], |
15 | ); |
16 | our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); |
1e73acc8 |
17 | |
18 | our $VERSION = '0.01'; |
19 | |
20 | { |
21 | require XSLoader; |
6ff38c27 |
22 | my %ob_reg; # private object registry |
23 | sub _ob_reg { \ %ob_reg } |
1e73acc8 |
24 | XSLoader::load('Hash::Util::FieldHash', $VERSION); |
25 | } |
26 | |
27 | sub fieldhash (\%) { |
28 | for ( shift ) { |
29 | return unless ref() && reftype( $_) eq 'HASH'; |
30 | return $_ if Hash::Util::FieldHash::_fieldhash( $_, 0); |
31 | return $_ if Hash::Util::FieldHash::_fieldhash( $_, 1); |
32 | return; |
33 | } |
34 | } |
35 | |
36 | sub fieldhashes { map &fieldhash( $_), @_ } |
37 | |
38 | 1; |
39 | __END__ |
40 | |
41 | =head1 NAME |
42 | |
43 | Hash::Util::FieldHash - Associate references with data |
44 | |
45 | =head1 SYNOPSIS |
46 | |
47 | use Hash::Util qw(fieldhash fieldhashes); |
6ff38c27 |
48 | |
1e73acc8 |
49 | # Create a single field hash |
50 | fieldhash my %foo; |
6ff38c27 |
51 | |
1e73acc8 |
52 | # Create three at once... |
53 | fieldhashes \ my(%foo, %bar, %baz); |
54 | # ...or any number |
55 | fieldhashes @hashrefs; |
56 | |
57 | =head1 Functions |
58 | |
59 | Two functions generate field hashes: |
60 | |
61 | =over |
62 | |
63 | =item fieldhash |
64 | |
65 | fieldhash %hash; |
66 | |
67 | Creates a single field hash. The argument must be a hash. Returns |
68 | a reference to the given hash if successful, otherwise nothing. |
69 | |
70 | =item fieldhashes |
71 | |
72 | fieldhashes @hashrefs; |
73 | |
74 | Creates any number of field hashes. Arguments must be hash references. |
75 | Returns the converted hashrefs in list context, their number in scalar |
76 | context. |
77 | |
78 | =back |
79 | |
80 | =head1 Description |
81 | |
82 | =head2 Features |
83 | |
84 | Field hashes have three basic features: |
85 | |
86 | =over |
87 | |
88 | =item Key exchange |
89 | |
90 | If a I<reference> is used as a field hash key, it is replaced by |
91 | the integer value of the reference address. |
92 | |
93 | =item Thread support |
94 | |
95 | In a new I<thread> a field hash is updated so that its keys reflect |
96 | the new reference addresses of the original objects. |
97 | |
98 | =item Garbage collection |
99 | |
100 | When a reference goes I<stale> after having been used as a field hash key, |
101 | the hash entry will be deleted. |
102 | |
103 | =back |
104 | |
105 | Field hashes are designed to maintain an association of a reference |
106 | with a value. The association is independent of the bless status of |
107 | the key, it is thread safe and garbage-collected. These properties |
108 | are desirable in the construction of inside-out classes. |
109 | |
110 | When used with keys that are plain scalars (not references), field |
111 | hashes behave like normal hashes. |
112 | |
113 | =head2 Rationale |
114 | |
115 | The association of a reference (namely an object) with a value is |
116 | central to the concept of inside-out classes. These classes don't |
117 | store the values of object variables (fields) inside the object itself, |
118 | but outside, as it were, in private hashes keyed by the object. |
119 | |
120 | Normal hashes can be used for the purpose, but turn out to have |
121 | some disadvantages: |
122 | |
123 | =over |
124 | |
125 | =item Stringification |
126 | |
127 | The stringification of references depends on the bless status of the |
128 | reference. A plain hash reference C<$ref> may stringify as C<HASH(0x1801018)>, |
129 | but after being blessed into class C<foo> the same reference will look like |
130 | as C<foo=HASH(0x1801018)>, unless class C<foo> overloads stringification, |
131 | in which case it may show up as C<wurzelzwerg>. In a normal hash, the |
132 | stringified reference wouldn't be found again after the blessing. |
133 | |
134 | Bypassing stringification by use of C<Scalar::Util::refaddr> has been |
135 | used to correct this. Field hashes automatically stringify their |
136 | keys to the reference address in decimal. |
137 | |
138 | =item Thread Dependency |
139 | |
140 | When a new thread is created, the Perl interpreter is cloned, which |
141 | implies that all variables change their reference address. Thus, |
142 | in a daughter thread, the "same" reference C<$ref> contains a different |
143 | address, but the cloned hash still holds the key based on the original |
144 | address. Again, the association is broken. |
145 | |
146 | A C<CLONE> method is required to update the hash on thread creation. |
147 | Field hashes come with an appropriate C<CLONE>. |
148 | |
149 | =item Garbage Collection |
150 | |
151 | When a reference (an object) is used as a hash key, the entry stays |
152 | in the hash when the object eventually goes out of scope. That can result |
153 | in a memory leak because the data associated with the object is not |
154 | freed. Worse than that, it can lead to a false association if the |
155 | reference address of the original object is later re-used. This |
156 | is not a remote possibility, address re-use happens all the time and |
157 | is a certainty under many conditions. |
158 | |
159 | If the references in question are indeed objects, a C<DESTROY> method |
160 | I<must> clean up hashes that the object uses for storage. Special |
161 | methods are needed when unblessed references can occur. |
162 | |
163 | Field hashes have garbage collection built in. If a reference |
164 | (blessed or unblessed) goes out of scope, corresponding entries |
165 | will be deleted from all field hashes. |
166 | |
167 | =back |
168 | |
169 | Thus, an inside-out class based on field hashes doesn't need a C<DESTROY> |
170 | method, nor a C<CLONE> method for thread support. That facilitates the |
171 | construction considerably. |
172 | |
173 | =head2 How to use |
174 | |
175 | Traditionally, the definition of an inside-out class contains a bare |
176 | block inside which a number of lexical hashes are declared and the |
177 | basic accessor methods defined, usually through C<Scalar::Util::refaddr>. |
178 | Further methods may be defined outside this block. There has to be |
179 | a DESTROY method and, for thread support, a CLONE method. |
180 | |
181 | When field hashes are used, the basic structure reamins the same. |
182 | Each lexical hash will be made a field hash. The call to C<refaddr> |
183 | can be omitted from the accessor methods. DESTROY and CLONE methods |
184 | are not necessary. |
185 | |
186 | If you have an existing inside-out class, simply making all hashes |
187 | field hashes with no other change should make no difference. Through |
188 | the calls to C<refaddr> or equivalent, the field hashes never get to |
189 | see a reference and work like normal hashes. Your DESTROY (and |
190 | CLONE) methods are still needed. |
191 | |
192 | To make the field hashes kick in, it is easiest to redefine C<refaddr> |
193 | as |
194 | |
195 | sub refaddr { shift } |
196 | |
197 | instead of importing it from C<Scalar::Util>. It should now be possible |
198 | to disable DESTROY and CLONE. Note that while it isn't disabled, |
199 | DESTROY will be called before the garbage collection of field hashes, |
6ff38c27 |
200 | so it will be invoked with a functional object and will continue to |
201 | function. |
202 | |
203 | It is not desirable to import the functions C<fieldhash> and/or |
204 | C<fieldhashes> into every class that is going to use them. They |
205 | are only used once to set up the class. When the class is up and running, |
206 | these functions serve no more purpose. |
1e73acc8 |
207 | |
1e73acc8 |
208 | If there are only a few field hashes to declare, it is simplest to |
209 | |
210 | use Hash::Util::FieldHash; |
211 | |
212 | early and call the functions qualified: |
213 | |
214 | Hash::Util::FieldHash::fieldhash my %foo; |
215 | |
216 | Otherwise, import the functions into a convenient package like |
217 | C<HUF> or, more generic, C<Aux> |
218 | |
219 | { |
220 | package Aux; |
221 | use Hash::Util::FieldHash ':all'; |
222 | } |
223 | |
224 | and call |
225 | |
226 | Aux::fieldhash my %foo; |
227 | |
228 | as needed. |
229 | |
230 | =head2 Examples |
231 | |
232 | Well... really only one example, and a rather trivial one at that. |
233 | There isn't much to exemplify. |
234 | |
235 | =head3 A simple class... |
236 | |
237 | The following example shows an utterly simple inside-out class |
238 | C<TimeStamp>, created using field hashes. It has a single field, |
239 | incorporated as the field hash C<%time>. Besides C<new> it has only |
240 | two methods: an initializer called C<stamp> that sets the field to |
241 | the current time, and a read-only accessor C<when> that returns the |
242 | time in C<localtime> format. |
243 | |
244 | # The class TimeStamp |
245 | |
246 | use Hash::Util::FieldHash; |
247 | { |
248 | package TimeStamp; |
249 | |
250 | Hash::Util::FieldHash::fieldhash my %time; |
251 | |
252 | sub stamp { $time{ $_[ 0]} = time; shift } # initializer |
253 | sub when { scalar localtime $time{ shift()} } # read accessor |
254 | sub new { bless( do { \ my $x }, shift)->stamp } # creator |
255 | } |
256 | |
257 | # See if it works |
258 | my $ts = TimeStamp->new; |
259 | print $ts->when, "\n"; |
260 | |
261 | Remarkable about this class definition is what isn't there: there |
262 | is no C<DESTROY> method, inherited or local, and no C<CLONE> method |
263 | is needed to make it thread-safe. Not to mention no need to call |
264 | C<refaddr> or something similar in the accessors. |
265 | |
266 | =head3 ...in action |
267 | |
268 | The outstanding property of inside-out classes is their "inheritability". |
269 | Like all inside-out classes, C<TimeStamp> is a I<universal base class>. |
270 | We can put it on the C<@ISA> list of arbitrary classes and its methods |
6ff38c27 |
271 | will just work, no matter how the host class is constructed. No traditional |
272 | Perl class allows that. The following program demonstrates the feat: |
1e73acc8 |
273 | |
274 | # Make a sample of objects to add time stamps to. |
275 | |
276 | use Math::Complex; |
277 | use IO::Handle; |
278 | |
279 | my @objects = ( |
280 | Math::Complex->new( 12, 13), |
281 | IO::Handle->new(), |
282 | qr/abc/, # in class Regexp |
283 | bless( [], 'Boing'), # made up on the spot |
6ff38c27 |
284 | # add more |
1e73acc8 |
285 | ); |
286 | |
287 | # Prepare for use with TimeStamp |
6ff38c27 |
288 | |
1e73acc8 |
289 | for ( @objects ) { |
290 | no strict 'refs'; |
291 | push @{ ref() . '::ISA' }, 'TimeStamp'; |
292 | } |
293 | |
294 | # Now apply TimeStamp methods to all objects and show the result |
295 | |
296 | for my $obj ( @objects ) { |
297 | $obj->stamp; |
298 | report( $obj, $obj->when); |
299 | } |
300 | |
301 | # print a description of the object and the result of ->when |
302 | |
303 | use Scalar::Util qw( reftype); |
304 | sub report { |
305 | my ( $obj, $when) = @_; |
306 | my $msg = sprintf "This is a %s object(a %s), its time is %s", |
307 | ref $obj, |
308 | reftype $obj, |
309 | $when; |
310 | $msg =~ s/\ba(?= [aeiouAEIOU])/an/g; # grammar matters :) |
311 | print "$msg\n"; |
312 | } |
313 | |
314 | =head2 Garbage-Collected Hashes |
315 | |
316 | Garbage collection in a field hash means that entries will "spontaneously" |
317 | disappear when the object that created them disappears. That must be |
318 | borne in mind, especially when looping over a field hash. If anything |
319 | you do inside the loop could cause an object to go out of scope, a |
320 | random key may be deleted from the hash you are looping over. That |
321 | can throw the loop iterator, so it's best to cache a consistent snapshot |
322 | of the keys and/or values and loop over that. You will still have to |
323 | check that a cached entry still exists when you get to it. |
324 | |
325 | Garbage collection can be confusing when keys are created in a field hash |
326 | from normal scalars as well as references. Once a reference is I<used> with |
327 | a field hash, the entry will be collected, even if it was later overwritten |
328 | with a plain scalar key (every positive integer is a candidate). This |
329 | is true even if the original entry was deleted in the meantime. In fact, |
330 | deletion from a field hash, and also a test for existence constitute |
331 | I<use> in this sense and create a liability to delete the entry when |
332 | the reference goes out of scope. If you happen to create an entry |
333 | with an identical key from a string or integer, that will be collected |
334 | instead. Thus, mixed use of references and plain scalars as field hash |
335 | keys is not entirely supported. |
336 | |
337 | =head1 Guts |
338 | |
339 | To make C<Hash::Util::FieldHash> work, there were two changes to |
340 | F<perl> itself. C<PERL_MAGIC_uvar> was made avaliable for hashes, |
341 | and weak references now call uvar C<get> magic after a weakref has been |
342 | cleared. The first feature is used to make field hashes intercept |
343 | their keys upon access. The second one triggers garbage collection. |
344 | |
345 | =head2 The C<PERL_MAGIC_uvar> interface for hashes |
346 | |
347 | C<PERL_MAGIC_uvar> I<get> magic is called from C<hv_fetch_common> and |
348 | C<hv_delete_common> through the function C<hv_magic_uvar_xkey>, which |
349 | defines the interface. The call happens for hashes with "uvar" magic |
350 | if the C<ufuncs> structure has equal values in the C<uf_val> and C<uf_set> |
351 | fields. Hashes are unaffected if (and as long as) these fields |
352 | hold different values. |
353 | |
354 | Upon the call, the C<mg_obj> field will hold the hash key to be accessed. |
355 | Upon return, the C<SV*> value in C<mg_obj> will be used in place of the |
356 | original key in the hash access. The integer index value in the first |
357 | parameter will be the C<action> value from C<hv_fetch_common>, or -1 |
358 | if the call is from C<hv_delete_common>. |
359 | |
360 | This is a template for a function suitable for the C<uf_val> field in |
361 | a C<ufuncs> structure for this call. The C<uf_set> and C<uf_index> |
362 | fields are irrelevant. |
363 | |
364 | IV watch_key(pTHX_ IV action, SV* field) { |
365 | MAGIC* mg = mg_find(field, PERL_MAGIC_uvar); |
366 | SV* keysv = mg->mg_obj; |
367 | /* Do whatever you need to. If you decide to |
368 | supply a different key newkey, return it like this |
369 | */ |
370 | sv_2mortal(newkey); |
371 | mg->mg_obj = newkey; |
372 | return 0; |
373 | } |
374 | |
375 | =head2 Weakrefs call uvar magic |
376 | |
377 | When a weak reference is stored in an C<SV> that has "uvar" magic, C<set> |
378 | magic is called after the reference has gone stale. This hook can be |
379 | used to trigger further garbage-collection activities associated with |
380 | the referenced object. |
381 | |
382 | =head2 How field hashes work |
383 | |
384 | The three features of key hashes, I<key replacement>, I<thread support>, |
385 | and I<garbage collection> are supported by a data structure called |
6ff38c27 |
386 | the I<object registry>. This is a private hash where every object |
387 | is stored. An "object" in this sense is any reference (blessed or |
388 | unblessed) that has been used as a field hash key. |
1e73acc8 |
389 | |
390 | The object registry keeps track of references that have been used as |
391 | field hash keys. The keys are generated from the reference address |
392 | like in a field hash (though the registry isn't a field hash). Each |
393 | value is a weak copy of the original reference, stored in an C<SV> that |
394 | is itself magical (C<PERL_MAGIC_uvar> again). The magical structure |
395 | holds a list (another hash, really) of field hashes that the reference |
396 | has been used with. When the weakref becomes stale, the magic is |
397 | activated and uses the list to delete the reference from all field |
398 | hashes it has been used with. After that, the entry is removed from |
399 | the object registry itself. Implicitly, that frees the magic structure |
400 | and the storage it has been using. |
401 | |
402 | Whenever a reference is used as a field hash key, the object registry |
403 | is checked and a new entry is made if necessary. The field hash is |
404 | then added to the list of fields this reference has used. |
405 | |
406 | The object registry is also used to repair a field hash after thread |
407 | cloning. Here, the entire object registry is processed. For every |
408 | reference found there, the field hashes it has used are visited and |
409 | the entry is updated. |
410 | |
411 | =head2 Internal function Hash::Util::FieldHash::_fieldhash |
412 | |
413 | # test if %hash is a field hash |
414 | my $result = _fieldhash \ %hash, 0; |
415 | |
416 | # make %hash a field hash |
417 | my $result = _fieldhash \ %hash, 1; |
418 | |
419 | C<_fieldhash> is the internal function used to create field hashes. |
420 | It takes two arguments, a hashref and a mode. If the mode is boolean |
421 | false, the hash is not changed but tested if it is a field hash. If |
422 | the hash isn't a field hash the return value is boolean false. If it |
423 | is, the return value indicates the mode of field hash. When called with |
424 | a boolean true mode, it turns the given hash into a field hash of this |
425 | mode, returning the mode of the created field hash. C<_fieldhash> |
426 | does not erase the given hash. |
427 | |
428 | Currently there is only one type of field hash, and only the boolean |
429 | value of the mode makes a difference, but that may change. |
430 | |
431 | =head1 AUTHOR |
432 | |
433 | Anno Siegel, E<lt>anno4000@zrz.tu-berlin.deE<gt> |
434 | |
435 | =head1 COPYRIGHT AND LICENSE |
436 | |
6ff38c27 |
437 | Copyright (C) 2006 by (Anno Siegel) |
1e73acc8 |
438 | |
439 | This library is free software; you can redistribute it and/or modify |
440 | it under the same terms as Perl itself, either Perl version 5.8.7 or, |
441 | at your option, any later version of Perl 5 you may have available. |
442 | |
443 | =cut |