1 package Hash::Util::FieldHash;
6 use Scalar::Util qw( reftype);
9 our @ISA = qw(Exporter);
16 our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
18 our $VERSION = '0.01';
22 my %ob_reg; # private object registry
23 sub _ob_reg { \ %ob_reg }
24 XSLoader::load('Hash::Util::FieldHash', $VERSION);
29 return unless ref() && reftype( $_) eq 'HASH';
30 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 0);
31 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 1);
36 sub fieldhashes { map &fieldhash( $_), @_ }
43 Hash::Util::FieldHash - Associate references with data
47 use Hash::Util qw(fieldhash fieldhashes);
49 # Create a single field hash
52 # Create three at once...
53 fieldhashes \ my(%foo, %bar, %baz);
55 fieldhashes @hashrefs;
59 Two functions generate field hashes:
67 Creates a single field hash. The argument must be a hash. Returns
68 a reference to the given hash if successful, otherwise nothing.
72 fieldhashes @hashrefs;
74 Creates any number of field hashes. Arguments must be hash references.
75 Returns the converted hashrefs in list context, their number in scalar
84 Field hashes have three basic features:
90 If a I<reference> is used as a field hash key, it is replaced by
91 the integer value of the reference address.
95 In a new I<thread> a field hash is updated so that its keys reflect
96 the new reference addresses of the original objects.
98 =item Garbage collection
100 When a reference goes I<stale> after having been used as a field hash key,
101 the hash entry will be deleted.
105 Field hashes are designed to maintain an association of a reference
106 with a value. The association is independent of the bless status of
107 the key, it is thread safe and garbage-collected. These properties
108 are desirable in the construction of inside-out classes.
110 When used with keys that are plain scalars (not references), field
111 hashes behave like normal hashes.
115 The association of a reference (namely an object) with a value is
116 central to the concept of inside-out classes. These classes don't
117 store the values of object variables (fields) inside the object itself,
118 but outside, as it were, in private hashes keyed by the object.
120 Normal hashes can be used for the purpose, but turn out to have
125 =item Stringification
127 The stringification of references depends on the bless status of the
128 reference. A plain hash reference C<$ref> may stringify as C<HASH(0x1801018)>,
129 but after being blessed into class C<foo> the same reference will look like
130 as C<foo=HASH(0x1801018)>, unless class C<foo> overloads stringification,
131 in which case it may show up as C<wurzelzwerg>. In a normal hash, the
132 stringified reference wouldn't be found again after the blessing.
134 Bypassing stringification by use of C<Scalar::Util::refaddr> has been
135 used to correct this. Field hashes automatically stringify their
136 keys to the reference address in decimal.
138 =item Thread Dependency
140 When a new thread is created, the Perl interpreter is cloned, which
141 implies that all variables change their reference address. Thus,
142 in a daughter thread, the "same" reference C<$ref> contains a different
143 address, but the cloned hash still holds the key based on the original
144 address. Again, the association is broken.
146 A C<CLONE> method is required to update the hash on thread creation.
147 Field hashes come with an appropriate C<CLONE>.
149 =item Garbage Collection
151 When a reference (an object) is used as a hash key, the entry stays
152 in the hash when the object eventually goes out of scope. That can result
153 in a memory leak because the data associated with the object is not
154 freed. Worse than that, it can lead to a false association if the
155 reference address of the original object is later re-used. This
156 is not a remote possibility, address re-use happens all the time and
157 is a certainty under many conditions.
159 If the references in question are indeed objects, a C<DESTROY> method
160 I<must> clean up hashes that the object uses for storage. Special
161 methods are needed when unblessed references can occur.
163 Field hashes have garbage collection built in. If a reference
164 (blessed or unblessed) goes out of scope, corresponding entries
165 will be deleted from all field hashes.
169 Thus, an inside-out class based on field hashes doesn't need a C<DESTROY>
170 method, nor a C<CLONE> method for thread support. That facilitates the
171 construction considerably.
175 Traditionally, the definition of an inside-out class contains a bare
176 block inside which a number of lexical hashes are declared and the
177 basic accessor methods defined, usually through C<Scalar::Util::refaddr>.
178 Further methods may be defined outside this block. There has to be
179 a DESTROY method and, for thread support, a CLONE method.
181 When field hashes are used, the basic structure reamins the same.
182 Each lexical hash will be made a field hash. The call to C<refaddr>
183 can be omitted from the accessor methods. DESTROY and CLONE methods
186 If you have an existing inside-out class, simply making all hashes
187 field hashes with no other change should make no difference. Through
188 the calls to C<refaddr> or equivalent, the field hashes never get to
189 see a reference and work like normal hashes. Your DESTROY (and
190 CLONE) methods are still needed.
192 To make the field hashes kick in, it is easiest to redefine C<refaddr>
195 sub refaddr { shift }
197 instead of importing it from C<Scalar::Util>. It should now be possible
198 to disable DESTROY and CLONE. Note that while it isn't disabled,
199 DESTROY will be called before the garbage collection of field hashes,
200 so it will be invoked with a functional object and will continue to
203 It is not desirable to import the functions C<fieldhash> and/or
204 C<fieldhashes> into every class that is going to use them. They
205 are only used once to set up the class. When the class is up and running,
206 these functions serve no more purpose.
208 If there are only a few field hashes to declare, it is simplest to
210 use Hash::Util::FieldHash;
212 early and call the functions qualified:
214 Hash::Util::FieldHash::fieldhash my %foo;
216 Otherwise, import the functions into a convenient package like
217 C<HUF> or, more generic, C<Aux>
221 use Hash::Util::FieldHash ':all';
226 Aux::fieldhash my %foo;
232 Well... really only one example, and a rather trivial one at that.
233 There isn't much to exemplify.
235 =head3 A simple class...
237 The following example shows an utterly simple inside-out class
238 C<TimeStamp>, created using field hashes. It has a single field,
239 incorporated as the field hash C<%time>. Besides C<new> it has only
240 two methods: an initializer called C<stamp> that sets the field to
241 the current time, and a read-only accessor C<when> that returns the
242 time in C<localtime> format.
244 # The class TimeStamp
246 use Hash::Util::FieldHash;
250 Hash::Util::FieldHash::fieldhash my %time;
252 sub stamp { $time{ $_[ 0]} = time; shift } # initializer
253 sub when { scalar localtime $time{ shift()} } # read accessor
254 sub new { bless( do { \ my $x }, shift)->stamp } # creator
258 my $ts = TimeStamp->new;
259 print $ts->when, "\n";
261 Remarkable about this class definition is what isn't there: there
262 is no C<DESTROY> method, inherited or local, and no C<CLONE> method
263 is needed to make it thread-safe. Not to mention no need to call
264 C<refaddr> or something similar in the accessors.
268 The outstanding property of inside-out classes is their "inheritability".
269 Like all inside-out classes, C<TimeStamp> is a I<universal base class>.
270 We can put it on the C<@ISA> list of arbitrary classes and its methods
271 will just work, no matter how the host class is constructed. No traditional
272 Perl class allows that. The following program demonstrates the feat:
274 # Make a sample of objects to add time stamps to.
280 Math::Complex->new( 12, 13),
282 qr/abc/, # in class Regexp
283 bless( [], 'Boing'), # made up on the spot
287 # Prepare for use with TimeStamp
291 push @{ ref() . '::ISA' }, 'TimeStamp';
294 # Now apply TimeStamp methods to all objects and show the result
296 for my $obj ( @objects ) {
298 report( $obj, $obj->when);
301 # print a description of the object and the result of ->when
303 use Scalar::Util qw( reftype);
305 my ( $obj, $when) = @_;
306 my $msg = sprintf "This is a %s object(a %s), its time is %s",
310 $msg =~ s/\ba(?= [aeiouAEIOU])/an/g; # grammar matters :)
314 =head2 Garbage-Collected Hashes
316 Garbage collection in a field hash means that entries will "spontaneously"
317 disappear when the object that created them disappears. That must be
318 borne in mind, especially when looping over a field hash. If anything
319 you do inside the loop could cause an object to go out of scope, a
320 random key may be deleted from the hash you are looping over. That
321 can throw the loop iterator, so it's best to cache a consistent snapshot
322 of the keys and/or values and loop over that. You will still have to
323 check that a cached entry still exists when you get to it.
325 Garbage collection can be confusing when keys are created in a field hash
326 from normal scalars as well as references. Once a reference is I<used> with
327 a field hash, the entry will be collected, even if it was later overwritten
328 with a plain scalar key (every positive integer is a candidate). This
329 is true even if the original entry was deleted in the meantime. In fact,
330 deletion from a field hash, and also a test for existence constitute
331 I<use> in this sense and create a liability to delete the entry when
332 the reference goes out of scope. If you happen to create an entry
333 with an identical key from a string or integer, that will be collected
334 instead. Thus, mixed use of references and plain scalars as field hash
335 keys is not entirely supported.
339 To make C<Hash::Util::FieldHash> work, there were two changes to
340 F<perl> itself. C<PERL_MAGIC_uvar> was made avaliable for hashes,
341 and weak references now call uvar C<get> magic after a weakref has been
342 cleared. The first feature is used to make field hashes intercept
343 their keys upon access. The second one triggers garbage collection.
345 =head2 The C<PERL_MAGIC_uvar> interface for hashes
347 C<PERL_MAGIC_uvar> I<get> magic is called from C<hv_fetch_common> and
348 C<hv_delete_common> through the function C<hv_magic_uvar_xkey>, which
349 defines the interface. The call happens for hashes with "uvar" magic
350 if the C<ufuncs> structure has equal values in the C<uf_val> and C<uf_set>
351 fields. Hashes are unaffected if (and as long as) these fields
352 hold different values.
354 Upon the call, the C<mg_obj> field will hold the hash key to be accessed.
355 Upon return, the C<SV*> value in C<mg_obj> will be used in place of the
356 original key in the hash access. The integer index value in the first
357 parameter will be the C<action> value from C<hv_fetch_common>, or -1
358 if the call is from C<hv_delete_common>.
360 This is a template for a function suitable for the C<uf_val> field in
361 a C<ufuncs> structure for this call. The C<uf_set> and C<uf_index>
362 fields are irrelevant.
364 IV watch_key(pTHX_ IV action, SV* field) {
365 MAGIC* mg = mg_find(field, PERL_MAGIC_uvar);
366 SV* keysv = mg->mg_obj;
367 /* Do whatever you need to. If you decide to
368 supply a different key newkey, return it like this
375 =head2 Weakrefs call uvar magic
377 When a weak reference is stored in an C<SV> that has "uvar" magic, C<set>
378 magic is called after the reference has gone stale. This hook can be
379 used to trigger further garbage-collection activities associated with
380 the referenced object.
382 =head2 How field hashes work
384 The three features of key hashes, I<key replacement>, I<thread support>,
385 and I<garbage collection> are supported by a data structure called
386 the I<object registry>. This is a private hash where every object
387 is stored. An "object" in this sense is any reference (blessed or
388 unblessed) that has been used as a field hash key.
390 The object registry keeps track of references that have been used as
391 field hash keys. The keys are generated from the reference address
392 like in a field hash (though the registry isn't a field hash). Each
393 value is a weak copy of the original reference, stored in an C<SV> that
394 is itself magical (C<PERL_MAGIC_uvar> again). The magical structure
395 holds a list (another hash, really) of field hashes that the reference
396 has been used with. When the weakref becomes stale, the magic is
397 activated and uses the list to delete the reference from all field
398 hashes it has been used with. After that, the entry is removed from
399 the object registry itself. Implicitly, that frees the magic structure
400 and the storage it has been using.
402 Whenever a reference is used as a field hash key, the object registry
403 is checked and a new entry is made if necessary. The field hash is
404 then added to the list of fields this reference has used.
406 The object registry is also used to repair a field hash after thread
407 cloning. Here, the entire object registry is processed. For every
408 reference found there, the field hashes it has used are visited and
409 the entry is updated.
411 =head2 Internal function Hash::Util::FieldHash::_fieldhash
413 # test if %hash is a field hash
414 my $result = _fieldhash \ %hash, 0;
416 # make %hash a field hash
417 my $result = _fieldhash \ %hash, 1;
419 C<_fieldhash> is the internal function used to create field hashes.
420 It takes two arguments, a hashref and a mode. If the mode is boolean
421 false, the hash is not changed but tested if it is a field hash. If
422 the hash isn't a field hash the return value is boolean false. If it
423 is, the return value indicates the mode of field hash. When called with
424 a boolean true mode, it turns the given hash into a field hash of this
425 mode, returning the mode of the created field hash. C<_fieldhash>
426 does not erase the given hash.
428 Currently there is only one type of field hash, and only the boolean
429 value of the mode makes a difference, but that may change.
433 Anno Siegel, E<lt>anno4000@zrz.tu-berlin.deE<gt>
435 =head1 COPYRIGHT AND LICENSE
437 Copyright (C) 2006 by (Anno Siegel)
439 This library is free software; you can redistribute it and/or modify
440 it under the same terms as Perl itself, either Perl version 5.8.7 or,
441 at your option, any later version of Perl 5 you may have available.