1 package Hash::Util::FieldHash;
7 use Scalar::Util qw( reftype);
10 our @ISA = qw(Exporter);
17 our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
21 our $VERSION = '0.01';
25 our %ob_reg; # silence possible 'once' warning in XSLoader
26 XSLoader::load('Hash::Util::FieldHash', $VERSION);
31 return unless ref() && reftype( $_) eq 'HASH';
32 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 0);
33 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 1);
38 sub fieldhashes { map &fieldhash( $_), @_ }
45 Hash::Util::FieldHash - Associate references with data
49 use Hash::Util qw(fieldhash fieldhashes);
51 # Create a single field hash
54 # Create three at once...
55 fieldhashes \ my(%foo, %bar, %baz);
57 fieldhashes @hashrefs;
61 Two functions generate field hashes:
69 Creates a single field hash. The argument must be a hash. Returns
70 a reference to the given hash if successful, otherwise nothing.
74 fieldhashes @hashrefs;
76 Creates any number of field hashes. Arguments must be hash references.
77 Returns the converted hashrefs in list context, their number in scalar
86 Field hashes have three basic features:
92 If a I<reference> is used as a field hash key, it is replaced by
93 the integer value of the reference address.
97 In a new I<thread> a field hash is updated so that its keys reflect
98 the new reference addresses of the original objects.
100 =item Garbage collection
102 When a reference goes I<stale> after having been used as a field hash key,
103 the hash entry will be deleted.
107 Field hashes are designed to maintain an association of a reference
108 with a value. The association is independent of the bless status of
109 the key, it is thread safe and garbage-collected. These properties
110 are desirable in the construction of inside-out classes.
112 When used with keys that are plain scalars (not references), field
113 hashes behave like normal hashes.
117 The association of a reference (namely an object) with a value is
118 central to the concept of inside-out classes. These classes don't
119 store the values of object variables (fields) inside the object itself,
120 but outside, as it were, in private hashes keyed by the object.
122 Normal hashes can be used for the purpose, but turn out to have
127 =item Stringification
129 The stringification of references depends on the bless status of the
130 reference. A plain hash reference C<$ref> may stringify as C<HASH(0x1801018)>,
131 but after being blessed into class C<foo> the same reference will look like
132 as C<foo=HASH(0x1801018)>, unless class C<foo> overloads stringification,
133 in which case it may show up as C<wurzelzwerg>. In a normal hash, the
134 stringified reference wouldn't be found again after the blessing.
136 Bypassing stringification by use of C<Scalar::Util::refaddr> has been
137 used to correct this. Field hashes automatically stringify their
138 keys to the reference address in decimal.
140 =item Thread Dependency
142 When a new thread is created, the Perl interpreter is cloned, which
143 implies that all variables change their reference address. Thus,
144 in a daughter thread, the "same" reference C<$ref> contains a different
145 address, but the cloned hash still holds the key based on the original
146 address. Again, the association is broken.
148 A C<CLONE> method is required to update the hash on thread creation.
149 Field hashes come with an appropriate C<CLONE>.
151 =item Garbage Collection
153 When a reference (an object) is used as a hash key, the entry stays
154 in the hash when the object eventually goes out of scope. That can result
155 in a memory leak because the data associated with the object is not
156 freed. Worse than that, it can lead to a false association if the
157 reference address of the original object is later re-used. This
158 is not a remote possibility, address re-use happens all the time and
159 is a certainty under many conditions.
161 If the references in question are indeed objects, a C<DESTROY> method
162 I<must> clean up hashes that the object uses for storage. Special
163 methods are needed when unblessed references can occur.
165 Field hashes have garbage collection built in. If a reference
166 (blessed or unblessed) goes out of scope, corresponding entries
167 will be deleted from all field hashes.
171 Thus, an inside-out class based on field hashes doesn't need a C<DESTROY>
172 method, nor a C<CLONE> method for thread support. That facilitates the
173 construction considerably.
177 Traditionally, the definition of an inside-out class contains a bare
178 block inside which a number of lexical hashes are declared and the
179 basic accessor methods defined, usually through C<Scalar::Util::refaddr>.
180 Further methods may be defined outside this block. There has to be
181 a DESTROY method and, for thread support, a CLONE method.
183 When field hashes are used, the basic structure reamins the same.
184 Each lexical hash will be made a field hash. The call to C<refaddr>
185 can be omitted from the accessor methods. DESTROY and CLONE methods
188 If you have an existing inside-out class, simply making all hashes
189 field hashes with no other change should make no difference. Through
190 the calls to C<refaddr> or equivalent, the field hashes never get to
191 see a reference and work like normal hashes. Your DESTROY (and
192 CLONE) methods are still needed.
194 To make the field hashes kick in, it is easiest to redefine C<refaddr>
197 sub refaddr { shift }
199 instead of importing it from C<Scalar::Util>. It should now be possible
200 to disable DESTROY and CLONE. Note that while it isn't disabled,
201 DESTROY will be called before the garbage collection of field hashes,
202 so it will be invoked with a functional object.
204 It is not necessary to import the functions C<fieldhash> and/or
205 C<fieldhashes> into every class that is going to use them. When
206 the class is up and running, these functions have no business there.
207 If there are only a few field hashes to declare, it is simplest to
209 use Hash::Util::FieldHash;
211 early and call the functions qualified:
213 Hash::Util::FieldHash::fieldhash my %foo;
215 Otherwise, import the functions into a convenient package like
216 C<HUF> or, more generic, C<Aux>
220 use Hash::Util::FieldHash ':all';
225 Aux::fieldhash my %foo;
231 Well... really only one example, and a rather trivial one at that.
232 There isn't much to exemplify.
234 =head3 A simple class...
236 The following example shows an utterly simple inside-out class
237 C<TimeStamp>, created using field hashes. It has a single field,
238 incorporated as the field hash C<%time>. Besides C<new> it has only
239 two methods: an initializer called C<stamp> that sets the field to
240 the current time, and a read-only accessor C<when> that returns the
241 time in C<localtime> format.
243 # The class TimeStamp
245 use Hash::Util::FieldHash;
249 Hash::Util::FieldHash::fieldhash my %time;
251 sub stamp { $time{ $_[ 0]} = time; shift } # initializer
252 sub when { scalar localtime $time{ shift()} } # read accessor
253 sub new { bless( do { \ my $x }, shift)->stamp } # creator
257 my $ts = TimeStamp->new;
258 print $ts->when, "\n";
260 Remarkable about this class definition is what isn't there: there
261 is no C<DESTROY> method, inherited or local, and no C<CLONE> method
262 is needed to make it thread-safe. Not to mention no need to call
263 C<refaddr> or something similar in the accessors.
267 The outstanding property of inside-out classes is their "inheritability".
268 Like all inside-out classes, C<TimeStamp> is a I<universal base class>.
269 We can put it on the C<@ISA> list of arbitrary classes and its methods
270 will just work, no matter how the host class is constructed. This is
271 demonstrated by the following program:
273 # Make a sample of objects to add time stamps to.
279 Math::Complex->new( 12, 13),
281 qr/abc/, # in class Regexp
282 bless( [], 'Boing'), # made up on the spot
285 # Prepare for use with TimeStamp
289 push @{ ref() . '::ISA' }, 'TimeStamp';
292 # Now apply TimeStamp methods to all objects and show the result
294 for my $obj ( @objects ) {
296 report( $obj, $obj->when);
299 # print a description of the object and the result of ->when
301 use Scalar::Util qw( reftype);
303 my ( $obj, $when) = @_;
304 my $msg = sprintf "This is a %s object(a %s), its time is %s",
308 $msg =~ s/\ba(?= [aeiouAEIOU])/an/g; # grammar matters :)
312 =head2 Garbage-Collected Hashes
314 Garbage collection in a field hash means that entries will "spontaneously"
315 disappear when the object that created them disappears. That must be
316 borne in mind, especially when looping over a field hash. If anything
317 you do inside the loop could cause an object to go out of scope, a
318 random key may be deleted from the hash you are looping over. That
319 can throw the loop iterator, so it's best to cache a consistent snapshot
320 of the keys and/or values and loop over that. You will still have to
321 check that a cached entry still exists when you get to it.
323 Garbage collection can be confusing when keys are created in a field hash
324 from normal scalars as well as references. Once a reference is I<used> with
325 a field hash, the entry will be collected, even if it was later overwritten
326 with a plain scalar key (every positive integer is a candidate). This
327 is true even if the original entry was deleted in the meantime. In fact,
328 deletion from a field hash, and also a test for existence constitute
329 I<use> in this sense and create a liability to delete the entry when
330 the reference goes out of scope. If you happen to create an entry
331 with an identical key from a string or integer, that will be collected
332 instead. Thus, mixed use of references and plain scalars as field hash
333 keys is not entirely supported.
337 To make C<Hash::Util::FieldHash> work, there were two changes to
338 F<perl> itself. C<PERL_MAGIC_uvar> was made avaliable for hashes,
339 and weak references now call uvar C<get> magic after a weakref has been
340 cleared. The first feature is used to make field hashes intercept
341 their keys upon access. The second one triggers garbage collection.
343 =head2 The C<PERL_MAGIC_uvar> interface for hashes
345 C<PERL_MAGIC_uvar> I<get> magic is called from C<hv_fetch_common> and
346 C<hv_delete_common> through the function C<hv_magic_uvar_xkey>, which
347 defines the interface. The call happens for hashes with "uvar" magic
348 if the C<ufuncs> structure has equal values in the C<uf_val> and C<uf_set>
349 fields. Hashes are unaffected if (and as long as) these fields
350 hold different values.
352 Upon the call, the C<mg_obj> field will hold the hash key to be accessed.
353 Upon return, the C<SV*> value in C<mg_obj> will be used in place of the
354 original key in the hash access. The integer index value in the first
355 parameter will be the C<action> value from C<hv_fetch_common>, or -1
356 if the call is from C<hv_delete_common>.
358 This is a template for a function suitable for the C<uf_val> field in
359 a C<ufuncs> structure for this call. The C<uf_set> and C<uf_index>
360 fields are irrelevant.
362 IV watch_key(pTHX_ IV action, SV* field) {
363 MAGIC* mg = mg_find(field, PERL_MAGIC_uvar);
364 SV* keysv = mg->mg_obj;
365 /* Do whatever you need to. If you decide to
366 supply a different key newkey, return it like this
373 =head2 Weakrefs call uvar magic
375 When a weak reference is stored in an C<SV> that has "uvar" magic, C<set>
376 magic is called after the reference has gone stale. This hook can be
377 used to trigger further garbage-collection activities associated with
378 the referenced object.
380 =head2 How field hashes work
382 The three features of key hashes, I<key replacement>, I<thread support>,
383 and I<garbage collection> are supported by a data structure called
384 the I<object registry>. This is currently the hash
385 C<Hash::Utils::FieldHash::ob_reg> though there may be a more private
386 place for it in the future. An "object" is any reference (blessed
387 or unblessed) that has been used as a field hash key.
389 The object registry keeps track of references that have been used as
390 field hash keys. The keys are generated from the reference address
391 like in a field hash (though the registry isn't a field hash). Each
392 value is a weak copy of the original reference, stored in an C<SV> that
393 is itself magical (C<PERL_MAGIC_uvar> again). The magical structure
394 holds a list (another hash, really) of field hashes that the reference
395 has been used with. When the weakref becomes stale, the magic is
396 activated and uses the list to delete the reference from all field
397 hashes it has been used with. After that, the entry is removed from
398 the object registry itself. Implicitly, that frees the magic structure
399 and the storage it has been using.
401 Whenever a reference is used as a field hash key, the object registry
402 is checked and a new entry is made if necessary. The field hash is
403 then added to the list of fields this reference has used.
405 The object registry is also used to repair a field hash after thread
406 cloning. Here, the entire object registry is processed. For every
407 reference found there, the field hashes it has used are visited and
408 the entry is updated.
410 =head2 Internal function Hash::Util::FieldHash::_fieldhash
412 # test if %hash is a field hash
413 my $result = _fieldhash \ %hash, 0;
415 # make %hash a field hash
416 my $result = _fieldhash \ %hash, 1;
418 C<_fieldhash> is the internal function used to create field hashes.
419 It takes two arguments, a hashref and a mode. If the mode is boolean
420 false, the hash is not changed but tested if it is a field hash. If
421 the hash isn't a field hash the return value is boolean false. If it
422 is, the return value indicates the mode of field hash. When called with
423 a boolean true mode, it turns the given hash into a field hash of this
424 mode, returning the mode of the created field hash. C<_fieldhash>
425 does not erase the given hash.
427 Currently there is only one type of field hash, and only the boolean
428 value of the mode makes a difference, but that may change.
432 Anno Siegel, E<lt>anno4000@zrz.tu-berlin.deE<gt>
434 =head1 COPYRIGHT AND LICENSE
436 Copyright (C) 2006 by (icke)
438 This library is free software; you can redistribute it and/or modify
439 it under the same terms as Perl itself, either Perl version 5.8.7 or,
440 at your option, any later version of Perl 5 you may have available.