FieldHash coverity-compliant
[p5sagit/p5-mst-13.2.git] / ext / Hash / Util / FieldHash / lib / Hash / Util / FieldHash.pm
CommitLineData
1e73acc8 1package Hash::Util::FieldHash;
2
3use 5.009004;
4use strict;
5use warnings;
1e73acc8 6use Scalar::Util qw( reftype);
7
8require Exporter;
9our @ISA = qw(Exporter);
10our %EXPORT_TAGS = (
11 'all' => [ qw(
12 fieldhash
13 fieldhashes
14 )],
15);
16our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
1e73acc8 17
18our $VERSION = '0.01';
19
20{
21 require XSLoader;
6ff38c27 22 my %ob_reg; # private object registry
23 sub _ob_reg { \ %ob_reg }
1e73acc8 24 XSLoader::load('Hash::Util::FieldHash', $VERSION);
25}
26
27sub fieldhash (\%) {
28 for ( shift ) {
29 return unless ref() && reftype( $_) eq 'HASH';
30 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 0);
31 return $_ if Hash::Util::FieldHash::_fieldhash( $_, 1);
32 return;
33 }
34}
35
36sub fieldhashes { map &fieldhash( $_), @_ }
37
381;
39__END__
40
41=head1 NAME
42
43Hash::Util::FieldHash - Associate references with data
44
45=head1 SYNOPSIS
46
47 use Hash::Util qw(fieldhash fieldhashes);
6ff38c27 48
1e73acc8 49 # Create a single field hash
50 fieldhash my %foo;
6ff38c27 51
1e73acc8 52 # Create three at once...
53 fieldhashes \ my(%foo, %bar, %baz);
54 # ...or any number
55 fieldhashes @hashrefs;
56
57=head1 Functions
58
59Two functions generate field hashes:
60
61=over
62
63=item fieldhash
64
65 fieldhash %hash;
66
67Creates a single field hash. The argument must be a hash. Returns
68a reference to the given hash if successful, otherwise nothing.
69
70=item fieldhashes
71
72 fieldhashes @hashrefs;
73
74Creates any number of field hashes. Arguments must be hash references.
75Returns the converted hashrefs in list context, their number in scalar
76context.
77
78=back
79
80=head1 Description
81
82=head2 Features
83
84Field hashes have three basic features:
85
86=over
87
88=item Key exchange
89
90If a I<reference> is used as a field hash key, it is replaced by
91the integer value of the reference address.
92
93=item Thread support
94
95In a new I<thread> a field hash is updated so that its keys reflect
96the new reference addresses of the original objects.
97
98=item Garbage collection
99
100When a reference goes I<stale> after having been used as a field hash key,
101the hash entry will be deleted.
102
103=back
104
105Field hashes are designed to maintain an association of a reference
106with a value. The association is independent of the bless status of
107the key, it is thread safe and garbage-collected. These properties
108are desirable in the construction of inside-out classes.
109
110When used with keys that are plain scalars (not references), field
111hashes behave like normal hashes.
112
113=head2 Rationale
114
115The association of a reference (namely an object) with a value is
116central to the concept of inside-out classes. These classes don't
117store the values of object variables (fields) inside the object itself,
118but outside, as it were, in private hashes keyed by the object.
119
120Normal hashes can be used for the purpose, but turn out to have
121some disadvantages:
122
123=over
124
125=item Stringification
126
127The stringification of references depends on the bless status of the
128reference. A plain hash reference C<$ref> may stringify as C<HASH(0x1801018)>,
129but after being blessed into class C<foo> the same reference will look like
130as C<foo=HASH(0x1801018)>, unless class C<foo> overloads stringification,
131in which case it may show up as C<wurzelzwerg>. In a normal hash, the
132stringified reference wouldn't be found again after the blessing.
133
134Bypassing stringification by use of C<Scalar::Util::refaddr> has been
135used to correct this. Field hashes automatically stringify their
136keys to the reference address in decimal.
137
138=item Thread Dependency
139
140When a new thread is created, the Perl interpreter is cloned, which
141implies that all variables change their reference address. Thus,
142in a daughter thread, the "same" reference C<$ref> contains a different
143address, but the cloned hash still holds the key based on the original
144address. Again, the association is broken.
145
146A C<CLONE> method is required to update the hash on thread creation.
147Field hashes come with an appropriate C<CLONE>.
148
149=item Garbage Collection
150
151When a reference (an object) is used as a hash key, the entry stays
152in the hash when the object eventually goes out of scope. That can result
153in a memory leak because the data associated with the object is not
154freed. Worse than that, it can lead to a false association if the
155reference address of the original object is later re-used. This
156is not a remote possibility, address re-use happens all the time and
157is a certainty under many conditions.
158
159If the references in question are indeed objects, a C<DESTROY> method
160I<must> clean up hashes that the object uses for storage. Special
161methods are needed when unblessed references can occur.
162
163Field hashes have garbage collection built in. If a reference
164(blessed or unblessed) goes out of scope, corresponding entries
165will be deleted from all field hashes.
166
167=back
168
169Thus, an inside-out class based on field hashes doesn't need a C<DESTROY>
170method, nor a C<CLONE> method for thread support. That facilitates the
171construction considerably.
172
173=head2 How to use
174
175Traditionally, the definition of an inside-out class contains a bare
176block inside which a number of lexical hashes are declared and the
177basic accessor methods defined, usually through C<Scalar::Util::refaddr>.
178Further methods may be defined outside this block. There has to be
179a DESTROY method and, for thread support, a CLONE method.
180
181When field hashes are used, the basic structure reamins the same.
182Each lexical hash will be made a field hash. The call to C<refaddr>
183can be omitted from the accessor methods. DESTROY and CLONE methods
184are not necessary.
185
186If you have an existing inside-out class, simply making all hashes
187field hashes with no other change should make no difference. Through
188the calls to C<refaddr> or equivalent, the field hashes never get to
189see a reference and work like normal hashes. Your DESTROY (and
190CLONE) methods are still needed.
191
192To make the field hashes kick in, it is easiest to redefine C<refaddr>
193as
194
195 sub refaddr { shift }
196
197instead of importing it from C<Scalar::Util>. It should now be possible
198to disable DESTROY and CLONE. Note that while it isn't disabled,
199DESTROY will be called before the garbage collection of field hashes,
6ff38c27 200so it will be invoked with a functional object and will continue to
201function.
202
203It is not desirable to import the functions C<fieldhash> and/or
204C<fieldhashes> into every class that is going to use them. They
205are only used once to set up the class. When the class is up and running,
206these functions serve no more purpose.
1e73acc8 207
1e73acc8 208If there are only a few field hashes to declare, it is simplest to
209
210 use Hash::Util::FieldHash;
211
212early and call the functions qualified:
213
214 Hash::Util::FieldHash::fieldhash my %foo;
215
216Otherwise, import the functions into a convenient package like
217C<HUF> or, more generic, C<Aux>
218
219 {
220 package Aux;
221 use Hash::Util::FieldHash ':all';
222 }
223
224and call
225
226 Aux::fieldhash my %foo;
227
228as needed.
229
230=head2 Examples
231
232Well... really only one example, and a rather trivial one at that.
233There isn't much to exemplify.
234
235=head3 A simple class...
236
237The following example shows an utterly simple inside-out class
238C<TimeStamp>, created using field hashes. It has a single field,
239incorporated as the field hash C<%time>. Besides C<new> it has only
240two methods: an initializer called C<stamp> that sets the field to
241the current time, and a read-only accessor C<when> that returns the
242time in C<localtime> format.
243
244 # The class TimeStamp
245
246 use Hash::Util::FieldHash;
247 {
248 package TimeStamp;
249
250 Hash::Util::FieldHash::fieldhash my %time;
251
252 sub stamp { $time{ $_[ 0]} = time; shift } # initializer
253 sub when { scalar localtime $time{ shift()} } # read accessor
254 sub new { bless( do { \ my $x }, shift)->stamp } # creator
255 }
256
257 # See if it works
258 my $ts = TimeStamp->new;
259 print $ts->when, "\n";
260
261Remarkable about this class definition is what isn't there: there
262is no C<DESTROY> method, inherited or local, and no C<CLONE> method
263is needed to make it thread-safe. Not to mention no need to call
264C<refaddr> or something similar in the accessors.
265
266=head3 ...in action
267
268The outstanding property of inside-out classes is their "inheritability".
269Like all inside-out classes, C<TimeStamp> is a I<universal base class>.
270We can put it on the C<@ISA> list of arbitrary classes and its methods
6ff38c27 271will just work, no matter how the host class is constructed. No traditional
272Perl class allows that. The following program demonstrates the feat:
1e73acc8 273
274 # Make a sample of objects to add time stamps to.
275
276 use Math::Complex;
277 use IO::Handle;
278
279 my @objects = (
280 Math::Complex->new( 12, 13),
281 IO::Handle->new(),
282 qr/abc/, # in class Regexp
283 bless( [], 'Boing'), # made up on the spot
6ff38c27 284 # add more
1e73acc8 285 );
286
287 # Prepare for use with TimeStamp
6ff38c27 288
1e73acc8 289 for ( @objects ) {
290 no strict 'refs';
291 push @{ ref() . '::ISA' }, 'TimeStamp';
292 }
293
294 # Now apply TimeStamp methods to all objects and show the result
295
296 for my $obj ( @objects ) {
297 $obj->stamp;
298 report( $obj, $obj->when);
299 }
300
301 # print a description of the object and the result of ->when
302
303 use Scalar::Util qw( reftype);
304 sub report {
305 my ( $obj, $when) = @_;
306 my $msg = sprintf "This is a %s object(a %s), its time is %s",
307 ref $obj,
308 reftype $obj,
309 $when;
310 $msg =~ s/\ba(?= [aeiouAEIOU])/an/g; # grammar matters :)
311 print "$msg\n";
312 }
313
314=head2 Garbage-Collected Hashes
315
316Garbage collection in a field hash means that entries will "spontaneously"
317disappear when the object that created them disappears. That must be
318borne in mind, especially when looping over a field hash. If anything
319you do inside the loop could cause an object to go out of scope, a
320random key may be deleted from the hash you are looping over. That
321can throw the loop iterator, so it's best to cache a consistent snapshot
322of the keys and/or values and loop over that. You will still have to
323check that a cached entry still exists when you get to it.
324
325Garbage collection can be confusing when keys are created in a field hash
326from normal scalars as well as references. Once a reference is I<used> with
327a field hash, the entry will be collected, even if it was later overwritten
328with a plain scalar key (every positive integer is a candidate). This
329is true even if the original entry was deleted in the meantime. In fact,
330deletion from a field hash, and also a test for existence constitute
331I<use> in this sense and create a liability to delete the entry when
332the reference goes out of scope. If you happen to create an entry
333with an identical key from a string or integer, that will be collected
334instead. Thus, mixed use of references and plain scalars as field hash
335keys is not entirely supported.
336
337=head1 Guts
338
339To make C<Hash::Util::FieldHash> work, there were two changes to
340F<perl> itself. C<PERL_MAGIC_uvar> was made avaliable for hashes,
341and weak references now call uvar C<get> magic after a weakref has been
342cleared. The first feature is used to make field hashes intercept
343their keys upon access. The second one triggers garbage collection.
344
345=head2 The C<PERL_MAGIC_uvar> interface for hashes
346
347C<PERL_MAGIC_uvar> I<get> magic is called from C<hv_fetch_common> and
348C<hv_delete_common> through the function C<hv_magic_uvar_xkey>, which
349defines the interface. The call happens for hashes with "uvar" magic
350if the C<ufuncs> structure has equal values in the C<uf_val> and C<uf_set>
351fields. Hashes are unaffected if (and as long as) these fields
352hold different values.
353
354Upon the call, the C<mg_obj> field will hold the hash key to be accessed.
355Upon return, the C<SV*> value in C<mg_obj> will be used in place of the
356original key in the hash access. The integer index value in the first
357parameter will be the C<action> value from C<hv_fetch_common>, or -1
358if the call is from C<hv_delete_common>.
359
360This is a template for a function suitable for the C<uf_val> field in
361a C<ufuncs> structure for this call. The C<uf_set> and C<uf_index>
362fields are irrelevant.
363
364 IV watch_key(pTHX_ IV action, SV* field) {
365 MAGIC* mg = mg_find(field, PERL_MAGIC_uvar);
366 SV* keysv = mg->mg_obj;
367 /* Do whatever you need to. If you decide to
368 supply a different key newkey, return it like this
369 */
370 sv_2mortal(newkey);
371 mg->mg_obj = newkey;
372 return 0;
373 }
374
375=head2 Weakrefs call uvar magic
376
377When a weak reference is stored in an C<SV> that has "uvar" magic, C<set>
378magic is called after the reference has gone stale. This hook can be
379used to trigger further garbage-collection activities associated with
380the referenced object.
381
382=head2 How field hashes work
383
384The three features of key hashes, I<key replacement>, I<thread support>,
385and I<garbage collection> are supported by a data structure called
6ff38c27 386the I<object registry>. This is a private hash where every object
387is stored. An "object" in this sense is any reference (blessed or
388unblessed) that has been used as a field hash key.
1e73acc8 389
390The object registry keeps track of references that have been used as
391field hash keys. The keys are generated from the reference address
392like in a field hash (though the registry isn't a field hash). Each
393value is a weak copy of the original reference, stored in an C<SV> that
394is itself magical (C<PERL_MAGIC_uvar> again). The magical structure
395holds a list (another hash, really) of field hashes that the reference
396has been used with. When the weakref becomes stale, the magic is
397activated and uses the list to delete the reference from all field
398hashes it has been used with. After that, the entry is removed from
399the object registry itself. Implicitly, that frees the magic structure
400and the storage it has been using.
401
402Whenever a reference is used as a field hash key, the object registry
403is checked and a new entry is made if necessary. The field hash is
404then added to the list of fields this reference has used.
405
406The object registry is also used to repair a field hash after thread
407cloning. Here, the entire object registry is processed. For every
408reference found there, the field hashes it has used are visited and
409the entry is updated.
410
411=head2 Internal function Hash::Util::FieldHash::_fieldhash
412
413 # test if %hash is a field hash
414 my $result = _fieldhash \ %hash, 0;
415
416 # make %hash a field hash
417 my $result = _fieldhash \ %hash, 1;
418
419C<_fieldhash> is the internal function used to create field hashes.
420It takes two arguments, a hashref and a mode. If the mode is boolean
421false, the hash is not changed but tested if it is a field hash. If
422the hash isn't a field hash the return value is boolean false. If it
423is, the return value indicates the mode of field hash. When called with
424a boolean true mode, it turns the given hash into a field hash of this
425mode, returning the mode of the created field hash. C<_fieldhash>
426does not erase the given hash.
427
428Currently there is only one type of field hash, and only the boolean
429value of the mode makes a difference, but that may change.
430
431=head1 AUTHOR
432
433Anno Siegel, E<lt>anno4000@zrz.tu-berlin.deE<gt>
434
435=head1 COPYRIGHT AND LICENSE
436
6ff38c27 437Copyright (C) 2006 by (Anno Siegel)
1e73acc8 438
439This library is free software; you can redistribute it and/or modify
440it under the same terms as Perl itself, either Perl version 5.8.7 or,
441at your option, any later version of Perl 5 you may have available.
442
443=cut