1 Newsgroups: comp.lang.perl
2 Subject: Re: perl5a4: tie ref restriction?
5 References: <2h7b64$aai@jethro.Corp.Sun.COM>
9 Organization: NetLabs, Inc.
12 In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
14 : tie ( @a, TST_tie, "arg1", "arg2" );
19 : Can't assign a reference to a magical variable at ./tsttie line 12.
21 : I'm all agog about the "tie" function, but ... if this restriction
22 : wasn't there, I think I would be able to tie a top level
23 : reference/variable to my own package, and then automatically tie in all
24 : subsequently linked vars/references so that I could "tie" any arbitrary thing
28 : to a DBM or other type storage area.
30 : Is the restriction necessary?
32 In the current storage scheme, yes, but as I mentioned in the other
33 article, I can and probably should relax that. That code is some of
34 the oldest Perl 5 code, and I didn't see some things then that I do
37 Ok, let me explain some things about how values are stored. Consider
38 this a little design document.
40 Internally everything is unified to look like a scalar, regardless of
41 its type. There's a type-invariant part of every value, and a
42 type-variant part. When we modify the type of a value, we can do it in
43 place because all references point to the invariant part. All we do is
44 swap the variant part for a different part and change that ANY pointer
45 in the invariant part to point to the new variant.
47 The invariant part looks like this:
50 void* sv_any; /* pointer to something */
51 U32 sv_refcnt; /* how many references to us */
52 SVTYPE sv_type; /* what sort of thing pointer points to */
53 U8 sv_flags; /* extra flags, some depending on type */
54 U8 sv_storage; /* storage class */
55 U8 sv_private; /* extra value, depending on type */
58 This is typedefed to SV. There are other structurally equivalent
59 types, AV, HV and CV, that are there merely to help gdb know what kind
60 of pointer sv_any is, and provide a little bit of C type-checking.
61 Here's a key to Perl naming:
68 Additionally I often use names containing
71 NV numeric value (double)
73 LV lvalue, such as a substr() or vec() being assigned to
74 BM a string containing a Boyer-Moore compiled pattern
75 FM a format line program
77 You'll notice that in SV there's an sv_type field. This contains one
78 of the following values, which gives the interpretation of sv_any.
98 These are arranged ROUGHLY in order of increasing complexity, though
99 there are some discontinuities. Many of them indicate that sv_any
100 points to a struct of a similar name with an X on the front. They can
101 be classified like this:
104 The sv_any doesn't point to anything meaningful.
107 The sv_any points to another SV. (This is what we're talking
108 about changing to work more like IV and NV below.)
112 These are a little tricky in order to be efficient in both
113 memory and time. The sv_any pointer indicates the location of
114 a solitary integer(double), but not directly. The pointer is
115 really a pointer to an XPVIV(XPVNV), so that if there's a valid
116 integer(double) the same code works regardless of the type of
117 the SV. They have special allocators that guarantee that, even
118 though sv_any is pointing to a location several words earlier
119 than the integer(double), it never points to unallocated
120 memory. This does waste a few allocated integers(doubles) at
121 the beginning, but it's probably an overall win.
127 These are pretty ordinary, and each is "derived" from the
128 previous in the sense that it just adds more data to the
132 char * xpv_pv; /* pointer to malloced string */
133 STRLEN xpv_cur; /* length of xpv_pv as a C string */
134 STRLEN xpv_len; /* allocated size */
137 This is your basic string scalar that is never used numerically
141 char * xpv_pv; /* pointer to malloced string */
142 STRLEN xpv_cur; /* length of xpv_pv as a C string */
143 STRLEN xpv_len; /* allocated size */
144 I32 xiv_iv; /* integer value or pv offset */
147 This is a string scalar that has either been used as an
148 integer, or an integer that has been used in a string
149 context, or has had the front trimmed off of it, in which
150 case xiv_iv contains how far xpv_pv has been incremented
151 from the original allocated value.
154 char * xpv_pv; /* pointer to malloced string */
155 STRLEN xpv_cur; /* length of xpv_pv as a C string */
156 STRLEN xpv_len; /* allocated size */
157 I32 xiv_iv; /* integer value or pv offset */
158 double xnv_nv; /* numeric value, if any */
161 This is a string or integer scalar that has been used in a
162 numeric context, or a number that has been used in a string
166 char * xpv_pv; /* pointer to malloced string */
167 STRLEN xpv_cur; /* length of xpv_pv as a C string */
168 STRLEN xpv_len; /* allocated size */
169 I32 xiv_iv; /* integer value or pv offset */
170 double xnv_nv; /* numeric value, if any */
171 MAGIC* xmg_magic; /* linked list of magicalness */
172 HV* xmg_stash; /* class package */
175 This is the top of the line for ordinary scalars. This scalar
176 has been charmed with one or more kinds of magical or object
177 behavior. In addition it can contain any or all of integer,
187 These are specialized forms that are never directly visible to
188 the Perl script. They are independent of each other, and may
189 not be promoted to any other type.
191 There are several additional data values in the SV structure. The sv_refcnt
192 gives the number of references to this SV. Some of these references may be
193 actual Perl language references, but many other are just internal pointers,
194 from a symbol table, or from the syntax tree, for example. When sv_refcnt
195 goes to zero, the value can be safely deallocated.
197 The sv_storage byte is not very well thought out, but tends to indicate
198 something about where the scalar lives. It's used in allocating
199 lexical storage, and at runtime contains an 'O' if the value has been
200 blessed as an object. There may be some conflicts lurking in here, and
201 I may eventually claim some of the bits for other purposes.
203 The sv_flags are currently as follows. Most of these are set and cleared
204 by macros to guarantee their consistency, and you should always use the
205 proper macro rather than accessing them directly.
207 #define SVf_IOK 1 /* has valid integer value */
208 #define SVf_NOK 2 /* has valid numeric value */
209 #define SVf_POK 4 /* has valid pointer value */
210 These tell whether an integer, double or string value is
211 immediately available without further consideration. All tainting
212 and magic (but not objecthood) works by turning off these bits and
213 forcing a routine to be executed to discover the real value. The
214 SvIV(), SvNV() and SvPV() macros that fetch values are smart about
215 all this, and should always be used if possible. Most of the stuff
216 mentioned below you really don't have to deal with directly. (Values
217 aren't stored using macros, but using functions sv_setiv(), sv_setnv()
218 and sv_setpv(), plus variants. You should never have to explicitly
219 follow the sv_any pointer to any X structure in your code.)
221 #define SVf_OOK 8 /* has valid offset value */
222 This is only on when SVf_IOK is off, and indicates that the unused
223 integer storage is holding an offset for the string pointer value
224 because you've done something like s/^prefix//.
226 #define SVf_MAGICAL 16 /* has special methods */
227 This indicates not only that sv_type is at least SVt_PVMG, but
228 also that the linked list of magical behaviors is not empty.
230 #define SVf_OK 32 /* has defined value */
231 This indicates that the value is defined. Currently it means either
232 that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
235 #define SVf_TEMP 64 /* eventually in sv_private? */
236 This indicates that the string is a temporary allocated by one of
237 the sv_mortal functions, and that any string value may be stolen
238 from it without copying. (It's important not to steal the value if
239 the temporary will continue to require the value, however.)
241 #define SVf_READONLY 128 /* may not be modified */
242 This scalar value may not be modified. Any function that might modify
243 a scalar should check for this first, and reject the operation when
244 inappropriate. Currently only the builtin values for sv_undef, sv_yes
245 and sv_no are marked readonly, but eventually we may provide a language
248 The sv_private byte contains some additional bits that apply across the
249 board. Really private bits (that depend on the type) are allocated from
252 #define SVp_IOK 1 /* has valid non-public integer value */
253 #define SVp_NOK 2 /* has valid non-public numeric value */
254 #define SVp_POK 4 /* has valid non-public pointer value */
255 These shadow the bits in sv_flags for tainted variables, indicated that
256 there really is a valid value available, but you have to set the global
257 tainted flag if you acces them.
259 #define SVp_SCREAM 8 /* has been studied? */
260 Indicates that a study was done on this string. A studied string is
261 magical and automatically unstudies itself when modified.
263 #define SVp_TAINTEDDIR 16 /* PATH component is a security risk */
264 A special flag for $ENV{PATH} that indicates that, while the value
265 as a whole may be untainted, some path component names an insecure
268 #define SVpfm_COMPILED 128
269 For a format, whether its picture has been "compiled" yet. This
270 cannot be done until runtime because the user has access to the
271 internal formline function, and may supply a variable as the
274 #define SVpbm_VALID 128
275 #define SVpbm_CASEFOLD 64
276 #define SVpbm_TAIL 32
277 For a Boyer-Moore pattern, whether the search string has been invalidated
278 by modification (can happen to $pat between calls to index($string,$pat)),
279 whether case folding is in force for regexp matching, and whether we're
280 trying to match something like /foo$/.
282 #define SVpgv_MULTI 128
283 For a symbol table entry, set when we've decided that this symbol is
284 probably not a typo. Suspected typos can be reported by -w.
287 Well, that's probably enough for now. As you can see, we could turn
288 references into something more like an integer or a pointer value. In
289 fact, I suspect the right thing to do is say that a reference is just
290 a funny type of string pointer that isn't allocated the same way.
291 This would let us not only have references to scalars, but might provide
292 a way to have scalars that point to non-malloced memory. Hmm. I'll
293 have to think about that s'more. You can think about it too.