1 Newsgroups: comp.lang.perl
2 Subject: Re: perl5a4: tie ref restriction?
5 References: <2h7b64$aai@jethro.Corp.Sun.COM>
9 Organization: NetLabs, Inc.
12 In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
14 : tie ( @a, TST_tie, "arg1", "arg2" );
19 : Can't assign a reference to a magical variable at ./tsttie line 12.
21 : I'm all agog about the "tie" function, but ... if this restriction
22 : wasn't there, I think I would be able to tie a top level
23 : reference/variable to my own package, and then automatically tie in all
24 : subsequently linked vars/references so that I could "tie" any arbitrary thing
28 : to a DBM or other type storage area.
30 : Is the restriction necessary?
32 In the current storage scheme, yes, but as I mentioned in the other
33 article, I can and probably should relax that. That code is some of
34 the oldest Perl 5 code, and I didn't see some things then that I do
39 Ok, let me explain some things about how values are stored. Consider
40 this a little design document.
42 Internally everything is unified to look like a scalar, regardless of
43 its type. There's a type-invariant part of every value, and a
44 type-variant part. When we modify the type of a value, we can do it in
45 place because all references point to the invariant part. All we do is
46 swap the variant part for a different part and change that ANY pointer
47 in the invariant part to point to the new variant.
49 The invariant part looks like this:
52 void* sv_any; /* pointer to something */
53 U32 sv_refcnt; /* how many references to us */
54 SVTYPE sv_type; /* what sort of thing pointer points to */
55 U8 sv_flags; /* extra flags, some depending on type */
56 U8 sv_storage; /* storage class */
57 U8 sv_private; /* extra value, depending on type */
60 [The last 4 bytes have been combined into a single U32.]
62 This is typedefed to SV. There are other structurally equivalent
63 types, AV, HV and CV, that are there merely to help gdb know what kind
64 of pointer sv_any is, and provide a little bit of C type-checking.
65 Here's a key to Perl naming:
72 Additionally I often use names containing
75 NV numeric value (double)
78 LV lvalue, such as a substr() or vec() being assigned to
79 BM a string containing a Boyer-Moore compiled pattern
80 FM a format line program
82 You'll notice that in SV there's an sv_type field. This contains one
83 of the following values, which gives the interpretation of sv_any.
103 [There is no longer a REF type. There's an RV type that holds a minimal ref
104 value but other types can also hold an RV. This was to allow magical refs.]
106 These are arranged ROUGHLY in order of increasing complexity, though
107 there are some discontinuities. Many of them indicate that sv_any
108 points to a struct of a similar name with an X on the front. They can
109 be classified like this:
112 The sv_any doesn't point to anything meaningful.
115 The sv_any points to another SV. (This is what we're talking
116 about changing to work more like IV and NV below.) [And that's what
121 These are a little tricky in order to be efficient in both
122 memory and time. The sv_any pointer indicates the location of
123 a solitary integer(double), but not directly. The pointer is
124 really a pointer to an XPVIV(XPVNV), so that if there's a valid
125 integer(double) the same code works regardless of the type of
126 the SV. They have special allocators that guarantee that, even
127 though sv_any is pointing to a location several words earlier
128 than the integer(double), it never points to unallocated
129 memory. This does waste a few allocated integers(doubles) at
130 the beginning, but it's probably an overall win.
132 [SVt_RV probably belongs here.]
137 These are pretty ordinary, and each is "derived" from the
138 previous in the sense that it just adds more data to the
142 SV * xrv_rv; /* pointer to another SV */
145 A reference value. In the following structs its space is reserved
146 as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to
147 another SV, not a string.
151 char * xpv_pv; /* pointer to malloced string */
152 STRLEN xpv_cur; /* length of xpv_pv as a C string */
153 STRLEN xpv_len; /* allocated size */
156 This is your basic string scalar that is never used numerically
160 char * xpv_pv; /* pointer to malloced string */
161 STRLEN xpv_cur; /* length of xpv_pv as a C string */
162 STRLEN xpv_len; /* allocated size */
163 I32 xiv_iv; /* integer value or pv offset */
166 This is a string scalar that has either been used as an
167 integer, or an integer that has been used in a string
168 context, or has had the front trimmed off of it, in which
169 case xiv_iv contains how far xpv_pv has been incremented
170 from the original allocated value.
173 char * xpv_pv; /* pointer to malloced string */
174 STRLEN xpv_cur; /* length of xpv_pv as a C string */
175 STRLEN xpv_len; /* allocated size */
176 I32 xiv_iv; /* integer value or pv offset */
177 double xnv_nv; /* numeric value, if any */
180 This is a string or integer scalar that has been used in a
181 numeric context, or a number that has been used in a string
185 char * xpv_pv; /* pointer to malloced string */
186 STRLEN xpv_cur; /* length of xpv_pv as a C string */
187 STRLEN xpv_len; /* allocated size */
188 I32 xiv_iv; /* integer value or pv offset */
189 double xnv_nv; /* numeric value, if any */
190 MAGIC* xmg_magic; /* linked list of magicalness */
191 HV* xmg_stash; /* class package */
194 This is the top of the line for ordinary scalars. This scalar
195 has been charmed with one or more kinds of magical or object
196 behavior. In addition it can contain any or all of integer,
206 These are specialized forms that are never directly visible to
207 the Perl script. They are independent of each other, and may
208 not be promoted to any other type.
209 [Actually, PVBM doesn't belong here, but in the previous section.
210 saying index($foo,$bar) will in fact turn $bar into a PVBM so that
211 it can do Boyer-Moore searching.]
213 There are several additional data values in the SV structure. The sv_refcnt
214 gives the number of references to this SV. Some of these references may be
215 actual Perl language references, but many other are just internal pointers,
216 from a symbol table, or from the syntax tree, for example. When sv_refcnt
217 goes to zero, the value can be safely deallocated. Must be, in fact.
219 The sv_storage byte is not very well thought out, but tends to indicate
220 something about where the scalar lives. It's used in allocating
221 lexical storage, and at runtime contains an 'O' if the value has been
222 blessed as an object. There may be some conflicts lurking in here, and
223 I may eventually claim some of the bits for other purposes. [I did,
226 The sv_flags are currently as follows. Most of these are set and cleared
227 by macros to guarantee their consistency, and you should always use the
228 proper macro rather than accessing them directly.
230 [Most of these numbers have changed, and there are some new flags.
231 And they're all stuffed into a single U32.]
233 #define SVf_IOK 1 /* has valid integer value */
234 #define SVf_NOK 2 /* has valid numeric value */
235 #define SVf_POK 4 /* has valid pointer value */
236 These tell whether an integer, double or string value is
237 immediately available without further consideration. All tainting
238 and magic (but not objecthood) works by turning off these bits and
239 forcing a routine to be executed to discover the real value. The
240 SvIV(), SvNV() and SvPV() macros that fetch values are smart about
241 all this, and should always be used if possible. Most of the stuff
242 mentioned below you really don't have to deal with directly. (Values
243 aren't stored using macros, but using functions sv_setiv(), sv_setnv()
244 and sv_setpv(), plus variants. You should never have to explicitly
245 follow the sv_any pointer to any X structure in your code.)
247 #define SVf_OOK 8 /* has valid offset value */
248 This is only on when SVf_IOK is off, and indicates that the unused
249 integer storage is holding an offset for the string pointer value
250 because you've done something like s/^prefix//.
252 #define SVf_MAGICAL 16 /* has special methods */
253 This indicates not only that sv_type is at least SVt_PVMG, but
254 also that the linked list of magical behaviors is not empty.
256 #define SVf_OK 32 /* has defined value */
257 This indicates that the value is defined. Currently it means either
258 that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
261 #define SVf_TEMP 64 /* eventually in sv_private? */
262 This indicates that the string is a temporary allocated by one of
263 the sv_mortal functions, and that any string value may be stolen
264 from it without copying. (It's important not to steal the value if
265 the temporary will continue to require the value, however.)
267 #define SVf_READONLY 128 /* may not be modified */
268 This scalar value may not be modified. Any function that might modify
269 a scalar should check for this first, and reject the operation when
270 inappropriate. Currently only the builtin values for sv_undef, sv_yes
271 and sv_no are marked readonly, but eventually we may provide a language
274 The sv_private byte contains some additional bits that apply across the
275 board. Really private bits (that depend on the type) are allocated from
278 #define SVp_IOK 1 /* has valid non-public integer value */
279 #define SVp_NOK 2 /* has valid non-public numeric value */
280 #define SVp_POK 4 /* has valid non-public pointer value */
281 These shadow the bits in sv_flags for tainted variables, indicated that
282 there really is a valid value available, but you have to set the global
283 tainted flag if you acces them.
285 #define SVp_SCREAM 8 /* has been studied? */
286 Indicates that a study was done on this string. A studied string is
287 magical and automatically unstudies itself when modified.
289 #define SVp_TAINTEDDIR 16 /* PATH component is a security risk */
290 A special flag for $ENV{PATH} that indicates that, while the value
291 as a whole may be untainted, some path component names an insecure
294 #define SVpfm_COMPILED 128
295 For a format, whether its picture has been "compiled" yet. This
296 cannot be done until runtime because the user has access to the
297 internal formline function, and may supply a variable as the
300 #define SVpbm_VALID 128
301 #define SVpbm_CASEFOLD 64
302 #define SVpbm_TAIL 32
303 For a Boyer-Moore pattern, whether the search string has been invalidated
304 by modification (can happen to $pat between calls to index($string,$pat)),
305 whether case folding is in force for regexp matching, and whether we're
306 trying to match something like /foo$/.
308 #define SVpgv_MULTI 128
309 For a symbol table entry, set when we've decided that this symbol is
310 probably not a typo. Suspected typos can be reported by -w.
313 Well, that's probably enough for now. As you can see, we could turn
314 references into something more like an integer or a pointer value. In
315 fact, I suspect the right thing to do is say that a reference is just
316 a funny type of string pointer that isn't allocated the same way.
317 This would let us not only have references to scalars, but might provide
318 a way to have scalars that point to non-malloced memory. Hmm. I'll
319 have to think about that s'more. You can think about it too.