perl 5.0 alpha 8
[p5sagit/p5-mst-13.2.git] / internals
CommitLineData
ed6116ce 1Newsgroups: comp.lang.perl
2Subject: Re: perl5a4: tie ref restriction?
3Summary:
4Expires:
5References: <2h7b64$aai@jethro.Corp.Sun.COM>
6Sender:
7Followup-To:
8Distribution: world
9Organization: NetLabs, Inc.
10Keywords:
11
12In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes:
13: Darn:
14: tie ( @a, TST_tie, "arg1", "arg2" );
15: $a[2]=[1];
16:
17: produces:
18:
19: Can't assign a reference to a magical variable at ./tsttie line 12.
20:
21: I'm all agog about the "tie" function, but ... if this restriction
22: wasn't there, I think I would be able to tie a top level
23: reference/variable to my own package, and then automatically tie in all
24: subsequently linked vars/references so that I could "tie" any arbitrary thing
25: like:
26: $r->{key}[el]{key}
27:
28: to a DBM or other type storage area.
29:
30: Is the restriction necessary?
31
32In the current storage scheme, yes, but as I mentioned in the other
33article, I can and probably should relax that. That code is some of
34the oldest Perl 5 code, and I didn't see some things then that I do
35now.
36
2304df62 37[I did relax that.]
38
ed6116ce 39Ok, let me explain some things about how values are stored. Consider
40this a little design document.
41
42Internally everything is unified to look like a scalar, regardless of
43its type. There's a type-invariant part of every value, and a
44type-variant part. When we modify the type of a value, we can do it in
45place because all references point to the invariant part. All we do is
46swap the variant part for a different part and change that ANY pointer
47in the invariant part to point to the new variant.
48
49The invariant part looks like this:
50
51struct sv {
52 void* sv_any; /* pointer to something */
53 U32 sv_refcnt; /* how many references to us */
54 SVTYPE sv_type; /* what sort of thing pointer points to */
55 U8 sv_flags; /* extra flags, some depending on type */
56 U8 sv_storage; /* storage class */
57 U8 sv_private; /* extra value, depending on type */
58};
59
2304df62 60[The last 4 bytes have been combined into a single U32.]
61
ed6116ce 62This is typedefed to SV. There are other structurally equivalent
63types, AV, HV and CV, that are there merely to help gdb know what kind
64of pointer sv_any is, and provide a little bit of C type-checking.
65Here's a key to Perl naming:
66
67 SV scalar value
68 AV array value
69 HV hash value
70 CV code value
71
72Additionally I often use names containing
73
74 IV integer value
75 NV numeric value (double)
76 PV pointer value
2304df62 77 RV reference value
ed6116ce 78 LV lvalue, such as a substr() or vec() being assigned to
79 BM a string containing a Boyer-Moore compiled pattern
80 FM a format line program
81
82You'll notice that in SV there's an sv_type field. This contains one
83of the following values, which gives the interpretation of sv_any.
84
85typedef enum {
86 SVt_NULL,
87 SVt_REF,
88 SVt_IV,
89 SVt_NV,
90 SVt_PV,
91 SVt_PVIV,
92 SVt_PVNV,
93 SVt_PVMG,
94 SVt_PVLV,
95 SVt_PVAV,
96 SVt_PVHV,
97 SVt_PVCV,
98 SVt_PVGV,
99 SVt_PVBM,
100 SVt_PVFM,
101} svtype;
102
2304df62 103[There is no longer a REF type. There's an RV type that holds a minimal ref
104value but other types can also hold an RV. This was to allow magical refs.]
105
ed6116ce 106These are arranged ROUGHLY in order of increasing complexity, though
107there are some discontinuities. Many of them indicate that sv_any
108points to a struct of a similar name with an X on the front. They can
109be classified like this:
110
111 SVt_NULL
112 The sv_any doesn't point to anything meaningful.
113
114 SVt_REF
115 The sv_any points to another SV. (This is what we're talking
2304df62 116 about changing to work more like IV and NV below.) [And that's what
117 I did.]
ed6116ce 118
119 SVt_IV
120 SVt_NV
121 These are a little tricky in order to be efficient in both
122 memory and time. The sv_any pointer indicates the location of
123 a solitary integer(double), but not directly. The pointer is
124 really a pointer to an XPVIV(XPVNV), so that if there's a valid
125 integer(double) the same code works regardless of the type of
126 the SV. They have special allocators that guarantee that, even
127 though sv_any is pointing to a location several words earlier
128 than the integer(double), it never points to unallocated
129 memory. This does waste a few allocated integers(doubles) at
130 the beginning, but it's probably an overall win.
131
2304df62 132 [SVt_RV probably belongs here.]
ed6116ce 133 SVt_PV
134 SVt_PVIV
135 SVt_PVNV
136 SVt_PVMG
137 These are pretty ordinary, and each is "derived" from the
138 previous in the sense that it just adds more data to the
139 previous structure.
2304df62 140[ Need to add this:
141 struct xrv {
142 SV * xrv_rv; /* pointer to another SV */
143 };
144
145 A reference value. In the following structs its space is reserved
146 as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to
147 another SV, not a string.
148]
ed6116ce 149
150 struct xpv {
151 char * xpv_pv; /* pointer to malloced string */
152 STRLEN xpv_cur; /* length of xpv_pv as a C string */
153 STRLEN xpv_len; /* allocated size */
154 };
155
156 This is your basic string scalar that is never used numerically
157 or magically.
158
159 struct xpviv {
160 char * xpv_pv; /* pointer to malloced string */
161 STRLEN xpv_cur; /* length of xpv_pv as a C string */
162 STRLEN xpv_len; /* allocated size */
163 I32 xiv_iv; /* integer value or pv offset */
164 };
165
166 This is a string scalar that has either been used as an
167 integer, or an integer that has been used in a string
168 context, or has had the front trimmed off of it, in which
169 case xiv_iv contains how far xpv_pv has been incremented
170 from the original allocated value.
171
172 struct xpvnv {
173 char * xpv_pv; /* pointer to malloced string */
174 STRLEN xpv_cur; /* length of xpv_pv as a C string */
175 STRLEN xpv_len; /* allocated size */
176 I32 xiv_iv; /* integer value or pv offset */
177 double xnv_nv; /* numeric value, if any */
178 };
179
180 This is a string or integer scalar that has been used in a
181 numeric context, or a number that has been used in a string
182 or integer context.
183
184 struct xpvmg {
185 char * xpv_pv; /* pointer to malloced string */
186 STRLEN xpv_cur; /* length of xpv_pv as a C string */
187 STRLEN xpv_len; /* allocated size */
188 I32 xiv_iv; /* integer value or pv offset */
189 double xnv_nv; /* numeric value, if any */
190 MAGIC* xmg_magic; /* linked list of magicalness */
191 HV* xmg_stash; /* class package */
192 };
193
194 This is the top of the line for ordinary scalars. This scalar
195 has been charmed with one or more kinds of magical or object
196 behavior. In addition it can contain any or all of integer,
197 double or string.
198
199 SVt_PVLV
200 SVt_PVAV
201 SVt_PVHV
202 SVt_PVCV
203 SVt_PVGV
204 SVt_PVBM
205 SVt_PVFM
206 These are specialized forms that are never directly visible to
207 the Perl script. They are independent of each other, and may
208 not be promoted to any other type.
2304df62 209 [Actually, PVBM doesn't belong here, but in the previous section.
210 saying index($foo,$bar) will in fact turn $bar into a PVBM so that
211 it can do Boyer-Moore searching.]
ed6116ce 212
213There are several additional data values in the SV structure. The sv_refcnt
214gives the number of references to this SV. Some of these references may be
215actual Perl language references, but many other are just internal pointers,
216from a symbol table, or from the syntax tree, for example. When sv_refcnt
2304df62 217goes to zero, the value can be safely deallocated. Must be, in fact.
ed6116ce 218
219The sv_storage byte is not very well thought out, but tends to indicate
220something about where the scalar lives. It's used in allocating
221lexical storage, and at runtime contains an 'O' if the value has been
222blessed as an object. There may be some conflicts lurking in here, and
2304df62 223I may eventually claim some of the bits for other purposes. [I did,
224with a vengeance.]
ed6116ce 225
226The sv_flags are currently as follows. Most of these are set and cleared
227by macros to guarantee their consistency, and you should always use the
228proper macro rather than accessing them directly.
229
2304df62 230[Most of these numbers have changed, and there are some new flags.
231And they're all stuffed into a single U32.]
232
ed6116ce 233#define SVf_IOK 1 /* has valid integer value */
234#define SVf_NOK 2 /* has valid numeric value */
235#define SVf_POK 4 /* has valid pointer value */
236 These tell whether an integer, double or string value is
237 immediately available without further consideration. All tainting
238 and magic (but not objecthood) works by turning off these bits and
239 forcing a routine to be executed to discover the real value. The
240 SvIV(), SvNV() and SvPV() macros that fetch values are smart about
241 all this, and should always be used if possible. Most of the stuff
242 mentioned below you really don't have to deal with directly. (Values
243 aren't stored using macros, but using functions sv_setiv(), sv_setnv()
244 and sv_setpv(), plus variants. You should never have to explicitly
245 follow the sv_any pointer to any X structure in your code.)
246
247#define SVf_OOK 8 /* has valid offset value */
248 This is only on when SVf_IOK is off, and indicates that the unused
249 integer storage is holding an offset for the string pointer value
250 because you've done something like s/^prefix//.
251
252#define SVf_MAGICAL 16 /* has special methods */
253 This indicates not only that sv_type is at least SVt_PVMG, but
254 also that the linked list of magical behaviors is not empty.
255
256#define SVf_OK 32 /* has defined value */
257 This indicates that the value is defined. Currently it means either
258 that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK
259 is set.
260
261#define SVf_TEMP 64 /* eventually in sv_private? */
262 This indicates that the string is a temporary allocated by one of
263 the sv_mortal functions, and that any string value may be stolen
264 from it without copying. (It's important not to steal the value if
265 the temporary will continue to require the value, however.)
266
267#define SVf_READONLY 128 /* may not be modified */
268 This scalar value may not be modified. Any function that might modify
269 a scalar should check for this first, and reject the operation when
270 inappropriate. Currently only the builtin values for sv_undef, sv_yes
271 and sv_no are marked readonly, but eventually we may provide a language
272 to set this bit.
273
274The sv_private byte contains some additional bits that apply across the
275board. Really private bits (that depend on the type) are allocated from
276128 down.
277
278#define SVp_IOK 1 /* has valid non-public integer value */
279#define SVp_NOK 2 /* has valid non-public numeric value */
280#define SVp_POK 4 /* has valid non-public pointer value */
281 These shadow the bits in sv_flags for tainted variables, indicated that
282 there really is a valid value available, but you have to set the global
283 tainted flag if you acces them.
284
285#define SVp_SCREAM 8 /* has been studied? */
286 Indicates that a study was done on this string. A studied string is
287 magical and automatically unstudies itself when modified.
288
289#define SVp_TAINTEDDIR 16 /* PATH component is a security risk */
290 A special flag for $ENV{PATH} that indicates that, while the value
291 as a whole may be untainted, some path component names an insecure
292 directory.
293
294#define SVpfm_COMPILED 128
295 For a format, whether its picture has been "compiled" yet. This
296 cannot be done until runtime because the user has access to the
297 internal formline function, and may supply a variable as the
298 picture.
299
300#define SVpbm_VALID 128
301#define SVpbm_CASEFOLD 64
302#define SVpbm_TAIL 32
303 For a Boyer-Moore pattern, whether the search string has been invalidated
304 by modification (can happen to $pat between calls to index($string,$pat)),
305 whether case folding is in force for regexp matching, and whether we're
306 trying to match something like /foo$/.
307
308#define SVpgv_MULTI 128
309 For a symbol table entry, set when we've decided that this symbol is
310 probably not a typo. Suspected typos can be reported by -w.
311
312
313Well, that's probably enough for now. As you can see, we could turn
314references into something more like an integer or a pointer value. In
315fact, I suspect the right thing to do is say that a reference is just
316a funny type of string pointer that isn't allocated the same way.
317This would let us not only have references to scalars, but might provide
318a way to have scalars that point to non-malloced memory. Hmm. I'll
319have to think about that s'more. You can think about it too.
320
321Larry