Commit | Line | Data |
ed6116ce |
1 | Newsgroups: comp.lang.perl |
2 | Subject: Re: perl5a4: tie ref restriction? |
3 | Summary: |
4 | Expires: |
5 | References: <2h7b64$aai@jethro.Corp.Sun.COM> |
6 | Sender: |
7 | Followup-To: |
8 | Distribution: world |
9 | Organization: NetLabs, Inc. |
10 | Keywords: |
11 | |
12 | In article <2h7b64$aai@jethro.Corp.Sun.COM> Eric.Arnold@Sun.COM writes: |
13 | : Darn: |
14 | : tie ( @a, TST_tie, "arg1", "arg2" ); |
15 | : $a[2]=[1]; |
16 | : |
17 | : produces: |
18 | : |
19 | : Can't assign a reference to a magical variable at ./tsttie line 12. |
20 | : |
21 | : I'm all agog about the "tie" function, but ... if this restriction |
22 | : wasn't there, I think I would be able to tie a top level |
23 | : reference/variable to my own package, and then automatically tie in all |
24 | : subsequently linked vars/references so that I could "tie" any arbitrary thing |
25 | : like: |
26 | : $r->{key}[el]{key} |
27 | : |
28 | : to a DBM or other type storage area. |
29 | : |
30 | : Is the restriction necessary? |
31 | |
32 | In the current storage scheme, yes, but as I mentioned in the other |
33 | article, I can and probably should relax that. That code is some of |
34 | the oldest Perl 5 code, and I didn't see some things then that I do |
35 | now. |
36 | |
2304df62 |
37 | [I did relax that.] |
38 | |
ed6116ce |
39 | Ok, let me explain some things about how values are stored. Consider |
40 | this a little design document. |
41 | |
42 | Internally everything is unified to look like a scalar, regardless of |
43 | its type. There's a type-invariant part of every value, and a |
44 | type-variant part. When we modify the type of a value, we can do it in |
45 | place because all references point to the invariant part. All we do is |
46 | swap the variant part for a different part and change that ANY pointer |
47 | in the invariant part to point to the new variant. |
48 | |
49 | The invariant part looks like this: |
50 | |
51 | struct sv { |
52 | void* sv_any; /* pointer to something */ |
53 | U32 sv_refcnt; /* how many references to us */ |
54 | SVTYPE sv_type; /* what sort of thing pointer points to */ |
55 | U8 sv_flags; /* extra flags, some depending on type */ |
56 | U8 sv_storage; /* storage class */ |
57 | U8 sv_private; /* extra value, depending on type */ |
58 | }; |
59 | |
2304df62 |
60 | [The last 4 bytes have been combined into a single U32.] |
61 | |
ed6116ce |
62 | This is typedefed to SV. There are other structurally equivalent |
63 | types, AV, HV and CV, that are there merely to help gdb know what kind |
64 | of pointer sv_any is, and provide a little bit of C type-checking. |
65 | Here's a key to Perl naming: |
66 | |
67 | SV scalar value |
68 | AV array value |
69 | HV hash value |
70 | CV code value |
71 | |
72 | Additionally I often use names containing |
73 | |
74 | IV integer value |
75 | NV numeric value (double) |
76 | PV pointer value |
2304df62 |
77 | RV reference value |
ed6116ce |
78 | LV lvalue, such as a substr() or vec() being assigned to |
79 | BM a string containing a Boyer-Moore compiled pattern |
80 | FM a format line program |
81 | |
82 | You'll notice that in SV there's an sv_type field. This contains one |
83 | of the following values, which gives the interpretation of sv_any. |
84 | |
85 | typedef enum { |
86 | SVt_NULL, |
87 | SVt_REF, |
88 | SVt_IV, |
89 | SVt_NV, |
90 | SVt_PV, |
91 | SVt_PVIV, |
92 | SVt_PVNV, |
93 | SVt_PVMG, |
94 | SVt_PVLV, |
95 | SVt_PVAV, |
96 | SVt_PVHV, |
97 | SVt_PVCV, |
98 | SVt_PVGV, |
99 | SVt_PVBM, |
100 | SVt_PVFM, |
101 | } svtype; |
102 | |
2304df62 |
103 | [There is no longer a REF type. There's an RV type that holds a minimal ref |
104 | value but other types can also hold an RV. This was to allow magical refs.] |
105 | |
ed6116ce |
106 | These are arranged ROUGHLY in order of increasing complexity, though |
107 | there are some discontinuities. Many of them indicate that sv_any |
108 | points to a struct of a similar name with an X on the front. They can |
109 | be classified like this: |
110 | |
111 | SVt_NULL |
112 | The sv_any doesn't point to anything meaningful. |
113 | |
114 | SVt_REF |
115 | The sv_any points to another SV. (This is what we're talking |
2304df62 |
116 | about changing to work more like IV and NV below.) [And that's what |
117 | I did.] |
ed6116ce |
118 | |
119 | SVt_IV |
120 | SVt_NV |
121 | These are a little tricky in order to be efficient in both |
122 | memory and time. The sv_any pointer indicates the location of |
123 | a solitary integer(double), but not directly. The pointer is |
124 | really a pointer to an XPVIV(XPVNV), so that if there's a valid |
125 | integer(double) the same code works regardless of the type of |
126 | the SV. They have special allocators that guarantee that, even |
127 | though sv_any is pointing to a location several words earlier |
128 | than the integer(double), it never points to unallocated |
129 | memory. This does waste a few allocated integers(doubles) at |
130 | the beginning, but it's probably an overall win. |
131 | |
2304df62 |
132 | [SVt_RV probably belongs here.] |
ed6116ce |
133 | SVt_PV |
134 | SVt_PVIV |
135 | SVt_PVNV |
136 | SVt_PVMG |
137 | These are pretty ordinary, and each is "derived" from the |
138 | previous in the sense that it just adds more data to the |
139 | previous structure. |
2304df62 |
140 | [ Need to add this: |
141 | struct xrv { |
142 | SV * xrv_rv; /* pointer to another SV */ |
143 | }; |
144 | |
145 | A reference value. In the following structs its space is reserved |
146 | as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to |
147 | another SV, not a string. |
148 | ] |
ed6116ce |
149 | |
150 | struct xpv { |
151 | char * xpv_pv; /* pointer to malloced string */ |
152 | STRLEN xpv_cur; /* length of xpv_pv as a C string */ |
153 | STRLEN xpv_len; /* allocated size */ |
154 | }; |
155 | |
156 | This is your basic string scalar that is never used numerically |
157 | or magically. |
158 | |
159 | struct xpviv { |
160 | char * xpv_pv; /* pointer to malloced string */ |
161 | STRLEN xpv_cur; /* length of xpv_pv as a C string */ |
162 | STRLEN xpv_len; /* allocated size */ |
163 | I32 xiv_iv; /* integer value or pv offset */ |
164 | }; |
165 | |
166 | This is a string scalar that has either been used as an |
167 | integer, or an integer that has been used in a string |
168 | context, or has had the front trimmed off of it, in which |
169 | case xiv_iv contains how far xpv_pv has been incremented |
170 | from the original allocated value. |
171 | |
172 | struct xpvnv { |
173 | char * xpv_pv; /* pointer to malloced string */ |
174 | STRLEN xpv_cur; /* length of xpv_pv as a C string */ |
175 | STRLEN xpv_len; /* allocated size */ |
176 | I32 xiv_iv; /* integer value or pv offset */ |
177 | double xnv_nv; /* numeric value, if any */ |
178 | }; |
179 | |
180 | This is a string or integer scalar that has been used in a |
181 | numeric context, or a number that has been used in a string |
182 | or integer context. |
183 | |
184 | struct xpvmg { |
185 | char * xpv_pv; /* pointer to malloced string */ |
186 | STRLEN xpv_cur; /* length of xpv_pv as a C string */ |
187 | STRLEN xpv_len; /* allocated size */ |
188 | I32 xiv_iv; /* integer value or pv offset */ |
189 | double xnv_nv; /* numeric value, if any */ |
190 | MAGIC* xmg_magic; /* linked list of magicalness */ |
191 | HV* xmg_stash; /* class package */ |
192 | }; |
193 | |
194 | This is the top of the line for ordinary scalars. This scalar |
195 | has been charmed with one or more kinds of magical or object |
196 | behavior. In addition it can contain any or all of integer, |
197 | double or string. |
198 | |
199 | SVt_PVLV |
200 | SVt_PVAV |
201 | SVt_PVHV |
202 | SVt_PVCV |
203 | SVt_PVGV |
204 | SVt_PVBM |
205 | SVt_PVFM |
206 | These are specialized forms that are never directly visible to |
207 | the Perl script. They are independent of each other, and may |
208 | not be promoted to any other type. |
2304df62 |
209 | [Actually, PVBM doesn't belong here, but in the previous section. |
210 | saying index($foo,$bar) will in fact turn $bar into a PVBM so that |
211 | it can do Boyer-Moore searching.] |
ed6116ce |
212 | |
213 | There are several additional data values in the SV structure. The sv_refcnt |
214 | gives the number of references to this SV. Some of these references may be |
215 | actual Perl language references, but many other are just internal pointers, |
216 | from a symbol table, or from the syntax tree, for example. When sv_refcnt |
2304df62 |
217 | goes to zero, the value can be safely deallocated. Must be, in fact. |
ed6116ce |
218 | |
219 | The sv_storage byte is not very well thought out, but tends to indicate |
220 | something about where the scalar lives. It's used in allocating |
221 | lexical storage, and at runtime contains an 'O' if the value has been |
222 | blessed as an object. There may be some conflicts lurking in here, and |
2304df62 |
223 | I may eventually claim some of the bits for other purposes. [I did, |
224 | with a vengeance.] |
ed6116ce |
225 | |
226 | The sv_flags are currently as follows. Most of these are set and cleared |
227 | by macros to guarantee their consistency, and you should always use the |
228 | proper macro rather than accessing them directly. |
229 | |
2304df62 |
230 | [Most of these numbers have changed, and there are some new flags. |
231 | And they're all stuffed into a single U32.] |
232 | |
ed6116ce |
233 | #define SVf_IOK 1 /* has valid integer value */ |
234 | #define SVf_NOK 2 /* has valid numeric value */ |
235 | #define SVf_POK 4 /* has valid pointer value */ |
236 | These tell whether an integer, double or string value is |
237 | immediately available without further consideration. All tainting |
238 | and magic (but not objecthood) works by turning off these bits and |
239 | forcing a routine to be executed to discover the real value. The |
240 | SvIV(), SvNV() and SvPV() macros that fetch values are smart about |
241 | all this, and should always be used if possible. Most of the stuff |
242 | mentioned below you really don't have to deal with directly. (Values |
243 | aren't stored using macros, but using functions sv_setiv(), sv_setnv() |
244 | and sv_setpv(), plus variants. You should never have to explicitly |
245 | follow the sv_any pointer to any X structure in your code.) |
246 | |
247 | #define SVf_OOK 8 /* has valid offset value */ |
248 | This is only on when SVf_IOK is off, and indicates that the unused |
249 | integer storage is holding an offset for the string pointer value |
250 | because you've done something like s/^prefix//. |
251 | |
252 | #define SVf_MAGICAL 16 /* has special methods */ |
253 | This indicates not only that sv_type is at least SVt_PVMG, but |
254 | also that the linked list of magical behaviors is not empty. |
255 | |
256 | #define SVf_OK 32 /* has defined value */ |
257 | This indicates that the value is defined. Currently it means either |
258 | that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK |
259 | is set. |
260 | |
261 | #define SVf_TEMP 64 /* eventually in sv_private? */ |
262 | This indicates that the string is a temporary allocated by one of |
263 | the sv_mortal functions, and that any string value may be stolen |
264 | from it without copying. (It's important not to steal the value if |
265 | the temporary will continue to require the value, however.) |
266 | |
267 | #define SVf_READONLY 128 /* may not be modified */ |
268 | This scalar value may not be modified. Any function that might modify |
269 | a scalar should check for this first, and reject the operation when |
270 | inappropriate. Currently only the builtin values for sv_undef, sv_yes |
271 | and sv_no are marked readonly, but eventually we may provide a language |
272 | to set this bit. |
273 | |
274 | The sv_private byte contains some additional bits that apply across the |
275 | board. Really private bits (that depend on the type) are allocated from |
276 | 128 down. |
277 | |
278 | #define SVp_IOK 1 /* has valid non-public integer value */ |
279 | #define SVp_NOK 2 /* has valid non-public numeric value */ |
280 | #define SVp_POK 4 /* has valid non-public pointer value */ |
281 | These shadow the bits in sv_flags for tainted variables, indicated that |
282 | there really is a valid value available, but you have to set the global |
283 | tainted flag if you acces them. |
284 | |
285 | #define SVp_SCREAM 8 /* has been studied? */ |
286 | Indicates that a study was done on this string. A studied string is |
287 | magical and automatically unstudies itself when modified. |
288 | |
289 | #define SVp_TAINTEDDIR 16 /* PATH component is a security risk */ |
290 | A special flag for $ENV{PATH} that indicates that, while the value |
291 | as a whole may be untainted, some path component names an insecure |
292 | directory. |
293 | |
294 | #define SVpfm_COMPILED 128 |
295 | For a format, whether its picture has been "compiled" yet. This |
296 | cannot be done until runtime because the user has access to the |
297 | internal formline function, and may supply a variable as the |
298 | picture. |
299 | |
300 | #define SVpbm_VALID 128 |
301 | #define SVpbm_CASEFOLD 64 |
302 | #define SVpbm_TAIL 32 |
303 | For a Boyer-Moore pattern, whether the search string has been invalidated |
304 | by modification (can happen to $pat between calls to index($string,$pat)), |
305 | whether case folding is in force for regexp matching, and whether we're |
306 | trying to match something like /foo$/. |
307 | |
308 | #define SVpgv_MULTI 128 |
309 | For a symbol table entry, set when we've decided that this symbol is |
310 | probably not a typo. Suspected typos can be reported by -w. |
311 | |
312 | |
313 | Well, that's probably enough for now. As you can see, we could turn |
314 | references into something more like an integer or a pointer value. In |
315 | fact, I suspect the right thing to do is say that a reference is just |
316 | a funny type of string pointer that isn't allocated the same way. |
317 | This would let us not only have references to scalars, but might provide |
318 | a way to have scalars that point to non-malloced memory. Hmm. I'll |
319 | have to think about that s'more. You can think about it too. |
320 | |
321 | Larry |