That code is some of the oldest Perl 5 code, and I didn't see some things then that I do now. [I did relax that.] Ok, let me explain some things about how values are stored. Consider this a little design document. Internally everything is unified to look like a scalar, regardless of its type. There's a type-invariant part of every value, and a type-variant part. When we modify the type of a value, we can do it in place because all references point to the invariant part. All we do is swap the variant part for a different part and change that ANY pointer in the invariant part to point to the new variant. The invariant part looks like this: struct sv { void* sv_any; /* pointer to something */ U32 sv_refcnt; /* how many references to us */ SVTYPE sv_type; /* what sort of thing pointer points to */ U8 sv_flags; /* extra flags, some depending on type */ U8 sv_storage; /* storage class */ U8 sv_private; /* extra value, depending on type */ }; [The last 4 bytes have been combined into a single U32.] This is typedefed to SV. There are other structurally equivalent types, AV, HV and CV, that are there merely to help gdb know what kind of pointer sv_any is, and provide a little bit of C type-checking. Here's a key to Perl naming: SV scalar value AV array value HV hash value CV code value Additionally I often use names containing IV integer value NV numeric value (double) PV pointer value RV reference value LV lvalue, such as a substr() or vec() being assigned to BM a string containing a Boyer-Moore compiled pattern FM a format line program You'll notice that in SV there's an sv_type field. This contains one of the following values, which gives the interpretation of sv_any. typedef enum { SVt_NULL, SVt_REF, SVt_IV, SVt_NV, SVt_PV, SVt_PVIV, SVt_PVNV, SVt_PVMG, SVt_PVLV, SVt_PVAV, SVt_PVHV, SVt_PVCV, SVt_PVGV, SVt_PVBM, SVt_PVFM, } svtype; [There is no longer a REF type. There's an RV type that holds a minimal ref value but other types can also hold an RV. This was to allow magical refs.] These are arranged ROUGHLY in order of increasing complexity, though there are some discontinuities. Many of them indicate that sv_any points to a struct of a similar name with an X on the front. They can be classified like this: SVt_NULL The sv_any doesn't point to anything meaningful. SVt_REF The sv_any points to another SV. (This is what we're talking about changing to work more like IV and NV below.) [And that's what I did.] SVt_IV SVt_NV These are a little tricky in order to be efficient in both memory and time. The sv_any pointer indicates the location of a solitary integer(double), but not directly. The pointer is really a pointer to an XPVIV(XPVNV), so that if there's a valid integer(double) the same code works regardless of the type of the SV. They have special allocators that guarantee that, even though sv_any is pointing to a location several words earlier than the integer(double), it never points to unallocated memory. This does waste a few allocated integers(doubles) at the beginning, but it's probably an overall win. [SVt_RV probably belongs here.] SVt_PV SVt_PVIV SVt_PVNV SVt_PVMG These are pretty ordinary, and each is "derived" from the previous in the sense that it just adds more data to the previous structure. [ Need to add this: struct xrv { SV * xrv_rv; /* pointer to another SV */ }; A reference value. In the following structs its space is reserved as a char* xpv_pv, but if SvROK() is true, xpv_pv is pointing to another SV, not a string. ] struct xpv { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xpv_pv as a C string */ STRLEN xpv_len; /* allocated size */ }; This is your basic string scalar that is never used numerically or magically. struct xpviv { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xpv_pv as a C string */ STRLEN xpv_len; /* allocated size */ I32 xiv_iv; /* integer value or pv offset */ }; This is a string scalar that has either been used as an integer, or an integer that has been used in a string context, or has had the front trimmed off of it, in which case xiv_iv contains how far xpv_pv has been incremented from the original allocated value. struct xpvnv { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xpv_pv as a C string */ STRLEN xpv_len; /* allocated size */ I32 xiv_iv; /* integer value or pv offset */ double xnv_nv; /* numeric value, if any */ }; This is a string or integer scalar that has been used in a numeric context, or a number that has been used in a string or integer context. struct xpvmg { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xpv_pv as a C string */ STRLEN xpv_len; /* allocated size */ I32 xiv_iv; /* integer value or pv offset */ double xnv_nv; /* numeric value, if any */ MAGIC* xmg_magic; /* linked list of magicalness */ HV* xmg_stash; /* class package */ }; This is the top of the line for ordinary scalars. This scalar has been charmed with one or more kinds of magical or object behavior. In addition it can contain any or all of integer, double or string. SVt_PVLV SVt_PVAV SVt_PVHV SVt_PVCV SVt_PVGV SVt_PVBM SVt_PVFM These are specialized forms that are never directly visible to the Perl script. They are independent of each other, and may not be promoted to any other type. [Actually, PVBM doesn't belong here, but in the previous section. saying index($foo,$bar) will in fact turn $bar into a PVBM so that it can do Boyer-Moore searching.] There are several additional data values in the SV structure. The sv_refcnt gives the number of references to this SV. Some of these references may be actual Perl language references, but many other are just internal pointers, from a symbol table, or from the syntax tree, for example. When sv_refcnt goes to zero, the value can be safely deallocated. Must be, in fact. The sv_storage byte is not very well thought out, but tends to indicate something about where the scalar lives. It's used in allocating lexical storage, and at runtime contains an 'O' if the value has been blessed as an object. There may be some conflicts lurking in here, and I may eventually claim some of the bits for other purposes. [I did, with a vengeance.] The sv_flags are currently as follows. Most of these are set and cleared by macros to guarantee their consistency, and you should always use the proper macro rather than accessing them directly. [Most of these numbers have changed, and there are some new flags. And they're all stuffed into a single U32.] #define SVf_IOK 1 /* has valid integer value */ #define SVf_NOK 2 /* has valid numeric value */ #define SVf_POK 4 /* has valid pointer value */ These tell whether an integer, double or string value is immediately available without further consideration. All tainting and magic (but not objecthood) works by turning off these bits and forcing a routine to be executed to discover the real value. The SvIV(), SvNV() and SvPV() macros that fetch values are smart about all this, and should always be used if possible. Most of the stuff mentioned below you really don't have to deal with directly. (Values aren't stored using macros, but using functions sv_setiv(), sv_setnv() and sv_setpv(), plus variants. You should never have to explicitly follow the sv_any pointer to any X structure in your code.) #define SVf_OOK 8 /* has valid offset value */ This is only on when SVf_IOK is off, and indicates that the unused integer storage is holding an offset for the string pointer value because you've done something like s/^prefix//. #define SVf_MAGICAL 16 /* has special methods */ This indicates not only that sv_type is at least SVt_PVMG, but also that the linked list of magical behaviors is not empty. #define SVf_OK 32 /* has defined value */ This indicates that the value is defined. Currently it means either that the type if SVt_REF or that one of SVf_IOK, SVf_NOK, or SVf_POK is set. #define SVf_TEMP 64 /* eventually in sv_private? */ This indicates that the string is a temporary allocated by one of the sv_mortal functions, and that any string value may be stolen from it without copying. (It's important not to steal the value if the temporary will continue to require the value, however.) #define SVf_READONLY 128 /* may not be modified */ This scalar value may not be modified. Any function that might modify a scalar should check for this first, and reject the operation when inappropriate. Currently only the builtin values for sv_undef, sv_yes and sv_no are marked readonly, but eventually we may provide a language to set this bit. The sv_private byte contains some additional bits that apply across the board. Really private bits (that depend on the type) are allocated from 128 down. #define SVp_IOK 1 /* has valid non-public integer value */ #define SVp_NOK 2 /* has valid non-public numeric value */ #define SVp_POK 4 /* has valid non-public pointer value */ These shadow the bits in sv_flags for tainted variables, indicated that there really is a valid value available, but you have to set the global tainted flag if you acces them. #define SVp_SCREAM 8 /* has been studied? */ Indicates that a study was done on this string. A studied string is magical and automatically unstudies itself when modified. #define SVp_TAINTEDDIR 16 /* PATH component is a security risk */ A special flag for $ENV{PATH} that indicates that, while the value as a whole may be untainted, some path component names an insecure directory. #define SVpfm_COMPILED 128 For a format, whether its picture has been "compiled" yet. This cannot be done until runtime because the user has access to the internal formline function, and may supply a variable as the picture. #define SVpbm_VALID 128 #define SVpbm_CASEFOLD 64 #define SVpbm_TAIL 32 For a Boyer-Moore pattern, whether the search string has been invalidated by modification (can happen to $pat between calls to index($string,$pat)), whether case folding is in force for regexp matching, and whether we're trying to match something like /foo$/. #define SVpgv_MULTI 128 For a symbol table entry, set when we've decided that this symbol is probably not a typo. Suspected typos can be reported by -w. Well, that's probably enough for now. As you can see, we could turn references into something more like an integer or a pointer value. In fact, I suspect the right thing to do is say that a reference is just a funny type of string pointer that isn't allocated the same way. This would let us not only have references to scalars, but might provide a way to have scalars that point to non-malloced memory. Hmm. I'll have to think about that s'more. You can think about it too. Larry