Perl executable. It is far from complete and probably contains many errors.
Please refer any questions or comments to the author below.
+=head1 Variables
+
=head2 Datatypes
Perl has three typedefs that handle Perl's three main data types:
line and all will be well.
To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
-call is not necessary (see the section on L<Mortality>).
+call is not necessary (see the section on L<Reference Counts and Mortality>).
=head2 What's Really Stored in an SV?
while (i--)
hash = hash * 33 + *s++;
+=head2 Hash API Extensions
+
+Beginning with version 5.004, the following functions are also supported:
+
+ HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
+ HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
+
+ bool hv_exists_ent (HV* tb, SV* key, U32 hash);
+ SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
+
+ SV* hv_iterkeysv (HE* entry);
+
+Note that these functions take C<SV*> keys, which simplifies writing
+of extension code that deals with hash structures. These functions
+also allow passing of C<SV*> keys to C<tie> functions without forcing
+you to stringify the keys (unlike the previous set of functions).
+
+They also return and accept whole hash entries (C<HE*>), making their
+use more efficient (since the hash number for a particular string
+doesn't have to be recomputed every time). See L<API LISTING> later in
+this document for detailed descriptions.
+
+The following macros must always be used to access the contents of hash
+entries. Note that the arguments to these macros must be simple
+variables, since they may get evaluated more than once. See
+L<API LISTING> later in this document for detailed descriptions of these
+macros.
+
+ HePV(HE* he, STRLEN len)
+ HeVAL(HE* he)
+ HeHASH(HE* he)
+ HeSVKEY(HE* he)
+ HeSVKEY_force(HE* he)
+ HeSVKEY_set(HE* he, SV* sv)
+
+These two lower level macros are defined, but must only be used when
+dealing with keys that are not C<SV*>s:
+
+ HeKEY(HE* he)
+ HeKLEN(HE* he)
+
+
=head2 References
References are a special type of scalar that point to other data types
SV* sv_bless(SV* sv, HV* stash);
The C<sv> argument must be a reference. The C<stash> argument specifies
-which class the reference will belong to. See the section on L<Stashes>
-for information on converting class names into stashes.
+which class the reference will belong to. See the section on
+L<Stashes and Globs> for information on converting class names into stashes.
/* Still under construction */
=head2 Stashes and Globs
-A stash is a hash table (associative array) that contains all of the
-different objects that are contained within a package. Each key of the
-stash is a symbol name (shared by all the different types of objects
-that have the same name), and each value in the hash table is called a
-GV (for Glob Value). This GV in turn contains references to the various
-objects of that name, including (but not limited to) the following:
+A "stash" is a hash that contains all of the different objects that
+are contained within a package. Each key of the stash is a symbol
+name (shared by all the different types of objects that have the same
+name), and each value in the hash table is a GV (Glob Value). This GV
+in turn contains references to the various objects of that name,
+including (but not limited to) the following:
Scalar Value
Array Value
For more information on references and blessings, consult L<perlref>.
-=head2 Magic
+=head2 Double-Typed SV's
+
+Scalar variables normally contain only one type of value, an integer,
+double, pointer, or reference. Perl will automatically convert the
+actual scalar data from the stored type into the requested type.
+
+Some scalar variables contain more than one type of scalar data. For
+example, the variable C<$!> contains either the numeric value of C<errno>
+or its string equivalent from either C<strerror> or C<sys_errlist[]>.
+
+To force multiple data values into an SV, you must do two things: use the
+C<sv_set*v> routines to add the additional scalar type, then set a flag
+so that Perl will believe it contains more than one type of data. The
+four macros to set the flags are:
+
+ SvIOK_on
+ SvNOK_on
+ SvPOK_on
+ SvROK_on
+
+The particular macro you must use depends on which C<sv_set*v> routine
+you called first. This is because every C<sv_set*v> routine turns on
+only the bit for the particular type of data being set, and turns off
+all the rest.
+
+For example, to create a new Perl variable called "dberror" that contains
+both the numeric and descriptive string error values, you could use the
+following code:
+
+ extern int dberror;
+ extern char *dberror_list;
+
+ SV* sv = perl_get_sv("dberror", TRUE);
+ sv_setiv(sv, (IV) dberror);
+ sv_setpv(sv, dberror_list[dberror]);
+ SvIOK_on(sv);
+
+If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
+macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
+
+=head2 Magic Variables
[This section still under construction. Ignore everything here. Post no
bills. Everything not permitted is forbidden.]
. vtbl_pos $. scalar variable
~ None Used by certain extensions
-When an upper-case and lower-case letter both exist in the table, then the
-upper-case letter is used to represent some kind of composite type (a list
-or a hash), and the lower-case letter is used to represent an element of
+When an uppercase and lowercase letter both exist in the table, then the
+uppercase letter is used to represent some kind of composite type (a list
+or a hash), and the lowercase letter is used to represent an element of
that composite type.
The '~' magic type is defined specifically for use by extensions and
int mg_copy(SV* sv, SV* nsv, char* key, STRLEN klen);
This routine checks to see what types of magic C<sv> has. If the mg_type
-field is an upper-case letter, then the mg_obj is copied to C<nsv>, but
-the mg_type field is changed to be the lower-case letter.
+field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
+the mg_type field is changed to be the lowercase letter.
-=head2 Double-Typed SV's
+=head1 Subroutines
-Scalar variables normally contain only one type of value, an integer,
-double, pointer, or reference. Perl will automatically convert the
-actual scalar data from the stored type into the requested type.
-
-Some scalar variables contain more than one type of scalar data. For
-example, the variable C<$!> contains either the numeric value of C<errno>
-or its string equivalent from either C<strerror> or C<sys_errlist[]>.
-
-To force multiple data values into an SV, you must do two things: use the
-C<sv_set*v> routines to add the additional scalar type, then set a flag
-so that Perl will believe it contains more than one type of data. The
-four macros to set the flags are:
-
- SvIOK_on
- SvNOK_on
- SvPOK_on
- SvROK_on
-
-The particular macro you must use depends on which C<sv_set*v> routine
-you called first. This is because every C<sv_set*v> routine turns on
-only the bit for the particular type of data being set, and turns off
-all the rest.
-
-For example, to create a new Perl variable called "dberror" that contains
-both the numeric and descriptive string error values, you could use the
-following code:
-
- extern int dberror;
- extern char *dberror_list;
-
- SV* sv = perl_get_sv("dberror", TRUE);
- sv_setiv(sv, (IV) dberror);
- sv_setpv(sv, dberror_list[dberror]);
- SvIOK_on(sv);
-
-If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
-macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
-
-=head2 XSUB's and the Argument Stack
+=head2 XSUBs and the Argument Stack
The XSUB mechanism is a simple way for Perl programs to access C subroutines.
An XSUB routine will have a stack that contains the arguments from the Perl
removing Perl's dependency on the "normal" standard I/O suite and allowing
other stdio implementations to be used. This involves creating a new
abstraction layer that then calls whichever implementation of stdio Perl
-was compiled with. All XSUB's should now use the functions in the PerlIO
+was compiled with. All XSUBs should now use the functions in the PerlIO
abstraction layer and not make any assumptions about what kind of stdio
is being used.
For a complete description of the PerlIO abstraction, consult L<perlapio>.
-=head2 Scratchpads
-
=head2 Putting a C value on Perl stack
A lot of opcodes (this is an elementary operation in the internal perl
reuse specially assigned SVs (I<target>s) which are (as a corollary)
not constantly freed/created.
-Each of the targets is created only once (but see
+Each of the targets is created only once (but see
L<Scratchpads and recursion> below), and when an opcode needs to put
an integer, a double, or a string on stack, it just sets the
corresponding parts of its I<target> and puts the I<target> on stack.
OP's in the compile tree of the unit can use the same target, if this
would not conflict with the expected life of the temporary.
-=head2 Scratchpads and recursions
+=head2 Scratchpads and recursion
In fact it is not 100% true that a compiled unit contains a pointer to
the scratchpad AV. In fact it contains a pointer to an AV of
The I<target>s on this scratchpad are C<undef>s, but they are already
marked with correct flags.
-=head2 API LISTING
+=head1 Compiled code
+
+=head2 Code tree
+
+Here we describe the internal form your code is converted to by
+Perl. Start with a simple example:
+
+ $a = $b + $c;
+
+This is converted to a tree similar to this one:
+
+ assign-to
+ / \
+ + $a
+ / \
+ $b $c
+
+(but slightly more complicated). This tree reflect the way Perl
+parsed your code, but has nothing to do with the execution order.
+There is an additional "thread" going through the nodes of the tree
+which shows the order of execution of the nodes. In our simplified
+example above it looks like:
+
+ $b ---> $c ---> + ---> $a ---> assign-to
+
+But with the actual compile tree for C<$a = $b + $c> it is different:
+some nodes I<optimized away>. As a corollary, though the actual tree
+contains more nodes than our simplified example, the execution order
+is the same as in our example.
+
+=head2 Examining the tree
+
+If you have your perl compiled for debugging (usually done with C<-D
+optimize=-g> on C<Configure> command line), you may examine the
+compiled tree by specifying C<-Dx> on the Perl command line. The
+output takes several lines per node, and for C<$b+$c> it looks like
+this:
+
+ 5 TYPE = add ===> 6
+ TARG = 1
+ FLAGS = (SCALAR,KIDS)
+ {
+ TYPE = null ===> (4)
+ (was rv2sv)
+ FLAGS = (SCALAR,KIDS)
+ {
+ 3 TYPE = gvsv ===> 4
+ FLAGS = (SCALAR)
+ GV = main::b
+ }
+ }
+ {
+ TYPE = null ===> (5)
+ (was rv2sv)
+ FLAGS = (SCALAR,KIDS)
+ {
+ 4 TYPE = gvsv ===> 5
+ FLAGS = (SCALAR)
+ GV = main::c
+ }
+ }
+
+This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
+not optimized away (one per number in the left column). The immediate
+children of the given node correspond to C<{}> pairs on the same level
+of indentation, thus this listing corresponds to the tree:
+
+ add
+ / \
+ null null
+ | |
+ gvsv gvsv
+
+The execution order is indicated by C<===E<gt>> marks, thus it is C<3
+4 5 6> (node C<6> is not included into above listing), i.e.,
+C<gvsv gvsv add whatever>.
+
+=head2 Compile pass 1: check routines
+
+The tree is created by the I<pseudo-compiler> while yacc code feeds it
+the constructions it recognizes. Since yacc works bottom-up, so does
+the first pass of perl compilation.
+
+What makes this pass interesting for perl developers is that some
+optimization may be performed on this pass. This is optimization by
+so-called I<check routines>. The correspondence between node names
+and corresponding check routines is described in F<opcode.pl> (do not
+forget to run C<make regen_headers> if you modify this file).
+
+A check routine is called when the node is fully constructed except
+for the execution-order thread. Since at this time there is no
+back-links to the currently constructed node, one can do most any
+operation to the top-level node, including freeing it and/or creating
+new nodes above/below it.
+
+The check routine returns the node which should be inserted into the
+tree (if the top-level node was not modified, check routine returns
+its argument).
+
+By convention, check routines have names C<ck_*>. They are usually
+called from C<new*OP> subroutines (or C<convert>) (which in turn are
+called from F<perly.y>).
+
+=head2 Compile pass 1a: constant folding
+
+Immediately after the check routine is called the returned node is
+checked for being compile-time executable. If it is (the value is
+judged to be constant) it is immediately executed, and a I<constant>
+node with the "return value" of the corresponding subtree is
+substituted instead. The subtree is deleted.
+
+If constant folding was not performed, the execution-order thread is
+created.
+
+=head2 Compile pass 2: context propagation
+
+When a context for a part of compile tree is known, it is propagated
+down through the tree. Aat this time the context can have 5 values
+(instead of 2 for runtime context): void, boolean, scalar, list, and
+lvalue. In contrast with the pass 1 this pass is processed from top
+to bottom: a node's context determines the context for its children.
+
+Additional context-dependent optimizations are performed at this time.
+Since at this moment the compile tree contains back-references (via
+"thread" pointers), nodes cannot be free()d now. To allow
+optimized-away nodes at this stage, such nodes are null()ified instead
+of free()ing (i.e. their type is changed to OP_NULL).
+
+=head2 Compile pass 3: peephole optimization
+
+After the compile tree for a subroutine (or for an C<eval> or a file)
+is created, an additional pass over the code is performed. This pass
+is neither top-down or bottom-up, but in the execution order (with
+additional compilications for conditionals). These optimizations are
+done in the subroutine peep(). Optimizations performed at this stage
+are subject to the same restrictions as in the pass 2.
+
+=head1 API LISTING
This is a listing of functions, macros, flags, and variables that may be
useful to extension writers or that may be found while reading other
=item gv_fetchmeth
Returns the glob with the given C<name> and a defined subroutine or
-C<NULL>. The glob lives in the given C<stash>, or in the stashes accessable
-via @ISA and @<UNIVERSAL>.
+C<NULL>. The glob lives in the given C<stash>, or in the stashes
+accessable via @ISA and @<UNIVERSAL>.
+
+The argument C<level> should be either 0 or -1. If C<level==0>, as a
+side-effect creates a glob with the given C<name> in the given
+C<stash> which in the case of success contains an alias for the
+subroutine, and sets up caching info for this glob. Similarly for all
+the searched stashes.
+
+This function grants C<"SUPER"> token as a postfix of the stash name.
-As a side-effect creates a glob with the given C<name> in the given C<stash>
-which in the case of success contains an alias for the subroutine, and
-sets up caching info for this glob. Similarly for all the searched
-stashes.
+The GV returned from C<gv_fetchmeth> may be a method cache entry,
+which is not visible to Perl code. So when calling C<perl_call_sv>,
+you should not use the GV directly; instead, you should use the
+method's CV, which can be obtained from the GV with the C<GvCV> macro.
GV* gv_fetchmeth _((HV* stash, char* name, STRLEN len, I32 level));
$AUTOLOAD is already setup.
Note that if you want to keep this glob for a long time, you need to
-check for it being "AUTOLOAD", since at the later time the the call
+check for it being "AUTOLOAD", since at the later time the call
may load a different subroutine due to $AUTOLOAD changing its value.
Use the glob created via a side effect to do this.
-This function grants C<"SUPER"> token as prefix of name or postfix of
-the stash name.
+This function grants C<"SUPER"> token as a prefix of the method name.
-Has the same side-effects and as C<gv_fetchmeth()>. C<name> should be
-writable if contains C<':'> or C<'\''>.
+Has the same side-effects and as C<gv_fetchmeth> with C<level==0>.
+C<name> should be writable if contains C<':'> or C<'\''>.
+The warning against passing the GV returned by C<gv_fetchmeth> to
+C<perl_call_sv> apply equally to C<gv_fetchmethod>.
GV* gv_fetchmethod _((HV* stash, char* name));
Return the SV from the GV.
-=item he_delayfree
+=item HEf_SVKEY
-Releases a hash entry, such as while iterating though the hash, but
-delays actual freeing of key and value until the end of the current
-statement (or thereabouts) with C<sv_2mortal>. See C<hv_iternext>.
+This flag, used in the length slot of hash entries and magic
+structures, specifies the structure contains a C<SV*> pointer where a
+C<char*> pointer is to be expected. (For information only--not to be used).
- void he_delayfree _((HV* hv, HE* hent));
+=item HeHASH
-=item he_free
+Returns the computed hash (type C<U32>) stored in the hash entry.
-Releases a hash entry, such as while iterating though the hash. See
-C<hv_iternext>.
+ HeHASH(HE* he)
+
+=item HeKEY
+
+Returns the actual pointer stored in the key slot of the hash entry.
+The pointer may be either C<char*> or C<SV*>, depending on the value of
+C<HeKLEN()>. Can be assigned to. The C<HePV()> or C<HeSVKEY()> macros
+are usually preferable for finding the value of a key.
+
+ HeKEY(HE* he)
+
+=item HeKLEN
+
+If this is negative, and amounts to C<HEf_SVKEY>, it indicates the entry
+holds an C<SV*> key. Otherwise, holds the actual length of the key.
+Can be assigned to. The C<HePV()> macro is usually preferable for finding
+key lengths.
+
+ HeKLEN(HE* he)
+
+=item HePV
+
+Returns the key slot of the hash entry as a C<char*> value, doing any
+necessary dereferencing of possibly C<SV*> keys. The length of
+the string is placed in C<len> (this is a macro, so do I<not> use
+C<&len>). If you do not care about what the length of the key is,
+you may use the global variable C<na>. Remember though, that hash
+keys in perl are free to contain embedded nulls, so using C<strlen()>
+or similar is not a good way to find the length of hash keys.
+This is very similar to the C<SvPV()> macro described elsewhere in
+this document.
+
+ HePV(HE* he, STRLEN len)
- void he_free _((HV* hv, HE* hent));
+=item HeSVKEY
+
+Returns the key as an C<SV*>, or C<Nullsv> if the hash entry
+does not contain an C<SV*> key.
+
+ HeSVKEY(HE* he)
+
+=item HeSVKEY_force
+
+Returns the key as an C<SV*>. Will create and return a temporary
+mortal C<SV*> if the hash entry contains only a C<char*> key.
+
+ HeSVKEY_force(HE* he)
+
+=item HeSVKEY_set
+
+Sets the key to a given C<SV*>, taking care to set the appropriate flags
+to indicate the presence of an C<SV*> key, and returns the same C<SV*>.
+
+ HeSVKEY_set(HE* he, SV* sv)
+
+=item HeVAL
+
+Returns the value slot (type C<SV*>) stored in the hash entry.
+
+ HeVAL(HE* he)
=item hv_clear
void hv_clear _((HV* tb));
+=item hv_delayfree_ent
+
+Releases a hash entry, such as while iterating though the hash, but
+delays actual freeing of key and value until the end of the current
+statement (or thereabouts) with C<sv_2mortal>. See C<hv_iternext>
+and C<hv_free_ent>.
+
+ void hv_delayfree_ent _((HV* hv, HE* entry));
+
=item hv_delete
Deletes a key/value pair in the hash. The value SV is removed from the hash
SV* hv_delete _((HV* tb, char* key, U32 klen, I32 flags));
+=item hv_delete_ent
+
+Deletes a key/value pair in the hash. The value SV is removed from the hash
+and returned to the caller. The C<flags> value will normally be zero; if set
+to G_DISCARD then null will be returned. C<hash> can be a valid pre-computed
+hash value, or 0 to ask for it to be computed.
+
+ SV* hv_delete_ent _((HV* tb, SV* key, I32 flags, U32 hash));
+
=item hv_exists
Returns a boolean indicating whether the specified hash key exists. The
bool hv_exists _((HV* tb, char* key, U32 klen));
+=item hv_exists_ent
+
+Returns a boolean indicating whether the specified hash key exists. C<hash>
+can be a valid pre-computed hash value, or 0 to ask for it to be computed.
+
+ bool hv_exists_ent _((HV* tb, SV* key, U32 hash));
+
=item hv_fetch
Returns the SV which corresponds to the specified key in the hash. The
SV** hv_fetch _((HV* tb, char* key, U32 klen, I32 lval));
+=item hv_fetch_ent
+
+Returns the hash entry which corresponds to the specified key in the hash.
+C<hash> must be a valid pre-computed hash number for the given C<key>, or
+0 if you want the function to compute it. IF C<lval> is set then the
+fetch will be part of a store. Make sure the return value is non-null
+before accessing it. The return value when C<tb> is a tied hash
+is a pointer to a static location, so be sure to make a copy of the
+structure if you need to store it somewhere.
+
+ HE* hv_fetch_ent _((HV* tb, SV* key, I32 lval, U32 hash));
+
+=item hv_free_ent
+
+Releases a hash entry, such as while iterating though the hash. See
+C<hv_iternext> and C<hv_delayfree_ent>.
+
+ void hv_free_ent _((HV* hv, HE* entry));
+
=item hv_iterinit
Prepares a starting point to traverse a hash table.
char* hv_iterkey _((HE* entry, I32* retlen));
+=item hv_iterkeysv
+
+Returns the key as an C<SV*> from the current position of the hash
+iterator. The return value will always be a mortal copy of the
+key. Also see C<hv_iterinit>.
+
+ SV* hv_iterkeysv _((HE* entry));
+
=item hv_iternext
Returns entries from a hash iterator. See C<hv_iterinit>.
SV** hv_store _((HV* tb, char* key, U32 klen, SV* val, U32 hash));
+=item hv_store_ent
+
+Stores C<val> in a hash. The hash key is specified as C<key>. The C<hash>
+parameter is the pre-computed hash value; if it is zero then Perl will
+compute it. The return value is the new hash entry so created. It will be
+null if the operation failed or if the entry was stored in a tied hash.
+Otherwise the contents of the return value can be accessed using the
+C<He???> macros described here.
+
+ HE* hv_store_ent _((HV* tb, SV* key, SV* val, U32 hash));
+
=item hv_undef
Undefines the hash.
=item newSV
Creates a new SV. The C<len> parameter indicates the number of bytes of
-pre-allocated string space the SV should have. The reference count for the
+preallocated string space the SV should have. The reference count for the
new SV is set to 1.
SV* newSV _((STRLEN len));
=head1 EDITOR
-Jeff Okamoto <okamoto@corp.hp.com>
+Jeff Okamoto <F<okamoto@corp.hp.com>>
With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
Bowers, Matthew Green, Tim Bunce, Spider Boardman, and Ulrich Pfeifer.
-API Listing by Dean Roehrich <roehrich@cray.com>.
+API Listing by Dean Roehrich <F<roehrich@cray.com>>.
=head1 DATE
-Version 30: 1997/1/17
+Version 31.3: 1997/3/14