=head1 DESCRIPTION
-This document attempts to describe how to use the Perl API, as well as containing
-some info on the basic workings of the Perl core. It is far from complete
-and probably contains many errors. Please refer any questions or
-comments to the author below.
+This document attempts to describe how to use the Perl API, as well as
+containing some info on the basic workings of the Perl core. It is far
+from complete and probably contains many errors. Please refer any
+questions or comments to the author below.
=head1 Variables
important. Note that this function requires you to specify the length of
the format.
+STRLEN is an integer type (Size_t, usually defined as size_t in
+config.h) guaranteed to be large enough to represent the size of
+any string that perl can handle.
+
The C<sv_set*()> functions are not generic enough to operate on values
that have "magic". See L<Magic Virtual Tables> later in this document.
To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
call is not necessary (see L<Reference Counts and Mortality>).
+=head2 Offsets
+
+Perl provides the function C<sv_chop> to efficiently remove characters
+from the beginning of a string; you give it an SV and a pointer to
+somewhere inside the the PV, and it discards everything before the
+pointer. The efficiency comes by means of a little hack: instead of
+actually removing the characters, C<sv_chop> sets the flag C<OOK>
+(offset OK) to signal to other functions that the offset hack is in
+effect, and it puts the number of bytes chopped off into the IV field
+of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
+many bytes, and adjusts C<SvCUR> and C<SvLEN>.
+
+Hence, at this point, the start of the buffer that we allocated lives
+at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
+into the middle of this allocated storage.
+
+This is best demonstrated by example:
+
+ % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
+ SV = PVIV(0x8128450) at 0x81340f0
+ REFCNT = 1
+ FLAGS = (POK,OOK,pPOK)
+ IV = 1 (OFFSET)
+ PV = 0x8135781 ( "1" . ) "2345"\0
+ CUR = 4
+ LEN = 5
+
+Here the number of bytes chopped off (1) is put into IV, and
+C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
+portion of the string between the "real" and the "fake" beginnings is
+shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
+the fake beginning, not the real one.
+
+Something similar to the offset hack is perfomed on AVs to enable
+efficient shifting and splicing off the beginning of the array; while
+C<AvARRAY> points to the first element in the array that is visible from
+Perl, C<AvALLOC> points to the real start of the C array. These are
+usually the same, but a C<shift> operation can be carried out by
+increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
+Again, the location of the real start of the C array only comes into
+play when freeing the array. See C<av_shift> in F<av.c>.
+
=head2 What's Really Stored in an SV?
Recall that the usual method of determining the type of scalar you have is
SV* newSVrv(SV* rv, const char* classname);
-Copies integer or double into an SV whose reference is C<rv>. SV is blessed
+Copies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed
if C<classname> is non-null.
SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
+ SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
Copies the pointer value (I<the address, not the string!>) into an SV whose
Inside such a I<pseudo-block> the following service is available:
-=over
+=over 4
=item C<SAVEINT(int i)>
=item C<SAVEFREESV(SV *sv)>
The refcount of C<sv> would be decremented at the end of
-I<pseudo-block>. This is similar to C<sv_2mortal>, which should (?) be
-used instead.
+I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
+mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
+extends the lifetime of C<sv> until the beginning of the next statement,
+C<SAVEFREESV> extends it until the end of the enclosing scope. These
+lifetimes can be wildly different.
+
+Also compare C<SAVEMORTALIZESV>.
+
+=item C<SAVEMORTALIZESV(SV *sv)>
+
+Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
+scope instead of decrementing its reference count. This usually has the
+effect of keeping C<sv> alive until the statement that called the currently
+live scope has finished executing.
=item C<SAVEFREEOP(OP *op)>
or Perlish C<GV *>s). Where the above macros take C<int>, a similar
function takes C<int *>.
-=over
+=over 4
=item C<SV* save_scalar(GV *gv)>
These macros automatically adjust the stack for you, if needed. Thus, you
do not need to call C<EXTEND> to extend the stack.
+However, see L</Putting a C value on Perl stack>
For more information, consult L<perlxs> and L<perlxstut>.
instances of the size of the C<type> data structure (using the C<sizeof>
function).
-Here is a handy table of equivalents between ordinary C and Perl's
-memory abstraction layer:
-
- Instead Of: Use:
-
- malloc New
- calloc Newz
- realloc Renew
- memcopy Copy
- memmove Move
- free Safefree
- strdup savepv
- strndup savepvn (Hey, strndup doesn't exist!)
- memcpy/*(struct foo *) StructCopy
-
=head2 PerlIO
The most recent development releases of Perl has been experimenting with
directly used in some opcodes, as well as indirectly in zillions of
others, which use it via C<(X)PUSH[pni]>.
+Because the target is reused, you must be careful when pushing multiple
+values on the stack. The following code will not do what you think:
+
+ XPUSHi(10);
+ XPUSHi(20);
+
+This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
+the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
+At the end of the operation, the stack does not contain the values 10
+and 20, but actually contains two pointers to C<TARG>, which we have set
+to 20. If you need to push multiple different values, use C<XPUSHs>,
+which bypasses C<TARG>.
+
+On a related note, if you do use C<(X)PUSH[npi]>, then you're going to
+need a C<dTARG> in your variable declarations so that the C<*PUSH*>
+macros can make use of the local variable C<TARG>.
+
=head2 Scratchpads
The question remains on when the SVs which are I<target>s for opcodes
4 5 6> (node C<6> is not included into above listing), i.e.,
C<gvsv gvsv add whatever>.
+Each of these nodes represents an op, a fundamental operation inside the
+Perl core. The code which implements each operation can be found in the
+F<pp*.c> files; the function which implements the op with type C<gvsv>
+is C<pp_gvsv>, and so on. As the tree above shows, different ops have
+different numbers of children: C<add> is a binary operator, as one would
+expect, and so has two children. To accommodate the various different
+numbers of children, there are various types of op data structure, and
+they link together in different ways.
+
+The simplest type of op structure is C<OP>: this has no children. Unary
+operators, C<UNOP>s, have one child, and this is pointed to by the
+C<op_first> field. Binary operators (C<BINOP>s) have not only an
+C<op_first> field but also an C<op_last> field. The most complex type of
+op is a C<LISTOP>, which has any number of children. In this case, the
+first child is pointed to by C<op_first> and the last child by
+C<op_last>. The children in between can be found by iteratively
+following the C<op_sibling> pointer from the first child to the last.
+
+There are also two other op types: a C<PMOP> holds a regular expression,
+and has no children, and a C<LOOP> may or may not have children. If the
+C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
+complicate matters, if a C<UNOP> is actually a C<null> op after
+optimization (see L</Compile pass 2: context propagation>) it will still
+have children in accordance with its former type.
+
=head2 Compile pass 1: check routines
The tree is created by the compiler while I<yacc> code feeds it
done in the subroutine peep(). Optimizations performed at this stage
are subject to the same restrictions as in the pass 2.
+=head1 Examining internal data structures with the C<dump> functions
+
+To aid debugging, the source file F<dump.c> contains a number of
+functions which produce formatted output of internal data structures.
+
+The most commonly used of these functions is C<Perl_sv_dump>; it's used
+for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
+C<sv_dump> to produce debugging output from Perl-space, so users of that
+module should already be familiar with its format.
+
+C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
+derivatives, and produces output similiar to C<perl -Dx>; in fact,
+C<Perl_dump_eval> will dump the main root of the code being evaluated,
+exactly like C<-Dx>.
+
+Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
+op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
+subroutines in a package like so: (Thankfully, these are all xsubs, so
+there is no op tree)
+
+ (gdb) print Perl_dump_packsubs(PL_defstash)
+
+ SUB attributes::bootstrap = (xsub 0x811fedc 0)
+
+ SUB UNIVERSAL::can = (xsub 0x811f50c 0)
+
+ SUB UNIVERSAL::isa = (xsub 0x811f304 0)
+
+ SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
+
+ SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
+
+and C<Perl_dump_all>, which dumps all the subroutines in the stash and
+the op tree of the main root.
+
=head1 How multiple interpreters and concurrency are supported
=head2 Background and PERL_IMPLICIT_CONTEXT
Three macros control the major Perl build flavors: MULTIPLICITY,
USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure
that packages all the interpreter state, there is a similar thread-specific
-data structure under USE_THREADS, and the PERL_OBJECT build has a C++
-class to maintain interpreter state. In all three cases,
+data structure under USE_THREADS, and the (now deprecated) PERL_OBJECT
+build has a C++ class to maintain interpreter state. In all three cases,
PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
support for passing in a "hidden" first argument that represents all three
data structures.
details of the interpreter's context. THX stands for "thread", "this",
or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
-or 'd' for B<d>eclaration.
+or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
+their variants.
-When Perl is built without PERL_IMPLICIT_CONTEXT, there is no first
-argument containing the interpreter's context. The trailing underscore
+When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
+first argument containing the interpreter's context. The trailing underscore
in the pTHX_ macro indicates that the macro expansion needs a comma
after the context argument because other arguments follow it. If
PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
explicit arguments.
When a core function calls another, it must pass the context. This
-is normally hidden via macros. Consider C<sv_setsv>. It expands
+is normally hidden via macros. Consider C<sv_setsv>. It expands into
something like this:
ifdef PERL_IMPLICIT_CONTEXT
- define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
+ define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
/* can't do this for vararg functions, see below */
else
- define sv_setsv Perl_sv_setsv
+ define sv_setsv Perl_sv_setsv
endif
This works well, and means that XS authors can gleefully write:
# see objXSUB.h
Under PERL_OBJECT in extensions (aka PERL_CAPI), or under
-MULTIPLICITY/USE_THREADS w/ PERL_IMPLICIT_CONTEXT in both core
-and extensions, it will be:
+MULTIPLICITY/USE_THREADS with PERL_IMPLICIT_CONTEXT in both core
+and extensions, it will become:
Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API"
# for all build flavors
Those are strictly for use within the core. Extensions and embedders
need only be aware of [pad]THX.
+=head2 So what happened to dTHR?
+
+C<dTHR> was introduced in perl 5.005 to support the older thread model.
+The older thread model now uses the C<THX> mechanism to pass context
+pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
+later still have it for backward source compatibility, but it is defined
+to be a no-op.
+
=head2 How do I use all this in extensions?
When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
The second, more efficient way is to use the following template for
your Foo.xs:
- #define PERL_NO_GET_CONTEXT /* we want efficiency */
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
+ #define PERL_NO_GET_CONTEXT /* we want efficiency */
+ #include "EXTERN.h"
+ #include "perl.h"
+ #include "XSUB.h"
static my_private_function(int arg1, int arg2);
- static SV *
- my_private_function(int arg1, int arg2)
- {
- dTHX; /* fetch context */
- ... call many Perl API functions ...
- }
+ static SV *
+ my_private_function(int arg1, int arg2)
+ {
+ dTHX; /* fetch context */
+ ... call many Perl API functions ...
+ }
[... etc ...]
- MODULE = Foo PACKAGE = Foo
+ MODULE = Foo PACKAGE = Foo
- /* typical XSUB */
+ /* typical XSUB */
- void
- my_xsub(arg)
- int arg
- CODE:
- my_private_function(arg, 10);
+ void
+ my_xsub(arg)
+ int arg
+ CODE:
+ my_private_function(arg, 10);
Note that the only two changes from the normal way of writing an
extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
the Perl guts:
- #define PERL_NO_GET_CONTEXT /* we want efficiency */
- #include "EXTERN.h"
- #include "perl.h"
- #include "XSUB.h"
+ #define PERL_NO_GET_CONTEXT /* we want efficiency */
+ #include "EXTERN.h"
+ #include "perl.h"
+ #include "XSUB.h"
/* pTHX_ only needed for functions that call Perl API */
static my_private_function(pTHX_ int arg1, int arg2);
- static SV *
- my_private_function(pTHX_ int arg1, int arg2)
- {
- /* dTHX; not needed here, because THX is an argument */
- ... call Perl API functions ...
- }
+ static SV *
+ my_private_function(pTHX_ int arg1, int arg2)
+ {
+ /* dTHX; not needed here, because THX is an argument */
+ ... call Perl API functions ...
+ }
[... etc ...]
- MODULE = Foo PACKAGE = Foo
+ MODULE = Foo PACKAGE = Foo
- /* typical XSUB */
+ /* typical XSUB */
- void
- my_xsub(arg)
- int arg
- CODE:
- my_private_function(aTHX_ arg, 10);
+ void
+ my_xsub(arg)
+ int arg
+ CODE:
+ my_private_function(aTHX_ arg, 10);
This implementation never has to fetch the context using a function
call, since it is always passed as an extra argument. Depending on
macro with the underscore for functions that take explicit arguments,
or the form without the argument for functions with no explicit arguments.
+=head2 Should I do anything special if I call perl from multiple threads?
+
+If you create interpreters in one thread and then proceed to call them in
+another, you need to make sure perl's own Thread Local Storage (TLS) slot is
+initialized correctly in each of those threads.
+
+The C<perl_alloc> and C<perl_clone> API functions will automatically set
+the TLS slot to the interpreter they created, so that there is no need to do
+anything special if the interpreter is always accessed in the same thread that
+created it, and that thread did not create or call any other interpreters
+afterwards. If that is not the case, you have to set the TLS slot of the
+thread before calling any functions in the Perl API on that particular
+interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
+thread as the first thing you do:
+
+ /* do this before doing anything else with some_perl */
+ PERL_SET_CONTEXT(some_perl);
+
+ ... other Perl API calls on some_perl go here ...
+
=head2 Future Plans and PERL_IMPLICIT_SYS
Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
that the interpreter knows about itself and pass it around, so too are
there plans to allow the interpreter to bundle up everything it knows
about the environment it's running on. This is enabled with the
-PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT,
-but is mostly there for MULTIPLICITY and USE_THREADS (see inside
-iperlsys.h).
+PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT
+and USE_THREADS on Windows (see inside iperlsys.h).
This allows the ability to provide an extra pointer (called the "host"
environment) for all the system calls. This makes it possible for
=item s
-This is a static function and is defined as C<S_whatever>.
+This is a static function and is defined as C<S_whatever>, and usually
+called within the sources as C<whatever(...)>.
=item n
Afprd |void |croak |const char* pat|...
-=item m
+=item M
This function is part of the experimental development API, and may change
or disappear without notice.
formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
following macros for portability
- IVdf IV in decimal
- UVuf UV in decimal
- UVof UV in octal
- UVxf UV in hexadecimal
- NVef NV %e-like
- NVff NV %f-like
- NVgf NV %g-like
+ IVdf IV in decimal
+ UVuf UV in decimal
+ UVof UV in octal
+ UVxf UV in hexadecimal
+ NVef NV %e-like
+ NVff NV %f-like
+ NVgf NV %g-like
These will take care of 64-bit integers and long doubles.
For example:
- printf("IV is %"IVdf"\n", iv);
+ printf("IV is %"IVdf"\n", iv);
The IVdf will expand to whatever is the correct format for the IVs.
+If you are printing addresses of pointers, use UVxf combined
+with PTR2UV(), do not use %lx or %p.
+
+=head2 Pointer-To-Integer and Integer-To-Pointer
+
+Because pointer size does not necessarily equal integer size,
+use the follow macros to do it right.
+
+ PTR2UV(pointer)
+ PTR2IV(pointer)
+ PTR2NV(pointer)
+ INT2PTR(pointertotype, integer)
+
+For example:
+
+ IV iv = ...;
+ SV *sv = INT2PTR(SV*, iv);
+
+and
+
+ AV *av = ...;
+ UV uv = PTR2UV(av);
+
=head2 Source Documentation
There's an effort going on to document the internal functions and
possibly think of and more. There are several ways of representing these
characters, and the one Perl uses is called UTF8. UTF8 uses
a variable number of bytes to represent a character, instead of just
-one. You can learn more about Unicode at
-L<http://www.unicode.org/|http://www.unicode.org/>
+one. You can learn more about Unicode at http://www.unicode.org/
=head2 How can I recognise a UTF8 string?
As mentioned above, UTF8 uses a variable number of bytes to store a
character. Characters with values 1...128 are stored in one byte, just
like good ol' ASCII. Character 129 is stored as C<v194.129>; this
-contines up to character 191, which is C<v194.191>. Now we've run out of
+continues up to character 191, which is C<v194.191>. Now we've run out of
bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
so it goes on, moving to three bytes at character 2048.
sv_utf8_upgrade(left);
If you do this in a binary operator, you will actually change one of the
-strings that came into the operator, and, while it shouldn't be noticable
+strings that came into the operator, and, while it shouldn't be noticeable
by the end user, it can cause problems.
Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
string argument. This is useful for having the data available for
-comparisons and so on, without harming the orginal SV. There's also
+comparisons and so on, without harming the original SV. There's also
C<utf8_to_bytes> to go the other way, but naturally, this will fail if
the string contains any characters above 255 that can't be represented
in a single byte.