X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=83ed06860444bcd1e3b4bddd7476e32630c7017d;hb=cccede5366275457276b68bb148b7872098aaf29;hp=f9ebdae85ac4ea2980da923894e393f9b982e57a;hpb=80a5d8e74b5512d4ab704d0e83466ae41247ce55;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlguts.pod b/pod/perlguts.pod index f9ebdae..83ed068 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -29,24 +29,34 @@ Additionally, there is the UV, which is simply an unsigned IV. Perl also uses two special typedefs, I32 and I16, which will always be at least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, -as well.) +as well.) They will usually be exactly 32 and 16 bits long, but on Crays +they will both be 64 bits. =head2 Working with SVs -An SV can be created and loaded with one command. There are four types of -values that can be loaded: an integer value (IV), a double (NV), -a string (PV), and another scalar (SV). +An SV can be created and loaded with one command. There are five types of +values that can be loaded: an integer value (IV), an unsigned integer +value (UV), a double (NV), a string (PV), and another scalar (SV). -The six routines are: +The seven routines are: SV* newSViv(IV); + SV* newSVuv(UV); SV* newSVnv(double); SV* newSVpv(const char*, int); SV* newSVpvn(const char*, int); SV* newSVpvf(const char*, ...); SV* newSVsv(SV*); -To change the value of an *already-existing* SV, there are seven routines: +If you require more complex initialisation you can create an empty SV with +newSV(len). If C is 0 an empty SV of type NULL is returned, else an +SV of type PV is returned with len + 1 (for the NUL) bytes of storage +allocated, accessible via SvPVX. In both cases the SV has value undef. + + SV* newSV(0); /* no storage allocated */ + SV* newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */ + +To change the value of an *already-existing* SV, there are eight routines: void sv_setiv(SV*, IV); void sv_setuv(SV*, UV); @@ -609,8 +619,6 @@ Marks the variable as multiply defined, thus preventing the: warning. -=over - =item GV_ADDWARN Issues the warning: @@ -955,6 +963,8 @@ The current kinds of Magic Virtual Tables are: t PERL_MAGIC_taint vtbl_taint Taintedness U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions v PERL_MAGIC_vec vtbl_vec vec() lvalue + V PERL_MAGIC_vstring (none) v-string scalars + w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache x PERL_MAGIC_substr vtbl_substr substr() lvalue y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator variable / smart parameter @@ -966,10 +976,10 @@ The current kinds of Magic Virtual Tables are: ~ PERL_MAGIC_ext (none) Available for use by extensions When an uppercase and lowercase letter both exist in the table, then the -uppercase letter is used to represent some kind of composite type (a list -or a hash), and the lowercase letter is used to represent an element of -that composite type. Some internals code makes use of this case -relationship. +uppercase letter is typically used to represent some kind of composite type +(a list or a hash), and the lowercase letter is used to represent an element +of that composite type. Some internals code makes use of this case +relationship. However, 'v' and 'V' (vec and v-string) are in no way related. The C and C magic types are defined specifically for use by extensions and will not be used by perl itself. @@ -1789,7 +1799,7 @@ A public function (i.e. part of the internal API, but not necessarily sanctioned for use in extensions) begins like this: void - Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv) + Perl_sv_setiv(pTHX_ SV* dsv, IV num) C is one of a number of macros (in perl.h) that hide the details of the interpreter's context. THX stands for "thread", "this", @@ -1808,19 +1818,19 @@ macro without the trailing underscore is used when there are no additional explicit arguments. When a core function calls another, it must pass the context. This -is normally hidden via macros. Consider C. It expands into +is normally hidden via macros. Consider C. It expands into something like this: - ifdef PERL_IMPLICIT_CONTEXT - define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b) + #ifdef PERL_IMPLICIT_CONTEXT + #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) /* can't do this for vararg functions, see below */ - else - define sv_setsv Perl_sv_setsv - endif + #else + #define sv_setiv Perl_sv_setiv + #endif This works well, and means that XS authors can gleefully write: - sv_setsv(foo, bar); + sv_setiv(foo, bar); and still have it work under all the modes Perl could have been compiled with. @@ -1864,16 +1874,16 @@ with extensions: whenever XSUB.h is #included, it redefines the aTHX and aTHX_ macros to call a function that will return the context. Thus, something like: - sv_setsv(asv, bsv); + sv_setiv(sv, num); in your extension will translate to this when PERL_IMPLICIT_CONTEXT is in effect: - Perl_sv_setsv(Perl_get_context(), asv, bsv); + Perl_sv_setiv(Perl_get_context(), sv, num); or to this otherwise: - Perl_sv_setsv(asv, bsv); + Perl_sv_setiv(sv, num); You have to do nothing new in your extension to get this; since the Perl library provides Perl_get_context(), it will all just @@ -2222,13 +2232,15 @@ C, which takes a string and a number of characters to skip over. You're on your own about bounds checking, though, so don't use it lightly. -All bytes in a multi-byte UTF8 character will have the high bit set, so -you can test if you need to do something special with this character -like this: +All bytes in a multi-byte UTF8 character will have the high bit set, +so you can test if you need to do something special with this +character like this (the UTF8_IS_INVARIANT() is a macro that tests +whether the byte can be encoded as a single byte even in UTF-8): - UV uv; + U8 *utf; + UV uv; /* Note: a UV, not a U8, not a char */ - if (utf & 0x80) + if (!UTF8_IS_INVARIANT(*utf)) /* Must treat this as UTF8 */ uv = utf8_to_uv(utf); else @@ -2239,7 +2251,7 @@ You can also see in that example that we use C to get the value of the character; the inverse function C is available for putting a UV into UTF8: - if (uv > 0x80) + if (!UTF8_IS_INVARIANT(uv)) /* Must treat this as UTF8 */ utf8 = uv_to_utf8(utf8, uv); else @@ -2301,6 +2313,10 @@ In fact, your C function should be made aware of whether or not it's dealing with UTF8 data, so that it can handle the string appropriately. +Since just passing an SV to an XS function and copying the data of +the SV is not enough to copy the UTF8 flags, even less right is just +passing a C to an XS function. + =head2 How do I convert a string to UTF8? If you're mixing UTF8 and non-UTF8 strings, you might find it necessary @@ -2341,12 +2357,13 @@ it's not - if you pass on the PV to somewhere, pass on the flag too. =item * If a string is UTF8, B use C to get at the value, -unless C in which case you can use C<*s>. +unless C in which case you can use C<*s>. =item * -When writing to a UTF8 string, B use C, unless -C in which case you can use C<*s = uv>. +When writing a character C to a UTF8 string, B use +C, unless C in which case +you can use C<*s = uv>. =item *