X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=99c79d0a437e473e2efeaac4ce8dc1b723d74988;hb=85b35914c9f3fc562f8a505e6508276be17f9d70;hp=b763dfea9aae558a6df774520e5267c36589689a;hpb=a938121844a85ea4fc6c37d2a1e11ff1930e084d;p=p5sagit%2Fp5-mst-13.2.git

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index b763dfe..99c79d0 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -201,9 +201,22 @@ you can call:
     SvOK(SV*)
 
 The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
-Its address can be used whenever an C<SV*> is needed.
-However, you have to be careful when using C<&PL_sv_undef> as a value in AVs
-or HVs (see L<AVs, HVs and undefined values>).
+
+Its address can be used whenever an C<SV*> is needed. Make sure that
+you don't try to compare a random sv with C<&PL_sv_undef>. For example
+when interfacing Perl code, it'll work correctly for:
+
+  foo(undef);
+
+But won't work when called as:
+
+  $x = undef;
+  foo($x);
+
+So to repeat always use SvOK() to check whether an sv is defined.
+
+Also you have to be careful when using C<&PL_sv_undef> as a value in
+AVs or HVs (see L<AVs, HVs and undefined values>).
 
 There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
 boolean TRUE and FALSE values, respectively.  Like C<PL_sv_undef>, their
@@ -1328,7 +1341,8 @@ Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
 
 Duplicates the current value of C<SV>, on the exit from the current
 C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
-using the stored value.
+using the stored value. It doesn't handle magic. Use C<save_scalar> if
+magic is affected.
 
 =item C<void save_list(SV **sarg, I32 maxsarg)>
 
@@ -1380,11 +1394,12 @@ and C<num> is the number of elements the stack should be extended by.
 
 Now that there is room on the stack, values can be pushed on it using C<PUSHs>
 macro. The pushed values will often need to be "mortal" (See
-L</Reference Counts and Mortality>).
+L</Reference Counts and Mortality>):
 
     PUSHs(sv_2mortal(newSViv(an_integer)))
+    PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
+    PUSHs(sv_2mortal(newSVnv(a_double)))
     PUSHs(sv_2mortal(newSVpv("Some String",0)))
-    PUSHs(sv_2mortal(newSVnv(3.141592)))
 
 And now the Perl program calling C<tzname>, the two values will be assigned
 as in:
@@ -1400,8 +1415,9 @@ This macro automatically adjust the stack for you, if needed.  Thus, you
 do not need to call C<EXTEND> to extend the stack.
 
 Despite their suggestions in earlier versions of this document the macros
-C<PUSHi>, C<PUSHn> and C<PUSHp> are I<not> suited to XSUBs which return
-multiple results, see L</Putting a C value on Perl stack>.
+C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results.
+For that, either stick to the C<(X)PUSHs> macros shown above, or use the new
+C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>.
 
 For more information, consult L<perlxs> and L<perlxstut>.
 
@@ -1535,7 +1551,7 @@ corresponding parts of its I<target> and puts the I<target> on stack.
 
 The macro to put this target on stack is C<PUSHTARG>, and it is
 directly used in some opcodes, as well as indirectly in zillions of
-others, which use it via C<(X)PUSH[pni]>.
+others, which use it via C<(X)PUSH[iunp]>.
 
 Because the target is reused, you must be careful when pushing multiple
 values on the stack. The following code will not do what you think:
@@ -1547,12 +1563,30 @@ This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
 the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
 At the end of the operation, the stack does not contain the values 10
 and 20, but actually contains two pointers to C<TARG>, which we have set
-to 20. If you need to push multiple different values, use C<XPUSHs>,
-which bypasses C<TARG>.
+to 20.
 
-On a related note, if you do use C<(X)PUSH[npi]>, then you're going to
+If you need to push multiple different values then you should either use
+the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros,
+none of which make use of C<TARG>.  The C<(X)PUSHs> macros simply push an
+SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>,
+will often need to be "mortal".  The new C<m(X)PUSH[iunp]> macros make
+this a little easier to achieve by creating a new mortal for you (via
+C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary
+in the case of the C<mXPUSH[iunp]> macros), and then setting its value.
+Thus, instead of writing this to "fix" the example above:
+
+    XPUSHs(sv_2mortal(newSViv(10)))
+    XPUSHs(sv_2mortal(newSViv(20)))
+
+you can simply write:
+
+    mXPUSHi(10)
+    mXPUSHi(20)
+
+On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to
 need a C<dTARG> in your variable declarations so that the C<*PUSH*>
-macros can make use of the local variable C<TARG>.
+macros can make use of the local variable C<TARG>.  See also C<dTARGET>
+and C<dXSTARG>.
 
 =head2 Scratchpads
 
@@ -1762,10 +1796,10 @@ are subject to the same restrictions as in the pass 2.
 =head2 Pluggable runops
 
 The compile tree is executed in a runops function.  There are two runops
-functions in F<run.c>.  C<Perl_runops_debug> is used with DEBUGGING and
-C<Perl_runops_standard> is used otherwise.  For fine control over the
-execution of the compile tree it is possible to provide your own runops
-function.
+functions, in F<run.c> and in F<dump.c>.  C<Perl_runops_debug> is used
+with DEBUGGING and C<Perl_runops_standard> is used otherwise.  For fine
+control over the execution of the compile tree it is possible to provide
+your own runops function.
 
 It's probably best to copy one of the existing runops functions and
 change it to suit your needs.  Then, in the BOOT section of your XS
@@ -2085,11 +2119,12 @@ static functions start with C<S_>.)
 Inside the Perl core, you can get at the functions either with or
 without the C<Perl_> prefix, thanks to a bunch of defines that live in
 F<embed.h>. This header file is generated automatically from
-F<embed.pl>. F<embed.pl> also creates the prototyping header files for
-the internal functions, generates the documentation and a lot of other
-bits and pieces. It's important that when you add a new function to the
-core or change an existing one, you change the data in the table at the
-end of F<embed.pl> as well. Here's a sample entry from that table:
+F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
+header files for the internal functions, generates the documentation
+and a lot of other bits and pieces. It's important that when you add
+a new function to the core or change an existing one, you change the
+data in the table in F<embed.fnc> as well. Here's a sample entry from
+that table:
 
     Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
 
@@ -2148,19 +2183,32 @@ or disappear without notice.
 This function should not have a compatibility macro to define, say,
 C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
 
-=item j
-
-This function is not a member of C<CPerlObj>. If you don't know
-what this means, don't use it.
-
 =item x
 
 This function isn't exported out of the Perl core.
 
+=item m
+
+This is implemented as a macro.
+
+=item X
+
+This function is explicitly exported.
+
+=item E
+
+This function is visible to extensions included in the Perl core.
+
+=item b
+
+Binary backward compatibility; this function is a macro but also has
+a C<Perl_> implementation (which is exported).
+
 =back
 
-If you edit F<embed.pl>, you will need to run C<make regen_headers> to
-force a rebuild of F<embed.h> and other auto-generated files.
+If you edit F<embed.pl> or F<embed.fnc>, you will need to run
+C<make regen_headers> to force a rebuild of F<embed.h> and other
+auto-generated files.
 
 =head2 Formatted Printing of IVs, UVs, and NVs
 
@@ -2255,34 +2303,34 @@ to one character.
 To fix this, some people formed Unicode, Inc. and
 produced a new character set containing all the characters you can
 possibly think of and more. There are several ways of representing these
-characters, and the one Perl uses is called UTF8. UTF8 uses
+characters, and the one Perl uses is called UTF-8. UTF-8 uses
 a variable number of bytes to represent a character, instead of just
 one. You can learn more about Unicode at http://www.unicode.org/
 
-=head2 How can I recognise a UTF8 string?
+=head2 How can I recognise a UTF-8 string?
 
-You can't. This is because UTF8 data is stored in bytes just like
-non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types)
+You can't. This is because UTF-8 data is stored in bytes just like
+non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
 capital E with a grave accent, is represented by the two bytes
 C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
 has that byte sequence as well. So you can't tell just by looking - this
 is what makes Unicode input an interesting problem.
 
 The API function C<is_utf8_string> can help; it'll tell you if a string
-contains only valid UTF8 characters. However, it can't do the work for
+contains only valid UTF-8 characters. However, it can't do the work for
 you. On a character-by-character basis, C<is_utf8_char> will tell you
-whether the current character in a string is valid UTF8.
+whether the current character in a string is valid UTF-8.
 
-=head2 How does UTF8 represent Unicode characters?
+=head2 How does UTF-8 represent Unicode characters?
 
-As mentioned above, UTF8 uses a variable number of bytes to store a
+As mentioned above, UTF-8 uses a variable number of bytes to store a
 character. Characters with values 1...128 are stored in one byte, just
 like good ol' ASCII. Character 129 is stored as C<v194.129>; this
 continues up to character 191, which is C<v194.191>. Now we've run out of
 bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
 so it goes on, moving to three bytes at character 2048.
 
-Assuming you know you're dealing with a UTF8 string, you can find out
+Assuming you know you're dealing with a UTF-8 string, you can find out
 how long the first character in it is with the C<UTF8SKIP> macro:
 
     char *utf = "\305\233\340\240\201";
@@ -2292,12 +2340,12 @@ how long the first character in it is with the C<UTF8SKIP> macro:
     utf += len;
     len = UTF8SKIP(utf); /* len is 3 here */
 
-Another way to skip over characters in a UTF8 string is to use
+Another way to skip over characters in a UTF-8 string is to use
 C<utf8_hop>, which takes a string and a number of characters to skip
 over. You're on your own about bounds checking, though, so don't use it
 lightly.
 
-All bytes in a multi-byte UTF8 character will have the high bit set,
+All bytes in a multi-byte UTF-8 character will have the high bit set,
 so you can test if you need to do something special with this
 character like this (the UTF8_IS_INVARIANT() is a macro that tests
 whether the byte can be encoded as a single byte even in UTF-8):
@@ -2306,7 +2354,7 @@ whether the byte can be encoded as a single byte even in UTF-8):
     UV uv;	/* Note: a UV, not a U8, not a char */
 
     if (!UTF8_IS_INVARIANT(*utf))
-        /* Must treat this as UTF8 */
+        /* Must treat this as UTF-8 */
         uv = utf8_to_uv(utf);
     else
         /* OK to treat this character as a byte */
@@ -2314,7 +2362,7 @@ whether the byte can be encoded as a single byte even in UTF-8):
 
 You can also see in that example that we use C<utf8_to_uv> to get the
 value of the character; the inverse function C<uv_to_utf8> is available
-for putting a UV into UTF8:
+for putting a UV into UTF-8:
 
     if (!UTF8_IS_INVARIANT(uv))
         /* Must treat this as UTF8 */
@@ -2324,14 +2372,14 @@ for putting a UV into UTF8:
         *utf8++ = uv;
 
 You B<must> convert characters to UVs using the above functions if
-you're ever in a situation where you have to match UTF8 and non-UTF8
-characters. You may not skip over UTF8 characters in this case. If you
-do this, you'll lose the ability to match hi-bit non-UTF8 characters;
-for instance, if your UTF8 string contains C<v196.172>, and you skip
-that character, you can never match a C<chr(200)> in a non-UTF8 string.
+you're ever in a situation where you have to match UTF-8 and non-UTF-8
+characters. You may not skip over UTF-8 characters in this case. If you
+do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
+for instance, if your UTF-8 string contains C<v196.172>, and you skip
+that character, you can never match a C<chr(200)> in a non-UTF-8 string.
 So don't do that!
 
-=head2 How does Perl store UTF8 strings?
+=head2 How does Perl store UTF-8 strings?
 
 Currently, Perl deals with Unicode strings and non-Unicode strings
 slightly differently. If a string has been identified as being UTF-8
@@ -2348,8 +2396,8 @@ C<length>, C<substr> and other string handling operations will have
 undesirable results.
 
 The problem comes when you have, for instance, a string that isn't
-flagged is UTF8, and contains a byte sequence that could be UTF8 -
-especially when combining non-UTF8 and UTF8 strings.
+flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
+especially when combining non-UTF-8 and UTF-8 strings.
 
 Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
 need be sure you don't accidentally knock it off while you're
@@ -2366,7 +2414,7 @@ manipulating SVs. More specifically, you cannot expect to do this:
 
 The C<char*> string does not tell you the whole story, and you can't
 copy or reconstruct an SV just by copying the string value. Check if the
-old SV has the UTF8 flag set, and act accordingly:
+old SV has the UTF-8 flag set, and act accordingly:
 
     p = SvPV(sv, len);
     frobnicate(p);
@@ -2375,17 +2423,17 @@ old SV has the UTF8 flag set, and act accordingly:
         SvUTF8_on(nsv);
 
 In fact, your C<frobnicate> function should be made aware of whether or
-not it's dealing with UTF8 data, so that it can handle the string
+not it's dealing with UTF-8 data, so that it can handle the string
 appropriately.
 
 Since just passing an SV to an XS function and copying the data of
-the SV is not enough to copy the UTF8 flags, even less right is just
+the SV is not enough to copy the UTF-8 flags, even less right is just
 passing a C<char *> to an XS function.
 
-=head2 How do I convert a string to UTF8?
+=head2 How do I convert a string to UTF-8?
 
-If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
-to upgrade one of the strings to UTF8. If you've got an SV, the easiest
+If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
+to upgrade one of the strings to UTF-8. If you've got an SV, the easiest
 way to do this is:
 
     sv_utf8_upgrade(sv);
@@ -2399,7 +2447,7 @@ If you do this in a binary operator, you will actually change one of the
 strings that came into the operator, and, while it shouldn't be noticeable
 by the end user, it can cause problems.
 
-Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
+Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
 string argument. This is useful for having the data available for
 comparisons and so on, without harming the original SV. There's also
 C<utf8_to_bytes> to go the other way, but naturally, this will fail if
@@ -2414,27 +2462,27 @@ Not really. Just remember these things:
 
 =item *
 
-There's no way to tell if a string is UTF8 or not. You can tell if an SV
-is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
-something should be UTF8. Treat the flag as part of the PV, even though
+There's no way to tell if a string is UTF-8 or not. You can tell if an SV
+is UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
+something should be UTF-8. Treat the flag as part of the PV, even though
 it's not - if you pass on the PV to somewhere, pass on the flag too.
 
 =item *
 
-If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
+If a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
 unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
 
 =item *
 
-When writing a character C<uv> to a UTF8 string, B<always> use
+When writing a character C<uv> to a UTF-8 string, B<always> use
 C<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
 you can use C<*s = uv>.
 
 =item *
 
-Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get
-a new string which is UTF8 encoded. There are tricks you can use to
-delay deciding whether you need to use a UTF8 string until you get to a
+Mixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get
+a new string which is UTF-8 encoded. There are tricks you can use to
+delay deciding whether you need to use a UTF-8 string until you get to a
 high character - C<HALF_UPGRADE> is one of those.
 
 =back