X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=99c79d0a437e473e2efeaac4ce8dc1b723d74988;hb=85b35914c9f3fc562f8a505e6508276be17f9d70;hp=b763dfea9aae558a6df774520e5267c36589689a;hpb=a938121844a85ea4fc6c37d2a1e11ff1930e084d;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlguts.pod b/pod/perlguts.pod index b763dfe..99c79d0 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -201,9 +201,22 @@ you can call: SvOK(SV*) The scalar C value is stored in an SV instance called C. -Its address can be used whenever an C is needed. -However, you have to be careful when using C<&PL_sv_undef> as a value in AVs -or HVs (see L). + +Its address can be used whenever an C is needed. Make sure that +you don't try to compare a random sv with C<&PL_sv_undef>. For example +when interfacing Perl code, it'll work correctly for: + + foo(undef); + +But won't work when called as: + + $x = undef; + foo($x); + +So to repeat always use SvOK() to check whether an sv is defined. + +Also you have to be careful when using C<&PL_sv_undef> as a value in +AVs or HVs (see L). There are also the two values C and C, which contain boolean TRUE and FALSE values, respectively. Like C, their @@ -1328,7 +1341,8 @@ Similar to C, but localize C<@gv> and C<%gv>. Duplicates the current value of C, on the exit from the current C/C I will restore the value of C -using the stored value. +using the stored value. It doesn't handle magic. Use C if +magic is affected. =item C @@ -1380,11 +1394,12 @@ and C is the number of elements the stack should be extended by. Now that there is room on the stack, values can be pushed on it using C macro. The pushed values will often need to be "mortal" (See -L). +L): PUSHs(sv_2mortal(newSViv(an_integer))) + PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) + PUSHs(sv_2mortal(newSVnv(a_double))) PUSHs(sv_2mortal(newSVpv("Some String",0))) - PUSHs(sv_2mortal(newSVnv(3.141592))) And now the Perl program calling C, the two values will be assigned as in: @@ -1400,8 +1415,9 @@ This macro automatically adjust the stack for you, if needed. Thus, you do not need to call C to extend the stack. Despite their suggestions in earlier versions of this document the macros -C, C and C are I suited to XSUBs which return -multiple results, see L. +C<(X)PUSH[iunp]> are I suited to XSUBs which return multiple results. +For that, either stick to the C<(X)PUSHs> macros shown above, or use the new +C macros instead; see L. For more information, consult L and L. @@ -1535,7 +1551,7 @@ corresponding parts of its I and puts the I on stack. The macro to put this target on stack is C, and it is directly used in some opcodes, as well as indirectly in zillions of -others, which use it via C<(X)PUSH[pni]>. +others, which use it via C<(X)PUSH[iunp]>. Because the target is reused, you must be careful when pushing multiple values on the stack. The following code will not do what you think: @@ -1547,12 +1563,30 @@ This translates as "set C to 10, push a pointer to C onto the stack; set C to 20, push a pointer to C onto the stack". At the end of the operation, the stack does not contain the values 10 and 20, but actually contains two pointers to C, which we have set -to 20. If you need to push multiple different values, use C, -which bypasses C. +to 20. -On a related note, if you do use C<(X)PUSH[npi]>, then you're going to +If you need to push multiple different values then you should either use +the C<(X)PUSHs> macros, or else use the new C macros, +none of which make use of C. The C<(X)PUSHs> macros simply push an +SV* on the stack, which, as noted under L, +will often need to be "mortal". The new C macros make +this a little easier to achieve by creating a new mortal for you (via +C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary +in the case of the C macros), and then setting its value. +Thus, instead of writing this to "fix" the example above: + + XPUSHs(sv_2mortal(newSViv(10))) + XPUSHs(sv_2mortal(newSViv(20))) + +you can simply write: + + mXPUSHi(10) + mXPUSHi(20) + +On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to need a C in your variable declarations so that the C<*PUSH*> -macros can make use of the local variable C. +macros can make use of the local variable C. See also C +and C. =head2 Scratchpads @@ -1762,10 +1796,10 @@ are subject to the same restrictions as in the pass 2. =head2 Pluggable runops The compile tree is executed in a runops function. There are two runops -functions in F. C is used with DEBUGGING and -C is used otherwise. For fine control over the -execution of the compile tree it is possible to provide your own runops -function. +functions, in F and in F. C is used +with DEBUGGING and C is used otherwise. For fine +control over the execution of the compile tree it is possible to provide +your own runops function. It's probably best to copy one of the existing runops functions and change it to suit your needs. Then, in the BOOT section of your XS @@ -2085,11 +2119,12 @@ static functions start with C.) Inside the Perl core, you can get at the functions either with or without the C prefix, thanks to a bunch of defines that live in F. This header file is generated automatically from -F. F also creates the prototyping header files for -the internal functions, generates the documentation and a lot of other -bits and pieces. It's important that when you add a new function to the -core or change an existing one, you change the data in the table at the -end of F as well. Here's a sample entry from that table: +F and F. F also creates the prototyping +header files for the internal functions, generates the documentation +and a lot of other bits and pieces. It's important that when you add +a new function to the core or change an existing one, you change the +data in the table in F as well. Here's a sample entry from +that table: Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval @@ -2148,19 +2183,32 @@ or disappear without notice. This function should not have a compatibility macro to define, say, C to C. It must be called as C. -=item j - -This function is not a member of C. If you don't know -what this means, don't use it. - =item x This function isn't exported out of the Perl core. +=item m + +This is implemented as a macro. + +=item X + +This function is explicitly exported. + +=item E + +This function is visible to extensions included in the Perl core. + +=item b + +Binary backward compatibility; this function is a macro but also has +a C implementation (which is exported). + =back -If you edit F, you will need to run C to -force a rebuild of F and other auto-generated files. +If you edit F or F, you will need to run +C to force a rebuild of F and other +auto-generated files. =head2 Formatted Printing of IVs, UVs, and NVs @@ -2255,34 +2303,34 @@ to one character. To fix this, some people formed Unicode, Inc. and produced a new character set containing all the characters you can possibly think of and more. There are several ways of representing these -characters, and the one Perl uses is called UTF8. UTF8 uses +characters, and the one Perl uses is called UTF-8. UTF-8 uses a variable number of bytes to represent a character, instead of just one. You can learn more about Unicode at http://www.unicode.org/ -=head2 How can I recognise a UTF8 string? +=head2 How can I recognise a UTF-8 string? -You can't. This is because UTF8 data is stored in bytes just like -non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types) +You can't. This is because UTF-8 data is stored in bytes just like +non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) capital E with a grave accent, is represented by the two bytes C. Unfortunately, the non-Unicode string C has that byte sequence as well. So you can't tell just by looking - this is what makes Unicode input an interesting problem. The API function C can help; it'll tell you if a string -contains only valid UTF8 characters. However, it can't do the work for +contains only valid UTF-8 characters. However, it can't do the work for you. On a character-by-character basis, C will tell you -whether the current character in a string is valid UTF8. +whether the current character in a string is valid UTF-8. -=head2 How does UTF8 represent Unicode characters? +=head2 How does UTF-8 represent Unicode characters? -As mentioned above, UTF8 uses a variable number of bytes to store a +As mentioned above, UTF-8 uses a variable number of bytes to store a character. Characters with values 1...128 are stored in one byte, just like good ol' ASCII. Character 129 is stored as C; this continues up to character 191, which is C. Now we've run out of bits (191 is binary C<10111111>) so we move on; 192 is C. And so it goes on, moving to three bytes at character 2048. -Assuming you know you're dealing with a UTF8 string, you can find out +Assuming you know you're dealing with a UTF-8 string, you can find out how long the first character in it is with the C macro: char *utf = "\305\233\340\240\201"; @@ -2292,12 +2340,12 @@ how long the first character in it is with the C macro: utf += len; len = UTF8SKIP(utf); /* len is 3 here */ -Another way to skip over characters in a UTF8 string is to use +Another way to skip over characters in a UTF-8 string is to use C, which takes a string and a number of characters to skip over. You're on your own about bounds checking, though, so don't use it lightly. -All bytes in a multi-byte UTF8 character will have the high bit set, +All bytes in a multi-byte UTF-8 character will have the high bit set, so you can test if you need to do something special with this character like this (the UTF8_IS_INVARIANT() is a macro that tests whether the byte can be encoded as a single byte even in UTF-8): @@ -2306,7 +2354,7 @@ whether the byte can be encoded as a single byte even in UTF-8): UV uv; /* Note: a UV, not a U8, not a char */ if (!UTF8_IS_INVARIANT(*utf)) - /* Must treat this as UTF8 */ + /* Must treat this as UTF-8 */ uv = utf8_to_uv(utf); else /* OK to treat this character as a byte */ @@ -2314,7 +2362,7 @@ whether the byte can be encoded as a single byte even in UTF-8): You can also see in that example that we use C to get the value of the character; the inverse function C is available -for putting a UV into UTF8: +for putting a UV into UTF-8: if (!UTF8_IS_INVARIANT(uv)) /* Must treat this as UTF8 */ @@ -2324,14 +2372,14 @@ for putting a UV into UTF8: *utf8++ = uv; You B convert characters to UVs using the above functions if -you're ever in a situation where you have to match UTF8 and non-UTF8 -characters. You may not skip over UTF8 characters in this case. If you -do this, you'll lose the ability to match hi-bit non-UTF8 characters; -for instance, if your UTF8 string contains C, and you skip -that character, you can never match a C in a non-UTF8 string. +you're ever in a situation where you have to match UTF-8 and non-UTF-8 +characters. You may not skip over UTF-8 characters in this case. If you +do this, you'll lose the ability to match hi-bit non-UTF-8 characters; +for instance, if your UTF-8 string contains C, and you skip +that character, you can never match a C in a non-UTF-8 string. So don't do that! -=head2 How does Perl store UTF8 strings? +=head2 How does Perl store UTF-8 strings? Currently, Perl deals with Unicode strings and non-Unicode strings slightly differently. If a string has been identified as being UTF-8 @@ -2348,8 +2396,8 @@ C, C and other string handling operations will have undesirable results. The problem comes when you have, for instance, a string that isn't -flagged is UTF8, and contains a byte sequence that could be UTF8 - -especially when combining non-UTF8 and UTF8 strings. +flagged is UTF-8, and contains a byte sequence that could be UTF-8 - +especially when combining non-UTF-8 and UTF-8 strings. Never forget that the C flag is separate to the PV value; you need be sure you don't accidentally knock it off while you're @@ -2366,7 +2414,7 @@ manipulating SVs. More specifically, you cannot expect to do this: The C string does not tell you the whole story, and you can't copy or reconstruct an SV just by copying the string value. Check if the -old SV has the UTF8 flag set, and act accordingly: +old SV has the UTF-8 flag set, and act accordingly: p = SvPV(sv, len); frobnicate(p); @@ -2375,17 +2423,17 @@ old SV has the UTF8 flag set, and act accordingly: SvUTF8_on(nsv); In fact, your C function should be made aware of whether or -not it's dealing with UTF8 data, so that it can handle the string +not it's dealing with UTF-8 data, so that it can handle the string appropriately. Since just passing an SV to an XS function and copying the data of -the SV is not enough to copy the UTF8 flags, even less right is just +the SV is not enough to copy the UTF-8 flags, even less right is just passing a C to an XS function. -=head2 How do I convert a string to UTF8? +=head2 How do I convert a string to UTF-8? -If you're mixing UTF8 and non-UTF8 strings, you might find it necessary -to upgrade one of the strings to UTF8. If you've got an SV, the easiest +If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary +to upgrade one of the strings to UTF-8. If you've got an SV, the easiest way to do this is: sv_utf8_upgrade(sv); @@ -2399,7 +2447,7 @@ If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable by the end user, it can cause problems. -Instead, C will give you a UTF8-encoded B of its +Instead, C will give you a UTF-8-encoded B of its string argument. This is useful for having the data available for comparisons and so on, without harming the original SV. There's also C to go the other way, but naturally, this will fail if @@ -2414,27 +2462,27 @@ Not really. Just remember these things: =item * -There's no way to tell if a string is UTF8 or not. You can tell if an SV -is UTF8 by looking at is C flag. Don't forget to set the flag if -something should be UTF8. Treat the flag as part of the PV, even though +There's no way to tell if a string is UTF-8 or not. You can tell if an SV +is UTF-8 by looking at is C flag. Don't forget to set the flag if +something should be UTF-8. Treat the flag as part of the PV, even though it's not - if you pass on the PV to somewhere, pass on the flag too. =item * -If a string is UTF8, B use C to get at the value, +If a string is UTF-8, B use C to get at the value, unless C in which case you can use C<*s>. =item * -When writing a character C to a UTF8 string, B use +When writing a character C to a UTF-8 string, B use C, unless C in which case you can use C<*s = uv>. =item * -Mixing UTF8 and non-UTF8 strings is tricky. Use C to get -a new string which is UTF8 encoded. There are tricks you can use to -delay deciding whether you need to use a UTF8 string until you get to a +Mixing UTF-8 and non-UTF-8 strings is tricky. Use C to get +a new string which is UTF-8 encoded. There are tricks you can use to +delay deciding whether you need to use a UTF-8 string until you get to a high character - C is one of those. =back