X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=4d045318d36501c332d6696bf9ff937a823d0c74;hb=cdd94ca776dceea28cef3714a8bc6d873614b2bf;hp=9a411e15975650fd1def80f57835cf1a083b27b5;hpb=06f6df1768f7810ef4b2d5d7821dcccb9c7bf9e4;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlguts.pod b/pod/perlguts.pod index 9a411e1..4d04531 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -201,7 +201,22 @@ you can call: SvOK(SV*) The scalar C value is stored in an SV instance called C. -Its address can be used whenever an C is needed. + +Its address can be used whenever an C is needed. Make sure that +you don't try to compare a random sv with C<&PL_sv_undef>. For example +when interfacing Perl code, it'll work correctly for: + + foo(undef); + +But won't work when called as: + + $x = undef; + foo($x); + +So to repeat always use SvOK() to check whether an sv is defined. + +Also you have to be careful when using C<&PL_sv_undef> as a value in +AVs or HVs (see L). There are also the two values C and C, which contain boolean TRUE and FALSE values, respectively. Like C, their @@ -489,6 +504,58 @@ reference count of the stored C, which is the caller's responsibility. If these functions return a NULL value, the caller will usually have to decrement the reference count of C to avoid a memory leak. +=head2 AVs, HVs and undefined values + +Sometimes you have to store undefined values in AVs or HVs. Although +this may be a rare case, it can be tricky. That's because you're +used to using C<&PL_sv_undef> if you need an undefined SV. + +For example, intuition tells you that this XS code: + + AV *av = newAV(); + av_store( av, 0, &PL_sv_undef ); + +is equivalent to this Perl code: + + my @av; + $av[0] = undef; + +Unfortunately, this isn't true. AVs use C<&PL_sv_undef> as a marker +for indicating that an array element has not yet been initialized. +Thus, C would be true for the above Perl code, but +false for the array generated by the XS code. + +Other problems can occur when storing C<&PL_sv_undef> in HVs: + + hv_store( hv, "key", 3, &PL_sv_undef, 0 ); + +This will indeed make the value C, but if you try to modify +the value of C, you'll get the following error: + + Modification of non-creatable hash value attempted + +In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders +in restricted hashes. This caused such hash entries not to appear +when iterating over the hash or when checking for the keys +with the C function. + +You can run into similar problems when you store C<&PL_sv_true> or +C<&PL_sv_false> into AVs or HVs. Trying to modify such elements +will give you the following error: + + Modification of a read-only value attempted + +To make a long story short, you can use the special variables +C<&PL_sv_undef>, C<&PL_sv_true> and C<&PL_sv_false> with AVs and +HVs, but you have to make sure you know what you're doing. + +Generally, if you want to store an undefined value in an AV +or HV, you should not use C<&PL_sv_undef>, but rather create a +new undefined value using the C function, for example: + + av_store( av, 42, newSV(0) ); + hv_store( hv, "foo", 3, newSV(0), 0 ); + =head2 References References are a special type of scalar that point to other data types @@ -859,8 +926,11 @@ SV. The C and C arguments are used to associate a string with the magic, typically the name of a variable. C is stored in the -C field and if C is non-null and C E= 0 a malloc'd -copy of the name is stored in C field. +C field and if C is non-null then either a C copy of +C or C itself is stored in the C field, depending on +whether C is greater than zero or equal to zero respectively. As a +special case, if C<(name && namlen == HEf_SVKEY)> then C is assumed +to contain an C and is stored as-is with its REFCNT incremented. The sv_magic function uses C to determine which, if any, predefined "Magic Virtual Table" should be assigned to the C field. @@ -877,6 +947,9 @@ count of the C object is incremented. If it is the same, or if the C argument is C, or if it is a NULL pointer, then C is merely stored, without the reference count being incremented. +See also C in L for a more flexible way to add magic +to an SV. + There is also a function to add magic to an C: void hv_magic(HV *hv, GV *gv, int how); @@ -1274,7 +1347,8 @@ Similar to C, but localize C<@gv> and C<%gv>. Duplicates the current value of C, on the exit from the current C/C I will restore the value of C -using the stored value. +using the stored value. It doesn't handle magic. Use C if +magic is affected. =item C @@ -1326,11 +1400,12 @@ and C is the number of elements the stack should be extended by. Now that there is room on the stack, values can be pushed on it using C macro. The pushed values will often need to be "mortal" (See -L). +L): PUSHs(sv_2mortal(newSViv(an_integer))) + PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) + PUSHs(sv_2mortal(newSVnv(a_double))) PUSHs(sv_2mortal(newSVpv("Some String",0))) - PUSHs(sv_2mortal(newSVnv(3.141592))) And now the Perl program calling C, the two values will be assigned as in: @@ -1346,8 +1421,9 @@ This macro automatically adjust the stack for you, if needed. Thus, you do not need to call C to extend the stack. Despite their suggestions in earlier versions of this document the macros -C, C and C are I suited to XSUBs which return -multiple results, see L. +C<(X)PUSH[iunp]> are I suited to XSUBs which return multiple results. +For that, either stick to the C<(X)PUSHs> macros shown above, or use the new +C macros instead; see L. For more information, consult L and L. @@ -1481,7 +1557,7 @@ corresponding parts of its I and puts the I on stack. The macro to put this target on stack is C, and it is directly used in some opcodes, as well as indirectly in zillions of -others, which use it via C<(X)PUSH[pni]>. +others, which use it via C<(X)PUSH[iunp]>. Because the target is reused, you must be careful when pushing multiple values on the stack. The following code will not do what you think: @@ -1493,12 +1569,30 @@ This translates as "set C to 10, push a pointer to C onto the stack; set C to 20, push a pointer to C onto the stack". At the end of the operation, the stack does not contain the values 10 and 20, but actually contains two pointers to C, which we have set -to 20. If you need to push multiple different values, use C, -which bypasses C. +to 20. + +If you need to push multiple different values then you should either use +the C<(X)PUSHs> macros, or else use the new C macros, +none of which make use of C. The C<(X)PUSHs> macros simply push an +SV* on the stack, which, as noted under L, +will often need to be "mortal". The new C macros make +this a little easier to achieve by creating a new mortal for you (via +C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary +in the case of the C macros), and then setting its value. +Thus, instead of writing this to "fix" the example above: + + XPUSHs(sv_2mortal(newSViv(10))) + XPUSHs(sv_2mortal(newSViv(20))) -On a related note, if you do use C<(X)PUSH[npi]>, then you're going to +you can simply write: + + mXPUSHi(10) + mXPUSHi(20) + +On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to need a C in your variable declarations so that the C<*PUSH*> -macros can make use of the local variable C. +macros can make use of the local variable C. See also C +and C. =head2 Scratchpads @@ -1708,10 +1802,10 @@ are subject to the same restrictions as in the pass 2. =head2 Pluggable runops The compile tree is executed in a runops function. There are two runops -functions in F. C is used with DEBUGGING and -C is used otherwise. For fine control over the -execution of the compile tree it is possible to provide your own runops -function. +functions, in F and in F. C is used +with DEBUGGING and C is used otherwise. For fine +control over the execution of the compile tree it is possible to provide +your own runops function. It's probably best to copy one of the existing runops functions and change it to suit your needs. Then, in the BOOT section of your XS @@ -2031,11 +2125,12 @@ static functions start with C.) Inside the Perl core, you can get at the functions either with or without the C prefix, thanks to a bunch of defines that live in F. This header file is generated automatically from -F. F also creates the prototyping header files for -the internal functions, generates the documentation and a lot of other -bits and pieces. It's important that when you add a new function to the -core or change an existing one, you change the data in the table at the -end of F as well. Here's a sample entry from that table: +F and F. F also creates the prototyping +header files for the internal functions, generates the documentation +and a lot of other bits and pieces. It's important that when you add +a new function to the core or change an existing one, you change the +data in the table in F as well. Here's a sample entry from +that table: Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval @@ -2094,19 +2189,32 @@ or disappear without notice. This function should not have a compatibility macro to define, say, C to C. It must be called as C. -=item j - -This function is not a member of C. If you don't know -what this means, don't use it. - =item x This function isn't exported out of the Perl core. +=item m + +This is implemented as a macro. + +=item X + +This function is explicitly exported. + +=item E + +This function is visible to extensions included in the Perl core. + +=item b + +Binary backward compatibility; this function is a macro but also has +a C implementation (which is exported). + =back -If you edit F, you will need to run C to -force a rebuild of F and other auto-generated files. +If you edit F or F, you will need to run +C to force a rebuild of F and other +auto-generated files. =head2 Formatted Printing of IVs, UVs, and NVs @@ -2175,6 +2283,26 @@ source, like this: Please try and supply some documentation if you add functions to the Perl core. +=head2 Backwards compatibility + +The Perl API changes over time. New functions are added or the interfaces +of existing functions are changed. The C module tries to +provide compatibility code for some of these changes, so XS writers don't +have to code it themselves when supporting multiple versions of Perl. + +C generates a C header file F that can also +be run as a Perl script. To generate F, run: + + perl -MDevel::PPPort -eDevel::PPPort::WriteFile + +Besides checking existing XS code, the script can also be used to retrieve +compatibility information for various API calls using the C<--api-info> +command line switch. For example: + + % perl ppport.h --api-info=sv_magicext + +For details, see C. + =head1 Unicode Support Perl 5.6.0 introduced Unicode support. It's important for porters and XS @@ -2201,34 +2329,34 @@ to one character. To fix this, some people formed Unicode, Inc. and produced a new character set containing all the characters you can possibly think of and more. There are several ways of representing these -characters, and the one Perl uses is called UTF8. UTF8 uses +characters, and the one Perl uses is called UTF-8. UTF-8 uses a variable number of bytes to represent a character, instead of just one. You can learn more about Unicode at http://www.unicode.org/ -=head2 How can I recognise a UTF8 string? +=head2 How can I recognise a UTF-8 string? -You can't. This is because UTF8 data is stored in bytes just like -non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types) +You can't. This is because UTF-8 data is stored in bytes just like +non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) capital E with a grave accent, is represented by the two bytes C. Unfortunately, the non-Unicode string C has that byte sequence as well. So you can't tell just by looking - this is what makes Unicode input an interesting problem. The API function C can help; it'll tell you if a string -contains only valid UTF8 characters. However, it can't do the work for +contains only valid UTF-8 characters. However, it can't do the work for you. On a character-by-character basis, C will tell you -whether the current character in a string is valid UTF8. +whether the current character in a string is valid UTF-8. -=head2 How does UTF8 represent Unicode characters? +=head2 How does UTF-8 represent Unicode characters? -As mentioned above, UTF8 uses a variable number of bytes to store a +As mentioned above, UTF-8 uses a variable number of bytes to store a character. Characters with values 1...128 are stored in one byte, just like good ol' ASCII. Character 129 is stored as C; this continues up to character 191, which is C. Now we've run out of bits (191 is binary C<10111111>) so we move on; 192 is C. And so it goes on, moving to three bytes at character 2048. -Assuming you know you're dealing with a UTF8 string, you can find out +Assuming you know you're dealing with a UTF-8 string, you can find out how long the first character in it is with the C macro: char *utf = "\305\233\340\240\201"; @@ -2238,12 +2366,12 @@ how long the first character in it is with the C macro: utf += len; len = UTF8SKIP(utf); /* len is 3 here */ -Another way to skip over characters in a UTF8 string is to use +Another way to skip over characters in a UTF-8 string is to use C, which takes a string and a number of characters to skip over. You're on your own about bounds checking, though, so don't use it lightly. -All bytes in a multi-byte UTF8 character will have the high bit set, +All bytes in a multi-byte UTF-8 character will have the high bit set, so you can test if you need to do something special with this character like this (the UTF8_IS_INVARIANT() is a macro that tests whether the byte can be encoded as a single byte even in UTF-8): @@ -2252,7 +2380,7 @@ whether the byte can be encoded as a single byte even in UTF-8): UV uv; /* Note: a UV, not a U8, not a char */ if (!UTF8_IS_INVARIANT(*utf)) - /* Must treat this as UTF8 */ + /* Must treat this as UTF-8 */ uv = utf8_to_uv(utf); else /* OK to treat this character as a byte */ @@ -2260,7 +2388,7 @@ whether the byte can be encoded as a single byte even in UTF-8): You can also see in that example that we use C to get the value of the character; the inverse function C is available -for putting a UV into UTF8: +for putting a UV into UTF-8: if (!UTF8_IS_INVARIANT(uv)) /* Must treat this as UTF8 */ @@ -2270,14 +2398,14 @@ for putting a UV into UTF8: *utf8++ = uv; You B convert characters to UVs using the above functions if -you're ever in a situation where you have to match UTF8 and non-UTF8 -characters. You may not skip over UTF8 characters in this case. If you -do this, you'll lose the ability to match hi-bit non-UTF8 characters; -for instance, if your UTF8 string contains C, and you skip -that character, you can never match a C in a non-UTF8 string. +you're ever in a situation where you have to match UTF-8 and non-UTF-8 +characters. You may not skip over UTF-8 characters in this case. If you +do this, you'll lose the ability to match hi-bit non-UTF-8 characters; +for instance, if your UTF-8 string contains C, and you skip +that character, you can never match a C in a non-UTF-8 string. So don't do that! -=head2 How does Perl store UTF8 strings? +=head2 How does Perl store UTF-8 strings? Currently, Perl deals with Unicode strings and non-Unicode strings slightly differently. If a string has been identified as being UTF-8 @@ -2294,8 +2422,8 @@ C, C and other string handling operations will have undesirable results. The problem comes when you have, for instance, a string that isn't -flagged is UTF8, and contains a byte sequence that could be UTF8 - -especially when combining non-UTF8 and UTF8 strings. +flagged is UTF-8, and contains a byte sequence that could be UTF-8 - +especially when combining non-UTF-8 and UTF-8 strings. Never forget that the C flag is separate to the PV value; you need be sure you don't accidentally knock it off while you're @@ -2312,7 +2440,7 @@ manipulating SVs. More specifically, you cannot expect to do this: The C string does not tell you the whole story, and you can't copy or reconstruct an SV just by copying the string value. Check if the -old SV has the UTF8 flag set, and act accordingly: +old SV has the UTF-8 flag set, and act accordingly: p = SvPV(sv, len); frobnicate(p); @@ -2321,17 +2449,17 @@ old SV has the UTF8 flag set, and act accordingly: SvUTF8_on(nsv); In fact, your C function should be made aware of whether or -not it's dealing with UTF8 data, so that it can handle the string +not it's dealing with UTF-8 data, so that it can handle the string appropriately. Since just passing an SV to an XS function and copying the data of -the SV is not enough to copy the UTF8 flags, even less right is just +the SV is not enough to copy the UTF-8 flags, even less right is just passing a C to an XS function. -=head2 How do I convert a string to UTF8? +=head2 How do I convert a string to UTF-8? -If you're mixing UTF8 and non-UTF8 strings, you might find it necessary -to upgrade one of the strings to UTF8. If you've got an SV, the easiest +If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary +to upgrade one of the strings to UTF-8. If you've got an SV, the easiest way to do this is: sv_utf8_upgrade(sv); @@ -2345,7 +2473,7 @@ If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable by the end user, it can cause problems. -Instead, C will give you a UTF8-encoded B of its +Instead, C will give you a UTF-8-encoded B of its string argument. This is useful for having the data available for comparisons and so on, without harming the original SV. There's also C to go the other way, but naturally, this will fail if @@ -2360,27 +2488,27 @@ Not really. Just remember these things: =item * -There's no way to tell if a string is UTF8 or not. You can tell if an SV -is UTF8 by looking at is C flag. Don't forget to set the flag if -something should be UTF8. Treat the flag as part of the PV, even though +There's no way to tell if a string is UTF-8 or not. You can tell if an SV +is UTF-8 by looking at is C flag. Don't forget to set the flag if +something should be UTF-8. Treat the flag as part of the PV, even though it's not - if you pass on the PV to somewhere, pass on the flag too. =item * -If a string is UTF8, B use C to get at the value, +If a string is UTF-8, B use C to get at the value, unless C in which case you can use C<*s>. =item * -When writing a character C to a UTF8 string, B use +When writing a character C to a UTF-8 string, B use C, unless C in which case you can use C<*s = uv>. =item * -Mixing UTF8 and non-UTF8 strings is tricky. Use C to get -a new string which is UTF8 encoded. There are tricks you can use to -delay deciding whether you need to use a UTF8 string until you get to a +Mixing UTF-8 and non-UTF-8 strings is tricky. Use C to get +a new string which is UTF-8 encoded. There are tricks you can use to +delay deciding whether you need to use a UTF-8 string until you get to a high character - C is one of those. =back