X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=c77abb66bc200a1a270fd51f84f1d1022b2804d1;hb=5145b83ccb40455ee1421b25f5971eb7e2a87afc;hp=a39c8f9cbc9446fa1a92ddd45217508b349a377b;hpb=7b52221de7fe1243a09b164fd1b22d32ce600210;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlguts.pod b/pod/perlguts.pod index a39c8f9..c77abb6 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -814,12 +814,12 @@ in the stash C in C's stash. To get the stash pointer for a particular package, use the function: - HV* gv_stashpv(const char* name, I32 create) - HV* gv_stashsv(SV*, I32 create) + HV* gv_stashpv(const char* name, I32 flags) + HV* gv_stashsv(SV*, I32 flags) The first function takes a literal string, the second uses the string stored in the SV. Remember that a stash is just a hash table, so you get back an -C. The C flag will create a new package if it is set. +C. The C flag will create a new package if it is set to GV_ADD. The name that C wants is the name of the package whose symbol table you want. The default package is called C
. If you have multiply nested @@ -901,9 +901,9 @@ linked list of C's, typedef'ed to C. U16 mg_private; char mg_type; U8 mg_flags; + I32 mg_len; SV* mg_obj; char* mg_ptr; - I32 mg_len; }; Note this is current as of patchlevel 0, and could change at any time. @@ -1187,7 +1187,7 @@ to do this. CODE: hash = newHV(); tie = newRV_noinc((SV*)newHV()); - stash = gv_stashpv("MyTie", TRUE); + stash = gv_stashpv("MyTie", GV_ADD); sv_bless(tie, stash); hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); RETVAL = newRV_noinc(hash); @@ -1914,6 +1914,12 @@ please see F for usage details. You may also need to use C in your coding to "declare the global variables" when you are using them. dTHX does this for you automatically. +To see whether you have non-const data you can use a BSD-compatible C: + + nm libperl.a | grep -v ' [TURtr] ' + +If this displays any C or C symbols, you have non-const data. + For backward compatibility reasons defining just PERL_GLOBAL_STRUCT doesn't actually hide all symbols inside a big global struct: some PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE @@ -2050,9 +2056,9 @@ your Foo.xs: #include "perl.h" #include "XSUB.h" - static my_private_function(int arg1, int arg2); + STATIC void my_private_function(int arg1, int arg2); - static SV * + STATIC void my_private_function(int arg1, int arg2) { dTHX; /* fetch context */ @@ -2090,9 +2096,9 @@ the Perl guts: #include "XSUB.h" /* pTHX_ only needed for functions that call Perl API */ - static my_private_function(pTHX_ int arg1, int arg2); + STATIC void my_private_function(pTHX_ int arg1, int arg2); - static SV * + STATIC void my_private_function(pTHX_ int arg1, int arg2) { /* dTHX; not needed here, because THX is an argument */ @@ -2425,8 +2431,8 @@ To fix this, some people formed Unicode, Inc. and produced a new character set containing all the characters you can possibly think of and more. There are several ways of representing these characters, and the one Perl uses is called UTF-8. UTF-8 uses -a variable number of bytes to represent a character, instead of just -one. You can learn more about Unicode at http://www.unicode.org/ +a variable number of bytes to represent a character. You can learn more +about Unicode and Perl's Unicode model in L. =head2 How can I recognise a UTF-8 string? @@ -2437,16 +2443,17 @@ C. Unfortunately, the non-Unicode string C has that byte sequence as well. So you can't tell just by looking - this is what makes Unicode input an interesting problem. -The API function C can help; it'll tell you if a string -contains only valid UTF-8 characters. However, it can't do the work for -you. On a character-by-character basis, C will tell you -whether the current character in a string is valid UTF-8. +In general, you either have to know what you're dealing with, or you +have to guess. The API function C can help; it'll tell +you if a string contains only valid UTF-8 characters. However, it can't +do the work for you. On a character-by-character basis, C +will tell you whether the current character in a string is valid UTF-8. =head2 How does UTF-8 represent Unicode characters? As mentioned above, UTF-8 uses a variable number of bytes to store a -character. Characters with values 1...128 are stored in one byte, just -like good ol' ASCII. Character 129 is stored as C; this +character. Characters with values 0...127 are stored in one byte, just +like good ol' ASCII. Character 128 is stored as C; this continues up to character 191, which is C. Now we've run out of bits (191 is binary C<10111111>) so we move on; 192 is C. And so it goes on, moving to three bytes at character 2048. @@ -2503,9 +2510,11 @@ So don't do that! =head2 How does Perl store UTF-8 strings? Currently, Perl deals with Unicode strings and non-Unicode strings -slightly differently. If a string has been identified as being UTF-8 -encoded, Perl will set a flag in the SV, C. You can check and -manipulate this flag with the following macros: +slightly differently. A flag in the SV, C, indicates that the +string is internally encoded as UTF-8. Without it, the byte value is the +codepoint number and vice versa (in other words, the string is encoded +as iso-8859-1). You can check and manipulate this flag with the +following macros: SvUTF8(sv) SvUTF8_on(sv) @@ -2517,7 +2526,7 @@ C, C and other string handling operations will have undesirable results. The problem comes when you have, for instance, a string that isn't -flagged is UTF-8, and contains a byte sequence that could be UTF-8 - +flagged as UTF-8, and contains a byte sequence that could be UTF-8 - especially when combining non-UTF-8 and UTF-8 strings. Never forget that the C flag is separate to the PV value; you @@ -2535,7 +2544,7 @@ manipulating SVs. More specifically, you cannot expect to do this: The C string does not tell you the whole story, and you can't copy or reconstruct an SV just by copying the string value. Check if the -old SV has the UTF-8 flag set, and act accordingly: +old SV has the UTF8 flag set, and act accordingly: p = SvPV(sv, len); frobnicate(p); @@ -2548,14 +2557,14 @@ not it's dealing with UTF-8 data, so that it can handle the string appropriately. Since just passing an SV to an XS function and copying the data of -the SV is not enough to copy the UTF-8 flags, even less right is just +the SV is not enough to copy the UTF8 flags, even less right is just passing a C to an XS function. =head2 How do I convert a string to UTF-8? -If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary -to upgrade one of the strings to UTF-8. If you've got an SV, the easiest -way to do this is: +If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade +one of the strings to UTF-8. If you've got an SV, the easiest way to do +this is: sv_utf8_upgrade(sv); @@ -2566,7 +2575,7 @@ However, you must not do this, for example: If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable -by the end user, it can cause problems. +by the end user, it can cause problems in deficient code. Instead, C will give you a UTF-8-encoded B of its string argument. This is useful for having the data available for