X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlguts.pod;h=c77abb66bc200a1a270fd51f84f1d1022b2804d1;hb=5145b83ccb40455ee1421b25f5971eb7e2a87afc;hp=a39c8f9cbc9446fa1a92ddd45217508b349a377b;hpb=7b52221de7fe1243a09b164fd1b22d32ce600210;p=p5sagit%2Fp5-mst-13.2.git

diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index a39c8f9..c77abb6 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -814,12 +814,12 @@ in the stash C<Baz::> in C<Bar::>'s stash.
 
 To get the stash pointer for a particular package, use the function:
 
-    HV*  gv_stashpv(const char* name, I32 create)
-    HV*  gv_stashsv(SV*, I32 create)
+    HV*  gv_stashpv(const char* name, I32 flags)
+    HV*  gv_stashsv(SV*, I32 flags)
 
 The first function takes a literal string, the second uses the string stored
 in the SV.  Remember that a stash is just a hash table, so you get back an
-C<HV*>.  The C<create> flag will create a new package if it is set.
+C<HV*>.  The C<flags> flag will create a new package if it is set to GV_ADD.
 
 The name that C<gv_stash*v> wants is the name of the package whose symbol table
 you want.  The default package is called C<main>.  If you have multiply nested
@@ -901,9 +901,9 @@ linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
         U16         mg_private;
         char        mg_type;
         U8          mg_flags;
+        I32         mg_len;
         SV*         mg_obj;
         char*       mg_ptr;
-        I32         mg_len;
     };
 
 Note this is current as of patchlevel 0, and could change at any time.
@@ -1187,7 +1187,7 @@ to do this.
     CODE:
         hash = newHV();
         tie = newRV_noinc((SV*)newHV());
-        stash = gv_stashpv("MyTie", TRUE);
+        stash = gv_stashpv("MyTie", GV_ADD);
         sv_bless(tie, stash);
         hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
         RETVAL = newRV_noinc(hash);
@@ -1914,6 +1914,12 @@ please see F<miniperlmain.c> for usage details.  You may also need
 to use C<dVAR> in your coding to "declare the global variables"
 when you are using them.  dTHX does this for you automatically.
 
+To see whether you have non-const data you can use a BSD-compatible C<nm>:
+
+  nm libperl.a | grep -v ' [TURtr] '
+
+If this displays any C<D> or C<d> symbols, you have non-const data.
+
 For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
 doesn't actually hide all symbols inside a big global struct: some
 PerlIO_xxx vtables are left visible.  The PERL_GLOBAL_STRUCT_PRIVATE
@@ -2050,9 +2056,9 @@ your Foo.xs:
         #include "perl.h"
         #include "XSUB.h"
 
-        static my_private_function(int arg1, int arg2);
+        STATIC void my_private_function(int arg1, int arg2);
 
-        static SV *
+        STATIC void
         my_private_function(int arg1, int arg2)
         {
             dTHX;       /* fetch context */
@@ -2090,9 +2096,9 @@ the Perl guts:
         #include "XSUB.h"
 
         /* pTHX_ only needed for functions that call Perl API */
-        static my_private_function(pTHX_ int arg1, int arg2);
+        STATIC void my_private_function(pTHX_ int arg1, int arg2);
 
-        static SV *
+        STATIC void
         my_private_function(pTHX_ int arg1, int arg2)
         {
             /* dTHX; not needed here, because THX is an argument */
@@ -2425,8 +2431,8 @@ To fix this, some people formed Unicode, Inc. and
 produced a new character set containing all the characters you can
 possibly think of and more. There are several ways of representing these
 characters, and the one Perl uses is called UTF-8. UTF-8 uses
-a variable number of bytes to represent a character, instead of just
-one. You can learn more about Unicode at http://www.unicode.org/
+a variable number of bytes to represent a character. You can learn more
+about Unicode and Perl's Unicode model in L<perlunicode>.
 
 =head2 How can I recognise a UTF-8 string?
 
@@ -2437,16 +2443,17 @@ C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
 has that byte sequence as well. So you can't tell just by looking - this
 is what makes Unicode input an interesting problem.
 
-The API function C<is_utf8_string> can help; it'll tell you if a string
-contains only valid UTF-8 characters. However, it can't do the work for
-you. On a character-by-character basis, C<is_utf8_char> will tell you
-whether the current character in a string is valid UTF-8.
+In general, you either have to know what you're dealing with, or you
+have to guess.  The API function C<is_utf8_string> can help; it'll tell
+you if a string contains only valid UTF-8 characters. However, it can't
+do the work for you. On a character-by-character basis, C<is_utf8_char>
+will tell you whether the current character in a string is valid UTF-8. 
 
 =head2 How does UTF-8 represent Unicode characters?
 
 As mentioned above, UTF-8 uses a variable number of bytes to store a
-character. Characters with values 1...128 are stored in one byte, just
-like good ol' ASCII. Character 129 is stored as C<v194.129>; this
+character. Characters with values 0...127 are stored in one byte, just
+like good ol' ASCII. Character 128 is stored as C<v194.128>; this
 continues up to character 191, which is C<v194.191>. Now we've run out of
 bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
 so it goes on, moving to three bytes at character 2048.
@@ -2503,9 +2510,11 @@ So don't do that!
 =head2 How does Perl store UTF-8 strings?
 
 Currently, Perl deals with Unicode strings and non-Unicode strings
-slightly differently. If a string has been identified as being UTF-8
-encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
-manipulate this flag with the following macros:
+slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
+string is internally encoded as UTF-8. Without it, the byte value is the
+codepoint number and vice versa (in other words, the string is encoded
+as iso-8859-1). You can check and manipulate this flag with the
+following macros:
 
     SvUTF8(sv)
     SvUTF8_on(sv)
@@ -2517,7 +2526,7 @@ C<length>, C<substr> and other string handling operations will have
 undesirable results.
 
 The problem comes when you have, for instance, a string that isn't
-flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
+flagged as UTF-8, and contains a byte sequence that could be UTF-8 -
 especially when combining non-UTF-8 and UTF-8 strings.
 
 Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
@@ -2535,7 +2544,7 @@ manipulating SVs. More specifically, you cannot expect to do this:
 
 The C<char*> string does not tell you the whole story, and you can't
 copy or reconstruct an SV just by copying the string value. Check if the
-old SV has the UTF-8 flag set, and act accordingly:
+old SV has the UTF8 flag set, and act accordingly:
 
     p = SvPV(sv, len);
     frobnicate(p);
@@ -2548,14 +2557,14 @@ not it's dealing with UTF-8 data, so that it can handle the string
 appropriately.
 
 Since just passing an SV to an XS function and copying the data of
-the SV is not enough to copy the UTF-8 flags, even less right is just
+the SV is not enough to copy the UTF8 flags, even less right is just
 passing a C<char *> to an XS function.
 
 =head2 How do I convert a string to UTF-8?
 
-If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
-to upgrade one of the strings to UTF-8. If you've got an SV, the easiest
-way to do this is:
+If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
+one of the strings to UTF-8. If you've got an SV, the easiest way to do
+this is:
 
     sv_utf8_upgrade(sv);
 
@@ -2566,7 +2575,7 @@ However, you must not do this, for example:
 
 If you do this in a binary operator, you will actually change one of the
 strings that came into the operator, and, while it shouldn't be noticeable
-by the end user, it can cause problems.
+by the end user, it can cause problems in deficient code.
 
 Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
 string argument. This is useful for having the data available for