len);>. It might work with your compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
- SV *s;
- STRLEN len;
- char * ptr;
- ptr = SvPV(s, len);
- foo(ptr, len);
+ SV *s;
+ STRLEN len;
+ char * ptr;
+ ptr = SvPV(s, len);
+ foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
SvOK(SV*)
The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
-Its address can be used whenever an C<SV*> is needed.
-However, you have to be careful when using C<&PL_sv_undef> as a value in AVs
-or HVs (see L<AVs, HVs and undefined values>).
+
+Its address can be used whenever an C<SV*> is needed. Make sure that
+you don't try to compare a random sv with C<&PL_sv_undef>. For example
+when interfacing Perl code, it'll work correctly for:
+
+ foo(undef);
+
+But won't work when called as:
+
+ $x = undef;
+ foo($x);
+
+So to repeat always use SvOK() to check whether an sv is defined.
+
+Also you have to be careful when using C<&PL_sv_undef> as a value in
+AVs or HVs (see L<AVs, HVs and undefined values>).
There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
For example, a tied SV may have a valid underlying value in the IV slot
(so SvIOKp is true), but the data should be accessed via the FETCH
routine rather than directly, so SvIOK is false. Another is when
-numeric conversion has occured and precision has been lost: only the
+numeric conversion has occurred and precision has been lost: only the
private flag is set on 'lossy' values. So when an NV is converted to an
IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
To get the stash pointer for a particular package, use the function:
- HV* gv_stashpv(const char* name, I32 create)
- HV* gv_stashsv(SV*, I32 create)
+ HV* gv_stashpv(const char* name, I32 flags)
+ HV* gv_stashsv(SV*, I32 flags)
The first function takes a literal string, the second uses the string stored
in the SV. Remember that a stash is just a hash table, so you get back an
-C<HV*>. The C<create> flag will create a new package if it is set.
+C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD.
The name that C<gv_stash*v> wants is the name of the package whose symbol table
you want. The default package is called C<main>. If you have multiply nested
U16 mg_private;
char mg_type;
U8 mg_flags;
+ I32 mg_len;
SV* mg_obj;
char* mg_ptr;
- I32 mg_len;
};
Note this is current as of patchlevel 0, and could change at any time.
The C<name> and C<namlen> arguments are used to associate a string with
the magic, typically the name of a variable. C<namlen> is stored in the
-C<mg_len> field and if C<name> is non-null and C<namlen> E<gt>= 0 a malloc'd
-copy of the name is stored in C<mg_ptr> field.
+C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
+C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
+whether C<namlen> is greater than zero or equal to zero respectively. As a
+special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed
+to contain an C<SV*> and is stored as-is with its REFCNT incremented.
The sv_magic function uses C<how> to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer,
then C<obj> is merely stored, without the reference count being incremented.
+See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic
+to an SV.
+
There is also a function to add magic to an C<HV>:
void hv_magic(HV *hv, GV *gv, int how);
"Magic Virtual Table" to handle the various operations that might be
applied to that variable.
-The C<MGVTBL> has five pointers to the following routine types:
+The C<MGVTBL> has five (or sometimes eight) pointers to the following
+routine types:
int (*svt_get)(SV* sv, MAGIC* mg);
int (*svt_set)(SV* sv, MAGIC* mg);
int (*svt_clear)(SV* sv, MAGIC* mg);
int (*svt_free)(SV* sv, MAGIC* mg);
+ int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, const char *name, I32 namlen);
+ int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param);
+ int (*svt_local)(SV *nsv, MAGIC *mg);
+
+
This MGVTBL structure is set at compile-time in F<perl.h> and there are
currently 19 types (or 21 with overloading turned on). These different
structures contain pointers to various routines that perform additional
svt_get Do something before the value of the SV is retrieved.
svt_set Do something after the SV is assigned a value.
svt_len Report on the SV's length.
- svt_clear Clear something the SV represents.
+ svt_clear Clear something the SV represents.
svt_free Free any extra storage associated with the SV.
+ svt_copy copy tied variable magic to a tied element
+ svt_dup duplicate a magic structure during thread cloning
+ svt_local copy magic to local value during 'local'
+
For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
to an C<mg_type> of C<PERL_MAGIC_sv>) contains:
with C<magic_>. NOTE: the magic routines are not considered part of
the Perl API, and may not be exported by the Perl library.
+The last three slots are a recent addition, and for source code
+compatibility they are only checked for if one of the three flags
+MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most
+code can continue declaring a vtable as a 5-element value. These three are
+currently used exclusively by the threading code, and are highly subject
+to change.
+
The current kinds of Magic Virtual Tables are:
mg_type
- (old-style char and macro) MGVTBL Type of magic
- -------------------------- ------ ----------------------------
- \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
- A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
+ (old-style char and macro) MGVTBL Type of magic
+ -------------------------- ------ -------------
+ \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
+ A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
- c PERL_MAGIC_overload_table (none) Holds overload table (AMT)
- on stash
- B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search)
- D PERL_MAGIC_regdata vtbl_regdata Regex match position data
- (@+ and @- vars)
- d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
- element
- E PERL_MAGIC_env vtbl_env %ENV hash
- e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
- f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format)
- g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string
- I PERL_MAGIC_isa vtbl_isa @ISA array
- i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
- k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
- L PERL_MAGIC_dbfile (none) Debugger %_<filename
- l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element
- m PERL_MAGIC_mutex vtbl_mutex ???
- o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation
- P PERL_MAGIC_tied vtbl_pack Tied array or hash
- p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
- q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
- r PERL_MAGIC_qr vtbl_qr precompiled qr// regex
- S PERL_MAGIC_sig vtbl_sig %SIG hash
- s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
- t PERL_MAGIC_taint vtbl_taint Taintedness
- U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions
- v PERL_MAGIC_vec vtbl_vec vec() lvalue
- V PERL_MAGIC_vstring (none) v-string scalars
- w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache
- x PERL_MAGIC_substr vtbl_substr substr() lvalue
- y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
- variable / smart parameter
- vivification
- * PERL_MAGIC_glob vtbl_glob GV (typeglob)
- # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
- . PERL_MAGIC_pos vtbl_pos pos() lvalue
- < PERL_MAGIC_backref vtbl_backref ???
- ~ PERL_MAGIC_ext (none) Available for use by extensions
+ c PERL_MAGIC_overload_table (none) Holds overload table (AMT)
+ on stash
+ B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search)
+ D PERL_MAGIC_regdata vtbl_regdata Regex match position data
+ (@+ and @- vars)
+ d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
+ element
+ E PERL_MAGIC_env vtbl_env %ENV hash
+ e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
+ f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format)
+ g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string
+ H PERL_MAGIC_hints vtbl_sig %^H hash
+ h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
+ I PERL_MAGIC_isa vtbl_isa @ISA array
+ i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
+ k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
+ L PERL_MAGIC_dbfile (none) Debugger %_<filename
+ l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename element
+ o PERL_MAGIC_collxfrm vtbl_collxfrm Locale collate transformation
+ P PERL_MAGIC_tied vtbl_pack Tied array or hash
+ p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
+ q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
+ r PERL_MAGIC_qr vtbl_qr precompiled qr// regex
+ S PERL_MAGIC_sig vtbl_sig %SIG hash
+ s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
+ t PERL_MAGIC_taint vtbl_taint Taintedness
+ U PERL_MAGIC_uvar vtbl_uvar Available for use by extensions
+ v PERL_MAGIC_vec vtbl_vec vec() lvalue
+ V PERL_MAGIC_vstring (none) v-string scalars
+ w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cache
+ x PERL_MAGIC_substr vtbl_substr substr() lvalue
+ y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
+ variable / smart parameter
+ vivification
+ # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
+ . PERL_MAGIC_pos vtbl_pos pos() lvalue
+ < PERL_MAGIC_backref vtbl_backref back pointer to a weak ref
+ ~ PERL_MAGIC_ext (none) Available for use by extensions
+ : PERL_MAGIC_symtab (none) hash used as symbol table
+ % PERL_MAGIC_rhash (none) hash used as restricted hash
+ @ PERL_MAGIC_arylen_p vtbl_arylen_p pointer to $#a from @a
+
When an uppercase and lowercase letter both exist in the table, then the
uppercase letter is typically used to represent some kind of composite type
uf.uf_index = 0;
sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
+Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect.
+
+For hashes there is a specialized hook that gives control over hash
+keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic
+if the "set" function in the C<ufuncs> structure is NULL. The hook
+is activated whenever the hash is accessed with a key specified as
+an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>,
+C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string
+through the functions without the C<..._ent> suffix circumvents the
+hook. See L<Hash::Util::Fieldhash/Guts> for a detailed description.
+
Note that because multiple extensions may be using C<PERL_MAGIC_ext>
or C<PERL_MAGIC_uvar> magic, it is important for extensions to take
extra care to avoid conflict. Typically only using the magic on
CODE:
hash = newHV();
tie = newRV_noinc((SV*)newHV());
- stash = gv_stashpv("MyTie", TRUE);
+ stash = gv_stashpv("MyTie", GV_ADD);
sv_bless(tie, stash);
hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
RETVAL = newRV_noinc(hash);
The following three macros are used to initially allocate memory :
- New(x, pointer, number, type);
- Newc(x, pointer, number, type, cast);
- Newz(x, pointer, number, type);
-
-The first argument C<x> was a "magic cookie" that was used to keep track
-of who called the macro, to help when debugging memory problems. However,
-the current code makes no use of this feature (most Perl developers now
-use run-time memory checkers), so this argument can be any number.
+ Newx(pointer, number, type);
+ Newxc(pointer, number, type, cast);
+ Newxz(pointer, number, type);
-The second argument C<pointer> should be the name of a variable that will
+The first argument C<pointer> should be the name of a variable that will
point to the newly allocated memory.
-The third and fourth arguments C<number> and C<type> specify how many of
+The second and third arguments C<number> and C<type> specify how many of
the specified type of data structure should be allocated. The argument
-C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>,
+C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>,
should be used if the C<pointer> argument is different from the C<type>
argument.
-Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
+Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero>
to zero out all the newly allocated memory.
=head3 Reallocation
or inside a thread-specific structure. These structures contain all
the context, the state of that interpreter.
-Two macros control the major Perl build flavors: MULTIPLICITY and
-USE_5005THREADS. The MULTIPLICITY build has a C structure
-that packages all the interpreter state, and there is a similar thread-specific
-data structure under USE_5005THREADS. In both cases,
-PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
-support for passing in a "hidden" first argument that represents all three
-data structures.
+One macro controls the major Perl build flavor: MULTIPLICITY. The
+MULTIPLICITY build has a C structure that packages all the interpreter
+state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
+normally defined, and enables the support for passing in a "hidden" first
+argument that represents all three data structures. MULTIPLICITY makes
+mutli-threaded perls possible (with the ithreads threading model, related
+to the macro USE_ITHREADS.)
+
+Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
+PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
+former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
+internal variables of Perl to be wrapped inside a single global struct,
+struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or
+the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes
+one step further, there is still a single struct (allocated in main()
+either from heap or from stack) but there are no global data symbols
+pointing to it. In either case the global struct should be initialised
+as the very first thing in main() using Perl_init_global_struct() and
+correspondingly tear it down after perl_free() using Perl_free_global_struct(),
+please see F<miniperlmain.c> for usage details. You may also need
+to use C<dVAR> in your coding to "declare the global variables"
+when you are using them. dTHX does this for you automatically.
+
+To see whether you have non-const data you can use a BSD-compatible C<nm>:
+
+ nm libperl.a | grep -v ' [TURtr] '
+
+If this displays any C<D> or C<d> symbols, you have non-const data.
+
+For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
+doesn't actually hide all symbols inside a big global struct: some
+PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE
+then hides everything (see how the PERLIO_FUNCS_DECL is used).
All this obviously requires a way for the Perl internal functions to be
either subroutines taking some kind of structure as the first
#include "perl.h"
#include "XSUB.h"
- static my_private_function(int arg1, int arg2);
+ STATIC void my_private_function(int arg1, int arg2);
- static SV *
+ STATIC void
my_private_function(int arg1, int arg2)
{
dTHX; /* fetch context */
#include "XSUB.h"
/* pTHX_ only needed for functions that call Perl API */
- static my_private_function(pTHX_ int arg1, int arg2);
+ STATIC void my_private_function(pTHX_ int arg1, int arg2);
- static SV *
+ STATIC void
my_private_function(pTHX_ int arg1, int arg2)
{
/* dTHX; not needed here, because THX is an argument */
macro with the underscore for functions that take explicit arguments,
or the form without the argument for functions with no explicit arguments.
+If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR>
+definition is needed if the Perl global variables (see F<perlvars.h>
+or F<globvar.sym>) are accessed in the function and C<dTHX> is not
+used (the C<dTHX> includes the C<dVAR> if necessary). One notices
+the need for C<dVAR> only with the said compile-time define, because
+otherwise the Perl global variables are visible as-is.
+
=head2 Should I do anything special if I call perl from multiple threads?
If you create interpreters in one thread and then proceed to call them in
that the interpreter knows about itself and pass it around, so too are
there plans to allow the interpreter to bundle up everything it knows
about the environment it's running on. This is enabled with the
-PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS
-and USE_5005THREADS on Windows (see inside iperlsys.h).
+PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
+Windows.
This allows the ability to provide an extra pointer (called the "host"
environment) for all the system calls. This makes it possible for
=item A
-This function is a part of the public API.
+This function is a part of the public API. All such functions should also
+have 'd', very few do not.
=item p
-This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
+This function has a C<Perl_> prefix; i.e. it is defined as
+C<Perl_av_fetch>.
=item d
This function has documentation using the C<apidoc> feature which we'll
-look at in a second.
+look at in a second. Some functions have 'd' but not 'A'; docs are good.
=back
=item s
-This is a static function and is defined as C<S_whatever>, and usually
-called within the sources as C<whatever(...)>.
+This is a static function and is defined as C<STATIC S_whatever>, and
+usually called within the sources as C<whatever(...)>.
=item n
-This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
+This does not need a interpreter context, so the definition has no
+C<pTHX>, and it follows that callers don't use C<aTHX>. (See
L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
=item r
Binary backward compatibility; this function is a macro but also has
a C<Perl_> implementation (which is exported).
+=item others
+
+See the comments at the top of C<embed.fnc> for others.
+
=back
If you edit F<embed.pl> or F<embed.fnc>, you will need to run
AV *av = ...;
UV uv = PTR2UV(av);
+=head2 Exception Handling
+
+There are a couple of macros to do very basic exception handling in XS
+modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
+be able to use these macros:
+
+ #define NO_XSLOCKS
+ #include "XSUB.h"
+
+You can use these macros if you call code that may croak, but you need
+to do some cleanup before giving control back to Perl. For example:
+
+ dXCPT; /* set up necessary variables */
+
+ XCPT_TRY_START {
+ code_that_may_croak();
+ } XCPT_TRY_END
+
+ XCPT_CATCH
+ {
+ /* do cleanup here */
+ XCPT_RETHROW;
+ }
+
+Note that you always have to rethrow an exception that has been
+caught. Using these macros, it is not possible to just catch the
+exception and ignore it. If you have to ignore the exception, you
+have to use the C<call_*> function.
+
+The advantage of using the above macros is that you don't have
+to setup an extra function for C<call_*>, and that using these
+macros is faster than using C<call_*>.
+
=head2 Source Documentation
There's an effort going on to document the internal functions and
Please try and supply some documentation if you add functions to the
Perl core.
+=head2 Backwards compatibility
+
+The Perl API changes over time. New functions are added or the interfaces
+of existing functions are changed. The C<Devel::PPPort> module tries to
+provide compatibility code for some of these changes, so XS writers don't
+have to code it themselves when supporting multiple versions of Perl.
+
+C<Devel::PPPort> generates a C header file F<ppport.h> that can also
+be run as a Perl script. To generate F<ppport.h>, run:
+
+ perl -MDevel::PPPort -eDevel::PPPort::WriteFile
+
+Besides checking existing XS code, the script can also be used to retrieve
+compatibility information for various API calls using the C<--api-info>
+command line switch. For example:
+
+ % perl ppport.h --api-info=sv_magicext
+
+For details, see C<perldoc ppport.h>.
+
=head1 Unicode Support
Perl 5.6.0 introduced Unicode support. It's important for porters and XS
produced a new character set containing all the characters you can
possibly think of and more. There are several ways of representing these
characters, and the one Perl uses is called UTF-8. UTF-8 uses
-a variable number of bytes to represent a character, instead of just
-one. You can learn more about Unicode at http://www.unicode.org/
+a variable number of bytes to represent a character. You can learn more
+about Unicode and Perl's Unicode model in L<perlunicode>.
=head2 How can I recognise a UTF-8 string?
has that byte sequence as well. So you can't tell just by looking - this
is what makes Unicode input an interesting problem.
-The API function C<is_utf8_string> can help; it'll tell you if a string
-contains only valid UTF-8 characters. However, it can't do the work for
-you. On a character-by-character basis, C<is_utf8_char> will tell you
-whether the current character in a string is valid UTF-8.
+In general, you either have to know what you're dealing with, or you
+have to guess. The API function C<is_utf8_string> can help; it'll tell
+you if a string contains only valid UTF-8 characters. However, it can't
+do the work for you. On a character-by-character basis, C<is_utf8_char>
+will tell you whether the current character in a string is valid UTF-8.
=head2 How does UTF-8 represent Unicode characters?
As mentioned above, UTF-8 uses a variable number of bytes to store a
-character. Characters with values 1...128 are stored in one byte, just
-like good ol' ASCII. Character 129 is stored as C<v194.129>; this
+character. Characters with values 0...127 are stored in one byte, just
+like good ol' ASCII. Character 128 is stored as C<v194.128>; this
continues up to character 191, which is C<v194.191>. Now we've run out of
bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
so it goes on, moving to three bytes at character 2048.
=head2 How does Perl store UTF-8 strings?
Currently, Perl deals with Unicode strings and non-Unicode strings
-slightly differently. If a string has been identified as being UTF-8
-encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
-manipulate this flag with the following macros:
+slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
+string is internally encoded as UTF-8. Without it, the byte value is the
+codepoint number and vice versa (in other words, the string is encoded
+as iso-8859-1). You can check and manipulate this flag with the
+following macros:
SvUTF8(sv)
SvUTF8_on(sv)
undesirable results.
The problem comes when you have, for instance, a string that isn't
-flagged is UTF-8, and contains a byte sequence that could be UTF-8 -
+flagged as UTF-8, and contains a byte sequence that could be UTF-8 -
especially when combining non-UTF-8 and UTF-8 strings.
Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
The C<char*> string does not tell you the whole story, and you can't
copy or reconstruct an SV just by copying the string value. Check if the
-old SV has the UTF-8 flag set, and act accordingly:
+old SV has the UTF8 flag set, and act accordingly:
p = SvPV(sv, len);
frobnicate(p);
appropriately.
Since just passing an SV to an XS function and copying the data of
-the SV is not enough to copy the UTF-8 flags, even less right is just
+the SV is not enough to copy the UTF8 flags, even less right is just
passing a C<char *> to an XS function.
=head2 How do I convert a string to UTF-8?
-If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary
-to upgrade one of the strings to UTF-8. If you've got an SV, the easiest
-way to do this is:
+If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
+one of the strings to UTF-8. If you've got an SV, the easiest way to do
+this is:
sv_utf8_upgrade(sv);
If you do this in a binary operator, you will actually change one of the
strings that came into the operator, and, while it shouldn't be noticeable
-by the end user, it can cause problems.
+by the end user, it can cause problems in deficient code.
Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
string argument. This is useful for having the data available for
interpreters for other languages in the Perl core, but it also allows
optimizations through the creation of "macro-ops" (ops which perform the
functions of multiple ops which are usually executed together, such as
-C<gvsv, gvsv, add>.)
+C<gvsv, gvsv, add>.)
This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
core does not "know" anything special about this op type, and so it will
place in the C<PL_custom_op_names> and C<PL_custom_op_descs> hashes.
Forthcoming versions of C<B::Generate> (version 1.0 and above) should
-directly support the creation of custom ops by name; C<Opcodes::Custom>
-will provide functions which make it trivial to "register" custom ops to
-the Perl interpreter.
+directly support the creation of custom ops by name.
=head1 AUTHORS