Commit | Line | Data |
a0d0e21e |
1 | =head1 NAME |
2 | |
954c1994 |
3 | perlguts - Introduction to the Perl API |
a0d0e21e |
4 | |
5 | =head1 DESCRIPTION |
6 | |
954c1994 |
7 | This document attempts to describe how to use the Perl API, as well as containing |
8 | some info on the basic workings of the Perl core. It is far from complete |
9 | and probably contains many errors. Please refer any questions or |
10 | comments to the author below. |
a0d0e21e |
11 | |
0a753a76 |
12 | =head1 Variables |
13 | |
5f05dabc |
14 | =head2 Datatypes |
a0d0e21e |
15 | |
16 | Perl has three typedefs that handle Perl's three main data types: |
17 | |
18 | SV Scalar Value |
19 | AV Array Value |
20 | HV Hash Value |
21 | |
d1b91892 |
22 | Each typedef has specific routines that manipulate the various data types. |
a0d0e21e |
23 | |
24 | =head2 What is an "IV"? |
25 | |
954c1994 |
26 | Perl uses a special typedef IV which is a simple signed integer type that is |
5f05dabc |
27 | guaranteed to be large enough to hold a pointer (as well as an integer). |
954c1994 |
28 | Additionally, there is the UV, which is simply an unsigned IV. |
a0d0e21e |
29 | |
d1b91892 |
30 | Perl also uses two special typedefs, I32 and I16, which will always be at |
954c1994 |
31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, |
32 | as well.) |
a0d0e21e |
33 | |
54310121 |
34 | =head2 Working with SVs |
a0d0e21e |
35 | |
36 | An SV can be created and loaded with one command. There are four types of |
37 | values that can be loaded: an integer value (IV), a double (NV), a string, |
38 | (PV), and another scalar (SV). |
39 | |
9da1e3b5 |
40 | The six routines are: |
a0d0e21e |
41 | |
42 | SV* newSViv(IV); |
43 | SV* newSVnv(double); |
08105a92 |
44 | SV* newSVpv(const char*, int); |
45 | SV* newSVpvn(const char*, int); |
46fc3d4c |
46 | SV* newSVpvf(const char*, ...); |
a0d0e21e |
47 | SV* newSVsv(SV*); |
48 | |
deb3007b |
49 | To change the value of an *already-existing* SV, there are seven routines: |
a0d0e21e |
50 | |
51 | void sv_setiv(SV*, IV); |
deb3007b |
52 | void sv_setuv(SV*, UV); |
a0d0e21e |
53 | void sv_setnv(SV*, double); |
08105a92 |
54 | void sv_setpv(SV*, const char*); |
55 | void sv_setpvn(SV*, const char*, int) |
46fc3d4c |
56 | void sv_setpvf(SV*, const char*, ...); |
9abd00ed |
57 | void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
a0d0e21e |
58 | void sv_setsv(SV*, SV*); |
59 | |
60 | Notice that you can choose to specify the length of the string to be |
9da1e3b5 |
61 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
62 | allow Perl to calculate the length by using C<sv_setpv> or by specifying |
63 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will |
64 | determine the string's length by using C<strlen>, which depends on the |
9abd00ed |
65 | string terminating with a NUL character. |
66 | |
67 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the |
68 | formatted output becomes the value. |
69 | |
70 | C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
71 | either a pointer to a variable argument list or the address and length of |
72 | an array of SVs. The last argument points to a boolean; on return, if that |
73 | boolean is true, then locale-specific information has been used to format |
c2611fb3 |
74 | the string, and the string's contents are therefore untrustworthy (see |
9abd00ed |
75 | L<perlsec>). This pointer may be NULL if that information is not |
76 | important. Note that this function requires you to specify the length of |
77 | the format. |
78 | |
9da1e3b5 |
79 | The C<sv_set*()> functions are not generic enough to operate on values |
80 | that have "magic". See L<Magic Virtual Tables> later in this document. |
a0d0e21e |
81 | |
a3cb178b |
82 | All SVs that contain strings should be terminated with a NUL character. |
83 | If it is not NUL-terminated there is a risk of |
5f05dabc |
84 | core dumps and corruptions from code which passes the string to C |
85 | functions or system calls which expect a NUL-terminated string. |
86 | Perl's own functions typically add a trailing NUL for this reason. |
87 | Nevertheless, you should be very careful when you pass a string stored |
88 | in an SV to a C function or system call. |
89 | |
a0d0e21e |
90 | To access the actual value that an SV points to, you can use the macros: |
91 | |
92 | SvIV(SV*) |
954c1994 |
93 | SvUV(SV*) |
a0d0e21e |
94 | SvNV(SV*) |
95 | SvPV(SV*, STRLEN len) |
1fa8b10d |
96 | SvPV_nolen(SV*) |
a0d0e21e |
97 | |
954c1994 |
98 | which will automatically coerce the actual scalar type into an IV, UV, double, |
a0d0e21e |
99 | or string. |
100 | |
101 | In the C<SvPV> macro, the length of the string returned is placed into the |
1fa8b10d |
102 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do |
103 | not care what the length of the data is, use the C<SvPV_nolen> macro. |
104 | Historically the C<SvPV> macro with the global variable C<PL_na> has been |
105 | used in this case. But that can be quite inefficient because C<PL_na> must |
106 | be accessed in thread-local storage in threaded Perl. In any case, remember |
107 | that Perl allows arbitrary strings of data that may both contain NULs and |
108 | might not be terminated by a NUL. |
a0d0e21e |
109 | |
ce2f5d8f |
110 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), |
111 | len);>. It might work with your compiler, but it won't work for everyone. |
112 | Break this sort of statement up into separate assignments: |
113 | |
b2f5ed49 |
114 | SV *s; |
ce2f5d8f |
115 | STRLEN len; |
116 | char * ptr; |
b2f5ed49 |
117 | ptr = SvPV(s, len); |
ce2f5d8f |
118 | foo(ptr, len); |
119 | |
07fa94a1 |
120 | If you want to know if the scalar value is TRUE, you can use: |
a0d0e21e |
121 | |
122 | SvTRUE(SV*) |
123 | |
124 | Although Perl will automatically grow strings for you, if you need to force |
125 | Perl to allocate more memory for your SV, you can use the macro |
126 | |
127 | SvGROW(SV*, STRLEN newlen) |
128 | |
129 | which will determine if more memory needs to be allocated. If so, it will |
130 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not |
5f05dabc |
131 | decrease, the allocated memory of an SV and that it does not automatically |
132 | add a byte for the a trailing NUL (perl's own string functions typically do |
8ebc5c01 |
133 | C<SvGROW(sv, len + 1)>). |
a0d0e21e |
134 | |
135 | If you have an SV and want to know what kind of data Perl thinks is stored |
136 | in it, you can use the following macros to check the type of SV you have. |
137 | |
138 | SvIOK(SV*) |
139 | SvNOK(SV*) |
140 | SvPOK(SV*) |
141 | |
142 | You can get and set the current length of the string stored in an SV with |
143 | the following macros: |
144 | |
145 | SvCUR(SV*) |
146 | SvCUR_set(SV*, I32 val) |
147 | |
cb1a09d0 |
148 | You can also get a pointer to the end of the string stored in the SV |
149 | with the macro: |
150 | |
151 | SvEND(SV*) |
152 | |
153 | But note that these last three macros are valid only if C<SvPOK()> is true. |
a0d0e21e |
154 | |
d1b91892 |
155 | If you want to append something to the end of string stored in an C<SV*>, |
156 | you can use the following functions: |
157 | |
08105a92 |
158 | void sv_catpv(SV*, const char*); |
e65f3abd |
159 | void sv_catpvn(SV*, const char*, STRLEN); |
46fc3d4c |
160 | void sv_catpvf(SV*, const char*, ...); |
9abd00ed |
161 | void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
d1b91892 |
162 | void sv_catsv(SV*, SV*); |
163 | |
164 | The first function calculates the length of the string to be appended by |
165 | using C<strlen>. In the second, you specify the length of the string |
46fc3d4c |
166 | yourself. The third function processes its arguments like C<sprintf> and |
9abd00ed |
167 | appends the formatted output. The fourth function works like C<vsprintf>. |
168 | You can specify the address and length of an array of SVs instead of the |
169 | va_list argument. The fifth function extends the string stored in the first |
170 | SV with the string stored in the second SV. It also forces the second SV |
171 | to be interpreted as a string. |
172 | |
173 | The C<sv_cat*()> functions are not generic enough to operate on values that |
174 | have "magic". See L<Magic Virtual Tables> later in this document. |
d1b91892 |
175 | |
a0d0e21e |
176 | If you know the name of a scalar variable, you can get a pointer to its SV |
177 | by using the following: |
178 | |
4929bf7b |
179 | SV* get_sv("package::varname", FALSE); |
a0d0e21e |
180 | |
181 | This returns NULL if the variable does not exist. |
182 | |
d1b91892 |
183 | If you want to know if this variable (or any other SV) is actually C<defined>, |
a0d0e21e |
184 | you can call: |
185 | |
186 | SvOK(SV*) |
187 | |
9cde0e7f |
188 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its |
a0d0e21e |
189 | address can be used whenever an C<SV*> is needed. |
190 | |
9cde0e7f |
191 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean |
192 | TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can |
a0d0e21e |
193 | be used whenever an C<SV*> is needed. |
194 | |
9cde0e7f |
195 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
a0d0e21e |
196 | Take this code: |
197 | |
198 | SV* sv = (SV*) 0; |
199 | if (I-am-to-return-a-real-value) { |
200 | sv = sv_2mortal(newSViv(42)); |
201 | } |
202 | sv_setsv(ST(0), sv); |
203 | |
204 | This code tries to return a new SV (which contains the value 42) if it should |
04343c6d |
205 | return a real value, or undef otherwise. Instead it has returned a NULL |
a0d0e21e |
206 | pointer which, somewhere down the line, will cause a segmentation violation, |
9cde0e7f |
207 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first |
5f05dabc |
208 | line and all will be well. |
a0d0e21e |
209 | |
210 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this |
3fe9a6f1 |
211 | call is not necessary (see L<Reference Counts and Mortality>). |
a0d0e21e |
212 | |
d1b91892 |
213 | =head2 What's Really Stored in an SV? |
a0d0e21e |
214 | |
215 | Recall that the usual method of determining the type of scalar you have is |
5f05dabc |
216 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
d1b91892 |
217 | usually these macros will always return TRUE and calling the C<Sv*V> |
a0d0e21e |
218 | macros will do the appropriate conversion of string to integer/double or |
219 | integer/double to string. |
220 | |
221 | If you I<really> need to know if you have an integer, double, or string |
222 | pointer in an SV, you can use the following three macros instead: |
223 | |
224 | SvIOKp(SV*) |
225 | SvNOKp(SV*) |
226 | SvPOKp(SV*) |
227 | |
228 | These will tell you if you truly have an integer, double, or string pointer |
d1b91892 |
229 | stored in your SV. The "p" stands for private. |
a0d0e21e |
230 | |
07fa94a1 |
231 | In general, though, it's best to use the C<Sv*V> macros. |
a0d0e21e |
232 | |
54310121 |
233 | =head2 Working with AVs |
a0d0e21e |
234 | |
07fa94a1 |
235 | There are two ways to create and load an AV. The first method creates an |
236 | empty AV: |
a0d0e21e |
237 | |
238 | AV* newAV(); |
239 | |
54310121 |
240 | The second method both creates the AV and initially populates it with SVs: |
a0d0e21e |
241 | |
242 | AV* av_make(I32 num, SV **ptr); |
243 | |
5f05dabc |
244 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
54310121 |
245 | AV has been created, the SVs can be destroyed, if so desired. |
a0d0e21e |
246 | |
54310121 |
247 | Once the AV has been created, the following operations are possible on AVs: |
a0d0e21e |
248 | |
249 | void av_push(AV*, SV*); |
250 | SV* av_pop(AV*); |
251 | SV* av_shift(AV*); |
252 | void av_unshift(AV*, I32 num); |
253 | |
254 | These should be familiar operations, with the exception of C<av_unshift>. |
255 | This routine adds C<num> elements at the front of the array with the C<undef> |
256 | value. You must then use C<av_store> (described below) to assign values |
257 | to these new elements. |
258 | |
259 | Here are some other functions: |
260 | |
5f05dabc |
261 | I32 av_len(AV*); |
a0d0e21e |
262 | SV** av_fetch(AV*, I32 key, I32 lval); |
a0d0e21e |
263 | SV** av_store(AV*, I32 key, SV* val); |
a0d0e21e |
264 | |
5f05dabc |
265 | The C<av_len> function returns the highest index value in array (just |
266 | like $#array in Perl). If the array is empty, -1 is returned. The |
267 | C<av_fetch> function returns the value at index C<key>, but if C<lval> |
268 | is non-zero, then C<av_fetch> will store an undef value at that index. |
04343c6d |
269 | The C<av_store> function stores the value C<val> at index C<key>, and does |
270 | not increment the reference count of C<val>. Thus the caller is responsible |
271 | for taking care of that, and if C<av_store> returns NULL, the caller will |
272 | have to decrement the reference count to avoid a memory leak. Note that |
273 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their |
274 | return value. |
d1b91892 |
275 | |
a0d0e21e |
276 | void av_clear(AV*); |
a0d0e21e |
277 | void av_undef(AV*); |
cb1a09d0 |
278 | void av_extend(AV*, I32 key); |
5f05dabc |
279 | |
280 | The C<av_clear> function deletes all the elements in the AV* array, but |
281 | does not actually delete the array itself. The C<av_undef> function will |
282 | delete all the elements in the array plus the array itself. The |
adc882cf |
283 | C<av_extend> function extends the array so that it contains at least C<key+1> |
284 | elements. If C<key+1> is less than the currently allocated length of the array, |
285 | then nothing is done. |
a0d0e21e |
286 | |
287 | If you know the name of an array variable, you can get a pointer to its AV |
288 | by using the following: |
289 | |
4929bf7b |
290 | AV* get_av("package::varname", FALSE); |
a0d0e21e |
291 | |
292 | This returns NULL if the variable does not exist. |
293 | |
04343c6d |
294 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
295 | information on how to use the array access functions on tied arrays. |
296 | |
54310121 |
297 | =head2 Working with HVs |
a0d0e21e |
298 | |
299 | To create an HV, you use the following routine: |
300 | |
301 | HV* newHV(); |
302 | |
54310121 |
303 | Once the HV has been created, the following operations are possible on HVs: |
a0d0e21e |
304 | |
08105a92 |
305 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
306 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); |
a0d0e21e |
307 | |
5f05dabc |
308 | The C<klen> parameter is the length of the key being passed in (Note that |
309 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the |
310 | length of the key). The C<val> argument contains the SV pointer to the |
54310121 |
311 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
5f05dabc |
312 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
313 | indicates whether this fetch is actually a part of a store operation, in |
314 | which case a new undefined value will be added to the HV with the supplied |
315 | key and C<hv_fetch> will return as if the value had already existed. |
a0d0e21e |
316 | |
5f05dabc |
317 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
318 | C<SV*>. To access the scalar value, you must first dereference the return |
319 | value. However, you should check to make sure that the return value is |
320 | not NULL before dereferencing it. |
a0d0e21e |
321 | |
322 | These two functions check if a hash table entry exists, and deletes it. |
323 | |
08105a92 |
324 | bool hv_exists(HV*, const char* key, U32 klen); |
325 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); |
a0d0e21e |
326 | |
5f05dabc |
327 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
328 | create and return a mortal copy of the deleted value. |
329 | |
a0d0e21e |
330 | And more miscellaneous functions: |
331 | |
332 | void hv_clear(HV*); |
a0d0e21e |
333 | void hv_undef(HV*); |
5f05dabc |
334 | |
335 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash |
336 | table but does not actually delete the hash table. The C<hv_undef> deletes |
337 | both the entries and the hash table itself. |
a0d0e21e |
338 | |
d1b91892 |
339 | Perl keeps the actual data in linked list of structures with a typedef of HE. |
340 | These contain the actual key and value pointers (plus extra administrative |
341 | overhead). The key is a string pointer; the value is an C<SV*>. However, |
342 | once you have an C<HE*>, to get the actual key and value, use the routines |
343 | specified below. |
344 | |
a0d0e21e |
345 | I32 hv_iterinit(HV*); |
346 | /* Prepares starting point to traverse hash table */ |
347 | HE* hv_iternext(HV*); |
348 | /* Get the next entry, and return a pointer to a |
349 | structure that has both the key and value */ |
350 | char* hv_iterkey(HE* entry, I32* retlen); |
351 | /* Get the key from an HE structure and also return |
352 | the length of the key string */ |
cb1a09d0 |
353 | SV* hv_iterval(HV*, HE* entry); |
a0d0e21e |
354 | /* Return a SV pointer to the value of the HE |
355 | structure */ |
cb1a09d0 |
356 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
d1b91892 |
357 | /* This convenience routine combines hv_iternext, |
358 | hv_iterkey, and hv_iterval. The key and retlen |
359 | arguments are return values for the key and its |
360 | length. The value is returned in the SV* argument */ |
a0d0e21e |
361 | |
362 | If you know the name of a hash variable, you can get a pointer to its HV |
363 | by using the following: |
364 | |
4929bf7b |
365 | HV* get_hv("package::varname", FALSE); |
a0d0e21e |
366 | |
367 | This returns NULL if the variable does not exist. |
368 | |
8ebc5c01 |
369 | The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: |
a0d0e21e |
370 | |
a0d0e21e |
371 | hash = 0; |
ab192400 |
372 | while (klen--) |
373 | hash = (hash * 33) + *key++; |
87275199 |
374 | hash = hash + (hash >> 5); /* after 5.6 */ |
ab192400 |
375 | |
87275199 |
376 | The last step was added in version 5.6 to improve distribution of |
ab192400 |
377 | lower bits in the resulting hash value. |
a0d0e21e |
378 | |
04343c6d |
379 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
380 | information on how to use the hash access functions on tied hashes. |
381 | |
1e422769 |
382 | =head2 Hash API Extensions |
383 | |
384 | Beginning with version 5.004, the following functions are also supported: |
385 | |
386 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); |
387 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); |
c47ff5f1 |
388 | |
1e422769 |
389 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
390 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); |
c47ff5f1 |
391 | |
1e422769 |
392 | SV* hv_iterkeysv (HE* entry); |
393 | |
394 | Note that these functions take C<SV*> keys, which simplifies writing |
395 | of extension code that deals with hash structures. These functions |
396 | also allow passing of C<SV*> keys to C<tie> functions without forcing |
397 | you to stringify the keys (unlike the previous set of functions). |
398 | |
399 | They also return and accept whole hash entries (C<HE*>), making their |
400 | use more efficient (since the hash number for a particular string |
4a4eefd0 |
401 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
402 | descriptions. |
1e422769 |
403 | |
404 | The following macros must always be used to access the contents of hash |
405 | entries. Note that the arguments to these macros must be simple |
406 | variables, since they may get evaluated more than once. See |
4a4eefd0 |
407 | L<perlapi> for detailed descriptions of these macros. |
1e422769 |
408 | |
409 | HePV(HE* he, STRLEN len) |
410 | HeVAL(HE* he) |
411 | HeHASH(HE* he) |
412 | HeSVKEY(HE* he) |
413 | HeSVKEY_force(HE* he) |
414 | HeSVKEY_set(HE* he, SV* sv) |
415 | |
416 | These two lower level macros are defined, but must only be used when |
417 | dealing with keys that are not C<SV*>s: |
418 | |
419 | HeKEY(HE* he) |
420 | HeKLEN(HE* he) |
421 | |
04343c6d |
422 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
423 | reference count of the stored C<val>, which is the caller's responsibility. |
424 | If these functions return a NULL value, the caller will usually have to |
425 | decrement the reference count of C<val> to avoid a memory leak. |
1e422769 |
426 | |
a0d0e21e |
427 | =head2 References |
428 | |
d1b91892 |
429 | References are a special type of scalar that point to other data types |
430 | (including references). |
a0d0e21e |
431 | |
07fa94a1 |
432 | To create a reference, use either of the following functions: |
a0d0e21e |
433 | |
5f05dabc |
434 | SV* newRV_inc((SV*) thing); |
435 | SV* newRV_noinc((SV*) thing); |
a0d0e21e |
436 | |
5f05dabc |
437 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
07fa94a1 |
438 | functions are identical except that C<newRV_inc> increments the reference |
439 | count of the C<thing>, while C<newRV_noinc> does not. For historical |
440 | reasons, C<newRV> is a synonym for C<newRV_inc>. |
441 | |
442 | Once you have a reference, you can use the following macro to dereference |
443 | the reference: |
a0d0e21e |
444 | |
445 | SvRV(SV*) |
446 | |
447 | then call the appropriate routines, casting the returned C<SV*> to either an |
d1b91892 |
448 | C<AV*> or C<HV*>, if required. |
a0d0e21e |
449 | |
d1b91892 |
450 | To determine if an SV is a reference, you can use the following macro: |
a0d0e21e |
451 | |
452 | SvROK(SV*) |
453 | |
07fa94a1 |
454 | To discover what type of value the reference refers to, use the following |
455 | macro and then check the return value. |
d1b91892 |
456 | |
457 | SvTYPE(SvRV(SV*)) |
458 | |
459 | The most useful types that will be returned are: |
460 | |
461 | SVt_IV Scalar |
462 | SVt_NV Scalar |
463 | SVt_PV Scalar |
5f05dabc |
464 | SVt_RV Scalar |
d1b91892 |
465 | SVt_PVAV Array |
466 | SVt_PVHV Hash |
467 | SVt_PVCV Code |
5f05dabc |
468 | SVt_PVGV Glob (possible a file handle) |
469 | SVt_PVMG Blessed or Magical Scalar |
470 | |
471 | See the sv.h header file for more details. |
d1b91892 |
472 | |
cb1a09d0 |
473 | =head2 Blessed References and Class Objects |
474 | |
475 | References are also used to support object-oriented programming. In the |
476 | OO lexicon, an object is simply a reference that has been blessed into a |
477 | package (or class). Once blessed, the programmer may now use the reference |
478 | to access the various methods in the class. |
479 | |
480 | A reference can be blessed into a package with the following function: |
481 | |
482 | SV* sv_bless(SV* sv, HV* stash); |
483 | |
484 | The C<sv> argument must be a reference. The C<stash> argument specifies |
3fe9a6f1 |
485 | which class the reference will belong to. See |
2ae324a7 |
486 | L<Stashes and Globs> for information on converting class names into stashes. |
cb1a09d0 |
487 | |
488 | /* Still under construction */ |
489 | |
490 | Upgrades rv to reference if not already one. Creates new SV for rv to |
8ebc5c01 |
491 | point to. If C<classname> is non-null, the SV is blessed into the specified |
492 | class. SV is returned. |
cb1a09d0 |
493 | |
08105a92 |
494 | SV* newSVrv(SV* rv, const char* classname); |
cb1a09d0 |
495 | |
8ebc5c01 |
496 | Copies integer or double into an SV whose reference is C<rv>. SV is blessed |
497 | if C<classname> is non-null. |
cb1a09d0 |
498 | |
08105a92 |
499 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
500 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
cb1a09d0 |
501 | |
5f05dabc |
502 | Copies the pointer value (I<the address, not the string!>) into an SV whose |
8ebc5c01 |
503 | reference is rv. SV is blessed if C<classname> is non-null. |
cb1a09d0 |
504 | |
08105a92 |
505 | SV* sv_setref_pv(SV* rv, const char* classname, PV iv); |
cb1a09d0 |
506 | |
8ebc5c01 |
507 | Copies string into an SV whose reference is C<rv>. Set length to 0 to let |
508 | Perl calculate the string length. SV is blessed if C<classname> is non-null. |
cb1a09d0 |
509 | |
e65f3abd |
510 | SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length); |
cb1a09d0 |
511 | |
9abd00ed |
512 | Tests whether the SV is blessed into the specified class. It does not |
513 | check inheritance relationships. |
514 | |
08105a92 |
515 | int sv_isa(SV* sv, const char* name); |
9abd00ed |
516 | |
517 | Tests whether the SV is a reference to a blessed object. |
518 | |
519 | int sv_isobject(SV* sv); |
520 | |
521 | Tests whether the SV is derived from the specified class. SV can be either |
522 | a reference to a blessed object or a string containing a class name. This |
523 | is the function implementing the C<UNIVERSAL::isa> functionality. |
524 | |
08105a92 |
525 | bool sv_derived_from(SV* sv, const char* name); |
9abd00ed |
526 | |
527 | To check if you've got an object derived from a specific class you have |
528 | to write: |
529 | |
530 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } |
cb1a09d0 |
531 | |
5f05dabc |
532 | =head2 Creating New Variables |
cb1a09d0 |
533 | |
5f05dabc |
534 | To create a new Perl variable with an undef value which can be accessed from |
535 | your Perl script, use the following routines, depending on the variable type. |
cb1a09d0 |
536 | |
4929bf7b |
537 | SV* get_sv("package::varname", TRUE); |
538 | AV* get_av("package::varname", TRUE); |
539 | HV* get_hv("package::varname", TRUE); |
cb1a09d0 |
540 | |
541 | Notice the use of TRUE as the second parameter. The new variable can now |
542 | be set, using the routines appropriate to the data type. |
543 | |
5f05dabc |
544 | There are additional macros whose values may be bitwise OR'ed with the |
545 | C<TRUE> argument to enable certain extra features. Those bits are: |
cb1a09d0 |
546 | |
5f05dabc |
547 | GV_ADDMULTI Marks the variable as multiply defined, thus preventing the |
54310121 |
548 | "Name <varname> used only once: possible typo" warning. |
07fa94a1 |
549 | GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if |
550 | the variable did not exist before the function was called. |
cb1a09d0 |
551 | |
07fa94a1 |
552 | If you do not specify a package name, the variable is created in the current |
553 | package. |
cb1a09d0 |
554 | |
5f05dabc |
555 | =head2 Reference Counts and Mortality |
a0d0e21e |
556 | |
54310121 |
557 | Perl uses an reference count-driven garbage collection mechanism. SVs, |
558 | AVs, or HVs (xV for short in the following) start their life with a |
55497cff |
559 | reference count of 1. If the reference count of an xV ever drops to 0, |
07fa94a1 |
560 | then it will be destroyed and its memory made available for reuse. |
55497cff |
561 | |
562 | This normally doesn't happen at the Perl level unless a variable is |
5f05dabc |
563 | undef'ed or the last variable holding a reference to it is changed or |
564 | overwritten. At the internal level, however, reference counts can be |
55497cff |
565 | manipulated with the following macros: |
566 | |
567 | int SvREFCNT(SV* sv); |
5f05dabc |
568 | SV* SvREFCNT_inc(SV* sv); |
55497cff |
569 | void SvREFCNT_dec(SV* sv); |
570 | |
571 | However, there is one other function which manipulates the reference |
07fa94a1 |
572 | count of its argument. The C<newRV_inc> function, you will recall, |
573 | creates a reference to the specified argument. As a side effect, |
574 | it increments the argument's reference count. If this is not what |
575 | you want, use C<newRV_noinc> instead. |
576 | |
577 | For example, imagine you want to return a reference from an XSUB function. |
578 | Inside the XSUB routine, you create an SV which initially has a reference |
579 | count of one. Then you call C<newRV_inc>, passing it the just-created SV. |
5f05dabc |
580 | This returns the reference as a new SV, but the reference count of the |
581 | SV you passed to C<newRV_inc> has been incremented to two. Now you |
07fa94a1 |
582 | return the reference from the XSUB routine and forget about the SV. |
583 | But Perl hasn't! Whenever the returned reference is destroyed, the |
584 | reference count of the original SV is decreased to one and nothing happens. |
585 | The SV will hang around without any way to access it until Perl itself |
586 | terminates. This is a memory leak. |
5f05dabc |
587 | |
588 | The correct procedure, then, is to use C<newRV_noinc> instead of |
faed5253 |
589 | C<newRV_inc>. Then, if and when the last reference is destroyed, |
590 | the reference count of the SV will go to zero and it will be destroyed, |
07fa94a1 |
591 | stopping any memory leak. |
55497cff |
592 | |
5f05dabc |
593 | There are some convenience functions available that can help with the |
54310121 |
594 | destruction of xVs. These functions introduce the concept of "mortality". |
07fa94a1 |
595 | An xV that is mortal has had its reference count marked to be decremented, |
596 | but not actually decremented, until "a short time later". Generally the |
597 | term "short time later" means a single Perl statement, such as a call to |
54310121 |
598 | an XSUB function. The actual determinant for when mortal xVs have their |
07fa94a1 |
599 | reference count decremented depends on two macros, SAVETMPS and FREETMPS. |
600 | See L<perlcall> and L<perlxs> for more details on these macros. |
55497cff |
601 | |
602 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. |
603 | However, if you mortalize a variable twice, the reference count will |
604 | later be decremented twice. |
605 | |
606 | You should be careful about creating mortal variables. Strange things |
607 | can happen if you make the same value mortal within multiple contexts, |
5f05dabc |
608 | or if you make a variable mortal multiple times. |
a0d0e21e |
609 | |
610 | To create a mortal variable, use the functions: |
611 | |
612 | SV* sv_newmortal() |
613 | SV* sv_2mortal(SV*) |
614 | SV* sv_mortalcopy(SV*) |
615 | |
5f05dabc |
616 | The first call creates a mortal SV, the second converts an existing |
617 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the |
618 | third creates a mortal copy of an existing SV. |
a0d0e21e |
619 | |
54310121 |
620 | The mortal routines are not just for SVs -- AVs and HVs can be |
faed5253 |
621 | made mortal by passing their address (type-casted to C<SV*>) to the |
07fa94a1 |
622 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
a0d0e21e |
623 | |
5f05dabc |
624 | =head2 Stashes and Globs |
a0d0e21e |
625 | |
aa689395 |
626 | A "stash" is a hash that contains all of the different objects that |
627 | are contained within a package. Each key of the stash is a symbol |
628 | name (shared by all the different types of objects that have the same |
629 | name), and each value in the hash table is a GV (Glob Value). This GV |
630 | in turn contains references to the various objects of that name, |
631 | including (but not limited to) the following: |
cb1a09d0 |
632 | |
a0d0e21e |
633 | Scalar Value |
634 | Array Value |
635 | Hash Value |
a3cb178b |
636 | I/O Handle |
a0d0e21e |
637 | Format |
638 | Subroutine |
639 | |
9cde0e7f |
640 | There is a single stash called "PL_defstash" that holds the items that exist |
5f05dabc |
641 | in the "main" package. To get at the items in other packages, append the |
642 | string "::" to the package name. The items in the "Foo" package are in |
9cde0e7f |
643 | the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are |
5f05dabc |
644 | in the stash "Baz::" in "Bar::"'s stash. |
a0d0e21e |
645 | |
d1b91892 |
646 | To get the stash pointer for a particular package, use the function: |
a0d0e21e |
647 | |
08105a92 |
648 | HV* gv_stashpv(const char* name, I32 create) |
a0d0e21e |
649 | HV* gv_stashsv(SV*, I32 create) |
650 | |
651 | The first function takes a literal string, the second uses the string stored |
d1b91892 |
652 | in the SV. Remember that a stash is just a hash table, so you get back an |
cb1a09d0 |
653 | C<HV*>. The C<create> flag will create a new package if it is set. |
a0d0e21e |
654 | |
655 | The name that C<gv_stash*v> wants is the name of the package whose symbol table |
656 | you want. The default package is called C<main>. If you have multiply nested |
d1b91892 |
657 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
658 | language itself. |
a0d0e21e |
659 | |
660 | Alternately, if you have an SV that is a blessed reference, you can find |
661 | out the stash pointer by using: |
662 | |
663 | HV* SvSTASH(SvRV(SV*)); |
664 | |
665 | then use the following to get the package name itself: |
666 | |
667 | char* HvNAME(HV* stash); |
668 | |
5f05dabc |
669 | If you need to bless or re-bless an object you can use the following |
670 | function: |
a0d0e21e |
671 | |
672 | SV* sv_bless(SV*, HV* stash) |
673 | |
674 | where the first argument, an C<SV*>, must be a reference, and the second |
675 | argument is a stash. The returned C<SV*> can now be used in the same way |
676 | as any other SV. |
677 | |
d1b91892 |
678 | For more information on references and blessings, consult L<perlref>. |
679 | |
54310121 |
680 | =head2 Double-Typed SVs |
0a753a76 |
681 | |
682 | Scalar variables normally contain only one type of value, an integer, |
683 | double, pointer, or reference. Perl will automatically convert the |
684 | actual scalar data from the stored type into the requested type. |
685 | |
686 | Some scalar variables contain more than one type of scalar data. For |
687 | example, the variable C<$!> contains either the numeric value of C<errno> |
688 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. |
689 | |
690 | To force multiple data values into an SV, you must do two things: use the |
691 | C<sv_set*v> routines to add the additional scalar type, then set a flag |
692 | so that Perl will believe it contains more than one type of data. The |
693 | four macros to set the flags are: |
694 | |
695 | SvIOK_on |
696 | SvNOK_on |
697 | SvPOK_on |
698 | SvROK_on |
699 | |
700 | The particular macro you must use depends on which C<sv_set*v> routine |
701 | you called first. This is because every C<sv_set*v> routine turns on |
702 | only the bit for the particular type of data being set, and turns off |
703 | all the rest. |
704 | |
705 | For example, to create a new Perl variable called "dberror" that contains |
706 | both the numeric and descriptive string error values, you could use the |
707 | following code: |
708 | |
709 | extern int dberror; |
710 | extern char *dberror_list; |
711 | |
4929bf7b |
712 | SV* sv = get_sv("dberror", TRUE); |
0a753a76 |
713 | sv_setiv(sv, (IV) dberror); |
714 | sv_setpv(sv, dberror_list[dberror]); |
715 | SvIOK_on(sv); |
716 | |
717 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the |
718 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. |
719 | |
720 | =head2 Magic Variables |
a0d0e21e |
721 | |
d1b91892 |
722 | [This section still under construction. Ignore everything here. Post no |
723 | bills. Everything not permitted is forbidden.] |
724 | |
d1b91892 |
725 | Any SV may be magical, that is, it has special features that a normal |
726 | SV does not have. These features are stored in the SV structure in a |
5f05dabc |
727 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
d1b91892 |
728 | |
729 | struct magic { |
730 | MAGIC* mg_moremagic; |
731 | MGVTBL* mg_virtual; |
732 | U16 mg_private; |
733 | char mg_type; |
734 | U8 mg_flags; |
735 | SV* mg_obj; |
736 | char* mg_ptr; |
737 | I32 mg_len; |
738 | }; |
739 | |
740 | Note this is current as of patchlevel 0, and could change at any time. |
741 | |
742 | =head2 Assigning Magic |
743 | |
744 | Perl adds magic to an SV using the sv_magic function: |
745 | |
08105a92 |
746 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
d1b91892 |
747 | |
748 | The C<sv> argument is a pointer to the SV that is to acquire a new magical |
749 | feature. |
750 | |
751 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to |
752 | set the C<SVt_PVMG> flag for the C<sv>. Perl then continues by adding |
753 | it to the beginning of the linked list of magical features. Any prior |
754 | entry of the same type of magic is deleted. Note that this can be |
5fb8527f |
755 | overridden, and multiple instances of the same type of magic can be |
d1b91892 |
756 | associated with an SV. |
757 | |
54310121 |
758 | The C<name> and C<namlen> arguments are used to associate a string with |
759 | the magic, typically the name of a variable. C<namlen> is stored in the |
760 | C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd |
d1b91892 |
761 | copy of the name is stored in C<mg_ptr> field. |
762 | |
763 | The sv_magic function uses C<how> to determine which, if any, predefined |
764 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. |
cb1a09d0 |
765 | See the "Magic Virtual Table" section below. The C<how> argument is also |
766 | stored in the C<mg_type> field. |
d1b91892 |
767 | |
768 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> |
769 | structure. If it is not the same as the C<sv> argument, the reference |
770 | count of the C<obj> object is incremented. If it is the same, or if |
04343c6d |
771 | the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is |
d1b91892 |
772 | merely stored, without the reference count being incremented. |
773 | |
cb1a09d0 |
774 | There is also a function to add magic to an C<HV>: |
775 | |
776 | void hv_magic(HV *hv, GV *gv, int how); |
777 | |
778 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. |
779 | |
780 | To remove the magic from an SV, call the function sv_unmagic: |
781 | |
782 | void sv_unmagic(SV *sv, int type); |
783 | |
784 | The C<type> argument should be equal to the C<how> value when the C<SV> |
785 | was initially made magical. |
786 | |
d1b91892 |
787 | =head2 Magic Virtual Tables |
788 | |
789 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a |
790 | C<MGVTBL>, which is a structure of function pointers and stands for |
791 | "Magic Virtual Table" to handle the various operations that might be |
792 | applied to that variable. |
793 | |
794 | The C<MGVTBL> has five pointers to the following routine types: |
795 | |
796 | int (*svt_get)(SV* sv, MAGIC* mg); |
797 | int (*svt_set)(SV* sv, MAGIC* mg); |
798 | U32 (*svt_len)(SV* sv, MAGIC* mg); |
799 | int (*svt_clear)(SV* sv, MAGIC* mg); |
800 | int (*svt_free)(SV* sv, MAGIC* mg); |
801 | |
802 | This MGVTBL structure is set at compile-time in C<perl.h> and there are |
803 | currently 19 types (or 21 with overloading turned on). These different |
804 | structures contain pointers to various routines that perform additional |
805 | actions depending on which function is being called. |
806 | |
807 | Function pointer Action taken |
808 | ---------------- ------------ |
809 | svt_get Do something after the value of the SV is retrieved. |
810 | svt_set Do something after the SV is assigned a value. |
811 | svt_len Report on the SV's length. |
812 | svt_clear Clear something the SV represents. |
813 | svt_free Free any extra storage associated with the SV. |
814 | |
815 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
816 | to an C<mg_type> of '\0') contains: |
817 | |
818 | { magic_get, magic_set, magic_len, 0, 0 } |
819 | |
820 | Thus, when an SV is determined to be magical and of type '\0', if a get |
821 | operation is being performed, the routine C<magic_get> is called. All |
822 | the various routines for the various magical types begin with C<magic_>. |
954c1994 |
823 | NOTE: the magic routines are not considered part of the Perl API, and may |
824 | not be exported by the Perl library. |
d1b91892 |
825 | |
826 | The current kinds of Magic Virtual Tables are: |
827 | |
bdbeb323 |
828 | mg_type MGVTBL Type of magic |
5f05dabc |
829 | ------- ------ ---------------------------- |
bdbeb323 |
830 | \0 vtbl_sv Special scalar variable |
831 | A vtbl_amagic %OVERLOAD hash |
832 | a vtbl_amagicelem %OVERLOAD hash element |
833 | c (none) Holds overload table (AMT) on stash |
834 | B vtbl_bm Boyer-Moore (fast string search) |
d1b91892 |
835 | E vtbl_env %ENV hash |
836 | e vtbl_envelem %ENV hash element |
bdbeb323 |
837 | f vtbl_fm Formline ('compiled' format) |
838 | g vtbl_mglob m//g target / study()ed string |
d1b91892 |
839 | I vtbl_isa @ISA array |
840 | i vtbl_isaelem @ISA array element |
bdbeb323 |
841 | k vtbl_nkeys scalar(keys()) lvalue |
842 | L (none) Debugger %_<filename |
843 | l vtbl_dbline Debugger %_<filename element |
44a8e56a |
844 | o vtbl_collxfrm Locale transformation |
bdbeb323 |
845 | P vtbl_pack Tied array or hash |
846 | p vtbl_packelem Tied array or hash element |
847 | q vtbl_packelem Tied scalar or handle |
848 | S vtbl_sig %SIG hash |
849 | s vtbl_sigelem %SIG hash element |
d1b91892 |
850 | t vtbl_taint Taintedness |
bdbeb323 |
851 | U vtbl_uvar Available for use by extensions |
852 | v vtbl_vec vec() lvalue |
853 | x vtbl_substr substr() lvalue |
854 | y vtbl_defelem Shadow "foreach" iterator variable / |
855 | smart parameter vivification |
856 | * vtbl_glob GV (typeglob) |
857 | # vtbl_arylen Array length ($#ary) |
858 | . vtbl_pos pos() lvalue |
859 | ~ (none) Available for use by extensions |
d1b91892 |
860 | |
68dc0745 |
861 | When an uppercase and lowercase letter both exist in the table, then the |
862 | uppercase letter is used to represent some kind of composite type (a list |
863 | or a hash), and the lowercase letter is used to represent an element of |
d1b91892 |
864 | that composite type. |
865 | |
bdbeb323 |
866 | The '~' and 'U' magic types are defined specifically for use by |
867 | extensions and will not be used by perl itself. Extensions can use |
868 | '~' magic to 'attach' private information to variables (typically |
869 | objects). This is especially useful because there is no way for |
870 | normal perl code to corrupt this private information (unlike using |
871 | extra elements of a hash object). |
872 | |
873 | Similarly, 'U' magic can be used much like tie() to call a C function |
874 | any time a scalar's value is used or changed. The C<MAGIC>'s |
875 | C<mg_ptr> field points to a C<ufuncs> structure: |
876 | |
877 | struct ufuncs { |
878 | I32 (*uf_val)(IV, SV*); |
879 | I32 (*uf_set)(IV, SV*); |
880 | IV uf_index; |
881 | }; |
882 | |
883 | When the SV is read from or written to, the C<uf_val> or C<uf_set> |
884 | function will be called with C<uf_index> as the first arg and a |
1526ead6 |
885 | pointer to the SV as the second. A simple example of how to add 'U' |
886 | magic is shown below. Note that the ufuncs structure is copied by |
887 | sv_magic, so you can safely allocate it on the stack. |
888 | |
889 | void |
890 | Umagic(sv) |
891 | SV *sv; |
892 | PREINIT: |
893 | struct ufuncs uf; |
894 | CODE: |
895 | uf.uf_val = &my_get_fn; |
896 | uf.uf_set = &my_set_fn; |
897 | uf.uf_index = 0; |
898 | sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf)); |
5f05dabc |
899 | |
bdbeb323 |
900 | Note that because multiple extensions may be using '~' or 'U' magic, |
901 | it is important for extensions to take extra care to avoid conflict. |
902 | Typically only using the magic on objects blessed into the same class |
903 | as the extension is sufficient. For '~' magic, it may also be |
904 | appropriate to add an I32 'signature' at the top of the private data |
905 | area and check that. |
5f05dabc |
906 | |
ef50df4b |
907 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
908 | earlier do B<not> invoke 'set' magic on their targets. This must |
909 | be done by the user either by calling the C<SvSETMAGIC()> macro after |
910 | calling these functions, or by using one of the C<sv_set*_mg()> or |
911 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the |
912 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV |
913 | obtained from external sources in functions that don't handle magic. |
4a4eefd0 |
914 | See L<perlapi> for a description of these functions. |
189b2af5 |
915 | For example, calls to the C<sv_cat*()> functions typically need to be |
916 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> |
917 | since their implementation handles 'get' magic. |
918 | |
d1b91892 |
919 | =head2 Finding Magic |
920 | |
921 | MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */ |
922 | |
923 | This routine returns a pointer to the C<MAGIC> structure stored in the SV. |
924 | If the SV does not have that magical feature, C<NULL> is returned. Also, |
54310121 |
925 | if the SV is not of type SVt_PVMG, Perl may core dump. |
d1b91892 |
926 | |
08105a92 |
927 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
d1b91892 |
928 | |
929 | This routine checks to see what types of magic C<sv> has. If the mg_type |
68dc0745 |
930 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
931 | the mg_type field is changed to be the lowercase letter. |
a0d0e21e |
932 | |
04343c6d |
933 | =head2 Understanding the Magic of Tied Hashes and Arrays |
934 | |
935 | Tied hashes and arrays are magical beasts of the 'P' magic type. |
9edb2b46 |
936 | |
937 | WARNING: As of the 5.004 release, proper usage of the array and hash |
938 | access functions requires understanding a few caveats. Some |
939 | of these caveats are actually considered bugs in the API, to be fixed |
940 | in later releases, and are bracketed with [MAYCHANGE] below. If |
941 | you find yourself actually applying such information in this section, be |
942 | aware that the behavior may change in the future, umm, without warning. |
04343c6d |
943 | |
1526ead6 |
944 | The perl tie function associates a variable with an object that implements |
945 | the various GET, SET etc methods. To perform the equivalent of the perl |
946 | tie function from an XSUB, you must mimic this behaviour. The code below |
947 | carries out the necessary steps - firstly it creates a new hash, and then |
948 | creates a second hash which it blesses into the class which will implement |
949 | the tie methods. Lastly it ties the two hashes together, and returns a |
950 | reference to the new tied hash. Note that the code below does NOT call the |
951 | TIEHASH method in the MyTie class - |
952 | see L<Calling Perl Routines from within C Programs> for details on how |
953 | to do this. |
954 | |
955 | SV* |
956 | mytie() |
957 | PREINIT: |
958 | HV *hash; |
959 | HV *stash; |
960 | SV *tie; |
961 | CODE: |
962 | hash = newHV(); |
963 | tie = newRV_noinc((SV*)newHV()); |
964 | stash = gv_stashpv("MyTie", TRUE); |
965 | sv_bless(tie, stash); |
966 | hv_magic(hash, tie, 'P'); |
967 | RETVAL = newRV_noinc(hash); |
968 | OUTPUT: |
969 | RETVAL |
970 | |
04343c6d |
971 | The C<av_store> function, when given a tied array argument, merely |
972 | copies the magic of the array onto the value to be "stored", using |
973 | C<mg_copy>. It may also return NULL, indicating that the value did not |
9edb2b46 |
974 | actually need to be stored in the array. [MAYCHANGE] After a call to |
975 | C<av_store> on a tied array, the caller will usually need to call |
976 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the |
977 | TIEARRAY object. If C<av_store> did return NULL, a call to |
978 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory |
979 | leak. [/MAYCHANGE] |
04343c6d |
980 | |
981 | The previous paragraph is applicable verbatim to tied hash access using the |
982 | C<hv_store> and C<hv_store_ent> functions as well. |
983 | |
984 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and |
985 | C<hv_fetch_ent> actually return an undefined mortal value whose magic |
986 | has been initialized using C<mg_copy>. Note the value so returned does not |
9edb2b46 |
987 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
988 | need to call C<mg_get()> on the returned value in order to actually invoke |
989 | the perl level "FETCH" method on the underlying TIE object. Similarly, |
04343c6d |
990 | you may also call C<mg_set()> on the return value after possibly assigning |
991 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" |
9edb2b46 |
992 | method on the TIE object. [/MAYCHANGE] |
04343c6d |
993 | |
9edb2b46 |
994 | [MAYCHANGE] |
04343c6d |
995 | In other words, the array or hash fetch/store functions don't really |
996 | fetch and store actual values in the case of tied arrays and hashes. They |
997 | merely call C<mg_copy> to attach magic to the values that were meant to be |
998 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually |
999 | do the job of invoking the TIE methods on the underlying objects. Thus |
9edb2b46 |
1000 | the magic mechanism currently implements a kind of lazy access to arrays |
04343c6d |
1001 | and hashes. |
1002 | |
1003 | Currently (as of perl version 5.004), use of the hash and array access |
1004 | functions requires the user to be aware of whether they are operating on |
9edb2b46 |
1005 | "normal" hashes and arrays, or on their tied variants. The API may be |
1006 | changed to provide more transparent access to both tied and normal data |
1007 | types in future versions. |
1008 | [/MAYCHANGE] |
04343c6d |
1009 | |
1010 | You would do well to understand that the TIEARRAY and TIEHASH interfaces |
1011 | are mere sugar to invoke some perl method calls while using the uniform hash |
1012 | and array syntax. The use of this sugar imposes some overhead (typically |
1013 | about two to four extra opcodes per FETCH/STORE operation, in addition to |
1014 | the creation of all the mortal variables required to invoke the methods). |
1015 | This overhead will be comparatively small if the TIE methods are themselves |
1016 | substantial, but if they are only a few statements long, the overhead |
1017 | will not be insignificant. |
1018 | |
d1c897a1 |
1019 | =head2 Localizing changes |
1020 | |
1021 | Perl has a very handy construction |
1022 | |
1023 | { |
1024 | local $var = 2; |
1025 | ... |
1026 | } |
1027 | |
1028 | This construction is I<approximately> equivalent to |
1029 | |
1030 | { |
1031 | my $oldvar = $var; |
1032 | $var = 2; |
1033 | ... |
1034 | $var = $oldvar; |
1035 | } |
1036 | |
1037 | The biggest difference is that the first construction would |
1038 | reinstate the initial value of $var, irrespective of how control exits |
1039 | the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit |
1040 | more efficient as well. |
1041 | |
1042 | There is a way to achieve a similar task from C via Perl API: create a |
1043 | I<pseudo-block>, and arrange for some changes to be automatically |
1044 | undone at the end of it, either explicit, or via a non-local exit (via |
1045 | die()). A I<block>-like construct is created by a pair of |
b687b08b |
1046 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
1047 | Such a construct may be created specially for some important localized |
1048 | task, or an existing one (like boundaries of enclosing Perl |
1049 | subroutine/block, or an existing pair for freeing TMPs) may be |
1050 | used. (In the second case the overhead of additional localization must |
1051 | be almost negligible.) Note that any XSUB is automatically enclosed in |
1052 | an C<ENTER>/C<LEAVE> pair. |
d1c897a1 |
1053 | |
1054 | Inside such a I<pseudo-block> the following service is available: |
1055 | |
1056 | =over |
1057 | |
1058 | =item C<SAVEINT(int i)> |
1059 | |
1060 | =item C<SAVEIV(IV i)> |
1061 | |
1062 | =item C<SAVEI32(I32 i)> |
1063 | |
1064 | =item C<SAVELONG(long i)> |
1065 | |
1066 | These macros arrange things to restore the value of integer variable |
1067 | C<i> at the end of enclosing I<pseudo-block>. |
1068 | |
1069 | =item C<SAVESPTR(s)> |
1070 | |
1071 | =item C<SAVEPPTR(p)> |
1072 | |
1073 | These macros arrange things to restore the value of pointers C<s> and |
1074 | C<p>. C<s> must be a pointer of a type which survives conversion to |
1075 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
1076 | and back. |
1077 | |
1078 | =item C<SAVEFREESV(SV *sv)> |
1079 | |
1080 | The refcount of C<sv> would be decremented at the end of |
1081 | I<pseudo-block>. This is similar to C<sv_2mortal>, which should (?) be |
1082 | used instead. |
1083 | |
1084 | =item C<SAVEFREEOP(OP *op)> |
1085 | |
1086 | The C<OP *> is op_free()ed at the end of I<pseudo-block>. |
1087 | |
1088 | =item C<SAVEFREEPV(p)> |
1089 | |
1090 | The chunk of memory which is pointed to by C<p> is Safefree()ed at the |
1091 | end of I<pseudo-block>. |
1092 | |
1093 | =item C<SAVECLEARSV(SV *sv)> |
1094 | |
1095 | Clears a slot in the current scratchpad which corresponds to C<sv> at |
1096 | the end of I<pseudo-block>. |
1097 | |
1098 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> |
1099 | |
1100 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
1101 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
1102 | short-lived storage, the corresponding string may be reallocated like |
1103 | this: |
1104 | |
9cde0e7f |
1105 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
d1c897a1 |
1106 | |
c76ac1ee |
1107 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
d1c897a1 |
1108 | |
1109 | At the end of I<pseudo-block> the function C<f> is called with the |
c76ac1ee |
1110 | only argument C<p>. |
1111 | |
1112 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> |
1113 | |
1114 | At the end of I<pseudo-block> the function C<f> is called with the |
1115 | implicit context argument (if any), and C<p>. |
d1c897a1 |
1116 | |
1117 | =item C<SAVESTACK_POS()> |
1118 | |
1119 | The current offset on the Perl internal stack (cf. C<SP>) is restored |
1120 | at the end of I<pseudo-block>. |
1121 | |
1122 | =back |
1123 | |
1124 | The following API list contains functions, thus one needs to |
1125 | provide pointers to the modifiable data explicitly (either C pointers, |
1126 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
1127 | function takes C<int *>. |
1128 | |
1129 | =over |
1130 | |
1131 | =item C<SV* save_scalar(GV *gv)> |
1132 | |
1133 | Equivalent to Perl code C<local $gv>. |
1134 | |
1135 | =item C<AV* save_ary(GV *gv)> |
1136 | |
1137 | =item C<HV* save_hash(GV *gv)> |
1138 | |
1139 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. |
1140 | |
1141 | =item C<void save_item(SV *item)> |
1142 | |
1143 | Duplicates the current value of C<SV>, on the exit from the current |
1144 | C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> |
1145 | using the stored value. |
1146 | |
1147 | =item C<void save_list(SV **sarg, I32 maxsarg)> |
1148 | |
1149 | A variant of C<save_item> which takes multiple arguments via an array |
1150 | C<sarg> of C<SV*> of length C<maxsarg>. |
1151 | |
1152 | =item C<SV* save_svref(SV **sptr)> |
1153 | |
1154 | Similar to C<save_scalar>, but will reinstate a C<SV *>. |
1155 | |
1156 | =item C<void save_aptr(AV **aptr)> |
1157 | |
1158 | =item C<void save_hptr(HV **hptr)> |
1159 | |
1160 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. |
1161 | |
1162 | =back |
1163 | |
1164 | The C<Alias> module implements localization of the basic types within the |
1165 | I<caller's scope>. People who are interested in how to localize things in |
1166 | the containing scope should take a look there too. |
1167 | |
0a753a76 |
1168 | =head1 Subroutines |
a0d0e21e |
1169 | |
68dc0745 |
1170 | =head2 XSUBs and the Argument Stack |
5f05dabc |
1171 | |
1172 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. |
1173 | An XSUB routine will have a stack that contains the arguments from the Perl |
1174 | program, and a way to map from the Perl data structures to a C equivalent. |
1175 | |
1176 | The stack arguments are accessible through the C<ST(n)> macro, which returns |
1177 | the C<n>'th stack argument. Argument 0 is the first argument passed in the |
1178 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere |
1179 | an C<SV*> is used. |
1180 | |
1181 | Most of the time, output from the C routine can be handled through use of |
1182 | the RETVAL and OUTPUT directives. However, there are some cases where the |
1183 | argument stack is not already long enough to handle all the return values. |
1184 | An example is the POSIX tzname() call, which takes no arguments, but returns |
1185 | two, the local time zone's standard and summer time abbreviations. |
1186 | |
1187 | To handle this situation, the PPCODE directive is used and the stack is |
1188 | extended using the macro: |
1189 | |
924508f0 |
1190 | EXTEND(SP, num); |
5f05dabc |
1191 | |
924508f0 |
1192 | where C<SP> is the macro that represents the local copy of the stack pointer, |
1193 | and C<num> is the number of elements the stack should be extended by. |
5f05dabc |
1194 | |
1195 | Now that there is room on the stack, values can be pushed on it using the |
54310121 |
1196 | macros to push IVs, doubles, strings, and SV pointers respectively: |
5f05dabc |
1197 | |
1198 | PUSHi(IV) |
1199 | PUSHn(double) |
1200 | PUSHp(char*, I32) |
1201 | PUSHs(SV*) |
1202 | |
1203 | And now the Perl program calling C<tzname>, the two values will be assigned |
1204 | as in: |
1205 | |
1206 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; |
1207 | |
1208 | An alternate (and possibly simpler) method to pushing values on the stack is |
1209 | to use the macros: |
1210 | |
1211 | XPUSHi(IV) |
1212 | XPUSHn(double) |
1213 | XPUSHp(char*, I32) |
1214 | XPUSHs(SV*) |
1215 | |
1216 | These macros automatically adjust the stack for you, if needed. Thus, you |
1217 | do not need to call C<EXTEND> to extend the stack. |
1218 | |
1219 | For more information, consult L<perlxs> and L<perlxstut>. |
1220 | |
1221 | =head2 Calling Perl Routines from within C Programs |
a0d0e21e |
1222 | |
1223 | There are four routines that can be used to call a Perl subroutine from |
1224 | within a C program. These four are: |
1225 | |
954c1994 |
1226 | I32 call_sv(SV*, I32); |
1227 | I32 call_pv(const char*, I32); |
1228 | I32 call_method(const char*, I32); |
1229 | I32 call_argv(const char*, I32, register char**); |
a0d0e21e |
1230 | |
954c1994 |
1231 | The routine most often used is C<call_sv>. The C<SV*> argument |
d1b91892 |
1232 | contains either the name of the Perl subroutine to be called, or a |
1233 | reference to the subroutine. The second argument consists of flags |
1234 | that control the context in which the subroutine is called, whether |
1235 | or not the subroutine is being passed arguments, how errors should be |
1236 | trapped, and how to treat return values. |
a0d0e21e |
1237 | |
1238 | All four routines return the number of arguments that the subroutine returned |
1239 | on the Perl stack. |
1240 | |
954c1994 |
1241 | These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0, |
1242 | but those names are now deprecated; macros of the same name are provided for |
1243 | compatibility. |
1244 | |
1245 | When using any of these routines (except C<call_argv>), the programmer |
d1b91892 |
1246 | must manipulate the Perl stack. These include the following macros and |
1247 | functions: |
a0d0e21e |
1248 | |
1249 | dSP |
924508f0 |
1250 | SP |
a0d0e21e |
1251 | PUSHMARK() |
1252 | PUTBACK |
1253 | SPAGAIN |
1254 | ENTER |
1255 | SAVETMPS |
1256 | FREETMPS |
1257 | LEAVE |
1258 | XPUSH*() |
cb1a09d0 |
1259 | POP*() |
a0d0e21e |
1260 | |
5f05dabc |
1261 | For a detailed description of calling conventions from C to Perl, |
1262 | consult L<perlcall>. |
a0d0e21e |
1263 | |
5f05dabc |
1264 | =head2 Memory Allocation |
a0d0e21e |
1265 | |
86058a2d |
1266 | All memory meant to be used with the Perl API functions should be manipulated |
1267 | using the macros described in this section. The macros provide the necessary |
1268 | transparency between differences in the actual malloc implementation that is |
1269 | used within perl. |
1270 | |
1271 | It is suggested that you enable the version of malloc that is distributed |
5f05dabc |
1272 | with Perl. It keeps pools of various sizes of unallocated memory in |
07fa94a1 |
1273 | order to satisfy allocation requests more quickly. However, on some |
1274 | platforms, it may cause spurious malloc or free errors. |
d1b91892 |
1275 | |
1276 | New(x, pointer, number, type); |
1277 | Newc(x, pointer, number, type, cast); |
1278 | Newz(x, pointer, number, type); |
1279 | |
07fa94a1 |
1280 | These three macros are used to initially allocate memory. |
5f05dabc |
1281 | |
1282 | The first argument C<x> was a "magic cookie" that was used to keep track |
1283 | of who called the macro, to help when debugging memory problems. However, |
07fa94a1 |
1284 | the current code makes no use of this feature (most Perl developers now |
1285 | use run-time memory checkers), so this argument can be any number. |
5f05dabc |
1286 | |
1287 | The second argument C<pointer> should be the name of a variable that will |
1288 | point to the newly allocated memory. |
d1b91892 |
1289 | |
d1b91892 |
1290 | The third and fourth arguments C<number> and C<type> specify how many of |
1291 | the specified type of data structure should be allocated. The argument |
1292 | C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>, |
1293 | should be used if the C<pointer> argument is different from the C<type> |
1294 | argument. |
1295 | |
1296 | Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero> |
1297 | to zero out all the newly allocated memory. |
1298 | |
1299 | Renew(pointer, number, type); |
1300 | Renewc(pointer, number, type, cast); |
1301 | Safefree(pointer) |
1302 | |
1303 | These three macros are used to change a memory buffer size or to free a |
1304 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> |
1305 | match those of C<New> and C<Newc> with the exception of not needing the |
1306 | "magic cookie" argument. |
1307 | |
1308 | Move(source, dest, number, type); |
1309 | Copy(source, dest, number, type); |
1310 | Zero(dest, number, type); |
1311 | |
1312 | These three macros are used to move, copy, or zero out previously allocated |
1313 | memory. The C<source> and C<dest> arguments point to the source and |
1314 | destination starting points. Perl will move, copy, or zero out C<number> |
1315 | instances of the size of the C<type> data structure (using the C<sizeof> |
1316 | function). |
a0d0e21e |
1317 | |
5f05dabc |
1318 | =head2 PerlIO |
ce3d39e2 |
1319 | |
5f05dabc |
1320 | The most recent development releases of Perl has been experimenting with |
1321 | removing Perl's dependency on the "normal" standard I/O suite and allowing |
1322 | other stdio implementations to be used. This involves creating a new |
1323 | abstraction layer that then calls whichever implementation of stdio Perl |
68dc0745 |
1324 | was compiled with. All XSUBs should now use the functions in the PerlIO |
5f05dabc |
1325 | abstraction layer and not make any assumptions about what kind of stdio |
1326 | is being used. |
1327 | |
1328 | For a complete description of the PerlIO abstraction, consult L<perlapio>. |
1329 | |
8ebc5c01 |
1330 | =head2 Putting a C value on Perl stack |
ce3d39e2 |
1331 | |
1332 | A lot of opcodes (this is an elementary operation in the internal perl |
1333 | stack machine) put an SV* on the stack. However, as an optimization |
1334 | the corresponding SV is (usually) not recreated each time. The opcodes |
1335 | reuse specially assigned SVs (I<target>s) which are (as a corollary) |
1336 | not constantly freed/created. |
1337 | |
0a753a76 |
1338 | Each of the targets is created only once (but see |
ce3d39e2 |
1339 | L<Scratchpads and recursion> below), and when an opcode needs to put |
1340 | an integer, a double, or a string on stack, it just sets the |
1341 | corresponding parts of its I<target> and puts the I<target> on stack. |
1342 | |
1343 | The macro to put this target on stack is C<PUSHTARG>, and it is |
1344 | directly used in some opcodes, as well as indirectly in zillions of |
1345 | others, which use it via C<(X)PUSH[pni]>. |
1346 | |
8ebc5c01 |
1347 | =head2 Scratchpads |
ce3d39e2 |
1348 | |
54310121 |
1349 | The question remains on when the SVs which are I<target>s for opcodes |
5f05dabc |
1350 | are created. The answer is that they are created when the current unit -- |
1351 | a subroutine or a file (for opcodes for statements outside of |
1352 | subroutines) -- is compiled. During this time a special anonymous Perl |
ce3d39e2 |
1353 | array is created, which is called a scratchpad for the current |
1354 | unit. |
1355 | |
54310121 |
1356 | A scratchpad keeps SVs which are lexicals for the current unit and are |
ce3d39e2 |
1357 | targets for opcodes. One can deduce that an SV lives on a scratchpad |
1358 | by looking on its flags: lexicals have C<SVs_PADMY> set, and |
1359 | I<target>s have C<SVs_PADTMP> set. |
1360 | |
54310121 |
1361 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
1362 | OPs in the compile tree of the unit can use the same target, if this |
ce3d39e2 |
1363 | would not conflict with the expected life of the temporary. |
1364 | |
2ae324a7 |
1365 | =head2 Scratchpads and recursion |
ce3d39e2 |
1366 | |
1367 | In fact it is not 100% true that a compiled unit contains a pointer to |
1368 | the scratchpad AV. In fact it contains a pointer to an AV of |
1369 | (initially) one element, and this element is the scratchpad AV. Why do |
1370 | we need an extra level of indirection? |
1371 | |
1372 | The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both |
1373 | these can create several execution pointers going into the same |
1374 | subroutine. For the subroutine-child not write over the temporaries |
1375 | for the subroutine-parent (lifespan of which covers the call to the |
1376 | child), the parent and the child should have different |
1377 | scratchpads. (I<And> the lexicals should be separate anyway!) |
1378 | |
5f05dabc |
1379 | So each subroutine is born with an array of scratchpads (of length 1). |
1380 | On each entry to the subroutine it is checked that the current |
ce3d39e2 |
1381 | depth of the recursion is not more than the length of this array, and |
1382 | if it is, new scratchpad is created and pushed into the array. |
1383 | |
1384 | The I<target>s on this scratchpad are C<undef>s, but they are already |
1385 | marked with correct flags. |
1386 | |
0a753a76 |
1387 | =head1 Compiled code |
1388 | |
1389 | =head2 Code tree |
1390 | |
1391 | Here we describe the internal form your code is converted to by |
1392 | Perl. Start with a simple example: |
1393 | |
1394 | $a = $b + $c; |
1395 | |
1396 | This is converted to a tree similar to this one: |
1397 | |
1398 | assign-to |
1399 | / \ |
1400 | + $a |
1401 | / \ |
1402 | $b $c |
1403 | |
7b8d334a |
1404 | (but slightly more complicated). This tree reflects the way Perl |
0a753a76 |
1405 | parsed your code, but has nothing to do with the execution order. |
1406 | There is an additional "thread" going through the nodes of the tree |
1407 | which shows the order of execution of the nodes. In our simplified |
1408 | example above it looks like: |
1409 | |
1410 | $b ---> $c ---> + ---> $a ---> assign-to |
1411 | |
1412 | But with the actual compile tree for C<$a = $b + $c> it is different: |
1413 | some nodes I<optimized away>. As a corollary, though the actual tree |
1414 | contains more nodes than our simplified example, the execution order |
1415 | is the same as in our example. |
1416 | |
1417 | =head2 Examining the tree |
1418 | |
1419 | If you have your perl compiled for debugging (usually done with C<-D |
1420 | optimize=-g> on C<Configure> command line), you may examine the |
1421 | compiled tree by specifying C<-Dx> on the Perl command line. The |
1422 | output takes several lines per node, and for C<$b+$c> it looks like |
1423 | this: |
1424 | |
1425 | 5 TYPE = add ===> 6 |
1426 | TARG = 1 |
1427 | FLAGS = (SCALAR,KIDS) |
1428 | { |
1429 | TYPE = null ===> (4) |
1430 | (was rv2sv) |
1431 | FLAGS = (SCALAR,KIDS) |
1432 | { |
1433 | 3 TYPE = gvsv ===> 4 |
1434 | FLAGS = (SCALAR) |
1435 | GV = main::b |
1436 | } |
1437 | } |
1438 | { |
1439 | TYPE = null ===> (5) |
1440 | (was rv2sv) |
1441 | FLAGS = (SCALAR,KIDS) |
1442 | { |
1443 | 4 TYPE = gvsv ===> 5 |
1444 | FLAGS = (SCALAR) |
1445 | GV = main::c |
1446 | } |
1447 | } |
1448 | |
1449 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are |
1450 | not optimized away (one per number in the left column). The immediate |
1451 | children of the given node correspond to C<{}> pairs on the same level |
1452 | of indentation, thus this listing corresponds to the tree: |
1453 | |
1454 | add |
1455 | / \ |
1456 | null null |
1457 | | | |
1458 | gvsv gvsv |
1459 | |
1460 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 |
1461 | 4 5 6> (node C<6> is not included into above listing), i.e., |
1462 | C<gvsv gvsv add whatever>. |
1463 | |
1464 | =head2 Compile pass 1: check routines |
1465 | |
8870b5c7 |
1466 | The tree is created by the compiler while I<yacc> code feeds it |
1467 | the constructions it recognizes. Since I<yacc> works bottom-up, so does |
0a753a76 |
1468 | the first pass of perl compilation. |
1469 | |
1470 | What makes this pass interesting for perl developers is that some |
1471 | optimization may be performed on this pass. This is optimization by |
8870b5c7 |
1472 | so-called "check routines". The correspondence between node names |
0a753a76 |
1473 | and corresponding check routines is described in F<opcode.pl> (do not |
1474 | forget to run C<make regen_headers> if you modify this file). |
1475 | |
1476 | A check routine is called when the node is fully constructed except |
7b8d334a |
1477 | for the execution-order thread. Since at this time there are no |
0a753a76 |
1478 | back-links to the currently constructed node, one can do most any |
1479 | operation to the top-level node, including freeing it and/or creating |
1480 | new nodes above/below it. |
1481 | |
1482 | The check routine returns the node which should be inserted into the |
1483 | tree (if the top-level node was not modified, check routine returns |
1484 | its argument). |
1485 | |
1486 | By convention, check routines have names C<ck_*>. They are usually |
1487 | called from C<new*OP> subroutines (or C<convert>) (which in turn are |
1488 | called from F<perly.y>). |
1489 | |
1490 | =head2 Compile pass 1a: constant folding |
1491 | |
1492 | Immediately after the check routine is called the returned node is |
1493 | checked for being compile-time executable. If it is (the value is |
1494 | judged to be constant) it is immediately executed, and a I<constant> |
1495 | node with the "return value" of the corresponding subtree is |
1496 | substituted instead. The subtree is deleted. |
1497 | |
1498 | If constant folding was not performed, the execution-order thread is |
1499 | created. |
1500 | |
1501 | =head2 Compile pass 2: context propagation |
1502 | |
1503 | When a context for a part of compile tree is known, it is propagated |
a3cb178b |
1504 | down through the tree. At this time the context can have 5 values |
0a753a76 |
1505 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
1506 | lvalue. In contrast with the pass 1 this pass is processed from top |
1507 | to bottom: a node's context determines the context for its children. |
1508 | |
1509 | Additional context-dependent optimizations are performed at this time. |
1510 | Since at this moment the compile tree contains back-references (via |
1511 | "thread" pointers), nodes cannot be free()d now. To allow |
1512 | optimized-away nodes at this stage, such nodes are null()ified instead |
1513 | of free()ing (i.e. their type is changed to OP_NULL). |
1514 | |
1515 | =head2 Compile pass 3: peephole optimization |
1516 | |
1517 | After the compile tree for a subroutine (or for an C<eval> or a file) |
1518 | is created, an additional pass over the code is performed. This pass |
1519 | is neither top-down or bottom-up, but in the execution order (with |
7b8d334a |
1520 | additional complications for conditionals). These optimizations are |
0a753a76 |
1521 | done in the subroutine peep(). Optimizations performed at this stage |
1522 | are subject to the same restrictions as in the pass 2. |
1523 | |
954c1994 |
1524 | =head1 How multiple interpreters and concurrency are supported |
ee072b34 |
1525 | |
ee072b34 |
1526 | =head2 Background and PERL_IMPLICIT_CONTEXT |
1527 | |
1528 | The Perl interpreter can be regarded as a closed box: it has an API |
1529 | for feeding it code or otherwise making it do things, but it also has |
1530 | functions for its own use. This smells a lot like an object, and |
1531 | there are ways for you to build Perl so that you can have multiple |
1532 | interpreters, with one interpreter represented either as a C++ object, |
1533 | a C structure, or inside a thread. The thread, the C structure, or |
1534 | the C++ object will contain all the context, the state of that |
1535 | interpreter. |
1536 | |
54aff467 |
1537 | Three macros control the major Perl build flavors: MULTIPLICITY, |
1538 | USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure |
1539 | that packages all the interpreter state, there is a similar thread-specific |
1540 | data structure under USE_THREADS, and the PERL_OBJECT build has a C++ |
1541 | class to maintain interpreter state. In all three cases, |
1542 | PERL_IMPLICIT_CONTEXT is also normally defined, and enables the |
1543 | support for passing in a "hidden" first argument that represents all three |
651a3225 |
1544 | data structures. |
54aff467 |
1545 | |
1546 | All this obviously requires a way for the Perl internal functions to be |
ee072b34 |
1547 | C++ methods, subroutines taking some kind of structure as the first |
1548 | argument, or subroutines taking nothing as the first argument. To |
1549 | enable these three very different ways of building the interpreter, |
1550 | the Perl source (as it does in so many other situations) makes heavy |
1551 | use of macros and subroutine naming conventions. |
1552 | |
54aff467 |
1553 | First problem: deciding which functions will be public API functions and |
954c1994 |
1554 | which will be private. All functions whose names begin C<S_> are private |
1555 | (think "S" for "secret" or "static"). All other functions begin with |
1556 | "Perl_", but just because a function begins with "Perl_" does not mean it is |
1557 | part of the API. The easiest way to be B<sure> a function is part of the API |
1558 | is to find its entry in L<perlapi>. If it exists in L<perlapi>, it's part |
4375e838 |
1559 | of the API. If it doesn't, and you think it should be (i.e., you need it for |
1560 | your extension), send mail via L<perlbug> explaining why you think it |
954c1994 |
1561 | should be. |
1562 | |
1563 | (L<perlapi> itself is generated by embed.pl, a Perl script that generates |
1564 | significant portions of the Perl source code. It has a list of almost |
1565 | all the functions defined by the Perl interpreter along with their calling |
1566 | characteristics and some flags. Functions that are part of the public API |
1567 | are marked with an 'A' in its flags.) |
ee072b34 |
1568 | |
1569 | Second problem: there must be a syntax so that the same subroutine |
1570 | declarations and calls can pass a structure as their first argument, |
1571 | or pass nothing. To solve this, the subroutines are named and |
1572 | declared in a particular way. Here's a typical start of a static |
1573 | function used within the Perl guts: |
1574 | |
1575 | STATIC void |
1576 | S_incline(pTHX_ char *s) |
1577 | |
1578 | STATIC becomes "static" in C, and is #define'd to nothing in C++. |
1579 | |
651a3225 |
1580 | A public function (i.e. part of the internal API, but not necessarily |
1581 | sanctioned for use in extensions) begins like this: |
ee072b34 |
1582 | |
1583 | void |
1584 | Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv) |
1585 | |
1586 | C<pTHX_> is one of a number of macros (in perl.h) that hide the |
1587 | details of the interpreter's context. THX stands for "thread", "this", |
1588 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) |
1589 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, |
1590 | or 'd' for B<d>eclaration. |
1591 | |
1592 | When Perl is built without PERL_IMPLICIT_CONTEXT, there is no first |
1593 | argument containing the interpreter's context. The trailing underscore |
1594 | in the pTHX_ macro indicates that the macro expansion needs a comma |
1595 | after the context argument because other arguments follow it. If |
1596 | PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the |
54aff467 |
1597 | subroutine is not prototyped to take the extra argument. The form of the |
1598 | macro without the trailing underscore is used when there are no additional |
ee072b34 |
1599 | explicit arguments. |
1600 | |
54aff467 |
1601 | When a core function calls another, it must pass the context. This |
ee072b34 |
1602 | is normally hidden via macros. Consider C<sv_setsv>. It expands |
1603 | something like this: |
1604 | |
1605 | ifdef PERL_IMPLICIT_CONTEXT |
1606 | define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b) |
1607 | /* can't do this for vararg functions, see below */ |
1608 | else |
1609 | define sv_setsv Perl_sv_setsv |
1610 | endif |
1611 | |
1612 | This works well, and means that XS authors can gleefully write: |
1613 | |
1614 | sv_setsv(foo, bar); |
1615 | |
1616 | and still have it work under all the modes Perl could have been |
1617 | compiled with. |
1618 | |
1619 | Under PERL_OBJECT in the core, that will translate to either: |
1620 | |
1621 | CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions, |
1622 | # C++ takes care of 'this' |
1623 | or |
1624 | |
1625 | pPerl->Perl_sv_setsv(foo,bar); # in truly static functions, |
1626 | # see objXSUB.h |
1627 | |
1628 | Under PERL_OBJECT in extensions (aka PERL_CAPI), or under |
1629 | MULTIPLICITY/USE_THREADS w/ PERL_IMPLICIT_CONTEXT in both core |
1630 | and extensions, it will be: |
1631 | |
1632 | Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API" |
1633 | # for all build flavors |
1634 | |
1635 | This doesn't work so cleanly for varargs functions, though, as macros |
1636 | imply that the number of arguments is known in advance. Instead we |
1637 | either need to spell them out fully, passing C<aTHX_> as the first |
1638 | argument (the Perl core tends to do this with functions like |
1639 | Perl_warner), or use a context-free version. |
1640 | |
1641 | The context-free version of Perl_warner is called |
1642 | Perl_warner_nocontext, and does not take the extra argument. Instead |
1643 | it does dTHX; to get the context from thread-local storage. We |
1644 | C<#define warner Perl_warner_nocontext> so that extensions get source |
1645 | compatibility at the expense of performance. (Passing an arg is |
1646 | cheaper than grabbing it from thread-local storage.) |
1647 | |
1648 | You can ignore [pad]THX[xo] when browsing the Perl headers/sources. |
1649 | Those are strictly for use within the core. Extensions and embedders |
1650 | need only be aware of [pad]THX. |
1651 | |
1652 | =head2 How do I use all this in extensions? |
1653 | |
1654 | When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call |
1655 | any functions in the Perl API will need to pass the initial context |
1656 | argument somehow. The kicker is that you will need to write it in |
1657 | such a way that the extension still compiles when Perl hasn't been |
1658 | built with PERL_IMPLICIT_CONTEXT enabled. |
1659 | |
1660 | There are three ways to do this. First, the easy but inefficient way, |
1661 | which is also the default, in order to maintain source compatibility |
1662 | with extensions: whenever XSUB.h is #included, it redefines the aTHX |
1663 | and aTHX_ macros to call a function that will return the context. |
1664 | Thus, something like: |
1665 | |
1666 | sv_setsv(asv, bsv); |
1667 | |
4375e838 |
1668 | in your extension will translate to this when PERL_IMPLICIT_CONTEXT is |
54aff467 |
1669 | in effect: |
ee072b34 |
1670 | |
2fa86c13 |
1671 | Perl_sv_setsv(Perl_get_context(), asv, bsv); |
ee072b34 |
1672 | |
54aff467 |
1673 | or to this otherwise: |
ee072b34 |
1674 | |
1675 | Perl_sv_setsv(asv, bsv); |
1676 | |
1677 | You have to do nothing new in your extension to get this; since |
2fa86c13 |
1678 | the Perl library provides Perl_get_context(), it will all just |
ee072b34 |
1679 | work. |
1680 | |
1681 | The second, more efficient way is to use the following template for |
1682 | your Foo.xs: |
1683 | |
1684 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
1685 | #include "EXTERN.h" |
1686 | #include "perl.h" |
1687 | #include "XSUB.h" |
1688 | |
1689 | static my_private_function(int arg1, int arg2); |
1690 | |
1691 | static SV * |
54aff467 |
1692 | my_private_function(int arg1, int arg2) |
ee072b34 |
1693 | { |
1694 | dTHX; /* fetch context */ |
1695 | ... call many Perl API functions ... |
1696 | } |
1697 | |
1698 | [... etc ...] |
1699 | |
1700 | MODULE = Foo PACKAGE = Foo |
1701 | |
1702 | /* typical XSUB */ |
1703 | |
1704 | void |
1705 | my_xsub(arg) |
1706 | int arg |
1707 | CODE: |
1708 | my_private_function(arg, 10); |
1709 | |
1710 | Note that the only two changes from the normal way of writing an |
1711 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before |
1712 | including the Perl headers, followed by a C<dTHX;> declaration at |
1713 | the start of every function that will call the Perl API. (You'll |
1714 | know which functions need this, because the C compiler will complain |
1715 | that there's an undeclared identifier in those functions.) No changes |
1716 | are needed for the XSUBs themselves, because the XS() macro is |
1717 | correctly defined to pass in the implicit context if needed. |
1718 | |
1719 | The third, even more efficient way is to ape how it is done within |
1720 | the Perl guts: |
1721 | |
1722 | |
1723 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
1724 | #include "EXTERN.h" |
1725 | #include "perl.h" |
1726 | #include "XSUB.h" |
1727 | |
1728 | /* pTHX_ only needed for functions that call Perl API */ |
1729 | static my_private_function(pTHX_ int arg1, int arg2); |
1730 | |
1731 | static SV * |
1732 | my_private_function(pTHX_ int arg1, int arg2) |
1733 | { |
1734 | /* dTHX; not needed here, because THX is an argument */ |
1735 | ... call Perl API functions ... |
1736 | } |
1737 | |
1738 | [... etc ...] |
1739 | |
1740 | MODULE = Foo PACKAGE = Foo |
1741 | |
1742 | /* typical XSUB */ |
1743 | |
1744 | void |
1745 | my_xsub(arg) |
1746 | int arg |
1747 | CODE: |
1748 | my_private_function(aTHX_ arg, 10); |
1749 | |
1750 | This implementation never has to fetch the context using a function |
1751 | call, since it is always passed as an extra argument. Depending on |
1752 | your needs for simplicity or efficiency, you may mix the previous |
1753 | two approaches freely. |
1754 | |
651a3225 |
1755 | Never add a comma after C<pTHX> yourself--always use the form of the |
1756 | macro with the underscore for functions that take explicit arguments, |
1757 | or the form without the argument for functions with no explicit arguments. |
ee072b34 |
1758 | |
1759 | =head2 Future Plans and PERL_IMPLICIT_SYS |
1760 | |
1761 | Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything |
1762 | that the interpreter knows about itself and pass it around, so too are |
1763 | there plans to allow the interpreter to bundle up everything it knows |
1764 | about the environment it's running on. This is enabled with the |
1765 | PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT, |
1766 | but is mostly there for MULTIPLICITY and USE_THREADS (see inside |
1767 | iperlsys.h). |
1768 | |
1769 | This allows the ability to provide an extra pointer (called the "host" |
1770 | environment) for all the system calls. This makes it possible for |
1771 | all the system stuff to maintain their own state, broken down into |
1772 | seven C structures. These are thin wrappers around the usual system |
1773 | calls (see win32/perllib.c) for the default perl executable, but for a |
1774 | more ambitious host (like the one that would do fork() emulation) all |
1775 | the extra work needed to pretend that different interpreters are |
1776 | actually different "processes", would be done here. |
1777 | |
1778 | The Perl engine/interpreter and the host are orthogonal entities. |
1779 | There could be one or more interpreters in a process, and one or |
1780 | more "hosts", with free association between them. |
1781 | |
954c1994 |
1782 | =head1 AUTHORS |
e89caa19 |
1783 | |
954c1994 |
1784 | Until May 1997, this document was maintained by Jeff Okamoto |
1785 | <okamoto@corp.hp.com>. It is now maintained as part of Perl itself |
1786 | by the Perl 5 Porters <perl5-porters@perl.org>. |
cb1a09d0 |
1787 | |
954c1994 |
1788 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
1789 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil |
1790 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, |
1791 | Stephen McCamant, and Gurusamy Sarathy. |
cb1a09d0 |
1792 | |
954c1994 |
1793 | API Listing originally by Dean Roehrich <roehrich@cray.com>. |
cb1a09d0 |
1794 | |
954c1994 |
1795 | Modifications to autogenerate the API listing (L<perlapi>) by Benjamin |
1796 | Stuhl. |
cb1a09d0 |
1797 | |
954c1994 |
1798 | =head1 SEE ALSO |
cb1a09d0 |
1799 | |
954c1994 |
1800 | perlapi(1), perlintern(1), perlxs(1), perlembed(1) |