Commit | Line | Data |
e8cd7eae |
1 | =head1 NAME |
2 | |
3 | perlhack - How to hack at the Perl internals |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This document attempts to explain how Perl development takes place, |
8 | and ends with some suggestions for people wanting to become bona fide |
9 | porters. |
10 | |
11 | The perl5-porters mailing list is where the Perl standard distribution |
12 | is maintained and developed. The list can get anywhere from 10 to 150 |
13 | messages a day, depending on the heatedness of the debate. Most days |
14 | there are two or three patches, extensions, features, or bugs being |
15 | discussed at a time. |
16 | |
f8e3975a |
17 | A searchable archive of the list is at either: |
e8cd7eae |
18 | |
19 | http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ |
20 | |
f8e3975a |
21 | or |
22 | |
23 | http://archive.develooper.com/perl5-porters@perl.org/ |
24 | |
e8cd7eae |
25 | List subscribers (the porters themselves) come in several flavours. |
26 | Some are quiet curious lurkers, who rarely pitch in and instead watch |
27 | the ongoing development to ensure they're forewarned of new changes or |
28 | features in Perl. Some are representatives of vendors, who are there |
29 | to make sure that Perl continues to compile and work on their |
30 | platforms. Some patch any reported bug that they know how to fix, |
31 | some are actively patching their pet area (threads, Win32, the regexp |
32 | engine), while others seem to do nothing but complain. In other |
33 | words, it's your usual mix of technical people. |
34 | |
35 | Over this group of porters presides Larry Wall. He has the final word |
f6c51b38 |
36 | in what does and does not change in the Perl language. Various |
37 | releases of Perl are shepherded by a ``pumpking'', a porter |
38 | responsible for gathering patches, deciding on a patch-by-patch |
39 | feature-by-feature basis what will and will not go into the release. |
caf100c0 |
40 | For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of |
961f29c6 |
41 | Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and |
42 | Hugo van der Sanden and Rafael Garcia-Suarez share the pumpking for |
43 | the 5.10 release. |
e8cd7eae |
44 | |
45 | In addition, various people are pumpkings for different things. For |
961f29c6 |
46 | instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the |
47 | I<Configure> pumpkin up till the 5.8 release. For the 5.10 release |
48 | H.Merijn Brand took over. |
e8cd7eae |
49 | |
50 | Larry sees Perl development along the lines of the US government: |
51 | there's the Legislature (the porters), the Executive branch (the |
52 | pumpkings), and the Supreme Court (Larry). The legislature can |
53 | discuss and submit patches to the executive branch all they like, but |
54 | the executive branch is free to veto them. Rarely, the Supreme Court |
55 | will side with the executive branch over the legislature, or the |
56 | legislature over the executive branch. Mostly, however, the |
57 | legislature and the executive branch are supposed to get along and |
58 | work out their differences without impeachment or court cases. |
59 | |
60 | You might sometimes see reference to Rule 1 and Rule 2. Larry's power |
61 | as Supreme Court is expressed in The Rules: |
62 | |
63 | =over 4 |
64 | |
65 | =item 1 |
66 | |
67 | Larry is always by definition right about how Perl should behave. |
68 | This means he has final veto power on the core functionality. |
69 | |
70 | =item 2 |
71 | |
72 | Larry is allowed to change his mind about any matter at a later date, |
73 | regardless of whether he previously invoked Rule 1. |
74 | |
75 | =back |
76 | |
77 | Got that? Larry is always right, even when he was wrong. It's rare |
78 | to see either Rule exercised, but they are often alluded to. |
79 | |
80 | New features and extensions to the language are contentious, because |
81 | the criteria used by the pumpkings, Larry, and other porters to decide |
82 | which features should be implemented and incorporated are not codified |
83 | in a few small design goals as with some other languages. Instead, |
84 | the heuristics are flexible and often difficult to fathom. Here is |
85 | one person's list, roughly in decreasing order of importance, of |
86 | heuristics that new features have to be weighed against: |
87 | |
88 | =over 4 |
89 | |
90 | =item Does concept match the general goals of Perl? |
91 | |
92 | These haven't been written anywhere in stone, but one approximation |
93 | is: |
94 | |
95 | 1. Keep it fast, simple, and useful. |
96 | 2. Keep features/concepts as orthogonal as possible. |
97 | 3. No arbitrary limits (platforms, data sizes, cultures). |
98 | 4. Keep it open and exciting to use/patch/advocate Perl everywhere. |
99 | 5. Either assimilate new technologies, or build bridges to them. |
100 | |
101 | =item Where is the implementation? |
102 | |
103 | All the talk in the world is useless without an implementation. In |
104 | almost every case, the person or people who argue for a new feature |
105 | will be expected to be the ones who implement it. Porters capable |
106 | of coding new features have their own agendas, and are not available |
107 | to implement your (possibly good) idea. |
108 | |
109 | =item Backwards compatibility |
110 | |
111 | It's a cardinal sin to break existing Perl programs. New warnings are |
112 | contentious--some say that a program that emits warnings is not |
113 | broken, while others say it is. Adding keywords has the potential to |
114 | break programs, changing the meaning of existing token sequences or |
115 | functions might break programs. |
116 | |
117 | =item Could it be a module instead? |
118 | |
119 | Perl 5 has extension mechanisms, modules and XS, specifically to avoid |
120 | the need to keep changing the Perl interpreter. You can write modules |
121 | that export functions, you can give those functions prototypes so they |
122 | can be called like built-in functions, you can even write XS code to |
123 | mess with the runtime data structures of the Perl interpreter if you |
124 | want to implement really complicated things. If it can be done in a |
125 | module instead of in the core, it's highly unlikely to be added. |
126 | |
127 | =item Is the feature generic enough? |
128 | |
129 | Is this something that only the submitter wants added to the language, |
130 | or would it be broadly useful? Sometimes, instead of adding a feature |
131 | with a tight focus, the porters might decide to wait until someone |
132 | implements the more generalized feature. For instance, instead of |
133 | implementing a ``delayed evaluation'' feature, the porters are waiting |
134 | for a macro system that would permit delayed evaluation and much more. |
135 | |
136 | =item Does it potentially introduce new bugs? |
137 | |
138 | Radical rewrites of large chunks of the Perl interpreter have the |
139 | potential to introduce new bugs. The smaller and more localized the |
140 | change, the better. |
141 | |
142 | =item Does it preclude other desirable features? |
143 | |
144 | A patch is likely to be rejected if it closes off future avenues of |
145 | development. For instance, a patch that placed a true and final |
146 | interpretation on prototypes is likely to be rejected because there |
147 | are still options for the future of prototypes that haven't been |
148 | addressed. |
149 | |
150 | =item Is the implementation robust? |
151 | |
152 | Good patches (tight code, complete, correct) stand more chance of |
153 | going in. Sloppy or incorrect patches might be placed on the back |
154 | burner until the pumpking has time to fix, or might be discarded |
155 | altogether without further notice. |
156 | |
157 | =item Is the implementation generic enough to be portable? |
158 | |
159 | The worst patches make use of a system-specific features. It's highly |
160 | unlikely that nonportable additions to the Perl language will be |
161 | accepted. |
162 | |
a936dd3c |
163 | =item Is the implementation tested? |
164 | |
165 | Patches which change behaviour (fixing bugs or introducing new features) |
166 | must include regression tests to verify that everything works as expected. |
167 | Without tests provided by the original author, how can anyone else changing |
168 | perl in the future be sure that they haven't unwittingly broken the behaviour |
169 | the patch implements? And without tests, how can the patch's author be |
9d077eaa |
170 | confident that his/her hard work put into the patch won't be accidentally |
a936dd3c |
171 | thrown away by someone in the future? |
172 | |
e8cd7eae |
173 | =item Is there enough documentation? |
174 | |
175 | Patches without documentation are probably ill-thought out or |
176 | incomplete. Nothing can be added without documentation, so submitting |
177 | a patch for the appropriate manpages as well as the source code is |
a936dd3c |
178 | always a good idea. |
e8cd7eae |
179 | |
180 | =item Is there another way to do it? |
181 | |
182 | Larry said ``Although the Perl Slogan is I<There's More Than One Way |
183 | to Do It>, I hesitate to make 10 ways to do something''. This is a |
184 | tricky heuristic to navigate, though--one man's essential addition is |
185 | another man's pointless cruft. |
186 | |
187 | =item Does it create too much work? |
188 | |
189 | Work for the pumpking, work for Perl programmers, work for module |
190 | authors, ... Perl is supposed to be easy. |
191 | |
f6c51b38 |
192 | =item Patches speak louder than words |
193 | |
194 | Working code is always preferred to pie-in-the-sky ideas. A patch to |
195 | add a feature stands a much higher chance of making it to the language |
196 | than does a random feature request, no matter how fervently argued the |
197 | request might be. This ties into ``Will it be useful?'', as the fact |
198 | that someone took the time to make the patch demonstrates a strong |
199 | desire for the feature. |
200 | |
e8cd7eae |
201 | =back |
202 | |
203 | If you're on the list, you might hear the word ``core'' bandied |
204 | around. It refers to the standard distribution. ``Hacking on the |
205 | core'' means you're changing the C source code to the Perl |
206 | interpreter. ``A core module'' is one that ships with Perl. |
207 | |
a1f349fd |
208 | =head2 Keeping in sync |
209 | |
e8cd7eae |
210 | The source code to the Perl interpreter, in its different versions, is |
f224927c |
211 | kept in a repository managed by a revision control system ( which is |
212 | currently the Perforce program, see http://perforce.com/ ). The |
e8cd7eae |
213 | pumpkings and a few others have access to the repository to check in |
214 | changes. Periodically the pumpking for the development version of Perl |
215 | will release a new version, so the rest of the porters can see what's |
2be4c08b |
216 | changed. The current state of the main trunk of repository, and patches |
217 | that describe the individual changes that have happened since the last |
218 | public release are available at this location: |
219 | |
0cfb3454 |
220 | http://public.activestate.com/gsar/APC/ |
2be4c08b |
221 | ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/ |
222 | |
0cfb3454 |
223 | If you're looking for a particular change, or a change that affected |
224 | a particular set of files, you may find the B<Perl Repository Browser> |
225 | useful: |
226 | |
227 | http://public.activestate.com/cgi-bin/perlbrowse |
228 | |
229 | You may also want to subscribe to the perl5-changes mailing list to |
230 | receive a copy of each patch that gets submitted to the maintenance |
231 | and development "branches" of the perl repository. See |
232 | http://lists.perl.org/ for subscription information. |
233 | |
a1f349fd |
234 | If you are a member of the perl5-porters mailing list, it is a good |
235 | thing to keep in touch with the most recent changes. If not only to |
236 | verify if what you would have posted as a bug report isn't already |
237 | solved in the most recent available perl development branch, also |
238 | known as perl-current, bleading edge perl, bleedperl or bleadperl. |
2be4c08b |
239 | |
240 | Needless to say, the source code in perl-current is usually in a perpetual |
241 | state of evolution. You should expect it to be very buggy. Do B<not> use |
242 | it for any purpose other than testing and development. |
e8cd7eae |
243 | |
3e148164 |
244 | Keeping in sync with the most recent branch can be done in several ways, |
245 | but the most convenient and reliable way is using B<rsync>, available at |
246 | ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent |
247 | branch by FTP.) |
a1f349fd |
248 | |
249 | If you choose to keep in sync using rsync, there are two approaches |
3e148164 |
250 | to doing so: |
a1f349fd |
251 | |
252 | =over 4 |
253 | |
254 | =item rsync'ing the source tree |
255 | |
3e148164 |
256 | Presuming you are in the directory where your perl source resides |
a1f349fd |
257 | and you have rsync installed and available, you can `upgrade' to |
258 | the bleadperl using: |
259 | |
260 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ . |
261 | |
262 | This takes care of updating every single item in the source tree to |
263 | the latest applied patch level, creating files that are new (to your |
264 | distribution) and setting date/time stamps of existing files to |
265 | reflect the bleadperl status. |
266 | |
c6d0653e |
267 | Note that this will not delete any files that were in '.' before |
268 | the rsync. Once you are sure that the rsync is running correctly, |
269 | run it with the --delete and the --dry-run options like this: |
270 | |
271 | # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ . |
272 | |
273 | This will I<simulate> an rsync run that also deletes files not |
274 | present in the bleadperl master copy. Observe the results from |
275 | this run closely. If you are sure that the actual run would delete |
276 | no files precious to you, you could remove the '--dry-run' option. |
277 | |
a1f349fd |
278 | You can than check what patch was the latest that was applied by |
279 | looking in the file B<.patch>, which will show the number of the |
280 | latest patch. |
281 | |
282 | If you have more than one machine to keep in sync, and not all of |
283 | them have access to the WAN (so you are not able to rsync all the |
284 | source trees to the real source), there are some ways to get around |
285 | this problem. |
286 | |
287 | =over 4 |
288 | |
289 | =item Using rsync over the LAN |
290 | |
291 | Set up a local rsync server which makes the rsynced source tree |
3e148164 |
292 | available to the LAN and sync the other machines against this |
a1f349fd |
293 | directory. |
294 | |
1577cd80 |
295 | From http://rsync.samba.org/README.html : |
a1f349fd |
296 | |
297 | "Rsync uses rsh or ssh for communication. It does not need to be |
298 | setuid and requires no special privileges for installation. It |
3958b146 |
299 | does not require an inetd entry or a daemon. You must, however, |
a1f349fd |
300 | have a working rsh or ssh system. Using ssh is recommended for |
301 | its security features." |
302 | |
303 | =item Using pushing over the NFS |
304 | |
305 | Having the other systems mounted over the NFS, you can take an |
3e148164 |
306 | active pushing approach by checking the just updated tree against |
307 | the other not-yet synced trees. An example would be |
308 | |
309 | #!/usr/bin/perl -w |
310 | |
311 | use strict; |
312 | use File::Copy; |
313 | |
314 | my %MF = map { |
315 | m/(\S+)/; |
316 | $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime |
317 | } `cat MANIFEST`; |
318 | |
319 | my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); |
320 | |
321 | foreach my $host (keys %remote) { |
322 | unless (-d $remote{$host}) { |
323 | print STDERR "Cannot Xsync for host $host\n"; |
324 | next; |
325 | } |
326 | foreach my $file (keys %MF) { |
327 | my $rfile = "$remote{$host}/$file"; |
328 | my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; |
329 | defined $size or ($mode, $size, $mtime) = (0, 0, 0); |
330 | $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; |
331 | printf "%4s %-34s %8d %9d %8d %9d\n", |
332 | $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; |
333 | unlink $rfile; |
334 | copy ($file, $rfile); |
335 | utime time, $MF{$file}[2], $rfile; |
336 | chmod $MF{$file}[0], $rfile; |
337 | } |
338 | } |
339 | |
340 | though this is not perfect. It could be improved with checking |
a1f349fd |
341 | file checksums before updating. Not all NFS systems support |
342 | reliable utime support (when used over the NFS). |
343 | |
344 | =back |
345 | |
346 | =item rsync'ing the patches |
347 | |
348 | The source tree is maintained by the pumpking who applies patches to |
349 | the files in the tree. These patches are either created by the |
350 | pumpking himself using C<diff -c> after updating the file manually or |
351 | by applying patches sent in by posters on the perl5-porters list. |
352 | These patches are also saved and rsync'able, so you can apply them |
353 | yourself to the source files. |
354 | |
355 | Presuming you are in a directory where your patches reside, you can |
3e148164 |
356 | get them in sync with |
a1f349fd |
357 | |
358 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . |
359 | |
360 | This makes sure the latest available patch is downloaded to your |
361 | patch directory. |
362 | |
3e148164 |
363 | It's then up to you to apply these patches, using something like |
a1f349fd |
364 | |
df3477ff |
365 | # last=`ls -t *.gz | sed q` |
a1f349fd |
366 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . |
367 | # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch |
368 | # cd ../perl-current |
369 | # patch -p1 -N <../perl-current-diffs/blead.patch |
370 | |
371 | or, since this is only a hint towards how it works, use CPAN-patchaperl |
372 | from Andreas König to have better control over the patching process. |
373 | |
374 | =back |
375 | |
f7e1e956 |
376 | =head2 Why rsync the source tree |
a1f349fd |
377 | |
378 | =over 4 |
379 | |
10f58044 |
380 | =item It's easier to rsync the source tree |
a1f349fd |
381 | |
382 | Since you don't have to apply the patches yourself, you are sure all |
383 | files in the source tree are in the right state. |
384 | |
a1f349fd |
385 | =item It's more reliable |
386 | |
0cfb3454 |
387 | While both the rsync-able source and patch areas are automatically |
388 | updated every few minutes, keep in mind that applying patches may |
389 | sometimes mean careful hand-holding, especially if your version of |
390 | the C<patch> program does not understand how to deal with new files, |
391 | files with 8-bit characters, or files without trailing newlines. |
a1f349fd |
392 | |
393 | =back |
394 | |
f7e1e956 |
395 | =head2 Why rsync the patches |
a1f349fd |
396 | |
397 | =over 4 |
398 | |
10f58044 |
399 | =item It's easier to rsync the patches |
a1f349fd |
400 | |
401 | If you have more than one machine that you want to keep in track with |
3e148164 |
402 | bleadperl, it's easier to rsync the patches only once and then apply |
a1f349fd |
403 | them to all the source trees on the different machines. |
404 | |
405 | In case you try to keep in pace on 5 different machines, for which |
406 | only one of them has access to the WAN, rsync'ing all the source |
3e148164 |
407 | trees should than be done 5 times over the NFS. Having |
a1f349fd |
408 | rsync'ed the patches only once, I can apply them to all the source |
3e148164 |
409 | trees automatically. Need you say more ;-) |
a1f349fd |
410 | |
411 | =item It's a good reference |
412 | |
413 | If you do not only like to have the most recent development branch, |
414 | but also like to B<fix> bugs, or extend features, you want to dive |
415 | into the sources. If you are a seasoned perl core diver, you don't |
416 | need no manuals, tips, roadmaps, perlguts.pod or other aids to find |
417 | your way around. But if you are a starter, the patches may help you |
418 | in finding where you should start and how to change the bits that |
419 | bug you. |
420 | |
421 | The file B<Changes> is updated on occasions the pumpking sees as his |
422 | own little sync points. On those occasions, he releases a tar-ball of |
423 | the current source tree (i.e. perl@7582.tar.gz), which will be an |
424 | excellent point to start with when choosing to use the 'rsync the |
425 | patches' scheme. Starting with perl@7582, which means a set of source |
426 | files on which the latest applied patch is number 7582, you apply all |
f18956b7 |
427 | succeeding patches available from then on (7583, 7584, ...). |
a1f349fd |
428 | |
429 | You can use the patches later as a kind of search archive. |
430 | |
431 | =over 4 |
432 | |
433 | =item Finding a start point |
434 | |
435 | If you want to fix/change the behaviour of function/feature Foo, just |
436 | scan the patches for patches that mention Foo either in the subject, |
3e148164 |
437 | the comments, or the body of the fix. A good chance the patch shows |
a1f349fd |
438 | you the files that are affected by that patch which are very likely |
439 | to be the starting point of your journey into the guts of perl. |
440 | |
441 | =item Finding how to fix a bug |
442 | |
443 | If you've found I<where> the function/feature Foo misbehaves, but you |
444 | don't know how to fix it (but you do know the change you want to |
445 | make), you can, again, peruse the patches for similar changes and |
446 | look how others apply the fix. |
447 | |
448 | =item Finding the source of misbehaviour |
449 | |
450 | When you keep in sync with bleadperl, the pumpking would love to |
3958b146 |
451 | I<see> that the community efforts really work. So after each of his |
a1f349fd |
452 | sync points, you are to 'make test' to check if everything is still |
453 | in working order. If it is, you do 'make ok', which will send an OK |
454 | report to perlbug@perl.org. (If you do not have access to a mailer |
3e148164 |
455 | from the system you just finished successfully 'make test', you can |
a1f349fd |
456 | do 'make okfile', which creates the file C<perl.ok>, which you can |
457 | than take to your favourite mailer and mail yourself). |
458 | |
3958b146 |
459 | But of course, as always, things will not always lead to a success |
a1f349fd |
460 | path, and one or more test do not pass the 'make test'. Before |
461 | sending in a bug report (using 'make nok' or 'make nokfile'), check |
462 | the mailing list if someone else has reported the bug already and if |
463 | so, confirm it by replying to that message. If not, you might want to |
464 | trace the source of that misbehaviour B<before> sending in the bug, |
465 | which will help all the other porters in finding the solution. |
466 | |
3e148164 |
467 | Here the saved patches come in very handy. You can check the list of |
468 | patches to see which patch changed what file and what change caused |
469 | the misbehaviour. If you note that in the bug report, it saves the |
470 | one trying to solve it, looking for that point. |
a1f349fd |
471 | |
472 | =back |
473 | |
474 | If searching the patches is too bothersome, you might consider using |
475 | perl's bugtron to find more information about discussions and |
476 | ramblings on posted bugs. |
477 | |
3e148164 |
478 | If you want to get the best of both worlds, rsync both the source |
479 | tree for convenience, reliability and ease and rsync the patches |
480 | for reference. |
481 | |
52315700 |
482 | =back |
483 | |
fcc89a64 |
484 | =head2 Working with the source |
485 | |
486 | Because you cannot use the Perforce client, you cannot easily generate |
487 | diffs against the repository, nor will merges occur when you update |
488 | via rsync. If you edit a file locally and then rsync against the |
489 | latest source, changes made in the remote copy will I<overwrite> your |
490 | local versions! |
491 | |
492 | The best way to deal with this is to maintain a tree of symlinks to |
493 | the rsync'd source. Then, when you want to edit a file, you remove |
494 | the symlink, copy the real file into the other tree, and edit it. You |
495 | can then diff your edited file against the original to generate a |
496 | patch, and you can safely update the original tree. |
497 | |
498 | Perl's F<Configure> script can generate this tree of symlinks for you. |
499 | The following example assumes that you have used rsync to pull a copy |
500 | of the Perl source into the F<perl-rsync> directory. In the directory |
501 | above that one, you can execute the following commands: |
502 | |
503 | mkdir perl-dev |
504 | cd perl-dev |
505 | ../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g" |
506 | |
507 | This will start the Perl configuration process. After a few prompts, |
508 | you should see something like this: |
509 | |
510 | Symbolic links are supported. |
511 | |
512 | Checking how to test for symbolic links... |
513 | Your builtin 'test -h' may be broken. |
514 | Trying external '/usr/bin/test -h'. |
515 | You can test for symbolic links with '/usr/bin/test -h'. |
516 | |
517 | Creating the symbolic links... |
518 | (First creating the subdirectories...) |
519 | (Then creating the symlinks...) |
520 | |
521 | The specifics may vary based on your operating system, of course. |
522 | After you see this, you can abort the F<Configure> script, and you |
523 | will see that the directory you are in has a tree of symlinks to the |
524 | F<perl-rsync> directories and files. |
525 | |
526 | If you plan to do a lot of work with the Perl source, here are some |
527 | Bourne shell script functions that can make your life easier: |
528 | |
529 | function edit { |
530 | if [ -L $1 ]; then |
531 | mv $1 $1.orig |
532 | cp $1.orig $1 |
533 | vi $1 |
534 | else |
535 | /bin/vi $1 |
536 | fi |
537 | } |
538 | |
539 | function unedit { |
540 | if [ -L $1.orig ]; then |
541 | rm $1 |
542 | mv $1.orig $1 |
543 | fi |
544 | } |
545 | |
546 | Replace "vi" with your favorite flavor of editor. |
547 | |
548 | Here is another function which will quickly generate a patch for the |
549 | files which have been edited in your symlink tree: |
550 | |
551 | mkpatchorig() { |
552 | local diffopts |
553 | for f in `find . -name '*.orig' | sed s,^\./,,` |
554 | do |
555 | case `echo $f | sed 's,.orig$,,;s,.*\.,,'` in |
556 | c) diffopts=-p ;; |
557 | pod) diffopts='-F^=' ;; |
558 | *) diffopts= ;; |
559 | esac |
560 | diff -du $diffopts $f `echo $f | sed 's,.orig$,,'` |
561 | done |
562 | } |
563 | |
564 | This function produces patches which include enough context to make |
565 | your changes obvious. This makes it easier for the Perl pumpking(s) |
566 | to review them when you send them to the perl5-porters list, and that |
567 | means they're more likely to get applied. |
568 | |
569 | This function assumed a GNU diff, and may require some tweaking for |
570 | other diff variants. |
52315700 |
571 | |
3fd28c4e |
572 | =head2 Perlbug administration |
52315700 |
573 | |
3fd28c4e |
574 | There is a single remote administrative interface for modifying bug status, |
575 | category, open issues etc. using the B<RT> I<bugtracker> system, maintained |
576 | by I<Robert Spier>. Become an administrator, and close any bugs you can get |
577 | your sticky mitts on: |
52315700 |
578 | |
3fd28c4e |
579 | http://rt.perl.org |
52315700 |
580 | |
3fd28c4e |
581 | The bugtracker mechanism for B<perl5> bugs in particular is at: |
52315700 |
582 | |
3fd28c4e |
583 | http://bugs6.perl.org/perlbug |
52315700 |
584 | |
3fd28c4e |
585 | To email the bug system administrators: |
52315700 |
586 | |
3fd28c4e |
587 | "perlbug-admin" <perlbug-admin@perl.org> |
52315700 |
588 | |
52315700 |
589 | |
a1f349fd |
590 | =head2 Submitting patches |
591 | |
f7e1e956 |
592 | Always submit patches to I<perl5-porters@perl.org>. If you're |
593 | patching a core module and there's an author listed, send the author a |
594 | copy (see L<Patching a core module>). This lets other porters review |
595 | your patch, which catches a surprising number of errors in patches. |
596 | Either use the diff program (available in source code form from |
f224927c |
597 | ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch> |
f7e1e956 |
598 | (available from I<CPAN/authors/id/JV/>). Unified diffs are preferred, |
599 | but context diffs are accepted. Do not send RCS-style diffs or diffs |
600 | without context lines. More information is given in the |
601 | I<Porting/patching.pod> file in the Perl source distribution. Please |
602 | patch against the latest B<development> version (e.g., if you're |
603 | fixing a bug in the 5.005 track, patch against the latest 5.005_5x |
604 | version). Only patches that survive the heat of the development |
605 | branch get applied to maintenance versions. |
606 | |
607 | Your patch should update the documentation and test suite. See |
608 | L<Writing a test>. |
e8cd7eae |
609 | |
610 | To report a bug in Perl, use the program I<perlbug> which comes with |
611 | Perl (if you can't get Perl to work, send mail to the address |
f18956b7 |
612 | I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through |
e8cd7eae |
613 | I<perlbug> feeds into the automated bug-tracking system, access to |
f224927c |
614 | which is provided through the web at http://bugs.perl.org/ . It |
e8cd7eae |
615 | often pays to check the archives of the perl5-porters mailing list to |
616 | see whether the bug you're reporting has been reported before, and if |
617 | so whether it was considered a bug. See above for the location of |
618 | the searchable archives. |
619 | |
f224927c |
620 | The CPAN testers ( http://testers.cpan.org/ ) are a group of |
ba139f7d |
621 | volunteers who test CPAN modules on a variety of platforms. Perl |
622 | Smokers ( http://archives.develooper.com/daily-build@perl.org/ ) |
623 | automatically tests Perl source releases on platforms with various |
624 | configurations. Both efforts welcome volunteers. |
e8cd7eae |
625 | |
e8cd7eae |
626 | It's a good idea to read and lurk for a while before chipping in. |
627 | That way you'll get to see the dynamic of the conversations, learn the |
628 | personalities of the players, and hopefully be better prepared to make |
629 | a useful contribution when do you speak up. |
630 | |
631 | If after all this you still think you want to join the perl5-porters |
f6c51b38 |
632 | mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To |
633 | unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>. |
e8cd7eae |
634 | |
a422fd2d |
635 | To hack on the Perl guts, you'll need to read the following things: |
636 | |
637 | =over 3 |
638 | |
639 | =item L<perlguts> |
640 | |
641 | This is of paramount importance, since it's the documentation of what |
642 | goes where in the Perl source. Read it over a couple of times and it |
643 | might start to make sense - don't worry if it doesn't yet, because the |
644 | best way to study it is to read it in conjunction with poking at Perl |
645 | source, and we'll do that later on. |
646 | |
647 | You might also want to look at Gisle Aas's illustrated perlguts - |
648 | there's no guarantee that this will be absolutely up-to-date with the |
649 | latest documentation in the Perl core, but the fundamentals will be |
1577cd80 |
650 | right. ( http://gisle.aas.no/perl/illguts/ ) |
a422fd2d |
651 | |
652 | =item L<perlxstut> and L<perlxs> |
653 | |
654 | A working knowledge of XSUB programming is incredibly useful for core |
655 | hacking; XSUBs use techniques drawn from the PP code, the portion of the |
656 | guts that actually executes a Perl program. It's a lot gentler to learn |
657 | those techniques from simple examples and explanation than from the core |
658 | itself. |
659 | |
660 | =item L<perlapi> |
661 | |
662 | The documentation for the Perl API explains what some of the internal |
663 | functions do, as well as the many macros used in the source. |
664 | |
665 | =item F<Porting/pumpkin.pod> |
666 | |
667 | This is a collection of words of wisdom for a Perl porter; some of it is |
668 | only useful to the pumpkin holder, but most of it applies to anyone |
669 | wanting to go about Perl development. |
670 | |
671 | =item The perl5-porters FAQ |
672 | |
7d7d5695 |
673 | This should be available from http://simon-cozens.org/writings/p5p-faq ; |
674 | alternatively, you can get the FAQ emailed to you by sending mail to |
675 | C<perl5-porters-faq@perl.org>. It contains hints on reading perl5-porters, |
676 | information on how perl5-porters works and how Perl development in general |
677 | works. |
a422fd2d |
678 | |
679 | =back |
680 | |
681 | =head2 Finding Your Way Around |
682 | |
683 | Perl maintenance can be split into a number of areas, and certain people |
684 | (pumpkins) will have responsibility for each area. These areas sometimes |
685 | correspond to files or directories in the source kit. Among the areas are: |
686 | |
687 | =over 3 |
688 | |
689 | =item Core modules |
690 | |
691 | Modules shipped as part of the Perl core live in the F<lib/> and F<ext/> |
692 | subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/> |
693 | contains the core XS modules. |
694 | |
f7e1e956 |
695 | =item Tests |
696 | |
697 | There are tests for nearly all the modules, built-ins and major bits |
698 | of functionality. Test files all have a .t suffix. Module tests live |
699 | in the F<lib/> and F<ext/> directories next to the module being |
700 | tested. Others live in F<t/>. See L<Writing a test> |
701 | |
a422fd2d |
702 | =item Documentation |
703 | |
704 | Documentation maintenance includes looking after everything in the |
705 | F<pod/> directory, (as well as contributing new documentation) and |
706 | the documentation to the modules in core. |
707 | |
708 | =item Configure |
709 | |
710 | The configure process is the way we make Perl portable across the |
711 | myriad of operating systems it supports. Responsibility for the |
712 | configure, build and installation process, as well as the overall |
713 | portability of the core code rests with the configure pumpkin - others |
714 | help out with individual operating systems. |
715 | |
716 | The files involved are the operating system directories, (F<win32/>, |
717 | F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h> |
718 | and F<Makefile>, as well as the metaconfig files which generate |
719 | F<Configure>. (metaconfig isn't included in the core distribution.) |
720 | |
721 | =item Interpreter |
722 | |
723 | And of course, there's the core of the Perl interpreter itself. Let's |
724 | have a look at that in a little more detail. |
725 | |
726 | =back |
727 | |
728 | Before we leave looking at the layout, though, don't forget that |
729 | F<MANIFEST> contains not only the file names in the Perl distribution, |
730 | but short descriptions of what's in them, too. For an overview of the |
731 | important files, try this: |
732 | |
733 | perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST |
734 | |
735 | =head2 Elements of the interpreter |
736 | |
737 | The work of the interpreter has two main stages: compiling the code |
738 | into the internal representation, or bytecode, and then executing it. |
739 | L<perlguts/Compiled code> explains exactly how the compilation stage |
740 | happens. |
741 | |
742 | Here is a short breakdown of perl's operation: |
743 | |
744 | =over 3 |
745 | |
746 | =item Startup |
747 | |
748 | The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) |
749 | This is very high-level code, enough to fit on a single screen, and it |
750 | resembles the code found in L<perlembed>; most of the real action takes |
751 | place in F<perl.c> |
752 | |
753 | First, F<perlmain.c> allocates some memory and constructs a Perl |
754 | interpreter: |
755 | |
756 | 1 PERL_SYS_INIT3(&argc,&argv,&env); |
757 | 2 |
758 | 3 if (!PL_do_undump) { |
759 | 4 my_perl = perl_alloc(); |
760 | 5 if (!my_perl) |
761 | 6 exit(1); |
762 | 7 perl_construct(my_perl); |
763 | 8 PL_perl_destruct_level = 0; |
764 | 9 } |
765 | |
766 | Line 1 is a macro, and its definition is dependent on your operating |
767 | system. Line 3 references C<PL_do_undump>, a global variable - all |
768 | global variables in Perl start with C<PL_>. This tells you whether the |
769 | current running program was created with the C<-u> flag to perl and then |
770 | F<undump>, which means it's going to be false in any sane context. |
771 | |
772 | Line 4 calls a function in F<perl.c> to allocate memory for a Perl |
773 | interpreter. It's quite a simple function, and the guts of it looks like |
774 | this: |
775 | |
776 | my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); |
777 | |
778 | Here you see an example of Perl's system abstraction, which we'll see |
779 | later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's |
780 | own C<malloc> as defined in F<malloc.c> if you selected that option at |
781 | configure time. |
782 | |
783 | Next, in line 7, we construct the interpreter; this sets up all the |
784 | special variables that Perl needs, the stacks, and so on. |
785 | |
786 | Now we pass Perl the command line options, and tell it to go: |
787 | |
788 | exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); |
789 | if (!exitstatus) { |
790 | exitstatus = perl_run(my_perl); |
791 | } |
792 | |
793 | |
794 | C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined |
795 | in F<perl.c>, which processes the command line options, sets up any |
796 | statically linked XS modules, opens the program and calls C<yyparse> to |
797 | parse it. |
798 | |
799 | =item Parsing |
800 | |
801 | The aim of this stage is to take the Perl source, and turn it into an op |
802 | tree. We'll see what one of those looks like later. Strictly speaking, |
803 | there's three things going on here. |
804 | |
805 | C<yyparse>, the parser, lives in F<perly.c>, although you're better off |
806 | reading the original YACC input in F<perly.y>. (Yes, Virginia, there |
807 | B<is> a YACC grammar for Perl!) The job of the parser is to take your |
808 | code and `understand' it, splitting it into sentences, deciding which |
809 | operands go with which operators and so on. |
810 | |
811 | The parser is nobly assisted by the lexer, which chunks up your input |
812 | into tokens, and decides what type of thing each token is: a variable |
813 | name, an operator, a bareword, a subroutine, a core function, and so on. |
814 | The main point of entry to the lexer is C<yylex>, and that and its |
815 | associated routines can be found in F<toke.c>. Perl isn't much like |
816 | other computer languages; it's highly context sensitive at times, it can |
817 | be tricky to work out what sort of token something is, or where a token |
818 | ends. As such, there's a lot of interplay between the tokeniser and the |
819 | parser, which can get pretty frightening if you're not used to it. |
820 | |
821 | As the parser understands a Perl program, it builds up a tree of |
822 | operations for the interpreter to perform during execution. The routines |
823 | which construct and link together the various operations are to be found |
824 | in F<op.c>, and will be examined later. |
825 | |
826 | =item Optimization |
827 | |
828 | Now the parsing stage is complete, and the finished tree represents |
829 | the operations that the Perl interpreter needs to perform to execute our |
830 | program. Next, Perl does a dry run over the tree looking for |
831 | optimisations: constant expressions such as C<3 + 4> will be computed |
832 | now, and the optimizer will also see if any multiple operations can be |
833 | replaced with a single one. For instance, to fetch the variable C<$foo>, |
834 | instead of grabbing the glob C<*foo> and looking at the scalar |
835 | component, the optimizer fiddles the op tree to use a function which |
836 | directly looks up the scalar in question. The main optimizer is C<peep> |
837 | in F<op.c>, and many ops have their own optimizing functions. |
838 | |
839 | =item Running |
840 | |
841 | Now we're finally ready to go: we have compiled Perl byte code, and all |
842 | that's left to do is run it. The actual execution is done by the |
843 | C<runops_standard> function in F<run.c>; more specifically, it's done by |
844 | these three innocent looking lines: |
845 | |
846 | while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { |
847 | PERL_ASYNC_CHECK(); |
848 | } |
849 | |
850 | You may be more comfortable with the Perl version of that: |
851 | |
852 | PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; |
853 | |
854 | Well, maybe not. Anyway, each op contains a function pointer, which |
855 | stipulates the function which will actually carry out the operation. |
856 | This function will return the next op in the sequence - this allows for |
857 | things like C<if> which choose the next op dynamically at run time. |
858 | The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt |
859 | execution if required. |
860 | |
861 | The actual functions called are known as PP code, and they're spread |
862 | between four files: F<pp_hot.c> contains the `hot' code, which is most |
863 | often used and highly optimized, F<pp_sys.c> contains all the |
864 | system-specific functions, F<pp_ctl.c> contains the functions which |
865 | implement control structures (C<if>, C<while> and the like) and F<pp.c> |
866 | contains everything else. These are, if you like, the C code for Perl's |
867 | built-in functions and operators. |
868 | |
dfc98234 |
869 | Note that each C<pp_> function is expected to return a pointer to the next |
870 | op. Calls to perl subs (and eval blocks) are handled within the same |
871 | runops loop, and do not consume extra space on the C stack. For example, |
872 | C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block |
873 | struct onto the context stack which contain the address of the op |
874 | following the sub call or eval. They then return the first op of that sub |
875 | or eval block, and so execution continues of that sub or block. Later, a |
876 | C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>, |
877 | retrieves the return op from it, and returns it. |
878 | |
879 | =item Exception handing |
880 | |
881 | Perl's exception handing (ie C<die> etc) is built on top of the low-level |
882 | C<setjmp()>/C<longjmp()> C-library functions. These basically provide a |
883 | way to capture the current PC and SP registers and later restore them; ie |
884 | a C<longjmp()> continues at the point in code where a previous C<setjmp()> |
885 | was done, with anything further up on the C stack being lost. This is why |
886 | code should always save values using C<SAVE_FOO> rather than in auto |
887 | variables. |
888 | |
889 | The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and |
890 | C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and |
891 | C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while |
892 | C<die> within C<eval> does a C<JMPENV_JUMP(3)>. |
893 | |
894 | At entry points to perl, such as C<perl_parse()>, C<perl_run()> and |
895 | C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops |
896 | loop or whatever, and handle possible exception returns. For a 2 return, |
897 | final cleanup is performed, such as popping stacks and calling C<CHECK> or |
898 | C<END> blocks. Amongst other things, this is how scope cleanup still |
899 | occurs during an C<exit>. |
900 | |
901 | If a C<die> can find a C<CxEVAL> block on the context stack, then the |
902 | stack is popped to that level and the return op in that block is assigned |
903 | to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. This normally |
904 | passes control back to the guard. In the case of C<perl_run> and |
905 | C<call_sv>, a non-null C<PL_restartop> triggers re-entry to the runops |
906 | loop. The is the normal way that C<die> or C<croak> is handled within an |
907 | C<eval>. |
908 | |
909 | Sometimes ops are executed within an inner runops loop, such as tie, sort |
910 | or overload code. In this case, something like |
911 | |
912 | sub FETCH { eval { die } } |
913 | |
914 | would cause a longjmp right back to the guard in C<perl_run>, popping both |
915 | runops loops, which is clearly incorrect. One way to avoid this is for the |
916 | tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in the inner |
917 | runops loop, but for efficiency reasons, perl in fact just sets a flag, |
918 | using C<CATCH_SET(TRUE)>. The C<pp_require>, C<pp_entereval> and |
919 | C<pp_entertry> ops check this flag, and if true, they call C<docatch>, |
920 | which does a C<JMPENV_PUSH> and starts a new runops level to execute the |
921 | code, rather than doing it on the current loop. |
922 | |
923 | As a further optimisation, on exit from the eval block in the C<FETCH>, |
924 | execution of the code following the block is still carried on in the inner |
925 | loop. When an exception is raised, C<docatch> compares the C<JMPENV> |
926 | level of the C<CxEVAL> with C<PL_top_env> and if they differ, just |
927 | re-throws the exception. In this way any inner loops get popped. |
928 | |
929 | Here's an example. |
930 | |
931 | 1: eval { tie @a, 'A' }; |
932 | 2: sub A::TIEARRAY { |
933 | 3: eval { die }; |
934 | 4: die; |
935 | 5: } |
936 | |
937 | To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> then |
938 | enters a runops loop. This loop executes the eval and tie ops on line 1, |
939 | with the eval pushing a C<CxEVAL> onto the context stack. |
940 | |
941 | The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops loop |
942 | to execute the body of C<TIEARRAY>. When it executes the entertry op on |
943 | line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> which |
944 | does a C<JMPENV_PUSH> and starts a third runops loop, which then executes |
945 | the die op. At this point the C call stack looks like this: |
946 | |
947 | Perl_pp_die |
948 | Perl_runops # third loop |
949 | S_docatch_body |
950 | S_docatch |
951 | Perl_pp_entertry |
952 | Perl_runops # second loop |
953 | S_call_body |
954 | Perl_call_sv |
955 | Perl_pp_tie |
956 | Perl_runops # first loop |
957 | S_run_body |
958 | perl_run |
959 | main |
960 | |
961 | and the context and data stacks, as shown by C<-Dstv>, look like: |
962 | |
963 | STACK 0: MAIN |
964 | CX 0: BLOCK => |
965 | CX 1: EVAL => AV() PV("A"\0) |
966 | retop=leave |
967 | STACK 1: MAGIC |
968 | CX 0: SUB => |
969 | retop=(null) |
970 | CX 1: EVAL => * |
971 | retop=nextstate |
972 | |
973 | The die pops the first C<CxEVAL> off the context stack, sets |
974 | C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns to |
975 | the top C<docatch>. This then starts another third-level runops level, |
976 | which executes the nextstate, pushmark and die ops on line 4. At the point |
977 | that the second C<pp_die> is called, the C call stack looks exactly like |
978 | that above, even though we are no longer within an inner eval; this is |
979 | because of the optimization mentioned earlier. However, the context stack |
980 | now looks like this, ie with the top CxEVAL popped: |
981 | |
982 | STACK 0: MAIN |
983 | CX 0: BLOCK => |
984 | CX 1: EVAL => AV() PV("A"\0) |
985 | retop=leave |
986 | STACK 1: MAGIC |
987 | CX 0: SUB => |
988 | retop=(null) |
989 | |
990 | The die on line 4 pops the context stack back down to the CxEVAL, leaving |
991 | it as: |
992 | |
993 | STACK 0: MAIN |
994 | CX 0: BLOCK => |
995 | |
996 | As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a |
997 | C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch: |
998 | |
999 | S_docatch |
1000 | Perl_pp_entertry |
1001 | Perl_runops # second loop |
1002 | S_call_body |
1003 | Perl_call_sv |
1004 | Perl_pp_tie |
1005 | Perl_runops # first loop |
1006 | S_run_body |
1007 | perl_run |
1008 | main |
1009 | |
1010 | In this case, because the C<JMPENV> level recorded in the C<CxEVAL> |
1011 | differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)> |
1012 | and the C stack unwinds to: |
1013 | |
1014 | perl_run |
1015 | main |
1016 | |
1017 | Because C<PL_restartop> is non-null, C<run_body> starts a new runops loop |
1018 | and execution continues. |
1019 | |
a422fd2d |
1020 | =back |
1021 | |
1022 | =head2 Internal Variable Types |
1023 | |
1024 | You should by now have had a look at L<perlguts>, which tells you about |
1025 | Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do |
1026 | that now. |
1027 | |
1028 | These variables are used not only to represent Perl-space variables, but |
1029 | also any constants in the code, as well as some structures completely |
1030 | internal to Perl. The symbol table, for instance, is an ordinary Perl |
1031 | hash. Your code is represented by an SV as it's read into the parser; |
1032 | any program files you call are opened via ordinary Perl filehandles, and |
1033 | so on. |
1034 | |
1035 | The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a |
1036 | Perl program. Let's see, for instance, how Perl treats the constant |
1037 | C<"hello">. |
1038 | |
1039 | % perl -MDevel::Peek -e 'Dump("hello")' |
1040 | 1 SV = PV(0xa041450) at 0xa04ecbc |
1041 | 2 REFCNT = 1 |
1042 | 3 FLAGS = (POK,READONLY,pPOK) |
1043 | 4 PV = 0xa0484e0 "hello"\0 |
1044 | 5 CUR = 5 |
1045 | 6 LEN = 6 |
1046 | |
1047 | Reading C<Devel::Peek> output takes a bit of practise, so let's go |
1048 | through it line by line. |
1049 | |
1050 | Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in |
1051 | memory. SVs themselves are very simple structures, but they contain a |
1052 | pointer to a more complex structure. In this case, it's a PV, a |
1053 | structure which holds a string value, at location C<0xa041450>. Line 2 |
1054 | is the reference count; there are no other references to this data, so |
1055 | it's 1. |
1056 | |
1057 | Line 3 are the flags for this SV - it's OK to use it as a PV, it's a |
1058 | read-only SV (because it's a constant) and the data is a PV internally. |
1059 | Next we've got the contents of the string, starting at location |
1060 | C<0xa0484e0>. |
1061 | |
1062 | Line 5 gives us the current length of the string - note that this does |
1063 | B<not> include the null terminator. Line 6 is not the length of the |
1064 | string, but the length of the currently allocated buffer; as the string |
1065 | grows, Perl automatically extends the available storage via a routine |
1066 | called C<SvGROW>. |
1067 | |
1068 | You can get at any of these quantities from C very easily; just add |
1069 | C<Sv> to the name of the field shown in the snippet, and you've got a |
1070 | macro which will return the value: C<SvCUR(sv)> returns the current |
1071 | length of the string, C<SvREFCOUNT(sv)> returns the reference count, |
1072 | C<SvPV(sv, len)> returns the string itself with its length, and so on. |
1073 | More macros to manipulate these properties can be found in L<perlguts>. |
1074 | |
1075 | Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c> |
1076 | |
1077 | 1 void |
1078 | 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) |
1079 | 3 { |
1080 | 4 STRLEN tlen; |
1081 | 5 char *junk; |
1082 | |
1083 | 6 junk = SvPV_force(sv, tlen); |
1084 | 7 SvGROW(sv, tlen + len + 1); |
1085 | 8 if (ptr == junk) |
1086 | 9 ptr = SvPVX(sv); |
1087 | 10 Move(ptr,SvPVX(sv)+tlen,len,char); |
1088 | 11 SvCUR(sv) += len; |
1089 | 12 *SvEND(sv) = '\0'; |
1090 | 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ |
1091 | 14 SvTAINT(sv); |
1092 | 15 } |
1093 | |
1094 | This is a function which adds a string, C<ptr>, of length C<len> onto |
1095 | the end of the PV stored in C<sv>. The first thing we do in line 6 is |
1096 | make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> |
1097 | macro to force a PV. As a side effect, C<tlen> gets set to the current |
1098 | value of the PV, and the PV itself is returned to C<junk>. |
1099 | |
b1866b2d |
1100 | In line 7, we make sure that the SV will have enough room to accommodate |
a422fd2d |
1101 | the old string, the new string and the null terminator. If C<LEN> isn't |
1102 | big enough, C<SvGROW> will reallocate space for us. |
1103 | |
1104 | Now, if C<junk> is the same as the string we're trying to add, we can |
1105 | grab the string directly from the SV; C<SvPVX> is the address of the PV |
1106 | in the SV. |
1107 | |
1108 | Line 10 does the actual catenation: the C<Move> macro moves a chunk of |
1109 | memory around: we move the string C<ptr> to the end of the PV - that's |
1110 | the start of the PV plus its current length. We're moving C<len> bytes |
1111 | of type C<char>. After doing so, we need to tell Perl we've extended the |
1112 | string, by altering C<CUR> to reflect the new length. C<SvEND> is a |
1113 | macro which gives us the end of the string, so that needs to be a |
1114 | C<"\0">. |
1115 | |
1116 | Line 13 manipulates the flags; since we've changed the PV, any IV or NV |
1117 | values will no longer be valid: if we have C<$a=10; $a.="6";> we don't |
1e54db1a |
1118 | want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware |
a422fd2d |
1119 | version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags |
1120 | and turns on POK. The final C<SvTAINT> is a macro which launders tainted |
1121 | data if taint mode is turned on. |
1122 | |
1123 | AVs and HVs are more complicated, but SVs are by far the most common |
1124 | variable type being thrown around. Having seen something of how we |
1125 | manipulate these, let's go on and look at how the op tree is |
1126 | constructed. |
1127 | |
1128 | =head2 Op Trees |
1129 | |
1130 | First, what is the op tree, anyway? The op tree is the parsed |
1131 | representation of your program, as we saw in our section on parsing, and |
1132 | it's the sequence of operations that Perl goes through to execute your |
1133 | program, as we saw in L</Running>. |
1134 | |
1135 | An op is a fundamental operation that Perl can perform: all the built-in |
1136 | functions and operators are ops, and there are a series of ops which |
1137 | deal with concepts the interpreter needs internally - entering and |
1138 | leaving a block, ending a statement, fetching a variable, and so on. |
1139 | |
1140 | The op tree is connected in two ways: you can imagine that there are two |
1141 | "routes" through it, two orders in which you can traverse the tree. |
1142 | First, parse order reflects how the parser understood the code, and |
1143 | secondly, execution order tells perl what order to perform the |
1144 | operations in. |
1145 | |
1146 | The easiest way to examine the op tree is to stop Perl after it has |
1147 | finished parsing, and get it to dump out the tree. This is exactly what |
7d7d5695 |
1148 | the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise> |
1149 | and L<B::Debug|B::Debug> do. |
a422fd2d |
1150 | |
1151 | Let's have a look at how Perl sees C<$a = $b + $c>: |
1152 | |
1153 | % perl -MO=Terse -e '$a=$b+$c' |
1154 | 1 LISTOP (0x8179888) leave |
1155 | 2 OP (0x81798b0) enter |
1156 | 3 COP (0x8179850) nextstate |
1157 | 4 BINOP (0x8179828) sassign |
1158 | 5 BINOP (0x8179800) add [1] |
1159 | 6 UNOP (0x81796e0) null [15] |
1160 | 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b |
1161 | 8 UNOP (0x81797e0) null [15] |
1162 | 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c |
1163 | 10 UNOP (0x816b4f0) null [15] |
1164 | 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a |
1165 | |
1166 | Let's start in the middle, at line 4. This is a BINOP, a binary |
1167 | operator, which is at location C<0x8179828>. The specific operator in |
1168 | question is C<sassign> - scalar assignment - and you can find the code |
1169 | which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a |
1170 | binary operator, it has two children: the add operator, providing the |
1171 | result of C<$b+$c>, is uppermost on line 5, and the left hand side is on |
1172 | line 10. |
1173 | |
1174 | Line 10 is the null op: this does exactly nothing. What is that doing |
1175 | there? If you see the null op, it's a sign that something has been |
1176 | optimized away after parsing. As we mentioned in L</Optimization>, |
1177 | the optimization stage sometimes converts two operations into one, for |
1178 | example when fetching a scalar variable. When this happens, instead of |
1179 | rewriting the op tree and cleaning up the dangling pointers, it's easier |
1180 | just to replace the redundant operation with the null op. Originally, |
1181 | the tree would have looked like this: |
1182 | |
1183 | 10 SVOP (0x816b4f0) rv2sv [15] |
1184 | 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a |
1185 | |
1186 | That is, fetch the C<a> entry from the main symbol table, and then look |
1187 | at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) |
1188 | happens to do both these things. |
1189 | |
1190 | The right hand side, starting at line 5 is similar to what we've just |
1191 | seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together |
1192 | two C<gvsv>s. |
1193 | |
1194 | Now, what's this about? |
1195 | |
1196 | 1 LISTOP (0x8179888) leave |
1197 | 2 OP (0x81798b0) enter |
1198 | 3 COP (0x8179850) nextstate |
1199 | |
1200 | C<enter> and C<leave> are scoping ops, and their job is to perform any |
1201 | housekeeping every time you enter and leave a block: lexical variables |
1202 | are tidied up, unreferenced variables are destroyed, and so on. Every |
1203 | program will have those first three lines: C<leave> is a list, and its |
1204 | children are all the statements in the block. Statements are delimited |
1205 | by C<nextstate>, so a block is a collection of C<nextstate> ops, with |
1206 | the ops to be performed for each statement being the children of |
1207 | C<nextstate>. C<enter> is a single op which functions as a marker. |
1208 | |
1209 | That's how Perl parsed the program, from top to bottom: |
1210 | |
1211 | Program |
1212 | | |
1213 | Statement |
1214 | | |
1215 | = |
1216 | / \ |
1217 | / \ |
1218 | $a + |
1219 | / \ |
1220 | $b $c |
1221 | |
1222 | However, it's impossible to B<perform> the operations in this order: |
1223 | you have to find the values of C<$b> and C<$c> before you add them |
1224 | together, for instance. So, the other thread that runs through the op |
1225 | tree is the execution order: each op has a field C<op_next> which points |
1226 | to the next op to be run, so following these pointers tells us how perl |
1227 | executes the code. We can traverse the tree in this order using |
1228 | the C<exec> option to C<B::Terse>: |
1229 | |
1230 | % perl -MO=Terse,exec -e '$a=$b+$c' |
1231 | 1 OP (0x8179928) enter |
1232 | 2 COP (0x81798c8) nextstate |
1233 | 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b |
1234 | 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c |
1235 | 5 BINOP (0x8179878) add [1] |
1236 | 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a |
1237 | 7 BINOP (0x81798a0) sassign |
1238 | 8 LISTOP (0x8179900) leave |
1239 | |
1240 | This probably makes more sense for a human: enter a block, start a |
1241 | statement. Get the values of C<$b> and C<$c>, and add them together. |
1242 | Find C<$a>, and assign one to the other. Then leave. |
1243 | |
1244 | The way Perl builds up these op trees in the parsing process can be |
1245 | unravelled by examining F<perly.y>, the YACC grammar. Let's take the |
1246 | piece we need to construct the tree for C<$a = $b + $c> |
1247 | |
1248 | 1 term : term ASSIGNOP term |
1249 | 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } |
1250 | 3 | term ADDOP term |
1251 | 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } |
1252 | |
1253 | If you're not used to reading BNF grammars, this is how it works: You're |
1254 | fed certain things by the tokeniser, which generally end up in upper |
1255 | case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your |
1256 | code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are |
1257 | `terminal symbols', because you can't get any simpler than them. |
1258 | |
1259 | The grammar, lines one and three of the snippet above, tells you how to |
1260 | build up more complex forms. These complex forms, `non-terminal symbols' |
1261 | are generally placed in lower case. C<term> here is a non-terminal |
1262 | symbol, representing a single expression. |
1263 | |
1264 | The grammar gives you the following rule: you can make the thing on the |
1265 | left of the colon if you see all the things on the right in sequence. |
1266 | This is called a "reduction", and the aim of parsing is to completely |
1267 | reduce the input. There are several different ways you can perform a |
1268 | reduction, separated by vertical bars: so, C<term> followed by C<=> |
1269 | followed by C<term> makes a C<term>, and C<term> followed by C<+> |
1270 | followed by C<term> can also make a C<term>. |
1271 | |
1272 | So, if you see two terms with an C<=> or C<+>, between them, you can |
1273 | turn them into a single expression. When you do this, you execute the |
1274 | code in the block on the next line: if you see C<=>, you'll do the code |
1275 | in line 2. If you see C<+>, you'll do the code in line 4. It's this code |
1276 | which contributes to the op tree. |
1277 | |
1278 | | term ADDOP term |
1279 | { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } |
1280 | |
1281 | What this does is creates a new binary op, and feeds it a number of |
1282 | variables. The variables refer to the tokens: C<$1> is the first token in |
1283 | the input, C<$2> the second, and so on - think regular expression |
1284 | backreferences. C<$$> is the op returned from this reduction. So, we |
1285 | call C<newBINOP> to create a new binary operator. The first parameter to |
1286 | C<newBINOP>, a function in F<op.c>, is the op type. It's an addition |
1287 | operator, so we want the type to be C<ADDOP>. We could specify this |
1288 | directly, but it's right there as the second token in the input, so we |
1289 | use C<$2>. The second parameter is the op's flags: 0 means `nothing |
1290 | special'. Then the things to add: the left and right hand side of our |
1291 | expression, in scalar context. |
1292 | |
1293 | =head2 Stacks |
1294 | |
1295 | When perl executes something like C<addop>, how does it pass on its |
1296 | results to the next op? The answer is, through the use of stacks. Perl |
1297 | has a number of stacks to store things it's currently working on, and |
1298 | we'll look at the three most important ones here. |
1299 | |
1300 | =over 3 |
1301 | |
1302 | =item Argument stack |
1303 | |
1304 | Arguments are passed to PP code and returned from PP code using the |
1305 | argument stack, C<ST>. The typical way to handle arguments is to pop |
1306 | them off the stack, deal with them how you wish, and then push the result |
1307 | back onto the stack. This is how, for instance, the cosine operator |
1308 | works: |
1309 | |
1310 | NV value; |
1311 | value = POPn; |
1312 | value = Perl_cos(value); |
1313 | XPUSHn(value); |
1314 | |
1315 | We'll see a more tricky example of this when we consider Perl's macros |
1316 | below. C<POPn> gives you the NV (floating point value) of the top SV on |
1317 | the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push |
1318 | the result back as an NV. The C<X> in C<XPUSHn> means that the stack |
1319 | should be extended if necessary - it can't be necessary here, because we |
1320 | know there's room for one more item on the stack, since we've just |
1321 | removed one! The C<XPUSH*> macros at least guarantee safety. |
1322 | |
1323 | Alternatively, you can fiddle with the stack directly: C<SP> gives you |
1324 | the first element in your portion of the stack, and C<TOP*> gives you |
1325 | the top SV/IV/NV/etc. on the stack. So, for instance, to do unary |
1326 | negation of an integer: |
1327 | |
1328 | SETi(-TOPi); |
1329 | |
1330 | Just set the integer value of the top stack entry to its negation. |
1331 | |
1332 | Argument stack manipulation in the core is exactly the same as it is in |
1333 | XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer |
1334 | description of the macros used in stack manipulation. |
1335 | |
1336 | =item Mark stack |
1337 | |
1338 | I say `your portion of the stack' above because PP code doesn't |
1339 | necessarily get the whole stack to itself: if your function calls |
1340 | another function, you'll only want to expose the arguments aimed for the |
1341 | called function, and not (necessarily) let it get at your own data. The |
1342 | way we do this is to have a `virtual' bottom-of-stack, exposed to each |
1343 | function. The mark stack keeps bookmarks to locations in the argument |
1344 | stack usable by each function. For instance, when dealing with a tied |
1345 | variable, (internally, something with `P' magic) Perl has to call |
1346 | methods for accesses to the tied variables. However, we need to separate |
1347 | the arguments exposed to the method to the argument exposed to the |
1348 | original function - the store or fetch or whatever it may be. Here's how |
1349 | the tied C<push> is implemented; see C<av_push> in F<av.c>: |
1350 | |
1351 | 1 PUSHMARK(SP); |
1352 | 2 EXTEND(SP,2); |
1353 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); |
1354 | 4 PUSHs(val); |
1355 | 5 PUTBACK; |
1356 | 6 ENTER; |
1357 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); |
1358 | 8 LEAVE; |
1359 | 9 POPSTACK; |
13a2d996 |
1360 | |
a422fd2d |
1361 | The lines which concern the mark stack are the first, fifth and last |
1362 | lines: they save away, restore and remove the current position of the |
1363 | argument stack. |
1364 | |
1365 | Let's examine the whole implementation, for practice: |
1366 | |
1367 | 1 PUSHMARK(SP); |
1368 | |
1369 | Push the current state of the stack pointer onto the mark stack. This is |
1370 | so that when we've finished adding items to the argument stack, Perl |
1371 | knows how many things we've added recently. |
1372 | |
1373 | 2 EXTEND(SP,2); |
1374 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); |
1375 | 4 PUSHs(val); |
1376 | |
1377 | We're going to add two more items onto the argument stack: when you have |
1378 | a tied array, the C<PUSH> subroutine receives the object and the value |
1379 | to be pushed, and that's exactly what we have here - the tied object, |
1380 | retrieved with C<SvTIED_obj>, and the value, the SV C<val>. |
1381 | |
1382 | 5 PUTBACK; |
1383 | |
1384 | Next we tell Perl to make the change to the global stack pointer: C<dSP> |
1385 | only gave us a local copy, not a reference to the global. |
1386 | |
1387 | 6 ENTER; |
1388 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); |
1389 | 8 LEAVE; |
1390 | |
1391 | C<ENTER> and C<LEAVE> localise a block of code - they make sure that all |
1392 | variables are tidied up, everything that has been localised gets |
1393 | its previous value returned, and so on. Think of them as the C<{> and |
1394 | C<}> of a Perl block. |
1395 | |
1396 | To actually do the magic method call, we have to call a subroutine in |
1397 | Perl space: C<call_method> takes care of that, and it's described in |
1398 | L<perlcall>. We call the C<PUSH> method in scalar context, and we're |
1399 | going to discard its return value. |
1400 | |
1401 | 9 POPSTACK; |
1402 | |
1403 | Finally, we remove the value we placed on the mark stack, since we |
1404 | don't need it any more. |
1405 | |
1406 | =item Save stack |
1407 | |
1408 | C doesn't have a concept of local scope, so perl provides one. We've |
1409 | seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save |
1410 | stack implements the C equivalent of, for example: |
1411 | |
1412 | { |
1413 | local $foo = 42; |
1414 | ... |
1415 | } |
1416 | |
1417 | See L<perlguts/Localising Changes> for how to use the save stack. |
1418 | |
1419 | =back |
1420 | |
1421 | =head2 Millions of Macros |
1422 | |
1423 | One thing you'll notice about the Perl source is that it's full of |
1424 | macros. Some have called the pervasive use of macros the hardest thing |
1425 | to understand, others find it adds to clarity. Let's take an example, |
1426 | the code which implements the addition operator: |
1427 | |
1428 | 1 PP(pp_add) |
1429 | 2 { |
39644a26 |
1430 | 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); |
a422fd2d |
1431 | 4 { |
1432 | 5 dPOPTOPnnrl_ul; |
1433 | 6 SETn( left + right ); |
1434 | 7 RETURN; |
1435 | 8 } |
1436 | 9 } |
1437 | |
1438 | Every line here (apart from the braces, of course) contains a macro. The |
1439 | first line sets up the function declaration as Perl expects for PP code; |
1440 | line 3 sets up variable declarations for the argument stack and the |
1441 | target, the return value of the operation. Finally, it tries to see if |
1442 | the addition operation is overloaded; if so, the appropriate subroutine |
1443 | is called. |
1444 | |
1445 | Line 5 is another variable declaration - all variable declarations start |
1446 | with C<d> - which pops from the top of the argument stack two NVs (hence |
1447 | C<nn>) and puts them into the variables C<right> and C<left>, hence the |
1448 | C<rl>. These are the two operands to the addition operator. Next, we |
1449 | call C<SETn> to set the NV of the return value to the result of adding |
1450 | the two values. This done, we return - the C<RETURN> macro makes sure |
1451 | that our return value is properly handled, and we pass the next operator |
1452 | to run back to the main run loop. |
1453 | |
1454 | Most of these macros are explained in L<perlapi>, and some of the more |
1455 | important ones are explained in L<perlxs> as well. Pay special attention |
1456 | to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on |
1457 | the C<[pad]THX_?> macros. |
1458 | |
52d59bef |
1459 | =head2 The .i Targets |
1460 | |
1461 | You can expand the macros in a F<foo.c> file by saying |
1462 | |
1463 | make foo.i |
1464 | |
1465 | which will expand the macros using cpp. Don't be scared by the results. |
1466 | |
a422fd2d |
1467 | =head2 Poking at Perl |
1468 | |
1469 | To really poke around with Perl, you'll probably want to build Perl for |
1470 | debugging, like this: |
1471 | |
1472 | ./Configure -d -D optimize=-g |
1473 | make |
1474 | |
1475 | C<-g> is a flag to the C compiler to have it produce debugging |
1476 | information which will allow us to step through a running program. |
1477 | F<Configure> will also turn on the C<DEBUGGING> compilation symbol which |
1478 | enables all the internal debugging code in Perl. There are a whole bunch |
1479 | of things you can debug with this: L<perlrun> lists them all, and the |
1480 | best way to find out about them is to play about with them. The most |
1481 | useful options are probably |
1482 | |
1483 | l Context (loop) stack processing |
1484 | t Trace execution |
1485 | o Method and overloading resolution |
1486 | c String/numeric conversions |
1487 | |
1488 | Some of the functionality of the debugging code can be achieved using XS |
1489 | modules. |
13a2d996 |
1490 | |
a422fd2d |
1491 | -Dr => use re 'debug' |
1492 | -Dx => use O 'Debug' |
1493 | |
1494 | =head2 Using a source-level debugger |
1495 | |
1496 | If the debugging output of C<-D> doesn't help you, it's time to step |
1497 | through perl's execution with a source-level debugger. |
1498 | |
1499 | =over 3 |
1500 | |
1501 | =item * |
1502 | |
1503 | We'll use C<gdb> for our examples here; the principles will apply to any |
1504 | debugger, but check the manual of the one you're using. |
1505 | |
1506 | =back |
1507 | |
1508 | To fire up the debugger, type |
1509 | |
1510 | gdb ./perl |
1511 | |
1512 | You'll want to do that in your Perl source tree so the debugger can read |
1513 | the source code. You should see the copyright message, followed by the |
1514 | prompt. |
1515 | |
1516 | (gdb) |
1517 | |
1518 | C<help> will get you into the documentation, but here are the most |
1519 | useful commands: |
1520 | |
1521 | =over 3 |
1522 | |
1523 | =item run [args] |
1524 | |
1525 | Run the program with the given arguments. |
1526 | |
1527 | =item break function_name |
1528 | |
1529 | =item break source.c:xxx |
1530 | |
1531 | Tells the debugger that we'll want to pause execution when we reach |
cea6626f |
1532 | either the named function (but see L<perlguts/Internal Functions>!) or the given |
a422fd2d |
1533 | line in the named source file. |
1534 | |
1535 | =item step |
1536 | |
1537 | Steps through the program a line at a time. |
1538 | |
1539 | =item next |
1540 | |
1541 | Steps through the program a line at a time, without descending into |
1542 | functions. |
1543 | |
1544 | =item continue |
1545 | |
1546 | Run until the next breakpoint. |
1547 | |
1548 | =item finish |
1549 | |
1550 | Run until the end of the current function, then stop again. |
1551 | |
13a2d996 |
1552 | =item 'enter' |
a422fd2d |
1553 | |
1554 | Just pressing Enter will do the most recent operation again - it's a |
1555 | blessing when stepping through miles of source code. |
1556 | |
1557 | =item print |
1558 | |
1559 | Execute the given C code and print its results. B<WARNING>: Perl makes |
52d59bef |
1560 | heavy use of macros, and F<gdb> does not necessarily support macros |
1561 | (see later L</"gdb macro support">). You'll have to substitute them |
1562 | yourself, or to invoke cpp on the source code files |
1563 | (see L</"The .i Targets">) |
1564 | So, for instance, you can't say |
a422fd2d |
1565 | |
1566 | print SvPV_nolen(sv) |
1567 | |
1568 | but you have to say |
1569 | |
1570 | print Perl_sv_2pv_nolen(sv) |
1571 | |
ffc145e8 |
1572 | =back |
1573 | |
a422fd2d |
1574 | You may find it helpful to have a "macro dictionary", which you can |
1575 | produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't |
52d59bef |
1576 | recursively apply those macros for you. |
1577 | |
1578 | =head2 gdb macro support |
a422fd2d |
1579 | |
52d59bef |
1580 | Recent versions of F<gdb> have fairly good macro support, but |
ea031e66 |
1581 | in order to use it you'll need to compile perl with macro definitions |
1582 | included in the debugging information. Using F<gcc> version 3.1, this |
1583 | means configuring with C<-Doptimize=-g3>. Other compilers might use a |
1584 | different switch (if they support debugging macros at all). |
1585 | |
a422fd2d |
1586 | =head2 Dumping Perl Data Structures |
1587 | |
1588 | One way to get around this macro hell is to use the dumping functions in |
1589 | F<dump.c>; these work a little like an internal |
1590 | L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures |
1591 | that you can't get at from Perl. Let's take an example. We'll use the |
1592 | C<$a = $b + $c> we used before, but give it a bit of context: |
1593 | C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? |
1594 | |
1595 | What about C<pp_add>, the function we examined earlier to implement the |
1596 | C<+> operator: |
1597 | |
1598 | (gdb) break Perl_pp_add |
1599 | Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. |
1600 | |
cea6626f |
1601 | Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>. |
a422fd2d |
1602 | With the breakpoint in place, we can run our program: |
1603 | |
1604 | (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' |
1605 | |
1606 | Lots of junk will go past as gdb reads in the relevant source files and |
1607 | libraries, and then: |
1608 | |
1609 | Breakpoint 1, Perl_pp_add () at pp_hot.c:309 |
39644a26 |
1610 | 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); |
a422fd2d |
1611 | (gdb) step |
1612 | 311 dPOPTOPnnrl_ul; |
1613 | (gdb) |
1614 | |
1615 | We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul> |
1616 | arranges for two C<NV>s to be placed into C<left> and C<right> - let's |
1617 | slightly expand it: |
1618 | |
1619 | #define dPOPTOPnnrl_ul NV right = POPn; \ |
1620 | SV *leftsv = TOPs; \ |
1621 | NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 |
1622 | |
1623 | C<POPn> takes the SV from the top of the stack and obtains its NV either |
1624 | directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function. |
1625 | C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses |
1626 | C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from |
1627 | C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>. |
1628 | |
1629 | Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to |
1630 | convert it. If we step again, we'll find ourselves there: |
1631 | |
1632 | Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 |
1633 | 1669 if (!sv) |
1634 | (gdb) |
1635 | |
1636 | We can now use C<Perl_sv_dump> to investigate the SV: |
1637 | |
1638 | SV = PV(0xa057cc0) at 0xa0675d0 |
1639 | REFCNT = 1 |
1640 | FLAGS = (POK,pPOK) |
1641 | PV = 0xa06a510 "6XXXX"\0 |
1642 | CUR = 5 |
1643 | LEN = 6 |
1644 | $1 = void |
1645 | |
1646 | We know we're going to get C<6> from this, so let's finish the |
1647 | subroutine: |
1648 | |
1649 | (gdb) finish |
1650 | Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 |
1651 | 0x462669 in Perl_pp_add () at pp_hot.c:311 |
1652 | 311 dPOPTOPnnrl_ul; |
1653 | |
1654 | We can also dump out this op: the current op is always stored in |
1655 | C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us |
1656 | similar output to L<B::Debug|B::Debug>. |
1657 | |
1658 | { |
1659 | 13 TYPE = add ===> 14 |
1660 | TARG = 1 |
1661 | FLAGS = (SCALAR,KIDS) |
1662 | { |
1663 | TYPE = null ===> (12) |
1664 | (was rv2sv) |
1665 | FLAGS = (SCALAR,KIDS) |
1666 | { |
1667 | 11 TYPE = gvsv ===> 12 |
1668 | FLAGS = (SCALAR) |
1669 | GV = main::b |
1670 | } |
1671 | } |
1672 | |
10f58044 |
1673 | # finish this later # |
a422fd2d |
1674 | |
1675 | =head2 Patching |
1676 | |
1677 | All right, we've now had a look at how to navigate the Perl sources and |
1678 | some things you'll need to know when fiddling with them. Let's now get |
1679 | on and create a simple patch. Here's something Larry suggested: if a |
1680 | C<U> is the first active format during a C<pack>, (for example, |
1681 | C<pack "U3C8", @stuff>) then the resulting string should be treated as |
1e54db1a |
1682 | UTF-8 encoded. |
a422fd2d |
1683 | |
1684 | How do we prepare to fix this up? First we locate the code in question - |
1685 | the C<pack> happens at runtime, so it's going to be in one of the F<pp> |
1686 | files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be |
1687 | altering this file, let's copy it to F<pp.c~>. |
1688 | |
a6ec74c1 |
1689 | [Well, it was in F<pp.c> when this tutorial was written. It has now been |
1690 | split off with C<pp_unpack> to its own file, F<pp_pack.c>] |
1691 | |
a422fd2d |
1692 | Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then |
1693 | loop over the pattern, taking each format character in turn into |
1694 | C<datum_type>. Then for each possible format character, we swallow up |
1695 | the other arguments in the pattern (a field width, an asterisk, and so |
1696 | on) and convert the next chunk input into the specified format, adding |
1697 | it onto the output SV C<cat>. |
1698 | |
1699 | How do we know if the C<U> is the first format in the C<pat>? Well, if |
1700 | we have a pointer to the start of C<pat> then, if we see a C<U> we can |
1701 | test whether we're still at the start of the string. So, here's where |
1702 | C<pat> is set up: |
1703 | |
1704 | STRLEN fromlen; |
1705 | register char *pat = SvPVx(*++MARK, fromlen); |
1706 | register char *patend = pat + fromlen; |
1707 | register I32 len; |
1708 | I32 datumtype; |
1709 | SV *fromstr; |
1710 | |
1711 | We'll have another string pointer in there: |
1712 | |
1713 | STRLEN fromlen; |
1714 | register char *pat = SvPVx(*++MARK, fromlen); |
1715 | register char *patend = pat + fromlen; |
1716 | + char *patcopy; |
1717 | register I32 len; |
1718 | I32 datumtype; |
1719 | SV *fromstr; |
1720 | |
1721 | And just before we start the loop, we'll set C<patcopy> to be the start |
1722 | of C<pat>: |
1723 | |
1724 | items = SP - MARK; |
1725 | MARK++; |
1726 | sv_setpvn(cat, "", 0); |
1727 | + patcopy = pat; |
1728 | while (pat < patend) { |
1729 | |
1730 | Now if we see a C<U> which was at the start of the string, we turn on |
1e54db1a |
1731 | the C<UTF8> flag for the output SV, C<cat>: |
a422fd2d |
1732 | |
1733 | + if (datumtype == 'U' && pat==patcopy+1) |
1734 | + SvUTF8_on(cat); |
1735 | if (datumtype == '#') { |
1736 | while (pat < patend && *pat != '\n') |
1737 | pat++; |
1738 | |
1739 | Remember that it has to be C<patcopy+1> because the first character of |
1740 | the string is the C<U> which has been swallowed into C<datumtype!> |
1741 | |
1742 | Oops, we forgot one thing: what if there are spaces at the start of the |
1743 | pattern? C<pack(" U*", @stuff)> will have C<U> as the first active |
1744 | character, even though it's not the first thing in the pattern. In this |
1745 | case, we have to advance C<patcopy> along with C<pat> when we see spaces: |
1746 | |
1747 | if (isSPACE(datumtype)) |
1748 | continue; |
1749 | |
1750 | needs to become |
1751 | |
1752 | if (isSPACE(datumtype)) { |
1753 | patcopy++; |
1754 | continue; |
1755 | } |
1756 | |
1757 | OK. That's the C part done. Now we must do two additional things before |
1758 | this patch is ready to go: we've changed the behaviour of Perl, and so |
1759 | we must document that change. We must also provide some more regression |
1760 | tests to make sure our patch works and doesn't create a bug somewhere |
1761 | else along the line. |
1762 | |
b23b8711 |
1763 | The regression tests for each operator live in F<t/op/>, and so we |
1764 | make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our |
1765 | tests to the end. First, we'll test that the C<U> does indeed create |
1766 | Unicode strings. |
1767 | |
1768 | t/op/pack.t has a sensible ok() function, but if it didn't we could |
35c336e6 |
1769 | use the one from t/test.pl. |
b23b8711 |
1770 | |
35c336e6 |
1771 | require './test.pl'; |
1772 | plan( tests => 159 ); |
b23b8711 |
1773 | |
1774 | so instead of this: |
a422fd2d |
1775 | |
1776 | print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000); |
1777 | print "ok $test\n"; $test++; |
1778 | |
35c336e6 |
1779 | we can write the more sensible (see L<Test::More> for a full |
1780 | explanation of is() and other testing functions). |
b23b8711 |
1781 | |
35c336e6 |
1782 | is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), |
812f5127 |
1783 | "U* produces unicode" ); |
b23b8711 |
1784 | |
a422fd2d |
1785 | Now we'll test that we got that space-at-the-beginning business right: |
1786 | |
35c336e6 |
1787 | is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000), |
812f5127 |
1788 | " with spaces at the beginning" ); |
a422fd2d |
1789 | |
1790 | And finally we'll test that we don't make Unicode strings if C<U> is B<not> |
1791 | the first active format: |
1792 | |
35c336e6 |
1793 | isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000), |
812f5127 |
1794 | "U* not first isn't unicode" ); |
a422fd2d |
1795 | |
35c336e6 |
1796 | Mustn't forget to change the number of tests which appears at the top, |
1797 | or else the automated tester will get confused. This will either look |
1798 | like this: |
a422fd2d |
1799 | |
35c336e6 |
1800 | print "1..156\n"; |
1801 | |
1802 | or this: |
1803 | |
1804 | plan( tests => 156 ); |
a422fd2d |
1805 | |
1806 | We now compile up Perl, and run it through the test suite. Our new |
1807 | tests pass, hooray! |
1808 | |
1809 | Finally, the documentation. The job is never done until the paperwork is |
1810 | over, so let's describe the change we've just made. The relevant place |
1811 | is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert |
1812 | this text in the description of C<pack>: |
1813 | |
1814 | =item * |
1815 | |
1816 | If the pattern begins with a C<U>, the resulting string will be treated |
1e54db1a |
1817 | as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string |
1818 | with an initial C<U0>, and the bytes that follow will be interpreted as |
1819 | Unicode characters. If you don't want this to happen, you can begin your |
1820 | pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your |
a422fd2d |
1821 | string, and then follow this with a C<U*> somewhere in your pattern. |
1822 | |
1823 | All done. Now let's create the patch. F<Porting/patching.pod> tells us |
1824 | that if we're making major changes, we should copy the entire directory |
1825 | to somewhere safe before we begin fiddling, and then do |
13a2d996 |
1826 | |
a422fd2d |
1827 | diff -ruN old new > patch |
1828 | |
1829 | However, we know which files we've changed, and we can simply do this: |
1830 | |
1831 | diff -u pp.c~ pp.c > patch |
1832 | diff -u t/op/pack.t~ t/op/pack.t >> patch |
1833 | diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch |
1834 | |
1835 | We end up with a patch looking a little like this: |
1836 | |
1837 | --- pp.c~ Fri Jun 02 04:34:10 2000 |
1838 | +++ pp.c Fri Jun 16 11:37:25 2000 |
1839 | @@ -4375,6 +4375,7 @@ |
1840 | register I32 items; |
1841 | STRLEN fromlen; |
1842 | register char *pat = SvPVx(*++MARK, fromlen); |
1843 | + char *patcopy; |
1844 | register char *patend = pat + fromlen; |
1845 | register I32 len; |
1846 | I32 datumtype; |
1847 | @@ -4405,6 +4406,7 @@ |
1848 | ... |
1849 | |
1850 | And finally, we submit it, with our rationale, to perl5-porters. Job |
1851 | done! |
1852 | |
f7e1e956 |
1853 | =head2 Patching a core module |
1854 | |
1855 | This works just like patching anything else, with an extra |
1856 | consideration. Many core modules also live on CPAN. If this is so, |
1857 | patch the CPAN version instead of the core and send the patch off to |
1858 | the module maintainer (with a copy to p5p). This will help the module |
1859 | maintainer keep the CPAN version in sync with the core version without |
1860 | constantly scanning p5p. |
1861 | |
acbe17fc |
1862 | =head2 Adding a new function to the core |
1863 | |
1864 | If, as part of a patch to fix a bug, or just because you have an |
1865 | especially good idea, you decide to add a new function to the core, |
1866 | discuss your ideas on p5p well before you start work. It may be that |
1867 | someone else has already attempted to do what you are considering and |
1868 | can give lots of good advice or even provide you with bits of code |
1869 | that they already started (but never finished). |
1870 | |
1871 | You have to follow all of the advice given above for patching. It is |
1872 | extremely important to test any addition thoroughly and add new tests |
1873 | to explore all boundary conditions that your new function is expected |
1874 | to handle. If your new function is used only by one module (e.g. toke), |
1875 | then it should probably be named S_your_function (for static); on the |
210b36aa |
1876 | other hand, if you expect it to accessible from other functions in |
acbe17fc |
1877 | Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions> |
1878 | for more details. |
1879 | |
1880 | The location of any new code is also an important consideration. Don't |
1881 | just create a new top level .c file and put your code there; you would |
1882 | have to make changes to Configure (so the Makefile is created properly), |
1883 | as well as possibly lots of include files. This is strictly pumpking |
1884 | business. |
1885 | |
1886 | It is better to add your function to one of the existing top level |
1887 | source code files, but your choice is complicated by the nature of |
1888 | the Perl distribution. Only the files that are marked as compiled |
1889 | static are located in the perl executable. Everything else is located |
1890 | in the shared library (or DLL if you are running under WIN32). So, |
1891 | for example, if a function was only used by functions located in |
1892 | toke.c, then your code can go in toke.c. If, however, you want to call |
1893 | the function from universal.c, then you should put your code in another |
1894 | location, for example util.c. |
1895 | |
1896 | In addition to writing your c-code, you will need to create an |
1897 | appropriate entry in embed.pl describing your function, then run |
1898 | 'make regen_headers' to create the entries in the numerous header |
1899 | files that perl needs to compile correctly. See L<perlguts/Internal Functions> |
1900 | for information on the various options that you can set in embed.pl. |
1901 | You will forget to do this a few (or many) times and you will get |
1902 | warnings during the compilation phase. Make sure that you mention |
1903 | this when you post your patch to P5P; the pumpking needs to know this. |
1904 | |
1905 | When you write your new code, please be conscious of existing code |
884bad00 |
1906 | conventions used in the perl source files. See L<perlstyle> for |
acbe17fc |
1907 | details. Although most of the guidelines discussed seem to focus on |
1908 | Perl code, rather than c, they all apply (except when they don't ;). |
1909 | See also I<Porting/patching.pod> file in the Perl source distribution |
1910 | for lots of details about both formatting and submitting patches of |
1911 | your changes. |
1912 | |
1913 | Lastly, TEST TEST TEST TEST TEST any code before posting to p5p. |
1914 | Test on as many platforms as you can find. Test as many perl |
1915 | Configure options as you can (e.g. MULTIPLICITY). If you have |
1916 | profiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL> |
210b36aa |
1917 | below for how to use them to further test your code. Remember that |
acbe17fc |
1918 | most of the people on P5P are doing this on their own time and |
1919 | don't have the time to debug your code. |
f7e1e956 |
1920 | |
1921 | =head2 Writing a test |
1922 | |
1923 | Every module and built-in function has an associated test file (or |
1924 | should...). If you add or change functionality, you have to write a |
1925 | test. If you fix a bug, you have to write a test so that bug never |
1926 | comes back. If you alter the docs, it would be nice to test what the |
1927 | new documentation says. |
1928 | |
1929 | In short, if you submit a patch you probably also have to patch the |
1930 | tests. |
1931 | |
1932 | For modules, the test file is right next to the module itself. |
1933 | F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation, |
1934 | so there are some snags (and it would be wonderful for you to brush |
1935 | them out), but it basically works that way. Everything else lives in |
1936 | F<t/>. |
1937 | |
1938 | =over 3 |
1939 | |
1940 | =item F<t/base/> |
1941 | |
1942 | Testing of the absolute basic functionality of Perl. Things like |
1943 | C<if>, basic file reads and writes, simple regexes, etc. These are |
1944 | run first in the test suite and if any of them fail, something is |
1945 | I<really> broken. |
1946 | |
1947 | =item F<t/cmd/> |
1948 | |
1949 | These test the basic control structures, C<if/else>, C<while>, |
35c336e6 |
1950 | subroutines, etc. |
f7e1e956 |
1951 | |
1952 | =item F<t/comp/> |
1953 | |
1954 | Tests basic issues of how Perl parses and compiles itself. |
1955 | |
1956 | =item F<t/io/> |
1957 | |
1958 | Tests for built-in IO functions, including command line arguments. |
1959 | |
1960 | =item F<t/lib/> |
1961 | |
1962 | The old home for the module tests, you shouldn't put anything new in |
1963 | here. There are still some bits and pieces hanging around in here |
1964 | that need to be moved. Perhaps you could move them? Thanks! |
1965 | |
1966 | =item F<t/op/> |
1967 | |
1968 | Tests for perl's built in functions that don't fit into any of the |
1969 | other directories. |
1970 | |
1971 | =item F<t/pod/> |
1972 | |
1973 | Tests for POD directives. There are still some tests for the Pod |
1974 | modules hanging around in here that need to be moved out into F<lib/>. |
1975 | |
1976 | =item F<t/run/> |
1977 | |
1978 | Testing features of how perl actually runs, including exit codes and |
1979 | handling of PERL* environment variables. |
1980 | |
244d9cb7 |
1981 | =item F<t/uni/> |
1982 | |
1983 | Tests for the core support of Unicode. |
1984 | |
1985 | =item F<t/win32/> |
1986 | |
1987 | Windows-specific tests. |
1988 | |
1989 | =item F<t/x2p> |
1990 | |
1991 | A test suite for the s2p converter. |
1992 | |
f7e1e956 |
1993 | =back |
1994 | |
1995 | The core uses the same testing style as the rest of Perl, a simple |
1996 | "ok/not ok" run through Test::Harness, but there are a few special |
1997 | considerations. |
1998 | |
35c336e6 |
1999 | There are three ways to write a test in the core. Test::More, |
2000 | t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The |
2001 | decision of which to use depends on what part of the test suite you're |
2002 | working on. This is a measure to prevent a high-level failure (such |
2003 | as Config.pm breaking) from causing basic functionality tests to fail. |
2004 | |
2005 | =over 4 |
2006 | |
2007 | =item t/base t/comp |
2008 | |
2009 | Since we don't know if require works, or even subroutines, use ad hoc |
2010 | tests for these two. Step carefully to avoid using the feature being |
2011 | tested. |
2012 | |
2013 | =item t/cmd t/run t/io t/op |
2014 | |
2015 | Now that basic require() and subroutines are tested, you can use the |
2016 | t/test.pl library which emulates the important features of Test::More |
2017 | while using a minimum of core features. |
2018 | |
2019 | You can also conditionally use certain libraries like Config, but be |
2020 | sure to skip the test gracefully if it's not there. |
2021 | |
2022 | =item t/lib ext lib |
2023 | |
2024 | Now that the core of Perl is tested, Test::More can be used. You can |
2025 | also use the full suite of core modules in the tests. |
2026 | |
2027 | =back |
f7e1e956 |
2028 | |
2029 | When you say "make test" Perl uses the F<t/TEST> program to run the |
7205a85d |
2030 | test suite (except under Win32 where it uses F<t/harness> instead.) |
2031 | All tests are run from the F<t/> directory, B<not> the directory |
2032 | which contains the test. This causes some problems with the tests |
2033 | in F<lib/>, so here's some opportunity for some patching. |
f7e1e956 |
2034 | |
2035 | You must be triply conscious of cross-platform concerns. This usually |
2036 | boils down to using File::Spec and avoiding things like C<fork()> and |
2037 | C<system()> unless absolutely necessary. |
2038 | |
e018f8be |
2039 | =head2 Special Make Test Targets |
2040 | |
2041 | There are various special make targets that can be used to test Perl |
2042 | slightly differently than the standard "test" target. Not all them |
2043 | are expected to give a 100% success rate. Many of them have several |
7205a85d |
2044 | aliases, and many of them are not available on certain operating |
2045 | systems. |
e018f8be |
2046 | |
2047 | =over 4 |
2048 | |
2049 | =item coretest |
2050 | |
7d7d5695 |
2051 | Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests). |
e018f8be |
2052 | |
7205a85d |
2053 | (Not available on Win32) |
2054 | |
e018f8be |
2055 | =item test.deparse |
2056 | |
b26492ee |
2057 | Run all the tests through B::Deparse. Not all tests will succeed. |
2058 | |
7205a85d |
2059 | (Not available on Win32) |
2060 | |
b26492ee |
2061 | =item test.taintwarn |
2062 | |
2063 | Run all tests with the B<-t> command-line switch. Not all tests |
2064 | are expected to succeed (until they're specifically fixed, of course). |
e018f8be |
2065 | |
7205a85d |
2066 | (Not available on Win32) |
2067 | |
e018f8be |
2068 | =item minitest |
2069 | |
2070 | Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>, |
2071 | F<t/op>, and F<t/uni> tests. |
2072 | |
7a834142 |
2073 | =item test.valgrind check.valgrind utest.valgrind ucheck.valgrind |
2074 | |
2075 | (Only in Linux) Run all the tests using the memory leak + naughty |
2076 | memory access tool "valgrind". The log files will be named |
2077 | F<testname.valgrind>. |
2078 | |
e018f8be |
2079 | =item test.third check.third utest.third ucheck.third |
2080 | |
2081 | (Only in Tru64) Run all the tests using the memory leak + naughty |
2082 | memory access tool "Third Degree". The log files will be named |
2083 | F<perl3.log.testname>. |
2084 | |
2085 | =item test.torture torturetest |
2086 | |
2087 | Run all the usual tests and some extra tests. As of Perl 5.8.0 the |
244d9cb7 |
2088 | only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>. |
e018f8be |
2089 | |
2090 | You can also run the torture test with F<t/harness> by giving |
2091 | C<-torture> argument to F<t/harness>. |
2092 | |
2093 | =item utest ucheck test.utf8 check.utf8 |
2094 | |
2095 | Run all the tests with -Mutf8. Not all tests will succeed. |
2096 | |
7205a85d |
2097 | (Not available on Win32) |
2098 | |
cc0710ff |
2099 | =item minitest.utf16 test.utf16 |
2100 | |
2101 | Runs the tests with UTF-16 encoded scripts, encoded with different |
2102 | versions of this encoding. |
2103 | |
2104 | C<make utest.utf16> runs the test suite with a combination of C<-utf8> and |
2105 | C<-utf16> arguments to F<t/TEST>. |
2106 | |
7205a85d |
2107 | (Not available on Win32) |
2108 | |
244d9cb7 |
2109 | =item test_harness |
2110 | |
2111 | Run the test suite with the F<t/harness> controlling program, instead of |
2112 | F<t/TEST>. F<t/harness> is more sophisticated, and uses the |
2113 | L<Test::Harness> module, thus using this test target supposes that perl |
2114 | mostly works. The main advantage for our purposes is that it prints a |
00bf5cd9 |
2115 | detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it |
2116 | doesn't redirect stderr to stdout. |
244d9cb7 |
2117 | |
7205a85d |
2118 | Note that under Win32 F<t/harness> is always used instead of F<t/TEST>, so |
2119 | there is no special "test_harness" target. |
2120 | |
2121 | Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES |
2122 | environment variables to control the behaviour of F<t/harness>. This means |
2123 | you can say |
2124 | |
2125 | nmake test TEST_FILES="op/*.t" |
2126 | nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t" |
2127 | |
2128 | =item test-notty test_notty |
2129 | |
2130 | Sets PERL_SKIP_TTY_TEST to true before running normal test. |
2131 | |
244d9cb7 |
2132 | =back |
2133 | |
2134 | =head2 Running tests by hand |
2135 | |
2136 | You can run part of the test suite by hand by using one the following |
2137 | commands from the F<t/> directory : |
2138 | |
2139 | ./perl -I../lib TEST list-of-.t-files |
2140 | |
2141 | or |
2142 | |
2143 | ./perl -I../lib harness list-of-.t-files |
2144 | |
2145 | (if you don't specify test scripts, the whole test suite will be run.) |
2146 | |
7205a85d |
2147 | =head3 Using t/harness for testing |
2148 | |
2149 | If you use C<harness> for testing you have several command line options |
2150 | available to you. The arguments are as follows, and are in the order |
2151 | that they must appear if used together. |
2152 | |
2153 | harness -v -torture -re=pattern LIST OF FILES TO TEST |
2154 | harness -v -torture -re LIST OF PATTERNS TO MATCH |
2155 | |
2156 | If C<LIST OF FILES TO TEST> is omitted the file list is obtained from |
2157 | the manifest. The file list may include shell wildcards which will be |
2158 | expanded out. |
2159 | |
2160 | =over 4 |
2161 | |
2162 | =item -v |
2163 | |
2164 | Run the tests under verbose mode so you can see what tests were run, |
2165 | and debug outbut. |
2166 | |
2167 | =item -torture |
2168 | |
2169 | Run the torture tests as well as the normal set. |
2170 | |
2171 | =item -re=PATTERN |
2172 | |
2173 | Filter the file list so that all the test files run match PATTERN. |
2174 | Note that this form is distinct from the B<-re LIST OF PATTERNS> form below |
2175 | in that it allows the file list to be provided as well. |
2176 | |
2177 | =item -re LIST OF PATTERNS |
2178 | |
2179 | Filter the file list so that all the test files run match |
2180 | /(LIST|OF|PATTERNS)/. Note that with this form the patterns |
2181 | are joined by '|' and you cannot supply a list of files, instead |
2182 | the test files are obtained from the MANIFEST. |
2183 | |
2184 | =back |
2185 | |
244d9cb7 |
2186 | You can run an individual test by a command similar to |
2187 | |
2188 | ./perl -I../lib patho/to/foo.t |
2189 | |
2190 | except that the harnesses set up some environment variables that may |
2191 | affect the execution of the test : |
2192 | |
2193 | =over 4 |
2194 | |
2195 | =item PERL_CORE=1 |
2196 | |
2197 | indicates that we're running this test part of the perl core test suite. |
2198 | This is useful for modules that have a dual life on CPAN. |
2199 | |
2200 | =item PERL_DESTRUCT_LEVEL=2 |
2201 | |
2202 | is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>) |
2203 | |
2204 | =item PERL |
2205 | |
2206 | (used only by F<t/TEST>) if set, overrides the path to the perl executable |
2207 | that should be used to run the tests (the default being F<./perl>). |
2208 | |
2209 | =item PERL_SKIP_TTY_TEST |
2210 | |
2211 | if set, tells to skip the tests that need a terminal. It's actually set |
2212 | automatically by the Makefile, but can also be forced artificially by |
2213 | running 'make test_notty'. |
2214 | |
e018f8be |
2215 | =back |
f7e1e956 |
2216 | |
902b9dbf |
2217 | =head1 EXTERNAL TOOLS FOR DEBUGGING PERL |
2218 | |
2219 | Sometimes it helps to use external tools while debugging and |
2220 | testing Perl. This section tries to guide you through using |
2221 | some common testing and debugging tools with Perl. This is |
2222 | meant as a guide to interfacing these tools with Perl, not |
2223 | as any kind of guide to the use of the tools themselves. |
2224 | |
a958818a |
2225 | B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or |
2226 | Third Degree greatly slows down the execution: seconds become minutes, |
2227 | minutes become hours. For example as of Perl 5.8.1, the |
2228 | ext/Encode/t/Unicode.t takes extraordinarily long to complete under |
2229 | e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more |
2230 | than six hours, even on a snappy computer-- the said test must be |
2231 | doing something that is quite unfriendly for memory debuggers. If you |
2232 | don't feel like waiting, that you can simply kill away the perl |
2233 | process. |
2234 | |
2235 | B<NOTE 2>: To minimize the number of memory leak false alarms (see |
2236 | L</PERL_DESTRUCT_LEVEL> for more information), you have to have |
2237 | environment variable PERL_DESTRUCT_LEVEL set to 2. The F<TEST> |
2238 | and harness scripts do that automatically. But if you are running |
2239 | some of the tests manually-- for csh-like shells: |
2240 | |
2241 | setenv PERL_DESTRUCT_LEVEL 2 |
2242 | |
2243 | and for Bourne-type shells: |
2244 | |
2245 | PERL_DESTRUCT_LEVEL=2 |
2246 | export PERL_DESTRUCT_LEVEL |
2247 | |
2248 | or in UNIXy environments you can also use the C<env> command: |
2249 | |
2250 | env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ... |
a1b65709 |
2251 | |
37c0adeb |
2252 | B<NOTE 3>: There are known memory leaks when there are compile-time |
2253 | errors within eval or require, seeing C<S_doeval> in the call stack |
2254 | is a good sign of these. Fixing these leaks is non-trivial, |
2255 | unfortunately, but they must be fixed eventually. |
2256 | |
902b9dbf |
2257 | =head2 Rational Software's Purify |
2258 | |
2259 | Purify is a commercial tool that is helpful in identifying |
2260 | memory overruns, wild pointers, memory leaks and other such |
2261 | badness. Perl must be compiled in a specific way for |
2262 | optimal testing with Purify. Purify is available under |
2263 | Windows NT, Solaris, HP-UX, SGI, and Siemens Unix. |
2264 | |
902b9dbf |
2265 | =head2 Purify on Unix |
2266 | |
2267 | On Unix, Purify creates a new Perl binary. To get the most |
2268 | benefit out of Purify, you should create the perl to Purify |
2269 | using: |
2270 | |
2271 | sh Configure -Accflags=-DPURIFY -Doptimize='-g' \ |
2272 | -Uusemymalloc -Dusemultiplicity |
2273 | |
2274 | where these arguments mean: |
2275 | |
2276 | =over 4 |
2277 | |
2278 | =item -Accflags=-DPURIFY |
2279 | |
2280 | Disables Perl's arena memory allocation functions, as well as |
2281 | forcing use of memory allocation functions derived from the |
2282 | system malloc. |
2283 | |
2284 | =item -Doptimize='-g' |
2285 | |
2286 | Adds debugging information so that you see the exact source |
2287 | statements where the problem occurs. Without this flag, all |
2288 | you will see is the source filename of where the error occurred. |
2289 | |
2290 | =item -Uusemymalloc |
2291 | |
2292 | Disable Perl's malloc so that Purify can more closely monitor |
2293 | allocations and leaks. Using Perl's malloc will make Purify |
2294 | report most leaks in the "potential" leaks category. |
2295 | |
2296 | =item -Dusemultiplicity |
2297 | |
2298 | Enabling the multiplicity option allows perl to clean up |
2299 | thoroughly when the interpreter shuts down, which reduces the |
2300 | number of bogus leak reports from Purify. |
2301 | |
2302 | =back |
2303 | |
2304 | Once you've compiled a perl suitable for Purify'ing, then you |
2305 | can just: |
2306 | |
2307 | make pureperl |
2308 | |
2309 | which creates a binary named 'pureperl' that has been Purify'ed. |
2310 | This binary is used in place of the standard 'perl' binary |
2311 | when you want to debug Perl memory problems. |
2312 | |
2313 | As an example, to show any memory leaks produced during the |
2314 | standard Perl testset you would create and run the Purify'ed |
2315 | perl as: |
2316 | |
2317 | make pureperl |
2318 | cd t |
2319 | ../pureperl -I../lib harness |
2320 | |
2321 | which would run Perl on test.pl and report any memory problems. |
2322 | |
2323 | Purify outputs messages in "Viewer" windows by default. If |
2324 | you don't have a windowing environment or if you simply |
2325 | want the Purify output to unobtrusively go to a log file |
2326 | instead of to the interactive window, use these following |
2327 | options to output to the log file "perl.log": |
2328 | |
2329 | setenv PURIFYOPTIONS "-chain-length=25 -windows=no \ |
2330 | -log-file=perl.log -append-logfile=yes" |
2331 | |
2332 | If you plan to use the "Viewer" windows, then you only need this option: |
2333 | |
2334 | setenv PURIFYOPTIONS "-chain-length=25" |
2335 | |
c406981e |
2336 | In Bourne-type shells: |
2337 | |
98631ff8 |
2338 | PURIFYOPTIONS="..." |
2339 | export PURIFYOPTIONS |
c406981e |
2340 | |
2341 | or if you have the "env" utility: |
2342 | |
98631ff8 |
2343 | env PURIFYOPTIONS="..." ../pureperl ... |
c406981e |
2344 | |
902b9dbf |
2345 | =head2 Purify on NT |
2346 | |
2347 | Purify on Windows NT instruments the Perl binary 'perl.exe' |
2348 | on the fly. There are several options in the makefile you |
2349 | should change to get the most use out of Purify: |
2350 | |
2351 | =over 4 |
2352 | |
2353 | =item DEFINES |
2354 | |
2355 | You should add -DPURIFY to the DEFINES line so the DEFINES |
2356 | line looks something like: |
2357 | |
2358 | DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 |
2359 | |
2360 | to disable Perl's arena memory allocation functions, as |
2361 | well as to force use of memory allocation functions derived |
2362 | from the system malloc. |
2363 | |
2364 | =item USE_MULTI = define |
2365 | |
2366 | Enabling the multiplicity option allows perl to clean up |
2367 | thoroughly when the interpreter shuts down, which reduces the |
2368 | number of bogus leak reports from Purify. |
2369 | |
2370 | =item #PERL_MALLOC = define |
2371 | |
2372 | Disable Perl's malloc so that Purify can more closely monitor |
2373 | allocations and leaks. Using Perl's malloc will make Purify |
2374 | report most leaks in the "potential" leaks category. |
2375 | |
2376 | =item CFG = Debug |
2377 | |
2378 | Adds debugging information so that you see the exact source |
2379 | statements where the problem occurs. Without this flag, all |
2380 | you will see is the source filename of where the error occurred. |
2381 | |
2382 | =back |
2383 | |
2384 | As an example, to show any memory leaks produced during the |
2385 | standard Perl testset you would create and run Purify as: |
2386 | |
2387 | cd win32 |
2388 | make |
2389 | cd ../t |
2390 | purify ../perl -I../lib harness |
2391 | |
2392 | which would instrument Perl in memory, run Perl on test.pl, |
2393 | then finally report any memory problems. |
2394 | |
7a834142 |
2395 | =head2 valgrind |
2396 | |
2397 | The excellent valgrind tool can be used to find out both memory leaks |
2398 | and illegal memory accesses. As of August 2003 it unfortunately works |
2399 | only on x86 (ELF) Linux. The special "test.valgrind" target can be used |
d44161bf |
2400 | to run the tests under valgrind. Found errors and memory leaks are |
2401 | logged in files named F<test.valgrind>. |
2402 | |
2403 | As system libraries (most notably glibc) are also triggering errors, |
2404 | valgrind allows to suppress such errors using suppression files. The |
2405 | default suppression file that comes with valgrind already catches a lot |
2406 | of them. Some additional suppressions are defined in F<t/perl.supp>. |
7a834142 |
2407 | |
2408 | To get valgrind and for more information see |
2409 | |
2410 | http://developer.kde.org/~sewardj/ |
2411 | |
f134cc4e |
2412 | =head2 Compaq's/Digital's/HP's Third Degree |
09187cb1 |
2413 | |
2414 | Third Degree is a tool for memory leak detection and memory access checks. |
2415 | It is one of the many tools in the ATOM toolkit. The toolkit is only |
2416 | available on Tru64 (formerly known as Digital UNIX formerly known as |
2417 | DEC OSF/1). |
2418 | |
2419 | When building Perl, you must first run Configure with -Doptimize=-g |
2420 | and -Uusemymalloc flags, after that you can use the make targets |
51a35ef1 |
2421 | "perl.third" and "test.third". (What is required is that Perl must be |
2422 | compiled using the C<-g> flag, you may need to re-Configure.) |
09187cb1 |
2423 | |
64cea5fd |
2424 | The short story is that with "atom" you can instrument the Perl |
83f0ef60 |
2425 | executable to create a new executable called F<perl.third>. When the |
4ae3d70a |
2426 | instrumented executable is run, it creates a log of dubious memory |
83f0ef60 |
2427 | traffic in file called F<perl.3log>. See the manual pages of atom and |
4ae3d70a |
2428 | third for more information. The most extensive Third Degree |
2429 | documentation is available in the Compaq "Tru64 UNIX Programmer's |
2430 | Guide", chapter "Debugging Programs with Third Degree". |
64cea5fd |
2431 | |
9c54ecba |
2432 | The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/ |
64cea5fd |
2433 | subdirectory. There is a problem with these files: Third Degree is so |
2434 | effective that it finds problems also in the system libraries. |
9c54ecba |
2435 | Therefore you should used the Porting/thirdclean script to cleanup |
2436 | the F<*.3log> files. |
64cea5fd |
2437 | |
2438 | There are also leaks that for given certain definition of a leak, |
2439 | aren't. See L</PERL_DESTRUCT_LEVEL> for more information. |
2440 | |
2441 | =head2 PERL_DESTRUCT_LEVEL |
2442 | |
a958818a |
2443 | If you want to run any of the tests yourself manually using e.g. |
2444 | valgrind, or the pureperl or perl.third executables, please note that |
2445 | by default perl B<does not> explicitly cleanup all the memory it has |
2446 | allocated (such as global memory arenas) but instead lets the exit() |
2447 | of the whole program "take care" of such allocations, also known as |
2448 | "global destruction of objects". |
64cea5fd |
2449 | |
2450 | There is a way to tell perl to do complete cleanup: set the |
2451 | environment variable PERL_DESTRUCT_LEVEL to a non-zero value. |
2452 | The t/TEST wrapper does set this to 2, and this is what you |
2453 | need to do too, if you don't want to see the "global leaks": |
1f56d61a |
2454 | For example, for "third-degreed" Perl: |
64cea5fd |
2455 | |
1f56d61a |
2456 | env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t |
09187cb1 |
2457 | |
414f2397 |
2458 | (Note: the mod_perl apache module uses also this environment variable |
2459 | for its own purposes and extended its semantics. Refer to the mod_perl |
287a822c |
2460 | documentation for more information. Also, spawned threads do the |
2461 | equivalent of setting this variable to the value 1.) |
5a6c59ef |
2462 | |
2463 | If, at the end of a run you get the message I<N scalars leaked>, you can |
fd0854ff |
2464 | recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause the addresses |
2465 | of all those leaked SVs to be dumped along with details as to where each |
2466 | SV was originally allocated. This information is also displayed by |
2467 | Devel::Peek. Note that the extra details recorded with each SV increases |
2468 | memory usage, so it shouldn't be used in production environments. It also |
2469 | converts C<new_SV()> from a macro into a real function, so you can use |
2470 | your favourite debugger to discover where those pesky SVs were allocated. |
414f2397 |
2471 | |
51a35ef1 |
2472 | =head2 Profiling |
2473 | |
2474 | Depending on your platform there are various of profiling Perl. |
2475 | |
2476 | There are two commonly used techniques of profiling executables: |
10f58044 |
2477 | I<statistical time-sampling> and I<basic-block counting>. |
51a35ef1 |
2478 | |
2479 | The first method takes periodically samples of the CPU program |
2480 | counter, and since the program counter can be correlated with the code |
2481 | generated for functions, we get a statistical view of in which |
2482 | functions the program is spending its time. The caveats are that very |
2483 | small/fast functions have lower probability of showing up in the |
2484 | profile, and that periodically interrupting the program (this is |
2485 | usually done rather frequently, in the scale of milliseconds) imposes |
2486 | an additional overhead that may skew the results. The first problem |
2487 | can be alleviated by running the code for longer (in general this is a |
2488 | good idea for profiling), the second problem is usually kept in guard |
2489 | by the profiling tools themselves. |
2490 | |
10f58044 |
2491 | The second method divides up the generated code into I<basic blocks>. |
51a35ef1 |
2492 | Basic blocks are sections of code that are entered only in the |
2493 | beginning and exited only at the end. For example, a conditional jump |
2494 | starts a basic block. Basic block profiling usually works by |
10f58044 |
2495 | I<instrumenting> the code by adding I<enter basic block #nnnn> |
51a35ef1 |
2496 | book-keeping code to the generated code. During the execution of the |
2497 | code the basic block counters are then updated appropriately. The |
2498 | caveat is that the added extra code can skew the results: again, the |
2499 | profiling tools usually try to factor their own effects out of the |
2500 | results. |
2501 | |
83f0ef60 |
2502 | =head2 Gprof Profiling |
2503 | |
51a35ef1 |
2504 | gprof is a profiling tool available in many UNIX platforms, |
2505 | it uses F<statistical time-sampling>. |
83f0ef60 |
2506 | |
2507 | You can build a profiled version of perl called "perl.gprof" by |
51a35ef1 |
2508 | invoking the make target "perl.gprof" (What is required is that Perl |
2509 | must be compiled using the C<-pg> flag, you may need to re-Configure). |
2510 | Running the profiled version of Perl will create an output file called |
2511 | F<gmon.out> is created which contains the profiling data collected |
2512 | during the execution. |
83f0ef60 |
2513 | |
2514 | The gprof tool can then display the collected data in various ways. |
2515 | Usually gprof understands the following options: |
2516 | |
2517 | =over 4 |
2518 | |
2519 | =item -a |
2520 | |
2521 | Suppress statically defined functions from the profile. |
2522 | |
2523 | =item -b |
2524 | |
2525 | Suppress the verbose descriptions in the profile. |
2526 | |
2527 | =item -e routine |
2528 | |
2529 | Exclude the given routine and its descendants from the profile. |
2530 | |
2531 | =item -f routine |
2532 | |
2533 | Display only the given routine and its descendants in the profile. |
2534 | |
2535 | =item -s |
2536 | |
2537 | Generate a summary file called F<gmon.sum> which then may be given |
2538 | to subsequent gprof runs to accumulate data over several runs. |
2539 | |
2540 | =item -z |
2541 | |
2542 | Display routines that have zero usage. |
2543 | |
2544 | =back |
2545 | |
2546 | For more detailed explanation of the available commands and output |
2547 | formats, see your own local documentation of gprof. |
2548 | |
51a35ef1 |
2549 | =head2 GCC gcov Profiling |
2550 | |
10f58044 |
2551 | Starting from GCC 3.0 I<basic block profiling> is officially available |
51a35ef1 |
2552 | for the GNU CC. |
2553 | |
2554 | You can build a profiled version of perl called F<perl.gcov> by |
2555 | invoking the make target "perl.gcov" (what is required that Perl must |
2556 | be compiled using gcc with the flags C<-fprofile-arcs |
2557 | -ftest-coverage>, you may need to re-Configure). |
2558 | |
2559 | Running the profiled version of Perl will cause profile output to be |
2560 | generated. For each source file an accompanying ".da" file will be |
2561 | created. |
2562 | |
2563 | To display the results you use the "gcov" utility (which should |
2564 | be installed if you have gcc 3.0 or newer installed). F<gcov> is |
2565 | run on source code files, like this |
2566 | |
2567 | gcov sv.c |
2568 | |
2569 | which will cause F<sv.c.gcov> to be created. The F<.gcov> files |
2570 | contain the source code annotated with relative frequencies of |
2571 | execution indicated by "#" markers. |
2572 | |
2573 | Useful options of F<gcov> include C<-b> which will summarise the |
2574 | basic block, branch, and function call coverage, and C<-c> which |
2575 | instead of relative frequencies will use the actual counts. For |
2576 | more information on the use of F<gcov> and basic block profiling |
2577 | with gcc, see the latest GNU CC manual, as of GCC 3.0 see |
2578 | |
2579 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html |
2580 | |
2581 | and its section titled "8. gcov: a Test Coverage Program" |
2582 | |
2583 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132 |
2584 | |
4ae3d70a |
2585 | =head2 Pixie Profiling |
2586 | |
51a35ef1 |
2587 | Pixie is a profiling tool available on IRIX and Tru64 (aka Digital |
2588 | UNIX aka DEC OSF/1) platforms. Pixie does its profiling using |
10f58044 |
2589 | I<basic-block counting>. |
4ae3d70a |
2590 | |
83f0ef60 |
2591 | You can build a profiled version of perl called F<perl.pixie> by |
51a35ef1 |
2592 | invoking the make target "perl.pixie" (what is required is that Perl |
2593 | must be compiled using the C<-g> flag, you may need to re-Configure). |
2594 | |
2595 | In Tru64 a file called F<perl.Addrs> will also be silently created, |
2596 | this file contains the addresses of the basic blocks. Running the |
2597 | profiled version of Perl will create a new file called "perl.Counts" |
2598 | which contains the counts for the basic block for that particular |
2599 | program execution. |
4ae3d70a |
2600 | |
51a35ef1 |
2601 | To display the results you use the F<prof> utility. The exact |
4ae3d70a |
2602 | incantation depends on your operating system, "prof perl.Counts" in |
2603 | IRIX, and "prof -pixie -all -L. perl" in Tru64. |
2604 | |
6c41479b |
2605 | In IRIX the following prof options are available: |
2606 | |
2607 | =over 4 |
2608 | |
2609 | =item -h |
2610 | |
2611 | Reports the most heavily used lines in descending order of use. |
6e36760b |
2612 | Useful for finding the hotspot lines. |
6c41479b |
2613 | |
2614 | =item -l |
2615 | |
2616 | Groups lines by procedure, with procedures sorted in descending order of use. |
2617 | Within a procedure, lines are listed in source order. |
6e36760b |
2618 | Useful for finding the hotspots of procedures. |
6c41479b |
2619 | |
2620 | =back |
2621 | |
2622 | In Tru64 the following options are available: |
2623 | |
2624 | =over 4 |
2625 | |
3958b146 |
2626 | =item -p[rocedures] |
6c41479b |
2627 | |
3958b146 |
2628 | Procedures sorted in descending order by the number of cycles executed |
6e36760b |
2629 | in each procedure. Useful for finding the hotspot procedures. |
6c41479b |
2630 | (This is the default option.) |
2631 | |
24000d2f |
2632 | =item -h[eavy] |
6c41479b |
2633 | |
6e36760b |
2634 | Lines sorted in descending order by the number of cycles executed in |
2635 | each line. Useful for finding the hotspot lines. |
6c41479b |
2636 | |
24000d2f |
2637 | =item -i[nvocations] |
6c41479b |
2638 | |
6e36760b |
2639 | The called procedures are sorted in descending order by number of calls |
2640 | made to the procedures. Useful for finding the most used procedures. |
6c41479b |
2641 | |
24000d2f |
2642 | =item -l[ines] |
6c41479b |
2643 | |
2644 | Grouped by procedure, sorted by cycles executed per procedure. |
6e36760b |
2645 | Useful for finding the hotspots of procedures. |
6c41479b |
2646 | |
2647 | =item -testcoverage |
2648 | |
2649 | The compiler emitted code for these lines, but the code was unexecuted. |
2650 | |
24000d2f |
2651 | =item -z[ero] |
6c41479b |
2652 | |
2653 | Unexecuted procedures. |
2654 | |
aa500c9e |
2655 | =back |
6c41479b |
2656 | |
2657 | For further information, see your system's manual pages for pixie and prof. |
4ae3d70a |
2658 | |
b8ddf6b3 |
2659 | =head2 Miscellaneous tricks |
2660 | |
2661 | =over 4 |
2662 | |
2663 | =item * |
2664 | |
cc177e1a |
2665 | Those debugging perl with the DDD frontend over gdb may find the |
b8ddf6b3 |
2666 | following useful: |
2667 | |
2668 | You can extend the data conversion shortcuts menu, so for example you |
2669 | can display an SV's IV value with one click, without doing any typing. |
2670 | To do that simply edit ~/.ddd/init file and add after: |
2671 | |
2672 | ! Display shortcuts. |
2673 | Ddd*gdbDisplayShortcuts: \ |
2674 | /t () // Convert to Bin\n\ |
2675 | /d () // Convert to Dec\n\ |
2676 | /x () // Convert to Hex\n\ |
2677 | /o () // Convert to Oct(\n\ |
2678 | |
2679 | the following two lines: |
2680 | |
2681 | ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\ |
2682 | ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx |
2683 | |
2684 | so now you can do ivx and pvx lookups or you can plug there the |
2685 | sv_peek "conversion": |
2686 | |
2687 | Perl_sv_peek(my_perl, (SV*)()) // sv_peek |
2688 | |
2689 | (The my_perl is for threaded builds.) |
2690 | Just remember that every line, but the last one, should end with \n\ |
2691 | |
2692 | Alternatively edit the init file interactively via: |
2693 | 3rd mouse button -> New Display -> Edit Menu |
2694 | |
2695 | Note: you can define up to 20 conversion shortcuts in the gdb |
2696 | section. |
2697 | |
9965345d |
2698 | =item * |
2699 | |
2700 | If you see in a debugger a memory area mysteriously full of 0xabababab, |
2701 | you may be seeing the effect of the Poison() macro, see L<perlclib>. |
2702 | |
b8ddf6b3 |
2703 | =back |
2704 | |
a422fd2d |
2705 | =head2 CONCLUSION |
2706 | |
2707 | We've had a brief look around the Perl source, an overview of the stages |
2708 | F<perl> goes through when it's running your code, and how to use a |
902b9dbf |
2709 | debugger to poke at the Perl guts. We took a very simple problem and |
2710 | demonstrated how to solve it fully - with documentation, regression |
2711 | tests, and finally a patch for submission to p5p. Finally, we talked |
2712 | about how to use external tools to debug and test Perl. |
a422fd2d |
2713 | |
2714 | I'd now suggest you read over those references again, and then, as soon |
2715 | as possible, get your hands dirty. The best way to learn is by doing, |
2716 | so: |
2717 | |
2718 | =over 3 |
2719 | |
2720 | =item * |
2721 | |
2722 | Subscribe to perl5-porters, follow the patches and try and understand |
2723 | them; don't be afraid to ask if there's a portion you're not clear on - |
2724 | who knows, you may unearth a bug in the patch... |
2725 | |
2726 | =item * |
2727 | |
2728 | Keep up to date with the bleeding edge Perl distributions and get |
2729 | familiar with the changes. Try and get an idea of what areas people are |
2730 | working on and the changes they're making. |
2731 | |
2732 | =item * |
2733 | |
3e148164 |
2734 | Do read the README associated with your operating system, e.g. README.aix |
a1f349fd |
2735 | on the IBM AIX OS. Don't hesitate to supply patches to that README if |
2736 | you find anything missing or changed over a new OS release. |
2737 | |
2738 | =item * |
2739 | |
a422fd2d |
2740 | Find an area of Perl that seems interesting to you, and see if you can |
2741 | work out how it works. Scan through the source, and step over it in the |
2742 | debugger. Play, poke, investigate, fiddle! You'll probably get to |
2743 | understand not just your chosen area but a much wider range of F<perl>'s |
2744 | activity as well, and probably sooner than you'd think. |
2745 | |
2746 | =back |
2747 | |
2748 | =over 3 |
2749 | |
2750 | =item I<The Road goes ever on and on, down from the door where it began.> |
2751 | |
2752 | =back |
2753 | |
2754 | If you can do these things, you've started on the long road to Perl porting. |
2755 | Thanks for wanting to help make Perl better - and happy hacking! |
2756 | |
e8cd7eae |
2757 | =head1 AUTHOR |
2758 | |
2759 | This document was written by Nathan Torkington, and is maintained by |
2760 | the perl5-porters mailing list. |
2761 | |