X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlhack.pod;h=8c5d213fb03c70fa55e16170ff4eaf40a3fb9748;hb=83272a45226e83bd136d713158e9b44ace2dbc8d;hp=c640870264122d484d125c85a81361f67bae3f4c;hpb=85add8c20c52762eef70f97d016f6b677c9a4612;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlhack.pod b/pod/perlhack.pod index c640870..8c5d213 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -14,14 +14,13 @@ messages a day, depending on the heatedness of the debate. Most days there are two or three patches, extensions, features, or bugs being discussed at a time. -A searchable archive of the list is at: +A searchable archive of the list is at either: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ -The list is also archived under the usenet group name -C at: +or - http://www.deja.com/ + http://archive.develooper.com/perl5-porters@perl.org/ List subscribers (the porters themselves) come in several flavours. Some are quiet curious lurkers, who rarely pitch in and instead watch @@ -38,12 +37,13 @@ in what does and does not change in the Perl language. Various releases of Perl are shepherded by a ``pumpking'', a porter responsible for gathering patches, deciding on a patch-by-patch feature-by-feature basis what will and will not go into the release. -For instance, Gurusamy Sarathy is the pumpking for the 5.6 release of -Perl. +For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of +Perl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and +Hugo van der Sanden will be the pumpking for the 5.10 release. In addition, various people are pumpkings for different things. For instance, Andy Dougherty and Jarkko Hietaniemi share the I -pumpkin, and Tom Christiansen is the documentation pumpking. +pumpkin. Larry sees Perl development along the lines of the US government: there's the Legislature (the porters), the Executive branch (the @@ -158,13 +158,22 @@ The worst patches make use of a system-specific features. It's highly unlikely that nonportable additions to the Perl language will be accepted. +=item Is the implementation tested? + +Patches which change behaviour (fixing bugs or introducing new features) +must include regression tests to verify that everything works as expected. +Without tests provided by the original author, how can anyone else changing +perl in the future be sure that they haven't unwittingly broken the behaviour +the patch implements? And without tests, how can the patch's author be +confident that his/her hard work put into the patch won't be accidentally +thrown away by someone in the future? + =item Is there enough documentation? Patches without documentation are probably ill-thought out or incomplete. Nothing can be added without documentation, so submitting a patch for the appropriate manpages as well as the source code is -always a good idea. If appropriate, patches should add to the test -suite as well. +always a good idea. =item Is there another way to do it? @@ -194,9 +203,11 @@ around. It refers to the standard distribution. ``Hacking on the core'' means you're changing the C source code to the Perl interpreter. ``A core module'' is one that ships with Perl. +=head2 Keeping in sync + The source code to the Perl interpreter, in its different versions, is -kept in a repository managed by a revision control system (which is -currently the Perforce program, see http://perforce.com/). The +kept in a repository managed by a revision control system ( which is +currently the Perforce program, see http://perforce.com/ ). The pumpkings and a few others have access to the repository to check in changes. Periodically the pumpking for the development version of Perl will release a new version, so the rest of the porters can see what's @@ -204,77 +215,364 @@ changed. The current state of the main trunk of repository, and patches that describe the individual changes that have happened since the last public release are available at this location: + http://public.activestate.com/gsar/APC/ ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/ -Selective parts are also visible via the rsync protocol. To get all -the individual changes to the mainline since the last development -release, use the following command: +If you're looking for a particular change, or a change that affected +a particular set of files, you may find the B +useful: - rsync -avuz rsync://ftp.linux.activestate.com/perl-diffs perl-diffs + http://public.activestate.com/cgi-bin/perlbrowse -Use this to get the latest source tree in full: +You may also want to subscribe to the perl5-changes mailing list to +receive a copy of each patch that gets submitted to the maintenance +and development "branches" of the perl repository. See +http://lists.perl.org/ for subscription information. - rsync -avuz rsync://ftp.linux.activestate.com/perl-current perl-current +If you are a member of the perl5-porters mailing list, it is a good +thing to keep in touch with the most recent changes. If not only to +verify if what you would have posted as a bug report isn't already +solved in the most recent available perl development branch, also +known as perl-current, bleading edge perl, bleedperl or bleadperl. Needless to say, the source code in perl-current is usually in a perpetual state of evolution. You should expect it to be very buggy. Do B use it for any purpose other than testing and development. -Always submit patches to I. This lets other -porters review your patch, which catches a surprising number of errors -in patches. Either use the diff program (available in source code -form from I), or use Johan Vromans' -I (available from I). Unified diffs -are preferred, but context diffs are accepted. Do not send RCS-style -diffs or diffs without context lines. More information is given in -the I file in the Perl source distribution. -Please patch against the latest B version (e.g., if -you're fixing a bug in the 5.005 track, patch against the latest -5.005_5x version). Only patches that survive the heat of the -development branch get applied to maintenance versions. - -Your patch should update the documentation and test suite. +Keeping in sync with the most recent branch can be done in several ways, +but the most convenient and reliable way is using B, available at +ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent +branch by FTP.) + +If you choose to keep in sync using rsync, there are two approaches +to doing so: + +=over 4 + +=item rsync'ing the source tree + +Presuming you are in the directory where your perl source resides +and you have rsync installed and available, you can `upgrade' to +the bleadperl using: + + # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ . + +This takes care of updating every single item in the source tree to +the latest applied patch level, creating files that are new (to your +distribution) and setting date/time stamps of existing files to +reflect the bleadperl status. + +Note that this will not delete any files that were in '.' before +the rsync. Once you are sure that the rsync is running correctly, +run it with the --delete and the --dry-run options like this: + + # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ . + +This will I an rsync run that also deletes files not +present in the bleadperl master copy. Observe the results from +this run closely. If you are sure that the actual run would delete +no files precious to you, you could remove the '--dry-run' option. + +You can than check what patch was the latest that was applied by +looking in the file B<.patch>, which will show the number of the +latest patch. + +If you have more than one machine to keep in sync, and not all of +them have access to the WAN (so you are not able to rsync all the +source trees to the real source), there are some ways to get around +this problem. + +=over 4 + +=item Using rsync over the LAN + +Set up a local rsync server which makes the rsynced source tree +available to the LAN and sync the other machines against this +directory. + +From http://rsync.samba.org/README.html : + + "Rsync uses rsh or ssh for communication. It does not need to be + setuid and requires no special privileges for installation. It + does not require an inetd entry or a daemon. You must, however, + have a working rsh or ssh system. Using ssh is recommended for + its security features." + +=item Using pushing over the NFS + +Having the other systems mounted over the NFS, you can take an +active pushing approach by checking the just updated tree against +the other not-yet synced trees. An example would be + + #!/usr/bin/perl -w + + use strict; + use File::Copy; + + my %MF = map { + m/(\S+)/; + $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime + } `cat MANIFEST`; + + my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); + + foreach my $host (keys %remote) { + unless (-d $remote{$host}) { + print STDERR "Cannot Xsync for host $host\n"; + next; + } + foreach my $file (keys %MF) { + my $rfile = "$remote{$host}/$file"; + my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; + defined $size or ($mode, $size, $mtime) = (0, 0, 0); + $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; + printf "%4s %-34s %8d %9d %8d %9d\n", + $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; + unlink $rfile; + copy ($file, $rfile); + utime time, $MF{$file}[2], $rfile; + chmod $MF{$file}[0], $rfile; + } + } + +though this is not perfect. It could be improved with checking +file checksums before updating. Not all NFS systems support +reliable utime support (when used over the NFS). + +=back + +=item rsync'ing the patches + +The source tree is maintained by the pumpking who applies patches to +the files in the tree. These patches are either created by the +pumpking himself using C after updating the file manually or +by applying patches sent in by posters on the perl5-porters list. +These patches are also saved and rsync'able, so you can apply them +yourself to the source files. + +Presuming you are in a directory where your patches reside, you can +get them in sync with + + # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . + +This makes sure the latest available patch is downloaded to your +patch directory. + +It's then up to you to apply these patches, using something like + + # last=`ls -t *.gz | sed q` + # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . + # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch + # cd ../perl-current + # patch -p1 -N <../perl-current-diffs/blead.patch + +or, since this is only a hint towards how it works, use CPAN-patchaperl +from Andreas König to have better control over the patching process. + +=back + +=head2 Why rsync the source tree + +=over 4 + +=item It's easier to rsync the source tree + +Since you don't have to apply the patches yourself, you are sure all +files in the source tree are in the right state. + +=item It's more reliable + +While both the rsync-able source and patch areas are automatically +updated every few minutes, keep in mind that applying patches may +sometimes mean careful hand-holding, especially if your version of +the C program does not understand how to deal with new files, +files with 8-bit characters, or files without trailing newlines. + +=back + +=head2 Why rsync the patches + +=over 4 + +=item It's easier to rsync the patches + +If you have more than one machine that you want to keep in track with +bleadperl, it's easier to rsync the patches only once and then apply +them to all the source trees on the different machines. + +In case you try to keep in pace on 5 different machines, for which +only one of them has access to the WAN, rsync'ing all the source +trees should than be done 5 times over the NFS. Having +rsync'ed the patches only once, I can apply them to all the source +trees automatically. Need you say more ;-) + +=item It's a good reference + +If you do not only like to have the most recent development branch, +but also like to B bugs, or extend features, you want to dive +into the sources. If you are a seasoned perl core diver, you don't +need no manuals, tips, roadmaps, perlguts.pod or other aids to find +your way around. But if you are a starter, the patches may help you +in finding where you should start and how to change the bits that +bug you. + +The file B is updated on occasions the pumpking sees as his +own little sync points. On those occasions, he releases a tar-ball of +the current source tree (i.e. perl@7582.tar.gz), which will be an +excellent point to start with when choosing to use the 'rsync the +patches' scheme. Starting with perl@7582, which means a set of source +files on which the latest applied patch is number 7582, you apply all +succeeding patches available from then on (7583, 7584, ...). + +You can use the patches later as a kind of search archive. + +=over 4 + +=item Finding a start point + +If you want to fix/change the behaviour of function/feature Foo, just +scan the patches for patches that mention Foo either in the subject, +the comments, or the body of the fix. A good chance the patch shows +you the files that are affected by that patch which are very likely +to be the starting point of your journey into the guts of perl. + +=item Finding how to fix a bug + +If you've found I the function/feature Foo misbehaves, but you +don't know how to fix it (but you do know the change you want to +make), you can, again, peruse the patches for similar changes and +look how others apply the fix. + +=item Finding the source of misbehaviour + +When you keep in sync with bleadperl, the pumpking would love to +I that the community efforts really work. So after each of his +sync points, you are to 'make test' to check if everything is still +in working order. If it is, you do 'make ok', which will send an OK +report to perlbug@perl.org. (If you do not have access to a mailer +from the system you just finished successfully 'make test', you can +do 'make okfile', which creates the file C, which you can +than take to your favourite mailer and mail yourself). + +But of course, as always, things will not always lead to a success +path, and one or more test do not pass the 'make test'. Before +sending in a bug report (using 'make nok' or 'make nokfile'), check +the mailing list if someone else has reported the bug already and if +so, confirm it by replying to that message. If not, you might want to +trace the source of that misbehaviour B sending in the bug, +which will help all the other porters in finding the solution. + +Here the saved patches come in very handy. You can check the list of +patches to see which patch changed what file and what change caused +the misbehaviour. If you note that in the bug report, it saves the +one trying to solve it, looking for that point. + +=back + +If searching the patches is too bothersome, you might consider using +perl's bugtron to find more information about discussions and +ramblings on posted bugs. + +If you want to get the best of both worlds, rsync both the source +tree for convenience, reliability and ease and rsync the patches +for reference. + +=back + + +=head2 Perlbug remote interface + +=over 4 + +There are three (3) remote administrative interfaces for modifying bug +status, category, etc. In all cases an admin must be first registered +with the Perlbug database by sending an email request to +richard@perl.org or bugmongers@perl.org. + +The main requirement is the willingness to classify, (with the +emphasis on closing where possible :), outstanding bugs. Further +explanation can be garnered from the web at http://bugs.perl.org/ , or +by asking on the admin mailing list at: bugmongers@perl.org + +For more info on the web see + + http://bugs.perl.org/perlbug.cgi?req=spec + +=item 1 http://bugs.perl.org + +Login via the web, (remove B if only browsing), where interested +Cc's, tests, patches and change-ids, etc. may be assigned. + + http://bugs.perl.org/admin/index.html + + +=item 2 bugdb@perl.org + +Where the subject line is used for commands: + + To: bugdb@perl.org + Subject: -a close bugid1 bugid2 aix install + + To: bugdb@perl.org + Subject: -h + + +=item 3 commands_and_bugdids@bugs.perl.org + +Where the address itself is the source for the commands: + + To: close_bugid1_bugid2_aix@bugs.perl.org + + To: help@bugs.perl.org + + +=item notes, patches, tests + +For patches and tests, the message body is assigned to the appropriate +bugs and forwarded to p5p for their attention. + + To: test__aix_close@bugs.perl.org + Subject: this is a test for the (now closed) aix bug + + Test is the body of the mail + +=back + +=head2 Submitting patches + +Always submit patches to I. If you're +patching a core module and there's an author listed, send the author a +copy (see L). This lets other porters review +your patch, which catches a surprising number of errors in patches. +Either use the diff program (available in source code form from +ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I +(available from I). Unified diffs are preferred, +but context diffs are accepted. Do not send RCS-style diffs or diffs +without context lines. More information is given in the +I file in the Perl source distribution. Please +patch against the latest B version (e.g., if you're +fixing a bug in the 5.005 track, patch against the latest 5.005_5x +version). Only patches that survive the heat of the development +branch get applied to maintenance versions. + +Your patch should update the documentation and test suite. See +L. To report a bug in Perl, use the program I which comes with Perl (if you can't get Perl to work, send mail to the address -I or I). Reporting bugs through +I or I). Reporting bugs through I feeds into the automated bug-tracking system, access to -which is provided through the web at I. It +which is provided through the web at http://bugs.perl.org/ . It often pays to check the archives of the perl5-porters mailing list to see whether the bug you're reporting has been reported before, and if so whether it was considered a bug. See above for the location of the searchable archives. -The CPAN testers (I) are a group of -volunteers who test CPAN modules on a variety of platforms. Perl Labs -(I) automatically tests Perl source releases on -platforms and gives feedback to the CPAN testers mailing list. Both -efforts welcome volunteers. - -To become an active and patching Perl porter, you'll need to learn how -Perl works on the inside. Chip Salzenberg, a pumpking, has written -articles on Perl internals for The Perl Journal -(I) which explain how various parts of the Perl -interpreter work. The C manpage explains the internal data -structures. And, of course, the C source code (sometimes sparsely -commented, sometimes commented well) is a great place to start (begin -with C and see where it goes from there). A lot of the style -of the Perl source is explained in the I file in -the source distribution. - -It is essential that you be comfortable using a good debugger -(e.g. gdb, dbx) before you can patch perl. Stepping through perl -as it executes a script is perhaps the best (if sometimes tedious) -way to gain a precise understanding of the overall architecture of -the language. - -If you build a version of the Perl interpreter with C<-DDEBUGGING>, -Perl's B<-D> command line flag will cause copious debugging information -to be emitted (see the C manpage). If you build a version of -Perl with compiler debugging information (e.g. with the C compiler's -C<-g> option instead of C<-O>) then you can step through the execution -of the interpreter with your favourite C symbolic debugger, setting -breakpoints on particular functions. +The CPAN testers ( http://testers.cpan.org/ ) are a group of +volunteers who test CPAN modules on a variety of platforms. Perl +Smokers ( http://archives.develooper.com/daily-build@perl.org/ ) +automatically tests Perl source releases on platforms with various +configurations. Both efforts welcome volunteers. It's a good idea to read and lurk for a while before chipping in. That way you'll get to see the dynamic of the conversations, learn the @@ -285,6 +583,1774 @@ If after all this you still think you want to join the perl5-porters mailing list, send mail to I. To unsubscribe, send mail to I. +To hack on the Perl guts, you'll need to read the following things: + +=over 3 + +=item L + +This is of paramount importance, since it's the documentation of what +goes where in the Perl source. Read it over a couple of times and it +might start to make sense - don't worry if it doesn't yet, because the +best way to study it is to read it in conjunction with poking at Perl +source, and we'll do that later on. + +You might also want to look at Gisle Aas's illustrated perlguts - +there's no guarantee that this will be absolutely up-to-date with the +latest documentation in the Perl core, but the fundamentals will be +right. ( http://gisle.aas.no/perl/illguts/ ) + +=item L and L + +A working knowledge of XSUB programming is incredibly useful for core +hacking; XSUBs use techniques drawn from the PP code, the portion of the +guts that actually executes a Perl program. It's a lot gentler to learn +those techniques from simple examples and explanation than from the core +itself. + +=item L + +The documentation for the Perl API explains what some of the internal +functions do, as well as the many macros used in the source. + +=item F + +This is a collection of words of wisdom for a Perl porter; some of it is +only useful to the pumpkin holder, but most of it applies to anyone +wanting to go about Perl development. + +=item The perl5-porters FAQ + +This should be available from http://simon-cozens.org/writings/p5p-faq ; +alternatively, you can get the FAQ emailed to you by sending mail to +C. It contains hints on reading perl5-porters, +information on how perl5-porters works and how Perl development in general +works. + +=back + +=head2 Finding Your Way Around + +Perl maintenance can be split into a number of areas, and certain people +(pumpkins) will have responsibility for each area. These areas sometimes +correspond to files or directories in the source kit. Among the areas are: + +=over 3 + +=item Core modules + +Modules shipped as part of the Perl core live in the F and F +subdirectories: F is for the pure-Perl modules, and F +contains the core XS modules. + +=item Tests + +There are tests for nearly all the modules, built-ins and major bits +of functionality. Test files all have a .t suffix. Module tests live +in the F and F directories next to the module being +tested. Others live in F. See L + +=item Documentation + +Documentation maintenance includes looking after everything in the +F directory, (as well as contributing new documentation) and +the documentation to the modules in core. + +=item Configure + +The configure process is the way we make Perl portable across the +myriad of operating systems it supports. Responsibility for the +configure, build and installation process, as well as the overall +portability of the core code rests with the configure pumpkin - others +help out with individual operating systems. + +The files involved are the operating system directories, (F, +F, F and so on) the shell scripts which generate F +and F, as well as the metaconfig files which generate +F. (metaconfig isn't included in the core distribution.) + +=item Interpreter + +And of course, there's the core of the Perl interpreter itself. Let's +have a look at that in a little more detail. + +=back + +Before we leave looking at the layout, though, don't forget that +F contains not only the file names in the Perl distribution, +but short descriptions of what's in them, too. For an overview of the +important files, try this: + + perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST + +=head2 Elements of the interpreter + +The work of the interpreter has two main stages: compiling the code +into the internal representation, or bytecode, and then executing it. +L explains exactly how the compilation stage +happens. + +Here is a short breakdown of perl's operation: + +=over 3 + +=item Startup + +The action begins in F. (or F for miniperl) +This is very high-level code, enough to fit on a single screen, and it +resembles the code found in L; most of the real action takes +place in F + +First, F allocates some memory and constructs a Perl +interpreter: + + 1 PERL_SYS_INIT3(&argc,&argv,&env); + 2 + 3 if (!PL_do_undump) { + 4 my_perl = perl_alloc(); + 5 if (!my_perl) + 6 exit(1); + 7 perl_construct(my_perl); + 8 PL_perl_destruct_level = 0; + 9 } + +Line 1 is a macro, and its definition is dependent on your operating +system. Line 3 references C, a global variable - all +global variables in Perl start with C. This tells you whether the +current running program was created with the C<-u> flag to perl and then +F, which means it's going to be false in any sane context. + +Line 4 calls a function in F to allocate memory for a Perl +interpreter. It's quite a simple function, and the guts of it looks like +this: + + my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); + +Here you see an example of Perl's system abstraction, which we'll see +later: C is either your system's C, or Perl's +own C as defined in F if you selected that option at +configure time. + +Next, in line 7, we construct the interpreter; this sets up all the +special variables that Perl needs, the stacks, and so on. + +Now we pass Perl the command line options, and tell it to go: + + exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); + if (!exitstatus) { + exitstatus = perl_run(my_perl); + } + + +C is actually a wrapper around C, as defined +in F, which processes the command line options, sets up any +statically linked XS modules, opens the program and calls C to +parse it. + +=item Parsing + +The aim of this stage is to take the Perl source, and turn it into an op +tree. We'll see what one of those looks like later. Strictly speaking, +there's three things going on here. + +C, the parser, lives in F, although you're better off +reading the original YACC input in F. (Yes, Virginia, there +B a YACC grammar for Perl!) The job of the parser is to take your +code and `understand' it, splitting it into sentences, deciding which +operands go with which operators and so on. + +The parser is nobly assisted by the lexer, which chunks up your input +into tokens, and decides what type of thing each token is: a variable +name, an operator, a bareword, a subroutine, a core function, and so on. +The main point of entry to the lexer is C, and that and its +associated routines can be found in F. Perl isn't much like +other computer languages; it's highly context sensitive at times, it can +be tricky to work out what sort of token something is, or where a token +ends. As such, there's a lot of interplay between the tokeniser and the +parser, which can get pretty frightening if you're not used to it. + +As the parser understands a Perl program, it builds up a tree of +operations for the interpreter to perform during execution. The routines +which construct and link together the various operations are to be found +in F, and will be examined later. + +=item Optimization + +Now the parsing stage is complete, and the finished tree represents +the operations that the Perl interpreter needs to perform to execute our +program. Next, Perl does a dry run over the tree looking for +optimisations: constant expressions such as C<3 + 4> will be computed +now, and the optimizer will also see if any multiple operations can be +replaced with a single one. For instance, to fetch the variable C<$foo>, +instead of grabbing the glob C<*foo> and looking at the scalar +component, the optimizer fiddles the op tree to use a function which +directly looks up the scalar in question. The main optimizer is C +in F, and many ops have their own optimizing functions. + +=item Running + +Now we're finally ready to go: we have compiled Perl byte code, and all +that's left to do is run it. The actual execution is done by the +C function in F; more specifically, it's done by +these three innocent looking lines: + + while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { + PERL_ASYNC_CHECK(); + } + +You may be more comfortable with the Perl version of that: + + PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; + +Well, maybe not. Anyway, each op contains a function pointer, which +stipulates the function which will actually carry out the operation. +This function will return the next op in the sequence - this allows for +things like C which choose the next op dynamically at run time. +The C makes sure that things like signals interrupt +execution if required. + +The actual functions called are known as PP code, and they're spread +between four files: F contains the `hot' code, which is most +often used and highly optimized, F contains all the +system-specific functions, F contains the functions which +implement control structures (C, C and the like) and F +contains everything else. These are, if you like, the C code for Perl's +built-in functions and operators. + +=back + +=head2 Internal Variable Types + +You should by now have had a look at L, which tells you about +Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do +that now. + +These variables are used not only to represent Perl-space variables, but +also any constants in the code, as well as some structures completely +internal to Perl. The symbol table, for instance, is an ordinary Perl +hash. Your code is represented by an SV as it's read into the parser; +any program files you call are opened via ordinary Perl filehandles, and +so on. + +The core L module lets us examine SVs from a +Perl program. Let's see, for instance, how Perl treats the constant +C<"hello">. + + % perl -MDevel::Peek -e 'Dump("hello")' + 1 SV = PV(0xa041450) at 0xa04ecbc + 2 REFCNT = 1 + 3 FLAGS = (POK,READONLY,pPOK) + 4 PV = 0xa0484e0 "hello"\0 + 5 CUR = 5 + 6 LEN = 6 + +Reading C output takes a bit of practise, so let's go +through it line by line. + +Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in +memory. SVs themselves are very simple structures, but they contain a +pointer to a more complex structure. In this case, it's a PV, a +structure which holds a string value, at location C<0xa041450>. Line 2 +is the reference count; there are no other references to this data, so +it's 1. + +Line 3 are the flags for this SV - it's OK to use it as a PV, it's a +read-only SV (because it's a constant) and the data is a PV internally. +Next we've got the contents of the string, starting at location +C<0xa0484e0>. + +Line 5 gives us the current length of the string - note that this does +B include the null terminator. Line 6 is not the length of the +string, but the length of the currently allocated buffer; as the string +grows, Perl automatically extends the available storage via a routine +called C. + +You can get at any of these quantities from C very easily; just add +C to the name of the field shown in the snippet, and you've got a +macro which will return the value: C returns the current +length of the string, C returns the reference count, +C returns the string itself with its length, and so on. +More macros to manipulate these properties can be found in L. + +Let's take an example of manipulating a PV, from C, in F + + 1 void + 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) + 3 { + 4 STRLEN tlen; + 5 char *junk; + + 6 junk = SvPV_force(sv, tlen); + 7 SvGROW(sv, tlen + len + 1); + 8 if (ptr == junk) + 9 ptr = SvPVX(sv); + 10 Move(ptr,SvPVX(sv)+tlen,len,char); + 11 SvCUR(sv) += len; + 12 *SvEND(sv) = '\0'; + 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ + 14 SvTAINT(sv); + 15 } + +This is a function which adds a string, C, of length C onto +the end of the PV stored in C. The first thing we do in line 6 is +make sure that the SV B a valid PV, by calling the C +macro to force a PV. As a side effect, C gets set to the current +value of the PV, and the PV itself is returned to C. + +In line 7, we make sure that the SV will have enough room to accommodate +the old string, the new string and the null terminator. If C isn't +big enough, C will reallocate space for us. + +Now, if C is the same as the string we're trying to add, we can +grab the string directly from the SV; C is the address of the PV +in the SV. + +Line 10 does the actual catenation: the C macro moves a chunk of +memory around: we move the string C to the end of the PV - that's +the start of the PV plus its current length. We're moving C bytes +of type C. After doing so, we need to tell Perl we've extended the +string, by altering C to reflect the new length. C is a +macro which gives us the end of the string, so that needs to be a +C<"\0">. + +Line 13 manipulates the flags; since we've changed the PV, any IV or NV +values will no longer be valid: if we have C<$a=10; $a.="6";> we don't +want to use the old IV of 10. C is a special UTF8-aware +version of C, a macro which turns off the IOK and NOK flags +and turns on POK. The final C is a macro which launders tainted +data if taint mode is turned on. + +AVs and HVs are more complicated, but SVs are by far the most common +variable type being thrown around. Having seen something of how we +manipulate these, let's go on and look at how the op tree is +constructed. + +=head2 Op Trees + +First, what is the op tree, anyway? The op tree is the parsed +representation of your program, as we saw in our section on parsing, and +it's the sequence of operations that Perl goes through to execute your +program, as we saw in L. + +An op is a fundamental operation that Perl can perform: all the built-in +functions and operators are ops, and there are a series of ops which +deal with concepts the interpreter needs internally - entering and +leaving a block, ending a statement, fetching a variable, and so on. + +The op tree is connected in two ways: you can imagine that there are two +"routes" through it, two orders in which you can traverse the tree. +First, parse order reflects how the parser understood the code, and +secondly, execution order tells perl what order to perform the +operations in. + +The easiest way to examine the op tree is to stop Perl after it has +finished parsing, and get it to dump out the tree. This is exactly what +the compiler backends L, L +and L do. + +Let's have a look at how Perl sees C<$a = $b + $c>: + + % perl -MO=Terse -e '$a=$b+$c' + 1 LISTOP (0x8179888) leave + 2 OP (0x81798b0) enter + 3 COP (0x8179850) nextstate + 4 BINOP (0x8179828) sassign + 5 BINOP (0x8179800) add [1] + 6 UNOP (0x81796e0) null [15] + 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b + 8 UNOP (0x81797e0) null [15] + 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c + 10 UNOP (0x816b4f0) null [15] + 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a + +Let's start in the middle, at line 4. This is a BINOP, a binary +operator, which is at location C<0x8179828>. The specific operator in +question is C - scalar assignment - and you can find the code +which implements it in the function C in F. As a +binary operator, it has two children: the add operator, providing the +result of C<$b+$c>, is uppermost on line 5, and the left hand side is on +line 10. + +Line 10 is the null op: this does exactly nothing. What is that doing +there? If you see the null op, it's a sign that something has been +optimized away after parsing. As we mentioned in L, +the optimization stage sometimes converts two operations into one, for +example when fetching a scalar variable. When this happens, instead of +rewriting the op tree and cleaning up the dangling pointers, it's easier +just to replace the redundant operation with the null op. Originally, +the tree would have looked like this: + + 10 SVOP (0x816b4f0) rv2sv [15] + 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a + +That is, fetch the C entry from the main symbol table, and then look +at the scalar component of it: C (C into F) +happens to do both these things. + +The right hand side, starting at line 5 is similar to what we've just +seen: we have the C op (C also in F) add together +two Cs. + +Now, what's this about? + + 1 LISTOP (0x8179888) leave + 2 OP (0x81798b0) enter + 3 COP (0x8179850) nextstate + +C and C are scoping ops, and their job is to perform any +housekeeping every time you enter and leave a block: lexical variables +are tidied up, unreferenced variables are destroyed, and so on. Every +program will have those first three lines: C is a list, and its +children are all the statements in the block. Statements are delimited +by C, so a block is a collection of C ops, with +the ops to be performed for each statement being the children of +C. C is a single op which functions as a marker. + +That's how Perl parsed the program, from top to bottom: + + Program + | + Statement + | + = + / \ + / \ + $a + + / \ + $b $c + +However, it's impossible to B the operations in this order: +you have to find the values of C<$b> and C<$c> before you add them +together, for instance. So, the other thread that runs through the op +tree is the execution order: each op has a field C which points +to the next op to be run, so following these pointers tells us how perl +executes the code. We can traverse the tree in this order using +the C option to C: + + % perl -MO=Terse,exec -e '$a=$b+$c' + 1 OP (0x8179928) enter + 2 COP (0x81798c8) nextstate + 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b + 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c + 5 BINOP (0x8179878) add [1] + 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a + 7 BINOP (0x81798a0) sassign + 8 LISTOP (0x8179900) leave + +This probably makes more sense for a human: enter a block, start a +statement. Get the values of C<$b> and C<$c>, and add them together. +Find C<$a>, and assign one to the other. Then leave. + +The way Perl builds up these op trees in the parsing process can be +unravelled by examining F, the YACC grammar. Let's take the +piece we need to construct the tree for C<$a = $b + $c> + + 1 term : term ASSIGNOP term + 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } + 3 | term ADDOP term + 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } + +If you're not used to reading BNF grammars, this is how it works: You're +fed certain things by the tokeniser, which generally end up in upper +case. Here, C, is provided when the tokeniser sees C<+> in your +code. C is provided when C<=> is used for assigning. These are +`terminal symbols', because you can't get any simpler than them. + +The grammar, lines one and three of the snippet above, tells you how to +build up more complex forms. These complex forms, `non-terminal symbols' +are generally placed in lower case. C here is a non-terminal +symbol, representing a single expression. + +The grammar gives you the following rule: you can make the thing on the +left of the colon if you see all the things on the right in sequence. +This is called a "reduction", and the aim of parsing is to completely +reduce the input. There are several different ways you can perform a +reduction, separated by vertical bars: so, C followed by C<=> +followed by C makes a C, and C followed by C<+> +followed by C can also make a C. + +So, if you see two terms with an C<=> or C<+>, between them, you can +turn them into a single expression. When you do this, you execute the +code in the block on the next line: if you see C<=>, you'll do the code +in line 2. If you see C<+>, you'll do the code in line 4. It's this code +which contributes to the op tree. + + | term ADDOP term + { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } + +What this does is creates a new binary op, and feeds it a number of +variables. The variables refer to the tokens: C<$1> is the first token in +the input, C<$2> the second, and so on - think regular expression +backreferences. C<$$> is the op returned from this reduction. So, we +call C to create a new binary operator. The first parameter to +C, a function in F, is the op type. It's an addition +operator, so we want the type to be C. We could specify this +directly, but it's right there as the second token in the input, so we +use C<$2>. The second parameter is the op's flags: 0 means `nothing +special'. Then the things to add: the left and right hand side of our +expression, in scalar context. + +=head2 Stacks + +When perl executes something like C, how does it pass on its +results to the next op? The answer is, through the use of stacks. Perl +has a number of stacks to store things it's currently working on, and +we'll look at the three most important ones here. + +=over 3 + +=item Argument stack + +Arguments are passed to PP code and returned from PP code using the +argument stack, C. The typical way to handle arguments is to pop +them off the stack, deal with them how you wish, and then push the result +back onto the stack. This is how, for instance, the cosine operator +works: + + NV value; + value = POPn; + value = Perl_cos(value); + XPUSHn(value); + +We'll see a more tricky example of this when we consider Perl's macros +below. C gives you the NV (floating point value) of the top SV on +the stack: the C<$x> in C. Then we compute the cosine, and push +the result back as an NV. The C in C means that the stack +should be extended if necessary - it can't be necessary here, because we +know there's room for one more item on the stack, since we've just +removed one! The C macros at least guarantee safety. + +Alternatively, you can fiddle with the stack directly: C gives you +the first element in your portion of the stack, and C gives you +the top SV/IV/NV/etc. on the stack. So, for instance, to do unary +negation of an integer: + + SETi(-TOPi); + +Just set the integer value of the top stack entry to its negation. + +Argument stack manipulation in the core is exactly the same as it is in +XSUBs - see L, L and L for a longer +description of the macros used in stack manipulation. + +=item Mark stack + +I say `your portion of the stack' above because PP code doesn't +necessarily get the whole stack to itself: if your function calls +another function, you'll only want to expose the arguments aimed for the +called function, and not (necessarily) let it get at your own data. The +way we do this is to have a `virtual' bottom-of-stack, exposed to each +function. The mark stack keeps bookmarks to locations in the argument +stack usable by each function. For instance, when dealing with a tied +variable, (internally, something with `P' magic) Perl has to call +methods for accesses to the tied variables. However, we need to separate +the arguments exposed to the method to the argument exposed to the +original function - the store or fetch or whatever it may be. Here's how +the tied C is implemented; see C in F: + + 1 PUSHMARK(SP); + 2 EXTEND(SP,2); + 3 PUSHs(SvTIED_obj((SV*)av, mg)); + 4 PUSHs(val); + 5 PUTBACK; + 6 ENTER; + 7 call_method("PUSH", G_SCALAR|G_DISCARD); + 8 LEAVE; + 9 POPSTACK; + +The lines which concern the mark stack are the first, fifth and last +lines: they save away, restore and remove the current position of the +argument stack. + +Let's examine the whole implementation, for practice: + + 1 PUSHMARK(SP); + +Push the current state of the stack pointer onto the mark stack. This is +so that when we've finished adding items to the argument stack, Perl +knows how many things we've added recently. + + 2 EXTEND(SP,2); + 3 PUSHs(SvTIED_obj((SV*)av, mg)); + 4 PUSHs(val); + +We're going to add two more items onto the argument stack: when you have +a tied array, the C subroutine receives the object and the value +to be pushed, and that's exactly what we have here - the tied object, +retrieved with C, and the value, the SV C. + + 5 PUTBACK; + +Next we tell Perl to make the change to the global stack pointer: C +only gave us a local copy, not a reference to the global. + + 6 ENTER; + 7 call_method("PUSH", G_SCALAR|G_DISCARD); + 8 LEAVE; + +C and C localise a block of code - they make sure that all +variables are tidied up, everything that has been localised gets +its previous value returned, and so on. Think of them as the C<{> and +C<}> of a Perl block. + +To actually do the magic method call, we have to call a subroutine in +Perl space: C takes care of that, and it's described in +L. We call the C method in scalar context, and we're +going to discard its return value. + + 9 POPSTACK; + +Finally, we remove the value we placed on the mark stack, since we +don't need it any more. + +=item Save stack + +C doesn't have a concept of local scope, so perl provides one. We've +seen that C and C are used as scoping braces; the save +stack implements the C equivalent of, for example: + + { + local $foo = 42; + ... + } + +See L for how to use the save stack. + +=back + +=head2 Millions of Macros + +One thing you'll notice about the Perl source is that it's full of +macros. Some have called the pervasive use of macros the hardest thing +to understand, others find it adds to clarity. Let's take an example, +the code which implements the addition operator: + + 1 PP(pp_add) + 2 { + 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); + 4 { + 5 dPOPTOPnnrl_ul; + 6 SETn( left + right ); + 7 RETURN; + 8 } + 9 } + +Every line here (apart from the braces, of course) contains a macro. The +first line sets up the function declaration as Perl expects for PP code; +line 3 sets up variable declarations for the argument stack and the +target, the return value of the operation. Finally, it tries to see if +the addition operation is overloaded; if so, the appropriate subroutine +is called. + +Line 5 is another variable declaration - all variable declarations start +with C - which pops from the top of the argument stack two NVs (hence +C) and puts them into the variables C and C, hence the +C. These are the two operands to the addition operator. Next, we +call C to set the NV of the return value to the result of adding +the two values. This done, we return - the C macro makes sure +that our return value is properly handled, and we pass the next operator +to run back to the main run loop. + +Most of these macros are explained in L, and some of the more +important ones are explained in L as well. Pay special attention +to L for information on +the C<[pad]THX_?> macros. + +=head2 Poking at Perl + +To really poke around with Perl, you'll probably want to build Perl for +debugging, like this: + + ./Configure -d -D optimize=-g + make + +C<-g> is a flag to the C compiler to have it produce debugging +information which will allow us to step through a running program. +F will also turn on the C compilation symbol which +enables all the internal debugging code in Perl. There are a whole bunch +of things you can debug with this: L lists them all, and the +best way to find out about them is to play about with them. The most +useful options are probably + + l Context (loop) stack processing + t Trace execution + o Method and overloading resolution + c String/numeric conversions + +Some of the functionality of the debugging code can be achieved using XS +modules. + + -Dr => use re 'debug' + -Dx => use O 'Debug' + +=head2 Using a source-level debugger + +If the debugging output of C<-D> doesn't help you, it's time to step +through perl's execution with a source-level debugger. + +=over 3 + +=item * + +We'll use C for our examples here; the principles will apply to any +debugger, but check the manual of the one you're using. + +=back + +To fire up the debugger, type + + gdb ./perl + +You'll want to do that in your Perl source tree so the debugger can read +the source code. You should see the copyright message, followed by the +prompt. + + (gdb) + +C will get you into the documentation, but here are the most +useful commands: + +=over 3 + +=item run [args] + +Run the program with the given arguments. + +=item break function_name + +=item break source.c:xxx + +Tells the debugger that we'll want to pause execution when we reach +either the named function (but see L!) or the given +line in the named source file. + +=item step + +Steps through the program a line at a time. + +=item next + +Steps through the program a line at a time, without descending into +functions. + +=item continue + +Run until the next breakpoint. + +=item finish + +Run until the end of the current function, then stop again. + +=item 'enter' + +Just pressing Enter will do the most recent operation again - it's a +blessing when stepping through miles of source code. + +=item print + +Execute the given C code and print its results. B: Perl makes +heavy use of macros, and F is not aware of macros. You'll have to +substitute them yourself. So, for instance, you can't say + + print SvPV_nolen(sv) + +but you have to say + + print Perl_sv_2pv_nolen(sv) + +You may find it helpful to have a "macro dictionary", which you can +produce by saying C. Even then, F won't +recursively apply the macros for you. + +=back + +=head2 Dumping Perl Data Structures + +One way to get around this macro hell is to use the dumping functions in +F; these work a little like an internal +L, but they also cover OPs and other structures +that you can't get at from Perl. Let's take an example. We'll use the +C<$a = $b + $c> we used before, but give it a bit of context: +C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? + +What about C, the function we examined earlier to implement the +C<+> operator: + + (gdb) break Perl_pp_add + Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. + +Notice we use C and not C - see L. +With the breakpoint in place, we can run our program: + + (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' + +Lots of junk will go past as gdb reads in the relevant source files and +libraries, and then: + + Breakpoint 1, Perl_pp_add () at pp_hot.c:309 + 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); + (gdb) step + 311 dPOPTOPnnrl_ul; + (gdb) + +We looked at this bit of code before, and we said that C +arranges for two Cs to be placed into C and C - let's +slightly expand it: + + #define dPOPTOPnnrl_ul NV right = POPn; \ + SV *leftsv = TOPs; \ + NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 + +C takes the SV from the top of the stack and obtains its NV either +directly (if C is set) or by calling the C function. +C takes the next SV from the top of the stack - yes, C uses +C - but doesn't remove it. We then use C to get the NV from +C in the same way as before - yes, C uses C. + +Since we don't have an NV for C<$b>, we'll have to use C to +convert it. If we step again, we'll find ourselves there: + + Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 + 1669 if (!sv) + (gdb) + +We can now use C to investigate the SV: + + SV = PV(0xa057cc0) at 0xa0675d0 + REFCNT = 1 + FLAGS = (POK,pPOK) + PV = 0xa06a510 "6XXXX"\0 + CUR = 5 + LEN = 6 + $1 = void + +We know we're going to get C<6> from this, so let's finish the +subroutine: + + (gdb) finish + Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 + 0x462669 in Perl_pp_add () at pp_hot.c:311 + 311 dPOPTOPnnrl_ul; + +We can also dump out this op: the current op is always stored in +C, and we can dump it with C. This'll give us +similar output to L. + + { + 13 TYPE = add ===> 14 + TARG = 1 + FLAGS = (SCALAR,KIDS) + { + TYPE = null ===> (12) + (was rv2sv) + FLAGS = (SCALAR,KIDS) + { + 11 TYPE = gvsv ===> 12 + FLAGS = (SCALAR) + GV = main::b + } + } + +# finish this later # + +=head2 Patching + +All right, we've now had a look at how to navigate the Perl sources and +some things you'll need to know when fiddling with them. Let's now get +on and create a simple patch. Here's something Larry suggested: if a +C is the first active format during a C, (for example, +C) then the resulting string should be treated as +UTF8 encoded. + +How do we prepare to fix this up? First we locate the code in question - +the C happens at runtime, so it's going to be in one of the F +files. Sure enough, C is in F. Since we're going to be +altering this file, let's copy it to F. + +[Well, it was in F when this tutorial was written. It has now been +split off with C to its own file, F] + +Now let's look over C: we take a pattern into C, and then +loop over the pattern, taking each format character in turn into +C. Then for each possible format character, we swallow up +the other arguments in the pattern (a field width, an asterisk, and so +on) and convert the next chunk input into the specified format, adding +it onto the output SV C. + +How do we know if the C is the first format in the C? Well, if +we have a pointer to the start of C then, if we see a C we can +test whether we're still at the start of the string. So, here's where +C is set up: + + STRLEN fromlen; + register char *pat = SvPVx(*++MARK, fromlen); + register char *patend = pat + fromlen; + register I32 len; + I32 datumtype; + SV *fromstr; + +We'll have another string pointer in there: + + STRLEN fromlen; + register char *pat = SvPVx(*++MARK, fromlen); + register char *patend = pat + fromlen; + + char *patcopy; + register I32 len; + I32 datumtype; + SV *fromstr; + +And just before we start the loop, we'll set C to be the start +of C: + + items = SP - MARK; + MARK++; + sv_setpvn(cat, "", 0); + + patcopy = pat; + while (pat < patend) { + +Now if we see a C which was at the start of the string, we turn on +the UTF8 flag for the output SV, C: + + + if (datumtype == 'U' && pat==patcopy+1) + + SvUTF8_on(cat); + if (datumtype == '#') { + while (pat < patend && *pat != '\n') + pat++; + +Remember that it has to be C because the first character of +the string is the C which has been swallowed into C + +Oops, we forgot one thing: what if there are spaces at the start of the +pattern? C will have C as the first active +character, even though it's not the first thing in the pattern. In this +case, we have to advance C along with C when we see spaces: + + if (isSPACE(datumtype)) + continue; + +needs to become + + if (isSPACE(datumtype)) { + patcopy++; + continue; + } + +OK. That's the C part done. Now we must do two additional things before +this patch is ready to go: we've changed the behaviour of Perl, and so +we must document that change. We must also provide some more regression +tests to make sure our patch works and doesn't create a bug somewhere +else along the line. + +The regression tests for each operator live in F, and so we +make a copy of F to F. Now we can add our +tests to the end. First, we'll test that the C does indeed create +Unicode strings. + +t/op/pack.t has a sensible ok() function, but if it didn't we could +use the one from t/test.pl. + + require './test.pl'; + plan( tests => 159 ); + +so instead of this: + + print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000); + print "ok $test\n"; $test++; + +we can write the more sensible (see L for a full +explanation of is() and other testing functions). + + is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), + "U* produces unicode" ); + +Now we'll test that we got that space-at-the-beginning business right: + + is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000), + " with spaces at the beginning" ); + +And finally we'll test that we don't make Unicode strings if C is B +the first active format: + + isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000), + "U* not first isn't unicode" ); + +Mustn't forget to change the number of tests which appears at the top, +or else the automated tester will get confused. This will either look +like this: + + print "1..156\n"; + +or this: + + plan( tests => 156 ); + +We now compile up Perl, and run it through the test suite. Our new +tests pass, hooray! + +Finally, the documentation. The job is never done until the paperwork is +over, so let's describe the change we've just made. The relevant place +is F; again, we make a copy, and then we'll insert +this text in the description of C: + + =item * + + If the pattern begins with a C, the resulting string will be treated + as Unicode-encoded. You can force UTF8 encoding on in a string with an + initial C, and the bytes that follow will be interpreted as Unicode + characters. If you don't want this to happen, you can begin your pattern + with C (or anything else) to force Perl not to UTF8 encode your + string, and then follow this with a C somewhere in your pattern. + +All done. Now let's create the patch. F tells us +that if we're making major changes, we should copy the entire directory +to somewhere safe before we begin fiddling, and then do + + diff -ruN old new > patch + +However, we know which files we've changed, and we can simply do this: + + diff -u pp.c~ pp.c > patch + diff -u t/op/pack.t~ t/op/pack.t >> patch + diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch + +We end up with a patch looking a little like this: + + --- pp.c~ Fri Jun 02 04:34:10 2000 + +++ pp.c Fri Jun 16 11:37:25 2000 + @@ -4375,6 +4375,7 @@ + register I32 items; + STRLEN fromlen; + register char *pat = SvPVx(*++MARK, fromlen); + + char *patcopy; + register char *patend = pat + fromlen; + register I32 len; + I32 datumtype; + @@ -4405,6 +4406,7 @@ + ... + +And finally, we submit it, with our rationale, to perl5-porters. Job +done! + +=head2 Patching a core module + +This works just like patching anything else, with an extra +consideration. Many core modules also live on CPAN. If this is so, +patch the CPAN version instead of the core and send the patch off to +the module maintainer (with a copy to p5p). This will help the module +maintainer keep the CPAN version in sync with the core version without +constantly scanning p5p. + +=head2 Adding a new function to the core + +If, as part of a patch to fix a bug, or just because you have an +especially good idea, you decide to add a new function to the core, +discuss your ideas on p5p well before you start work. It may be that +someone else has already attempted to do what you are considering and +can give lots of good advice or even provide you with bits of code +that they already started (but never finished). + +You have to follow all of the advice given above for patching. It is +extremely important to test any addition thoroughly and add new tests +to explore all boundary conditions that your new function is expected +to handle. If your new function is used only by one module (e.g. toke), +then it should probably be named S_your_function (for static); on the +other hand, if you expect it to accessible from other functions in +Perl, you should name it Perl_your_function. See L +for more details. + +The location of any new code is also an important consideration. Don't +just create a new top level .c file and put your code there; you would +have to make changes to Configure (so the Makefile is created properly), +as well as possibly lots of include files. This is strictly pumpking +business. + +It is better to add your function to one of the existing top level +source code files, but your choice is complicated by the nature of +the Perl distribution. Only the files that are marked as compiled +static are located in the perl executable. Everything else is located +in the shared library (or DLL if you are running under WIN32). So, +for example, if a function was only used by functions located in +toke.c, then your code can go in toke.c. If, however, you want to call +the function from universal.c, then you should put your code in another +location, for example util.c. + +In addition to writing your c-code, you will need to create an +appropriate entry in embed.pl describing your function, then run +'make regen_headers' to create the entries in the numerous header +files that perl needs to compile correctly. See L +for information on the various options that you can set in embed.pl. +You will forget to do this a few (or many) times and you will get +warnings during the compilation phase. Make sure that you mention +this when you post your patch to P5P; the pumpking needs to know this. + +When you write your new code, please be conscious of existing code +conventions used in the perl source files. See L for +details. Although most of the guidelines discussed seem to focus on +Perl code, rather than c, they all apply (except when they don't ;). +See also I file in the Perl source distribution +for lots of details about both formatting and submitting patches of +your changes. + +Lastly, TEST TEST TEST TEST TEST any code before posting to p5p. +Test on as many platforms as you can find. Test as many perl +Configure options as you can (e.g. MULTIPLICITY). If you have +profiling or memory tools, see L +below for how to use them to further test your code. Remember that +most of the people on P5P are doing this on their own time and +don't have the time to debug your code. + +=head2 Writing a test + +Every module and built-in function has an associated test file (or +should...). If you add or change functionality, you have to write a +test. If you fix a bug, you have to write a test so that bug never +comes back. If you alter the docs, it would be nice to test what the +new documentation says. + +In short, if you submit a patch you probably also have to patch the +tests. + +For modules, the test file is right next to the module itself. +F tests F. This is a recent innovation, +so there are some snags (and it would be wonderful for you to brush +them out), but it basically works that way. Everything else lives in +F. + +=over 3 + +=item F + +Testing of the absolute basic functionality of Perl. Things like +C, basic file reads and writes, simple regexes, etc. These are +run first in the test suite and if any of them fail, something is +I broken. + +=item F + +These test the basic control structures, C, C, +subroutines, etc. + +=item F + +Tests basic issues of how Perl parses and compiles itself. + +=item F + +Tests for built-in IO functions, including command line arguments. + +=item F + +The old home for the module tests, you shouldn't put anything new in +here. There are still some bits and pieces hanging around in here +that need to be moved. Perhaps you could move them? Thanks! + +=item F + +Tests for perl's built in functions that don't fit into any of the +other directories. + +=item F + +Tests for POD directives. There are still some tests for the Pod +modules hanging around in here that need to be moved out into F. + +=item F + +Testing features of how perl actually runs, including exit codes and +handling of PERL* environment variables. + +=back + +The core uses the same testing style as the rest of Perl, a simple +"ok/not ok" run through Test::Harness, but there are a few special +considerations. + +There are three ways to write a test in the core. Test::More, +t/test.pl and ad hoc C. The +decision of which to use depends on what part of the test suite you're +working on. This is a measure to prevent a high-level failure (such +as Config.pm breaking) from causing basic functionality tests to fail. + +=over 4 + +=item t/base t/comp + +Since we don't know if require works, or even subroutines, use ad hoc +tests for these two. Step carefully to avoid using the feature being +tested. + +=item t/cmd t/run t/io t/op + +Now that basic require() and subroutines are tested, you can use the +t/test.pl library which emulates the important features of Test::More +while using a minimum of core features. + +You can also conditionally use certain libraries like Config, but be +sure to skip the test gracefully if it's not there. + +=item t/lib ext lib + +Now that the core of Perl is tested, Test::More can be used. You can +also use the full suite of core modules in the tests. + +=back + +When you say "make test" Perl uses the F program to run the +test suite. All tests are run from the F directory, B the +directory which contains the test. This causes some problems with the +tests in F, so here's some opportunity for some patching. + +You must be triply conscious of cross-platform concerns. This usually +boils down to using File::Spec and avoiding things like C and +C unless absolutely necessary. + +=head2 Special Make Test Targets + +There are various special make targets that can be used to test Perl +slightly differently than the standard "test" target. Not all them +are expected to give a 100% success rate. Many of them have several +aliases. + +=over 4 + +=item coretest + +Run F on all core tests (F and F pragma tests). + +=item test.deparse + +Run all the tests through the B::Deparse. Not all tests will succeed. + +=item minitest + +Run F on F, F, F, F, F, +F, and F tests. + +=item test.third check.third utest.third ucheck.third + +(Only in Tru64) Run all the tests using the memory leak + naughty +memory access tool "Third Degree". The log files will be named +F. + +=item test.torture torturetest + +Run all the usual tests and some extra tests. As of Perl 5.8.0 the +only extra tests are Abigail's JAPHs, t/japh/abigail.t. + +You can also run the torture test with F by giving +C<-torture> argument to F. + +=item utest ucheck test.utf8 check.utf8 + +Run all the tests with -Mutf8. Not all tests will succeed. + +=back + +=head1 EXTERNAL TOOLS FOR DEBUGGING PERL + +Sometimes it helps to use external tools while debugging and +testing Perl. This section tries to guide you through using +some common testing and debugging tools with Perl. This is +meant as a guide to interfacing these tools with Perl, not +as any kind of guide to the use of the tools themselves. + +=head2 Rational Software's Purify + +Purify is a commercial tool that is helpful in identifying +memory overruns, wild pointers, memory leaks and other such +badness. Perl must be compiled in a specific way for +optimal testing with Purify. Purify is available under +Windows NT, Solaris, HP-UX, SGI, and Siemens Unix. + +The only currently known leaks happen when there are +compile-time errors within eval or require. (Fixing these +is non-trivial, unfortunately, but they must be fixed +eventually.) + +=head2 Purify on Unix + +On Unix, Purify creates a new Perl binary. To get the most +benefit out of Purify, you should create the perl to Purify +using: + + sh Configure -Accflags=-DPURIFY -Doptimize='-g' \ + -Uusemymalloc -Dusemultiplicity + +where these arguments mean: + +=over 4 + +=item -Accflags=-DPURIFY + +Disables Perl's arena memory allocation functions, as well as +forcing use of memory allocation functions derived from the +system malloc. + +=item -Doptimize='-g' + +Adds debugging information so that you see the exact source +statements where the problem occurs. Without this flag, all +you will see is the source filename of where the error occurred. + +=item -Uusemymalloc + +Disable Perl's malloc so that Purify can more closely monitor +allocations and leaks. Using Perl's malloc will make Purify +report most leaks in the "potential" leaks category. + +=item -Dusemultiplicity + +Enabling the multiplicity option allows perl to clean up +thoroughly when the interpreter shuts down, which reduces the +number of bogus leak reports from Purify. + +=back + +Once you've compiled a perl suitable for Purify'ing, then you +can just: + + make pureperl + +which creates a binary named 'pureperl' that has been Purify'ed. +This binary is used in place of the standard 'perl' binary +when you want to debug Perl memory problems. + +To minimize the number of memory leak false alarms +(see L), set environment variable +PERL_DESTRUCT_LEVEL to 2. + + setenv PERL_DESTRUCT_LEVEL 2 + +In Bourne-type shells: + + PERL_DESTRUCT_LEVEL=2 + export PERL_DESTRUCT_LEVEL + +As an example, to show any memory leaks produced during the +standard Perl testset you would create and run the Purify'ed +perl as: + + make pureperl + cd t + ../pureperl -I../lib harness + +which would run Perl on test.pl and report any memory problems. + +Purify outputs messages in "Viewer" windows by default. If +you don't have a windowing environment or if you simply +want the Purify output to unobtrusively go to a log file +instead of to the interactive window, use these following +options to output to the log file "perl.log": + + setenv PURIFYOPTIONS "-chain-length=25 -windows=no \ + -log-file=perl.log -append-logfile=yes" + +If you plan to use the "Viewer" windows, then you only need this option: + + setenv PURIFYOPTIONS "-chain-length=25" + +In Bourne-type shells: + + PURIFYOPTIONS="..." + export PURIFYOPTIONS + +or if you have the "env" utility: + + env PURIFYOPTIONS="..." ../pureperl ... + +=head2 Purify on NT + +Purify on Windows NT instruments the Perl binary 'perl.exe' +on the fly. There are several options in the makefile you +should change to get the most use out of Purify: + +=over 4 + +=item DEFINES + +You should add -DPURIFY to the DEFINES line so the DEFINES +line looks something like: + + DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 + +to disable Perl's arena memory allocation functions, as +well as to force use of memory allocation functions derived +from the system malloc. + +=item USE_MULTI = define + +Enabling the multiplicity option allows perl to clean up +thoroughly when the interpreter shuts down, which reduces the +number of bogus leak reports from Purify. + +=item #PERL_MALLOC = define + +Disable Perl's malloc so that Purify can more closely monitor +allocations and leaks. Using Perl's malloc will make Purify +report most leaks in the "potential" leaks category. + +=item CFG = Debug + +Adds debugging information so that you see the exact source +statements where the problem occurs. Without this flag, all +you will see is the source filename of where the error occurred. + +=back + +As an example, to show any memory leaks produced during the +standard Perl testset you would create and run Purify as: + + cd win32 + make + cd ../t + purify ../perl -I../lib harness + +which would instrument Perl in memory, run Perl on test.pl, +then finally report any memory problems. + +B: as of Perl 5.8.0, the ext/Encode/t/Unicode.t takes +extraordinarily long (hours?) to complete under Purify. It has been +theorized that it would eventually finish, but nobody has so far been +patient enough :-) (This same extreme slowdown has been seen also with +the Third Degree tool, so the said test must be doing something that +is quite unfriendly for memory debuggers.) It is suggested that you +simply kill away that testing process. + +=head2 Compaq's/Digital's/HP's Third Degree + +Third Degree is a tool for memory leak detection and memory access checks. +It is one of the many tools in the ATOM toolkit. The toolkit is only +available on Tru64 (formerly known as Digital UNIX formerly known as +DEC OSF/1). + +When building Perl, you must first run Configure with -Doptimize=-g +and -Uusemymalloc flags, after that you can use the make targets +"perl.third" and "test.third". (What is required is that Perl must be +compiled using the C<-g> flag, you may need to re-Configure.) + +The short story is that with "atom" you can instrument the Perl +executable to create a new executable called F. When the +instrumented executable is run, it creates a log of dubious memory +traffic in file called F. See the manual pages of atom and +third for more information. The most extensive Third Degree +documentation is available in the Compaq "Tru64 UNIX Programmer's +Guide", chapter "Debugging Programs with Third Degree". + +The "test.third" leaves a lot of files named F in the t/ +subdirectory. There is a problem with these files: Third Degree is so +effective that it finds problems also in the system libraries. +Therefore you should used the Porting/thirdclean script to cleanup +the F<*.3log> files. + +There are also leaks that for given certain definition of a leak, +aren't. See L for more information. + +=head2 PERL_DESTRUCT_LEVEL + +If you want to run any of the tests yourself manually using the +pureperl or perl.third executables, please note that by default +perl B explicitly cleanup all the memory it has allocated +(such as global memory arenas) but instead lets the exit() of +the whole program "take care" of such allocations, also known +as "global destruction of objects". + +There is a way to tell perl to do complete cleanup: set the +environment variable PERL_DESTRUCT_LEVEL to a non-zero value. +The t/TEST wrapper does set this to 2, and this is what you +need to do too, if you don't want to see the "global leaks": +For example, for "third-degreed" Perl: + + env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t + +(Note: the mod_perl apache module uses also this environment variable +for its own purposes and extended its semantics. Refer to the mod_perl +documentation for more information.) + +=head2 Profiling + +Depending on your platform there are various of profiling Perl. + +There are two commonly used techniques of profiling executables: +I and I. + +The first method takes periodically samples of the CPU program +counter, and since the program counter can be correlated with the code +generated for functions, we get a statistical view of in which +functions the program is spending its time. The caveats are that very +small/fast functions have lower probability of showing up in the +profile, and that periodically interrupting the program (this is +usually done rather frequently, in the scale of milliseconds) imposes +an additional overhead that may skew the results. The first problem +can be alleviated by running the code for longer (in general this is a +good idea for profiling), the second problem is usually kept in guard +by the profiling tools themselves. + +The second method divides up the generated code into I. +Basic blocks are sections of code that are entered only in the +beginning and exited only at the end. For example, a conditional jump +starts a basic block. Basic block profiling usually works by +I the code by adding I +book-keeping code to the generated code. During the execution of the +code the basic block counters are then updated appropriately. The +caveat is that the added extra code can skew the results: again, the +profiling tools usually try to factor their own effects out of the +results. + +=head2 Gprof Profiling + +gprof is a profiling tool available in many UNIX platforms, +it uses F. + +You can build a profiled version of perl called "perl.gprof" by +invoking the make target "perl.gprof" (What is required is that Perl +must be compiled using the C<-pg> flag, you may need to re-Configure). +Running the profiled version of Perl will create an output file called +F is created which contains the profiling data collected +during the execution. + +The gprof tool can then display the collected data in various ways. +Usually gprof understands the following options: + +=over 4 + +=item -a + +Suppress statically defined functions from the profile. + +=item -b + +Suppress the verbose descriptions in the profile. + +=item -e routine + +Exclude the given routine and its descendants from the profile. + +=item -f routine + +Display only the given routine and its descendants in the profile. + +=item -s + +Generate a summary file called F which then may be given +to subsequent gprof runs to accumulate data over several runs. + +=item -z + +Display routines that have zero usage. + +=back + +For more detailed explanation of the available commands and output +formats, see your own local documentation of gprof. + +=head2 GCC gcov Profiling + +Starting from GCC 3.0 I is officially available +for the GNU CC. + +You can build a profiled version of perl called F by +invoking the make target "perl.gcov" (what is required that Perl must +be compiled using gcc with the flags C<-fprofile-arcs +-ftest-coverage>, you may need to re-Configure). + +Running the profiled version of Perl will cause profile output to be +generated. For each source file an accompanying ".da" file will be +created. + +To display the results you use the "gcov" utility (which should +be installed if you have gcc 3.0 or newer installed). F is +run on source code files, like this + + gcov sv.c + +which will cause F to be created. The F<.gcov> files +contain the source code annotated with relative frequencies of +execution indicated by "#" markers. + +Useful options of F include C<-b> which will summarise the +basic block, branch, and function call coverage, and C<-c> which +instead of relative frequencies will use the actual counts. For +more information on the use of F and basic block profiling +with gcc, see the latest GNU CC manual, as of GCC 3.0 see + + http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html + +and its section titled "8. gcov: a Test Coverage Program" + + http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132 + +=head2 Pixie Profiling + +Pixie is a profiling tool available on IRIX and Tru64 (aka Digital +UNIX aka DEC OSF/1) platforms. Pixie does its profiling using +I. + +You can build a profiled version of perl called F by +invoking the make target "perl.pixie" (what is required is that Perl +must be compiled using the C<-g> flag, you may need to re-Configure). + +In Tru64 a file called F will also be silently created, +this file contains the addresses of the basic blocks. Running the +profiled version of Perl will create a new file called "perl.Counts" +which contains the counts for the basic block for that particular +program execution. + +To display the results you use the F utility. The exact +incantation depends on your operating system, "prof perl.Counts" in +IRIX, and "prof -pixie -all -L. perl" in Tru64. + +In IRIX the following prof options are available: + +=over 4 + +=item -h + +Reports the most heavily used lines in descending order of use. +Useful for finding the hotspot lines. + +=item -l + +Groups lines by procedure, with procedures sorted in descending order of use. +Within a procedure, lines are listed in source order. +Useful for finding the hotspots of procedures. + +=back + +In Tru64 the following options are available: + +=over 4 + +=item -p[rocedures] + +Procedures sorted in descending order by the number of cycles executed +in each procedure. Useful for finding the hotspot procedures. +(This is the default option.) + +=item -h[eavy] + +Lines sorted in descending order by the number of cycles executed in +each line. Useful for finding the hotspot lines. + +=item -i[nvocations] + +The called procedures are sorted in descending order by number of calls +made to the procedures. Useful for finding the most used procedures. + +=item -l[ines] + +Grouped by procedure, sorted by cycles executed per procedure. +Useful for finding the hotspots of procedures. + +=item -testcoverage + +The compiler emitted code for these lines, but the code was unexecuted. + +=item -z[ero] + +Unexecuted procedures. + +=back + +For further information, see your system's manual pages for pixie and prof. + +=head2 Miscellaneous tricks + +=over 4 + +=item * + +Those debugging perl with the DDD frontend over gdb may find the +following useful: + +You can extend the data conversion shortcuts menu, so for example you +can display an SV's IV value with one click, without doing any typing. +To do that simply edit ~/.ddd/init file and add after: + + ! Display shortcuts. + Ddd*gdbDisplayShortcuts: \ + /t () // Convert to Bin\n\ + /d () // Convert to Dec\n\ + /x () // Convert to Hex\n\ + /o () // Convert to Oct(\n\ + +the following two lines: + + ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\ + ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx + +so now you can do ivx and pvx lookups or you can plug there the +sv_peek "conversion": + + Perl_sv_peek(my_perl, (SV*)()) // sv_peek + +(The my_perl is for threaded builds.) +Just remember that every line, but the last one, should end with \n\ + +Alternatively edit the init file interactively via: +3rd mouse button -> New Display -> Edit Menu + +Note: you can define up to 20 conversion shortcuts in the gdb +section. + +=item * + +If you see in a debugger a memory area mysteriously full of 0xabababab, +you may be seeing the effect of the Poison() macro, see L. + +=back + +=head2 CONCLUSION + +We've had a brief look around the Perl source, an overview of the stages +F goes through when it's running your code, and how to use a +debugger to poke at the Perl guts. We took a very simple problem and +demonstrated how to solve it fully - with documentation, regression +tests, and finally a patch for submission to p5p. Finally, we talked +about how to use external tools to debug and test Perl. + +I'd now suggest you read over those references again, and then, as soon +as possible, get your hands dirty. The best way to learn is by doing, +so: + +=over 3 + +=item * + +Subscribe to perl5-porters, follow the patches and try and understand +them; don't be afraid to ask if there's a portion you're not clear on - +who knows, you may unearth a bug in the patch... + +=item * + +Keep up to date with the bleeding edge Perl distributions and get +familiar with the changes. Try and get an idea of what areas people are +working on and the changes they're making. + +=item * + +Do read the README associated with your operating system, e.g. README.aix +on the IBM AIX OS. Don't hesitate to supply patches to that README if +you find anything missing or changed over a new OS release. + +=item * + +Find an area of Perl that seems interesting to you, and see if you can +work out how it works. Scan through the source, and step over it in the +debugger. Play, poke, investigate, fiddle! You'll probably get to +understand not just your chosen area but a much wider range of F's +activity as well, and probably sooner than you'd think. + +=back + +=over 3 + +=item I + +=back + +If you can do these things, you've started on the long road to Perl porting. +Thanks for wanting to help make Perl better - and happy hacking! + =head1 AUTHOR This document was written by Nathan Torkington, and is maintained by