Two missed s/Autrijus/Audrey/g
[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
CommitLineData
7711098a 1=head1 NAME
2
3perltodo - Perl TO-DO List
4
5=head1 DESCRIPTION
e50bb9a1 6
52960e22 7This is a list of wishes for Perl. The tasks we think are smaller or
8easier are listed first. Anyone is welcome to work on any of these,
9but it's a good idea to first contact I<perl5-porters@perl.org> to
10avoid duplication of effort, and to learn from any previous attempts.
11By all means contact a pumpking privately first if you prefer.
e50bb9a1 12
0bdfc961 13Whilst patches to make the list shorter are most welcome, ideas to add to
14the list are also encouraged. Check the perl5-porters archives for past
15ideas, and any discussion about them. One set of archives may be found at:
e50bb9a1 16
0bdfc961 17 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
938c8732 18
617eabfa 19What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
20not, but if your patch is incorporated, then we'll add your name to the
21F<AUTHORS> file, which ships in the official distribution. How many other
22programming languages offer you 1 line of immortality?
938c8732 23
0bdfc961 24=head1 Tasks that only need Perl knowledge
e50bb9a1 25
162f8c67 26=head2 Smartmatch design issues
27
28In 5.10.0 the smartmatch operator C<~~> isn't working quite "right". But
29before we can fix the implementation, we need to define what "right" is.
30The first problem is that Robin Houston implemented the Perl 6 smart match
31spec as of February 2006, when smart match was axiomatically symmetrical:
32L<http://groups.google.com/group/perl.perl6.language/msg/bf2b486f089ad021>
33
34Since then the Perl 6 target moved, but the Perl 5 implementation did not.
35
36So it would be useful for someone to compare the Perl 6 smartmatch table
37as of February 2006 L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=markup&pathrev=7615>
38and the current table L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?revision=14556&view=markup>
98af1e14 39and tabulate the differences in Perl 6. The annotated view of changes is
40L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=annotate> and the diff is
162f8c67 41C<svn diff -r7615:14556 http://svn.perl.org/perl6/doc/trunk/design/syn/S03.pod>
98af1e14 42-- search for C<=head1 Smart matching>. (In theory F<viewvc> can generate that,
43but in practice when I tried it hung forever, I assume "thinking")
162f8c67 44
45With that done and published, someone (else) can then map any changed Perl 6
46semantics back to Perl 5, based on how the existing semantics map to Perl 5:
47L<http://search.cpan.org/~rgarcia/perl-5.10.0/pod/perlsyn.pod#Smart_matching_in_detail>
48
49
50There are also some questions that need answering:
51
52=over 4
53
54=item *
55
56How do you negate one? (documentation issue)
57http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-01/msg00071.html
58
59=item *
60
61Array behaviors
62http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00799.html
63
64* Should smart matches be symmetrical? (Perl 6 says no)
65
66* Other differences between Perl 5 and Perl 6 smart match?
67
68=item *
69
70Objects and smart match
71http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00865.html
72
73=back
74
5a176cbc 75=head2 Remove duplication of test setup.
76
77Schwern notes, that there's duplication of code - lots and lots of tests have
78some variation on the big block of C<$Is_Foo> checks. We can safely put this
79into a file, change it to build an C<%Is> hash and require it. Maybe just put
80it into F<test.pl>. Throw in the handy tainting subroutines.
81
87a942b1 82=head2 POD -E<gt> HTML conversion in the core still sucks
e50bb9a1 83
938c8732 84Which is crazy given just how simple POD purports to be, and how simple HTML
adebf063 85can be. It's not actually I<as> simple as it sounds, particularly with the
86flexibility POD allows for C<=item>, but it would be good to improve the
87visual appeal of the HTML generated, and to avoid it having any validation
88errors. See also L</make HTML install work>, as the layout of installation tree
89is needed to improve the cross-linking.
938c8732 90
dc0fb092 91The addition of C<Pod::Simple> and its related modules may make this task
92easier to complete.
93
8537f021 94=head2 merge checkpods and podchecker
95
96F<pod/checkpods.PL> (and C<make check> in the F<pod/> subdirectory)
97implements a very basic check for pod files, but the errors it discovers
98aren't found by podchecker. Add this check to podchecker, get rid of
99checkpods and have C<make check> use podchecker.
100
aa237293 101=head2 Parallel testing
102
b2e2905c 103(This probably impacts much more than the core: also the Test::Harness
02f21748 104and TAP::* modules on CPAN.)
105
c707cc00 106All of the tests in F<t/> can now be run in parallel, if C<$ENV{TEST_JOBS}>
107is set. However, tests within each directory in F<ext> and F<lib> are still
108run in series, with directories run in parallel. This is an adequate
109heuristic, but it might be possible to relax it further, and get more
110throughput. Specifically, it would be good to audit all of F<lib/*.t>, and
111make them use C<File::Temp>.
aa237293 112
0bdfc961 113=head2 Make Schwern poorer
e50bb9a1 114
613bd4f7 115We should have tests for everything. When all the core's modules are tested,
0bdfc961 116Schwern has promised to donate to $500 to TPF. We may need volunteers to
117hold him upside down and shake vigorously in order to actually extract the
118cash.
3958b146 119
0bdfc961 120=head2 Improve the coverage of the core tests
e50bb9a1 121
02f21748 122Use Devel::Cover to ascertain the core modules's test coverage, then add
123tests that are currently missing.
30222c0f 124
0bdfc961 125=head2 test B
e50bb9a1 126
0bdfc961 127A full test suite for the B module would be nice.
e50bb9a1 128
636e63cb 129=head2 Deparse inlined constants
130
131Code such as this
132
133 use constant PI => 4;
134 warn PI
135
136will currently deparse as
137
138 use constant ('PI', 4);
139 warn 4;
140
141because the tokenizer inlines the value of the constant subroutine C<PI>.
142This allows various compile time optimisations, such as constant folding
143and dead code elimination. Where these haven't happened (such as the example
144above) it ought be possible to make B::Deparse work out the name of the
145original constant, because just enough information survives in the symbol
146table to do this. Specifically, the same scalar is used for the constant in
147the optree as is used for the constant subroutine, so by iterating over all
148symbol tables and generating a mapping of SV address to constant name, it
149would be possible to provide B::Deparse with this functionality.
150
0bdfc961 151=head2 A decent benchmark
e50bb9a1 152
617eabfa 153C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961 154would be useful to have a reasonable general benchmarking suite that roughly
155represented what current perl programs do, and measurably reported whether
156tweaks to the core improve, degrade or don't really affect performance, to
157guide people attempting to optimise the guts of perl. Gisle would welcome
158new tests for perlbench.
6168cf99 159
0bdfc961 160=head2 fix tainting bugs
6168cf99 161
0bdfc961 162Fix the bugs revealed by running the test suite with the C<-t> switch (via
163C<make test.taintwarn>).
e50bb9a1 164
0bdfc961 165=head2 Dual life everything
e50bb9a1 166
0bdfc961 167As part of the "dists" plan, anything that doesn't belong in the smallest perl
168distribution needs to be dual lifed. Anything else can be too. Figure out what
169changes would be needed to package that module and its tests up for CPAN, and
170do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 171
a393eb28 172To make a minimal perl distribution, it's useful to look at
173F<t/lib/commonsense.t>.
174
c2aba5b8 175=head2 Bundle dual life modules in ext/
176
177For maintenance (and branch merging) reasons, it would be useful to move
178some architecture-independent dual-life modules from lib/ to ext/, if this
179has no negative impact on the build of perl itself.
180
0bdfc961 181=head2 POSIX memory footprint
e50bb9a1 182
0bdfc961 183Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
184various times worked to cut it down. There is probably still fat to cut out -
185for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 186
eed36644 187=head2 embed.pl/makedef.pl
188
189There is a script F<embed.pl> that generates several header files to prefix
190all of Perl's symbols in a consistent way, to provide some semblance of
191namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
907b3e23 192in F<interpvar.h>. Quite a few of the functions and variables
eed36644 193are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
194doesn't understand the C macros, so the rules about which symbols are present
195when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
196It would be good to teach C<embed.pl> to understand the conditional
197compilation, and hence remove the duplication, and the mistakes it has caused.
e50bb9a1 198
801de10e 199=head2 use strict; and AutoLoad
200
201Currently if you write
202
203 package Whack;
204 use AutoLoader 'AUTOLOAD';
205 use strict;
206 1;
207 __END__
208 sub bloop {
209 print join (' ', No, strict, here), "!\n";
210 }
211
212then C<use strict;> isn't in force within the autoloaded subroutines. It would
213be more consistent (and less surprising) to arrange for all lexical pragmas
214in force at the __END__ block to be in force within each autoloaded subroutine.
215
773b3597 216There's a similar problem with SelfLoader.
217
91d0cbf6 218=head2 profile installman
219
220The F<installman> script is slow. All it is doing text processing, which we're
221told is something Perl is good at. So it would be nice to know what it is doing
222that is taking so much CPU, and where possible address it.
223
224
0bdfc961 225=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 226
0bdfc961 227Or if you prefer, tasks that you would learn from, and broaden your skills
228base...
e50bb9a1 229
cd793d32 230=head2 make HTML install work
e50bb9a1 231
adebf063 232There is an C<installhtml> target in the Makefile. It's marked as
233"experimental". It would be good to get this tested, make it work reliably, and
234remove the "experimental" tag. This would include
235
236=over 4
237
238=item 1
239
240Checking that cross linking between various parts of the documentation works.
241In particular that links work between the modules (files with POD in F<lib/>)
242and the core documentation (files in F<pod/>)
243
244=item 2
245
617eabfa 246Work out how to split C<perlfunc> into chunks, preferably one per function
247group, preferably with general case code that could be used elsewhere.
248Challenges here are correctly identifying the groups of functions that go
249together, and making the right named external cross-links point to the right
250page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
251C<endservent>, two or more C<=items> giving the different parameter lists, such
252as
adebf063 253
254 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
adebf063 255 =item substr EXPR,OFFSET,LENGTH
adebf063 256 =item substr EXPR,OFFSET
257
258and different parameter lists having different meanings. (eg C<select>)
259
260=back
3a89a73c 261
0bdfc961 262=head2 compressed man pages
263
264Be able to install them. This would probably need a configure test to see how
265the system does compressed man pages (same directory/different directory?
266same filename/different filename), as well as tweaking the F<installman> script
267to compress as necessary.
268
30222c0f 269=head2 Add a code coverage target to the Makefile
270
271Make it easy for anyone to run Devel::Cover on the core's tests. The steps
272to do this manually are roughly
273
274=over 4
275
276=item *
277
278do a normal C<Configure>, but include Devel::Cover as a module to install
279(see F<INSTALL> for how to do this)
280
281=item *
282
283 make perl
284
285=item *
286
287 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
288
289=item *
290
291Process the resulting Devel::Cover database
292
293=back
294
295This just give you the coverage of the F<.pm>s. To also get the C level
296coverage you need to
297
298=over 4
299
300=item *
301
302Additionally tell C<Configure> to use the appropriate C compiler flags for
303C<gcov>
304
305=item *
306
307 make perl.gcov
308
309(instead of C<make perl>)
310
311=item *
312
313After running the tests run C<gcov> to generate all the F<.gcov> files.
314(Including down in the subdirectories of F<ext/>
315
316=item *
317
318(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
319to get their stats into the cover_db directory.
320
321=item *
322
323Then process the Devel::Cover database
324
325=back
326
327It would be good to add a single switch to C<Configure> to specify that you
328wanted to perform perl level coverage, and another to specify C level
329coverage, and have C<Configure> and the F<Makefile> do all the right things
330automatically.
331
02f21748 332=head2 Make Config.pm cope with differences between built and installed perl
0bdfc961 333
334Quite often vendors ship a perl binary compiled with their (pay-for)
335compilers. People install a free compiler, such as gcc. To work out how to
336build extensions, Perl interrogates C<%Config>, so in this situation
337C<%Config> describes compilers that aren't there, and extension building
338fails. This forces people into choosing between re-compiling perl themselves
339using the compiler they have, or only using modules that the vendor ships.
340
341It would be good to find a way teach C<Config.pm> about the installation setup,
342possibly involving probing at install time or later, so that the C<%Config> in
343a binary distribution better describes the installed machine, when the
344installed machine differs from the build machine in some significant way.
345
728f4ecd 346=head2 linker specification files
347
348Some platforms mandate that you provide a list of a shared library's external
349symbols to the linker, so the core already has the infrastructure in place to
350do this for generating shared perl libraries. My understanding is that the
351GNU toolchain can accept an optional linker specification file, and restrict
352visibility just to symbols declared in that file. It would be good to extend
353F<makedef.pl> to support this format, and to provide a means within
354C<Configure> to enable it. This would allow Unix users to test that the
355export list is correct, and to build a perl that does not pollute the global
356namespace with private symbols.
357
a229ae3b 358=head2 Cross-compile support
359
360Currently C<Configure> understands C<-Dusecrosscompile> option. This option
361arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
362assumed then to be copied to TARGET machine and used as a replacement of full
363C<perl> executable.
364
d1307786 365This could be done little differently. Namely C<miniperl> should be built for
a229ae3b 366HOST and then full C<perl> with extensions should be compiled for TARGET.
d1307786 367This, however, might require extra trickery for %Config: we have one config
87a942b1 368first for HOST and then another for TARGET. Tools like MakeMaker will be
369mightily confused. Having around two different types of executables and
370libraries (HOST and TARGET) makes life interesting for Makefiles and
371shell (and Perl) scripts. There is $Config{run}, normally empty, which
372can be used as an execution wrapper. Also note that in some
373cross-compilation/execution environments the HOST and the TARGET do
374not see the same filesystem(s), the $Config{run} may need to do some
375file/directory copying back and forth.
0bdfc961 376
8537f021 377=head2 roffitall
378
379Make F<pod/roffitall> be updated by F<pod/buildtoc>.
380
98fca0e8 381=head2 Split "linker" from "compiler"
382
383Right now, Configure probes for two commands, and sets two variables:
384
385=over 4
386
387=item * C<cc (cc.U)>
388
389This variable holds the name of a command to execute a C compiler which
390can resolve multiple global references that happen to have the same
391name. Usual values are F<cc> and F<gcc>.
392Fervent ANSI compilers may be called F<c89>. AIX has F<xlc>.
393
394=item * ld (dlsrc.U)
395
396This variable indicates the program to be used to link
397libraries for dynamic loading. On some systems, it is F<ld>.
398On ELF systems, it should be C<$cc>. Mostly, we'll try to respect
399the hint file setting.
400
401=back
402
8d159ec1 403There is an implicit historical assumption from around Perl5.000alpha
404something, that C<$cc> is also the correct command for linking object files
405together to make an executable. This may be true on Unix, but it's not true
406on other platforms, and there are a maze of work arounds in other places (such
407as F<Makefile.SH>) to cope with this.
98fca0e8 408
409Ideally, we should create a new variable to hold the name of the executable
410linker program, probe for it in F<Configure>, and centralise all the special
411case logic there or in hints files.
412
413A small bikeshed issue remains - what to call it, given that C<$ld> is already
8d159ec1 414taken (arguably for the wrong thing now, but on SunOS 4.1 it is the command
415for creating dynamically-loadable modules) and C<$link> could be confused with
416the Unix command line executable of the same name, which does something
417completely different. Andy Dougherty makes the counter argument "In parrot, I
418tried to call the command used to link object files and libraries into an
419executable F<link>, since that's what my vaguely-remembered DOS and VMS
420experience suggested. I don't think any real confusion has ensued, so it's
421probably a reasonable name for perl5 to use."
98fca0e8 422
423"Alas, I've always worried that introducing it would make things worse,
424since now the module building utilities would have to look for
425C<$Config{link}> and institute a fall-back plan if it weren't found."
8d159ec1 426Although I can see that as confusing, given that C<$Config{d_link}> is true
427when (hard) links are available.
98fca0e8 428
0bdfc961 429=head1 Tasks that need a little C knowledge
430
431These tasks would need a little C knowledge, but don't need any specific
432background or experience with XS, or how the Perl interpreter works
433
3d826b29 434=head2 Weed out needless PERL_UNUSED_ARG
435
436The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
437unused arguments. Often the arguments can't be removed, as there is an
438external constraint that determines the prototype of the function, so this
439approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
440could be removed. Specifically
441
442=over 4
443
444=item *
445
446The prototypes of (nearly all) static functions can be changed
447
448=item *
449
450Unused arguments generated by short cut macros are wasteful - the short cut
451macro used can be changed.
452
453=back
454
fbf638cb 455=head2 Modernize the order of directories in @INC
456
457The way @INC is laid out by default, one cannot upgrade core (dual-life)
458modules without overwriting files. This causes problems for binary
3d14fd97 459package builders. One possible proposal is laid out in this
460message:
461L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
fbf638cb 462
bcbaa2d5 463=head2 -Duse32bit*
464
465Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
466On these systems, it might be the default compilation mode, and there
467is currently no guarantee that passing no use64bitall option to the
468Configure process will build a 32bit perl. Implementing -Duse32bit*
469options would be nice for perl 5.12.
470
0bdfc961 471=head2 Make it clear from -v if this is the exact official release
89007cb3 472
617eabfa 473Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
474usually defines one local patch, of the form "MAINT12345" or "RC1". The output
475of perl -v doesn't report that a perl isn't an official release, and this
89007cb3 476information can get lost in bugs reports. Because of this, the minor version
fa11829f 477isn't bumped up until RC time, to minimise the possibility of versions of perl
89007cb3 478escaping that believe themselves to be newer than they actually are.
479
480It would be useful to find an elegant way to have the "this is an interim
481maintenance release" or "this is a release candidate" in the terse -v output,
482and have it so that it's easy for the pumpking to remove this just as the
483release tarball is rolled up. This way the version pulled out of rsync would
484always say "I'm a development release" and it would be safe to bump the
485reported minor version as soon as a release ships, which would aid perl
486developers.
487
0bdfc961 488This task is really about thinking of an elegant way to arrange the C source
489such that it's trivial for the Pumpking to flag "this is an official release"
490when making a tarball, yet leave the default source saying "I'm not the
491official release".
492
fee0a0f7 493=head2 Profile Perl - am I hot or not?
62403a3c 494
fee0a0f7 495The Perl source code is stable enough that it makes sense to profile it,
496identify and optimise the hotspots. It would be good to measure the
497performance of the Perl interpreter using free tools such as cachegrind,
498gprof, and dtrace, and work to reduce the bottlenecks they reveal.
499
500As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
501the ops that are most commonly used. The idea is that by grouping them, their
502object code will be adjacent in the executable, so they have a greater chance
503of already being in the CPU cache (or swapped in) due to being near another op
504already in use.
62403a3c 505
506Except that it's not clear if these really are the most commonly used ops. So
fee0a0f7 507as part of exercising your skills with coverage and profiling tools you might
508want to determine what ops I<really> are the most commonly used. And in turn
509suggest evictions and promotions to achieve a better F<pp_hot.c>.
62403a3c 510
91d0cbf6 511One piece of Perl code that might make a good testbed is F<installman>.
512
98fed0ad 513=head2 Allocate OPs from arenas
514
515Currently all new OP structures are individually malloc()ed and free()d.
516All C<malloc> implementations have space overheads, and are now as fast as
517custom allocates so it would both use less memory and less CPU to allocate
518the various OP structures from arenas. The SV arena code can probably be
519re-used for this.
520
539f2c54 521Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
522Perl_Slab_alloc() to pack optrees into a contiguous block, which is
523probably superior to the use of OP arenas, esp. from a cache locality
524standpoint. See L<Profile Perl - am I hot or not?>.
525
a229ae3b 526=head2 Improve win32/wince.c
0bdfc961 527
a229ae3b 528Currently, numerous functions look virtually, if not completely,
02f21748 529identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
6d71adcd 530be good.
531
c5b31784 532=head2 Use secure CRT functions when building with VC8 on Win32
533
534Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
535that they were "unsafe" and introduced differently named secure versions of
536them as replacements, e.g. instead of writing
537
538 FILE* f = fopen(__FILE__, "r");
539
540one should now write
541
542 FILE* f;
543 errno_t err = fopen_s(&f, __FILE__, "r");
544
545Currently, the warnings about these deprecations have been disabled by adding
546-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
547warning suppressant and actually make use of the new secure CRT functions.
548
549There is also a similar issue with POSIX CRT function names like fileno having
550been deprecated in favour of ISO C++ conformant names like _fileno. These
26a6faa8 551warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
c5b31784 552might be nice to do as Microsoft suggest here too, although, unlike the secure
553functions issue, there is presumably little or no benefit in this case.
554
038ae9a4 555=head2 Fix POSIX::access() and chdir() on Win32
556
557These functions currently take no account of DACLs and therefore do not behave
558correctly in situations where access is restricted by DACLs (as opposed to the
559read-only attribute).
560
561Furthermore, POSIX::access() behaves differently for directories having the
562read-only attribute set depending on what CRT library is being used. For
563example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
564such directories are not writable, whereas in fact all directories are writable
565unless access is denied by DACLs. (In the case of directories, the read-only
566attribute actually only means that the directory cannot be deleted.) This CRT
567bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
568not actually be writable if access is indeed denied by DACLs).
569
570For the chdir() issue, see ActiveState bug #74552:
571http://bugs.activestate.com/show_bug.cgi?id=74552
572
573Therefore, DACLs should be checked both for consistency across CRTs and for
574the correct answer.
575
576(Note that perl's -w operator should not be modified to check DACLs. It has
577been written so that it reflects the state of the read-only attribute, even
578for directories (whatever CRT is being used), for symmetry with chmod().)
579
16815324 580=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
581
582Maybe create a utility that checks after each libperl.a creation that
583none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets())
584ever creep back to libperl.a.
585
586 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
587
588Note, of course, that this will only tell whether B<your> platform
589is using those naughty interfaces.
590
de96509d 591=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
592
593Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
594(4.1 onwards?) supports C<-fstack-protector>, both of which give
595protection against various kinds of buffer overflow problems.
596These should probably be used for compiling Perl whenever available,
597Configure and/or hints files should be adjusted to probe for the
598availability of these features and enable them as appropriate.
16815324 599
8964cfe0 600=head2 Arenas for GPs? For MAGIC?
601
602C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
603It might be a speed or memory saving to change to using arenas. Or it might
604not. It would need some suitable benchmarking first. In particular, C<GP>s
605can probably be changed with minimal compatibility impact (probably nothing
606outside of the core, or even outside of F<gv.c> allocates them), but they
607probably aren't allocated/deallocated often enough for a speed saving. Whereas
608C<MAGIC> is allocated/deallocated more often, but in turn, is also something
609more externally visible, so changing the rules here may bite external code.
610
3880c8ec 611=head2 Shared arenas
612
613Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
614PVHV, and PVCV and PVFM. It should be possible to allocate and return same
615sized bodies from the same actual arena, rather than maintaining one arena for
616each. This could save 4-6K per thread, of memory no longer tied up in the
617not-yet-allocated part of an arena.
618
8964cfe0 619
6d71adcd 620=head1 Tasks that need a knowledge of XS
621
622These tasks would need C knowledge, and roughly the level of knowledge of
623the perl API that comes from writing modules that use XS to interface to
624C.
625
5d96f598 626=head2 safely supporting POSIX SA_SIGINFO
627
628Some years ago Jarkko supplied patches to provide support for the POSIX
629SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
630
631Unfortunately, it only works with "unsafe" signals, because under safe
632signals, by the time Perl gets to run the signal handler, the extra
633information has been lost. Moreover, it's not easy to store it somewhere,
634as you can't call mutexs, or do anything else fancy, from inside a signal
635handler.
636
637So it strikes me that we could provide safe SA_SIGINFO support
638
639=over 4
640
641=item 1
642
643Provide global variables for two file descriptors
644
645=item 2
646
647When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
648pipe, store the reader in one, the writer in the other
649
650=item 3
651
652In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
653the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
654
655=over 8
656
657=item 1
658
659serialise signal number, C<struct siginfo_t> (or at least the parts we care
660about) into a small auto char buff
661
662=item 2
663
664C<write()> that (non-blocking) to the writer fd
665
666=over 12
667
668=item 1
669
670if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
671to the current per-signal-number counts
672
673=item 2
674
675if it writes 0%, assume the pipe is full. Flag the data as lost?
676
677=item 3
678
679if it writes partially, croak a panic, as your OS is broken.
680
681=back
682
683=back
684
685=item 4
686
687in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
688the pipe", read the data out, deserialise, build the Perl structures on
689the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
690usual.
691
692=back
693
694I think that this gets us decent C<SA_SIGINFO> support, without the current risk
695of running Perl code inside the signal handler context. (With all the dangers
696of things like C<malloc> corruption that that currently offers us)
697
698For more information see the thread starting with this message:
699http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
700
6d71adcd 701=head2 autovivification
702
703Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
704
705This task is incremental - even a little bit of work on it will help.
706
707=head2 Unicode in Filenames
708
709chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
710opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
711system, truncate, unlink, utime, -X. All these could potentially accept
712Unicode filenames either as input or output (and in the case of system
713and qx Unicode in general, as input or output to/from the shell).
714Whether a filesystem - an operating system pair understands Unicode in
715filenames varies.
716
717Known combinations that have some level of understanding include
718Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
719OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
720create Unicode filenames, what forms of Unicode are accepted and used
721(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
722and so on, varies. Finding the right level of interfacing to Perl
723requires some thought. Remember that an OS does not implicate a
724filesystem.
725
726(The Windows -C command flag "wide API support" has been at least
727temporarily retired in 5.8.1, and the -C has been repurposed, see
728L<perlrun>.)
729
87a942b1 730Most probably the right way to do this would be this:
731L</"Virtualize operating system access">.
732
6d71adcd 733=head2 Unicode in %ENV
734
735Currently the %ENV entries are always byte strings.
87a942b1 736See L</"Virtualize operating system access">.
6d71adcd 737
1f2e7916 738=head2 Unicode and glob()
739
740Currently glob patterns and filenames returned from File::Glob::glob()
87a942b1 741are always byte strings. See L</"Virtualize operating system access">.
1f2e7916 742
dbb0c492 743=head2 Unicode and lc/uc operators
744
745Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
746what the internal encoding of their argument is. That should not be the
747case. Maybe add a pragma to switch behaviour.
748
6d71adcd 749=head2 use less 'memory'
750
751Investigate trade offs to switch out perl's choices on memory usage.
752Particularly perl should be able to give memory back.
753
754This task is incremental - even a little bit of work on it will help.
755
756=head2 Re-implement C<:unique> in a way that is actually thread-safe
757
758The old implementation made bad assumptions on several levels. A good 90%
759solution might be just to make C<:unique> work to share the string buffer
760of SvPVs. That way large constant strings can be shared between ithreads,
761such as the configuration information in F<Config>.
762
763=head2 Make tainting consistent
764
765Tainting would be easier to use if it didn't take documented shortcuts and
766allow taint to "leak" everywhere within an expression.
767
768=head2 readpipe(LIST)
769
770system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
771running a shell. readpipe() (the function behind qx//) could be similarly
772extended.
773
6d71adcd 774=head2 Audit the code for destruction ordering assumptions
775
776Change 25773 notes
777
778 /* Need to check SvMAGICAL, as during global destruction it may be that
779 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
780 is now part of the linked list of SV heads, rather than pointing to
781 the original body. */
782 /* FIXME - audit the code for other bugs like this one. */
783
784adding the C<SvMAGICAL> check to
785
786 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
787 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
788
789Go through the core and look for similar assumptions that SVs have particular
790types, as all bets are off during global destruction.
791
749904bf 792=head2 Extend PerlIO and PerlIO::Scalar
793
794PerlIO::Scalar doesn't know how to truncate(). Implementing this
795would require extending the PerlIO vtable.
796
797Similarly the PerlIO vtable doesn't know about formats (write()), or
798about stat(), or chmod()/chown(), utime(), or flock().
799
800(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
801would mean.)
802
803PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
804opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
805readlink().
806
94da6c29 807See also L</"Virtualize operating system access">.
808
3236f110 809=head2 -C on the #! line
810
811It should be possible to make -C work correctly if found on the #! line,
812given that all perl command line options are strict ASCII, and -C changes
813only the interpretation of non-ASCII characters, and not for the script file
814handle. To make it work needs some investigation of the ordering of function
815calls during startup, and (by implication) a bit of tweaking of that order.
816
d6c1e11f 817=head2 Organize error messages
818
819Perl's diagnostics (error messages, see L<perldiag>) could use
a8d0aeb9 820reorganizing and formalizing so that each error message has its
d6c1e11f 821stable-for-all-eternity unique id, categorized by severity, type, and
822subsystem. (The error messages would be listed in a datafile outside
c4bd451b 823of the Perl source code, and the source code would only refer to the
824messages by the id.) This clean-up and regularizing should apply
d6c1e11f 825for all croak() messages.
826
827This would enable all sorts of things: easier translation/localization
828of the messages (though please do keep in mind the caveats of
829L<Locale::Maketext> about too straightforward approaches to
830translation), filtering by severity, and instead of grepping for a
831particular error message one could look for a stable error id. (Of
832course, changing the error messages by default would break all the
833existing software depending on some particular error message...)
834
835This kind of functionality is known as I<message catalogs>. Look for
836inspiration for example in the catgets() system, possibly even use it
837if available-- but B<only> if available, all platforms will B<not>
de96509d 838have catgets().
d6c1e11f 839
840For the really pure at heart, consider extending this item to cover
841also the warning messages (see L<perllexwarn>, C<warnings.pl>).
3236f110 842
0bdfc961 843=head1 Tasks that need a knowledge of the interpreter
3298bd4d 844
0bdfc961 845These tasks would need C knowledge, and knowledge of how the interpreter works,
846or a willingness to learn.
3298bd4d 847
718140ec 848=head2 lexicals used only once
849
850This warns:
851
852 $ perl -we '$pie = 42'
853 Name "main::pie" used only once: possible typo at -e line 1.
854
855This does not:
856
857 $ perl -we 'my $pie = 42'
858
859Logically all lexicals used only once should warn, if the user asks for
d6f4ea2e 860warnings. An unworked RT ticket (#5087) has been open for almost seven
861years for this discrepancy.
718140ec 862
a3d15f9a 863=head2 UTF-8 revamp
864
865The handling of Unicode is unclean in many places. For example, the regexp
866engine matches in Unicode semantics whenever the string or the pattern is
867flagged as UTF-8, but that should not be dependent on an internal storage
868detail of the string. Likewise, case folding behaviour is dependent on the
869UTF8 internal flag being on or off.
870
871=head2 Properly Unicode safe tokeniser and pads.
872
873The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
874variable names are stored in stashes as raw bytes, without the utf-8 flag
875set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
876tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
877source filters. All this could be fixed.
878
636e63cb 879=head2 state variable initialization in list context
880
881Currently this is illegal:
882
883 state ($a, $b) = foo();
884
a2874905 885In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
a8d0aeb9 886semantics, which is tricky to implement in Perl 5 as currently they produce
a2874905 887the same opcode trees. The Perl 6 design is firm, so it would be good to
a8d0aeb9 888implement the necessary code in Perl 5. There are comments in
a2874905 889C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
890constructions involving state variables.
636e63cb 891
4fedb12c 892=head2 Implement $value ~~ 0 .. $range
893
894It would be nice to extend the syntax of the C<~~> operator to also
895understand numeric (and maybe alphanumeric) ranges.
a393eb28 896
897=head2 A does() built-in
898
899Like ref(), only useful. It would call the C<DOES> method on objects; it
900would also tell whether something can be dereferenced as an
901array/hash/etc., or used as a regexp, etc.
902L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
903
904=head2 Tied filehandles and write() don't mix
905
906There is no method on tied filehandles to allow them to be called back by
907formats.
4fedb12c 908
d10fc472 909=head2 Attach/detach debugger from running program
1626a787 910
cd793d32 911The old perltodo notes "With C<gdb>, you can attach the debugger to a running
912program if you pass the process ID. It would be good to do this with the Perl
0bdfc961 913debugger on a running Perl program, although I'm not sure how it would be
914done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 915
a8cb5b9e 916=head2 Optimize away empty destructors
917
918Defining an empty DESTROY method might be useful (notably in
919AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
920could probably be optimized.
921
0bdfc961 922=head2 LVALUE functions for lists
923
924The old perltodo notes that lvalue functions don't work for list or hash
925slices. This would be good to fix.
926
927=head2 LVALUE functions in the debugger
928
929The old perltodo notes that lvalue functions don't work in the debugger. This
930would be good to fix.
931
0bdfc961 932=head2 regexp optimiser optional
933
934The regexp optimiser is not optional. It should configurable to be, to allow
935its performance to be measured, and its bugs to be easily demonstrated.
936
02f21748 937=head2 delete &function
938
939Allow to delete functions. One can already undef them, but they're still
940in the stash.
941
ef36c6a7 942=head2 C</w> regex modifier
943
944That flag would enable to match whole words, and also to interpolate
945arrays as alternations. With it, C</P/w> would be roughly equivalent to:
946
947 do { local $"='|'; /\b(?:P)\b/ }
948
949See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
950for the discussion.
951
0bdfc961 952=head2 optional optimizer
953
954Make the peephole optimizer optional. Currently it performs two tasks as
955it walks the optree - genuine peephole optimisations, and necessary fixups of
956ops. It would be good to find an efficient way to switch out the
957optimisations whilst keeping the fixups.
958
959=head2 You WANT *how* many
960
961Currently contexts are void, scalar and list. split has a special mechanism in
962place to pass in the number of return values wanted. It would be useful to
963have a general mechanism for this, backwards compatible and little speed hit.
964This would allow proposals such as short circuiting sort to be implemented
965as a module on CPAN.
966
967=head2 lexical aliases
968
969Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
970
971=head2 entersub XS vs Perl
972
973At the moment pp_entersub is huge, and has code to deal with entering both
974perl and XS subroutines. Subroutine implementations rarely change between
975perl and XS at run time, so investigate using 2 ops to enter subs (one for
976XS, one for perl) and swap between if a sub is redefined.
2810d901 977
de535794 978=head2 Self-ties
2810d901 979
de535794 980Self-ties are currently illegal because they caused too many segfaults. Maybe
a8d0aeb9 981the causes of these could be tracked down and self-ties on all types
de535794 982reinstated.
0bdfc961 983
984=head2 Optimize away @_
985
986The old perltodo notes "Look at the "reification" code in C<av.c>".
987
87a942b1 988=head2 Virtualize operating system access
989
990Implement a set of "vtables" that virtualizes operating system access
991(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
992least these interfaces should take SVs as "name" arguments instead of
993bare char pointers; probably the most flexible and extensible way
e1a3d5d1 994would be for the Perl-facing interfaces to accept HVs. The system
995needs to be per-operating-system and per-file-system
996hookable/filterable, preferably both from XS and Perl level
87a942b1 997(L<perlport/"Files and Filesystems"> is good reading at this point,
998in fact, all of L<perlport> is.)
999
e1a3d5d1 1000This has actually already been implemented (but only for Win32),
1001take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
1002variants go through a set of "vtables" for operating system access,
1003non-Win32 systems currently go straight for the POSIX/UNIX-style
1004system/library call. Similar system as for Win32 should be
1005implemented for all platforms. The existing Win32 implementation
1006probably does not need to survive alongside this proposed new
1007implementation, the approaches could be merged.
87a942b1 1008
1009What would this give us? One often-asked-for feature this would
94da6c29 1010enable is using Unicode for filenames, and other "names" like %ENV,
1011usernames, hostnames, and so forth.
1012(See L<perlunicode/"When Unicode Does Not Happen">.)
1013
1014But this kind of virtualization would also allow for things like
1015virtual filesystems, virtual networks, and "sandboxes" (though as long
1016as dynamic loading of random object code is allowed, not very safe
1017sandboxes since external code of course know not of Perl's vtables).
1018An example of a smaller "sandbox" is that this feature can be used to
1019implement per-thread working directories: Win32 already does this.
1020
1021See also L</"Extend PerlIO and PerlIO::Scalar">.
87a942b1 1022
ac6197af 1023=head2 Investigate PADTMP hash pessimisation
1024
1025The peephole optimier converts constants used for hash key lookups to shared
057163d7 1026hash key scalars. Under ithreads, something is undoing this work.
ac6197af 1027See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
1028
057163d7 1029=head2 Store the current pad in the OP slab allocator
1030
1031=for clarification
1032I hope that I got that "current pad" part correct
1033
1034Currently we leak ops in various cases of parse failure. I suggested that we
1035could solve this by always using the op slab allocator, and walking it to
1036free ops. Dave comments that as some ops are already freed during optree
1037creation one would have to mark which ops are freed, and not double free them
1038when walking the slab. He notes that one problem with this is that for some ops
1039you have to know which pad was current at the time of allocation, which does
1040change. I suggested storing a pointer to the current pad in the memory allocated
1041for the slab, and swapping to a new slab each time the pad changes. Dave thinks
1042that this would work.
1043
52960e22 1044=head2 repack the optree
1045
1046Repacking the optree after execution order is determined could allow
057163d7 1047removal of NULL ops, and optimal ordering of OPs with respect to cache-line
1048filling. The slab allocator could be reused for this purpose. I think that
1049the best way to do this is to make it an optional step just before the
1050completed optree is attached to anything else, and to use the slab allocator
1051unchanged, so that freeing ops is identical whether or not this step runs.
1052Note that the slab allocator allocates ops downwards in memory, so one would
1053have to actually "allocate" the ops in reverse-execution order to get them
1054contiguous in memory in execution order.
1055
1056See http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
1057
1058Note that running this copy, and then freeing all the old location ops would
1059cause their slabs to be freed, which would eliminate possible memory wastage if
1060the previous suggestion is implemented, and we swap slabs more frequently.
52960e22 1061
12e06b6f 1062=head2 eliminate incorrect line numbers in warnings
1063
1064This code
1065
1066 use warnings;
1067 my $undef;
1068
1069 if ($undef == 3) {
1070 } elsif ($undef == 0) {
1071 }
1072
18a16cc5 1073used to produce this output:
12e06b6f 1074
1075 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1076 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1077
18a16cc5 1078where the line of the second warning was misreported - it should be line 5.
1079Rafael fixed this - the problem arose because there was no nextstate OP
1080between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
1081reports that the currently executing line is line 4. The solution was to inject
1082a nextstate OPs for each C<elsif>, although it turned out that the nextstate
1083OP needed to be a nulled OP, rather than a live nextstate OP, else other line
1084numbers became misreported. (Jenga!)
12e06b6f 1085
1086The problem is more general than C<elsif> (although the C<elsif> case is the
1087most common and the most confusing). Ideally this code
1088
1089 use warnings;
1090 my $undef;
1091
1092 my $a = $undef + 1;
1093 my $b
1094 = $undef
1095 + 1;
1096
1097would produce this output
1098
1099 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1100 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1101
1102(rather than lines 4 and 5), but this would seem to require every OP to carry
1103(at least) line number information.
1104
1105What might work is to have an optional line number in memory just before the
1106BASEOP structure, with a flag bit in the op to say whether it's present.
1107Initially during compile every OP would carry its line number. Then add a late
1108pass to the optimiser (potentially combined with L</repack the optree>) which
1109looks at the two ops on every edge of the graph of the execution path. If
1110the line number changes, flags the destination OP with this information.
1111Once all paths are traced, replace every op with the flag with a
1112nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
1113control on to the true op. All ops would then be replaced by variants that
1114do not store the line number. (Which, logically, why it would work best in
1115conjunction with L</repack the optree>, as that is already copying/reallocating
1116all the OPs)
1117
18a16cc5 1118(Although I should note that we're not certain that doing this for the general
1119case is worth it)
1120
52960e22 1121=head2 optimize tail-calls
1122
1123Tail-calls present an opportunity for broadly applicable optimization;
1124anywhere that C<< return foo(...) >> is called, the outer return can
1125be replaced by a goto, and foo will return directly to the outer
1126caller, saving (conservatively) 25% of perl's call&return cost, which
1127is relatively higher than in C. The scheme language is known to do
1128this heavily. B::Concise provides good insight into where this
1129optimization is possible, ie anywhere entersub,leavesub op-sequence
1130occurs.
1131
1132 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1133
1134Bottom line on this is probably a new pp_tailcall function which
1135combines the code in pp_entersub, pp_leavesub. This should probably
1136be done 1st in XS, and using B::Generate to patch the new OP into the
1137optrees.
1138
0bdfc961 1139=head1 Big projects
1140
1141Tasks that will get your name mentioned in the description of the "Highlights
87a942b1 1142of 5.12"
0bdfc961 1143
1144=head2 make ithreads more robust
1145
4e577f8b 1146Generally make ithreads more robust. See also L</iCOW>
0bdfc961 1147
1148This task is incremental - even a little bit of work on it will help, and
1149will be greatly appreciated.
1150
6c047da7 1151One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1152
59c7f7d5 1153Fix Perl_sv_dup, et al so that threads can return objects.
1154
0bdfc961 1155=head2 iCOW
1156
1157Sarathy and Arthur have a proposal for an improved Copy On Write which
1158specifically will be able to COW new ithreads. If this can be implemented
1159it would be a good thing.
1160
1161=head2 (?{...}) closures in regexps
1162
1163Fix (or rewrite) the implementation of the C</(?{...})/> closures.
1164
1165=head2 A re-entrant regexp engine
1166
1167This will allow the use of a regex from inside (?{ }), (??{ }) and
1168(?(?{ })|) constructs.
6bda09f9 1169
6bda09f9 1170=head2 Add class set operations to regexp engine
1171
1172Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1173
1174demerphq has this on his todo list, but right at the bottom.