Move the modules, tests, prove and Changes file from lib/ to
[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
CommitLineData
7711098a 1=head1 NAME
2
3perltodo - Perl TO-DO List
4
5=head1 DESCRIPTION
e50bb9a1 6
52960e22 7This is a list of wishes for Perl. The tasks we think are smaller or
8easier are listed first. Anyone is welcome to work on any of these,
9but it's a good idea to first contact I<perl5-porters@perl.org> to
10avoid duplication of effort, and to learn from any previous attempts.
11By all means contact a pumpking privately first if you prefer.
e50bb9a1 12
0bdfc961 13Whilst patches to make the list shorter are most welcome, ideas to add to
14the list are also encouraged. Check the perl5-porters archives for past
15ideas, and any discussion about them. One set of archives may be found at:
e50bb9a1 16
0bdfc961 17 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
938c8732 18
617eabfa 19What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
20not, but if your patch is incorporated, then we'll add your name to the
21F<AUTHORS> file, which ships in the official distribution. How many other
22programming languages offer you 1 line of immortality?
938c8732 23
0bdfc961 24=head1 Tasks that only need Perl knowledge
e50bb9a1 25
5a176cbc 26=head2 Remove duplication of test setup.
27
28Schwern notes, that there's duplication of code - lots and lots of tests have
29some variation on the big block of C<$Is_Foo> checks. We can safely put this
30into a file, change it to build an C<%Is> hash and require it. Maybe just put
31it into F<test.pl>. Throw in the handy tainting subroutines.
32
87a942b1 33=head2 POD -E<gt> HTML conversion in the core still sucks
e50bb9a1 34
938c8732 35Which is crazy given just how simple POD purports to be, and how simple HTML
adebf063 36can be. It's not actually I<as> simple as it sounds, particularly with the
37flexibility POD allows for C<=item>, but it would be good to improve the
38visual appeal of the HTML generated, and to avoid it having any validation
39errors. See also L</make HTML install work>, as the layout of installation tree
40is needed to improve the cross-linking.
938c8732 41
dc0fb092 42The addition of C<Pod::Simple> and its related modules may make this task
43easier to complete.
44
8537f021 45=head2 merge checkpods and podchecker
46
47F<pod/checkpods.PL> (and C<make check> in the F<pod/> subdirectory)
48implements a very basic check for pod files, but the errors it discovers
49aren't found by podchecker. Add this check to podchecker, get rid of
50checkpods and have C<make check> use podchecker.
51
aa237293 52=head2 Parallel testing
53
b2e2905c 54(This probably impacts much more than the core: also the Test::Harness
02f21748 55and TAP::* modules on CPAN.)
56
c707cc00 57All of the tests in F<t/> can now be run in parallel, if C<$ENV{TEST_JOBS}>
58is set. However, tests within each directory in F<ext> and F<lib> are still
59run in series, with directories run in parallel. This is an adequate
60heuristic, but it might be possible to relax it further, and get more
61throughput. Specifically, it would be good to audit all of F<lib/*.t>, and
62make them use C<File::Temp>.
aa237293 63
0bdfc961 64=head2 Make Schwern poorer
e50bb9a1 65
613bd4f7 66We should have tests for everything. When all the core's modules are tested,
0bdfc961 67Schwern has promised to donate to $500 to TPF. We may need volunteers to
68hold him upside down and shake vigorously in order to actually extract the
69cash.
3958b146 70
0bdfc961 71=head2 Improve the coverage of the core tests
e50bb9a1 72
02f21748 73Use Devel::Cover to ascertain the core modules's test coverage, then add
74tests that are currently missing.
30222c0f 75
0bdfc961 76=head2 test B
e50bb9a1 77
0bdfc961 78A full test suite for the B module would be nice.
e50bb9a1 79
636e63cb 80=head2 Deparse inlined constants
81
82Code such as this
83
84 use constant PI => 4;
85 warn PI
86
87will currently deparse as
88
89 use constant ('PI', 4);
90 warn 4;
91
92because the tokenizer inlines the value of the constant subroutine C<PI>.
93This allows various compile time optimisations, such as constant folding
94and dead code elimination. Where these haven't happened (such as the example
95above) it ought be possible to make B::Deparse work out the name of the
96original constant, because just enough information survives in the symbol
97table to do this. Specifically, the same scalar is used for the constant in
98the optree as is used for the constant subroutine, so by iterating over all
99symbol tables and generating a mapping of SV address to constant name, it
100would be possible to provide B::Deparse with this functionality.
101
0bdfc961 102=head2 A decent benchmark
e50bb9a1 103
617eabfa 104C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961 105would be useful to have a reasonable general benchmarking suite that roughly
106represented what current perl programs do, and measurably reported whether
107tweaks to the core improve, degrade or don't really affect performance, to
108guide people attempting to optimise the guts of perl. Gisle would welcome
109new tests for perlbench.
6168cf99 110
0bdfc961 111=head2 fix tainting bugs
6168cf99 112
0bdfc961 113Fix the bugs revealed by running the test suite with the C<-t> switch (via
114C<make test.taintwarn>).
e50bb9a1 115
0bdfc961 116=head2 Dual life everything
e50bb9a1 117
0bdfc961 118As part of the "dists" plan, anything that doesn't belong in the smallest perl
119distribution needs to be dual lifed. Anything else can be too. Figure out what
120changes would be needed to package that module and its tests up for CPAN, and
121do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 122
a393eb28 123To make a minimal perl distribution, it's useful to look at
124F<t/lib/commonsense.t>.
125
c2aba5b8 126=head2 Bundle dual life modules in ext/
127
128For maintenance (and branch merging) reasons, it would be useful to move
129some architecture-independent dual-life modules from lib/ to ext/, if this
130has no negative impact on the build of perl itself.
131
132However, we need to make sure that they are still installed in
133architecture-independent directories by C<make install>.
134
0bdfc961 135=head2 Improving C<threads::shared>
722d2a37 136
0bdfc961 137Investigate whether C<threads::shared> could share aggregates properly with
138only Perl level changes to shared.pm
722d2a37 139
0bdfc961 140=head2 POSIX memory footprint
e50bb9a1 141
0bdfc961 142Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
143various times worked to cut it down. There is probably still fat to cut out -
144for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 145
eed36644 146=head2 embed.pl/makedef.pl
147
148There is a script F<embed.pl> that generates several header files to prefix
149all of Perl's symbols in a consistent way, to provide some semblance of
150namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
907b3e23 151in F<interpvar.h>. Quite a few of the functions and variables
eed36644 152are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
153doesn't understand the C macros, so the rules about which symbols are present
154when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
155It would be good to teach C<embed.pl> to understand the conditional
156compilation, and hence remove the duplication, and the mistakes it has caused.
e50bb9a1 157
801de10e 158=head2 use strict; and AutoLoad
159
160Currently if you write
161
162 package Whack;
163 use AutoLoader 'AUTOLOAD';
164 use strict;
165 1;
166 __END__
167 sub bloop {
168 print join (' ', No, strict, here), "!\n";
169 }
170
171then C<use strict;> isn't in force within the autoloaded subroutines. It would
172be more consistent (and less surprising) to arrange for all lexical pragmas
173in force at the __END__ block to be in force within each autoloaded subroutine.
174
773b3597 175There's a similar problem with SelfLoader.
176
91d0cbf6 177=head2 profile installman
178
179The F<installman> script is slow. All it is doing text processing, which we're
180told is something Perl is good at. So it would be nice to know what it is doing
181that is taking so much CPU, and where possible address it.
182
183
0bdfc961 184=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 185
0bdfc961 186Or if you prefer, tasks that you would learn from, and broaden your skills
187base...
e50bb9a1 188
cd793d32 189=head2 make HTML install work
e50bb9a1 190
adebf063 191There is an C<installhtml> target in the Makefile. It's marked as
192"experimental". It would be good to get this tested, make it work reliably, and
193remove the "experimental" tag. This would include
194
195=over 4
196
197=item 1
198
199Checking that cross linking between various parts of the documentation works.
200In particular that links work between the modules (files with POD in F<lib/>)
201and the core documentation (files in F<pod/>)
202
203=item 2
204
617eabfa 205Work out how to split C<perlfunc> into chunks, preferably one per function
206group, preferably with general case code that could be used elsewhere.
207Challenges here are correctly identifying the groups of functions that go
208together, and making the right named external cross-links point to the right
209page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
210C<endservent>, two or more C<=items> giving the different parameter lists, such
211as
adebf063 212
213 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
adebf063 214 =item substr EXPR,OFFSET,LENGTH
adebf063 215 =item substr EXPR,OFFSET
216
217and different parameter lists having different meanings. (eg C<select>)
218
219=back
3a89a73c 220
0bdfc961 221=head2 compressed man pages
222
223Be able to install them. This would probably need a configure test to see how
224the system does compressed man pages (same directory/different directory?
225same filename/different filename), as well as tweaking the F<installman> script
226to compress as necessary.
227
30222c0f 228=head2 Add a code coverage target to the Makefile
229
230Make it easy for anyone to run Devel::Cover on the core's tests. The steps
231to do this manually are roughly
232
233=over 4
234
235=item *
236
237do a normal C<Configure>, but include Devel::Cover as a module to install
238(see F<INSTALL> for how to do this)
239
240=item *
241
242 make perl
243
244=item *
245
246 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
247
248=item *
249
250Process the resulting Devel::Cover database
251
252=back
253
254This just give you the coverage of the F<.pm>s. To also get the C level
255coverage you need to
256
257=over 4
258
259=item *
260
261Additionally tell C<Configure> to use the appropriate C compiler flags for
262C<gcov>
263
264=item *
265
266 make perl.gcov
267
268(instead of C<make perl>)
269
270=item *
271
272After running the tests run C<gcov> to generate all the F<.gcov> files.
273(Including down in the subdirectories of F<ext/>
274
275=item *
276
277(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
278to get their stats into the cover_db directory.
279
280=item *
281
282Then process the Devel::Cover database
283
284=back
285
286It would be good to add a single switch to C<Configure> to specify that you
287wanted to perform perl level coverage, and another to specify C level
288coverage, and have C<Configure> and the F<Makefile> do all the right things
289automatically.
290
02f21748 291=head2 Make Config.pm cope with differences between built and installed perl
0bdfc961 292
293Quite often vendors ship a perl binary compiled with their (pay-for)
294compilers. People install a free compiler, such as gcc. To work out how to
295build extensions, Perl interrogates C<%Config>, so in this situation
296C<%Config> describes compilers that aren't there, and extension building
297fails. This forces people into choosing between re-compiling perl themselves
298using the compiler they have, or only using modules that the vendor ships.
299
300It would be good to find a way teach C<Config.pm> about the installation setup,
301possibly involving probing at install time or later, so that the C<%Config> in
302a binary distribution better describes the installed machine, when the
303installed machine differs from the build machine in some significant way.
304
728f4ecd 305=head2 linker specification files
306
307Some platforms mandate that you provide a list of a shared library's external
308symbols to the linker, so the core already has the infrastructure in place to
309do this for generating shared perl libraries. My understanding is that the
310GNU toolchain can accept an optional linker specification file, and restrict
311visibility just to symbols declared in that file. It would be good to extend
312F<makedef.pl> to support this format, and to provide a means within
313C<Configure> to enable it. This would allow Unix users to test that the
314export list is correct, and to build a perl that does not pollute the global
315namespace with private symbols.
316
a229ae3b 317=head2 Cross-compile support
318
319Currently C<Configure> understands C<-Dusecrosscompile> option. This option
320arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
321assumed then to be copied to TARGET machine and used as a replacement of full
322C<perl> executable.
323
d1307786 324This could be done little differently. Namely C<miniperl> should be built for
a229ae3b 325HOST and then full C<perl> with extensions should be compiled for TARGET.
d1307786 326This, however, might require extra trickery for %Config: we have one config
87a942b1 327first for HOST and then another for TARGET. Tools like MakeMaker will be
328mightily confused. Having around two different types of executables and
329libraries (HOST and TARGET) makes life interesting for Makefiles and
330shell (and Perl) scripts. There is $Config{run}, normally empty, which
331can be used as an execution wrapper. Also note that in some
332cross-compilation/execution environments the HOST and the TARGET do
333not see the same filesystem(s), the $Config{run} may need to do some
334file/directory copying back and forth.
0bdfc961 335
8537f021 336=head2 roffitall
337
338Make F<pod/roffitall> be updated by F<pod/buildtoc>.
339
0bdfc961 340=head1 Tasks that need a little C knowledge
341
342These tasks would need a little C knowledge, but don't need any specific
343background or experience with XS, or how the Perl interpreter works
344
3d826b29 345=head2 Weed out needless PERL_UNUSED_ARG
346
347The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
348unused arguments. Often the arguments can't be removed, as there is an
349external constraint that determines the prototype of the function, so this
350approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
351could be removed. Specifically
352
353=over 4
354
355=item *
356
357The prototypes of (nearly all) static functions can be changed
358
359=item *
360
361Unused arguments generated by short cut macros are wasteful - the short cut
362macro used can be changed.
363
364=back
365
fbf638cb 366=head2 Modernize the order of directories in @INC
367
368The way @INC is laid out by default, one cannot upgrade core (dual-life)
369modules without overwriting files. This causes problems for binary
3d14fd97 370package builders. One possible proposal is laid out in this
371message:
372L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
fbf638cb 373
bcbaa2d5 374=head2 -Duse32bit*
375
376Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
377On these systems, it might be the default compilation mode, and there
378is currently no guarantee that passing no use64bitall option to the
379Configure process will build a 32bit perl. Implementing -Duse32bit*
380options would be nice for perl 5.12.
381
0bdfc961 382=head2 Make it clear from -v if this is the exact official release
89007cb3 383
617eabfa 384Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
385usually defines one local patch, of the form "MAINT12345" or "RC1". The output
386of perl -v doesn't report that a perl isn't an official release, and this
89007cb3 387information can get lost in bugs reports. Because of this, the minor version
fa11829f 388isn't bumped up until RC time, to minimise the possibility of versions of perl
89007cb3 389escaping that believe themselves to be newer than they actually are.
390
391It would be useful to find an elegant way to have the "this is an interim
392maintenance release" or "this is a release candidate" in the terse -v output,
393and have it so that it's easy for the pumpking to remove this just as the
394release tarball is rolled up. This way the version pulled out of rsync would
395always say "I'm a development release" and it would be safe to bump the
396reported minor version as soon as a release ships, which would aid perl
397developers.
398
0bdfc961 399This task is really about thinking of an elegant way to arrange the C source
400such that it's trivial for the Pumpking to flag "this is an official release"
401when making a tarball, yet leave the default source saying "I'm not the
402official release".
403
fee0a0f7 404=head2 Profile Perl - am I hot or not?
62403a3c 405
fee0a0f7 406The Perl source code is stable enough that it makes sense to profile it,
407identify and optimise the hotspots. It would be good to measure the
408performance of the Perl interpreter using free tools such as cachegrind,
409gprof, and dtrace, and work to reduce the bottlenecks they reveal.
410
411As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
412the ops that are most commonly used. The idea is that by grouping them, their
413object code will be adjacent in the executable, so they have a greater chance
414of already being in the CPU cache (or swapped in) due to being near another op
415already in use.
62403a3c 416
417Except that it's not clear if these really are the most commonly used ops. So
fee0a0f7 418as part of exercising your skills with coverage and profiling tools you might
419want to determine what ops I<really> are the most commonly used. And in turn
420suggest evictions and promotions to achieve a better F<pp_hot.c>.
62403a3c 421
91d0cbf6 422One piece of Perl code that might make a good testbed is F<installman>.
423
98fed0ad 424=head2 Allocate OPs from arenas
425
426Currently all new OP structures are individually malloc()ed and free()d.
427All C<malloc> implementations have space overheads, and are now as fast as
428custom allocates so it would both use less memory and less CPU to allocate
429the various OP structures from arenas. The SV arena code can probably be
430re-used for this.
431
539f2c54 432Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
433Perl_Slab_alloc() to pack optrees into a contiguous block, which is
434probably superior to the use of OP arenas, esp. from a cache locality
435standpoint. See L<Profile Perl - am I hot or not?>.
436
a229ae3b 437=head2 Improve win32/wince.c
0bdfc961 438
a229ae3b 439Currently, numerous functions look virtually, if not completely,
02f21748 440identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
6d71adcd 441be good.
442
c5b31784 443=head2 Use secure CRT functions when building with VC8 on Win32
444
445Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
446that they were "unsafe" and introduced differently named secure versions of
447them as replacements, e.g. instead of writing
448
449 FILE* f = fopen(__FILE__, "r");
450
451one should now write
452
453 FILE* f;
454 errno_t err = fopen_s(&f, __FILE__, "r");
455
456Currently, the warnings about these deprecations have been disabled by adding
457-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
458warning suppressant and actually make use of the new secure CRT functions.
459
460There is also a similar issue with POSIX CRT function names like fileno having
461been deprecated in favour of ISO C++ conformant names like _fileno. These
26a6faa8 462warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
c5b31784 463might be nice to do as Microsoft suggest here too, although, unlike the secure
464functions issue, there is presumably little or no benefit in this case.
465
038ae9a4 466=head2 Fix POSIX::access() and chdir() on Win32
467
468These functions currently take no account of DACLs and therefore do not behave
469correctly in situations where access is restricted by DACLs (as opposed to the
470read-only attribute).
471
472Furthermore, POSIX::access() behaves differently for directories having the
473read-only attribute set depending on what CRT library is being used. For
474example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
475such directories are not writable, whereas in fact all directories are writable
476unless access is denied by DACLs. (In the case of directories, the read-only
477attribute actually only means that the directory cannot be deleted.) This CRT
478bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
479not actually be writable if access is indeed denied by DACLs).
480
481For the chdir() issue, see ActiveState bug #74552:
482http://bugs.activestate.com/show_bug.cgi?id=74552
483
484Therefore, DACLs should be checked both for consistency across CRTs and for
485the correct answer.
486
487(Note that perl's -w operator should not be modified to check DACLs. It has
488been written so that it reflects the state of the read-only attribute, even
489for directories (whatever CRT is being used), for symmetry with chmod().)
490
16815324 491=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
492
493Maybe create a utility that checks after each libperl.a creation that
494none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets())
495ever creep back to libperl.a.
496
497 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
498
499Note, of course, that this will only tell whether B<your> platform
500is using those naughty interfaces.
501
de96509d 502=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
503
504Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
505(4.1 onwards?) supports C<-fstack-protector>, both of which give
506protection against various kinds of buffer overflow problems.
507These should probably be used for compiling Perl whenever available,
508Configure and/or hints files should be adjusted to probe for the
509availability of these features and enable them as appropriate.
16815324 510
8964cfe0 511=head2 Arenas for GPs? For MAGIC?
512
513C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
514It might be a speed or memory saving to change to using arenas. Or it might
515not. It would need some suitable benchmarking first. In particular, C<GP>s
516can probably be changed with minimal compatibility impact (probably nothing
517outside of the core, or even outside of F<gv.c> allocates them), but they
518probably aren't allocated/deallocated often enough for a speed saving. Whereas
519C<MAGIC> is allocated/deallocated more often, but in turn, is also something
520more externally visible, so changing the rules here may bite external code.
521
3880c8ec 522=head2 Shared arenas
523
524Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
525PVHV, and PVCV and PVFM. It should be possible to allocate and return same
526sized bodies from the same actual arena, rather than maintaining one arena for
527each. This could save 4-6K per thread, of memory no longer tied up in the
528not-yet-allocated part of an arena.
529
8964cfe0 530
6d71adcd 531=head1 Tasks that need a knowledge of XS
532
533These tasks would need C knowledge, and roughly the level of knowledge of
534the perl API that comes from writing modules that use XS to interface to
535C.
536
5d96f598 537=head2 safely supporting POSIX SA_SIGINFO
538
539Some years ago Jarkko supplied patches to provide support for the POSIX
540SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
541
542Unfortunately, it only works with "unsafe" signals, because under safe
543signals, by the time Perl gets to run the signal handler, the extra
544information has been lost. Moreover, it's not easy to store it somewhere,
545as you can't call mutexs, or do anything else fancy, from inside a signal
546handler.
547
548So it strikes me that we could provide safe SA_SIGINFO support
549
550=over 4
551
552=item 1
553
554Provide global variables for two file descriptors
555
556=item 2
557
558When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
559pipe, store the reader in one, the writer in the other
560
561=item 3
562
563In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
564the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
565
566=over 8
567
568=item 1
569
570serialise signal number, C<struct siginfo_t> (or at least the parts we care
571about) into a small auto char buff
572
573=item 2
574
575C<write()> that (non-blocking) to the writer fd
576
577=over 12
578
579=item 1
580
581if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
582to the current per-signal-number counts
583
584=item 2
585
586if it writes 0%, assume the pipe is full. Flag the data as lost?
587
588=item 3
589
590if it writes partially, croak a panic, as your OS is broken.
591
592=back
593
594=back
595
596=item 4
597
598in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
599the pipe", read the data out, deserialise, build the Perl structures on
600the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
601usual.
602
603=back
604
605I think that this gets us decent C<SA_SIGINFO> support, without the current risk
606of running Perl code inside the signal handler context. (With all the dangers
607of things like C<malloc> corruption that that currently offers us)
608
609For more information see the thread starting with this message:
610http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
611
6d71adcd 612=head2 autovivification
613
614Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
615
616This task is incremental - even a little bit of work on it will help.
617
618=head2 Unicode in Filenames
619
620chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
621opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
622system, truncate, unlink, utime, -X. All these could potentially accept
623Unicode filenames either as input or output (and in the case of system
624and qx Unicode in general, as input or output to/from the shell).
625Whether a filesystem - an operating system pair understands Unicode in
626filenames varies.
627
628Known combinations that have some level of understanding include
629Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
630OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
631create Unicode filenames, what forms of Unicode are accepted and used
632(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
633and so on, varies. Finding the right level of interfacing to Perl
634requires some thought. Remember that an OS does not implicate a
635filesystem.
636
637(The Windows -C command flag "wide API support" has been at least
638temporarily retired in 5.8.1, and the -C has been repurposed, see
639L<perlrun>.)
640
87a942b1 641Most probably the right way to do this would be this:
642L</"Virtualize operating system access">.
643
6d71adcd 644=head2 Unicode in %ENV
645
646Currently the %ENV entries are always byte strings.
87a942b1 647See L</"Virtualize operating system access">.
6d71adcd 648
1f2e7916 649=head2 Unicode and glob()
650
651Currently glob patterns and filenames returned from File::Glob::glob()
87a942b1 652are always byte strings. See L</"Virtualize operating system access">.
1f2e7916 653
dbb0c492 654=head2 Unicode and lc/uc operators
655
656Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
657what the internal encoding of their argument is. That should not be the
658case. Maybe add a pragma to switch behaviour.
659
6d71adcd 660=head2 use less 'memory'
661
662Investigate trade offs to switch out perl's choices on memory usage.
663Particularly perl should be able to give memory back.
664
665This task is incremental - even a little bit of work on it will help.
666
667=head2 Re-implement C<:unique> in a way that is actually thread-safe
668
669The old implementation made bad assumptions on several levels. A good 90%
670solution might be just to make C<:unique> work to share the string buffer
671of SvPVs. That way large constant strings can be shared between ithreads,
672such as the configuration information in F<Config>.
673
674=head2 Make tainting consistent
675
676Tainting would be easier to use if it didn't take documented shortcuts and
677allow taint to "leak" everywhere within an expression.
678
679=head2 readpipe(LIST)
680
681system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
682running a shell. readpipe() (the function behind qx//) could be similarly
683extended.
684
6d71adcd 685=head2 Audit the code for destruction ordering assumptions
686
687Change 25773 notes
688
689 /* Need to check SvMAGICAL, as during global destruction it may be that
690 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
691 is now part of the linked list of SV heads, rather than pointing to
692 the original body. */
693 /* FIXME - audit the code for other bugs like this one. */
694
695adding the C<SvMAGICAL> check to
696
697 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
698 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
699
700Go through the core and look for similar assumptions that SVs have particular
701types, as all bets are off during global destruction.
702
749904bf 703=head2 Extend PerlIO and PerlIO::Scalar
704
705PerlIO::Scalar doesn't know how to truncate(). Implementing this
706would require extending the PerlIO vtable.
707
708Similarly the PerlIO vtable doesn't know about formats (write()), or
709about stat(), or chmod()/chown(), utime(), or flock().
710
711(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
712would mean.)
713
714PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
715opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
716readlink().
717
94da6c29 718See also L</"Virtualize operating system access">.
719
3236f110 720=head2 -C on the #! line
721
722It should be possible to make -C work correctly if found on the #! line,
723given that all perl command line options are strict ASCII, and -C changes
724only the interpretation of non-ASCII characters, and not for the script file
725handle. To make it work needs some investigation of the ordering of function
726calls during startup, and (by implication) a bit of tweaking of that order.
727
d6c1e11f 728=head2 Organize error messages
729
730Perl's diagnostics (error messages, see L<perldiag>) could use
a8d0aeb9 731reorganizing and formalizing so that each error message has its
d6c1e11f 732stable-for-all-eternity unique id, categorized by severity, type, and
733subsystem. (The error messages would be listed in a datafile outside
c4bd451b 734of the Perl source code, and the source code would only refer to the
735messages by the id.) This clean-up and regularizing should apply
d6c1e11f 736for all croak() messages.
737
738This would enable all sorts of things: easier translation/localization
739of the messages (though please do keep in mind the caveats of
740L<Locale::Maketext> about too straightforward approaches to
741translation), filtering by severity, and instead of grepping for a
742particular error message one could look for a stable error id. (Of
743course, changing the error messages by default would break all the
744existing software depending on some particular error message...)
745
746This kind of functionality is known as I<message catalogs>. Look for
747inspiration for example in the catgets() system, possibly even use it
748if available-- but B<only> if available, all platforms will B<not>
de96509d 749have catgets().
d6c1e11f 750
751For the really pure at heart, consider extending this item to cover
752also the warning messages (see L<perllexwarn>, C<warnings.pl>).
3236f110 753
0bdfc961 754=head1 Tasks that need a knowledge of the interpreter
3298bd4d 755
0bdfc961 756These tasks would need C knowledge, and knowledge of how the interpreter works,
757or a willingness to learn.
3298bd4d 758
718140ec 759=head2 lexicals used only once
760
761This warns:
762
763 $ perl -we '$pie = 42'
764 Name "main::pie" used only once: possible typo at -e line 1.
765
766This does not:
767
768 $ perl -we 'my $pie = 42'
769
770Logically all lexicals used only once should warn, if the user asks for
d6f4ea2e 771warnings. An unworked RT ticket (#5087) has been open for almost seven
772years for this discrepancy.
718140ec 773
a3d15f9a 774=head2 UTF-8 revamp
775
776The handling of Unicode is unclean in many places. For example, the regexp
777engine matches in Unicode semantics whenever the string or the pattern is
778flagged as UTF-8, but that should not be dependent on an internal storage
779detail of the string. Likewise, case folding behaviour is dependent on the
780UTF8 internal flag being on or off.
781
782=head2 Properly Unicode safe tokeniser and pads.
783
784The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
785variable names are stored in stashes as raw bytes, without the utf-8 flag
786set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
787tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
788source filters. All this could be fixed.
789
636e63cb 790=head2 state variable initialization in list context
791
792Currently this is illegal:
793
794 state ($a, $b) = foo();
795
a2874905 796In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
a8d0aeb9 797semantics, which is tricky to implement in Perl 5 as currently they produce
a2874905 798the same opcode trees. The Perl 6 design is firm, so it would be good to
a8d0aeb9 799implement the necessary code in Perl 5. There are comments in
a2874905 800C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
801constructions involving state variables.
636e63cb 802
4fedb12c 803=head2 Implement $value ~~ 0 .. $range
804
805It would be nice to extend the syntax of the C<~~> operator to also
806understand numeric (and maybe alphanumeric) ranges.
a393eb28 807
808=head2 A does() built-in
809
810Like ref(), only useful. It would call the C<DOES> method on objects; it
811would also tell whether something can be dereferenced as an
812array/hash/etc., or used as a regexp, etc.
813L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
814
815=head2 Tied filehandles and write() don't mix
816
817There is no method on tied filehandles to allow them to be called back by
818formats.
4fedb12c 819
d10fc472 820=head2 Attach/detach debugger from running program
1626a787 821
cd793d32 822The old perltodo notes "With C<gdb>, you can attach the debugger to a running
823program if you pass the process ID. It would be good to do this with the Perl
0bdfc961 824debugger on a running Perl program, although I'm not sure how it would be
825done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 826
a8cb5b9e 827=head2 Optimize away empty destructors
828
829Defining an empty DESTROY method might be useful (notably in
830AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
831could probably be optimized.
832
0bdfc961 833=head2 LVALUE functions for lists
834
835The old perltodo notes that lvalue functions don't work for list or hash
836slices. This would be good to fix.
837
838=head2 LVALUE functions in the debugger
839
840The old perltodo notes that lvalue functions don't work in the debugger. This
841would be good to fix.
842
0bdfc961 843=head2 regexp optimiser optional
844
845The regexp optimiser is not optional. It should configurable to be, to allow
846its performance to be measured, and its bugs to be easily demonstrated.
847
02f21748 848=head2 delete &function
849
850Allow to delete functions. One can already undef them, but they're still
851in the stash.
852
ef36c6a7 853=head2 C</w> regex modifier
854
855That flag would enable to match whole words, and also to interpolate
856arrays as alternations. With it, C</P/w> would be roughly equivalent to:
857
858 do { local $"='|'; /\b(?:P)\b/ }
859
860See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
861for the discussion.
862
0bdfc961 863=head2 optional optimizer
864
865Make the peephole optimizer optional. Currently it performs two tasks as
866it walks the optree - genuine peephole optimisations, and necessary fixups of
867ops. It would be good to find an efficient way to switch out the
868optimisations whilst keeping the fixups.
869
870=head2 You WANT *how* many
871
872Currently contexts are void, scalar and list. split has a special mechanism in
873place to pass in the number of return values wanted. It would be useful to
874have a general mechanism for this, backwards compatible and little speed hit.
875This would allow proposals such as short circuiting sort to be implemented
876as a module on CPAN.
877
878=head2 lexical aliases
879
880Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
881
882=head2 entersub XS vs Perl
883
884At the moment pp_entersub is huge, and has code to deal with entering both
885perl and XS subroutines. Subroutine implementations rarely change between
886perl and XS at run time, so investigate using 2 ops to enter subs (one for
887XS, one for perl) and swap between if a sub is redefined.
2810d901 888
de535794 889=head2 Self-ties
2810d901 890
de535794 891Self-ties are currently illegal because they caused too many segfaults. Maybe
a8d0aeb9 892the causes of these could be tracked down and self-ties on all types
de535794 893reinstated.
0bdfc961 894
895=head2 Optimize away @_
896
897The old perltodo notes "Look at the "reification" code in C<av.c>".
898
87a942b1 899=head2 Virtualize operating system access
900
901Implement a set of "vtables" that virtualizes operating system access
902(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
903least these interfaces should take SVs as "name" arguments instead of
904bare char pointers; probably the most flexible and extensible way
e1a3d5d1 905would be for the Perl-facing interfaces to accept HVs. The system
906needs to be per-operating-system and per-file-system
907hookable/filterable, preferably both from XS and Perl level
87a942b1 908(L<perlport/"Files and Filesystems"> is good reading at this point,
909in fact, all of L<perlport> is.)
910
e1a3d5d1 911This has actually already been implemented (but only for Win32),
912take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
913variants go through a set of "vtables" for operating system access,
914non-Win32 systems currently go straight for the POSIX/UNIX-style
915system/library call. Similar system as for Win32 should be
916implemented for all platforms. The existing Win32 implementation
917probably does not need to survive alongside this proposed new
918implementation, the approaches could be merged.
87a942b1 919
920What would this give us? One often-asked-for feature this would
94da6c29 921enable is using Unicode for filenames, and other "names" like %ENV,
922usernames, hostnames, and so forth.
923(See L<perlunicode/"When Unicode Does Not Happen">.)
924
925But this kind of virtualization would also allow for things like
926virtual filesystems, virtual networks, and "sandboxes" (though as long
927as dynamic loading of random object code is allowed, not very safe
928sandboxes since external code of course know not of Perl's vtables).
929An example of a smaller "sandbox" is that this feature can be used to
930implement per-thread working directories: Win32 already does this.
931
932See also L</"Extend PerlIO and PerlIO::Scalar">.
87a942b1 933
ac6197af 934=head2 Investigate PADTMP hash pessimisation
935
936The peephole optimier converts constants used for hash key lookups to shared
057163d7 937hash key scalars. Under ithreads, something is undoing this work.
ac6197af 938See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
939
057163d7 940=head2 Store the current pad in the OP slab allocator
941
942=for clarification
943I hope that I got that "current pad" part correct
944
945Currently we leak ops in various cases of parse failure. I suggested that we
946could solve this by always using the op slab allocator, and walking it to
947free ops. Dave comments that as some ops are already freed during optree
948creation one would have to mark which ops are freed, and not double free them
949when walking the slab. He notes that one problem with this is that for some ops
950you have to know which pad was current at the time of allocation, which does
951change. I suggested storing a pointer to the current pad in the memory allocated
952for the slab, and swapping to a new slab each time the pad changes. Dave thinks
953that this would work.
954
52960e22 955=head2 repack the optree
956
957Repacking the optree after execution order is determined could allow
057163d7 958removal of NULL ops, and optimal ordering of OPs with respect to cache-line
959filling. The slab allocator could be reused for this purpose. I think that
960the best way to do this is to make it an optional step just before the
961completed optree is attached to anything else, and to use the slab allocator
962unchanged, so that freeing ops is identical whether or not this step runs.
963Note that the slab allocator allocates ops downwards in memory, so one would
964have to actually "allocate" the ops in reverse-execution order to get them
965contiguous in memory in execution order.
966
967See http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
968
969Note that running this copy, and then freeing all the old location ops would
970cause their slabs to be freed, which would eliminate possible memory wastage if
971the previous suggestion is implemented, and we swap slabs more frequently.
52960e22 972
12e06b6f 973=head2 eliminate incorrect line numbers in warnings
974
975This code
976
977 use warnings;
978 my $undef;
979
980 if ($undef == 3) {
981 } elsif ($undef == 0) {
982 }
983
18a16cc5 984used to produce this output:
12e06b6f 985
986 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
987 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
988
18a16cc5 989where the line of the second warning was misreported - it should be line 5.
990Rafael fixed this - the problem arose because there was no nextstate OP
991between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
992reports that the currently executing line is line 4. The solution was to inject
993a nextstate OPs for each C<elsif>, although it turned out that the nextstate
994OP needed to be a nulled OP, rather than a live nextstate OP, else other line
995numbers became misreported. (Jenga!)
12e06b6f 996
997The problem is more general than C<elsif> (although the C<elsif> case is the
998most common and the most confusing). Ideally this code
999
1000 use warnings;
1001 my $undef;
1002
1003 my $a = $undef + 1;
1004 my $b
1005 = $undef
1006 + 1;
1007
1008would produce this output
1009
1010 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1011 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1012
1013(rather than lines 4 and 5), but this would seem to require every OP to carry
1014(at least) line number information.
1015
1016What might work is to have an optional line number in memory just before the
1017BASEOP structure, with a flag bit in the op to say whether it's present.
1018Initially during compile every OP would carry its line number. Then add a late
1019pass to the optimiser (potentially combined with L</repack the optree>) which
1020looks at the two ops on every edge of the graph of the execution path. If
1021the line number changes, flags the destination OP with this information.
1022Once all paths are traced, replace every op with the flag with a
1023nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
1024control on to the true op. All ops would then be replaced by variants that
1025do not store the line number. (Which, logically, why it would work best in
1026conjunction with L</repack the optree>, as that is already copying/reallocating
1027all the OPs)
1028
18a16cc5 1029(Although I should note that we're not certain that doing this for the general
1030case is worth it)
1031
52960e22 1032=head2 optimize tail-calls
1033
1034Tail-calls present an opportunity for broadly applicable optimization;
1035anywhere that C<< return foo(...) >> is called, the outer return can
1036be replaced by a goto, and foo will return directly to the outer
1037caller, saving (conservatively) 25% of perl's call&return cost, which
1038is relatively higher than in C. The scheme language is known to do
1039this heavily. B::Concise provides good insight into where this
1040optimization is possible, ie anywhere entersub,leavesub op-sequence
1041occurs.
1042
1043 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1044
1045Bottom line on this is probably a new pp_tailcall function which
1046combines the code in pp_entersub, pp_leavesub. This should probably
1047be done 1st in XS, and using B::Generate to patch the new OP into the
1048optrees.
1049
0bdfc961 1050=head1 Big projects
1051
1052Tasks that will get your name mentioned in the description of the "Highlights
87a942b1 1053of 5.12"
0bdfc961 1054
1055=head2 make ithreads more robust
1056
4e577f8b 1057Generally make ithreads more robust. See also L</iCOW>
0bdfc961 1058
1059This task is incremental - even a little bit of work on it will help, and
1060will be greatly appreciated.
1061
6c047da7 1062One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1063
59c7f7d5 1064Fix Perl_sv_dup, et al so that threads can return objects.
1065
0bdfc961 1066=head2 iCOW
1067
1068Sarathy and Arthur have a proposal for an improved Copy On Write which
1069specifically will be able to COW new ithreads. If this can be implemented
1070it would be a good thing.
1071
1072=head2 (?{...}) closures in regexps
1073
1074Fix (or rewrite) the implementation of the C</(?{...})/> closures.
1075
1076=head2 A re-entrant regexp engine
1077
1078This will allow the use of a regex from inside (?{ }), (??{ }) and
1079(?(?{ })|) constructs.
6bda09f9 1080
6bda09f9 1081=head2 Add class set operations to regexp engine
1082
1083Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1084
1085demerphq has this on his todo list, but right at the bottom.