Document the mro improvments.
[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
CommitLineData
7711098a 1=head1 NAME
2
3perltodo - Perl TO-DO List
4
5=head1 DESCRIPTION
e50bb9a1 6
049aabcb 7This is a list of wishes for Perl. The most up to date version of this file
8is at http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
9
10The tasks we think are smaller or easier are listed first. Anyone is welcome
11to work on any of these, but it's a good idea to first contact
12I<perl5-porters@perl.org> to avoid duplication of effort, and to learn from
13any previous attempts. By all means contact a pumpking privately first if you
14prefer.
e50bb9a1 15
0bdfc961 16Whilst patches to make the list shorter are most welcome, ideas to add to
17the list are also encouraged. Check the perl5-porters archives for past
18ideas, and any discussion about them. One set of archives may be found at:
e50bb9a1 19
0bdfc961 20 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
938c8732 21
617eabfa 22What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
23not, but if your patch is incorporated, then we'll add your name to the
24F<AUTHORS> file, which ships in the official distribution. How many other
25programming languages offer you 1 line of immortality?
938c8732 26
0bdfc961 27=head1 Tasks that only need Perl knowledge
e50bb9a1 28
1841b798 29=head2 Remove macperl references from tests
30
31MacPerl is gone. The tests don't need to be there.
32
5a176cbc 33=head2 Remove duplication of test setup.
34
35Schwern notes, that there's duplication of code - lots and lots of tests have
36some variation on the big block of C<$Is_Foo> checks. We can safely put this
37into a file, change it to build an C<%Is> hash and require it. Maybe just put
38it into F<test.pl>. Throw in the handy tainting subroutines.
39
87a942b1 40=head2 POD -E<gt> HTML conversion in the core still sucks
e50bb9a1 41
938c8732 42Which is crazy given just how simple POD purports to be, and how simple HTML
adebf063 43can be. It's not actually I<as> simple as it sounds, particularly with the
44flexibility POD allows for C<=item>, but it would be good to improve the
45visual appeal of the HTML generated, and to avoid it having any validation
46errors. See also L</make HTML install work>, as the layout of installation tree
47is needed to improve the cross-linking.
938c8732 48
dc0fb092 49The addition of C<Pod::Simple> and its related modules may make this task
50easier to complete.
51
0befdfba 52=head2 Make ExtUtils::ParseXS use strict;
53
54F<lib/ExtUtils/ParseXS.pm> contains this line
55
56 # use strict; # One of these days...
57
58Simply uncomment it, and fix all the resulting issues :-)
59
60The more practical approach, to break the task down into manageable chunks, is
61to work your way though the code from bottom to top, or if necessary adding
62extra C<{ ... }> blocks, and turning on strict within them.
63
0bdfc961 64=head2 Make Schwern poorer
e50bb9a1 65
613bd4f7 66We should have tests for everything. When all the core's modules are tested,
0bdfc961 67Schwern has promised to donate to $500 to TPF. We may need volunteers to
68hold him upside down and shake vigorously in order to actually extract the
69cash.
3958b146 70
0bdfc961 71=head2 Improve the coverage of the core tests
e50bb9a1 72
02f21748 73Use Devel::Cover to ascertain the core modules's test coverage, then add
74tests that are currently missing.
30222c0f 75
0bdfc961 76=head2 test B
e50bb9a1 77
0bdfc961 78A full test suite for the B module would be nice.
e50bb9a1 79
0bdfc961 80=head2 A decent benchmark
e50bb9a1 81
617eabfa 82C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961 83would be useful to have a reasonable general benchmarking suite that roughly
84represented what current perl programs do, and measurably reported whether
85tweaks to the core improve, degrade or don't really affect performance, to
86guide people attempting to optimise the guts of perl. Gisle would welcome
87new tests for perlbench.
6168cf99 88
0bdfc961 89=head2 fix tainting bugs
6168cf99 90
0bdfc961 91Fix the bugs revealed by running the test suite with the C<-t> switch (via
92C<make test.taintwarn>).
e50bb9a1 93
0bdfc961 94=head2 Dual life everything
e50bb9a1 95
0bdfc961 96As part of the "dists" plan, anything that doesn't belong in the smallest perl
97distribution needs to be dual lifed. Anything else can be too. Figure out what
98changes would be needed to package that module and its tests up for CPAN, and
99do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 100
a393eb28 101To make a minimal perl distribution, it's useful to look at
102F<t/lib/commonsense.t>.
103
dfb56e28 104=head2 Move dual-life pod/*.PL into ext
c2aba5b8 105
dfb56e28 106Nearly all the dual-life modules have been moved to F<ext>. However, we
107still need to move F<pod/*.PL> into their respective directories
764e6bc7 108in F<ext/>. They're referenced by (at least) C<plextract> in F<Makefile.SH>
109and C<utils> in F<win32/Makefile> and F<win32/makefile.ml>, and listed
110explicitly in F<win32/pod.mak>, F<vms/descrip_mms.template> and F<utils.lst>
111
0bdfc961 112=head2 POSIX memory footprint
e50bb9a1 113
0bdfc961 114Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
115various times worked to cut it down. There is probably still fat to cut out -
116for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 117
eed36644 118=head2 embed.pl/makedef.pl
119
120There is a script F<embed.pl> that generates several header files to prefix
121all of Perl's symbols in a consistent way, to provide some semblance of
122namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
907b3e23 123in F<interpvar.h>. Quite a few of the functions and variables
eed36644 124are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
125doesn't understand the C macros, so the rules about which symbols are present
126when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
127It would be good to teach C<embed.pl> to understand the conditional
128compilation, and hence remove the duplication, and the mistakes it has caused.
e50bb9a1 129
801de10e 130=head2 use strict; and AutoLoad
131
132Currently if you write
133
134 package Whack;
135 use AutoLoader 'AUTOLOAD';
136 use strict;
137 1;
138 __END__
139 sub bloop {
140 print join (' ', No, strict, here), "!\n";
141 }
142
143then C<use strict;> isn't in force within the autoloaded subroutines. It would
144be more consistent (and less surprising) to arrange for all lexical pragmas
145in force at the __END__ block to be in force within each autoloaded subroutine.
146
773b3597 147There's a similar problem with SelfLoader.
148
91d0cbf6 149=head2 profile installman
150
151The F<installman> script is slow. All it is doing text processing, which we're
152told is something Perl is good at. So it would be nice to know what it is doing
153that is taking so much CPU, and where possible address it.
154
155
0bdfc961 156=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 157
0bdfc961 158Or if you prefer, tasks that you would learn from, and broaden your skills
159base...
e50bb9a1 160
cd793d32 161=head2 make HTML install work
e50bb9a1 162
adebf063 163There is an C<installhtml> target in the Makefile. It's marked as
164"experimental". It would be good to get this tested, make it work reliably, and
165remove the "experimental" tag. This would include
166
167=over 4
168
169=item 1
170
171Checking that cross linking between various parts of the documentation works.
172In particular that links work between the modules (files with POD in F<lib/>)
173and the core documentation (files in F<pod/>)
174
175=item 2
176
617eabfa 177Work out how to split C<perlfunc> into chunks, preferably one per function
178group, preferably with general case code that could be used elsewhere.
179Challenges here are correctly identifying the groups of functions that go
180together, and making the right named external cross-links point to the right
181page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
182C<endservent>, two or more C<=items> giving the different parameter lists, such
183as
adebf063 184
185 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
adebf063 186 =item substr EXPR,OFFSET,LENGTH
adebf063 187 =item substr EXPR,OFFSET
188
189and different parameter lists having different meanings. (eg C<select>)
190
191=back
3a89a73c 192
0bdfc961 193=head2 compressed man pages
194
195Be able to install them. This would probably need a configure test to see how
196the system does compressed man pages (same directory/different directory?
197same filename/different filename), as well as tweaking the F<installman> script
198to compress as necessary.
199
30222c0f 200=head2 Add a code coverage target to the Makefile
201
202Make it easy for anyone to run Devel::Cover on the core's tests. The steps
203to do this manually are roughly
204
205=over 4
206
207=item *
208
209do a normal C<Configure>, but include Devel::Cover as a module to install
210(see F<INSTALL> for how to do this)
211
212=item *
213
214 make perl
215
216=item *
217
218 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
219
220=item *
221
222Process the resulting Devel::Cover database
223
224=back
225
226This just give you the coverage of the F<.pm>s. To also get the C level
227coverage you need to
228
229=over 4
230
231=item *
232
233Additionally tell C<Configure> to use the appropriate C compiler flags for
234C<gcov>
235
236=item *
237
238 make perl.gcov
239
240(instead of C<make perl>)
241
242=item *
243
244After running the tests run C<gcov> to generate all the F<.gcov> files.
245(Including down in the subdirectories of F<ext/>
246
247=item *
248
249(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
250to get their stats into the cover_db directory.
251
252=item *
253
254Then process the Devel::Cover database
255
256=back
257
258It would be good to add a single switch to C<Configure> to specify that you
259wanted to perform perl level coverage, and another to specify C level
260coverage, and have C<Configure> and the F<Makefile> do all the right things
261automatically.
262
02f21748 263=head2 Make Config.pm cope with differences between built and installed perl
0bdfc961 264
265Quite often vendors ship a perl binary compiled with their (pay-for)
266compilers. People install a free compiler, such as gcc. To work out how to
267build extensions, Perl interrogates C<%Config>, so in this situation
268C<%Config> describes compilers that aren't there, and extension building
269fails. This forces people into choosing between re-compiling perl themselves
270using the compiler they have, or only using modules that the vendor ships.
271
272It would be good to find a way teach C<Config.pm> about the installation setup,
273possibly involving probing at install time or later, so that the C<%Config> in
274a binary distribution better describes the installed machine, when the
275installed machine differs from the build machine in some significant way.
276
728f4ecd 277=head2 linker specification files
278
279Some platforms mandate that you provide a list of a shared library's external
280symbols to the linker, so the core already has the infrastructure in place to
281do this for generating shared perl libraries. My understanding is that the
282GNU toolchain can accept an optional linker specification file, and restrict
283visibility just to symbols declared in that file. It would be good to extend
284F<makedef.pl> to support this format, and to provide a means within
285C<Configure> to enable it. This would allow Unix users to test that the
286export list is correct, and to build a perl that does not pollute the global
287namespace with private symbols.
288
a229ae3b 289=head2 Cross-compile support
290
291Currently C<Configure> understands C<-Dusecrosscompile> option. This option
292arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
293assumed then to be copied to TARGET machine and used as a replacement of full
294C<perl> executable.
295
d1307786 296This could be done little differently. Namely C<miniperl> should be built for
a229ae3b 297HOST and then full C<perl> with extensions should be compiled for TARGET.
d1307786 298This, however, might require extra trickery for %Config: we have one config
87a942b1 299first for HOST and then another for TARGET. Tools like MakeMaker will be
300mightily confused. Having around two different types of executables and
301libraries (HOST and TARGET) makes life interesting for Makefiles and
302shell (and Perl) scripts. There is $Config{run}, normally empty, which
303can be used as an execution wrapper. Also note that in some
304cross-compilation/execution environments the HOST and the TARGET do
305not see the same filesystem(s), the $Config{run} may need to do some
306file/directory copying back and forth.
0bdfc961 307
8537f021 308=head2 roffitall
309
310Make F<pod/roffitall> be updated by F<pod/buildtoc>.
311
98fca0e8 312=head2 Split "linker" from "compiler"
313
314Right now, Configure probes for two commands, and sets two variables:
315
316=over 4
317
b91dd380 318=item * C<cc> (in F<cc.U>)
98fca0e8 319
320This variable holds the name of a command to execute a C compiler which
321can resolve multiple global references that happen to have the same
322name. Usual values are F<cc> and F<gcc>.
323Fervent ANSI compilers may be called F<c89>. AIX has F<xlc>.
324
b91dd380 325=item * C<ld> (in F<dlsrc.U>)
98fca0e8 326
327This variable indicates the program to be used to link
328libraries for dynamic loading. On some systems, it is F<ld>.
329On ELF systems, it should be C<$cc>. Mostly, we'll try to respect
330the hint file setting.
331
332=back
333
8d159ec1 334There is an implicit historical assumption from around Perl5.000alpha
335something, that C<$cc> is also the correct command for linking object files
336together to make an executable. This may be true on Unix, but it's not true
337on other platforms, and there are a maze of work arounds in other places (such
338as F<Makefile.SH>) to cope with this.
98fca0e8 339
340Ideally, we should create a new variable to hold the name of the executable
341linker program, probe for it in F<Configure>, and centralise all the special
342case logic there or in hints files.
343
344A small bikeshed issue remains - what to call it, given that C<$ld> is already
8d159ec1 345taken (arguably for the wrong thing now, but on SunOS 4.1 it is the command
346for creating dynamically-loadable modules) and C<$link> could be confused with
347the Unix command line executable of the same name, which does something
348completely different. Andy Dougherty makes the counter argument "In parrot, I
349tried to call the command used to link object files and libraries into an
350executable F<link>, since that's what my vaguely-remembered DOS and VMS
351experience suggested. I don't think any real confusion has ensued, so it's
352probably a reasonable name for perl5 to use."
98fca0e8 353
354"Alas, I've always worried that introducing it would make things worse,
355since now the module building utilities would have to look for
356C<$Config{link}> and institute a fall-back plan if it weren't found."
8d159ec1 357Although I can see that as confusing, given that C<$Config{d_link}> is true
358when (hard) links are available.
98fca0e8 359
75585ce3 360=head2 Configure Windows using PowerShell
361
362Currently, Windows uses hard-coded config files based to build the
363config.h for compiling Perl. Makefiles are also hard-coded and need to be
364hand edited prior to building Perl. While this makes it easy to create a perl.exe
365that works across multiple Windows versions, being able to accurately
366configure a perl.exe for a specific Windows versions and VS C++ would be
367a nice enhancement. With PowerShell available on Windows XP and up, this
368may now be possible. Step 1 might be to investigate whether this is possible
369and use this to clean up our current makefile situation. Step 2 would be to
370see if there would be a way to use our existing metaconfig units to configure a
371Windows Perl or whether we go in a separate direction and make it so. Of
372course, we all know what step 3 is.
373
ab45a0fa 374=head2 decouple -g and -DDEBUGGING
375
376Currently F<Configure> automatically adds C<-DDEBUGGING> to the C compiler
377flags if it spots C<-g> in the optimiser flags. The pre-processor directive
378C<DEBUGGING> enables F<perl>'s command line <-D> options, but in the process
379makes F<perl> slower. It would be good to disentangle this logic, so that
380C-level debugging with C<-g> and Perl level debugging with C<-D> can easily
381be enabled independently.
382
0bdfc961 383=head1 Tasks that need a little C knowledge
384
385These tasks would need a little C knowledge, but don't need any specific
386background or experience with XS, or how the Perl interpreter works
387
3d826b29 388=head2 Weed out needless PERL_UNUSED_ARG
389
390The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
391unused arguments. Often the arguments can't be removed, as there is an
392external constraint that determines the prototype of the function, so this
393approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
394could be removed. Specifically
395
396=over 4
397
398=item *
399
400The prototypes of (nearly all) static functions can be changed
401
402=item *
403
404Unused arguments generated by short cut macros are wasteful - the short cut
405macro used can be changed.
406
407=back
408
fbf638cb 409=head2 Modernize the order of directories in @INC
410
411The way @INC is laid out by default, one cannot upgrade core (dual-life)
412modules without overwriting files. This causes problems for binary
3d14fd97 413package builders. One possible proposal is laid out in this
414message:
415L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
fbf638cb 416
bcbaa2d5 417=head2 -Duse32bit*
418
419Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
420On these systems, it might be the default compilation mode, and there
421is currently no guarantee that passing no use64bitall option to the
422Configure process will build a 32bit perl. Implementing -Duse32bit*
423options would be nice for perl 5.12.
424
fee0a0f7 425=head2 Profile Perl - am I hot or not?
62403a3c 426
fee0a0f7 427The Perl source code is stable enough that it makes sense to profile it,
428identify and optimise the hotspots. It would be good to measure the
429performance of the Perl interpreter using free tools such as cachegrind,
430gprof, and dtrace, and work to reduce the bottlenecks they reveal.
431
432As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
433the ops that are most commonly used. The idea is that by grouping them, their
434object code will be adjacent in the executable, so they have a greater chance
435of already being in the CPU cache (or swapped in) due to being near another op
436already in use.
62403a3c 437
438Except that it's not clear if these really are the most commonly used ops. So
fee0a0f7 439as part of exercising your skills with coverage and profiling tools you might
440want to determine what ops I<really> are the most commonly used. And in turn
441suggest evictions and promotions to achieve a better F<pp_hot.c>.
62403a3c 442
91d0cbf6 443One piece of Perl code that might make a good testbed is F<installman>.
444
98fed0ad 445=head2 Allocate OPs from arenas
446
447Currently all new OP structures are individually malloc()ed and free()d.
448All C<malloc> implementations have space overheads, and are now as fast as
449custom allocates so it would both use less memory and less CPU to allocate
450the various OP structures from arenas. The SV arena code can probably be
451re-used for this.
452
539f2c54 453Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
454Perl_Slab_alloc() to pack optrees into a contiguous block, which is
455probably superior to the use of OP arenas, esp. from a cache locality
456standpoint. See L<Profile Perl - am I hot or not?>.
457
a229ae3b 458=head2 Improve win32/wince.c
0bdfc961 459
a229ae3b 460Currently, numerous functions look virtually, if not completely,
02f21748 461identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
6d71adcd 462be good.
463
c5b31784 464=head2 Use secure CRT functions when building with VC8 on Win32
465
466Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
467that they were "unsafe" and introduced differently named secure versions of
468them as replacements, e.g. instead of writing
469
470 FILE* f = fopen(__FILE__, "r");
471
472one should now write
473
474 FILE* f;
475 errno_t err = fopen_s(&f, __FILE__, "r");
476
477Currently, the warnings about these deprecations have been disabled by adding
478-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
479warning suppressant and actually make use of the new secure CRT functions.
480
481There is also a similar issue with POSIX CRT function names like fileno having
482been deprecated in favour of ISO C++ conformant names like _fileno. These
26a6faa8 483warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
c5b31784 484might be nice to do as Microsoft suggest here too, although, unlike the secure
485functions issue, there is presumably little or no benefit in this case.
486
038ae9a4 487=head2 Fix POSIX::access() and chdir() on Win32
488
489These functions currently take no account of DACLs and therefore do not behave
490correctly in situations where access is restricted by DACLs (as opposed to the
491read-only attribute).
492
493Furthermore, POSIX::access() behaves differently for directories having the
494read-only attribute set depending on what CRT library is being used. For
495example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
496such directories are not writable, whereas in fact all directories are writable
497unless access is denied by DACLs. (In the case of directories, the read-only
498attribute actually only means that the directory cannot be deleted.) This CRT
499bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
500not actually be writable if access is indeed denied by DACLs).
501
502For the chdir() issue, see ActiveState bug #74552:
503http://bugs.activestate.com/show_bug.cgi?id=74552
504
505Therefore, DACLs should be checked both for consistency across CRTs and for
506the correct answer.
507
508(Note that perl's -w operator should not be modified to check DACLs. It has
509been written so that it reflects the state of the read-only attribute, even
510for directories (whatever CRT is being used), for symmetry with chmod().)
511
16815324 512=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
513
514Maybe create a utility that checks after each libperl.a creation that
515none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets())
516ever creep back to libperl.a.
517
518 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
519
520Note, of course, that this will only tell whether B<your> platform
521is using those naughty interfaces.
522
de96509d 523=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
524
525Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
526(4.1 onwards?) supports C<-fstack-protector>, both of which give
527protection against various kinds of buffer overflow problems.
528These should probably be used for compiling Perl whenever available,
529Configure and/or hints files should be adjusted to probe for the
530availability of these features and enable them as appropriate.
16815324 531
8964cfe0 532=head2 Arenas for GPs? For MAGIC?
533
534C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
535It might be a speed or memory saving to change to using arenas. Or it might
536not. It would need some suitable benchmarking first. In particular, C<GP>s
537can probably be changed with minimal compatibility impact (probably nothing
538outside of the core, or even outside of F<gv.c> allocates them), but they
539probably aren't allocated/deallocated often enough for a speed saving. Whereas
540C<MAGIC> is allocated/deallocated more often, but in turn, is also something
541more externally visible, so changing the rules here may bite external code.
542
3880c8ec 543=head2 Shared arenas
544
545Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
546PVHV, and PVCV and PVFM. It should be possible to allocate and return same
547sized bodies from the same actual arena, rather than maintaining one arena for
548each. This could save 4-6K per thread, of memory no longer tied up in the
549not-yet-allocated part of an arena.
550
8964cfe0 551
6d71adcd 552=head1 Tasks that need a knowledge of XS
553
554These tasks would need C knowledge, and roughly the level of knowledge of
555the perl API that comes from writing modules that use XS to interface to
556C.
557
318bf708 558=head2 Remove the use of SVs as temporaries in dump.c
559
560F<dump.c> contains debugging routines to dump out the contains of perl data
561structures, such as C<SV>s, C<AV>s and C<HV>s. Currently, the dumping code
562B<uses> C<SV>s for its temporary buffers, which was a logical initial
563implementation choice, as they provide ready made memory handling.
564
565However, they also lead to a lot of confusion when it happens that what you're
566trying to debug is seen by the code in F<dump.c>, correctly or incorrectly, as
567a temporary scalar it can use for a temporary buffer. It's also not possible
568to dump scalars before the interpreter is properly set up, such as during
569ithreads cloning. It would be good to progressively replace the use of scalars
570as string accumulation buffers with something much simpler, directly allocated
571by C<malloc>. The F<dump.c> code is (or should be) only producing 7 bit
572US-ASCII, so output character sets are not an issue.
573
574Producing and proving an internal simple buffer allocation would make it easier
575to re-write the internals of the PerlIO subsystem to avoid using C<SV>s for
576B<its> buffers, use of which can cause problems similar to those of F<dump.c>,
577at similar times.
578
5d96f598 579=head2 safely supporting POSIX SA_SIGINFO
580
581Some years ago Jarkko supplied patches to provide support for the POSIX
582SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
583
584Unfortunately, it only works with "unsafe" signals, because under safe
585signals, by the time Perl gets to run the signal handler, the extra
586information has been lost. Moreover, it's not easy to store it somewhere,
587as you can't call mutexs, or do anything else fancy, from inside a signal
588handler.
589
590So it strikes me that we could provide safe SA_SIGINFO support
591
592=over 4
593
594=item 1
595
596Provide global variables for two file descriptors
597
598=item 2
599
600When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
601pipe, store the reader in one, the writer in the other
602
603=item 3
604
605In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
606the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
607
608=over 8
609
610=item 1
611
612serialise signal number, C<struct siginfo_t> (or at least the parts we care
613about) into a small auto char buff
614
615=item 2
616
617C<write()> that (non-blocking) to the writer fd
618
619=over 12
620
621=item 1
622
623if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
624to the current per-signal-number counts
625
626=item 2
627
628if it writes 0%, assume the pipe is full. Flag the data as lost?
629
630=item 3
631
632if it writes partially, croak a panic, as your OS is broken.
633
634=back
635
636=back
637
638=item 4
639
640in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
641the pipe", read the data out, deserialise, build the Perl structures on
642the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
643usual.
644
645=back
646
647I think that this gets us decent C<SA_SIGINFO> support, without the current risk
648of running Perl code inside the signal handler context. (With all the dangers
649of things like C<malloc> corruption that that currently offers us)
650
651For more information see the thread starting with this message:
652http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
653
6d71adcd 654=head2 autovivification
655
656Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
657
658This task is incremental - even a little bit of work on it will help.
659
660=head2 Unicode in Filenames
661
662chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
663opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
664system, truncate, unlink, utime, -X. All these could potentially accept
665Unicode filenames either as input or output (and in the case of system
666and qx Unicode in general, as input or output to/from the shell).
667Whether a filesystem - an operating system pair understands Unicode in
668filenames varies.
669
670Known combinations that have some level of understanding include
671Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
672OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
673create Unicode filenames, what forms of Unicode are accepted and used
674(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
675and so on, varies. Finding the right level of interfacing to Perl
676requires some thought. Remember that an OS does not implicate a
677filesystem.
678
679(The Windows -C command flag "wide API support" has been at least
680temporarily retired in 5.8.1, and the -C has been repurposed, see
681L<perlrun>.)
682
87a942b1 683Most probably the right way to do this would be this:
684L</"Virtualize operating system access">.
685
6d71adcd 686=head2 Unicode in %ENV
687
688Currently the %ENV entries are always byte strings.
87a942b1 689See L</"Virtualize operating system access">.
6d71adcd 690
1f2e7916 691=head2 Unicode and glob()
692
693Currently glob patterns and filenames returned from File::Glob::glob()
87a942b1 694are always byte strings. See L</"Virtualize operating system access">.
1f2e7916 695
dbb0c492 696=head2 Unicode and lc/uc operators
697
698Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
699what the internal encoding of their argument is. That should not be the
700case. Maybe add a pragma to switch behaviour.
701
6d71adcd 702=head2 use less 'memory'
703
704Investigate trade offs to switch out perl's choices on memory usage.
705Particularly perl should be able to give memory back.
706
707This task is incremental - even a little bit of work on it will help.
708
709=head2 Re-implement C<:unique> in a way that is actually thread-safe
710
711The old implementation made bad assumptions on several levels. A good 90%
712solution might be just to make C<:unique> work to share the string buffer
713of SvPVs. That way large constant strings can be shared between ithreads,
714such as the configuration information in F<Config>.
715
716=head2 Make tainting consistent
717
718Tainting would be easier to use if it didn't take documented shortcuts and
719allow taint to "leak" everywhere within an expression.
720
721=head2 readpipe(LIST)
722
723system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
724running a shell. readpipe() (the function behind qx//) could be similarly
725extended.
726
6d71adcd 727=head2 Audit the code for destruction ordering assumptions
728
729Change 25773 notes
730
731 /* Need to check SvMAGICAL, as during global destruction it may be that
732 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
733 is now part of the linked list of SV heads, rather than pointing to
734 the original body. */
735 /* FIXME - audit the code for other bugs like this one. */
736
737adding the C<SvMAGICAL> check to
738
739 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
740 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
741
742Go through the core and look for similar assumptions that SVs have particular
743types, as all bets are off during global destruction.
744
749904bf 745=head2 Extend PerlIO and PerlIO::Scalar
746
747PerlIO::Scalar doesn't know how to truncate(). Implementing this
748would require extending the PerlIO vtable.
749
750Similarly the PerlIO vtable doesn't know about formats (write()), or
751about stat(), or chmod()/chown(), utime(), or flock().
752
753(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
754would mean.)
755
756PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
757opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
758readlink().
759
94da6c29 760See also L</"Virtualize operating system access">.
761
3236f110 762=head2 -C on the #! line
763
764It should be possible to make -C work correctly if found on the #! line,
765given that all perl command line options are strict ASCII, and -C changes
766only the interpretation of non-ASCII characters, and not for the script file
767handle. To make it work needs some investigation of the ordering of function
768calls during startup, and (by implication) a bit of tweaking of that order.
769
d6c1e11f 770=head2 Organize error messages
771
772Perl's diagnostics (error messages, see L<perldiag>) could use
a8d0aeb9 773reorganizing and formalizing so that each error message has its
d6c1e11f 774stable-for-all-eternity unique id, categorized by severity, type, and
775subsystem. (The error messages would be listed in a datafile outside
c4bd451b 776of the Perl source code, and the source code would only refer to the
777messages by the id.) This clean-up and regularizing should apply
d6c1e11f 778for all croak() messages.
779
780This would enable all sorts of things: easier translation/localization
781of the messages (though please do keep in mind the caveats of
782L<Locale::Maketext> about too straightforward approaches to
783translation), filtering by severity, and instead of grepping for a
784particular error message one could look for a stable error id. (Of
785course, changing the error messages by default would break all the
786existing software depending on some particular error message...)
787
788This kind of functionality is known as I<message catalogs>. Look for
789inspiration for example in the catgets() system, possibly even use it
790if available-- but B<only> if available, all platforms will B<not>
de96509d 791have catgets().
d6c1e11f 792
793For the really pure at heart, consider extending this item to cover
794also the warning messages (see L<perllexwarn>, C<warnings.pl>).
3236f110 795
0bdfc961 796=head1 Tasks that need a knowledge of the interpreter
3298bd4d 797
0bdfc961 798These tasks would need C knowledge, and knowledge of how the interpreter works,
799or a willingness to learn.
3298bd4d 800
de6375e3 801=head2 truncate() prototype
802
803The prototype of truncate() is currently C<$$>. It should probably
804be C<*$> instead. (This is changed in F<opcode.pl>)
805
2d0587d8 806=head2 decapsulation of smart match argument
807
808Currently C<$foo ~~ $object> will die with the message "Smart matching a
809non-overloaded object breaks encapsulation". It would be nice to allow
810to bypass this by using explictly the syntax C<$foo ~~ %$object> or
811C<$foo ~~ @$object>.
812
565590b5 813=head2 error reporting of [$a ; $b]
814
815Using C<;> inside brackets is a syntax error, and we don't propose to change
816that by giving it any meaning. However, it's not reported very helpfully:
817
818 $ perl -e '$a = [$b; $c];'
819 syntax error at -e line 1, near "$b;"
820 syntax error at -e line 1, near "$c]"
821 Execution of -e aborted due to compilation errors.
822
823It should be possible to hook into the tokeniser or the lexer, so that when a
824C<;> is parsed where it is not legal as a statement terminator (ie inside
825C<{}> used as a hashref, C<[]> or C<()>) it issues an error something like
826I<';' isn't legal inside an expression - if you need multiple statements use a
827do {...} block>. See the thread starting at
828http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
829
718140ec 830=head2 lexicals used only once
831
832This warns:
833
834 $ perl -we '$pie = 42'
835 Name "main::pie" used only once: possible typo at -e line 1.
836
837This does not:
838
839 $ perl -we 'my $pie = 42'
840
841Logically all lexicals used only once should warn, if the user asks for
d6f4ea2e 842warnings. An unworked RT ticket (#5087) has been open for almost seven
843years for this discrepancy.
718140ec 844
a3d15f9a 845=head2 UTF-8 revamp
846
847The handling of Unicode is unclean in many places. For example, the regexp
848engine matches in Unicode semantics whenever the string or the pattern is
849flagged as UTF-8, but that should not be dependent on an internal storage
850detail of the string. Likewise, case folding behaviour is dependent on the
851UTF8 internal flag being on or off.
852
853=head2 Properly Unicode safe tokeniser and pads.
854
855The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
856variable names are stored in stashes as raw bytes, without the utf-8 flag
857set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
858tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
859source filters. All this could be fixed.
860
636e63cb 861=head2 state variable initialization in list context
862
863Currently this is illegal:
864
865 state ($a, $b) = foo();
866
a2874905 867In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
a8d0aeb9 868semantics, which is tricky to implement in Perl 5 as currently they produce
a2874905 869the same opcode trees. The Perl 6 design is firm, so it would be good to
a8d0aeb9 870implement the necessary code in Perl 5. There are comments in
a2874905 871C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
872constructions involving state variables.
636e63cb 873
4fedb12c 874=head2 Implement $value ~~ 0 .. $range
875
876It would be nice to extend the syntax of the C<~~> operator to also
877understand numeric (and maybe alphanumeric) ranges.
a393eb28 878
879=head2 A does() built-in
880
881Like ref(), only useful. It would call the C<DOES> method on objects; it
882would also tell whether something can be dereferenced as an
883array/hash/etc., or used as a regexp, etc.
884L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
885
886=head2 Tied filehandles and write() don't mix
887
888There is no method on tied filehandles to allow them to be called back by
889formats.
4fedb12c 890
53967bb9 891=head2 Propagate compilation hints to the debugger
892
893Currently a debugger started with -dE on the command-line doesn't see the
894features enabled by -E. More generally hints (C<$^H> and C<%^H>) aren't
895propagated to the debugger. Probably it would be a good thing to propagate
896hints from the innermost non-C<DB::> scope: this would make code eval'ed
897in the debugger see the features (and strictures, etc.) currently in
898scope.
899
d10fc472 900=head2 Attach/detach debugger from running program
1626a787 901
cd793d32 902The old perltodo notes "With C<gdb>, you can attach the debugger to a running
903program if you pass the process ID. It would be good to do this with the Perl
0bdfc961 904debugger on a running Perl program, although I'm not sure how it would be
905done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 906
0bdfc961 907=head2 LVALUE functions for lists
908
909The old perltodo notes that lvalue functions don't work for list or hash
910slices. This would be good to fix.
911
0bdfc961 912=head2 regexp optimiser optional
913
914The regexp optimiser is not optional. It should configurable to be, to allow
915its performance to be measured, and its bugs to be easily demonstrated.
916
02f21748 917=head2 delete &function
918
919Allow to delete functions. One can already undef them, but they're still
920in the stash.
921
ef36c6a7 922=head2 C</w> regex modifier
923
924That flag would enable to match whole words, and also to interpolate
925arrays as alternations. With it, C</P/w> would be roughly equivalent to:
926
927 do { local $"='|'; /\b(?:P)\b/ }
928
929See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
930for the discussion.
931
0bdfc961 932=head2 optional optimizer
933
934Make the peephole optimizer optional. Currently it performs two tasks as
935it walks the optree - genuine peephole optimisations, and necessary fixups of
936ops. It would be good to find an efficient way to switch out the
937optimisations whilst keeping the fixups.
938
939=head2 You WANT *how* many
940
941Currently contexts are void, scalar and list. split has a special mechanism in
942place to pass in the number of return values wanted. It would be useful to
943have a general mechanism for this, backwards compatible and little speed hit.
944This would allow proposals such as short circuiting sort to be implemented
945as a module on CPAN.
946
947=head2 lexical aliases
948
949Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
950
951=head2 entersub XS vs Perl
952
953At the moment pp_entersub is huge, and has code to deal with entering both
954perl and XS subroutines. Subroutine implementations rarely change between
955perl and XS at run time, so investigate using 2 ops to enter subs (one for
956XS, one for perl) and swap between if a sub is redefined.
2810d901 957
de535794 958=head2 Self-ties
2810d901 959
de535794 960Self-ties are currently illegal because they caused too many segfaults. Maybe
a8d0aeb9 961the causes of these could be tracked down and self-ties on all types
de535794 962reinstated.
0bdfc961 963
964=head2 Optimize away @_
965
966The old perltodo notes "Look at the "reification" code in C<av.c>".
967
87a942b1 968=head2 Virtualize operating system access
969
970Implement a set of "vtables" that virtualizes operating system access
971(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
972least these interfaces should take SVs as "name" arguments instead of
973bare char pointers; probably the most flexible and extensible way
e1a3d5d1 974would be for the Perl-facing interfaces to accept HVs. The system
975needs to be per-operating-system and per-file-system
976hookable/filterable, preferably both from XS and Perl level
87a942b1 977(L<perlport/"Files and Filesystems"> is good reading at this point,
978in fact, all of L<perlport> is.)
979
e1a3d5d1 980This has actually already been implemented (but only for Win32),
981take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
982variants go through a set of "vtables" for operating system access,
983non-Win32 systems currently go straight for the POSIX/UNIX-style
984system/library call. Similar system as for Win32 should be
985implemented for all platforms. The existing Win32 implementation
986probably does not need to survive alongside this proposed new
987implementation, the approaches could be merged.
87a942b1 988
989What would this give us? One often-asked-for feature this would
94da6c29 990enable is using Unicode for filenames, and other "names" like %ENV,
991usernames, hostnames, and so forth.
992(See L<perlunicode/"When Unicode Does Not Happen">.)
993
994But this kind of virtualization would also allow for things like
995virtual filesystems, virtual networks, and "sandboxes" (though as long
996as dynamic loading of random object code is allowed, not very safe
997sandboxes since external code of course know not of Perl's vtables).
998An example of a smaller "sandbox" is that this feature can be used to
999implement per-thread working directories: Win32 already does this.
1000
1001See also L</"Extend PerlIO and PerlIO::Scalar">.
87a942b1 1002
ac6197af 1003=head2 Investigate PADTMP hash pessimisation
1004
9a2f2e6b 1005The peephole optimiser converts constants used for hash key lookups to shared
057163d7 1006hash key scalars. Under ithreads, something is undoing this work.
ac6197af 1007See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
1008
057163d7 1009=head2 Store the current pad in the OP slab allocator
1010
1011=for clarification
1012I hope that I got that "current pad" part correct
1013
1014Currently we leak ops in various cases of parse failure. I suggested that we
1015could solve this by always using the op slab allocator, and walking it to
1016free ops. Dave comments that as some ops are already freed during optree
1017creation one would have to mark which ops are freed, and not double free them
1018when walking the slab. He notes that one problem with this is that for some ops
1019you have to know which pad was current at the time of allocation, which does
1020change. I suggested storing a pointer to the current pad in the memory allocated
1021for the slab, and swapping to a new slab each time the pad changes. Dave thinks
1022that this would work.
1023
52960e22 1024=head2 repack the optree
1025
1026Repacking the optree after execution order is determined could allow
057163d7 1027removal of NULL ops, and optimal ordering of OPs with respect to cache-line
1028filling. The slab allocator could be reused for this purpose. I think that
1029the best way to do this is to make it an optional step just before the
1030completed optree is attached to anything else, and to use the slab allocator
1031unchanged, so that freeing ops is identical whether or not this step runs.
1032Note that the slab allocator allocates ops downwards in memory, so one would
1033have to actually "allocate" the ops in reverse-execution order to get them
1034contiguous in memory in execution order.
1035
1036See http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
1037
1038Note that running this copy, and then freeing all the old location ops would
1039cause their slabs to be freed, which would eliminate possible memory wastage if
1040the previous suggestion is implemented, and we swap slabs more frequently.
52960e22 1041
12e06b6f 1042=head2 eliminate incorrect line numbers in warnings
1043
1044This code
1045
1046 use warnings;
1047 my $undef;
1048
1049 if ($undef == 3) {
1050 } elsif ($undef == 0) {
1051 }
1052
18a16cc5 1053used to produce this output:
12e06b6f 1054
1055 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1056 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1057
18a16cc5 1058where the line of the second warning was misreported - it should be line 5.
1059Rafael fixed this - the problem arose because there was no nextstate OP
1060between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
1061reports that the currently executing line is line 4. The solution was to inject
1062a nextstate OPs for each C<elsif>, although it turned out that the nextstate
1063OP needed to be a nulled OP, rather than a live nextstate OP, else other line
1064numbers became misreported. (Jenga!)
12e06b6f 1065
1066The problem is more general than C<elsif> (although the C<elsif> case is the
1067most common and the most confusing). Ideally this code
1068
1069 use warnings;
1070 my $undef;
1071
1072 my $a = $undef + 1;
1073 my $b
1074 = $undef
1075 + 1;
1076
1077would produce this output
1078
1079 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1080 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1081
1082(rather than lines 4 and 5), but this would seem to require every OP to carry
1083(at least) line number information.
1084
1085What might work is to have an optional line number in memory just before the
1086BASEOP structure, with a flag bit in the op to say whether it's present.
1087Initially during compile every OP would carry its line number. Then add a late
1088pass to the optimiser (potentially combined with L</repack the optree>) which
1089looks at the two ops on every edge of the graph of the execution path. If
1090the line number changes, flags the destination OP with this information.
1091Once all paths are traced, replace every op with the flag with a
1092nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
1093control on to the true op. All ops would then be replaced by variants that
1094do not store the line number. (Which, logically, why it would work best in
1095conjunction with L</repack the optree>, as that is already copying/reallocating
1096all the OPs)
1097
18a16cc5 1098(Although I should note that we're not certain that doing this for the general
1099case is worth it)
1100
52960e22 1101=head2 optimize tail-calls
1102
1103Tail-calls present an opportunity for broadly applicable optimization;
1104anywhere that C<< return foo(...) >> is called, the outer return can
1105be replaced by a goto, and foo will return directly to the outer
1106caller, saving (conservatively) 25% of perl's call&return cost, which
1107is relatively higher than in C. The scheme language is known to do
1108this heavily. B::Concise provides good insight into where this
1109optimization is possible, ie anywhere entersub,leavesub op-sequence
1110occurs.
1111
1112 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1113
1114Bottom line on this is probably a new pp_tailcall function which
1115combines the code in pp_entersub, pp_leavesub. This should probably
1116be done 1st in XS, and using B::Generate to patch the new OP into the
1117optrees.
1118
0bdfc961 1119=head1 Big projects
1120
1121Tasks that will get your name mentioned in the description of the "Highlights
87a942b1 1122of 5.12"
0bdfc961 1123
1124=head2 make ithreads more robust
1125
4e577f8b 1126Generally make ithreads more robust. See also L</iCOW>
0bdfc961 1127
1128This task is incremental - even a little bit of work on it will help, and
1129will be greatly appreciated.
1130
6c047da7 1131One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1132
59c7f7d5 1133Fix Perl_sv_dup, et al so that threads can return objects.
1134
0bdfc961 1135=head2 iCOW
1136
1137Sarathy and Arthur have a proposal for an improved Copy On Write which
1138specifically will be able to COW new ithreads. If this can be implemented
1139it would be a good thing.
1140
1141=head2 (?{...}) closures in regexps
1142
1143Fix (or rewrite) the implementation of the C</(?{...})/> closures.
1144
1145=head2 A re-entrant regexp engine
1146
1147This will allow the use of a regex from inside (?{ }), (??{ }) and
1148(?(?{ })|) constructs.
6bda09f9 1149
6bda09f9 1150=head2 Add class set operations to regexp engine
1151
1152Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1153
1154demerphq has this on his todo list, but right at the bottom.
44a7a252 1155
1156
1157=head1 Tasks for microperl
1158
1159
1160[ Each and every one of these may be obsolete, but they were listed
1161 in the old Todo.micro file]
1162
1163
1164=head2 make creating uconfig.sh automatic
1165
1166=head2 make creating Makefile.micro automatic
1167
1168=head2 do away with fork/exec/wait?
1169
1170(system, popen should be enough?)
1171
1172=head2 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
1173
1174(uConfigure? :-) native datatype widths and endianness come to mind
1175