cleaning up perly.c
[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
CommitLineData
7711098a 1=head1 NAME
2
3perltodo - Perl TO-DO List
4
5=head1 DESCRIPTION
e50bb9a1 6
0bdfc961 7This is a list of wishes for Perl. The tasks we think are smaller or easier
8are listed first. Anyone is welcome to work on any of these, but it's a good
9idea to first contact I<perl5-porters@perl.org> to avoid duplication of
10effort. By all means contact a pumpking privately first if you prefer.
e50bb9a1 11
0bdfc961 12Whilst patches to make the list shorter are most welcome, ideas to add to
13the list are also encouraged. Check the perl5-porters archives for past
14ideas, and any discussion about them. One set of archives may be found at:
e50bb9a1 15
0bdfc961 16 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
938c8732 17
617eabfa 18What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
19not, but if your patch is incorporated, then we'll add your name to the
20F<AUTHORS> file, which ships in the official distribution. How many other
21programming languages offer you 1 line of immortality?
938c8732 22
4e577f8b 23=head1 The roadmap to 5.10
938c8732 24
4e577f8b 25The roadmap to 5.10 envisages feature based releases, as various items in this
26TODO are completed.
27
4e577f8b 28=head2 Needed for a 5.9.4 release
29
30=over
31
32=item *
78ef48ad 33
34Review assertions. Review syntax to combine assertions. Assertions could take
35advantage of the lexical pragmas work. L</What hooks would assertions need?>
4e577f8b 36
860f190d 37=item *
38
39C<encoding::warnings> should be turned into a lexical pragma.
40
4e577f8b 41=back
42
43=head2 Needed for a 5.9.5 release
44
45=over
46
47=item *
48Implement L</_ prototype character>
49
50=item *
51Implement L</state variables>
52
53=back
54
55=head2 Needed for a 5.9.6 release
56
57Stabilisation. If all goes well, this will be the equivalent of a 5.10-beta.
e50bb9a1 58
0bdfc961 59=head1 Tasks that only need Perl knowledge
e50bb9a1 60
0bdfc961 61=head2 common test code for timed bail out
e50bb9a1 62
0bdfc961 63Write portable self destruct code for tests to stop them burning CPU in
64infinite loops. This needs to avoid using alarm, as some of the tests are
65testing alarm/sleep or timers.
e50bb9a1 66
0bdfc961 67=head2 POD -> HTML conversion in the core still sucks
e50bb9a1 68
938c8732 69Which is crazy given just how simple POD purports to be, and how simple HTML
adebf063 70can be. It's not actually I<as> simple as it sounds, particularly with the
71flexibility POD allows for C<=item>, but it would be good to improve the
72visual appeal of the HTML generated, and to avoid it having any validation
73errors. See also L</make HTML install work>, as the layout of installation tree
74is needed to improve the cross-linking.
938c8732 75
dc0fb092 76The addition of C<Pod::Simple> and its related modules may make this task
77easier to complete.
78
aa237293 79=head2 Parallel testing
80
81The core regression test suite is getting ever more comprehensive, which has
82the side effect that it takes longer to run. This isn't so good. Investigate
83whether it would be feasible to give the harness script the B<option> of
84running sets of tests in parallel. This would be useful for tests in
85F<t/op/*.t> and F<t/uni/*.t> and maybe some sets of tests in F<lib/>.
86
87Questions to answer
88
89=over 4
90
91=item 1
92
93How does screen layout work when you're running more than one test?
94
95=item 2
96
97How does the caller of test specify how many tests to run in parallel?
98
99=item 3
100
101How do setup/teardown tests identify themselves?
102
103=back
104
105Pugs already does parallel testing - can their approach be re-used?
106
0bdfc961 107=head2 Make Schwern poorer
e50bb9a1 108
0bdfc961 109We should have for everything. When all the core's modules are tested,
110Schwern has promised to donate to $500 to TPF. We may need volunteers to
111hold him upside down and shake vigorously in order to actually extract the
112cash.
3958b146 113
0bdfc961 114See F<t/lib/1_compile.t> for the 3 remaining modules that need tests.
e50bb9a1 115
0bdfc961 116=head2 Improve the coverage of the core tests
e50bb9a1 117
0bdfc961 118Use Devel::Cover to ascertain the core's test coverage, then add tests that
119are currently missing.
30222c0f 120
0bdfc961 121=head2 test B
e50bb9a1 122
0bdfc961 123A full test suite for the B module would be nice.
e50bb9a1 124
0bdfc961 125=head2 A decent benchmark
e50bb9a1 126
617eabfa 127C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961 128would be useful to have a reasonable general benchmarking suite that roughly
129represented what current perl programs do, and measurably reported whether
130tweaks to the core improve, degrade or don't really affect performance, to
131guide people attempting to optimise the guts of perl. Gisle would welcome
132new tests for perlbench.
6168cf99 133
0bdfc961 134=head2 fix tainting bugs
6168cf99 135
0bdfc961 136Fix the bugs revealed by running the test suite with the C<-t> switch (via
137C<make test.taintwarn>).
e50bb9a1 138
0bdfc961 139=head2 Dual life everything
e50bb9a1 140
0bdfc961 141As part of the "dists" plan, anything that doesn't belong in the smallest perl
142distribution needs to be dual lifed. Anything else can be too. Figure out what
143changes would be needed to package that module and its tests up for CPAN, and
144do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 145
0bdfc961 146=head2 Improving C<threads::shared>
722d2a37 147
0bdfc961 148Investigate whether C<threads::shared> could share aggregates properly with
149only Perl level changes to shared.pm
722d2a37 150
0bdfc961 151=head2 POSIX memory footprint
e50bb9a1 152
0bdfc961 153Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
154various times worked to cut it down. There is probably still fat to cut out -
155for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 156
e50bb9a1 157
e50bb9a1 158
e50bb9a1 159
e50bb9a1 160
adebf063 161
adebf063 162
0bdfc961 163=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 164
0bdfc961 165Or if you prefer, tasks that you would learn from, and broaden your skills
166base...
e50bb9a1 167
617eabfa 168=head2 Relocatable perl
169
170The C level patches needed to create a relocatable perl binary are done, as
171is the work on F<Config.pm>. All that's left to do is the C<Configure> tweaking
172to let people specify how they want to do the install.
173
cd793d32 174=head2 make HTML install work
e50bb9a1 175
adebf063 176There is an C<installhtml> target in the Makefile. It's marked as
177"experimental". It would be good to get this tested, make it work reliably, and
178remove the "experimental" tag. This would include
179
180=over 4
181
182=item 1
183
184Checking that cross linking between various parts of the documentation works.
185In particular that links work between the modules (files with POD in F<lib/>)
186and the core documentation (files in F<pod/>)
187
188=item 2
189
617eabfa 190Work out how to split C<perlfunc> into chunks, preferably one per function
191group, preferably with general case code that could be used elsewhere.
192Challenges here are correctly identifying the groups of functions that go
193together, and making the right named external cross-links point to the right
194page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
195C<endservent>, two or more C<=items> giving the different parameter lists, such
196as
adebf063 197
198 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
199
200 =item substr EXPR,OFFSET,LENGTH
201
202 =item substr EXPR,OFFSET
203
204and different parameter lists having different meanings. (eg C<select>)
205
206=back
3a89a73c 207
0bdfc961 208=head2 compressed man pages
209
210Be able to install them. This would probably need a configure test to see how
211the system does compressed man pages (same directory/different directory?
212same filename/different filename), as well as tweaking the F<installman> script
213to compress as necessary.
214
30222c0f 215=head2 Add a code coverage target to the Makefile
216
217Make it easy for anyone to run Devel::Cover on the core's tests. The steps
218to do this manually are roughly
219
220=over 4
221
222=item *
223
224do a normal C<Configure>, but include Devel::Cover as a module to install
225(see F<INSTALL> for how to do this)
226
227=item *
228
229 make perl
230
231=item *
232
233 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
234
235=item *
236
237Process the resulting Devel::Cover database
238
239=back
240
241This just give you the coverage of the F<.pm>s. To also get the C level
242coverage you need to
243
244=over 4
245
246=item *
247
248Additionally tell C<Configure> to use the appropriate C compiler flags for
249C<gcov>
250
251=item *
252
253 make perl.gcov
254
255(instead of C<make perl>)
256
257=item *
258
259After running the tests run C<gcov> to generate all the F<.gcov> files.
260(Including down in the subdirectories of F<ext/>
261
262=item *
263
264(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
265to get their stats into the cover_db directory.
266
267=item *
268
269Then process the Devel::Cover database
270
271=back
272
273It would be good to add a single switch to C<Configure> to specify that you
274wanted to perform perl level coverage, and another to specify C level
275coverage, and have C<Configure> and the F<Makefile> do all the right things
276automatically.
277
0bdfc961 278=head2 Make Config.pm cope with differences between build and installed perl
279
280Quite often vendors ship a perl binary compiled with their (pay-for)
281compilers. People install a free compiler, such as gcc. To work out how to
282build extensions, Perl interrogates C<%Config>, so in this situation
283C<%Config> describes compilers that aren't there, and extension building
284fails. This forces people into choosing between re-compiling perl themselves
285using the compiler they have, or only using modules that the vendor ships.
286
287It would be good to find a way teach C<Config.pm> about the installation setup,
288possibly involving probing at install time or later, so that the C<%Config> in
289a binary distribution better describes the installed machine, when the
290installed machine differs from the build machine in some significant way.
291
46925299 292=head2 make parallel builds work
0bdfc961 293
46925299 294Currently parallel builds (such as C<make -j3>) don't work reliably. We believe
295that this is due to incomplete dependency specification in the F<Makefile>.
296It would be good if someone were able to track down the causes of these
297problems, so that parallel builds worked properly.
0bdfc961 298
728f4ecd 299=head2 linker specification files
300
301Some platforms mandate that you provide a list of a shared library's external
302symbols to the linker, so the core already has the infrastructure in place to
303do this for generating shared perl libraries. My understanding is that the
304GNU toolchain can accept an optional linker specification file, and restrict
305visibility just to symbols declared in that file. It would be good to extend
306F<makedef.pl> to support this format, and to provide a means within
307C<Configure> to enable it. This would allow Unix users to test that the
308export list is correct, and to build a perl that does not pollute the global
309namespace with private symbols.
310
8523e164 311
0bdfc961 312
313
314=head1 Tasks that need a little C knowledge
315
316These tasks would need a little C knowledge, but don't need any specific
317background or experience with XS, or how the Perl interpreter works
318
319=head2 Make it clear from -v if this is the exact official release
89007cb3 320
617eabfa 321Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
322usually defines one local patch, of the form "MAINT12345" or "RC1". The output
323of perl -v doesn't report that a perl isn't an official release, and this
89007cb3 324information can get lost in bugs reports. Because of this, the minor version
fa11829f 325isn't bumped up until RC time, to minimise the possibility of versions of perl
89007cb3 326escaping that believe themselves to be newer than they actually are.
327
328It would be useful to find an elegant way to have the "this is an interim
329maintenance release" or "this is a release candidate" in the terse -v output,
330and have it so that it's easy for the pumpking to remove this just as the
331release tarball is rolled up. This way the version pulled out of rsync would
332always say "I'm a development release" and it would be safe to bump the
333reported minor version as soon as a release ships, which would aid perl
334developers.
335
0bdfc961 336This task is really about thinking of an elegant way to arrange the C source
337such that it's trivial for the Pumpking to flag "this is an official release"
338when making a tarball, yet leave the default source saying "I'm not the
339official release".
340
0f788cd2 341=head2 Ordering of "global" variables.
342
343F<thrdvar.h> and F<intrpvarh> define the "global" variables that need to be
344per-thread under ithreads, where the variables are actually elements in a
345structure. As C dictates, the variables must be laid out in order of
346declaration. There is a comment
347C</* Important ones in the first cache line (if alignment is done right) */>
348which implies that at some point in the past the ordering was carefully chosen
349(at least in part). However, it's clear that the ordering is less than perfect,
350as currently there are things such as 7 C<bool>s in a row, then something
351typically requiring 4 byte alignment, and then an odd C<bool> later on.
352(C<bool>s are typically defined as C<char>s). So it would be good for someone
353to review the ordering of the variables, to see how much alignment padding can
354be removed.
355
d7939546 356It's also worth checking that all variables are actually used. Perl 5.8.0
357shipped with C<PL_nrs> still defined in F<thrdvar.h>, despite it being unused
358since a change over a year earlier. Had this been spotted before release, it
359could have been removed, but now it has to remain in the 5.8.x releases to
360keep the structure the same size, to retain binary compatibility.
361
62403a3c 362=head2 am I hot or not?
363
364The idea of F<pp_hot.c> is that it contains the I<hot> ops, the ops that are
365most commonly used. The idea is that by grouping them, their object code will
366be adjacent in the executable, so they have a greater chance of already being
367in the CPU cache (or swapped in) due to being near another op already in use.
368
369Except that it's not clear if these really are the most commonly used ops. So
370anyone feeling like exercising their skill with coverage and profiling tools
371might want to determine what ops I<really> are the most commonly used. And in
372turn suggest evictions and promotions to achieve a better F<pp_hot.c>.
373
0bdfc961 374
375
376
0bdfc961 377=head1 Tasks that need a knowledge of XS
e50bb9a1 378
0bdfc961 379These tasks would need C knowledge, and roughly the level of knowledge of
380the perl API that comes from writing modules that use XS to interface to
381C.
382
383=head2 IPv6
384
385Clean this up. Check everything in core works
386
4a750395 387=head2 shrink C<GV>s, C<CV>s
388
389By removing unused elements and careful re-ordering, the structures for C<AV>s
390and C<HV>s have recently been shrunk considerably. It's probable that the same
391approach would find savings in C<GV>s and C<CV>s, if not all the other
392larger-than-C<PVMG> types.
393
0bdfc961 394=head2 UTF8 caching code
395
396The string position/offset cache is not optional. It should be.
397
398=head2 Implicit Latin 1 => Unicode translation
399
400Conversions from byte strings to UTF-8 currently map high bit characters
401to Unicode without translation (or, depending on how you look at it, by
402implicitly assuming that the byte strings are in Latin-1). As perl assumes
403the C locale by default, upgrading a string to UTF-8 may change the
404meaning of its contents regarding character classes, case mapping, etc.
405This should probably emit a warning (at least).
406
407This task is incremental - even a little bit of work on it will help.
e50bb9a1 408
cd793d32 409=head2 autovivification
e50bb9a1 410
cd793d32 411Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
e50bb9a1 412
0bdfc961 413This task is incremental - even a little bit of work on it will help.
e50bb9a1 414
0bdfc961 415=head2 Unicode in Filenames
e50bb9a1 416
0bdfc961 417chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
418opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
419system, truncate, unlink, utime, -X. All these could potentially accept
420Unicode filenames either as input or output (and in the case of system
421and qx Unicode in general, as input or output to/from the shell).
422Whether a filesystem - an operating system pair understands Unicode in
423filenames varies.
e50bb9a1 424
0bdfc961 425Known combinations that have some level of understanding include
426Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
427OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
428create Unicode filenames, what forms of Unicode are accepted and used
429(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
430and so on, varies. Finding the right level of interfacing to Perl
431requires some thought. Remember that an OS does not implicate a
432filesystem.
e50bb9a1 433
0bdfc961 434(The Windows -C command flag "wide API support" has been at least
435temporarily retired in 5.8.1, and the -C has been repurposed, see
436L<perlrun>.)
969e704b 437
0bdfc961 438=head2 Unicode in %ENV
969e704b 439
0bdfc961 440Currently the %ENV entries are always byte strings.
e50bb9a1 441
0bdfc961 442=head2 use less 'memory'
e50bb9a1 443
0bdfc961 444Investigate trade offs to switch out perl's choices on memory usage.
445Particularly perl should be able to give memory back.
e50bb9a1 446
0bdfc961 447This task is incremental - even a little bit of work on it will help.
0abe3f7c 448
0bdfc961 449=head2 Re-implement C<:unique> in a way that is actually thread-safe
0abe3f7c 450
0bdfc961 451The old implementation made bad assumptions on several levels. A good 90%
452solution might be just to make C<:unique> work to share the string buffer
453of SvPVs. That way large constant strings can be shared between ithreads,
454such as the configuration information in F<Config>.
0abe3f7c 455
0bdfc961 456=head2 Make tainting consistent
0abe3f7c 457
0bdfc961 458Tainting would be easier to use if it didn't take documented shortcuts and
459allow taint to "leak" everywhere within an expression.
0abe3f7c 460
0bdfc961 461=head2 readpipe(LIST)
0abe3f7c 462
0bdfc961 463system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
464running a shell. readpipe() (the function behind qx//) could be similarly
465extended.
0abe3f7c 466
e50bb9a1 467
e50bb9a1 468
e50bb9a1 469
f86a8bc5 470
0bdfc961 471=head1 Tasks that need a knowledge of the interpreter
3298bd4d 472
0bdfc961 473These tasks would need C knowledge, and knowledge of how the interpreter works,
474or a willingness to learn.
3298bd4d 475
0bdfc961 476=head2 lexical pragmas
477
78ef48ad 478Document the new support for lexical pragmas in 5.9.3 and how %^H works.
479Maybe C<re>, C<encoding>, maybe other pragmas could be made lexical.
0562c0e3 480
d10fc472 481=head2 Attach/detach debugger from running program
1626a787 482
cd793d32 483The old perltodo notes "With C<gdb>, you can attach the debugger to a running
484program if you pass the process ID. It would be good to do this with the Perl
0bdfc961 485debugger on a running Perl program, although I'm not sure how it would be
486done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 487
0bdfc961 488=head2 Constant folding
80b46460 489
0bdfc961 490The peephole optimiser should trap errors during constant folding, and give
491up on the folding, rather than bailing out at compile time. It is quite
492possible that the unfoldable constant is in unreachable code, eg something
493akin to C<$a = 0/0 if 0;>
494
495=head2 LVALUE functions for lists
496
497The old perltodo notes that lvalue functions don't work for list or hash
498slices. This would be good to fix.
499
500=head2 LVALUE functions in the debugger
501
502The old perltodo notes that lvalue functions don't work in the debugger. This
503would be good to fix.
504
505=head2 _ prototype character
506
507Study the possibility of adding a new prototype character, C<_>, meaning
508"this argument defaults to $_".
509
4e577f8b 510=head2 state variables
511
512C<my $foo if 0;> is deprecated, and should be replaced with
513C<state $x = "initial value\n";> the syntax from Perl 6.
514
0bdfc961 515=head2 @INC source filter to Filter::Simple
516
517The second return value from a sub in @INC can be a source filter. This isn't
518documented. It should be changed to use Filter::Simple, tested and documented.
519
520=head2 regexp optimiser optional
521
522The regexp optimiser is not optional. It should configurable to be, to allow
523its performance to be measured, and its bugs to be easily demonstrated.
524
525=head2 UNITCHECK
526
527Introduce a new special block, UNITCHECK, which is run at the end of a
528compilation unit (module, file, eval(STRING) block). This will correspond to
529the Perl 6 CHECK. Perl 5's CHECK cannot be changed or removed because the
530O.pm/B.pm backend framework depends on it.
531
532=head2 optional optimizer
533
534Make the peephole optimizer optional. Currently it performs two tasks as
535it walks the optree - genuine peephole optimisations, and necessary fixups of
536ops. It would be good to find an efficient way to switch out the
537optimisations whilst keeping the fixups.
538
539=head2 You WANT *how* many
540
541Currently contexts are void, scalar and list. split has a special mechanism in
542place to pass in the number of return values wanted. It would be useful to
543have a general mechanism for this, backwards compatible and little speed hit.
544This would allow proposals such as short circuiting sort to be implemented
545as a module on CPAN.
546
547=head2 lexical aliases
548
549Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
550
551=head2 entersub XS vs Perl
552
553At the moment pp_entersub is huge, and has code to deal with entering both
554perl and XS subroutines. Subroutine implementations rarely change between
555perl and XS at run time, so investigate using 2 ops to enter subs (one for
556XS, one for perl) and swap between if a sub is redefined.
2810d901 557
558=head2 Self ties
559
560self ties are currently illegal because they caused too many segfaults. Maybe
561the causes of these could be tracked down and self-ties on all types re-
562instated.
0bdfc961 563
564=head2 Optimize away @_
565
566The old perltodo notes "Look at the "reification" code in C<av.c>".
567
0bdfc961 568=head2 What hooks would assertions need?
569
570Assertions are in the core, and work. However, assertions needed to be added
571as a core patch, rather than an XS module in ext, or a CPAN module, because
572the core has no hooks in the necessary places. It would be useful to
573investigate what hooks would need to be added to make it possible to provide
574the full assertion support from a CPAN module, so that we aren't constraining
575the imagination of future CPAN authors.
576
577
578
579
580
0bdfc961 581=head1 Big projects
582
583Tasks that will get your name mentioned in the description of the "Highlights
584of 5.10"
585
586=head2 make ithreads more robust
587
4e577f8b 588Generally make ithreads more robust. See also L</iCOW>
0bdfc961 589
590This task is incremental - even a little bit of work on it will help, and
591will be greatly appreciated.
592
6c047da7 593One bit would be to write the missing code in sv.c:Perl_dirp_dup.
594
0bdfc961 595=head2 iCOW
596
597Sarathy and Arthur have a proposal for an improved Copy On Write which
598specifically will be able to COW new ithreads. If this can be implemented
599it would be a good thing.
600
601=head2 (?{...}) closures in regexps
602
603Fix (or rewrite) the implementation of the C</(?{...})/> closures.
604
605=head2 A re-entrant regexp engine
606
607This will allow the use of a regex from inside (?{ }), (??{ }) and
608(?(?{ })|) constructs.