C<perlio> provides this, but the interface could be a lot more
straightforward.
-=head2 Eliminate need for "use utf8";
-
-While the C<utf8> pragma is autoloaded when necessary, it's still needed
-for things like Unicode characters in a source file. The UTF8 hint can
-always be set to true, but it needs to be set to false when F<utf8.pm>
-is being compiled. (To stop Perl trying to autoload the C<utf8>
-pragma...)
-
-=head2 Create a char *sv_printify(sv, STRLEN *lenp, UV flags) function
-
-For displaying PVs with control characters, embedded nulls, and Unicode.
-This would be useful for printing warnings, or data and regex dumping,
-not_a_number(), and so on.
-
-=head2 Autoload byte.pm
+=head2 Autoload bytes.pm
When the lexer sees, for instance, C<bytes::length>, it should
automatically load the C<bytes> pragma.
C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
flamewar.
+=head2 Create a char *sv_pvprintify(sv, STRLEN *lenp, UV flags)
+
+For displaying PVs with control characters, embedded nulls, and Unicode.
+This would be useful for printing warnings, or data and regex dumping,
+not_a_number(), and so on.
+
+Requirements: should handle both byte and UTF8 strings. isPRINT()
+characters printed as-is, character less than 256 as \xHH, Unicode
+characters as \x{HHH}. Don't assume ASCII-like, either, get somebody
+on EBCDIC to test the output.
+
+Possible options, controlled by the flags:
+- whitespace (other than ' ' of isPRINT()) printed as-is
+- use isPRINT_LC() instead of isPRINT()
+- print control characters like this: "\cA"
+- print control characters like this: "^A"
+- non-PRINTables printed as '.' instead of \xHH
+- use \OOO instead of \xHH
+- use the C/Perl-metacharacters like \n, \t
+- have a maximum length for the produced string (read it from *lenp)
+- append a "..." to the produced string if the maximum length is exceeded
+- really fancy: print unicode characters as \N{...}
+
=head2 Overloadable regex assertions
This may or may not be possible with the current regular expression
=head2 Unicode regular expression character classes
-They have some tricks Perl doesn't yet implement.
+They have some tricks Perl doesn't yet implement like character
+class subtraction.
http://www.unicode.org/unicode/reports/tr18/
target systems sees is not the same what the build host sees, various
input, output, and (Perl) library files need to be copied back and forth.
+As of 5.8.0 Configure mostly works for cross-compilation
+(used successfully for iPAQ Linux), miniperl gets built,
+but then building DynaLoader (and other extensions) fails
+since MakeMaker knows nothing of cross-compilation.
+(See INSTALL/Cross-compilation for the state of things.)
+
=head2 Perl preprocessor / macros
Source filters help with this, but do not get us all the way. For
There should be a way of restarting the debugger on demand.
+=head2 Test Suite for the Debugger
+
+The debugger is a complex piece of software and fixing something
+here may inadvertently break something else over there. To tame
+this chaotic behaviour, a test suite is necessary.
+
=head2 my sub foo { }
The basic principle is sound, but there are problems with the semantics
It's unclear what this should do or how to do it without breaking old
code.
-=head2 Make tr/// return histogram
+=head2 Make tr/// return histogram of characters in list context
There is a patch for this, but it may require Unicodification.
=head2 "class"-based lexicals
Use flyweight objects, secure hashes or, dare I say it, pseudo-hashes instead.
+(Or whatever will replace pseudohashes in 5.10.)
=head2 byteperl
=head2 Lazy evaluation / tail recursion removal
-C<List::Util> in core gives some of these; tail recursion removal is
-done manually, with C<goto &whoami;>. (However, MJD has found that
-C<goto &whoami> introduces a performance penalty, so maybe there should
-be a way to do this after all: C<sub foo {START: ... goto START;> is
-better.)
+C<List::Util> gives first() (a short-circuiting grep); tail recursion
+removal is done manually, with C<goto &whoami;>. (However, MJD has
+found that C<goto &whoami> introduces a performance penalty, so maybe
+there should be a way to do this after all: C<sub foo {START: ... goto
+START;> is better.)
=head2 Make "use utf8" the default
-There is a patch available for this, search p5p archives for
-the Subject "[EXPERIMENTAL PATCH] make unicode (utf8) default"
-but this would be unacceptable because of backward compatibility:
-scripts could not contain B<any legacy eight-bit data>. Also would
-introduce a measurable slowdown of at least few percentages since all
-regular expression operations would be done in full UTF-8.
+Because of backward compatibility this is difficult: scripts could not
+contain B<any legacy eight-bit data> (like Latin-1) anymore, even in
+string literals or pod. Also would introduce a measurable slowdown of
+at least few percentages since all regular expression operations would
+be done in full UTF-8. But if you want to try this, add
+-DUSE_UTF8_SCRIPTS to your compilation flags.
+