C<perlio> provides this, but the interface could be a lot more
straightforward.
-=head2 Eliminate need for "use utf8";
+=head2 Autoload bytes.pm
-While the C<utf8> pragma is autoloaded when necessary, it's still needed
-for things like Unicode characters in a source file. The UTF8 hint can
-always be set to true, but it needs to be set to false when F<utf8.pm>
-is being compiled. (To stop Perl trying to autoload the C<utf8>
-pragma...)
+When the lexer sees, for instance, C<bytes::length>, it should
+automatically load the C<bytes> pragma.
+
+=head2 Make "\u{XXXX}" et al work
+
+Danger, Will Robinson! Discussing the semantics of C<"\x{F00}">,
+C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
+flamewar.
=head2 Create a char *sv_pvprintify(sv, STRLEN *lenp, UV flags)
- append a "..." to the produced string if the maximum length is exceeded
- really fancy: print unicode characters as \N{...}
-=head2 Autoload byte.pm
-
-When the lexer sees, for instance, C<bytes::length>, it should
-automatically load the C<bytes> pragma.
-
-=head2 Make "\u{XXXX}" et al work
-
-Danger, Will Robinson! Discussing the semantics of C<"\x{F00}">,
-C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
-flamewar.
-
=head2 Overloadable regex assertions
This may or may not be possible with the current regular expression
algorithmically computed if you're dealing with Thai text. Hence, the
B<\b> assertion wants to be overloaded by a function.
-=head2 Unicode collation and normalization
+=head2 Unicode
-Simon Cozens promises to work on this.
+=over 4
- Collation? http://www.unicode.org/unicode/reports/tr10/
- Normalization? http://www.unicode.org/unicode/reports/tr15/
+=item *
+
+Allow for long form of the General Category Properties, e.g
+C<\p{IsOpenPunctuation}>, not just the abbreviated form, e.g.
+C<\p{IsPs}>.
+
+=item *
+
+Allow for the metaproperties: C<XID Start>, C<XID Continue>,
+C<NF*_NO>, C<NF*_MAYBE> (require the DerivedCoreProperties and
+DerviceNormalizationProperties files).
-=head2 Unicode case mappings
+There are also enumerated properties: C<Decomposition Type>,
+C<Numeric Type>, C<East Asian Width>, C<Line Break>. These
+properties have multiple values: for uniqueness the property
+value should be appended. For example, C<\p{IsAlphabetic}>
+wouldbe the binary property, while C<\p{AlphabeticLineBreak}>
+would mean the enumerated property.
+
+=item *
Case Mappings? http://www.unicode.org/unicode/reports/tr21/
-=head2 Unicode regular expression character classes
+lc(), uc(), lcfirst(), and ucfirst() work only for some of the
+simplest cases, where the mapping goes from a single Unicode character
+to another single Unicode character. See lib/unicore/SpecCase.txt
+(and CaseFold.txt).
+
+=item *
They have some tricks Perl doesn't yet implement like character
class subtraction.
http://www.unicode.org/unicode/reports/tr18/
-=head2 Unicode Scripts support
+=back
-Currently the C<\p{In...}> supports only the Blocks database, like
-C<\p{BasicLatin}>, C<\p{InGreek}>, C<\p{InThai}>, but there's also the
-Scripts database, which has members like C<Latin>, C<Greek>,
-C<Armenian>, C<Han>. It is desireable that also the script names
-could be used for the C<\p{In...}> construct. Note: needs to be
-researched whether this is possible, that is, are there conflicts
-between the Blocks and the Scripts, is the Blocks Greek the same as
-the Scripts Greek?
+See L<perlunicode/UNICODE REGULAR EXPRESSION SUPPORT LEVEL> for what's
+there and what's missing. Almost all of Levels 2 and 3 is missing,
+and as of 5.8.0 not even all of Level 1 is there.
=head2 use Thread for iThreads
target systems sees is not the same what the build host sees, various
input, output, and (Perl) library files need to be copied back and forth.
+As of 5.8.0 Configure mostly works for cross-compilation
+(used successfully for iPAQ Linux), miniperl gets built,
+but then building DynaLoader (and other extensions) fails
+since MakeMaker knows nothing of cross-compilation.
+(See INSTALL/Cross-compilation for the state of things.)
+
=head2 Perl preprocessor / macros
Source filters help with this, but do not get us all the way. For
=head2 All ARGV input should act like E<lt>E<gt>
+eg C<read(ARGV, ...)> doesn't currently read across multiple files.
+
=head2 Support for rerunning debugger
There should be a way of restarting the debugger on demand.
It's unclear what this should do or how to do it without breaking old
code.
-=head2 Make tr/// return histogram
+=head2 Make tr/// return histogram of characters in list context
There is a patch for this, but it may require Unicodification.
=head2 "class"-based lexicals
Use flyweight objects, secure hashes or, dare I say it, pseudo-hashes instead.
+(Or whatever will replace pseudohashes in 5.10.)
=head2 byteperl
=head2 Lazy evaluation / tail recursion removal
-C<List::Util> in core gives some of these; tail recursion removal is
-done manually, with C<goto &whoami;>. (However, MJD has found that
-C<goto &whoami> introduces a performance penalty, so maybe there should
-be a way to do this after all: C<sub foo {START: ... goto START;> is
-better.)
+C<List::Util> gives first() (a short-circuiting grep); tail recursion
+removal is done manually, with C<goto &whoami;>. (However, MJD has
+found that C<goto &whoami> introduces a performance penalty, so maybe
+there should be a way to do this after all: C<sub foo {START: ... goto
+START;> is better.)
=head2 Make "use utf8" the default
-There is a patch available for this, search p5p archives for
-the Subject "[EXPERIMENTAL PATCH] make unicode (utf8) default"
-but this would be unacceptable because of backward compatibility:
-scripts could not contain B<any legacy eight-bit data>. Also would
-introduce a measurable slowdown of at least few percentages since all
-regular expression operations would be done in full UTF-8.
+Because of backward compatibility this is difficult: scripts could not
+contain B<any legacy eight-bit data> (like Latin-1) anymore, even in
+string literals or pod. Also would introduce a measurable slowdown of
+at least few percentages since all regular expression operations would
+be done in full UTF-8. But if you want to try this, add
+-DUSE_UTF8_SCRIPTS to your compilation flags.
+
+=head2 Unicode collation and normalization
+
+The Unicode::Collate and Unicode::Normalize modules
+by SADAHIRO Tomoyuki have been included since 5.8.0.
+
+ Collation? http://www.unicode.org/unicode/reports/tr10/
+ Normalization? http://www.unicode.org/unicode/reports/tr15/
+
+=head2 Create debugging macros
+
+Debugging macros (like printsv, dump) can make debugging perl inside a
+C debugger much easier. A good set for gdb comes with mod_perl.
+Something similar should be distributed with perl.
+
+The proper way to do this is to use and extend Devel::DebugInit.
+Devel::DebugInit also needs to be extended to support threads.
+
+See p5p archives for late May/early June 2001 for a recent discussion
+on this topic.
+=cut