Re: [PATCH mg.c gv.c and others] ${^TAINT}

[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
diff --git a/pod/perltodo.pod b/pod/perltodo.pod

index f96c770..b903593 100644 (file)
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -20,13 +20,16 @@ of archives may be found at:
 C<perlio> provides this, but the interface could be a lot more
 straightforward.
 
-=head2 Eliminate need for "use utf8";
+=head2 Autoload bytes.pm
 
-While the C<utf8> pragma is autoloaded when necessary, it's still needed
-for things like Unicode characters in a source file. The UTF8 hint can
-always be set to true, but it needs to be set to false when F<utf8.pm>
-is being compiled. (To stop Perl trying to autoload the C<utf8>
-pragma...)
+When the lexer sees, for instance, C<bytes::length>, it should
+automatically load the C<bytes> pragma.
+
+=head2 Make "\u{XXXX}" et al work
+
+Danger, Will Robinson! Discussing the semantics of C<"\x{F00}">,
+C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
+flamewar.
 
 =head2 Create a char *sv_pvprintify(sv, STRLEN *lenp, UV flags)
 
@@ -51,17 +54,6 @@ Possible options, controlled by the flags:
 - append a "..." to the produced string if the maximum length is exceeded
 - really fancy: print unicode characters as \N{...}
 
-=head2 Autoload byte.pm
-
-When the lexer sees, for instance, C<bytes::length>, it should
-automatically load the C<bytes> pragma.
-
-=head2 Make "\u{XXXX}" et al work
-
-Danger, Will Robinson! Discussing the semantics of C<"\x{F00}">,
-C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
-flamewar.
-
 =head2 Overloadable regex assertions
 
 This may or may not be possible with the current regular expression
@@ -69,34 +61,50 @@ engine. The idea is that, for instance, C<\b> needs to be
 algorithmically computed if you're dealing with Thai text. Hence, the
 B<\b> assertion wants to be overloaded by a function.
 
-=head2 Unicode collation and normalization
+=head2 Unicode
 
-Simon Cozens promises to work on this.
+=over 4
 
-    Collation?     http://www.unicode.org/unicode/reports/tr10/
-    Normalization? http://www.unicode.org/unicode/reports/tr15/
+=item *
+
+Allow for long form of the General Category Properties, e.g
+C<\p{IsOpenPunctuation}>, not just the abbreviated form, e.g.
+C<\p{IsPs}>.
+
+=item *
+
+Allow for the metaproperties: C<XID Start>, C<XID Continue>,
+C<NF*_NO>, C<NF*_MAYBE> (require the DerivedCoreProperties and
+DerviceNormalizationProperties files).
 
-=head2 Unicode case mappings 
+There are also enumerated properties: C<Decomposition Type>,
+C<Numeric Type>, C<East Asian Width>, C<Line Break>.  These
+properties have multiple values: for uniqueness the property
+value should be appended.  For example, C<\p{IsAlphabetic}>
+wouldbe the binary property, while C<\p{AlphabeticLineBreak}>
+would mean the enumerated property.
+
+=item *
 
     Case Mappings? http://www.unicode.org/unicode/reports/tr21/
 
-=head2 Unicode regular expression character classes
+lc(), uc(), lcfirst(), and ucfirst() work only for some of the
+simplest cases, where the mapping goes from a single Unicode character
+to another single Unicode character.  See lib/unicore/SpecCase.txt
+(and CaseFold.txt).
+
+=item *
 
 They have some tricks Perl doesn't yet implement like character
 class subtraction.
 
        http://www.unicode.org/unicode/reports/tr18/
 
-=head2 Unicode Scripts support
+=back
 
-Currently the C<\p{In...}> supports only the Blocks database, like
-C<\p{BasicLatin}>, C<\p{InGreek}>, C<\p{InThai}>, but there's also the
-Scripts database, which has members like C<Latin>, C<Greek>,
-C<Armenian>, C<Han>.  It is desireable that also the script names
-could be used for the C<\p{In...}> construct.  Note: needs to be
-researched whether this is possible, that is, are there conflicts
-between the Blocks and the Scripts, is the Blocks Greek the same as
-the Scripts Greek?
+See L<perlunicode/UNICODE REGULAR EXPRESSION SUPPORT LEVEL> for what's
+there and what's missing.  Almost all of Levels 2 and 3 is missing,
+and as of 5.8.0 not even all of Level 1 is there.
 
 =head2 use Thread for iThreads
 
@@ -302,6 +310,12 @@ is the bootstrapping build process of Perl: if the filesystem the
 target systems sees is not the same what the build host sees, various
 input, output, and (Perl) library files need to be copied back and forth.
 
+As of 5.8.0 Configure mostly works for cross-compilation
+(used successfully for iPAQ Linux), miniperl gets built,
+but then building DynaLoader (and other extensions) fails
+since MakeMaker knows nothing of cross-compilation.
+(See INSTALL/Cross-compilation for the state of things.)
+
 =head2 Perl preprocessor / macros
 
 Source filters help with this, but do not get us all the way. For
@@ -336,6 +350,8 @@ has changed. Detecting a change is perhaps the difficult bit.
 
 =head2 All ARGV input should act like E<lt>E<gt>
 
+eg C<read(ARGV, ...)> doesn't currently read across multiple files.
+
 =head2 Support for rerunning debugger
 
 There should be a way of restarting the debugger on demand.
@@ -510,7 +526,7 @@ Ideas which have been discussed, and which may or may not happen.
 It's unclear what this should do or how to do it without breaking old
 code.
 
-=head2 Make tr/// return histogram
+=head2 Make tr/// return histogram of characters in list context
 
 There is a patch for this, but it may require Unicodification.
 
@@ -799,6 +815,7 @@ Suggesting this on P5P B<will> cause a boring and interminable flamewar.
 =head2 "class"-based lexicals
 
 Use flyweight objects, secure hashes or, dare I say it, pseudo-hashes instead.
+(Or whatever will replace pseudohashes in 5.10.)
 
 =head2 byteperl
 
@@ -806,18 +823,39 @@ C<ByteLoader> covers this.
 
 =head2 Lazy evaluation / tail recursion removal
 
-C<List::Util> in core gives some of these; tail recursion removal is
-done manually, with C<goto &whoami;>. (However, MJD has found that
-C<goto &whoami> introduces a performance penalty, so maybe there should
-be a way to do this after all: C<sub foo {START: ... goto START;> is
-better.)
+C<List::Util> gives first() (a short-circuiting grep); tail recursion
+removal is done manually, with C<goto &whoami;>. (However, MJD has
+found that C<goto &whoami> introduces a performance penalty, so maybe
+there should be a way to do this after all: C<sub foo {START: ... goto
+START;> is better.)
 
 =head2 Make "use utf8" the default
 
-There is a patch available for this, search p5p archives for
-the Subject "[EXPERIMENTAL PATCH] make unicode (utf8) default"
-but this would be unacceptable because of backward compatibility:
-scripts could not contain B<any legacy eight-bit data>.  Also would
-introduce a measurable slowdown of at least few percentages since all
-regular expression operations would be done in full UTF-8.
+Because of backward compatibility this is difficult: scripts could not
+contain B<any legacy eight-bit data> (like Latin-1) anymore, even in
+string literals or pod.  Also would introduce a measurable slowdown of
+at least few percentages since all regular expression operations would
+be done in full UTF-8.  But if you want to try this, add
+-DUSE_UTF8_SCRIPTS to your compilation flags.
+
+=head2 Unicode collation and normalization
+
+The Unicode::Collate and Unicode::Normalize modules
+by SADAHIRO Tomoyuki have been included since 5.8.0.
+
+    Collation?     http://www.unicode.org/unicode/reports/tr10/
+    Normalization? http://www.unicode.org/unicode/reports/tr15/
+
+=head2 Create debugging macros
+
+Debugging macros (like printsv, dump) can make debugging perl inside a
+C debugger much easier.  A good set for gdb comes with mod_perl.
+Something similar should be distributed with perl.
+
+The proper way to do this is to use and extend Devel::DebugInit.
+Devel::DebugInit also needs to be extended to support threads.
+
+See p5p archives for late May/early June 2001 for a recent discussion
+on this topic.
 
+=cut