The Official name of ASCII.

[p5sagit/p5-mst-13.2.git] / pod / perltodo.pod
diff --git a/pod/perltodo.pod b/pod/perltodo.pod

index 3882498..534580a 100644 (file)
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -20,15 +20,7 @@ of archives may be found at:
 C<perlio> provides this, but the interface could be a lot more
 straightforward.
 
-=head2 Eliminate need for "use utf8";
-
-While the C<utf8> pragma is autoloaded when necessary, it's still needed
-for things like Unicode characters in a source file. The UTF8 hint can
-always be set to true, but it needs to be set to false when F<utf8.pm>
-is being compiled. (To stop Perl trying to autoload the C<utf8>
-pragma...)
-
-=head2 Autoload byte.pm
+=head2 Autoload bytes.pm
 
 When the lexer sees, for instance, C<bytes::length>, it should
 automatically load the C<bytes> pragma.
@@ -39,6 +31,32 @@ Danger, Will Robinson! Discussing the semantics of C<"\x{F00}">,
 C<"\xF00"> and C<"\U{F00}"> on P5P I<will> lead to a long and boring
 flamewar.
 
+=head2 Create a char *sv_pvprintify(sv, STRLEN *lenp, UV flags)
+
+For displaying PVs with control characters, embedded nulls, and Unicode.
+This would be useful for printing warnings, or data and regex dumping,
+not_a_number(), and so on.
+
+Requirements: should handle both byte and UTF8 strings.  isPRINT()
+characters printed as-is, character less than 256 as \xHH, Unicode
+characters as \x{HHH}.  Don't assume ASCII-like, either, get somebody
+on EBCDIC to test the output.
+
+Possible options, controlled by the flags:
+- whitespace (other than ' ' of isPRINT()) printed as-is
+- use isPRINT_LC() instead of isPRINT()
+- print control characters like this: "\cA"
+- print control characters like this: "^A"
+- non-PRINTables printed as '.' instead of \xHH
+- use \OOO instead of \xHH
+- use the C/Perl-metacharacters like \n, \t
+- have a maximum length for the produced string (read it from *lenp)
+- append a "..." to the produced string if the maximum length is exceeded
+- really fancy: print unicode characters as \N{...}
+
+NOTE: pv_display(), pv_uni_display(), sv_uni_display() are doing
+something like the above.
+
 =head2 Overloadable regex assertions
 
 This may or may not be possible with the current regular expression
@@ -46,23 +64,47 @@ engine. The idea is that, for instance, C<\b> needs to be
 algorithmically computed if you're dealing with Thai text. Hence, the
 B<\b> assertion wants to be overloaded by a function.
 
-=head2 Unicode collation and normalization
+=head2 Unicode
 
-Simon Cozens promises to work on this.
+=over 4
 
-    Collation?     http://www.unicode.org/unicode/reports/tr10/
-    Normalization? http://www.unicode.org/unicode/reports/tr15/
+=item *
 
-=head2 Unicode case mappings 
+Allow for long form of the General Category Properties, e.g
+C<\p{IsOpenPunctuation}>, not just the abbreviated form, e.g.
+C<\p{IsPs}>.
+
+=item *
+
+Allow for the metaproperties: C<XID Start>, C<XID Continue>,
+C<NF*_NO>, C<NF*_MAYBE> (require the DerivedCoreProperties and
+DerviceNormalizationProperties files).
+
+There are also multiple value properties still unimplemented:
+C<Numeric Type>, C<East Asian Width>.
+
+=item *
 
     Case Mappings? http://www.unicode.org/unicode/reports/tr21/
 
-=head2 Unicode regular expression character classes
+lc(), uc(), lcfirst(), and ucfirst() work only for some of the
+simplest cases, where the mapping goes from a single Unicode character
+to another single Unicode character.  See lib/unicore/SpecCase.txt
+(and CaseFold.txt).
+
+=item *
 
-They have some tricks Perl doesn't yet implement.
+They have some tricks Perl doesn't yet implement like character
+class subtraction.
 
        http://www.unicode.org/unicode/reports/tr18/
 
+=back
+
+See L<perlunicode/UNICODE REGULAR EXPRESSION SUPPORT LEVEL> for what's
+there and what's missing.  Almost all of Levels 2 and 3 is missing,
+and as of 5.8.0 not even all of Level 1 is there.
+
 =head2 use Thread for iThreads
 
 Artur Bergman's C<iThreads> module is a start on this, but needs to
@@ -150,11 +192,6 @@ Have a way to introduce user-defined opcodes without the subroutine call
 overhead of an XSUB; the user should be able to create PP code. Simon
 Cozens has some ideas on this.
 
-=head2 spawnvp() on Win32
-
-Win32 has problems spawning processes, particularly when the arguments
-to the child process contain spaces, quotes or tab characters.
-
 =head2 DLL Versioning
 
 Windows needs a way to know what version of a XS or C<libperl> DLL it's
@@ -259,7 +296,7 @@ That's to say, C<pack "(sI)40"> would be the same as C<pack "sI"x40>
 =head2 Cross compilation
 
 Make Perl buildable with a cross-compiler. This will play havoc with
-Configure, which needs to how how the target system will respond to
+Configure, which needs to know how the target system will respond to
 its tests; maybe C<microperl> will be a good starting point here.
 (Indeed, Bart Schuller reports that he compiled up C<microperl> for
 the Agenda PDA and it works fine.)  A really big spanner in the works
@@ -267,6 +304,12 @@ is the bootstrapping build process of Perl: if the filesystem the
 target systems sees is not the same what the build host sees, various
 input, output, and (Perl) library files need to be copied back and forth.
 
+As of 5.8.0 Configure mostly works for cross-compilation
+(used successfully for iPAQ Linux), miniperl gets built,
+but then building DynaLoader (and other extensions) fails
+since MakeMaker knows nothing of cross-compilation.
+(See INSTALL/Cross-compilation for the state of things.)
+
 =head2 Perl preprocessor / macros
 
 Source filters help with this, but do not get us all the way. For
@@ -301,10 +344,18 @@ has changed. Detecting a change is perhaps the difficult bit.
 
 =head2 All ARGV input should act like E<lt>E<gt>
 
+eg C<read(ARGV, ...)> doesn't currently read across multiple files.
+
 =head2 Support for rerunning debugger
 
 There should be a way of restarting the debugger on demand.
 
+=head2 Test Suite for the Debugger
+
+The debugger is a complex piece of software and fixing something
+here may inadvertently break something else over there.  To tame
+this chaotic behaviour, a test suite is necessary. 
+
 =head2 my sub foo { }
 
 The basic principle is sound, but there are problems with the semantics
@@ -460,6 +511,12 @@ Hugo van der Sanden plans to look at this.
 This has been done in places, but needs a thorough code review.
 Also fchdir is available in some platforms.
 
+=head2 Make v-strings overloaded objects
+
+Instead of having to guess whether a string is a v-string and thus
+needs to be displayed with %vd, make v-strings (readonly) objects
+(class "vstring"?) with a stringify overload.
+
 =head1 Vague ideas
 
 Ideas which have been discussed, and which may or may not happen.
@@ -469,7 +526,7 @@ Ideas which have been discussed, and which may or may not happen.
 It's unclear what this should do or how to do it without breaking old
 code.
 
-=head2 Make tr/// return histogram
+=head2 Make tr/// return histogram of characters in list context
 
 There is a patch for this, but it may require Unicodification.
 
@@ -758,6 +815,7 @@ Suggesting this on P5P B<will> cause a boring and interminable flamewar.
 =head2 "class"-based lexicals
 
 Use flyweight objects, secure hashes or, dare I say it, pseudo-hashes instead.
+(Or whatever will replace pseudohashes in 5.10.)
 
 =head2 byteperl
 
@@ -765,8 +823,39 @@ C<ByteLoader> covers this.
 
 =head2 Lazy evaluation / tail recursion removal
 
-C<List::Util> in core gives some of these; tail recursion removal is
-done manually, with C<goto &whoami;>. (However, MJD has found that
-C<goto &whoami> introduces a performance penalty, so maybe there should
-be a way to do this after all: C<sub foo {START: ... goto START;> is
-better.)
+C<List::Util> gives first() (a short-circuiting grep); tail recursion
+removal is done manually, with C<goto &whoami;>. (However, MJD has
+found that C<goto &whoami> introduces a performance penalty, so maybe
+there should be a way to do this after all: C<sub foo {START: ... goto
+START;> is better.)
+
+=head2 Make "use utf8" the default
+
+Because of backward compatibility this is difficult: scripts could not
+contain B<any legacy eight-bit data> (like Latin-1) anymore, even in
+string literals or pod.  Also would introduce a measurable slowdown of
+at least few percentages since all regular expression operations would
+be done in full UTF-8.  But if you want to try this, add
+-DUSE_UTF8_SCRIPTS to your compilation flags.
+
+=head2 Unicode collation and normalization
+
+The Unicode::Collate and Unicode::Normalize modules
+by SADAHIRO Tomoyuki have been included since 5.8.0.
+
+    Collation?     http://www.unicode.org/unicode/reports/tr10/
+    Normalization? http://www.unicode.org/unicode/reports/tr15/
+
+=head2 Create debugging macros
+
+Debugging macros (like printsv, dump) can make debugging perl inside a
+C debugger much easier.  A good set for gdb comes with mod_perl.
+Something similar should be distributed with perl.
+
+The proper way to do this is to use and extend Devel::DebugInit.
+Devel::DebugInit also needs to be extended to support threads.
+
+See p5p archives for late May/early June 2001 for a recent discussion
+on this topic.
+
+=cut