Larry's fix for buggy propagation of utf8-ness in join(); add test

[p5sagit/p5-mst-13.2.git] / Todo-5.6
diff --git a/Todo-5.6 b/Todo-5.6

index 7912102..9abeb55 100644 (file)
--- a/Todo-5.6
+++ b/Todo-5.6
@@ -1,6 +1,3 @@
-Bugs
-    fix small memory leaks on compile-time failures
-
 Unicode support
     finish byte <-> utf8 and localencoding <-> utf8 conversions
     make substr($bytestr,0,0,$charstr) do the right conversion
@@ -13,7 +10,6 @@ Unicode support
        - a way to set default disciplines for all handle constructors:
            use open IN => ":any", OUT => ":utf8", SYS => ":utf16"
     eliminate need for "use utf8;"
-    autoload utf8_heavy.pl's swash routines in swash_init()
     autoload byte.pm when byte:: is seen by the parser
     check uv_to_utf8() calls for buffer overflow
     (see also "Locales", "Regexen", and "Miscellaneous")
@@ -47,12 +43,7 @@ Configure
        libswanted <-> usethreads <-> use64bitint <-> use64bitall <->
        uselargefiles <-> ...  
     make configuring+building away from source directory work (VPATH et al)
-       this is related to: cross-compilation configuring
-       scenarios to consider: the host and the target might have
-       shared filesystems, or they might not (the communication
-       channel might be e.g. rsh/ssh, or some batch submission system)
-       most obviously: they might not share the same CPU
-       meaning: assume nothing about shared properties/resources
+       this is related to: cross-compilation configuring (see Todo)
     _r support (see Todo for mode detailed description)
     POSIX 1003.1 1996 Edition support--realtime stuff:
        POSIX semaphores, message queues, shared memory, realtime clocks,
@@ -91,6 +82,18 @@ Locales
 
 Regexen
    make RE engine thread-safe
+   a way to do full character set arithmetics: now one can do
+       addition, negate a whole class, and negate certain subclasses
+       (e.g. \D, [:^digit:]), but a more generic way to add/subtract/
+       intersect characters/classes, like described in the Unicode technical
+       report on Regular Expression Guidelines,
+       http://www.unicode.org/unicode/reports/tr18/
+       (amusingly, the TR notes that difference and intersection
+        can be done using "Perl-style look-ahead")
+       difference syntax?  maybe [[:alpha:][^abc]] meaning
+       "all alphabetic expect a, b, and c"? or [[:alpha:]-[abc]]?
+       (maybe bad, as we explicitly disallow such 'ranges')
+       intersection syntax? maybe [[..]&[...]]?
    POSIX [=bar=] and [.zap.] would nice too but there's no API for them
        =bar= could be done with Unicode, though, see the Unicode TR #15 about
        normalization forms: