-Bugs
- fix small memory leaks on compile-time failures
-
Unicode support
finish byte <-> utf8 and localencoding <-> utf8 conversions
make substr($bytestr,0,0,$charstr) do the right conversion
- a way to set default disciplines for all handle constructors:
use open IN => ":any", OUT => ":utf8", SYS => ":utf16"
eliminate need for "use utf8;"
- autoload utf8_heavy.pl's swash routines in swash_init()
autoload byte.pm when byte:: is seen by the parser
check uv_to_utf8() calls for buffer overflow
- (see also "Locales", "Regexen", and "Miscellaneous")
+ make \uXXXX (and \u{XXXX}?) where XXXX are hex digits
+ to work similarly to Unicode tech reports and Java
+ notation \uXXXX (and already existing \x{XXXX))?
+ more than four hexdigits? make also \U+XXXX work?
+ See also "Locales", "Regexen", and "Miscellaneous".
Multi-threading
support "use Thread;" under useithreads
libswanted <-> usethreads <-> use64bitint <-> use64bitall <->
uselargefiles <-> ...
make configuring+building away from source directory work (VPATH et al)
- this is related to: cross-compilation configuring
- scenarios to consider: the host and the target might have
- shared filesystems, or they might not (the communication
- channel might be e.g. rsh/ssh, or some batch submission system)
- most obviously: they might not share the same CPU
- meaning: assume nothing about shared properties/resources
+ this is related to: cross-compilation configuring (see Todo)
_r support (see Todo for mode detailed description)
POSIX 1003.1 1996 Edition support--realtime stuff:
POSIX semaphores, message queues, shared memory, realtime clocks,
timers, signals (the metaconfig units mostly already exist for these)
+ PREFERABLY AS AN EXTENSION
UNIX98 support: reader-writer locks, realtime/asynchronous IO
+ PREFERABLY AS AN EXTENSION
IPv6 support: see RFC2292, RFC2553
+ PREFERABLY AS AN EXTENSION
+ there already is Socket6
Long doubles
figure out where the PV->NV->PV conversion gets it wrong at least
64-bit support
Configure probe for quad_t, uquad_t, and (argh) u_quad_t, they might
be in some systems the only thing working as quadtype and uquadtype.
+ more pain: long_long, u_long_long.
Locales
deprecate traditional/legacy locales?
figure out how to support Unicode locales
suggestion: integrate the IBM Classes for Unicode (ICU)
http://oss.software.ibm.com/developerworks/opensource/icu/project/
- and check out also the Locale Converter:
+ ICU is "portable, open-source Unicode library with:
+ charset-independent locales (with multiple locales
+ simultaneously supported in same thread; character
+ conversions; formatting/parsing for numbers, currencies,
+ date/time and messages; message catalogs (resources);
+ transliteration, collation, normalization, and text
+ boundaries (grapheme, word, line-break))".
+ Check out also the Locale Converter:
http://alphaworks.ibm.com/tech/localeconverter
- ICU is "portable, open-source Unicode library with:
- charset-independent locales (with multiple locales simultaneously
- supported in same thread; character conversions; formatting/parsing
- for numbers, currencies, date/time and messages; message catalogs
- (resources) ; transliteration, collation, normalization, and text
- boundaries (grapheme, word, line-break))".
- There is also 'iconv', either from XPG4 or GNU (glibc).
+ There is also the iconv interface, either from XPG4 or GNU (glibc).
iconv is about character set conversions.
Either ICU or iconv would be valuable to get integrated
into Perl, Configure already probes for libiconv and <iconv.h>.
Regexen
make RE engine thread-safe
+ a way to do full character set arithmetics: now one can do
+ addition, negate a whole class, and negate certain subclasses
+ (e.g. \D, [:^digit:]), but a more generic way to add/subtract/
+ intersect characters/classes, like described in the Unicode technical
+ report on Regular Expression Guidelines,
+ http://www.unicode.org/unicode/reports/tr18/
+ (amusingly, the TR notes that difference and intersection
+ can be done using "Perl-style look-ahead")
+ difference syntax? maybe [[:alpha:][^abc]] meaning
+ "all alphabetic expect a, b, and c"? or [[:alpha:]-[abc]]?
+ (maybe bad, as we explicitly disallow such 'ranges')
+ intersection syntax? maybe [[..]&[...]]?
POSIX [=bar=] and [.zap.] would nice too but there's no API for them
=bar= could be done with Unicode, though, see the Unicode TR #15 about
normalization forms:
this is also a part of the Unicode 3.0:
http://www.unicode.org/unicode/uni2book/u2.html
executive summary: there are several different levels of 'equivalence'
+ trie optimization: factor out common suffixes (and prefixes?)
+ from |-alternating groups (both for exact strings and character
+ classes, use lookaheads?)
approximate matching
Security
Configure doesn't yet probe for usleep/nanosleep/ualarm but
the units exist)
floating point handling: nans, infinities, fp exception masks, etc.
- at least the following interfaces exist: fp_classify(), fp_class(),
- class(), isnan(), isinf(), isfinite(), finite(), isnormal(),
- ordered(), fp_setmask(), fp_getmask(), fp_setround(), fp_getround(),
- ieeefp.h, fp_class.h. There are metaconfig units for most of these.
- Search for ifdef __osf__ in pp.c to find a temporary fix that
- needs to be done right.
+ At least the following interfaces exist: fp_classify(), fp_class(),
+ class(), isinf(), isfinite(), finite(), isnormal(), unordered(),
+ <ieeefp.h>, <fp_class.h> (there are metaconfig units for all these),
+ fp_setmask(), fp_getmask(), fp_setround(), fp_getround()
+ (no metaconfig units yet for these).
+ Don't forget finitel(), fp_classl(), fp_class_l(), (yes, both do,
+ unfortunately, exist), and unorderedl().
+ PREFERABLY AS AN EXTENSION.
+ As of 5.6.1 there is cpp macro Perl_isnan().
fix the basic arithmetics (+ - * / %) to preserve IVness/UVness if
both arguments are IVs/UVs
replace pod2html with new PodtoHtml? (requires other modules from CPAN)
spot-check all new modules for completeness
better docs for pack()/unpack()
reorg tutorials vs. reference sections
+ make roffitall to be dynamical about its pods and libs
+