X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperltodo.pod;h=e92d4742454db18f2dcaeb70171ac91ff80ff5f7;hb=0c2f6559512b2211f892f1a6ae8db4739c5369b4;hp=0e5d8a74ef6c16d770f3f2dff66716d479b59f8a;hpb=d8b6afd3d119b21b9c373c553672871c0c5fd1e2;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perltodo.pod b/pod/perltodo.pod index 0e5d8a7..e92d474 100644 --- a/pod/perltodo.pod +++ b/pod/perltodo.pod @@ -37,7 +37,7 @@ For displaying PVs with control characters, embedded nulls, and Unicode. This would be useful for printing warnings, or data and regex dumping, not_a_number(), and so on. -Requirements: should handle both byte and UTF8 strings. isPRINT() +Requirements: should handle both byte and UTF-8 strings. isPRINT() characters printed as-is, character less than 256 as \xHH, Unicode characters as \x{HHH}. Don't assume ASCII-like, either, get somebody on EBCDIC to test the output. @@ -98,7 +98,25 @@ UTF-8 identifier names should probably be canonicalized: NFC? UTF-8 in package names and sub names? The first is problematic because of the mapping to pathnames, ditto for the second one if -one does autosplitting, for example. +one does autosplitting, for example. Some of this works already +in 5.8.0, but essentially it is unsupported. Constructs to consider, +at the very least: + + use utf8; + package UnicodePackage; + sub new { bless {}, shift }; + sub UnicodeMethod1 { ... $_[0]->UnicodeMethod2(...) ... } + sub UnicodeMethod2 { ... } # in here caller(0) should contain Unicode + ... + package main; + my $x = UnicodePackage->new; + print ref $x, "\n"; # should be Unicode + $x->UnicodeMethod1(...); + my $y = UnicodeMethod3 UnicodePackage ...; + +In the above all I contain (identifier-worthy) characters +beyond the code point 255, for example 256. Wherever package/class or +subroutine names can be returned needs to be checked for Unicodeness. =back @@ -186,7 +204,7 @@ Floating point formatting is still causing some weird test failures. Locales and Unicode interact with each other in unpleasant ways. One possible solution would be to adopt/support ICU: - http://oss.software.ibm.com/developerworks/opensource/icu/project/ + http://oss.software.ibm.com/icu/index.html =head2 Arithmetic on non-Arabic numerals @@ -562,6 +580,13 @@ Should taint be stopped from affecting control flow, if ($tainted)? Should tainted symbolic method calls and subref calls be stopped? (Look at Ruby's $SAFE levels for inspiration?) +=head2 Perform correctly when XSUBs call subroutines that exit via goto(LABEL) and friends + +If an XSUB calls a subroutine that exits using goto(LABEL), +last(LABEL) or next(LABEL), then the interpreter will very probably crash +with a segfault because the execution resumes in the XSUB instead of +never returning there. + =head1 Vague ideas Ideas which have been discussed, and which may or may not happen. @@ -660,13 +685,13 @@ This emulation should also go near pp_sys.pp_truncate(). =head2 Unicode in Filenames -chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open, qx, -readdir, readlink, rename, rmdir, stat, symlink, sysopen, system, -truncate, unlink, utime. All these could potentially accept Unicode -filenames either as input or output (and in the case of system and qx -Unicode in general, as input or output to/from the shell). Whether a -filesystem - an operating system pair understands Unicode in filenames -varies. +chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open, +opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen, +system, truncate, unlink, utime, -X. All these could potentially accept +Unicode filenames either as input or output (and in the case of system +and qx Unicode in general, as input or output to/from the shell). +Whether a filesystem - an operating system pair understands Unicode in +filenames varies. Known combinations that have some level of understanding include Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac @@ -677,10 +702,13 @@ and so on, varies. Finding the right level of interfacing to Perl requires some thought. Remember that an OS does not implicate a filesystem. -Note that in Windows the -C command line flag already does quite -a bit of the above (but even there the support is not complete: -for example the exec/spawn are not Unicode-aware) by turning on -the so-called "wide API support". +(The Windows -C command flag "wide API support" has been at least +temporarily retired in 5.8.1, and the -C has been repurposed, see +L.) + +=head1 Unicode in %ENV + +Currently the %ENV entries are always byte strings. =head1 Recently done things @@ -778,7 +806,7 @@ Benjamin Sugars has done this. Nick Ing-Simmons' C supports an C IO method. -=head2 Byte to/from UTF8 and UTF8 to/from local conversion +=head2 Byte to/from UTF-8 and UTF-8 to/from local conversion C provides this. @@ -792,7 +820,8 @@ http://lists.perl.org/ , http://archive.develooper.com/ =head2 Bug tracking -Richard Foley has written the bug tracking system at http://bugs.perl.org/ +Since 5.8.0 perl uses the RT bug tracking system from Jesse Vincent, +implemented by Robert Spier at http://bugs.perl.org/ =head2 Integrate MacPerl