Whitespace tweaks.

[p5sagit/p5-mst-13.2.git] / pod / perlport.pod
diff --git a/pod/perlport.pod b/pod/perlport.pod

index e4a50b0..ef05ecf 100644 (file)
--- a/pod/perlport.pod
+++ b/pod/perlport.pod
@@ -232,6 +232,9 @@ binary, or else consider using modules like Data::Dumper (included in
 the standard distribution as of Perl 5.005) and Storable (included as
 of perl 5.8).  Keeping all data as text significantly simplifies matters.
 
+The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's
+how far EBCDIC, or more precisely UTF-EBCDIC will go.
+
 =head2 Files and Filesystems
 
 Most platforms these days structure files in a hierarchical fashion.
@@ -259,6 +262,9 @@ timestamp (meaning that about the only portable timestamp is the
 modification timestamp), or one second granularity of any timestamps
 (e.g. the FAT filesystem limits the time granularity to two seconds).
 
+The "inode change timestamp" (the <-C> filetest) may really be the
+"creation timestamp" (which it is not in UNIX).
+
 VOS perl can emulate Unix filenames with C</> as path separator.  The
 native pathname characters greater-than, less-than, number-sign, and
 percent-sign are always accepted.
@@ -267,6 +273,13 @@ S<RISC OS> perl can emulate Unix filenames with C</> as path
 separator, or go native and use C<.> for path separator and C<:> to
 signal filesystems and disk names.
 
+Don't assume UNIX filesystem access semantics: that read, write,
+and execute are all the permissions there are, and even if they exist,
+that their semantics (for example what do r, w, and x mean on
+a directory) are the UNIX ones.  The various UNIX/POSIX compatibility
+layers usually try to make interfaces like chmod() work, but sometimes
+there simply is no good mapping.
+
 If all this is intimidating, have no (well, maybe only a little)
 fear.  There are modules that can help.  The File::Spec modules
 provide methods to do the Right Thing on whatever platform happens
@@ -311,30 +324,61 @@ the user to override the default location of the file.
 Don't assume a text file will end with a newline.  They should,
 but people forget.
 
-Do not have two files of the same name with different case, like
-F<test.pl> and F<Test.pl>, as many platforms have case-insensitive
-filenames.  Also, try not to have non-word characters (except for C<.>)
-in the names, and keep them to the 8.3 convention, for maximum
-portability, onerous a burden though this may appear.
+Do not have two files or directories of the same name with different
+case, like F<test.pl> and F<Test.pl>, as many platforms have
+case-insensitive (or at least case-forgiving) filenames.  Also, try
+not to have non-word characters (except for C<.>) in the names, and
+keep them to the 8.3 convention, for maximum portability, onerous a
+burden though this may appear.
 
 Likewise, when using the AutoSplit module, try to keep your functions to
 8.3 naming and case-insensitive conventions; or, at the least,
 make it so the resulting files have a unique (case-insensitively)
 first 8 characters.
 
-Whitespace in filenames is tolerated on most systems, but not all.
+Whitespace in filenames is tolerated on most systems, but not all,
+and even on systems where it might be tolerated, some utilities
+might become confused by such whitespace.
+
 Many systems (DOS, VMS) cannot have more than one C<.> in their filenames.
 
 Don't assume C<< > >> won't be the first character of a filename.
-Always use C<< < >> explicitly to open a file for reading,
-unless you want the user to be able to specify a pipe open.
+Always use C<< < >> explicitly to open a file for reading, or even
+better, use the three-arg version of open, unless you want the user to
+be able to specify a pipe open.
 
-    open(FILE, "< $existing_file") or die $!;
+    open(FILE, '<', $existing_file) or die $!;
 
 If filenames might use strange characters, it is safest to open it
 with C<sysopen> instead of C<open>.  C<open> is magic and can
 translate characters like C<< > >>, C<< < >>, and C<|>, which may
 be the wrong thing to do.  (Sometimes, though, it's the right thing.)
+Three-arg open can also help protect against this translation in cases
+where it is undesirable.
+
+Don't use C<:> as a part of a filename since many systems use that for
+their own semantics (MacOS Classic for separating pathname components,
+many networking schemes and utilities for separating the nodename and
+the pathname, and so on).  For the same reasons, avoid C<@>, C<;> and
+C<|>.
+
+Don't assume that in pathnames you can collapse two leading slashes
+C<//> into one: some networking and clustering filesystems have special
+semantics for that.  Let the operating system to sort it out.
+
+The I<portable filename characters> as defined by ANSI C are
+
+ a b c d e f g h i j k l m n o p q r t u v w x y z
+ A B C D E F G H I J K L M N O P Q R T U V W X Y Z
+ 0 1 2 3 4 5 6 7 8 9
+ . _ -
+
+and the "-" shouldn't be the first character.  If you want to be
+hypercorrect, stay case-insensitive and within the 8.3 naming
+convention (all the files and directories have to be unique within one
+directory if their names are lowercased and truncated to eight
+characters before the C<.>, if any, and to three characters after the
+C<.>, if any).  (And do not use C<.>s in directory names.)
 
 =head2 System Interaction
 
@@ -423,13 +467,13 @@ simple, platform-independent mailing.
 The Unix System V IPC (C<msg*(), sem*(), shm*()>) is not available
 even on all Unix platforms.
 
-Do not use either the bare result of C<pack("N", 10, 20, 30, 40)>
-or bare v-strings (such as C<v10.20.30.40>) or to represent
-IPv4 addresses: both forms just pack the four bytes into network order.
-That this would be equal to the C language C<in_addr> struct (which is
-what the socket code internally uses) is not guaranteed.  To be
-portable use the routines of the Socket extension, such as
-C<inet_aton()>, C<inet_ntoa()>, and C<sockaddr_in()>.
+Do not use either the bare result of C<pack("N", 10, 20, 30, 40)> or
+bare v-strings (such as C<v10.20.30.40>) to represent IPv4 addresses:
+both forms just pack the four bytes into network order.  That this
+would be equal to the C language C<in_addr> struct (which is what the
+socket code internally uses) is not guaranteed.  To be portable use
+the routines of the Socket extension, such as C<inet_aton()>,
+C<inet_ntoa()>, and C<sockaddr_in()>.
 
 The rule of thumb for portable code is: Do it all in portable Perl, or
 use a module (that may internally implement it with platform-specific
@@ -495,15 +539,20 @@ to get what should be the proper value on any system.
 
 =head2 Character sets and character encoding
 
-Assume little about character sets.  Assume nothing about
-numerical values (C<ord>, C<chr>) of characters.  Do not
-assume that the alphabetic characters are encoded contiguously (in
-the numeric sense).  Do not assume anything about the ordering of the
-characters.  The lowercase letters may come before or after the
-uppercase letters; the lowercase and uppercase may be interlaced so
-that both `a' and `A' come before `b'; the accented and other
-international characters may be interlaced so that E<auml> comes
-before `b'.
+Assume very little about character sets.
+
+Assume nothing about numerical values (C<ord>, C<chr>) of characters.
+Do not use explicit code point ranges (like \xHH-\xHH); use for
+example symbolic character classes like C<[:print:]>.
+
+Do not assume that the alphabetic characters are encoded contiguously
+(in the numeric sense).  There may be gaps.
+
+Do not assume anything about the ordering of the characters.
+The lowercase letters may come before or after the uppercase letters;
+the lowercase and uppercase may be interlaced so that both `a' and `A'
+come before `b'; the accented and other international characters may
+be interlaced so that E<auml> comes before `b'.
 
 =head2 Internationalisation
 
@@ -538,13 +587,31 @@ more efficient that the first.
 
 Most multi-user platforms provide basic levels of security, usually
 implemented at the filesystem level.  Some, however, do
-not--unfortunately.  Thus the notion of user id, or "home" directory,
+not-- unfortunately.  Thus the notion of user id, or "home" directory,
 or even the state of being logged-in, may be unrecognizable on many
 platforms.  If you write programs that are security-conscious, it
 is usually best to know what type of system you will be running
 under so that you can write code explicitly for that platform (or
 class of platforms).
 
+Don't assume the UNIX filesystem access semantics: the operating
+system or the filesystem may be using some ACL systems, which are
+richer languages than the usual rwx.  Even if the rwx exist,
+their semantics might be different.
+
+(From security viewpoint testing for permissions before attempting to
+do something is silly anyway: if one tries this, there is potential
+for race conditions-- someone or something might change the
+permissions between the permissions check and the actual operation.
+Just try the operation.)
+
+Don't assume the UNIX user and group semantics: especially, don't
+expect the C<< $< >> and C<< $> >> (or the C<$(> and C<$)>) to work
+for switching identities (or memberships).
+
+Don't assume set-uid and set-gid semantics. (And even if you do,
+think twice: set-uid and set-gid are a known can of security worms.)
+
 =head2 Style
 
 For those times when it is necessary to have platform-specific code,
@@ -559,7 +626,7 @@ often happens when tests spawn off other processes or call external
 programs to aid in the testing, or when (as noted above) the tests
 assume certain things about the filesystem and paths.  Be careful
 not to depend on a specific output style for errors, such as when
-checking C<$!> after an system call.  Some platforms expect a certain
+checking C<$!> after a system call.  Some platforms expect a certain
 output format, and perl on those platforms may have been adjusted
 accordingly.  Most specifically, don't anchor a regex when testing
 an error value.
@@ -613,6 +680,7 @@ are a few of the more popular Unix flavors:
     --------------------------------------------
     AIX           aix        aix
     BSD/OS        bsdos      i386-bsdos
+    Darwin        darwin     darwin
     dgux          dgux       AViiON-dgux
     DYNIX/ptx     dynixptx   i386-dynixptx
     FreeBSD       freebsd    freebsd-i386    
@@ -1372,6 +1440,9 @@ Only good for changing "owner" and "other" read-write access. (S<RISC OS>)
 
 Access permissions are mapped onto VOS access-control list changes. (VOS)
 
+The actual permissions set depend on the value of the C<CYGWIN>
+in the SYSTEM environment settings.  (Cygwin)
+
 =item chown LIST
 
 Not implemented. (S<Mac OS>, Win32, Plan9, S<RISC OS>, VOS)
@@ -1416,6 +1487,17 @@ Implemented via Spawn. (VM/ESA)
 Does not automatically flush output handles on some platforms.
 (SunOS, Solaris, HP-UX)
 
+=item exit EXPR
+
+=item exit
+
+Emulates UNIX exit() (which considers C<exit 1> to indicate an error) by
+mapping the C<1> to SS$_ABORT (C<44>).  This behavior may be overridden
+with the pragma C<use vmsish 'exit'>.  As with the CRTL's exit()
+function, C<exit 0> is also mapped to an exit status of SS$_NORMAL
+(C<1>); this mapping cannot be overridden.  Any other argument to exit()
+is used directly as Perl's exit status. (VMS)
+
 =item fcntl FILEHANDLE,FUNCTION,SCALAR
 
 Not implemented. (Win32, VMS)
@@ -1559,20 +1641,9 @@ Not implemented. (S<Mac OS>, Win32, Plan9)
 
 Not implemented. (Plan9, Win32)
 
-=item exit EXPR
-
-=item exit
-
-Emulates UNIX exit() (which considers C<exit 1> to indicate an error) by
-mapping the C<1> to SS$_ABORT (C<44>).  This behavior may be overridden
-with the pragma C<use vmsish 'exit'>.  As with the CRTL's exit()
-function, C<exit 0> is also mapped to an exit status of SS$_NORMAL
-(C<1>); this mapping cannot be overridden.  Any other argument to exit()
-is used directly as Perl's exit status. (VMS)
-
 =item getsockopt SOCKET,LEVEL,OPTNAME
 
-Not implemented. (S<Mac OS>, Plan9)
+Not implemented. (Plan9)
 
 =item glob EXPR
 
@@ -1658,11 +1729,11 @@ Not implemented. (Win32, VMS, S<RISC OS>)
 
 =item select RBITS,WBITS,EBITS,TIMEOUT
 
-Only implemented on sockets. (Win32)
+Only implemented on sockets. (Win32, VMS)
 
 Only reliable on sockets. (S<RISC OS>)
 
-Note that the C<socket FILEHANDLE> form is generally portable.
+Note that the C<select FILEHANDLE> form is generally portable.
 
 =item semctl ID,SEMNUM,CMD,ARG
 
@@ -1690,7 +1761,7 @@ Not implemented. (MPE/iX, Win32)
 
 =item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL
 
-Not implemented. (S<Mac OS>, Plan9)
+Not implemented. (Plan9)
 
 =item shmctl ID,CMD,ARG
 
@@ -1722,7 +1793,11 @@ as '', so numeric comparison or manipulation of these fields may cause
 'not numeric' warnings.
 
 mtime and atime are the same thing, and ctime is creation time instead of
-inode change time. (S<Mac OS>)
+inode change time. (S<Mac OS>).
+
+ctime not supported on UFS (S<Mac OS X>).
+
+ctime is creation time instead of inode change time  (Win32).
 
 device and inode are not meaningful.  (Win32)
 
@@ -1754,6 +1829,16 @@ OS>, OS/390, VM/ESA)
 
 =item system LIST
 
+In general, do not assume the UNIX/POSIX semantics that you can shift
+C<$?> right by eight to get the exit value, or that C<$? & 127>
+would give you the number of the signal that terminated the program,
+or that C<$? & 128> would test true if the program was terminated by a
+coredump.  Instead, use the POSIX W*() interfaces: for example, use
+WIFEXITED($?) an WEXITVALUE($?) to test for a normal exit and the exit
+value, and WIFSIGNALED($?) and WTERMSIG($?)  for a signal exit and the
+signal.  Core dumping is not a portable concept, so there's no portable
+way to test for that.
+
 Only implemented if ToolServer is installed. (S<Mac OS>)
 
 As an optimization, may not call the command shell specified in
@@ -1823,7 +1908,7 @@ is finally closed. (AmigaOS)
 
 =item utime LIST
 
-Only the modification time is updated. (S<Mac OS>, VMS, S<RISC OS>)
+Only the modification time is updated. (S<BeOS>, S<Mac OS>, VMS, S<RISC OS>)
 
 May not behave as expected.  Behavior depends on the C runtime
 library's implementation of utime(), and the filesystem being
@@ -1937,7 +2022,7 @@ First public release with perl5.005.
 
 As of early 2001 (the Perl releases 5.6.1 and 5.7.1), the following
 platforms are able to build Perl from the standard source code
-distribution available at http://www.perl.com/CPAN/src/index.html
+distribution available at http://www.cpan.org/src/index.html
 
        AIX
        AmigaOS
@@ -2063,7 +2148,7 @@ Support for the following platform is planned for a future Perl release:
        Netware
 
 The following platforms have their own source code distributions and
-binaries available via http://www.perl.com/CPAN/ports/index.html:
+binaries available via http://www.cpan.org/ports/index.html:
 
                                Perl release
 
@@ -2072,7 +2157,7 @@ binaries available via http://www.perl.com/CPAN/ports/index.html:
        Tandem Guardian         5.004
 
 The following platforms have only binaries available via
-http://www.perl.com/CPAN/ports/index.html :
+http://www.cpan.org/ports/index.html :
 
                                Perl release
 
@@ -2083,7 +2168,7 @@ http://www.perl.com/CPAN/ports/index.html :
 Although we do suggest that you always build your own Perl from
 the source code, both for maximal configurability and for security,
 in case you are in a hurry you can check
-http://www.perl.com/CPAN/ports/index.html for binary distributions.
+http://www.cpan.org/ports/index.html for binary distributions.
 
 =head1 SEE ALSO
 
@@ -2117,7 +2202,7 @@ Andrew M. Langmead <aml@world.std.com>,
 Larry Moore <ljmoore@freespace.net>,
 Paul Moore <Paul.Moore@uk.origin-it.com>,
 Chris Nandor <pudge@pobox.com>,
-Matthias Neeracher <neeri@iis.ee.ethz.ch>,
+Matthias Neeracher <neeracher@mac.com>,
 Philip Newton <pne@cpan.org>,
 Gary Ng <71564.1743@CompuServe.COM>,
 Tom Phoenix <rootbeer@teleport.com>,
@@ -2130,6 +2215,3 @@ Michael G Schwern <schwern@pobox.com>,
 Dan Sugalski <dan@sidhe.org>,
 Nathan Torkington <gnat@frii.com>.
 
-=head1 VERSION
-
-Version 1.50, last modified 10 Jul 2001