X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlport.pod;h=250e4266bfc8273c40e97a4d2926ee3fc6e890b5;hb=d29820a7a5800074874ec72fde5f57c0f699c7c8;hp=fc96531e1bdd93077a8f93d04276fe424e2f77a7;hpb=5e12dbfa1fc5fab9ffcdf3a398fa9f5c92327e0d;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlport.pod b/pod/perlport.pod index fc96531..250e426 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -75,7 +75,7 @@ This information should not be considered complete; it includes possibly transient information about idiosyncrasies of some of the ports, almost all of which are in a state of constant evolution. Thus, this material should be considered a perpetual work in progress -(Under Construction). +(C<< Under Construction >>). =head1 ISSUES @@ -171,8 +171,8 @@ newline representation. A single line of code will often suffice: Some of this may be confusing. Here's a handy reference to the ASCII CR and LF characters. You can print it out and stick it in your wallet. - LF == \012 == \x0A == \cJ == ASCII 10 - CR == \015 == \x0D == \cM == ASCII 13 + LF eq \012 eq \x0A eq \cJ eq chr(10) eq ASCII 10 + CR eq \015 eq \x0D eq \cM eq chr(13) eq ASCII 13 | Unix | DOS | Mac | --------------------------- @@ -188,7 +188,23 @@ The Unix column assumes that you are not accessing a serial line "\n", and "\n" on output becomes CRLF. These are just the most common definitions of C<\n> and C<\r> in Perl. -There may well be others. +There may well be others. For example, on an EBCDIC implementation such +as z/OS or OS/400 the above material is similar to "Unix" but the code +numbers change: + + LF eq \025 eq \x15 eq chr(21) eq CP-1047 21 + LF eq \045 eq \x25 eq \cU eq chr(37) eq CP-0037 37 + CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-1047 13 + CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-0037 13 + + | z/OS | OS/400 | + ---------------------- + \n | LF | LF | + \r | CR | CR | + \n * | LF | LF | + \r * | CR | CR | + ---------------------- + * text-mode STDIO =head2 Numbers endianness and Width @@ -229,8 +245,11 @@ transferring or storing raw binary numbers. One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw binary, or else consider using modules like Data::Dumper (included in -the standard distribution as of Perl 5.005) and Storable. Keeping -all data as text significantly simplifies matters. +the standard distribution as of Perl 5.005) and Storable (included as +of perl 5.8). Keeping all data as text significantly simplifies matters. + +The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's +how far EBCDIC, or more precisely UTF-EBCDIC will go. =head2 Files and Filesystems @@ -259,6 +278,9 @@ timestamp (meaning that about the only portable timestamp is the modification timestamp), or one second granularity of any timestamps (e.g. the FAT filesystem limits the time granularity to two seconds). +The "inode change timestamp" (the C<-C> filetest) may really be the +"creation timestamp" (which it is not in UNIX). + VOS perl can emulate Unix filenames with C as path separator. The native pathname characters greater-than, less-than, number-sign, and percent-sign are always accepted. @@ -267,6 +289,13 @@ S perl can emulate Unix filenames with C as path separator, or go native and use C<.> for path separator and C<:> to signal filesystems and disk names. +Don't assume UNIX filesystem access semantics: that read, write, +and execute are all the permissions there are, and even if they exist, +that their semantics (for example what do r, w, and x mean on +a directory) are the UNIX ones. The various UNIX/POSIX compatibility +layers usually try to make interfaces like chmod() work, but sometimes +there simply is no good mapping. + If all this is intimidating, have no (well, maybe only a little) fear. There are modules that can help. The File::Spec modules provide methods to do the Right Thing on whatever platform happens @@ -311,30 +340,61 @@ the user to override the default location of the file. Don't assume a text file will end with a newline. They should, but people forget. -Do not have two files of the same name with different case, like -F and F, as many platforms have case-insensitive -filenames. Also, try not to have non-word characters (except for C<.>) -in the names, and keep them to the 8.3 convention, for maximum -portability, onerous a burden though this may appear. +Do not have two files or directories of the same name with different +case, like F and F, as many platforms have +case-insensitive (or at least case-forgiving) filenames. Also, try +not to have non-word characters (except for C<.>) in the names, and +keep them to the 8.3 convention, for maximum portability, onerous a +burden though this may appear. Likewise, when using the AutoSplit module, try to keep your functions to 8.3 naming and case-insensitive conventions; or, at the least, make it so the resulting files have a unique (case-insensitively) first 8 characters. -Whitespace in filenames is tolerated on most systems, but not all. +Whitespace in filenames is tolerated on most systems, but not all, +and even on systems where it might be tolerated, some utilities +might become confused by such whitespace. + Many systems (DOS, VMS) cannot have more than one C<.> in their filenames. Don't assume C<< > >> won't be the first character of a filename. -Always use C<< < >> explicitly to open a file for reading, -unless you want the user to be able to specify a pipe open. +Always use C<< < >> explicitly to open a file for reading, or even +better, use the three-arg version of open, unless you want the user to +be able to specify a pipe open. - open(FILE, "< $existing_file") or die $!; + open(FILE, '<', $existing_file) or die $!; If filenames might use strange characters, it is safest to open it with C instead of C. C is magic and can translate characters like C<< > >>, C<< < >>, and C<|>, which may be the wrong thing to do. (Sometimes, though, it's the right thing.) +Three-arg open can also help protect against this translation in cases +where it is undesirable. + +Don't use C<:> as a part of a filename since many systems use that for +their own semantics (MacOS Classic for separating pathname components, +many networking schemes and utilities for separating the nodename and +the pathname, and so on). For the same reasons, avoid C<@>, C<;> and +C<|>. + +Don't assume that in pathnames you can collapse two leading slashes +C into one: some networking and clustering filesystems have special +semantics for that. Let the operating system to sort it out. + +The I as defined by ANSI C are + + a b c d e f g h i j k l m n o p q r t u v w x y z + A B C D E F G H I J K L M N O P Q R T U V W X Y Z + 0 1 2 3 4 5 6 7 8 9 + . _ - + +and the "-" shouldn't be the first character. If you want to be +hypercorrect, stay case-insensitive and within the 8.3 naming +convention (all the files and directories have to be unique within one +directory if their names are lowercased and truncated to eight +characters before the C<.>, if any, and to three characters after the +C<.>, if any). (And do not use C<.>s in directory names.) =head2 System Interaction @@ -352,6 +412,25 @@ file already tied or opened; C or C it first. Don't open the same file more than once at a time for writing, as some operating systems put mandatory locks on such files. +Don't assume that write/modify permission on a directory gives the +right to add or delete files/directories in that directory. That is +filesystem specific: in some filesystems you need write/modify +permission also (or even just) in the file/directory itself. In some +filesystems (AFS, DFS) the permission to add/delete directory entries +is a completely separate permission. + +Don't assume that a single C completely gets rid of the file: +some filesystems (most notably the ones in VMS) have versioned +filesystems, and unlink() removes only the most recent one (it doesn't +remove all the versions because by default the native tools on those +platforms remove just the most recent version, too). The portable +idiom to remove all the versions of a file is + + 1 while unlink "file"; + +This will terminate if the file is undeleteable for some reason +(protected, not there, and so on). + Don't count on a specific environment variable existing in C<%ENV>. Don't count on C<%ENV> entries being case-sensitive, or even case-preserving. Don't try to clear %ENV by saying C<%ENV = ();>, or, @@ -369,6 +448,38 @@ directories. Don't count on specific values of C<$!>. +=head2 Command names versus file pathnames + +Don't assume that the name used to invoke a command or program with +C or C can also be used to test for the existence of the +file that holds the executable code for that command or program. +First, many systems have "internal" commands that are built-in to the +shell or OS and while these commands can be invoked, there is no +corresponding file. Second, some operating systems (e.g., Cygwin, +DJGPP, OS/2, and VOS) have required suffixes for executable files; +these suffixes are generally permitted on the command name but are not +required. Thus, a command like "perl" might exist in a file named +"perl", "perl.exe", or "perl.pm", depending on the operating system. +The variable "_exe" in the Config module holds the executable suffix, +if any. Third, the VMS port carefully sets up $^X and +$Config{perlpath} so that no further processing is required. This is +just as well, because the matching regular expression used below would +then have to deal with a possible trailing version number in the VMS +file name. + +To convert $^X to a file pathname, taking account of the requirements +of the various operating system possibilities, say: + use Config; + $thisperl = $^X; + if ($^O ne 'VMS') + {$thisperl .= $Config{_exe} unless $thisperl =~ m/$Config{_exe}$/i;} + +To convert $Config{perlpath} to a file pathname, say: + use Config; + $thisperl = $Config{perlpath}; + if ($^O ne 'VMS') + {$thisperl .= $Config{_exe} unless $thisperl =~ m/$Config{_exe}$/i;} + =head2 Interprocess Communication (IPC) In general, don't directly access the system in code meant to be @@ -404,6 +515,14 @@ simple, platform-independent mailing. The Unix System V IPC (C) is not available even on all Unix platforms. +Do not use either the bare result of C or +bare v-strings (such as C) to represent IPv4 addresses: +both forms just pack the four bytes into network order. That this +would be equal to the C language C struct (which is what the +socket code internally uses) is not guaranteed. To be portable use +the routines of the Socket extension, such as C, +C, and C. + The rule of thumb for portable code is: Do it all in portable Perl, or use a module (that may internally implement it with platform-specific code, but expose a common interface). @@ -468,15 +587,20 @@ to get what should be the proper value on any system. =head2 Character sets and character encoding -Assume little about character sets. Assume nothing about -numerical values (C, C) of characters. Do not -assume that the alphabetic characters are encoded contiguously (in -the numeric sense). Do not assume anything about the ordering of the -characters. The lowercase letters may come before or after the -uppercase letters; the lowercase and uppercase may be interlaced so -that both `a' and `A' come before `b'; the accented and other -international characters may be interlaced so that E comes -before `b'. +Assume very little about character sets. + +Assume nothing about numerical values (C, C) of characters. +Do not use explicit code point ranges (like \xHH-\xHH); use for +example symbolic character classes like C<[:print:]>. + +Do not assume that the alphabetic characters are encoded contiguously +(in the numeric sense). There may be gaps. + +Do not assume anything about the ordering of the characters. +The lowercase letters may come before or after the uppercase letters; +the lowercase and uppercase may be interlaced so that both `a' and `A' +come before `b'; the accented and other international characters may +be interlaced so that E comes before `b'. =head2 Internationalisation @@ -511,13 +635,31 @@ more efficient that the first. Most multi-user platforms provide basic levels of security, usually implemented at the filesystem level. Some, however, do -not--unfortunately. Thus the notion of user id, or "home" directory, +not-- unfortunately. Thus the notion of user id, or "home" directory, or even the state of being logged-in, may be unrecognizable on many platforms. If you write programs that are security-conscious, it is usually best to know what type of system you will be running under so that you can write code explicitly for that platform (or class of platforms). +Don't assume the UNIX filesystem access semantics: the operating +system or the filesystem may be using some ACL systems, which are +richer languages than the usual rwx. Even if the rwx exist, +their semantics might be different. + +(From security viewpoint testing for permissions before attempting to +do something is silly anyway: if one tries this, there is potential +for race conditions-- someone or something might change the +permissions between the permissions check and the actual operation. +Just try the operation.) + +Don't assume the UNIX user and group semantics: especially, don't +expect the C<< $< >> and C<< $> >> (or the C<$(> and C<$)>) to work +for switching identities (or memberships). + +Don't assume set-uid and set-gid semantics. (And even if you do, +think twice: set-uid and set-gid are a known can of security worms.) + =head2 Style For those times when it is necessary to have platform-specific code, @@ -532,7 +674,7 @@ often happens when tests spawn off other processes or call external programs to aid in the testing, or when (as noted above) the tests assume certain things about the filesystem and paths. Be careful not to depend on a specific output style for errors, such as when -checking C<$!> after an system call. Some platforms expect a certain +checking C<$!> after a system call. Some platforms expect a certain output format, and perl on those platforms may have been adjusted accordingly. Most specifically, don't anchor a regex when testing an error value. @@ -586,6 +728,7 @@ are a few of the more popular Unix flavors: -------------------------------------------- AIX aix aix BSD/OS bsdos i386-bsdos + Darwin darwin darwin dgux dgux AViiON-dgux DYNIX/ptx dynixptx i386-dynixptx FreeBSD freebsd freebsd-i386 @@ -595,7 +738,7 @@ are a few of the more popular Unix flavors: Linux linux ppc-linux HP-UX hpux PA-RISC1.1 IRIX irix irix - Mac OS X rhapsody rhapsody + Mac OS X darwin darwin MachTen PPC machten powerpc-machten NeXT 3 next next-fat NeXT 4 next OPENSTEP-Mach @@ -663,17 +806,22 @@ often assume nothing about their data. The C<$^O> variable and the C<$Config{archname}> values for various DOSish perls are as follows: - OS $^O $Config{'archname'} - -------------------------------------------- - MS-DOS dos - PC-DOS dos - OS/2 os2 - Windows 95 MSWin32 MSWin32-x86 - Windows 98 MSWin32 MSWin32-x86 - Windows NT MSWin32 MSWin32-x86 - Windows NT MSWin32 MSWin32-ALPHA - Windows NT MSWin32 MSWin32-ppc - Cygwin cygwin + OS $^O $Config{archname} ID Version + -------------------------------------------------------- + MS-DOS dos ? + PC-DOS dos ? + OS/2 os2 ? + Windows 3.1 ? ? 0 3 01 + Windows 95 MSWin32 MSWin32-x86 1 4 00 + Windows 98 MSWin32 MSWin32-x86 1 4 10 + Windows ME MSWin32 MSWin32-x86 1 ? + Windows NT MSWin32 MSWin32-x86 2 4 xx + Windows NT MSWin32 MSWin32-ALPHA 2 4 xx + Windows NT MSWin32 MSWin32-ppc 2 4 xx + Windows 2000 MSWin32 MSWin32-x86 2 5 xx + Windows XP MSWin32 MSWin32-x86 2 ? + Windows CE MSWin32 ? 3 + Cygwin cygwin ? The various MSWin32 Perl's can distinguish the OS they are running on via the value of the fifth element of the list returned from @@ -697,7 +845,7 @@ and L. The EMX environment for DOS, OS/2, etc. emx@iaehv.nl, http://www.leo.org/pub/comp/os/os2/leo/gnu/emx+gcc/index.html or -ftp://hobbes.nmsu.edu/pub/os2/dev/emx. Also L. +ftp://hobbes.nmsu.edu/pub/os2/dev/emx/ Also L. =item * @@ -784,14 +932,10 @@ the application or MPW tool version is running, check: $is_ppc = $MacPerl::Architecture eq 'MacPPC'; $is_68k = $MacPerl::Architecture eq 'Mac68K'; -S and S, based on NeXT's OpenStep OS, will -(in theory) be able to run MacPerl natively, under the "Classic" -environment. The new "Cocoa" environment (formerly called the "Yellow Box") -may run a slightly modified version of MacPerl, using the Carbon interfaces. - -S and its Open Source version, Darwin, both run Unix -perl natively (with a few patches). Full support for these -is slated for perl 5.6. +S, based on NeXT's OpenStep OS, runs MacPerl natively, under the +"Classic" environment. There is no "Carbon" version of MacPerl to run +under the primary Mac OS X environment. S and its Open Source +version, Darwin, both run Unix perl natively. Also see: @@ -799,15 +943,15 @@ Also see: =item * -The MacPerl Pages, http://www.macperl.com/ . +MacPerl Development, http://dev.macperl.org/ . =item * -The MacPerl mailing lists, http://www.macperl.org/ . +The MacPerl Pages, http://www.macperl.com/ . =item * -MacPerl Module Porters, http://pudge.net/mmp/ . +The MacPerl mailing lists, http://lists.perl.org/ . =back @@ -928,12 +1072,12 @@ Perl on VOS is discussed in F in the perl distribution (installed as L). Perl on VOS can accept either VOS- or Unix-style file specifications as in either of the following: - $ perl -ne "print if /perl_setup/i" >system>notices - $ perl -ne "print if /perl_setup/i" /system/notices + C<< $ perl -ne "print if /perl_setup/i" >system>notices >> + C<< $ perl -ne "print if /perl_setup/i" /system/notices >> or even a mixture of both as in: - $ perl -ne "print if /perl_setup/i" >system/notices + C<< $ perl -ne "print if /perl_setup/i" >system/notices >> Even though VOS allows the slash character to appear in object names, because the VOS port of Perl interprets it as a pathname @@ -942,11 +1086,12 @@ contain a slash character cannot be processed. Such files must be renamed before they can be processed by Perl. Note that VOS limits file names to 32 or fewer characters. -See F for restrictions that apply when Perl is built -with the alpha version of VOS POSIX.1 support. - -Perl on VOS is built without any extensions and does not support -dynamic loading. +Perl on VOS can be built using two different compilers and two different +versions of the POSIX runtime. The recommended method for building full +Perl is with the GNU C compiler and the generally-available version of +VOS POSIX support. See F (installed as L) for +restrictions that apply when Perl is built using the VOS Standard C +compiler or the alpha version of VOS POSIX support. The value of C<$^O> on VOS is "VOS". To determine the architecture that you are running on without resorting to loading all of C<%Config> you @@ -978,7 +1123,7 @@ Also see: =item * -F +F (installed as L) =item * @@ -986,12 +1131,12 @@ The VOS mailing list. There is no specific mailing list for Perl on VOS. You can post comments to the comp.sys.stratus newsgroup, or subscribe to the general -Stratus mailing list. Send a letter with "Subscribe Info-Stratus" in +Stratus mailing list. Send a letter with "subscribe Info-Stratus" in the message body to majordomo@list.stratagy.com. =item * -VOS Perl on the web at http://ftp.stratus.com/pub/vos/vos.html +VOS Perl on the web at http://ftp.stratus.com/pub/vos/posix/posix.html =back @@ -1214,6 +1359,7 @@ in the "OTHER" category include: OS $^O $Config{'archname'} ------------------------------------------ Amiga DOS amigaos m68k-amigos + BeOS beos MPE/iX mpeix PA-RISC1.1 See also: @@ -1344,6 +1490,9 @@ Only good for changing "owner" and "other" read-write access. (S) Access permissions are mapped onto VOS access-control list changes. (VOS) +The actual permissions set depend on the value of the C +in the SYSTEM environment settings. (Cygwin) + =item chown LIST Not implemented. (S, Win32, Plan9, S, VOS) @@ -1388,6 +1537,17 @@ Implemented via Spawn. (VM/ESA) Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX) +=item exit EXPR + +=item exit + +Emulates UNIX exit() (which considers C to indicate an error) by +mapping the C<1> to SS$_ABORT (C<44>). This behavior may be overridden +with the pragma C. As with the CRTL's exit() +function, C is also mapped to an exit status of SS$_NORMAL +(C<1>); this mapping cannot be overridden. Any other argument to exit() +is used directly as Perl's exit status. (VMS) + =item fcntl FILEHANDLE,FUNCTION,SCALAR Not implemented. (Win32, VMS) @@ -1483,14 +1643,6 @@ Not implemented. (S, Win32, Plan9) Not implemented. (Win32, Plan9) -=item setpwent - -Not implemented. (S, Win32, S) - -=item setgrent - -Not implemented. (S, Win32, VMS, S) - =item sethostent STAYOPEN Not implemented. (S, Win32, Plan9, S) @@ -1533,15 +1685,12 @@ Not implemented. (Plan9, Win32) =item getsockopt SOCKET,LEVEL,OPTNAME -Not implemented. (S, Plan9) +Not implemented. (Plan9) =item glob EXPR =item glob -Globbing built-in, but only C<*> and C metacharacters are supported. -(S) - This operator is implemented via the File::Glob extension on most platforms. See L for portability information. @@ -1556,8 +1705,10 @@ Available only for socket handles. (S) =item kill SIGNAL, LIST -Not implemented, hence not useful for taint checking. (S, -S) +C is implemented for the sake of taint checking; +use with other signals is unimplemented. (S) + +Not implemented, hence not useful for taint checking. (S) C doesn't have the semantics of C, i.e. it doesn't send a signal to the identified process like it does on Unix platforms. @@ -1610,8 +1761,6 @@ platforms. (SunOS, Solaris, HP-UX) =item pipe READHANDLE,WRITEHANDLE -Not implemented. (S) - Very limited functionality. (MiNT) =item readlink EXPR @@ -1622,11 +1771,11 @@ Not implemented. (Win32, VMS, S) =item select RBITS,WBITS,EBITS,TIMEOUT -Only implemented on sockets. (Win32) +Only implemented on sockets. (Win32, VMS) Only reliable on sockets. (S) -Note that the C form is generally portable. +Note that the C