X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlport.pod;h=ef05ecf3492c94d774ff5532fc5213618631cf54;hb=666f95b95a2f6347f7b7bbc8951144df2db05479;hp=586913502036fb49a301ccc2b4fc8c253758ff0b;hpb=d1be9408a3c14848d30728674452e191ba5fffaa;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlport.pod b/pod/perlport.pod index 5869135..ef05ecf 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -232,6 +232,9 @@ binary, or else consider using modules like Data::Dumper (included in the standard distribution as of Perl 5.005) and Storable (included as of perl 5.8). Keeping all data as text significantly simplifies matters. +The v-strings are portable only up to v2147483647 (0x7FFFFFFF), that's +how far EBCDIC, or more precisely UTF-EBCDIC will go. + =head2 Files and Filesystems Most platforms these days structure files in a hierarchical fashion. @@ -259,6 +262,9 @@ timestamp (meaning that about the only portable timestamp is the modification timestamp), or one second granularity of any timestamps (e.g. the FAT filesystem limits the time granularity to two seconds). +The "inode change timestamp" (the <-C> filetest) may really be the +"creation timestamp" (which it is not in UNIX). + VOS perl can emulate Unix filenames with C as path separator. The native pathname characters greater-than, less-than, number-sign, and percent-sign are always accepted. @@ -267,6 +273,13 @@ S perl can emulate Unix filenames with C as path separator, or go native and use C<.> for path separator and C<:> to signal filesystems and disk names. +Don't assume UNIX filesystem access semantics: that read, write, +and execute are all the permissions there are, and even if they exist, +that their semantics (for example what do r, w, and x mean on +a directory) are the UNIX ones. The various UNIX/POSIX compatibility +layers usually try to make interfaces like chmod() work, but sometimes +there simply is no good mapping. + If all this is intimidating, have no (well, maybe only a little) fear. There are modules that can help. The File::Spec modules provide methods to do the Right Thing on whatever platform happens @@ -311,30 +324,61 @@ the user to override the default location of the file. Don't assume a text file will end with a newline. They should, but people forget. -Do not have two files of the same name with different case, like -F and F, as many platforms have case-insensitive -filenames. Also, try not to have non-word characters (except for C<.>) -in the names, and keep them to the 8.3 convention, for maximum -portability, onerous a burden though this may appear. +Do not have two files or directories of the same name with different +case, like F and F, as many platforms have +case-insensitive (or at least case-forgiving) filenames. Also, try +not to have non-word characters (except for C<.>) in the names, and +keep them to the 8.3 convention, for maximum portability, onerous a +burden though this may appear. Likewise, when using the AutoSplit module, try to keep your functions to 8.3 naming and case-insensitive conventions; or, at the least, make it so the resulting files have a unique (case-insensitively) first 8 characters. -Whitespace in filenames is tolerated on most systems, but not all. +Whitespace in filenames is tolerated on most systems, but not all, +and even on systems where it might be tolerated, some utilities +might become confused by such whitespace. + Many systems (DOS, VMS) cannot have more than one C<.> in their filenames. Don't assume C<< > >> won't be the first character of a filename. -Always use C<< < >> explicitly to open a file for reading, -unless you want the user to be able to specify a pipe open. +Always use C<< < >> explicitly to open a file for reading, or even +better, use the three-arg version of open, unless you want the user to +be able to specify a pipe open. - open(FILE, "< $existing_file") or die $!; + open(FILE, '<', $existing_file) or die $!; If filenames might use strange characters, it is safest to open it with C instead of C. C is magic and can translate characters like C<< > >>, C<< < >>, and C<|>, which may be the wrong thing to do. (Sometimes, though, it's the right thing.) +Three-arg open can also help protect against this translation in cases +where it is undesirable. + +Don't use C<:> as a part of a filename since many systems use that for +their own semantics (MacOS Classic for separating pathname components, +many networking schemes and utilities for separating the nodename and +the pathname, and so on). For the same reasons, avoid C<@>, C<;> and +C<|>. + +Don't assume that in pathnames you can collapse two leading slashes +C into one: some networking and clustering filesystems have special +semantics for that. Let the operating system to sort it out. + +The I as defined by ANSI C are + + a b c d e f g h i j k l m n o p q r t u v w x y z + A B C D E F G H I J K L M N O P Q R T U V W X Y Z + 0 1 2 3 4 5 6 7 8 9 + . _ - + +and the "-" shouldn't be the first character. If you want to be +hypercorrect, stay case-insensitive and within the 8.3 naming +convention (all the files and directories have to be unique within one +directory if their names are lowercased and truncated to eight +characters before the C<.>, if any, and to three characters after the +C<.>, if any). (And do not use C<.>s in directory names.) =head2 System Interaction @@ -495,15 +539,20 @@ to get what should be the proper value on any system. =head2 Character sets and character encoding -Assume little about character sets. Assume nothing about -numerical values (C, C) of characters. Do not -assume that the alphabetic characters are encoded contiguously (in -the numeric sense). Do not assume anything about the ordering of the -characters. The lowercase letters may come before or after the -uppercase letters; the lowercase and uppercase may be interlaced so -that both `a' and `A' come before `b'; the accented and other -international characters may be interlaced so that E comes -before `b'. +Assume very little about character sets. + +Assume nothing about numerical values (C, C) of characters. +Do not use explicit code point ranges (like \xHH-\xHH); use for +example symbolic character classes like C<[:print:]>. + +Do not assume that the alphabetic characters are encoded contiguously +(in the numeric sense). There may be gaps. + +Do not assume anything about the ordering of the characters. +The lowercase letters may come before or after the uppercase letters; +the lowercase and uppercase may be interlaced so that both `a' and `A' +come before `b'; the accented and other international characters may +be interlaced so that E comes before `b'. =head2 Internationalisation @@ -538,13 +587,31 @@ more efficient that the first. Most multi-user platforms provide basic levels of security, usually implemented at the filesystem level. Some, however, do -not--unfortunately. Thus the notion of user id, or "home" directory, +not-- unfortunately. Thus the notion of user id, or "home" directory, or even the state of being logged-in, may be unrecognizable on many platforms. If you write programs that are security-conscious, it is usually best to know what type of system you will be running under so that you can write code explicitly for that platform (or class of platforms). +Don't assume the UNIX filesystem access semantics: the operating +system or the filesystem may be using some ACL systems, which are +richer languages than the usual rwx. Even if the rwx exist, +their semantics might be different. + +(From security viewpoint testing for permissions before attempting to +do something is silly anyway: if one tries this, there is potential +for race conditions-- someone or something might change the +permissions between the permissions check and the actual operation. +Just try the operation.) + +Don't assume the UNIX user and group semantics: especially, don't +expect the C<< $< >> and C<< $> >> (or the C<$(> and C<$)>) to work +for switching identities (or memberships). + +Don't assume set-uid and set-gid semantics. (And even if you do, +think twice: set-uid and set-gid are a known can of security worms.) + =head2 Style For those times when it is necessary to have platform-specific code, @@ -613,6 +680,7 @@ are a few of the more popular Unix flavors: -------------------------------------------- AIX aix aix BSD/OS bsdos i386-bsdos + Darwin darwin darwin dgux dgux AViiON-dgux DYNIX/ptx dynixptx i386-dynixptx FreeBSD freebsd freebsd-i386 @@ -1372,6 +1440,9 @@ Only good for changing "owner" and "other" read-write access. (S) Access permissions are mapped onto VOS access-control list changes. (VOS) +The actual permissions set depend on the value of the C +in the SYSTEM environment settings. (Cygwin) + =item chown LIST Not implemented. (S, Win32, Plan9, S, VOS) @@ -1416,6 +1487,17 @@ Implemented via Spawn. (VM/ESA) Does not automatically flush output handles on some platforms. (SunOS, Solaris, HP-UX) +=item exit EXPR + +=item exit + +Emulates UNIX exit() (which considers C to indicate an error) by +mapping the C<1> to SS$_ABORT (C<44>). This behavior may be overridden +with the pragma C. As with the CRTL's exit() +function, C is also mapped to an exit status of SS$_NORMAL +(C<1>); this mapping cannot be overridden. Any other argument to exit() +is used directly as Perl's exit status. (VMS) + =item fcntl FILEHANDLE,FUNCTION,SCALAR Not implemented. (Win32, VMS) @@ -1559,17 +1641,6 @@ Not implemented. (S, Win32, Plan9) Not implemented. (Plan9, Win32) -=item exit EXPR - -=item exit - -Emulates UNIX exit() (which considers C to indicate an error) by -mapping the C<1> to SS$_ABORT (C<44>). This behavior may be overridden -with the pragma C. As with the CRTL's exit() -function, C is also mapped to an exit status of SS$_NORMAL -(C<1>); this mapping cannot be overridden. Any other argument to exit() -is used directly as Perl's exit status. (VMS) - =item getsockopt SOCKET,LEVEL,OPTNAME Not implemented. (Plan9) @@ -1658,11 +1729,11 @@ Not implemented. (Win32, VMS, S) =item select RBITS,WBITS,EBITS,TIMEOUT -Only implemented on sockets. (Win32) +Only implemented on sockets. (Win32, VMS) Only reliable on sockets. (S) -Note that the C form is generally portable. +Note that the C