X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlport.pod;h=8a72de246fdcc77a5849d66e8cee1b45eaf8dfce;hb=b86b480f7301a8816081189c89b366a79ab9909f;hp=36a87050ba015a42a1df599109cf771bd9943a02;hpb=e5218da503dbb4980410e0018f4cc5dcba3ea666;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlport.pod b/pod/perlport.pod index 36a8705..8a72de2 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -94,21 +94,9 @@ from) C<\015\012>, depending on whether you're reading or writing. Unix does the same thing on ttys in canonical mode. C<\015\012> is commonly referred to as CRLF. -A common cause of unportable programs is the misuse of chop() to trim -newlines: - - # XXX UNPORTABLE! - while() { - chop; - @array = split(/:/); - #... - } - -You can get away with this on Unix and Mac OS (they have a single -character end-of-line), but the same program will break under DOSish -perls because you're only chop()ing half the end-of-line. Instead, -chomp() should be used to trim newlines. The L module -can help audit your code for misuses of chop(). +To trim trailing newlines from text lines use chomp(). With default +settings that function looks for a trailing C<\n> character and thus +trims in a portable way. When dealing with binary files (or text files in binary mode) be sure to explicitly set $/ to the appropriate value for your file format @@ -360,7 +348,8 @@ Whitespace in filenames is tolerated on most systems, but not all, and even on systems where it might be tolerated, some utilities might become confused by such whitespace. -Many systems (DOS, VMS) cannot have more than one C<.> in their filenames. +Many systems (DOS, VMS ODS-2) cannot have more than one C<.> in their +filenames. Don't assume C<< > >> won't be the first character of a filename. Always use C<< < >> explicitly to open a file for reading, or even @@ -444,6 +433,16 @@ if you really have to, make it conditional on C<$^O ne 'VMS'> since in VMS the C<%ENV> table is much more than a per-process key-value string table. +On VMS, some entries in the %ENV hash are dynamically created when +their key is used on a read if they did not previously exist. The +values for C<$ENV{HOME}>, C<$ENV{TERM}>, C<$ENV{HOME}>, and C<$ENV{USER}>, +are known to be dynamically generated. The specific names that are +dynamically generated may vary with the version of the C library on VMS, +and more may exist than is documented. + +On VMS by default, changes to the %ENV hash are persistent after the process +exits. This can cause unintended issues. + Don't count on signals or C<%SIG> for anything. Don't count on filename globbing. Use C, C, and @@ -642,9 +641,6 @@ The value for C<$offset> in Unix will be C<0>, but in Mac OS will be some large number. C<$offset> can then be added to a Unix time value to get what should be the proper value on any system. -On Windows (at least), you shouldn't pass a negative value to C or -C. - =head2 Character sets and character encoding Assume very little about character sets. @@ -658,9 +654,9 @@ Do not assume that the alphabetic characters are encoded contiguously Do not assume anything about the ordering of the characters. The lowercase letters may come before or after the uppercase letters; -the lowercase and uppercase may be interlaced so that both `a' and `A' -come before `b'; the accented and other international characters may -be interlaced so that E comes before `b'. +the lowercase and uppercase may be interlaced so that both "a" and "A" +come before "b"; the accented and other international characters may +be interlaced so that E comes before "b". =head2 Internationalisation @@ -683,12 +679,9 @@ ISO 8859-1 bytes beyond 0x7f into your strings might cause trouble later. If the bytes are native 8-bit bytes, you can use the C pragma. If the bytes are in a string (regular expression being a curious string), you can often also use the C<\xHH> notation instead -of embedding the bytes as-is. If they are in some particular legacy -encoding (ether single-byte or something more complicated), you can -use the C pragma. (If you want to write your code in UTF-8, -you can use either the C pragma, or the C pragma.) -The C and C pragmata are available since Perl 5.6.0, and -the C pragma since Perl 5.8.0. +of embedding the bytes as-is. (If you want to write your code in UTF-8, +you can use the C.) The C and C pragmata are +available since Perl 5.6.0. =head2 System Resources @@ -1057,6 +1050,9 @@ MPW, ftp://ftp.apple.com/developer/Tool_Chest/Core_Mac_OS_Tools/ =head2 VMS Perl on VMS is discussed in L in the perl distribution. + +The official name of VMS as of this writing is OpenVMS. + Perl on VMS can accept either VMS- or Unix-style file specifications as in either of the following: @@ -1093,29 +1089,98 @@ you are so inclined. For example: Do take care with C<$ ASSIGN/nolog/user SYS$COMMAND: SYS$INPUT> if your perl-in-DCL script expects to do things like C<< $read = ; >>. -Filenames are in the format "name.extension;version". The maximum -length for filenames is 39 characters, and the maximum length for +The VMS operating system has two filesystems, known as ODS-2 and ODS-5. + +For ODS-2, filenames are in the format "name.extension;version". The +maximum length for filenames is 39 characters, and the maximum length for extensions is also 39 characters. Version is a number from 1 to 32767. Valid characters are C. -VMS's RMS filesystem is case-insensitive and does not preserve case. -C returns lowercased filenames, but specifying a file for -opening remains case-insensitive. Files without extensions have a -trailing period on them, so doing a C with a file named F -will return F (though that file could be opened with +The ODS-2 filesystem is case-insensitive and does not preserve case. +Perl simulates this by converting all filenames to lowercase internally. + +For ODS-5, filenames may have almost any character in them and can include +Unicode characters. Characters that could be misinterpreted by the DCL +shell or file parsing utilities need to be prefixed with the C<^> +character, or replaced with hexadecimal characters prefixed with the +C<^> character. Such prefixing is only needed with the pathnames are +in VMS format in applications. Programs that can accept the UNIX format +of pathnames do not need the escape characters. The maximum length for +filenames is 255 characters. The ODS-5 file system can handle both +a case preserved and a case sensitive mode. + +ODS-5 is only available on the OpenVMS for 64 bit platforms. + +Support for the extended file specifications is being done as optional +settings to preserve backward compatibility with Perl scripts that +assume the previous VMS limitations. + +In general routines on VMS that get a UNIX format file specification +should return it in a UNIX format, and when they get a VMS format +specification they should return a VMS format unless they are documented +to do a conversion. + +For routines that generate return a file specification, VMS allows setting +if the C library which Perl is built on if it will be returned in VMS +format or in UNIX format. + +With the ODS-2 file system, there is not much difference in syntax of +filenames without paths for VMS or UNIX. With the extended character +set available with ODS-5 there can be a significant difference. + +Because of this, existing Perl scripts written for VMS were sometimes +treating VMS and UNIX filenames interchangeably. Without the extended +character set enabled, this behavior will mostly be maintained for +backwards compatibility. + +When extended characters are enabled with ODS-5, the handling of +UNIX formatted file specifications is to that of a UNIX system. + +VMS file specifications without extensions have a trailing dot. An +equivalent UNIX file specification should not show the trailing dot. + +The result of all of this, is that for VMS, for portable scripts, you +can not depend on Perl to present the filenames in lowercase, to be +case sensitive, and that the filenames could be returned in either +UNIX or VMS format. + +And if a routine returns a file specification, unless it is intended to +convert it, it should return it in the same format as it found it. + +C by default has traditionally returned lowercased filenames. +When the ODS-5 support is enabled, it will return the exact case of the +filename on the disk. + +Files without extensions have a trailing period on them, so doing a +C in the default mode with a file named F will +return F when VMS is (though that file could be opened with C). +With support for extended file specifications and if C was +given a UNIX format directory, a file named F will return F +and optionally in the exact case on the disk. When C is given +a VMS format directory, then C should return F, and +again with the optionally the exact case. + RMS had an eight level limit on directory depths from any rooted logical -(allowing 16 levels overall) prior to VMS 7.2. Hence -C is a valid directory specification but -C is not. F authors might -have to take this into account, but at least they can refer to the former -as C. +(allowing 16 levels overall) prior to VMS 7.2, and even with versions of +VMS on VAX up through 7.3. Hence C is a +valid directory specification but C is +not. F authors might have to take this into account, but at +least they can refer to the former as C. + +Pumpkings and module integrators can easily see whether files with too many +directory levels have snuck into the core by running the following in the +top-level source directory: + + $ perl -ne "$_=~s/\s+.*//; print if scalar(split /\//) > 8;" < MANIFEST + The VMS::Filespec module, which gets installed as part of the build process on VMS, is a pure Perl module that can easily be installed on non-VMS platforms and can be helpful for conversions to and from RMS -native formats. +native formats. It is also now the only way that you should check to +see if VMS is in a case sensitive mode. What C<\n> represents depends on the type of file opened. It usually represents C<\012> but it could also be C<\015>, C<\012>, C<\015\012>, @@ -1126,6 +1191,10 @@ special fopen() requirements of files with unusual attributes on VMS. TCP/IP stacks are optional on VMS, so socket routines might not be implemented. UDP sockets may not be supported. +The TCP/IP library support for all current versions of VMS is dynamically +loaded if present, so even if the routines are configured, they may +return a status indicating that they are not implemented. + The value of C<$^O> on OpenVMS is "VMS". To determine the architecture that you are running on without resorting to loading all of C<%Config> you can examine the content of the C<@INC> array like so: @@ -1136,10 +1205,16 @@ you can examine the content of the C<@INC> array like so: } elsif (grep(/VMS_VAX/, @INC)) { print "I'm on VAX!\n"; + } elsif (grep(/VMS_IA64/, @INC)) { + print "I'm on IA64!\n"; + } else { print "I'm not so sure about where $^O is...\n"; } +In general, the significant differences should only be if Perl is running +on VMS_VAX or one of the 64 bit OpenVMS platforms. + On VMS, perl determines the UTC offset from the C logical name. Although the VMS epoch began at 17-NOV-1858 00:00:00.00, calls to C are adjusted to count offsets from @@ -1155,9 +1230,7 @@ F (installed as L), L =item * -vmsperl list, majordomo@perl.org - -(Put the words C in message body.) +vmsperl list, vmsperl-subscribe@perl.org =item * @@ -1183,7 +1256,8 @@ names, because the VOS port of Perl interprets it as a pathname delimiting character, VOS files, directories, or links whose names contain a slash character cannot be processed. Such files must be renamed before they can be processed by Perl. Note that VOS limits -file names to 32 or fewer characters. +file names to 32 or fewer characters, file names cannot start with a +C<-> character, or contain any character matching C<< tr/ !%&'()*+;<>?// >> The value of C<$^O> on VOS is "VOS". To determine the architecture that you are running on without resorting to loading all of C<%Config> you @@ -1504,6 +1578,11 @@ C<-r>, C<-w>, and C<-x> have a limited meaning only; directories and applications are executable, and there are no uid/gid considerations. C<-o> is not supported. (S) +C<-w> only inspects the read-only file attribute (FILE_ATTRIBUTE_READONLY), +which determines whether the directory can be deleted, not whether it can +be written to. Directories always have read and write access unless denied +by discretionary access control lists (DACLs). (S) + C<-r>, C<-w>, C<-x>, and C<-o> tell whether the file is accessible, which may not reflect UIC-based file protections. (VMS) @@ -1602,7 +1681,7 @@ Not implemented. (VMS, S, VOS) Not useful. (S, S) -Not implemented. (Win32) +Not supported. (Cygwin, Win32) Invokes VMS debugger. (VMS) @@ -1622,11 +1701,17 @@ mapping the C<1> to SS$_ABORT (C<44>). This behavior may be overridden with the pragma C. As with the CRTL's exit() function, C is also mapped to an exit status of SS$_NORMAL (C<1>); this mapping cannot be overridden. Any other argument to exit() -is used directly as Perl's exit status. (VMS) +is used directly as Perl's exit status. On VMS, unless the future +POSIX_EXIT mode is enabled, the exit code should always be a valid +VMS exit code and not a generic number. When the POSIX_EXIT mode is +enabled, a generic number will be encoded in a method compatible with +the C library _POSIX_EXIT macro so that it can be decoded by other +programs, particularly ones written in C, like the GNV package. (VMS) =item fcntl -Not implemented. (Win32, VMS) +Not implemented. (Win32) +Some functions available based on the version of VMS. (VMS) =item flock @@ -1773,6 +1858,10 @@ Not implemented. (S) This operator is implemented via the File::Glob extension on most platforms. See L for portability information. +=item gmtime + +gmtime() has a range of about 2 billion years before and after 1970. + =item ioctl FILEHANDLE,FUNCTION,SCALAR Not implemented. (VMS) @@ -1796,19 +1885,39 @@ and makes it exit immediately with exit status $sig. As in Unix, if $sig is 0 and the specified process exists, it returns true without actually terminating it. (Win32) +C will terminate the process specified by $pid and +recursively all child processes owned by it. This is different from +the Unix semantics, where the signal will be delivered to all +processes in the same process group as the process specified by +$pid. (Win32) + +Is not supported for process identification number of 0 or negative +numbers. (VMS) + =item link -Not implemented. (S, MPE/iX, VMS, S) +Not implemented. (S, MPE/iX, S) Link count not updated because hard links are not quite that hard (They are sort of half-way between hard and soft links). (AmigaOS) -Hard links are implemented on Win32 (Windows NT and Windows 2000) -under NTFS only. +Hard links are implemented on Win32 under NTFS only. They are +natively supported on Windows 2000 and later. On Windows NT they +are implemented using the Windows POSIX subsystem support and the +Perl process will need Administrator or Backup Operator privileges +to create hard links. + +Available on 64 bit OpenVMS 8.2 and later. (VMS) + +=item localtime + +localtime() has the same range as L, but because time zone +rules change its accuracy for historical and future times may degrade +but usually by no more than an hour. =item lstat -Not implemented. (VMS, S) +Not implemented. (S) Return values (especially for device and inode) may be bogus. (Win32) @@ -1897,7 +2006,9 @@ be implemented even in UNIX platforms. =item socketpair -Not implemented. (Win32, VMS, S, VOS, VM/ESA) +Not implemented. (S, VOS, VM/ESA) + +Available on 64 bit OpenVMS 8.2 and later. (VMS) =item stat @@ -1925,9 +2036,17 @@ meaningful and will differ between stat calls on the same file. (os2) some versions of cygwin when doing a stat("foo") and if not finding it may then attempt to stat("foo.exe") (Cygwin) +On Win32 stat() needs to open the file to determine the link count +and update attributes that may have been changed through hard links. +Setting ${^WIN32_SLOPPY_STAT} to a true value speeds up stat() by +not performing this operation. (Win32) + =item symlink -Not implemented. (Win32, VMS, S) +Not implemented. (Win32, S) + +Implemented on 64 bit VMS 8.3. VMS requires the symbolic link to be in Unix +syntax if it is intended to resolve to a valid path. =item syscall @@ -1974,6 +2093,8 @@ Does not automatically flush output handles on some platforms. The return value is POSIX-like (shifted up by 8 bits), which only allows room for a made-up value derived from the severity bits of the native 32-bit condition code (unless overridden by C). +If the native condition code is one that has a POSIX value encoded, the +POSIX value will be decoded to extract the expected exit value. For more details see L. (VMS) =item times @@ -2206,4 +2327,4 @@ Paul J. Schinder , Michael G Schwern , Dan Sugalski , Nathan Torkington . - +John Malmberg