X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlport.pod;h=c1a5483add6795cb520dcce8ed01020d9811360a;hb=2b5ab1e742ea1b1374dcea7f6f90ef5c5cf29914;hp=3c1fba66010aa65996bffd6464321955ac3ef0a4;hpb=dd9f0070190bd7c99e6ea3d164a54285586358ad;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlport.pod b/pod/perlport.pod index 3c1fba6..c1a5483 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -10,23 +10,25 @@ a lot in common, they also have their own very particular and unique features. This document is meant to help you to find out what constitutes portable -perl code, so that once you have made your decision to write portably, +Perl code, so that once you have made your decision to write portably, you know where the lines are drawn, and you can stay within them. There is a tradeoff between taking full advantage of B particular type -of computer, and taking advantage of a full B of them. Naturally, -as you make your range bigger (and thus more diverse), the common denominators -drop, and you are left with fewer areas of common ground in which -you can operate to accomplish a particular task. Thus, when you begin -attacking a problem, it is important to consider which part of the tradeoff -curve you want to operate under. Specifically, whether it is important to -you that the task that you are coding needs the full generality of being -portable, or if it is sufficient to just get the job done. This is the -hardest choice to be made. The rest is easy, because Perl provides lots -of choices, whichever way you want to approach your problem. - -Looking at it another way, writing portable code is usually about willfully -limiting your available choices. Naturally, it takes discipline to do that. +of computer, and taking advantage of a full B of them. Naturally, +as you make your range bigger (and thus more diverse), the common +denominators drop, and you are left with fewer areas of common ground in +which you can operate to accomplish a particular task. Thus, when you +begin attacking a problem, it is important to consider which part of the +tradeoff curve you want to operate under. Specifically, whether it is +important to you that the task that you are coding needs the full +generality of being portable, or if it is sufficient to just get the job +done. This is the hardest choice to be made. The rest is easy, because +Perl provides lots of choices, whichever way you want to approach your +problem. + +Looking at it another way, writing portable code is usually about +willfully limiting your available choices. Naturally, it takes discipline +to do that. Be aware of two important points: @@ -59,10 +61,15 @@ take advantage of some unique feature of a particular platform, as is often the case with systems programming (whether for Unix, Windows, S, VMS, etc.), consider writing platform-specific code. -When the code will run on only two or three operating systems, then you may -only need to consider the differences of those particular systems. The -important thing is to decide where the code will run, and to be deliberate -in your decision. +When the code will run on only two or three operating systems, then you +may only need to consider the differences of those particular systems. +The important thing is to decide where the code will run, and to be +deliberate in your decision. + +The material below is separated into three main sections: main issues of +portability (L<"ISSUES">, platform-specific issues (L<"PLATFORMS">, and +builtin perl functions that behave differently on various ports +(L<"FUNCTION IMPLEMENTATIONS">. This information should not be considered complete; it includes possibly transient information about idiosyncrasies of some of the ports, almost @@ -71,11 +78,13 @@ should be considered a perpetual work in progress (EIMG SRC="yellow_sign.gif" ALT="Under Construction"E). + + =head1 ISSUES =head2 Newlines -In most operating systems, lines in files are separated with newlines. +In most operating systems, lines in files are terminated by newlines. Just what is used as a newline may vary from OS to OS. Unix traditionally uses C<\012>, one kind of Windows I/O uses C<\015\012>, and S uses C<\015>. @@ -97,7 +106,7 @@ C on a file, however, you can usually use C and C with arbitrary values quite safely. A common misconception in socket programming is that C<\n> eq C<\012> -everywhere. When using protocols, such as common Internet protocols, +everywhere. When using protocols such as common Internet protocols, C<\012> and C<\015> are called for specifically, and the values of the logical C<\n> and C<\r> (carriage return) are not reliable. @@ -110,9 +119,9 @@ the most popular EBCDIC webserver, for instance, accepts C<\r\n>, which translates those characters, along with all other characters in text streams, from EBCDIC to ASCII.] -However, C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious and -unsightly, as well as confusing to those maintaining the code. As such, -the C module supplies the Right Thing for those who want it. +However, using C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious +and unsightly, as well as confusing to those maintaining the code. As +such, the C module supplies the Right Thing for those who want it. use Socket qw(:DEFAULT :crlf); print SOCKET "Hi there, client!$CRLF" # RIGHT @@ -139,8 +148,41 @@ And this example is actually better than the previous one even for Unix platforms, because now any C<\015>'s (C<\cM>'s) are stripped out (and there was much rejoicing). +An important thing to remember is that functions that return data +should translate newlines when appropriate. Often one line of code +will suffice: + + $data =~ s/\015?\012/\n/g; + return $data; + + +=head2 Numbers endianness and Width + +Different CPUs store integers and floating point numbers in different +orders (called I) and widths (32-bit and 64-bit being the +most common). This affects your programs if they attempt to transfer +numbers in binary format from a CPU architecture to another over some +channel: either 'live' via network connections or storing the numbers +to secondary storage such as a disk file. -=head2 Files +Conflicting storage orders make utter mess out of the numbers: if a +little-endian host (Intel, Alpha) stores 0x12345678 (305419896 in +decimal), a big-endian host (Motorola, MIPS, Sparc, PA) reads it as +0x78563412 (2018915346 in decimal). To avoid this problem in network +(socket) connections use the C and C formats C<"n"> +and C<"N">, the "network" orders, they are guaranteed to be portable. + +Different widths can cause truncation even between platforms of equal +endianness: the platform of shorter width loses the upper parts of the +number. There is no good solution for this problem except to avoid +transferring or storing raw binary numbers. + +One can circumnavigate both these problems in two ways: either +transfer and store numbers always in text format, instead of raw +binary, or consider using modules like C (included in +the standard distribution as of Perl 5.005) and C. + +=head2 Files and Filesystems Most platforms these days structure files in a hierarchical fashion. So, it is reasonably safe to assume that any platform supports the @@ -148,13 +190,32 @@ notion of a "path" to uniquely identify a file on the system. Just how that path is actually written, differs. While they are similar, file path specifications differ between Unix, -Windows, S, OS/2, VMS, S and probably others. Unix, -for example, is one of the few OSes that has the idea of a root directory. -S uses C<:> as a path separator instead of C. VMS, Windows, and -OS/2 can work similarly to Unix with C as path separator, or in their own -idiosyncratic ways. C perl can emulate Unix filenames with C -as path separator, or go native and use C<.> for path separator and C<:> -to signal filing systems and disc names. +Windows, S, OS/2, VMS, VOS, S and probably others. +Unix, for example, is one of the few OSes that has the idea of a single +root directory. + +VMS, Windows, and OS/2 can work similarly to Unix with C as path +separator, or in their own idiosyncratic ways (such as having several +root directories and various "unrooted" device files such NIL: and +LPT:). + +S uses C<:> as a path separator instead of C. + +The filesystem may support neither hard links (C) nor +symbolic links (C, C, C). + +The filesystem may not support neither access timestamp nor change +timestamp (meaning that about the only portable timestamp is the +modification timestamp), or one second granularity of any timestamps +(e.g. the FAT filesystem limits the time granularity to two seconds). + +VOS perl can emulate Unix filenames with C as path separator. The +native pathname characters greater-than, less-than, number-sign, and +percent-sign are always accepted. + +C perl can emulate Unix filenames with C as path +separator, or go native and use C<.> for path separator and C<:> to +signal filing systems and disc names. As with the newline problem above, there are modules that can help. The C modules provide methods to do the Right Thing on whatever @@ -182,25 +243,41 @@ Also of use is C, from the standard distribution, which splits a pathname into pieces (base filename, full path to directory, and file suffix). -Remember not to count on the existence of system-specific files, like -F. If code does need to rely on such a file, include a -description of the file and its format in the code's documentation, and -make it easy for the user to override the default location of the file. +Even when on a single platform (if you can call UNIX a single platform), +remember not to count on the existence or the contents of +system-specific files or directories, like F, +F, F, or even F. For +example, F may exist but it may not contain the encrypted +passwords because the system is using some form of enhanced security -- +or it may not contain all the accounts because the system is using NIS. +If code does need to rely on such a file, include a description of the +file and its format in the code's documentation, and make it easy for +the user to override the default location of the file. -Don't assume that a you can open a full pathname for input with -C, as some platforms can use characters such as C> -which will perl C will interpret and eat. +Don't assume a text file will end with a newline. Do not have two files of the same name with different case, like -F and , as many platforms have case-insensitive +F and F, as many platforms have case-insensitive filenames. Also, try not to have non-word characters (except for C<.>) -in the names, and keep them to the 8.3 convention, for maximum portability. +in the names, and keep them to the 8.3 convention, for maximum +portability. Likewise, if using C, try to keep the split functions to 8.3 naming and case-insensitive conventions; or, at the very least, make it so the resulting files have a unique (case-insensitively) first 8 characters. +There certainly can be whitespace in filenames. Many systems (DOS, +VMS) cannot have more than one C<"."> in their filenames. + +Don't assume C> won't be the first character of a filename. +Always use C> explicitly to open a file for reading. + + open(FILE, "<$existing_file") or die $!; + +Actually, though, if filenames might use strange characters, it is +safest to open it with C instead of C, which is magic. + =head2 System Interaction @@ -215,11 +292,14 @@ the system. Remember to C files when you are done with them. Don't C or C an open file. Don't C to or C a file that is already tied to or opened; C or C first. +Don't open the same file more than once at a time for writing, as some +operating systems put mandatory locks on such files. + Don't count on a specific environment variable existing in C<%ENV>. -Don't even count on C<%ENV> entries being case-sensitive, or even +Don't count on C<%ENV> entries being case-sensitive, or even case-preserving. -Don't count on signals in portable programs. +Don't count on signals. Don't count on filename globbing. Use C, C, and C instead. @@ -227,12 +307,14 @@ C instead. Don't count on per-program environment variables, or per-program current directories. +Don't count on specific values of C<$!>. + =head2 Interprocess Communication (IPC) In general, don't directly access the system in code that is meant to be -portable. That means, no: C, C, C, C, C<``>, -C, C with a C<|>, or any of the other things that makes being +portable. That means, no C, C, C, C, C<``>, +C, C with a C<|>, nor any of the other things that makes being a Unix perl hacker worth being. Commands that launch external processes are generally supported on @@ -257,9 +339,11 @@ mailing methods, including mail, sendmail, and direct SMTP (via C) if a mail transfer agent is not available. The rule of thumb for portable code is: Do it all in portable Perl, or -use a module that may internally implement it with platform-specific code, -but expose a common interface. By portable Perl, we mean code that -avoids the constructs described in this document as being non-portable. +use a module (that may internally implement it with platform-specific +code, but expose a common interface). + +The UNIX System V IPC (C) is not available +even in all UNIX platforms. =head2 External Subroutines (XS) @@ -271,8 +355,8 @@ code might be. If the libraries and headers are portable, then it is normally reasonable to make sure the XS code is portable, too. There is a different kind of portability issue with writing XS -code: availability of a C compiler on the end-user's system. C brings with -it its own portability issues, and writing XS code will expose you to +code: availability of a C compiler on the end-user's system. C brings +with it its own portability issues, and writing XS code will expose you to some of those. Writing purely in perl is a comparatively easier way to achieve portability. @@ -286,7 +370,8 @@ C), and DBM modules. There is no one DBM module that is available on all platforms. C and the others are generally available on all Unix and DOSish -ports, but not in MacPerl, where C and C are available. +ports, but not in MacPerl, where only C and C are +available. The good news is that at least some DBM module should be available, and C will use whichever module it can find. Of course, then @@ -296,24 +381,49 @@ denominator (e.g., not exceeding 1K for each record). =head2 Time and Date -The system's notion of time of day and calendar date is controlled in widely -different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, and even -if it is, don't assume that you can control the timezone through that -variable. +The system's notion of time of day and calendar date is controlled in +widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, +and even if it is, don't assume that you can control the timezone through +that variable. + +Don't assume that the epoch starts at 00:00:00, January 1, 1970, +because that is OS-specific. Better to store a date in an unambiguous +representation. The ISO 8601 standard defines YYYY-MM-DD as the date +format. A text representation (like C<1 Jan 1970>) can be easily +converted into an OS-specific value using a module like +C. An array of values, such as those returned by +C, can be converted to an OS-specific representation using +C. + -Don't assume that the epoch starts at January 1, 1970, because that is -OS-specific. Better to store a date in an unambiguous representation. -A text representation (like C<1 Jan 1970>) can be easily converted into an -OS-specific value using a module like C. An array of values, -such as those returned by C, can be converted to an OS-specific -representation using C. +=head2 Character sets and character encoding + +Assume very little about character sets. Do not assume anything about +the numerical values (C, C) of characters. Do not +assume that the alphabetic characters are encoded contiguously (in +numerical sense). Do not assume anything about the ordering of the +characters. The lowercase letters may come before or after the +uppercase letters, the lowercase and uppercase may be interlaced so +that both 'a' and 'A' come before the 'b', the accented and other +international characters may be interlaced so that E comes +before the 'b'. + + +=head2 Internationalisation + +If you may assume POSIX (a rather large assumption, that in practice +means UNIX), you may read more about the POSIX locale system from +L. The locale system at least attempts to make things a +little bit more portable, or at least more convenient and +native-friendly for non-English users. The system affects character +sets and encoding, and date and time formatting, among other things. =head2 System Resources -If your code is destined for systems with severely constrained (or missing!) -virtual memory systems then you want to be especially mindful of avoiding -wasteful constructs such as: +If your code is destined for systems with severely constrained (or +missing!) virtual memory systems then you want to be I mindful +of avoiding wasteful constructs such as: # NOTE: this is no longer "bad" in perl5.005 for (0..10000000) {} # bad @@ -322,41 +432,45 @@ wasteful constructs such as: @lines = ; # bad while () {$file .= $_} # sometimes bad - $file = join '', ; # better + $file = join('', ); # better The last two may appear unintuitive to most people. The first of those two constructs repeatedly grows a string, while the second allocates a large chunk of memory in one go. On some systems, the latter is more efficient that the former. + =head2 Security -Most multi-user platforms provide basic levels of security that is usually felt -at the file-system level. Other platforms usually don't (unfortunately). -Thus the notion of User-ID, or "home" directory, or even the state of -being logged-in may be unrecognizable on many platforms. If you write -programs that are security conscious, it is usually best to know what -type of system you will be operating under, and write code explicitly +Most multi-user platforms provide basic levels of security that is usually +felt at the file-system level. Other platforms usually don't +(unfortunately). Thus the notion of user id, or "home" directory, or even +the state of being logged-in, may be unrecognizable on many platforms. If +you write programs that are security conscious, it is usually best to know +what type of system you will be operating under, and write code explicitly for that platform (or class of platforms). + =head2 Style For those times when it is necessary to have platform-specific code, consider keeping the platform-specific code in one place, making porting to other platforms easier. Use the C module and the special -variable C<$^O> to differentiate platforms, as described in L<"PLATFORMS">. +variable C<$^O> to differentiate platforms, as described in +L<"PLATFORMS">. -=head1 CPAN TESTERS +=head1 CPAN Testers -Module uploaded to CPAN are tested by a variety of volunteers on -different platforms. These CPAN testers are notified by e-mail of each +Modules uploaded to CPAN are tested by a variety of volunteers on +different platforms. These CPAN testers are notified by mail of each new upload, and reply to the list with PASS, FAIL, NA (not applicable to -this platform), or ???? (unknown), along with any relevant notations. +this platform), or UNKNOWN (unknown), along with any relevant notations. The purpose of the testing is twofold: one, to help developers fix any -problems in their code; two, to provide users with information about -whether or not a given module works on a given platform. +problems in their code that crop up because of lack of testing on other +platforms; two, to provide users with information about whether or not +a given module works on a given platform. =over 4 @@ -382,24 +496,24 @@ Perl works on a bewildering variety of Unix and Unix-like platforms (see e.g. most of the files in the F directory in the source code kit). On most of these systems, the value of C<$^O> (hence C<$Config{'osname'}>, too) is determined by lowercasing and stripping punctuation from the first -field of the string returned by typing - - % uname -a - -(or a similar command) at the shell prompt. Here, for example, are a few -of the more popular Unix flavors: +field of the string returned by typing C (or a similar command) +at the shell prompt. Here, for example, are a few of the more popular +Unix flavors: uname $^O $Config{'archname'} ------------------------------------------- - AIX aix - FreeBSD freebsd - Linux linux - HP-UX hpux - OSF1 dec_osf + AIX aix aix + FreeBSD freebsd freebsd-i386 + Linux linux i386-linux + HP-UX hpux PA-RISC1.1 + IRIX irix irix + OSF1 dec_osf alpha-dec_osf SunOS solaris sun4-solaris SunOS solaris i86pc-solaris - SunOS4 sunos + SunOS4 sunos sun4-sunos +Note that because the C<$Config{'archname'}> may depend on the hardware +architecture it may vary quite a lot, much more than the C<$^O>. =head2 DOS and Derivatives @@ -422,9 +536,9 @@ from calling any external programs, C will work just fine, and probably better, as it is more consistent with popular usage, and avoids the problem of remembering what to backwhack and what not to. -The DOS FAT file system can only accommodate "8.3" style filenames. Under +The DOS FAT filesystem can only accommodate "8.3" style filenames. Under the "case insensitive, but case preserving" HPFS (OS/2) and NTFS (NT) -file systems you may have to be careful about case returned with functions +filesystems you may have to be careful about case returned with functions like C or used with functions like C or C. DOS also treats several filenames as special, such as AUX, PRN, NUL, CON, @@ -463,7 +577,8 @@ Also see: =item The djgpp environment for DOS, C =item The EMX environment for DOS, OS/2, etc. C, -C +C or +C =item Build instructions for Win32, L. @@ -477,7 +592,8 @@ C Any module requiring XS compilation is right out for most people, because MacPerl is built using non-free (and non-cheap!) compilers. Some XS modules that can work with MacPerl are built and distributed in binary -form on CPAN. See I for more details. +form on CPAN. See I and L<"CPAN Testers"> +for more details. Directories are specified as: @@ -492,8 +608,8 @@ Files in a directory are stored in alphabetical order. Filenames are limited to 31 characters, and may include any character except C<:>, which is reserved as a path separator. -Instead of C, see C and C in -C. +Instead of C, see C and C in the +C module, or C and C. In the MacPerl application, you can't run a program from the command line; programs that expect C<@ARGV> to be populated can be edited with something @@ -515,7 +631,7 @@ shell: perl myscript.plx some arguments ToolServer is another app from Apple that provides access to MPW tools -from MPW and the MacPerl app, which allows MacPerl program to use +from MPW and the MacPerl app, which allows MacPerl programs to use C, backticks, and piped C. "S" is the proper name for the operating system, but the value @@ -528,9 +644,9 @@ the application or MPW tool version is running, check: $is_ppc = $MacPerl::Architecture eq 'MacPPC'; $is_68k = $MacPerl::Architecture eq 'Mac68K'; -S, to be based on NeXT's OpenStep OS, will be able to run MacPerl -natively (in the Blue Box, and even in the Yellow Box, once some changes -to the toolbox calls are made), but Unix perl will also run natively. +S, to be based on NeXT's OpenStep OS, will (in theory) be able +to run MacPerl natively, but Unix perl will also run natively under the +built-in Unix environment. Also see: @@ -546,7 +662,7 @@ Also see: =head2 VMS Perl on VMS is discussed in F in the perl distribution. -Note that perl on VMS can accept either VMS or Unix style file +Note that perl on VMS can accept either VMS- or Unix-style file specifications as in either of the following: $ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM @@ -591,7 +707,8 @@ VMS' RMS filesystem is case insensitive and does not preserve case. C returns lowercased filenames, but specifying a file for opening remains case insensitive. Files without extensions have a trailing period on them, so doing a C with a file named F -will return F (though that file could be opened with C. +will return F (though that file could be opened with +C). RMS had an eight level limit on directory depths from any rooted logical (allowing 16 levels overall) prior to VMS 7.2. Hence @@ -600,10 +717,10 @@ C is not. F authors might have to take this into account, but at least they can refer to the former as C. -The C module, which gets installed as part -of the build process on VMS, is a pure Perl module that can easily be -installed on non-VMS platforms and can be helpful for conversions to -and from RMS native formats. +The C module, which gets installed as part of the build +process on VMS, is a pure Perl module that can easily be installed on +non-VMS platforms and can be helpful for conversions to and from RMS +native formats. What C<\n> represents depends on the type of file that is open. It could be C<\015>, C<\012>, C<\015\012>, or nothing. Reading from a file @@ -640,18 +757,84 @@ Put words C in message body. =back +=head2 VOS + +Perl on VOS is discussed in F in the perl distribution. +Note that perl on VOS can accept either VOS- or Unix-style file +specifications as in either of the following: + + $ perl -ne "print if /perl_setup/i" >system>notices + $ perl -ne "print if /perl_setup/i" /system/notices + +or even a mixture of both as in: + + $ perl -ne "print if /perl_setup/i" >system/notices + +Note that even though VOS allows the slash character to appear in object +names, because the VOS port of Perl interprets it as a pathname +delimiting character, VOS files, directories, or links whose names +contain a slash character cannot be processed. Such files must be +renamed before they can be processed by Perl. + +The following C functions are unimplemented on VOS, and any attempt by +Perl to use them will result in a fatal error message and an immediate +exit from Perl: dup, do_aspawn, do_spawn, fork, waitpid. Once these +functions become available in the VOS POSIX.1 implementation, you can +either recompile and rebind Perl, or you can download a newer port from +ftp.stratus.com. + +The value of C<$^O> on VOS is "VOS". To determine the architecture that +you are running on without resorting to loading all of C<%Config> you +can examine the content of the C<@INC> array like so: + + if (grep(/VOS/, @INC)) { + print "I'm on a Stratus box!\n"; + } else { + print "I'm not on a Stratus box!\n"; + die; + } + + if (grep(/860/, @INC)) { + print "This box is a Stratus XA/R!\n"; + } elsif (grep(/7100/, @INC)) { + print "This box is a Stratus HP 7100 or 8000!\n"; + } elsif (grep(/8000/, @INC)) { + print "This box is a Stratus HP 8000!\n"; + } else { + print "This box is a Stratus 68K...\n"; + } + +Also see: + +=over 4 + +=item L + +=item VOS mailing list + +There is no specific mailing list for Perl on VOS. You can post +comments to the comp.sys.stratus newsgroup, or subscribe to the general +Stratus mailing list. Send a letter with "Subscribe Info-Stratus" in +the message body to majordomo@list.stratagy.com. + +=item VOS Perl on the web at C + +=back + + =head2 EBCDIC Platforms Recent versions of Perl have been ported to platforms such as OS/400 on -AS/400 minicomputers as well as OS/390 for IBM Mainframes. Such computers -use EBCDIC character sets internally (usually Character Code Set ID 00819 -for OS/400 and IBM-1047 for OS/390). Note that on the mainframe perl -currently works under the "Unix system services for OS/390" (formerly -known as OpenEdition). +AS/400 minicomputers as well as OS/390 & VM/ESA for IBM Mainframes. Such +computers use EBCDIC character sets internally (usually Character Code +Set ID 00819 for OS/400 and IBM-1047 for OS/390 & VM/ESA). Note that on +the mainframe perl currently works under the "Unix system services +for OS/390" (formerly known as OpenEdition) and VM/ESA OpenEdition. -As of R2.5 of USS for OS/390 that Unix sub-system did not support the -C<#!> shebang trick for script invocation. Hence, on OS/390 perl scripts -can executed with a header similar to the following simple script: +As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix +sub-systems do not support the C<#!> shebang trick for script invocation. +Hence, on OS/390 and VM/ESA perl scripts can be executed with a header +similar to the following simple script: : # use perl eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}' @@ -661,20 +844,22 @@ can executed with a header similar to the following simple script: print "Hello from perl!\n"; On these platforms, bear in mind that the EBCDIC character set may have -an effect on what happens with perl functions such as C, C, -C, C, C, C, C, C; as well as -bit-fiddling with ASCII constants using operators like C<^>, C<&> and -C<|>; not to mention dealing with socket interfaces to ASCII computers -(see L<"NEWLINES">). +an effect on what happens with some perl functions (such as C, +C, C, C, C, C, C, C), as +well as bit-fiddling with ASCII constants using operators like C<^>, C<&> +and C<|>, not to mention dealing with socket interfaces to ASCII computers +(see L). Fortunately, most web servers for the mainframe will correctly translate the C<\n> in the following statement to its ASCII equivalent (note that -C<\r> is the same under both ASCII and EBCDIC): +C<\r> is the same under both Unix and OS/390 & VM/ESA): print "Content-type: text/html\r\n\r\n"; The value of C<$^O> on OS/390 is "os390". +The value of C<$^O> on VM/ESA is "vmesa". + Some simple tricks for determining if you are running on an EBCDIC platform could include any of the following (perhaps all): @@ -685,9 +870,9 @@ platform could include any of the following (perhaps all): if (chr(169) eq 'z') { print "EBCDIC may be spoken here!\n"; } Note that one thing you may not want to rely on is the EBCDIC encoding -of punctuation characters since these may differ from code page to code page -(and once your module or script is rumoured to work with EBCDIC, folks will -want it to work with all EBCDIC character sets). +of punctuation characters since these may differ from code page to code +page (and once your module or script is rumoured to work with EBCDIC, +folks will want it to work with all EBCDIC character sets). Also see: @@ -699,22 +884,23 @@ The perl-mvs@perl.org list is for discussion of porting issues as well as general usage issues for all EBCDIC Perls. Send a message body of "subscribe perl-mvs" to majordomo@perl.org. -=item AS/400 Perl information at C +=item AS/400 Perl information at C =back =head2 Acorn RISC OS -As Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like Unix -and Unix filename emulation is turned on by default, it is quite likely that -most simple scripts will work "out of the box". The native filing system is -modular, and individual filing systems are free to be case sensitive or -insensitive, usually case preserving. Some native filing systems have name -length limits which file and directory names are silently truncated to fit - -scripts should be aware that the standard disc filing system currently has -a name length limit of B<10> characters, with up to 77 items in a directory, -but other filing systems may not impose such limitations. +As Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like +Unix and Unix filename emulation is turned on by default, it is quite +likely that most simple scripts will work "out of the box". The native +filing system is modular, and individual filing systems are free to be +case-sensitive or insensitive, and are usually case-preserving. Some +native filing systems have name length limits which file and directory +names are silently truncated to fit - scripts should be aware that the +standard disc filing system currently has a name length limit of B<10> +characters, with up to 77 items in a directory, but other filing systems +may not impose such limitations. Native filenames are of the form @@ -734,19 +920,20 @@ where The default filename translation is roughly C Note that C<"ADFS::HardDisc.$.File" ne 'ADFS::HardDisc.$.File'> and that -the second stage of $ interpolation in regular expressions will fall foul -of the C<$.> if scripts are not careful. - -Logical paths specified by system variables containing comma separated -search lists are also allowed, hence C is a valid filename, -and the filesystem will prefix C with each section of C -until a name is made that points to an object on disc. Writing to a new -file C would only be allowed if C contains a -single item list. The filesystem will also expand system variables in -filenames if enclosed in angle brackets, so CSystem$DirE.Modules> -would look for the file S>. The obvious -implication of this is that BE>> -and should be protected when C is used for input. +the second stage of C<$> interpolation in regular expressions will fall +foul of the C<$.> if scripts are not careful. + +Logical paths specified by system variables containing comma-separated +search lists are also allowed, hence C is a valid +filename, and the filesystem will prefix C with each section of +C until a name is made that points to an object on disc. +Writing to a new file C would only be allowed if +C contains a single item list. The filesystem will also +expand system variables in filenames if enclosed in angle brackets, so +CSystem$DirE.Modules> would look for the file +S>. The obvious implication of this is +that BE>> and should +be protected when C is used for input. Because C<.> was in use as a directory separator and filenames could not be assumed to be unique after 10 characters, Acorn implemented the C @@ -762,57 +949,44 @@ subdirectories named after the suffix. Hence files are translated: 11charname_.c c.11charname (assuming filesystem truncates at 10) The Unix emulation library's translation of filenames to native assumes -that this sort of translation is required, and allows a user defined list of -known suffixes which it will transpose in this fashion. This may appear -transparent, but consider that with these rules C and -C both map to C, and that C and -C cannot and do not attempt to emulate the reverse mapping. Other '.'s -in filenames are translated to '/'. - -S has "image files", files that behave as directories. For -example with suitable software this allows the contents of a zip file to -be treated as a directory at command line (and therefore script) level, -with full read-write random access. At present the perl port treats images -as directories: C<-d> returns true, C<-f> false, and C checks to -ensure that recognised images are empty before deleting them. In theory -images should never trouble a script, but in practice they may do so if -the software to deal with an image file is loaded and registered while the -script is running, as suddenly "files" that it had cached information on -metamorphose into directories. - -As implied above the environment accessed through C<%ENV> is global, and the -convention is that program specific environment variables are of the form -C. Each filing system maintains a current directory, and -the current filing system's current directory is the B current -directory. Consequently sociable scripts don't change the current directory -but rely on full pathnames, and scripts (and Makefiles) cannot assume that -they can spawn a child process which can change the current directory -without affecting its parent (and everyone else for that matter). - -As native operating system filehandles are global and currently are allocated -down from 255, with 0 being a reserved value the Unix emulation library -emulates Unix filehandles. Consequently you can't rely on passing C -C or C to your children. Run time libraries perform -command line processing to emulate Unix shell style C<>> redirection, but -the core operating system is written in assembler and has its own private, -obscure and somewhat broken convention. All this is further complicated by -the desire of users to express filenames of the form CFoo$DirE.Bar> on -the command line unquoted. (Oh yes, it's run time libraries interpreting the -quoting convention.) Hence C<``> command output capture has to perform -a guessing game as to how the command is going to interpret the command line -so that it can bodge it correctly to capture output. It assumes that a -string C[^EE]+\$[^EE]E> is a reference to an environment -variable, whereas anything else involving C> or C> is redirection, -and generally manages to be 99% right. Despite all this the problem remains -that scripts cannot rely on any Unix tools being available, or that any tools -found have Unix-like command line arguments. - -Extensions and XS are in theory buildable by anyone using free tools. In -practice many don't as the Acorn platform is used to binary distribution. -MakeMaker does itself run, but no make currently copes with MakeMaker's -makefiles! Even if (when) this is fixed os that the lack of a Unix-like -shell can cause problems with makefile rules, especially lines of the form -C and anything using quoting. +that this sort of translation is required, and allows a user defined list +of known suffixes which it will transpose in this fashion. This may +appear transparent, but consider that with these rules C +and C both map to C, and that C and +C cannot and do not attempt to emulate the reverse mapping. Other +C<.>s in filenames are translated to C. + +As implied above the environment accessed through C<%ENV> is global, and +the convention is that program specific environment variables are of the +form C. Each filing system maintains a current directory, +and the current filing system's current directory is the B current +directory. Consequently, sociable scripts don't change the current +directory but rely on full pathnames, and scripts (and Makefiles) cannot +assume that they can spawn a child process which can change the current +directory without affecting its parent (and everyone else for that +matter). + +As native operating system filehandles are global and currently are +allocated down from 255, with 0 being a reserved value the Unix emulation +library emulates Unix filehandles. Consequently, you can't rely on +passing C, C, or C to your children. + +The desire of users to express filenames of the form +CFoo$DirE.Bar> on the command line unquoted causes problems, +too: C<``> command output capture has to perform a guessing game. It +assumes that a string C[^EE]+\$[^EE]E> is a +reference to an environment variable, whereas anything else involving +C> or C> is redirection, and generally manages to be 99% +right. Of course, the problem remains that scripts cannot rely on any +Unix tools being available, or that any tools found have Unix-like command +line arguments. + +Extensions and XS are, in theory, buildable by anyone using free tools. +In practice, many don't, as users of the Acorn platform are used to binary +distribution. MakeMaker does run, but no available make currently copes +with MakeMaker's makefiles; even if/when this is fixed, the lack of a +Unix-like shell can cause problems with makefile rules, especially lines +of the form C, and anything using quoting. "S" is the proper name for the operating system, but the value in C<$^O> is "riscos" (because we don't like shouting). @@ -830,11 +1004,11 @@ Also see: Perl has been ported to a variety of platforms that do not fit into any of the above categories. Some, such as AmigaOS, BeOS, QNX, and Plan 9, have -been well integrated into the standard Perl source code kit. You may need +been well-integrated into the standard Perl source code kit. You may need to see the F directory on CPAN for information, and possibly -binaries, for the likes of: aos, atari, lynxos, HP-MPE/iX, riscos, -Tandem Guardian, vos, I (yes we know that some of these OSes may fall -under the Unix category but we are not a standards body.) +binaries, for the likes of: aos, atari, lynxos, riscos, Tandem Guardian, +vos, I (yes we know that some of these OSes may fall under the Unix +category, but we are not a standards body.) See also: @@ -846,7 +1020,7 @@ See also: =item Novell Netware -A free Perl 5 based PERL.NLM for Novell Netware is available from +A free perl5-based PERL.NLM for Novell Netware is available from C =back @@ -862,14 +1036,12 @@ The list may very well be incomplete, or wrong in some places. When in doubt, consult the platform-specific README files in the Perl source distribution, and other documentation resources for a given port. -Be aware, moreover, that even among Unix-ish systems there are variations, -and not all functions listed here are necessarily available, though -most usually are. +Be aware, moreover, that even among Unix-ish systems there are variations. For many functions, you can also query C<%Config>, exported by default from C. For example, to check if the platform has the C -call, check C<$Config{'d_lstat'}>. See L for a full description -of available variables. +call, check C<$Config{'d_lstat'}>. See L for a full +description of available variables. =head2 Alphabetical Listing of Perl Functions @@ -909,8 +1081,8 @@ C<-d> is true if passed a device spec without an explicit directory. (VMS) C<-T> and C<-B> are implemented, but might misclassify Mac text files -with foreign characters; this is the case will all platforms, but -affects S a lot. (S) +with foreign characters; this is the case will all platforms, but may +affect S often. (S) C<-x> (or C<-X>) determine if a file ends in one of the executable suffixes. C<-S> is meaningless. (Win32) @@ -939,9 +1111,11 @@ bits are meaningless. (Win32) Only good for changing "owner" and "other" read-write access. (S) +Access permissions are mapped onto VOS access-control list changes. (VOS) + =item chown LIST -Not implemented. (S, Win32, Plan9, S) +Not implemented. (S, Win32, Plan9, S, VOS) Does nothing, but won't fail. (Win32) @@ -949,20 +1123,22 @@ Does nothing, but won't fail. (Win32) =item chroot -Not implemented. (S, Win32, VMS, Plan9, S) +Not implemented. (S, Win32, VMS, Plan9, S, VOS, VM/ESA) =item crypt PLAINTEXT,SALT May not be available if library or source was not provided when building perl. (Win32) +Not implemented. (VOS) + =item dbmclose HASH -Not implemented. (VMS, Plan9) +Not implemented. (VMS, Plan9, VOS) =item dbmopen HASH,DBNAME,MODE -Not implemented. (VMS, Plan9) +Not implemented. (VMS, Plan9, VOS) =item dump LABEL @@ -976,19 +1152,21 @@ Invokes VMS debugger. (VMS) Not implemented. (S) +Implemented via Spawn. (VM/ESA) + =item fcntl FILEHANDLE,FUNCTION,SCALAR Not implemented. (Win32, VMS) =item flock FILEHANDLE,OPERATION -Not implemented (S, VMS, S). +Not implemented (S, VMS, S, VOS). Available only on Windows NT (not on Windows 95). (Win32) =item fork -Not implemented. (S, Win32, AmigaOS, S) +Not implemented. (S, Win32, AmigaOS, S, VOS, VM/ESA) =item getlogin @@ -996,7 +1174,7 @@ Not implemented. (S, S) =item getpgrp PID -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS) =item getppid @@ -1004,7 +1182,7 @@ Not implemented. (S, Win32, VMS, S) =item getpriority WHICH,WHO -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) =item getpwnam NAME @@ -1044,11 +1222,11 @@ Not implemented. (S) =item getpwent -Not implemented. (S, Win32) +Not implemented. (S, Win32, VM/ESA) =item getgrent -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, VM/ESA) =item gethostent @@ -1092,11 +1270,11 @@ Not implemented. (Plan9, Win32, S) =item endpwent -Not implemented. (S, Win32) +Not implemented. (S, Win32, VM/ESA) =item endgrent -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VM/ESA) =item endhostent @@ -1125,13 +1303,14 @@ Not implemented. (S, Plan9) Globbing built-in, but only C<*> and C metacharacters are supported. (S) -Features depend on external perlglob.exe or perlglob.bat. May be overridden -with something like File::DosGlob, which is recommended. (Win32) +Features depend on external perlglob.exe or perlglob.bat. May be +overridden with something like File::DosGlob, which is recommended. +(Win32) Globbing built-in, but only C<*> and C metacharacters are supported. -Globbing relies on operating system calls, which may return filenames in -any order. As most filesystems are case insensitive even "sorted" -filenames will not be in case sensitive order. (S) +Globbing relies on operating system calls, which may return filenames +in any order. As most filesystems are case-insensitive, even "sorted" +filenames will not be in case-sensitive order. (S) =item ioctl FILEHANDLE,FUNCTION,SCALAR @@ -1144,15 +1323,19 @@ Available only for socket handles. (S) =item kill LIST -Not implemented, hence not useful for taint checking. (S, S) +Not implemented, hence not useful for taint checking. (S, +S) -Available only for process handles returned by the C method of -spawning a process. (Win32) +Available only for process handles returned by the C +method of spawning a process. (Win32) =item link OLDFILE,NEWFILE Not implemented. (S, Win32, VMS, S) +Link count not updated because hard links are not quite that hard +(They are sort of half-way between hard and soft links). (AmigaOS) + =item lstat FILEHANDLE =item lstat EXPR @@ -1171,7 +1354,7 @@ Return values may be bogus. (Win32) =item msgrcv ID,VAR,SIZE,TYPE,FLAGS -Not implemented. (S, Win32, VMS, Plan9, S) +Not implemented. (S, Win32, VMS, Plan9, S, VOS) =item open FILEHANDLE,EXPR @@ -1186,6 +1369,8 @@ open to C<|-> and C<-|> are unsupported. (S, Win32, S) Not implemented. (S) +Very limited functionality. (MiNT) + =item readlink EXPR =item readlink @@ -1204,15 +1389,15 @@ Only reliable on sockets. (S) =item semop KEY,OPSTRING -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS) =item setpgrp PID,PGRP -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS) =item setpriority WHICH,WHO,PRIORITY -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS) =item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL @@ -1226,11 +1411,11 @@ Not implemented. (S, Plan9) =item shmwrite ID,STRING,POS,SIZE -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS) =item socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) =item stat FILEHANDLE @@ -1254,13 +1439,14 @@ Not implemented. (Win32, VMS, S) =item syscall LIST -Not implemented. (S, Win32, VMS, S) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) =item sysopen FILEHANDLE,FILENAME,MODE,PERMS The traditional "0", "1", and "2" MODEs are implemented with different -numeric values on some systems. The flags exported by C should work -everywhere though. (S, OS/390) +numeric values on some systems. The flags exported by C +(O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though. (S, OS/390, VM/ESA) =item system LIST @@ -1282,6 +1468,11 @@ the child program uses a compatible version of the emulation library. I will call the native command line direct and no such emulation of a child Unix program will exists. Mileage B vary. (S) +Far from being POSIX compliant. Because there may be no underlying +/bin/sh tries to work around the problem by forking and execing the +first token in its argument string. Handles basic redirection +("E" or "E") on its own behalf. (MiNT) + =item times Only the first entry returned is nonzero. (S) @@ -1298,23 +1489,37 @@ Not useful. (S) Not implemented. (VMS) +Truncation to zero-length only. (VOS) + +If a FILEHANDLE is supplied, it must be writable and opened in append +mode (i.e., use C>filename')> +or C. If a filename is supplied, it +should not be held open elsewhere. (Win32) + =item umask EXPR =item umask Returns undef where unavailable, as of version 5.005. +C works but the correct permissions are only set when the file +is finally close()d. (AmigaOS) + =item utime LIST Only the modification time is updated. (S, VMS, S) -May not behave as expected. (Win32) +May not behave as expected. Behavior depends on the C runtime +library's implementation of utime(), and the filesystem being +used. The FAT filesystem typically does not support an "access +time" field, and it may limit timestamps to a granularity of +two seconds. (Win32) =item wait =item waitpid PID,FLAGS -Not implemented. (S) +Not implemented. (S, VOS) Can only be applied to process handles returned for processes spawned using C. (Win32) @@ -1327,15 +1532,43 @@ Not useful. (S) =over 4 -=item 1.32, 05 August 1998 +=item v1.39, 11 February, 1999 + +Changes from Jarkko and EMX URL fixes Michael Schwern. Additional +note about newlines added. + +=item v1.38, 31 December 1998 + +More changes from Jarkko. + +=item v1.37, 19 December 1998 + +More minor changes. Merge two separate version 1.35 documents. + +=item v1.36, 9 September 1998 + +Updated for Stratus VOS. Also known as version 1.35. + +=item v1.35, 13 August 1998 + +Integrate more minor changes, plus addition of new sections under +L<"ISSUES">: L<"Numbers endianness and Width">, +L<"Character sets and character encoding">, +L<"Internationalisation">. + +=item v1.33, 06 August 1998 + +Integrate more minor changes. + +=item v1.32, 05 August 1998 Integrate more minor changes. -=item 1.30, 03 August 1998 +=item v1.30, 03 August 1998 Major update for RISC OS, other minor changes. -=item 1.23, 10 July 1998 +=item v1.23, 10 July 1998 First public release with perl5.005. @@ -1344,32 +1577,37 @@ First public release with perl5.005. =head1 AUTHORS / CONTRIBUTORS Abigail Eabigail@fnx.comE, -Charles Bailey Ebailey@genetics.upenn.eduE, +Charles Bailey Ebailey@newman.upenn.eduE, Graham Barr Egbarr@pobox.comE, Tom Christiansen Etchrist@perl.comE, Nicholas Clark ENicholas.Clark@liverpool.ac.ukE, Andy Dougherty Edoughera@lafcol.lafayette.eduE, Dominic Dunlop Edomo@vo.luE, +Neale Ferguson Eneale@mailbox.tabnsw.com.auE +Paul Green EPaul_Green@stratus.comE, M.J.T. Guy Emjtg@cus.cam.ac.ukE, +Jarkko Hietaniemi Ejhi@iki.fi, Luther Huffman Elutherh@stratcom.comE, Nick Ing-Simmons Enick@ni-s.u-net.comE, -Andreas J. Koenig Ekoenig@kulturbox.deE, +Andreas J. KEnig Ekoenig@kulturbox.deE, +Markus Laker Emlaker@contax.co.ukE, Andrew M. Langmead Eaml@world.std.comE, Paul Moore EPaul.Moore@uk.origin-it.comE, Chris Nandor Epudge@pobox.comE, -Matthias Neercher Eneeri@iis.ee.ethz.chE, +Matthias Neeracher Eneeri@iis.ee.ethz.chE, Gary Ng E71564.1743@CompuServe.COME, Tom Phoenix Erootbeer@teleport.comE, Peter Prymmer Epvhp@forte.comE, -Hugo van der Sanden Eh.sanden@elsevier.nlE, +Hugo van der Sanden Ehv@crypt0.demon.co.ukE, Gurusamy Sarathy Egsar@umich.eduE, Paul J. Schinder Eschinder@pobox.comE, +Michael G Schwern Eschwern@pobox.comE, Dan Sugalski Esugalskd@ous.eduE, Nathan Torkington Egnat@frii.comE. -This document is maintained by Chris Nandor. +This document is maintained by Chris Nandor +Epudge@pobox.comE. =head1 VERSION -Version 1.32, last modified 05 August 1998. - +Version 1.39, last modified 11 February 1999