X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlport.pod;h=c1a5483add6795cb520dcce8ed01020d9811360a;hb=2b5ab1e742ea1b1374dcea7f6f90ef5c5cf29914;hp=83654689a61cd0fb022c9ac67c4af4b205093643;hpb=e41182b575eaa0023db9380ce3ad8398f8ffe918;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlport.pod b/pod/perlport.pod index 8365468..c1a5483 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -10,23 +10,25 @@ a lot in common, they also have their own very particular and unique features. This document is meant to help you to find out what constitutes portable -perl code, so that once you have made your decision to write portably, +Perl code, so that once you have made your decision to write portably, you know where the lines are drawn, and you can stay within them. There is a tradeoff between taking full advantage of B particular type -of computer, and taking advantage of a full B of them. Naturally, -as you make your range bigger (and thus more diverse), the common denominators -drop, and you are left with fewer areas of common ground in which -you can operate to accomplish a particular task. Thus, when you begin -attacking a problem, it is important to consider which part of the tradeoff -curve you want to operate under. Specifically, whether it is important to -you that the task that you are coding needs the full generality of being -portable, or if it is sufficient to just get the job done. This is the -hardest choice to be made. The rest is easy, because Perl provides lots -of choices, whichever way you want to approach your problem. - -Looking at it another way, writing portable code is usually about willfully -limiting your available choices. Naturally, it takes discipline to do that. +of computer, and taking advantage of a full B of them. Naturally, +as you make your range bigger (and thus more diverse), the common +denominators drop, and you are left with fewer areas of common ground in +which you can operate to accomplish a particular task. Thus, when you +begin attacking a problem, it is important to consider which part of the +tradeoff curve you want to operate under. Specifically, whether it is +important to you that the task that you are coding needs the full +generality of being portable, or if it is sufficient to just get the job +done. This is the hardest choice to be made. The rest is easy, because +Perl provides lots of choices, whichever way you want to approach your +problem. + +Looking at it another way, writing portable code is usually about +willfully limiting your available choices. Naturally, it takes discipline +to do that. Be aware of two important points: @@ -59,23 +61,30 @@ take advantage of some unique feature of a particular platform, as is often the case with systems programming (whether for Unix, Windows, S, VMS, etc.), consider writing platform-specific code. -When the code will run on only two or three operating systems, then you may -only need to consider the differences of those particular systems. The -important thing is to decide where the code will run, and to be deliberate -in your decision. +When the code will run on only two or three operating systems, then you +may only need to consider the differences of those particular systems. +The important thing is to decide where the code will run, and to be +deliberate in your decision. + +The material below is separated into three main sections: main issues of +portability (L<"ISSUES">, platform-specific issues (L<"PLATFORMS">, and +builtin perl functions that behave differently on various ports +(L<"FUNCTION IMPLEMENTATIONS">. This information should not be considered complete; it includes possibly -transient information about idiosyncracies of some of the ports, almost +transient information about idiosyncrasies of some of the ports, almost all of which are in a state of constant evolution. Thus this material should be considered a perpetual work in progress (EIMG SRC="yellow_sign.gif" ALT="Under Construction"E). + + =head1 ISSUES =head2 Newlines -In most operating systems, lines in files are separated with newlines. +In most operating systems, lines in files are terminated by newlines. Just what is used as a newline may vary from OS to OS. Unix traditionally uses C<\012>, one kind of Windows I/O uses C<\015\012>, and S uses C<\015>. @@ -97,7 +106,7 @@ C on a file, however, you can usually use C and C with arbitrary values quite safely. A common misconception in socket programming is that C<\n> eq C<\012> -everywhere. When using protocols, such as common Internet protocols, +everywhere. When using protocols such as common Internet protocols, C<\012> and C<\015> are called for specifically, and the values of the logical C<\n> and C<\r> (carriage return) are not reliable. @@ -110,9 +119,9 @@ the most popular EBCDIC webserver, for instance, accepts C<\r\n>, which translates those characters, along with all other characters in text streams, from EBCDIC to ASCII.] -However, C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious and -unsightly, as well as confusing to those maintaining the code. As such, -the C module supplies the Right Thing for those who want it. +However, using C<\015\012> (or C<\cM\cJ>, or C<\x0D\x0A>) can be tedious +and unsightly, as well as confusing to those maintaining the code. As +such, the C module supplies the Right Thing for those who want it. use Socket qw(:DEFAULT :crlf); print SOCKET "Hi there, client!$CRLF" # RIGHT @@ -139,8 +148,41 @@ And this example is actually better than the previous one even for Unix platforms, because now any C<\015>'s (C<\cM>'s) are stripped out (and there was much rejoicing). +An important thing to remember is that functions that return data +should translate newlines when appropriate. Often one line of code +will suffice: + + $data =~ s/\015?\012/\n/g; + return $data; + + +=head2 Numbers endianness and Width + +Different CPUs store integers and floating point numbers in different +orders (called I) and widths (32-bit and 64-bit being the +most common). This affects your programs if they attempt to transfer +numbers in binary format from a CPU architecture to another over some +channel: either 'live' via network connections or storing the numbers +to secondary storage such as a disk file. + +Conflicting storage orders make utter mess out of the numbers: if a +little-endian host (Intel, Alpha) stores 0x12345678 (305419896 in +decimal), a big-endian host (Motorola, MIPS, Sparc, PA) reads it as +0x78563412 (2018915346 in decimal). To avoid this problem in network +(socket) connections use the C and C formats C<"n"> +and C<"N">, the "network" orders, they are guaranteed to be portable. + +Different widths can cause truncation even between platforms of equal +endianness: the platform of shorter width loses the upper parts of the +number. There is no good solution for this problem except to avoid +transferring or storing raw binary numbers. + +One can circumnavigate both these problems in two ways: either +transfer and store numbers always in text format, instead of raw +binary, or consider using modules like C (included in +the standard distribution as of Perl 5.005) and C. -=head2 File Paths +=head2 Files and Filesystems Most platforms these days structure files in a hierarchical fashion. So, it is reasonably safe to assume that any platform supports the @@ -148,11 +190,32 @@ notion of a "path" to uniquely identify a file on the system. Just how that path is actually written, differs. While they are similar, file path specifications differ between Unix, -Windows, S, OS/2, VMS and probably others. Unix, for example, is -one of the few OSes that has the idea of a root directory. S -uses C<:> as a path separator instead of C. VMS, Windows, and OS/2 -can work similarly to Unix with C as path separator, or in their own -idiosyncratic ways. +Windows, S, OS/2, VMS, VOS, S and probably others. +Unix, for example, is one of the few OSes that has the idea of a single +root directory. + +VMS, Windows, and OS/2 can work similarly to Unix with C as path +separator, or in their own idiosyncratic ways (such as having several +root directories and various "unrooted" device files such NIL: and +LPT:). + +S uses C<:> as a path separator instead of C. + +The filesystem may support neither hard links (C) nor +symbolic links (C, C, C). + +The filesystem may not support neither access timestamp nor change +timestamp (meaning that about the only portable timestamp is the +modification timestamp), or one second granularity of any timestamps +(e.g. the FAT filesystem limits the time granularity to two seconds). + +VOS perl can emulate Unix filenames with C as path separator. The +native pathname characters greater-than, less-than, number-sign, and +percent-sign are always accepted. + +C perl can emulate Unix filenames with C as path +separator, or go native and use C<.> for path separator and C<:> to +signal filing systems and disc names. As with the newline problem above, there are modules that can help. The C modules provide methods to do the Right Thing on whatever @@ -180,10 +243,40 @@ Also of use is C, from the standard distribution, which splits a pathname into pieces (base filename, full path to directory, and file suffix). -Remember not to count on the existence of system-specific files, like -F. If code does need to rely on such a file, include a -description of the file and its format in the code's documentation, and -make it easy for the user to override the default location of the file. +Even when on a single platform (if you can call UNIX a single platform), +remember not to count on the existence or the contents of +system-specific files or directories, like F, +F, F, or even F. For +example, F may exist but it may not contain the encrypted +passwords because the system is using some form of enhanced security -- +or it may not contain all the accounts because the system is using NIS. +If code does need to rely on such a file, include a description of the +file and its format in the code's documentation, and make it easy for +the user to override the default location of the file. + +Don't assume a text file will end with a newline. + +Do not have two files of the same name with different case, like +F and F, as many platforms have case-insensitive +filenames. Also, try not to have non-word characters (except for C<.>) +in the names, and keep them to the 8.3 convention, for maximum +portability. + +Likewise, if using C, try to keep the split functions to +8.3 naming and case-insensitive conventions; or, at the very least, +make it so the resulting files have a unique (case-insensitively) +first 8 characters. + +There certainly can be whitespace in filenames. Many systems (DOS, +VMS) cannot have more than one C<"."> in their filenames. + +Don't assume C> won't be the first character of a filename. +Always use C> explicitly to open a file for reading. + + open(FILE, "<$existing_file") or die $!; + +Actually, though, if filenames might use strange characters, it is +safest to open it with C instead of C, which is magic. =head2 System Interaction @@ -199,21 +292,29 @@ the system. Remember to C files when you are done with them. Don't C or C an open file. Don't C to or C a file that is already tied to or opened; C or C first. +Don't open the same file more than once at a time for writing, as some +operating systems put mandatory locks on such files. + Don't count on a specific environment variable existing in C<%ENV>. -Don't even count on C<%ENV> entries being case-sensitive, or even +Don't count on C<%ENV> entries being case-sensitive, or even case-preserving. -Don't count on signals in portable programs. +Don't count on signals. Don't count on filename globbing. Use C, C, and C instead. +Don't count on per-program environment variables, or per-program current +directories. + +Don't count on specific values of C<$!>. + =head2 Interprocess Communication (IPC) In general, don't directly access the system in code that is meant to be -portable. That means, no: C, C, C, C, C<``>, -C, C with a C<|>, or any of the other things that makes being +portable. That means, no C, C, C, C, C<``>, +C, C with a C<|>, nor any of the other things that makes being a Unix perl hacker worth being. Commands that launch external processes are generally supported on @@ -238,9 +339,11 @@ mailing methods, including mail, sendmail, and direct SMTP (via C) if a mail transfer agent is not available. The rule of thumb for portable code is: Do it all in portable Perl, or -use a module that may internally implement it with platform-specific code, -but expose a common interface. By portable Perl, we mean code that -avoids the constructs described in this document as being non-portable. +use a module (that may internally implement it with platform-specific +code, but expose a common interface). + +The UNIX System V IPC (C) is not available +even in all UNIX platforms. =head2 External Subroutines (XS) @@ -252,8 +355,8 @@ code might be. If the libraries and headers are portable, then it is normally reasonable to make sure the XS code is portable, too. There is a different kind of portability issue with writing XS -code: availability of a C compiler on the end-user's system. C brings with -it its own portability issues, and writing XS code will expose you to +code: availability of a C compiler on the end-user's system. C brings +with it its own portability issues, and writing XS code will expose you to some of those. Writing purely in perl is a comparatively easier way to achieve portability. @@ -267,7 +370,8 @@ C), and DBM modules. There is no one DBM module that is available on all platforms. C and the others are generally available on all Unix and DOSish -ports, but not in MacPerl, where C and C are available. +ports, but not in MacPerl, where only C and C are +available. The good news is that at least some DBM module should be available, and C will use whichever module it can find. Of course, then @@ -277,24 +381,49 @@ denominator (e.g., not exceeding 1K for each record). =head2 Time and Date -The system's notion of time of day and calendar date is controlled in widely -different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, and even -if it is, don't assume that you can control the timezone through that -variable. +The system's notion of time of day and calendar date is controlled in +widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, +and even if it is, don't assume that you can control the timezone through +that variable. + +Don't assume that the epoch starts at 00:00:00, January 1, 1970, +because that is OS-specific. Better to store a date in an unambiguous +representation. The ISO 8601 standard defines YYYY-MM-DD as the date +format. A text representation (like C<1 Jan 1970>) can be easily +converted into an OS-specific value using a module like +C. An array of values, such as those returned by +C, can be converted to an OS-specific representation using +C. + + +=head2 Character sets and character encoding -Don't assume that the epoch starts at January 1, 1970, because that is -OS-specific. Better to store a date in an unambiguous representation. -A text representation (like C<1 Jan 1970>) can be easily converted into an -OS-specific value using a module like C. An array of values, -such as those returned by C, can be converted to an OS-specific -representation using C. +Assume very little about character sets. Do not assume anything about +the numerical values (C, C) of characters. Do not +assume that the alphabetic characters are encoded contiguously (in +numerical sense). Do not assume anything about the ordering of the +characters. The lowercase letters may come before or after the +uppercase letters, the lowercase and uppercase may be interlaced so +that both 'a' and 'A' come before the 'b', the accented and other +international characters may be interlaced so that E comes +before the 'b'. + + +=head2 Internationalisation + +If you may assume POSIX (a rather large assumption, that in practice +means UNIX), you may read more about the POSIX locale system from +L. The locale system at least attempts to make things a +little bit more portable, or at least more convenient and +native-friendly for non-English users. The system affects character +sets and encoding, and date and time formatting, among other things. =head2 System Resources -If your code is destined for systems with severely constrained (or missing!) -virtual memory systems then you want to be especially mindful of avoiding -wasteful constructs such as: +If your code is destined for systems with severely constrained (or +missing!) virtual memory systems then you want to be I mindful +of avoiding wasteful constructs such as: # NOTE: this is no longer "bad" in perl5.005 for (0..10000000) {} # bad @@ -303,41 +432,45 @@ wasteful constructs such as: @lines = ; # bad while () {$file .= $_} # sometimes bad - $file = join '', ; # better + $file = join('', ); # better The last two may appear unintuitive to most people. The first of those two constructs repeatedly grows a string, while the second allocates a large chunk of memory in one go. On some systems, the latter is more efficient that the former. + =head2 Security -Most Unix platforms provide basic levels of security that is usually felt -at the file-system level. Other platforms usually don't (unfortunately). -Thus the notion of User-ID, or "home" directory, or even the state of -being logged-in may be unrecognizable on may platforms. If you write -programs that are security conscious, it is usually best to know what -type of system you will be operating under, and write code explicitly +Most multi-user platforms provide basic levels of security that is usually +felt at the file-system level. Other platforms usually don't +(unfortunately). Thus the notion of user id, or "home" directory, or even +the state of being logged-in, may be unrecognizable on many platforms. If +you write programs that are security conscious, it is usually best to know +what type of system you will be operating under, and write code explicitly for that platform (or class of platforms). + =head2 Style For those times when it is necessary to have platform-specific code, consider keeping the platform-specific code in one place, making porting to other platforms easier. Use the C module and the special -variable C<$^O> to differentiate platforms, as described in L<"PLATFORMS">. +variable C<$^O> to differentiate platforms, as described in +L<"PLATFORMS">. -=head1 CPAN TESTERS +=head1 CPAN Testers -Module uploaded to CPAN are tested by a variety of volunteers on -different platforms. These CPAN testers are notified by e-mail of each +Modules uploaded to CPAN are tested by a variety of volunteers on +different platforms. These CPAN testers are notified by mail of each new upload, and reply to the list with PASS, FAIL, NA (not applicable to -this platform), or ???? (unknown), along with any relevant notations. +this platform), or UNKNOWN (unknown), along with any relevant notations. The purpose of the testing is twofold: one, to help developers fix any -problems in their code; two, to provide users with information about -whether or not a given module works on a given platform. +problems in their code that crop up because of lack of testing on other +platforms; two, to provide users with information about whether or not +a given module works on a given platform. =over 4 @@ -363,23 +496,24 @@ Perl works on a bewildering variety of Unix and Unix-like platforms (see e.g. most of the files in the F directory in the source code kit). On most of these systems, the value of C<$^O> (hence C<$Config{'osname'}>, too) is determined by lowercasing and stripping punctuation from the first -field of the string returned by typing - - % uname -a - -(or a similar command) at the shell prompt. Here, for example, are a few -of the more popular Unix flavors: - - uname $^O - -------------------- - AIX aix - FreeBSD freebsd - Linux linux - HP-UX hpux - OSF1 dec_osf - SunOS solaris - SunOS4 sunos - +field of the string returned by typing C (or a similar command) +at the shell prompt. Here, for example, are a few of the more popular +Unix flavors: + + uname $^O $Config{'archname'} + ------------------------------------------- + AIX aix aix + FreeBSD freebsd freebsd-i386 + Linux linux i386-linux + HP-UX hpux PA-RISC1.1 + IRIX irix irix + OSF1 dec_osf alpha-dec_osf + SunOS solaris sun4-solaris + SunOS solaris i86pc-solaris + SunOS4 sunos sun4-sunos + +Note that because the C<$Config{'archname'}> may depend on the hardware +architecture it may vary quite a lot, much more than the C<$^O>. =head2 DOS and Derivatives @@ -402,9 +536,9 @@ from calling any external programs, C will work just fine, and probably better, as it is more consistent with popular usage, and avoids the problem of remembering what to backwhack and what not to. -The DOS FAT file system can only accomodate "8.3" style filenames. Under +The DOS FAT filesystem can only accommodate "8.3" style filenames. Under the "case insensitive, but case preserving" HPFS (OS/2) and NTFS (NT) -file systems you may have to be careful about case returned with functions +filesystems you may have to be careful about case returned with functions like C or used with functions like C or C. DOS also treats several filenames as special, such as AUX, PRN, NUL, CON, @@ -443,7 +577,8 @@ Also see: =item The djgpp environment for DOS, C =item The EMX environment for DOS, OS/2, etc. C, -C +C or +C =item Build instructions for Win32, L. @@ -452,12 +587,13 @@ C =back -=head2 MacPerl +=head2 S Any module requiring XS compilation is right out for most people, because MacPerl is built using non-free (and non-cheap!) compilers. Some XS modules that can work with MacPerl are built and distributed in binary -form on CPAN. See I for more details. +form on CPAN. See I and L<"CPAN Testers"> +for more details. Directories are specified as: @@ -472,8 +608,8 @@ Files in a directory are stored in alphabetical order. Filenames are limited to 31 characters, and may include any character except C<:>, which is reserved as a path separator. -Instead of C, see C and C in -C. +Instead of C, see C and C in the +C module, or C and C. In the MacPerl application, you can't run a program from the command line; programs that expect C<@ARGV> to be populated can be edited with something @@ -495,7 +631,7 @@ shell: perl myscript.plx some arguments ToolServer is another app from Apple that provides access to MPW tools -from MPW and the MacPerl app, which allows MacPerl program to use +from MPW and the MacPerl app, which allows MacPerl programs to use C, backticks, and piped C. "S" is the proper name for the operating system, but the value @@ -508,6 +644,9 @@ the application or MPW tool version is running, check: $is_ppc = $MacPerl::Architecture eq 'MacPPC'; $is_68k = $MacPerl::Architecture eq 'Mac68K'; +S, to be based on NeXT's OpenStep OS, will (in theory) be able +to run MacPerl natively, but Unix perl will also run natively under the +built-in Unix environment. Also see: @@ -523,7 +662,7 @@ Also see: =head2 VMS Perl on VMS is discussed in F in the perl distribution. -Note that perl on VMS can accept either VMS or Unix style file +Note that perl on VMS can accept either VMS- or Unix-style file specifications as in either of the following: $ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM @@ -566,20 +705,22 @@ extensions is also 39 characters. Version is a number from 1 to VMS' RMS filesystem is case insensitive and does not preserve case. C returns lowercased filenames, but specifying a file for -opening remains case insensitive. Files without extensions have a +opening remains case insensitive. Files without extensions have a trailing period on them, so doing a C with a file named F -will return F (though that file could be opened with C. +will return F (though that file could be opened with +C). -RMS has an eight level limit on directory depths from any rooted logical -(allowing 16 levels overall). Hence C -is a valid directory specification but C -is not. F authors might have to take this into account, but -at least they can refer to the former as C. +RMS had an eight level limit on directory depths from any rooted logical +(allowing 16 levels overall) prior to VMS 7.2. Hence +C is a valid directory specification but +C is not. F authors might +have to take this into account, but at least they can refer to the former +as C. -The C module, which gets installed as part -of the build process on VMS, is a pure Perl module that can easily be -installed on non-VMS platforms and can be helpful for conversions to -and from RMS native formats. +The C module, which gets installed as part of the build +process on VMS, is a pure Perl module that can easily be installed on +non-VMS platforms and can be helpful for conversions to and from RMS +native formats. What C<\n> represents depends on the type of file that is open. It could be C<\015>, C<\012>, C<\015\012>, or nothing. Reading from a file @@ -616,18 +757,84 @@ Put words C in message body. =back +=head2 VOS + +Perl on VOS is discussed in F in the perl distribution. +Note that perl on VOS can accept either VOS- or Unix-style file +specifications as in either of the following: + + $ perl -ne "print if /perl_setup/i" >system>notices + $ perl -ne "print if /perl_setup/i" /system/notices + +or even a mixture of both as in: + + $ perl -ne "print if /perl_setup/i" >system/notices + +Note that even though VOS allows the slash character to appear in object +names, because the VOS port of Perl interprets it as a pathname +delimiting character, VOS files, directories, or links whose names +contain a slash character cannot be processed. Such files must be +renamed before they can be processed by Perl. + +The following C functions are unimplemented on VOS, and any attempt by +Perl to use them will result in a fatal error message and an immediate +exit from Perl: dup, do_aspawn, do_spawn, fork, waitpid. Once these +functions become available in the VOS POSIX.1 implementation, you can +either recompile and rebind Perl, or you can download a newer port from +ftp.stratus.com. + +The value of C<$^O> on VOS is "VOS". To determine the architecture that +you are running on without resorting to loading all of C<%Config> you +can examine the content of the C<@INC> array like so: + + if (grep(/VOS/, @INC)) { + print "I'm on a Stratus box!\n"; + } else { + print "I'm not on a Stratus box!\n"; + die; + } + + if (grep(/860/, @INC)) { + print "This box is a Stratus XA/R!\n"; + } elsif (grep(/7100/, @INC)) { + print "This box is a Stratus HP 7100 or 8000!\n"; + } elsif (grep(/8000/, @INC)) { + print "This box is a Stratus HP 8000!\n"; + } else { + print "This box is a Stratus 68K...\n"; + } + +Also see: + +=over 4 + +=item L + +=item VOS mailing list + +There is no specific mailing list for Perl on VOS. You can post +comments to the comp.sys.stratus newsgroup, or subscribe to the general +Stratus mailing list. Send a letter with "Subscribe Info-Stratus" in +the message body to majordomo@list.stratagy.com. + +=item VOS Perl on the web at C + +=back + + =head2 EBCDIC Platforms Recent versions of Perl have been ported to platforms such as OS/400 on -AS/400 minicomputers as well as OS/390 for IBM Mainframes. Such computers -use EBCDIC character sets internally (usually Character Code Set ID 00819 -for OS/400 and IBM-1047 for OS/390). Note that on the mainframe perl -currently works under the "Unix system services for OS/390" (formerly -known as OpenEdition). +AS/400 minicomputers as well as OS/390 & VM/ESA for IBM Mainframes. Such +computers use EBCDIC character sets internally (usually Character Code +Set ID 00819 for OS/400 and IBM-1047 for OS/390 & VM/ESA). Note that on +the mainframe perl currently works under the "Unix system services +for OS/390" (formerly known as OpenEdition) and VM/ESA OpenEdition. -As of R2.5 of USS for OS/390 that Unix sub-system did not support the -C<#!> shebang trick for script invocation. Hence, on OS/390 perl scripts -can executed with a header similar to the following simple script: +As of R2.5 of USS for OS/390 and Version 2.3 of VM/ESA these Unix +sub-systems do not support the C<#!> shebang trick for script invocation. +Hence, on OS/390 and VM/ESA perl scripts can be executed with a header +similar to the following simple script: : # use perl eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}' @@ -637,20 +844,22 @@ can executed with a header similar to the following simple script: print "Hello from perl!\n"; On these platforms, bear in mind that the EBCDIC character set may have -an effect on what happens with perl functions such as C, C, -C, C, C, C, C, C; as well as -bit-fiddling with ASCII constants using operators like C<^>, C<&> and -C<|>; not to mention dealing with socket interfaces to ASCII computers -(see L<"NEWLINES">). +an effect on what happens with some perl functions (such as C, +C, C, C, C, C, C, C), as +well as bit-fiddling with ASCII constants using operators like C<^>, C<&> +and C<|>, not to mention dealing with socket interfaces to ASCII computers +(see L). Fortunately, most web servers for the mainframe will correctly translate the C<\n> in the following statement to its ASCII equivalent (note that -C<\r> is the same under both ASCII and EBCDIC): +C<\r> is the same under both Unix and OS/390 & VM/ESA): print "Content-type: text/html\r\n\r\n"; The value of C<$^O> on OS/390 is "os390". +The value of C<$^O> on VM/ESA is "vmesa". + Some simple tricks for determining if you are running on an EBCDIC platform could include any of the following (perhaps all): @@ -661,9 +870,9 @@ platform could include any of the following (perhaps all): if (chr(169) eq 'z') { print "EBCDIC may be spoken here!\n"; } Note that one thing you may not want to rely on is the EBCDIC encoding -of punctuation characters since these may differ from code page to code page -(and once your module or script is rumoured to work with EBCDIC, folks will -want it to work with all EBCDIC character sets). +of punctuation characters since these may differ from code page to code +page (and once your module or script is rumoured to work with EBCDIC, +folks will want it to work with all EBCDIC character sets). Also see: @@ -675,19 +884,131 @@ The perl-mvs@perl.org list is for discussion of porting issues as well as general usage issues for all EBCDIC Perls. Send a message body of "subscribe perl-mvs" to majordomo@perl.org. -=item AS/400 Perl information at C +=item AS/400 Perl information at C =back + +=head2 Acorn RISC OS + +As Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like +Unix and Unix filename emulation is turned on by default, it is quite +likely that most simple scripts will work "out of the box". The native +filing system is modular, and individual filing systems are free to be +case-sensitive or insensitive, and are usually case-preserving. Some +native filing systems have name length limits which file and directory +names are silently truncated to fit - scripts should be aware that the +standard disc filing system currently has a name length limit of B<10> +characters, with up to 77 items in a directory, but other filing systems +may not impose such limitations. + +Native filenames are of the form + + Filesystem#Special_Field::DiscName.$.Directory.Directory.File + +where + + Special_Field is not usually present, but may contain . and $ . + Filesystem =~ m|[A-Za-z0-9_]| + DsicName =~ m|[A-Za-z0-9_/]| + $ represents the root directory + . is the path separator + @ is the current directory (per filesystem but machine global) + ^ is the parent directory + Directory and File =~ m|[^\0- "\.\$\%\&:\@\\^\|\177]+| + +The default filename translation is roughly C + +Note that C<"ADFS::HardDisc.$.File" ne 'ADFS::HardDisc.$.File'> and that +the second stage of C<$> interpolation in regular expressions will fall +foul of the C<$.> if scripts are not careful. + +Logical paths specified by system variables containing comma-separated +search lists are also allowed, hence C is a valid +filename, and the filesystem will prefix C with each section of +C until a name is made that points to an object on disc. +Writing to a new file C would only be allowed if +C contains a single item list. The filesystem will also +expand system variables in filenames if enclosed in angle brackets, so +CSystem$DirE.Modules> would look for the file +S>. The obvious implication of this is +that BE>> and should +be protected when C is used for input. + +Because C<.> was in use as a directory separator and filenames could not +be assumed to be unique after 10 characters, Acorn implemented the C +compiler to strip the trailing C<.c> C<.h> C<.s> and C<.o> suffix from +filenames specified in source code and store the respective files in +subdirectories named after the suffix. Hence files are translated: + + foo.h h.foo + C:foo.h C:h.foo (logical path variable) + sys/os.h sys.h.os (C compiler groks Unix-speak) + 10charname.c c.10charname + 10charname.o o.10charname + 11charname_.c c.11charname (assuming filesystem truncates at 10) + +The Unix emulation library's translation of filenames to native assumes +that this sort of translation is required, and allows a user defined list +of known suffixes which it will transpose in this fashion. This may +appear transparent, but consider that with these rules C +and C both map to C, and that C and +C cannot and do not attempt to emulate the reverse mapping. Other +C<.>s in filenames are translated to C. + +As implied above the environment accessed through C<%ENV> is global, and +the convention is that program specific environment variables are of the +form C. Each filing system maintains a current directory, +and the current filing system's current directory is the B current +directory. Consequently, sociable scripts don't change the current +directory but rely on full pathnames, and scripts (and Makefiles) cannot +assume that they can spawn a child process which can change the current +directory without affecting its parent (and everyone else for that +matter). + +As native operating system filehandles are global and currently are +allocated down from 255, with 0 being a reserved value the Unix emulation +library emulates Unix filehandles. Consequently, you can't rely on +passing C, C, or C to your children. + +The desire of users to express filenames of the form +CFoo$DirE.Bar> on the command line unquoted causes problems, +too: C<``> command output capture has to perform a guessing game. It +assumes that a string C[^EE]+\$[^EE]E> is a +reference to an environment variable, whereas anything else involving +C> or C> is redirection, and generally manages to be 99% +right. Of course, the problem remains that scripts cannot rely on any +Unix tools being available, or that any tools found have Unix-like command +line arguments. + +Extensions and XS are, in theory, buildable by anyone using free tools. +In practice, many don't, as users of the Acorn platform are used to binary +distribution. MakeMaker does run, but no available make currently copes +with MakeMaker's makefiles; even if/when this is fixed, the lack of a +Unix-like shell can cause problems with makefile rules, especially lines +of the form C, and anything using quoting. + +"S" is the proper name for the operating system, but the value +in C<$^O> is "riscos" (because we don't like shouting). + +Also see: + +=over 4 + +=item perl list + +=back + + =head2 Other perls Perl has been ported to a variety of platforms that do not fit into any of the above categories. Some, such as AmigaOS, BeOS, QNX, and Plan 9, have -been well integrated into the standard Perl source code kit. You may need +been well-integrated into the standard Perl source code kit. You may need to see the F directory on CPAN for information, and possibly -binaries, for the likes of: acorn, aos, atari, lynxos, HP-MPE/iX, riscos, -Tandem Guardian, vos, I (yes we know that some of these OSes may fall -under the Unix category but we are not a standards body.) +binaries, for the likes of: aos, atari, lynxos, riscos, Tandem Guardian, +vos, I (yes we know that some of these OSes may fall under the Unix +category, but we are not a standards body.) See also: @@ -699,7 +1020,7 @@ See also: =item Novell Netware -A free Perl 5 based PERL.NLM for Novell Netware is available from +A free perl5-based PERL.NLM for Novell Netware is available from C =back @@ -715,14 +1036,12 @@ The list may very well be incomplete, or wrong in some places. When in doubt, consult the platform-specific README files in the Perl source distribution, and other documentation resources for a given port. -Be aware, moreover, that even among Unix-ish systems there are variations, -and not all functions listed here are necessarily available, though -most usually are. +Be aware, moreover, that even among Unix-ish systems there are variations. For many functions, you can also query C<%Config>, exported by default from C. For example, to check if the platform has the C -call, check C<$Config{'d_lstat'}>. See L for a full description -of available variables. +call, check C<$Config{'d_lstat'}>. See L for a full +description of available variables. =head2 Alphabetical Listing of Perl Functions @@ -742,28 +1061,38 @@ considerations. C<-o> is not supported. (S) C<-r>, C<-w>, C<-x>, and C<-o> tell whether or not file is accessible, which may not reflect UIC-based file protections. (VMS) +C<-s> returns the size of the data fork, not the total size of data fork +plus resource fork. (S). + +C<-s> by name on an open file will return the space reserved on disk, +rather than the current extent. C<-s> on an open filehandle returns the +current size. (S) + C<-R>, C<-W>, C<-X>, C<-O> are indistinguishable from C<-r>, C<-w>, -C<-x>, C<-o>. (S, Win32, VMS) +C<-x>, C<-o>. (S, Win32, VMS, S) C<-b>, C<-c>, C<-k>, C<-g>, C<-p>, C<-u>, C<-A> are not implemented. (S) C<-g>, C<-k>, C<-l>, C<-p>, C<-u>, C<-A> are not particularly meaningful. -(Win32, VMS) +(Win32, VMS, S) C<-d> is true if passed a device spec without an explicit directory. (VMS) C<-T> and C<-B> are implemented, but might misclassify Mac text files -with foreign characters; this is the case will all platforms, but -affects S a lot. (S) +with foreign characters; this is the case will all platforms, but may +affect S often. (S) C<-x> (or C<-X>) determine if a file ends in one of the executable suffixes. C<-S> is meaningless. (Win32) +C<-x> (or C<-X>) determine if a file has an executable file type. +(S) + =item binmode FILEHANDLE -Meaningless. (S) +Meaningless. (S, S) Reopens file and restores pointer; if function fails, underlying filehandle may be closed, or pointer may be in a different position. @@ -780,9 +1109,13 @@ locking/unlocking the file. (S) Only good for changing "owner" read-write access, "group", and "other" bits are meaningless. (Win32) +Only good for changing "owner" and "other" read-write access. (S) + +Access permissions are mapped onto VOS access-control list changes. (VOS) + =item chown LIST -Not implemented. (S, Win32, Plan9) +Not implemented. (S, Win32, Plan9, S, VOS) Does nothing, but won't fail. (Win32) @@ -790,70 +1123,76 @@ Does nothing, but won't fail. (Win32) =item chroot -Not implemented. (S, Win32, VMS, Plan9) +Not implemented. (S, Win32, VMS, Plan9, S, VOS, VM/ESA) =item crypt PLAINTEXT,SALT May not be available if library or source was not provided when building -perl. (Win32) +perl. (Win32) + +Not implemented. (VOS) =item dbmclose HASH -Not implemented. (VMS, Plan9) +Not implemented. (VMS, Plan9, VOS) =item dbmopen HASH,DBNAME,MODE -Not implemented. (VMS, Plan9) +Not implemented. (VMS, Plan9, VOS) =item dump LABEL -Not useful. (S) +Not useful. (S, S) Not implemented. (Win32) -Invokes VMS debugger. (VMS) +Invokes VMS debugger. (VMS) =item exec LIST Not implemented. (S) +Implemented via Spawn. (VM/ESA) + =item fcntl FILEHANDLE,FUNCTION,SCALAR Not implemented. (Win32, VMS) =item flock FILEHANDLE,OPERATION -Not implemented (S, VMS). +Not implemented (S, VMS, S, VOS). Available only on Windows NT (not on Windows 95). (Win32) =item fork -Not implemented. (S, Win32, AmigaOS) +Not implemented. (S, Win32, AmigaOS, S, VOS, VM/ESA) =item getlogin -Not implemented. (S) +Not implemented. (S, S) =item getpgrp PID -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS) =item getppid -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S) =item getpriority WHICH,WHO -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) =item getpwnam NAME Not implemented. (S, Win32) +Not useful. (S) + =item getgrnam NAME -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S) =item getnetbyname NAME @@ -863,9 +1202,11 @@ Not implemented. (S, Win32, Plan9) Not implemented. (S, Win32) +Not useful. (S) + =item getgrgid GID -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S) =item getnetbyaddr ADDR,ADDRTYPE @@ -881,11 +1222,11 @@ Not implemented. (S) =item getpwent -Not implemented. (S, Win32) +Not implemented. (S, Win32, VM/ESA) =item getgrent -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, VM/ESA) =item gethostent @@ -905,35 +1246,35 @@ Not implemented. (Win32, Plan9) =item setpwent -Not implemented. (S, Win32) +Not implemented. (S, Win32, S) =item setgrent -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S) =item sethostent STAYOPEN -Not implemented. (S, Win32, Plan9) +Not implemented. (S, Win32, Plan9, S) =item setnetent STAYOPEN -Not implemented. (S, Win32, Plan9) +Not implemented. (S, Win32, Plan9, S) =item setprotoent STAYOPEN -Not implemented. (S, Win32, Plan9) +Not implemented. (S, Win32, Plan9, S) =item setservent STAYOPEN -Not implemented. (Plan9, Win32) +Not implemented. (Plan9, Win32, S) =item endpwent -Not implemented. (S, Win32) +Not implemented. (S, Win32, VM/ESA) =item endgrent -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VM/ESA) =item endhostent @@ -962,8 +1303,14 @@ Not implemented. (S, Plan9) Globbing built-in, but only C<*> and C metacharacters are supported. (S) -Features depend on external perlglob.exe or perlglob.bat. May be overridden -with something like File::DosGlob, which is recommended. (Win32) +Features depend on external perlglob.exe or perlglob.bat. May be +overridden with something like File::DosGlob, which is recommended. +(Win32) + +Globbing built-in, but only C<*> and C metacharacters are supported. +Globbing relies on operating system calls, which may return filenames +in any order. As most filesystems are case-insensitive, even "sorted" +filenames will not be in case-sensitive order. (S) =item ioctl FILEHANDLE,FUNCTION,SCALAR @@ -972,16 +1319,22 @@ Not implemented. (VMS) Available only for socket handles, and it does what the ioctlsocket() call in the Winsock API does. (Win32) +Available only for socket handles. (S) + =item kill LIST -Not implemented. (S) +Not implemented, hence not useful for taint checking. (S, +S) -Available only for process handles returned by the C method of -spawning a process. (Win32) +Available only for process handles returned by the C +method of spawning a process. (Win32) =item link OLDFILE,NEWFILE -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S) + +Link count not updated because hard links are not quite that hard +(They are sort of half-way between hard and soft links). (AmigaOS) =item lstat FILEHANDLE @@ -989,9 +1342,9 @@ Not implemented. (S, Win32, VMS) =item lstat -Not implemented. (VMS) +Not implemented. (VMS, S) -Return values may be bogus. (Win32) +Return values may be bogus. (Win32) =item msgctl ID,CMD,ARG @@ -1001,7 +1354,7 @@ Return values may be bogus. (Win32) =item msgrcv ID,VAR,SIZE,TYPE,FLAGS -Not implemented. (S, Win32, VMS, Plan9) +Not implemented. (S, Win32, VMS, Plan9, S, VOS) =item open FILEHANDLE,EXPR @@ -1010,37 +1363,41 @@ Not implemented. (S, Win32, VMS, Plan9) The C<|> variants are only supported if ToolServer is installed. (S) -open to C<|-> and C<-|> are unsupported. (S, Win32) +open to C<|-> and C<-|> are unsupported. (S, Win32, S) =item pipe READHANDLE,WRITEHANDLE Not implemented. (S) +Very limited functionality. (MiNT) + =item readlink EXPR =item readlink -Not implemented. (Win32, VMS) +Not implemented. (Win32, VMS, S) =item select RBITS,WBITS,EBITS,TIMEOUT Only implemented on sockets. (Win32) +Only reliable on sockets. (S) + =item semctl ID,SEMNUM,CMD,ARG =item semget KEY,NSEMS,FLAGS =item semop KEY,OPSTRING -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS) =item setpgrp PID,PGRP -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS) =item setpriority WHICH,WHO,PRIORITY -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS) =item setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL @@ -1054,11 +1411,11 @@ Not implemented. (S, Plan9) =item shmwrite ID,STRING,POS,SIZE -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS) =item socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) =item stat FILEHANDLE @@ -1073,13 +1430,23 @@ device and inode are not meaningful. (Win32) device and inode are not necessarily reliable. (VMS) +mtime, atime and ctime all return the last modification time. Device and +inode are not necessarily reliable. (S) + =item symlink OLDFILE,NEWFILE -Not implemented. (Win32, VMS) +Not implemented. (Win32, VMS, S) =item syscall LIST -Not implemented. (S, Win32, VMS) +Not implemented. (S, Win32, VMS, S, VOS, VM/ESA) + +=item sysopen FILEHANDLE,FILENAME,MODE,PERMS + +The traditional "0", "1", and "2" MODEs are implemented with different +numeric values on some systems. The flags exported by C +(O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though. (S, OS/390, VM/ESA) =item system LIST @@ -1091,6 +1458,21 @@ process and immediately returns its process designator, without waiting for it to terminate. Return value may be used subsequently in C or C. (Win32) +There is no shell to process metacharacters, and the native standard is +to pass a command line terminated by "\n" "\r" or "\0" to the spawned +program. Redirection such as C foo> is performed (if at all) by +the run time library of the spawned program. C I will call +the Unix emulation library's C emulation, which attempts to provide +emulation of the stdin, stdout, stderr in force in the parent, providing +the child program uses a compatible version of the emulation library. +I will call the native command line direct and no such emulation +of a child Unix program will exists. Mileage B vary. (S) + +Far from being POSIX compliant. Because there may be no underlying +/bin/sh tries to work around the problem by forking and execing the +first token in its argument string. Handles basic redirection +("E" or "E") on its own behalf. (MiNT) + =item times Only the first entry returned is nonzero. (S) @@ -1099,62 +1481,133 @@ Only the first entry returned is nonzero. (S) "system" time will be bogus, and "user" time is actually the time returned by the clock() function in the C runtime library. (Win32) +Not useful. (S) + =item truncate FILEHANDLE,LENGTH =item truncate EXPR,LENGTH Not implemented. (VMS) +Truncation to zero-length only. (VOS) + +If a FILEHANDLE is supplied, it must be writable and opened in append +mode (i.e., use C>filename')> +or C. If a filename is supplied, it +should not be held open elsewhere. (Win32) + =item umask EXPR =item umask Returns undef where unavailable, as of version 5.005. +C works but the correct permissions are only set when the file +is finally close()d. (AmigaOS) + =item utime LIST -Only the modification time is updated. (S, VMS) +Only the modification time is updated. (S, VMS, S) -May not behave as expected. (Win32) +May not behave as expected. Behavior depends on the C runtime +library's implementation of utime(), and the filesystem being +used. The FAT filesystem typically does not support an "access +time" field, and it may limit timestamps to a granularity of +two seconds. (Win32) =item wait =item waitpid PID,FLAGS -Not implemented. (S) +Not implemented. (S, VOS) Can only be applied to process handles returned for processes spawned using C. (Win32) +Not useful. (S) + =back +=head1 CHANGES + +=over 4 + +=item v1.39, 11 February, 1999 + +Changes from Jarkko and EMX URL fixes Michael Schwern. Additional +note about newlines added. + +=item v1.38, 31 December 1998 + +More changes from Jarkko. + +=item v1.37, 19 December 1998 + +More minor changes. Merge two separate version 1.35 documents. + +=item v1.36, 9 September 1998 + +Updated for Stratus VOS. Also known as version 1.35. + +=item v1.35, 13 August 1998 + +Integrate more minor changes, plus addition of new sections under +L<"ISSUES">: L<"Numbers endianness and Width">, +L<"Character sets and character encoding">, +L<"Internationalisation">. + +=item v1.33, 06 August 1998 + +Integrate more minor changes. + +=item v1.32, 05 August 1998 + +Integrate more minor changes. + +=item v1.30, 03 August 1998 + +Major update for RISC OS, other minor changes. + +=item v1.23, 10 July 1998 + +First public release with perl5.005. + +=back =head1 AUTHORS / CONTRIBUTORS -Chris Nandor Epudge@pobox.comE, -Gurusamy Sarathy Egsar@umich.eduE, -Peter Prymmer Epvhp@forte.comE, +Abigail Eabigail@fnx.comE, +Charles Bailey Ebailey@newman.upenn.eduE, +Graham Barr Egbarr@pobox.comE, Tom Christiansen Etchrist@perl.comE, -Nathan Torkington Egnat@frii.comE, -Paul Moore EPaul.Moore@uk.origin-it.comE, -Matthias Neercher Eneeri@iis.ee.ethz.chE, -Charles Bailey Ebailey@genetics.upenn.eduE, +Nicholas Clark ENicholas.Clark@liverpool.ac.ukE, +Andy Dougherty Edoughera@lafcol.lafayette.eduE, +Dominic Dunlop Edomo@vo.luE, +Neale Ferguson Eneale@mailbox.tabnsw.com.auE +Paul Green EPaul_Green@stratus.comE, +M.J.T. Guy Emjtg@cus.cam.ac.ukE, +Jarkko Hietaniemi Ejhi@iki.fi, Luther Huffman Elutherh@stratcom.comE, -Gary Ng E71564.1743@CompuServe.COME, Nick Ing-Simmons Enick@ni-s.u-net.comE, -Paul J. Schinder Eschinder@pobox.comE, +Andreas J. KEnig Ekoenig@kulturbox.deE, +Markus Laker Emlaker@contax.co.ukE, +Andrew M. Langmead Eaml@world.std.comE, +Paul Moore EPaul.Moore@uk.origin-it.comE, +Chris Nandor Epudge@pobox.comE, +Matthias Neeracher Eneeri@iis.ee.ethz.chE, +Gary Ng E71564.1743@CompuServe.COME, Tom Phoenix Erootbeer@teleport.comE, -Hugo van der Sanden Eh.sanden@elsevier.nlE, -Dominic Dunlop Edomo@vo.luE, +Peter Prymmer Epvhp@forte.comE, +Hugo van der Sanden Ehv@crypt0.demon.co.ukE, +Gurusamy Sarathy Egsar@umich.eduE, +Paul J. Schinder Eschinder@pobox.comE, +Michael G Schwern Eschwern@pobox.comE, Dan Sugalski Esugalskd@ous.eduE, -Andreas J. Koenig Ekoenig@kulturbox.deE, -Andrew M. Langmead Eaml@world.std.comE, -Andy Dougherty Edoughera@lafcol.lafayette.eduE, -Abigail Eabigail@fnx.comE. +Nathan Torkington Egnat@frii.comE. -This document is maintained by Chris Nandor. +This document is maintained by Chris Nandor +Epudge@pobox.comE. =head1 VERSION -Version 1.23, last modified 10 July 1998. - +Version 1.39, last modified 11 February 1999