From: Gurusamy Sarathy Date: Fri, 7 Aug 1998 22:19:42 +0000 (+0000) Subject: perlport.pod notes from Jarkko Hietaniemi; utime() note for Win32 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=322422def5902825e69a6575c6ba0ca85b18ea34;p=p5sagit%2Fp5-mst-13.2.git perlport.pod notes from Jarkko Hietaniemi; utime() note for Win32 p4raw-id: //depot/maint-5.005/perl@1753 --- diff --git a/pod/perlport.pod b/pod/perlport.pod index 8568c25..534f0e2 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -149,6 +149,32 @@ platforms, because now any C<\015>'s (C<\cM>'s) are stripped out (and there was much rejoicing). +=head2 Numbers endianness and Width + +Different CPUs store integers and floating point numbers in different +orders (called I) and widths (32-bit and 64-bit being the +most common). This affects your programs if they attempt to transfer +numbers in binary format from a CPU architecture to another over some +channel: either 'live' via network connections or storing the numbers +to secondary storage such as a disk file. + +Conflicting storage orders make utter mess out of the numbers: if a +little-endian host (Intel, Alpha) stores 0x12345678 (305419896 in +decimal), a big-endian host (Motorola, MIPS, Sparc, PA) reads it as +0x78563412 (2018915346 in decimal). To avoid this problem in network +(socket) connections use the C and C formats C<"n"> +and C<"N">, the "network" orders, they are guaranteed to be portable. + +Different widths can cause truncation even between platforms of equal +endianness: the platform of shorter width loses the upper parts of the +number. There is no good solution for this problem except to avoid +transferring or storing raw binary numbers. + +One can circumnavigate both these problems in two ways: either +transfer and store numbers always in text format, instead of raw +binary, or consider using modules like C (included in +the standard distribution as of Perl 5.005) and C. + =head2 Files Most platforms these days structure files in a hierarchical fashion. @@ -157,13 +183,20 @@ notion of a "path" to uniquely identify a file on the system. Just how that path is actually written, differs. While they are similar, file path specifications differ between Unix, -Windows, S, OS/2, VMS, S and probably others. Unix, for -example, is one of the few OSes that has the idea of a root directory. -S uses C<:> as a path separator instead of C. VMS, Windows, -and OS/2 can work similarly to Unix with C as path separator, or in -their own idiosyncratic ways. C perl can emulate Unix filenames -with C as path separator, or go native and use C<.> for path separator -and C<:> to signal filing systems and disc names. +Windows, S, OS/2, VMS, S and probably others. Unix, +for example, is one of the few OSes that has the idea of a single root +directory. + +VMS, Windows, and OS/2 can work similarly to Unix with C as path +separator, or in their own idiosyncratic ways (such as having several +root directories and various "unrooted" device files such NIL: and +LPT:). + +S uses C<:> as a path separator instead of C. + +C perl can emulate Unix filenames with C as path +separator, or go native and use C<.> for path separator and C<:> to +signal filing systems and disc names. As with the newline problem above, there are modules that can help. The C modules provide methods to do the Right Thing on whatever @@ -191,10 +224,16 @@ Also of use is C, from the standard distribution, which splits a pathname into pieces (base filename, full path to directory, and file suffix). -Remember not to count on the existence of system-specific files, like -F. If code does need to rely on such a file, include a -description of the file and its format in the code's documentation, and -make it easy for the user to override the default location of the file. +Even when on a single platform (if you can call UNIX a single +platform), remember not to count on the existence or the contents of +system-specific files, like F, F, or +F. For example the F may exist but it +may not contain the encrypted passwords because the system is using +some form of enhanced security-- or it may not contain all the +accounts because the system is using NIS. If code does need to rely +on such a file, include a description of the file and its format in +the code's documentation, and make it easy for the user to override +the default location of the file. Do not have two files of the same name with different case, like F and , as many platforms have case-insensitive @@ -274,6 +313,8 @@ The rule of thumb for portable code is: Do it all in portable Perl, or use a module (that may internally implement it with platform-specific code, but expose a common interface). +The UNIX System V IPC (C) is not available +even in all UNIX platforms. =head2 External Subroutines (XS) @@ -315,12 +356,37 @@ widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, and even if it is, don't assume that you can control the timezone through that variable. -Don't assume that the epoch starts at January 1, 1970, because that is -OS-specific. Better to store a date in an unambiguous representation. -A text representation (like C<1 Jan 1970>) can be easily converted into an -OS-specific value using a module like C. An array of values, -such as those returned by C, can be converted to an OS-specific -representation using C. +Don't assume that the epoch starts at 00:00:00, January 1, 1970, +because that is OS-specific. Better to store a date in an unambiguous +representation. The ISO 8601 standard defines YYYY-MM-DD as the date +format. A text representation (like C<1 Jan 1970>) can be easily +converted into an OS-specific value using a module like +C. An array of values, such as those returned by +C, can be converted to an OS-specific representation using +C. + + +=head2 Character sets and character encoding + +Assume very little about character sets. Do not assume anything about +the numerical values (C, C) of characters. Do not +assume that the alphabetic characters are encoded contiguously (in +numerical sense). Do no assume anything about the ordering of the +characters. The lowercase letters may come before or after the +uppercase letters, the lowercase and uppercase may be interlaced so +that both 'a' and 'A* come before the 'b', the accented and other +international characters may be interlaced so that E comes +before the 'b'. + + +=head2 Internationalisation + +If you may assume POSIX (a rather large assumption, that: in practise +that means UNIX) you may read more about the POSIX locale system from +L. The locale system at least attempts to make things a +little bit more portable or at least more convenient and +native-friendly for non-English users. The system affects character +sets and encoding, and date and time formatting, among other things. =head2 System Resources @@ -406,15 +472,18 @@ Unix flavors: uname $^O $Config{'archname'} ------------------------------------------- - AIX aix - FreeBSD freebsd - Linux linux - HP-UX hpux - OSF1 dec_osf + AIX aix aix + FreeBSD freebsd freebsd-i386 + Linux linux i386-linux + HP-UX hpux PA-RISC1.1 + IRIX irix irix + OSF1 dec_osf alpha-dec_osf SunOS solaris sun4-solaris SunOS solaris i86pc-solaris - SunOS4 sunos + SunOS4 sunos sun4-sunos +Note that because the C<$Config{'archname'}> may depend on the hardware +architecture it may vary quite a lot, much more than the C<$^O>. =head2 DOS and Derivatives @@ -1266,8 +1335,9 @@ Not implemented. (S, Win32, VMS, S) =item sysopen FILEHANDLE,FILENAME,MODE,PERMS The traditional "0", "1", and "2" MODEs are implemented with different -numeric values on some systems. The flags exported by C should -work everywhere though. (S, OS/390) +numeric values on some systems. The flags exported by C +(O_RDONLY, O_WRONLY, O_RDWR) should work everywhere though. (S, OS/390) =item system LIST @@ -1315,7 +1385,11 @@ Returns undef where unavailable, as of version 5.005. Only the modification time is updated. (S, VMS, S) -May not behave as expected. (Win32) +May not behave as expected. Behavior depends on the C runtime +library's implementation of utime(), and the filesystem being +used. The FAT filesystem typically does not support an "access +time" field, and it may limit timestamps to a granularity of +two seconds. (Win32) =item wait @@ -1364,15 +1438,15 @@ Dominic Dunlop Edomo@vo.luE, M.J.T. Guy Emjtg@cus.cam.ac.ukE, Luther Huffman Elutherh@stratcom.comE, Nick Ing-Simmons Enick@ni-s.u-net.comE, -Andreas J. Koenig Ekoenig@kulturbox.deE, +Andreas J. KEnig Ekoenig@kulturbox.deE, Andrew M. Langmead Eaml@world.std.comE, Paul Moore EPaul.Moore@uk.origin-it.comE, Chris Nandor Epudge@pobox.comE, -Matthias Neercher Eneeri@iis.ee.ethz.chE, +Matthias Neeracher Eneeri@iis.ee.ethz.chE, Gary Ng E71564.1743@CompuServe.COME, Tom Phoenix Erootbeer@teleport.comE, Peter Prymmer Epvhp@forte.comE, -Hugo van der Sanden Eh.sanden@elsevier.nlE, +Hugo van der Sanden Ehv@crypt0.demon.co.ukE, Gurusamy Sarathy Egsar@umich.eduE, Paul J. Schinder Eschinder@pobox.comE, Dan Sugalski Esugalskd@ous.eduE, @@ -1382,6 +1456,6 @@ This document is maintained by Chris Nandor. =head1 VERSION -Version 1.33, last modified 06 August 1998. +Version 1.34, last modified 07 August 1998.