From: Nick Ing-Simmons Date: Tue, 18 Jun 2002 09:14:25 +0000 (+0000) Subject: More PerlIO doc tweaks - trying to make them document what X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=01e6739c7b8104181279fe94fb5488ec35b2ba1c;p=p5sagit%2Fp5-mst-13.2.git More PerlIO doc tweaks - trying to make them document what happens in current implementation while leaving way open to "fixing" things. p4raw-id: //depot/perlio@17282 --- diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm index 9a4da3a..a311418 100644 --- a/lib/PerlIO.pm +++ b/lib/PerlIO.pm @@ -33,7 +33,7 @@ PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space =head1 SYNOPSIS - open($fh,">:crlf", "my.txt"); # portably open a text file for writing + open($fh,"<:crlf", "my.txt"); # portably open a text file for reading open($fh,"<","his.jpg"); # portably open a binary file for reading binmode($fh); @@ -110,27 +110,60 @@ to a such a stream. =item raw -B layer is deprecated.> +B layer is deprecated:> -A pseudo-layer which performs two functions (which is messy, but -necessary to maintain compatibility with non-PerlIO builds of Perl -and their way things have been documented elsewhere). +C<:raw> has been documented as both the opposite of C<:crlf> and +as a way to make a stream "binary". With the new IO system those +two are no longer equivalent. The name has also been read as meaning +an unbuffered stream "as close to the operating system as possible". +See below for better ways to do things. + +The C<:raw> layer exists to maintain compatibility with non-PerlIO builds +of Perl and to approximate the way it has been documented and how +it was "faked" in perl5.6. It is a pseudo-layer which performs two +functions (which is messy). Firstly it forces the file handle to be considered binary at that -point in the layer stack, i.e. it turns off any CRLF translation. +point in the layer stack, i.e it turns off any CRLF translation. Secondly in prevents the IO system seaching back before it in the -layer specification. Thus: +layer specification. This second effect is intended to disable other +non-binary features of the stream. + +Thus: open($fh,":raw:perlio",...) -Forces the use of C layer even if the platform default, or +forces the use of C layer even if the platform default, or C default is something else (such as ":encoding(iso-8859-7)") (the C<:encoding> requires C) which would interfere with binary nature of the stream. =back +=head2 Alternatives to raw + +To get a binary stream the prefered method is to use: + + binmode($fh); + +which is neatly backward compatible with how such things have +had to be coded on some platforms for years. +The current implementation comprehends the effects of C<:utf8> and +C<:crlf> layers and will be extended to comprehend similar translations +if they are defined in future releases of perl. + +To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>) +the open call: + + open($fh,"<:unix",$path) + +To get a non-CRLF translated stream on any platform start from +the un-buffered stream and add the appropriate layer: + + open($fh,"<:unix:perlio",$path) + + =head2 Defaults and how to override them If the platform is MS-DOS like and normally does CRLF to "\n" diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 858f23f..b4c5de0 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -455,36 +455,44 @@ binary and text files. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. Returns true on success, C on failure. -DISCIPLINE can be either of C<:bytes> for "binary" mode or C<:crlf> -for "text" mode. If the DISCIPLINE is omitted, it defaults to -C<:bytes>. To mark FILEHANDLE as UTF-8, use C<:utf8>. For backward -compatibility C also implicitly marks the -filehandle as bytes. +If DISCIPLINE is ommited the filehandle is made suitable for +passing binary data. This includes turning off CRLF translation +and marking it as bytes. + +On some systems (in general, DOS and Windows-based systems) binmode() +is necessary when you're not working with a text file. For the sake +of portability it is a good idea to always use it when appropriate, +and to never use it when it isn't appropriate. + +In other words: regardless of platform, use binmode() on binary files +(like for example images). + +If DISCIPLINE is present it is a single string, but may contain +multiple directives. The directives alter the behaviour of the +file handle. When DISCIPLINE is present using binmode on text +file makes sense. + +To mark FILEHANDLE as UTF-8, use C<:utf8>. The C<:bytes>, C<:crlf>, and C<:utf8>, and any other directives of the -form C<:...>, are called I/O I. The C pragma can -be used to establish default I/O disciplines. See L. +form C<:...>, are called I/O I. The normal implementation +of disciplines in perl5.8 and later is in terms of I. See +L. (There is typically a one-to-one correspondence between +layers and disiplines.) The C pragma can be used to establish +default I/O disciplines. See L. The C<:raw> discipline is deprecated. (As opposed to what Camel III -said, it is not the inverse of C<:crlf>.) See L and the -discussion about the PERLIO environment variable. +said, it is not the inverse of C<:crlf>.) See L, L +and the discussion about the PERLIO environment variable. In general, binmode() should be called after open() but before any I/O -is done on the filehandle. Calling binmode() will flush any possibly -pending buffered input or output data on the handle. The only -exception to this is the C<:encoding> discipline that changes -the default character encoding of the handle, see L. +is done on the filehandle. Calling binmode() will normally flush any +pending buffered output data (and perhaps pending input data) on the +handle. An exception to this is the C<:encoding> discipline that +changes the default character encoding of the handle, see L. The C<:encoding> discipline sometimes needs to be called in mid-stream, and it doesn't flush the stream. -On some systems (in general, DOS and Windows-based systems) binmode() -is necessary when you're not working with a text file. For the sake -of portability it is a good idea to always use it when appropriate, -and to never use it when it isn't appropriate. - -In other words: regardless of platform, use binmode() on binary files -(like for example images), and do not use binmode() on text files. - The operating system, device drivers, C libraries, and Perl run-time system all work together to let the programmer treat a single character (C<\n>) as the line terminator, irrespective of the external @@ -496,14 +504,13 @@ one character. Mac OS, all variants of Unix, and Stream_LF files on VMS use a single character to end each line in the external representation of text (even though that single character is CARRIAGE RETURN on Mac OS and LINE FEED -on Unix and most VMS files). Consequently binmode() has no effect on -these operating systems. In other systems like OS/2, DOS and the various -flavors of MS-Windows your program sees a C<\n> as a simple C<\cJ>, but -what's stored in text files are the two characters C<\cM\cJ>. That means -that, if you don't use binmode() on these systems, C<\cM\cJ> sequences on -disk will be converted to C<\n> on input, and any C<\n> in your program -will be converted back to C<\cM\cJ> on output. This is what you want for -text files, but it can be disastrous for binary files. +on Unix and most VMS files). In other systems like OS/2, DOS and the +various flavors of MS-Windows your program sees a C<\n> as a simple C<\cJ>, +but what's stored in text files are the two characters C<\cM\cJ>. That +means that, if you don't use binmode() on these systems, C<\cM\cJ> +sequences on disk will be converted to C<\n> on input, and any C<\n> in +your program will be converted back to C<\cM\cJ> on output. This is what +you want for text files, but it can be disastrous for binary files. Another consequence of using binmode() (on some systems) is that special end-of-file markers will be seen as part of the data stream. @@ -2476,7 +2483,7 @@ and the month of the year, may not necessarily be three characters wide. =item lock THING -This function places an advisory lock on a shared variable, or referenced +This function places an advisory lock on a shared variable, or referenced object contained in I until the lock goes out of scope. lock() is a "weak keyword" : this means that if you've defined a function @@ -2788,13 +2795,16 @@ In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN and opening C<< '>-' >> opens STDOUT. You may use the three-argument form of open to specify -I that affect how the input and output -are processed: see L and L. For example +I or IO "layers" to be applied to the handle that affect how the input and output +are processed: (see L and L for more details). +For example open(FH, "<:utf8", "file") will open the UTF-8 encoded file containing Unicode characters, -see L. +see L. (Note that if disciplines are specified in the +three-arg form then default disciplines set by the C pragma are +ignored.) Open returns nonzero upon success, the undefined value otherwise. If the C involved a pipe, the return value happens to be the pid of @@ -2808,11 +2818,6 @@ like Unix, Mac OS, and Plan 9, which delimit lines with a single character, and which encode that character in C as C<"\n">, do not need C. The rest need it. -In the three argument form MODE may also contain a list of IO "layers" -(see L and L for more details) to be applied to the -handle. This can be used to achieve the effect of C as well -as more complex behaviours. - When opening a file, it's usually a bad idea to continue normal execution if the request failed, so C is frequently used in connection with C. Even if C won't do what you want (say, in a CGI script,