From: Iain Truskett Date: Thu, 20 Nov 2003 00:41:33 +0000 (+1100) Subject: [docpatch] PerlIO layers in perlrun.pod and PerlIO.pm X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=3d897973af512747f1e151f6cf3b7bc2230d4067;p=p5sagit%2Fp5-mst-13.2.git [docpatch] PerlIO layers in perlrun.pod and PerlIO.pm Message-ID: <20031119134132.GG21314@gytha.anu.edu.au> p4raw-id: //depot/perl@21754 --- diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm index c7b9f13..3b277d9 100644 --- a/lib/PerlIO.pm +++ b/lib/PerlIO.pm @@ -61,30 +61,65 @@ The following layers are currently defined: =over 4 -=item unix +=item :unix -Low level layer which calls C, C and C etc. +Lowest level layer which provides basic PerlIO operations in terms of +UNIX/POSIX numeric file descriptor calls +(open(), read(), write(), lseek(), close()). -=item stdio +=item :stdio Layer which calls C, C and C/C etc. Note that as this is "real" stdio it will ignore any layers beneath it and got straight to the operating system via the C library as usual. -=item perlio +=item :perlio -This is a re-implementation of "stdio-like" buffering written as a -PerlIO "layer". As such it will call whatever layer is below it for -its operations. +A from scratch implementation of buffering for PerlIO. Provides fast +access to the buffer for C which implements perl's readline/EE +and in general attempts to minimize data copying. -=item crlf +C<:perlio> will insert a C<:unix> layer below itself to do low level IO. -A layer which does CRLF to "\n" translation distinguishing "text" and -"binary" files in the manner of MS-DOS and similar operating systems. -(It currently does I mimic MS-DOS as far as treating of Control-Z -as being an end-of-file marker.) +=item :crlf -=item utf8 +A layer that implements DOS/Windows like CRLF line endings. On read +converts pairs of CR,LF to a single "\n" newline character. On write +converts each "\n" to a CR,LF pair. Note that this layer likes to be +one of its kind: it silently ignores attempts to be pushed into the +layer stack more than once. + +It currently does I mimic MS-DOS as far as treating of Control-Z +as being an end-of-file marker. + +(Gory details follow) To be more exact what happens is this: after +pushing itself to the stack, the C<:crlf> layer checks all the layers +below itself to find the first layer that is capable of being a CRLF +layer but is not yet enabled to be a CRLF layer. If it finds such a +layer, it enables the CRLFness of that other deeper layer, and then +pops itself off the stack. If not, fine, use the one we just pushed. + +The end result is that a C<:crlf> means "please enable the first CRLF +layer you can find, and if you can't find one, here would be a good +spot to place a new one." + +Based on the C<:perlio> layer. + +=item :mmap + +A layer which implements "reading" of files by using C to +make (whole) file appear in the process's address space, and then +using that as PerlIO's "buffer". This I be faster in certain +circumstances for large files, and may result in less physical memory +use when multiple processes are reading the same file. + +Files which are not C-able revert to behaving like the C<:perlio> +layer. Writes also behave like C<:perlio> layer as C for write +needs extra house-keeping (to extend the file) which negates any advantage. + +The C<:mmap> layer will not exist if platform does not support C. + +=item :utf8 Declares that the stream accepts perl's internal encoding of characters. (Which really is UTF-8 on ASCII machines, but is @@ -104,7 +139,7 @@ and then read it back in. $in = ; close(F); -=item bytes +=item :bytes This is the inverse of C<:utf8> layer. It turns off the flag on the layer below so that data read from it is considered to @@ -112,14 +147,20 @@ be "octets" i.e. characters in range 0..255 only. Likewise on output perl will warn if a "wide" character is written to a such a stream. -=item raw +=item :raw The C<:raw> layer is I as being identical to calling C - the stream is made suitable for passing binary data i.e. each byte is passed as-is. The stream will still be -buffered. Unlike in the earlier versions of Perl C<:raw> is I -just the inverse of C<:crlf> - other layers which would affect the -binary nature of the stream are also removed or disabled. +buffered. + +In Perl 5.6 and some books the C<:raw> layer (previously sometimes also +referred to as a "discipline") is documented as the inverse of the +C<:crlf> layer. That is no longer the case - other layers which would +alter binary nature of the stream are also disabled. If you want UNIX +line endings on a platform that normally does CRLF translation, but still +want UTF-8 or encoding defaults the appropriate thing to do is to add +C<:perlio> to PERLIO environment variable. The implementation of C<:raw> is as a pseudo-layer which when "pushed" pops itself and then any layers which do not declare themselves as suitable @@ -135,7 +176,7 @@ a known base on which to build e.g. will construct a "binary" stream, but then enable UTF-8 translation. -=item pop +=item :pop A pseudo layer that removes the top-most layer. Gives perl code a way to manipulate the layer stack. Should be considered @@ -151,6 +192,12 @@ An example of a possible use might be: A more elegant (and safer) interface is needed. +=item :win32 + +On Win32 platforms this I layer uses native "handle" IO +rather than unix-like numeric file descriptor layer. Known to be +buggy as of perl 5.8.2. + =back =head2 Custom Layers diff --git a/pod/perlrun.pod b/pod/perlrun.pod index 4c74581..fd9d9a2 100644 --- a/pod/perlrun.pod +++ b/pod/perlrun.pod @@ -938,7 +938,7 @@ IO in order to load them!. See L<"open pragma"|open> for how to add external encodings as defaults. The layers that it makes sense to include in the PERLIO environment -variable are summarised below. For more details see L. +variable are briefly summarised below. For more details see L. =over 8 @@ -950,51 +950,27 @@ You perhaps were thinking of C<:crlf:bytes> or C<:perlio:bytes>. =item :crlf -A layer that implements DOS/Windows like CRLF line endings. On read -converts pairs of CR,LF to a single "\n" newline character. On write -converts each "\n" to a CR,LF pair. Note that this layer likes to be -one of its kind: it silently ignores attempts to be pushed into the -layer stack more than once. - -(Gory details follow) To be more exact what happens is this: after -pushing itself to the stack, the C<:crlf> layer checks all the layers -below itself to find the first layer that is capable of being a CRLF -layer but is not yet enabled to be a CRLF layer. If it finds such a -layer, it enables the CRLFness of that other deeper layer, and then -pops itself off the stack. If not, fine, use the one we just pushed. - -The end result is that a C<:crlf> means "please enable the first CRLF -layer you can find, and if you can't find one, here would be a good -spot to place a new one." - -Based on the C<:perlio> layer. +A layer which does CRLF to "\n" translation distinguishing "text" and +"binary" files in the manner of MS-DOS and similar operating systems. +(It currently does I mimic MS-DOS as far as treating of Control-Z +as being an end-of-file marker.) =item :mmap A layer which implements "reading" of files by using C to make (whole) file appear in the process's address space, and then -using that as PerlIO's "buffer". This I be faster in certain -circumstances for large files, and may result in less physical memory -use when multiple processes are reading the same file. - -Files which are not C-able revert to behaving like the C<:perlio> -layer. Writes also behave like C<:perlio> layer as C for write -needs extra house-keeping (to extend the file) which negates any advantage. - -The C<:mmap> layer will not exist if platform does not support C. +using that as PerlIO's "buffer". =item :perlio -A from scratch implementation of buffering for PerlIO. Provides fast -access to the buffer for C which implements perl's readline/EE -and in general attempts to minimize data copying. - -C<:perlio> will insert a C<:unix> layer below itself to do low level IO. +This is a re-implementation of "stdio-like" buffering written as a +PerlIO "layer". As such it will call whatever layer is below it for +its operations (typically C<:unix>). =item :pop An experimental pseudolayer that removes the topmost layer. -Use with the same care as is reserved for nitroglyserin. +Use with the same care as is reserved for nitroglycerin. =item :raw @@ -1003,16 +979,9 @@ layer is equivalent to calling C. It makes the stream pass each byte as-is without any translation. In particular CRLF translation, and/or :utf8 intuited from locale are disabled. -Arranges for all accesses go straight to the lowest buffered layer provided -by the configration. That is it strips off any layers above that layer. - -In Perl 5.6 and some books the C<:raw> layer (previously sometimes also -referred to as a "discipline") is documented as the inverse of the -C<:crlf> layer. That is no longer the case - other layers which would -alter binary nature of the stream are also disabled. If you want UNIX -line endings on a platform that normally does CRLF translation, but still -want UTF-8 or encoding defaults the appropriate thing to do is to add -C<:perlio> to PERLIO environment variable. +Unlike in the earlier versions of Perl C<:raw> is I +just the inverse of C<:crlf> - other layers which would affect the +binary nature of the stream are also removed or disabled. =item :stdio @@ -1024,19 +993,15 @@ to do that. =item :unix -Lowest level layer which provides basic PerlIO operations in terms of -UNIX/POSIX numeric file descriptor calls -C +Low level layer which calls C, C and C etc. =item :utf8 A pseudolayer that turns on a flag on the layer below to tell perl -that data sent to the stream should be converted to perl internal -"utf8" form and that data from the stream should be considered as so -encoded. On ASCII based platforms the encoding is UTF-8 and on EBCDIC -platforms UTF-EBCDIC. May be useful in PERLIO environment variable to -make UTF-8 the default. (To turn off that behaviour use C<:bytes> -layer.) +that output should be in utf8 and that input should be regarded as +already in utf8 form. May be useful in PERLIO environment +variable to make UTF-8 the default. (To turn off that behaviour +use C<:bytes> layer.) =item :win32 @@ -1062,8 +1027,8 @@ buffering. This release uses C as the bottom layer on Win32 and so still uses C compiler's numeric file descriptor routines. There is an experimental native -C layer which is expected to be enhanced and should eventually replace -the C layer. +C layer which is expected to be enhanced and should eventually be +the default under Win32. =item PERLIO_DEBUG