by Elizabeth Mattijsen.
p4raw-id: //depot/perl@17410
lib/NEXT/t/actuns.t NEXT
lib/NEXT/t/next.t NEXT
lib/NEXT/t/unseen.t NEXT
-lib/open.pm Pragma to specify default I/O disciplines
+lib/open.pm Pragma to specify default I/O layers
lib/open.t See if the open pragma works
lib/open2.pl Open a two-ended pipe (uses IPC::Open2)
lib/open3.pl Open a three-ended pipe (uses IPC::Open3)
default).
The C<open> pragma serves as one of the interfaces to declare default
-"layers" for all I/O.
-
-The C<open> pragma is used to declare one or more default layers for
-I/O operations. Any open(), readpipe() (aka qx//) and similar
-operators found within the lexical scope of this pragma will use the
-declared defaults.
+"layers" (also known as "disciplines") for all I/O. Any open(),
+readpipe() (aka qx//) and similar operators found within the lexical
+scope of this pragma will use the declared defaults.
With the C<IN> subpragma you can declare the default layers
of input streams, and with the C<OUT> subpragma you can declare
=item *
Previous versions of perl and some readings of some sections of Camel III
-implied that C<:raw> "discipline" was the inverse of C<:crlf>.
+implied that the C<:raw> "discipline" was the inverse of C<:crlf>.
Turning off "clrfness" is no longer enough to make a stream truly
-binary. So the PerlIO C<:raw> discipline is now formally defined as being
+binary. So the PerlIO C<:raw> layer (or "discipline", to use the
+Camel book's older terminology) is now formally defined as being
equivalent to binmode(FH) - which is in turn defined as doing whatever
is necessary to pass each byte as-is without any translation.
In particular binmode(FH) - and hence C<:raw> - will now turn off both CRLF
-and UTF-8 translation and remove other "layers" (e.g. :encoding()) which
+and UTF-8 translation and remove other layers (e.g. :encoding()) which
would modify byte stream.
=item *
If your environment variables (LC_ALL, LC_CTYPE, LANG, LANGUAGE) look
like you want to use UTF-8 (any of the the variables match C</utf-?8/i>),
-your STDIN, STDOUT, STDERR handles and the default open discipline
+your STDIN, STDOUT, STDERR handles and the default open layer
(see L<open>) are marked as UTF-8. (This feature, like other new
features that combine Unicode and I/O, work only if you are using
PerlIO, but that's is the default.)
=item *
-C<open> is a new pragma for setting the default I/O disciplines
+C<open> is a new pragma for setting the default I/O layers
for open().
=item *
packed address of the appropriate type for the socket. See the examples in
L<perlipc/"Sockets: Client/Server Communication">.
-=item binmode FILEHANDLE, DISCIPLINE
+=item binmode FILEHANDLE, LAYER
=item binmode FILEHANDLE
taken as the name of the filehandle. Returns true on success,
C<undef> on failure.
-If DISCIPLINE is omitted or specified as C<:raw> the filehandle is made
+If LAYER is omitted or specified as C<:raw> the filehandle is made
suitable for passing binary data. This includes turning off possible CRLF
translation and marking it as bytes (as opposed to Unicode characters).
Note that as desipite what may be implied in I<"Programming Perl">
(the Camel) or elsewhere C<:raw> is I<not> the simply inverse of C<:crlf>
-- other disciplines which would affect binary nature of the stream are
+-- other layers which would affect binary nature of the stream are
I<also> disabled. See L<PerlIO>, L<perlrun> and the discussion about the
PERLIO environment variable.
+I<The LAYER parameter of the binmode() function is described as "DISCIPLINE"
+in "Programming Perl, 3rd Edition". However, since the publishing of this
+book, by many known as "Camel III", the consensus of the naming of this
+functionality has moved from "discipline" to "layer". All documentation
+of this version of Perl therefore refers to "layers" rather than to
+"disciplines". Now back to the regularly scheduled documentation...>
+
On some systems (in general, DOS and Windows-based systems) binmode()
is necessary when you're not working with a text file. For the sake
of portability it is a good idea to always use it when appropriate,
In other words: regardless of platform, use binmode() on binary files
(like for example images).
-If DISCIPLINE is present it is a single string, but may contain
+If LAYER is present it is a single string, but may contain
multiple directives. The directives alter the behaviour of the
-file handle. When DISCIPLINE is present using binmode on text
+file handle. When LAYER is present using binmode on text
file makes sense.
To mark FILEHANDLE as UTF-8, use C<:utf8>.
The C<:bytes>, C<:crlf>, and C<:utf8>, and any other directives of the
-form C<:...>, are called I/O I<disciplines>. The normal implementation
-of disciplines in Perl 5.8 and later is in terms of I<layers>. See
-L<PerlIO>. (There is typically a one-to-one correspondence between
-layers and disiplines.) The C<open> pragma can be used to establish
-default I/O disciplines. See L<open>.
+form C<:...>, are called I/O I<layers>. The C<open> pragma can be used to
+establish default I/O layers. See L<open>.
In general, binmode() should be called after open() but before any I/O
is done on the filehandle. Calling binmode() will normally flush any
pending buffered output data (and perhaps pending input data) on the
-handle. An exception to this is the C<:encoding> discipline that
+handle. An exception to this is the C<:encoding> layer that
changes the default character encoding of the handle, see L<open>.
-The C<:encoding> discipline sometimes needs to be called in
+The C<:encoding> layer sometimes needs to be called in
mid-stream, and it doesn't flush the stream.
The operating system, device drivers, C libraries, and Perl run-time
In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN
and opening C<< '>-' >> opens STDOUT.
-You may use the three-argument form of open to specify
-I<I/O disciplines> or IO "layers" to be applied to the handle that affect how the input and output
-are processed: (see L<open> and L<PerlIO> for more details).
-For example
+You may use the three-argument form of open to specify IO "layers"
+(sometimes also referred to as "disciplines") to be applied to the handle
+that affect how the input and output are processed (see L<open> and
+L<PerlIO> for more details). For example
open(FH, "<:utf8", "file")
will open the UTF-8 encoded file containing Unicode characters,
-see L<perluniintro>. (Note that if disciplines are specified in the
-three-arg form then default disciplines set by the C<open> pragma are
+see L<perluniintro>. (Note that if layers are specified in the
+three-arg form then default layers set by the C<open> pragma are
ignored.)
Open returns nonzero upon success, the undefined value otherwise. If
Note the I<characters>: depending on the status of the filehandle,
either (8-bit) bytes or characters are read. By default all
filehandles operate on bytes, but for example if the filehandle has
-been opened with the C<:utf8> discipline (see L</open>, and the C<open>
+been opened with the C<:utf8> I/O layer (see L</open>, and the C<open>
pragma, L<open>), the I/O will operate on characters, not bytes.
=item readdir DIRHANDLE
Note the I<characters>: depending on the status of the socket, either
(8-bit) bytes or characters are received. By default all sockets
operate on bytes, but for example if the socket has been changed using
-binmode() to operate with the C<:utf8> discipline (see the C<open>
+binmode() to operate with the C<:utf8> I/O layer (see the C<open>
pragma, L<open>), the I/O will operate on characters, not bytes.
=item redo LABEL
Note the I<in bytes>: even if the filehandle has been set to
operate on characters (for example by using the C<:utf8> open
-discipline), tell() will return byte offsets, not character offsets
+layer), tell() will return byte offsets, not character offsets
(because implementing that would render seek() and tell() rather slow).
If you want to position file for C<sysread> or C<syswrite>, don't use
Note the I<characters>: depending on the status of the socket, either
(8-bit) bytes or characters are sent. By default all sockets operate
on bytes, but for example if the socket has been changed using
-binmode() to operate with the C<:utf8> discipline (see L</open>, or
+binmode() to operate with the C<:utf8> I/O layer (see L</open>, or
the C<open> pragma, L<open>), the I/O will operate on characters, not
bytes.
Note the I<characters>: depending on the status of the filehandle,
either (8-bit) bytes or characters are read. By default all
filehandles operate on bytes, but for example if the filehandle has
-been opened with the C<:utf8> discipline (see L</open>, and the C<open>
+been opened with the C<:utf8> I/O layer (see L</open>, and the C<open>
pragma, L<open>), the I/O will operate on characters, not bytes.
An OFFSET may be specified to place the read data at some place in the
negative).
Note the I<in bytes>: even if the filehandle has been set to operate
-on characters (for example by using the C<:utf8> discipline), tell()
+on characters (for example by using the C<:utf8> I/O layer), tell()
will return byte offsets, not character offsets (because implementing
that would render sysseek() very slow).
Note the I<characters>: depending on the status of the filehandle,
either (8-bit) bytes or characters are written. By default all
filehandles operate on bytes, but for example if the filehandle has
-been opened with the C<:utf8> discipline (see L</open>, and the open
+been opened with the C<:utf8> I/O layer (see L</open>, and the open
pragma, L<open>), the I/O will operate on characters, not bytes.
=item tell FILEHANDLE
Note the I<in bytes>: even if the filehandle has been set to
operate on characters (for example by using the C<:utf8> open
-discipline), tell() will return byte offsets, not character offsets
+layer), tell() will return byte offsets, not character offsets
(because that would render seek() and tell() rather slow).
The return value of tell() for the standard streams like the STDIN
=for comment
If/WHEN some brave soul makes these heuristics into a generic
- text-file class (or file discipline?), we can presumably delete
+ text-file class (or PerlIO layer?), we can presumably delete
mention of these icky details from this file, and can instead
- tell people to just use appropriate class/discipline.
+ tell people to just use appropriate class/layer.
Auto-recognition of newline sequences would be another desirable
- feature of such a class/discipline.
+ feature of such a class/layer.
HINT HINT HINT.
=for comment
Arranges for all accesses go straight to the lowest buffered layer provided
by the configration. That is it strips off any layers above that layer.
-In Perl 5.6 and some books the C<:raw> layer (also called a discipline)
-is documented as the inverse of the C<:crlf> layer. That is no longer
-the case - other layers which would alter binary nature of the
-stream are also disabled. If you want UNIX line endings on a platform
-that normally does CRLF translation, but still want UTF-8 or encoding
-defaults the appropriate thing to do is to add C<:perlio> to PERLIO
-environment variable.
+In Perl 5.6 and some books the C<:raw> layer (previously sometimes also
+referred to as a "discipline") is documented as the inverse of the
+C<:crlf> layer. That is no longer the case - other layers which would
+alter binary nature of the stream are also disabled. If you want UNIX
+line endings on a platform that normally does CRLF translation, but still
+want UTF-8 or encoding defaults the appropriate thing to do is to add
+C<:perlio> to PERLIO environment variable.
=item :stdio
=over 4
-=item Input and Output Disciplines
+=item Input and Output Layers
Perl knows when a filehandle uses Perl's internal Unicode encodings
(UTF-8, or UTF-EBCDIC if in EBCDIC) if the filehandle is opened with
for Unicode data and byte semantics for non-Unicode data.
The decision to use character semantics is made transparently. If
input data comes from a Unicode source--for example, if a character
-encoding discipline is added to a filehandle or a literal Unicode
+encoding layer is added to a filehandle or a literal Unicode
string constant appears in a program--character semantics apply.
Otherwise, byte semantics are in effect. The C<bytes> pragma should
be used to force byte semantics on Unicode data.
A user of Perl does not normally need to know nor care how Perl
happens to encode its internal strings, but it becomes relevant when
-outputting Unicode strings to a stream without a discipline--one with
+outputting Unicode strings to a stream without a PerlIO layer -- one with
the "default" encoding. In such a case, the raw bytes used internally
(the native character set or UTF-8, as appropriate for each string)
will be used, and a "Wide character" warning will be issued if those
Wide character in print at ...
-To output UTF-8, use the C<:utf8> output discipline. Prepending
+To output UTF-8, use the C<:utf8> output layer. Prepending
binmode(STDOUT, ":utf8");
binmode(STDOUT, ":encoding(shift_jis)");
The matching of encoding names is loose: case does not matter, and
-many encodings have several aliases. Note that C<:utf8> discipline
+many encodings have several aliases. Note that the C<:utf8> layer
must always be specified exactly like that; it is I<not> subject to
the loose matching of encoding names.
Reading in a file that you know happens to be encoded in one of the
Unicode or legacy encodings does not magically turn the data into
Unicode in Perl's eyes. To do that, specify the appropriate
-discipline when opening files
+layer when opening files
open(my $fh,'<:utf8', 'anything');
my $line_of_unicode = <$fh>;
open(my $fh,'<:encoding(Big5)', 'anything');
my $line_of_unicode = <$fh>;
-The I/O disciplines can also be specified more flexibly with
+The I/O layers can also be specified more flexibly with
the C<open> pragma. See L<open>, or look at the following example.
- use open ':utf8'; # input and output default discipline will be UTF-8
+ use open ':utf8'; # input and output default layer will be UTF-8
open X, ">file";
print X chr(0x100), "\n";
close X;
printf "%#x\n", ord(<Y>); # this should print 0x100
close Y;
-With the C<open> pragma you can use the C<:locale> discipline
+With the C<open> pragma you can use the C<:locale> layer
$ENV{LC_ALL} = $ENV{LANG} = 'ru_RU.KOI8-R';
# the :locale will probe the locale environment variables like LC_ALL
printf "%#x\n", ord(<I>), "\n"; # this should print 0xc1
close I;
-or you can also use the C<':encoding(...)'> discipline
+or you can also use the C<':encoding(...)'> layer
open(my $epic,'<:encoding(iso-8859-7)','iliad.greek');
my $line_of_unicode = <$epic>;
stream. The result is always Unicode.
The L<open> pragma affects all the C<open()> calls after the pragma by
-setting default disciplines. If you want to affect only certain
-streams, use explicit disciplines directly in the C<open()> call.
+setting default layers. If you want to affect only certain
+streams, use explicit layers directly in the C<open()> call.
You can switch encodings on an already opened stream by using
C<binmode()>; see L<perlfunc/binmode>.
C<:utf8> and C<:encoding(...)> methods do work with all of C<open()>,
C<binmode()>, and the C<open> pragma.
-Similarly, you may use these I/O disciplines on output streams to
+Similarly, you may use these I/O layers on output streams to
automatically convert Unicode to the specified encoding when it is
written to the stream. For example, the following snippet copies the
contents of the file "text.jis" (encoded as ISO-2022-JP, aka JIS) to
and C<sysseek()>.
Notice that because of the default behaviour of not doing any
-conversion upon input if there is no default discipline,
+conversion upon input if there is no default layer,
it is easy to mistakenly write code that keeps on expanding a file
by repeatedly encoding the data:
Normal users of Perl should never care how Perl encodes any particular
Unicode string (because the normal ways to get at the contents of a
string with Unicode--via input and output--should always be via
-explicitly-defined I/O disciplines). But if you must, there are two
+explicitly-defined I/O layers). But if you must, there are two
ways of looking behind the scenes.
One way of peeking inside the internal encoding of Unicode characters
=item ${^OPEN}
An internal variable used by PerlIO. A string in two parts, separated
-by a C<\0> byte, the first part is the input disciplines, the second
-part is the output disciplines.
+by a C<\0> byte, the first part describes the input layers, the second
+part describes the output layers.
=item $PERLDB