From: Jarkko Hietaniemi Date: Thu, 15 Nov 2001 14:35:55 +0000 (+0000) Subject: Fix for "perlio bug in koi8-r encoding". The problem X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=ed53a2bb7f04bc2b3b8eb8450d6e00842e9d7680;p=p5sagit%2Fp5-mst-13.2.git Fix for "perlio bug in koi8-r encoding". The problem seemed to be that binmode() always flushed the handle, which is not so good when switching encodings. Fixed, added Matt Sergeant's testcase, documented in perlfunc/binmode, also added a pointer about disciplines to perlfunc/open, and in general cleaned up and reformatted the open entry. p4raw-id: //depot/perl@13019 --- diff --git a/ext/PerlIO/t/encoding.t b/ext/PerlIO/t/encoding.t index 590fc00..e30e270 100644 --- a/ext/PerlIO/t/encoding.t +++ b/ext/PerlIO/t/encoding.t @@ -9,11 +9,12 @@ BEGIN { } } -print "1..10\n"; +print "1..11\n"; my $grk = "grk$$"; my $utf = "utf$$"; my $fail1 = "fail$$"; +my $russki = "koi8r$$"; if (open(GRK, ">$grk")) { # alpha beta gamma in ISO 8859-7 @@ -73,6 +74,29 @@ if (!defined $warn) { print "not ok 10 # warning is '$warn'"; } +if (open(RUSSKI, ">$russki")) { + print RUSSKI "\x3c\x3f\x78"; + close RUSSKI; + open(RUSSKI, "$russki"); + binmode(RUSSKI, ":raw"); + my $buf1; + read(RUSSKI, $buf1, 1); + eof(RUSSKI); + binmode(RUSSKI, ":encoding(koi8-r)"); + my $buf2; + read(RUSSKI, $buf2, 1); + my $offset = tell(RUSSKI); + if (ord($buf1) == 0x3c && ord($buf2) == 0x3f && $offset == 2) { + print "ok 11\n"; + } else { + printf "not ok 11 # %#x %#x %d\n", + ord($buf1), ord($buf2), $offset; + } + close(RUSSKI); +} else { + print "not ok 11 # open failed: $!\n"; +} + END { - unlink($grk, $utf, $fail1); + unlink($grk, $utf, $fail1, $russki); } diff --git a/perlio.c b/perlio.c index 8e8b859..88b3758 100644 --- a/perlio.c +++ b/perlio.c @@ -1072,16 +1072,19 @@ PerlIO_binmode(pTHX_ PerlIO *f, int iotype, int mode, const char *names) PerlIO_debug("PerlIO_binmode f=%p %s %c %x %s\n", f, PerlIOBase(f)->tab->name, iotype, mode, (names) ? names : "(Null)"); - PerlIO_flush(f); - if (!names && (O_TEXT != O_BINARY && (mode & O_BINARY))) { - PerlIO *top = f; - while (*top) { - if (PerlIOBase(top)->tab == &PerlIO_crlf) { - PerlIOBase(top)->flags &= ~PERLIO_F_CRLF; - break; + /* Can't flush if switching encodings. */ + if (!(names && memEQ(names, ":encoding(", 10))) { + PerlIO_flush(f); + if (!names && (O_TEXT != O_BINARY && (mode & O_BINARY))) { + PerlIO *top = f; + while (*top) { + if (PerlIOBase(top)->tab == &PerlIO_crlf) { + PerlIOBase(top)->flags &= ~PERLIO_F_CRLF; + break; + } + top = PerlIONext(top); + PerlIO_flush(top); } - top = PerlIONext(top); - PerlIO_flush(top); } } return PerlIO_apply_layers(aTHX_ f, NULL, names) == 0 ? TRUE : FALSE; diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 986c66e..5ad0afb 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -448,13 +448,22 @@ L. Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files. If FILEHANDLE is an expression, the value is taken as the -name of the filehandle. DISCIPLINE can be either of C<":raw"> for -binary mode or C<":crlf"> for "text" mode. If the DISCIPLINE is -omitted, it defaults to C<":raw">. Returns true on success, C on -failure. +name of the filehandle. DISCIPLINE can be either of C<:raw> for +binary mode or C<:crlf> for "text" mode. If the DISCIPLINE is +omitted, it defaults to C<:raw>. Returns true on success, C on +failure. The C<:raw> are C<:clrf>, and any other directives of the +form C<:...>, are called I/O I. -binmode() should be called after open() but before any I/O is done on -the filehandle. +The C pragma can be used to establish default I/O disciplines. +See L. + +In general, binmode() should be called after open() but before any I/O +is done on the filehandle. Calling binmode() will flush any possibly +pending buffered input or output data on the handle. The only +exception to this is the C<:encoding> discipline that changes +the default character encoding of the handle, see L. +The C<:encoding> discipline sometimes needs to be called in +mid-stream, and it doesn't flush the stream. On some systems binmode() is necessary when you're not working with a text file. For the sake of portability it is a good idea to always use @@ -463,9 +472,6 @@ it when appropriate, and to never use it when it isn't appropriate. In other words: Regardless of platform, use binmode() on binary files, and do not use binmode() on text files. -The C pragma can be used to establish default disciplines. -See L. - The operating system, device drivers, C libraries, and Perl run-time system all work together to let the programmer treat a single character (C<\n>) as the line terminator, irrespective of the external @@ -2659,33 +2665,39 @@ conversion assumes base 10.) =item open FILEHANDLE Opens the file whose filename is given by EXPR, and associates it with -FILEHANDLE. If FILEHANDLE is an undefined lexical (C) variable the variable is -assigned a reference to a new anonymous filehandle, otherwise if FILEHANDLE is an expression, -its value is used as the name of the real filehandle wanted. (This is considered a symbolic -reference, so C should I be in effect.) - -If EXPR is omitted, the scalar -variable of the same name as the FILEHANDLE contains the filename. -(Note that lexical variables--those declared with C--will not work -for this purpose; so if you're using C, specify EXPR in your call -to open.) See L for a kinder, gentler explanation of opening -files. - -If three or more arguments are specified then the mode of opening and the file name -are separate. If MODE is C<< '<' >> or nothing, the file is opened for input. -If MODE is C<< '>' >>, the file is truncated and opened for -output, being created if necessary. If MODE is C<<< '>>' >>>, +FILEHANDLE. + +(The following is a comprehensive reference to open(): for a gentler +introduction you may consider L.) + +If FILEHANDLE is an undefined lexical (C) variable the variable is +assigned a reference to a new anonymous filehandle, otherwise if +FILEHANDLE is an expression, its value is used as the name of the real +filehandle wanted. (This is considered a symbolic reference, so C should I be in effect.) + +If EXPR is omitted, the scalar variable of the same name as the +FILEHANDLE contains the filename. (Note that lexical variables--those +declared with C--will not work for this purpose; so if you're +using C, specify EXPR in your call to open.) + +If three or more arguments are specified then the mode of opening and +the file name are separate. If MODE is C<< '<' >> or nothing, the file +is opened for input. If MODE is C<< '>' >>, the file is truncated and +opened for output, being created if necessary. If MODE is C<<< '>>' >>>, the file is opened for appending, again being created if necessary. -You can put a C<'+'> in front of the C<< '>' >> or C<< '<' >> to indicate that -you want both read and write access to the file; thus C<< '+<' >> is almost -always preferred for read/write updates--the C<< '+>' >> mode would clobber the -file first. You can't usually use either read-write mode for updating -textfiles, since they have variable length records. See the B<-i> -switch in L for a better approach. The file is created with -permissions of C<0666> modified by the process' C value. -These various prefixes correspond to the fopen(3) modes of C<'r'>, C<'r+'>, -C<'w'>, C<'w+'>, C<'a'>, and C<'a+'>. +You can put a C<'+'> in front of the C<< '>' >> or C<< '<' >> to +indicate that you want both read and write access to the file; thus +C<< '+<' >> is almost always preferred for read/write updates--the C<< +'+>' >> mode would clobber the file first. You can't usually use +either read-write mode for updating textfiles, since they have +variable length records. See the B<-i> switch in L for a +better approach. The file is created with permissions of C<0666> +modified by the process' C value. + +These various prefixes correspond to the fopen(3) modes of C<'r'>, +C<'r+'>, C<'w'>, C<'w+'>, C<'a'>, and C<'a+'>. In the 2-arguments (and 1-argument) form of the call the mode and filename should be concatenated (in this order), possibly separated by @@ -2701,38 +2713,46 @@ that pipes both in I out, but see L, L, and L for alternatives.) -For three or more arguments if MODE is C<'|-'>, the filename is interpreted as a -command to which output is to be piped, and if MODE is -C<'-|'>, the filename is interpreted as a command which pipes output to -us. In the 2-arguments (and 1-argument) form one should replace dash -(C<'-'>) with the command. See L -for more examples of this. (You are not allowed to C to a command -that pipes both in I out, but see L, L, -and L for alternatives.) In 3+ arg form of -pipe opens then if LIST is specified (extra arguments after the command name) then -LIST becomes arguments to the command invoked if the platform supports it. -The meaning of C with more than three arguments for non-pipe modes -is not yet specified. Experimental "layers" may give extra LIST arguments meaning. +For three or more arguments if MODE is C<'|-'>, the filename is +interpreted as a command to which output is to be piped, and if MODE +is C<'-|'>, the filename is interpreted as a command which pipes +output to us. In the 2-arguments (and 1-argument) form one should +replace dash (C<'-'>) with the command. +See L for more examples of this. +(You are not allowed to C to a command that pipes both in I +out, but see L, L, and +L for alternatives.) + +In the three-or-more argument form of pipe opens, if LIST is specified +(extra arguments after the command name) then LIST becomes arguments +to the command invoked if the platform supports it. The meaning of +C with more than three arguments for non-pipe modes is not yet +specified. Experimental "layers" may give extra LIST arguments +meaning. In the 2-arguments (and 1-argument) form opening C<'-'> opens STDIN and opening C<< '>-' >> opens STDOUT. -Open returns -nonzero upon success, the undefined value otherwise. If the C -involved a pipe, the return value happens to be the pid of the -subprocess. +You may use the three-argument form of open to specify +I that affect how the input and output +are processed: see L and L. + +Open returns nonzero upon success, the undefined value otherwise. If +the C involved a pipe, the return value happens to be the pid of +the subprocess. -If you're unfortunate enough to be running Perl on a system that -distinguishes between text files and binary files (modern operating -systems don't care), then you should check out L for tips for -dealing with this. The key distinction between systems that need C -and those that don't is their text file formats. Systems like Unix, MacOS, and -Plan9, which delimit lines with a single character, and which encode that -character in C as C<"\n">, do not need C. The rest need it. +If you're running Perl on a system that distinguishes between text +files and binary files, then you should check out L for tips +for dealing with this. The key distinction between systems that need +C and those that don't is their text file formats. Systems +like Unix, MacOS, and Plan9, which delimit lines with a single +character, and which encode that character in C as C<"\n">, do not +need C. The rest need it. -In the three argument form MODE may also contain a list of IO "layers" (see L and -L for more details) to be applied to the handle. This can be used to achieve the -effect of C as well as more complex behaviours. +In the three argument form MODE may also contain a list of IO "layers" +(see L and L for more details) to be applied to the +handle. This can be used to achieve the effect of C as well +as more complex behaviours. When opening a file, it's usually a bad idea to continue normal execution if the request failed, so C is frequently used in connection with @@ -2742,14 +2762,13 @@ modules that can help with that problem)) you should always check the return value from opening a file. The infrequent exception is when working with an unopened filehandle is actually what you want to do. -As a special case the 3 arg form with a read/write mode and the third argument -being C: +As a special case the 3 arg form with a read/write mode and the third +argument being C: open(TMP, "+>", undef) or die ... opens a filehandle to an anonymous temporary file. - Examples: $ARTICLE = 100; @@ -2887,17 +2906,16 @@ supported on some platforms (see L). To be safe, you may need to set C<$|> ($AUTOFLUSH in English) or call the C method of C on any open handles. -On systems that support a -close-on-exec flag on files, the flag will be set for the newly opened -file descriptor as determined by the value of $^F. See L. +On systems that support a close-on-exec flag on files, the flag will +be set for the newly opened file descriptor as determined by the value +of $^F. See L. Closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in C<$?>. -The filename passed to 2-argument (or 1-argument) form of open() -will have leading and trailing -whitespace deleted, and the normal redirection characters -honored. This property, known as "magic open", +The filename passed to 2-argument (or 1-argument) form of open() will +have leading and trailing whitespace deleted, and the normal +redirection characters honored. This property, known as "magic open", can often be used to good effect. A user could specify a filename of F<"rsh cat file |">, or you could change certain filenames as needed: