X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlop.pod;h=3c84e608019577152a60d6e74ec8237770c16025;hb=49b8b560b023159cf65bbcf3068dc24e8091bc05;hp=150813e711557da46b5859af57ea902c00d505ea;hpb=0b8d69e96040ec811c067522a2d9770121123a35;p=p5sagit%2Fp5-mst-13.2.git

diff --git a/pod/perlop.pod b/pod/perlop.pod
index 150813e..3c84e60 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -95,7 +95,7 @@ as well as L<"I/O Operators">.
 
 =head2 The Arrow Operator
 
-"C<-E<gt>>" is an infix dereference operator, just as it is in C
+"C<< -> >>" is an infix dereference operator, just as it is in C
 and C++.  If the right side is either a C<[...]>, C<{...}>, or a
 C<(...)> subscript, then the left side must be either a hard or
 symbolic reference to an array, a hash, or a subroutine respectively.
@@ -148,9 +148,12 @@ starts with a plus or minus, a string starting with the opposite sign
 is returned.  One effect of these rules is that C<-bareword> is equivalent
 to C<"-bareword">.
 
-Unary "~" performs bitwise negation, i.e., 1's complement.  For example,
-C<0666 &~ 027> is 0640.  (See also L<Integer Arithmetic> and L<Bitwise
-String Operators>.)
+Unary "~" performs bitwise negation, i.e., 1's complement.  For
+example, C<0666 & ~027> is 0640.  (See also L<Integer Arithmetic> and
+L<Bitwise String Operators>.)  Note that the width of the result is
+platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
+bits wide on a 64-bit platform, so if you are expecting a certain bit
+width, remember use the & operator to mask off the excess bits.
 
 Unary "+" has no effect whatsoever, even on strings.  It is useful
 syntactically for separating a function name from a parenthesized expression
@@ -169,8 +172,11 @@ search or modify the string $_ by default.  This operator makes that kind
 of operation work on some other string.  The right argument is a search
 pattern, substitution, or transliteration.  The left argument is what is
 supposed to be searched, substituted, or transliterated instead of the default
-$_.  The return value indicates the success of the operation.  If the
-right argument is an expression rather than a search pattern,
+$_.  When used in scalar context, the return value generally indicates the
+success of the operation.  Behavior in list context depends on the particular
+operator.  See L</"Regexp Quote-Like Operators"> for details.
+
+If the right argument is an expression rather than a search pattern,
 substitution, or transliteration, it is interpreted as a search pattern at run
 time.  This can be less efficient than an explicit search, because the
 pattern must be compiled every time the expression is evaluated.
@@ -259,16 +265,16 @@ See also L<"Terms and List Operators (Leftward)">.
 
 =head2 Relational Operators
 
-Binary "E<lt>" returns true if the left argument is numerically less than
+Binary "<" returns true if the left argument is numerically less than
 the right argument.
 
-Binary "E<gt>" returns true if the left argument is numerically greater
+Binary ">" returns true if the left argument is numerically greater
 than the right argument.
 
-Binary "E<lt>=" returns true if the left argument is numerically less than
+Binary "<=" returns true if the left argument is numerically less than
 or equal to the right argument.
 
-Binary "E<gt>=" returns true if the left argument is numerically greater
+Binary ">=" returns true if the left argument is numerically greater
 than or equal to the right argument.
 
 Binary "lt" returns true if the left argument is stringwise less than
@@ -291,7 +297,7 @@ the right argument.
 Binary "!=" returns true if the left argument is numerically not equal
 to the right argument.
 
-Binary "E<lt>=E<gt>" returns -1, 0, or 1 depending on whether the left
+Binary "<=>" returns -1, 0, or 1 depending on whether the left
 argument is numerically less than, equal to, or greater than the right
 argument.
 
@@ -541,7 +547,7 @@ argument and returns that value.  This is just like C's comma operator.
 In list context, it's just the list argument separator, and inserts
 both its arguments into the list.
 
-The =E<gt> digraph is mostly just a synonym for the comma operator.  It's useful for
+The => digraph is mostly just a synonym for the comma operator.  It's useful for
 documenting arguments that come in pairs.  As of release 5.001, it also forces
 any word to the left of it to be interpreted as a string.
 
@@ -643,7 +649,7 @@ sorts of brackets (round, angle, square, curly) will all nest, which means
 that 
 
 	q{foo{bar}baz} 
-	
+
 is the same as 
 
 	'foo{bar}baz'
@@ -782,8 +788,8 @@ If "'" is the delimiter, no interpolation is performed on the PATTERN.
 
 PATTERN may contain variables, which will be interpolated (and the
 pattern recompiled) every time the pattern search is evaluated, except
-for when the delimiter is a single quote.  (Note that C<$)> and C<$|>
-might not be interpolated because they look like end-of-string tests.)
+for when the delimiter is a single quote.  (Note that C<$(>, C<$)>, and
+C<$|> are not interpolated because they look like end-of-string tests.)
 If you want such a pattern to be compiled only once, add a C</o> after
 the trailing delimiter.  This avoids expensive run-time recompilations,
 and is useful when the value you are interpolating won't change over
@@ -1045,6 +1051,12 @@ multiple commands in a single line by separating them with the command
 separator character, if your shell supports that (e.g. C<;> on many Unix
 shells; C<&> on the Windows NT C<cmd> shell).
 
+Beginning with v5.6.0, Perl will attempt to flush all files opened for
+output before starting the child process, but this may not be supported
+on some platforms (see L<perlport>).  To be safe, you may need to set
+C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
+C<IO::Handle> on any open handles.
+
 Beware that some command shells may place restrictions on the length
 of the command line.  You must ensure your strings don't exceed this
 limit after any necessary interpolations.  See the platform-specific
@@ -1085,8 +1097,8 @@ Some frequently seen examples:
 
 A common mistake is to try to separate the words with comma or to
 put comments into a multi-line C<qw>-string.  For this reason, the
-B<-w> switch (that is, the C<$^W> variable) produces warnings if
-the STRING contains the "," or the "#" character.
+C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) 
+produces warnings if the STRING contains the "," or the "#" character.
 
 =item s/PATTERN/REPLACEMENT/egimosx
 
@@ -1127,10 +1139,11 @@ Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
 text is not evaluated as a command.  If the
 PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
 pair of quotes, which may or may not be bracketing quotes, e.g.,
-C<s(foo)(bar)> or C<sE<lt>fooE<gt>/bar/>.  A C</e> will cause the
-replacement portion to be interpreted as a full-fledged Perl expression
-and eval()ed right then and there.  It is, however, syntax checked at
-compile-time.
+C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
+replacement portion to be treated as a full-fledged Perl expression
+and evaluated right then and there.  It is, however, syntax checked at
+compile-time. A second C<e> modifier will cause the replacement portion
+to be C<eval>ed before being run as a Perl expression.
 
 Examples:
 
@@ -1157,8 +1170,12 @@ Examples:
     # symbolic dereferencing
     s/\$(\w+)/${$1}/g;
 
-    # /e's can even nest;  this will expand
-    # any embedded scalar variable (including lexicals) in $_
+    # Add one to the value of any numbers in the string
+    s/(\d+)/1 + $1/eg;
+
+    # This will expand any embedded scalar variable
+    # (including lexicals) in $_ : First $1 is interpolated
+    # to the variable name, and then evaluated
     s/(\$\w+)/$1/eeg;
 
     # Delete (most) C comments.
@@ -1178,8 +1195,8 @@ Examples:
     s/([^ ]*) *([^ ]*)/$2 $1/;	# reverse 1st two fields
 
 Note the use of $ instead of \ in the last example.  Unlike
-B<sed>, we use the \E<lt>I<digit>E<gt> form in only the left hand side.
-Anywhere else it's $E<lt>I<digit>E<gt>.
+B<sed>, we use the \<I<digit>> form in only the left hand side.
+Anywhere else it's $<I<digit>>.
 
 Occasionally, you can't use just a C</g> to get all the changes
 to occur that you might want.  Here are two common cases:
@@ -1190,9 +1207,9 @@ to occur that you might want.  Here are two common cases:
     # expand tabs to 8-column spacing
     1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
 
-=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC
+=item tr/SEARCHLIST/REPLACEMENTLIST/cds
 
-=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC
+=item y/SEARCHLIST/REPLACEMENTLIST/cds
 
 Transliterates all occurrences of the characters found in the search list
 with the corresponding character in the replacement list.  It returns
@@ -1208,6 +1225,12 @@ SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
 its own pair of quotes, which may or may not be bracketing quotes,
 e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
 
+Note that C<tr> does B<not> do regular expression character classes
+such as C<\d> or C<[:lower:]>.  The <tr> operator is not equivalent to
+the tr(1) utility.  If you want to map strings between lower/upper
+cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
+using the C<s> operator if you need regular expressions.
+
 Note also that the whole range idea is rather unportable between
 character sets--and even within character sets they may cause results
 you probably didn't expect.  A sound principle is to use only ranges
@@ -1220,8 +1243,6 @@ Options:
     c	Complement the SEARCHLIST.
     d	Delete found but unreplaced characters.
     s	Squash duplicate replaced characters.
-    U	Translate to/from UTF-8.
-    C	Translate to/from 8-bit char (octet).
 
 If the C</c> modifier is specified, the SEARCHLIST character set
 is complemented.  If the C</d> modifier is specified, any characters
@@ -1239,10 +1260,6 @@ enough.  If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
 This latter is useful for counting characters in a class or for
 squashing character sequences in a class.
 
-The first C</U> or C</C> modifier applies to the left side of the translation.
-The second one applies to the right side.  If present, these modifiers override
-the current utf8 state.
-
 Examples:
 
     $ARGV[1] =~ tr/A-Z/a-z/;	# canonicalize to lower case
@@ -1262,9 +1279,6 @@ Examples:
     tr [\200-\377]
        [\000-\177];		# delete 8th bit
 
-    tr/\0-\xFF//CU;		# change Latin-1 to Unicode
-    tr/\0-\x{FF}//UC;		# change Unicode to Latin-1
-
 If multiple transliterations are given for a character, only the
 first one is used:
 
@@ -1317,8 +1331,8 @@ one to five, but these passes are always performed in the same order.
 The first pass is finding the end of the quoted construct, whether
 it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
 construct, a C</> that terminates a C<qq//> construct, a C<]> which
-terminates C<qq[]> construct, or a C<E<gt>> which terminates a
-fileglob started with C<E<lt>>.
+terminates C<qq[]> construct, or a C<< > >> which terminates a
+fileglob started with C<< < >>.
 
 When searching for single-character non-pairing delimiters, such
 as C</>, combinations of C<\\> and C<\/> are skipped.  However,
@@ -1374,7 +1388,7 @@ No interpolation is performed.
 
 The only interpolation is removal of C<\> from pairs C<\\>.
 
-=item C<"">, C<``>, C<qq//>, C<qx//>, C<<file*globE<gt>>
+=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
 
 C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
 converted to corresponding Perl constructs.  Thus, C<"$foo\Qbaz$bar">
@@ -1395,7 +1409,7 @@ as C<"\\\t"> (since TAB is not alphanumeric).  Note also that:
 may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
 
 Interpolated scalars and arrays are converted internally to the C<join> and
-C<.> catentation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
+C<.> catenation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
 
   $foo . " XXX '" . (join $", @arr) . "'";
 
@@ -1409,7 +1423,7 @@ scalar.
 
 Note also that the interpolation code needs to make a decision on
 where the interpolated scalar ends.  For instance, whether 
-C<"a $b -E<gt> {c}"> really means:
+C<< "a $b -> {c}" >> really means:
 
   "a " . $b . " -> {c}";
 
@@ -1446,8 +1460,8 @@ the result is not predictable.
 It is at this step that C<\1> is begrudgingly converted to C<$1> in
 the replacement text of C<s///> to correct the incorrigible
 I<sed> hackers who haven't picked up the saner idiom yet.  A warning
-is emitted if the B<-w> command-line flag (that is, the C<$^W> variable)
-was set.
+is emitted if the C<use warnings> pragma or the B<-w> command-line flag
+(that is, the C<$^W> variable) was set.
 
 The lack of processing of C<\\> creates specific restrictions on
 the post-processed text.  If the delimiter is C</>, one cannot get
@@ -1508,7 +1522,7 @@ terminator of a C<{}>-delimited construct.
 It is possible to inspect both the string given to RE engine and the
 resulting finite automaton.  See the arguments C<debug>/C<debugcolor>
 in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
-switch documented in L<perlrun/Switches>.
+switch documented in L<perlrun/"Command Switches">.
 
 =item Optimization of regular expressions
 
@@ -1584,8 +1598,9 @@ to terminate the loop, they should be tested for explicitly:
     while (($_ = <STDIN>) ne '0') { ... }
     while (<STDIN>) { last unless $_; ... }
 
-In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without an
-explicit C<defined> test or comparison elicit a warning if the B<-w>
+In other boolean contexts, C<< <I<filehandle>> >> without an
+explicit C<defined> test or comparison elicit a warning if the 
+C<use warnings> pragma or the B<-w>
 command-line switch (the C<$^W> variable) is in effect.
 
 The filehandles STDIN, STDOUT, and STDERR are predefined.  (The
@@ -1595,18 +1610,18 @@ rather than global.)  Additional filehandles may be created with
 the open() function, amongst others.  See L<perlopentut> and
 L<perlfunc/open> for details on this.
 
-If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for
+If a <FILEHANDLE> is used in a context that is looking for
 a list, a list comprising all input lines is returned, one line per
 list element.  It's easy to grow to a rather large data space this
 way, so use with care.
 
-E<lt>FILEHANDLEE<gt> may also be spelled C<readline(*FILEHANDLE)>.
+<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
 See L<perlfunc/readline>.
 
-The null filehandle E<lt>E<gt> is special: it can be used to emulate the
-behavior of B<sed> and B<awk>.  Input from E<lt>E<gt> comes either from
+The null filehandle <> is special: it can be used to emulate the
+behavior of B<sed> and B<awk>.  Input from <> comes either from
 standard input, or from each file listed on the command line.  Here's
-how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
+how it works: the first time <> is evaluated, the @ARGV array is
 checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
 gives you standard input.  The @ARGV array is then processed as a list
 of filenames.  The loop
@@ -1628,11 +1643,11 @@ is equivalent to the following Perl-like pseudo code:
 except that it isn't so cumbersome to say, and will actually work.
 It really does shift the @ARGV array and put the current filename
 into the $ARGV variable.  It also uses filehandle I<ARGV>
-internally--E<lt>E<gt> is just a synonym for E<lt>ARGVE<gt>, which
+internally--<> is just a synonym for <ARGV>, which
 is magical.  (The pseudo code above doesn't work because it treats
-E<lt>ARGVE<gt> as non-magical.)
+<ARGV> as non-magical.)
 
-You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
+You can modify @ARGV before the first <> as long as the array ends up
 containing the list of filenames you really want.  Line numbers (C<$.>)
 continue as though the input were one big happy file.  See the example
 in L<perlfunc/eof> for how to reset line numbers on each file.
@@ -1662,12 +1677,12 @@ Getopts modules or put a loop on the front like this:
 	# ...		# code for each line
     }
 
-The E<lt>E<gt> symbol will return C<undef> for end-of-file only once.  
+The <> symbol will return C<undef> for end-of-file only once.  
 If you call it again after this, it will assume you are processing another 
 @ARGV list, and if you haven't set @ARGV, will read input from STDIN.
 
 If angle brackets contain is a simple scalar variable (e.g.,
-E<lt>$fooE<gt>), then that variable contains the name of the
+<$foo>), then that variable contains the name of the
 filehandle to input from, or its typeglob, or a reference to the
 same.  For example:
 
@@ -1679,16 +1694,16 @@ scalar variable containing a filehandle name, typeglob, or typeglob
 reference, it is interpreted as a filename pattern to be globbed, and
 either a list of filenames or the next filename in the list is returned,
 depending on context.  This distinction is determined on syntactic
-grounds alone.  That means C<E<lt>$xE<gt>> is always a readline() from
-an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob().
+grounds alone.  That means C<< <$x> >> is always a readline() from
+an indirect handle, but C<< <$hash{key}> >> is always a glob().
 That's because $x is a simple scalar variable, but C<$hash{key}> is
 not--it's a hash element.
 
 One level of double-quote interpretation is done first, but you can't
-say C<E<lt>$fooE<gt>> because that's an indirect filehandle as explained
+say C<< <$foo> >> because that's an indirect filehandle as explained
 in the previous paragraph.  (In older versions of Perl, programmers
 would insert curly brackets to force interpretation as a filename glob:
-C<E<lt>${foo}E<gt>>.  These days, it's considered cleaner to call the
+C<< <${foo}> >>.  These days, it's considered cleaner to call the
 internal function directly as C<glob($foo)>, which is probably the right
 way to have done it in the first place.)  For example:
 
@@ -1696,7 +1711,7 @@ way to have done it in the first place.)  For example:
 	chmod 0644, $_;
     }
 
-is equivalent to
+is roughly equivalent to:
 
     open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
     while (<FOO>) {
@@ -1704,25 +1719,16 @@ is equivalent to
 	chmod 0644, $_;
     }
 
-In fact, it's currently implemented that way, but this is expected
-to be made completely internal in the near future.  (Which means
-it will not work on filenames with spaces in them unless you have
-csh(1) on your machine.)  Of course, the shortest way to do the
-above is:
+except that the globbing is actually done internally using the standard
+C<File::Glob> extension.  Of course, the shortest way to do the above is:
 
     chmod 0644, <*.c>;
 
-Because globbing currently invokes a shell, it's often faster to
-call readdir() yourself and do your own grep() on the filenames.
-Furthermore, due to its current implementation of using a shell,
-the glob() routine may get "Arg list too long" errors (unless you've
-installed tcsh(1L) as F</bin/csh> or hacked your F<config.sh>).
-
 A (file)glob evaluates its (embedded) argument only when it is
 starting a new list.  All values must be read before it will start
 over.  In list context, this isn't important because you automatically
 get them all anyway.  However, in scalar context the operator returns
-the next value each time it's called, or C
+the next value each time it's called, or C<undef> when the list has
 run out.  As with filehandle reads, an automatic C<defined> is
 generated when the glob occurs in the test part of a C<while>,
 because legal glob returns (e.g. a file called F<0>) would otherwise
@@ -1802,17 +1808,6 @@ operation you intend by using C<""> or C<0+>, as in the examples below.
 See L<perlfunc/vec> for information on how to manipulate individual bits
 in a bit vector.
 
-=head2 Version tuples
-
-A version number of the form C<v1.2.3.4> is parsed as a dual-valued literal.
-It has the string value of C<"\x{1}\x{2}\x{3}\x{4}"> (i.e., a utf8 string)
-and a numeric value of C<1 + 2/1000 + 3/1000000 + 4/1000000000>.  This is
-useful for representing and comparing version numbers.
-
-Version tuples are accepted by both C<require> and C<use>.  The C<$^V> variable
-contains the running Perl interpreter's version in this format.
-See L<perlvar/$^V>.
-
 =head2 Integer Arithmetic
 
 By default, Perl assumes that it must do most of its arithmetic in