X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlop.pod;h=0bb506ddc708ada8a8988131bf1243676f097a0d;hb=0e06870bf080a38cda51c06c6612359afc2334e1;hp=ac9d4b65da8c3bf1b30f1c2adeaa57fac0777542;hpb=61eff3bce55f37db01a97cd993ca4e73fb953a10;p=p5sagit%2Fp5-mst-13.2.git

diff --git a/pod/perlop.pod b/pod/perlop.pod
index ac9d4b6..0bb506d 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -119,7 +119,7 @@ you increment a variable that is numeric, or that has ever been used in
 a numeric context, you get a normal increment.  If, however, the
 variable has been used in only string contexts since it was set, and
 has a value that is not the empty string and matches the pattern
-C</^[a-zA-Z]*[0-9]*$/>, the increment is done as a string, preserving each
+C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
 character within its range, with carry:
 
     print ++($foo = '99');	# prints '100'
@@ -172,8 +172,11 @@ search or modify the string $_ by default.  This operator makes that kind
 of operation work on some other string.  The right argument is a search
 pattern, substitution, or transliteration.  The left argument is what is
 supposed to be searched, substituted, or transliterated instead of the default
-$_.  The return value indicates the success of the operation.  If the
-right argument is an expression rather than a search pattern,
+$_.  When used in scalar context, the return value generally indicates the
+success of the operation.  Behavior in list context depends on the particular
+operator.  See L</"Regexp Quote-Like Operators"> for details.
+
+If the right argument is an expression rather than a search pattern,
 substitution, or transliteration, it is interpreted as a search pattern at run
 time.  This can be less efficient than an explicit search, because the
 pattern must be compiled every time the expression is evaluated.
@@ -193,7 +196,7 @@ C<$a> minus the largest multiple of C<$b> that is not greater than
 C<$a>.  If C<$b> is negative, then C<$a % $b> is C<$a> minus the
 smallest multiple of C<$b> that is not less than C<$a> (i.e. the
 result will be less than or equal to zero). 
-Note than when C<use integer> is in scope, "%" give you direct access
+Note than when C<use integer> is in scope, "%" gives you direct access
 to the modulus operator as implemented by your C compiler.  This
 operator is not as well defined for negative operands, but it will
 execute faster.
@@ -296,7 +299,14 @@ to the right argument.
 
 Binary "<=>" returns -1, 0, or 1 depending on whether the left
 argument is numerically less than, equal to, or greater than the right
-argument.
+argument.  If your platform supports NaNs (not-a-numbers) as numeric
+values, using them with "<=>" returns undef.  NaN is not "<", "==", ">",
+"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
+returns true, as does NaN != anything else. If your platform doesn't
+support NaNs then NaN is just a string with numeric value 0.
+
+    perl -le '$a = NaN; print "No NaN support here" if $a == $a'
+    perl -le '$a = NaN; print "NaN support here" if $a != $a'
 
 Binary "eq" returns true if the left argument is stringwise equal to
 the right argument.
@@ -304,8 +314,9 @@ the right argument.
 Binary "ne" returns true if the left argument is stringwise not equal
 to the right argument.
 
-Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise
-less than, equal to, or greater than the right argument.
+Binary "cmp" returns -1, 0, or 1 depending on whether the left
+argument is stringwise less than, equal to, or greater than the right
+argument.
 
 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
 by the current locale if C<use locale> is in effect.  See L<perllocale>.
@@ -704,7 +715,7 @@ on a Mac, these are reversed, and on systems without line terminator,
 printing C<"\n"> may emit no actual data.  In general, use C<"\n"> when
 you mean a "newline" for your system, but use the literal ASCII when you
 need an exact character.  For example, most networking protocols expect
-and prefer a CR+LF (C<"\012\015"> or C<"\cJ\cM">) for line terminators,
+and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
 and although they often accept just C<"\012">, they seldom tolerate just
 C<"\015">.  If you get in the habit of using C<"\n"> for networking,
 you may be burned some day.
@@ -785,14 +796,14 @@ If "'" is the delimiter, no interpolation is performed on the PATTERN.
 
 PATTERN may contain variables, which will be interpolated (and the
 pattern recompiled) every time the pattern search is evaluated, except
-for when the delimiter is a single quote.  (Note that C<$)> and C<$|>
-might not be interpolated because they look like end-of-string tests.)
+for when the delimiter is a single quote.  (Note that C<$(>, C<$)>, and
+C<$|> are not interpolated because they look like end-of-string tests.)
 If you want such a pattern to be compiled only once, add a C</o> after
 the trailing delimiter.  This avoids expensive run-time recompilations,
 and is useful when the value you are interpolating won't change over
 the life of the script.  However, mentioning C</o> constitutes a promise
 that you won't change the variables in the pattern.  If you change them,
-Perl won't even notice.  See also L<"qr//">.
+Perl won't even notice.  See also L<"qr/STRING/imosx">.
 
 If the PATTERN evaluates to the empty string, the last
 I<successfully> matched regular expression is used instead.
@@ -845,9 +856,11 @@ string also resets the search position.
 
 You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
 zero-width assertion that matches the exact position where the previous
-C<m//g>, if any, left off.  The C<\G> assertion is not supported without
-the C</g> modifier.  (Currently, without C</g>, C<\G> behaves just like
-C<\A>, but that's accidental and may change in the future.)
+C<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
+still anchors at pos(), but the match is of course only attempted once.
+Using C<\G> without C</g> on a target string that has not previously had a
+C</g> match applied to it is the same as using the C<\A> assertion to match
+the beginning of the string.
 
 Examples:
 
@@ -855,7 +868,7 @@ Examples:
     ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
 
     # scalar context
-    $/ = ""; $* = 1;  # $* deprecated in modern perls
+    $/ = "";
     while (defined($paragraph = <>)) {
 	while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
 	    $sentences++;
@@ -873,6 +886,7 @@ Examples:
         print "3: '";
         print $1 while /(p)/gc; print "', pos=", pos, "\n";
     }
+    print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
 
 The last example should print:
 
@@ -882,6 +896,13 @@ The last example should print:
     1: '', pos=7
     2: 'q', pos=8
     3: '', pos=8
+    Final: 'q', pos=8
+
+Notice that the final match matched C<q> instead of C<p>, which a match
+without the C<\G> anchor would have done. Also note that the final match
+did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
+final match did indeed match C<p>, it's a good bet that you're running an
+older (pre-5.6.0) Perl.
 
 A useful idiom for C<lex>-like scanners is C</\G.../gc>.  You can
 combine several regexps like this to process a string part-by-part,
@@ -935,7 +956,7 @@ A double-quoted, interpolated string.
 
 =item qr/STRING/imosx
 
-This operators quotes--and compiles--its I<STRING> as a regular
+This operator quotes (and possibly compiles) its I<STRING> as a regular
 expression.  I<STRING> is interpolated the same way as I<PATTERN>
 in C<m/PATTERN/>.  If "'" is used as the delimiter, no interpolation
 is done.  Returns a Perl value which may be used instead of the
@@ -994,13 +1015,14 @@ for a detailed look at the semantics of regular expressions.
 
 =item `STRING`
 
-A string which is (possibly) interpolated and then executed as a system
-command with C</bin/sh> or its equivalent.  Shell wildcards, pipes,
-and redirections will be honored.  The collected standard output of the
-command is returned; standard error is unaffected.  In scalar context,
-it comes back as a single (potentially multi-line) string.  In list
-context, returns a list of lines (however you've defined lines with $/
-or $INPUT_RECORD_SEPARATOR).
+A string which is (possibly) interpolated and then executed as a
+system command with C</bin/sh> or its equivalent.  Shell wildcards,
+pipes, and redirections will be honored.  The collected standard
+output of the command is returned; standard error is unaffected.  In
+scalar context, it comes back as a single (potentially multi-line)
+string, or undef if the command failed.  In list context, returns a
+list of lines (however you've defined lines with $/ or
+$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
 
 Because backticks do not affect standard error, use shell file descriptor
 syntax (assuming the shell supports this) if you care to address this.
@@ -1048,6 +1070,12 @@ multiple commands in a single line by separating them with the command
 separator character, if your shell supports that (e.g. C<;> on many Unix
 shells; C<&> on the Windows NT C<cmd> shell).
 
+Beginning with v5.6.0, Perl will attempt to flush all files opened for
+output before starting the child process, but this may not be supported
+on some platforms (see L<perlport>).  To be safe, you may need to set
+C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
+C<IO::Handle> on any open handles.
+
 Beware that some command shells may place restrictions on the length
 of the command line.  You must ensure your strings don't exceed this
 limit after any necessary interpolations.  See the platform-specific
@@ -1088,8 +1116,8 @@ Some frequently seen examples:
 
 A common mistake is to try to separate the words with comma or to
 put comments into a multi-line C<qw>-string.  For this reason, the
-B<-w> switch (that is, the C<$^W> variable) produces warnings if
-the STRING contains the "," or the "#" character.
+C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) 
+produces warnings if the STRING contains the "," or the "#" character.
 
 =item s/PATTERN/REPLACEMENT/egimosx
 
@@ -1131,9 +1159,10 @@ text is not evaluated as a command.  If the
 PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
 pair of quotes, which may or may not be bracketing quotes, e.g.,
 C<s(foo)(bar)> or C<< s<foo>/bar/ >>.  A C</e> will cause the
-replacement portion to be interpreted as a full-fledged Perl expression
-and eval()ed right then and there.  It is, however, syntax checked at
-compile-time.
+replacement portion to be treated as a full-fledged Perl expression
+and evaluated right then and there.  It is, however, syntax checked at
+compile-time. A second C<e> modifier will cause the replacement portion
+to be C<eval>ed before being run as a Perl expression.
 
 Examples:
 
@@ -1160,8 +1189,12 @@ Examples:
     # symbolic dereferencing
     s/\$(\w+)/${$1}/g;
 
-    # /e's can even nest;  this will expand
-    # any embedded scalar variable (including lexicals) in $_
+    # Add one to the value of any numbers in the string
+    s/(\d+)/1 + $1/eg;
+
+    # This will expand any embedded scalar variable
+    # (including lexicals) in $_ : First $1 is interpolated
+    # to the variable name, and then evaluated
     s/(\$\w+)/$1/eeg;
 
     # Delete (most) C comments.
@@ -1193,9 +1226,9 @@ to occur that you might want.  Here are two common cases:
     # expand tabs to 8-column spacing
     1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
 
-=item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC
+=item tr/SEARCHLIST/REPLACEMENTLIST/cds
 
-=item y/SEARCHLIST/REPLACEMENTLIST/cdsUC
+=item y/SEARCHLIST/REPLACEMENTLIST/cds
 
 Transliterates all occurrences of the characters found in the search list
 with the corresponding character in the replacement list.  It returns
@@ -1211,6 +1244,12 @@ SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
 its own pair of quotes, which may or may not be bracketing quotes,
 e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
 
+Note that C<tr> does B<not> do regular expression character classes
+such as C<\d> or C<[:lower:]>.  The <tr> operator is not equivalent to
+the tr(1) utility.  If you want to map strings between lower/upper
+cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
+using the C<s> operator if you need regular expressions.
+
 Note also that the whole range idea is rather unportable between
 character sets--and even within character sets they may cause results
 you probably didn't expect.  A sound principle is to use only ranges
@@ -1223,8 +1262,6 @@ Options:
     c	Complement the SEARCHLIST.
     d	Delete found but unreplaced characters.
     s	Squash duplicate replaced characters.
-    U	Translate to/from UTF-8.
-    C	Translate to/from 8-bit char (octet).
 
 If the C</c> modifier is specified, the SEARCHLIST character set
 is complemented.  If the C</d> modifier is specified, any characters
@@ -1242,10 +1279,6 @@ enough.  If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
 This latter is useful for counting characters in a class or for
 squashing character sequences in a class.
 
-The first C</U> or C</C> modifier applies to the left side of the translation.
-The second one applies to the right side.  If present, these modifiers override
-the current utf8 state.
-
 Examples:
 
     $ARGV[1] =~ tr/A-Z/a-z/;	# canonicalize to lower case
@@ -1265,9 +1298,6 @@ Examples:
     tr [\200-\377]
        [\000-\177];		# delete 8th bit
 
-    tr/\0-\xFF//CU;		# change Latin-1 to Unicode
-    tr/\0-\x{FF}//UC;		# change Unicode to Latin-1
-
 If multiple transliterations are given for a character, only the
 first one is used:
 
@@ -1313,7 +1343,7 @@ their results are the same, we consider them individually.  For different
 quoting constructs, Perl performs different numbers of passes, from
 one to five, but these passes are always performed in the same order.
 
-=over
+=over 4
 
 =item Finding the end
 
@@ -1367,7 +1397,7 @@ used in parsing.
 The next step is interpolation in the text obtained, which is now
 delimiter-independent.  There are four different cases.
 
-=over
+=over 4
 
 =item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
 
@@ -1398,7 +1428,7 @@ as C<"\\\t"> (since TAB is not alphanumeric).  Note also that:
 may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
 
 Interpolated scalars and arrays are converted internally to the C<join> and
-C<.> catentation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
+C<.> catenation operations.  Thus, C<"$foo XXX '@arr'"> becomes:
 
   $foo . " XXX '" . (join $", @arr) . "'";
 
@@ -1449,8 +1479,8 @@ the result is not predictable.
 It is at this step that C<\1> is begrudgingly converted to C<$1> in
 the replacement text of C<s///> to correct the incorrigible
 I<sed> hackers who haven't picked up the saner idiom yet.  A warning
-is emitted if the B<-w> command-line flag (that is, the C<$^W> variable)
-was set.
+is emitted if the C<use warnings> pragma or the B<-w> command-line flag
+(that is, the C<$^W> variable) was set.
 
 The lack of processing of C<\\> creates specific restrictions on
 the post-processed text.  If the delimiter is C</>, one cannot get
@@ -1511,7 +1541,7 @@ terminator of a C<{}>-delimited construct.
 It is possible to inspect both the string given to RE engine and the
 resulting finite automaton.  See the arguments C<debug>/C<debugcolor>
 in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
-switch documented in L<perlrun/Switches>.
+switch documented in L<perlrun/"Command Switches">.
 
 =item Optimization of regular expressions
 
@@ -1588,7 +1618,8 @@ to terminate the loop, they should be tested for explicitly:
     while (<STDIN>) { last unless $_; ... }
 
 In other boolean contexts, C<< <I<filehandle>> >> without an
-explicit C<defined> test or comparison elicit a warning if the B<-w>
+explicit C<defined> test or comparison elicit a warning if the 
+C<use warnings> pragma or the B<-w>
 command-line switch (the C<$^W> variable) is in effect.
 
 The filehandles STDIN, STDOUT, and STDERR are predefined.  (The
@@ -1716,7 +1747,7 @@ A (file)glob evaluates its (embedded) argument only when it is
 starting a new list.  All values must be read before it will start
 over.  In list context, this isn't important because you automatically
 get them all anyway.  However, in scalar context the operator returns
-the next value each time it's called, or C
+the next value each time it's called, or C<undef> when the list has
 run out.  As with filehandle reads, an automatic C<defined> is
 generated when the glob occurs in the test part of a C<while>,
 because legal glob returns (e.g. a file called F<0>) would otherwise
@@ -1796,22 +1827,6 @@ operation you intend by using C<""> or C<0+>, as in the examples below.
 See L<perlfunc/vec> for information on how to manipulate individual bits
 in a bit vector.
 
-=head2 Strings of Character
-
-A literal of the form C<v1.20.300.4000> is parsed as a string composed
-of characters with the specified ordinals.  This provides an alternative,
-more readable way to construct strings, rather than use the somewhat less
-readable interpolation form C<"\x{1}\x{14}\x{12c}\x{fa0}">.  This is useful
-for representing Unicode strings, and for comparing version "numbers"
-using the string comparison operators, C<cmp>, C<gt>, C<lt> etc.
-
-If there are two or more dots in the literal, the leading C<v> may be
-omitted.
-
-Such literals are accepted by both C<require> and C<use> for doing a version
-check.  The C<$^V> special variable also contains the running Perl
-interpreter's version in this form.  See L<perlvar/$^V>.
-
 =head2 Integer Arithmetic
 
 By default, Perl assumes that it must do most of its arithmetic in
@@ -1832,8 +1847,8 @@ integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
 or so.
 
 Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
-and ">>") always produce integral results.  (But see also L<Bitwise
-String Operators>.)  However, C<use integer> still has meaning for
+and ">>") always produce integral results.  (But see also 
+L<Bitwise String Operators>.)  However, C<use integer> still has meaning for
 them.  By default, their results are interpreted as unsigned integers, but
 if C<use integer> is in effect, their results are interpreted
 as signed integers.  For example, C<~0> usually evaluates to a large