X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlop.pod;h=17728df9d3e7a846ae264ceb697aa7cdc5946d7d;hb=c69f112c145fabe210a7e2c5c2406baeea71af2f;hp=81efcb267d78069b36c07ba49dfe72443d2d1d1f;hpb=18270fd3b8aafde2f9ea21ea13adde95ef24b149;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlop.pod b/pod/perlop.pod index 81efcb2..17728df 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -392,7 +392,7 @@ As a scalar operator: As a list operator: for (101 .. 200) { print; } # print $_ 100 times - @foo = @foo[$[ .. $#foo]; # an expensive no-op + @foo = @foo[0 .. $#foo]; # an expensive no-op @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items The range operator (in a list context) makes use of the magical @@ -567,6 +567,15 @@ the same character fore and aft, but the 4 sorts of brackets s{}{} Substitution yes tr{}{} Translation no +Note that there can be whitespace between the operator and the quoting +characters, except when C<#> is being used as the quoting character. +C is parsed as being the string C, which C is the +operator C followed by a comment. Its argument will be taken from the +next line. This allows you to write: + + s {foo} # Replace foo + {bar} # with bar. + For constructs that do interpolation, variables beginning with "C<$>" or "C<@>" are interpolated, as are the following sequences: @@ -619,9 +628,9 @@ patterns local to the current package are reset. This usage is vaguely deprecated, and may be removed in some future version of Perl. -=item m/PATTERN/gimosx +=item m/PATTERN/cgimosx -=item /PATTERN/gimosx +=item /PATTERN/cgimosx Searches a string for a pattern match, and in a scalar context returns true (1) or false (''). If no string is specified via the C<=~> or @@ -634,6 +643,7 @@ when C is in effect. Options are: + c Do not reset search position on a failed match when /g is in effect. g Match globally, i.e., find all occurrences. i Do case-insensitive pattern matching. m Treat string as multiple lines. @@ -644,7 +654,8 @@ Options are: If "/" is the delimiter then the initial C is optional. With the C you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching Unix path names -that contain "/", to avoid LTS (leaning toothpick syndrome). +that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is +the delimiter, then the match-only-once rule of C applies. PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated. (Note @@ -696,18 +707,22 @@ If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. In a scalar context, C iterates through the string, returning TRUE -each time it matches, and FALSE when it eventually runs out of -matches. (In other words, it remembers where it left off last time and -restarts the search at that point. You can actually find the current -match position of a string or set it using the pos() function--see -L.) Note that you can use this feature to stack C -matches or intermix C matches with C. Note that -the C<\G> zero-width assertion is not supported without the C -modifier; currently, without C, C<\G> behaves just like C<\A>, but -that's accidental and may change in the future. - -If you modify the string in any way, the match position is reset to the -beginning. Examples: +each time it matches, and FALSE when it eventually runs out of matches. +(In other words, it remembers where it left off last time and restarts +the search at that point. You can actually find the current match +position of a string or set it using the pos() function; see +L.) A failed match normally resets the search position to +the beginning of the string, but you can avoid that by adding the C +modifier (e.g. C). Modifying the target string also resets the +search position. + +You can intermix C matches with C, where C<\G> is a +zero-width assertion that matches the exact position where the previous +C, if any, left off. The C<\G> assertion is not supported without +the C modifier; currently, without C, C<\G> behaves just like +C<\A>, but that's accidental and may change in the future. + +Examples: # list context ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); @@ -721,15 +736,15 @@ beginning. Examples: } print "$sentences\n"; - # using m//g with \G + # using m//gc with \G $_ = "ppooqppqq"; while ($i++ < 2) { print "1: '"; - print $1 while /(o)/g; print "', pos=", pos, "\n"; + print $1 while /(o)/gc; print "', pos=", pos, "\n"; print "2: '"; - print $1 if /\G(q)/g; print "', pos=", pos, "\n"; + print $1 if /\G(q)/gc; print "', pos=", pos, "\n"; print "3: '"; - print $1 while /(p)/g; print "', pos=", pos, "\n"; + print $1 while /(p)/gc; print "', pos=", pos, "\n"; } The last example should print: @@ -741,23 +756,23 @@ The last example should print: 2: 'q', pos=8 3: '', pos=8 -A useful idiom for C-like scanners is C. You can +A useful idiom for C-like scanners is C. You can combine several regexps like this to process a string part-by-part, -doing different actions depending on which regexp matched. The next -regexp would step in at the place the previous one left off. +doing different actions depending on which regexp matched. Each +regexp tries to match where the previous one leaves off. $_ = <<'EOL'; $url = new URI::URL "http://www/"; die if $url eq "xXx"; EOL LOOP: { - print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/g; - print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/g; - print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/g; - print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/g; - print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/g; - print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/g; - print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/g; + print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc; + print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc; + print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc; + print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc; + print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc; + print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc; + print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc; print ". That's all!\n"; } @@ -803,6 +818,24 @@ with $/ or $INPUT_RECORD_SEPARATOR). $today = qx{ date }; +Note that how the string gets evaluated is entirely subject to the +command interpreter on your system. On most platforms, you will have +to protect shell metacharacters if you want them treated literally. +On some platforms (notably DOS-like ones), the shell may not be +capable of dealing with multiline commands, so putting newlines in +the string may not get you what you want. You may be able to evaluate +multiple commands in a single line by separating them with the command +separator character, if your shell supports that (e.g. C<;> on many Unix +shells; C<&> on the Windows NT C shell). + +Beware that some command shells may place restrictions on the length +of the command line. You must ensure your strings don't exceed this +limit after any necessary interpolations. See the platform-specific +release notes for more details about your particular environment. + +Also realize that using this operator frequently leads to unportable +programs. + See L<"I/O Operators"> for more discussion. =item qw/STRING/ @@ -1053,7 +1086,7 @@ of filenames. The loop is equivalent to the following Perl-like pseudo code: - unshift(@ARGV, '-') if $#ARGV < $[; + unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV); while () {