X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlfaq6.pod;h=6bf1428fb5be817defcc56189f9271ceb9dcfee4;hb=226de479579f4a84dd17654b44e5aef323b0a403;hp=ab19de8cfa7dd7f52c5e76c3669d14e26f3dd8b8;hpb=e573f90328e9db84c5405db01c52908bfac9286d;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index ab19de8..6bf1428 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regular Expressions ($Revision: 7910 $) +perlfaq6 - Regular Expressions ($Revision: 10126 $) =head1 DESCRIPTION @@ -153,9 +153,8 @@ Here's another example of using C<..>: X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in> X<$RS, regexes in> -Up to Perl 5.8.0, $/ has to be a string. This may change in 5.10, -but don't get your hopes up. Until then, you can use these examples -if you really need to do this. +$/ has to be a string. You can use these examples if you really need to +do this. If you have File::Stream, this is easy. @@ -338,32 +337,63 @@ The use of C<\Q> causes the <.> in the regex to be treated as a regular character, so that C matches a C

followed by a dot. =head2 What is C really for? -X +X X -Using a variable in a regular expression match forces a re-evaluation -(and perhaps recompilation) each time the regular expression is -encountered. The C modifier locks in the regex the first time -it's used. This always happens in a constant regular expression, and -in fact, the pattern was compiled into the internal format at the same -time your entire program was. +(contributed by brian d foy) -Use of C is irrelevant unless variable interpolation is used in -the pattern, and if so, the regex engine will neither know nor care -whether the variables change after the pattern is evaluated the I time. +The C option for regular expressions (documented in L and +L) tells Perl to compile the regular expression only once. +This is only useful when the pattern contains a variable. Perls 5.6 +and later handle this automatically if the pattern does not change. -C is often used to gain an extra measure of efficiency by not -performing subsequent evaluations when you know it won't matter -(because you know the variables won't change), or more rarely, when -you don't want the regex to notice if they do. +Since the match operator C, the substitution operator C, +and the regular expression quoting operator C are double-quotish +constructs, you can interpolate variables into the pattern. See the +answer to "How can I quote a variable to use in a regex?" for more +details. -For example, here's a "paragrep" program: +This example takes a regular expression from the argument list and +prints the lines of input that match it: - $/ = ''; # paragraph mode - $pat = shift; - while (<>) { - print if /$pat/o; - } + my $pattern = shift @ARGV; + + while( <> ) { + print if m/$pattern/; + } + +Versions of Perl prior to 5.6 would recompile the regular expression +for each iteration, even if C<$pattern> had not changed. The C +would prevent this by telling Perl to compile the pattern the first +time, then reuse that for subsequent iterations: + + my $pattern = shift @ARGV; + + while( <> ) { + print if m/$pattern/o; # useful for Perl < 5.6 + } + +In versions 5.6 and later, Perl won't recompile the regular expression +if the variable hasn't changed, so you probably don't need the C +option. It doesn't hurt, but it doesn't help either. If you want any +version of Perl to compile the regular expression only once even if +the variable changes (thus, only using its initial value), you still +need the C. + +You can watch Perl's regular expression engine at work to verify for +yourself if Perl is recompiling a regular expression. The C pragma (comes with Perl 5.005 and later) shows the details. +With Perls before 5.6, you should see C reporting that its +compiling the regular expression on each iteration. With Perl 5.6 or +later, you should only see C report that for the first iteration. + + use re 'debug'; + + $regex = 'Perl'; + foreach ( qw(Perl Java Ruby Python) ) { + print STDERR "-" x 73, "\n"; + print STDERR "Trying $_...\n"; + print STDERR "\t$_ is good!\n" if m/$regex/; + } =head2 How do I use a regular expression to strip C style comments from a file? @@ -575,7 +605,7 @@ but faster. { foreach $pattern ( @patterns ) { - print if /\b$pattern\b/i; + print if /$pattern/i; next LINE; } } @@ -684,14 +714,14 @@ string where the last match left off. The regular expression engine cannot skip over any characters to find the next match with this anchor, so C<\G> is similar to the beginning of string anchor, C<^>. The C<\G> anchor is typically -used with the C flag. It uses the value of pos() +used with the C flag. It uses the value of C as the position to start the next match. As the match -operator makes successive matches, it updates pos() with the +operator makes successive matches, it updates C with the position of the next character past the last match (or the first character of the next match, depending on how you like -to look at it). Each string has its own pos() value. +to look at it). Each string has its own C value. -Suppose you want to match all of consective pairs of digits +Suppose you want to match all of consecutive pairs of digits in a string like "1122a44" and stop matching when you encounter non-digits. You want to match C<11> and C<22> but the letter shows up between C<22> and C<44> and you want @@ -701,7 +731,7 @@ the C and still matches C<44>. $_ = "1122a44"; my @pairs = m/(\d\d)/g; # qw( 11 22 44 ) -If you use the \G anchor, you force the match after C<22> to +If you use the C<\G> anchor, you force the match after C<22> to start with the C. The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already @@ -719,7 +749,7 @@ still need the C flag. print "Found $1\n"; } -After the match fails at the letter C, perl resets pos() +After the match fails at the letter C, perl resets C and the next match on the same string starts at the beginning. $_ = "1122a44"; @@ -730,13 +760,13 @@ and the next match on the same string starts at the beginning. print "Found $1 after while" if m/(\d\d)/g; # finds "11" -You can disable pos() resets on fail with the C flag. -Subsequent matches start where the last successful match -ended (the value of pos()) even if a match on the same -string as failed in the meantime. In this case, the match -after the while() loop starts at the C (where the last -match stopped), and since it does not use any anchor it can -skip over the C to find "44". +You can disable C resets on fail with the C flag, documented +in L and L. Subsequent matches start where the last +successful match ended (the value of C) even if a match on the +same string has failed in the meantime. In this case, the match after +the C loop starts at the C (where the last match stopped), +and since it does not use any anchor it can skip over the C to find +C<44>. $_ = "1122a44"; while( m/\G(\d\d)/gc ) @@ -761,7 +791,7 @@ which works in 5.004 or later. } } -For each line, the PARSER loop first tries to match a series +For each line, the C loop first tries to match a series of digits followed by a word boundary. This match has to start at the place the last match left off (or the beginning of the string on the first match). Since C for source control details and availability. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and +Copyright (c) 1997-2007 Tom Christiansen, Nathan Torkington, and other authors as noted. All rights reserved. This documentation is free; you can redistribute it and/or modify it