littered with answers involving regular expressions. For example,
decoding a URL and checking whether something is a number are handled
with regular expressions, but those answers are found elsewhere in
-this document (in the section on Data and the Networking one on
-networking, to be precise).
+this document (in L<perlfaq9>: ``How do I decode or create those %-encodings
+on the web'' and L<perfaq4>: ``How do I determine whether a scalar is
+a number/whole/integer/float'', to be precise).
=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?
$file->waitfor('/second line\n/');
print $file->getline;
-=head2 How do I substitute case insensitively on the LHS, but preserving case on the RHS?
+=head2 How do I substitute case insensitively on the LHS while preserving case on the RHS?
Here's a lovely Perlish solution by Larry Rosler. It exploits
properties of bitwise xor on ASCII strings.
=head2 What is C</o> really for?
Using a variable in a regular expression match forces a re-evaluation
-(and perhaps recompilation) each time through. The C</o> modifier
-locks in the regex the first time it's used. This always happens in a
-constant regular expression, and in fact, the pattern was compiled
-into the internal format at the same time your entire program was.
+(and perhaps recompilation) each time the regular expression is
+encountered. The C</o> modifier locks in the regex the first time
+it's used. This always happens in a constant regular expression, and
+in fact, the pattern was compiled into the internal format at the same
+time your entire program was.
Use of C</o> is irrelevant unless variable interpolation is used in
the pattern, and if so, the regex engine will neither know nor care
=head2 Can I use Perl regular expressions to match balanced text?
Although Perl regular expressions are more powerful than "mathematical"
-regular expressions, because they feature conveniences like backreferences
-(C<\1> and its ilk), they still aren't powerful enough -- with
+regular expressions because they feature conveniences like backreferences
+(C<\1> and its ilk), they still aren't powerful enough--with
the possible exception of bizarre and experimental features in the
development-track releases of Perl. You still need to use non-regex
techniques to parse balanced text, such as the text enclosed between
or C<(> and C<)> can be found in
http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .
-The C::Scan module from CPAN contains such subs for internal usage,
+The C::Scan module from CPAN contains such subs for internal use,
but they are undocumented.
=head2 What does it mean that regexes are greedy? How can I get around it?
control on to whatever is next in line, like you would if you were
playing hot potato.
-=head2 How do I process each word on each line?
+=head2 How do I process each word on each line?
Use the split function:
print "$count $line";
}
-If you want these output in a sorted order, see the section on Hashes.
+If you want these output in a sorted order, see L<perlfaq4>: ``How do I
+sort a hash (optionally by value instead of key)?''.
=head2 How can I do approximate matching?
=head2 Why don't word-boundary searches with C<\b> work for me?
-Two common misconceptions are that C<\b> is a synonym for C<\s+>, and
+Two common misconceptions are that C<\b> is a synonym for C<\s+> and
that it's the edge between whitespace characters and non-whitespace
characters. Neither is correct. C<\b> is the place between a C<\w>
character and a C<\W> character (that is, C<\b> is the edge of a
=head2 Why does using $&, $`, or $' slow my program down?
-Because once Perl sees that you need one of these variables anywhere in
-the program, it has to provide them on each and every pattern match.
+Once Perl sees that you need one of these variables anywhere in
+the program, it provides them on each and every pattern match.
The same mechanism that handles these provides for the use of $1, $2,
etc., so you pay the same price for each regex that contains capturing
-parentheses. But if you never use $&, etc., in your script, then regexes
+parentheses. If you never use $&, etc., in your script, then regexes
I<without> capturing parentheses won't be penalized. So avoid $&, $',
and $` if you can, but if you can't, once you've used them at all, use
them at will because you've already paid the price. Remember that some
}
}
-But then you lose the vertical alignment of the regular expressions.
+but then you lose the vertical alignment of the regular expressions.
=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
chomp($pattern = <STDIN>);
if ($line =~ /$pattern/) { }
-Or, since you have no guarantee that your user entered
+Alternatively, since you have no guarantee that your user entered
a valid regular expression, trap the exception this way:
if (eval { $line =~ /$pattern/ }) { }
-But if all you really want to search for a string, not a pattern,
+If all you really want to search for a string, not a pattern,
then you should either use the index() function, which is made for
string searching, or if you can't be disabused of using a pattern
match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented