=head1 DESCRIPTION
-This page describes the syntax of regular expressions in Perl. For a
-description of how to I<use> regular expressions in matching
-operations, plus various examples of the same, see discussions
-of C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like Operators">.
+This page describes the syntax of regular expressions in Perl.
+
+if you haven't used regular expressions before, a quick-start
+introduction is available in L<perlrequick>, and a longer tutorial
+introduction is available in L<perlretut>.
+
+For reference on how regular expressions are used in matching
+operations, plus various examples of the same, see discussions of
+C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like
+Operators">.
Matching operations can have various modifiers. Modifiers
that relate to the interpretation of the regular expression inside
"^" to match only at the beginning of the string and "$" to match
only at the end (or just before a newline at the end) of the string.
Together, as /ms, they let the "." match any character whatsoever,
-while yet allowing "^" and "$" to match, respectively, just after
+while still allowing "^" and "$" to match, respectively, just after
and just before newlines within the string.
=item x
This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
- $_ **= $_ , / {$_} / for 2 .. 42;
+ $_ **= $_ , / {$_} / for 2 .. 42;
By default, a quantified subpattern is "greedy", that is, it will match as
many times as possible (given a particular starting location) while still
\x1B hex char
\x{263a} wide hex char (Unicode SMILEY)
\c[ control char
- \C{name} named char
+ \N{name} named char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
and C<\U> is taken from the current locale. See L<perllocale>. For
-documentation of C<\C{name}>, see L<charnames>.
+documentation of C<\N{name}>, see L<charnames>.
You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
An unescaped C<$> or C<@> interpolates the corresponding variable,
In addition, Perl defines the following:
\w Match a "word" character (alphanumeric plus "_")
- \W Match a non-word character
+ \W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
equivalent to C<(?:\PM\pM*)>
- \O Match a single C char (octet) even under utf8.
+ \C Match a single C char (octet) even under utf8.
-A C<\w> matches a single alphanumeric character, not a whole word.
+A C<\w> matches a single alphanumeric character or C<_>, not a whole word.
Use C<\w+> to match a string of Perl-identifier characters (which isn't
the same as matching an English word). If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
current locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes (though not as either end of
-a range). See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
+C<\d>, and C<\D> within character classes, but if you try to use them
+as endpoints of a range, that's not a range, the "-" is understood literally.
+See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
The POSIX character class syntax
- [:class:]
+ [:class:]
-is also available. The available classes and their \-equivalents
-(if any) are as follows:
+is also available. The available classes and their backslash
+equivalents (if available) are as follows:
alpha
alnum
ascii
+ blank [1]
cntrl
digit \d
graph
lower
print
punct
- space \s
+ space \s [2]
upper
- word \w
+ word \w [3]
xdigit
-Note that the [] are part of the [::] construct, not part of the whole
-character class. For example:
+ [1] A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'.
+ [2] Not I<exactly equivalent> to C<\s> since the C<[[:space:]]> includes
+ also the (very rare) `vertical tabulator', "\ck", chr(11).
+ [3] A Perl extension.
- [01[:alpha:]%]
+For example use C<[:upper:]> to match all the uppercase characters.
+Note that the C<[]> are part of the C<[::]> construct, not part of the
+whole character class. For example:
-matches one, zero, any alphabetic character, and the percentage sign.
+ [01[:alpha:]%]
-The exact meanings of the above classes depend from many things:
-if the C<utf8> pragma is used, the following equivalenced to Unicode
-\p{} constructs hold:
+matches zero, one, any alphabetic character, and the percentage sign.
+
+If the C<utf8> pragma is used, the following equivalences to Unicode
+\p{} constructs and equivalent backslash character classes (if available),
+will hold:
alpha IsAlpha
alnum IsAlnum
ascii IsASCII
+ blank IsSpace
cntrl IsCntrl
- digit IsDigit
+ digit IsDigit \d
graph IsGraph
lower IsLower
print IsPrint
punct IsPunct
space IsSpace
+ IsSpacePerl \s
upper IsUpper
word IsWord
xdigit IsXDigit
-For example, [:lower:] and \p{IsLower} are equivalent.
+For example C<[:lower:]> and C<\p{IsLower}> are equivalent.
If the C<utf8> pragma is not used but the C<locale> pragma is, the
-classes correlate with the isalpha(3) interface (except for `word',
-which is a Perl extension).
+classes correlate with the usual isalpha(3) interface (except for
+`word' and `blank').
The assumedly non-obviously named classes are:
=item cntrl
- Any control character. Usually characters that don't produce
- output as such but instead control the terminal somehow:
- for example newline and backspace are control characters.
+Any control character. Usually characters that don't produce output as
+such but instead control the terminal somehow: for example newline and
+backspace are control characters. All characters with ord() less than
+32 are most often classified as control characters (assuming ASCII,
+the ISO Latin character sets, and Unicode), as is the character with
+the ord() value of 127 (C<DEL>).
=item graph
- Any alphanumeric or punctuation character.
+Any alphanumeric or punctuation (special) character.
=item print
- Any alphanumeric or punctuation character or space.
+Any alphanumeric or punctuation (special) character or the space character.
=item punct
- Any punctuation character.
+Any punctuation (special) character.
=item xdigit
- Any hexadecimal digit. Though this may feel silly
- (/0-9a-f/i would work just fine) it is included
- for completeness.
-
-=item
+Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would
+work just fine) it is included for completeness.
=back
You can negate the [::] character classes by prefixing the class name
with a '^'. This is a Perl extension. For example:
- ^digit \D \P{IsDigit}
- ^space \S \P{IsSpace}
- ^word \W \P{IsWord}
+ POSIX trad. Perl utf8 Perl
-The POSIX character classes [.cc.] and [=cc=] are B<not> supported
-and trying to use them will cause an error.
+ [:^digit:] \D \P{IsDigit}
+ [:^space:] \S \P{IsSpace}
+ [:^word:] \W \P{IsWord}
+
+The POSIX character classes [.cc.] and [=cc=] are recognized but
+B<not> supported and trying to use them will cause an error.
Perl defines the following zero-width assertions:
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
- \G Match only where previous m//g left off (works only with /g)
+ \G Match only at pos() (e.g. at the end-of-match position
+ of prior m//g)
A word boundary (C<\b>) is a spot between two characters
that has a C<\w> on one side of it and a C<\W> on the other side
of your string, see the previous reference. The actual location
where C<\G> will match can also be influenced by using C<pos()> as
an lvalue. See L<perlfunc/pos>.
-
+
The bracketing construct C<( ... )> creates capture buffers. To
-refer to the digit'th buffer use \E<lt>digitE<gt> within the
+refer to the digit'th buffer use \<digit> within the
match. Outside the match use "$" instead of "\". (The
-\E<lt>digitE<gt> notation works in certain circumstances outside
+\<digit> notation works in certain circumstances outside
the match. See the warning below about \1 vs $1 for details.)
Referring back to another part of the match is called a
I<backreference>.
There is no limit to the number of captured substrings that you may
use. However Perl also uses \10, \11, etc. as aliases for \010,
-\011, etc. (Recall that 0 means octal, so \011 is the 9'th ASCII
-character, a tab.) Perl resolves this ambiguity by interpreting
-\10 as a backreference only if at least 10 left parentheses have
-opened before it. Likewise \11 is a backreference only if at least
-11 left parentheses have opened before it. And so on. \1 through
-\9 are always interpreted as backreferences."
+\011, etc. (Recall that 0 means octal, so \011 is the character at
+number 9 in your coded character set; which would be the 10th character,
+a horizontal tab under ASCII.) Perl resolves this
+ambiguity by interpreting \10 as a backreference only if at least 10
+left parentheses have opened before it. Likewise \11 is a
+backreference only if at least 11 left parentheses have opened
+before it. And so on. \1 through \9 are always interpreted as
+backreferences.
Examples:
if (/(.)\1/) { # find first doubled char
print "'$1' is the first doubled character\n";
}
-
+
if (/Time: (..):(..):(..)/) { # parse out values
$hours = $1;
$minutes = $2;
$seconds = $3;
}
-
+
Several special variables also refer back to portions of the previous
match. C<$+> returns whatever the last bracket match matched.
C<$&> returns the entire matched string. (At one point C<$0> did
after the matched string.
The numbered variables ($1, $2, $3, etc.) and the related punctuation
-set (C<<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped
+set (C<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped
until the end of the enclosing block or until the next successful
match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
C<\w>, C<\n>. Unlike some other regular expression languages, there
are no backslashed symbols that aren't alphanumeric. So anything
-that looks like \\, \(, \), \E<lt>, \E<gt>, \{, or \} is always
+that looks like \\, \(, \), \<, \>, \{, or \} is always
interpreted as a literal character, not a metacharacter. This was
once used in a common idiom to disable or quote the special meanings
of regular expression metacharacters in a string that you want to
-use for a pattern. Simply quote all non-alphanumeric characters:
+use for a pattern. Simply quote all non-"word" characters:
$pattern =~ s/(\W)/\\$1/g;
+(If C<use locale> is set, then this depends on the current locale.)
Today it is more common to use the quotemeta() function or the C<\Q>
metaquoting escape sequence to disable all metacharacters' special
meanings like this:
/$unquoted\Q$quoted\E$unquoted/
+Beware that if you put literal backslashes (those not inside
+interpolated variables) between C<\Q> and C<\E>, double-quotish
+backslash interpolation may lead to confusing results. If you
+I<need> to use literal backslashes within C<\Q...\E>,
+consult L<perlop/"Gory details of parsing quoted constructs">.
+
=head2 Extended Patterns
Perl also defines a consistent extension syntax for features not
For look-behind see below.
-=item C<(?E<lt>=pattern)>
+=item C<(?<=pattern)>
-A zero-width positive look-behind assertion. For example, C</(?E<lt>=\t)\w+/>
+A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/>
matches a word that follows a tab, without including the tab in C<$&>.
Works only for fixed-width look-behind.
Better yet, use the carefully constrained evaluation within a Safe
module. See L<perlsec> for details about both these mechanisms.
-=item C<(?p{ code })>
+=item C<(??{ code })>
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
+A simplified version of the syntax may be introduced for commonly
+used idioms.
This is a "postponed" regular subexpression. The C<code> is evaluated
at run time, at the moment this subexpression may match. The result
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
- (?p{ $re }) # Group with matching parens
+ (??{ $re }) # Group with matching parens
)*
\)
}x;
-=item C<(?E<gt>pattern)>
+=item C<< (?>pattern) >>
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
An "independent" subexpression, one which matches the substring
that a I<standalone> C<pattern> would match if anchored at the given
-position--but it matches no more than this substring. This
+position, and it matches I<nothing other than this substring>. This
construct is useful for optimizations of what would otherwise be
"eternal" matches, because it will not backtrack (see L<"Backtracking">).
+It may also be useful in places where the "grab all you can, and do not
+give anything back" semantic is desirable.
-For example: C<^(?E<gt>a*)ab> will never match, since C<(?E<gt>a*)>
+For example: C<< ^(?>a*)ab >> will never match, since C<< (?>a*) >>
(anchored at the beginning of string, as above) will match I<all>
characters C<a> at the beginning of string, leaving no C<a> for
C<ab> to match. In contrast, C<a*ab> will match the same as C<a+b>,
C<a*ab> will match fewer characters than a standalone C<a*>, since
this makes the tail match.
-An effect similar to C<(?E<gt>pattern)> may be achieved by writing
+An effect similar to C<< (?>pattern) >> may be achieved by writing
C<(?=(pattern))\1>. This matches the same substring as a standalone
C<a+>, and the following C<\1> eats the matched string; it therefore
-makes a zero-length assertion into an analogue of C<(?E<gt>...)>.
+makes a zero-length assertion into an analogue of C<< (?>...) >>.
(The difference between these two constructs is that the second one
uses a capturing group, thus shifting ordinals of backreferences
in the rest of a regular expression.)
m{ \(
(
- [^()]+
+ [^()]+ # x+
|
\( [^()]* \)
)+
m{ \(
(
- (?> [^()]+ )
+ (?> [^()]+ ) # change x+ above to (?> x+ )
|
\( [^()]* \)
)+
\)
}x
-which uses C<(?E<gt>...)> matches exactly when the one above does (verifying
+which uses C<< (?>...) >> matches exactly when the one above does (verifying
this yourself would be a productive exercise), but finishes in a fourth
the time when used on a similar string with 1000000 C<a>s. Be aware,
however, that this pattern currently triggers a warning message under
-B<-w> saying it C<"matches the null string many times">):
+the C<use warnings> pragma or B<-w> switch saying it
+C<"matches null string many times in regex">.
-On simple groups, such as the pattern C<(?E<gt> [^()]+ )>, a comparable
+On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
This was only 4 times slower on a string with 1000000 C<a>s.
+The "grab all you can, and do not give anything back" semantic is desirable
+in many situations where on the first sight a simple C<()*> looks like
+the correct solution. Suppose we parse text with comments being delimited
+by C<#> followed by some optional (horizontal) whitespace. Contrary to
+its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
+the comment delimiter, because it may "give up" some whitespace if
+the remainder of the pattern can be made to match that way. The correct
+answer is either one of these:
+
+ (?>#[ \t]*)
+ #[ \t]*(?![ \t])
+
+For example, to grab non-empty comments into $1, one should use either
+one of these:
+
+ / (?> \# [ \t]* ) ( .+ ) /x;
+ / \# [ \t]* ( [^ \t] .* ) /x;
+
+Which one you pick depends on which of these expressions better reflects
+the above specification of comments.
+
=item C<(?(condition)yes-pattern|no-pattern)>
=item C<(?(condition)yes-pattern)>
=head2 Backtracking
+NOTE: This section presents an abstract approximation of regular
+expression behavior. For a more rigorous (and complicated) view of
+the rules involved in selecting a match among possible alternatives,
+see L<Combining pieces together>.
+
A fundamental feature of regular expression matching involves the
notion called I<backtracking>, which is currently used (when needed)
by all regular expression quantifiers, namely C<*>, C<*?>, C<+>,
-C<+?>, C<{n,m}>, and C<{n,m}?>.
+C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
+internally, but the general principle outlined here is valid.
For a regular expression to match, the I<entire> regular expression must
match, not just part of it. So if the beginning of a pattern containing a
got <d is under the >
Here's another example: let's say you'd like to match a number at the end
-of a string, and you also want to keep the preceding part the match.
+of a string, and you also want to keep the preceding part of the match.
So you write this:
$_ = "I have 2 numbers: 53147";
But that isn't going to match; at least, not the way you're hoping. It
claims that there is no 123 in the string. Here's a clearer picture of
-why it that pattern matches, contrary to popular expectations:
+why that pattern matches, contrary to popular expectations:
$x = 'ABC123' ;
$y = 'ABC445' ;
B<WARNING>: particularly complicated regular expressions can take
exponential time to solve because of the immense number of possible
-ways they can use backtracking to try match. For example, this will
-take a painfully long time to run
-
- /((a{0,5}){0,5}){0,5}/
-
-And if you used C<*>'s instead of limiting it to 0 through 5 matches,
-then it would take forever--or until you ran out of stack space.
-
-A powerful tool for optimizing such beasts is "independent" groups,
-which do not backtrace (see L<C<(?E<gt>pattern)>>). Note also that
-zero-length look-ahead/look-behind assertions will not backtrace to make
+ways they can use backtracking to try match. For example, without
+internal optimizations done by the regular expression engine, this will
+take a painfully long time to run:
+
+ 'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
+
+And if you used C<*>'s in the internal groups instead of limiting them
+to 0 through 5 matches, then it would take forever--or until you ran
+out of stack space. Moreover, these internal optimizations are not
+always applicable. For example, if you put C<{0,5}> instead of C<*>
+on the external group, no current optimization is applicable, and the
+match takes a long time to finish.
+
+A powerful tool for optimizing such beasts is what is known as an
+"independent group",
+which does not backtrack (see L<C<< (?>pattern) >>>). Note also that
+zero-length look-ahead/look-behind assertions will not backtrack to make
the tail match, since they are in "logical" context: only
whether they match is considered relevant. For an example
-where side-effects of a look-ahead I<might> have influenced the
-following match, see L<C<(?E<gt>pattern)>>.
+where side-effects of look-ahead I<might> have influenced the
+following match, see L<C<< (?>pattern) >>>.
=head2 Version 8 Regular Expressions
first character after the "[" is "^", the class matches any character not
in the list. Within a list, the "-" character specifies a
range, so that C<a-z> represents all characters between "a" and "z",
-inclusive. If you want "-" itself to be a member of a class, put it
-at the start or end of the list, or escape it with a backslash. (The
+inclusive. If you want either "-" or "]" itself to be a member of a
+class, put it at the start of the list (possibly after a "^"), or
+escape it with a backslash. "-" is also taken literally when it is
+at the end of the list, just before the closing "]". (The
following all specify the same class of three characters: C<[-az]>,
C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
-specifies a class containing twenty-six characters.)
+specifies a class containing twenty-six characters, even on EBCDIC
+based coded character sets.) Also, if you try to use the character
+classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of
+a range, that's not a range, the "-" is understood literally.
Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results
Characters may be specified using a metacharacter syntax much like that
used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
"\f" a form feed, etc. More generally, \I<nnn>, where I<nnn> is a string
-of octal digits, matches the character whose ASCII value is I<nnn>.
-Similarly, \xI<nn>, where I<nn> are hexadecimal digits, matches the
-character whose ASCII value is I<nn>. The expression \cI<x> matches the
-ASCII character control-I<x>. Finally, the "." metacharacter matches any
-character except "\n" (unless you use C</s>).
+of octal digits, matches the character whose coded character set value
+is I<nnn>. Similarly, \xI<nn>, where I<nn> are hexadecimal digits,
+matches the character whose numeric value is I<nn>. The expression \cI<x>
+matches the character control-I<x>. Finally, the "." metacharacter
+matches any character except "\n" (unless you use C</s>).
You can specify a series of alternatives for a pattern using "|" to
separate them, so that C<fee|fie|foe> will match any of "fee", "fie",
@chars = split //, $string; # // is not magic in split
($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
-Thus Perl allows the C</()/> construct, which I<forcefully breaks
+Thus Perl allows such constructs, by I<forcefully breaking
the infinite loop>. The rules for this are different for lower-level
loops given by the greedy modifiers C<*+{}>, and for higher-level
ones like the C</g> modifier or split() operator.
$_ = 'bar';
s/\w??/<$&>/g;
-results in C<"<><b><><a><><r><>">. At each position of the string the best
+results in C<< <><b><><a><><r><> >>. At each position of the string the best
match given by non-greedy C<??> is the zero-length match, and the I<second
best> match is what is matched by C<\w>. Thus zero-length matches
alternate with one-character-long matches.
The additional state of being I<matched with zero-length> is associated with
the matched string, and is reset by each assignment to pos().
+Zero-length matches at the end of the previous match are ignored
+during C<split>.
+
+=head2 Combining pieces together
+
+Each of the elementary pieces of regular expressions which were described
+before (such as C<ab> or C<\Z>) could match at most one substring
+at the given position of the input string. However, in a typical regular
+expression these elementary pieces are combined into more complicated
+patterns using combining operators C<ST>, C<S|T>, C<S*> etc
+(in these examples C<S> and C<T> are regular subexpressions).
+
+Such combinations can include alternatives, leading to a problem of choice:
+if we match a regular expression C<a|ab> against C<"abc">, will it match
+substring C<"a"> or C<"ab">? One way to describe which substring is
+actually matched is the concept of backtracking (see L<"Backtracking">).
+However, this description is too low-level and makes you think
+in terms of a particular implementation.
+
+Another description starts with notions of "better"/"worse". All the
+substrings which may be matched by the given regular expression can be
+sorted from the "best" match to the "worst" match, and it is the "best"
+match which is chosen. This substitutes the question of "what is chosen?"
+by the question of "which matches are better, and which are worse?".
+
+Again, for elementary pieces there is no such question, since at most
+one match at a given position is possible. This section describes the
+notion of better/worse for combining operators. In the description
+below C<S> and C<T> are regular subexpressions.
+
+=over 4
+
+=item C<ST>
+
+Consider two possible matches, C<AB> and C<A'B'>, C<A> and C<A'> are
+substrings which can be matched by C<S>, C<B> and C<B'> are substrings
+which can be matched by C<T>.
+
+If C<A> is better match for C<S> than C<A'>, C<AB> is a better
+match than C<A'B'>.
+
+If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if
+C<B> is better match for C<T> than C<B'>.
+
+=item C<S|T>
+
+When C<S> can match, it is a better match than when only C<T> can match.
+
+Ordering of two matches for C<S> is the same as for C<S>. Similar for
+two matches for C<T>.
+
+=item C<S{REPEAT_COUNT}>
+
+Matches as C<SSS...S> (repeated as many times as necessary).
+
+=item C<S{min,max}>
+
+Matches as C<S{max}|S{max-1}|...|S{min+1}|S{min}>.
+
+=item C<S{min,max}?>
+
+Matches as C<S{min}|S{min+1}|...|S{max-1}|S{max}>.
+
+=item C<S?>, C<S*>, C<S+>
+
+Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively.
+
+=item C<S??>, C<S*?>, C<S+?>
+
+Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively.
+
+=item C<< (?>S) >>
+
+Matches the best match for C<S> and only that.
+
+=item C<(?=S)>, C<(?<=S)>
+
+Only the best match for C<S> is considered. (This is important only if
+C<S> has capturing parentheses, and backreferences are used somewhere
+else in the whole regular expression.)
+
+=item C<(?!S)>, C<(?<!S)>
+
+For this grouping operator there is no need to describe the ordering, since
+only whether or not C<S> can match is important.
+
+=item C<(??{ EXPR })>
+
+The ordering is the same as for the regular expression which is
+the result of EXPR.
+
+=item C<(?(condition)yes-pattern|no-pattern)>
+
+Recall that which of C<yes-pattern> or C<no-pattern> actually matches is
+already determined. The ordering of the matches is the same as for the
+chosen subexpression.
+
+=back
+
+The above recipes describe the ordering of matches I<at a given position>.
+One more rule is needed to understand how a match is determined for the
+whole regular expression: a match at an earlier position is always better
+than a match at a later position.
=head2 Creating custom RE engines
=head1 BUGS
-This manpage is varies from difficult to understand to completely
-and utterly opaque.
+This document varies from difficult to understand to completely
+and utterly opaque. The wandering prose riddled with jargon is
+hard to fathom in several places.
+
+This document needs a rewrite that separates the tutorial content
+from the reference content.
=head1 SEE ALSO
+L<perlrequick>.
+
+L<perlretut>.
+
L<perlop/"Regexp Quote-Like Operators">.
L<perlop/"Gory details of parsing quoted constructs">.
L<perllocale>.
+L<perlebcdic>.
+
I<Mastering Regular Expressions> by Jeffrey Friedl, published
by O'Reilly and Associates.