(If a curly bracket occurs in any other context, it is treated
as a regular character. In particular, the lower bound
-is not optional.) The "*" modifier is equivalent to C<{0,}>, the "+"
-modifier to C<{1,}>, and the "?" modifier to C<{0,1}>. n and m are limited
+is not optional.) The "*" quantifier is equivalent to C<{0,}>, the "+"
+quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited
to integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
while escaping will cause the literal string C<\$> to be matched.
You'll need to write something like C<m/\Quser\E\@\Qhost/>.
-=head3 Character classes
+=head3 Character Classes and other Special Escapes
In addition, Perl defines the following:
X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
\x12 Hexadecimal escape sequence
\x{1234} Long hexadecimal escape sequence
\K Keep the stuff left of the \K, don't include it in $&
- \v Shortcut for (*PRUNE)
- \V Shortcut for (*SKIP)
+ \v Vertical whitespace
+ \V Not vertical whitespace
+ \h Horizontal whitespace
+ \H Not horizontal whitespace
+ \R Linebreak
A C<\w> matches a single alphanumeric character (an alphabetic
character, or a decimal digit) or C<_>, not a whole word. Use C<\w+>
C<\d>, and C<\D> within character classes, but they aren't usable
as either end of a range. If any of them precedes or follows a "-",
the "-" is understood literally. If Unicode is in effect, C<\s> matches
-also "\x{85}", "\x{2028}, and "\x{2029}". See L<perlunicode> for more
+also "\x{85}", "\x{2028}", and "\x{2029}". See L<perlunicode> for more
details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
in general.
X<\w> X<\W> X<word>
+C<\R> will atomically match a linebreak, including the network line-ending
+"\x0D\x0A". Specifically, X<\R> is exactly equivelent to
+
+ (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
+
+B<Note:> C<\R> has no special meaning inside of a character class;
+use C<\v> instead (vertical whitespace).
+X<\R>
+
The POSIX character class syntax
X<character class>
This is the "branch reset" pattern, which has the special property
that the capture buffers are numbered from the same starting point
-in each branch. It is available starting from perl 5.10.
+in each alternation branch. It is available starting from perl 5.10.
-Normally capture buffers in a pattern are numbered sequentially,
-from left to right. Inside this construct that behaviour is
-overridden so that the capture buffers are shared between all the
-branches and take their values from the branch that matched.
+Capture buffers are numbered from left to right, but inside this
+construct the numbering is restarted for each branch.
The numbering within each branch will be as normal, and any buffers
following this construct will be numbered as though the construct
look-behind. The use of C<\K> inside of another look-around assertion
is allowed, but the behaviour is currently not well defined.
-For various reasons C<\K> may be signifigantly more efficient than the
+For various reasons C<\K> may be significantly more efficient than the
equivalent C<< (?<=...) >> construct, and it is especially useful in
situations where you want to efficiently remove something following
something else in a string. For instance
A named capture buffer. Identical in every respect to normal capturing
parentheses C<()> but for the additional fact that C<%+> may be used after
-a succesful match to refer to a named buffer. See C<perlvar> for more
+a successful match to refer to a named buffer. See C<perlvar> for more
details on the C<%+> hash.
If multiple distinct capture buffers have the same name then the
in the match.
This can be used to determine which branch of a pattern was matched
-without using a seperate capture buffer for each branch, which in turn
+without using a separate capture buffer for each branch, which in turn
can result in a performance improvement, as perl cannot optimize
C</(?:(x)|(y)|(z))/> as efficiently as something like
C</(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/>.
The C<o?> matches at the beginning of C<'foo'>, and since the position
in the string is not moved by the match, C<o?> would match again and again
-because of the C<*> modifier. Another common way to create a similar cycle
+because of the C<*> quantifier. Another common way to create a similar cycle
is with the looping modifier C<//g>:
@matches = ( 'foo' =~ m{ o? }xg );
Thus Perl allows such constructs, by I<forcefully breaking
the infinite loop>. The rules for this are different for lower-level
-loops given by the greedy modifiers C<*+{}>, and for higher-level
+loops given by the greedy quantifiers C<*+{}>, and for higher-level
ones like the C</g> modifier or split() operator.
The lower-level loops are I<interrupted> (that is, the loop is