X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlreref.pod;h=b9fb3b020260aa81338947c7c5b10604eaf43beb;hb=fcbc2cdbfb87f2f022dd44fc82e01764faffaa19;hp=9f083c7e8637a2e594fd4b4008f43bfb5a89c2c4;hpb=6d014f17427cba892ebb4fb2b45f28cc737c7c9e;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlreref.pod b/pod/perlreref.pod index 9f083c7..b9fb3b0 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -8,51 +8,54 @@ This is a quick reference to Perl's regular expressions. For full information see L and L, as well as the L section in this document. -=head1 OPERATORS +=head2 OPERATORS - =~ determines to which variable the regex is applied. - In its absence, $_ is used. +C<=~> determines to which variable the regex is applied. +In its absence, $_ is used. - $var =~ /foo/; + $var =~ /foo/; - !~ determines to which variable the regex is applied, - and negates the result of the match; it returns - false if the match succeeds, and true if it fails. +C determines to which variable the regex is applied, +and negates the result of the match; it returns +false if the match succeeds, and true if it fails. - $var !~ /foo/; + $var !~ /foo/; - m/pattern/igmsoxc searches a string for a pattern match, - applying the given options. +C searches a string for a pattern match, +applying the given options. - i case-Insensitive - g Global - all occurrences - m Multiline mode - ^ and $ match internal lines - s match as a Single line - . matches \n - o compile pattern Once - x eXtended legibility - free whitespace and comments - c don't reset pos on failed matches when using /g + m Multiline mode - ^ and $ match internal lines + s match as a Single line - . matches \n + i case-Insensitive + x eXtended legibility - free whitespace and comments + p Preserve a copy of the matched string - + ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. + o compile pattern Once + g Global - all occurrences + c don't reset pos on failed matches when using /g - If 'pattern' is an empty string, the last I matched - regex is used. Delimiters other than '/' may be used for both this - operator and the following ones. +If 'pattern' is an empty string, the last I matched +regex is used. Delimiters other than '/' may be used for both this +operator and the following ones. The leading C can be omitted +if the delimiter is '/'. - qr/pattern/imsox lets you store a regex in a variable, - or pass one around. Modifiers as for m// and are stored - within the regex. +C lets you store a regex in a variable, +or pass one around. Modifiers as for C, and are stored +within the regex. - s/pattern/replacement/igmsoxe substitutes matches of - 'pattern' with 'replacement'. Modifiers as for m// - with one addition: +C substitutes matches of +'pattern' with 'replacement'. Modifiers as for C, +with one addition: - e Evaluate replacement as an expression + e Evaluate 'replacement' as an expression - 'e' may be specified multiple times. 'replacement' is interpreted - as a double quoted string unless a single-quote (') is the delimiter. +'e' may be specified multiple times. 'replacement' is interpreted +as a double quoted string unless a single-quote (C<'>) is the delimiter. - ?pattern? is like m/pattern/ but matches only once. No alternate - delimiters can be used. Must be reset with L. +C is like C but matches only once. No alternate +delimiters can be used. Must be reset with reset(). -=head1 SYNTAX +=head2 SYNTAX \ Escapes the character immediately following it . Matches any single character except a newline (unless /s is used) @@ -66,7 +69,13 @@ as the L section in this document. (...) Groups subexpressions for capturing to $1, $2... (?:...) Groups subexpressions without capturing (cluster) | Matches either the subexpression preceding or following it - \1, \2 ... The text from the Nth group + \1, \2, \3 ... Matches the text from the Nth group + \g1 or \g{1}, \g2 ... Matches the text from the Nth group + \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group + \g{name} Named backreference + \k Named backreference + \k'name' Named backreference + (?P=name) Named backreference (python syntax) =head2 ESCAPE SEQUENCES @@ -78,18 +87,20 @@ These work as in normal strings. \n Newline \r Carriage return \t Tab - \038 Any octal ASCII value + \037 Any octal ASCII value \x7f Any hexadecimal ASCII value \x{263a} A wide hexadecimal value \cx Control-x \N{name} A named character \l Lowercase next character - \u Uppercase next character + \u Titlecase next character \L Lowercase until \E \U Uppercase until \E \Q Disable pattern metacharacters until \E - \E End case modification + \E End modification + +For Titlecase, see L. This one works differently from normal strings: @@ -102,38 +113,46 @@ This one works differently from normal strings: [f-j-] Dash escaped or at start or end means 'dash' [^f-j] Caret indicates "match any character _except_ these" -The following work within or without a character class: - - \d A digit, same as [0-9] - \D A nondigit, same as [^0-9] - \w A word character (alphanumeric), same as [a-zA-Z0-9_] - \W A non-word character, [^a-zA-Z0-9_] - \s A whitespace character, same as [ \t\n\r\f] - \S A non-whitespace character, [^ \t\n\r\f] - \C Match a byte (with Unicode, '.' matches char) +The following sequences work within or without a character class. +The first six are locale aware, all are Unicode aware. See L +and L for details. + + \d A digit + \D A nondigit + \w A word character + \W A non-word character + \s A whitespace character + \S A non-whitespace character + \h An horizontal white space + \H A non horizontal white space + \v A vertical white space + \V A non vertical white space + \R A generic newline (?>\v|\x0D\x0A) + + \C Match a byte (with Unicode, '.' matches a character) \pP Match P-named (Unicode) property \p{...} Match Unicode property with long name \PP Match non-P \P{...} Match lack of Unicode property with long name - \X Match extended unicode sequence + \X Match extended Unicode combining character sequence POSIX character classes and their Unicode and Perl equivalents: - alnum IsAlnum Alphanumeric - alpha IsAlpha Alphabetic - ascii IsASCII Any ASCII char - blank IsSpace [ \t] Horizontal whitespace (GNU) - cntrl IsCntrl Control characters - digit IsDigit \d Digits - graph IsGraph Alphanumeric and punctuation - lower IsLower Lowercase chars (locale aware) - print IsPrint Alphanumeric, punct, and space - punct IsPunct Punctuation - space IsSpace [\s\ck] Whitespace - IsSpacePerl \s Perl's whitespace definition - upper IsUpper Uppercase chars (locale aware) - word IsWord \w Alphanumeric plus _ (Perl) - xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit + alnum IsAlnum Alphanumeric + alpha IsAlpha Alphabetic + ascii IsASCII Any ASCII char + blank IsSpace [ \t] Horizontal whitespace (GNU extension) + cntrl IsCntrl Control characters + digit IsDigit \d Digits + graph IsGraph Alphanumeric and punctuation + lower IsLower Lowercase chars (locale and Unicode aware) + print IsPrint Alphanumeric, punct, and space + punct IsPunct Punctuation + space IsSpace [\s\ck] Whitespace + IsSpacePerl \s Perl's whitespace definition + upper IsUpper Uppercase chars (locale and Unicode aware) + word IsWord \w Alphanumeric plus _ (Perl extension) + xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit Within a character class: @@ -154,48 +173,79 @@ All are zero-width assertions. \z Match absolute string end \G Match where previous m//g left off + \K Keep the stuff left of the \K, don't include it in $& + =head2 QUANTIFIERS Quantifiers are greedy by default -- match the B leftmost. - Maximal Minimal Allowed range - ------- ------- ------------- - {n,m} {n,m}? Must occur at least n times but no more than m times - {n,} {n,}? Must occur at least n times - {n} {n}? Must occur exactly n times - * *? 0 or more times (same as {0,}) - + +? 1 or more times (same as {1,}) - ? ?? 0 or 1 time (same as {0,1}) + Maximal Minimal Possessive Allowed range + ------- ------- ---------- ------------- + {n,m} {n,m}? {n,m}+ Must occur at least n times + but no more than m times + {n,} {n,}? {n,}+ Must occur at least n times + {n} {n}? {n}+ Must occur exactly n times + * *? *+ 0 or more times (same as {0,}) + + +? ++ 1 or more times (same as {1,}) + ? ?? ?+ 0 or 1 time (same as {0,1}) + +The possessive forms (new in Perl 5.10) prevent backtracking: what gets +matched by a pattern with a possessive quantifier will not be backtracked +into, even if that causes the whole match to fail. There is no quantifier {,n} -- that gets understood as a literal string. =head2 EXTENDED CONSTRUCTS - (?#text) A comment - (?imxs-imsx:...) Enable/disable option (as per m// modifiers) - (?=...) Zero-width positive lookahead assertion - (?!...) Zero-width negative lookahead assertion - (?<=...) Zero-width positive lookbehind assertion - (?...) Grab what we can, prohibit backtracking - (?{ code }) Embedded code, return value becomes $^R - (??{ code }) Dynamic regex, return value used as regex - (?(cond)yes|no) cond being integer corresponding to capturing parens - (?(cond)yes) or a lookaround/eval zero-width assertion - -=head1 VARIABLES + (?#text) A comment + (?:...) Groups subexpressions without capturing (cluster) + (?pimsx-imsx:...) Enable/disable option (as per m// modifiers) + (?=...) Zero-width positive lookahead assertion + (?!...) Zero-width negative lookahead assertion + (?<=...) Zero-width positive lookbehind assertion + (?...) Grab what we can, prohibit backtracking + (?|...) Branch reset + (?...) Named capture + (?'name'...) Named capture + (?P...) Named capture (python syntax) + (?{ code }) Embedded code, return value becomes $^R + (??{ code }) Dynamic regex, return value used as regex + (?N) Recurse into subpattern number N + (?-N), (?+N) Recurse into Nth previous/next subpattern + (?R), (?0) Recurse at the beginning of the whole pattern + (?&name) Recurse into a named subpattern + (?P>name) Recurse into a named subpattern (python syntax) + (?(cond)yes|no) + (?(cond)yes) Conditional expression, where "cond" can be: + (N) subpattern N has matched something + () named subpattern has matched something + ('name') named subpattern has matched something + (?{code}) code condition + (R) true if recursing + (RN) true if recursing into Nth subpattern + (R&name) true if recursing into named subpattern + (DEFINE) always false, no no-pattern allowed + +=head2 VARIABLES $_ Default variable for operators to use - $* Enable multiline matching (deprecated; not in 5.9.0 or later) - $& Entire matched string $` Everything prior to matched string + $& Entire matched string $' Everything after to matched string -The use of those last three will slow down B regex use -within your program. Consult L for C<@LAST_MATCH_START> + ${^PREMATCH} Everything prior to matched string + ${^MATCH} Entire matched string + ${^POSTMATCH} Everything after to matched string + +The use of C<$`>, C<$&> or C<$'> will slow down B regex use +within your program. Consult L for C<@-> to see equivalent expressions that won't cause slow down. -See also L. +See also L. Starting with Perl 5.10, you +can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> +and C<${^POSTMATCH}>, but for them to be defined, you have to +specify the C

(preserve) modifier on your regular expression. $1, $2 ... hold the Xth captured expr $+ Last parenthesized pattern match @@ -203,25 +253,38 @@ See also L. $^R Holds the result of the last (?{...}) expr @- Offsets of starts of groups. $-[0] holds start of whole match @+ Offsets of ends of groups. $+[0] holds end of whole match + %+ Named capture buffers + %- Named capture buffers, as array refs Captured groups are numbered according to their I paren. -=head1 FUNCTIONS +=head2 FUNCTIONS lc Lowercase a string lcfirst Lowercase first char of a string uc Uppercase a string - ucfirst Uppercase first char of a string + ucfirst Titlecase first char of a string + pos Return or set current match position quotemeta Quote metacharacters reset Reset ?pattern? status study Analyze string for optimizing matching - split Use regex to split a string into parts + split Use a regex to split a string into parts + +The first four of these are like the escape sequences C<\L>, C<\l>, +C<\U>, and C<\u>. For Titlecase, see L. + +=head2 TERMINOLOGY + +=head3 Titlecase + +Unicode concept which most often is equal to uppercase, but for +certain characters like the German "sharp s" there is a difference. =head1 AUTHOR -Iain Truskett. +Iain Truskett. Updated by the Perl 5 Porters. This document may be distributed under the same terms as Perl itself. @@ -259,6 +322,14 @@ L for FAQs on regular expressions. =item * +L for a reference on backslash sequences. + +=item * + +L for a reference on character classes. + +=item * + The L module to alter behaviour and aid debugging. @@ -268,7 +339,7 @@ L =item * -L, L, L and L +L, L, L and L for details on regexes and internationalisation. =item *