pod/perlreref.pod

   1 =head1 NAME
   2
   3 perlreref - Perl Regular Expressions Reference
   4
   5 =head1 DESCRIPTION
   6
   7 This is a quick reference to Perl's regular expressions.
   8 For full information see L<perlre> and L<perlop>, as well
   9 as the L</"SEE ALSO"> section in this document.
  10
  11 =head2 OPERATORS
  12
  13 C<=~> determines to which variable the regex is applied.
  14 In its absence, $_ is used.
  15
  16     $var =~ /foo/;
  17
  18 C<!~> determines to which variable the regex is applied,
  19 and negates the result of the match; it returns
  20 false if the match succeeds, and true if it fails.
  21
  22     $var !~ /foo/;
  23
  24 C<m/pattern/msixpogc> searches a string for a pattern match,
  25 applying the given options.
  26
  27     m  Multiline mode - ^ and $ match internal lines
  28     s  match as a Single line - . matches \n
  29     i  case-Insensitive
  30     x  eXtended legibility - free whitespace and comments
  31     p  Preserve a copy of the matched string -
  32        ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
  33     o  compile pattern Once
  34     g  Global - all occurrences
  35     c  don't reset pos on failed matches when using /g
  36
  37 If 'pattern' is an empty string, the last I<successfully> matched
  38 regex is used. Delimiters other than '/' may be used for both this
  39 operator and the following ones. The leading C<m> can be omitted
  40 if the delimiter is '/'.
  41
  42 C<qr/pattern/msixpo> lets you store a regex in a variable,
  43 or pass one around. Modifiers as for C<m//>, and are stored
  44 within the regex.
  45
  46 C<s/pattern/replacement/msixpogce> substitutes matches of
  47 'pattern' with 'replacement'. Modifiers as for C<m//>,
  48 with one addition:
  49
  50     e  Evaluate 'replacement' as an expression
  51
  52 'e' may be specified multiple times. 'replacement' is interpreted
  53 as a double quoted string unless a single-quote (C<'>) is the delimiter.
  54
  55 C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
  56 delimiters can be used.  Must be reset with reset().
  57
  58 =head2 SYNTAX
  59
  60    \       Escapes the character immediately following it
  61    .       Matches any single character except a newline (unless /s is used)
  62    ^       Matches at the beginning of the string (or line, if /m is used)
  63    $       Matches at the end of the string (or line, if /m is used)
  64    *       Matches the preceding element 0 or more times
  65    +       Matches the preceding element 1 or more times
  66    ?       Matches the preceding element 0 or 1 times
  67    {...}   Specifies a range of occurrences for the element preceding it
  68    [...]   Matches any one of the characters contained within the brackets
  69    (...)   Groups subexpressions for capturing to $1, $2...
  70    (?:...) Groups subexpressions without capturing (cluster)
  71    |       Matches either the subexpression preceding or following it
  72    \1, \2, \3 ...           Matches the text from the Nth group
  73    \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
  74    \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
  75    \g{name}     Named backreference
  76    \k<name>     Named backreference
  77    \k'name'     Named backreference
  78    (?P=name)    Named backreference (python syntax)
  79
  80 =head2 ESCAPE SEQUENCES
  81
  82 These work as in normal strings.
  83
  84    \a       Alarm (beep)
  85    \e       Escape
  86    \f       Formfeed
  87    \n       Newline
  88    \r       Carriage return
  89    \t       Tab
  90    \037     Any octal ASCII value
  91    \x7f     Any hexadecimal ASCII value
  92    \x{263a} A wide hexadecimal value
  93    \cx      Control-x
  94    \N{name} A named character
  95
  96    \l  Lowercase next character
  97    \u  Titlecase next character
  98    \L  Lowercase until \E
  99    \U  Uppercase until \E
 100    \Q  Disable pattern metacharacters until \E
 101    \E  End modification
 102
 103 For Titlecase, see L</Titlecase>.
 104
 105 This one works differently from normal strings:
 106
 107    \b  An assertion, not backspace, except in a character class
 108
 109 =head2 CHARACTER CLASSES
 110
 111    [amy]    Match 'a', 'm' or 'y'
 112    [f-j]    Dash specifies "range"
 113    [f-j-]   Dash escaped or at start or end means 'dash'
 114    [^f-j]   Caret indicates "match any character _except_ these"
 115
 116 The following sequences work within or without a character class.
 117 The first six are locale aware, all are Unicode aware. See L<perllocale>
 118 and L<perlunicode> for details.
 119
 120    \d      A digit
 121    \D      A nondigit
 122    \w      A word character
 123    \W      A non-word character
 124    \s      A whitespace character
 125    \S      A non-whitespace character
 126    \h      An horizontal white space
 127    \H      A non horizontal white space
 128    \N      A non newline (when not followed by a '{'; it's like . without /s)
 129    \v      A vertical white space
 130    \V      A non vertical white space
 131    \R      A generic newline           (?>\v|\x0D\x0A)
 132
 133    \C      Match a byte (with Unicode, '.' matches a character)
 134    \pP     Match P-named (Unicode) property
 135    \p{...} Match Unicode property with name longer than 1 character
 136    \PP     Match non-P
 137    \P{...} Match lack of Unicode property with name longer than 1 char
 138    \X      Match Unicode extended grapheme cluster
 139
 140 POSIX character classes and their Unicode and Perl equivalents:
 141
 142    alnum   IsAlnum              Alphanumeric
 143    alpha   IsAlpha              Alphabetic
 144    ascii   IsASCII              Any ASCII char
 145    blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
 146    cntrl   IsCntrl              Control characters
 147    digit   IsDigit  \d          Digits
 148    graph   IsGraph              Alphanumeric and punctuation
 149    lower   IsLower              Lowercase chars (locale and Unicode aware)
 150    print   IsPrint              Alphanumeric, punct, and space
 151    punct   IsPunct              Punctuation
 152    space   IsSpace  [\s\ck]     Whitespace
 153            IsSpacePerl   \s     Perl's whitespace definition
 154    upper   IsUpper              Uppercase chars (locale and Unicode aware)
 155    word    IsWord   \w          Alphanumeric plus _ (Perl extension)
 156    xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
 157
 158 Within a character class:
 159
 160     POSIX       traditional   Unicode
 161     [:digit:]       \d        \p{IsDigit}
 162     [:^digit:]      \D        \P{IsDigit}
 163
 164 =head2 ANCHORS
 165
 166 All are zero-width assertions.
 167
 168    ^  Match string start (or line, if /m is used)
 169    $  Match string end (or line, if /m is used) or before newline
 170    \b Match word boundary (between \w and \W)
 171    \B Match except at word boundary (between \w and \w or \W and \W)
 172    \A Match string start (regardless of /m)
 173    \Z Match string end (before optional newline)
 174    \z Match absolute string end
 175    \G Match where previous m//g left off
 176
 177    \K Keep the stuff left of the \K, don't include it in $&
 178
 179 =head2 QUANTIFIERS
 180
 181 Quantifiers are greedy by default and match the B<longest> leftmost.
 182
 183    Maximal Minimal Possessive Allowed range
 184    ------- ------- ---------- -------------
 185    {n,m}   {n,m}?  {n,m}+     Must occur at least n times
 186                               but no more than m times
 187    {n,}    {n,}?   {n,}+      Must occur at least n times
 188    {n}     {n}?    {n}+       Must occur exactly n times
 189    *       *?      *+         0 or more times (same as {0,})
 190    +       +?      ++         1 or more times (same as {1,})
 191    ?       ??      ?+         0 or 1 time (same as {0,1})
 192
 193 The possessive forms (new in Perl 5.10) prevent backtracking: what gets
 194 matched by a pattern with a possessive quantifier will not be backtracked
 195 into, even if that causes the whole match to fail.
 196
 197 There is no quantifier C<{,n}>. That's interpreted as a literal string.
 198
 199 =head2 EXTENDED CONSTRUCTS
 200
 201    (?#text)          A comment
 202    (?:...)           Groups subexpressions without capturing (cluster)
 203    (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
 204    (?=...)           Zero-width positive lookahead assertion
 205    (?!...)           Zero-width negative lookahead assertion
 206    (?<=...)          Zero-width positive lookbehind assertion
 207    (?<!...)          Zero-width negative lookbehind assertion
 208    (?>...)           Grab what we can, prohibit backtracking
 209    (?|...)           Branch reset
 210    (?<name>...)      Named capture
 211    (?'name'...)      Named capture
 212    (?P<name>...)     Named capture (python syntax)
 213    (?{ code })       Embedded code, return value becomes $^R
 214    (??{ code })      Dynamic regex, return value used as regex
 215    (?N)              Recurse into subpattern number N
 216    (?-N), (?+N)      Recurse into Nth previous/next subpattern
 217    (?R), (?0)        Recurse at the beginning of the whole pattern
 218    (?&name)          Recurse into a named subpattern
 219    (?P>name)         Recurse into a named subpattern (python syntax)
 220    (?(cond)yes|no)
 221    (?(cond)yes)      Conditional expression, where "cond" can be:
 222                      (N)       subpattern N has matched something
 223                      (<name>)  named subpattern has matched something
 224                      ('name')  named subpattern has matched something
 225                      (?{code}) code condition
 226                      (R)       true if recursing
 227                      (RN)      true if recursing into Nth subpattern
 228                      (R&name)  true if recursing into named subpattern
 229                      (DEFINE)  always false, no no-pattern allowed
 230
 231 =head2 VARIABLES
 232
 233    $_    Default variable for operators to use
 234
 235    $`    Everything prior to matched string
 236    $&    Entire matched string
 237    $'    Everything after to matched string
 238
 239    ${^PREMATCH}   Everything prior to matched string
 240    ${^MATCH}      Entire matched string
 241    ${^POSTMATCH}  Everything after to matched string
 242
 243 The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
 244 within your program. Consult L<perlvar> for C<@->
 245 to see equivalent expressions that won't cause slow down.
 246 See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
 247 can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
 248 and C<${^POSTMATCH}>, but for them to be defined, you have to
 249 specify the C</p> (preserve) modifier on your regular expression.
 250
 251    $1, $2 ...  hold the Xth captured expr
 252    $+    Last parenthesized pattern match
 253    $^N   Holds the most recently closed capture
 254    $^R   Holds the result of the last (?{...}) expr
 255    @-    Offsets of starts of groups. $-[0] holds start of whole match
 256    @+    Offsets of ends of groups. $+[0] holds end of whole match
 257    %+    Named capture buffers
 258    %-    Named capture buffers, as array refs
 259
 260 Captured groups are numbered according to their I<opening> paren.
 261
 262 =head2 FUNCTIONS
 263
 264    lc          Lowercase a string
 265    lcfirst     Lowercase first char of a string
 266    uc          Uppercase a string
 267    ucfirst     Titlecase first char of a string
 268
 269    pos         Return or set current match position
 270    quotemeta   Quote metacharacters
 271    reset       Reset ?pattern? status
 272    study       Analyze string for optimizing matching
 273
 274    split       Use a regex to split a string into parts
 275
 276 The first four of these are like the escape sequences C<\L>, C<\l>,
 277 C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
 278
 279 =head2 TERMINOLOGY
 280
 281 =head3 Titlecase
 282
 283 Unicode concept which most often is equal to uppercase, but for
 284 certain characters like the German "sharp s" there is a difference.
 285
 286 =head1 AUTHOR
 287
 288 Iain Truskett. Updated by the Perl 5 Porters.
 289
 290 This document may be distributed under the same terms as Perl itself.
 291
 292 =head1 SEE ALSO
 293
 294 =over 4
 295
 296 =item *
 297
 298 L<perlretut> for a tutorial on regular expressions.
 299
 300 =item *
 301
 302 L<perlrequick> for a rapid tutorial.
 303
 304 =item *
 305
 306 L<perlre> for more details.
 307
 308 =item *
 309
 310 L<perlvar> for details on the variables.
 311
 312 =item *
 313
 314 L<perlop> for details on the operators.
 315
 316 =item *
 317
 318 L<perlfunc> for details on the functions.
 319
 320 =item *
 321
 322 L<perlfaq6> for FAQs on regular expressions.
 323
 324 =item *
 325
 326 L<perlrebackslash> for a reference on backslash sequences.
 327
 328 =item *
 329
 330 L<perlrecharclass> for a reference on character classes.
 331
 332 =item *
 333
 334 The L<re> module to alter behaviour and aid
 335 debugging.
 336
 337 =item *
 338
 339 L<perldebug/"Debugging regular expressions">
 340
 341 =item *
 342
 343 L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
 344 for details on regexes and internationalisation.
 345
 346 =item *
 347
 348 I<Mastering Regular Expressions> by Jeffrey Friedl
 349 (F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
 350 reference on the topic.
 351
 352 =back
 353
 354 =head1 THANKS
 355
 356 David P.C. Wollmann,
 357 Richard Soderberg,
 358 Sean M. Burke,
 359 Tom Christiansen,
 360 Jim Cromie,
 361 and
 362 Jeffrey Goff
 363 for useful advice.
 364
 365 =cut