pod/perlreref.pod

   1 =head1 NAME
   2
   3 perlreref - Perl Regular Expressions Reference
   4
   5 =head1 DESCRIPTION
   6
   7 This is a quick reference to Perl's regular expressions.
   8 For full information see L<perlre> and L<perlop>, as well
   9 as the L</"SEE ALSO"> section in this document.
  10
  11 =head2 OPERATORS
  12
  13 C<=~> determines to which variable the regex is applied.
  14 In its absence, $_ is used.
  15
  16     $var =~ /foo/;
  17
  18 C<!~> determines to which variable the regex is applied,
  19 and negates the result of the match; it returns
  20 false if the match succeeds, and true if it fails.
  21
  22     $var !~ /foo/;
  23
  24 C<m/pattern/msixpogc> searches a string for a pattern match,
  25 applying the given options.
  26
  27     m  Multiline mode - ^ and $ match internal lines
  28     s  match as a Single line - . matches \n
  29     i  case-Insensitive
  30     x  eXtended legibility - free whitespace and comments
  31     p  Preserve a copy of the matched string -
  32        ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
  33     o  compile pattern Once
  34     g  Global - all occurrences
  35     c  don't reset pos on failed matches when using /g
  36
  37 If 'pattern' is an empty string, the last I<successfully> matched
  38 regex is used. Delimiters other than '/' may be used for both this
  39 operator and the following ones. The leading C<m> can be ommitted
  40 if the delimiter is '/'.
  41
  42 C<qr/pattern/msixpo> lets you store a regex in a variable,
  43 or pass one around. Modifiers as for C<m//>, and are stored
  44 within the regex.
  45
  46 C<s/pattern/replacement/msixpogce> substitutes matches of
  47 'pattern' with 'replacement'. Modifiers as for C<m//>,
  48 with one addition:
  49
  50     e  Evaluate 'replacement' as an expression
  51
  52 'e' may be specified multiple times. 'replacement' is interpreted
  53 as a double quoted string unless a single-quote (C<'>) is the delimiter.
  54
  55 C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
  56 delimiters can be used.  Must be reset with reset().
  57
  58 =head2 SYNTAX
  59
  60    \       Escapes the character immediately following it
  61    .       Matches any single character except a newline (unless /s is used)
  62    ^       Matches at the beginning of the string (or line, if /m is used)
  63    $       Matches at the end of the string (or line, if /m is used)
  64    *       Matches the preceding element 0 or more times
  65    +       Matches the preceding element 1 or more times
  66    ?       Matches the preceding element 0 or 1 times
  67    {...}   Specifies a range of occurrences for the element preceding it
  68    [...]   Matches any one of the characters contained within the brackets
  69    (...)   Groups subexpressions for capturing to $1, $2...
  70    (?:...) Groups subexpressions without capturing (cluster)
  71    |       Matches either the subexpression preceding or following it
  72    \1, \2 ...  Matches the text from the Nth group
  73
  74 =head2 ESCAPE SEQUENCES
  75
  76 These work as in normal strings.
  77
  78    \a       Alarm (beep)
  79    \e       Escape
  80    \f       Formfeed
  81    \n       Newline
  82    \r       Carriage return
  83    \t       Tab
  84    \037     Any octal ASCII value
  85    \x7f     Any hexadecimal ASCII value
  86    \x{263a} A wide hexadecimal value
  87    \cx      Control-x
  88    \N{name} A named character
  89
  90    \l  Lowercase next character
  91    \u  Titlecase next character
  92    \L  Lowercase until \E
  93    \U  Uppercase until \E
  94    \Q  Disable pattern metacharacters until \E
  95    \E  End modification
  96
  97 For Titlecase, see L</Titlecase>.
  98
  99 This one works differently from normal strings:
 100
 101    \b  An assertion, not backspace, except in a character class
 102
 103 =head2 CHARACTER CLASSES
 104
 105    [amy]    Match 'a', 'm' or 'y'
 106    [f-j]    Dash specifies "range"
 107    [f-j-]   Dash escaped or at start or end means 'dash'
 108    [^f-j]   Caret indicates "match any character _except_ these"
 109
 110 The following sequences work within or without a character class.
 111 The first six are locale aware, all are Unicode aware. See L<perllocale>
 112 and L<perlunicode> for details.
 113
 114    \d      A digit
 115    \D      A nondigit
 116    \w      A word character
 117    \W      A non-word character
 118    \s      A whitespace character
 119    \S      A non-whitespace character
 120    \h      An horizontal white space
 121    \H      A non horizontal white space
 122    \v      A vertical white space
 123    \V      A non vertical white space
 124    \R      A generic newline           (?>\v|\x0D\x0A)
 125
 126    \C      Match a byte (with Unicode, '.' matches a character)
 127    \pP     Match P-named (Unicode) property
 128    \p{...} Match Unicode property with long name
 129    \PP     Match non-P
 130    \P{...} Match lack of Unicode property with long name
 131    \X      Match extended Unicode combining character sequence
 132
 133 POSIX character classes and their Unicode and Perl equivalents:
 134
 135    alnum   IsAlnum              Alphanumeric
 136    alpha   IsAlpha              Alphabetic
 137    ascii   IsASCII              Any ASCII char
 138    blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
 139    cntrl   IsCntrl              Control characters
 140    digit   IsDigit  \d          Digits
 141    graph   IsGraph              Alphanumeric and punctuation
 142    lower   IsLower              Lowercase chars (locale and Unicode aware)
 143    print   IsPrint              Alphanumeric, punct, and space
 144    punct   IsPunct              Punctuation
 145    space   IsSpace  [\s\ck]     Whitespace
 146            IsSpacePerl   \s     Perl's whitespace definition
 147    upper   IsUpper              Uppercase chars (locale and Unicode aware)
 148    word    IsWord   \w          Alphanumeric plus _ (Perl extension)
 149    xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
 150
 151 Within a character class:
 152
 153     POSIX       traditional   Unicode
 154     [:digit:]       \d        \p{IsDigit}
 155     [:^digit:]      \D        \P{IsDigit}
 156
 157 =head2 ANCHORS
 158
 159 All are zero-width assertions.
 160
 161    ^  Match string start (or line, if /m is used)
 162    $  Match string end (or line, if /m is used) or before newline
 163    \b Match word boundary (between \w and \W)
 164    \B Match except at word boundary (between \w and \w or \W and \W)
 165    \A Match string start (regardless of /m)
 166    \Z Match string end (before optional newline)
 167    \z Match absolute string end
 168    \G Match where previous m//g left off
 169
 170 =head2 QUANTIFIERS
 171
 172 Quantifiers are greedy by default -- match the B<longest> leftmost.
 173
 174    Maximal Minimal Allowed range
 175    ------- ------- -------------
 176    {n,m}   {n,m}?  Must occur at least n times but no more than m times
 177    {n,}    {n,}?   Must occur at least n times
 178    {n}     {n}?    Must occur exactly n times
 179    *       *?      0 or more times (same as {0,})
 180    +       +?      1 or more times (same as {1,})
 181    ?       ??      0 or 1 time (same as {0,1})
 182
 183 There is no quantifier {,n} -- that gets understood as a literal string.
 184
 185 =head2 EXTENDED CONSTRUCTS
 186
 187    (?#text)         A comment
 188    (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
 189    (?=...)          Zero-width positive lookahead assertion
 190    (?!...)          Zero-width negative lookahead assertion
 191    (?<=...)         Zero-width positive lookbehind assertion
 192    (?<!...)         Zero-width negative lookbehind assertion
 193    (?>...)          Grab what we can, prohibit backtracking
 194    (?{ code })      Embedded code, return value becomes $^R
 195    (??{ code })     Dynamic regex, return value used as regex
 196    (?(cond)yes|no)  cond being integer corresponding to capturing parens
 197    (?(cond)yes)        or a lookaround/eval zero-width assertion
 198
 199 =head2 VARIABLES
 200
 201    $_    Default variable for operators to use
 202
 203    $`    Everything prior to matched string
 204    $&    Entire matched string
 205    $'    Everything after to matched string
 206
 207    ${^PREMATCH}   Everything prior to matched string
 208    ${^MATCH}      Entire matched string
 209    ${^POSTMATCH}  Everything after to matched string
 210
 211 The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
 212 within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
 213 to see equivalent expressions that won't cause slow down.
 214 See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
 215 can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
 216 and C<${^POSTMATCH}>, but for them to be defined, you have to
 217 specify the C</p> (preserve) modifier on your regular expression.
 218
 219    $1, $2 ...  hold the Xth captured expr
 220    $+    Last parenthesized pattern match
 221    $^N   Holds the most recently closed capture
 222    $^R   Holds the result of the last (?{...}) expr
 223    @-    Offsets of starts of groups. $-[0] holds start of whole match
 224    @+    Offsets of ends of groups. $+[0] holds end of whole match
 225    %+    Named capture buffers
 226    %-    Named capture buffers, as array refs
 227
 228 Captured groups are numbered according to their I<opening> paren.
 229
 230 =head2 FUNCTIONS
 231
 232    lc          Lowercase a string
 233    lcfirst     Lowercase first char of a string
 234    uc          Uppercase a string
 235    ucfirst     Titlecase first char of a string
 236
 237    pos         Return or set current match position
 238    quotemeta   Quote metacharacters
 239    reset       Reset ?pattern? status
 240    study       Analyze string for optimizing matching
 241
 242    split       Use a regex to split a string into parts
 243
 244 The first four of these are like the escape sequences C<\L>, C<\l>,
 245 C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
 246
 247 =head2 TERMINOLOGY
 248
 249 =head3 Titlecase
 250
 251 Unicode concept which most often is equal to uppercase, but for
 252 certain characters like the German "sharp s" there is a difference.
 253
 254 =head1 AUTHOR
 255
 256 Iain Truskett.
 257
 258 This document may be distributed under the same terms as Perl itself.
 259
 260 =head1 SEE ALSO
 261
 262 =over 4
 263
 264 =item *
 265
 266 L<perlretut> for a tutorial on regular expressions.
 267
 268 =item *
 269
 270 L<perlrequick> for a rapid tutorial.
 271
 272 =item *
 273
 274 L<perlre> for more details.
 275
 276 =item *
 277
 278 L<perlvar> for details on the variables.
 279
 280 =item *
 281
 282 L<perlop> for details on the operators.
 283
 284 =item *
 285
 286 L<perlfunc> for details on the functions.
 287
 288 =item *
 289
 290 L<perlfaq6> for FAQs on regular expressions.
 291
 292 =item *
 293
 294 The L<re> module to alter behaviour and aid
 295 debugging.
 296
 297 =item *
 298
 299 L<perldebug/"Debugging regular expressions">
 300
 301 =item *
 302
 303 L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
 304 for details on regexes and internationalisation.
 305
 306 =item *
 307
 308 I<Mastering Regular Expressions> by Jeffrey Friedl
 309 (F<http://regex.info/>) for a thorough grounding and
 310 reference on the topic.
 311
 312 =back
 313
 314 =head1 THANKS
 315
 316 David P.C. Wollmann,
 317 Richard Soderberg,
 318 Sean M. Burke,
 319 Tom Christiansen,
 320 Jim Cromie,
 321 and
 322 Jeffrey Goff
 323 for useful advice.
 324
 325 =cut