pod/perlreref.pod

   1 =head1 NAME
   2
   3 perlreref - Perl Regular Expressions Reference
   4
   5 =head1 DESCRIPTION
   6
   7 This is a quick reference to Perl's regular expressions.
   8 For full information see L<perlre> and L<perlop>, as well
   9 as the L</"SEE ALSO"> section in this document.
  10
  11 =head2 OPERATORS
  12
  13   =~ determines to which variable the regex is applied.
  14      In its absence, $_ is used.
  15
  16         $var =~ /foo/;
  17
  18   !~ determines to which variable the regex is applied,
  19      and negates the result of the match; it returns
  20      false if the match succeeds, and true if it fails.
  21
  22        $var !~ /foo/;
  23
  24   m/pattern/igmsoxc searches a string for a pattern match,
  25      applying the given options.
  26
  27         i  case-Insensitive
  28         g  Global - all occurrences
  29         m  Multiline mode - ^ and $ match internal lines
  30         s  match as a Single line - . matches \n
  31         o  compile pattern Once
  32         x  eXtended legibility - free whitespace and comments
  33         c  don't reset pos on failed matches when using /g
  34
  35      If 'pattern' is an empty string, the last I<successfully> matched
  36      regex is used. Delimiters other than '/' may be used for both this
  37      operator and the following ones.
  38
  39   qr/pattern/imsox lets you store a regex in a variable,
  40      or pass one around. Modifiers as for m// and are stored
  41      within the regex.
  42
  43   s/pattern/replacement/igmsoxe substitutes matches of
  44      'pattern' with 'replacement'. Modifiers as for m//
  45      with one addition:
  46
  47         e  Evaluate replacement as an expression
  48
  49      'e' may be specified multiple times. 'replacement' is interpreted
  50      as a double quoted string unless a single-quote (') is the delimiter.
  51
  52   ?pattern? is like m/pattern/ but matches only once. No alternate
  53       delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
  54
  55 =head2 SYNTAX
  56
  57    \       Escapes the character immediately following it
  58    .       Matches any single character except a newline (unless /s is used)
  59    ^       Matches at the beginning of the string (or line, if /m is used)
  60    $       Matches at the end of the string (or line, if /m is used)
  61    *       Matches the preceding element 0 or more times
  62    +       Matches the preceding element 1 or more times
  63    ?       Matches the preceding element 0 or 1 times
  64    {...}   Specifies a range of occurrences for the element preceding it
  65    [...]   Matches any one of the characters contained within the brackets
  66    (...)   Groups subexpressions for capturing to $1, $2...
  67    (?:...) Groups subexpressions without capturing (cluster)
  68    |       Matches either the subexpression preceding or following it
  69    \1, \2 ...  The text from the Nth group
  70
  71 =head2 ESCAPE SEQUENCES
  72
  73 These work as in normal strings.
  74
  75    \a       Alarm (beep)
  76    \e       Escape
  77    \f       Formfeed
  78    \n       Newline
  79    \r       Carriage return
  80    \t       Tab
  81    \037     Any octal ASCII value
  82    \x7f     Any hexadecimal ASCII value
  83    \x{263a} A wide hexadecimal value
  84    \cx      Control-x
  85    \N{name} A named character
  86
  87    \l  Lowercase next character
  88    \u  Titlecase next character
  89    \L  Lowercase until \E
  90    \U  Uppercase until \E
  91    \Q  Disable pattern metacharacters until \E
  92    \E  End case modification
  93
  94 For Titlecase, see L</Titlecase>.
  95
  96 This one works differently from normal strings:
  97
  98    \b  An assertion, not backspace, except in a character class
  99
 100 =head2 CHARACTER CLASSES
 101
 102    [amy]    Match 'a', 'm' or 'y'
 103    [f-j]    Dash specifies "range"
 104    [f-j-]   Dash escaped or at start or end means 'dash'
 105    [^f-j]   Caret indicates "match any character _except_ these"
 106
 107 The following sequences work within or without a character class.
 108 The first six are locale aware, all are Unicode aware.  The default
 109 character class equivalent are given.  See L<perllocale> and
 110 L<perlunicode> for details.
 111
 112    \d      A digit                     [0-9]
 113    \D      A nondigit                  [^0-9]
 114    \w      A word character            [a-zA-Z0-9_]
 115    \W      A non-word character        [^a-zA-Z0-9_]
 116    \s      A whitespace character      [ \t\n\r\f]
 117    \S      A non-whitespace character  [^ \t\n\r\f]
 118
 119    \C      Match a byte (with Unicode, '.' matches a character)
 120    \pP     Match P-named (Unicode) property
 121    \p{...} Match Unicode property with long name
 122    \PP     Match non-P
 123    \P{...} Match lack of Unicode property with long name
 124    \X      Match extended unicode sequence
 125
 126 POSIX character classes and their Unicode and Perl equivalents:
 127
 128    alnum   IsAlnum              Alphanumeric
 129    alpha   IsAlpha              Alphabetic
 130    ascii   IsASCII              Any ASCII char
 131    blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
 132    cntrl   IsCntrl              Control characters
 133    digit   IsDigit  \d          Digits
 134    graph   IsGraph              Alphanumeric and punctuation
 135    lower   IsLower              Lowercase chars (locale and Unicode aware)
 136    print   IsPrint              Alphanumeric, punct, and space
 137    punct   IsPunct              Punctuation
 138    space   IsSpace  [\s\ck]     Whitespace
 139            IsSpacePerl   \s     Perl's whitespace definition
 140    upper   IsUpper              Uppercase chars (locale and Unicode aware)
 141    word    IsWord   \w          Alphanumeric plus _ (Perl extension)
 142    xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
 143
 144 Within a character class:
 145
 146     POSIX       traditional   Unicode
 147     [:digit:]       \d        \p{IsDigit}
 148     [:^digit:]      \D        \P{IsDigit}
 149
 150 =head2 ANCHORS
 151
 152 All are zero-width assertions.
 153
 154    ^  Match string start (or line, if /m is used)
 155    $  Match string end (or line, if /m is used) or before newline
 156    \b Match word boundary (between \w and \W)
 157    \B Match except at word boundary (between \w and \w or \W and \W)
 158    \A Match string start (regardless of /m)
 159    \Z Match string end (before optional newline)
 160    \z Match absolute string end
 161    \G Match where previous m//g left off
 162
 163 =head2 QUANTIFIERS
 164
 165 Quantifiers are greedy by default -- match the B<longest> leftmost.
 166
 167    Maximal Minimal Allowed range
 168    ------- ------- -------------
 169    {n,m}   {n,m}?  Must occur at least n times but no more than m times
 170    {n,}    {n,}?   Must occur at least n times
 171    {n}     {n}?    Must occur exactly n times
 172    *       *?      0 or more times (same as {0,})
 173    +       +?      1 or more times (same as {1,})
 174    ?       ??      0 or 1 time (same as {0,1})
 175
 176 There is no quantifier {,n} -- that gets understood as a literal string.
 177
 178 =head2 EXTENDED CONSTRUCTS
 179
 180    (?#text)         A comment
 181    (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
 182    (?=...)          Zero-width positive lookahead assertion
 183    (?!...)          Zero-width negative lookahead assertion
 184    (?<=...)         Zero-width positive lookbehind assertion
 185    (?<!...)         Zero-width negative lookbehind assertion
 186    (?>...)          Grab what we can, prohibit backtracking
 187    (?{ code })      Embedded code, return value becomes $^R
 188    (??{ code })     Dynamic regex, return value used as regex
 189    (?(cond)yes|no)  cond being integer corresponding to capturing parens
 190    (?(cond)yes)        or a lookaround/eval zero-width assertion
 191
 192 =head2 VARIABLES
 193
 194    $_    Default variable for operators to use
 195    $*    Enable multiline matching (deprecated; not in 5.9.0 or later)
 196
 197    $&    Entire matched string
 198    $`    Everything prior to matched string
 199    $'    Everything after to matched string
 200
 201 The use of those last three will slow down B<all> regex use
 202 within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
 203 to see equivalent expressions that won't cause slow down.
 204 See also L<Devel::SawAmpersand>.
 205
 206    $1, $2 ...  hold the Xth captured expr
 207    $+    Last parenthesized pattern match
 208    $^N   Holds the most recently closed capture
 209    $^R   Holds the result of the last (?{...}) expr
 210    @-    Offsets of starts of groups. $-[0] holds start of whole match
 211    @+    Offsets of ends of groups. $+[0] holds end of whole match
 212
 213 Captured groups are numbered according to their I<opening> paren.
 214
 215 =head2 FUNCTIONS
 216
 217    lc          Lowercase a string
 218    lcfirst     Lowercase first char of a string
 219    uc          Uppercase a string
 220    ucfirst     Titlecase first char of a string
 221
 222    pos         Return or set current match position
 223    quotemeta   Quote metacharacters
 224    reset       Reset ?pattern? status
 225    study       Analyze string for optimizing matching
 226
 227    split       Use regex to split a string into parts
 228
 229 The first four of these are like the escape sequences C<\L>, C<\l>,
 230 C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
 231
 232 =head2 TERMINOLOGY
 233
 234 =head3 Titlecase
 235
 236 Unicode concept which most often is equal to uppercase, but for
 237 certain characters like the German "sharp s" there is a difference.
 238
 239 =head1 AUTHOR
 240
 241 Iain Truskett.
 242
 243 This document may be distributed under the same terms as Perl itself.
 244
 245 =head1 SEE ALSO
 246
 247 =over 4
 248
 249 =item *
 250
 251 L<perlretut> for a tutorial on regular expressions.
 252
 253 =item *
 254
 255 L<perlrequick> for a rapid tutorial.
 256
 257 =item *
 258
 259 L<perlre> for more details.
 260
 261 =item *
 262
 263 L<perlvar> for details on the variables.
 264
 265 =item *
 266
 267 L<perlop> for details on the operators.
 268
 269 =item *
 270
 271 L<perlfunc> for details on the functions.
 272
 273 =item *
 274
 275 L<perlfaq6> for FAQs on regular expressions.
 276
 277 =item *
 278
 279 The L<re> module to alter behaviour and aid
 280 debugging.
 281
 282 =item *
 283
 284 L<perldebug/"Debugging regular expressions">
 285
 286 =item *
 287
 288 L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
 289 for details on regexes and internationalisation.
 290
 291 =item *
 292
 293 I<Mastering Regular Expressions> by Jeffrey Friedl
 294 (F<http://regex.info/>) for a thorough grounding and
 295 reference on the topic.
 296
 297 =back
 298
 299 =head1 THANKS
 300
 301 David P.C. Wollmann,
 302 Richard Soderberg,
 303 Sean M. Burke,
 304 Tom Christiansen,
 305 Jim Cromie,
 306 and
 307 Jeffrey Goff
 308 for useful advice.
 309
 310 =cut