Commit | Line | Data |
30487ceb |
1 | =head1 NAME |
2 | |
3 | perlreref - Perl Regular Expressions Reference |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This is a quick reference to Perl's regular expressions. |
8 | For full information see L<perlre> and L<perlop>, as well |
6d014f17 |
9 | as the L</"SEE ALSO"> section in this document. |
30487ceb |
10 | |
a5365663 |
11 | =head2 OPERATORS |
30487ceb |
12 | |
e17472c5 |
13 | C<=~> determines to which variable the regex is applied. |
14 | In its absence, $_ is used. |
30487ceb |
15 | |
e17472c5 |
16 | $var =~ /foo/; |
30487ceb |
17 | |
e17472c5 |
18 | C<!~> determines to which variable the regex is applied, |
19 | and negates the result of the match; it returns |
20 | false if the match succeeds, and true if it fails. |
6d014f17 |
21 | |
e17472c5 |
22 | $var !~ /foo/; |
6d014f17 |
23 | |
e17472c5 |
24 | C<m/pattern/msixpogc> searches a string for a pattern match, |
25 | applying the given options. |
30487ceb |
26 | |
e17472c5 |
27 | m Multiline mode - ^ and $ match internal lines |
28 | s match as a Single line - . matches \n |
29 | i case-Insensitive |
30 | x eXtended legibility - free whitespace and comments |
31 | p Preserve a copy of the matched string - |
32 | ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. |
33 | o compile pattern Once |
34 | g Global - all occurrences |
35 | c don't reset pos on failed matches when using /g |
30487ceb |
36 | |
e17472c5 |
37 | If 'pattern' is an empty string, the last I<successfully> matched |
38 | regex is used. Delimiters other than '/' may be used for both this |
64c5a566 |
39 | operator and the following ones. The leading C<m> can be omitted |
e17472c5 |
40 | if the delimiter is '/'. |
30487ceb |
41 | |
e17472c5 |
42 | C<qr/pattern/msixpo> lets you store a regex in a variable, |
43 | or pass one around. Modifiers as for C<m//>, and are stored |
44 | within the regex. |
30487ceb |
45 | |
e17472c5 |
46 | C<s/pattern/replacement/msixpogce> substitutes matches of |
47 | 'pattern' with 'replacement'. Modifiers as for C<m//>, |
48 | with one addition: |
30487ceb |
49 | |
e17472c5 |
50 | e Evaluate 'replacement' as an expression |
30487ceb |
51 | |
e17472c5 |
52 | 'e' may be specified multiple times. 'replacement' is interpreted |
53 | as a double quoted string unless a single-quote (C<'>) is the delimiter. |
30487ceb |
54 | |
e17472c5 |
55 | C<?pattern?> is like C<m/pattern/> but matches only once. No alternate |
56 | delimiters can be used. Must be reset with reset(). |
30487ceb |
57 | |
a5365663 |
58 | =head2 SYNTAX |
30487ceb |
59 | |
6d014f17 |
60 | \ Escapes the character immediately following it |
e5a7b003 |
61 | . Matches any single character except a newline (unless /s is used) |
62 | ^ Matches at the beginning of the string (or line, if /m is used) |
63 | $ Matches at the end of the string (or line, if /m is used) |
64 | * Matches the preceding element 0 or more times |
65 | + Matches the preceding element 1 or more times |
66 | ? Matches the preceding element 0 or 1 times |
67 | {...} Specifies a range of occurrences for the element preceding it |
68 | [...] Matches any one of the characters contained within the brackets |
69 | (...) Groups subexpressions for capturing to $1, $2... |
70 | (?:...) Groups subexpressions without capturing (cluster) |
6d014f17 |
71 | | Matches either the subexpression preceding or following it |
64c5a566 |
72 | \1, \2, \3 ... Matches the text from the Nth group |
73 | \g1 or \g{1}, \g2 ... Matches the text from the Nth group |
74 | \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group |
75 | \g{name} Named backreference |
76 | \k<name> Named backreference |
77 | \k'name' Named backreference |
78 | (?P=name) Named backreference (python syntax) |
30487ceb |
79 | |
80 | =head2 ESCAPE SEQUENCES |
81 | |
82 | These work as in normal strings. |
83 | |
84 | \a Alarm (beep) |
85 | \e Escape |
86 | \f Formfeed |
87 | \n Newline |
88 | \r Carriage return |
89 | \t Tab |
6ed007ae |
90 | \037 Any octal ASCII value |
30487ceb |
91 | \x7f Any hexadecimal ASCII value |
92 | \x{263a} A wide hexadecimal value |
93 | \cx Control-x |
94 | \N{name} A named character |
e526e8bb |
95 | \N{U+263D} A Unicode character by hex ordinal |
30487ceb |
96 | |
6d014f17 |
97 | \l Lowercase next character |
d3b55b48 |
98 | \u Titlecase next character |
30487ceb |
99 | \L Lowercase until \E |
d3b55b48 |
100 | \U Uppercase until \E |
30487ceb |
101 | \Q Disable pattern metacharacters until \E |
e17472c5 |
102 | \E End modification |
30487ceb |
103 | |
47e8a552 |
104 | For Titlecase, see L</Titlecase>. |
105 | |
30487ceb |
106 | This one works differently from normal strings: |
107 | |
108 | \b An assertion, not backspace, except in a character class |
109 | |
110 | =head2 CHARACTER CLASSES |
111 | |
112 | [amy] Match 'a', 'm' or 'y' |
113 | [f-j] Dash specifies "range" |
114 | [f-j-] Dash escaped or at start or end means 'dash' |
6d014f17 |
115 | [^f-j] Caret indicates "match any character _except_ these" |
30487ceb |
116 | |
df225385 |
117 | The following sequences (except C<\N>) work within or without a character class. |
e17472c5 |
118 | The first six are locale aware, all are Unicode aware. See L<perllocale> |
119 | and L<perlunicode> for details. |
120 | |
121 | \d A digit |
122 | \D A nondigit |
123 | \w A word character |
124 | \W A non-word character |
125 | \s A whitespace character |
126 | \S A non-whitespace character |
418e7b04 |
127 | \h An horizontal whitespace |
128 | \H A non horizontal whitespace |
b3b85878 |
129 | \N A non newline (when not followed by '{NAME}'; experimental; not |
130 | valid in a character class; equivalent to [^\n]; it's like '.' |
131 | without /s modifier) |
418e7b04 |
132 | \v A vertical whitespace |
133 | \V A non vertical whitespace |
e17472c5 |
134 | \R A generic newline (?>\v|\x0D\x0A) |
e04a154e |
135 | |
136 | \C Match a byte (with Unicode, '.' matches a character) |
30487ceb |
137 | \pP Match P-named (Unicode) property |
e1b711da |
138 | \p{...} Match Unicode property with name longer than 1 character |
30487ceb |
139 | \PP Match non-P |
e1b711da |
140 | \P{...} Match lack of Unicode property with name longer than 1 char |
0111a78f |
141 | \X Match Unicode extended grapheme cluster |
30487ceb |
142 | |
143 | POSIX character classes and their Unicode and Perl equivalents: |
144 | |
e04a154e |
145 | alnum IsAlnum Alphanumeric |
146 | alpha IsAlpha Alphabetic |
147 | ascii IsASCII Any ASCII char |
148 | blank IsSpace [ \t] Horizontal whitespace (GNU extension) |
149 | cntrl IsCntrl Control characters |
150 | digit IsDigit \d Digits |
151 | graph IsGraph Alphanumeric and punctuation |
152 | lower IsLower Lowercase chars (locale and Unicode aware) |
153 | print IsPrint Alphanumeric, punct, and space |
154 | punct IsPunct Punctuation |
155 | space IsSpace [\s\ck] Whitespace |
156 | IsSpacePerl \s Perl's whitespace definition |
157 | upper IsUpper Uppercase chars (locale and Unicode aware) |
158 | word IsWord \w Alphanumeric plus _ (Perl extension) |
159 | xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit |
30487ceb |
160 | |
161 | Within a character class: |
162 | |
163 | POSIX traditional Unicode |
164 | [:digit:] \d \p{IsDigit} |
165 | [:^digit:] \D \P{IsDigit} |
166 | |
167 | =head2 ANCHORS |
168 | |
169 | All are zero-width assertions. |
170 | |
171 | ^ Match string start (or line, if /m is used) |
172 | $ Match string end (or line, if /m is used) or before newline |
173 | \b Match word boundary (between \w and \W) |
6d014f17 |
174 | \B Match except at word boundary (between \w and \w or \W and \W) |
30487ceb |
175 | \A Match string start (regardless of /m) |
6d014f17 |
176 | \Z Match string end (before optional newline) |
30487ceb |
177 | \z Match absolute string end |
178 | \G Match where previous m//g left off |
30487ceb |
179 | |
64c5a566 |
180 | \K Keep the stuff left of the \K, don't include it in $& |
181 | |
30487ceb |
182 | =head2 QUANTIFIERS |
183 | |
ac036724 |
184 | Quantifiers are greedy by default and match the B<longest> leftmost. |
30487ceb |
185 | |
64c5a566 |
186 | Maximal Minimal Possessive Allowed range |
187 | ------- ------- ---------- ------------- |
188 | {n,m} {n,m}? {n,m}+ Must occur at least n times |
189 | but no more than m times |
190 | {n,} {n,}? {n,}+ Must occur at least n times |
191 | {n} {n}? {n}+ Must occur exactly n times |
192 | * *? *+ 0 or more times (same as {0,}) |
193 | + +? ++ 1 or more times (same as {1,}) |
194 | ? ?? ?+ 0 or 1 time (same as {0,1}) |
195 | |
196 | The possessive forms (new in Perl 5.10) prevent backtracking: what gets |
197 | matched by a pattern with a possessive quantifier will not be backtracked |
198 | into, even if that causes the whole match to fail. |
30487ceb |
199 | |
ac036724 |
200 | There is no quantifier C<{,n}>. That's interpreted as a literal string. |
6d014f17 |
201 | |
30487ceb |
202 | =head2 EXTENDED CONSTRUCTS |
203 | |
64c5a566 |
204 | (?#text) A comment |
205 | (?:...) Groups subexpressions without capturing (cluster) |
206 | (?pimsx-imsx:...) Enable/disable option (as per m// modifiers) |
207 | (?=...) Zero-width positive lookahead assertion |
208 | (?!...) Zero-width negative lookahead assertion |
209 | (?<=...) Zero-width positive lookbehind assertion |
210 | (?<!...) Zero-width negative lookbehind assertion |
211 | (?>...) Grab what we can, prohibit backtracking |
212 | (?|...) Branch reset |
213 | (?<name>...) Named capture |
214 | (?'name'...) Named capture |
215 | (?P<name>...) Named capture (python syntax) |
216 | (?{ code }) Embedded code, return value becomes $^R |
217 | (??{ code }) Dynamic regex, return value used as regex |
218 | (?N) Recurse into subpattern number N |
219 | (?-N), (?+N) Recurse into Nth previous/next subpattern |
220 | (?R), (?0) Recurse at the beginning of the whole pattern |
221 | (?&name) Recurse into a named subpattern |
222 | (?P>name) Recurse into a named subpattern (python syntax) |
223 | (?(cond)yes|no) |
224 | (?(cond)yes) Conditional expression, where "cond" can be: |
225 | (N) subpattern N has matched something |
226 | (<name>) named subpattern has matched something |
227 | ('name') named subpattern has matched something |
228 | (?{code}) code condition |
229 | (R) true if recursing |
230 | (RN) true if recursing into Nth subpattern |
231 | (R&name) true if recursing into named subpattern |
232 | (DEFINE) always false, no no-pattern allowed |
30487ceb |
233 | |
a5365663 |
234 | =head2 VARIABLES |
30487ceb |
235 | |
236 | $_ Default variable for operators to use |
30487ceb |
237 | |
30487ceb |
238 | $` Everything prior to matched string |
e17472c5 |
239 | $& Entire matched string |
30487ceb |
240 | $' Everything after to matched string |
241 | |
e17472c5 |
242 | ${^PREMATCH} Everything prior to matched string |
243 | ${^MATCH} Entire matched string |
244 | ${^POSTMATCH} Everything after to matched string |
245 | |
246 | The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use |
64c5a566 |
247 | within your program. Consult L<perlvar> for C<@-> |
30487ceb |
248 | to see equivalent expressions that won't cause slow down. |
e17472c5 |
249 | See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you |
250 | can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> |
251 | and C<${^POSTMATCH}>, but for them to be defined, you have to |
252 | specify the C</p> (preserve) modifier on your regular expression. |
30487ceb |
253 | |
254 | $1, $2 ... hold the Xth captured expr |
255 | $+ Last parenthesized pattern match |
256 | $^N Holds the most recently closed capture |
257 | $^R Holds the result of the last (?{...}) expr |
6d014f17 |
258 | @- Offsets of starts of groups. $-[0] holds start of whole match |
259 | @+ Offsets of ends of groups. $+[0] holds end of whole match |
e17472c5 |
260 | %+ Named capture buffers |
261 | %- Named capture buffers, as array refs |
30487ceb |
262 | |
6d014f17 |
263 | Captured groups are numbered according to their I<opening> paren. |
30487ceb |
264 | |
a5365663 |
265 | =head2 FUNCTIONS |
30487ceb |
266 | |
267 | lc Lowercase a string |
268 | lcfirst Lowercase first char of a string |
269 | uc Uppercase a string |
47e8a552 |
270 | ucfirst Titlecase first char of a string |
271 | |
30487ceb |
272 | pos Return or set current match position |
273 | quotemeta Quote metacharacters |
274 | reset Reset ?pattern? status |
275 | study Analyze string for optimizing matching |
276 | |
e17472c5 |
277 | split Use a regex to split a string into parts |
30487ceb |
278 | |
d3b55b48 |
279 | The first four of these are like the escape sequences C<\L>, C<\l>, |
280 | C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. |
47e8a552 |
281 | |
1501d360 |
282 | =head2 TERMINOLOGY |
47e8a552 |
283 | |
a5365663 |
284 | =head3 Titlecase |
47e8a552 |
285 | |
286 | Unicode concept which most often is equal to uppercase, but for |
287 | certain characters like the German "sharp s" there is a difference. |
288 | |
40506b5d |
289 | =head1 AUTHOR |
30487ceb |
290 | |
64c5a566 |
291 | Iain Truskett. Updated by the Perl 5 Porters. |
30487ceb |
292 | |
293 | This document may be distributed under the same terms as Perl itself. |
294 | |
40506b5d |
295 | =head1 SEE ALSO |
30487ceb |
296 | |
297 | =over 4 |
298 | |
299 | =item * |
300 | |
301 | L<perlretut> for a tutorial on regular expressions. |
302 | |
303 | =item * |
304 | |
305 | L<perlrequick> for a rapid tutorial. |
306 | |
307 | =item * |
308 | |
309 | L<perlre> for more details. |
310 | |
311 | =item * |
312 | |
313 | L<perlvar> for details on the variables. |
314 | |
315 | =item * |
316 | |
317 | L<perlop> for details on the operators. |
318 | |
319 | =item * |
320 | |
321 | L<perlfunc> for details on the functions. |
322 | |
323 | =item * |
324 | |
325 | L<perlfaq6> for FAQs on regular expressions. |
326 | |
327 | =item * |
328 | |
64c5a566 |
329 | L<perlrebackslash> for a reference on backslash sequences. |
330 | |
331 | =item * |
332 | |
333 | L<perlrecharclass> for a reference on character classes. |
334 | |
335 | =item * |
336 | |
30487ceb |
337 | The L<re> module to alter behaviour and aid |
338 | debugging. |
339 | |
340 | =item * |
341 | |
342 | L<perldebug/"Debugging regular expressions"> |
343 | |
344 | =item * |
345 | |
e17472c5 |
346 | L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale> |
30487ceb |
347 | for details on regexes and internationalisation. |
348 | |
349 | =item * |
350 | |
351 | I<Mastering Regular Expressions> by Jeffrey Friedl |
08d7a6b2 |
352 | (F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and |
30487ceb |
353 | reference on the topic. |
354 | |
355 | =back |
356 | |
40506b5d |
357 | =head1 THANKS |
30487ceb |
358 | |
359 | David P.C. Wollmann, |
360 | Richard Soderberg, |
361 | Sean M. Burke, |
362 | Tom Christiansen, |
e5a7b003 |
363 | Jim Cromie, |
30487ceb |
364 | and |
365 | Jeffrey Goff |
366 | for useful advice. |
6d014f17 |
367 | |
368 | =cut |