Commit | Line | Data |
30487ceb |
1 | =head1 NAME |
2 | |
3 | perlreref - Perl Regular Expressions Reference |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This is a quick reference to Perl's regular expressions. |
8 | For full information see L<perlre> and L<perlop>, as well |
6d014f17 |
9 | as the L</"SEE ALSO"> section in this document. |
30487ceb |
10 | |
11 | =head1 OPERATORS |
12 | |
13 | =~ determines to which variable the regex is applied. |
e5a7b003 |
14 | In its absence, $_ is used. |
30487ceb |
15 | |
16 | $var =~ /foo/; |
17 | |
6d014f17 |
18 | !~ determines to which variable the regex is applied, |
19 | and negates the result of the match; it returns |
20 | false if the match succeeds, and true if it fails. |
21 | |
22 | $var !~ /foo/; |
23 | |
30487ceb |
24 | m/pattern/igmsoxc searches a string for a pattern match, |
25 | applying the given options. |
26 | |
27 | i case-Insensitive |
28 | g Global - all occurrences |
29 | m Multiline mode - ^ and $ match internal lines |
30 | s match as a Single line - . matches \n |
31 | o compile pattern Once |
32 | x eXtended legibility - free whitespace and comments |
6d014f17 |
33 | c don't reset pos on failed matches when using /g |
30487ceb |
34 | |
6d014f17 |
35 | If 'pattern' is an empty string, the last I<successfully> matched |
e5a7b003 |
36 | regex is used. Delimiters other than '/' may be used for both this |
30487ceb |
37 | operator and the following ones. |
38 | |
39 | qr/pattern/imsox lets you store a regex in a variable, |
e5a7b003 |
40 | or pass one around. Modifiers as for m// and are stored |
30487ceb |
41 | within the regex. |
42 | |
43 | s/pattern/replacement/igmsoxe substitutes matches of |
e5a7b003 |
44 | 'pattern' with 'replacement'. Modifiers as for m// |
45 | with one addition: |
30487ceb |
46 | |
47 | e Evaluate replacement as an expression |
48 | |
49 | 'e' may be specified multiple times. 'replacement' is interpreted |
50 | as a double quoted string unless a single-quote (') is the delimiter. |
51 | |
e5a7b003 |
52 | ?pattern? is like m/pattern/ but matches only once. No alternate |
6d014f17 |
53 | delimiters can be used. Must be reset with L<reset|perlfunc/reset>. |
30487ceb |
54 | |
55 | =head1 SYNTAX |
56 | |
6d014f17 |
57 | \ Escapes the character immediately following it |
e5a7b003 |
58 | . Matches any single character except a newline (unless /s is used) |
59 | ^ Matches at the beginning of the string (or line, if /m is used) |
60 | $ Matches at the end of the string (or line, if /m is used) |
61 | * Matches the preceding element 0 or more times |
62 | + Matches the preceding element 1 or more times |
63 | ? Matches the preceding element 0 or 1 times |
64 | {...} Specifies a range of occurrences for the element preceding it |
65 | [...] Matches any one of the characters contained within the brackets |
66 | (...) Groups subexpressions for capturing to $1, $2... |
67 | (?:...) Groups subexpressions without capturing (cluster) |
6d014f17 |
68 | | Matches either the subexpression preceding or following it |
30487ceb |
69 | \1, \2 ... The text from the Nth group |
70 | |
71 | =head2 ESCAPE SEQUENCES |
72 | |
73 | These work as in normal strings. |
74 | |
75 | \a Alarm (beep) |
76 | \e Escape |
77 | \f Formfeed |
78 | \n Newline |
79 | \r Carriage return |
80 | \t Tab |
81 | \038 Any octal ASCII value |
82 | \x7f Any hexadecimal ASCII value |
83 | \x{263a} A wide hexadecimal value |
84 | \cx Control-x |
85 | \N{name} A named character |
86 | |
6d014f17 |
87 | \l Lowercase next character |
88 | \u Uppercase next character |
30487ceb |
89 | \L Lowercase until \E |
90 | \U Uppercase until \E |
91 | \Q Disable pattern metacharacters until \E |
92 | \E End case modification |
93 | |
94 | This one works differently from normal strings: |
95 | |
96 | \b An assertion, not backspace, except in a character class |
97 | |
98 | =head2 CHARACTER CLASSES |
99 | |
100 | [amy] Match 'a', 'm' or 'y' |
101 | [f-j] Dash specifies "range" |
102 | [f-j-] Dash escaped or at start or end means 'dash' |
6d014f17 |
103 | [^f-j] Caret indicates "match any character _except_ these" |
30487ceb |
104 | |
105 | The following work within or without a character class: |
106 | |
107 | \d A digit, same as [0-9] |
108 | \D A nondigit, same as [^0-9] |
6d014f17 |
109 | \w A word character (alphanumeric), same as [a-zA-Z0-9_] |
110 | \W A non-word character, [^a-zA-Z0-9_] |
30487ceb |
111 | \s A whitespace character, same as [ \t\n\r\f] |
112 | \S A non-whitespace character, [^ \t\n\r\f] |
6d014f17 |
113 | \C Match a byte (with Unicode, '.' matches char) |
30487ceb |
114 | \pP Match P-named (Unicode) property |
115 | \p{...} Match Unicode property with long name |
116 | \PP Match non-P |
117 | \P{...} Match lack of Unicode property with long name |
118 | \X Match extended unicode sequence |
119 | |
120 | POSIX character classes and their Unicode and Perl equivalents: |
121 | |
122 | alnum IsAlnum Alphanumeric |
123 | alpha IsAlpha Alphabetic |
124 | ascii IsASCII Any ASCII char |
125 | blank IsSpace [ \t] Horizontal whitespace (GNU) |
126 | cntrl IsCntrl Control characters |
127 | digit IsDigit \d Digits |
128 | graph IsGraph Alphanumeric and punctuation |
129 | lower IsLower Lowercase chars (locale aware) |
130 | print IsPrint Alphanumeric, punct, and space |
131 | punct IsPunct Punctuation |
132 | space IsSpace [\s\ck] Whitespace |
133 | IsSpacePerl \s Perl's whitespace definition |
134 | upper IsUpper Uppercase chars (locale aware) |
135 | word IsWord \w Alphanumeric plus _ (Perl) |
136 | xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit |
137 | |
138 | Within a character class: |
139 | |
140 | POSIX traditional Unicode |
141 | [:digit:] \d \p{IsDigit} |
142 | [:^digit:] \D \P{IsDigit} |
143 | |
144 | =head2 ANCHORS |
145 | |
146 | All are zero-width assertions. |
147 | |
148 | ^ Match string start (or line, if /m is used) |
149 | $ Match string end (or line, if /m is used) or before newline |
150 | \b Match word boundary (between \w and \W) |
6d014f17 |
151 | \B Match except at word boundary (between \w and \w or \W and \W) |
30487ceb |
152 | \A Match string start (regardless of /m) |
6d014f17 |
153 | \Z Match string end (before optional newline) |
30487ceb |
154 | \z Match absolute string end |
155 | \G Match where previous m//g left off |
30487ceb |
156 | |
157 | =head2 QUANTIFIERS |
158 | |
6d014f17 |
159 | Quantifiers are greedy by default -- match the B<longest> leftmost. |
30487ceb |
160 | |
161 | Maximal Minimal Allowed range |
162 | ------- ------- ------------- |
163 | {n,m} {n,m}? Must occur at least n times but no more than m times |
164 | {n,} {n,}? Must occur at least n times |
6d014f17 |
165 | {n} {n}? Must occur exactly n times |
30487ceb |
166 | * *? 0 or more times (same as {0,}) |
167 | + +? 1 or more times (same as {1,}) |
168 | ? ?? 0 or 1 time (same as {0,1}) |
169 | |
6d014f17 |
170 | There is no quantifier {,n} -- that gets understood as a literal string. |
171 | |
30487ceb |
172 | =head2 EXTENDED CONSTRUCTS |
173 | |
174 | (?#text) A comment |
6d014f17 |
175 | (?imxs-imsx:...) Enable/disable option (as per m// modifiers) |
30487ceb |
176 | (?=...) Zero-width positive lookahead assertion |
177 | (?!...) Zero-width negative lookahead assertion |
6d014f17 |
178 | (?<=...) Zero-width positive lookbehind assertion |
30487ceb |
179 | (?<!...) Zero-width negative lookbehind assertion |
180 | (?>...) Grab what we can, prohibit backtracking |
181 | (?{ code }) Embedded code, return value becomes $^R |
182 | (??{ code }) Dynamic regex, return value used as regex |
e5a7b003 |
183 | (?(cond)yes|no) cond being integer corresponding to capturing parens |
30487ceb |
184 | (?(cond)yes) or a lookaround/eval zero-width assertion |
185 | |
186 | =head1 VARIABLES |
187 | |
188 | $_ Default variable for operators to use |
8da7c437 |
189 | $* Enable multiline matching (deprecated; not in 5.9.0 or later) |
30487ceb |
190 | |
191 | $& Entire matched string |
192 | $` Everything prior to matched string |
193 | $' Everything after to matched string |
194 | |
195 | The use of those last three will slow down B<all> regex use |
196 | within your program. Consult L<perlvar> for C<@LAST_MATCH_START> |
197 | to see equivalent expressions that won't cause slow down. |
198 | See also L<Devel::SawAmpersand>. |
199 | |
200 | $1, $2 ... hold the Xth captured expr |
201 | $+ Last parenthesized pattern match |
202 | $^N Holds the most recently closed capture |
203 | $^R Holds the result of the last (?{...}) expr |
6d014f17 |
204 | @- Offsets of starts of groups. $-[0] holds start of whole match |
205 | @+ Offsets of ends of groups. $+[0] holds end of whole match |
30487ceb |
206 | |
6d014f17 |
207 | Captured groups are numbered according to their I<opening> paren. |
30487ceb |
208 | |
209 | =head1 FUNCTIONS |
210 | |
211 | lc Lowercase a string |
212 | lcfirst Lowercase first char of a string |
213 | uc Uppercase a string |
6d014f17 |
214 | ucfirst Uppercase first char of a string |
30487ceb |
215 | pos Return or set current match position |
216 | quotemeta Quote metacharacters |
217 | reset Reset ?pattern? status |
218 | study Analyze string for optimizing matching |
219 | |
220 | split Use regex to split a string into parts |
221 | |
222 | =head1 AUTHOR |
223 | |
224 | Iain Truskett. |
225 | |
226 | This document may be distributed under the same terms as Perl itself. |
227 | |
228 | =head1 SEE ALSO |
229 | |
230 | =over 4 |
231 | |
232 | =item * |
233 | |
234 | L<perlretut> for a tutorial on regular expressions. |
235 | |
236 | =item * |
237 | |
238 | L<perlrequick> for a rapid tutorial. |
239 | |
240 | =item * |
241 | |
242 | L<perlre> for more details. |
243 | |
244 | =item * |
245 | |
246 | L<perlvar> for details on the variables. |
247 | |
248 | =item * |
249 | |
250 | L<perlop> for details on the operators. |
251 | |
252 | =item * |
253 | |
254 | L<perlfunc> for details on the functions. |
255 | |
256 | =item * |
257 | |
258 | L<perlfaq6> for FAQs on regular expressions. |
259 | |
260 | =item * |
261 | |
262 | The L<re> module to alter behaviour and aid |
263 | debugging. |
264 | |
265 | =item * |
266 | |
267 | L<perldebug/"Debugging regular expressions"> |
268 | |
269 | =item * |
270 | |
271 | L<perluniintro>, L<perlunicode>, L<charnames> and L<locale> |
272 | for details on regexes and internationalisation. |
273 | |
274 | =item * |
275 | |
276 | I<Mastering Regular Expressions> by Jeffrey Friedl |
277 | (F<http://regex.info/>) for a thorough grounding and |
278 | reference on the topic. |
279 | |
280 | =back |
281 | |
282 | =head1 THANKS |
283 | |
284 | David P.C. Wollmann, |
285 | Richard Soderberg, |
286 | Sean M. Burke, |
287 | Tom Christiansen, |
e5a7b003 |
288 | Jim Cromie, |
30487ceb |
289 | and |
290 | Jeffrey Goff |
291 | for useful advice. |
6d014f17 |
292 | |
293 | =cut |