head2-ify many of the head1s, will probably make this look
[p5sagit/p5-mst-13.2.git] / pod / perlreref.pod
CommitLineData
30487ceb 1=head1 NAME
2
3perlreref - Perl Regular Expressions Reference
4
5=head1 DESCRIPTION
6
7This is a quick reference to Perl's regular expressions.
8For full information see L<perlre> and L<perlop>, as well
6d014f17 9as the L</"SEE ALSO"> section in this document.
30487ceb 10
a5365663 11=head2 OPERATORS
30487ceb 12
13 =~ determines to which variable the regex is applied.
e5a7b003 14 In its absence, $_ is used.
30487ceb 15
16 $var =~ /foo/;
17
6d014f17 18 !~ determines to which variable the regex is applied,
19 and negates the result of the match; it returns
20 false if the match succeeds, and true if it fails.
21
22 $var !~ /foo/;
23
30487ceb 24 m/pattern/igmsoxc searches a string for a pattern match,
25 applying the given options.
26
27 i case-Insensitive
28 g Global - all occurrences
29 m Multiline mode - ^ and $ match internal lines
30 s match as a Single line - . matches \n
31 o compile pattern Once
32 x eXtended legibility - free whitespace and comments
6d014f17 33 c don't reset pos on failed matches when using /g
30487ceb 34
6d014f17 35 If 'pattern' is an empty string, the last I<successfully> matched
e5a7b003 36 regex is used. Delimiters other than '/' may be used for both this
30487ceb 37 operator and the following ones.
38
39 qr/pattern/imsox lets you store a regex in a variable,
e5a7b003 40 or pass one around. Modifiers as for m// and are stored
30487ceb 41 within the regex.
42
43 s/pattern/replacement/igmsoxe substitutes matches of
e5a7b003 44 'pattern' with 'replacement'. Modifiers as for m//
45 with one addition:
30487ceb 46
47 e Evaluate replacement as an expression
48
49 'e' may be specified multiple times. 'replacement' is interpreted
50 as a double quoted string unless a single-quote (') is the delimiter.
51
e5a7b003 52 ?pattern? is like m/pattern/ but matches only once. No alternate
6d014f17 53 delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
30487ceb 54
a5365663 55=head2 SYNTAX
30487ceb 56
6d014f17 57 \ Escapes the character immediately following it
e5a7b003 58 . Matches any single character except a newline (unless /s is used)
59 ^ Matches at the beginning of the string (or line, if /m is used)
60 $ Matches at the end of the string (or line, if /m is used)
61 * Matches the preceding element 0 or more times
62 + Matches the preceding element 1 or more times
63 ? Matches the preceding element 0 or 1 times
64 {...} Specifies a range of occurrences for the element preceding it
65 [...] Matches any one of the characters contained within the brackets
66 (...) Groups subexpressions for capturing to $1, $2...
67 (?:...) Groups subexpressions without capturing (cluster)
6d014f17 68 | Matches either the subexpression preceding or following it
30487ceb 69 \1, \2 ... The text from the Nth group
70
71=head2 ESCAPE SEQUENCES
72
73These work as in normal strings.
74
75 \a Alarm (beep)
76 \e Escape
77 \f Formfeed
78 \n Newline
79 \r Carriage return
80 \t Tab
81 \038 Any octal ASCII value
82 \x7f Any hexadecimal ASCII value
83 \x{263a} A wide hexadecimal value
84 \cx Control-x
85 \N{name} A named character
86
6d014f17 87 \l Lowercase next character
88 \u Uppercase next character
30487ceb 89 \L Lowercase until \E
47e8a552 90 \U Titlecase until \E
30487ceb 91 \Q Disable pattern metacharacters until \E
92 \E End case modification
93
47e8a552 94For Titlecase, see L</Titlecase>.
95
30487ceb 96This one works differently from normal strings:
97
98 \b An assertion, not backspace, except in a character class
99
100=head2 CHARACTER CLASSES
101
102 [amy] Match 'a', 'm' or 'y'
103 [f-j] Dash specifies "range"
104 [f-j-] Dash escaped or at start or end means 'dash'
6d014f17 105 [^f-j] Caret indicates "match any character _except_ these"
30487ceb 106
107The following work within or without a character class:
108
109 \d A digit, same as [0-9]
110 \D A nondigit, same as [^0-9]
6d014f17 111 \w A word character (alphanumeric), same as [a-zA-Z0-9_]
112 \W A non-word character, [^a-zA-Z0-9_]
30487ceb 113 \s A whitespace character, same as [ \t\n\r\f]
114 \S A non-whitespace character, [^ \t\n\r\f]
6d014f17 115 \C Match a byte (with Unicode, '.' matches char)
30487ceb 116 \pP Match P-named (Unicode) property
117 \p{...} Match Unicode property with long name
118 \PP Match non-P
119 \P{...} Match lack of Unicode property with long name
120 \X Match extended unicode sequence
121
122POSIX character classes and their Unicode and Perl equivalents:
123
124 alnum IsAlnum Alphanumeric
125 alpha IsAlpha Alphabetic
126 ascii IsASCII Any ASCII char
127 blank IsSpace [ \t] Horizontal whitespace (GNU)
128 cntrl IsCntrl Control characters
129 digit IsDigit \d Digits
130 graph IsGraph Alphanumeric and punctuation
131 lower IsLower Lowercase chars (locale aware)
132 print IsPrint Alphanumeric, punct, and space
133 punct IsPunct Punctuation
134 space IsSpace [\s\ck] Whitespace
135 IsSpacePerl \s Perl's whitespace definition
136 upper IsUpper Uppercase chars (locale aware)
137 word IsWord \w Alphanumeric plus _ (Perl)
138 xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit
139
140Within a character class:
141
142 POSIX traditional Unicode
143 [:digit:] \d \p{IsDigit}
144 [:^digit:] \D \P{IsDigit}
145
146=head2 ANCHORS
147
148All are zero-width assertions.
149
150 ^ Match string start (or line, if /m is used)
151 $ Match string end (or line, if /m is used) or before newline
152 \b Match word boundary (between \w and \W)
6d014f17 153 \B Match except at word boundary (between \w and \w or \W and \W)
30487ceb 154 \A Match string start (regardless of /m)
6d014f17 155 \Z Match string end (before optional newline)
30487ceb 156 \z Match absolute string end
157 \G Match where previous m//g left off
30487ceb 158
159=head2 QUANTIFIERS
160
6d014f17 161Quantifiers are greedy by default -- match the B<longest> leftmost.
30487ceb 162
163 Maximal Minimal Allowed range
164 ------- ------- -------------
165 {n,m} {n,m}? Must occur at least n times but no more than m times
166 {n,} {n,}? Must occur at least n times
6d014f17 167 {n} {n}? Must occur exactly n times
30487ceb 168 * *? 0 or more times (same as {0,})
169 + +? 1 or more times (same as {1,})
170 ? ?? 0 or 1 time (same as {0,1})
171
6d014f17 172There is no quantifier {,n} -- that gets understood as a literal string.
173
30487ceb 174=head2 EXTENDED CONSTRUCTS
175
176 (?#text) A comment
6d014f17 177 (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
30487ceb 178 (?=...) Zero-width positive lookahead assertion
179 (?!...) Zero-width negative lookahead assertion
6d014f17 180 (?<=...) Zero-width positive lookbehind assertion
30487ceb 181 (?<!...) Zero-width negative lookbehind assertion
182 (?>...) Grab what we can, prohibit backtracking
183 (?{ code }) Embedded code, return value becomes $^R
184 (??{ code }) Dynamic regex, return value used as regex
e5a7b003 185 (?(cond)yes|no) cond being integer corresponding to capturing parens
30487ceb 186 (?(cond)yes) or a lookaround/eval zero-width assertion
187
a5365663 188=head2 VARIABLES
30487ceb 189
190 $_ Default variable for operators to use
8da7c437 191 $* Enable multiline matching (deprecated; not in 5.9.0 or later)
30487ceb 192
193 $& Entire matched string
194 $` Everything prior to matched string
195 $' Everything after to matched string
196
197The use of those last three will slow down B<all> regex use
198within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
199to see equivalent expressions that won't cause slow down.
200See also L<Devel::SawAmpersand>.
201
202 $1, $2 ... hold the Xth captured expr
203 $+ Last parenthesized pattern match
204 $^N Holds the most recently closed capture
205 $^R Holds the result of the last (?{...}) expr
6d014f17 206 @- Offsets of starts of groups. $-[0] holds start of whole match
207 @+ Offsets of ends of groups. $+[0] holds end of whole match
30487ceb 208
6d014f17 209Captured groups are numbered according to their I<opening> paren.
30487ceb 210
a5365663 211=head2 FUNCTIONS
30487ceb 212
213 lc Lowercase a string
214 lcfirst Lowercase first char of a string
215 uc Uppercase a string
47e8a552 216 ucfirst Titlecase first char of a string
217
30487ceb 218 pos Return or set current match position
219 quotemeta Quote metacharacters
220 reset Reset ?pattern? status
221 study Analyze string for optimizing matching
222
223 split Use regex to split a string into parts
224
47e8a552 225The first four of these are identical to the escape sequences \l, \u,
226\L, and \U. For Titlecase, see L</Titlecase>.
227
a5365663 228=head2 Terminology
47e8a552 229
a5365663 230=head3 Titlecase
47e8a552 231
232Unicode concept which most often is equal to uppercase, but for
233certain characters like the German "sharp s" there is a difference.
234
a5365663 235=head2 AUTHOR
30487ceb 236
237Iain Truskett.
238
239This document may be distributed under the same terms as Perl itself.
240
a5365663 241=head2 SEE ALSO
30487ceb 242
243=over 4
244
245=item *
246
247L<perlretut> for a tutorial on regular expressions.
248
249=item *
250
251L<perlrequick> for a rapid tutorial.
252
253=item *
254
255L<perlre> for more details.
256
257=item *
258
259L<perlvar> for details on the variables.
260
261=item *
262
263L<perlop> for details on the operators.
264
265=item *
266
267L<perlfunc> for details on the functions.
268
269=item *
270
271L<perlfaq6> for FAQs on regular expressions.
272
273=item *
274
275The L<re> module to alter behaviour and aid
276debugging.
277
278=item *
279
280L<perldebug/"Debugging regular expressions">
281
282=item *
283
284L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
285for details on regexes and internationalisation.
286
287=item *
288
289I<Mastering Regular Expressions> by Jeffrey Friedl
290(F<http://regex.info/>) for a thorough grounding and
291reference on the topic.
292
293=back
294
a5365663 295=head2 THANKS
30487ceb 296
297David P.C. Wollmann,
298Richard Soderberg,
299Sean M. Burke,
300Tom Christiansen,
e5a7b003 301Jim Cromie,
30487ceb 302and
303Jeffrey Goff
304for useful advice.
6d014f17 305
306=cut