[patch@31649] vms.c realpath prototype mismatch
[p5sagit/p5-mst-13.2.git] / pod / perlreref.pod
CommitLineData
30487ceb 1=head1 NAME
2
3perlreref - Perl Regular Expressions Reference
4
5=head1 DESCRIPTION
6
7This is a quick reference to Perl's regular expressions.
8For full information see L<perlre> and L<perlop>, as well
6d014f17 9as the L</"SEE ALSO"> section in this document.
30487ceb 10
a5365663 11=head2 OPERATORS
30487ceb 12
e17472c5 13C<=~> determines to which variable the regex is applied.
14In its absence, $_ is used.
30487ceb 15
e17472c5 16 $var =~ /foo/;
30487ceb 17
e17472c5 18C<!~> determines to which variable the regex is applied,
19and negates the result of the match; it returns
20false if the match succeeds, and true if it fails.
6d014f17 21
e17472c5 22 $var !~ /foo/;
6d014f17 23
e17472c5 24C<m/pattern/msixpogc> searches a string for a pattern match,
25applying the given options.
30487ceb 26
e17472c5 27 m Multiline mode - ^ and $ match internal lines
28 s match as a Single line - . matches \n
29 i case-Insensitive
30 x eXtended legibility - free whitespace and comments
31 p Preserve a copy of the matched string -
32 ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
33 o compile pattern Once
34 g Global - all occurrences
35 c don't reset pos on failed matches when using /g
30487ceb 36
e17472c5 37If 'pattern' is an empty string, the last I<successfully> matched
38regex is used. Delimiters other than '/' may be used for both this
39operator and the following ones. The leading C<m> can be ommitted
40if the delimiter is '/'.
30487ceb 41
e17472c5 42C<qr/pattern/msixpo> lets you store a regex in a variable,
43or pass one around. Modifiers as for C<m//>, and are stored
44within the regex.
30487ceb 45
e17472c5 46C<s/pattern/replacement/msixpogce> substitutes matches of
47'pattern' with 'replacement'. Modifiers as for C<m//>,
48with one addition:
30487ceb 49
e17472c5 50 e Evaluate 'replacement' as an expression
30487ceb 51
e17472c5 52'e' may be specified multiple times. 'replacement' is interpreted
53as a double quoted string unless a single-quote (C<'>) is the delimiter.
30487ceb 54
e17472c5 55C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
56delimiters can be used. Must be reset with reset().
30487ceb 57
a5365663 58=head2 SYNTAX
30487ceb 59
6d014f17 60 \ Escapes the character immediately following it
e5a7b003 61 . Matches any single character except a newline (unless /s is used)
62 ^ Matches at the beginning of the string (or line, if /m is used)
63 $ Matches at the end of the string (or line, if /m is used)
64 * Matches the preceding element 0 or more times
65 + Matches the preceding element 1 or more times
66 ? Matches the preceding element 0 or 1 times
67 {...} Specifies a range of occurrences for the element preceding it
68 [...] Matches any one of the characters contained within the brackets
69 (...) Groups subexpressions for capturing to $1, $2...
70 (?:...) Groups subexpressions without capturing (cluster)
6d014f17 71 | Matches either the subexpression preceding or following it
e17472c5 72 \1, \2 ... Matches the text from the Nth group
30487ceb 73
74=head2 ESCAPE SEQUENCES
75
76These work as in normal strings.
77
78 \a Alarm (beep)
79 \e Escape
80 \f Formfeed
81 \n Newline
82 \r Carriage return
83 \t Tab
6ed007ae 84 \037 Any octal ASCII value
30487ceb 85 \x7f Any hexadecimal ASCII value
86 \x{263a} A wide hexadecimal value
87 \cx Control-x
88 \N{name} A named character
89
6d014f17 90 \l Lowercase next character
d3b55b48 91 \u Titlecase next character
30487ceb 92 \L Lowercase until \E
d3b55b48 93 \U Uppercase until \E
30487ceb 94 \Q Disable pattern metacharacters until \E
e17472c5 95 \E End modification
30487ceb 96
47e8a552 97For Titlecase, see L</Titlecase>.
98
30487ceb 99This one works differently from normal strings:
100
101 \b An assertion, not backspace, except in a character class
102
103=head2 CHARACTER CLASSES
104
105 [amy] Match 'a', 'm' or 'y'
106 [f-j] Dash specifies "range"
107 [f-j-] Dash escaped or at start or end means 'dash'
6d014f17 108 [^f-j] Caret indicates "match any character _except_ these"
30487ceb 109
e04a154e 110The following sequences work within or without a character class.
e17472c5 111The first six are locale aware, all are Unicode aware. See L<perllocale>
112and L<perlunicode> for details.
113
114 \d A digit
115 \D A nondigit
116 \w A word character
117 \W A non-word character
118 \s A whitespace character
119 \S A non-whitespace character
120 \h An horizontal white space
121 \H A non horizontal white space
122 \v A vertical white space
123 \V A non vertical white space
124 \R A generic newline (?>\v|\x0D\x0A)
e04a154e 125
126 \C Match a byte (with Unicode, '.' matches a character)
30487ceb 127 \pP Match P-named (Unicode) property
128 \p{...} Match Unicode property with long name
129 \PP Match non-P
130 \P{...} Match lack of Unicode property with long name
e17472c5 131 \X Match extended Unicode combining character sequence
30487ceb 132
133POSIX character classes and their Unicode and Perl equivalents:
134
e04a154e 135 alnum IsAlnum Alphanumeric
136 alpha IsAlpha Alphabetic
137 ascii IsASCII Any ASCII char
138 blank IsSpace [ \t] Horizontal whitespace (GNU extension)
139 cntrl IsCntrl Control characters
140 digit IsDigit \d Digits
141 graph IsGraph Alphanumeric and punctuation
142 lower IsLower Lowercase chars (locale and Unicode aware)
143 print IsPrint Alphanumeric, punct, and space
144 punct IsPunct Punctuation
145 space IsSpace [\s\ck] Whitespace
146 IsSpacePerl \s Perl's whitespace definition
147 upper IsUpper Uppercase chars (locale and Unicode aware)
148 word IsWord \w Alphanumeric plus _ (Perl extension)
149 xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
30487ceb 150
151Within a character class:
152
153 POSIX traditional Unicode
154 [:digit:] \d \p{IsDigit}
155 [:^digit:] \D \P{IsDigit}
156
157=head2 ANCHORS
158
159All are zero-width assertions.
160
161 ^ Match string start (or line, if /m is used)
162 $ Match string end (or line, if /m is used) or before newline
163 \b Match word boundary (between \w and \W)
6d014f17 164 \B Match except at word boundary (between \w and \w or \W and \W)
30487ceb 165 \A Match string start (regardless of /m)
6d014f17 166 \Z Match string end (before optional newline)
30487ceb 167 \z Match absolute string end
168 \G Match where previous m//g left off
30487ceb 169
170=head2 QUANTIFIERS
171
6d014f17 172Quantifiers are greedy by default -- match the B<longest> leftmost.
30487ceb 173
174 Maximal Minimal Allowed range
175 ------- ------- -------------
176 {n,m} {n,m}? Must occur at least n times but no more than m times
177 {n,} {n,}? Must occur at least n times
6d014f17 178 {n} {n}? Must occur exactly n times
30487ceb 179 * *? 0 or more times (same as {0,})
180 + +? 1 or more times (same as {1,})
181 ? ?? 0 or 1 time (same as {0,1})
182
6d014f17 183There is no quantifier {,n} -- that gets understood as a literal string.
184
30487ceb 185=head2 EXTENDED CONSTRUCTS
186
187 (?#text) A comment
6d014f17 188 (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
30487ceb 189 (?=...) Zero-width positive lookahead assertion
190 (?!...) Zero-width negative lookahead assertion
6d014f17 191 (?<=...) Zero-width positive lookbehind assertion
30487ceb 192 (?<!...) Zero-width negative lookbehind assertion
193 (?>...) Grab what we can, prohibit backtracking
194 (?{ code }) Embedded code, return value becomes $^R
195 (??{ code }) Dynamic regex, return value used as regex
e5a7b003 196 (?(cond)yes|no) cond being integer corresponding to capturing parens
30487ceb 197 (?(cond)yes) or a lookaround/eval zero-width assertion
198
a5365663 199=head2 VARIABLES
30487ceb 200
201 $_ Default variable for operators to use
30487ceb 202
30487ceb 203 $` Everything prior to matched string
e17472c5 204 $& Entire matched string
30487ceb 205 $' Everything after to matched string
206
e17472c5 207 ${^PREMATCH} Everything prior to matched string
208 ${^MATCH} Entire matched string
209 ${^POSTMATCH} Everything after to matched string
210
211The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
30487ceb 212within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
213to see equivalent expressions that won't cause slow down.
e17472c5 214See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
215can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
216and C<${^POSTMATCH}>, but for them to be defined, you have to
217specify the C</p> (preserve) modifier on your regular expression.
30487ceb 218
219 $1, $2 ... hold the Xth captured expr
220 $+ Last parenthesized pattern match
221 $^N Holds the most recently closed capture
222 $^R Holds the result of the last (?{...}) expr
6d014f17 223 @- Offsets of starts of groups. $-[0] holds start of whole match
224 @+ Offsets of ends of groups. $+[0] holds end of whole match
e17472c5 225 %+ Named capture buffers
226 %- Named capture buffers, as array refs
30487ceb 227
6d014f17 228Captured groups are numbered according to their I<opening> paren.
30487ceb 229
a5365663 230=head2 FUNCTIONS
30487ceb 231
232 lc Lowercase a string
233 lcfirst Lowercase first char of a string
234 uc Uppercase a string
47e8a552 235 ucfirst Titlecase first char of a string
236
30487ceb 237 pos Return or set current match position
238 quotemeta Quote metacharacters
239 reset Reset ?pattern? status
240 study Analyze string for optimizing matching
241
e17472c5 242 split Use a regex to split a string into parts
30487ceb 243
d3b55b48 244The first four of these are like the escape sequences C<\L>, C<\l>,
245C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
47e8a552 246
1501d360 247=head2 TERMINOLOGY
47e8a552 248
a5365663 249=head3 Titlecase
47e8a552 250
251Unicode concept which most often is equal to uppercase, but for
252certain characters like the German "sharp s" there is a difference.
253
40506b5d 254=head1 AUTHOR
30487ceb 255
256Iain Truskett.
257
258This document may be distributed under the same terms as Perl itself.
259
40506b5d 260=head1 SEE ALSO
30487ceb 261
262=over 4
263
264=item *
265
266L<perlretut> for a tutorial on regular expressions.
267
268=item *
269
270L<perlrequick> for a rapid tutorial.
271
272=item *
273
274L<perlre> for more details.
275
276=item *
277
278L<perlvar> for details on the variables.
279
280=item *
281
282L<perlop> for details on the operators.
283
284=item *
285
286L<perlfunc> for details on the functions.
287
288=item *
289
290L<perlfaq6> for FAQs on regular expressions.
291
292=item *
293
294The L<re> module to alter behaviour and aid
295debugging.
296
297=item *
298
299L<perldebug/"Debugging regular expressions">
300
301=item *
302
e17472c5 303L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
30487ceb 304for details on regexes and internationalisation.
305
306=item *
307
308I<Mastering Regular Expressions> by Jeffrey Friedl
309(F<http://regex.info/>) for a thorough grounding and
310reference on the topic.
311
312=back
313
40506b5d 314=head1 THANKS
30487ceb 315
316David P.C. Wollmann,
317Richard Soderberg,
318Sean M. Burke,
319Tom Christiansen,
e5a7b003 320Jim Cromie,
30487ceb 321and
322Jeffrey Goff
323for useful advice.
6d014f17 324
325=cut