X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlretut.pod;h=6e06f192914b95af82e9cb85856ac46daf6a111f;hb=c23d1eb0e18a49361001d26c686323d50b0c6d21;hp=e90e03d60213c28df88e95657bcfaf46ff258467;hpb=08ce8fc6c711b96a43427a2fe173f1d81abd18c2;p=p5sagit%2Fp5-mst-13.2.git
diff --git a/pod/perlretut.pod b/pod/perlretut.pod
index e90e03d..6e06f19 100644
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -689,10 +689,11 @@ inside goes into the special variables C<$1>, C<$2>, etc. They can be
used just as ordinary variables:
# extract hours, minutes, seconds
- $time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
- $hours = $1;
- $minutes = $2;
- $seconds = $3;
+ if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format
+ $hours = $1;
+ $minutes = $2;
+ $seconds = $3;
+ }
Now, we know that in scalar context,
S > returns a true or false
@@ -1403,6 +1404,8 @@ off. C<\G> allows us to easily do context-sensitive matching:
The combination of C/g> and C<\G> allows us to process the string a
bit at a time and use arbitrary Perl logic to decide what to do next.
+Currently, the C<\G> anchor is only fully supported when used to anchor
+to the start of the pattern.
C<\G> is also invaluable in processing fixed length records with
regexps. Suppose we have a snippet of coding region DNA, encoded as
@@ -1705,7 +1708,7 @@ it matches I byte 0-255. So
The last regexp matches, but is dangerous because the string
I position is no longer synchronized to the string I
position. This generates the warning 'Malformed UTF-8
-character'. C<\C> is best used for matching the binary data in strings
+character'. The C<\C> is best used for matching the binary data in strings
with binary data intermixed with Unicode characters.
Let us now discuss the rest of the character classes. Just as with
@@ -1752,7 +1755,7 @@ For the full list see L.
The Unicode has also been separated into various sets of charaters
which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in),
-for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>.
+for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>.
For the full list see L.
C<\X> is an abbreviation for a character class sequence that includes
@@ -1782,10 +1785,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
character classes. To negate a POSIX class, put a C<^> in front of
the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under
C, C<\P{IsDigit}>. The Unicode and POSIX character classes can
-be used just like C<\d>, both inside and outside of character classes:
+be used just like C<\d>, with the exception that POSIX character
+classes can only be used inside of a character class:
/\s+[abc[:digit:]xyz]\s*/; # match a,b,c,x,y,z, or a digit
- /^=item\s[:digit:]/; # match '=item',
+ /^=item\s[[:digit:]]/; # match '=item',
# followed by a space and a digit
use charnames ":full";
/\s+[abc\p{IsDigit}xyz]\s+/; # match a,b,c,x,y,z, or a digit
@@ -2001,6 +2005,10 @@ They evaluate true if the regexps do I match:
$x =~ /foo(?!baz)/; # matches, 'baz' doesn't follow 'foo'
$x =~ /(? is unsupported in lookbehind, because the already
+treacherous definition of C<\C> would become even more so
+when going backwards.
+
=head2 Using independent subexpressions to prevent backtracking
The last few extended patterns in this tutorial are experimental as of