From: Mark Kvale Date: Wed, 27 Mar 2002 16:45:37 +0000 (-0800) Subject: [DOC PATCH] Regex \G and POSIX restrictions X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=54c18d0455d4f9550786bea467f5a04c96e86890;p=p5sagit%2Fp5-mst-13.2.git [DOC PATCH] Regex \G and POSIX restrictions Message-Id: <02032716453705.38063@ivy.ucsf.edu> p4raw-id: //depot/perl@15562 --- diff --git a/pod/perlre.pod b/pod/perlre.pod index 58cd645..fef8ce3 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -316,8 +316,10 @@ with a '^'. This is a Perl extension. For example: [:^space:] \S \P{IsSpace} [:^word:] \W \P{IsWord} -The POSIX character classes [.cc.] and [=cc=] are recognized but -B supported and trying to use them will cause an error. +Perl respects the POSIX standard in that POSIX character classes are +only supported within a character class. The POSIX character classes +[.cc.] and [=cc=] are recognized but B supported and trying to +use them will cause an error. Perl defines the following zero-width assertions: @@ -347,7 +349,8 @@ It is also useful when writing C-like scanners, when you have several patterns that you want to match against consequent substrings of your string, see the previous reference. The actual location where C<\G> will match can also be influenced by using C as -an lvalue. See L. +an lvalue. Currently C<\G> only works when used at the +beginning of the pattern. See L. The bracketing construct C<( ... )> creates capture buffers. To refer to the digit'th buffer use \ within the diff --git a/pod/perlretut.pod b/pod/perlretut.pod index e90e03d..8c12a5c 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1403,6 +1403,7 @@ off. C<\G> allows us to easily do context-sensitive matching: The combination of C and C<\G> allows us to process the string a bit at a time and use arbitrary Perl logic to decide what to do next. +Currently, the C<\G> anchor only works at the beginning of a pattern. C<\G> is also invaluable in processing fixed length records with regexps. Suppose we have a snippet of coding region DNA, encoded as @@ -1782,10 +1783,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s> character classes. To negate a POSIX class, put a C<^> in front of the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under C, C<\P{IsDigit}>. The Unicode and POSIX character classes can -be used just like C<\d>, both inside and outside of character classes: +be used just like C<\d>, with the exception that POSIX character +classes can only be used inside of a character class: /\s+[abc[:digit:]xyz]\s*/; # match a,b,c,x,y,z, or a digit - /^=item\s[:digit:]/; # match '=item', + /^=item\s[[:digit:]]/; # match '=item', # followed by a space and a digit use charnames ":full"; /\s+[abc\p{IsDigit}xyz]\s+/; # match a,b,c,x,y,z, or a digit