X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlrebackslash.pod;h=ddd7abee3801a161c1919b35df1a4c7492755e53;hb=0c42fe95656e99f238a0bcf90ab2476c175615b7;hp=e5f6ff5d516addeea8f066067a381ad729469dd2;hpb=e2cb52ee1eadebb8ce6e30389dbc092e45144039;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index e5f6ff5..ddd7abe 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -176,7 +176,7 @@ Mnemonic: Iamed character. Octal escapes consist of a backslash followed by two or three octal digits matching the code point of the character you want to use. This allows for -522 characters (C<\00> up to C<\777>) that can be expressed this way. +512 characters (C<\00> up to C<\777>) that can be expressed this way. Enough in pre-Unicode days, but most Unicode characters cannot be escaped this way. @@ -202,7 +202,7 @@ the following rules: =item 1 -If the backslash is followed by a single digit, it's a backrefence. +If the backslash is followed by a single digit, it's a backreference. =item 2 @@ -293,7 +293,7 @@ L. C<\w> is a character class that matches any I character (letters, digits, underscore). C<\d> is a character class that matches any digit, while the character class C<\s> matches any white space character. -New in perl 5.10 are the classes C<\h> and C<\v> which match horizontal +New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal and vertical white space characters. The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are @@ -319,9 +319,10 @@ Mnemonic: I

roperty. If capturing parenthesis are used in a regular expression, we can refer to the part of the source string that was matched, and match exactly the -same thing. (Full details are discussed in L). There are -three ways of referring to such I: absolutely, relatively, -and by name. +same thing. There are three ways of referring to such I: +absolutely, relatively, and by name. + +=for later add link to perlrecapture =head3 Absolute referencing @@ -339,7 +340,7 @@ as well. =head3 Relative referencing -New in perl 5.10 is different way of referring to capture buffers: C<\g>. +New in perl 5.10.0 is a different way of referring to capture buffers: C<\g>. C<\g> takes a number as argument, with the number in curly braces (the braces are optional). If the number (N) does not have a sign, it's a reference to the Nth capture group (so C<\g{2}> is equivalent to C<\2> - except that @@ -368,7 +369,7 @@ Mnemonic: Iroup. =head3 Named referencing -Also new in perl 5.10 is the use of named capture buffers, which can be +Also new in perl 5.10.0 is the use of named capture buffers, which can be referred to by name. This is done with C<\g{name}>, which is a backreference to the capture buffer with the name I. @@ -426,7 +427,9 @@ remember where in the source string the last match ended, and the next time, it will start the match from where it ended the previous time. C<\G> matches the point where the previous match ended, or the beginning -of the string if there was no previous match. See also L. +of the string if there was no previous match. + +=for later add link to perlremodifiers Mnemonic: Ilobal. @@ -479,7 +482,7 @@ Mnemonic: oItet. =item \K -This is new in perl 5.10. Anything that is matched left of C<\K> is +This is new in perl 5.10.0. Anything that is matched left of C<\K> is not included in C<$&> - and will not be replaced if the pattern is used in a substitution. This will allow you to write C instead of C or C. @@ -495,9 +498,11 @@ a newline by Unicode. This includes all characters matched by C<\v> the newline used in Windows text files). C<\R> is equivalent with C<< (?>\x0D\x0A)|\v) >>. Since C<\R> can match a more than one character, it cannot be put inside a bracketed character class; C is an error. -C<\R> is introduced in perl 5.10. +C<\R> was introduced in perl 5.10.0. -Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>. +Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>, +and more importantly because Unicode recommends such a regular expression +metacharacter, and suggests C<\R> as the notation. =item \X @@ -509,6 +514,11 @@ mark character followed by zero or more mark characters. Mark characters include (but are not restricted to) I and I. +C<\X> matches quite well what normal (non-Unicode-programmer) usage +would consider a single character: for example a base character +(the C<\PM> above), for example a letter, followed by zero or more +diacritics, which are I (the C<\pM*> above). + Mnemonic: eItended Unicode character. =back