X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlrebackslash.pod;h=40f73fcbc1bfbf7957050096747c92c4310520b1;hb=c1effa61278e47c916466883d74905b04fedc388;hp=e1fdebb791e86058e95614a8623947c8eec28cbf;hpb=353c650532037e4006fbdb2176350717f320f7c3;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index e1fdebb..40f73fc 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -83,6 +83,7 @@ quoted constructs>. \l Lowercase next character. \L Lowercase till \E. \n (Logical) newline character. + \N Any character but newline. \N{} Named (Unicode) character. \p{}, \pP Character with a Unicode property. \P{}, \PP Character without a Unicode property. @@ -176,7 +177,7 @@ Mnemonic: Iamed character. Octal escapes consist of a backslash followed by two or three octal digits matching the code point of the character you want to use. This allows for -522 characters (C<\00> up to C<\777>) that can be expressed this way. +512 characters (C<\00> up to C<\777>) that can be expressed this way. Enough in pre-Unicode days, but most Unicode characters cannot be escaped this way. @@ -293,7 +294,7 @@ L. C<\w> is a character class that matches any I character (letters, digits, underscore). C<\d> is a character class that matches any digit, while the character class C<\s> matches any white space character. -New in perl 5.10 are the classes C<\h> and C<\v> which match horizontal +New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal and vertical white space characters. The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are @@ -340,7 +341,7 @@ as well. =head3 Relative referencing -New in perl 5.10 is different way of referring to capture buffers: C<\g>. +New in perl 5.10.0 is a different way of referring to capture buffers: C<\g>. C<\g> takes a number as argument, with the number in curly braces (the braces are optional). If the number (N) does not have a sign, it's a reference to the Nth capture group (so C<\g{2}> is equivalent to C<\2> - except that @@ -369,7 +370,7 @@ Mnemonic: Iroup. =head3 Named referencing -Also new in perl 5.10 is the use of named capture buffers, which can be +Also new in perl 5.10.0 is the use of named capture buffers, which can be referred to by name. This is done with C<\g{name}>, which is a backreference to the capture buffer with the name I. @@ -482,7 +483,7 @@ Mnemonic: oItet. =item \K -This is new in perl 5.10. Anything that is matched left of C<\K> is +This is new in perl 5.10.0. Anything that is matched left of C<\K> is not included in C<$&> - and will not be replaced if the pattern is used in a substitution. This will allow you to write C instead of C or C. @@ -498,9 +499,11 @@ a newline by Unicode. This includes all characters matched by C<\v> the newline used in Windows text files). C<\R> is equivalent with C<< (?>\x0D\x0A)|\v) >>. Since C<\R> can match a more than one character, it cannot be put inside a bracketed character class; C is an error. -C<\R> is introduced in perl 5.10. +C<\R> was introduced in perl 5.10.0. -Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>. +Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>, +and more importantly because Unicode recommends such a regular expression +metacharacter, and suggests C<\R> as the notation. =item \X @@ -512,6 +515,11 @@ mark character followed by zero or more mark characters. Mark characters include (but are not restricted to) I and I. +C<\X> matches quite well what normal (non-Unicode-programmer) usage +would consider a single character: for example a base character +(the C<\PM> above), for example a letter, followed by zero or more +diacritics, which are I (the C<\pM*> above). + Mnemonic: eItended Unicode character. =back