X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlrebackslash.pod;h=ddd7abee3801a161c1919b35df1a4c7492755e53;hb=0c42fe95656e99f238a0bcf90ab2476c175615b7;hp=7851bf9f6a0dbfab59dc0b10c0e404de601bfe87;hpb=8a11820602bf2132dec1f28575fd8f008fe20205;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index 7851bf9..ddd7abe 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -37,7 +37,7 @@ future version of Perl may assign a special meaning to it. However, if you have warnings turned on, Perl will issue a warning if you use such a sequence. [1]. -It is however garanteed that backslash or escape sequences never have a +It is however guaranteed that backslash or escape sequences never have a punctuation character following the backslash, not now, and not in a future version of Perl 5. So it is safe to put a backslash in front of a non-word character. @@ -92,7 +92,7 @@ quoted constructs>. \s Character class for white space. \S Character class for non white space. \t Tab character. - \u Uppercase next character. + \u Titlecase next character. \U Uppercase till \E. \v Character class for vertical white space. \V Character class for non vertical white space. @@ -107,7 +107,7 @@ quoted constructs>. =head3 Fixed characters -A handful of characters have a dedidated I. The following +A handful of characters have a dedicated I. The following table shows them, along with their code points (in decimal and hex), their ASCII name, the control escape (see below) and a short description. @@ -176,7 +176,7 @@ Mnemonic: Iamed character. Octal escapes consist of a backslash followed by two or three octal digits matching the code point of the character you want to use. This allows for -522 characters (C<\00> up to C<\777>) that can be expressed this way. +512 characters (C<\00> up to C<\777>) that can be expressed this way. Enough in pre-Unicode days, but most Unicode characters cannot be escaped this way. @@ -202,7 +202,7 @@ the following rules: =item 1 -If the backslash is followed by a single digit, it's a backrefence. +If the backslash is followed by a single digit, it's a backreference. =item 2 @@ -255,12 +255,13 @@ Mnemonic: heIadecimal. A number of backslash sequences have to do with changing the character, or characters following them. C<\l> will lowercase the character following -it, while C<\u> will uppercase the character following it. (They perform -similar functionality as the functions C and C). +it, while C<\u> will uppercase (or, more accurately, titlecase) the +character following it. (They perform similar functionality as the +functions C and C). To uppercase or lowercase several characters, one might want to use C<\L> or C<\U>, which will lowercase/uppercase all characters following -them, until either the end of the pattern, or the next occurance of +them, until either the end of the pattern, or the next occurrence of C<\E>, whatever comes first. They perform similar functionality as the functions C and C do. @@ -292,7 +293,7 @@ L. C<\w> is a character class that matches any I character (letters, digits, underscore). C<\d> is a character class that matches any digit, while the character class C<\s> matches any white space character. -New in perl 5.10 are the classes C<\h> and C<\v> which match horizontal +New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal and vertical white space characters. The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are @@ -318,9 +319,10 @@ Mnemonic: I

roperty. If capturing parenthesis are used in a regular expression, we can refer to the part of the source string that was matched, and match exactly the -same thing. (Full details are discussed in L). There are -three ways of refering to such I: absolutely, relatively, -and by name. +same thing. There are three ways of referring to such I: +absolutely, relatively, and by name. + +=for later add link to perlrecapture =head3 Absolute referencing @@ -338,12 +340,12 @@ as well. =head3 Relative referencing -New in perl 5.10 is different way of refering to capture buffers: C<\g>. +New in perl 5.10.0 is a different way of referring to capture buffers: C<\g>. C<\g> takes a number as argument, with the number in curly braces (the braces are optional). If the number (N) does not have a sign, it's a reference to the Nth capture group (so C<\g{2}> is equivalent to C<\2> - except that C<\g> always refers to a capture group and will never be seen as an octal -escape). If the number is negative, the reference is relative, refering to +escape). If the number is negative, the reference is relative, referring to the Nth group before the C<\g{-N}>. The big advantage of C<\g{-N}> is that it makes it much easier to write @@ -367,7 +369,7 @@ Mnemonic: Iroup. =head3 Named referencing -Also new in perl 5.10 is the use of named capture buffers, which can be +Also new in perl 5.10.0 is the use of named capture buffers, which can be referred to by name. This is done with C<\g{name}>, which is a backreference to the capture buffer with the name I. @@ -425,7 +427,9 @@ remember where in the source string the last match ended, and the next time, it will start the match from where it ended the previous time. C<\G> matches the point where the previous match ended, or the beginning -of the string if there was no previous match. See also L. +of the string if there was no previous match. + +=for later add link to perlremodifiers Mnemonic: Ilobal. @@ -478,7 +482,7 @@ Mnemonic: oItet. =item \K -This is new in perl 5.10. Anything that is matched left of C<\K> is +This is new in perl 5.10.0. Anything that is matched left of C<\K> is not included in C<$&> - and will not be replaced if the pattern is used in a substitution. This will allow you to write C instead of C or C. @@ -492,11 +496,13 @@ a newline by Unicode. This includes all characters matched by C<\v> (vertical white space), and the multi character sequence C<"\x0D\x0A"> (carriage return followed by a line feed, aka the network newline, or the newline used in Windows text files). C<\R> is equivalent with -C<(?>\x0D\x0A)|\v)>. Since C<\R> can match a more than one character, +C<< (?>\x0D\x0A)|\v) >>. Since C<\R> can match a more than one character, it cannot be put inside a bracketed character class; C is an error. -C<\R> is introduced in perl 5.10. +C<\R> was introduced in perl 5.10.0. -Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>. +Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>, +and more importantly because Unicode recommends such a regular expression +metacharacter, and suggests C<\R> as the notation. =item \X @@ -508,6 +514,11 @@ mark character followed by zero or more mark characters. Mark characters include (but are not restricted to) I and I. +C<\X> matches quite well what normal (non-Unicode-programmer) usage +would consider a single character: for example a base character +(the C<\PM> above), for example a letter, followed by zero or more +diacritics, which are I (the C<\pM*> above). + Mnemonic: eItended Unicode character. =back