X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlreref.pod;h=700814140d759e48c2c63d99dc21598e6abd2fee;hb=12fc24939aa1955e247b87a4837866062d192a17;hp=c6c7752bae7b833a8966384d2419dc62503cbc89;hpb=47e8a552181f4e4d9a6075d02dfd1f5863b44bf7;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlreref.pod b/pod/perlreref.pod index c6c7752..7008141 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -8,7 +8,7 @@ This is a quick reference to Perl's regular expressions. For full information see L and L, as well as the L section in this document. -=head1 OPERATORS +=head2 OPERATORS =~ determines to which variable the regex is applied. In its absence, $_ is used. @@ -52,7 +52,7 @@ as the L section in this document. ?pattern? is like m/pattern/ but matches only once. No alternate delimiters can be used. Must be reset with L. -=head1 SYNTAX +=head2 SYNTAX \ Escapes the character immediately following it . Matches any single character except a newline (unless /s is used) @@ -85,9 +85,9 @@ These work as in normal strings. \N{name} A named character \l Lowercase next character - \u Uppercase next character + \u Titlecase next character \L Lowercase until \E - \U Titlecase until \E + \U Uppercase until \E \Q Disable pattern metacharacters until \E \E End case modification @@ -104,15 +104,19 @@ This one works differently from normal strings: [f-j-] Dash escaped or at start or end means 'dash' [^f-j] Caret indicates "match any character _except_ these" -The following work within or without a character class: +The following sequences work within or without a character class. +The first six are locale aware, all are Unicode aware. The default +character class equivalent are given. See L and +L for details. - \d A digit, same as [0-9] - \D A nondigit, same as [^0-9] - \w A word character (alphanumeric), same as [a-zA-Z0-9_] - \W A non-word character, [^a-zA-Z0-9_] - \s A whitespace character, same as [ \t\n\r\f] - \S A non-whitespace character, [^ \t\n\r\f] - \C Match a byte (with Unicode, '.' matches char) + \d A digit [0-9] + \D A nondigit [^0-9] + \w A word character [a-zA-Z0-9_] + \W A non-word character [^a-zA-Z0-9_] + \s A whitespace character [ \t\n\r\f] + \S A non-whitespace character [^ \t\n\r\f] + + \C Match a byte (with Unicode, '.' matches a character) \pP Match P-named (Unicode) property \p{...} Match Unicode property with long name \PP Match non-P @@ -121,21 +125,21 @@ The following work within or without a character class: POSIX character classes and their Unicode and Perl equivalents: - alnum IsAlnum Alphanumeric - alpha IsAlpha Alphabetic - ascii IsASCII Any ASCII char - blank IsSpace [ \t] Horizontal whitespace (GNU) - cntrl IsCntrl Control characters - digit IsDigit \d Digits - graph IsGraph Alphanumeric and punctuation - lower IsLower Lowercase chars (locale aware) - print IsPrint Alphanumeric, punct, and space - punct IsPunct Punctuation - space IsSpace [\s\ck] Whitespace - IsSpacePerl \s Perl's whitespace definition - upper IsUpper Uppercase chars (locale aware) - word IsWord \w Alphanumeric plus _ (Perl) - xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit + alnum IsAlnum Alphanumeric + alpha IsAlpha Alphabetic + ascii IsASCII Any ASCII char + blank IsSpace [ \t] Horizontal whitespace (GNU extension) + cntrl IsCntrl Control characters + digit IsDigit \d Digits + graph IsGraph Alphanumeric and punctuation + lower IsLower Lowercase chars (locale and Unicode aware) + print IsPrint Alphanumeric, punct, and space + punct IsPunct Punctuation + space IsSpace [\s\ck] Whitespace + IsSpacePerl \s Perl's whitespace definition + upper IsUpper Uppercase chars (locale and Unicode aware) + word IsWord \w Alphanumeric plus _ (Perl extension) + xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit Within a character class: @@ -185,7 +189,7 @@ There is no quantifier {,n} -- that gets understood as a literal string. (?(cond)yes|no) cond being integer corresponding to capturing parens (?(cond)yes) or a lookaround/eval zero-width assertion -=head1 VARIABLES +=head2 VARIABLES $_ Default variable for operators to use $* Enable multiline matching (deprecated; not in 5.9.0 or later) @@ -208,7 +212,7 @@ See also L. Captured groups are numbered according to their I paren. -=head1 FUNCTIONS +=head2 FUNCTIONS lc Lowercase a string lcfirst Lowercase first char of a string @@ -222,12 +226,12 @@ Captured groups are numbered according to their I paren. split Use regex to split a string into parts -The first four of these are identical to the escape sequences \l, \u, -\L, and \U. For Titlecase, see L. +The first four of these are like the escape sequences C<\L>, C<\l>, +C<\U>, and C<\u>. For Titlecase, see L. -=head1 Terminology +=head2 TERMINOLOGY -=head2 Titlecase +=head3 Titlecase Unicode concept which most often is equal to uppercase, but for certain characters like the German "sharp s" there is a difference.