X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlre.pod;h=58cd6456f5a3b4e3e9c30c5677c2ec8b5b5f8642;hb=c9436a12b1ee8d5e32d19b5870c63a8435afed9d;hp=5c7e76b5ad6b06a55f2de294cbc3aa0d9cc8ef9f;hpb=72ff290864ea88cc224b5d3af7058f500755f94a;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlre.pod b/pod/perlre.pod index 5c7e76b..58cd645 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -183,19 +183,23 @@ In addition, Perl defines the following: \pP Match P, named property. Use \p{Prop} for longer names. \PP Match non-P \X Match eXtended Unicode "combining character sequence", - equivalent to C<(?:\PM\pM*)> + equivalent to (?:\PM\pM*) \C Match a single C char (octet) even under Unicode. - B breaks up characters into their UTF-8 bytes, + NOTE: breaks up characters into their UTF-8 bytes, so you may end up with malformed pieces of UTF-8. -A C<\w> matches a single alphanumeric character or C<_>, not a whole word. -Use C<\w+> to match a string of Perl-identifier characters (which isn't -the same as matching an English word). If C is in effect, the -list of alphabetic characters generated by C<\w> is taken from the -current locale. See L. You may use C<\w>, C<\W>, C<\s>, C<\S>, +A C<\w> matches a single alphanumeric character (an alphabetic +character, or a decimal digit) or C<_>, not a whole word. Use C<\w+> +to match a string of Perl-identifier characters (which isn't the same +as matching an English word). If C is in effect, the list +of alphabetic characters generated by C<\w> is taken from the current +locale. See L. You may use C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, and C<\D> within character classes, but if you try to use them -as endpoints of a range, that's not a range, the "-" is understood literally. -See L for details about C<\pP>, C<\PP>, and C<\X>. +as endpoints of a range, that's not a range, the "-" is understood +literally. If Unicode is in effect, C<\s> matches also "\x{85}", +"\x{2028}, and "\x{2029}", see L for more details about +C<\pP>, C<\PP>, and C<\X>, and L about Unicode in +general. The POSIX character class syntax @@ -219,10 +223,22 @@ equivalents (if available) are as follows: word \w [3] xdigit - [1] A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'. - [2] Not I to C<\s> since the C<[[:space:]]> includes - also the (very rare) `vertical tabulator', "\ck", chr(11). - [3] A Perl extension. +=over + +=item [1] + +A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'. + +=item [2] + +Not exactly equivalent to C<\s> since the C<[[:space:]]> includes +also the (very rare) `vertical tabulator', "\ck", chr(11). + +=item [3] + +A Perl extension, see above. + +=back For example use C<[:upper:]> to match all the uppercase characters. Note that the C<[]> are part of the C<[::]> construct, not part of the @@ -446,12 +462,14 @@ C<)> in the comment. =item C<(?imsx-imsx)> -One or more embedded pattern-match modifiers. This is particularly -useful for dynamic patterns, such as those read in from a configuration -file, read in as an argument, are specified in a table somewhere, -etc. Consider the case that some of which want to be case sensitive -and some do not. The case insensitive ones need to include merely -C<(?i)> at the front of the pattern. For example: +One or more embedded pattern-match modifiers, to be turned on (or +turned off, if preceded by C<->) for the remainder of the pattern or +the remainder of the enclosing pattern group (if any). This is +particularly useful for dynamic patterns, such as those read in from a +configuration file, read in as an argument, are specified in a table +somewhere, etc. Consider the case that some of which want to be case +sensitive and some do not. The case insensitive ones need to include +merely C<(?i)> at the front of the pattern. For example: $pattern = "foobar"; if ( /$pattern/i ) { } @@ -461,8 +479,7 @@ C<(?i)> at the front of the pattern. For example: $pattern = "(?i)foobar"; if ( /$pattern/ ) { } -Letters after a C<-> turn those modifiers off. These modifiers are -localized inside an enclosing group (if any). For example, +These modifiers are restored at the end of the enclosing group. For example, ( (?i) blah ) \s+ \1