From: Karl Williamson Date: Sun, 28 Mar 2010 04:43:32 +0000 (-0600) Subject: Remove duplicate information and refer to other pods X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=d0b161077624458de6e6b915c2c6d48c04aca5e4;p=p5sagit%2Fp5-mst-13.2.git Remove duplicate information and refer to other pods Things were getting out of sync. --- diff --git a/pod/perlre.pod b/pod/perlre.pod index 014f921..12a1119 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -166,7 +166,7 @@ X X X<*> X<+> X X<{n}> X<{n,}> X<{n,m}> as a regular character. In particular, the lower bound is not optional.) The "*" quantifier is equivalent to C<{0,}>, the "+" quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited -to integral values less than a preset limit defined when perl is built. +to non-negative integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this: @@ -223,7 +223,7 @@ instance the above example could also be written as follows: Because patterns are processed as double quoted strings, the following also work: X<\t> X<\n> X<\r> X<\f> X<\e> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> -X<\0> X<\c> X<\N> X<\x> +X<\0> X<\c> X<\N{}> X<\x> \t tab (HT, TAB) \n newline (LF, NL) @@ -256,9 +256,7 @@ You'll need to write something like C. =head3 Character Classes and other Special Escapes In addition, Perl defines the following: -X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C> -X<\g> X<\k> X<\N> X<\K> X<\v> X<\V> X<\h> X<\H> -X X X X +X<\g> X<\k> X<\K> X \w Match a "word" character (alphanumeric plus "_") \W Match a non-"word" character @@ -288,29 +286,10 @@ X X X X \H Not horizontal whitespace \R Linebreak -A C<\w> matches a single alphanumeric character (an alphabetic -character, or a decimal digit) or C<_>, not a whole word. Use C<\w+> -to match a string of Perl-identifier characters (which isn't the same -as matching an English word). If C is in effect, the list -of alphabetic characters generated by C<\w> is taken from the current -locale. See L. You may use C<\w>, C<\W>, C<\s>, C<\S>, -C<\d>, and C<\D> within character classes, but they aren't usable -as either end of a range. If any of them precedes or follows a "-", -the "-" is understood literally. If Unicode is in effect, C<\s> matches -also "\x{85}", "\x{2028}", and "\x{2029}". See L for more -details about C<\pP>, C<\PP>, C<\X> and the possibility of defining -your own C<\p> and C<\P> properties, and L about Unicode -in general. -X<\w> X<\W> X - -C<\R> will atomically match a linebreak, including the network line-ending -"\x0D\x0A". Specifically, X<\R> is exactly equivalent to - - (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]) - -B C<\R> has no special meaning inside of a character class; -use C<\v> instead (vertical whitespace). -X<\R> +See L for details on +on C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, C<\D>, C<\p>, C<\P>, C<\N>, C<\v>, C<\V>, +C<\h>, and C<\H>. +See L for details on C<\R> and C<\X>. Note that C<\N> has two meanings. When of the form C<\N{NAME}>, it matches the character whose name is C; and similarly when of the form @@ -331,113 +310,33 @@ they must always be used within a character class expression. # this is not, and will generate a warning: $string =~ /[:alpha:]/; -The following table shows the mapping of POSIX character class -names, common escapes, literal escape sequences and their equivalent -Unicode style property names. -X X<\p> X<\p{}> -X X X X X X X -X X X X X X X - -B up to Perl 5.10 the property names used were shared with -standard Unicode properties, this was changed in Perl 5.11, see -L for details. - - POSIX Esc Class Property Note - -------------------------------------------------------- - alnum [0-9A-Za-z] IsPosixAlnum - alpha [A-Za-z] IsPosixAlpha - ascii [\000-\177] IsASCII - blank [\011 ] IsPosixBlank [1] - cntrl [\0-\37\177] IsPosixCntrl - digit \d [0-9] IsPosixDigit - graph [!-~] IsPosixGraph - lower [a-z] IsPosixLower - print [ -~] IsPosixPrint - punct [!-/:-@[-`{-~] IsPosixPunct - space [\11-\15 ] IsPosixSpace [2] - \s [\11\12\14\15 ] IsPerlSpace [2] - upper [A-Z] IsPosixUpper - word \w [0-9A-Z_a-z] IsPerlWord [3] - xdigit [0-9A-Fa-f] IsXDigit - -=over - -=item [1] - -A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace". - -=item [2] - -Note that C<\s> and C<[[:space:]]> are B equivalent as C<[[:space:]]> -includes also the (very rare) "vertical tabulator", "\cK" or chr(11) in -ASCII. - -=item [3] - -A Perl extension, see above. - -=back - -For example use C<[:upper:]> to match all the uppercase characters. -Note that the C<[]> are part of the C<[::]> construct, not part of the -whole character class. For example: - - [01[:alpha:]%] - -matches zero, one, any alphabetic character, and the percent sign. - -The other named classes are: - -=over 4 - -=item cntrl -X - -Any control character. Usually characters that don't produce output as -such but instead control the terminal somehow: for example newline and -backspace are control characters. All characters with ord() less than -32 are usually classified as control characters (assuming ASCII, -the ISO Latin character sets, and Unicode), as is the character with -the ord() value of 127 (C). - -=item graph -X - -Any alphanumeric or punctuation (special) character. - -=item print -X - -Any alphanumeric or punctuation (special) character or the space character. - -=item punct -X - -Any punctuation (special) character. - -=item xdigit -X - -Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would -work just fine) it is included for completeness. - -=back +The following Posix-style character classes are available: + + [[:alpha:]] Any alphabetical character. + [[:alnum:]] Any alphanumerical character. + [[:ascii:]] Any character in the ASCII character set. + [[:blank:]] A GNU extension, equal to a space or a horizontal tab + [[:cntrl:]] Any control character. + [[:digit:]] Any decimal digit, equivalent to "\d". + [[:graph:]] Any printable character, excluding a space. + [[:lower:]] Any lowercase character. + [[:print:]] Any printable character, including a space. + [[:punct:]] Any graphical character excluding "word" characters. + [[:space:]] Any whitespace character. "\s" plus the vertical tab ("\cK"). + [[:upper:]] Any uppercase character. + [[:word:]] A Perl extension, equivalent to "\w". + [[:xdigit:]] Any hexadecimal digit. You can negate the [::] character classes by prefixing the class name -with a '^'. This is a Perl extension. For example: -X - - POSIX traditional Unicode +with a '^'. This is a Perl extension. - [[:^digit:]] \D \P{IsPosixDigit} - [[:^space:]] \S \P{IsPosixSpace} - [[:^word:]] \W \P{IsPerlWord} - -Perl respects the POSIX standard in that POSIX character classes are -only supported within a character class. The POSIX character classes +The POSIX character classes [.cc.] and [=cc=] are recognized but B supported and trying to use them will cause an error. +Details on POSIX character classes are in +L. + =head3 Assertions Perl defines the following zero-width assertions: