From: Karl Williamson Date: Thu, 25 Feb 2010 19:35:14 +0000 (-0700) Subject: Document \N{U+...} X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=e526e8bb0a50d374ea99b823a0b7ebc449c4cb0b;p=p5sagit%2Fp5-mst-13.2.git Document \N{U+...} --- diff --git a/pod/perlop.pod b/pod/perlop.pod index a6514bb..7a24364 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -1025,6 +1025,7 @@ X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> \x{263a} wide hex char (example: SMILEY) \c[ control char (example: ESC) \N{name} named Unicode character + \N{U+263D} Unicode character (example: FIRST QUARTER MOON) The character following C<\c> is mapped to some other character by converting letters to upper case and then (on ASCII systems) by inverting @@ -1034,10 +1035,15 @@ through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only '@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting in 0x00 through 0x1F and 0x7F. -B: Unlike C and other languages, Perl has no \v escape sequence for -the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. +C<\N{U+I}> means the Unicode character whose Unicode ordinal +number is I. +For documentation of C<\N{name}>, see L. -The following escape sequences are available in constructs that interpolate +B: Unlike C and other languages, Perl has no C<\v> escape sequence for +the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. (C<\v> +does have meaning in regular expression patterns in Perl, see L.) + +The following escape sequences are available in constructs that interpolate, but not in transliterations. X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> @@ -1052,8 +1058,7 @@ If C is in effect, the case map used by C<\l>, C<\L>, C<\u> and C<\U> is taken from the current locale. See L. If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and -C<\U> is as defined by Unicode. For documentation of C<\N{name}>, -see L. +C<\U> is as defined by Unicode. All systems use the virtual C<"\n"> to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical diff --git a/pod/perlre.pod b/pod/perlre.pod index f82d196..f98fba7 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -226,6 +226,7 @@ X<\0> X<\c> X<\N> X<\x> \x{263a} long hex char (example: Unicode SMILEY) \cK control char (example: VT) \N{name} named Unicode character + \N{U+263D} Unicode character (example: FIRST QUARTER MOON) \l lowercase next char (think vi) \u uppercase next char (think vi) \L lowercase till \E (think vi) @@ -302,7 +303,9 @@ use C<\v> instead (vertical whitespace). X<\R> Note that C<\N> has two meanings. When of the form C<\N{NAME}>, it matches the -character whose name is C. Otherwise it matches any character but C<\n>. +character whose name is C; and similarly when of the form +C<\N{U+I}>, it matches the character whose Unicode ordinal is +I. Otherwise it matches any character but C<\n>. The POSIX character class syntax X diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index 271f4b3..e14ef35 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -86,7 +86,7 @@ as C \L Lowercase till \E. Not in []. \n (Logical) newline character. \N Any character but newline. Not in []. - \N{} Named (Unicode) character. + \N{} Named or numbered (Unicode) character. \p{}, \pP Character with the given Unicode property. \P{}, \PP Character without the given Unicode property. \Q Quotemeta till \E. Not in []. @@ -156,15 +156,26 @@ Mnemonic: Iontrol character. $str =~ /\cK/; # Matches if $str contains a vertical tab (control-K). -=head3 Named characters +=head3 Named or numbered characters -All Unicode characters have a Unicode name. It is even possible to give your -own names to characters, even to short sequences of characters. You can use a -character by name by using the C<\N{}> construct; the name of the character -goes between the curly braces. You do have to C to load the -Unicode names of the characters, otherwise Perl will complain. (If you instead -have your own names, a C statement will be required for your translator.) -For more details, see L. +All Unicode characters have a Unicode name and numeric ordinal value. Use the +C<\N{}> construct to specify a character by either of these values. + +To specify by name, the name of the character goes between the curly braces. +In this case, you have to C to load the Unicode names of the +characters, otherwise Perl will complain. + +To specify by Unicode ordinal number, use the form +C<\N{U+I}>, where I is a number in +hexadecimal that gives the ordinal number that Unicode has assigned to the +desired character. It is customary (but not required) to use leading zeros to +pad the number to 4 digits. Thus C<\N{U+0041}> means +C, and you will rarely see it written without the two +leading zeros. C<\N{U+0041}> means C even on EBCDIC machines (where the +ordinal value of C is not 0x41). + +It is even possible to give your own names to characters, and even to short +sequences of characters. For details, see L. Mnemonic: Iamed character. @@ -505,7 +516,8 @@ It is a short-hand for writing C<[^\n]>, and is identical to the C<.> metasymbol, except under the C flag, which changes the meaning of C<.>, but not C<\N>. -Note that C<\N{...}> can mean a L. +Note that C<\N{...}> can mean a +L. Mnemonic: Complement of I<\n>. diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index 55b178e..ae44f84 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -120,8 +120,8 @@ named characters, if C<\N> is followed by an opening brace and something that is not a quantifier, perl will assume that a character name is coming. For example, C<\N{3}> means to match 3 non-newlines; C<\N{5,}> means to match 5 or more non-newlines, but C<\N{4F}> is not a legal quantifier, and will cause -perl to look for a character named C<4F> (and won't find it unless custom names -have been defined.) +perl to look for a character named C<4F> (and won't find one unless custom names +have been defined that include it.) C<\v> will match any character that is considered vertical white space; this includes the carriage return and line feed characters (newline). @@ -292,7 +292,8 @@ C<\c>, C<\e>, C<\f>, C<\n>, -C<\N{NAME}>, +C<\N{I}>, +C<\N{U+I}>, C<\r>, C<\t>, and @@ -388,10 +389,10 @@ as if you put all the characters matched by the backslash sequence inside the character class. For instance, C<[a-f\d]> will match any digit, or any of the lowercase letters between 'a' and 'f' inclusive. -C<\N> within a bracketed character class must be of the form C<\N{NAME}> for -the same reason that a dot C<.> inside a bracketed character class loses its -special meaning: it matches nearly anything, which generally isn't what you -want to happen. +C<\N> within a bracketed character class must be of the forms C<\N{I}> or +C<\N{U+I}> for the same reason that a dot C<.> inside a +bracketed character class loses its special meaning: it matches nearly +anything, which generally isn't what you want to happen. Examples: diff --git a/pod/perlreref.pod b/pod/perlreref.pod index 266868d..a776e5f 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -92,6 +92,7 @@ These work as in normal strings. \x{263a} A wide hexadecimal value \cx Control-x \N{name} A named character + \N{U+263D} A Unicode character by hex ordinal \l Lowercase next character \u Titlecase next character diff --git a/pod/perlretut.pod b/pod/perlretut.pod index f9a4351..2798f68 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -1866,8 +1866,8 @@ Unicode and encoded in UTF-8, then an explicit C is needed.) Figuring out the hexadecimal sequence of a Unicode character you want or deciphering someone else's hexadecimal Unicode regexp is about as much fun as programming in machine code. So another way to specify -Unicode characters is to use the I> escape -sequence C<\N{name}>. C is a name for the Unicode character, as +Unicode characters is to use the I escape +sequence C<\N{I}>. I is a name for the Unicode character, as specified in the Unicode standard. For instance, if we wanted to represent or match the astrological sign for the planet Mercury, we could use