Document \N{U+...}

diff --git a/pod/perlop.pod b/pod/perlop.pod

index a6514bb..7a24364 100644 (file)
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -1025,6 +1025,7 @@ X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N>
     \x{263a}   wide hex char   (example: SMILEY)
     \c[                control char    (example: ESC)
     \N{name}   named Unicode character
+    \N{U+263D} Unicode character (example: FIRST QUARTER MOON)
 
 The character following C<\c> is mapped to some other character by
 converting letters to upper case and then (on ASCII systems) by inverting
@@ -1034,10 +1035,15 @@ through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only
 '@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting
 in 0x00 through 0x1F and 0x7F.
 
-B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for
-the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>.
+C<\N{U+I<wide hex char>}> means the Unicode character whose Unicode ordinal
+number is I<wide hex char>.
+For documentation of C<\N{name}>, see L<charnames>.
 
-The following escape sequences are available in constructs that interpolate
+B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
+the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>.  (C<\v>
+does have meaning in regular expression patterns in Perl, see L<perlre>.)
+
+The following escape sequences are available in constructs that interpolate,
 but not in transliterations.
 X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
 
@@ -1052,8 +1058,7 @@ If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
 C<\u> and C<\U> is taken from the current locale.  See L<perllocale>.
 If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
 beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
-C<\U> is as defined by Unicode.  For documentation of C<\N{name}>,
-see L<charnames>.
+C<\U> is as defined by Unicode.
 
 All systems use the virtual C<"\n"> to represent a line terminator,
 called a "newline".  There is no such thing as an unvarying, physical
diff --git a/pod/perlre.pod b/pod/perlre.pod

index f82d196..f98fba7 100644 (file)
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -226,6 +226,7 @@ X<\0> X<\c> X<\N> X<\x>
     \x{263a}   long hex char         (example: Unicode SMILEY)
     \cK                control char          (example: VT)
     \N{name}   named Unicode character
+    \N{U+263D} Unicode character     (example: FIRST QUARTER MOON)
     \l         lowercase next char (think vi)
     \u         uppercase next char (think vi)
     \L         lowercase till \E (think vi)
@@ -302,7 +303,9 @@ use C<\v> instead (vertical whitespace).
 X<\R>
 
 Note that C<\N> has two meanings.  When of the form C<\N{NAME}>, it matches the
-character whose name is C<NAME>.  Otherwise it matches any character but C<\n>.
+character whose name is C<NAME>; and similarly when of the form
+C<\N{U+I<wide hex char>}>, it matches the character whose Unicode ordinal is
+I<wide hex char>.  Otherwise it matches any character but C<\n>.
 
 The POSIX character class syntax
 X<character class>
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod

index 271f4b3..e14ef35 100644 (file)
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -86,7 +86,7 @@ as C<Not in [].>
  \L                Lowercase till \E.  Not in [].
  \n                (Logical) newline character.
  \N                Any character but newline.  Not in [].
- \N{}              Named (Unicode) character.
+ \N{}              Named or numbered (Unicode) character.
  \p{}, \pP         Character with the given Unicode property.
  \P{}, \PP         Character without the given Unicode property.
  \Q                Quotemeta till \E.  Not in [].
@@ -156,15 +156,26 @@ Mnemonic: I<c>ontrol character.
 
  $str =~ /\cK/;  # Matches if $str contains a vertical tab (control-K).
 
-=head3 Named characters
+=head3 Named or numbered characters
 
-All Unicode characters have a Unicode name. It is even possible to give your
-own names to characters, even to short sequences of characters.  You can use a
-character by name by using the C<\N{}> construct; the name of the character
-goes between the curly braces.  You do have to C<use charnames> to load the
-Unicode names of the characters, otherwise Perl will complain.  (If you instead
-have your own names, a C<use> statement will be required for your translator.)
-For more details, see L<charnames>.
+All Unicode characters have a Unicode name and numeric ordinal value.  Use the
+C<\N{}> construct to specify a character by either of these values.
+
+To specify by name, the name of the character goes between the curly braces.
+In this case, you have to C<use charnames> to load the Unicode names of the
+characters, otherwise Perl will complain.
+
+To specify by Unicode ordinal number, use the form
+C<\N{U+I<wide hex character>}>, where I<wide hex character> is a number in
+hexadecimal that gives the ordinal number that Unicode has assigned to the
+desired character.  It is customary (but not required) to use leading zeros to
+pad the number to 4 digits.  Thus C<\N{U+0041}> means
+C<Latin Capital Letter A>, and you will rarely see it written without the two
+leading zeros.  C<\N{U+0041}> means C<A> even on EBCDIC machines (where the
+ordinal value of C<A> is not 0x41).
+
+It is even possible to give your own names to characters, and even to short
+sequences of characters.  For details, see L<charnames>.
 
 Mnemonic: I<N>amed character.
 
@@ -505,7 +516,8 @@ It is a short-hand for writing C<[^\n]>, and is identical to the C<.>
 metasymbol, except under the C</s> flag, which changes the meaning of C<.>, but
 not C<\N>.
 
-Note that C<\N{...}> can mean a L<named character|/Named characters>.
+Note that C<\N{...}> can mean a
+L<named or numbered character|/Named or numbered characters>.
 
 Mnemonic: Complement of I<\n>.
 
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod

index 55b178e..ae44f84 100644 (file)
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -120,8 +120,8 @@ named characters, if C<\N> is followed by an opening brace and something that
 is not a quantifier, perl will assume that a character name is coming.  For
 example, C<\N{3}> means to match 3 non-newlines; C<\N{5,}> means to match 5 or
 more non-newlines, but C<\N{4F}> is not a legal quantifier, and will cause
-perl to look for a character named C<4F> (and won't find it unless custom names
-have been defined.)
+perl to look for a character named C<4F> (and won't find one unless custom names
+have been defined that include it.)
 
 C<\v> will match any character that is considered vertical white space;
 this includes the carriage return and line feed characters (newline).
@@ -292,7 +292,8 @@ C<\c>,
 C<\e>,
 C<\f>,
 C<\n>,
-C<\N{NAME}>,
+C<\N{I<NAME>}>,
+C<\N{U+I<wide hex char>}>,
 C<\r>,
 C<\t>,
 and
@@ -388,10 +389,10 @@ as if you put all the characters matched by the backslash sequence inside the
 character class. For instance, C<[a-f\d]> will match any digit, or any of the
 lowercase letters between 'a' and 'f' inclusive.
 
-C<\N> within a bracketed character class must be of the form C<\N{NAME}> for
-the same reason that a dot C<.> inside a bracketed character class loses its
-special meaning: it matches nearly anything, which generally isn't what you
-want to happen.
+C<\N> within a bracketed character class must be of the forms C<\N{I<name>}>  or
+C<\N{U+I<wide hex char>}> for the same reason that a dot C<.> inside a
+bracketed character class loses its special meaning: it matches nearly
+anything, which generally isn't what you want to happen.
 
 Examples:
 
diff --git a/pod/perlreref.pod b/pod/perlreref.pod

index 266868d..a776e5f 100644 (file)
--- a/pod/perlreref.pod
+++ b/pod/perlreref.pod
@@ -92,6 +92,7 @@ These work as in normal strings.
    \x{263a} A wide hexadecimal value
    \cx      Control-x
    \N{name} A named character
+   \N{U+263D} A Unicode character by hex ordinal
 
    \l  Lowercase next character
    \u  Titlecase next character
diff --git a/pod/perlretut.pod b/pod/perlretut.pod

index f9a4351..2798f68 100644 (file)
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -1866,8 +1866,8 @@ Unicode and encoded in UTF-8, then an explicit C<use utf8> is needed.)
 Figuring out the hexadecimal sequence of a Unicode character you want
 or deciphering someone else's hexadecimal Unicode regexp is about as
 much fun as programming in machine code.  So another way to specify
-Unicode characters is to use the I<named character>> escape
-sequence C<\N{name}>.  C<name> is a name for the Unicode character, as
+Unicode characters is to use the I<named character> escape
+sequence C<\N{I<name>}>.  I<name> is a name for the Unicode character, as
 specified in the Unicode standard.  For instance, if we wanted to
 represent or match the astrological sign for the planet Mercury, we
 could use
pod/perlop.pod		patch \| blob \| blame \| history
pod/perlre.pod		patch \| blob \| blame \| history
pod/perlrebackslash.pod		patch \| blob \| blame \| history
pod/perlrecharclass.pod		patch \| blob \| blame \| history
pod/perlreref.pod		patch \| blob \| blame \| history
pod/perlretut.pod		patch \| blob \| blame \| history