The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
to drive the layout of the character code charts in the Unicode Standard. The information
@@ -85,12 +162,12 @@ CHAR_ENTRY: NAME_LINE | RESERVED_LINE
| CHAR_ENTRY NOTICE
-
In other words:
+
In other words:
-Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.
+Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.
-
Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may
-occur before the first BLOCKHEADER.
+
Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may
+occur before the first BLOCKHEADER.
Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
@@ -168,7 +245,7 @@ EMPTY_LINE: LF
// blank page, then output one or more charts
// followed by the list of character names.
// use BLOCKSTART and BLOCKEND to define the
- // what characters belong to a block
+ // characters belonging to a block
// use blockname in page and table headers
"@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND
// if a comment is present it replaces the blockname
@@ -188,21 +265,37 @@ EMPTY_LINE: LF
// character corresponding to char
// If character is combining, it is replaced with
// CHAR NBSP <circ> x NBSP where <circ> is the
- // dotted circle
-
+ // dotted circle
+
+
Notes:
+
+
+
+
+
Blocks must be aligned on 16-code point boundary and contain an integer
+ multiple of code points. The exception to that rule is for blocks of
+ ideographs etc. for which no names are listed in the file. Such blocks must
+ end on the actual last character.
+
Blocks must be non-overlapping and in ascending order. Namelines
+ must be in ascending order and following the block header for the block to
+ which they belong.
+
Reserved entries are optional, and will be supplied automatically. They
+ are required whenever followed by ALIAS_LINE, COMMENT_LINE or CROSS_REF
+
1.4 NamesList File Primitives
The following are the primitives and terminals for the NamesList syntax.
-
LINE: STRING LF
-COMMENT: "(" NAME ")"
- "(" NAME ")" "*"
-
-NAME: <sequence of ASCII characters, except "(" or ")" >
+
LINE: STRING LF
+COMMENT: "(" NAME ")"
+ "(" NAME ")" "*"
+BLOCKNAME: <sequence of Latin-1 characters, except "(" and ")">
+NAME: <sequence of uppercase ASCII letters, digit and hyphen>
STRING: <sequence of Latin-1 characters>
CHAR: X X X X
- | X X X X X X X X X
+ | X X X X X
+ | X X X X X X
X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"
<tab>: <sequence of one or more ASCII tab characters 0x09>
SP: <ASCII 0x20>
@@ -213,14 +306,67 @@ COMMENT: "(" NAME ")"
Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
- being misinterpreted as ISO CHAR.
+ being misinterpreted as ISO CHAR. The - in a character range CHAR-CHAR is
+ replaced by an EN DASH.
Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
UTF-16LE.
The final LF in the file must be present
-
A CHAR inside ' or " is expanded, but only its glyph image is printed, the
- code value is not echoed
-
Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
- Apostrophes are supported, but nested quotes are not.
+
A CHAR inside ' or " is expanded, but only its glyph image is printed,
+ the
+ code value is not echoed.
+
Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
+ Apostrophes are supported, but nested quotes are not.
The Unicode Character Database is provided as is by Unicode, Inc. No
+ claims are made as to fitness for any particular purpose. No warranties of any
+ kind are expressed or implied. The recipient agrees to determine applicability
+ of information provided. If this file has been purchased on magnetic or
+ optical media from Unicode, Inc., the sole remedy for any claim will be
+ exchange of defective media within 90 days of receipt.
+
This disclaimer is applicable for all other data files accompanying the
+ Unicode Character Database, some of which have been compiled by the Unicode
+ Consortium, and some of which have been supplied by other sources.
+
+
Limitations on Rights to Redistribute This Data
+
+
Recipient is granted the right to make copies in any form for internal
+ distribution and to freely use the information supplied in the creation of
+ products supporting the UnicodeTM Standard. The files in the
+ Unicode Character Database can be redistributed to third parties or other
+ organizations (whether for profit or not) as long as this notice and the
+ disclaimer notice are retained. Information can be extracted from these files
+ and used in documentation or programs, as long as there is an accompanying
+ notice indicating the source.