Commit | Line | Data |
505afebf |
1 | <html> |
2 | |
3 | <head> |
4 | <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> |
5 | <title>Unicode 3.0 NamesList File Structure</title> |
6 | </head> |
7 | |
8 | <body> |
9 | |
10 | <h3>Unicode NamesList File Format</h3> |
11 | |
12 | <p>Last updated: 1999-07-06</p> |
13 | |
14 | <h3>1.0 Introduction</h3> |
15 | |
16 | <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used |
17 | to drive the layout of the character code charts in the Unicode Standard. The information |
18 | in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files, |
19 | together with additional annotations for many characters. This document describes the |
20 | syntax rules for the file format, but also gives brief information on how each construct |
21 | is rendered when laid out for the book. Some of the syntax elements were used in |
22 | preparation of the drafts of the book and may not be present in the final, released form |
23 | of the NamesList.txt file.</p> |
24 | |
25 | <p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred |
26 | below as ISO-style). This necessitates the presence of some information in the name list |
27 | file that is not needed (and in fact removed during parsing) for the Unicode book.</p> |
28 | |
29 | <p>With access to the layout program (unibook.exe) it is a simple matter of creating |
30 | name lists for the purpose of formatting working drafts containing proposed characters.</p> |
31 | |
32 | <h3>1.1 NamesList File Overview</h3> |
33 | |
34 | <p>The *.lst files are plain text files which in their most simple form look like this</p> |
35 | |
36 | <p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br> |
37 | ; this is a file comment (ignored)<br> |
38 | 0020<tab>SPACE<br> |
39 | 0021<tab>EXCLAMATION MARK<br> |
40 | 0022<tab>QUOTATION MARK<br> |
41 | . . . <br> |
42 | 007F<tab>DELETE</p> |
43 | |
44 | <p>The semicolon (as first character), @ and <tab> characters are used by the file |
45 | syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double |
46 | @@ introduces a block header, with the title, and start and ending code of the block |
47 | provided as shown.</p> |
48 | |
49 | <p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their |
50 | constituent syntax elements are needed.</p> |
51 | |
52 | <p>The full syntax with all the options is provided in the following sections.</p> |
53 | |
54 | <h3>1.2 NamesList File Structure</h3> |
55 | |
56 | <p>This section gives defines the overall file structure</p> |
57 | |
58 | <pre><strong>NAMELIST: TITLE_PAGE* BLOCK* |
59 | </strong> |
60 | <strong>TITLE_PAGE: TITLE |
61 | | TITLE_PAGE SUBTITLE |
62 | | TITLE_PAGE SUBHEADER |
63 | | TITLE_PAGE IGNORED_LINE |
64 | | TITLE_PAGE EMPTY_LINE |
65 | | TITLE_PAGE COMMENTLINE |
66 | | TITLE_PAGE NOTICE |
67 | | TITLE_PAGE PAGEBREAK |
68 | </strong> |
69 | <strong>BLOCK: BLOCKHEADER |
70 | | BLOCK CHAR_ENTRY |
71 | | BLOCK SUBHEADER |
72 | | BLOCK NOTICE |
73 | | BLOCK EMPTY_LINE |
74 | | BLOCK IGNORED_LINE |
75 | | BLOCK PAGEBREAK |
76 | |
77 | CHAR_ENTRY: NAME_LINE | RESERVED_LINE |
78 | | CHAR_ENTRY ALIAS_LINE |
79 | | CHAR_ENTRY COMMENT_LINE |
80 | | CHAR_ENTRY CROSS_REF |
81 | | CHAR_ENTRY DECOMPOSITION |
82 | | CHAR_ENTRY COMPAT_MAPPING |
83 | | CHAR_ENTRY IGNORED_LINE |
84 | | CHAR_ENTRY EMPTY_LINE |
85 | | CHAR_ENTRY NOTICE |
86 | </strong></pre> |
87 | |
88 | <p>In other words:<br> |
89 | <br> |
90 | Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p> |
91 | |
92 | <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may |
93 | occur before the first BLOCKHEADER.</p> |
94 | |
95 | <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of |
96 | the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE, |
97 | CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p> |
98 | |
99 | <p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other |
100 | place. </p> |
101 | |
102 | <p>Note: A NOTICE displays differently depending on whether it follows a header or title |
103 | or is part of a CHAR_ENTRY.</p> |
104 | |
105 | <h3>1.3 NamesList File Elements</h3> |
106 | |
107 | <p>This section provides the details of the syntax for the individual elements.</p> |
108 | |
109 | <pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre> |
110 | |
111 | <pre><small><strong>NAME_LINE: CHAR <tab> LINE |
112 | </strong> // the CHAR and the corresponding image are echoed, |
113 | // followed by the name as given in LINE |
114 | |
115 | <strong> CHAR TAB NAME COMMENT LF |
116 | </strong> // Names may have a comment, which is stripped off |
117 | // unless the file is parsed for an ISO style list |
118 | |
119 | <strong>RESERVED_LINE: CHAR TAB <reserved> |
120 | </strong> // the CHAR is echoed followed by an icon for the |
121 | // reserved character and a fixed string e.g. <reserved> |
122 | |
123 | <strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE |
124 | </strong> // * is replaced by BULLET, output line as comment |
125 | <strong><tab> EXPAND_LINE</strong> |
126 | // output line as comment |
127 | |
128 | <strong>ALIAS_LINE: <tab> "=" SP LINE |
129 | </strong> // replace = by itself, output line as alias |
130 | |
131 | <strong>CROSS_REF: <tab> "X" SP EXPAND_LINE |
132 | </strong> // X is replaced by a right arrow |
133 | <strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")" |
134 | </strong> // X is replaced by a right arrow |
135 | // the "(", "-", ")" are removed, the |
136 | // order of CHAR and STRING is reversed |
137 | // i.e. both inputs result in the same output |
138 | |
139 | <strong>IGNORED_LINE: <tab> ";" EXPAND_LINE |
140 | EMPTY_LINE: LF |
141 | </strong> // empty lines and file comments are ignored |
142 | |
143 | <strong>DECOMPOSITION: <tab> ":" EXPAND_LINE |
144 | </strong> // replace ':' by EQUIV, expand line into |
145 | // decomposition |
146 | |
147 | <strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE |
148 | </strong> // replace '#' by APPROX, output line as mapping |
149 | |
150 | <strong>NOTICE: "@+" <tab> LINE |
151 | </strong> // skip '@+', output text as notice |
152 | <strong> "@+" TAB * SP LINE |
153 | </strong> // skip '@', output text as notice |
154 | // "*" expands to a bullet character |
155 | // Notices following a character code apply to the |
156 | // character and are indented. Notices not following |
157 | // a character code apply to the page/block/column |
158 | // and are italicized, but not indented |
159 | |
160 | <strong>SUBTITLE: "@@@+" <tab> LINE |
161 | </strong> // skip "@@@+", output text as subtitle |
162 | |
163 | <strong>SUBHEADER: "@" <tab> LINE |
164 | </strong> // skip '@', output line as text as column header |
165 | |
166 | <strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND |
167 | </strong> // skip "@@", cause a page break and optional |
168 | // blank page, then output one or more charts |
169 | // followed by the list of character names. |
170 | // use BLOCKSTART and BLOCKEND to define the |
171 | // what characters belong to a block |
172 | // use blockname in page and table headers |
173 | <strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND |
174 | </strong>// if a comment is present it replaces the blockname |
175 | // when an ISO-style namelist is laid out |
176 | |
177 | <strong>BLOCKSTART: CHAR</strong> // first character position in block |
178 | <strong>BLOCKEND: CHAR</strong> // last character position in block |
179 | <strong>PAGE_BREAK: "@@"</strong> // insert a (column) break |
180 | |
181 | <strong>TITLE: "@@@" <tab> LINE</strong> |
182 | // skip "@@@", output line as text |
183 | // Title is used in page headers |
184 | |
185 | <strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong> |
186 | // all instances of CHAR *) are replaced by |
187 | // CHAR NBSP x NBSP where x is the single Unicode |
188 | // character corresponding to char |
189 | // If character is combining, it is replaced with |
190 | // CHAR NBSP <circ> x NBSP where <circ> is the |
191 | // dotted circle</small> |
192 | </pre> |
193 | |
194 | <h3><strong>1.4 NamesList File Primitives</strong></h3> |
195 | |
196 | <p>The following are the primitives and terminals for the NamesList syntax.</p> |
197 | |
198 | <pre><small><strong>LINE: STRING LF |
199 | COMMENT: "(" NAME ")" |
200 | "(" NAME ")" "*" |
201 | </strong> |
202 | <strong>NAME</strong>: <sequence of ASCII characters, except "(" or ")" > |
203 | <strong>STRING</strong>: <sequence of Latin-1 characters> |
204 | <strong>CHAR</strong>: <strong>X X X X</strong> |
205 | <strong>| X X X X X X X X X</strong></small> |
206 | <small><strong>X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" |
207 | <tab>:</strong> <sequence of one or more ASCII tab characters 0x09> |
208 | <strong>SP</strong>: <ASCII 0x20> |
209 | <strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D> |
210 | </small></pre> |
211 | |
212 | <p><strong>Notes:</strong> |
213 | |
214 | <ul> |
215 | <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from |
216 | being misinterpreted as ISO CHAR.</li> |
217 | <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as |
218 | UTF-16LE.</li> |
219 | <li>The final LF in the file must be present</li> |
220 | <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the |
221 | code value is not echoed</li> |
222 | <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules. |
223 | Apostrophes are supported, but nested quotes are not.</li> |
224 | </ul> |
225 | </body> |
226 | </html> |