Commit | Line | Data |
06bfd75b |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" |
2 | |
3 | "http://www.w3.org/TR/REC-html40/loose.dtd"> |
4 | |
505afebf |
5 | <html> |
6 | |
7 | <head> |
06bfd75b |
8 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
9 | <meta http-equiv="Content-Language" content="en-us"> |
10 | <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> |
11 | <meta name="ProgId" content="FrontPage.Editor.Document"> |
12 | <meta name="keywords" |
13 | content="unicode, normalization, composition, decomposition"> |
14 | <meta name="description" content="Specifies the Unicode Normalization Formats"> |
15 | <title>UCD: Unicode NamesList File Format</title> |
16 | <link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css"> |
17 | <style type="text/css"> |
18 | |
19 | <!-- |
20 | |
21 | .foo { } |
22 | --> |
23 | |
24 | </style> |
505afebf |
25 | </head> |
26 | |
06bfd75b |
27 | <body bgcolor="#ffffff"> |
28 | |
29 | <table width="100%" cellpadding="0" cellspacing="0" border="0"> |
30 | <tr> |
31 | <td> |
32 | <table width="100%" border="0" cellpadding="0" cellspacing="0"> |
33 | <tr> |
34 | <td class="icon"><a href="http://www.unicode.org"><img border="0" |
35 | src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle" |
36 | alt="[Unicode]" width="34" height="33"></a> <a |
37 | class="bar" href="UnicodeCharacterDatabase-3.1.0.html">Unicode Character |
38 | Database</a></td> |
39 | </tr> |
40 | </table> |
41 | </td> |
42 | </tr> |
43 | <tr> |
44 | <td class="gray"> </td> |
45 | </tr> |
46 | </table> |
47 | <h1>Unicode NamesList File Format</h1> |
48 | <table height="87" cellSpacing="2" cellPadding="0" width="100%" border="1"> |
49 | <tbody> |
50 | <tr> |
51 | <td vAlign="top" width="144">Revision</td> |
52 | <td vAlign="top">3.1</td> |
53 | </tr> |
54 | <tr> |
55 | <td vAlign="top" width="144">Authors</td> |
56 | <td vAlign="top">Asmus Freytag</td> |
57 | </tr> |
58 | <tr> |
59 | <td vAlign="top" width="144">Date</td> |
60 | <td vAlign="top">2001-02-26</td> |
61 | </tr> |
62 | <tr> |
63 | <td vAlign="top" width="144">This Version</td> |
64 | <td vAlign="top"><a href="http://http://www.unicode.org/Public/3.1-Update/NamesList-2.html">http://www.unicode.org/Public/3.1-Update/NamesList-2.html</a></td> |
65 | </tr> |
66 | <tr> |
67 | <td vAlign="top" width="144">Previous Version</td> |
68 | <td vAlign="top"><a href="http://http://www.unicode.org/Public/3.0-Update/NamesList-1.html">http://www.unicode.org/Public/3.0-Update/NamesList-1.html</a></td> |
69 | </tr> |
70 | <tr> |
71 | <td vAlign="top" width="144">Latest Version</td> |
72 | <td vAlign="top"><a href="http://www.unicode.org/Public/UNIDATA/NamesList.html">http://www.unicode.org/Public/UNIDATA/NamesList.html</a></td> |
73 | </tr> |
74 | </tbody> |
75 | </table> |
76 | <h3> |
77 | <br> |
78 | <i>Summary</i></h3> |
79 | <blockquote> |
80 | <p>This file describes the format and contents of NamesList.txt</p> |
81 | </blockquote> |
82 | <h3><i>Status</i></h3> |
83 | <blockquote> |
84 | <p> |
85 | <i>The file and the files described herein are part of the <a href="UnicodeCharacterDatabase-3.1.0.html"> Unicode Character Database</a> |
86 | (UCD) |
87 | and are governed by the <a href="#Terms of Use">UCD Terms of Use</a> stated at the end.</i></p> |
88 | </blockquote> |
89 | <hr width="50%"> |
90 | |
91 | <h2>1.0 Introduction</h2> |
505afebf |
92 | |
93 | <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used |
94 | to drive the layout of the character code charts in the Unicode Standard. The information |
95 | in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files, |
96 | together with additional annotations for many characters. This document describes the |
97 | syntax rules for the file format, but also gives brief information on how each construct |
98 | is rendered when laid out for the book. Some of the syntax elements were used in |
99 | preparation of the drafts of the book and may not be present in the final, released form |
100 | of the NamesList.txt file.</p> |
101 | |
102 | <p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred |
103 | below as ISO-style). This necessitates the presence of some information in the name list |
104 | file that is not needed (and in fact removed during parsing) for the Unicode book.</p> |
105 | |
106 | <p>With access to the layout program (unibook.exe) it is a simple matter of creating |
107 | name lists for the purpose of formatting working drafts containing proposed characters.</p> |
108 | |
109 | <h3>1.1 NamesList File Overview</h3> |
110 | |
111 | <p>The *.lst files are plain text files which in their most simple form look like this</p> |
112 | |
113 | <p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br> |
114 | ; this is a file comment (ignored)<br> |
115 | 0020<tab>SPACE<br> |
116 | 0021<tab>EXCLAMATION MARK<br> |
117 | 0022<tab>QUOTATION MARK<br> |
118 | . . . <br> |
119 | 007F<tab>DELETE</p> |
120 | |
121 | <p>The semicolon (as first character), @ and <tab> characters are used by the file |
122 | syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double |
123 | @@ introduces a block header, with the title, and start and ending code of the block |
124 | provided as shown.</p> |
125 | |
126 | <p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their |
127 | constituent syntax elements are needed.</p> |
128 | |
129 | <p>The full syntax with all the options is provided in the following sections.</p> |
130 | |
131 | <h3>1.2 NamesList File Structure</h3> |
132 | |
133 | <p>This section gives defines the overall file structure</p> |
134 | |
135 | <pre><strong>NAMELIST: TITLE_PAGE* BLOCK* |
136 | </strong> |
137 | <strong>TITLE_PAGE: TITLE |
138 | | TITLE_PAGE SUBTITLE |
139 | | TITLE_PAGE SUBHEADER |
140 | | TITLE_PAGE IGNORED_LINE |
141 | | TITLE_PAGE EMPTY_LINE |
142 | | TITLE_PAGE COMMENTLINE |
143 | | TITLE_PAGE NOTICE |
144 | | TITLE_PAGE PAGEBREAK |
145 | </strong> |
146 | <strong>BLOCK: BLOCKHEADER |
147 | | BLOCK CHAR_ENTRY |
148 | | BLOCK SUBHEADER |
149 | | BLOCK NOTICE |
150 | | BLOCK EMPTY_LINE |
151 | | BLOCK IGNORED_LINE |
152 | | BLOCK PAGEBREAK |
153 | |
154 | CHAR_ENTRY: NAME_LINE | RESERVED_LINE |
155 | | CHAR_ENTRY ALIAS_LINE |
156 | | CHAR_ENTRY COMMENT_LINE |
157 | | CHAR_ENTRY CROSS_REF |
158 | | CHAR_ENTRY DECOMPOSITION |
159 | | CHAR_ENTRY COMPAT_MAPPING |
160 | | CHAR_ENTRY IGNORED_LINE |
161 | | CHAR_ENTRY EMPTY_LINE |
162 | | CHAR_ENTRY NOTICE |
163 | </strong></pre> |
164 | |
06bfd75b |
165 | <p>In other words:<br> |
505afebf |
166 | <br> |
06bfd75b |
167 | Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p> |
505afebf |
168 | |
06bfd75b |
169 | <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may |
170 | occur before the first BLOCKHEADER.</p> |
505afebf |
171 | |
172 | <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of |
173 | the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE, |
174 | CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p> |
175 | |
176 | <p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other |
177 | place. </p> |
178 | |
179 | <p>Note: A NOTICE displays differently depending on whether it follows a header or title |
180 | or is part of a CHAR_ENTRY.</p> |
181 | |
182 | <h3>1.3 NamesList File Elements</h3> |
183 | |
184 | <p>This section provides the details of the syntax for the individual elements.</p> |
185 | |
186 | <pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre> |
187 | |
188 | <pre><small><strong>NAME_LINE: CHAR <tab> LINE |
189 | </strong> // the CHAR and the corresponding image are echoed, |
190 | // followed by the name as given in LINE |
191 | |
192 | <strong> CHAR TAB NAME COMMENT LF |
193 | </strong> // Names may have a comment, which is stripped off |
194 | // unless the file is parsed for an ISO style list |
195 | |
196 | <strong>RESERVED_LINE: CHAR TAB <reserved> |
197 | </strong> // the CHAR is echoed followed by an icon for the |
198 | // reserved character and a fixed string e.g. <reserved> |
199 | |
200 | <strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE |
201 | </strong> // * is replaced by BULLET, output line as comment |
202 | <strong><tab> EXPAND_LINE</strong> |
203 | // output line as comment |
204 | |
205 | <strong>ALIAS_LINE: <tab> "=" SP LINE |
206 | </strong> // replace = by itself, output line as alias |
207 | |
208 | <strong>CROSS_REF: <tab> "X" SP EXPAND_LINE |
209 | </strong> // X is replaced by a right arrow |
210 | <strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")" |
211 | </strong> // X is replaced by a right arrow |
212 | // the "(", "-", ")" are removed, the |
213 | // order of CHAR and STRING is reversed |
214 | // i.e. both inputs result in the same output |
215 | |
216 | <strong>IGNORED_LINE: <tab> ";" EXPAND_LINE |
217 | EMPTY_LINE: LF |
218 | </strong> // empty lines and file comments are ignored |
219 | |
220 | <strong>DECOMPOSITION: <tab> ":" EXPAND_LINE |
221 | </strong> // replace ':' by EQUIV, expand line into |
222 | // decomposition |
223 | |
224 | <strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE |
225 | </strong> // replace '#' by APPROX, output line as mapping |
226 | |
227 | <strong>NOTICE: "@+" <tab> LINE |
228 | </strong> // skip '@+', output text as notice |
229 | <strong> "@+" TAB * SP LINE |
230 | </strong> // skip '@', output text as notice |
231 | // "*" expands to a bullet character |
232 | // Notices following a character code apply to the |
233 | // character and are indented. Notices not following |
234 | // a character code apply to the page/block/column |
235 | // and are italicized, but not indented |
236 | |
237 | <strong>SUBTITLE: "@@@+" <tab> LINE |
238 | </strong> // skip "@@@+", output text as subtitle |
239 | |
240 | <strong>SUBHEADER: "@" <tab> LINE |
241 | </strong> // skip '@', output line as text as column header |
242 | |
243 | <strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND |
244 | </strong> // skip "@@", cause a page break and optional |
245 | // blank page, then output one or more charts |
246 | // followed by the list of character names. |
247 | // use BLOCKSTART and BLOCKEND to define the |
06bfd75b |
248 | // characters belonging to a block |
505afebf |
249 | // use blockname in page and table headers |
250 | <strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND |
251 | </strong>// if a comment is present it replaces the blockname |
252 | // when an ISO-style namelist is laid out |
253 | |
254 | <strong>BLOCKSTART: CHAR</strong> // first character position in block |
255 | <strong>BLOCKEND: CHAR</strong> // last character position in block |
256 | <strong>PAGE_BREAK: "@@"</strong> // insert a (column) break |
257 | |
258 | <strong>TITLE: "@@@" <tab> LINE</strong> |
259 | // skip "@@@", output line as text |
260 | // Title is used in page headers |
261 | |
262 | <strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong> |
263 | // all instances of CHAR *) are replaced by |
264 | // CHAR NBSP x NBSP where x is the single Unicode |
265 | // character corresponding to char |
266 | // If character is combining, it is replaced with |
267 | // CHAR NBSP <circ> x NBSP where <circ> is the |
06bfd75b |
268 | // dotted circle</small></pre> |
269 | |
270 | <p><strong>Notes:</strong> |
271 | |
272 | </p> |
273 | |
274 | <ul> |
275 | <li>Blocks must be aligned on 16-code point boundary and contain an integer |
276 | multiple of code points. The exception to that rule is for blocks of |
277 | ideographs etc. for which no names are listed in the file. Such blocks must |
278 | end on the actual last character.</li> |
279 | <li>Blocks must be non-overlapping and in ascending order. Namelines |
280 | must be in ascending order and following the block header for the block to |
281 | which they belong.</li> |
282 | <li>Reserved entries are optional, and will be supplied automatically. They |
283 | are required whenever followed by ALIAS_LINE, COMMENT_LINE or CROSS_REF</li> |
284 | </ul> |
505afebf |
285 | |
286 | <h3><strong>1.4 NamesList File Primitives</strong></h3> |
287 | |
288 | <p>The following are the primitives and terminals for the NamesList syntax.</p> |
289 | |
06bfd75b |
290 | <pre><strong><small>LINE: STRING LF |
291 | COMMENT: "(" NAME ")" |
292 | "(" NAME ")" "*" </small></strong><small> |
293 | <strong>BLOCKNAME:</strong> <sequence of Latin-1 characters, except "(" and ")"> |
294 | <strong>NAME</strong>: <sequence of uppercase ASCII letters, digit and hyphen> |
505afebf |
295 | <strong>STRING</strong>: <sequence of Latin-1 characters> |
296 | <strong>CHAR</strong>: <strong>X X X X</strong> |
06bfd75b |
297 | <strong>| X X X X X</strong> |
298 | <strong>| X X X X X X</strong></small> |
505afebf |
299 | <small><strong>X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" |
300 | <tab>:</strong> <sequence of one or more ASCII tab characters 0x09> |
301 | <strong>SP</strong>: <ASCII 0x20> |
302 | <strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D> |
303 | </small></pre> |
304 | |
305 | <p><strong>Notes:</strong> |
306 | |
307 | <ul> |
308 | <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from |
06bfd75b |
309 | being misinterpreted as ISO CHAR. The - in a character range CHAR-CHAR is |
310 | replaced by an EN DASH.</li> |
505afebf |
311 | <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as |
312 | UTF-16LE.</li> |
313 | <li>The final LF in the file must be present</li> |
06bfd75b |
314 | <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, |
315 | the |
316 | code value is not echoed.</li> |
317 | <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules. |
318 | Apostrophes are supported, but nested quotes are not.</li> |
505afebf |
319 | </ul> |
06bfd75b |
320 | <h2>Modifications</h2> |
321 | <p>Use of 4-6 digit hex notation is now supported.</p> |
322 | <hr width="50%"> |
323 | <h2> |
324 | UCD <a name="Terms of Use">Terms of Use</a></h2> |
325 | <h3> |
326 | <i>Disclaimer</i></h3> |
327 | <blockquote> |
328 | <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No |
329 | claims are made as to fitness for any particular purpose. No warranties of any |
330 | kind are expressed or implied. The recipient agrees to determine applicability |
331 | of information provided. If this file has been purchased on magnetic or |
332 | optical media from Unicode, Inc., the sole remedy for any claim will be |
333 | exchange of defective media within 90 days of receipt.</i></p> |
334 | <p><i>This disclaimer is applicable for all other data files accompanying the |
335 | Unicode Character Database, some of which have been compiled by the Unicode |
336 | Consortium, and some of which have been supplied by other sources.</i></p> |
337 | </blockquote> |
338 | <h3><i>Limitations on Rights to Redistribute This Data</i></h3> |
339 | <blockquote> |
340 | <p><i>Recipient is granted the right to make copies in any form for internal |
341 | distribution and to freely use the information supplied in the creation of |
342 | products supporting the Unicode<sup>TM</sup> Standard. The files in the |
343 | Unicode Character Database can be redistributed to third parties or other |
344 | organizations (whether for profit or not) as long as this notice and the |
345 | disclaimer notice are retained. Information can be extracted from these files |
346 | and used in documentation or programs, as long as there is an accompanying |
347 | notice indicating the source.</i></p> |
348 | </blockquote> |
349 | <hr width="50%"> |
350 | <div align="center"> |
351 | <center> |
352 | <table cellspacing="0" cellpadding="0" border="0"> |
353 | <tr> |
354 | <td><a href="../../../../../../index.html"><img |
355 | src="http://www.unicode.org/img/hb_home.gif" border="0" |
356 | alt="Home" width="40" height="49"></a><a |
357 | href="../copyright.html"><img |
358 | src="http://www.unicode.org/img/hb_mid.gif" border="0" |
359 | alt="Terms of Use" width="152" height="49"></a><a |
360 | href="mailto:info@unicode.org"><img |
361 | src="http://www.unicode.org/img/hb_mail.gif" border="0" |
362 | alt="E-mail" width="46" height="49"></a></td> |
363 | </tr> |
364 | </table> |
365 | <script language="Javascript" src="http://www.unicode.org/webscripts/lastModified.js"></script> |
366 | </center> |
367 | </div> |
368 | </form> |
369 | |
370 | </body> |
371 | |
372 | </html> |