[p5sagit/p5-mst-13.2.git] / lib / unicode / NamesList.html

<html>

<head>
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<title>Unicode 3.0 NamesList File Structure</title>
</head>

<body>

<h3>Unicode NamesList File Format</h3>

<p>Last updated: 1999-07-06</p>

<h3>1.0 Introduction</h3>

<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
to drive the layout of the character code charts in the Unicode Standard. The information
in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files,
together with additional annotations for many characters. This document describes the
syntax rules for the file format, but also gives brief information on how each construct
is rendered when laid out for the book. Some of the syntax elements were used in
preparation of the drafts of the book and may not be present in the final, released form
of the NamesList.txt file.</p>

<p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred
below as ISO-style). This necessitates the presence of some information in the name list
file that is not needed (and in fact removed during parsing) for the Unicode book.</p>

<p>With access to the layout program (unibook.exe) it is a simple matter of creating
name lists for the purpose of formatting working drafts containing proposed characters.</p>

<h3>1.1 NamesList File Overview</h3>

<p>The *.lst files are plain text files which in their most simple form look like this</p>

<p>@@&lt;tab&gt;0020&lt;tab&gt;BASIC LATIN&lt;tab&gt;007F<br>
; this is a file comment (ignored)<br>
0020&lt;tab&gt;SPACE<br>
0021&lt;tab&gt;EXCLAMATION MARK<br>
0022&lt;tab&gt;QUOTATION MARK<br>
. . . <br>
007F&lt;tab&gt;DELETE</p>

<p>The semicolon (as first character), @ and &lt;tab&gt; characters are used by the file
syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double
@@ introduces a block header, with the title, and start and ending code of the block
provided as shown.</p>

<p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their
constituent syntax elements are needed.</p>

<p>The full syntax with all the options is provided in the following sections.</p>

<h3>1.2 NamesList File Structure</h3>

<p>This section gives defines the overall file structure</p>

<pre><strong>NAMELIST:     TITLE_PAGE* BLOCK* 
</strong>
<strong>TITLE_PAGE:   TITLE 
		| TITLE_PAGE SUBTITLE 
		| TITLE_PAGE SUBHEADER 
		| TITLE_PAGE IGNORED_LINE 
		| TITLE_PAGE EMPTY_LINE
		| TITLE_PAGE COMMENTLINE
		| TITLE_PAGE NOTICE
		| TITLE_PAGE PAGEBREAK 
</strong>
<strong>BLOCK:	      BLOCKHEADER 
		| BLOCK CHAR_ENTRY 
		| BLOCK SUBHEADER 
		| BLOCK NOTICE 
		| BLOCK EMPTY_LINE 
		| BLOCK IGNORED_LINE 
		| BLOCK PAGEBREAK

CHAR_ENTRY:   NAME_LINE | RESERVED_LINE
		| CHAR_ENTRY ALIAS_LINE
		| CHAR_ENTRY COMMENT_LINE
		| CHAR_ENTRY CROSS_REF
		| CHAR_ENTRY DECOMPOSITION
		| CHAR_ENTRY COMPAT_MAPPING
		| CHAR_ENTRY IGNORED_LINE
		| CHAR_ENTRY EMPTY_LINE
		| CHAR_ENTRY NOTICE
</strong></pre>

<p>In other words:<br>
<br>
Neither TITLE nor&nbsp; SUBTITLE may occur after the first BLOCKHEADER. </p>

<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE,&nbsp; and IGNORED_LINE may
occur before the first BLOCKHEADER.</p>

<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p>

<p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other
place. </p>

<p>Note: A NOTICE displays differently depending on whether it follows a header or title
or is part of a CHAR_ENTRY.</p>

<h3>1.3 NamesList File Elements</h3>

<p>This section provides the details of the syntax for the individual elements.</p>

<pre><small><strong>ELEMENT		SYNTAX</strong>	// How rendered</small></pre>

<pre><small><strong>NAME_LINE:	CHAR &lt;tab&gt; LINE
</strong>			// the CHAR and the corresponding image are echoed, 
			// followed by the name as given in LINE

<strong>		CHAR TAB NAME COMMENT LF
</strong>			// Names may have a comment, which is stripped off
			// unless the file is parsed for an ISO style list
										
<strong>RESERVED_LINE:	CHAR TAB &lt;reserved&gt;		
</strong>			// the CHAR is echoed followed by an icon for the
			// reserved character and a fixed string e.g. &lt;reserved&gt;
	
<strong>COMMMENT_LINE:	&lt;tab&gt; &quot;*&quot; SP EXPAND_LINE
</strong>			// * is replaced by BULLET, output line as comment
		<strong>&lt;tab&gt; EXPAND_LINE</strong>	
			// output line as comment

<strong>ALIAS_LINE:	&lt;tab&gt; &quot;=&quot; SP LINE	
</strong>			// replace = by itself, output line as alias

<strong>CROSS_REF:	&lt;tab&gt; &quot;X&quot; SP EXPAND_LINE	
</strong>			// X is replaced by a right arrow
<strong>		&lt;tab&gt; &quot;X&quot; SP &quot;(&quot; STRING SP &quot;-&quot; SP CHAR &quot;)&quot;	
</strong>			// X is replaced by a right arrow
			// the &quot;(&quot;, &quot;-&quot;, &quot;)&quot; are removed, the
			// order of CHAR and STRING is reversed
			// i.e. both inputs result in the same output

<strong>IGNORED_LINE:	&lt;tab&gt; &quot;;&quot; EXPAND_LINE	
EMPTY_LINE:	LF			
</strong>			// empty lines and file comments are ignored

<strong>DECOMPOSITION:	&lt;tab&gt; &quot;:&quot; EXPAND_LINE	
</strong>			// replace ':' by EQUIV, expand line into 
			// decomposition 

<strong>COMPAT_MAPPING:	&lt;tab&gt; &quot;#&quot; SP EXPAND_LINE	
</strong>			// replace '#' by APPROX, output line as mapping 

<strong>NOTICE:		&quot;@+&quot; &lt;tab&gt; LINE		
</strong>			// skip '@+', output text as notice
<strong>		&quot;@+&quot; TAB * SP LINE	
</strong>			// skip '@', output text as notice
			// &quot;*&quot; expands to a bullet character
			// Notices following a character code apply to the
			// character and are indented. Notices not following
			// a character code apply to the page/block/column 
			// and are italicized, but not indented

<strong>SUBTITLE:	&quot;@@@+&quot; &lt;tab&gt; LINE	
</strong>			// skip &quot;@@@+&quot;, output text as subtitle

<strong>SUBHEADER:	&quot;@&quot; &lt;tab&gt; LINE	
</strong>			// skip '@', output line as text as column header

<strong>BLOCKHEADER:	&quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME &lt;tab&gt; BLOCKEND
</strong>			// skip &quot;@@&quot;, cause a page break and optional
			// blank page, then output one or more charts
			// followed by the list of character names. 
			// use BLOCKSTART and BLOCKEND to define the 
			// what characters belong to a block
			// use blockname in page and table headers
	<strong>	&quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME COMMENT &lt;tab&gt; BLOCKEND
			</strong>// if a comment is present it replaces the blockname
			// when an ISO-style namelist is laid out

<strong>BLOCKSTART:	CHAR</strong>	// first character position in block
<strong>BLOCKEND:	CHAR</strong>	// last character position in block
<strong>PAGE_BREAK:	&quot;@@&quot;</strong>	// insert a (column) break

<strong>TITLE:		&quot;@@@&quot; &lt;tab&gt; LINE</strong>	
			// skip &quot;@@@&quot;, output line as text
			// Title is used in page headers

<strong>EXPAND_LINE:	{CHAR | STRING}+ LF	</strong>
			// all instances of CHAR *) are replaced by 
			// CHAR NBSP x NBSP where x is the single Unicode
			// character corresponding to char
			// If character is combining, it is replaced with
			// CHAR NBSP &lt;circ&gt; x NBSP where &lt;circ&gt; is the 
			// dotted circle</small>
</pre>

<h3><strong>1.4 NamesList File Primitives</strong></h3>

<p>The following are the primitives and terminals for the NamesList syntax.</p>

<pre><small><strong>LINE:		STRING LF
COMMENT:	&quot;(&quot; NAME &quot;)&quot;
		&quot;(&quot; NAME &quot;)&quot; &quot;*&quot;
</strong>
<strong>NAME</strong>:	  	&lt;sequence of ASCII characters, except &quot;(&quot; or &quot;)&quot; &gt; 
<strong>STRING</strong>:	  	&lt;sequence of Latin-1 characters&gt; 
<strong>CHAR</strong>:		<strong>X X X X</strong>
		<strong>| X X X X X X X X X</strong></small>
<small><strong>X:	  	&quot;0&quot;|&quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot;|&quot;A&quot;|&quot;B&quot;|&quot;C&quot;|&quot;D&quot;|&quot;E&quot;|&quot;F&quot; 
&lt;tab&gt;:</strong>	  	&lt;sequence of one or more ASCII tab characters 0x09&gt;	
<strong>SP</strong>:	  	&lt;ASCII 0x20&gt;
<strong>LF</strong>:	  	&lt;any sequence of ASCII 0x0A and 0x0D&gt;
</small></pre>

<p><strong>Notes:</strong> 

<ul>
  <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
    being misinterpreted as ISO CHAR.</li>
  <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
    UTF-16LE.</li>
  <li>The final LF in the file must be present</li>
  <li>A CHAR inside ' or &quot; is expanded, but only its glyph image is printed,&nbsp; the
    code value is not echoed</li>
  <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
    Apostrophes are supported, but nested quotes are not.</li>
</ul>
</body>
</html>
Commit	Line	Data
505afebf	1	<html>
	2
	3	<head>
	4	<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
	5	<title>Unicode 3.0 NamesList File Structure</title>
	6	</head>
	7
	8	<body>
	9
	10	<h3>Unicode NamesList File Format</h3>
	11
	12	<p>Last updated: 1999-07-06</p>
	13
	14	<h3>1.0 Introduction</h3>
	15
	16	<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
	17	to drive the layout of the character code charts in the Unicode Standard. The information
	18	in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files,
	19	together with additional annotations for many characters. This document describes the
	20	syntax rules for the file format, but also gives brief information on how each construct
	21	is rendered when laid out for the book. Some of the syntax elements were used in
	22	preparation of the drafts of the book and may not be present in the final, released form
	23	of the NamesList.txt file.</p>
	24
	25	<p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred
	26	below as ISO-style). This necessitates the presence of some information in the name list
	27	file that is not needed (and in fact removed during parsing) for the Unicode book.</p>
	28
	29	<p>With access to the layout program (unibook.exe) it is a simple matter of creating
	30	name lists for the purpose of formatting working drafts containing proposed characters.</p>
	31
	32	<h3>1.1 NamesList File Overview</h3>
	33
	34	<p>The *.lst files are plain text files which in their most simple form look like this</p>
	35
	36	<p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
	37	; this is a file comment (ignored)<br>
	38	0020<tab>SPACE<br>
	39	0021<tab>EXCLAMATION MARK<br>
	40	0022<tab>QUOTATION MARK<br>
	41	. . . <br>
	42	007F<tab>DELETE</p>
	43
	44	<p>The semicolon (as first character), @ and <tab> characters are used by the file
	45	syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double
	46	@@ introduces a block header, with the title, and start and ending code of the block
	47	provided as shown.</p>
	48
	49	<p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their
	50	constituent syntax elements are needed.</p>
	51
	52	<p>The full syntax with all the options is provided in the following sections.</p>
	53
	54	<h3>1.2 NamesList File Structure</h3>
	55
	56	<p>This section gives defines the overall file structure</p>
	57
	58	<pre><strong>NAMELIST: TITLE_PAGE* BLOCK*
	59	</strong>
	60	<strong>TITLE_PAGE: TITLE
	61	\| TITLE_PAGE SUBTITLE
	62	\| TITLE_PAGE SUBHEADER
	63	\| TITLE_PAGE IGNORED_LINE
	64	\| TITLE_PAGE EMPTY_LINE
65	\| TITLE_PAGE COMMENTLINE
66	\| TITLE_PAGE NOTICE
67	\| TITLE_PAGE PAGEBREAK
68	</strong>
69	<strong>BLOCK: BLOCKHEADER
70	\| BLOCK CHAR_ENTRY
71	\| BLOCK SUBHEADER
72	\| BLOCK NOTICE
73	\| BLOCK EMPTY_LINE
74	\| BLOCK IGNORED_LINE
75	\| BLOCK PAGEBREAK
76
77	CHAR_ENTRY: NAME_LINE \| RESERVED_LINE
78	\| CHAR_ENTRY ALIAS_LINE
79	\| CHAR_ENTRY COMMENT_LINE
80	\| CHAR_ENTRY CROSS_REF
81	\| CHAR_ENTRY DECOMPOSITION
82	\| CHAR_ENTRY COMPAT_MAPPING
83	\| CHAR_ENTRY IGNORED_LINE
84	\| CHAR_ENTRY EMPTY_LINE
85	\| CHAR_ENTRY NOTICE
86	</strong></pre>
87
88	<p>In other words:<br>
89	<br>
90	Neither TITLE nor  SUBTITLE may occur after the first BLOCKHEADER. </p>
91
92	<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE,  and IGNORED_LINE may
93	occur before the first BLOCKHEADER.</p>
94
95	<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
96	the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
97	CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p>
98
99	<p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other
100	place. </p>
101
102	<p>Note: A NOTICE displays differently depending on whether it follows a header or title
103	or is part of a CHAR_ENTRY.</p>
104
105	<h3>1.3 NamesList File Elements</h3>
106
107	<p>This section provides the details of the syntax for the individual elements.</p>
108
109	<pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre>
110
111	<pre><small><strong>NAME_LINE: CHAR <tab> LINE
112	</strong> // the CHAR and the corresponding image are echoed,
113	// followed by the name as given in LINE
114
115	<strong> CHAR TAB NAME COMMENT LF
116	</strong> // Names may have a comment, which is stripped off
117	// unless the file is parsed for an ISO style list
118
119	<strong>RESERVED_LINE: CHAR TAB <reserved>
120	</strong> // the CHAR is echoed followed by an icon for the
121	// reserved character and a fixed string e.g. <reserved>
122
123	<strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE
124	</strong> // * is replaced by BULLET, output line as comment
125	<strong><tab> EXPAND_LINE</strong>
126	// output line as comment
127
128	<strong>ALIAS_LINE: <tab> "=" SP LINE
129	</strong> // replace = by itself, output line as alias
130
131	<strong>CROSS_REF: <tab> "X" SP EXPAND_LINE
132	</strong> // X is replaced by a right arrow
133	<strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")"
134	</strong> // X is replaced by a right arrow
135	// the "(", "-", ")" are removed, the
136	// order of CHAR and STRING is reversed
137	// i.e. both inputs result in the same output
138
139	<strong>IGNORED_LINE: <tab> ";" EXPAND_LINE
140	EMPTY_LINE: LF
141	</strong> // empty lines and file comments are ignored
142
143	<strong>DECOMPOSITION: <tab> ":" EXPAND_LINE
144	</strong> // replace ':' by EQUIV, expand line into
145	// decomposition
146
147	<strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE
148	</strong> // replace '#' by APPROX, output line as mapping
149
150	<strong>NOTICE: "@+" <tab> LINE
151	</strong> // skip '@+', output text as notice
152	<strong> "@+" TAB * SP LINE
153	</strong> // skip '@', output text as notice
154	// "*" expands to a bullet character
155	// Notices following a character code apply to the
156	// character and are indented. Notices not following
157	// a character code apply to the page/block/column
158	// and are italicized, but not indented
159
160	<strong>SUBTITLE: "@@@+" <tab> LINE
161	</strong> // skip "@@@+", output text as subtitle
162
163	<strong>SUBHEADER: "@" <tab> LINE
164	</strong> // skip '@', output line as text as column header
165
166	<strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND
167	</strong> // skip "@@", cause a page break and optional
168	// blank page, then output one or more charts
169	// followed by the list of character names.
170	// use BLOCKSTART and BLOCKEND to define the
171	// what characters belong to a block
172	// use blockname in page and table headers
173	<strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND
174	</strong>// if a comment is present it replaces the blockname
175	// when an ISO-style namelist is laid out
176
177	<strong>BLOCKSTART: CHAR</strong> // first character position in block
178	<strong>BLOCKEND: CHAR</strong> // last character position in block
179	<strong>PAGE_BREAK: "@@"</strong> // insert a (column) break
180
181	<strong>TITLE: "@@@" <tab> LINE</strong>
182	// skip "@@@", output line as text
183	// Title is used in page headers
184
185	<strong>EXPAND_LINE: {CHAR \| STRING}+ LF </strong>
186	// all instances of CHAR *) are replaced by
187	// CHAR NBSP x NBSP where x is the single Unicode
188	// character corresponding to char
189	// If character is combining, it is replaced with
190	// CHAR NBSP <circ> x NBSP where <circ> is the
191	// dotted circle</small>
192	</pre>
193
194	<h3><strong>1.4 NamesList File Primitives</strong></h3>
195
196	<p>The following are the primitives and terminals for the NamesList syntax.</p>
197
198	<pre><small><strong>LINE: STRING LF
199	COMMENT: "(" NAME ")"
200	"(" NAME ")" "*"
201	</strong>
202	<strong>NAME</strong>: <sequence of ASCII characters, except "(" or ")" >
203	<strong>STRING</strong>: <sequence of Latin-1 characters>
204	<strong>CHAR</strong>: <strong>X X X X</strong>
205	<strong>\| X X X X X X X X X</strong></small>
206	<small><strong>X: "0"\|"1"\|"2"\|"3"\|"4"\|"5"\|"6"\|"7"\|"8"\|"9"\|"A"\|"B"\|"C"\|"D"\|"E"\|"F"
207	<tab>:</strong> <sequence of one or more ASCII tab characters 0x09>
208	<strong>SP</strong>: <ASCII 0x20>
209	<strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D>
210	</small></pre>
211
212	<p><strong>Notes:</strong>
213
214	<ul>
215	<li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
216	being misinterpreted as ISO CHAR.</li>
217	<li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
218	UTF-16LE.</li>
219	<li>The final LF in the file must be present</li>
220	<li>A CHAR inside ' or " is expanded, but only its glyph image is printed,  the
221	code value is not echoed</li>
222	<li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
223	Apostrophes are supported, but nested quotes are not.</li>
224	</ul>
225	</body>
226	</html>