[p5sagit/p5-mst-13.2.git] / lib / unicode / PropList.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta name="keywords"
content="unicode, normalization, composition, decomposition">
<meta name="description" content="Describes PropList.html">
<title>UCD: Extended Character Properties</title>
<link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css">
</head>

<body bgcolor="#ffffff">

<table width="100%" cellpadding="0" cellspacing="0" border="0">
  <tr>
    <td>
      <table width="100%" border="0" cellpadding="0" cellspacing="0">
        <tr>
          <td class="icon"><a href="http://www.unicode.org"><img border="0"
            src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle"
            alt="[Unicode]" width="34" height="33"></a>&nbsp;&nbsp;<a
            class="bar" href="UnicodeCharacterDatabase.html">Unicode Character 
            Database</a></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td class="gray">&nbsp;</td>
  </tr>
</table>
<h1>Extended Character Properties</h1>
<table height="87" cellspacing="2" cellpadding="0" width="100%" border="1">
  <tbody>
    <tr>
      <td valign="top" width="144">Revision</td>
      <td valign="top">3.1.0</td>
    </tr>
    <tr>
      <td valign="top" width="144">Authors</td>
      <td valign="top">Mark Davis</td>
    </tr>
    <tr>
      <td valign="top" width="144">Date</td>
      <td valign="top">2001-02-28</td>
    </tr>
    <tr>
      <td valign="top" width="144">This Version</td>
      <td valign="top"><a
        href="http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html">http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html</a></td>
    </tr>
    <tr>
      <td valign="top" width="144">Previous Version</td>
      <td valign="top">n/a</td>
    </tr>
    <tr>
      <td valign="top" width="144">Latest Version</td>
      <td valign="top"><a
        href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td>
    </tr>
  </tbody>
</table>
<h3><i><br>
Summary</i></h3>
<blockquote>
  <p><i>This document describes the format and content of the PropList.txt data 
  file in the Unicode Character Database (UCD).</i></p>
</blockquote>
<h3><i>Status</i></h3>
<blockquote>
  <p><i>The file and the files described herein are part of the Unicode 
  Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a> 
  given below.</i></p>
  <p><i>For general information on file formats and table formats, and the 
  implications of normative vs informative properties, see 
  UnicodeCharacterDatabase.html.</i></p>
  <p><i><b>Warning: </b>the information in this file does not completely 
  describe the use and interpretation of Unicode character properties and 
  behavior. It must be used in conjunction with the data in the other files in 
  the UCD, and relies on the notation and definitions supplied in <a
  href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The 
  Unicode Standard</a>. All chapter references are to Version 3.1.0 of the 
  standard.</i></p>
</blockquote>
<hr width="50%">
<h2>Introduction</h2>
<p align="left">PropList.txt contains extended properties that supplement the 
General Category property described in UnicodeData.html. Unlike the derived 
properties, the properties in PropList.txt cannot be derived directly from 
UnicodeData.txt or other data files of the UCD. These properties are listed in 
the following table.</p>
<div align="center">
  <center>
  <table border="1" cellspacing="0" cellpadding="3" class="smallText">
    <tr>
      <th>Property Value</th>
      <th>N/I</th>
      <th>Definition and Usage</th>
    </tr>
    <tr>
      <th valign="top">White_space</th>
      <th valign="top">N</th>
      <td valign="top">Space characters and those format control characters 
        (such as TAB, CR and LF) which should be treated by programming 
        languages as &quot;white space&quot; for the purpose of parsing 
        elements.
        <p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not 
        included, since their functions are restricted to line-break control. 
        Their names are unfortunately misleading in this respect.</p>
        <p><b>Note: </b>There are other senses of &quot;whitespace&quot; that 
        encompass a different set of characters.</p>
      </td>
    </tr>
    <tr>
      <th valign="top">Bidi_Control</th>
      <th valign="top">N</th>
      <td valign="top">Those format control characters which have specific 
        functions in the Bidirectional Algorithm.</td>
    </tr>
    <tr>
      <th valign="top">Join_Control</th>
      <th valign="top">N</th>
      <td valign="top">Those format control characters which have specific 
        functions for control of cursive joining and ligation.</td>
    </tr>
    <tr>
      <th valign="top">Dash</th>
      <th valign="top">I</th>
      <td valign="top">Those punctuation characters explicitly called out as 
        dashes in the Unicode Standard, plus compatibility equivalents to those. 
        Most of these have the Pd General Category, but some have the Sm General 
        Category because of their use in mathematics.</td>
    </tr>
    <tr>
      <th valign="top">Hyphen</th>
      <th valign="top">I</th>
      <td valign="top">Those dashes used to mark connections between pieces of 
        words, plus the Katakana middle dot. The Katakana middle dot functions 
        like a hyphen, but is shaped like a dot rather than a dash.</td>
    </tr>
    <tr>
      <th valign="top">Quotation_Mark</th>
      <th valign="top">I</th>
      <td valign="top">Those punctuation characters that function as quotation 
        marks.</td>
    </tr>
    <tr>
      <th valign="top">Terminal_Punctuation</th>
      <th valign="top">I</th>
      <td valign="top">Those punctuation characters that generally mark the end 
        of textual units.</td>
    </tr>
    <tr>
      <th valign="top">Other_Math</th>
      <th valign="top">I</th>
      <td valign="top">Math characters that do not have the Sm General Category.</td>
    </tr>
    <tr>
      <th valign="top">Hex_Digit</th>
      <th valign="top">I</th>
      <td valign="top">Characters commonly used for the representation of 
        hexadecimal numbers, plus their compatibility equivalents.</td>
    </tr>
    <tr>
      <th valign="top">Other_Alphabetic</th>
      <th valign="top">I</th>
      <td valign="top">Alphabetic characters that do not have L as their major 
        class for the General Category (Lu, Ll, Lt, Lm, Lo).</td>
    </tr>
    <tr>
      <th valign="top">Ideographic</th>
      <th valign="top">I</th>
      <td valign="top">Characters considered to be CJKV (Chinese, Japanese, 
        Korean, and Vietnamese) ideographs.</td>
    </tr>
    <tr>
      <th valign="top">Diacritic</th>
      <th valign="top">I</th>
      <td valign="top">Characters that linguistically modify the meaning of 
        another character to which they apply. Some diacritics are not combining 
        characters, and some combining characters are not diacritics.</td>
    </tr>
    <tr>
      <th valign="top">Extender</th>
      <th valign="top">I</th>
      <td valign="top">Characters whose principal function is to extend the 
        value or shape of a preceding alphabetic character. Typical of these are 
        length and iteration marks.</td>
    </tr>
    <tr>
      <th valign="top">Other_Lowercase</th>
      <th valign="top">I</th>
      <td valign="top">Lowercase characters that do not have the Ll General 
        Category.</td>
    </tr>
    <tr>
      <th valign="top">Other_Uppercase</th>
      <th valign="top">I</th>
      <td valign="top">Uppercase characters that do not have the Lu General 
        Category.</td>
    </tr>
    <tr>
      <th valign="top">Noncharacter_Code_Point</th>
      <th valign="top">N</th>
      <td valign="top">Code points that are explicitly defined as illegal for 
        the encoding of characters. See <a
        href="http://www.unicode.org/unicode/reports/tr27/">Unicode 3.1</a> for 
        more information.</td>
    </tr>
  </table>
  </center>
</div>
<h2><i><a name="UCD_Terms"><br>
UCD Terms of Use</a></i></h2>
<h3><i>Disclaimer</i></h3>
<blockquote>
  <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No 
  claims are made as to fitness for any particular purpose. No warranties of any 
  kind are expressed or implied. The recipient agrees to determine applicability 
  of information provided. If this file has been purchased on magnetic or 
  optical media from Unicode, Inc., the sole remedy for any claim will be 
  exchange of defective media within 90 days of receipt.</i></p>
  <p><i>This disclaimer is applicable for all other data files accompanying the 
  Unicode Character Database, some of which have been compiled by the Unicode 
  Consortium, and some of which have been supplied by other sources.</i></p>
</blockquote>
<h3><i>Limitations on Rights to Redistribute This Data</i></h3>
<blockquote>
  <p><i>Recipient is granted the right to make copies in any form for internal 
  distribution and to freely use the information supplied in the creation of 
  products supporting the Unicode<sup>TM</sup> Standard. The files in the 
  Unicode Character Database can be redistributed to third parties or other 
  organizations (whether for profit or not) as long as this notice and the 
  disclaimer notice are retained. Information can be extracted from these files 
  and used in documentation or programs, as long as there is an accompanying 
  notice indicating the source.</i></p>
</blockquote>
<hr width="50%">
<p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img
src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40"
height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0"
alt="Terms of Use" width="152" height="49"><img
src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46"
height="49"></a>

</body>

</html>
Commit	Line	Data
06bfd75b	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
	2	<html>
	3
	4	<head>
	5	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	6	<meta http-equiv="Content-Language" content="en-us">
	7	<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
	8	<meta name="ProgId" content="FrontPage.Editor.Document">
	9	<meta name="keywords"
	10	content="unicode, normalization, composition, decomposition">
	11	<meta name="description" content="Describes PropList.html">
	12	<title>UCD: Extended Character Properties</title>
	13	<link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css">
	14	</head>
	15
	16	<body bgcolor="#ffffff">
	17
	18	<table width="100%" cellpadding="0" cellspacing="0" border="0">
	19	<tr>
	20	<td>
	21	<table width="100%" border="0" cellpadding="0" cellspacing="0">
	22	<tr>
	23	<td class="icon"><a href="http://www.unicode.org"><img border="0"
	24	src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle"
	25	alt="[Unicode]" width="34" height="33"></a>  <a
	26	class="bar" href="UnicodeCharacterDatabase.html">Unicode Character
	27	Database</a></td>
	28	</tr>
	29	</table>
	30	</td>
	31	</tr>
	32	<tr>
	33	<td class="gray"> </td>
	34	</tr>
	35	</table>
	36	<h1>Extended Character Properties</h1>
	37	<table height="87" cellspacing="2" cellpadding="0" width="100%" border="1">
	38	<tbody>
	39	<tr>
	40	<td valign="top" width="144">Revision</td>
	41	<td valign="top">3.1.0</td>
	42	</tr>
	43	<tr>
	44	<td valign="top" width="144">Authors</td>
	45	<td valign="top">Mark Davis</td>
	46	</tr>
	47	<tr>
	48	<td valign="top" width="144">Date</td>
	49	<td valign="top">2001-02-28</td>
	50	</tr>
	51	<tr>
	52	<td valign="top" width="144">This Version</td>
	53	<td valign="top"><a
	54	href="http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html">http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html</a></td>
	55	</tr>
	56	<tr>
	57	<td valign="top" width="144">Previous Version</td>
	58	<td valign="top">n/a</td>
	59	</tr>
	60	<tr>
	61	<td valign="top" width="144">Latest Version</td>
	62	<td valign="top"><a
	63	href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td>
	64	</tr>
65	</tbody>
66	</table>
67	<h3><i><br>
68	Summary</i></h3>
69	<blockquote>
70	<p><i>This document describes the format and content of the PropList.txt data
71	file in the Unicode Character Database (UCD).</i></p>
72	</blockquote>
73	<h3><i>Status</i></h3>
74	<blockquote>
75	<p><i>The file and the files described herein are part of the Unicode
76	Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a>
77	given below.</i></p>
78	<p><i>For general information on file formats and table formats, and the
79	implications of normative vs informative properties, see
80	UnicodeCharacterDatabase.html.</i></p>
81	<p><i><b>Warning: </b>the information in this file does not completely
82	describe the use and interpretation of Unicode character properties and
83	behavior. It must be used in conjunction with the data in the other files in
84	the UCD, and relies on the notation and definitions supplied in <a
85	href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The
86	Unicode Standard</a>. All chapter references are to Version 3.1.0 of the
87	standard.</i></p>
88	</blockquote>
89	<hr width="50%">
90	<h2>Introduction</h2>
91	<p align="left">PropList.txt contains extended properties that supplement the
92	General Category property described in UnicodeData.html. Unlike the derived
93	properties, the properties in PropList.txt cannot be derived directly from
94	UnicodeData.txt or other data files of the UCD. These properties are listed in
95	the following table.</p>
96	<div align="center">
97	<center>
98	<table border="1" cellspacing="0" cellpadding="3" class="smallText">
99	<tr>
100	<th>Property Value</th>
101	<th>N/I</th>
102	<th>Definition and Usage</th>
103	</tr>
104	<tr>
105	<th valign="top">White_space</th>
106	<th valign="top">N</th>
107	<td valign="top">Space characters and those format control characters
108	(such as TAB, CR and LF) which should be treated by programming
109	languages as "white space" for the purpose of parsing
110	elements.
111	<p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not
112	included, since their functions are restricted to line-break control.
113	Their names are unfortunately misleading in this respect.</p>
114	<p><b>Note: </b>There are other senses of "whitespace" that
115	encompass a different set of characters.</p>
116	</td>
117	</tr>
118	<tr>
119	<th valign="top">Bidi_Control</th>
120	<th valign="top">N</th>
121	<td valign="top">Those format control characters which have specific
122	functions in the Bidirectional Algorithm.</td>
123	</tr>
124	<tr>
125	<th valign="top">Join_Control</th>
126	<th valign="top">N</th>
127	<td valign="top">Those format control characters which have specific
128	functions for control of cursive joining and ligation.</td>
129	</tr>
130	<tr>
131	<th valign="top">Dash</th>
132	<th valign="top">I</th>
133	<td valign="top">Those punctuation characters explicitly called out as
134	dashes in the Unicode Standard, plus compatibility equivalents to those.
135	Most of these have the Pd General Category, but some have the Sm General
136	Category because of their use in mathematics.</td>
137	</tr>
138	<tr>
139	<th valign="top">Hyphen</th>
140	<th valign="top">I</th>
141	<td valign="top">Those dashes used to mark connections between pieces of
142	words, plus the Katakana middle dot. The Katakana middle dot functions
143	like a hyphen, but is shaped like a dot rather than a dash.</td>
144	</tr>
145	<tr>
146	<th valign="top">Quotation_Mark</th>
147	<th valign="top">I</th>
148	<td valign="top">Those punctuation characters that function as quotation
149	marks.</td>
150	</tr>
151	<tr>
152	<th valign="top">Terminal_Punctuation</th>
153	<th valign="top">I</th>
154	<td valign="top">Those punctuation characters that generally mark the end
155	of textual units.</td>
156	</tr>
157	<tr>
158	<th valign="top">Other_Math</th>
159	<th valign="top">I</th>
160	<td valign="top">Math characters that do not have the Sm General Category.</td>
161	</tr>
162	<tr>
163	<th valign="top">Hex_Digit</th>
164	<th valign="top">I</th>
165	<td valign="top">Characters commonly used for the representation of
166	hexadecimal numbers, plus their compatibility equivalents.</td>
167	</tr>
168	<tr>
169	<th valign="top">Other_Alphabetic</th>
170	<th valign="top">I</th>
171	<td valign="top">Alphabetic characters that do not have L as their major
172	class for the General Category (Lu, Ll, Lt, Lm, Lo).</td>
173	</tr>
174	<tr>
175	<th valign="top">Ideographic</th>
176	<th valign="top">I</th>
177	<td valign="top">Characters considered to be CJKV (Chinese, Japanese,
178	Korean, and Vietnamese) ideographs.</td>
179	</tr>
180	<tr>
181	<th valign="top">Diacritic</th>
182	<th valign="top">I</th>
183	<td valign="top">Characters that linguistically modify the meaning of
184	another character to which they apply. Some diacritics are not combining
185	characters, and some combining characters are not diacritics.</td>
186	</tr>
187	<tr>
188	<th valign="top">Extender</th>
189	<th valign="top">I</th>
190	<td valign="top">Characters whose principal function is to extend the
191	value or shape of a preceding alphabetic character. Typical of these are
192	length and iteration marks.</td>
193	</tr>
194	<tr>
195	<th valign="top">Other_Lowercase</th>
196	<th valign="top">I</th>
197	<td valign="top">Lowercase characters that do not have the Ll General
198	Category.</td>
199	</tr>
200	<tr>
201	<th valign="top">Other_Uppercase</th>
202	<th valign="top">I</th>
203	<td valign="top">Uppercase characters that do not have the Lu General
204	Category.</td>
205	</tr>
206	<tr>
207	<th valign="top">Noncharacter_Code_Point</th>
208	<th valign="top">N</th>
209	<td valign="top">Code points that are explicitly defined as illegal for
210	the encoding of characters. See <a
211	href="http://www.unicode.org/unicode/reports/tr27/">Unicode 3.1</a> for
212	more information.</td>
213	</tr>
214	</table>
215	</center>
216	</div>
217	<h2><i><a name="UCD_Terms"><br>
218	UCD Terms of Use</a></i></h2>
219	<h3><i>Disclaimer</i></h3>
220	<blockquote>
221	<p><i>The Unicode Character Database is provided as is by Unicode, Inc. No
222	claims are made as to fitness for any particular purpose. No warranties of any
223	kind are expressed or implied. The recipient agrees to determine applicability
224	of information provided. If this file has been purchased on magnetic or
225	optical media from Unicode, Inc., the sole remedy for any claim will be
226	exchange of defective media within 90 days of receipt.</i></p>
227	<p><i>This disclaimer is applicable for all other data files accompanying the
228	Unicode Character Database, some of which have been compiled by the Unicode
229	Consortium, and some of which have been supplied by other sources.</i></p>
230	</blockquote>
231	<h3><i>Limitations on Rights to Redistribute This Data</i></h3>
232	<blockquote>
233	<p><i>Recipient is granted the right to make copies in any form for internal
234	distribution and to freely use the information supplied in the creation of
235	products supporting the Unicode<sup>TM</sup> Standard. The files in the
236	Unicode Character Database can be redistributed to third parties or other
237	organizations (whether for profit or not) as long as this notice and the
238	disclaimer notice are retained. Information can be extracted from these files
239	and used in documentation or programs, as long as there is an accompanying
240	notice indicating the source.</i></p>
241	</blockquote>
242	<hr width="50%">
243	<p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img
244	src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40"
245	height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0"
246	alt="Terms of Use" width="152" height="49"><img
247	src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46"
248	height="49"></a>
249
250	</body>
251
252	</html>