X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=lib%2Funicore%2FArabicShaping.txt;h=b851d3839f7be709f2aee54b44499f71ede3efda;hb=6182169b72782336c6202161aa4cde16ac88296e;hp=df1f1933a39a79e025aedcf230b1152310f4c487;hpb=1911be8391700522b225cf514eddd9ebe9eaf644;p=p5sagit%2Fp5-mst-13.2.git diff --git a/lib/unicore/ArabicShaping.txt b/lib/unicore/ArabicShaping.txt index df1f193..b851d38 100644 --- a/lib/unicore/ArabicShaping.txt +++ b/lib/unicore/ArabicShaping.txt @@ -1,44 +1,80 @@ -# ArabicShaping-4.0.0.txt +# ArabicShaping-5.2.0.txt +# Date: 2009-08-17, 11:11:00 PDT [KW] # # This file is a normative contributory data file in the # Unicode Character Database. # -# This file defines the shaping classes for Arabic and Syriac +# Copyright (c) 1991-2009 Unicode, Inc. +# For terms of use, see http://www.unicode.org/terms_of_use.html +# +# This file defines the shaping classes for Arabic, Syriac, and N'Ko # positional shaping, repeating in machine readable form the -# information printed in Tables 8-3, 8-7, 8-8, 8-11, 8-12, and -# 8-13 of The Unicode Standard, Version 4.0. +# information exemplified in Tables 8-3, 8-7, 8-8, 8-11, 8-12, +# 8-13, and 13-5 of The Unicode Standard, Version 5.2. # -# See sections 8.2 and 8.3 of The Unicode Standard, Version 4.0 +# See sections 8.2, 8.3, and 13.5 of The Unicode Standard, Version 5.2 # for more information. # # Each line contains four fields, separated by a semicolon. # # Field 0: the code point, in 4-digit hexadecimal -# form, of an Arabic or Syriac character. +# form, of an Arabic, Syriac, or N'Ko character. +# # Field 1: gives a short schematic name for that character, # abbreviated from the normative Unicode character name. -# Field 2: defines the joining type -# R right-joining, -# L left-joining, -# D dual-joining, -# C join-causing -# U non-joining -# T transparent -# See the Arabic block description for more information on these types. -# Field 3: defines the joining group. # +# Field 2: defines the joining type (property name: Joining_Type) +# R Right_Joining +# L Left_Joining +# D Dual_Joining +# C Join_Causing +# U Non_Joining +# T Transparent +# See Section 8.2, Arabic for more information on these types. +# +# Field 3: defines the joining group (property name: Joining_Group) +# +# The values of the joining group are based schematically on character +# names. Where a schematic character name consists of two or more parts separated +# by spaces, the formal Joining_Group property value, as specified in +# PropertyValueAliases.txt, consists of the same name parts joined by +# underscores. Hence, the entry: +# +# 0629; TEH MARBUTA; R; TEH MARBUTA +# +# corresponds to [Joining_Group = Teh_Marbuta]. +# +# Note: For historical reasons, the property value [Joining_Group = Hamza_On_Heh_Goal] +# is anachronistically named. It used to apply to both of the following characters +# in earlier versions of the standard: +# +# U+06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE +# U+06C3 ARABIC LETTER TEH MARBUTA GOAL +# +# However, it currently applies only to U+06C3, and *not* to U+06C2. +# To avoid destabilizing existing Joining_Group property aliases, the +# value Hamza_On_Heh_Goal has not been changed, despite the fact that it +# no longer applies to Hamza On Heh Goal, but only to Teh Marbuta Goal. +# +# When other cursive scripts are added to the Unicode Standard in +# the future, the joining group value of all its letters will default +# to jg=No_Joining_Group in this data file. Other, more specific +# joining group values will be defined only if an explicit proposal +# to define those values exactly has been approved by the UTC. This +# is the convention exemplified by the N'Ko script. Only the Arabic +# and Syriac scripts currently have explicit joining group values defined. # # Note: Code points that are not explicitly listed in this file are -# either of type T or U: +# either of joining type T or U: # -# - Those that not explicitly listed that are of General Category Mn or Cf +# - Those that not explicitly listed that are of General Category Mn, Me, or Cf # have joining type T. -# - All others not explicitly listed have type U. +# - All others not explicitly listed have joining type U. # # For an explicit listing of characters of joining type T, see # the derived property file DerivedJoiningType.txt. # -# There are currently no characters of type L defined in Unicode. +# There are currently no characters of joining type L defined in Unicode. # # ############################################################# @@ -46,11 +82,13 @@ # Arabic characters -0600; ARABIC NUMBER SIGN; U; -0601; ARABIC SIGN SANAH; U; -0602; ARABIC FOOTNOTE MARKER; U; -0603; ARABIC SIGN SAFHA; U; -0621; HAMZA; U; +0600; ARABIC NUMBER SIGN; U; No_Joining_Group +0601; ARABIC SIGN SANAH; U; No_Joining_Group +0602; ARABIC FOOTNOTE MARKER; U; No_Joining_Group +0603; ARABIC SIGN SAFHA; U; No_Joining_Group +0608; ARABIC RAY; U; No_Joining_Group +060B; AFGHANI SIGN; U; No_Joining_Group +0621; HAMZA; U; No_Joining_Group 0622; MADDA ON ALEF; R; ALEF 0623; HAMZA ON ALEF; R; ALEF 0624; HAMZA ON WAW; R; WAW @@ -76,7 +114,12 @@ 0638; ZAH; D; TAH 0639; AIN; D; AIN 063A; GHAIN; D; AIN -0640; TATWEEL; C; +063B; KEHEH WITH 2 DOTS ABOVE; D; GAF +063C; KEHEH WITH 3 DOTS BELOW; D; GAF +063D; FARSI YEH WITH INVERTED V; D; FARSI YEH +063E; FARSI YEH WITH 2 DOTS ABOVE; D; FARSI YEH +063F; FARSI YEH WITH 3 DOTS ABOVE; D; FARSI YEH +0640; TATWEEL; C; No_Joining_Group 0641; FEH; D; FEH 0642; QAF; D; QAF 0643; KAF; D; KAF @@ -92,7 +135,7 @@ 0671; HAMZAT WASL ON ALEF; R; ALEF 0672; WAVY HAMZA ON ALEF; R; ALEF 0673; WAVY HAMZA UNDER ALEF; R; ALEF -0674; HIGH HAMZA; U; +0674; HIGH HAMZA; U; No_Joining_Group 0675; HIGH HAMZA ALEF; R; ALEF 0676; HIGH HAMZA WAW; R; WAW 0677; HIGH HAMZA WAW WITH DAMMA; R; WAW @@ -145,7 +188,7 @@ 06A6; FEH WITH 4 DOTS ABOVE; D; FEH 06A7; QAF WITH DOT ABOVE; D; QAF 06A8; QAF WITH 3 DOTS ABOVE; D; QAF -06A9; OPEN KAF; D; GAF +06A9; KEHEH; D; GAF 06AA; SWASH KAF; D; SWASH KAF 06AB; KAF WITH RING; D; GAF 06AC; KAF WITH DOT ABOVE; D; KAF @@ -165,12 +208,12 @@ 06BA; DOTLESS NOON; D; NOON 06BB; DOTLESS NOON WITH SMALL TAH; D; NOON 06BC; NOON WITH RING; D; NOON -06BD; NOON WITH 3 DOTS ABOVE; D; NOON +06BD; NYA; D; NYA 06BE; KNOTTED HEH; D; KNOTTED HEH 06BF; HAH WITH MIDDLE 3 DOTS DOWNWARD AND DOT ABOVE; D; HAH 06C0; HAMZA ON HEH; R; TEH MARBUTA 06C1; HEH GOAL; D; HEH GOAL -06C2; HAMZA ON HEH GOAL; R; HAMZA ON HEH GOAL +06C2; HAMZA ON HEH GOAL; D; HEH GOAL 06C3; TEH MARBUTA GOAL; R; HAMZA ON HEH GOAL 06C4; WAW WITH RING; R; WAW 06C5; WAW WITH BAR; R; WAW @@ -180,22 +223,22 @@ 06C9; WAW WITH INVERTED SMALL V; R; WAW 06CA; WAW WITH 2 DOTS ABOVE; R; WAW 06CB; WAW WITH 3 DOTS ABOVE; R; WAW -06CC; DOTLESS YEH; D; YEH +06CC; FARSI YEH; D; FARSI YEH 06CD; YEH WITH TAIL; R; YEH WITH TAIL -06CE; YEH WITH SMALL V; D; YEH +06CE; FARSI YEH WITH SMALL V; D; FARSI YEH 06CF; WAW WITH DOT ABOVE; R; WAW 06D0; YEH WITH 2 DOTS VERTICAL BELOW; D; YEH 06D1; YEH WITH 3 DOTS BELOW; D; YEH 06D2; YEH BARREE; R; YEH BARREE 06D3; HAMZA ON YEH BARREE; R; YEH BARREE 06D5; AE; R; TEH MARBUTA -06DD; ARABIC END OF AYAH; U; +06DD; ARABIC END OF AYAH; U; No_Joining_Group 06EE; DAL WITH INVERTED V; R; DAL 06EF; REH WITH INVERTED V; R; REH -06FF; HEH WITH INVERTED V; D; KNOTTED HEH 06FA; SEEN WITH DOT BELOW AND 3 DOTS ABOVE; D; SEEN 06FB; DAD WITH DOT BELOW; D; SAD 06FC; GHAIN WITH DOT BELOW; D; AIN +06FF; HEH WITH INVERTED V; D; KNOTTED HEH # Syriac characters @@ -234,7 +277,97 @@ 074E; SOGDIAN KHAPH; D; KHAPH 074F; SOGDIAN FE; D; FE +# Arabic supplement characters + +0750; BEH WITH 3 DOTS HORIZONTALLY BELOW; D; BEH +0751; BEH WITH DOT BELOW AND 3 DOTS ABOVE; D; BEH +0752; BEH WITH 3 DOTS POINTING UPWARDS BELOW; D; BEH +0753; BEH WITH 3 DOTS POINTING UPWARDS BELOW AND 2 DOTS ABOVE; D; BEH +0754; BEH WITH 2 DOTS BELOW AND DOT ABOVE; D; BEH +0755; BEH WITH INVERTED SMALL V BELOW; D; BEH +0756; BEH WITH SMALL V; D; BEH +0757; HAH WITH 2 DOTS ABOVE; D; HAH +0758; HAH WITH 3 DOTS POINTING UPWARDS BELOW; D; HAH +0759; DAL WITH 2 DOTS VERTICALLY BELOW AND SMALL TAH; R; DAL +075A; DAL WITH INVERTED SMALL V BELOW; R; DAL +075B; REH WITH STROKE; R; REH +075C; SEEN WITH 4 DOTS ABOVE; D; SEEN +075D; AIN WITH 2 DOTS ABOVE; D; AIN +075E; AIN WITH 3 DOTS POINTING DOWNWARDS ABOVE; D; AIN +075F; AIN WITH 2 DOTS VERTICALLY ABOVE; D; AIN +0760; FEH WITH 2 DOTS BELOW; D; FEH +0761; FEH WITH 3 DOTS POINTING UPWARDS BELOW; D; FEH +0762; KEHEH WITH DOT ABOVE; D; GAF +0763; KEHEH WITH 3 DOTS ABOVE; D; GAF +0764; KEHEH WITH 3 DOTS POINTING UPWARDS BELOW; D; GAF +0765; MEEM WITH DOT ABOVE; D; MEEM +0766; MEEM WITH DOT BELOW; D; MEEM +0767; NOON WITH 2 DOTS BELOW; D; NOON +0768; NOON WITH SMALL TAH; D; NOON +0769; NOON WITH SMALL V; D; NOON +076A; LAM WITH BAR; D; LAM +076B; REH WITH 2 DOTS VERTICALLY ABOVE; R; REH +076C; REH WITH HAMZA ABOVE; R; REH +076D; SEEN WITH 2 DOTS VERTICALLY ABOVE; D; SEEN +076E; HAH WITH SMALL TAH BELOW; D; HAH +076F; HAH WITH SMALL TAH AND 2 DOTS; D; HAH +0770; SEEN WITH SMALL TAH AND 2 DOTS; D; SEEN +0771; REH WITH SMALL TAH AND 2 DOTS; R; REH +0772; HAH WITH SMALL TAH ABOVE; D; HAH +0773; ALEF WITH DIGIT TWO ABOVE; R; ALEF +0774; ALEF WITH DIGIT THREE ABOVE; R; ALEF +0775; FARSI YEH WITH DIGIT TWO ABOVE; D; FARSI YEH +0776; FARSI YEH WITH DIGIT THREE ABOVE; D; FARSI YEH +0777; YEH WITH DIGIT FOUR BELOW; D; YEH +0778; WAW WITH DIGIT TWO ABOVE; R; WAW +0779; WAW WITH DIGIT THREE ABOVE; R; WAW +077A; YEH BARREE WITH DIGIT TWO ABOVE; D; BURUSHASKI YEH BARREE +077B; YEH BARREE WITH DIGIT THREE ABOVE; D; BURUSHASKI YEH BARREE +077C; HAH WITH DIGIT FOUR BELOW; D; HAH +077D; SEEN WITH DIGIT FOUR ABOVE; D; SEEN +077E; SEEN WITH INVERTED V; D; SEEN +077F; KAF WITH 2 DOTS ABOVE; D; KAF + +# N'Ko Characters + +07CA; NKO A; D; No_Joining_Group +07CB; NKO EE; D; No_Joining_Group +07CC; NKO I; D; No_Joining_Group +07CD; NKO E; D; No_Joining_Group +07CE; NKO U; D; No_Joining_Group +07CF; NKO OO; D; No_Joining_Group +07D0; NKO O; D; No_Joining_Group +07D1; NKO DAGBASINNA; D; No_Joining_Group +07D2; NKO N; D; No_Joining_Group +07D3; NKO BA; D; No_Joining_Group +07D4; NKO PA; D; No_Joining_Group +07D5; NKO TA; D; No_Joining_Group +07D6; NKO JA; D; No_Joining_Group +07D7; NKO CHA; D; No_Joining_Group +07D8; NKO DA; D; No_Joining_Group +07D9; NKO RA; D; No_Joining_Group +07DA; NKO RRA; D; No_Joining_Group +07DB; NKO SA; D; No_Joining_Group +07DC; NKO GBA; D; No_Joining_Group +07DD; NKO FA; D; No_Joining_Group +07DE; NKO KA; D; No_Joining_Group +07DF; NKO LA; D; No_Joining_Group +07E0; NKO NA WOLOSO; D; No_Joining_Group +07E1; NKO MA; D; No_Joining_Group +07E2; NKO NYA; D; No_Joining_Group +07E3; NKO NA; D; No_Joining_Group +07E4; NKO HA; D; No_Joining_Group +07E5; NKO WA; D; No_Joining_Group +07E6; NKO YA; D; No_Joining_Group +07E7; NKO NYA WOLOSO; D; No_Joining_Group +07E8; NKO JONA JA; D; No_Joining_Group +07E9; NKO JONA CHA; D; No_Joining_Group +07EA; NKO JONA RA; D; No_Joining_Group +07FA; NKO LAJANYALAN; C; No_Joining_Group + # Other -200D; ZERO WIDTH JOINER; C; -200C; ZERO WIDTH NON-JOINER; U; +200C; ZERO WIDTH NON-JOINER; U; No_Joining_Group +200D; ZERO WIDTH JOINER; C; No_Joining_Group + +# EOF