pod/perlebcdic.pod

   1 =head1 NAME
   2
   3 perlebcdic - Considerations for running Perl on EBCDIC platforms
   4
   5 =head1 DESCRIPTION
   6
   7 An exploration of some of the issues facing Perl programmers
   8 on EBCDIC based computers.  We do not cover localization,
   9 internationalization, or multi byte character set issues (yet).
  10
  11 Portions that are still incomplete are marked with XXX.
  12
  13 =head1 COMMON CHARACTER CODE SETS
  14
  15 =head2 ASCII
  16
  17 The American Standard Code for Information Interchange is a set of
  18 integers running from 0 to 127 (decimal) that imply character
  19 interpretation by the display and other system(s) of computers.
  20 The range 0..127 can be covered by setting the bits in a 7-bit binary
  21 digit, hence the set is sometimes referred to as a "7-bit ASCII".
  22 ASCII was described by the American National Standards Institute
  23 document ANSI X3.4-1986.  It was also described by ISO 646:1991
  24 (with localization for currency symbols).  The full ASCII set is
  25 given in the table below as the first 128 elements.  Languages that
  26 can be written adequately with the characters in ASCII include
  27 English, Hawaiian, Indonesian, Swahili and some Native American
  28 languages.
  29
  30 There are many character sets that extend the range of integers
  31 from 0..2**7-1 up to 2**8-1, or 8 bit bytes (octets if you prefer).
  32 One common one is the ISO 8859-1 character set.
  33
  34 =head2 ISO 8859
  35
  36 The ISO 8859-$n are a collection of character code sets from the
  37 International Organization for Standardization (ISO) each of which
  38 adds characters to the ASCII set that are typically found in European
  39 languages many of which are based on the Roman, or Latin, alphabet.
  40
  41 =head2 Latin 1 (ISO 8859-1)
  42
  43 A particular 8-bit extension to ASCII that includes grave and acute
  44 accented Latin characters.  Languages that can employ ISO 8859-1
  45 include all the languages covered by ASCII as well as Afrikaans,
  46 Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian,
  47 Portugese, Spanish, and Swedish.  Dutch is covered albeit without
  48 the ij ligature.  French is covered too but without the oe ligature.
  49 German can use ISO 8859-1 but must do so without German-style
  50 quotation marks.  This set is based on Western European extensions
  51 to ASCII and is commonly encountered in world wide web work.
  52 In IBM character code set identification terminology ISO 8859-1 is
  53 also known as CCSID 819 (or sometimes 0819 or even 00819).
  54
  55 =head2 EBCDIC
  56
  57 The Extended Binary Coded Decimal Interchange Code  refers to a
  58 large collection of slightly different single and multi byte
  59 coded character sets that are different from ASCII or ISO 8859-1
  60 and typically run on host computers.  The EBCDIC encodings derive
  61 from 8 bit byte extensions of Hollerith punched card encodings.
  62 The layout on the cards was such that high bits were set for the
  63 upper and lower case alphabet characters [a-z] and [A-Z], but there
  64 were gaps within each latin alphabet range.
  65
  66 =head2 13 variant characters
  67
  68 Some IBM EBCDIC character sets may be known by character code set
  69 identification numbers (CCSID numbers) or code page numbers.  Leading
  70 zero digits in CCSID numbers within this document are insignificant.
  71 E.g. CCSID 0037 may be referred to as 37 in places.
  72
  73 Among IBM EBCDIC character code sets there are 13 characters that
  74 are often mapped to different integer values.  Those characters
  75 are known as the 13 "variant" characters and are:
  76
  77     \ [ ] { } ^ ~ ! # | $ @ `
  78
  79 =head2 0037
  80
  81 Character code set ID 0037 is a mapping of the ASCII plus Latin-1
  82 characters (i.e. ISO 8859-1) to an EBCDIC set.  0037 is used
  83 in North American English locales on the OS/400 operating system
  84 that runs on AS/400 computers.  CCSID 37 differs from ISO 8859-1
  85 in 237 places, in other words they agree on only 19 code point values.
  86
  87 =head2 1047
  88
  89 Character code set ID 1047 is also a mapping of the ASCII plus
  90 Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set.  1047 is
  91 used under Unix System Services for OS/390, and OpenEdition for VM/ESA.
  92 CCSID 1047 differs from CCSID 0037 in eight places.
  93
  94 =head2 POSIX-BC
  95
  96 The EBCDIC code page in use on Siemens' BS2000 system is distinct from
  97 1047 and 0037.  It is identified below as the POSIX-BC set.
  98
  99 =head1 SINGLE OCTET TABLES
 100
 101 The following tables list the ASCII and Latin 1 ordered sets including
 102 the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f),
 103 C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff).  In the
 104 table non-printing control character names as well as the Latin 1
 105 extensions to ASCII have been labelled with character names roughly
 106 corresponding to I<The Unicode Standard, Version 2.0> albeit with
 107 substitutions such as s/LATIN// and s/VULGAR// in all cases,
 108 s/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/
 109 in some other cases.  The "names" of the C1 control set
 110 (128..159 in ISO 8859-1) are somewhat arbitrary.  The differences
 111 between the 0037 and 1047 sets are flagged with ***.  The differences
 112 between the 1047 and POSIX-BC sets are flagged with ###.
 113 All ord() numbers listed are decimal.  If you would rather see this
 114 table listing octal values then run the table (that is, the pod
 115 version of this document since this recipe may not work with
 116 a pod2_other_format translation) through:
 117
 118 =over 4
 119
 120 =item recipe 0
 121
 122 =back
 123
 124     perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
 125      -e '{printf("%s%-9o%-9o%-9o%-9o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
 126
 127 If you would rather see this table listing hexadecimal values then
 128 run the table through:
 129
 130 =over 4
 131
 132 =item recipe 1
 133
 134 =back
 135
 136     perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \
 137      -e '{printf("%s%-9X%-9X%-9X%-9X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod
 138
 139
 140                                  8859-1
 141     chr                          0819     0037     1047     POSIX-BC
 142     ----------------------------------------------------------------
 143     <NULL>                       0        0        0        0
 144     <START OF HEADING>           1        1        1        1
 145     <START OF TEXT>              2        2        2        2
 146     <END OF TEXT>                3        3        3        3
 147     <END OF TRANSMISSION>        4        55       55       55
 148     <ENQUIRY>                    5        45       45       45
 149     <ACKNOWLEDGE>                6        46       46       46
 150     <BELL>                       7        47       47       47
 151     <BACKSPACE>                  8        22       22       22
 152     <HORIZONTAL TABULATION>      9        5        5        5
 153     <LINE FEED>                  10       37       21       21  ***
 154     <VERTICAL TABULATION>        11       11       11       11
 155     <FORM FEED>                  12       12       12       12
 156     <CARRIAGE RETURN>            13       13       13       13
 157     <SHIFT OUT>                  14       14       14       14
 158     <SHIFT IN>                   15       15       15       15
 159     <DATA LINK ESCAPE>           16       16       16       16
 160     <DEVICE CONTROL ONE>         17       17       17       17
 161     <DEVICE CONTROL TWO>         18       18       18       18
 162     <DEVICE CONTROL THREE>       19       19       19       19
 163     <DEVICE CONTROL FOUR>        20       60       60       60
 164     <NEGATIVE ACKNOWLEDGE>       21       61       61       61
 165     <SYNCHRONOUS IDLE>           22       50       50       50
 166     <END OF TRANSMISSION BLOCK>  23       38       38       38
 167     <CANCEL>                     24       24       24       24
 168     <END OF MEDIUM>              25       25       25       25
 169     <SUBSTITUTE>                 26       63       63       63
 170     <ESCAPE>                     27       39       39       39
 171     <FILE SEPARATOR>             28       28       28       28
 172     <GROUP SEPARATOR>            29       29       29       29
 173     <RECORD SEPARATOR>           30       30       30       30
 174     <UNIT SEPARATOR>             31       31       31       31
 175     <SPACE>                      32       64       64       64
 176     !                            33       90       90       90
 177     "                            34       127      127      127
 178     #                            35       123      123      123
 179     $                            36       91       91       91
 180     %                            37       108      108      108
 181     &                            38       80       80       80
 182     '                            39       125      125      125
 183     (                            40       77       77       77
 184     )                            41       93       93       93
 185     *                            42       92       92       92
 186     +                            43       78       78       78
 187     ,                            44       107      107      107
 188     -                            45       96       96       96
 189     .                            46       75       75       75
 190     /                            47       97       97       97
 191     0                            48       240      240      240
 192     1                            49       241      241      241
 193     2                            50       242      242      242
 194     3                            51       243      243      243
 195     4                            52       244      244      244
 196     5                            53       245      245      245
 197     6                            54       246      246      246
 198     7                            55       247      247      247
 199     8                            56       248      248      248
 200     9                            57       249      249      249
 201     :                            58       122      122      122
 202     ;                            59       94       94       94
 203     <                            60       76       76       76
 204     =                            61       126      126      126
 205     >                            62       110      110      110
 206     ?                            63       111      111      111
 207     @                            64       124      124      124
 208     A                            65       193      193      193
 209     B                            66       194      194      194
 210     C                            67       195      195      195
 211     D                            68       196      196      196
 212     E                            69       197      197      197
 213     F                            70       198      198      198
 214     G                            71       199      199      199
 215     H                            72       200      200      200
 216     I                            73       201      201      201
 217     J                            74       209      209      209
 218     K                            75       210      210      210
 219     L                            76       211      211      211
 220     M                            77       212      212      212
 221     N                            78       213      213      213
 222     O                            79       214      214      214
 223     P                            80       215      215      215
 224     Q                            81       216      216      216
 225     R                            82       217      217      217
 226     S                            83       226      226      226
 227     T                            84       227      227      227
 228     U                            85       228      228      228
 229     V                            86       229      229      229
 230     W                            87       230      230      230
 231     X                            88       231      231      231
 232     Y                            89       232      232      232
 233     Z                            90       233      233      233
 234     [                            91       186      173      187 *** ###
 235     \                            92       224      224      188 ###
 236     ]                            93       187      189      189 ***
 237     ^                            94       176      95       106 *** ###
 238     _                            95       109      109      109
 239     `                            96       121      121      74  ###
 240     a                            97       129      129      129
 241     b                            98       130      130      130
 242     c                            99       131      131      131
 243     d                            100      132      132      132
 244     e                            101      133      133      133
 245     f                            102      134      134      134
 246     g                            103      135      135      135
 247     h                            104      136      136      136
 248     i                            105      137      137      137
 249     j                            106      145      145      145
 250     k                            107      146      146      146
 251     l                            108      147      147      147
 252     m                            109      148      148      148
 253     n                            110      149      149      149
 254     o                            111      150      150      150
 255     p                            112      151      151      151
 256     q                            113      152      152      152
 257     r                            114      153      153      153
 258     s                            115      162      162      162
 259     t                            116      163      163      163
 260     u                            117      164      164      164
 261     v                            118      165      165      165
 262     w                            119      166      166      166
 263     x                            120      167      167      167
 264     y                            121      168      168      168
 265     z                            122      169      169      169
 266     {                            123      192      192      251 ###
 267     |                            124      79       79       79
 268     }                            125      208      208      253 ###
 269     ~                            126      161      161      255 ###
 270     <DELETE>                     127      7        7        7
 271     <C1 0>                       128      32       32       32
 272     <C1 1>                       129      33       33       33
 273     <C1 2>                       130      34       34       34
 274     <C1 3>                       131      35       35       35
 275     <C1 4>                       132      36       36       36
 276     <C1 5>                       133      21       37       37  ***
 277     <C1 6>                       134      6        6        6
 278     <C1 7>                       135      23       23       23
 279     <C1 8>                       136      40       40       40
 280     <C1 9>                       137      41       41       41
 281     <C1 10>                      138      42       42       42
 282     <C1 11>                      139      43       43       43
 283     <C1 12>                      140      44       44       44
 284     <C1 13>                      141      9        9        9
 285     <C1 14>                      142      10       10       10
 286     <C1 15>                      143      27       27       27
 287     <C1 16>                      144      48       48       48
 288     <C1 17>                      145      49       49       49
 289     <C1 18>                      146      26       26       26
 290     <C1 19>                      147      51       51       51
 291     <C1 20>                      148      52       52       52
 292     <C1 21>                      149      53       53       53
 293     <C1 22>                      150      54       54       54
 294     <C1 23>                      151      8        8        8
 295     <C1 24>                      152      56       56       56
 296     <C1 25>                      153      57       57       57
 297     <C1 26>                      154      58       58       58
 298     <C1 27>                      155      59       59       59
 299     <C1 28>                      156      4        4        4
 300     <C1 29>                      157      20       20       20
 301     <C1 30>                      158      62       62       62
 302     <C1 31>                      159      255      255      95  ###
 303     <NON-BREAKING SPACE>         160      65       65       65
 304     <INVERTED EXCLAMATION MARK>  161      170      170      170
 305     <CENT SIGN>                  162      74       74       176 ###
 306     <POUND SIGN>                 163      177      177      177
 307     <CURRENCY SIGN>              164      159      159      159
 308     <YEN SIGN>                   165      178      178      178
 309     <BROKEN BAR>                 166      106      106      208 ###
 310     <SECTION SIGN>               167      181      181      181
 311     <DIAERESIS>                  168      189      187      121 *** ###
 312     <COPYRIGHT SIGN>             169      180      180      180
 313     <FEMININE ORDINAL INDICATOR> 170      154      154      154
 314     <LEFT POINTING GUILLEMET>    171      138      138      138
 315     <NOT SIGN>                   172      95       176      186 *** ###
 316     <SOFT HYPHEN>                173      202      202      202
 317     <REGISTERED TRADE MARK SIGN> 174      175      175      175
 318     <MACRON>                     175      188      188      161 ###
 319     <DEGREE SIGN>                176      144      144      144
 320     <PLUS-OR-MINUS SIGN>         177      143      143      143
 321     <SUPERSCRIPT TWO>            178      234      234      234
 322     <SUPERSCRIPT THREE>          179      250      250      250
 323     <ACUTE ACCENT>               180      190      190      190
 324     <MICRO SIGN>                 181      160      160      160
 325     <PARAGRAPH SIGN>             182      182      182      182
 326     <MIDDLE DOT>                 183      179      179      179
 327     <CEDILLA>                    184      157      157      157
 328     <SUPERSCRIPT ONE>            185      218      218      218
 329     <MASC. ORDINAL INDICATOR>    186      155      155      155
 330     <RIGHT POINTING GUILLEMET>   187      139      139      139
 331     <FRACTION ONE QUARTER>       188      183      183      183
 332     <FRACTION ONE HALF>          189      184      184      184
 333     <FRACTION THREE QUARTERS>    190      185      185      185
 334     <INVERTED QUESTION MARK>     191      171      171      171
 335     <A WITH GRAVE>               192      100      100      100
 336     <A WITH ACUTE>               193      101      101      101
 337     <A WITH CIRCUMFLEX>          194      98       98       98
 338     <A WITH TILDE>               195      102      102      102
 339     <A WITH DIAERESIS>           196      99       99       99
 340     <A WITH RING ABOVE>          197      103      103      103
 341     <CAPITAL LIGATURE AE>        198      158      158      158
 342     <C WITH CEDILLA>             199      104      104      104
 343     <E WITH GRAVE>               200      116      116      116
 344     <E WITH ACUTE>               201      113      113      113
 345     <E WITH CIRCUMFLEX>          202      114      114      114
 346     <E WITH DIAERESIS>           203      115      115      115
 347     <I WITH GRAVE>               204      120      120      120
 348     <I WITH ACUTE>               205      117      117      117
 349     <I WITH CIRCUMFLEX>          206      118      118      118
 350     <I WITH DIAERESIS>           207      119      119      119
 351     <CAPITAL LETTER ETH>         208      172      172      172
 352     <N WITH TILDE>               209      105      105      105
 353     <O WITH GRAVE>               210      237      237      237
 354     <O WITH ACUTE>               211      238      238      238
 355     <O WITH CIRCUMFLEX>          212      235      235      235
 356     <O WITH TILDE>               213      239      239      239
 357     <O WITH DIAERESIS>           214      236      236      236
 358     <MULTIPLICATION SIGN>        215      191      191      191
 359     <O WITH STROKE>              216      128      128      128
 360     <U WITH GRAVE>               217      253      253      224 ###
 361     <U WITH ACUTE>               218      254      254      254
 362     <U WITH CIRCUMFLEX>          219      251      251      221 ###
 363     <U WITH DIAERESIS>           220      252      252      252
 364     <Y WITH ACUTE>               221      173      186      173 *** ###
 365     <CAPITAL LETTER THORN>       222      174      174      174
 366     <SMALL LETTER SHARP S>       223      89       89       89
 367     <a WITH GRAVE>               224      68       68       68
 368     <a WITH ACUTE>               225      69       69       69
 369     <a WITH CIRCUMFLEX>          226      66       66       66
 370     <a WITH TILDE>               227      70       70       70
 371     <a WITH DIAERESIS>           228      67       67       67
 372     <a WITH RING ABOVE>          229      71       71       71
 373     <SMALL LIGATURE ae>          230      156      156      156
 374     <c WITH CEDILLA>             231      72       72       72
 375     <e WITH GRAVE>               232      84       84       84
 376     <e WITH ACUTE>               233      81       81       81
 377     <e WITH CIRCUMFLEX>          234      82       82       82
 378     <e WITH DIAERESIS>           235      83       83       83
 379     <i WITH GRAVE>               236      88       88       88
 380     <i WITH ACUTE>               237      85       85       85
 381     <i WITH CIRCUMFLEX>          238      86       86       86
 382     <i WITH DIAERESIS>           239      87       87       87
 383     <SMALL LETTER eth>           240      140      140      140
 384     <n WITH TILDE>               241      73       73       73
 385     <o WITH GRAVE>               242      205      205      205
 386     <o WITH ACUTE>               243      206      206      206
 387     <o WITH CIRCUMFLEX>          244      203      203      203
 388     <o WITH TILDE>               245      207      207      207
 389     <o WITH DIAERESIS>           246      204      204      204
 390     <DIVISION SIGN>              247      225      225      225
 391     <o WITH STROKE>              248      112      112      112
 392     <u WITH GRAVE>               249      221      221      192 ###
 393     <u WITH ACUTE>               250      222      222      222
 394     <u WITH CIRCUMFLEX>          251      219      219      219
 395     <u WITH DIAERESIS>           252      220      220      220
 396     <y WITH ACUTE>               253      141      141      141
 397     <SMALL LETTER thorn>         254      142      142      142
 398     <y WITH DIAERESIS>           255      223      223      223
 399
 400 If you would rather see the above table in CCSID 0037 order rather than
 401 ASCII + Latin-1 order then run the table through:
 402
 403 =over 4
 404
 405 =item recipe 2
 406
 407 =back
 408
 409     perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
 410      -e '{push(@l,$_)}' \
 411      -e 'END{print map{$_->[0]}' \
 412      -e '          sort{$a->[1] <=> $b->[1]}' \
 413      -e '          map{[$_,substr($_,42,3)]}@l;}' perlebcdic.pod
 414
 415 If you would rather see it in CCSID 1047 order then change the digit
 416 42 in the last line to 51, like this:
 417
 418 =over 4
 419
 420 =item recipe 3
 421
 422 =back
 423
 424     perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
 425      -e '{push(@l,$_)}' \
 426      -e 'END{print map{$_->[0]}' \
 427      -e '          sort{$a->[1] <=> $b->[1]}' \
 428      -e '          map{[$_,substr($_,51,3)]}@l;}' perlebcdic.pod
 429
 430 If you would rather see it in POSIX-BC order then change the digit
 431 51 in the last line to 60, like this:
 432
 433 =over 4
 434
 435 =item recipe 4
 436
 437 =back
 438
 439     perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\
 440      -e '{push(@l,$_)}' \
 441      -e 'END{print map{$_->[0]}' \
 442      -e '          sort{$a->[1] <=> $b->[1]}' \
 443      -e '          map{[$_,substr($_,60,3)]}@l;}' perlebcdic.pod
 444
 445
 446 =head1 IDENTIFYING CHARACTER CODE SETS
 447
 448 To determine the character set you are running under from perl one
 449 could use the return value of ord() or chr() to test one or more
 450 character values.  For example:
 451
 452     $is_ascii  = "A" eq chr(65);
 453     $is_ebcdic = "A" eq chr(193);
 454
 455 Also, "\t" is a C<HORIZONTAL TABULATION> character so that:
 456
 457     $is_ascii  = ord("\t") == 9;
 458     $is_ebcdic = ord("\t") == 5;
 459
 460 To distinguish EBCDIC code pages try looking at one or more of
 461 the characters that differ between them.  For example:
 462
 463     $is_ebcdic_37   = "\n" eq chr(37);
 464     $is_ebcdic_1047 = "\n" eq chr(21);
 465
 466 Or better still choose a character that is uniquely encoded in any
 467 of the code sets, e.g.:
 468
 469     $is_ascii           = ord('[') == 91;
 470     $is_ebcdic_37       = ord('[') == 186;
 471     $is_ebcdic_1047     = ord('[') == 173;
 472     $is_ebcdic_POSIX_BC = ord('[') == 187;
 473
 474 However, it would be unwise to write tests such as:
 475
 476     $is_ascii = "\r" ne chr(13);  #  WRONG
 477     $is_ascii = "\n" ne chr(10);  #  ILL ADVISED
 478
 479 Obviously the first of these will fail to distinguish most ASCII machines
 480 from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC machine since "\r" eq
 481 chr(13) under all of those coded character sets.  But note too that
 482 because "\n" is chr(13) and "\r" is chr(10) on the MacIntosh (which is an
 483 ASCII machine) the second C<$is_ascii> test will lead to trouble there.
 484
 485 To determine whether or not perl was built under an EBCDIC
 486 code page you can use the Config module like so:
 487
 488     use Config;
 489     $is_ebcdic = $Config{'ebcdic'} eq 'define';
 490
 491 =head1 CONVERSIONS
 492
 493 In order to convert a string of characters from one character set to
 494 another a simple list of numbers, such as in the right columns in the
 495 above table, along with perl's tr/// operator is all that is needed.
 496 The data in the table are in ASCII order hence the EBCDIC columns
 497 provide easy to use ASCII to EBCDIC operations that are also easily
 498 reversed.
 499
 500 For example, to convert ASCII to code page 037 take the output of the second
 501 column from the output of recipe 0 and use it in tr/// like so:
 502
 503     $cp_037 =
 504     '\000\001\002\003\234\011\206\177\227\215\216\013\014\015\016\017' .
 505     '\020\021\022\023\235\205\010\207\030\031\222\217\034\035\036\037' .
 506     '\200\201\202\203\204\012\027\033\210\211\212\213\214\005\006\007' .
 507     '\220\221\026\223\224\225\226\004\230\231\232\233\024\025\236\032' .
 508     '\040\240\342\344\340\341\343\345\347\361\242\056\074\050\053\174' .
 509     '\046\351\352\353\350\355\356\357\354\337\041\044\052\051\073\254' .
 510     '\055\057\302\304\300\301\303\305\307\321\246\054\045\137\076\077' .
 511     '\370\311\312\313\310\315\316\317\314\140\072\043\100\047\075\042' .
 512     '\330\141\142\143\144\145\146\147\150\151\253\273\360\375\376\261' .
 513     '\260\152\153\154\155\156\157\160\161\162\252\272\346\270\306\244' .
 514     '\265\176\163\164\165\166\167\170\171\172\241\277\320\335\336\256' .
 515     '\136\243\245\267\251\247\266\274\275\276\133\135\257\250\264\327' .
 516     '\173\101\102\103\104\105\106\107\110\111\255\364\366\362\363\365' .
 517     '\175\112\113\114\115\116\117\120\121\122\271\373\374\371\372\377' .
 518     '\134\367\123\124\125\126\127\130\131\132\262\324\326\322\323\325' .
 519     '\060\061\062\063\064\065\066\067\070\071\263\333\334\331\332\237' ;
 520
 521     my $ebcdic_string = $ascii_string;
 522     $ebcdic_string = tr/\000-\377/$cp_037/;
 523
 524 To convert from EBCDIC to ASCII just reverse the order of the tr///
 525 arguments like so:
 526
 527     my $ascii_string = $ebcdic_string;
 528     $ascii_string = tr/$code_page_chrs/\000-\037/;
 529
 530 XPG4 operability often implies the presence of an I<iconv> utility
 531 available from the shell or from the C library.  Consult your system's
 532 documentation for information on iconv.
 533
 534 On OS/390 see the iconv(1) man page.  One way to invoke the iconv
 535 shell utility from within perl would be to:
 536
 537     # OS/390 example
 538     $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1`
 539
 540 or the inverse map:
 541
 542     # OS/390 example
 543     $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047`
 544
 545 For other perl based conversion options see the Convert::* modules on CPAN.
 546
 547 =head1 OPERATOR DIFFERENCES
 548
 549 The C<..> range operator treats certain character ranges with
 550 care on EBCDIC machines.  For example the following array
 551 will have twenty six elements on either an EBCDIC machine
 552 or an ASCII machine:
 553
 554     @alphabet = ('A'..'Z');   #  $#alphabet == 25
 555
 556 The bitwise operators such as & ^ | may return different results
 557 when operating on string or character data in a perl program running
 558 on an EBCDIC machine than when run on an ASCII machine.  Here is
 559 an example adapted from the one in L<perlop>:
 560
 561     # EBCDIC-based examples
 562     print "j p \n" ^ " a h";                      # prints "JAPH\n"
 563     print "JA" | "  ph\n";                        # prints "japh\n"
 564     print "JAPH\nJunk" & "\277\277\277\277\277";  # prints "japh\n";
 565     print 'p N$' ^ " E<H\n";                      # prints "Perl\n";
 566
 567 An interesting property of the 32 C0 control characters
 568 in the ASCII table is that they can "literally" be constructed
 569 as control characters in perl, e.g. C<(chr(0) eq "\c@")>
 570 C<(chr(1) eq "\cA")>, and so on.  Perl on EBCDIC machines has been
 571 ported to take "\c@" to chr(0) and "\cA" to chr(1) as well, but the
 572 thirty three characters that result depend on which code page you are
 573 using.  The table below uses the character names from the previous table
 574 but with substitutions such as s/START OF/S.O./; s/END OF /E.O./;
 575 s/TRANSMISSION/TRANS./; s/TABULATION/TAB./; s/VERTICAL/VERT./;
 576 s/HORIZONTAL/HORIZ./; s/DEVICE CONTROL/D.C./; s/SEPARATOR/SEP./;
 577 s/NEGATIVE ACKNOWLEDGE/NEG. ACK./;.  The POSIX-BC and 1047 sets are
 578 identical throughout this range and differ from the 0037 set at only
 579 one spot (21 decimal).  Note that the C<LINE FEED> character
 580 may be generated by "\cJ" on ASCII machines but by "\cU" on 1047 or POSIX-BC
 581 machines and cannot be generated as a C<"\c.letter."> control character on
 582 0037 machines.  Note also that "\c\\" maps to two characters
 583 not one.
 584
 585     chr   ord  8859-1               0037                1047 && POSIX-BC
 586     ------------------------------------------------------------------------
 587     "\c?" 127  <DELETE>             "                   "              ***><
 588     "\c@"   0  <NULL>               <NULL>              <NULL>         ***><
 589     "\cA"   1  <S.O. HEADING>       <S.O. HEADING>      <S.O. HEADING>
 590     "\cB"   2  <S.O. TEXT>          <S.O. TEXT>         <S.O. TEXT>
 591     "\cC"   3  <E.O. TEXT>          <E.O. TEXT>         <E.O. TEXT>
 592     "\cD"   4  <E.O. TRANS.>        <C1 28>             <C1 28>
 593     "\cE"   5  <ENQUIRY>            <HORIZ. TAB.>       <HORIZ. TAB.>
 594     "\cF"   6  <ACKNOWLEDGE>        <C1 6>              <C1 6>
 595     "\cG"   7  <BELL>               <DELETE>            <DELETE>
 596     "\cH"   8  <BACKSPACE>          <C1 23>             <C1 23>
 597     "\cI"   9  <HORIZ. TAB.>        <C1 13>             <C1 13>
 598     "\cJ"  10  <LINE FEED>          <C1 14>             <C1 14>
 599     "\cK"  11  <VERT. TAB.>         <VERT. TAB.>        <VERT. TAB.>
 600     "\cL"  12  <FORM FEED>          <FORM FEED>         <FORM FEED>
 601     "\cM"  13  <CARRIAGE RETURN>    <CARRIAGE RETURN>   <CARRIAGE RETURN>
 602     "\cN"  14  <SHIFT OUT>          <SHIFT OUT>         <SHIFT OUT>
 603     "\cO"  15  <SHIFT IN>           <SHIFT IN>          <SHIFT IN>
 604     "\cP"  16  <DATA LINK ESCAPE>   <DATA LINK ESCAPE>  <DATA LINK ESCAPE>
 605     "\cQ"  17  <D.C. ONE>           <D.C. ONE>          <D.C. ONE>
 606     "\cR"  18  <D.C. TWO>           <D.C. TWO>          <D.C. TWO>
 607     "\cS"  19  <D.C. THREE>         <D.C. THREE>        <D.C. THREE>
 608     "\cT"  20  <D.C. FOUR>          <C1 29>             <C1 29>
 609     "\cU"  21  <NEG. ACK.>          <C1 5>              <LINE FEED>    ***
 610     "\cV"  22  <SYNCHRONOUS IDLE>   <BACKSPACE>         <BACKSPACE>
 611     "\cW"  23  <E.O. TRANS. BLOCK>  <C1 7>              <C1 7>
 612     "\cX"  24  <CANCEL>             <CANCEL>            <CANCEL>
 613     "\cY"  25  <E.O. MEDIUM>        <E.O. MEDIUM>       <E.O. MEDIUM>
 614     "\cZ"  26  <SUBSTITUTE>         <C1 18>             <C1 18>
 615     "\c["  27  <ESCAPE>             <C1 15>             <C1 15>
 616     "\c\\" 28  <FILE SEP.>\         <FILE SEP.>\        <FILE SEP.>\
 617     "\c]"  29  <GROUP SEP.>         <GROUP SEP.>        <GROUP SEP.>
 618     "\c^"  30  <RECORD SEP.>        <RECORD SEP.>       <RECORD SEP.>  ***><
 619     "\c_"  31  <UNIT SEP.>          <UNIT SEP.>         <UNIT SEP.>    ***><
 620
 621
 622 =head1 FUNCTION DIFFERENCES
 623
 624 =over 8
 625
 626 =item chr()
 627
 628 chr() must be given an EBCDIC code number argument to yield a desired
 629 character return value on an EBCDIC machine.  For example:
 630
 631     $CAPITAL_LETTER_A = chr(193);
 632
 633 =item ord()
 634
 635 ord() will return EBCDIC code number values on an EBCDIC machine.
 636 For example:
 637
 638     $the_number_193 = ord("A");
 639
 640 =item pack()
 641
 642 The c and C templates for pack() are dependent upon character set
 643 encoding.  Examples of usage on EBCDIC include:
 644
 645     $foo = pack("CCCC",193,194,195,196);
 646     # $foo eq "ABCD"
 647     $foo = pack("C4",193,194,195,196);
 648     # same thing
 649
 650     $foo = pack("ccxxcc",193,194,195,196);
 651     # $foo eq "AB\0\0CD"
 652
 653 =item print()
 654
 655 One must be careful with scalars and strings that are passed to
 656 print that contain ASCII encodings.  One common place
 657 for this to occur is in the output of the MIME type header for
 658 CGI script writing.  For example, many perl programming guides
 659 recommend something similar to:
 660
 661     print "Content-type:\ttext/html\015\012\015\012";
 662     # this may be wrong on EBCDIC
 663
 664 Under the IBM OS/390 USS Web Server for example you should instead
 665 write that as:
 666
 667     print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et alia
 668
 669 That is because the translation from EBCDIC to ASCII is done
 670 by the web server in this case (such code will not be appropriate for
 671 the Macintosh however).  Consult your web server's documentation for
 672 further details.
 673
 674 =item printf()
 675
 676 The formats that can convert characters to numbers and vice versa
 677 will be different from their ASCII counterparts when executed
 678 on an EBCDIC machine.  Examples include:
 679
 680     printf("%c%c%c",193,194,195);  # prints ABC
 681
 682 =item sort()
 683
 684 EBCDIC sort results may differ from ASCII sort results especially for
 685 mixed case strings.  This is discussed in more detail below.
 686
 687 =item sprintf()
 688
 689 See the discussion of printf() above.  An example of the use
 690 of sprintf would be:
 691
 692     $CAPITAL_LETTER_A = sprintf("%c",193);
 693
 694 =item unpack()
 695
 696 See the discussion of pack() above.
 697
 698 =back
 699
 700 =head1 REGULAR EXPRESSION DIFFERENCES
 701
 702 As of perl 5.005_03 the letter range regular expression such as
 703 [A-Z] and [a-z] have been especially coded to not pick up gap
 704 characters.  For example, characters such as E<ocirc> C<o WITH CIRCUMFLEX>
 705 that lie between I and J would not be matched by the
 706 regular expression range C</[H-K]/>.
 707
 708 If you do want to match the alphabet gap characters in a single octet
 709 regular expression try matching the hex or octal code such
 710 as C</\313/> on EBCDIC or C</\364/> on ASCII machines to
 711 have your regular expression match C<o WITH CIRCUMFLEX>.
 712
 713 Another construct to be wary of is the inappropriate use of hex or
 714 octal constants in regular expressions.  Consider the following
 715 set of subs:
 716
 717     sub is_c0 {
 718         my $char = substr(shift,0,1);
 719         $char =~ /[\000-\037]/;
 720     }
 721
 722     sub is_print_ascii {
 723         my $char = substr(shift,0,1);
 724         $char =~ /[\040-\176]/;
 725     }
 726
 727     sub is_delete {
 728         my $char = substr(shift,0,1);
 729         $char eq "\177";
 730     }
 731
 732     sub is_c1 {
 733         my $char = substr(shift,0,1);
 734         $char =~ /[\200-\237]/;
 735     }
 736
 737     sub is_latin_1 {
 738         my $char = substr(shift,0,1);
 739         $char =~ /[\240-\377]/;
 740     }
 741
 742 The above would be adequate if the concern was only with numeric code points.
 743 However, the concern may be with characters rather than code points
 744 and on an EBCDIC machine it may be desirable for constructs such as
 745 C<if (is_print_ascii("A")) {print "A is a printable character\n";}> to print
 746 out the expected message.  One way to represent the above collection
 747 of character classification subs that is capable of working across the
 748 four coded character sets discussed in this document is as follows:
 749
 750     sub Is_c0 {
 751         my $char = substr(shift,0,1);
 752         if (ord('^')==94)  { # ascii
 753             return $char =~ /[\000-\037]/;
 754         }
 755         if (ord('^')==176) { # 37
 756             return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
 757         }
 758         if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc
 759             return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/;
 760         }
 761     }
 762
 763     sub Is_print_ascii {
 764         my $char = substr(shift,0,1);
 765         $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/;
 766     }
 767
 768     sub Is_delete {
 769         my $char = substr(shift,0,1);
 770         if (ord('^')==94)  { # ascii
 771             return $char eq "\177";
 772         }
 773         else  {              # ebcdic
 774             return $char eq "\007";
 775         }
 776     }
 777
 778     sub Is_c1 {
 779         my $char = substr(shift,0,1);
 780         if (ord('^')==94)  { # ascii
 781             return $char =~ /[\200-\237]/;
 782         }
 783         if (ord('^')==176) { # 37
 784             return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
 785         }
 786         if (ord('^')==95)  { # 1047
 787             return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/;
 788         }
 789         if (ord('^')==106) { # posix-bc
 790             return $char =~
 791               /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/;
 792         }
 793     }
 794
 795     sub Is_latin_1 {
 796         my $char = substr(shift,0,1);
 797         if (ord('^')==94)  { # ascii
 798             return $char =~ /[\240-\377]/;
 799         }
 800         if (ord('^')==176) { # 37
 801             return $char =~
 802               /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
 803         }
 804         if (ord('^')==95)  { # 1047
 805             return $char =~
 806               /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/;
 807         }
 808         if (ord('^')==106) { # posix-bc
 809             return $char =~
 810               /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/;
 811         }
 812     }
 813
 814 Note however that only the C<Is_ascii_print()> sub is really independent
 815 of coded character set.  Another way to write C<Is_latin_1()> would be
 816 to use the characters in the range explicitly:
 817
 818     sub Is_latin_1 {
 819         my $char = substr(shift,0,1);
 820         $char =~ /[ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/;
 821     }
 822
 823 Although that form may run into trouble in network transit (due to the
 824 presence of 8 bit characters) or on non ISO-Latin character sets.
 825
 826 =head1 SOCKETS
 827
 828 Most socket programming assumes ASCII character encodings in network
 829 byte order.  Exceptions can include CGI script writing under a
 830 host web server where the server may take care of translation for you.
 831 Most host web servers convert EBCDIC data to ISO-8859-1 or Unicode on
 832 output.
 833
 834 =head1 SORTING
 835
 836 One big difference between ASCII based character sets and EBCDIC ones
 837 are the relative positions of upper and lower case letters and the
 838 letters compared to the digits.  If sorted on an ASCII based machine the
 839 two letter abbreviation for a physician comes before the two letter
 840 for drive, that is:
 841
 842     @sorted = sort(qw(Dr. dr.));  # @sorted holds ('Dr.','dr.') on ASCII,
 843                                   # but ('dr.','Dr.') on EBCDIC
 844
 845 The property of lower case before uppercase letters in EBCDIC is
 846 even carried to the Latin 1 EBCDIC pages such as 0037 and 1047.
 847 An example would be that E<Euml> C<E WITH DIAERESIS> (203) comes
 848 before E<euml> C<e WITH DIAERESIS> (235) on an ASCII machine, but
 849 the latter (83) comes before the former (115) on an EBCDIC machine.
 850 (Astute readers will note that the upper case version of E<szlig>
 851 C<SMALL LETTER SHARP S> is simply "SS" and that the upper case version of
 852 E<yuml> C<y WITH DIAERESIS> is not in the 0..255 range but it is
 853 at U+x0178 in Unicode, or C<"\x{178}"> in a Unicode enabled Perl).
 854
 855 The sort order will cause differences between results obtained on
 856 ASCII machines versus EBCDIC machines.  What follows are some suggestions
 857 on how to deal with these differences.
 858
 859 =head2 Ignore ASCII vs. EBCDIC sort differences.
 860
 861 This is the least computationally expensive strategy.  It may require
 862 some user education.
 863
 864 =head2 MONO CASE then sort data.
 865
 866 In order to minimize the expense of mono casing mixed test try to
 867 C<tr///> towards the character set case most employed within the data.
 868 If the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/
 869 then sort().  If the data are primarily lowercase non Latin 1 then
 870 apply tr/[A-Z]/[a-z]/ before sorting.  If the data are primarily UPPERCASE
 871 and include Latin-1 characters then apply:
 872
 873     tr/[a-z]/[A-Z]/;
 874     tr/[àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ]/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ]/;
 875     s/ß/SS/g;
 876
 877 then sort().  Do note however that such Latin-1 manipulation does not
 878 address the E<yuml> C<y WITH DIAERESIS> character that will remain at
 879 code point 255 on ASCII machines, but 223 on most EBCDIC machines
 880 where it will sort to a place less than the EBCDIC numerals.  With a
 881 Unicode enabled Perl you might try:
 882
 883     tr/^?/\x{178}/;
 884
 885 The strategy of mono casing data before sorting does not preserve the case
 886 of the data and may not be acceptable for that reason.
 887
 888 =head2 Convert, sort data, then re convert.
 889
 890 This is the most expensive proposition that does not employ a network
 891 connection.
 892
 893 =head2 Perform sorting on one type of machine only.
 894
 895 This strategy can employ a network connection.  As such
 896 it would be computationally expensive.
 897
 898 =head1 URL ENCODING and DECODING
 899
 900 Note that some URLs have hexadecimal ASCII code points in them in an
 901 attempt to overcome character limitation issues.  For example the
 902 tilde character is not on every keyboard hence a URL of the form:
 903
 904     http://www.pvhp.com/~pvhp/
 905
 906 may also be expressed as either of:
 907
 908     http://www.pvhp.com/%7Epvhp/
 909
 910     http://www.pvhp.com/%7epvhp/
 911
 912 where 7E is the hexadecimal ASCII code point for '~'.  Here is an example
 913 of decoding such a URL under CCSID 1047:
 914
 915     $url = 'http://www.pvhp.com/%7Epvhp/';
 916     # this array assumes code page 1047
 917     my @a2e_1047 = (
 918           0,  1,  2,  3, 55, 45, 46, 47, 22,  5, 21, 11, 12, 13, 14, 15,
 919          16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,
 920          64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,
 921         240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,
 922         124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,
 923         215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,
 924         121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,
 925         151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161,  7,
 926          32, 33, 34, 35, 36, 37,  6, 23, 40, 41, 42, 43, 44,  9, 10, 27,
 927          48, 49, 26, 51, 52, 53, 54,  8, 56, 57, 58, 59,  4, 20, 62,255,
 928          65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188,
 929         144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171,
 930         100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119,
 931         172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89,
 932          68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87,
 933         140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223
 934     );
 935     $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge;
 936
 937 =head1 I18N AND L10N
 938
 939 Internationalization(I18N) and localization(L10N) are supported at least
 940 in principle even on EBCDIC machines.  The details are system dependent
 941 and discussed under the L<perlebcdic/OS ISSUES> section below.
 942
 943 =head1 MULTI OCTET CHARACTER SETS
 944
 945 Multi byte EBCDIC code pages; Unicode, UTF-8, UTF-EBCDIC, XXX.
 946
 947 =head1 OS ISSUES
 948
 949 There may be a few system dependent issues
 950 of concern to EBCDIC Perl programmers.
 951
 952 =head2 OS/400
 953
 954 The PASE environment.
 955
 956 =over 8
 957
 958 =item IFS access
 959
 960 XXX.
 961
 962 =back
 963
 964 =head2 OS/390
 965
 966 Perl runs under Unix Systems Services or USS.
 967
 968 =over 8
 969
 970 =item chcp
 971
 972 L<chcp> is supported as a shell utility for displaying and changing
 973 one's code page.
 974
 975 =item dataset access
 976
 977 For sequential data set access try:
 978
 979     my @ds_records = `cat //DSNAME`;
 980
 981 or:
 982
 983     my @ds_records = `cat //'HLQ.DSNAME'`;
 984
 985 See also the OS390::Stdio module on CPAN.
 986
 987 =item iconv
 988
 989 L<iconv> is supported as both a shell utility and a C RTL routine.
 990
 991 =item locales
 992
 993 On OS/390 see L<locale> for information on locales.  The L10N files
 994 are in F</usr/nls/locale>.  $Config{d_setlocale} is 'define' on OS/390.
 995
 996 =back
 997
 998 =head2 VM/ESA?
 999
1000 XXX.
1001
1002 =head2 POSIX-BC?
1003
1004 XXX.
1005
1006 =head1 BUGS
1007
1008 This pod document contains literal Latin 1 characters and may encounter
1009 translation difficulties.  In particular one popular nroff implementation
1010 was known to strip accented characters to their unaccented counterparts
1011 while attempting to view this document through the B<pod2man> program
1012 (for example, you may see a plain C<y> rather than one with a diaeresis
1013 as in E<yuml>).  Another nroff truncated the resultant man page at
1014 the first occurence of 8 bit characters.
1015
1016 Not all shells will allow multiple C<-e> string arguments to perl to
1017 be concatenated together properly as recipes 2, 3, and 4 might seem
1018 to imply.
1019
1020 Perl does not yet work with any Unicode features on EBCDIC platforms.
1021
1022 =head1 SEE ALSO
1023
1024 L<perllocale>, L<perlfunc>.
1025
1026 =head1 REFERENCES
1027
1028 http://anubis.dkuug.dk/i18n/charmaps
1029
1030 http://www.unicode.org/
1031
1032 http://www.unicode.org/unicode/reports/tr16/
1033
1034 http://www.wps.com/texts/codes/
1035 B<ASCII: American Standard Code for Information Infiltration> Tom Jennings,
1036 September 1999.
1037
1038 B<The Unicode Standard Version 2.0> The Unicode Consortium,
1039 ISBN 0-201-48345-9, Addison Wesley Developers Press, July 1996.
1040
1041 B<The Unicode Standard Version 3.0> The Unicode Consortium, Lisa Moore ed.,
1042 ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000.
1043
1044 B<CDRA: IBM - Character Data Representation Architecture -
1045 Reference and Registry>, IBM SC09-2190-00, December 1996.
1046
1047 "Demystifying Character Sets", Andrea Vine, Multilingual Computing
1048 & Technology, B<#26 Vol. 10 Issue 4>, August/September 1999;
1049 ISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA.
1050
1051 =head1 AUTHOR
1052
1053 Peter Prymmer pvhp@best.com wrote this in 1999 and 2000
1054 with CCSID 0819 and 0037 help from Chris Leach and
1055 AndrE<eacute> Pirard A.Pirard@ulg.ac.be as well as POSIX-BC
1056 help from Thomas Dorner Thomas.Dorner@start.de.
1057 Thanks also to Philip Newton and Vickie Cooper.  Trademarks, registered
1058 trademarks, service marks and registered service marks used in this
1059 document are the property of their respective owners.
1060
1061