Commit | Line | Data |
d396a558 |
1 | =head1 NAME |
2 | |
3 | perlebcdic - Considerations for running Perl on EBCDIC platforms |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | An exploration of some of the issues facing Perl programmers |
8 | on EBCDIC based computers. We do not cover localization, |
9 | internationalization, or multi byte character set issues (yet). |
10 | |
11 | Portions that are still incomplete are marked with XXX. |
12 | |
13 | =head1 COMMON CHARACTER CODE SETS |
14 | |
15 | =head2 ASCII |
16 | |
17 | The American Standard Code for Information Interchange is a set of |
18 | integers running from 0 to 127 (decimal) that imply character |
19 | interpretation by the display and other system(s) of computers. |
20 | The range 0..127 is covered by setting the bits in a 7-bit binary |
21 | digit, hence the set is sometimes referred to as a "7-bit ASCII". |
22 | ASCII was described by the American National Standards Instute |
23 | document ANSI X3.4-1986. It was also described by ISO 646:1991 |
24 | (with localization for currency symbols). The full ASCII set is |
25 | given in the table below as the first 128 elements. Languages that |
26 | can be written adequately with the characters in ASCII include |
27 | English, Hawaiian, Indonesian, Swahili and some Native American |
28 | languages. |
29 | |
30 | =head2 ISO 8859 |
31 | |
32 | The ISO 8859-$n are a collection of character code sets from the |
33 | International Organization for Standardization (ISO) each of which |
34 | adds characters to the ASCII set that are typically found in European |
35 | languages many of which are based on the Roman, or Latin, alphabet. |
36 | |
37 | =head2 Latin 1 (ISO 8859-1) |
38 | |
39 | A particular 8-bit extension to ASCII that includes grave and acute |
40 | accented Latin characters. Languages that can employ ISO 8859-1 |
41 | include all the languages covered by ASCII as well as Afrikaans, |
42 | Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, |
43 | Portugese, Spanish, and Swedish. Dutch is covered albeit without |
44 | the ij ligature. French is covered too but without the oe ligature. |
45 | German can use ISO 8859-1 but must do so without German-style |
46 | quotation marks. This set is based on Western European extensions |
47 | to ASCII and is commonly encountered in world wide web work. |
48 | In IBM character code set identification terminology ISO 8859-1 is |
49 | known as CCSID 819 (or sometimes 0819 or even 00819). |
50 | |
51 | =head2 EBCDIC |
52 | |
53 | Extended Binary Coded Decimal Interchange Code. The EBCDIC acronym |
54 | refers to a large collection of slightly different single and |
55 | multi byte coded character sets that are different from ASCII or |
56 | ISO 8859-1 and typically run on host computers. The |
57 | EBCDIC encodings derive from Hollerith punched card encodings. |
58 | The layout on the cards was such that high bits were set for the |
59 | upper and lower case alphabet characters [a-z] and [A-Z], but there |
60 | were gaps within each latin alphabet range. |
61 | |
62 | =head2 13 variant characters |
63 | |
64 | XXX. |
65 | |
66 | EBCDIC character sets may be known by character code set identification |
67 | numbers (CCSID numbers) or code page numbers. |
68 | |
69 | =head2 0037 |
70 | |
71 | Character code set ID 0037 is a mapping of the ASCII plus Latin-1 |
72 | characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used |
73 | on the OS/400 operating system that runs on AS/400 computers. |
74 | CCSID 37 differs from ISO 8859-1 in 237 places, in other words |
75 | they agree on only 19 code point values. |
76 | |
77 | =head2 1047 |
78 | |
79 | Character code set ID 1047 is also a mapping of the ASCII plus |
80 | Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is |
81 | used under Unix System Services for OS/390, and OpenEdition for VM/ESA. |
82 | CCSID 1047 differs from CCSID 0037 in eight places. |
83 | |
84 | =head2 POSIX-BC |
85 | |
86 | The EBCDIC code page in use on Siemens' BS2000 system is distinct from |
87 | 1047 and 0037. It is identified below as the POSIX-BC set. |
88 | |
89 | =head1 SINGLE OCTET TABLES |
90 | |
91 | The following tables list the ASCII and Latin 1 ordered sets including |
92 | the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f), |
93 | C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the |
94 | table non-printing control character names as well as the Latin 1 |
95 | extensions to ASCII have been labelled with character names roughly |
96 | corresponding to I<The Unicode Standard, Version 2.0> albeit with |
97 | substitutions such as s/LATIN// and s/VULGAR// in all cases, |
98 | s/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ |
99 | in some other cases. The "names" of the C1 control set |
100 | (128..159 in ISO 8859-1) are somewhat arbitrary. The differences |
101 | between the 0037 and 1047 sets are flagged with ***. The differences |
102 | between the 1047 and POSIX-BC sets are flagged with ###. |
103 | All ord() numbers listed are decimal. If you would rather see this |
104 | table listing octal values then run the table (that is, the pod |
105 | version of this document since this recipe may not work with |
106 | a pod2XXX translation to another format) through: |
107 | |
108 | =over 4 |
109 | |
110 | =item recipe 0 |
111 | |
112 | =back |
113 | |
114 | perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ |
115 | -e '{printf("%s%-9o%-9o%-9o%-9o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod |
116 | |
117 | If you would rather see this table listing hexadecimal values then |
118 | run the table through: |
119 | |
120 | =over 4 |
121 | |
122 | =item recipe 1 |
123 | |
124 | =back |
125 | |
126 | perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ |
127 | -e '{printf("%s%-9X%-9X%-9X%-9X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod |
128 | |
129 | |
130 | 8859-1 |
131 | chr 0819 0037 1047 POSIX-BC |
132 | ---------------------------------------------------------------- |
133 | <NULL> 0 0 0 0 |
134 | <START OF HEADING> 1 1 1 1 |
135 | <START OF TEXT> 2 2 2 2 |
136 | <END OF TEXT> 3 3 3 3 |
137 | <END OF TRANSMISSION> 4 55 55 55 |
138 | <ENQUIRY> 5 45 45 45 |
139 | <ACKNOWLEDGE> 6 46 46 46 |
140 | <BELL> 7 47 47 47 |
141 | <BACKSPACE> 8 22 22 22 |
142 | <HORIZONTAL TABULATION> 9 5 5 5 |
143 | <LINE FEED> 10 37 21 21 *** |
144 | <VERTICAL TABULATION> 11 11 11 11 |
145 | <FORM FEED> 12 12 12 12 |
146 | <CARRIAGE RETURN> 13 13 13 13 |
147 | <SHIFT OUT> 14 14 14 14 |
148 | <SHIFT IN> 15 15 15 15 |
149 | <DATA LINK ESCAPE> 16 16 16 16 |
150 | <DEVICE CONTROL ONE> 17 17 17 17 |
151 | <DEVICE CONTROL TWO> 18 18 18 18 |
152 | <DEVICE CONTROL THREE> 19 19 19 19 |
153 | <DEVICE CONTROL FOUR> 20 60 60 60 |
154 | <NEGATIVE ACKNOWLEDGE> 21 61 61 61 |
155 | <SYNCHRONOUS IDLE> 22 50 50 50 |
156 | <END OF TRANSMISSION BLOCK> 23 38 38 38 |
157 | <CANCEL> 24 24 24 24 |
158 | <END OF MEDIUM> 25 25 25 25 |
159 | <SUBSTITUTE> 26 63 63 63 |
160 | <ESCAPE> 27 39 39 39 |
161 | <FILE SEPARATOR> 28 28 28 28 |
162 | <GROUP SEPARATOR> 29 29 29 29 |
163 | <RECORD SEPARATOR> 30 30 30 30 |
164 | <UNIT SEPARATOR> 31 31 31 31 |
165 | <SPACE> 32 64 64 64 |
166 | ! 33 90 90 90 |
167 | " 34 127 127 127 |
168 | # 35 123 123 123 |
169 | $ 36 91 91 91 |
170 | % 37 108 108 108 |
171 | & 38 80 80 80 |
172 | ' 39 125 125 125 |
173 | ( 40 77 77 77 |
174 | ) 41 93 93 93 |
175 | * 42 92 92 92 |
176 | + 43 78 78 78 |
177 | , 44 107 107 107 |
178 | - 45 96 96 96 |
179 | . 46 75 75 75 |
180 | / 47 97 97 97 |
181 | 0 48 240 240 240 |
182 | 1 49 241 241 241 |
183 | 2 50 242 242 242 |
184 | 3 51 243 243 243 |
185 | 4 52 244 244 244 |
186 | 5 53 245 245 245 |
187 | 6 54 246 246 246 |
188 | 7 55 247 247 247 |
189 | 8 56 248 248 248 |
190 | 9 57 249 249 249 |
191 | : 58 122 122 122 |
192 | ; 59 94 94 94 |
193 | < 60 76 76 76 |
194 | = 61 126 126 126 |
195 | > 62 110 110 110 |
196 | ? 63 111 111 111 |
197 | @ 64 124 124 124 |
198 | A 65 193 193 193 |
199 | B 66 194 194 194 |
200 | C 67 195 195 195 |
201 | D 68 196 196 196 |
202 | E 69 197 197 197 |
203 | F 70 198 198 198 |
204 | G 71 199 199 199 |
205 | H 72 200 200 200 |
206 | I 73 201 201 201 |
207 | J 74 209 209 209 |
208 | K 75 210 210 210 |
209 | L 76 211 211 211 |
210 | M 77 212 212 212 |
211 | N 78 213 213 213 |
212 | O 79 214 214 214 |
213 | P 80 215 215 215 |
214 | Q 81 216 216 216 |
215 | R 82 217 217 217 |
216 | S 83 226 226 226 |
217 | T 84 227 227 227 |
218 | U 85 228 228 228 |
219 | V 86 229 229 229 |
220 | W 87 230 230 230 |
221 | X 88 231 231 231 |
222 | Y 89 232 232 232 |
223 | Z 90 233 233 233 |
224 | [ 91 186 173 187 *** ### |
225 | \ 92 224 224 188 ### |
226 | ] 93 187 189 189 *** |
227 | ^ 94 176 95 106 *** ### |
228 | _ 95 109 109 109 |
229 | ` 96 121 121 74 ### |
230 | a 97 129 129 129 |
231 | b 98 130 130 130 |
232 | c 99 131 131 131 |
233 | d 100 132 132 132 |
234 | e 101 133 133 133 |
235 | f 102 134 134 134 |
236 | g 103 135 135 135 |
237 | h 104 136 136 136 |
238 | i 105 137 137 137 |
239 | j 106 145 145 145 |
240 | k 107 146 146 146 |
241 | l 108 147 147 147 |
242 | m 109 148 148 148 |
243 | n 110 149 149 149 |
244 | o 111 150 150 150 |
245 | p 112 151 151 151 |
246 | q 113 152 152 152 |
247 | r 114 153 153 153 |
248 | s 115 162 162 162 |
249 | t 116 163 163 163 |
250 | u 117 164 164 164 |
251 | v 118 165 165 165 |
252 | w 119 166 166 166 |
253 | x 120 167 167 167 |
254 | y 121 168 168 168 |
255 | z 122 169 169 169 |
256 | { 123 192 192 251 ### |
257 | | 124 79 79 79 |
258 | } 125 208 208 253 ### |
259 | ~ 126 161 161 255 ### |
260 | <DELETE> 127 7 7 7 |
261 | <C1 0> 128 32 32 32 |
262 | <C1 1> 129 33 33 33 |
263 | <C1 2> 130 34 34 34 |
264 | <C1 3> 131 35 35 35 |
265 | <C1 4> 132 36 36 36 |
266 | <C1 5> 133 21 37 37 *** |
267 | <C1 6> 134 6 6 6 |
268 | <C1 7> 135 23 23 23 |
269 | <C1 8> 136 40 40 40 |
270 | <C1 9> 137 41 41 41 |
271 | <C1 10> 138 42 42 42 |
272 | <C1 11> 139 43 43 43 |
273 | <C1 12> 140 44 44 44 |
274 | <C1 13> 141 9 9 9 |
275 | <C1 14> 142 10 10 10 |
276 | <C1 15> 143 27 27 27 |
277 | <C1 16> 144 48 48 48 |
278 | <C1 17> 145 49 49 49 |
279 | <C1 18> 146 26 26 26 |
280 | <C1 19> 147 51 51 51 |
281 | <C1 20> 148 52 52 52 |
282 | <C1 21> 149 53 53 53 |
283 | <C1 22> 150 54 54 54 |
284 | <C1 23> 151 8 8 8 |
285 | <C1 24> 152 56 56 56 |
286 | <C1 25> 153 57 57 57 |
287 | <C1 26> 154 58 58 58 |
288 | <C1 27> 155 59 59 59 |
289 | <C1 28> 156 4 4 4 |
290 | <C1 29> 157 20 20 20 |
291 | <C1 30> 158 62 62 62 |
292 | <C1 31> 159 255 255 95 ### |
293 | <NON-BREAKING SPACE> 160 65 65 65 |
294 | <INVERTED EXCLAMATION MARK> 161 170 170 170 |
295 | <CENT SIGN> 162 74 74 176 ### |
296 | <POUND SIGN> 163 177 177 177 |
297 | <CURRENCY SIGN> 164 159 159 159 |
298 | <YEN SIGN> 165 178 178 178 |
299 | <BROKEN BAR> 166 106 106 208 ### |
300 | <SECTION SIGN> 167 181 181 181 |
301 | <DIAERESIS> 168 189 187 121 *** ### |
302 | <COPYRIGHT SIGN> 169 180 180 180 |
303 | <FEMININE ORDINAL INDICATOR> 170 154 154 154 |
304 | <LEFT POINTING GUILLEMET> 171 138 138 138 |
305 | <NOT SIGN> 172 95 176 186 *** ### |
306 | <SOFT HYPHEN> 173 202 202 202 |
307 | <REGISTERED TRADE MARK SIGN> 174 175 175 175 |
308 | <MACRON> 175 188 188 161 ### |
309 | <DEGREE SIGN> 176 144 144 144 |
310 | <PLUS-OR-MINUS SIGN> 177 143 143 143 |
311 | <SUPERSCRIPT TWO> 178 234 234 234 |
312 | <SUPERSCRIPT THREE> 179 250 250 250 |
313 | <ACUTE ACCENT> 180 190 190 190 |
314 | <MICRO SIGN> 181 160 160 160 |
315 | <PARAGRAPH SIGN> 182 182 182 182 |
316 | <MIDDLE DOT> 183 179 179 179 |
317 | <CEDILLA> 184 157 157 157 |
318 | <SUPERSCRIPT ONE> 185 218 218 218 |
319 | <MASC. ORDINAL INDICATOR> 186 155 155 155 |
320 | <RIGHT POINTING GUILLEMET> 187 139 139 139 |
321 | <FRACTION ONE QUARTER> 188 183 183 183 |
322 | <FRACTION ONE HALF> 189 184 184 184 |
323 | <FRACTION THREE QUARTERS> 190 185 185 185 |
324 | <INVERTED QUESTION MARK> 191 171 171 171 |
325 | <A WITH GRAVE> 192 100 100 100 |
326 | <A WITH ACUTE> 193 101 101 101 |
327 | <A WITH CIRCUMFLEX> 194 98 98 98 |
328 | <A WITH TILDE> 195 102 102 102 |
329 | <A WITH DIAERESIS> 196 99 99 99 |
330 | <A WITH RING ABOVE> 197 103 103 103 |
331 | <CAPITAL LIGATURE AE> 198 158 158 158 |
332 | <C WITH CEDILLA> 199 104 104 104 |
333 | <E WITH GRAVE> 200 116 116 116 |
334 | <E WITH ACUTE> 201 113 113 113 |
335 | <E WITH CIRCUMFLEX> 202 114 114 114 |
336 | <E WITH DIAERESIS> 203 115 115 115 |
337 | <I WITH GRAVE> 204 120 120 120 |
338 | <I WITH ACUTE> 205 117 117 117 |
339 | <I WITH CIRCUMFLEX> 206 118 118 118 |
340 | <I WITH DIAERESIS> 207 119 119 119 |
341 | <CAPITAL LETTER ETH> 208 172 172 172 |
342 | <N WITH TILDE> 209 105 105 105 |
343 | <O WITH GRAVE> 210 237 237 237 |
344 | <O WITH ACUTE> 211 238 238 238 |
345 | <O WITH CIRCUMFLEX> 212 235 235 235 |
346 | <O WITH TILDE> 213 239 239 239 |
347 | <O WITH DIAERESIS> 214 236 236 236 |
348 | <MULTIPLICATION SIGN> 215 191 191 191 |
349 | <O WITH STROKE> 216 128 128 128 |
350 | <U WITH GRAVE> 217 253 253 224 ### |
351 | <U WITH ACUTE> 218 254 254 254 |
352 | <U WITH CIRCUMFLEX> 219 251 251 221 ### |
353 | <U WITH DIAERESIS> 220 252 252 252 |
354 | <Y WITH ACUTE> 221 173 186 173 *** ### |
355 | <CAPITAL LETTER THORN> 222 174 174 174 |
356 | <SMALL LETTER SHARP S> 223 89 89 89 |
357 | <a WITH GRAVE> 224 68 68 68 |
358 | <a WITH ACUTE> 225 69 69 69 |
359 | <a WITH CIRCUMFLEX> 226 66 66 66 |
360 | <a WITH TILDE> 227 70 70 70 |
361 | <a WITH DIAERESIS> 228 67 67 67 |
362 | <a WITH RING ABOVE> 229 71 71 71 |
363 | <SMALL LIGATURE ae> 230 156 156 156 |
364 | <c WITH CEDILLA> 231 72 72 72 |
365 | <e WITH GRAVE> 232 84 84 84 |
366 | <e WITH ACUTE> 233 81 81 81 |
367 | <e WITH CIRCUMFLEX> 234 82 82 82 |
368 | <e WITH DIAERESIS> 235 83 83 83 |
369 | <i WITH GRAVE> 236 88 88 88 |
370 | <i WITH ACUTE> 237 85 85 85 |
371 | <i WITH CIRCUMFLEX> 238 86 86 86 |
372 | <i WITH DIAERESIS> 239 87 87 87 |
373 | <SMALL LETTER eth> 240 140 140 140 |
374 | <n WITH TILDE> 241 73 73 73 |
375 | <o WITH GRAVE> 242 205 205 205 |
376 | <o WITH ACUTE> 243 206 206 206 |
377 | <o WITH CIRCUMFLEX> 244 203 203 203 |
378 | <o WITH TILDE> 245 207 207 207 |
379 | <o WITH DIAERESIS> 246 204 204 204 |
380 | <DIVISION SIGN> 247 225 225 225 |
381 | <o WITH STROKE> 248 112 112 112 |
382 | <u WITH GRAVE> 249 221 221 192 ### |
383 | <u WITH ACUTE> 250 222 222 222 |
384 | <u WITH CIRCUMFLEX> 251 219 219 219 |
385 | <u WITH DIAERESIS> 252 220 220 220 |
386 | <y WITH ACUTE> 253 141 141 141 |
387 | <SMALL LETTER thorn> 254 142 142 142 |
388 | <y WITH DIAERESIS> 255 223 223 223 |
389 | |
390 | If you would rather see the above table in CCSID 0037 order rather than |
391 | ASCII + Latin-1 order then run the table through: |
392 | |
393 | =over 4 |
394 | |
395 | =item recipe 2 |
396 | |
397 | =back |
398 | |
399 | perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\s{1,3}/)'\ |
400 | -e '{push(@l,$_)}' \ |
401 | -e 'END{print map{$_->[0]}' \ |
402 | -e ' sort{$a->[1] <=> $b->[1]}' \ |
403 | -e ' map{[$_,substr($_,42,3)]}@l;}' perlebcdic.pod |
404 | |
405 | If you would rather see it in CCSID 1047 order then change the digit |
406 | 42 in the last line to 51, like this: |
407 | |
408 | =over 4 |
409 | |
410 | =item recipe 3 |
411 | |
412 | =back |
413 | |
414 | perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\s{1,3}/)'\ |
415 | -e '{push(@l,$_)}' \ |
416 | -e 'END{print map{$_->[0]}' \ |
417 | -e ' sort{$a->[1] <=> $b->[1]}' \ |
418 | -e ' map{[$_,substr($_,51,3)]}@l;}' perlebcdic.pod |
419 | |
420 | If you would rather see it in POSIX-BC order then change the digit |
421 | 51 in the last line to 60, like this: |
422 | |
423 | =over 4 |
424 | |
425 | =item recipe 4 |
426 | |
427 | =back |
428 | |
429 | perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\s{1,3}/)'\ |
430 | -e '{push(@l,$_)}' \ |
431 | -e 'END{print map{$_->[0]}' \ |
432 | -e ' sort{$a->[1] <=> $b->[1]}' \ |
433 | -e ' map{[$_,substr($_,60,3)]}@l;}' perlebcdic.pod |
434 | |
435 | |
436 | =head1 IDENTIFYING CHARACTER CODE SETS |
437 | |
438 | To determine the character set you are running under from perl one |
439 | could use the return value of ord() or chr() to test one or more |
440 | character values. For example: |
441 | |
442 | $is_ascii = "A" eq chr(65); |
443 | $is_ebcdic = "A" eq chr(193); |
444 | |
445 | "\t" is a <HORIZONTAL TABULATION>. So that: |
446 | |
447 | $is_ascii = ord("\t") == 9; |
448 | $is_ebcdic = ord("\t") == 5; |
449 | |
450 | To distinguish EBCDIC code pages try looking at one or more of |
451 | the characters that differ between them. For example: |
452 | |
453 | $is_ebcdic_37 = "\n" eq chr(37); |
454 | $is_ebcdic_1047 = "\n" eq chr(21); |
455 | |
456 | Or better still choose a character that is uniquely encoded in any |
457 | of the code sets, e.g.: |
458 | |
459 | $is_ascii = ord('[') == 91; |
460 | $is_ebcdic_37 = ord('[') == 186; |
461 | $is_ebcdic_1047 = ord('[') == 173; |
462 | $is_ebcdic_POSIX_BC = ord('[') == 187; |
463 | |
464 | However, it would be unwise to write tests such as: |
465 | |
466 | $is_ascii = "\r" ne chr(13); # WRONG |
467 | $is_ascii = "\n" ne chr(10); # ILL ADVISED |
468 | |
469 | Obviously the first of these will fail to distinguish most ASCII machines |
470 | from either a CCSID 0037, a 1047, or a POSIX-BC EBCDIC machine since "\r" eq |
471 | chr(13) under all of those coded character sets. But note too that |
472 | because "\n" is chr(13) and "\r" is chr(10) on the MacIntosh (which is an |
473 | ASCII machine) the second C<$is_ascii> test will lead to trouble there. |
474 | |
475 | To determine whether or not perl was built under an EBCDIC |
476 | code page you can use the Config module like so: |
477 | |
478 | use Config; |
479 | $is_ebcdic = $Config{ebcdic} eq 'define'; |
480 | |
481 | =head1 CONVERSIONS |
482 | |
483 | In order to convert a string of characters from one character set to |
484 | another a simple list of numbers, such as in the right columns in the |
485 | above table, along with perl's tr/// operator is all that is needed. |
486 | The data in the table are in ASCII order hence the EBCDIC columns |
487 | provide easy to use ASCII to EBCDIC operations that are also easily |
488 | reversed. |
489 | |
490 | For example, to convert ASCII to code page 037 take the output of the second |
491 | column from the output of recipe 0 and use it in tr/// like so: |
492 | |
493 | $cp_037 = |
494 | '\000\001\002\003\234\011\206\177\227\215\216\013\014\015\016\017' . |
495 | '\020\021\022\023\235\205\010\207\030\031\222\217\034\035\036\037' . |
496 | '\200\201\202\203\204\012\027\033\210\211\212\213\214\005\006\007' . |
497 | '\220\221\026\223\224\225\226\004\230\231\232\233\024\025\236\032' . |
498 | '\040\240\342\344\340\341\343\345\347\361\242\056\074\050\053\174' . |
499 | '\046\351\352\353\350\355\356\357\354\337\041\044\052\051\073\254' . |
500 | '\055\057\302\304\300\301\303\305\307\321\246\054\045\137\076\077' . |
501 | '\370\311\312\313\310\315\316\317\314\140\072\043\100\047\075\042' . |
502 | '\330\141\142\143\144\145\146\147\150\151\253\273\360\375\376\261' . |
503 | '\260\152\153\154\155\156\157\160\161\162\252\272\346\270\306\244' . |
504 | '\265\176\163\164\165\166\167\170\171\172\241\277\320\335\336\256' . |
505 | '\136\243\245\267\251\247\266\274\275\276\133\135\257\250\264\327' . |
506 | '\173\101\102\103\104\105\106\107\110\111\255\364\366\362\363\365' . |
507 | '\175\112\113\114\115\116\117\120\121\122\271\373\374\371\372\377' . |
508 | '\134\367\123\124\125\126\127\130\131\132\262\324\326\322\323\325' . |
509 | '\060\061\062\063\064\065\066\067\070\071\263\333\334\331\332\237' ; |
510 | |
511 | my $ebcdic_string = $ascii_string; |
512 | $ebcdic_string = tr/\000-\377/$cp_037/; |
513 | |
514 | To convert from EBCDIC to ASCII just reverse the order of the tr/// |
515 | arguments like so: |
516 | |
517 | my $ascii_string = $ebcdic_string; |
518 | $ascii_string = tr/$code_page_chrs/\000-\037/; |
519 | |
520 | XPG4 interoperability often implies the presence of an I<iconv> utility |
521 | available from the shell or from the C library. Consult your system's |
522 | documentation for information on iconv. |
523 | |
524 | On OS/390 see the iconv(1) man page. One way to invoke the iconv |
525 | shell utility from within perl would be to: |
526 | |
527 | $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1` |
528 | |
529 | or the inverse map: |
530 | |
531 | $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047` |
532 | |
533 | XXX iconv under qsh on OS/400? |
534 | XXX iconv on VM? |
535 | XXX iconv on BS2k? |
536 | |
537 | For other perl based conversion options see the Convert::* modules on CPAN. |
538 | |
539 | =head1 OPERATOR DIFFERENCES |
540 | |
541 | The C<..> range operator treats certain character ranges with |
542 | care on EBCDIC machines. For example the following array |
543 | will have twenty six elements on either an EBCDIC machine |
544 | or an ASCII machine: |
545 | |
546 | @alphabet = ('A'..'Z'); # $#alphabet == 25 |
547 | |
548 | The bitwise operators such as & ^ | may return different results |
549 | when operating on string or character data in a perl program running |
550 | on an EBCDIC machine than when run on an ASCII machine. Here is |
551 | an example adapted from the one in L<perlop>: |
552 | |
553 | # EBCDIC-based examples |
554 | print "j p \n" ^ " a h"; # prints "JAPH\n" |
555 | print "JA" | " ph\n"; # prints "japh\n" |
556 | print "JAPH\nJunk" & "\277\277\277\277\277"; # prints "japh\n"; |
557 | print 'p N$' ^ " E<H\n"; # prints "Perl\n"; |
558 | |
559 | An interesting property of the 32 C0 control characters |
560 | in the ASCII table is that they can "literally" be constructed |
561 | as control characters in perl, e.g. (chr(0) eq "\c@"), |
562 | (chr(1) eq "\cA"), and so on. Perl on EBCDIC machines has been |
563 | ported to take "\c@" -> chr(0) and "\cA" -> chr(1) as well, but the |
564 | thirty three characters that result depend on which code page you are |
565 | using. The table below uses the character names from the previous table |
566 | but with substitions such as s/START OF/S.O./; s/END OF /E.O./; |
567 | s/TRANSMISSION/TRANS./; s/TABULATION/TAB./; s/VERTICAL/VERT./; |
568 | s/HORIZONTAL/HORIZ./; s/DEVICE CONTROL/D.C./; s/SEPARATOR/SEP./; |
569 | s/NEGATIVE ACKNOWLEDGE/NEG. ACK./;. The POSIX-BC and 1047 sets are |
570 | identical throughout this range and differ from the 0037 set at only |
571 | one spot (21 decimal). Note that "\c\\" maps to two characters |
572 | not one. |
573 | |
574 | chr ord 8859-1 0037 1047 && POSIX-BC |
575 | ------------------------------------------------------------------------ |
576 | "\c?" 127 <DELETE> " " ***>< |
577 | "\c@" 0 <NULL> <NULL> <NULL> ***>< |
578 | "\cA" 1 <S.O. HEADING> <S.O. HEADING> <S.O. HEADING> |
579 | "\cB" 2 <S.O. TEXT> <S.O. TEXT> <S.O. TEXT> |
580 | "\cC" 3 <E.O. TEXT> <E.O. TEXT> <E.O. TEXT> |
581 | "\cD" 4 <E.O. TRANS.> <C1 28> <C1 28> |
582 | "\cE" 5 <ENQUIRY> <HORIZ. TAB.> <HORIZ. TAB.> |
583 | "\cF" 6 <ACKNOWLEDGE> <C1 6> <C1 6> |
584 | "\cG" 7 <BELL> <DELETE> <DELETE> |
585 | "\cH" 8 <BACKSPACE> <C1 23> <C1 23> |
586 | "\cI" 9 <HORIZ. TAB.> <C1 13> <C1 13> |
587 | "\cJ" 10 <LINE FEED> <C1 14> <C1 14> |
588 | "\cK" 11 <VERT. TAB.> <VERT. TAB.> <VERT. TAB.> |
589 | "\cL" 12 <FORM FEED> <FORM FEED> <FORM FEED> |
590 | "\cM" 13 <CARRIAGE RETURN> <CARRIAGE RETURN> <CARRIAGE RETURN> |
591 | "\cN" 14 <SHIFT OUT> <SHIFT OUT> <SHIFT OUT> |
592 | "\cO" 15 <SHIFT IN> <SHIFT IN> <SHIFT IN> |
593 | "\cP" 16 <DATA LINK ESCAPE> <DATA LINK ESCAPE> <DATA LINK ESCAPE> |
594 | "\cQ" 17 <D.C. ONE> <D.C. ONE> <D.C. ONE> |
595 | "\cR" 18 <D.C. TWO> <D.C. TWO> <D.C. TWO> |
596 | "\cS" 19 <D.C. THREE> <D.C. THREE> <D.C. THREE> |
597 | "\cT" 20 <D.C. FOUR> <C1 29> <C1 29> |
598 | "\cU" 21 <NEG. ACK.> <C1 5> <LINE FEED> *** |
599 | "\cV" 22 <SYNCHRONOUS IDLE> <BACKSPACE> <BACKSPACE> |
600 | "\cW" 23 <E.O. TRANS. BLOCK> <C1 7> <C1 7> |
601 | "\cX" 24 <CANCEL> <CANCEL> <CANCEL> |
602 | "\cY" 25 <E.O. MEDIUM> <E.O. MEDIUM> <E.O. MEDIUM> |
603 | "\cZ" 26 <SUBSTITUTE> <C1 18> <C1 18> |
604 | "\c[" 27 <ESCAPE> <C1 15> <C1 15> |
605 | "\c\\" 28 <FILE SEP.>\ <FILE SEP.>\ <FILE SEP.>\ |
606 | "\c]" 29 <GROUP SEP.> <GROUP SEP.> <GROUP SEP.> |
607 | "\c^" 30 <RECORD SEP.> <RECORD SEP.> <RECORD SEP.> ***>< |
608 | "\c_" 31 <UNIT SEP.> <UNIT SEP.> <UNIT SEP.> ***>< |
609 | |
610 | |
611 | =head1 FUNCTION DIFFERENCES |
612 | |
613 | =over 8 |
614 | |
615 | =item chr() |
616 | |
617 | chr() must be given an EBCDIC code number argument to yield a desired |
618 | character return value on an EBCDIC machine. For example: |
619 | |
620 | $CAPITAL_LETTER_A = chr(193); |
621 | |
622 | =item ord() |
623 | |
624 | ord() will return EBCDIC code number values on an EBCDIC machine. |
625 | For example: |
626 | |
627 | $the_number_193 = ord("A"); |
628 | |
629 | =item pack() |
630 | |
631 | The c and C templates for pack() are dependent upon character set |
632 | encoding. Examples of usage on EBCDIC include: |
633 | |
634 | $foo = pack("CCCC",193,194,195,196); |
635 | # $foo eq "ABCD" |
636 | $foo = pack("C4",193,194,195,196); |
637 | # same thing |
638 | |
639 | $foo = pack("ccxxcc",193,194,195,196); |
640 | # $foo eq "AB\0\0CD" |
641 | |
642 | =item print() |
643 | |
644 | One must be careful with scalars and strings that are passed to |
645 | print that contain ASCII encodings. One common place |
646 | for this to occur is in the output of the MIME type header for |
647 | CGI script writing. For example, many perl programming guides |
648 | recommend something similar to: |
649 | |
650 | print "Content-type:\ttext/html\015\012\015\012"; |
651 | # this may be wrong on EBCDIC |
652 | |
653 | Under the IBM OS/390 USS Web Server for example you should instead |
654 | write that as: |
655 | |
656 | print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et alia |
657 | |
658 | That is because the translation from EBCDIC to ASCII is done |
659 | by the web server in this case (such code will not be appropriate for |
660 | the Macintosh however). Consult your web server's documentation for |
661 | further details. |
662 | |
663 | =item printf() |
664 | |
665 | The formats that can convert characters to numbers and vice versa |
666 | will be different from their ASCII counterparts when executed |
667 | on an EBCDIC machine. Examples include: |
668 | |
669 | printf("%c%c%c",193,194,195); # prints ABC |
670 | |
671 | =item sort() |
672 | |
673 | EBCDIC sort results may differ from ASCII sort results especially for |
674 | mixed case strings. This is discussed in more detail below. |
675 | |
676 | =item sprintf() |
677 | |
678 | See the discussion of printf() above. An example of the use |
679 | of sprintf would be: |
680 | |
681 | $CAPITAL_LETTER_A = sprintf("%c",193); |
682 | |
683 | =item unpack() |
684 | |
685 | See the discussion of pack() above. |
686 | |
687 | =back |
688 | |
689 | =head1 REGULAR EXPRESSION DIFFERENCES |
690 | |
691 | As of perl 5.005_03 the letter range regular expression such as |
692 | [A-Z] and [a-z] have been especially coded to not pick up gap |
693 | characters. For example characters such as <o WITH CIRCUMFLEX> |
694 | that lie between I and J would not be matched by C</[H-K]/>. |
695 | If you do want to match such characters in a single octet |
696 | regular expression try matching the hex or octal code such |
697 | as C</\313/> on EBCDIC or C</\364/> on ASCII machines to |
698 | have your regular expression match <o WITH CIRCUMFLEX>. |
699 | |
700 | Another place to be wary of is the inappropriate use of hex or |
701 | octal constants in regular expressions. Consider the following |
702 | set of subs: |
703 | |
704 | sub is_c0 { |
705 | my $char = substr(shift,0,1); |
706 | $char =~ /[\000-\037]/; |
707 | } |
708 | |
709 | sub is_print_ascii { |
710 | my $char = substr(shift,0,1); |
711 | $char =~ /[\040-\176]/; |
712 | } |
713 | |
714 | sub is_delete { |
715 | my $char = substr(shift,0,1); |
716 | $char eq "\177"; |
717 | } |
718 | |
719 | sub is_c1 { |
720 | my $char = substr(shift,0,1); |
721 | $char =~ /[\200-\237]/; |
722 | } |
723 | |
724 | sub is_latin_1 { |
725 | my $char = substr(shift,0,1); |
726 | $char =~ /[\240-\377]/; |
727 | } |
728 | |
729 | The above would be adequate if the concern was only with numeric codepoints. |
730 | However, we may actually be concerned with characters rather than codepoints |
731 | and on an EBCDIC machine would like for constructs such as |
732 | C<if (is_print_ascii("A")) {print "A is a printable character\n";}> to print |
733 | out the expected message. One way to represent the above collection |
734 | of character classification subs that is capable of working across the |
735 | four coded character sets discussed in this document is as follows: |
736 | |
737 | sub Is_c0 { |
738 | my $char = substr(shift,0,1); |
739 | if (ord('^')==94) { # ascii |
740 | return $char =~ /[\000-\037]/; |
741 | } |
742 | if (ord('^')==176) { # 37 |
743 | return $char =~ /[\000-\003\067\055-\057\026\005\045\013-\023\074\075\062\046\030\031\077\047\034-\037]/; |
744 | } |
745 | if (ord('^')==95 || ord('^')==106) { # 1047 || posix-bc |
746 | return $char =~ /[\000-\003\067\055-\057\026\005\025\013-\023\074\075\062\046\030\031\077\047\034-\037]/; |
747 | } |
748 | } |
749 | |
750 | sub Is_print_ascii { |
751 | my $char = substr(shift,0,1); |
752 | $char =~ /[ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/; |
753 | } |
754 | |
755 | sub Is_delete { |
756 | my $char = substr(shift,0,1); |
757 | if (ord('^')==94) { # ascii |
758 | return $char eq "\177"; |
759 | } |
760 | else { # ebcdic |
761 | return $char eq "\007"; |
762 | } |
763 | } |
764 | |
765 | sub Is_c1 { |
766 | my $char = substr(shift,0,1); |
767 | if (ord('^')==94) { # ascii |
768 | return $char =~ /[\200-\237]/; |
769 | } |
770 | if (ord('^')==176) { # 37 |
771 | return $char =~ /[\040-\044\025\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; |
772 | } |
773 | if (ord('^')==95) { # 1047 |
774 | return $char =~ /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\377]/; |
775 | } |
776 | if (ord('^')==106) { # posix-bc |
777 | return $char =~ |
778 | /[\040-\045\006\027\050-\054\011\012\033\060\061\032\063-\066\010\070-\073\040\024\076\137]/; |
779 | } |
780 | } |
781 | |
782 | sub Is_latin_1 { |
783 | my $char = substr(shift,0,1); |
784 | if (ord('^')==94) { # ascii |
785 | return $char =~ /[\240-\377]/; |
786 | } |
787 | if (ord('^')==176) { # 37 |
788 | return $char =~ |
789 | /[\101\252\112\261\237\262\152\265\275\264\232\212\137\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; |
790 | } |
791 | if (ord('^')==95) { # 1047 |
792 | return $char =~ |
793 | /[\101\252\112\261\237\262\152\265\273\264\232\212\260\312\257\274\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\375\376\373\374\272\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\335\336\333\334\215\216\337]/; |
794 | } |
795 | if (ord('^')==106) { # posix-bc |
796 | return $char =~ |
797 | /[\101\252\260\261\237\262\320\265\171\264\232\212\272\312\257\241\220\217\352\372\276\240\266\263\235\332\233\213\267\270\271\253\144\145\142\146\143\147\236\150\164\161-\163\170\165-\167\254\151\355\356\353\357\354\277\200\340\376\335\374\255\256\131\104\105\102\106\103\107\234\110\124\121-\123\130\125-\127\214\111\315\316\313\317\314\341\160\300\336\333\334\215\216\337]/; |
798 | } |
799 | } |
800 | |
801 | Note however that only the C<Is_ascii_print()> sub is really independent |
802 | of coded character set. Another way to write C<Is_latin_1()> would be |
803 | to use the characters in the range explicitly: |
804 | |
805 | sub Is_latin_1 { |
806 | my $char = substr(shift,0,1); |
807 | $char =~ /[ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]/; |
808 | } |
809 | |
810 | Although that form may run into trouble in network transit (due to the |
811 | presence of 8 bit characters) or on non ISO-Latin character sets. |
812 | |
813 | |
814 | =head1 SOCKETS |
815 | |
816 | Most socket programming assumes ASCII character encodings in network |
817 | byte order. Exceptions can include CGI script writing under a |
818 | host web server where the server may take care of translation for you. |
819 | Most host web servers convert EBCDIC data to ISO-8859-1 or Unicode on |
820 | output. |
821 | |
822 | =head1 SORTING |
823 | |
824 | One big difference between ASCII based character sets and EBCDIC ones |
825 | are the relative positions of upper and lower case letters and the |
826 | letters compared to the digits. If sorted on an ASCII based machine the |
827 | two letter abbreviation for a physician comes before the two letter |
828 | for drive, that is: |
829 | |
830 | @sorted = sort(qw(Dr. dr.)); # @sorted holds qw(Dr. dr.) on ASCII, |
831 | # qw(dr. Dr.) on EBCDIC |
832 | |
833 | The property of lower case before uppercase letters in EBCDIC is |
834 | even carried to the Latin 1 EBCDIC pages such as 0037 and 1047. |
835 | An example would be that <E WITH DIAERESIS> (203) comes before |
836 | <e WITH DIAERESIS> (235) on and ASCII machine, but the latter (83) |
837 | comes before the former (115) on an EBCDIC machine. (Astute readers will |
838 | note that the upper case version of <SMALL LETTER SHARP S> is |
839 | simply "SS" and that the upper case version of <y WITH DIAERESIS> |
840 | is not in the 0..255 range but it is at U+x0178 in Unicode). |
841 | |
842 | The sort order will cause differences between results obtained on |
843 | ASCII machines versus EBCDIC machines. What follows are some suggestions |
844 | on how to deal with these differences. |
845 | |
846 | =head2 Ignore ASCII vs EBCDIC sort differences. |
847 | |
848 | This is the least computationally expensive strategy. It may require |
849 | some user education. |
850 | |
851 | =head2 MONOCASE then sort data. |
852 | |
853 | In order to minimize the expense of monocasing mixed test try to |
854 | C<tr///> towards the character set case most employed within the data. |
855 | If the data are primarily UPPERCASE non Latin 1 then apply tr/[a-z]/[A-Z]/ |
856 | then sort(). If the data are primarily lowercase non Latin 1 then |
857 | apply tr/[A-Z]/[a-z]/ before sorting. If the data are primarily UPPERCASE |
858 | and include Latin-1 characters then apply: tr/[a-z]/[A-Z]/; |
859 | XXX |
860 | |
861 | This strategy does not preserve the case of the data and may not be |
862 | acceptable. |
863 | |
864 | =head2 Convert, sort data, then reconvert. |
865 | |
866 | This is the most expensive proposition that does not employ a network |
867 | connection. |
868 | |
869 | =head2 Perform sorting on one type of machine only. |
870 | |
871 | This strategy can employ a network connection. As such |
872 | it would be computationally expensive. |
873 | |
874 | =head1 URL ENCODING and DECODING |
875 | |
876 | Note that some URLs have hexadecimal ASCII codepoints in them in an |
877 | attempt to overcome character limitation issues. For example the |
878 | tilde character is not on every keyboard hence a URL of the form: |
879 | |
880 | http://www.pvhp.com/~pvhp/ |
881 | |
882 | may also be expressed as either of: |
883 | |
884 | http://www.pvhp.com/%7Epvhp/ |
885 | |
886 | http://www.pvhp.com/%7epvhp/ |
887 | |
888 | where 7E is the hexadecimal ASCII codepoint for '~'. Here is an example |
889 | of decoding such a URL under CCSID 1047: |
890 | |
891 | $url = 'http://www.pvhp.com/%7Epvhp/'; |
892 | # this array assumes code page 1047 |
893 | my @a2e_1047 = ( |
894 | 0, 1, 2, 3, 55, 45, 46, 47, 22, 5, 21, 11, 12, 13, 14, 15, |
895 | 16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31, |
896 | 64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97, |
897 | 240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111, |
898 | 124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214, |
899 | 215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109, |
900 | 121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150, |
901 | 151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161, 7, |
902 | 32, 33, 34, 35, 36, 37, 6, 23, 40, 41, 42, 43, 44, 9, 10, 27, |
903 | 48, 49, 26, 51, 52, 53, 54, 8, 56, 57, 58, 59, 4, 20, 62,255, |
904 | 65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188, |
905 | 144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171, |
906 | 100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119, |
907 | 172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89, |
908 | 68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87, |
909 | 140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223 |
910 | ); |
911 | $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge; |
912 | |
913 | =head1 I18N AND L10N |
914 | |
915 | Internationalization(I18N) and localization(L10N) are supported at least |
916 | in principle even on EBCDIC machines. The details are system dependent |
917 | and discussed under the L<perlebcdic/OS ISSUES> section below. |
918 | |
919 | =head1 MULTI OCTET CHARACTER SETS |
920 | |
921 | Double byte EBCDIC code pages (?) XXX. |
922 | |
923 | UTF-8, UTF-EBCDIC, (?) XXX. |
924 | |
925 | =head1 OS ISSUES |
926 | |
927 | There may be a few system dependent issues |
928 | of concern to EBCDIC Perl programmers. |
929 | |
930 | =head2 OS/400 |
931 | |
932 | =over 8 |
933 | |
934 | =item IFS access |
935 | |
936 | XXX. |
937 | |
938 | =back |
939 | |
940 | =head2 OS/390 |
941 | |
942 | =over 8 |
943 | |
944 | =item dataset access |
945 | |
946 | For sequential data set access try: |
947 | |
948 | my @ds_records = `cat //DSNAME`; |
949 | |
950 | or: |
951 | |
952 | my @ds_records = `cat //'HLQ.DSNAME'`; |
953 | |
954 | See also the OS390::Stdio module on CPAN. |
955 | |
956 | =item locales |
957 | |
958 | On OS/390 see L<locale> for information on locales. The L10N files |
959 | are in F</usr/nls/locale>. $Config{d_setlocale} is 'define' on OS/390. |
960 | |
961 | =back |
962 | |
963 | =head2 VM/ESA? |
964 | |
965 | XXX. |
966 | |
967 | =head2 POSIX-BC? |
968 | |
969 | XXX. |
970 | |
971 | =head1 REFERENCES |
972 | |
973 | http://anubis.dkuug.dk/i18n/charmaps |
974 | |
975 | L<perllocale>. |
976 | |
977 | http://www.unicode.org/ |
978 | |
979 | http://www.unicode.org/unicode/reports/tr16/ |
980 | |
981 | B<The Unicode Standard Version 2.0> The Unicode Consortium, |
982 | ISBN 0-201-48345-9, Addison Wesley Developers Press, July 1996. |
983 | |
984 | B<CDRA: IBM - Character Data Representation Architecture - |
985 | Reference and Registry>, IBM SC09-2190-00, December 1996. |
986 | |
987 | "Demystifying Character Sets", Andrea Vine, Multilingual Computing |
988 | & Technology, B<#26 Vol. 10 Issue 4>, August/September 1999; |
989 | ISSN 1523-0309; Multilingual Computing Inc. Sandpoint ID, USA. |
990 | |
991 | =head1 AUTHOR |
992 | |
993 | Peter Prymmer E<lt>pvhp@best.comE<gt> wrote this in 1999 and 2000 |
994 | with CCSID 0819 and 0037 help from Chris Leach and |
a31a806a |
995 | AndrE<eacute> Pirard E<lt>A.Pirard@ulg.ac.beE<gt> as well as POSIX-BC |
d396a558 |
996 | help from Thomas Dorner E<lt>Thomas.Dorner@start.deE<gt>. |
997 | Thanks also to Philip Newton and Vickie Cooper. Trademarks, registered |
998 | trademarks, service marks and registered service marks used in this |
999 | document are the property of their respective owners. |
1000 | |
1001 | |