X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlebcdic.pod;h=942526b0e937c378d234fccfcec2f76c612a0700;hb=0dfdcd8a63a82bd61087d84a6f130e03a4b20ed9;hp=12ea2f3ef4b1634e54918ee732f4bbd55fa5945e;hpb=d5d9880cc2523c10f7be68257f4f1768a4e552d9;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlebcdic.pod b/pod/perlebcdic.pod index 12ea2f3..942526b 100644 --- a/pod/perlebcdic.pod +++ b/pod/perlebcdic.pod @@ -6,7 +6,8 @@ perlebcdic - Considerations for running Perl on EBCDIC platforms An exploration of some of the issues facing Perl programmers on EBCDIC based computers. We do not cover localization, -internationalization, or multi byte character set issues (yet). +internationalization, or multi byte character set issues other +than some discussion of UTF-8 and UTF-EBCDIC. Portions that are still incomplete are marked with XXX. @@ -44,7 +45,7 @@ A particular 8-bit extension to ASCII that includes grave and acute accented Latin characters. Languages that can employ ISO 8859-1 include all the languages covered by ASCII as well as Afrikaans, Albanian, Basque, Catalan, Danish, Faroese, Finnish, Norwegian, -Portugese, Spanish, and Swedish. Dutch is covered albeit without +Portuguese, Spanish, and Swedish. Dutch is covered albeit without the ij ligature. French is covered too but without the oe ligature. German can use ISO 8859-1 but must do so without German-style quotation marks. This set is based on Western European extensions @@ -54,7 +55,7 @@ also known as CCSID 819 (or sometimes 0819 or even 00819). =head2 EBCDIC -The Extended Binary Coded Decimal Interchange Code refers to a +The Extended Binary Coded Decimal Interchange Code refers to a large collection of slightly different single and multi byte coded character sets that are different from ASCII or ISO 8859-1 and typically run on host computers. The EBCDIC encodings derive @@ -88,14 +89,106 @@ in 237 places, in other words they agree on only 19 code point values. Character code set ID 1047 is also a mapping of the ASCII plus Latin-1 characters (i.e. ISO 8859-1) to an EBCDIC set. 1047 is -used under Unix System Services for OS/390, and OpenEdition for VM/ESA. -CCSID 1047 differs from CCSID 0037 in eight places. +used under Unix System Services for OS/390 or z/OS, and OpenEdition +for VM/ESA. CCSID 1047 differs from CCSID 0037 in eight places. =head2 POSIX-BC The EBCDIC code page in use on Siemens' BS2000 system is distinct from 1047 and 0037. It is identified below as the POSIX-BC set. +=head2 Unicode code points versus EBCDIC code points + +In Unicode terminology a I is the number assigned to a +character: for example, in EBCDIC the character "A" is usually assigned +the number 193. In Unicode the character "A" is assigned the number 65. +This causes a problem with the semantics of the pack/unpack "U", which +are supposed to pack Unicode code points to characters and back to numbers. +The problem is: which code points to use for code points less than 256? +(for 256 and over there's no problem: Unicode code points are used) +In EBCDIC, for the low 256 the EBCDIC code points are used. This +means that the equivalences + + pack("U", ord($character)) eq $character + unpack("U", $character) == ord $character + +will hold. (If Unicode code points were applied consistently over +all the possible code points, pack("U",ord("A")) would in EBCDIC +equal I or chr(101), and unpack("U", "A") would equal +65, or I, not 193, or ord "A".) + +=head2 Remaining Perl Unicode problems in EBCDIC + +=over 4 + +=item * + +Many of the remaining seem to be related to case-insensitive matching: +for example, C<< /[\x{131}]/ >> (LATIN SMALL LETTER DOTLESS I) does +not match "I" case-insensitively, as it should under Unicode. +(The match succeeds in ASCII-derived platforms.) + +=item * + +The extensions Unicode::Collate and Unicode::Normalized are not +supported under EBCDIC, likewise for the encoding pragma. + +=back + +=head2 Unicode and UTF + +UTF is a Unicode Transformation Format. UTF-8 is a Unicode conforming +representation of the Unicode standard that looks very much like ASCII. +UTF-EBCDIC is an attempt to represent Unicode characters in an EBCDIC +transparent manner. + +=head2 Using Encode + +Starting from Perl 5.8 you can use the standard new module Encode +to translate from EBCDIC to Latin-1 code points + + use Encode 'from_to'; + + my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' ); + + # $a is in EBCDIC code points + from_to($a, $ebcdic{ord '^'}, 'latin1'); + # $a is ISO 8859-1 code points + +and from Latin-1 code points to EBCDIC code points + + use Encode 'from_to'; + + my %ebcdic = ( 176 => 'cp37', 95 => 'cp1047', 106 => 'posix-bc' ); + + # $a is ISO 8859-1 code points + from_to($a, 'latin1', $ebcdic{ord '^'}); + # $a is in EBCDIC code points + +For doing I/O it is suggested that you use the autotranslating features +of PerlIO, see L. + +Since version 5.8 Perl uses the new PerlIO I/O library. This enables +you to use different encodings per IO channel. For example you may use + + use Encode; + open($f, ">:encoding(ascii)", "test.ascii"); + print $f "Hello World!\n"; + open($f, ">:encoding(cp37)", "test.ebcdic"); + print $f "Hello World!\n"; + open($f, ">:encoding(latin1)", "test.latin1"); + print $f "Hello World!\n"; + open($f, ">:encoding(utf8)", "test.utf8"); + print $f "Hello World!\n"; + +to get two files containing "Hello World!\n" in ASCII, CP 37 EBCDIC, +ISO 8859-1 (Latin-1) (in this example identical to ASCII) respective +UTF-EBCDIC (in this example identical to normal EBCDIC). See the +documentation of Encode::PerlIO for details. + +As the PerlIO layer uses raw IO (bytes) internally, all this totally +ignores things like the type of your filesystem (ASCII or EBCDIC). + =head1 SINGLE OCTET TABLES The following tables list the ASCII and Latin 1 ordered sets including @@ -103,7 +196,7 @@ the subsets: C0 controls (0..31), ASCII graphics (32..7e), delete (7f), C1 controls (80..9f), and Latin-1 (a.k.a. ISO 8859-1) (a0..ff). In the table non-printing control character names as well as the Latin 1 extensions to ASCII have been labelled with character names roughly -corresponding to I albeit with +corresponding to I albeit with substitutions such as s/LATIN// and s/VULGAR// in all cases, s/CAPITAL LETTER// in some cases, and s/SMALL LETTER ([A-Z])/\l$1/ in some other cases (the C pragma names unfortunately do @@ -123,294 +216,342 @@ work with a pod2_other_format translation) through: =back perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ - -e '{printf("%s%-9o%-9o%-9o%-9o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod + -e '{printf("%s%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5)}' perlebcdic.pod + +If you want to retain the UTF-x code points then in script form you +might want to write: + +=over 4 + +=item recipe 1 + +=back + + open(FH,") { + if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) { + if ($7 ne '' && $9 ne '') { + printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%-3o.%o\n",$1,$2,$3,$4,$5,$6,$7,$8,$9); + } + elsif ($7 ne '') { + printf("%s%-9o%-9o%-9o%-9o%-3o.%-5o%o\n",$1,$2,$3,$4,$5,$6,$7,$8); + } + else { + printf("%s%-9o%-9o%-9o%-9o%-9o%o\n",$1,$2,$3,$4,$5,$6,$8); + } + } + } If you would rather see this table listing hexadecimal values then run the table through: =over 4 -=item recipe 1 +=item recipe 2 =back perl -ne 'if(/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)/)' \ - -e '{printf("%s%-9X%-9X%-9X%-9X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod - - - 8859-1 - chr 0819 0037 1047 POSIX-BC - ---------------------------------------------------------------- - 0 0 0 0 - 1 1 1 1 - 2 2 2 2 - 3 3 3 3 - 4 55 55 55 - 5 45 45 45 - 6 46 46 46 - 7 47 47 47 - 8 22 22 22 - 9 5 5 5 - 10 37 21 21 *** - 11 11 11 11 -
12 12 12 12 - 13 13 13 13 - 14 14 14 14 - 15 15 15 15 - 16 16 16 16 - 17 17 17 17 - 18 18 18 18 - 19 19 19 19 - 20 60 60 60 - 21 61 61 61 - 22 50 50 50 - 23 38 38 38 - 24 24 24 24 - 25 25 25 25 - 26 63 63 63 - 27 39 39 39 - 28 28 28 28 - 29 29 29 29 - 30 30 30 30 - 31 31 31 31 - 32 64 64 64 - ! 33 90 90 90 - " 34 127 127 127 - # 35 123 123 123 - $ 36 91 91 91 - % 37 108 108 108 - & 38 80 80 80 - ' 39 125 125 125 - ( 40 77 77 77 - ) 41 93 93 93 - * 42 92 92 92 - + 43 78 78 78 - , 44 107 107 107 - - 45 96 96 96 - . 46 75 75 75 - / 47 97 97 97 - 0 48 240 240 240 - 1 49 241 241 241 - 2 50 242 242 242 - 3 51 243 243 243 - 4 52 244 244 244 - 5 53 245 245 245 - 6 54 246 246 246 - 7 55 247 247 247 - 8 56 248 248 248 - 9 57 249 249 249 - : 58 122 122 122 - ; 59 94 94 94 - < 60 76 76 76 - = 61 126 126 126 - > 62 110 110 110 - ? 63 111 111 111 - @ 64 124 124 124 - A 65 193 193 193 - B 66 194 194 194 - C 67 195 195 195 - D 68 196 196 196 - E 69 197 197 197 - F 70 198 198 198 - G 71 199 199 199 - H 72 200 200 200 - I 73 201 201 201 - J 74 209 209 209 - K 75 210 210 210 - L 76 211 211 211 - M 77 212 212 212 - N 78 213 213 213 - O 79 214 214 214 - P 80 215 215 215 - Q 81 216 216 216 - R 82 217 217 217 - S 83 226 226 226 - T 84 227 227 227 - U 85 228 228 228 - V 86 229 229 229 - W 87 230 230 230 - X 88 231 231 231 - Y 89 232 232 232 - Z 90 233 233 233 - [ 91 186 173 187 *** ### - \ 92 224 224 188 ### - ] 93 187 189 189 *** - ^ 94 176 95 106 *** ### - _ 95 109 109 109 - ` 96 121 121 74 ### - a 97 129 129 129 - b 98 130 130 130 - c 99 131 131 131 - d 100 132 132 132 - e 101 133 133 133 - f 102 134 134 134 - g 103 135 135 135 - h 104 136 136 136 - i 105 137 137 137 - j 106 145 145 145 - k 107 146 146 146 - l 108 147 147 147 - m 109 148 148 148 - n 110 149 149 149 - o 111 150 150 150 - p 112 151 151 151 - q 113 152 152 152 - r 114 153 153 153 - s 115 162 162 162 - t 116 163 163 163 - u 117 164 164 164 - v 118 165 165 165 - w 119 166 166 166 - x 120 167 167 167 - y 121 168 168 168 - z 122 169 169 169 - { 123 192 192 251 ### - | 124 79 79 79 - } 125 208 208 253 ### - ~ 126 161 161 255 ### - 127 7 7 7 - 128 32 32 32 - 129 33 33 33 - 130 34 34 34 - 131 35 35 35 - 132 36 36 36 - 133 21 37 37 *** - 134 6 6 6 - 135 23 23 23 - 136 40 40 40 - 137 41 41 41 - 138 42 42 42 - 139 43 43 43 - 140 44 44 44 - 141 9 9 9 - 142 10 10 10 - 143 27 27 27 - 144 48 48 48 - 145 49 49 49 - 146 26 26 26 - 147 51 51 51 - 148 52 52 52 - 149 53 53 53 - 150 54 54 54 - 151 8 8 8 - 152 56 56 56 - 153 57 57 57 - 154 58 58 58 - 155 59 59 59 - 156 4 4 4 - 157 20 20 20 - 158 62 62 62 - 159 255 255 95 ### - 160 65 65 65 - 161 170 170 170 - 162 74 74 176 ### - 163 177 177 177 - 164 159 159 159 - 165 178 178 178 - 166 106 106 208 ### -
167 181 181 181 - 168 189 187 121 *** ### - 169 180 180 180 - 170 154 154 154 - 171 138 138 138 - 172 95 176 186 *** ### - 173 202 202 202 - 174 175 175 175 - 175 188 188 161 ### - 176 144 144 144 - 177 143 143 143 - 178 234 234 234 - 179 250 250 250 - 180 190 190 190 - 181 160 160 160 - 182 182 182 182 - 183 179 179 179 - 184 157 157 157 - 185 218 218 218 - 186 155 155 155 - 187 139 139 139 - 188 183 183 183 - 189 184 184 184 - 190 185 185 185 - 191 171 171 171 - 192 100 100 100 - 193 101 101 101 - 194 98 98 98 - 195 102 102 102 - 196 99 99 99 - 197 103 103 103 - 198 158 158 158 - 199 104 104 104 - 200 116 116 116 - 201 113 113 113 - 202 114 114 114 - 203 115 115 115 - 204 120 120 120 - 205 117 117 117 - 206 118 118 118 - 207 119 119 119 - 208 172 172 172 - 209 105 105 105 - 210 237 237 237 - 211 238 238 238 - 212 235 235 235 - 213 239 239 239 - 214 236 236 236 - 215 191 191 191 - 216 128 128 128 - 217 253 253 224 ### - 218 254 254 254 - 219 251 251 221 ### - 220 252 252 252 - 221 173 186 173 *** ### - 222 174 174 174 - 223 89 89 89 - 224 68 68 68 - 225 69 69 69 - 226 66 66 66 - 227 70 70 70 - 228 67 67 67 - 229 71 71 71 - 230 156 156 156 - 231 72 72 72 - 232 84 84 84 - 233 81 81 81 - 234 82 82 82 - 235 83 83 83 - 236 88 88 88 - 237 85 85 85 - 238 86 86 86 - 239 87 87 87 - 240 140 140 140 - 241 73 73 73 - 242 205 205 205 - 243 206 206 206 - 244 203 203 203 - 245 207 207 207 - 246 204 204 204 - 247 225 225 225 - 248 112 112 112 - 249 221 221 192 ### - 250 222 222 222 - 251 219 219 219 - 252 220 220 220 - 253 141 141 141 - 254 142 142 142 - 255 223 223 223 + -e '{printf("%s%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5)}' perlebcdic.pod + +Or, in order to retain the UTF-x code points in hexadecimal: + +=over 4 + +=item recipe 3 + +=back + + open(FH,") { + if (/(.{33})(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\.?(\d*)\s+(\d+)\.?(\d*)/) { + if ($7 ne '' && $9 ne '') { + printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%-2X.%X\n",$1,$2,$3,$4,$5,$6,$7,$8,$9); + } + elsif ($7 ne '') { + printf("%s%-9X%-9X%-9X%-9X%-2X.%-6X%X\n",$1,$2,$3,$4,$5,$6,$7,$8); + } + else { + printf("%s%-9X%-9X%-9X%-9X%-9X%X\n",$1,$2,$3,$4,$5,$6,$8); + } + } + } + + + incomp- incomp- + 8859-1 lete lete + chr 0819 0037 1047 POSIX-BC UTF-8 UTF-EBCDIC + ------------------------------------------------------------------------------------ + 0 0 0 0 0 0 + 1 1 1 1 1 1 + 2 2 2 2 2 2 + 3 3 3 3 3 3 + 4 55 55 55 4 55 + 5 45 45 45 5 45 + 6 46 46 46 6 46 + 7 47 47 47 7 47 + 8 22 22 22 8 22 + 9 5 5 5 9 5 + 10 37 21 21 10 21 *** + 11 11 11 11 11 11 + 12 12 12 12 12 12 + 13 13 13 13 13 13 + 14 14 14 14 14 14 + 15 15 15 15 15 15 + 16 16 16 16 16 16 + 17 17 17 17 17 17 + 18 18 18 18 18 18 + 19 19 19 19 19 19 + 20 60 60 60 20 60 + 21 61 61 61 21 61 + 22 50 50 50 22 50 + 23 38 38 38 23 38 + 24 24 24 24 24 24 + 25 25 25 25 25 25 + 26 63 63 63 26 63 + 27 39 39 39 27 39 + 28 28 28 28 28 28 + 29 29 29 29 29 29 + 30 30 30 30 30 30 + 31 31 31 31 31 31 + 32 64 64 64 32 64 + ! 33 90 90 90 33 90 + " 34 127 127 127 34 127 + # 35 123 123 123 35 123 + $ 36 91 91 91 36 91 + % 37 108 108 108 37 108 + & 38 80 80 80 38 80 + ' 39 125 125 125 39 125 + ( 40 77 77 77 40 77 + ) 41 93 93 93 41 93 + * 42 92 92 92 42 92 + + 43 78 78 78 43 78 + , 44 107 107 107 44 107 + - 45 96 96 96 45 96 + . 46 75 75 75 46 75 + / 47 97 97 97 47 97 + 0 48 240 240 240 48 240 + 1 49 241 241 241 49 241 + 2 50 242 242 242 50 242 + 3 51 243 243 243 51 243 + 4 52 244 244 244 52 244 + 5 53 245 245 245 53 245 + 6 54 246 246 246 54 246 + 7 55 247 247 247 55 247 + 8 56 248 248 248 56 248 + 9 57 249 249 249 57 249 + : 58 122 122 122 58 122 + ; 59 94 94 94 59 94 + < 60 76 76 76 60 76 + = 61 126 126 126 61 126 + > 62 110 110 110 62 110 + ? 63 111 111 111 63 111 + @ 64 124 124 124 64 124 + A 65 193 193 193 65 193 + B 66 194 194 194 66 194 + C 67 195 195 195 67 195 + D 68 196 196 196 68 196 + E 69 197 197 197 69 197 + F 70 198 198 198 70 198 + G 71 199 199 199 71 199 + H 72 200 200 200 72 200 + I 73 201 201 201 73 201 + J 74 209 209 209 74 209 + K 75 210 210 210 75 210 + L 76 211 211 211 76 211 + M 77 212 212 212 77 212 + N 78 213 213 213 78 213 + O 79 214 214 214 79 214 + P 80 215 215 215 80 215 + Q 81 216 216 216 81 216 + R 82 217 217 217 82 217 + S 83 226 226 226 83 226 + T 84 227 227 227 84 227 + U 85 228 228 228 85 228 + V 86 229 229 229 86 229 + W 87 230 230 230 87 230 + X 88 231 231 231 88 231 + Y 89 232 232 232 89 232 + Z 90 233 233 233 90 233 + [ 91 186 173 187 91 173 *** ### + \ 92 224 224 188 92 224 ### + ] 93 187 189 189 93 189 *** + ^ 94 176 95 106 94 95 *** ### + _ 95 109 109 109 95 109 + ` 96 121 121 74 96 121 ### + a 97 129 129 129 97 129 + b 98 130 130 130 98 130 + c 99 131 131 131 99 131 + d 100 132 132 132 100 132 + e 101 133 133 133 101 133 + f 102 134 134 134 102 134 + g 103 135 135 135 103 135 + h 104 136 136 136 104 136 + i 105 137 137 137 105 137 + j 106 145 145 145 106 145 + k 107 146 146 146 107 146 + l 108 147 147 147 108 147 + m 109 148 148 148 109 148 + n 110 149 149 149 110 149 + o 111 150 150 150 111 150 + p 112 151 151 151 112 151 + q 113 152 152 152 113 152 + r 114 153 153 153 114 153 + s 115 162 162 162 115 162 + t 116 163 163 163 116 163 + u 117 164 164 164 117 164 + v 118 165 165 165 118 165 + w 119 166 166 166 119 166 + x 120 167 167 167 120 167 + y 121 168 168 168 121 168 + z 122 169 169 169 122 169 + { 123 192 192 251 123 192 ### + | 124 79 79 79 124 79 + } 125 208 208 253 125 208 ### + ~ 126 161 161 255 126 161 ### + 127 7 7 7 127 7 + 128 32 32 32 194.128 32 + 129 33 33 33 194.129 33 + 130 34 34 34 194.130 34 + 131 35 35 35 194.131 35 + 132 36 36 36 194.132 36 + 133 21 37 37 194.133 37 *** + 134 6 6 6 194.134 6 + 135 23 23 23 194.135 23 + 136 40 40 40 194.136 40 + 137 41 41 41 194.137 41 + 138 42 42 42 194.138 42 + 139 43 43 43 194.139 43 + 140 44 44 44 194.140 44 + 141 9 9 9 194.141 9 + 142 10 10 10 194.142 10 + 143 27 27 27 194.143 27 + 144 48 48 48 194.144 48 + 145 49 49 49 194.145 49 + 146 26 26 26 194.146 26 + 147 51 51 51 194.147 51 + 148 52 52 52 194.148 52 + 149 53 53 53 194.149 53 + 150 54 54 54 194.150 54 + 151 8 8 8 194.151 8 + 152 56 56 56 194.152 56 + 153 57 57 57 194.153 57 + 154 58 58 58 194.154 58 + 155 59 59 59 194.155 59 + 156 4 4 4 194.156 4 + 157 20 20 20 194.157 20 + 158 62 62 62 194.158 62 + 159 255 255 95 194.159 255 ### + 160 65 65 65 194.160 128.65 + 161 170 170 170 194.161 128.66 + 162 74 74 176 194.162 128.67 ### + 163 177 177 177 194.163 128.68 + 164 159 159 159 194.164 128.69 + 165 178 178 178 194.165 128.70 + 166 106 106 208 194.166 128.71 ### +
167 181 181 181 194.167 128.72 + 168 189 187 121 194.168 128.73 *** ### + 169 180 180 180 194.169 128.74 + 170 154 154 154 194.170 128.81 + 171 138 138 138 194.171 128.82 + 172 95 176 186 194.172 128.83 *** ### + 173 202 202 202 194.173 128.84 + 174 175 175 175 194.174 128.85 + 175 188 188 161 194.175 128.86 ### + 176 144 144 144 194.176 128.87 + 177 143 143 143 194.177 128.88 + 178 234 234 234 194.178 128.89 + 179 250 250 250 194.179 128.98 + 180 190 190 190 194.180 128.99 + 181 160 160 160 194.181 128.100 + 182 182 182 182 194.182 128.101 + 183 179 179 179 194.183 128.102 + 184 157 157 157 194.184 128.103 + 185 218 218 218 194.185 128.104 + 186 155 155 155 194.186 128.105 + 187 139 139 139 194.187 128.106 + 188 183 183 183 194.188 128.112 + 189 184 184 184 194.189 128.113 + 190 185 185 185 194.190 128.114 + 191 171 171 171 194.191 128.115 + 192 100 100 100 195.128 138.65 + 193 101 101 101 195.129 138.66 + 194 98 98 98 195.130 138.67 + 195 102 102 102 195.131 138.68 + 196 99 99 99 195.132 138.69 + 197 103 103 103 195.133 138.70 + 198 158 158 158 195.134 138.71 + 199 104 104 104 195.135 138.72 + 200 116 116 116 195.136 138.73 + 201 113 113 113 195.137 138.74 + 202 114 114 114 195.138 138.81 + 203 115 115 115 195.139 138.82 + 204 120 120 120 195.140 138.83 + 205 117 117 117 195.141 138.84 + 206 118 118 118 195.142 138.85 + 207 119 119 119 195.143 138.86 + 208 172 172 172 195.144 138.87 + 209 105 105 105 195.145 138.88 + 210 237 237 237 195.146 138.89 + 211 238 238 238 195.147 138.98 + 212 235 235 235 195.148 138.99 + 213 239 239 239 195.149 138.100 + 214 236 236 236 195.150 138.101 + 215 191 191 191 195.151 138.102 + 216 128 128 128 195.152 138.103 + 217 253 253 224 195.153 138.104 ### + 218 254 254 254 195.154 138.105 + 219 251 251 221 195.155 138.106 ### + 220 252 252 252 195.156 138.112 + 221 173 186 173 195.157 138.113 *** ### + 222 174 174 174 195.158 138.114 + 223 89 89 89 195.159 138.115 + 224 68 68 68 195.160 139.65 + 225 69 69 69 195.161 139.66 + 226 66 66 66 195.162 139.67 + 227 70 70 70 195.163 139.68 + 228 67 67 67 195.164 139.69 + 229 71 71 71 195.165 139.70 + 230 156 156 156 195.166 139.71 + 231 72 72 72 195.167 139.72 + 232 84 84 84 195.168 139.73 + 233 81 81 81 195.169 139.74 + 234 82 82 82 195.170 139.81 + 235 83 83 83 195.171 139.82 + 236 88 88 88 195.172 139.83 + 237 85 85 85 195.173 139.84 + 238 86 86 86 195.174 139.85 + 239 87 87 87 195.175 139.86 + 240 140 140 140 195.176 139.87 + 241 73 73 73 195.177 139.88 + 242 205 205 205 195.178 139.89 + 243 206 206 206 195.179 139.98 + 244 203 203 203 195.180 139.99 + 245 207 207 207 195.181 139.100 + 246 204 204 204 195.182 139.101 + 247 225 225 225 195.183 139.102 + 248 112 112 112 195.184 139.103 + 249 221 221 192 195.185 139.104 ### + 250 222 222 222 195.186 139.105 + 251 219 219 219 195.187 139.106 + 252 220 220 220 195.188 139.112 + 253 141 141 141 195.189 139.113 + 254 142 142 142 195.190 139.114 + 255 223 223 223 195.191 139.115 If you would rather see the above table in CCSID 0037 order rather than ASCII + Latin-1 order then run the table through: =over 4 -=item recipe 2 +=item recipe 4 =back perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ - -e ' sort{$a->[1] <=> $b->[1]}' \ + -e ' sort{$a->[1] <=> $b->[1]}' \ -e ' map{[$_,substr($_,42,3)]}@l;}' perlebcdic.pod If you would rather see it in CCSID 1047 order then change the digit @@ -418,14 +559,14 @@ If you would rather see it in CCSID 1047 order then change the digit =over 4 -=item recipe 3 +=item recipe 5 =back perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ - -e ' sort{$a->[1] <=> $b->[1]}' \ + -e ' sort{$a->[1] <=> $b->[1]}' \ -e ' map{[$_,substr($_,51,3)]}@l;}' perlebcdic.pod If you would rather see it in POSIX-BC order then change the digit @@ -433,14 +574,14 @@ If you would rather see it in POSIX-BC order then change the digit =over 4 -=item recipe 4 +=item recipe 6 =back perl -ne 'if(/.{33}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}\s{6,8}\d{1,3}/)'\ -e '{push(@l,$_)}' \ -e 'END{print map{$_->[0]}' \ - -e ' sort{$a->[1] <=> $b->[1]}' \ + -e ' sort{$a->[1] <=> $b->[1]}' \ -e ' map{[$_,substr($_,60,3)]}@l;}' perlebcdic.pod @@ -523,13 +664,13 @@ it in tr/// like so: '\060\061\062\063\064\065\066\067\070\071\263\333\334\331\332\237' ; my $ebcdic_string = $ascii_string; - eval '$ebcdic_string =~ tr/\000-\377/' . $cp_037 . '/'; + eval '$ebcdic_string =~ tr/' . $cp_037 . '/\000-\377/'; To convert from EBCDIC 037 to ASCII just reverse the order of the tr/// arguments like so: my $ascii_string = $ebcdic_string; - eval '$ascii_string = tr/' . $cp_037 . '/\000-\377/'; + eval '$ascii_string =~ tr/\000-\377/' . $cp_037 . '/'; Similarly one could take the output of the third column from recipe 0 to obtain a C<$cp_1047> table. The fourth column of the output from recipe @@ -541,22 +682,22 @@ XPG operability often implies the presence of an I utility available from the shell or from the C library. Consult your system's documentation for information on iconv. -On OS/390 see the iconv(1) man page. One way to invoke the iconv +On OS/390 or z/OS see the iconv(1) manpage. One way to invoke the iconv shell utility from within perl would be to: - # OS/390 example + # OS/390 or z/OS example $ascii_data = `echo '$ebcdic_data'| iconv -f IBM-1047 -t ISO8859-1` or the inverse map: - # OS/390 example + # OS/390 or z/OS example $ebcdic_data = `echo '$ascii_data'| iconv -f ISO8859-1 -t IBM-1047` For other perl based conversion options see the Convert::* modules on CPAN. =head2 C RTL -The OS/390 C run time library provides _atoe() and _etoa() functions. +The OS/390 and z/OS C run time libraries provide _atoe() and _etoa() functions. =head1 OPERATOR DIFFERENCES @@ -675,8 +816,8 @@ recommend something similar to: print "Content-type:\ttext/html\015\012\015\012"; # this may be wrong on EBCDIC -Under the IBM OS/390 USS Web Server for example you should instead -write that as: +Under the IBM OS/390 USS Web Server or WebSphere on z/OS for example +you should instead write that as: print "Content-type:\ttext/html\r\n\r\n"; # OK for DGW et alia @@ -717,7 +858,11 @@ As of perl 5.005_03 the letter range regular expression such as [A-Z] and [a-z] have been especially coded to not pick up gap characters. For example, characters such as E C that lie between I and J would not be matched by the -regular expression range C. +regular expression range C. This works in +the other direction, too, if either of the range end points is +explicitly numeric: C<[\x89-\x91]> will match C<\x8e>, even +though C<\x89> is C and C<\x91 > is C, and C<\x8e> +is a gap character from the alphabetic viewpoint. If you do want to match the alphabet gap characters in a single octet regular expression try matching the hex or octal code such @@ -909,7 +1054,7 @@ connection. This strategy can employ a network connection. As such it would be computationally expensive. -=head1 TRANFORMATION FORMATS +=head1 TRANSFORMATION FORMATS There are a variety of ways of transforming data with an intra character set mapping that serve a variety of purposes. Sorting was discussed in the @@ -998,7 +1143,7 @@ following will print "Yes indeed\n" on either an ASCII or EBCDIC computer: $all_byte_chrs = ''; for (0..255) { $all_byte_chrs .= chr($_); } $uuencode_byte_chrs = pack('u', $all_byte_chrs); - ($uu = <<' ENDOFHEREDOC') =~ s/^\s*//gm; + ($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm; M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL M+2XO,#$R,S0U-C'EZ>WQ]?G^`@8*#A(6& @@ -1073,7 +1218,7 @@ omitted for brevity): $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr $a2e[hex $1]/ge; $string =~ s/=[\n\r]+$//; -=head2 Caesarian cyphers +=head2 Caesarian ciphers The practice of shifting an alphabet one or more characters for encipherment dates back thousands of years and was explicitly detailed by Gaius Julius @@ -1100,6 +1245,9 @@ In one-liner form: =head1 Hashing order and checksums +To the extent that it is possible to write code that depends on +hashing order there may be differences between hashes as stored +on an ASCII based machine and hashes stored on an EBCDIC based machine. XXX =head1 I18N AND L10N @@ -1110,26 +1258,34 @@ and discussed under the L section below. =head1 MULTI OCTET CHARACTER SETS -Multi byte EBCDIC code pages; Unicode, UTF-8, UTF-EBCDIC, XXX. +Perl may work with an internal UTF-EBCDIC encoding form for wide characters +on EBCDIC platforms in a manner analogous to the way that it works with +the UTF-8 internal encoding form on ASCII based platforms. + +Legacy multi byte EBCDIC code pages XXX. =head1 OS ISSUES There may be a few system dependent issues of concern to EBCDIC Perl programmers. -=head2 OS/400 - -The PASE environment. +=head2 OS/400 =over 8 +=item PASE + +The PASE environment is runtime environment for OS/400 that can run +executables built for PowerPC AIX in OS/400, see L. PASE +is ASCII-based, not EBCDIC-based as the ILE. + =item IFS access XXX. =back -=head2 OS/390 +=head2 OS/390, z/OS Perl runs under Unix Systems Services or USS. @@ -1152,15 +1308,16 @@ or: See also the OS390::Stdio module on CPAN. -=item OS/390 iconv +=item OS/390, z/OS iconv B is supported as both a shell utility and a C RTL routine. See also the iconv(1) and iconv(3) manual pages. =item locales -On OS/390 see L for information on locales. The L10N files -are in F. $Config{d_setlocale} is 'define' on OS/390. +On OS/390 or z/OS see L for information on locales. The L10N files +are in F. $Config{d_setlocale} is 'define' on OS/390 +or z/OS. =back @@ -1179,18 +1336,16 @@ translation difficulties. In particular one popular nroff implementation was known to strip accented characters to their unaccented counterparts while attempting to view this document through the B program (for example, you may see a plain C rather than one with a diaeresis -as in E). Another nroff truncated the resultant man page at -the first occurence of 8 bit characters. +as in E). Another nroff truncated the resultant manpage at +the first occurrence of 8 bit characters. Not all shells will allow multiple C<-e> string arguments to perl to -be concatenated together properly as recipes 2, 3, and 4 might seem -to imply. - -Perl does not yet work with any Unicode features on EBCDIC platforms. +be concatenated together properly as recipes 0, 2, 4, 5, and 6 might +seem to imply. =head1 SEE ALSO -L, L. +L, L, L, L. =head1 REFERENCES @@ -1204,10 +1359,7 @@ http://www.wps.com/texts/codes/ B Tom Jennings, September 1999. -B The Unicode Consortium, -ISBN 0-201-48345-9, Addison Wesley Developers Press, July 1996. - -B The Unicode Consortium, Lisa Moore ed., +B The Unicode Consortium, Lisa Moore ed., ISBN 0-201-61633-5, Addison Wesley Developers Press, February 2000. B Fred B. Wrixon, ISBN 1-57912-040-7, Black Dog & Leventhal Publishers, 1998. +http://www.bobbemer.com/P-BIT.HTM +B Robert Bemer. + +=head1 HISTORY + +15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp. + =head1 AUTHOR Peter Prymmer pvhp@best.com wrote this in 1999 and 2000