The EBCDIC code page in use on Siemens' BS2000 system is distinct from
1047 and 0037. It is identified below as the POSIX-BC set.
+=head2 Unicode code points versus EBCDIC code points
+
+In Unicode terminology a I<code point> is the number assigned to a
+character: for example, in EBCDIC the character "A" is usually assigned
+the number 193. In Unicode the character "A" is assigned the number 65.
+This causes a problem with the semantics of the pack/unpack "U", which
+are supposed to pack Unicode code points to characters and back to numbers.
+The problem is: which code points to use for code points less than 256?
+(for 256 and over there's no problem: Unicode code points are used)
+In EBCDIC, for the low 256 the EBCDIC code points are used. This
+means that the equivalences
+
+ pack("U", ord($character)) eq $character
+ unpack("U", $character) == ord $character
+
+will hold. (If Unicode code points were applied consistently over
+all the possible code points, pack("U",ord("A")) would in EBCDIC
+equal I<A with acute> or chr(101), and unpack("U", "A") would equal
+65, or I<non-breaking space>, not 193, or ord "A".)
+
=head2 Unicode and UTF
UTF is a Unicode Transformation Format. UTF-8 is a Unicode conforming
to this sample program ensures the output is completely UTF-8, and
of course, removes the warning.
-Perl 5.8.0 also supports Unicode on EBCDIC platforms. There, the
-support is somewhat harder to implement since additional conversions
-are needed at every step. Because of these difficulties, the Unicode
-support isn't quite as full as in other, mainly ASCII-based, platforms
-(the Unicode support is better than in the 5.6 series, which didn't
-work much at all for EBCDIC platform). On EBCDIC platforms, the
-internal Unicode encoding form is UTF-EBCDIC instead of UTF-8 (the
-difference is that as UTF-8 is "ASCII-safe" in that ASCII characters
-encode to UTF-8 as-is, UTF-EBCDIC is "EBCDIC-safe").
+=head2 Unicode and EBCDIC
+
+Perl 5.8.0 also supports Unicode on EBCDIC platforms. There,
+the Unicode support is somewhat more complex to implement since
+additional conversions are needed at every step. Some problems
+remain, but they all seem to be related to the combination of
+the extra mapping just described and case-insensitive matching:
+for example, "\x{131}" (LATIN SMALL LETTER DOTLESS I) does not
+match "I" case-insensitively, as it should under Unicode.
+(The match succeeds in ASCII-derived platforms.)
+
+In any case, the Unicode support on EBCDIC platforms is better than
+in the 5.6 series, which didn't work much at all for EBCDIC platform.
+On EBCDIC platforms, the internal Unicode encoding form is UTF-EBCDIC
+instead of UTF-8 (the difference is that as UTF-8 is "ASCII-safe" in
+that ASCII characters encode to UTF-8 as-is, UTF-EBCDIC is
+"EBCDIC-safe").
=head2 Creating Unicode