possible, but as soon as Unicodeness cannot be avoided, the data is
transparently upgraded to Unicode.
-The internal encoding of Unicode in Perl is UTF-8. The internal
-encoding is normally hidden, however, and one need not and should not
-worry about the internal encoding at all: it is all just characters.
+Internally, Perl currently uses either whatever the native eight-bit
+character set of the platform (for example Latin-1) or UTF-8 to encode
+Unicode strings. Specifically, if all code points in the string are
+0xFF or less, Perl uses Latin-1. Otherwise, it uses UTF-8.
+
+A user of Perl does not normally need to know nor care how Perl happens
+to encodes its internal strings, but it becomes relevant when outputting
+Unicode strings to a stream without a discipline (one with the "default
+default"). In such a case, the raw bytes used internally (the native
+character set or UTF-8, as appropriate for each string) will be used,
+and if warnings are turned on, a "Wide character" warning will be issued
+if those strings contain a character beyond 0x00FF.
+
+For example,
+
+ perl -w -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
+
+produces a fairly useless mixture of native bytes and UTF-8, as well
+as a warning.
+
+To output UTF-8 always, use the ":utf8" output discipline. Prepending
+
+ binmode(STDOUT, ":utf8");
+
+to this sample program ensures the output is completely UTF-8, and
+of course, removes the warning. Another way to achieve this is the
+L<encoding> pragma, discussed later in L</Legacy Encodings>.
Perl 5.8.0 will also support Unicode on EBCDIC platforms. There the
support is somewhat harder to implement since additional conversions
support won't be quite as full as in other, mainly ASCII-based,
platforms (the Unicode support will be better than in the 5.6 series,
which didn't work much at all for EBCDIC platform). On EBCDIC
-platforms the internal encoding form used is UTF-EBCDIC.
+platforms the internal encoding form used is UTF-EBCDIC instead
+of UTF-8 (the difference is that as UTF-8 is "ASCII-safe" in that
+ASCII characters encode to UTF-8 as-is, UTF-EBCDIC is "EBCDIC-safe").
=head2 Creating Unicode
-To create Unicode literals, use the C<\x{...}> notation in
-doublequoted strings:
+To create Unicode literals for code points above 0xFF, use the
+C<\x{...}> notation in doublequoted strings:
my $smiley = "\x{263a}";