# This is the first line of program text.
sub foo { # This is the second.
-It is an error to try to I<start> a Pod black with a "=cut" command. In
+It is an error to try to I<start> a Pod block with a "=cut" command. In
that case, the Pod processor must halt parsing of the input file, and
must by default emit a warning.
to use "=for formatname text..." to express "text..." as a verbatim
paragraph.
+=item "=encoding encodingname"
+
+This command, which should occur early in the document (at least
+before any non-US-ASCII data!), declares that this document is
+encoded in the encoding I<encodingname>, which must be
+an encoding name that L<Encoding> recognizes. (Encoding's list
+of supported encodings, in L<Encoding::Supported>, is useful here.)
+If the Pod parser cannot decode the declared encoding, it
+should emit a warning and may abort parsing the document
+altogether.
+
+A document having more than one "=encoding" line should be
+considered an error. Pod processors may silently tolerate this if
+the not-first "=encoding" lines are just duplicates of the
+first one (e.g., if there's a "=use utf8" line, and later on
+another "=use utf8" line). But Pod processors should complain if
+there are contradictory "=encoding" lines in the same document
+(e.g., if there is a "=encoding utf8" early in the document and
+"=encoding big5" later). Pod processors that recognize BOMs
+may also complain if they see an "=encoding" line
+that contradicts the BOM (e.g., if a document with a UTF-16LE
+BOM has an "=encoding shiftjis" line).
+
=back
If a Pod processor sees any command other than the ones listed
This formatting code is syntactically simple, but semantically
complex. What it means is that each space in the printable
-content of this code signifies a nonbreaking space.
+content of this code signifies a non-breaking space.
Consider:
Both signify the monospace (c[ode] style) text consisting of
"$x", one space, "?", one space, ":", one space, "$z". The
difference is that in the latter, with the S code, those spaces
-are not "normal" spaces, but instead are nonbreaking spaces.
+are not "normal" spaces, but instead are non-breaking spaces.
=back
would parse as equivalent to this:
- C<$foo-E<lt>bar>
+ C<$foo-E<gt>bar>
instead of as equivalent to a "C" formatting code containing
only "$foo-", and then a "bar>" outside the "C" formatting code. This
0xEF 0xBB 0xBF
=for comment
- If toke.c is modified to support UTF32, add mention of those here.
+ If toke.c is modified to support UTF-32, add mention of those here.
=item *
=for comment
If/WHEN some brave soul makes these heuristics into a generic
- text-file class (or file discipline?), we can presumably delete
+ text-file class (or PerlIO layer?), we can presumably delete
mention of these icky details from this file, and can instead
- tell people to just use appropriate class/discipline.
+ tell people to just use appropriate class/layer.
Auto-recognition of newline sequences would be another desirable
- feature of such a class/discipline.
+ feature of such a class/layer.
HINT HINT HINT.
=for comment
=item *
When rendering Pod to a format that has two kinds of hyphens (-), one
-that's a nonbreaking hyphen, and another that's a breakable hyphen
+that's a non-breaking hyphen, and another that's a breakable hyphen
(as in "object-oriented", which can be split across lines as
"object-", newline, "oriented"), formatters are encouraged to
-generally translate "-" to nonbreaking hyphen, but may apply
+generally translate "-" to non-breaking hyphen, but may apply
heuristics to convert some of these to breaking hyphens.
=item *
such sequences in the target output format. A converter to *roff
would, for example know that "\xE9" (whether conveyed literally, or via
a EE<lt>...> sequence) is to be conveyed as "e\\*'".
-Similarly, a program rendering Pod in a MacOS application window, would
+Similarly, a program rendering Pod in a Mac OS application window, would
presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
-encoding that (at time of writing) is native for MacOS. Such
+encoding that (at time of writing) is native for Mac OS. Such
Unicode2whatever mappings are presumably already widely available for
common output formats. (Such mappings may be incomplete! Implementers
are not expected to bend over backwards in an attempt to render
=item *
-Some Pod formatters output to formats that implement nonbreaking
+Some Pod formatters output to formats that implement non-breaking
spaces as an individual character (which I'll call "NBSP"), and
-others output to formats that implement nonbreaking spaces just as
+others output to formats that implement non-breaking spaces just as
spaces wrapped in a "don't break this across lines" code. Note that
at the level of Pod, both sorts of codes can occur: Pod can contain a
NBSP character (whether as a literal, or as a "EE<lt>160>" or
"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
-such codes are taken to represent nonbreaking spaces. Pod
+such codes are taken to represent non-breaking spaces. Pod
parsers should consider supporting the optional parsing of "SE<lt>foo
IE<lt>barE<gt> baz>" as if it were
"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the