=for comment
Hm, I wonder what it would look like if
you tried to write a BNF for Pod from this.
-
+
=head3 Dr. Strangelove, or: How I Learned to
Stop Worrying and Love the Bomb
# <- that's the 0th column
=head1 Foo
-
+
Stuff
-
+
$foo->bar
-
+
=cut
Here, "=head1 Foo" and "=cut" are command paragraphs because the first
is a heading. That text may contain formatting codes. Examples:
=head1 Object Attributes
-
+
=head3 What B<Not> to Do!
=item "=pod"
it must be ignored. Examples:
=pod
-
+
This is a plain Pod paragraph.
-
+
=pod This text is ignored.
=item "=cut"
# This is the first line of program text.
sub foo { # This is the second.
-It is an error to try to I<start> a Pod black with a "=cut" command. In
+It is an error to try to I<start> a Pod block with a "=cut" command. In
that case, the Pod processor must halt parsing of the input file, and
must by default emit a warning.
below. Formatting codes are not expanded. Examples:
=over 3
-
+
=over 3.5
-
+
=over
=item "=item"
below. Examples:
=item
-
+
=item *
-
+
=item *
-
+
=item 14
-
+
=item 3.
-
+
=item C<< $thing->stuff(I<dodad>) >>
-
+
=item For transporting us beyond seas to be tried for pretended
offenses
-
+
=item He is at this time transporting large armies of foreign
mercenaries to complete the works of death, desolation and
tyranny, already begun with circumstances of cruelty and perfidy
This is synonymous with:
=begin formatname
-
+
text...
-
+
=end formatname
That is, it creates a region consisting of a single paragraph; that
to use "=for formatname text..." to express "text..." as a verbatim
paragraph.
+=item "=encoding encodingname"
+
+This command, which should occur early in the document (at least
+before any non-US-ASCII data!), declares that this document is
+encoded in the encoding I<encodingname>, which must be
+an encoding name that L<Encoding> recognizes. (Encoding's list
+of supported encodings, in L<Encoding::Supported>, is useful here.)
+If the Pod parser cannot decode the declared encoding, it
+should emit a warning and may abort parsing the document
+altogether.
+
+A document having more than one "=encoding" line should be
+considered an error. Pod processors may silently tolerate this if
+the not-first "=encoding" lines are just duplicates of the
+first one (e.g., if there's a "=use utf8" line, and later on
+another "=use utf8" line). But Pod processors should complain if
+there are contradictory "=encoding" lines in the same document
+(e.g., if there is a "=encoding utf8" early in the document and
+"=encoding big5" later). Pod processors that recognize BOMs
+may also complain if they see an "=encoding" line
+that contradicts the BOM (e.g., if a document with a UTF-16LE
+BOM has an "=encoding shiftjis" line).
+
=back
If a Pod processor sees any command other than the ones listed
This formatting code is syntactically simple, but semantically
complex. What it means is that each space in the printable
-content of this code signifies a nonbreaking space.
+content of this code signifies a non-breaking space.
Consider:
Both signify the monospace (c[ode] style) text consisting of
"$x", one space, "?", one space, ":", one space, "$z". The
difference is that in the latter, with the S code, those spaces
-are not "normal" spaces, but instead are nonbreaking spaces.
+are not "normal" spaces, but instead are non-breaking spaces.
=back
would parse as equivalent to this:
- C<$foo-E<lt>bar>
+ C<$foo-E<gt>bar>
instead of as equivalent to a "C" formatting code containing
only "$foo-", and then a "bar>" outside the "C" formatting code. This
two paragraphs:
I<I told you not to do this!
-
+
Don't make me say it again!>
...must I<not> be parsed as two paragraphs in italics (with the I
above code must parse as if it were:
I<I told you not to do this!>
-
+
Don't make me say it again!E<gt>
(In SGMLish jargon, all Pod commands are like block-level
0xEF 0xBB 0xBF
=for comment
- If toke.c is modified to support UTF32, add mention of those here.
+ If toke.c is modified to support UTF-32, add mention of those here.
=item *
=for comment
If/WHEN some brave soul makes these heuristics into a generic
- text-file class (or file discipline?), we can presumably delete
+ text-file class (or PerlIO layer?), we can presumably delete
mention of these icky details from this file, and can instead
- tell people to just use appropriate class/discipline.
+ tell people to just use appropriate class/layer.
Auto-recognition of newline sequences would be another desirable
- feature of such a class/discipline.
+ feature of such a class/layer.
HINT HINT HINT.
=for comment
Minimal examples:
%% POD::Pod2PS v3.14159, using POD::Parser v1.92
-
+
<!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
-
+
{\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
-
+
.\" Pod::Man version 3.14159, using POD::Parser version 1.92
Formatters may also insert additional comments, including: the
=item *
When rendering Pod to a format that has two kinds of hyphens (-), one
-that's a nonbreaking hyphen, and another that's a breakable hyphen
+that's a non-breaking hyphen, and another that's a breakable hyphen
(as in "object-oriented", which can be split across lines as
"object-", newline, "oriented"), formatters are encouraged to
-generally translate "-" to nonbreaking hyphen, but may apply
+generally translate "-" to non-breaking hyphen, but may apply
heuristics to convert some of these to breaking hyphens.
=item *
Pod parsers, when processing a series of verbatim paragraphs one
after another, should consider them to be one large verbatim
paragraph that happens to contain blank lines. I.e., these two
-lines, which have an blank line between them:
+lines, which have a blank line between them:
use Foo;
which all Pod formatters must render faithfully. Characters
in the ranges 0-31 and 127-159 should not be used (neither as
literals, nor as EE<lt>number> codes), except for the
-literal byte-sequences for newline (13, 13 10, or 13), and tab (9).
+literal byte-sequences for newline (13, 13 10, or 10), and tab (9).
Characters in the range 160-255 refer to Latin-1 characters (also
defined there by Unicode, with the same meaning). Characters above
presumably does not I<need> special treatment by a Pod processor;
" 0 1 2 3 " doesn't look like a number in any base, so it would
presumably be looked up in the table of HTML-like names. Since
-there is (and cannot be) an HTML-like entity called " 0 1 2 3 ",
+there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ",
this will be treated as an error. However, Pod processors may
treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically>
invalid, potentially earning a different error message than the
such sequences in the target output format. A converter to *roff
would, for example know that "\xE9" (whether conveyed literally, or via
a EE<lt>...> sequence) is to be conveyed as "e\\*'".
-Similarly, a program rendering Pod in a MacOS application window, would
+Similarly, a program rendering Pod in a Mac OS application window, would
presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
-encoding that (at time of writing) is native for MacOS. Such
+encoding that (at time of writing) is native for Mac OS. Such
Unicode2whatever mappings are presumably already widely available for
common output formats. (Such mappings may be incomplete! Implementers
are not expected to bend over backwards in an attempt to render
anything, renderable or not). It is good practice to map Latin letters
with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding
unaccented US-ASCII letters (like a simple character 101, "e"), but
-clearly this is often not feasable, and an unrenderable character may
+clearly this is often not feasible, and an unrenderable character may
be represented as "?", or the like. In attempting a sane fallback
(as from EE<lt>233> to "e"), Pod formatters may use the
%Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or
=item *
-Some Pod formatters output to formats that implement nonbreaking
+Some Pod formatters output to formats that implement non-breaking
spaces as an individual character (which I'll call "NBSP"), and
-others output to formats that implement nonbreaking spaces just as
+others output to formats that implement non-breaking spaces just as
spaces wrapped in a "don't break this across lines" code. Note that
at the level of Pod, both sorts of codes can occur: Pod can contain a
NBSP character (whether as a literal, or as a "EE<lt>160>" or
"EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
-such codes are taken to represent nonbreaking spaces. Pod
+such codes are taken to represent non-breaking spaces. Pod
parsers should consider supporting the optional parsing of "SE<lt>foo
IE<lt>barE<gt> baz>" as if it were
"fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
=item *
-Some processors may find it the C<SE<lt>...E<gt>> code easiest to
+Some processors may find that the C<SE<lt>...E<gt>> code is easiest to
implement by replacing each space in the parse tree under the content
of the S, with an NBSP. But note: the replacement should apply I<not> to
spaces in I<all> text, but I<only> to spaces in I<printable> text. (This
Besides the NBSP character discussed above, implementors are reminded
of the existence of the other "special" character in Latin-1, the
-"soft hyphen" chararacter, also known as "discretionary hyphen",
+"soft hyphen" character, also known as "discretionary hyphen",
i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> =
C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation
point. That is, it normally renders as nothing, but may render as a
in L<Getopt::Std/DESCRIPTION>, "DESCRIPTION" is the section. (Note
that this is not the same as a manpage section like the "5" in "man 5
crontab". "Section Foo" in the Pod sense means the part of the text
-that's introduced by the heading or item whose text is "Foo".
+that's introduced by the heading or item whose text is "Foo".)
=back
<h1><a name="About_the_-M_Operator">About the <code>-M</code>
Operator</h1>
-
+
...
-
+
<a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
Operator" in somedoc</a>
with no accompanying paragraph. The middle item is an example:
=over
-
+
=item 1
-
+
Pick up dry cleaning.
-
+
=item 2
-
+
=item 3
-
+
Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs.
-
+
=back
=item *
content. That is, authors should not have an empty region like this:
=over
-
+
=back
Pod processors seeing such a contentless "=over" ... "=back" region,
=item Porro
=item Quisquam Est
-
+
Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
velit, sed quia non numquam eius modi tempora incidunt ut
labore et dolore magnam aliquam quaerat voluptatem.
"Ut Enim". In that case, you'd want to format it like so:
Neque
-
+
Porro
-
+
Quisquam Est
Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
velit, sed quia non numquam eius modi tempora incidunt ut
Ut Enim
-That is, there should be (at least roughtly) equal spacing between
+That is, there should be (at least roughly) equal spacing between
items as between paragraphs (although that spacing may well be less
than the full height of a line of text). This leaves it to the reader
to use (con)textual cues to figure out whether the "Qui dolorem
a specific format:
=begin rtf
-
+
\par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
-
+
=end rtf
The exact same effect could, incidentally, be achieved with a single
Another example of a data paragraph:
=begin html
-
+
I like <em>PIE</em>!
-
+
<hr>Especially pecan pie!
-
+
=end html
If these were ordinary paragraphs, the Pod parser would try to
I<identifier> begins with a colon, I<can> contain commands. For example:
=begin :biblio
-
+
Wirth's classic is available in several editions, including:
-
+
=for comment
hm, check abebooks.com for how much used copies cost.
-
+
=over
-
+
=item
-
+
Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
Teubner, Stuttgart. [Yes, it's in German.]
-
+
=item
-
+
Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
Programs.> Prentice-Hall, Englewood Cliffs, NJ.
-
+
=back
-
+
=end :biblio
Note, however, a "=begin I<identifier>"..."=end I<identifier>"
nor "=item". For example, this may be considered invalid:
=begin somedata
-
+
This is a data paragraph.
-
+
=head1 Don't do this!
-
+
This is a data paragraph too.
-
+
=end somedata
A Pod processor may signal that the above (specifically the "=head1"
I<not> be treated as an error:
=begin somedata
-
+
This is a data paragraph.
-
+
=cut
-
+
# Yup, this isn't Pod anymore.
sub excl { (rand() > .5) ? "hoo!" : "hah!" }
-
+
=pod
-
+
This is a data paragraph too.
-
+
=end somedata
And this too is valid:
=begin someformat
-
+
This is a data paragraph.
-
+
And this is a data paragraph.
-
+
=begin someotherformat
-
+
This is a data paragraph too.
-
+
And this is a data paragraph too.
-
+
=begin :yetanotherformat
=head2 This is a command paragraph!
This is an ordinary paragraph!
-
+
And this is a verbatim paragraph!
-
+
=end :yetanotherformat
-
+
=end someotherformat
-
+
Another data paragraph!
-
+
=end someformat
The contents of the above "=begin :yetanotherformat" ...
Also consider this valid structure:
=begin :biblio
-
+
Wirth's classic is available in several editions, including:
-
+
=over
-
+
=item
-
+
Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.>
Teubner, Stuttgart. [Yes, it's in German.]
-
+
=item
-
+
Wirth, Niklaus. 1976. I<Algorithms + Data Structures =
Programs.> Prentice-Hall, Englewood Cliffs, NJ.
=back
-
+
Buy buy buy!
-
+
=begin html
-
+
<img src='wirth_spokesmodeling_book.png'>
-
+
<hr>
-
+
=end html
-
+
Now now now!
-
+
=end :biblio
There, the "=begin html"..."=end html" region is nested inside
paragraphs. I.e., these should be tolerated:
=for html
-
+
=begin html
-
+
=end html
-
+
=begin :biblio
-
+
=end :biblio
Incidentally, note that there's no easy way to express a data
paragraph starting with something that looks like a command. Consider:
=begin stuff
-
+
=shazbot
-
+
=end stuff
There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
is, they must properly nest. For example, this is valid:
=begin outer
-
+
X
-
+
=begin inner
-
+
Y
-
+
=end inner
-
+
Z
-
+
=end outer
while this is invalid:
=begin outer
-
+
X
-
+
=begin inner
-
+
Y
-
+
=end outer
-
+
Z
-
+
=end inner
-
+
This latter is improper because when the "=end outer" command is seen, the
currently open region has the formatname "inner", not "outer". (It just
happens that "outer" is the format name of a higher-up region.) This is
an error. Processors must by default report this as an error, and may halt
-processing the document containing that error. A corrolary of this is that
+processing the document containing that error. A corollary of this is that
regions cannot "overlap" -- i.e., the latter block above does not represent
a region called "outer" which contains X and Y, overlapping a region called
"inner" which contains Y and Z. But because it is invalid (as all
Similarly, this is invalid:
=begin thing
-
+
=end hting
This is an error because the region is opened by "thing", and the "=end"
This is also invalid:
=begin thing
-
+
=end
This is invalid because every "=end" command must have a formatname