From: Sean M. Burke Date: Sat, 20 Oct 2001 17:51:09 +0000 (-0600) Subject: perlpodspec and perlpod rewrite, draft 3 "final" X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=8a93676d2b6d9cfcd46e9efcc3c94cc624b3b332;p=p5sagit%2Fp5-mst-13.2.git perlpodspec and perlpod rewrite, draft 3 "final" Message-Id: <3.0.6.32.20011020175109.007cb3d0@mail.spinn.net> p4raw-id: //depot/perl@12542 --- diff --git a/MANIFEST b/MANIFEST index 9236c3c..f38b378 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1854,6 +1854,7 @@ pod/perlop.pod Operator info pod/perlopentut.pod open() tutorial pod/perlothrtut.pod Threads old tutorial pod/perlpod.pod Pod info +pod/perlpodspec.pod Pod specification pod/perlport.pod Portability guide pod/perlre.pod Regular expression info pod/perlref.pod References info diff --git a/pod/buildtoc.PL b/pod/buildtoc.PL index 5d78962..bb6d0d3 100644 --- a/pod/buildtoc.PL +++ b/pod/buildtoc.PL @@ -109,6 +109,7 @@ if (-d "pod") { perldsc perlrequick perlpod + perlpodspec perlstyle perltrap diff --git a/pod/perl.pod b/pod/perl.pod index 9d585b5..8495648 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -63,6 +63,7 @@ For ease of access, the Perl manual has been split up into several sections. perlfunc Perl built-in functions perlopentut Perl open() tutorial perlpod Perl plain old documentation + perlpodspec Perl plain old documentation format specification perlrun Perl execution and options perldiag Perl diagnostic messages perllexwarn Perl warnings and their control diff --git a/pod/perlpod.pod b/pod/perlpod.pod index 765266b..91cc81a 100644 --- a/pod/perlpod.pod +++ b/pod/perlpod.pod @@ -1,325 +1,685 @@ + +=for comment +This document is in Pod format. To read this, use a Pod formatter, +like "perldoc perlpod". + =head1 NAME -perlpod - plain old documentation +perlpod - the Plain Old Documentation format =head1 DESCRIPTION -A pod-to-whatever translator reads a pod file paragraph by paragraph, -and translates it to the appropriate output format. There are -three kinds of paragraphs: -L, -L, and -L. +Pod is a simple-to-use markup language used for writing documentation +for Perl, Perl programs, and Perl modules. + +Translators are available for converting Pod to various formats +like plain text, HTML, man pages, and more. + +Pod markup consists of three basic kinds of paragraphs: +L, +L, and +L. + + +=head2 Ordinary Paragraph + +Most paragraphs in your documentation will be ordinary blocks +of text, like this one. You can simply type in your text without +any markup whatsoever, and with just a blank line before and +after. When it gets formatted, it will undergo minimal formatting, +like being rewrapped, probably put into a proportionally spaced +font, and maybe even justified. + +You can use formatting codes in ordinary paragraphs, for B, +I, C, L, and more. Such +codes are explained in the "L" +section, below. + =head2 Verbatim Paragraph -A verbatim paragraph, distinguished by being indented (that is, -it starts with space or tab). It should be reproduced exactly, -with tabs assumed to be on 8-column boundaries. There are no -special formatting escapes, so you can't italicize or anything -like that. A \ means \, and nothing else. +Verbatim paragraphs are usually used for presenting a codeblock or +other text which does not require any special parsing or formatting, +and which shouldn't be wrapped. + +A verbatim paragraph is distinguished by having its first character +be a space or a tab. (And commonly, all its lines begin with spaces +and/or tabs.) It should be reproduced exactly, with tabs assumed to +be on 8-column boundaries. There are no special formatting codes, +so you can't italicize or anything like that. A \ means \, and +nothing else. + =head2 Command Paragraph -All command paragraphs start with "=", followed by an -identifier, followed by arbitrary text that the command can -use however it pleases. Currently recognized commands are +A command paragraph is used for special treatment of whole chunks +of text, usually as headings or parts of lists. + +All command paragraphs (which are typically only one line long) start +with "=", followed by an identifier, followed by arbitrary text that +the command can use however it pleases. Currently recognized commands +are - =head1 heading - =head2 heading - =head3 heading - =head4 heading - =item text - =over N + =head1 Heading Text + =head2 Heading Text + =head3 Heading Text + =head4 Heading Text + =over indentlevel + =item stuff =back =cut =pod - =for X - =begin X - =end X + =begin format + =end format + =for format text... + +To explain them each in detail: + +=over + +=item C<=head1 I> -=over 4 +=item C<=head2 I> -=item =pod +=item C<=head3 I> -=item =cut +=item C<=head4 I> -The "=pod" directive does nothing beyond telling the compiler to lay -off parsing code through the next "=cut". It's useful for adding -another paragraph to the doc if you're mixing up code and pod a lot. +Head1 through head4 produce headings, head1 being the highest +level. The text in the rest of this paragraph is the content of the +heading. For example: -=item =head1 + =head2 Object Attributes -=item =head2 +The text "Object Attributes" comprises the heading there. (Note that +head3 and head4 are recent additions, not supported in older Pod +translators.) The text in these heading commands can use +formatting codes, as seen here: -=item =head3 + =head2 Possible Values for C<$/> -=item =head4 +Such commands are explained in the +"L" section, below. -Head1, head2, head3 and head4 produce first, second, third and fourth -level headings, with the text in the same paragraph as the "=headn" -directive forming the heading description. +=item C<=over I> -=item =over +=item C<=item I> -=item =back +=item C<=back> -=item =item +Item, over, and back require a little more explanation: "=over" starts +a region specifically for the generation of a list using "=item" +commands, or for indenting (groups of) normal paragraphs. At the end +of your list, use "=back" to end it. The I option to +"=over" indicates how far over to indent, generally in ems (where +one em is the width of an "M" in the document's base font) or roughly +comparable units; if there is no I option, it defaults +to four. (And some formatters may just ignore whatever I +you provide.) In the I in C<=item I>, you may +use formatting codes, as seen here: -Item, over, and back require a little more explanation: "=over" starts a -section specifically for the generation of a list using "=item" commands. At -the end of your list, use "=back" to end it. You will probably want to give -"4" as the number to "=over", as some formatters will use this for indentation. -The unit of indentation is optional. If the unit is not given the natural -indentation of the formatting system applied will be used. Note also that -there are some basic rules to using =item: don't use them outside of -an =over/=back block, use at least one inside an =over/=back block, you don't -_have_ to include the =back if the list just runs off the document, and -perhaps most importantly, keep the items consistent: either use "=item *" for -all of them, to produce bullets, or use "=item 1.", "=item 2.", etc., to -produce numbered lists, or use "=item foo", "=item bar", etc., i.e., things -that looks nothing like bullets or numbers. If you start with bullets or -numbers, stick with them, as many formatters use the first "=item" type to -decide how to format the list. + =item Using C<$|> to Control Buffering -=item =for +Such commands are explained in the +"L" section, below. -=item =begin +Note also that there are some basic rules to using "=over" ... +"=back" regions: -=item =end +=over -For, begin, and end let you include sections that are not interpreted -as pod text, but passed directly to particular formatters. A formatter -that can utilize that format will use the section, otherwise it will be -completely ignored. The directive "=for" specifies that the entire next -paragraph is in the format indicated by the first word after -"=for", like this: +=item * + +Don't use "=item"s outside of an "=over" ... "=back" region. + +=item * - =for html
+The first thing after the "=over" command should be an "=item", unless +there aren't going to be any items at all in this "=over" ... "=back" +region. + +=item * + +Don't put "=headI" commands inside an "=over" ... "=back" region. + +=item * + +And perhaps most importantly, keep the items consistent: either use +"=item *" for all of them, to produce bullets; or use "=item 1.", +"=item 2.", etc., to produce numbered lists; or use "=item foo", +"=item bar", etc. -- namely, things that look nothing like bullets or +numbers. + +If you start with bullets or numbers, stick with them, as +formatters use the first "=item" type to decide how to format the +list. + +=back + +=item C<=cut> + +To end a Pod block, use a blank line, +then a line beginning with "=cut", and a blank +line after it. This lets Perl (and the Pod formatter) know that +this is where Perl code is resuming. (The blank line before the "=cut" +is not technically necessary, but many older Pod processors require it.) + +=item C<=pod> + +The "=pod" command by itself doesn't do much of anything, but it +signals to Perl (and Pod formatters) that a Pod block starts here. A +Pod block starts with I command paragraph, so a "=pod" command is +usually used just when you want to start a Pod block with an ordinary +paragraph or a verbatim paragraph. For example: + + =item stuff() + + This function does stuff. + + =cut + + sub stuff { + ... + } + + =pod + + Remember to check its return value, as in: + + stuff() || die "Couldn't do stufF!"; + + =cut + +=item C<=begin I> + +=item C<=end I> + +=item C<=for I I> + +For, begin, and end will let you have regions of text/code/data that +are not generally interpreted as normal Pod text, but are passed +directly to particular formatters, or are otherwise special. A +formatter that can use that format will use the region, otherwise it +will be completely ignored. + +A command "=begin I", some paragraphs, and a +command "=end I", mean that the text/data inbetween +is meant for formatters that understand the special format +called I. For example, + + =begin html + +

This is a raw HTML paragraph

+ + =end html + +The command "=for I I" +specifies that the remainder of just this paragraph (starting +right after I) is in that special format. + + =for html
+

This is a raw HTML paragraph

+ +This means the same thing as the above "=begin html" ... "=end html" +region. -The paired commands "=begin" and "=end" work very similarly to "=for", but -instead of only accepting a single paragraph, all text from "=begin" to a -paragraph with a matching "=end" are treated as a particular format. +That is, with "=for", you can have only one paragraph's worth +of text (i.e., the text in "=foo targetname text..."), but with +"=begin targetname" ... "=end targetname", you can have any amount +of stuff inbetween. (Note that there still must be a blank line +after the "=begin" command and a blank line before the "=end" +command. Here are some examples of how to use these: - =begin html + =begin html + +
Figure 1.

+ + =end html + + =begin text + + --------------- + | foo | + | bar | + --------------- -
Figure 1.
+ ^^^^ Figure 1. ^^^^ - =end html + =end text - =begin text +Some format names that formatters currently are known to accept +include "roff", "man", "latex", "tex", "text", and "html". (Some +formatters will treat some of these as synonyms.) - --------------- - | foo | - | bar | - --------------- +A format name of "comment" is common for just making notes (presumably +to yourself) that won't appear in any formatted version of the Pod +document: - ^^^^ Figure 1. ^^^^ + =for comment + Make sure that all the available options are documented! - =end text +Some I will require a leading colon (as in +C<"=for :formatname">, or +C<"=begin :formatname" ... "=end :formatname">), +to signal that the text is not raw data, but instead I Pod text +(i.e., possibly containing formatting codes) that's just not for +normal formatting (e.g., may not be a normal-use paragraph, but might +be for formatting as a footnote). -Some format names that formatters currently are known to accept include -"roff", "man", "latex", "tex", "text", and "html". (Some formatters will -treat some of these as synonyms.) +=back -And don't forget, when using any command, that the command lasts up until -the end of the B, not the line. Hence in the examples below, you -can see the empty lines after each command to end its paragraph. +And don't forget, when using any command, that the command lasts up +until the end of its I, not its line. So in the +examples below, you can see that every command needs the blank +line after it, to end its paragraph. Some examples of lists include: - =over 4 + =over + + =item * + + First item + + =item * + + Second item + + =back + + =over + + =item Foo() + + Description of Foo function + + =item Bar() - =item * + Description of Bar function - First item + =back - =item * - Second item +=head2 Formatting Codes - =back +In ordinary paragraphs and in some command paragraphs, various +formatting codes (a.k.a. "interior sequences") can be used: - =over 4 +=for comment + "interior sequences" is such an opaque term. + Prefer "formatting codes" instead. - =item Foo() +=over - Description of Foo function +=item CtextE> -- italic text - =item Bar() +Used for emphasis ("Ccareful!E>") and parameters +("CLABELE>") + +=item CtextE> -- bold text + +Used for switches ("C-nE switch>"), programs +("CchfnE for that>"), +emphasis ("Ccareful!E>"), and so on +("CautovivificationE>"). + +=item CcodeE> -- code text + +Renders code in a typewriter font, or gives some other indication that +this represents program text ("Cgmtime($^T)E>") or some other +form of computerese ("Cdrwxr-xr-xE>"). + +=item CnameE> -- a hyperlink + +There are various syntaxes, listed below. In the syntaxes given, +C, C, and C
cannot contain the characters +'/' and '|'; and any '<' or '>' should be matched. + +=over + +=item * - Description of Bar function +CnameE> - =back +Link to a Perl manual page (e.g., CNet::PingE>). Note +that C should not contain spaces. This syntax +is also occasionally used for references to UNIX man pages, as in +Ccrontab(5)E>. + +=item * + +Cname/"sec"E> or Cname/secE> + +Link to a section in other manual page. E.g., +Cperlsyn/"For Loops"E> + +=item * + +C/"sec"E> or C/secE> or C"sec"E> + +Link to a section in this manual page. E.g., +C/"Object Methods"E> =back -=head2 Ordinary Block of Text - -It will be filled, and maybe even -justified. Certain interior sequences are recognized both -here and in commands: - - I Italicize text, used for emphasis or variables - B Embolden text, used for switches and programs - S Text contains non-breaking spaces - C Render code in a typewriter font, or give some other - indication that this represents program text - L A link (cross reference) to name - L manual page - L item in manual page - L section in other manual page - L<"sec"> section in this manual page - (the quotes are optional) - L ditto - same as above but only 'text' is used for output. - (Text can not contain the characters '/' and '|', - and should contain matched '<' or '>') - L - L - L - L - L - - F Used for filenames - X An index entry - Z<> A zero-width character - E A named character (very similar to HTML escapes) - E A literal < - E A literal > - E A literal / - E A literal | - (these are optional except in other interior - sequences and when preceded by a capital letter) - E Character number n (probably in ASCII) - E Some non-numeric HTML entity, such - as E - -Most of the time, you will only need a single set of angle brackets to -delimit the beginning and end of interior sequences. However, sometimes -you will want to put a right angle bracket (or greater-than sign '>') -inside of a sequence. This is particularly common when using a sequence -to provide a different font-type for a snippet of code. As with all -things in Perl, there is more than one way to do it. One way is to -simply escape the closing bracket using an C sequence: +A section is started by the named heading or item. For +example, Cperlvar/$.E> or Cperlvar/"$."E> both +link to the section started by "C<=item $.>" in perlvar. And +Cperlsyn/For LoopsE> or Cperlsyn/"For Loops"E> +both link to the section started by "C<=head2 For Loops>" +in perlsyn. + +To control what text is used for display, you +use "Ctext|...E>", as in: + +=over + +=item * + +Ctext|nameE> + +Link this text to that manual page. E.g., +CPerl Error Messages|perldiagE> + +=item * + +Ctext|name/"sec"E> or Ctext|name/secE> + +Link this text to that section in that manual page. E.g., +CSWITCH statements|perlsyn/"Basic BLOCKs and Switch +Statements"E> + +=item * + +Ctext|/"sec"E> or Ctext|/secE> +or Ctext|"sec"E> + +Link this text to that section in this manual page. E.g., +Cthe various attributes|/"Member Data"E> + +=back + +Or you can link to a web page: + +=over + +=item * + +Cscheme:...E> + +Links to an absolute URL. For example, +Chttp://www.perl.org/E>. But note +that there is no corresponding Ctext|scheme:...E> syntax, for +various reasons. + +=back + +=item CescapeE> -- a character escape + +Very similar to HTML/XML C<&I;> "entity references": + +=over + +=item * + +CltE> -- a literal E (less than) + +=item * + +CgtE> -- a literal E (greater than) + +=item * + +CverbarE> -- a literal | (Itical I) + +=item * + +CsolE> = a literal / (Iidus) + +The above four are optional except in other formatting codes, +notably C...E>, and when preceded by a +capital letter. + +=item * + +ChtmlnameE> + +Some non-numeric HTML entity name, such as CeacuteE>, +meaning the same thing as C<é> in HTML -- i.e., a lowercase +e with an acute (/-shaped) accent. + +=item * + +CnumberE> + +The ASCII/Latin-1/Unicode character with that number. A +leading "0x" means that I is hex, as in +C0x201EE>. A leading "0" means that I is octal, +as in C075E>. Otherwise I is interpreted as being +in decimal, as in C181E>. + +Note that older Pod formatters might not recognize octal or +hex numeric escapes, and that many formatters cannot reliably +render characters above 255. (Some formatters may even have +to use compromised renderings of Latin-1 characters, like +rendering CeacuteE> as just a plain "e".) + +=back + +=item CfilenameE> -- used for filenames + +Typically displayed in italics. Example: "C.cshrcE>" + +=item CtextE> -- text contains non-breaking spaces + +This means that the words in I should not be broken +across lines. Example: S$x ? $y : $zE>>. + +=item Ctopic nameE> -- an index entry + +This is ignored by most formatters, but some may use it for building +indexes. It always renders as empty-string. +Example: Cabsolutizing relative URLsE> + +=item CE> -- a null (zero-effect) formatting code + +This is rarely used. It's one way to get around using an +EE...E code sometimes. For example, instead of +"CltE3>" (for "NE3") you could write +"CEE3>" (the "ZEE" breaks up the "N" and +the "E" so they can't be considered +the part of a (fictitious) "NE...E" code. + +=for comment + This was formerly explained as a "zero-width character". But it in + most parser models, it parses to nothing at all, as opposed to parsing + as if it were a E or E, which are REAL zero-width characters. + So "width" and "character" are exactly the wrong words. + +=back + +Most of the time, you will need only a single set of angle brackets to +delimit the beginning and end of formatting codes. However, +sometimes you will want to put a real right angle bracket (a +greater-than sign, '>') inside of a formatting code. This is particularly +common when using a formatting code to provide a different font-type for a +snippet of code. As with all things in Perl, there is more than +one way to do it. One way is to simply escape the closing bracket +using an C code: C<$a E=E $b> This will produce: "C<$a E=E $b>" -A more readable, and perhaps more "plain" way is to use an alternate set of -delimiters that doesn't require a ">" to be escaped. As of perl5.5.660, -doubled angle brackets ("<<" and ">>") may be used I For example, the following will do the -trick: +A more readable, and perhaps more "plain" way is to use an alternate +set of delimiters that doesn't require a single ">" to be escaped. With +the Pod formatters that are standard starting with perl5.5.660, doubled +angle brackets ("<<" and ">>") may be used I For example, the following will +do the trick: C<< $a <=> $b >> In fact, you can use as many repeated angle-brackets as you like so long as you have the same number of them in the opening and closing delimiters, and make sure that whitespace immediately follows the last -'<' of the opening delimiter, and immediately precedes the first '>' of -the closing delimiter. So the following will also work: +'<' of the opening delimiter, and immediately precedes the first '>' +of the closing delimiter. (The whitespace is ignored.) So the +following will also work: C<<< $a <=> $b >>> - C<<<< $a <=> $b >>>> + C<<<< $a <=> $b >>>> -This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man), -and any other pod2xxx and Pod::Xxxx translator that uses Pod::Parser -1.093 or later. +And they all mean exactly the same as this: + + C<$a E=E $b> + +As a further example, this means that if you wanted to put these bits of +code in C (code) style: + + open(X, ">>thing.dat") || die $! + $foo->bar(); + +you could do it like so: + + C<<< open(X, ">>thing.dat") || die $! >>> + C<< $foo->bar(); >> +which is presumably easier to read than the old way: + + CEthing.dat") || die $!> + C<$foo-Ebar(); >> + +This is currently supported by pod2text (Pod::Text), pod2man (Pod::Man), +and any other pod2xxx or Pod::Xxxx translators that use +Pod::Parser 1.093 or later, or Pod::Tree 1.02 or later. =head2 The Intent -That's it. The intent is simplicity, not power. I wanted paragraphs -to look like paragraphs (block format), so that they stand out -visually, and so that I could run them through fmt easily to reformat -them (that's F7 in my version of B). I wanted the translator (and not -me) to worry about whether " or ' is a left quote or a right quote -within filled text, and I wanted it to leave the quotes alone, dammit, in -verbatim mode, so I could slurp in a working program, shift it over 4 -spaces, and have it print out, er, verbatim. And presumably in a -constant width font. - -In particular, you can leave things like this verbatim in your text: - - Perl - FILEHANDLE - $variable - function() - manpage(3r) - -Doubtless a few other commands or sequences will need to be added along -the way, but I've gotten along surprisingly well with just these. - -Note that I'm not at all claiming this to be sufficient for producing a -book. I'm just trying to make an idiot-proof common source for nroff, -TeX, and other markup languages, as used for online documentation. -Translators exist for B (that's for nroff(1) and troff(1)), -B, B, B, and B. +The intent is simplicity of use, not power of expression. Paragraphs +look like paragraphs (block format), so that they stand out +visually, and so that I could run them through C easily to reformat +them (that's F7 in my version of B, or Esc Q in my version of +B). I wanted the translator to always leave the C<'> and C<`> and +C<"> quotes alone, in verbatim mode, so I could slurp in a +working program, shift it over four spaces, and have it print out, er, +verbatim. And presumably in a monospace font. + +The Pod format is not necessarily sufficient for writing a book. Pod +is just meant to be an idiot-proof common source for nroff, HTML, +TeX, and other markup languages, as used for online +documentation. Translators exist for B, B, +B (that's for nroff(1) and troff(1)), B, and +B. Various others are available in CPAN. + =head2 Embedding Pods in Perl Modules -You can embed pod documentation in your Perl scripts. Start your -documentation with a "=head1" command at the beginning, and end it -with a "=cut" command. Perl will ignore the pod text. See any of the -supplied library modules for examples. If you're going to put your -pods at the end of the file, and you're using an __END__ or __DATA__ -cut mark, make sure to put an empty line there before the first pod -directive. +You can embed Pod documentation in your Perl modules and scripts. +Start your documentation with an empty line, a "=head1" command at the +beginning, and end it with a "=cut" command and an empty line. Perl +will ignore the Pod text. See any of the supplied library modules for +examples. If you're going to put your Pod at the end of the file, and +you're using an __END__ or __DATA__ cut mark, make sure to put an +empty line there before the first Pod command. - __END__ + __END__ - =head1 NAME + =head1 NAME - modern - I am a modern module + Time::Local - efficiently compute time from local and GMT time -If you had not had that empty line there, then the translators wouldn't -have seen it. +Without that empty line before the "=head1", many translators wouldn't +have recognized the "=head1" as starting a Pod block. -=head2 Common Pod Pitfalls +=head2 Hints for Writing Pod -=over 4 +=over =item * -Pod translators usually will require paragraphs to be separated by -completely empty lines. If you have an apparently empty line with -some spaces on it, this can cause odd formatting. +The B command is provided for checking Pod syntax for errors +and warnings. For example, it checks for completely blank lines in +Pod blocks and for unknown commands and formatting codes. You should +still also pass your document through one or more translators and proofread +the result, or print out the result and proofread that. Some of the +problems found may be bugs in the translators, which you may or may not +wish to work around. =item * -Translators will mostly add wording around a LEE link, so that -Cfoo(1)E> becomes "the I(1) manpage", for example (see -B for details). Thus, you shouldn't write things like CfooE manpage>, if you want the translated document to read -sensibly. +If you're more familiar with writing in HTML than with writing in Pod, you +can try your hand at writing documentation in simple HTML, and coverting +it to Pod with the experimental L module, +(available in CPAN), and looking at the resulting code. The experimental +L module in CPAN might also be useful. + +=item * + +Many older Pod translators require the lines before every Pod +command and after every Pod command (including "=cut"!) to be a blank +line. Having something like this: + + # - - - - - - - - - - - - + =item $firecracker->boom() + + This noisily detonates the firecracker object. + =cut + sub boom { + ... + +...will make such Pod translators completely fail to see the Pod block +at all. + +Instead, have it like this: + + # - - - - - - - - - - - - + + =item $firecracker->boom() + + This noisily detonates the firecracker object. + + =cut + + sub boom { + ... + +=item * + +Some older Pod translators require paragraphs (including command +paragraphs like "=head2 Functions") to be separated by I +empty lines. If you have an apparently empty line with some spaces +on it, this might not count as a separator for those translators, and +that could cause odd formatting. + +=item * -If you need total control of the text used for a link in the output -use the form LEshow this text|fooE instead. +Older translators might add wording around an LEE link, so that +CFoo::BarE> may become "the Foo::Bar manpage", for example. +So you shouldn't write things like CfooE +documentation>, if you want the translated document to read sensibly +-- instead write CFoo::Bar|Foo::BarE documentation> or +Cthe Foo::Bar documentation|Foo::BarE>, to control how the +link comes out. =item * -The B command is provided to check pod syntax -for errors and warnings. For example, it checks for completely -blank lines in pod segments and for unknown escape sequences. -It is still advised to pass it through -one or more translators and proofread the result, or print out the -result and proofread that. Some of the problems found may be bugs in -the translators, which you may or may not wish to work around. +Going past the 70th column in a verbatim block might be ungracefully +wrapped by some formatters. =back =head1 SEE ALSO -L, L, -L +L, L, +L, L, L, L, L. =head1 AUTHOR -Larry Wall +Larry Wall, Sean M. Burke +=cut diff --git a/pod/perlpodspec.pod b/pod/perlpodspec.pod new file mode 100644 index 0000000..c87e1cb --- /dev/null +++ b/pod/perlpodspec.pod @@ -0,0 +1,1876 @@ + +=head1 NAME + +perlpodspec - Plain Old Documentation: format specification and notes + +=head1 DESCRIPTION + +This document is detailed notes on the Pod markup language. Most +people will only have to read L to know how to write +in Pod, but this document may answer some incidental questions to do +with parsing and rendering Pod. + +In this document, "must" / "must not", "should" / +"should not", and "may" have their conventional (cf. RFC 2119) +meanings: "X must do Y" means that if X doesn't do Y, it's against +this specification, and should really be fixed. "X should do Y" +means that it's recommended, but X may fail to do Y, if there's a +good reason. "X may do Y" is merely a note that X can do Y at +will (although it is up to the reader to detect any connotation of +"and I think it would be I if X did Y" versus "it wouldn't +really I me if X did Y"). + +Notably, when I say "the parser should do Y", the +parser may fail to do Y, if the calling application explicitly +requests that the parser I do Y. I often phrase this as +"the parser should, by default, do Y." This doesn't I +the parser to provide an option for turning off whatever +feature Y is (like expanding tabs in verbatim paragraphs), although +it implicates that such an option I be provided. + +=head1 Pod Definitions + +Pod is embedded in files, typically Perl source files -- although you +can write a file that's nothing but Pod. + +A B in a file consists of zero or more non-newline characters, +terminated by either a newline or the end of the file. + +A B is usually a platform-dependent concept, but +Pod parsers should understand it to mean any of CR (ASCII 13), LF +(ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in +addition to any other system-specific meaning. The first CR/CRLF/LF +sequence in the file may be used as the basis for identifying the +newline sequence for parsing the rest of the file. + +A B is a line consisting entirely of zero or more spaces +(ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. +A B is a line containing one or more characters other +than space or tab (and terminated by a newline or end-of-file). + +(I Many older Pod parsers did not accept a line consisting of +spaces/tabs and then a newline as a blank line -- the only lines they +considered blank were lines consisting of I, +terminated by a newline.) + +B is used in this document as a blanket term for spaces, +tabs, and newline sequences. (By itself, this term usually refers +to literal whitespace. That is, sequences of whitespace characters +in Pod source, as opposed to "EE32>", which is a formatting +code that I a whitespace character.) + +A B is a module meant for parsing Pod (regardless of +whether this involves calling callbacks or building a parse tree or +directly formatting it). A B (or B) +is a module or program that converts Pod to some other format (HTML, +plaintext, TeX, PostScript, RTF). A B might be a +formatter or translator, or might be a program that does something +else with the Pod (like wordcounting it, scanning for index points, +etc.). + +Pod content is contained in B. A Pod block starts with a +line that matches , and continues up to the next line +that matches C -- or up to the end of the file, if there is +no C line. + +=for comment + The current perlsyn says: + [beginquote] + Note that pod translators should look at only paragraphs beginning + with a pod directive (it makes parsing easier), whereas the compiler + actually knows to look for pod escapes even in the middle of a + paragraph. This means that the following secret stuff will be ignored + by both the compiler and the translators. + $a=3; + =secret stuff + warn "Neither POD nor CODE!?" + =cut back + print "got $a\n"; + You probably shouldn't rely upon the warn() being podded out forever. + Not all pod translators are well-behaved in this regard, and perhaps + the compiler will become pickier. + [endquote] + I think that those paragraphs should just be removed; paragraph-based + parsing seems to have been largely abandoned, because of the hassle + with non-empty blank lines messing up what people meant by "paragraph". + Even if the "it makes parsing easier" bit were especially true, + it wouldn't be worth the confusion of having perl and pod2whatever + actually disagree on what can constitute a Pod block. + +Within a Pod block, there are B. A Pod paragraph +consists of non-blank lines of text, separated by one or more blank +lines. + +For purposes of Pod processing, there are four types of paragraphs in +a Pod block: + +=over + +=item * + +A command paragraph (also called a "directive"). The first line of +this paragraph must match C. Command paragraphs are +typically one line, as in: + + =head1 NOTES + + =item * + +But they may span several (non-blank) lines: + + =for comment + Hm, I wonder what it would look like if + you tried to write a BNF for Pod from this. + + =head3 Dr. Strangelove, or: How I Learned to + Stop Worrying and Love the Bomb + +I command paragraphs allow formatting codes in their content +(i.e., after the part that matches C), as in: + + =head1 Did You Remember to C? + +In other words, the Pod processing handler for "head1" will apply the +same processing to "Did You Remember to CEuse strict;>?" that it +would to an ordinary paragraph -- i.e., formatting codes (like +"CE...>") are parsed and presumably formatted appropriately, and +whitespace in the form of literal spaces and/or tabs is not +significant. + +=item * + +A B. The first line of this paragraph must be a +literal space or tab, and this paragraph must not be inside a "=begin +I", ... "=end I" sequence unless +"I" begins with a colon (":"). That is, if a paragraph +starts with a literal space or tab, but I inside a +"=begin I", ... "=end I" region, then it's +a data paragraph, unless "I" begins with a colon. + +Whitespace I significant in verbatim paragraphs (although, in +processing, tabs are probably expanded). + +=item * + +An B. A paragraph is an ordinary paragraph +if its first line matches neither C nor +C, I if it's not inside a "=begin I", +... "=end I" sequence unless "I" begins with +a colon (":"). + +=item * + +A B. This is a paragraph that I inside a "=begin +I" ... "=end I" sequence where +"I" does I begin with a literal colon (":"). In +some sense, a data paragraph is not part of Pod at all (i.e., +effectively it's "out-of-band"), since it's not subject to most kinds +of Pod parsing; but it is specified here, since Pod +parsers need to be able to call an event for it, or store it in some +form in a parse tree, or at least just parse I it. + +=back + +For example: consider the following paragraphs: + + # <- that's the 0th column + + =head1 Foo + + Stuff + + $foo->bar + + =cut + +Here, "=head1 Foo" and "=cut" are command paragraphs because the first +line of each matches C. "I<[space][space]>$foo->bar" +is a verbatim paragraph, because its first line starts with a literal +whitespace character (and there's no "=begin"..."=end" region around). + +The "=begin I" ... "=end I" commands stop +paragraphs that they surround from being parsed as data or verbatim +paragraphs, if I doesn't begin with a colon. This +is discussed in detail in the section +L=end" Regions>. + +=head1 Pod Commands + +This section is intended to supplement and clarify the discussion in +L. These are the currently recognized +Pod commands: + +=over + +=item "=head1", "=head2", "=head3", "=head4" + +This command indicates that the text in the remainder of the paragraph +is a heading. That text may contain formatting codes. Examples: + + =head1 Object Attributes + + =head3 What B to Do! + +=item "=pod" + +This command indicates that this paragraph begins a Pod block. (If we +are already in the middle of a Pod block, this command has no effect at +all.) If there is any text in this command paragraph after "=pod", +it must be ignored. Examples: + + =pod + + This is a plain Pod paragraph. + + =pod This text is ignored. + +=item "=cut" + +This command indicates that this line is the end of this previously +started Pod block. If there is any text after "=cut" on the line, it must be +ignored. Examples: + + =cut + + =cut The documentation ends here. + + =cut + # This is the first line of program text. + sub foo { # This is the second. + +It is an error to try to I a Pod black with a "=cut" command. In +that case, the Pod processor must halt parsing of the input file, and +must by default emit a warning. + +=item "=over" + +This command indicates that this is the start of a list/indent +region. If there is any text following the "=over", it must consist +of only a nonzero positive numeral. The semantics of this numeral is +explained in the L section, further +below. Formatting codes are not expanded. Examples: + + =over 3 + + =over 3.5 + + =over + +=item "=item" + +This command indicates that an item in a list begins here. Formatting +codes are processed. The semantics of the (optional) text in the +remainder of this paragraph are +explained in the L section, further +below. Examples: + + =item + + =item * + + =item * + + =item 14 + + =item 3. + + =item C<< $thing->stuff(I) >> + + =item For transporting us beyond seas to be tried for pretended + offenses + + =item He is at this time transporting large armies of foreign + mercenaries to complete the works of death, desolation and + tyranny, already begun with circumstances of cruelty and perfidy + scarcely paralleled in the most barbarous ages, and totally + unworthy the head of a civilized nation. + +=item "=back" + +This command indicates that this is the end of the region begun +by the most recent "=over" command. It permits no text after the +"=back" command. + +=item "=begin formatname" + +This marks the following paragraphs (until the matching "=end +formatname") as being for some special kind of processing. Unless +"formatname" begins with a colon, the contained non-command +paragraphs are data paragraphs. But if "formatname" I begin +with a colon, then non-command paragraphs are ordinary paragraphs +or data paragraphs. This is discussed in detail in the section +L=end" Regions>. + +It is advised that formatnames match the regexp +C. Implementors should anticipate future +expansion in the semantics and syntax of the first parameter +to "=begin"/"=end"/"=for". + +=item "=end formatname" + +This marks the end of the region opened by the matching +"=begin formatname" region. If "formatname" is not the formatname +of the most recent open "=begin formatname" region, then this +is an error, and must generate an error message. This +is discussed in detail in the section +L=end" Regions>. + +=item "=for formatname text..." + +This is synonymous with: + + =begin formatname + + text... + + =end formatname + +That is, it creates a region consisting of a single paragraph; that +paragraph is to be treated as a normal paragraph if "formatname" +begins with a ":"; if "formatname" I begin with a colon, +then "text..." will constitute a data paragraph. There is no way +to use "=for formatname text..." to express "text..." as a verbatim +paragraph. + +=back + +If a Pod processor sees any command other than the ones listed +above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", +or "=w123"), that processor must by default treat this as an +error. It must not process the paragraph beginning with that +command, must by default warn of this as an error, and may +abort the parse. A Pod parser may allow a way for particular +applications to add to the above list of known commands, and to +stipulate, for each additional command, whether formatting +codes should be processed. + +Future versions of this specification may add additional +commands. + + + +=head1 Pod Formatting Codes + +(Note that in previous drafts of this document and of perlpod, +formatting codes were referred to as "interior sequences", and +this term may still be found in the documentation for Pod parsers, +and in error messages from Pod processors.) + +There are two syntaxes for formatting codes: + +=over + +=item * + +A formatting code starts with a capital letter (just US-ASCII [A-Z]) +followed by a "<", any number of characters, and ending with the first +matching ">". Examples: + + That's what I think! + + What's C for? + + X and C Under Different Operating Systems> + +=item * + +A formatting code starts with a capital letter (just US-ASCII [A-Z]) +followed by two or more "<"'s, one or more whitespace characters, +any number of characters, one or more whitespace characters, +and ending with the first matching sequence of two or more ">"'s, where +the number of ">"'s equals the number of "<"'s in the opening of this +formatting code. Examples: + + That's what I<< you >> think! + + C<<< open(X, ">>thing.dat") || die $! >>> + + B<< $foo->bar(); >> + +With this syntax, the whitespace character(s) after the "CE<<" +and before the ">>" (or whatever letter) are I renderable -- they +do not signify whitespace, are merely part of the formatting codes +themselves. That is, these are all synonymous: + + C + C<< thing >> + C<< thing >> + C<<< thing >>> + C<<<< + thing + >>>> + +and so on. + +=back + +In parsing Pod, a notably tricky part is the correct parsing of +(potentially nested!) formatting codes. Implementors should +consult the code in the C routine in Pod::Parser as an +example of a correct implementation. + +=over + +=item CtextE> -- italic text + +See the brief discussion in L. + +=item CtextE> -- bold text + +See the brief discussion in L. + +=item CcodeE> -- code text + +See the brief discussion in L. + +=item CfilenameE> -- style for filenames + +See the brief discussion in L. + +=item Ctopic nameE> -- an index entry + +See the brief discussion in L. + +This code is unusual in that most formatters completely discard +this code and its content. Other formatters will render it with +invisible codes that can be used in building an index of +the current document. + +=item CE> -- a null (zero-effect) formatting code + +Discussed briefly in L. + +This code is unusual is that it should have no content. That is, +a processor may complain if it sees CpotatoesE>. Whether +or not it complains, the I text should ignored. + +=item CnameE> -- a hyperlink + +The complicated syntaxes of this code are discussed at length in +L, and implementation details are +discussed below, in L...E Codes">. Parsing the +contents of LEcontent> is tricky. Notably, the content has to be +checked for whether it looks like a URL, or whether it has to be split +on literal "|" and/or "/" (in the right order!), and so on, +I EE...> codes are resolved. + +=item CescapeE> -- a character escape + +See L, and several points in +L. + +=item CtextE> -- text contains non-breaking spaces + +This formatting code is syntactically simple, but semantically +complex. What it means is that each space in the printable +content of this code signifies a nonbreaking space. + +Consider: + + C<$x ? $y : $z> + + S> + +Both signify the monospace (c[ode] style) text consisting of +"$x", one space, "?", one space, ":", one space, "$z". The +difference is that in the latter, with the S code, those spaces +are not "normal" spaces, but instead are nonbreaking spaces. + +=back + + +If a Pod processor sees any formatting code other than the ones +listed above (as in "NE...>", or "QE...>", etc.), that +processor must by default treat this as an error. +A Pod parser may allow a way for particular +applications to add to the above list of known formatting codes; +a Pod parser might even allow a way to stipulate, for each additional +command, whether it requires some form of special processing, as +LE...> does. + +Future versions of this specification may add additional +formatting codes. + +Historical note: A few older Pod processors would not see a ">" as +closing a "CE" code, if the ">" was immediately preceded by +a "-". This was so that this: + + C<$foo->bar> + +would parse as equivalent to this: + + C<$foo-Ebar> + +instead of as equivalent to a "C" formatting code containing +only "$foo-", and then a "bar>" outside the "C" formatting code. This +problem has since been solved by the addition of syntaxes like this: + + C<< $foo->bar >> + +Compliant parsers must not treat "->" as special. + +Formatting codes absolutely cannot span paragraphs. If a code is +opened in one paragraph, and no closing code is found by the end of +that paragraph, the Pod parser must close that formatting code, +and should complain (as in "Unterminated I code in the paragraph +starting at line 123: 'Time objects are not...'"). So these +two paragraphs: + + I + +...must I be parsed as two paragraphs in italics (with the I +code starting in one paragraph and starting in another.) Instead, +the first paragraph should generate a warning, but that aside, the +above code must parse as if it were: + + I + + Don't make me say it again!E + +(In SGMLish jargon, all Pod commands are like block-level +elements, whereas all Pod formatting codes are like inline-level +elements.) + + + +=head1 Notes on Implementing Pod Processors + +The following is a long section of miscellaneous requirements +and suggestions to do with Pod processing. + +=over + +=item * + +Pod formatters should tolerate lines in verbatim blocks that are of +any length, even if that means having to break them (possibly several +times, for very long lines) to avoid text running off the side of the +page. Pod formatters may warn of such line-breaking. Such warnings +are particularly appropriate for lines are over 100 characters long, which +are usually not intentional. + +=item * + +Pod parsers must recognize I of the three well-known newline +formats: CR, LF, and CRLF. See L. + +=item * + +Pod parsers should accept input lines that are of any length. + +=item * + +Since Perl recognizes a Unicode Byte Order Mark at the start of files +as signaling that the file is Unicode encoded as in UTF-16 (whether +big-endian or little-endian) or UTF-8, Pod parsers should do the +same. Otherwise, the character encoding should be understood as +being UTF-8 if the first highbit byte sequence in the file seems +valid as a UTF-8 sequence, or otherwise as Latin-1. + +Future versions of this specification may specify +how Pod can accept other encodings. Presumably treatment of other +encodings in Pod parsing would be as in XML parsing: whatever the +encoding declared by a particular Pod file, content is to be +stored in memory as Unicode characters. + +=item * + +The well known Unicode Byte Order Marks are as follows: if the +file begins with the two literal byte values 0xFE 0xFF, this is +the BOM for big-endian UTF-16. If the file begins with the two +literal byte value 0xFF 0xFE, this is the BOM for little-endian +UTF-16. If the file begins with the three literal byte values +0xEF 0xBB 0xBF, this is the BOM for UTF-8. + +=for comment + use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; + 0xEF 0xBB 0xBF + +=for comment + If toke.c is modified to support UTF32, add mention of those here. + +=item * + +A naive but sufficient heuristic for testing the first highbit +byte-sequence in a BOM-less file (whether in code or in Pod!), to see +whether that sequence is valid as UTF-8 (RFC 2279) is to check whether +that the first byte in the sequence is in the range 0xC0 - 0xFD +I whether the next byte is in the range +0x80 - 0xBF. If so, the parser may conclude that this file is in +UTF-8, and all highbit sequences in the file should be assumed to +be UTF-8. Otherwise the parser should treat the file as being +in Latin-1. In the unlikely circumstance that the first highbit +sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one +can cater to our heuristic (as well as any more intelligent heuristic) +by prefacing that line with a comment line containing a highbit +sequence that is clearly I valid as UTF-8. A line consisting +of simply "#", an e-acute, and any non-highbit byte, +is sufficient to establish this file's encoding. + +=for comment + If/WHEN some brave soul makes these heuristics into a generic + text-file class (or file discipline?), we can presumably delete + mention of these icky details from this file, and can instead + tell people to just use appropriate class/discipline. + Auto-recognition of newline sequences would be another desirable + feature of such a class/discipline. + HINT HINT HINT. + +=for comment + "The probability that a string of characters + in any other encoding appears as valid UTF-8 is low" - RFC2279 + +=item * + +This document's requirements and suggestions about encodings +do not apply to Pod processors running on non-ASCII platforms, +notably EBCDIC platforms. + +=item * + +Pod processors must treat a "=for [label] [content...]" paragraph as +meaning the same thing as a "=begin [label]" paragraph, content, and +an "=end [label]" paragraph. (The parser may conflate these two +constructs, or may leave them distinct, in the expectation that the +formatter will nevertheless treat them the same.) + +=item * + +When rendering Pod to a format that allows comments (i.e., to nearly +any format other than plaintext), a Pod formatter must insert comment +text identifying its name and version number, and the name and +version numbers of any modules it might be using to process the Pod. +Minimal examples: + + %% POD::Pod2PS v3.14159, using POD::Parser v1.92 + + + + {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} + + .\" Pod::Man version 3.14159, using POD::Parser version 1.92 + +Formatters may also insert additional comments, including: the +release date of the Pod formatter program, the contact address for +the author(s) of the formatter, the current time, the name of input +file, the formatting options in effect, version of Perl used, etc. + +Formatters may also choose to note errors/warnings as comments, +besides or instead of emitting them otherwise (as in messages to +STDERR, or Cing). + +=item * + +Pod parsers I emit warnings or error messages ("Unknown E code +EEzslig>!") to STDERR (whether through printing to STDERR, or +Cing/Cing, or Cing/Cing), but I allow +suppressing all such STDERR output, and instead allow an option for +reporting errors/warnings +in some other way, whether by triggering a callback, or noting errors +in some attribute of the document object, or some similarly unobtrusive +mechanism -- or even by appending a "Pod Errors" section to the end of +the parsed form of the document. + +=item * + +In cases of exceptionally aberrant documents, Pod parsers may abort the +parse. Even then, using Cing/Cing is to be avoided; where +possible, the parser library may simply close the input file +and add text like "*** Formatting Aborted ***" to the end of the +(partial) in-memory document. + +=item * + +In paragraphs where formatting codes (like EE...>, BE...>) +are understood (i.e., I verbatim paragraphs, but I +ordinary paragraphs, and command paragraphs that produce renderable +text, like "=head1"), literal whitespace should generally be considered +"insignificant", in that one literal space has the same meaning as any +(nonzero) number of literal spaces, literal newlines, and literal tabs +(as long as this produces no blank lines, since those would terminate +the paragraph). Pod parsers should compact literal whitespace in each +processed paragraph, but may provide an option for overriding this +(since some processing tasks do not require it), or may follow +additional special rules (for example, specially treating +period-space-space or period-newline sequences). + +=item * + +Pod parsers should not, by default, try to coerce apostrophe (') and +quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to +turn backtick (`) into anything else but a single backtick character +(distinct from an openquote character!), nor "--" into anything but +two minus signs. They I do any of those things to text +in CE...> formatting codes, and never I to text in verbatim +paragraphs. + +=item * + +When rendering Pod to a format that has two kinds of hyphens (-), one +that's a nonbreaking hyphen, and another that's a breakable hyphen +(as in "object-oriented", which can be split across lines as +"object-", newline, "oriented"), formatters are encouraged to +generally translate "-" to nonbreaking hyphen, but may apply +heuristics to convert some of these to breaking hyphens. + +=item * + +Pod formatters should make reasonable efforts to keep words of Perl +code from being broken across lines. For example, "Foo::Bar" in some +formatting systems is seen as eligible for being broken across lines +as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should +be avoided where possible, either by disabling all line-breaking in +mid-word, or by wrapping particular words with internal punctuation +in "don't break this across lines" codes (which in some formats may +not be a single code, but might be a matter of inserting non-breaking +zero-width spaces between every pair of characters in a word.) + +=item * + +Pod parsers should, by default, expand tabs in verbatim paragraphs as +they are processed, before passing them to the formatter or other +processor. Parsers may also allow an option for overriding this. + +=item * + +Pod parsers should, by default, remove newlines from the end of +ordinary and verbatim paragraphs before passing them to the +formatter. For example, while the paragraph you're reading now +could be considered, in Pod source, to end with (and contain) +the newline(s) that end it, it should be processed as ending with +(and containing) the period character that ends this sentence. + +=item * + +Pod parsers, when reporting errors, should make some effort to report +an approximate line number ("Nested EE>'s in Paragraph #52, near +line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph +number ("Nested EE>'s in Paragraph #52 of Thing/Foo.pm!"). Where +this is problematic, the paragraph number should at least be +accompanied by an excerpt from the paragraph ("Nested EE>'s in +Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for +the CEinterest rate> attribute...'"). + +=item * + +Pod parsers, when processing a series of verbatim paragraphs one +after another, should consider them to be one large verbatim +paragraph that happens to contain blank lines. I.e., these two +lines, which have an blank line between them: + + use Foo; + + print Foo->VERSION + +should be unified into one paragraph ("\tuse Foo;\n\n\tprint +Foo->VERSION") before being passed to the formatter or other +processor. Parsers may also allow an option for overriding this. + +While this might be too cumbersome to implement in event-based Pod +parsers, it is straightforward for parsers that return parse trees. + +=item * + +Pod formatters, where feasible, are advised to avoid splitting short +verbatim paragraphs (under twelve lines, say) across pages. + +=item * + +Pod parsers must treat a line with only spaces and/or tabs on it as a +"blank line" such as separates paragraphs. (Some older parsers +recognized only two adjacent newlines as a "blank line" but would not +recognize a newline, a space, and a newline, as a blank line. This +is noncompliant behavior.) + +=item * + +Authors of Pod formatters/processors should make every effort to +avoid writing their own Pod parser. There are already several in +CPAN, with a wide range of interface styles -- and one of them, +Pod::Parser, comes with modern versions of Perl. + +=item * + +Characters in Pod documents may be conveyed either as literals, or by +number in EEn> codes, or by an equivalent mnemonic, as in +EEeacute> which is exactly equivalent to EE233>. + +Characters in the range 32-126 refer to those well known US-ASCII +characters (also defined there by Unicode, with the same meaning), +which all Pod formatters must render faithfully. Characters +in the ranges 0-31 and 127-159 should not be used (neither as +literals, nor as EEnumber> codes), except for the +literal byte-sequences for newline (13, 13 10, or 13), and tab (9). + +Characters in the range 160-255 refer to Latin-1 characters (also +defined there by Unicode, with the same meaning). Characters above +255 should be understood to refer to Unicode characters. + +=item * + +Be warned +that some formatters cannot reliably render characters outside 32-126; +and many are able to handle 32-126 and 160-255, but nothing above +255. + +=item * + +Besides the well-known "EElt>" and "EEgt>" codes for +less-than and greater-than, Pod parsers must understand "EEsol>" +for "/" (solidus, slash), and "EEverbar>" for "|" (vertical bar, +pipe). Pod parsers should also understand "EElchevron>" and +"EErchevron>" as legacy codes for characters 171 and 187, i.e., +"left-pointing double angle quotation mark" = "left pointing +guillemet" and "right-pointing double angle quotation mark" = "right +pointing guillemet". (These look like little "<<" and ">>", and they +are now preferably expressed with the HTML/XHTML codes "EElaquo>" +and "EEraquo>".) + +=item * + +Pod parsers should understand all "EEhtml>" codes as defined +in the entity declarations in the most recent XHTML specification at +C. Pod parsers must understand at least the entities +that define characters in the range 160-255 (Latin-1). Pod parsers, +when faced with some unknown "EEI>" code, +shouldn't simply replace it with nullstring (by default, at least), +but may pass it through as a string consisting of the literal characters +E, less-than, I, greater-than. Or Pod parsers may offer the +alternative option of processing such unknown +"EEI>" codes by firing an event especially +for such codes, or by adding a special node-type to the in-memory +document tree. Such "EEI>" may have special meaning +to some processors, or some processors may choose to add them to +a special error report. + +=item * + +Pod parsers must also support the XHTML codes "EEquot>" for +character 34 (doublequote, "), "EEamp>" for character 38 +(ampersand, &), and "EEapos>" for character 39 (apostrophe, '). + +=item * + +Note that in all cases of "EEwhatever>", I (whether +an htmlname, or a number in any base) must consist only of +alphanumeric characters -- that is, I must watch +C. So "EE 0 1 2 3 >" is invalid, because +it contains spaces, which aren't alphanumeric characters. This +presumably does not I special treatment by a Pod processor; +" 0 1 2 3 " doesn't look like a number in any base, so it would +presumably be looked up in the table of HTML-like names. Since +there is (and cannot be) an HTML-like entity called " 0 1 2 3 ", +this will be treated as an error. However, Pod processors may +treat "EE 0 1 2 3 >" or "EEe-acute>" as I +invalid, potentially earning a different error message than the +error message (or warning, or event) generated by a merely unknown +(but theoretically valid) htmlname, as in "EEqacute>" +[sic]. However, Pod parsers are not required to make this +distinction. + +=item * + +Note that EEnumber> I be interpreted as simply +"codepoint I in the current/native character set". It always +means only "the character represented by codepoint I in +Unicode." (This is identical to the semantics of &#I; in XML.) + +This will likely require many formatters to have tables mapping from +treatable Unicode codepoints (such as the "\xE9" for the e-acute +character) to the escape sequences or codes necessary for conveying +such sequences in the target output format. A converter to *roff +would, for example know that "\xE9" (whether conveyed literally, or via +a EE...> sequence) is to be conveyed as "e\\*'". +Similarly, a program rendering Pod in a MacOS application window, would +presumably need to know that "\xE9" maps to codepoint 142 in MacRoman +encoding that (at time of writing) is native for MacOS. Such +Unicode2whatever mappings are presumably already widely available for +common output formats. (Such mappings may be incomplete! Implementers +are not expected to bend over backwards in an attempt to render +Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any +of the other weird things that Unicode can encode.) And +if a Pod document uses a character not found in such a mapping, the +formatter should consider it an unrenderable character. + +=item * + +If, surprisingly, the implementor of a Pod formatter can't find a +satisfactory pre-existing table mapping from Unicode characters to +escapes in the target format (e.g., a decent table of Unicode +characters to *roff escapes), it will be necessary to build such a +table. If you are in this circumstance, you should begin with the +characters in the range 0x00A0 - 0x00FF, which is mostly the heavily +used accented characters. Then proceed (as patience permits and +fastidiousness compels) through the characters that the (X)HTML +standards groups judged important enough to merit mnemonics +for. These are declared in the (X)HTML specifications at the +www.W3.org site. At time of writing (September 2001), the most recent +entity declaration files are: + + http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent + http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent + http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent + +Then you can progress through any remaining notable Unicode characters +in the range 0x2000-0x204D (consult the character tables at +www.unicode.org), and whatever else strikes your fancy. For example, +in F, there is the entry: + + + +While the mapping "infin" to the character "\x{221E}" will (hopefully) +have been already handled by the Pod parser, the presence of the +character in this file means that it's reasonably important enough to +include in a formatter's table that maps from notable Unicode characters +to the codes necessary for rendering them. So for a Unicode-to-*roff +mapping, for example, this would merit the entry: + + "\x{221E}" => '\(in', + +It is eagerly hoped that in the future, increasing numbers of formats +(and formatters) will support Unicode characters directly (as (X)HTML +does with C<∞>, C<∞>, or C<∞>), reducing the need +for idiosyncratic mappings of Unicode-to-I. + +=item * + +It is up to individual Pod formatter to display good judgment when +confronted with an unrenderable character (which is distinct from an +unknown EEthing> sequence that the parser couldn't resolve to +anything, renderable or not). It is good practice to map Latin letters +with diacritics (like "EEeacute>"/"EE233>") to the corresponding +unaccented US-ASCII letters (like a simple character 101, "e"), but +clearly this is often not feasable, and an unrenderable character may +be represented as "?", or the like. In attempting a sane fallback +(as from EE233> to "e"), Pod formatters may use the +%Latin1Code_to_fallback table in L, or +L, if available. + +For example, this Pod text: + + magic is enabled if you set C<$Currency> to 'E'. + +may be rendered as: +"magic is enabled if you set C<$Currency> to 'I'" or as +"magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as +"magic is enabled if you set C<$Currency> to '[x20AC]', etc. + +A Pod formatter may also note, in a comment or warning, a list of what +unrenderable characters were encountered. + +=item * + +EE...> may freely appear in any formatting code (other than +in another EE...> or in an ZE>). That is, "XEThe +EEeuro>1,000,000 Solution>" is valid, as is "LEThe +EEeuro>1,000,000 Solution|Million::Euros>". + +=item * + +Some Pod formatters output to formats that implement nonbreaking +spaces as an individual character (which I'll call "NBSP"), and +others output to formats that implement nonbreaking spaces just as +spaces wrapped in a "don't break this across lines" code. Note that +at the level of Pod, both sorts of codes can occur: Pod can contain a +NBSP character (whether as a literal, or as a "EE160>" or +"EEnbsp>" code); and Pod can contain "SEfoo +IEbarE baz>" codes, where "mere spaces" (character 32) in +such codes are taken to represent nonbreaking spaces. Pod +parsers should consider supporting the optional parsing of "SEfoo +IEbarE baz>" as if it were +"fooIIEbarEIbaz", and, going the other way, the +optional parsing of groups of words joined by NBSP's as if each group +were in a SE...> code, so that formatters may use the +representation that maps best to what the output format demands. + +=item * + +Some processors may find it the C...E> code easiest to +implement by replacing each space in the parse tree under the content +of the S, with an NBSP. But note: the replacement should apply I to +spaces in I text, but I to spaces in I text. (This +distinction may or may not be evident in the particular tree/event +model implemented by the Pod parser.) For example, consider this +unusual case: + + S> + +This means that the space in the middle of the visible link text must +not be broken across lines. In other words, it's the same as this: + + L<"AutoloadedE<160>Functions"/Autoloaded Functions> + +However, a misapplied space-to-NBSP replacement could (wrongly) +produce something equivalent to this: + + L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> + +...which is almost definitely not going to work as a hyperlink (assuming +this formatter outputs a format supporting hypertext). + +Formatters may choose to just not support the S format code, +especially in cases where the output format simply has no NBSP +character/code and no code for "don't break this stuff across lines". + +=item * + +Besides the NBSP character discussed above, implementors are reminded +of the existence of the other "special" character in Latin-1, the +"soft hyphen" chararacter, also known as "discretionary hyphen", +i.e. C173E> = C0xADE> = +CshyE>). This character expresses an optional hyphenation +point. That is, it normally renders as nothing, but may render as a +"-" if a formatter breaks the word at that point. Pod formatters +should, as appropriate, do one of the following: 1) render this with +a code with the same meaning (e.g., "\-" in RTF), 2) pass it through +in the expectation that the formatter understands this character as +such, or 3) delete it. + +For example: + + sigEaction + manuEscript + JarkEko HieEtaEnieEmi + +These signal to a formatter that if it is to hyphenate "sigaction" +or "manuscript", then it should be done as +"sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" +(and if it doesn't hyphenate it, then the CshyE> doesn't +show up at all). And if it is +to hyphenate "Jarkko" and/or "Hietaniemi", it can do +so only at the points where there is a CshyE> code. + +In practice, it is anticipated that this character will not be used +often, but formatters should either support it, or delete it. + +=item * + +If you think that you want to add a new command to Pod (like, say, a +"=biblio" command), consider whether you could get the same +effect with a for or begin/end sequence: "=for biblio ..." or "=begin +biblio" ... "=end biblio". Pod processors that don't understand +"=for biblio", etc, will simply ignore it, whereas they may complain +loudly if they see "=biblio". + +=item * + +Throughout this document, "Pod" has been the preferred spelling for +the name of the documentation format. One may also use "POD" or +"pod". For the the documentation that is (typically) in the Pod +format, you may use "pod", or "Pod", or "POD". Understanding these +distinctions is useful; but obsessing over how to spell them, usually +is not. + +=back + + + + + +=head1 About LE...E Codes + +As you can tell from a glance at L, the LE...> +code is the most complex of the Pod formatting codes. The points below +will hopefully clarify what it means and how processors should deal +with it. + +=over + +=item * + +In parsing an LE...> code, Pod parsers must distinguish at least +four attributes: + +=over + +=item First: + +The link-text. If there is none, this must be undef. (E.g., in +"LEPerl Functions|perlfunc>", the link-text is "Perl Functions". +In "LETime::HiRes>" and even "LE|Time::HiRes>", there is no +link text. Note that link text may contain formatting.) + +=item Second: + +The possibly inferred link-text -- i.e., if there was no real link +text, then this is the text that we'll infer in its place. (E.g., for +"LEGetopt::Std>", the inferred link text is "Getopt::Std".) + +=item Third: + +The name or URL, or undef if none. (E.g., in "LEPerl +Functions|perlfunc>", the name -- also sometimes called the page -- +is "perlfunc". In "LE/CAVEATS>", the name is undef.) + +=item Fourth: + +The section (AKA "item" in older perlpods), or undef if none. E.g., +in L, "DESCRIPTION" is the section. (Note +that this is not the same as a manpage section like the "5" in "man 5 +crontab". "Section Foo" in the Pod sense means the part of the text +that's introduced by the heading or item whose text is "Foo". + +=back + +Pod parsers may also note additional attributes including: + +=over + +=item Fifth: + +A flag for whether item 3 (if present) is a URL (like +"http://lists.perl.org" is), in which case there should be no section +attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or +possibly a man page name (like "crontab(5)" is). + +=item Sixth: + +The raw original LE...> content, before text is split on +"|", "/", etc, and before EE...> codes are expanded. + +=back + +(The above were numbered only for concise reference below. It is not +a requirement that these be passed as an actual list or array.) + +For example: + + L + => undef, # link text + "Foo::Bar", # possibly inferred link text + "Foo::Bar", # name + undef, # section + 'pod', # what sort of link + "Foo::Bar" # original content + + L + => "Perlport's section on NL's", # link text + "Perlport's section on NL's", # possibly inferred link text + "perlport", # name + "Newlines", # section + 'pod', # what sort of link + "Perlport's section on NL's|perlport/Newlines" # orig. content + + L + => undef, # link text + '"Newlines" in perlport', # possibly inferred link text + "perlport", # name + "Newlines", # section + 'pod', # what sort of link + "perlport/Newlines" # original content + + L + => undef, # link text + '"DESCRIPTION" in crontab(5)', # possibly inferred link text + "crontab(5)", # name + "DESCRIPTION", # section + 'man', # what sort of link + 'crontab(5)/"DESCRIPTION"' # original content + + L + => undef, # link text + '"Object Attributes"', # possibly inferred link text + undef, # name + "Object Attributes", # section + 'pod', # what sort of link + "/Object Attributes" # original content + + L + => undef, # link text + "http://www.perl.org/", # possibly inferred link text + "http://www.perl.org/", # name + undef, # section + 'url', # what sort of link + "http://www.perl.org/" # original content + +Note that you can distinguish URL-links from anything else by the +fact that they match C. So +Chttp://www.perl.comE> is a URL, but +CHTTP::ResponseE> isn't. + +=item * + +In case of LE...> codes with no "text|" part in them, +older formatters have exhibited great variation in actually displaying +the link or cross reference. For example, LEcrontab(5)> would render +as "the C manpage", or "in the C manpage" +or just "C". + +Pod processors must now treat "text|"-less links as follows: + + L => L + L
=> L<"section"|/section> + L => L<"section" in name|name/section> + +=item * + +Note that section names might contain markup. I.e., if a section +starts with: + + =head2 About the C<-M> Operator + +or with: + + =item About the C<-M> Operator + +then a link to it would look like this: + + L Operator> + +Formatters may choose to ignore the markup for purposes of resolving +the link and use only the renderable characters in the section name, +as in: + +

About the -M + Operator

+ + ... + + About the -M + Operator" in somedoc + +=item * + +Previous versions of perlpod distinguished Cname/"section"E> +links from Cname/itemE> links (and their targets). These +have been merged syntactically and semantically in the current +specification, and I
can refer either to a "=headI Heading +Content" command or to a "=item Item Content" command. This +specification does not specify what behavior should be in the case +of a given document having several things all seeming to produce the +same I
identifier (e.g., in HTML, several things all producing +the same I in ... +elements). Where Pod processors can control this behavior, they should +use the first such anchor. That is, CFoo/BarE> refers to the +I "Bar" section in Foo. + +But for some processors/formats this cannot be easily controlled; as +with the HTML example, the behavior of multiple ambiguous +... is most easily just left up to +browsers to decide. + +=item * + +Authors wanting to link to a particular (absolute) URL, must do so +only with "LEscheme:...>" codes (like +LEhttp://www.perl.org>), and must not attempt "LESome Site +Name|scheme:...>" codes. This restriction avoids many problems +in parsing and rendering LE...> codes. + +=item * + +In a Ctext|...E> code, text may contain formatting codes +for formatting or for EE...> escapes, as in: + + Lstuff>|...> + +For C...E> codes without a "name|" part, only +C...E> and CE> codes may occur -- no +other formatting codes. That is, authors should not use +"CBEFoo::BarEE>". + +Note, however, that formatting codes and ZE>'s can occur in any +and all parts of an LE...> (i.e., in I, I
, I, +and I). + +Authors must not nest LE...> codes. For example, "LEThe +LEFoo::Bar> man page>" should be treated as an error. + +=item * + +Note that Pod authors may use formatting codes inside the "text" +part of "LEtext|name>" (and so on for LEtext|/"sec">). + +In other words, this is valid: + + Go read L|perlvar/"$."> + +Some output formats that do allow rendering "LE...>" codes as +hypertext, might not allow the link-text to be formatted; in +that case, formatters will have to just ignore that formatting. + +=item * + +At time of writing, CnameE> values are of two types: +either the name of a Pod page like CFoo::BarE> (which +might be a real Perl module or program in an @INC / PATH +directory, or a .pod file in those places); or the name of a UNIX +man page, like Ccrontab(5)E>. In theory, CchmodE> +in ambiguous between a Pod page called "chmod", or the Unix man page +"chmod" (in whatever man-section). However, the presence of a string +in parens, as in "crontab(5)", is sufficient to signal that what +is being discussed is not a Pod page, and so is presumably a +UNIX man page. The distinction is of no importance to many +Pod processors, but some processors that render to hypertext formats +may need to distinguish them in order to know how to render a +given CfooE> code. + +=item * + +Previous versions of perlpod allowed for a CsectionE> syntax +(as in "CObject AttributesE>"), which was not easily distinguishable +from CnameE> syntax. This syntax is no longer in the +specification, and has been replaced by the C"section"E> syntax +(where the quotes were formerly optional). Pod parsers should tolerate +the CsectionE> syntax, for a while at least. The suggested +heuristic for distinguishing CsectionE> from CnameE> +is that if it contains any whitespace, it's a I
. Pod processors +may warn about this being deprecated syntax. + +=back + +=head1 About =over...=back Regions + +"=over"..."=back" regions are used for various kinds of list-like +structures. (I use the term "region" here simply as a collective +term for everything from the "=over" to the matching "=back".) + +=over + +=item * + +The non-zero numeric I in "=over I" ... +"=back" is used for giving the formatter a clue as to how many +"spaces" (ems, or roughly equivalent units) it should tab over, +although many formatters will have to convert this to an absolute +measurement that may not exactly match with the size of spaces (or M's) +in the document's base font. Other formatters may have to completely +ignore the number. The lack of any explicit I parameter is +equivalent to an I value of 4. Pod processors may +complain if I is present but is not a positive number +matching C. + +=item * + +Authors of Pod formatters are reminded that "=over" ... "=back" may +map to several different constructs in your output format. For +example, in converting Pod to (X)HTML, it can map to any of +
    ...
,
    ...
,
...
, or +
...
. Similarly, "=item" can map to
  • or +
    . + +=item * + +Each "=over" ... "=back" region should be one of the following: + +=over + +=item * + +An "=over" ... "=back" region containing only "=item *" commands, +each followed by some number of ordinary/verbatim paragraphs, other +nested "=over" ... "=back" regions, "=for..." paragraphs, and +"=begin"..."=end" regions. + +(Pod processors must tolerate a bare "=item" as if it were "=item +*".) Whether "*" is rendered as a literal asterisk, an "o", or as +some kind of real bullet character, is left up to the Pod formatter, +and may depend on the level of nesting. + +=item * + +An "=over" ... "=back" region containing only +C paragraphs, each one (or each group of them) +followed by some number of ordinary/verbatim paragraphs, other nested +"=over" ... "=back" regions, "=for..." paragraphs, and/or +"=begin"..."=end" codes. Note that the numbers must start at 1 +in each section, and must proceed in order and without skipping +numbers. + +(Pod processors must tolerate lines like "=item 1" as if they were +"=item 1.", with the period.) + +=item * + +An "=over" ... "=back" region containing only "=item [text]" +commands, each one (or each group of them) followed by some number of +ordinary/verbatim paragraphs, other nested "=over" ... "=back" +regions, or "=for..." paragraphs, and "=begin"..."=end" regions. + +The "=item [text]" paragraph should not match +C or C, nor should it +match just C. + +=item * + +An "=over" ... "=back" region containing no "=item" paragraphs at +all, and containing only some number of +ordinary/verbatim paragraphs, and possibly also some nested "=over" +... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" +regions. Such an itemless "=over" ... "=back" region in Pod is +equivalent in meaning to a "
    ...
    " element in +HTML. + +=back + +Note that with all the above cases, you can determine which type of +"=over" ... "=back" you have, by examining the first (non-"=cut", +non-"=pod") Pod paragraph after the "=over" command. + +=item * + +Pod formatters I tolerate arbitrarily large amounts of text +in the "=item I" paragraph. In practice, most such +paragraphs are short, as in: + + =item For cutting off our trade with all parts of the world + +But they may be arbitrarily long: + + =item For transporting us beyond seas to be tried for pretended + offenses + + =item He is at this time transporting large armies of foreign + mercenaries to complete the works of death, desolation and + tyranny, already begun with circumstances of cruelty and perfidy + scarcely paralleled in the most barbarous ages, and totally + unworthy the head of a civilized nation. + +=item * + +Pod processors should tolerate "=item *" / "=item I" commands +with no accompanying paragraph. The middle item is an example: + + =over + + =item 1 + + Pick up dry cleaning. + + =item 2 + + =item 3 + + Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. + + =back + +=item * + +No "=over" ... "=back" region can contain headings. Processors may +treat such a heading as an error. + +=item * + +Note that an "=over" ... "=back" region should have some +content. That is, authors should not have an empty region like this: + + =over + + =back + +Pod processors seeing such a contentless "=over" ... "=back" region, +may ignore it, or may report it as an error. + +=item * + +Processors must tolerate an "=over" list that goes off the end of the +document (i.e., which has no matching "=back"), but they may warn +about such a list. + +=item * + +Authors of Pod formatters should note that this construct: + + =item Neque + + =item Porro + + =item Quisquam Est + + Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci + velit, sed quia non numquam eius modi tempora incidunt ut + labore et dolore magnam aliquam quaerat voluptatem. + + =item Ut Enim + +is semantically ambiguous, in a way that makes formatting decisions +a bit difficult. On the one hand, it could be mention of an item +"Neque", mention of another item "Porro", and mention of another +item "Quisquam Est", with just the last one requiring the explanatory +paragraph "Qui dolorem ipsum quia dolor..."; and then an item +"Ut Enim". In that case, you'd want to format it like so: + + Neque + + Porro + + Quisquam Est + Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci + velit, sed quia non numquam eius modi tempora incidunt ut + labore et dolore magnam aliquam quaerat voluptatem. + + Ut Enim + +But it could equally well be a discussion of three (related or equivalent) +items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph +explaining them all, and then a new item "Ut Enim". In that case, you'd +probably want to format it like so: + + Neque + Porro + Quisquam Est + Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci + velit, sed quia non numquam eius modi tempora incidunt ut + labore et dolore magnam aliquam quaerat voluptatem. + + Ut Enim + +But (for the forseeable future), Pod does not provide any way for Pod +authors to distinguish which grouping is meant by the above +"=item"-cluster structure. So formatters should format it like so: + + Neque + + Porro + + Quisquam Est + + Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci + velit, sed quia non numquam eius modi tempora incidunt ut + labore et dolore magnam aliquam quaerat voluptatem. + + Ut Enim + +That is, there should be (at least roughtly) equal spacing between +items as between paragraphs (although that spacing may well be less +than the full height of a line of text). This leaves it to the reader +to use (con)textual cues to figure out whether the "Qui dolorem +ipsum..." paragraph applies to the "Quisquam Est" item or to all three +items "Neque", "Porro", and "Quisquam Est". While not an ideal +situation, this is preferable to providing formatting cues that may +be actually contrary to the author's intent. + +=back + + + +=head1 About Data Paragraphs and "=begin/=end" Regions + +Data paragraphs are typically used for inlining non-Pod data that is +to be used (typically passed through) when rendering the document to +a specific format: + + =begin rtf + + \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} + + =end rtf + +The exact same effect could, incidentally, be achieved with a single +"=for" paragraph: + + =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} + +(Although that is not formally a data paragraph, it has the same +meaning as one, and Pod parsers may parse it as one.) + +Another example of a data paragraph: + + =begin html + + I like PIE! + +
    Especially pecan pie! + + =end html + +If these were ordinary paragraphs, the Pod parser would try to +expand the "EE/em>" (in the first paragraph) as a formatting +code, just like "EElt>" or "EEeacute>". But since this +is in a "=begin I"..."=end I" region I +the identifier "html" doesn't begin have a ":" prefix, the contents +of this region are stored as data paragraphs, instead of being +processed as ordinary paragraphs (or if they began with a spaces +and/or tabs, as verbatim paragraphs). + +As a further example: At time of writing, no "biblio" identifier is +supported, but suppose some processor were written to recognize it as +a way of (say) denoting a bibliographic reference (necessarily +containing formatting codes in ordinary paragraphs). The fact that +"biblio" paragraphs were meant for ordinary processing would be +indicated by prefacing each "biblio" identifier with a colon: + + =begin :biblio + + Wirth, Niklaus. 1976. I Prentice-Hall, Englewood Cliffs, NJ. + + =end :biblio + +This would signal to the parser that paragraphs in this begin...end +region are subject to normal handling as ordinary/verbatim paragraphs +(while still tagged as meant only for processors that understand the +"biblio" identifier). The same effect could be had with: + + =for :biblio + Wirth, Niklaus. 1976. I Prentice-Hall, Englewood Cliffs, NJ. + +The ":" on these identifiers means simply "process this stuff +normally, even though the result will be for some special target". +I suggest that parser APIs report "biblio" as the target identifier, +but also report that it had a ":" prefix. (And similarly, with the +above "html", report "html" as the target identifier, and note the +I of a ":" prefix.) + +Note that a "=begin I"..."=end I" region where +I begins with a colon, I contain commands. For example: + + =begin :biblio + + Wirth's classic is available in several editions, including: + + =for comment + hm, check abebooks.com for how much used copies cost. + + =over + + =item + + Wirth, Niklaus. 1975. I + Teubner, Stuttgart. [Yes, it's in German.] + + =item + + Wirth, Niklaus. 1976. I Prentice-Hall, Englewood Cliffs, NJ. + + =back + + =end :biblio + +Note, however, a "=begin I"..."=end I" +region where I does I begin with a colon, should not +directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", +nor "=item". For example, this may be considered invalid: + + =begin somedata + + This is a data paragraph. + + =head1 Don't do this! + + This is a data paragraph too. + + =end somedata + +A Pod processor may signal that the above (specifically the "=head1" +paragraph) is an error. Note, however, that the following should +I be treated as an error: + + =begin somedata + + This is a data paragraph. + + =cut + + # Yup, this isn't Pod anymore. + sub excl { (rand() > .5) ? "hoo!" : "hah!" } + + =pod + + This is a data paragraph too. + + =end somedata + +And this too is valid: + + =begin someformat + + This is a data paragraph. + + And this is a data paragraph. + + =begin someotherformat + + This is a data paragraph too. + + And this is a data paragraph too. + + =begin :yetanotherformat + + =head2 This is a command paragraph! + + This is an ordinary paragraph! + + And this is a verbatim paragraph! + + =end :yetanotherformat + + =end someotherformat + + Another data paragraph! + + =end someformat + +The contents of the above "=begin :yetanotherformat" ... +"=end :yetanotherformat" region I data paragraphs, because +the immediately containing region's identifier (":yetanotherformat") +begins with a colon. In practice, most regions that contain +data paragraphs will contain I data paragraphs; however, +the above nesting is syntactically valid as Pod, even if it is +rare. However, the handlers for some formats, like "html", +will accept only data paragraphs, not nested regions; and they may +complain if they see (targeted for them) nested regions, or commands, +other than "=end", "=pod", and "=cut". + +Also consider this valid structure: + + =begin :biblio + + Wirth's classic is available in several editions, including: + + =over + + =item + + Wirth, Niklaus. 1975. I + Teubner, Stuttgart. [Yes, it's in German.] + + =item + + Wirth, Niklaus. 1976. I Prentice-Hall, Englewood Cliffs, NJ. + + =back + + Buy buy buy! + + =begin html + + + +
    + + =end html + + Now now now! + + =end :biblio + +There, the "=begin html"..."=end html" region is nested inside +the larger "=begin :biblio"..."=end :biblio" region. Note that the +content of the "=begin html"..."=end html" region is data +paragraph(s), because the immediately containing region's identifier +("html") I begin with a colon. + +Pod parsers, when processing a series of data paragraphs one +after another (within a single region), should consider them to +be one large data paragraph that happens to contain blank lines. So +the content of the above "=begin html"..."=end html" I be stored +as two data paragraphs (one consisting of +"\n" +and another consisting of "
    \n"), but I be stored as +a single data paragraph (consisting of +"\n\n
    \n"). + +Pod processors should tolerate empty +"=begin I"..."=end I" regions, +empty "=begin :I"..."=end :I" regions, and +contentless "=for I" and "=for :I" +paragraphs. I.e., these should be tolerated: + + =for html + + =begin html + + =end html + + =begin :biblio + + =end :biblio + +Incidentally, note that there's no easy way to express a data +paragraph starting with something that looks like a command. Consider: + + =begin stuff + + =shazbot + + =end stuff + +There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data +paragraph "=shazbot\n". However, you can express a data paragraph consisting +of "=shazbot\n" using this code: + + =for stuff =shazbot + +The situation where this is necessary, is presumably quite rare. + +Note that =end commands must match the currently open =begin command. That +is, they must properly nest. For example, this is valid: + + =begin outer + + X + + =begin inner + + Y + + =end inner + + Z + + =end outer + +while this is invalid: + + =begin outer + + X + + =begin inner + + Y + + =end outer + + Z + + =end inner + +This latter is improper because when the "=end outer" command is seen, the +currently open region has the formatname "inner", not "outer". (It just +happens that "outer" is the format name of a higher-up region.) This is +an error. Processors must by default report this as an error, and may halt +processing the document containing that error. A corrolary of this is that +regions cannot "overlap" -- i.e., the latter block above does not represent +a region called "outer" which contains X and Y, overlapping a region called +"inner" which contains Y and Z. But because it is invalid (as all +apparently overlapping regions would be), it doesn't represent that, or +anything at all. + +Similarly, this is invalid: + + =begin thing + + =end hting + +This is an error because the region is opened by "thing", and the "=end" +tries to close "hting" [sic]. + +This is also invalid: + + =begin thing + + =end + +This is invalid because every "=end" command must have a formatname +parameter. + +=head1 SEE ALSO + +L, L, +L + +=head1 AUTHOR + +Sean M. Burke + +=cut + + diff --git a/pod/perltoc.pod b/pod/perltoc.pod index 3ad81d4..859c2a4 100644 --- a/pod/perltoc.pod +++ b/pod/perltoc.pod @@ -642,7 +642,7 @@ more elaborate constructs =back -=head2 perlpod - plain old documentation +=head2 perlpod - the Plain Old Documentation format =over 4 @@ -650,20 +650,73 @@ more elaborate constructs =over 4 +=item Ordinary Paragraph + =item Verbatim Paragraph =item Command Paragraph -=item Ordinary Block of Text +C<=head1 I>, C<=head2 I>, C<=head3 I>, C<=head4 I>, C<=over I>, C<=item +I>, C<=back>, C<=cut>, C<=pod>, C<=begin I>, C<=end +I>, C<=for I I> + +=item Formatting Codes + +CtextE> -- italic text, CtextE> -- bold text, +CcodeE> -- code text, CnameE> -- a hyperlink, +CescapeE> -- a character escape, CfilenameE> -- used +for filenames, CtextE> -- text contains non-breaking spaces, +Ctopic nameE> -- an index entry, CE> -- a null +(zero-effect) formatting code =item The Intent =item Embedding Pods in Perl Modules -=item Common Pod Pitfalls +=item Hints for Writing Pod + +=back + +=item SEE ALSO + +=item AUTHOR =back +=head2 perlpodspec - Plain Old Documentation: format specification and +notes + +=over 4 + +=item DESCRIPTION + +=item Pod Definitions + +=item Pod Commands + +"=head1", "=head2", "=head3", "=head4", "=pod", "=cut", "=over", "=item", +"=back", "=begin formatname", "=end formatname", "=for formatname text..." + +=item Pod Formatting Codes + +CtextE> -- italic text, CtextE> -- bold text, +CcodeE> -- code text, CfilenameE> -- style for +filenames, Ctopic nameE> -- an index entry, CE> -- a +null (zero-effect) formatting code, CnameE> -- a hyperlink, +CescapeE> -- a character escape, CtextE> -- text +contains non-breaking spaces + +=item Notes on Implementing Pod Processors + +=item About LE...E Codes + +First:, Second:, Third:, Fourth:, Fifth:, Sixth: + +=item About =over...=back Regions + +=item About Data Paragraphs and "=begin/=end" Regions + =item SEE ALSO =item AUTHOR