X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlpodspec.pod;h=5e38e2c184c1c6475909a7dc28129a37b5ae96c4;hb=03f3e794d79fbb3655a305de34b922dd4020d2ec;hp=73872586343db0ae4d97e495067cd6e778754e3c;hpb=fae2c0fbfb26247eb616ab310ef74b1f4084ba68;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlpodspec.pod b/pod/perlpodspec.pod index 7387258..5e38e2c 100644 --- a/pod/perlpodspec.pod +++ b/pod/perlpodspec.pod @@ -65,7 +65,7 @@ directly formatting it). A B (or B) is a module or program that converts Pod to some other format (HTML, plaintext, TeX, PostScript, RTF). A B might be a formatter or translator, or might be a program that does something -else with the Pod (like wordcounting it, scanning for index points, +else with the Pod (like counting words, scanning for index points, etc.). Pod content is contained in B. A Pod block starts with a @@ -238,7 +238,7 @@ ignored. Examples: # This is the first line of program text. sub foo { # This is the second. -It is an error to try to I a Pod black with a "=cut" command. In +It is an error to try to I a Pod block with a "=cut" command. In that case, the Pod processor must halt parsing of the input file, and must by default emit a warning. @@ -332,6 +332,29 @@ then "text..." will constitute a data paragraph. There is no way to use "=for formatname text..." to express "text..." as a verbatim paragraph. +=item "=encoding encodingname" + +This command, which should occur early in the document (at least +before any non-US-ASCII data!), declares that this document is +encoded in the encoding I, which must be +an encoding name that L recognizes. (Encoding's list +of supported encodings, in L, is useful here.) +If the Pod parser cannot decode the declared encoding, it +should emit a warning and may abort parsing the document +altogether. + +A document having more than one "=encoding" line should be +considered an error. Pod processors may silently tolerate this if +the not-first "=encoding" lines are just duplicates of the +first one (e.g., if there's a "=use utf8" line, and later on +another "=use utf8" line). But Pod processors should complain if +there are contradictory "=encoding" lines in the same document +(e.g., if there is a "=encoding utf8" early in the document and +"=encoding big5" later). Pod processors that recognize BOMs +may also complain if they see an "=encoding" line +that contradicts the BOM (e.g., if a document with a UTF-16LE +BOM has an "=encoding shiftjis" line). + =back If a Pod processor sees any command other than the ones listed @@ -463,7 +486,7 @@ L. This formatting code is syntactically simple, but semantically complex. What it means is that each space in the printable -content of this code signifies a nonbreaking space. +content of this code signifies a non-breaking space. Consider: @@ -474,7 +497,7 @@ Consider: Both signify the monospace (c[ode] style) text consisting of "$x", one space, "?", one space, ":", one space, "$z". The difference is that in the latter, with the S code, those spaces -are not "normal" spaces, but instead are nonbreaking spaces. +are not "normal" spaces, but instead are non-breaking spaces. =back @@ -499,7 +522,7 @@ a "-". This was so that this: would parse as equivalent to this: - C<$foo-Ebar> + C<$foo-Ebar> instead of as equivalent to a "C" formatting code containing only "$foo-", and then a "bar>" outside the "C" formatting code. This @@ -589,7 +612,7 @@ UTF-16. If the file begins with the three literal byte values 0xEF 0xBB 0xBF =for comment - If toke.c is modified to support UTF32, add mention of those here. + If toke.c is modified to support UTF-32, add mention of those here. =item * @@ -701,7 +724,7 @@ period-space-space or period-newline sequences). Pod parsers should not, by default, try to coerce apostrophe (') and quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to turn backtick (`) into anything else but a single backtick character -(distinct from an openquote character!), nor "--" into anything but +(distinct from an open quote character!), nor "--" into anything but two minus signs. They I do any of those things to text in CE...> formatting codes, and never I to text in verbatim paragraphs. @@ -709,10 +732,10 @@ paragraphs. =item * When rendering Pod to a format that has two kinds of hyphens (-), one -that's a nonbreaking hyphen, and another that's a breakable hyphen +that's a non-breaking hyphen, and another that's a breakable hyphen (as in "object-oriented", which can be split across lines as "object-", newline, "oriented"), formatters are encouraged to -generally translate "-" to nonbreaking hyphen, but may apply +generally translate "-" to non-breaking hyphen, but may apply heuristics to convert some of these to breaking hyphens. =item * @@ -936,7 +959,7 @@ for idiosyncratic mappings of Unicode-to-I. =item * -It is up to individual Pod formatter to display good judgment when +It is up to individual Pod formatter to display good judgement when confronted with an unrenderable character (which is distinct from an unknown EEthing> sequence that the parser couldn't resolve to anything, renderable or not). It is good practice to map Latin letters @@ -969,15 +992,15 @@ EEeuro>1,000,000 Solution|Million::Euros>". =item * -Some Pod formatters output to formats that implement nonbreaking +Some Pod formatters output to formats that implement non-breaking spaces as an individual character (which I'll call "NBSP"), and -others output to formats that implement nonbreaking spaces just as +others output to formats that implement non-breaking spaces just as spaces wrapped in a "don't break this across lines" code. Note that at the level of Pod, both sorts of codes can occur: Pod can contain a NBSP character (whether as a literal, or as a "EE160>" or "EEnbsp>" code); and Pod can contain "SEfoo IEbarE baz>" codes, where "mere spaces" (character 32) in -such codes are taken to represent nonbreaking spaces. Pod +such codes are taken to represent non-breaking spaces. Pod parsers should consider supporting the optional parsing of "SEfoo IEbarE baz>" as if it were "fooIIEbarEIbaz", and, going the other way, the @@ -1107,7 +1130,7 @@ is "perlfunc". In "LE/CAVEATS>", the name is undef.) =item Fourth: The section (AKA "item" in older perlpods), or undef if none. E.g., -in L, "DESCRIPTION" is the section. (Note +in "LEGetopt::Std/DESCRIPTIONE", "DESCRIPTION" is the section. (Note that this is not the same as a manpage section like the "5" in "man 5 crontab". "Section Foo" in the Pod sense means the part of the text that's introduced by the heading or item whose text is "Foo".) @@ -1310,7 +1333,7 @@ given CfooE> code. =item * Previous versions of perlpod allowed for a CsectionE> syntax -(as in "CObject AttributesE>"), which was not easily distinguishable +(as in CObject AttributesE>), which was not easily distinguishable from CnameE> syntax. This syntax is no longer in the specification, and has been replaced by the C"section"E> syntax (where the quotes were formerly optional). Pod parsers should tolerate @@ -1518,7 +1541,7 @@ probably want to format it like so: Ut Enim -But (for the forseeable future), Pod does not provide any way for Pod +But (for the foreseeable future), Pod does not provide any way for Pod authors to distinguish which grouping is meant by the above "=item"-cluster structure. So formatters should format it like so: