Commit | Line | Data |
8a93676d |
1 | |
2 | =head1 NAME |
3 | |
4 | perlpodspec - Plain Old Documentation: format specification and notes |
5 | |
6 | =head1 DESCRIPTION |
7 | |
8 | This document is detailed notes on the Pod markup language. Most |
9 | people will only have to read L<perlpod|perlpod> to know how to write |
10 | in Pod, but this document may answer some incidental questions to do |
11 | with parsing and rendering Pod. |
12 | |
13 | In this document, "must" / "must not", "should" / |
14 | "should not", and "may" have their conventional (cf. RFC 2119) |
15 | meanings: "X must do Y" means that if X doesn't do Y, it's against |
16 | this specification, and should really be fixed. "X should do Y" |
17 | means that it's recommended, but X may fail to do Y, if there's a |
18 | good reason. "X may do Y" is merely a note that X can do Y at |
19 | will (although it is up to the reader to detect any connotation of |
20 | "and I think it would be I<nice> if X did Y" versus "it wouldn't |
21 | really I<bother> me if X did Y"). |
22 | |
23 | Notably, when I say "the parser should do Y", the |
24 | parser may fail to do Y, if the calling application explicitly |
25 | requests that the parser I<not> do Y. I often phrase this as |
26 | "the parser should, by default, do Y." This doesn't I<require> |
27 | the parser to provide an option for turning off whatever |
28 | feature Y is (like expanding tabs in verbatim paragraphs), although |
29 | it implicates that such an option I<may> be provided. |
30 | |
31 | =head1 Pod Definitions |
32 | |
33 | Pod is embedded in files, typically Perl source files -- although you |
34 | can write a file that's nothing but Pod. |
35 | |
36 | A B<line> in a file consists of zero or more non-newline characters, |
37 | terminated by either a newline or the end of the file. |
38 | |
39 | A B<newline sequence> is usually a platform-dependent concept, but |
40 | Pod parsers should understand it to mean any of CR (ASCII 13), LF |
41 | (ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in |
42 | addition to any other system-specific meaning. The first CR/CRLF/LF |
43 | sequence in the file may be used as the basis for identifying the |
44 | newline sequence for parsing the rest of the file. |
45 | |
46 | A B<blank line> is a line consisting entirely of zero or more spaces |
47 | (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file. |
48 | A B<non-blank line> is a line containing one or more characters other |
49 | than space or tab (and terminated by a newline or end-of-file). |
50 | |
51 | (I<Note:> Many older Pod parsers did not accept a line consisting of |
52 | spaces/tabs and then a newline as a blank line -- the only lines they |
53 | considered blank were lines consisting of I<no characters at all>, |
54 | terminated by a newline.) |
55 | |
56 | B<Whitespace> is used in this document as a blanket term for spaces, |
57 | tabs, and newline sequences. (By itself, this term usually refers |
58 | to literal whitespace. That is, sequences of whitespace characters |
59 | in Pod source, as opposed to "EE<lt>32>", which is a formatting |
60 | code that I<denotes> a whitespace character.) |
61 | |
62 | A B<Pod parser> is a module meant for parsing Pod (regardless of |
63 | whether this involves calling callbacks or building a parse tree or |
64 | directly formatting it). A B<Pod formatter> (or B<Pod translator>) |
65 | is a module or program that converts Pod to some other format (HTML, |
66 | plaintext, TeX, PostScript, RTF). A B<Pod processor> might be a |
67 | formatter or translator, or might be a program that does something |
68 | else with the Pod (like wordcounting it, scanning for index points, |
69 | etc.). |
70 | |
71 | Pod content is contained in B<Pod blocks>. A Pod block starts with a |
72 | line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line |
73 | that matches C<m/\A=cut/> -- or up to the end of the file, if there is |
74 | no C<m/\A=cut/> line. |
75 | |
76 | =for comment |
77 | The current perlsyn says: |
78 | [beginquote] |
79 | Note that pod translators should look at only paragraphs beginning |
80 | with a pod directive (it makes parsing easier), whereas the compiler |
81 | actually knows to look for pod escapes even in the middle of a |
82 | paragraph. This means that the following secret stuff will be ignored |
83 | by both the compiler and the translators. |
84 | $a=3; |
85 | =secret stuff |
86 | warn "Neither POD nor CODE!?" |
87 | =cut back |
88 | print "got $a\n"; |
89 | You probably shouldn't rely upon the warn() being podded out forever. |
90 | Not all pod translators are well-behaved in this regard, and perhaps |
91 | the compiler will become pickier. |
92 | [endquote] |
93 | I think that those paragraphs should just be removed; paragraph-based |
94 | parsing seems to have been largely abandoned, because of the hassle |
95 | with non-empty blank lines messing up what people meant by "paragraph". |
96 | Even if the "it makes parsing easier" bit were especially true, |
97 | it wouldn't be worth the confusion of having perl and pod2whatever |
98 | actually disagree on what can constitute a Pod block. |
99 | |
100 | Within a Pod block, there are B<Pod paragraphs>. A Pod paragraph |
101 | consists of non-blank lines of text, separated by one or more blank |
102 | lines. |
103 | |
104 | For purposes of Pod processing, there are four types of paragraphs in |
105 | a Pod block: |
106 | |
107 | =over |
108 | |
109 | =item * |
110 | |
111 | A command paragraph (also called a "directive"). The first line of |
112 | this paragraph must match C<m/\A=[a-zA-Z]/>. Command paragraphs are |
113 | typically one line, as in: |
114 | |
115 | =head1 NOTES |
116 | |
117 | =item * |
118 | |
119 | But they may span several (non-blank) lines: |
120 | |
121 | =for comment |
122 | Hm, I wonder what it would look like if |
123 | you tried to write a BNF for Pod from this. |
210b36aa |
124 | |
8a93676d |
125 | =head3 Dr. Strangelove, or: How I Learned to |
126 | Stop Worrying and Love the Bomb |
127 | |
128 | I<Some> command paragraphs allow formatting codes in their content |
129 | (i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in: |
130 | |
131 | =head1 Did You Remember to C<use strict;>? |
132 | |
133 | In other words, the Pod processing handler for "head1" will apply the |
134 | same processing to "Did You Remember to CE<lt>use strict;>?" that it |
135 | would to an ordinary paragraph -- i.e., formatting codes (like |
136 | "CE<lt>...>") are parsed and presumably formatted appropriately, and |
137 | whitespace in the form of literal spaces and/or tabs is not |
138 | significant. |
139 | |
140 | =item * |
141 | |
142 | A B<verbatim paragraph>. The first line of this paragraph must be a |
143 | literal space or tab, and this paragraph must not be inside a "=begin |
144 | I<identifier>", ... "=end I<identifier>" sequence unless |
145 | "I<identifier>" begins with a colon (":"). That is, if a paragraph |
146 | starts with a literal space or tab, but I<is> inside a |
147 | "=begin I<identifier>", ... "=end I<identifier>" region, then it's |
148 | a data paragraph, unless "I<identifier>" begins with a colon. |
149 | |
150 | Whitespace I<is> significant in verbatim paragraphs (although, in |
151 | processing, tabs are probably expanded). |
152 | |
153 | =item * |
154 | |
155 | An B<ordinary paragraph>. A paragraph is an ordinary paragraph |
156 | if its first line matches neither C<m/\A=[a-zA-Z]/> nor |
157 | C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>", |
158 | ... "=end I<identifier>" sequence unless "I<identifier>" begins with |
159 | a colon (":"). |
160 | |
161 | =item * |
162 | |
163 | A B<data paragraph>. This is a paragraph that I<is> inside a "=begin |
164 | I<identifier>" ... "=end I<identifier>" sequence where |
165 | "I<identifier>" does I<not> begin with a literal colon (":"). In |
166 | some sense, a data paragraph is not part of Pod at all (i.e., |
167 | effectively it's "out-of-band"), since it's not subject to most kinds |
168 | of Pod parsing; but it is specified here, since Pod |
169 | parsers need to be able to call an event for it, or store it in some |
170 | form in a parse tree, or at least just parse I<around> it. |
171 | |
172 | =back |
173 | |
174 | For example: consider the following paragraphs: |
175 | |
176 | # <- that's the 0th column |
177 | |
178 | =head1 Foo |
210b36aa |
179 | |
8a93676d |
180 | Stuff |
210b36aa |
181 | |
8a93676d |
182 | $foo->bar |
210b36aa |
183 | |
8a93676d |
184 | =cut |
185 | |
186 | Here, "=head1 Foo" and "=cut" are command paragraphs because the first |
187 | line of each matches C<m/\A=[a-zA-Z]/>. "I<[space][space]>$foo->bar" |
188 | is a verbatim paragraph, because its first line starts with a literal |
189 | whitespace character (and there's no "=begin"..."=end" region around). |
190 | |
191 | The "=begin I<identifier>" ... "=end I<identifier>" commands stop |
192 | paragraphs that they surround from being parsed as data or verbatim |
193 | paragraphs, if I<identifier> doesn't begin with a colon. This |
194 | is discussed in detail in the section |
195 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>. |
196 | |
197 | =head1 Pod Commands |
198 | |
199 | This section is intended to supplement and clarify the discussion in |
200 | L<perlpod/"Command Paragraph">. These are the currently recognized |
201 | Pod commands: |
202 | |
203 | =over |
204 | |
205 | =item "=head1", "=head2", "=head3", "=head4" |
206 | |
207 | This command indicates that the text in the remainder of the paragraph |
208 | is a heading. That text may contain formatting codes. Examples: |
209 | |
210 | =head1 Object Attributes |
210b36aa |
211 | |
8a93676d |
212 | =head3 What B<Not> to Do! |
213 | |
214 | =item "=pod" |
215 | |
216 | This command indicates that this paragraph begins a Pod block. (If we |
217 | are already in the middle of a Pod block, this command has no effect at |
218 | all.) If there is any text in this command paragraph after "=pod", |
219 | it must be ignored. Examples: |
220 | |
221 | =pod |
210b36aa |
222 | |
8a93676d |
223 | This is a plain Pod paragraph. |
210b36aa |
224 | |
8a93676d |
225 | =pod This text is ignored. |
226 | |
227 | =item "=cut" |
228 | |
229 | This command indicates that this line is the end of this previously |
230 | started Pod block. If there is any text after "=cut" on the line, it must be |
231 | ignored. Examples: |
232 | |
233 | =cut |
234 | |
235 | =cut The documentation ends here. |
236 | |
237 | =cut |
238 | # This is the first line of program text. |
239 | sub foo { # This is the second. |
240 | |
241 | It is an error to try to I<start> a Pod black with a "=cut" command. In |
242 | that case, the Pod processor must halt parsing of the input file, and |
243 | must by default emit a warning. |
244 | |
245 | =item "=over" |
246 | |
247 | This command indicates that this is the start of a list/indent |
248 | region. If there is any text following the "=over", it must consist |
249 | of only a nonzero positive numeral. The semantics of this numeral is |
250 | explained in the L</"About =over...=back Regions"> section, further |
251 | below. Formatting codes are not expanded. Examples: |
252 | |
253 | =over 3 |
210b36aa |
254 | |
8a93676d |
255 | =over 3.5 |
210b36aa |
256 | |
8a93676d |
257 | =over |
258 | |
259 | =item "=item" |
260 | |
261 | This command indicates that an item in a list begins here. Formatting |
262 | codes are processed. The semantics of the (optional) text in the |
263 | remainder of this paragraph are |
264 | explained in the L</"About =over...=back Regions"> section, further |
265 | below. Examples: |
266 | |
267 | =item |
210b36aa |
268 | |
8a93676d |
269 | =item * |
210b36aa |
270 | |
8a93676d |
271 | =item * |
210b36aa |
272 | |
8a93676d |
273 | =item 14 |
210b36aa |
274 | |
8a93676d |
275 | =item 3. |
210b36aa |
276 | |
8a93676d |
277 | =item C<< $thing->stuff(I<dodad>) >> |
210b36aa |
278 | |
8a93676d |
279 | =item For transporting us beyond seas to be tried for pretended |
280 | offenses |
210b36aa |
281 | |
8a93676d |
282 | =item He is at this time transporting large armies of foreign |
283 | mercenaries to complete the works of death, desolation and |
284 | tyranny, already begun with circumstances of cruelty and perfidy |
285 | scarcely paralleled in the most barbarous ages, and totally |
286 | unworthy the head of a civilized nation. |
287 | |
288 | =item "=back" |
289 | |
290 | This command indicates that this is the end of the region begun |
291 | by the most recent "=over" command. It permits no text after the |
292 | "=back" command. |
293 | |
294 | =item "=begin formatname" |
295 | |
296 | This marks the following paragraphs (until the matching "=end |
297 | formatname") as being for some special kind of processing. Unless |
298 | "formatname" begins with a colon, the contained non-command |
299 | paragraphs are data paragraphs. But if "formatname" I<does> begin |
300 | with a colon, then non-command paragraphs are ordinary paragraphs |
301 | or data paragraphs. This is discussed in detail in the section |
302 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>. |
303 | |
304 | It is advised that formatnames match the regexp |
305 | C<m/\A:?[-a-zA-Z0-9_]+\z/>. Implementors should anticipate future |
306 | expansion in the semantics and syntax of the first parameter |
307 | to "=begin"/"=end"/"=for". |
308 | |
309 | =item "=end formatname" |
310 | |
311 | This marks the end of the region opened by the matching |
312 | "=begin formatname" region. If "formatname" is not the formatname |
313 | of the most recent open "=begin formatname" region, then this |
314 | is an error, and must generate an error message. This |
315 | is discussed in detail in the section |
316 | L</About Data Paragraphs and "=beginE<sol>=end" Regions>. |
317 | |
318 | =item "=for formatname text..." |
319 | |
320 | This is synonymous with: |
321 | |
322 | =begin formatname |
210b36aa |
323 | |
8a93676d |
324 | text... |
210b36aa |
325 | |
8a93676d |
326 | =end formatname |
327 | |
328 | That is, it creates a region consisting of a single paragraph; that |
329 | paragraph is to be treated as a normal paragraph if "formatname" |
330 | begins with a ":"; if "formatname" I<doesn't> begin with a colon, |
331 | then "text..." will constitute a data paragraph. There is no way |
332 | to use "=for formatname text..." to express "text..." as a verbatim |
333 | paragraph. |
334 | |
335 | =back |
336 | |
337 | If a Pod processor sees any command other than the ones listed |
338 | above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish", |
339 | or "=w123"), that processor must by default treat this as an |
340 | error. It must not process the paragraph beginning with that |
341 | command, must by default warn of this as an error, and may |
342 | abort the parse. A Pod parser may allow a way for particular |
343 | applications to add to the above list of known commands, and to |
344 | stipulate, for each additional command, whether formatting |
345 | codes should be processed. |
346 | |
347 | Future versions of this specification may add additional |
348 | commands. |
349 | |
350 | |
351 | |
352 | =head1 Pod Formatting Codes |
353 | |
354 | (Note that in previous drafts of this document and of perlpod, |
355 | formatting codes were referred to as "interior sequences", and |
356 | this term may still be found in the documentation for Pod parsers, |
357 | and in error messages from Pod processors.) |
358 | |
359 | There are two syntaxes for formatting codes: |
360 | |
361 | =over |
362 | |
363 | =item * |
364 | |
365 | A formatting code starts with a capital letter (just US-ASCII [A-Z]) |
366 | followed by a "<", any number of characters, and ending with the first |
367 | matching ">". Examples: |
368 | |
369 | That's what I<you> think! |
370 | |
371 | What's C<dump()> for? |
372 | |
373 | X<C<chmod> and C<unlink()> Under Different Operating Systems> |
374 | |
375 | =item * |
376 | |
377 | A formatting code starts with a capital letter (just US-ASCII [A-Z]) |
378 | followed by two or more "<"'s, one or more whitespace characters, |
379 | any number of characters, one or more whitespace characters, |
380 | and ending with the first matching sequence of two or more ">"'s, where |
381 | the number of ">"'s equals the number of "<"'s in the opening of this |
382 | formatting code. Examples: |
383 | |
384 | That's what I<< you >> think! |
385 | |
386 | C<<< open(X, ">>thing.dat") || die $! >>> |
387 | |
388 | B<< $foo->bar(); >> |
389 | |
390 | With this syntax, the whitespace character(s) after the "CE<lt><<" |
391 | and before the ">>" (or whatever letter) are I<not> renderable -- they |
392 | do not signify whitespace, are merely part of the formatting codes |
393 | themselves. That is, these are all synonymous: |
394 | |
395 | C<thing> |
396 | C<< thing >> |
397 | C<< thing >> |
398 | C<<< thing >>> |
399 | C<<<< |
400 | thing |
401 | >>>> |
402 | |
403 | and so on. |
404 | |
405 | =back |
406 | |
407 | In parsing Pod, a notably tricky part is the correct parsing of |
408 | (potentially nested!) formatting codes. Implementors should |
409 | consult the code in the C<parse_text> routine in Pod::Parser as an |
410 | example of a correct implementation. |
411 | |
412 | =over |
413 | |
414 | =item C<IE<lt>textE<gt>> -- italic text |
415 | |
416 | See the brief discussion in L<perlpod/"Formatting Codes">. |
417 | |
418 | =item C<BE<lt>textE<gt>> -- bold text |
419 | |
420 | See the brief discussion in L<perlpod/"Formatting Codes">. |
421 | |
422 | =item C<CE<lt>codeE<gt>> -- code text |
423 | |
424 | See the brief discussion in L<perlpod/"Formatting Codes">. |
425 | |
426 | =item C<FE<lt>filenameE<gt>> -- style for filenames |
427 | |
428 | See the brief discussion in L<perlpod/"Formatting Codes">. |
429 | |
430 | =item C<XE<lt>topic nameE<gt>> -- an index entry |
431 | |
432 | See the brief discussion in L<perlpod/"Formatting Codes">. |
433 | |
434 | This code is unusual in that most formatters completely discard |
435 | this code and its content. Other formatters will render it with |
436 | invisible codes that can be used in building an index of |
437 | the current document. |
438 | |
439 | =item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code |
440 | |
441 | Discussed briefly in L<perlpod/"Formatting Codes">. |
442 | |
443 | This code is unusual is that it should have no content. That is, |
444 | a processor may complain if it sees C<ZE<lt>potatoesE<gt>>. Whether |
445 | or not it complains, the I<potatoes> text should ignored. |
446 | |
447 | =item C<LE<lt>nameE<gt>> -- a hyperlink |
448 | |
449 | The complicated syntaxes of this code are discussed at length in |
450 | L<perlpod/"Formatting Codes">, and implementation details are |
451 | discussed below, in L</"About LE<lt>...E<gt> Codes">. Parsing the |
452 | contents of LE<lt>content> is tricky. Notably, the content has to be |
453 | checked for whether it looks like a URL, or whether it has to be split |
454 | on literal "|" and/or "/" (in the right order!), and so on, |
455 | I<before> EE<lt>...> codes are resolved. |
456 | |
457 | =item C<EE<lt>escapeE<gt>> -- a character escape |
458 | |
459 | See L<perlpod/"Formatting Codes">, and several points in |
460 | L</Notes on Implementing Pod Processors>. |
461 | |
462 | =item C<SE<lt>textE<gt>> -- text contains non-breaking spaces |
463 | |
464 | This formatting code is syntactically simple, but semantically |
465 | complex. What it means is that each space in the printable |
466 | content of this code signifies a nonbreaking space. |
467 | |
468 | Consider: |
469 | |
470 | C<$x ? $y : $z> |
471 | |
472 | S<C<$x ? $y : $z>> |
473 | |
474 | Both signify the monospace (c[ode] style) text consisting of |
475 | "$x", one space, "?", one space, ":", one space, "$z". The |
476 | difference is that in the latter, with the S code, those spaces |
477 | are not "normal" spaces, but instead are nonbreaking spaces. |
478 | |
479 | =back |
480 | |
481 | |
482 | If a Pod processor sees any formatting code other than the ones |
483 | listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that |
484 | processor must by default treat this as an error. |
485 | A Pod parser may allow a way for particular |
486 | applications to add to the above list of known formatting codes; |
487 | a Pod parser might even allow a way to stipulate, for each additional |
488 | command, whether it requires some form of special processing, as |
489 | LE<lt>...> does. |
490 | |
491 | Future versions of this specification may add additional |
492 | formatting codes. |
493 | |
494 | Historical note: A few older Pod processors would not see a ">" as |
495 | closing a "CE<lt>" code, if the ">" was immediately preceded by |
496 | a "-". This was so that this: |
497 | |
498 | C<$foo->bar> |
499 | |
500 | would parse as equivalent to this: |
501 | |
502 | C<$foo-E<lt>bar> |
503 | |
504 | instead of as equivalent to a "C" formatting code containing |
505 | only "$foo-", and then a "bar>" outside the "C" formatting code. This |
506 | problem has since been solved by the addition of syntaxes like this: |
507 | |
508 | C<< $foo->bar >> |
509 | |
510 | Compliant parsers must not treat "->" as special. |
511 | |
512 | Formatting codes absolutely cannot span paragraphs. If a code is |
513 | opened in one paragraph, and no closing code is found by the end of |
514 | that paragraph, the Pod parser must close that formatting code, |
515 | and should complain (as in "Unterminated I code in the paragraph |
516 | starting at line 123: 'Time objects are not...'"). So these |
517 | two paragraphs: |
518 | |
519 | I<I told you not to do this! |
210b36aa |
520 | |
8a93676d |
521 | Don't make me say it again!> |
522 | |
523 | ...must I<not> be parsed as two paragraphs in italics (with the I |
524 | code starting in one paragraph and starting in another.) Instead, |
525 | the first paragraph should generate a warning, but that aside, the |
526 | above code must parse as if it were: |
527 | |
528 | I<I told you not to do this!> |
210b36aa |
529 | |
8a93676d |
530 | Don't make me say it again!E<gt> |
531 | |
532 | (In SGMLish jargon, all Pod commands are like block-level |
533 | elements, whereas all Pod formatting codes are like inline-level |
534 | elements.) |
535 | |
536 | |
537 | |
538 | =head1 Notes on Implementing Pod Processors |
539 | |
540 | The following is a long section of miscellaneous requirements |
541 | and suggestions to do with Pod processing. |
542 | |
543 | =over |
544 | |
545 | =item * |
546 | |
547 | Pod formatters should tolerate lines in verbatim blocks that are of |
548 | any length, even if that means having to break them (possibly several |
549 | times, for very long lines) to avoid text running off the side of the |
550 | page. Pod formatters may warn of such line-breaking. Such warnings |
551 | are particularly appropriate for lines are over 100 characters long, which |
552 | are usually not intentional. |
553 | |
554 | =item * |
555 | |
556 | Pod parsers must recognize I<all> of the three well-known newline |
557 | formats: CR, LF, and CRLF. See L<perlport|perlport>. |
558 | |
559 | =item * |
560 | |
561 | Pod parsers should accept input lines that are of any length. |
562 | |
563 | =item * |
564 | |
565 | Since Perl recognizes a Unicode Byte Order Mark at the start of files |
566 | as signaling that the file is Unicode encoded as in UTF-16 (whether |
567 | big-endian or little-endian) or UTF-8, Pod parsers should do the |
568 | same. Otherwise, the character encoding should be understood as |
569 | being UTF-8 if the first highbit byte sequence in the file seems |
570 | valid as a UTF-8 sequence, or otherwise as Latin-1. |
571 | |
572 | Future versions of this specification may specify |
573 | how Pod can accept other encodings. Presumably treatment of other |
574 | encodings in Pod parsing would be as in XML parsing: whatever the |
575 | encoding declared by a particular Pod file, content is to be |
576 | stored in memory as Unicode characters. |
577 | |
578 | =item * |
579 | |
580 | The well known Unicode Byte Order Marks are as follows: if the |
581 | file begins with the two literal byte values 0xFE 0xFF, this is |
582 | the BOM for big-endian UTF-16. If the file begins with the two |
583 | literal byte value 0xFF 0xFE, this is the BOM for little-endian |
584 | UTF-16. If the file begins with the three literal byte values |
585 | 0xEF 0xBB 0xBF, this is the BOM for UTF-8. |
586 | |
587 | =for comment |
588 | use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; |
589 | 0xEF 0xBB 0xBF |
590 | |
591 | =for comment |
592 | If toke.c is modified to support UTF32, add mention of those here. |
593 | |
594 | =item * |
595 | |
596 | A naive but sufficient heuristic for testing the first highbit |
597 | byte-sequence in a BOM-less file (whether in code or in Pod!), to see |
598 | whether that sequence is valid as UTF-8 (RFC 2279) is to check whether |
599 | that the first byte in the sequence is in the range 0xC0 - 0xFD |
600 | I<and> whether the next byte is in the range |
601 | 0x80 - 0xBF. If so, the parser may conclude that this file is in |
602 | UTF-8, and all highbit sequences in the file should be assumed to |
603 | be UTF-8. Otherwise the parser should treat the file as being |
604 | in Latin-1. In the unlikely circumstance that the first highbit |
605 | sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one |
606 | can cater to our heuristic (as well as any more intelligent heuristic) |
607 | by prefacing that line with a comment line containing a highbit |
608 | sequence that is clearly I<not> valid as UTF-8. A line consisting |
609 | of simply "#", an e-acute, and any non-highbit byte, |
610 | is sufficient to establish this file's encoding. |
611 | |
612 | =for comment |
613 | If/WHEN some brave soul makes these heuristics into a generic |
614 | text-file class (or file discipline?), we can presumably delete |
615 | mention of these icky details from this file, and can instead |
616 | tell people to just use appropriate class/discipline. |
617 | Auto-recognition of newline sequences would be another desirable |
618 | feature of such a class/discipline. |
619 | HINT HINT HINT. |
620 | |
621 | =for comment |
622 | "The probability that a string of characters |
623 | in any other encoding appears as valid UTF-8 is low" - RFC2279 |
624 | |
625 | =item * |
626 | |
627 | This document's requirements and suggestions about encodings |
628 | do not apply to Pod processors running on non-ASCII platforms, |
629 | notably EBCDIC platforms. |
630 | |
631 | =item * |
632 | |
633 | Pod processors must treat a "=for [label] [content...]" paragraph as |
634 | meaning the same thing as a "=begin [label]" paragraph, content, and |
635 | an "=end [label]" paragraph. (The parser may conflate these two |
636 | constructs, or may leave them distinct, in the expectation that the |
637 | formatter will nevertheless treat them the same.) |
638 | |
639 | =item * |
640 | |
641 | When rendering Pod to a format that allows comments (i.e., to nearly |
642 | any format other than plaintext), a Pod formatter must insert comment |
643 | text identifying its name and version number, and the name and |
644 | version numbers of any modules it might be using to process the Pod. |
645 | Minimal examples: |
646 | |
647 | %% POD::Pod2PS v3.14159, using POD::Parser v1.92 |
210b36aa |
648 | |
8a93676d |
649 | <!-- Pod::HTML v3.14159, using POD::Parser v1.92 --> |
210b36aa |
650 | |
8a93676d |
651 | {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08} |
210b36aa |
652 | |
8a93676d |
653 | .\" Pod::Man version 3.14159, using POD::Parser version 1.92 |
654 | |
655 | Formatters may also insert additional comments, including: the |
656 | release date of the Pod formatter program, the contact address for |
657 | the author(s) of the formatter, the current time, the name of input |
658 | file, the formatting options in effect, version of Perl used, etc. |
659 | |
660 | Formatters may also choose to note errors/warnings as comments, |
661 | besides or instead of emitting them otherwise (as in messages to |
662 | STDERR, or C<die>ing). |
663 | |
664 | =item * |
665 | |
666 | Pod parsers I<may> emit warnings or error messages ("Unknown E code |
667 | EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or |
668 | C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow |
669 | suppressing all such STDERR output, and instead allow an option for |
670 | reporting errors/warnings |
671 | in some other way, whether by triggering a callback, or noting errors |
672 | in some attribute of the document object, or some similarly unobtrusive |
673 | mechanism -- or even by appending a "Pod Errors" section to the end of |
674 | the parsed form of the document. |
675 | |
676 | =item * |
677 | |
678 | In cases of exceptionally aberrant documents, Pod parsers may abort the |
679 | parse. Even then, using C<die>ing/C<croak>ing is to be avoided; where |
680 | possible, the parser library may simply close the input file |
681 | and add text like "*** Formatting Aborted ***" to the end of the |
682 | (partial) in-memory document. |
683 | |
684 | =item * |
685 | |
686 | In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>) |
687 | are understood (i.e., I<not> verbatim paragraphs, but I<including> |
688 | ordinary paragraphs, and command paragraphs that produce renderable |
689 | text, like "=head1"), literal whitespace should generally be considered |
690 | "insignificant", in that one literal space has the same meaning as any |
691 | (nonzero) number of literal spaces, literal newlines, and literal tabs |
692 | (as long as this produces no blank lines, since those would terminate |
693 | the paragraph). Pod parsers should compact literal whitespace in each |
694 | processed paragraph, but may provide an option for overriding this |
695 | (since some processing tasks do not require it), or may follow |
696 | additional special rules (for example, specially treating |
697 | period-space-space or period-newline sequences). |
698 | |
699 | =item * |
700 | |
701 | Pod parsers should not, by default, try to coerce apostrophe (') and |
702 | quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to |
703 | turn backtick (`) into anything else but a single backtick character |
704 | (distinct from an openquote character!), nor "--" into anything but |
705 | two minus signs. They I<must never> do any of those things to text |
706 | in CE<lt>...> formatting codes, and never I<ever> to text in verbatim |
707 | paragraphs. |
708 | |
709 | =item * |
710 | |
711 | When rendering Pod to a format that has two kinds of hyphens (-), one |
712 | that's a nonbreaking hyphen, and another that's a breakable hyphen |
713 | (as in "object-oriented", which can be split across lines as |
714 | "object-", newline, "oriented"), formatters are encouraged to |
715 | generally translate "-" to nonbreaking hyphen, but may apply |
716 | heuristics to convert some of these to breaking hyphens. |
717 | |
718 | =item * |
719 | |
720 | Pod formatters should make reasonable efforts to keep words of Perl |
721 | code from being broken across lines. For example, "Foo::Bar" in some |
722 | formatting systems is seen as eligible for being broken across lines |
723 | as "Foo::" newline "Bar" or even "Foo::-" newline "Bar". This should |
724 | be avoided where possible, either by disabling all line-breaking in |
725 | mid-word, or by wrapping particular words with internal punctuation |
726 | in "don't break this across lines" codes (which in some formats may |
727 | not be a single code, but might be a matter of inserting non-breaking |
728 | zero-width spaces between every pair of characters in a word.) |
729 | |
730 | =item * |
731 | |
732 | Pod parsers should, by default, expand tabs in verbatim paragraphs as |
733 | they are processed, before passing them to the formatter or other |
734 | processor. Parsers may also allow an option for overriding this. |
735 | |
736 | =item * |
737 | |
738 | Pod parsers should, by default, remove newlines from the end of |
739 | ordinary and verbatim paragraphs before passing them to the |
740 | formatter. For example, while the paragraph you're reading now |
741 | could be considered, in Pod source, to end with (and contain) |
742 | the newline(s) that end it, it should be processed as ending with |
743 | (and containing) the period character that ends this sentence. |
744 | |
745 | =item * |
746 | |
747 | Pod parsers, when reporting errors, should make some effort to report |
748 | an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near |
749 | line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph |
750 | number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!"). Where |
751 | this is problematic, the paragraph number should at least be |
752 | accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in |
753 | Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for |
754 | the CE<lt>interest rate> attribute...'"). |
755 | |
756 | =item * |
757 | |
758 | Pod parsers, when processing a series of verbatim paragraphs one |
759 | after another, should consider them to be one large verbatim |
760 | paragraph that happens to contain blank lines. I.e., these two |
d1be9408 |
761 | lines, which have a blank line between them: |
8a93676d |
762 | |
763 | use Foo; |
764 | |
765 | print Foo->VERSION |
766 | |
767 | should be unified into one paragraph ("\tuse Foo;\n\n\tprint |
768 | Foo->VERSION") before being passed to the formatter or other |
769 | processor. Parsers may also allow an option for overriding this. |
770 | |
771 | While this might be too cumbersome to implement in event-based Pod |
772 | parsers, it is straightforward for parsers that return parse trees. |
773 | |
774 | =item * |
775 | |
776 | Pod formatters, where feasible, are advised to avoid splitting short |
777 | verbatim paragraphs (under twelve lines, say) across pages. |
778 | |
779 | =item * |
780 | |
781 | Pod parsers must treat a line with only spaces and/or tabs on it as a |
782 | "blank line" such as separates paragraphs. (Some older parsers |
783 | recognized only two adjacent newlines as a "blank line" but would not |
784 | recognize a newline, a space, and a newline, as a blank line. This |
785 | is noncompliant behavior.) |
786 | |
787 | =item * |
788 | |
789 | Authors of Pod formatters/processors should make every effort to |
790 | avoid writing their own Pod parser. There are already several in |
791 | CPAN, with a wide range of interface styles -- and one of them, |
792 | Pod::Parser, comes with modern versions of Perl. |
793 | |
794 | =item * |
795 | |
796 | Characters in Pod documents may be conveyed either as literals, or by |
797 | number in EE<lt>n> codes, or by an equivalent mnemonic, as in |
798 | EE<lt>eacute> which is exactly equivalent to EE<lt>233>. |
799 | |
800 | Characters in the range 32-126 refer to those well known US-ASCII |
801 | characters (also defined there by Unicode, with the same meaning), |
802 | which all Pod formatters must render faithfully. Characters |
803 | in the ranges 0-31 and 127-159 should not be used (neither as |
804 | literals, nor as EE<lt>number> codes), except for the |
210b36aa |
805 | literal byte-sequences for newline (13, 13 10, or 10), and tab (9). |
8a93676d |
806 | |
807 | Characters in the range 160-255 refer to Latin-1 characters (also |
808 | defined there by Unicode, with the same meaning). Characters above |
809 | 255 should be understood to refer to Unicode characters. |
810 | |
811 | =item * |
812 | |
813 | Be warned |
814 | that some formatters cannot reliably render characters outside 32-126; |
815 | and many are able to handle 32-126 and 160-255, but nothing above |
816 | 255. |
817 | |
818 | =item * |
819 | |
820 | Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for |
821 | less-than and greater-than, Pod parsers must understand "EE<lt>sol>" |
822 | for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar, |
823 | pipe). Pod parsers should also understand "EE<lt>lchevron>" and |
824 | "EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e., |
825 | "left-pointing double angle quotation mark" = "left pointing |
826 | guillemet" and "right-pointing double angle quotation mark" = "right |
827 | pointing guillemet". (These look like little "<<" and ">>", and they |
828 | are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>" |
829 | and "EE<lt>raquo>".) |
830 | |
831 | =item * |
832 | |
833 | Pod parsers should understand all "EE<lt>html>" codes as defined |
834 | in the entity declarations in the most recent XHTML specification at |
835 | C<www.W3.org>. Pod parsers must understand at least the entities |
836 | that define characters in the range 160-255 (Latin-1). Pod parsers, |
837 | when faced with some unknown "EE<lt>I<identifier>>" code, |
838 | shouldn't simply replace it with nullstring (by default, at least), |
839 | but may pass it through as a string consisting of the literal characters |
840 | E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the |
841 | alternative option of processing such unknown |
842 | "EE<lt>I<identifier>>" codes by firing an event especially |
843 | for such codes, or by adding a special node-type to the in-memory |
844 | document tree. Such "EE<lt>I<identifier>>" may have special meaning |
845 | to some processors, or some processors may choose to add them to |
846 | a special error report. |
847 | |
848 | =item * |
849 | |
850 | Pod parsers must also support the XHTML codes "EE<lt>quot>" for |
851 | character 34 (doublequote, "), "EE<lt>amp>" for character 38 |
852 | (ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, '). |
853 | |
854 | =item * |
855 | |
856 | Note that in all cases of "EE<lt>whatever>", I<whatever> (whether |
857 | an htmlname, or a number in any base) must consist only of |
858 | alphanumeric characters -- that is, I<whatever> must watch |
859 | C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because |
860 | it contains spaces, which aren't alphanumeric characters. This |
861 | presumably does not I<need> special treatment by a Pod processor; |
862 | " 0 1 2 3 " doesn't look like a number in any base, so it would |
863 | presumably be looked up in the table of HTML-like names. Since |
210b36aa |
864 | there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ", |
8a93676d |
865 | this will be treated as an error. However, Pod processors may |
866 | treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically> |
867 | invalid, potentially earning a different error message than the |
868 | error message (or warning, or event) generated by a merely unknown |
869 | (but theoretically valid) htmlname, as in "EE<lt>qacute>" |
870 | [sic]. However, Pod parsers are not required to make this |
871 | distinction. |
872 | |
873 | =item * |
874 | |
875 | Note that EE<lt>number> I<must not> be interpreted as simply |
876 | "codepoint I<number> in the current/native character set". It always |
877 | means only "the character represented by codepoint I<number> in |
878 | Unicode." (This is identical to the semantics of &#I<number>; in XML.) |
879 | |
880 | This will likely require many formatters to have tables mapping from |
881 | treatable Unicode codepoints (such as the "\xE9" for the e-acute |
882 | character) to the escape sequences or codes necessary for conveying |
883 | such sequences in the target output format. A converter to *roff |
884 | would, for example know that "\xE9" (whether conveyed literally, or via |
885 | a EE<lt>...> sequence) is to be conveyed as "e\\*'". |
8939ba94 |
886 | Similarly, a program rendering Pod in a Mac OS application window, would |
8a93676d |
887 | presumably need to know that "\xE9" maps to codepoint 142 in MacRoman |
8939ba94 |
888 | encoding that (at time of writing) is native for Mac OS. Such |
8a93676d |
889 | Unicode2whatever mappings are presumably already widely available for |
890 | common output formats. (Such mappings may be incomplete! Implementers |
891 | are not expected to bend over backwards in an attempt to render |
892 | Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any |
893 | of the other weird things that Unicode can encode.) And |
894 | if a Pod document uses a character not found in such a mapping, the |
895 | formatter should consider it an unrenderable character. |
896 | |
897 | =item * |
898 | |
899 | If, surprisingly, the implementor of a Pod formatter can't find a |
900 | satisfactory pre-existing table mapping from Unicode characters to |
901 | escapes in the target format (e.g., a decent table of Unicode |
902 | characters to *roff escapes), it will be necessary to build such a |
903 | table. If you are in this circumstance, you should begin with the |
904 | characters in the range 0x00A0 - 0x00FF, which is mostly the heavily |
905 | used accented characters. Then proceed (as patience permits and |
906 | fastidiousness compels) through the characters that the (X)HTML |
907 | standards groups judged important enough to merit mnemonics |
908 | for. These are declared in the (X)HTML specifications at the |
909 | www.W3.org site. At time of writing (September 2001), the most recent |
910 | entity declaration files are: |
911 | |
912 | http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent |
913 | http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent |
914 | http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent |
915 | |
916 | Then you can progress through any remaining notable Unicode characters |
917 | in the range 0x2000-0x204D (consult the character tables at |
918 | www.unicode.org), and whatever else strikes your fancy. For example, |
919 | in F<xhtml-symbol.ent>, there is the entry: |
920 | |
921 | <!ENTITY infin "∞"> <!-- infinity, U+221E ISOtech --> |
922 | |
923 | While the mapping "infin" to the character "\x{221E}" will (hopefully) |
924 | have been already handled by the Pod parser, the presence of the |
925 | character in this file means that it's reasonably important enough to |
926 | include in a formatter's table that maps from notable Unicode characters |
927 | to the codes necessary for rendering them. So for a Unicode-to-*roff |
928 | mapping, for example, this would merit the entry: |
929 | |
930 | "\x{221E}" => '\(in', |
931 | |
932 | It is eagerly hoped that in the future, increasing numbers of formats |
933 | (and formatters) will support Unicode characters directly (as (X)HTML |
934 | does with C<∞>, C<∞>, or C<∞>), reducing the need |
935 | for idiosyncratic mappings of Unicode-to-I<my_escapes>. |
936 | |
937 | =item * |
938 | |
939 | It is up to individual Pod formatter to display good judgment when |
940 | confronted with an unrenderable character (which is distinct from an |
941 | unknown EE<lt>thing> sequence that the parser couldn't resolve to |
942 | anything, renderable or not). It is good practice to map Latin letters |
943 | with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding |
944 | unaccented US-ASCII letters (like a simple character 101, "e"), but |
210b36aa |
945 | clearly this is often not feasible, and an unrenderable character may |
8a93676d |
946 | be represented as "?", or the like. In attempting a sane fallback |
947 | (as from EE<lt>233> to "e"), Pod formatters may use the |
948 | %Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or |
949 | L<Text::Unidecode|Text::Unidecode>, if available. |
950 | |
951 | For example, this Pod text: |
952 | |
953 | magic is enabled if you set C<$Currency> to 'E<euro>'. |
954 | |
955 | may be rendered as: |
956 | "magic is enabled if you set C<$Currency> to 'I<?>'" or as |
957 | "magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as |
958 | "magic is enabled if you set C<$Currency> to '[x20AC]', etc. |
959 | |
960 | A Pod formatter may also note, in a comment or warning, a list of what |
961 | unrenderable characters were encountered. |
962 | |
963 | =item * |
964 | |
965 | EE<lt>...> may freely appear in any formatting code (other than |
966 | in another EE<lt>...> or in an ZE<lt>>). That is, "XE<lt>The |
967 | EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The |
968 | EE<lt>euro>1,000,000 Solution|Million::Euros>". |
969 | |
970 | =item * |
971 | |
972 | Some Pod formatters output to formats that implement nonbreaking |
973 | spaces as an individual character (which I'll call "NBSP"), and |
974 | others output to formats that implement nonbreaking spaces just as |
975 | spaces wrapped in a "don't break this across lines" code. Note that |
976 | at the level of Pod, both sorts of codes can occur: Pod can contain a |
977 | NBSP character (whether as a literal, or as a "EE<lt>160>" or |
978 | "EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo |
979 | IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in |
980 | such codes are taken to represent nonbreaking spaces. Pod |
981 | parsers should consider supporting the optional parsing of "SE<lt>foo |
982 | IE<lt>barE<gt> baz>" as if it were |
983 | "fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the |
984 | optional parsing of groups of words joined by NBSP's as if each group |
985 | were in a SE<lt>...> code, so that formatters may use the |
986 | representation that maps best to what the output format demands. |
987 | |
988 | =item * |
989 | |
210b36aa |
990 | Some processors may find that the C<SE<lt>...E<gt>> code is easiest to |
8a93676d |
991 | implement by replacing each space in the parse tree under the content |
992 | of the S, with an NBSP. But note: the replacement should apply I<not> to |
993 | spaces in I<all> text, but I<only> to spaces in I<printable> text. (This |
994 | distinction may or may not be evident in the particular tree/event |
995 | model implemented by the Pod parser.) For example, consider this |
996 | unusual case: |
997 | |
998 | S<L</Autoloaded Functions>> |
999 | |
1000 | This means that the space in the middle of the visible link text must |
1001 | not be broken across lines. In other words, it's the same as this: |
1002 | |
1003 | L<"AutoloadedE<160>Functions"/Autoloaded Functions> |
1004 | |
1005 | However, a misapplied space-to-NBSP replacement could (wrongly) |
1006 | produce something equivalent to this: |
1007 | |
1008 | L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions> |
1009 | |
1010 | ...which is almost definitely not going to work as a hyperlink (assuming |
1011 | this formatter outputs a format supporting hypertext). |
1012 | |
1013 | Formatters may choose to just not support the S format code, |
1014 | especially in cases where the output format simply has no NBSP |
1015 | character/code and no code for "don't break this stuff across lines". |
1016 | |
1017 | =item * |
1018 | |
1019 | Besides the NBSP character discussed above, implementors are reminded |
1020 | of the existence of the other "special" character in Latin-1, the |
210b36aa |
1021 | "soft hyphen" character, also known as "discretionary hyphen", |
8a93676d |
1022 | i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> = |
1023 | C<EE<lt>shyE<gt>>). This character expresses an optional hyphenation |
1024 | point. That is, it normally renders as nothing, but may render as a |
1025 | "-" if a formatter breaks the word at that point. Pod formatters |
1026 | should, as appropriate, do one of the following: 1) render this with |
1027 | a code with the same meaning (e.g., "\-" in RTF), 2) pass it through |
1028 | in the expectation that the formatter understands this character as |
1029 | such, or 3) delete it. |
1030 | |
1031 | For example: |
1032 | |
1033 | sigE<shy>action |
1034 | manuE<shy>script |
1035 | JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi |
1036 | |
1037 | These signal to a formatter that if it is to hyphenate "sigaction" |
1038 | or "manuscript", then it should be done as |
1039 | "sig-I<[linebreak]>action" or "manu-I<[linebreak]>script" |
1040 | (and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't |
1041 | show up at all). And if it is |
1042 | to hyphenate "Jarkko" and/or "Hietaniemi", it can do |
1043 | so only at the points where there is a C<EE<lt>shyE<gt>> code. |
1044 | |
1045 | In practice, it is anticipated that this character will not be used |
1046 | often, but formatters should either support it, or delete it. |
1047 | |
1048 | =item * |
1049 | |
1050 | If you think that you want to add a new command to Pod (like, say, a |
1051 | "=biblio" command), consider whether you could get the same |
1052 | effect with a for or begin/end sequence: "=for biblio ..." or "=begin |
1053 | biblio" ... "=end biblio". Pod processors that don't understand |
1054 | "=for biblio", etc, will simply ignore it, whereas they may complain |
1055 | loudly if they see "=biblio". |
1056 | |
1057 | =item * |
1058 | |
1059 | Throughout this document, "Pod" has been the preferred spelling for |
1060 | the name of the documentation format. One may also use "POD" or |
da75cd15 |
1061 | "pod". For the documentation that is (typically) in the Pod |
8a93676d |
1062 | format, you may use "pod", or "Pod", or "POD". Understanding these |
1063 | distinctions is useful; but obsessing over how to spell them, usually |
1064 | is not. |
1065 | |
1066 | =back |
1067 | |
1068 | |
1069 | |
1070 | |
1071 | |
1072 | =head1 About LE<lt>...E<gt> Codes |
1073 | |
1074 | As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> |
1075 | code is the most complex of the Pod formatting codes. The points below |
1076 | will hopefully clarify what it means and how processors should deal |
1077 | with it. |
1078 | |
1079 | =over |
1080 | |
1081 | =item * |
1082 | |
1083 | In parsing an LE<lt>...> code, Pod parsers must distinguish at least |
1084 | four attributes: |
1085 | |
1086 | =over |
1087 | |
1088 | =item First: |
1089 | |
1090 | The link-text. If there is none, this must be undef. (E.g., in |
1091 | "LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". |
1092 | In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no |
1093 | link text. Note that link text may contain formatting.) |
1094 | |
1095 | =item Second: |
1096 | |
1097 | The possibly inferred link-text -- i.e., if there was no real link |
1098 | text, then this is the text that we'll infer in its place. (E.g., for |
1099 | "LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".) |
1100 | |
1101 | =item Third: |
1102 | |
1103 | The name or URL, or undef if none. (E.g., in "LE<lt>Perl |
1104 | Functions|perlfunc>", the name -- also sometimes called the page -- |
1105 | is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) |
1106 | |
1107 | =item Fourth: |
1108 | |
1109 | The section (AKA "item" in older perlpods), or undef if none. E.g., |
1110 | in L<Getopt::Std/DESCRIPTION>, "DESCRIPTION" is the section. (Note |
1111 | that this is not the same as a manpage section like the "5" in "man 5 |
1112 | crontab". "Section Foo" in the Pod sense means the part of the text |
6edf2346 |
1113 | that's introduced by the heading or item whose text is "Foo".) |
8a93676d |
1114 | |
1115 | =back |
1116 | |
1117 | Pod parsers may also note additional attributes including: |
1118 | |
1119 | =over |
1120 | |
1121 | =item Fifth: |
1122 | |
1123 | A flag for whether item 3 (if present) is a URL (like |
1124 | "http://lists.perl.org" is), in which case there should be no section |
1125 | attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or |
1126 | possibly a man page name (like "crontab(5)" is). |
1127 | |
1128 | =item Sixth: |
1129 | |
1130 | The raw original LE<lt>...> content, before text is split on |
1131 | "|", "/", etc, and before EE<lt>...> codes are expanded. |
1132 | |
1133 | =back |
1134 | |
1135 | (The above were numbered only for concise reference below. It is not |
1136 | a requirement that these be passed as an actual list or array.) |
1137 | |
1138 | For example: |
1139 | |
1140 | L<Foo::Bar> |
1141 | => undef, # link text |
1142 | "Foo::Bar", # possibly inferred link text |
1143 | "Foo::Bar", # name |
1144 | undef, # section |
1145 | 'pod', # what sort of link |
1146 | "Foo::Bar" # original content |
1147 | |
1148 | L<Perlport's section on NL's|perlport/Newlines> |
1149 | => "Perlport's section on NL's", # link text |
1150 | "Perlport's section on NL's", # possibly inferred link text |
1151 | "perlport", # name |
1152 | "Newlines", # section |
1153 | 'pod', # what sort of link |
1154 | "Perlport's section on NL's|perlport/Newlines" # orig. content |
1155 | |
1156 | L<perlport/Newlines> |
1157 | => undef, # link text |
1158 | '"Newlines" in perlport', # possibly inferred link text |
1159 | "perlport", # name |
1160 | "Newlines", # section |
1161 | 'pod', # what sort of link |
1162 | "perlport/Newlines" # original content |
1163 | |
1164 | L<crontab(5)/"DESCRIPTION"> |
1165 | => undef, # link text |
1166 | '"DESCRIPTION" in crontab(5)', # possibly inferred link text |
1167 | "crontab(5)", # name |
1168 | "DESCRIPTION", # section |
1169 | 'man', # what sort of link |
1170 | 'crontab(5)/"DESCRIPTION"' # original content |
1171 | |
1172 | L</Object Attributes> |
1173 | => undef, # link text |
1174 | '"Object Attributes"', # possibly inferred link text |
1175 | undef, # name |
1176 | "Object Attributes", # section |
1177 | 'pod', # what sort of link |
1178 | "/Object Attributes" # original content |
1179 | |
1180 | L<http://www.perl.org/> |
1181 | => undef, # link text |
1182 | "http://www.perl.org/", # possibly inferred link text |
1183 | "http://www.perl.org/", # name |
1184 | undef, # section |
1185 | 'url', # what sort of link |
1186 | "http://www.perl.org/" # original content |
1187 | |
1188 | Note that you can distinguish URL-links from anything else by the |
1189 | fact that they match C<m/\A\w+:[^:\s]\S*\z/>. So |
1190 | C<LE<lt>http://www.perl.comE<gt>> is a URL, but |
1191 | C<LE<lt>HTTP::ResponseE<gt>> isn't. |
1192 | |
1193 | =item * |
1194 | |
1195 | In case of LE<lt>...> codes with no "text|" part in them, |
1196 | older formatters have exhibited great variation in actually displaying |
1197 | the link or cross reference. For example, LE<lt>crontab(5)> would render |
1198 | as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage" |
1199 | or just "C<crontab(5)>". |
1200 | |
1201 | Pod processors must now treat "text|"-less links as follows: |
1202 | |
1203 | L<name> => L<name|name> |
1204 | L</section> => L<"section"|/section> |
1205 | L<name/section> => L<"section" in name|name/section> |
1206 | |
1207 | =item * |
1208 | |
1209 | Note that section names might contain markup. I.e., if a section |
1210 | starts with: |
1211 | |
1212 | =head2 About the C<-M> Operator |
1213 | |
1214 | or with: |
1215 | |
1216 | =item About the C<-M> Operator |
1217 | |
1218 | then a link to it would look like this: |
1219 | |
1220 | L<somedoc/About the C<-M> Operator> |
1221 | |
1222 | Formatters may choose to ignore the markup for purposes of resolving |
1223 | the link and use only the renderable characters in the section name, |
1224 | as in: |
1225 | |
1226 | <h1><a name="About_the_-M_Operator">About the <code>-M</code> |
1227 | Operator</h1> |
210b36aa |
1228 | |
8a93676d |
1229 | ... |
210b36aa |
1230 | |
8a93676d |
1231 | <a href="somedoc#About_the_-M_Operator">About the <code>-M</code> |
1232 | Operator" in somedoc</a> |
1233 | |
1234 | =item * |
1235 | |
1236 | Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>> |
1237 | links from C<LE<lt>name/itemE<gt>> links (and their targets). These |
1238 | have been merged syntactically and semantically in the current |
1239 | specification, and I<section> can refer either to a "=headI<n> Heading |
1240 | Content" command or to a "=item Item Content" command. This |
1241 | specification does not specify what behavior should be in the case |
1242 | of a given document having several things all seeming to produce the |
1243 | same I<section> identifier (e.g., in HTML, several things all producing |
1244 | the same I<anchorname> in <a name="I<anchorname>">...</a> |
1245 | elements). Where Pod processors can control this behavior, they should |
1246 | use the first such anchor. That is, C<LE<lt>Foo/BarE<gt>> refers to the |
1247 | I<first> "Bar" section in Foo. |
1248 | |
1249 | But for some processors/formats this cannot be easily controlled; as |
1250 | with the HTML example, the behavior of multiple ambiguous |
1251 | <a name="I<anchorname>">...</a> is most easily just left up to |
1252 | browsers to decide. |
1253 | |
1254 | =item * |
1255 | |
1256 | Authors wanting to link to a particular (absolute) URL, must do so |
1257 | only with "LE<lt>scheme:...>" codes (like |
1258 | LE<lt>http://www.perl.org>), and must not attempt "LE<lt>Some Site |
1259 | Name|scheme:...>" codes. This restriction avoids many problems |
1260 | in parsing and rendering LE<lt>...> codes. |
1261 | |
1262 | =item * |
1263 | |
1264 | In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes |
1265 | for formatting or for EE<lt>...> escapes, as in: |
1266 | |
1267 | L<B<ummE<234>stuff>|...> |
1268 | |
1269 | For C<LE<lt>...E<gt>> codes without a "name|" part, only |
1270 | C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur -- no |
1271 | other formatting codes. That is, authors should not use |
1272 | "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>". |
1273 | |
1274 | Note, however, that formatting codes and ZE<lt>>'s can occur in any |
1275 | and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>, |
1276 | and I<url>). |
1277 | |
1278 | Authors must not nest LE<lt>...> codes. For example, "LE<lt>The |
1279 | LE<lt>Foo::Bar> man page>" should be treated as an error. |
1280 | |
1281 | =item * |
1282 | |
1283 | Note that Pod authors may use formatting codes inside the "text" |
1284 | part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">). |
1285 | |
1286 | In other words, this is valid: |
1287 | |
1288 | Go read L<the docs on C<$.>|perlvar/"$."> |
1289 | |
1290 | Some output formats that do allow rendering "LE<lt>...>" codes as |
1291 | hypertext, might not allow the link-text to be formatted; in |
1292 | that case, formatters will have to just ignore that formatting. |
1293 | |
1294 | =item * |
1295 | |
1296 | At time of writing, C<LE<lt>nameE<gt>> values are of two types: |
1297 | either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which |
1298 | might be a real Perl module or program in an @INC / PATH |
1299 | directory, or a .pod file in those places); or the name of a UNIX |
1300 | man page, like C<LE<lt>crontab(5)E<gt>>. In theory, C<LE<lt>chmodE<gt>> |
1301 | in ambiguous between a Pod page called "chmod", or the Unix man page |
1302 | "chmod" (in whatever man-section). However, the presence of a string |
1303 | in parens, as in "crontab(5)", is sufficient to signal that what |
1304 | is being discussed is not a Pod page, and so is presumably a |
1305 | UNIX man page. The distinction is of no importance to many |
1306 | Pod processors, but some processors that render to hypertext formats |
1307 | may need to distinguish them in order to know how to render a |
1308 | given C<LE<lt>fooE<gt>> code. |
1309 | |
1310 | =item * |
1311 | |
1312 | Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax |
1313 | (as in "C<LE<lt>Object AttributesE<gt>>"), which was not easily distinguishable |
1314 | from C<LE<lt>nameE<gt>> syntax. This syntax is no longer in the |
1315 | specification, and has been replaced by the C<LE<lt>"section"E<gt>> syntax |
1316 | (where the quotes were formerly optional). Pod parsers should tolerate |
1317 | the C<LE<lt>sectionE<gt>> syntax, for a while at least. The suggested |
1318 | heuristic for distinguishing C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> |
1319 | is that if it contains any whitespace, it's a I<section>. Pod processors |
1320 | may warn about this being deprecated syntax. |
1321 | |
1322 | =back |
1323 | |
1324 | =head1 About =over...=back Regions |
1325 | |
1326 | "=over"..."=back" regions are used for various kinds of list-like |
1327 | structures. (I use the term "region" here simply as a collective |
1328 | term for everything from the "=over" to the matching "=back".) |
1329 | |
1330 | =over |
1331 | |
1332 | =item * |
1333 | |
1334 | The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ... |
1335 | "=back" is used for giving the formatter a clue as to how many |
1336 | "spaces" (ems, or roughly equivalent units) it should tab over, |
1337 | although many formatters will have to convert this to an absolute |
1338 | measurement that may not exactly match with the size of spaces (or M's) |
1339 | in the document's base font. Other formatters may have to completely |
1340 | ignore the number. The lack of any explicit I<indentlevel> parameter is |
1341 | equivalent to an I<indentlevel> value of 4. Pod processors may |
1342 | complain if I<indentlevel> is present but is not a positive number |
1343 | matching C<m/\A(\d*\.)?\d+\z/>. |
1344 | |
1345 | =item * |
1346 | |
1347 | Authors of Pod formatters are reminded that "=over" ... "=back" may |
1348 | map to several different constructs in your output format. For |
1349 | example, in converting Pod to (X)HTML, it can map to any of |
1350 | <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or |
1351 | <blockquote>...</blockquote>. Similarly, "=item" can map to <li> or |
1352 | <dt>. |
1353 | |
1354 | =item * |
1355 | |
1356 | Each "=over" ... "=back" region should be one of the following: |
1357 | |
1358 | =over |
1359 | |
1360 | =item * |
1361 | |
1362 | An "=over" ... "=back" region containing only "=item *" commands, |
1363 | each followed by some number of ordinary/verbatim paragraphs, other |
1364 | nested "=over" ... "=back" regions, "=for..." paragraphs, and |
1365 | "=begin"..."=end" regions. |
1366 | |
1367 | (Pod processors must tolerate a bare "=item" as if it were "=item |
1368 | *".) Whether "*" is rendered as a literal asterisk, an "o", or as |
1369 | some kind of real bullet character, is left up to the Pod formatter, |
1370 | and may depend on the level of nesting. |
1371 | |
1372 | =item * |
1373 | |
1374 | An "=over" ... "=back" region containing only |
1375 | C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them) |
1376 | followed by some number of ordinary/verbatim paragraphs, other nested |
1377 | "=over" ... "=back" regions, "=for..." paragraphs, and/or |
1378 | "=begin"..."=end" codes. Note that the numbers must start at 1 |
1379 | in each section, and must proceed in order and without skipping |
1380 | numbers. |
1381 | |
1382 | (Pod processors must tolerate lines like "=item 1" as if they were |
1383 | "=item 1.", with the period.) |
1384 | |
1385 | =item * |
1386 | |
1387 | An "=over" ... "=back" region containing only "=item [text]" |
1388 | commands, each one (or each group of them) followed by some number of |
1389 | ordinary/verbatim paragraphs, other nested "=over" ... "=back" |
1390 | regions, or "=for..." paragraphs, and "=begin"..."=end" regions. |
1391 | |
1392 | The "=item [text]" paragraph should not match |
1393 | C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it |
1394 | match just C<m/\A=item\s*\z/>. |
1395 | |
1396 | =item * |
1397 | |
1398 | An "=over" ... "=back" region containing no "=item" paragraphs at |
1399 | all, and containing only some number of |
1400 | ordinary/verbatim paragraphs, and possibly also some nested "=over" |
1401 | ... "=back" regions, "=for..." paragraphs, and "=begin"..."=end" |
1402 | regions. Such an itemless "=over" ... "=back" region in Pod is |
1403 | equivalent in meaning to a "<blockquote>...</blockquote>" element in |
1404 | HTML. |
1405 | |
1406 | =back |
1407 | |
1408 | Note that with all the above cases, you can determine which type of |
1409 | "=over" ... "=back" you have, by examining the first (non-"=cut", |
1410 | non-"=pod") Pod paragraph after the "=over" command. |
1411 | |
1412 | =item * |
1413 | |
1414 | Pod formatters I<must> tolerate arbitrarily large amounts of text |
1415 | in the "=item I<text...>" paragraph. In practice, most such |
1416 | paragraphs are short, as in: |
1417 | |
1418 | =item For cutting off our trade with all parts of the world |
1419 | |
1420 | But they may be arbitrarily long: |
1421 | |
1422 | =item For transporting us beyond seas to be tried for pretended |
1423 | offenses |
1424 | |
1425 | =item He is at this time transporting large armies of foreign |
1426 | mercenaries to complete the works of death, desolation and |
1427 | tyranny, already begun with circumstances of cruelty and perfidy |
1428 | scarcely paralleled in the most barbarous ages, and totally |
1429 | unworthy the head of a civilized nation. |
1430 | |
1431 | =item * |
1432 | |
1433 | Pod processors should tolerate "=item *" / "=item I<number>" commands |
1434 | with no accompanying paragraph. The middle item is an example: |
1435 | |
1436 | =over |
210b36aa |
1437 | |
8a93676d |
1438 | =item 1 |
210b36aa |
1439 | |
8a93676d |
1440 | Pick up dry cleaning. |
210b36aa |
1441 | |
8a93676d |
1442 | =item 2 |
210b36aa |
1443 | |
8a93676d |
1444 | =item 3 |
210b36aa |
1445 | |
8a93676d |
1446 | Stop by the store. Get Abba Zabas, Stoli, and cheap lawn chairs. |
210b36aa |
1447 | |
8a93676d |
1448 | =back |
1449 | |
1450 | =item * |
1451 | |
1452 | No "=over" ... "=back" region can contain headings. Processors may |
1453 | treat such a heading as an error. |
1454 | |
1455 | =item * |
1456 | |
1457 | Note that an "=over" ... "=back" region should have some |
1458 | content. That is, authors should not have an empty region like this: |
1459 | |
1460 | =over |
210b36aa |
1461 | |
8a93676d |
1462 | =back |
1463 | |
1464 | Pod processors seeing such a contentless "=over" ... "=back" region, |
1465 | may ignore it, or may report it as an error. |
1466 | |
1467 | =item * |
1468 | |
1469 | Processors must tolerate an "=over" list that goes off the end of the |
1470 | document (i.e., which has no matching "=back"), but they may warn |
1471 | about such a list. |
1472 | |
1473 | =item * |
1474 | |
1475 | Authors of Pod formatters should note that this construct: |
1476 | |
1477 | =item Neque |
1478 | |
1479 | =item Porro |
1480 | |
1481 | =item Quisquam Est |
210b36aa |
1482 | |
8a93676d |
1483 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci |
1484 | velit, sed quia non numquam eius modi tempora incidunt ut |
1485 | labore et dolore magnam aliquam quaerat voluptatem. |
1486 | |
1487 | =item Ut Enim |
1488 | |
1489 | is semantically ambiguous, in a way that makes formatting decisions |
1490 | a bit difficult. On the one hand, it could be mention of an item |
1491 | "Neque", mention of another item "Porro", and mention of another |
1492 | item "Quisquam Est", with just the last one requiring the explanatory |
1493 | paragraph "Qui dolorem ipsum quia dolor..."; and then an item |
1494 | "Ut Enim". In that case, you'd want to format it like so: |
1495 | |
1496 | Neque |
210b36aa |
1497 | |
8a93676d |
1498 | Porro |
210b36aa |
1499 | |
8a93676d |
1500 | Quisquam Est |
1501 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci |
1502 | velit, sed quia non numquam eius modi tempora incidunt ut |
1503 | labore et dolore magnam aliquam quaerat voluptatem. |
1504 | |
1505 | Ut Enim |
1506 | |
1507 | But it could equally well be a discussion of three (related or equivalent) |
1508 | items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph |
1509 | explaining them all, and then a new item "Ut Enim". In that case, you'd |
1510 | probably want to format it like so: |
1511 | |
1512 | Neque |
1513 | Porro |
1514 | Quisquam Est |
1515 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci |
1516 | velit, sed quia non numquam eius modi tempora incidunt ut |
1517 | labore et dolore magnam aliquam quaerat voluptatem. |
1518 | |
1519 | Ut Enim |
1520 | |
1521 | But (for the forseeable future), Pod does not provide any way for Pod |
1522 | authors to distinguish which grouping is meant by the above |
1523 | "=item"-cluster structure. So formatters should format it like so: |
1524 | |
1525 | Neque |
1526 | |
1527 | Porro |
1528 | |
1529 | Quisquam Est |
1530 | |
1531 | Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci |
1532 | velit, sed quia non numquam eius modi tempora incidunt ut |
1533 | labore et dolore magnam aliquam quaerat voluptatem. |
1534 | |
1535 | Ut Enim |
1536 | |
210b36aa |
1537 | That is, there should be (at least roughly) equal spacing between |
8a93676d |
1538 | items as between paragraphs (although that spacing may well be less |
1539 | than the full height of a line of text). This leaves it to the reader |
1540 | to use (con)textual cues to figure out whether the "Qui dolorem |
1541 | ipsum..." paragraph applies to the "Quisquam Est" item or to all three |
1542 | items "Neque", "Porro", and "Quisquam Est". While not an ideal |
1543 | situation, this is preferable to providing formatting cues that may |
1544 | be actually contrary to the author's intent. |
1545 | |
1546 | =back |
1547 | |
1548 | |
1549 | |
1550 | =head1 About Data Paragraphs and "=begin/=end" Regions |
1551 | |
1552 | Data paragraphs are typically used for inlining non-Pod data that is |
1553 | to be used (typically passed through) when rendering the document to |
1554 | a specific format: |
1555 | |
1556 | =begin rtf |
210b36aa |
1557 | |
8a93676d |
1558 | \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} |
210b36aa |
1559 | |
8a93676d |
1560 | =end rtf |
1561 | |
1562 | The exact same effect could, incidentally, be achieved with a single |
1563 | "=for" paragraph: |
1564 | |
1565 | =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par} |
1566 | |
1567 | (Although that is not formally a data paragraph, it has the same |
1568 | meaning as one, and Pod parsers may parse it as one.) |
1569 | |
1570 | Another example of a data paragraph: |
1571 | |
1572 | =begin html |
210b36aa |
1573 | |
8a93676d |
1574 | I like <em>PIE</em>! |
210b36aa |
1575 | |
8a93676d |
1576 | <hr>Especially pecan pie! |
210b36aa |
1577 | |
8a93676d |
1578 | =end html |
1579 | |
1580 | If these were ordinary paragraphs, the Pod parser would try to |
1581 | expand the "EE<lt>/em>" (in the first paragraph) as a formatting |
1582 | code, just like "EE<lt>lt>" or "EE<lt>eacute>". But since this |
1583 | is in a "=begin I<identifier>"..."=end I<identifier>" region I<and> |
1584 | the identifier "html" doesn't begin have a ":" prefix, the contents |
1585 | of this region are stored as data paragraphs, instead of being |
1586 | processed as ordinary paragraphs (or if they began with a spaces |
1587 | and/or tabs, as verbatim paragraphs). |
1588 | |
1589 | As a further example: At time of writing, no "biblio" identifier is |
1590 | supported, but suppose some processor were written to recognize it as |
1591 | a way of (say) denoting a bibliographic reference (necessarily |
1592 | containing formatting codes in ordinary paragraphs). The fact that |
1593 | "biblio" paragraphs were meant for ordinary processing would be |
1594 | indicated by prefacing each "biblio" identifier with a colon: |
1595 | |
1596 | =begin :biblio |
1597 | |
1598 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures = |
1599 | Programs.> Prentice-Hall, Englewood Cliffs, NJ. |
1600 | |
1601 | =end :biblio |
1602 | |
1603 | This would signal to the parser that paragraphs in this begin...end |
1604 | region are subject to normal handling as ordinary/verbatim paragraphs |
1605 | (while still tagged as meant only for processors that understand the |
1606 | "biblio" identifier). The same effect could be had with: |
1607 | |
1608 | =for :biblio |
1609 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures = |
1610 | Programs.> Prentice-Hall, Englewood Cliffs, NJ. |
1611 | |
1612 | The ":" on these identifiers means simply "process this stuff |
1613 | normally, even though the result will be for some special target". |
1614 | I suggest that parser APIs report "biblio" as the target identifier, |
1615 | but also report that it had a ":" prefix. (And similarly, with the |
1616 | above "html", report "html" as the target identifier, and note the |
1617 | I<lack> of a ":" prefix.) |
1618 | |
1619 | Note that a "=begin I<identifier>"..."=end I<identifier>" region where |
1620 | I<identifier> begins with a colon, I<can> contain commands. For example: |
1621 | |
1622 | =begin :biblio |
210b36aa |
1623 | |
8a93676d |
1624 | Wirth's classic is available in several editions, including: |
210b36aa |
1625 | |
8a93676d |
1626 | =for comment |
1627 | hm, check abebooks.com for how much used copies cost. |
210b36aa |
1628 | |
8a93676d |
1629 | =over |
210b36aa |
1630 | |
8a93676d |
1631 | =item |
210b36aa |
1632 | |
8a93676d |
1633 | Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> |
1634 | Teubner, Stuttgart. [Yes, it's in German.] |
210b36aa |
1635 | |
8a93676d |
1636 | =item |
210b36aa |
1637 | |
8a93676d |
1638 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures = |
1639 | Programs.> Prentice-Hall, Englewood Cliffs, NJ. |
210b36aa |
1640 | |
8a93676d |
1641 | =back |
210b36aa |
1642 | |
8a93676d |
1643 | =end :biblio |
1644 | |
1645 | Note, however, a "=begin I<identifier>"..."=end I<identifier>" |
1646 | region where I<identifier> does I<not> begin with a colon, should not |
1647 | directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back", |
1648 | nor "=item". For example, this may be considered invalid: |
1649 | |
1650 | =begin somedata |
210b36aa |
1651 | |
8a93676d |
1652 | This is a data paragraph. |
210b36aa |
1653 | |
8a93676d |
1654 | =head1 Don't do this! |
210b36aa |
1655 | |
8a93676d |
1656 | This is a data paragraph too. |
210b36aa |
1657 | |
8a93676d |
1658 | =end somedata |
1659 | |
1660 | A Pod processor may signal that the above (specifically the "=head1" |
1661 | paragraph) is an error. Note, however, that the following should |
1662 | I<not> be treated as an error: |
1663 | |
1664 | =begin somedata |
210b36aa |
1665 | |
8a93676d |
1666 | This is a data paragraph. |
210b36aa |
1667 | |
8a93676d |
1668 | =cut |
210b36aa |
1669 | |
8a93676d |
1670 | # Yup, this isn't Pod anymore. |
1671 | sub excl { (rand() > .5) ? "hoo!" : "hah!" } |
210b36aa |
1672 | |
8a93676d |
1673 | =pod |
210b36aa |
1674 | |
8a93676d |
1675 | This is a data paragraph too. |
210b36aa |
1676 | |
8a93676d |
1677 | =end somedata |
1678 | |
1679 | And this too is valid: |
1680 | |
1681 | =begin someformat |
210b36aa |
1682 | |
8a93676d |
1683 | This is a data paragraph. |
210b36aa |
1684 | |
8a93676d |
1685 | And this is a data paragraph. |
210b36aa |
1686 | |
8a93676d |
1687 | =begin someotherformat |
210b36aa |
1688 | |
8a93676d |
1689 | This is a data paragraph too. |
210b36aa |
1690 | |
8a93676d |
1691 | And this is a data paragraph too. |
210b36aa |
1692 | |
8a93676d |
1693 | =begin :yetanotherformat |
1694 | |
1695 | =head2 This is a command paragraph! |
1696 | |
1697 | This is an ordinary paragraph! |
210b36aa |
1698 | |
8a93676d |
1699 | And this is a verbatim paragraph! |
210b36aa |
1700 | |
8a93676d |
1701 | =end :yetanotherformat |
210b36aa |
1702 | |
8a93676d |
1703 | =end someotherformat |
210b36aa |
1704 | |
8a93676d |
1705 | Another data paragraph! |
210b36aa |
1706 | |
8a93676d |
1707 | =end someformat |
1708 | |
1709 | The contents of the above "=begin :yetanotherformat" ... |
1710 | "=end :yetanotherformat" region I<aren't> data paragraphs, because |
1711 | the immediately containing region's identifier (":yetanotherformat") |
1712 | begins with a colon. In practice, most regions that contain |
1713 | data paragraphs will contain I<only> data paragraphs; however, |
1714 | the above nesting is syntactically valid as Pod, even if it is |
1715 | rare. However, the handlers for some formats, like "html", |
1716 | will accept only data paragraphs, not nested regions; and they may |
1717 | complain if they see (targeted for them) nested regions, or commands, |
1718 | other than "=end", "=pod", and "=cut". |
1719 | |
1720 | Also consider this valid structure: |
1721 | |
1722 | =begin :biblio |
210b36aa |
1723 | |
8a93676d |
1724 | Wirth's classic is available in several editions, including: |
210b36aa |
1725 | |
8a93676d |
1726 | =over |
210b36aa |
1727 | |
8a93676d |
1728 | =item |
210b36aa |
1729 | |
8a93676d |
1730 | Wirth, Niklaus. 1975. I<Algorithmen und Datenstrukturen.> |
1731 | Teubner, Stuttgart. [Yes, it's in German.] |
210b36aa |
1732 | |
8a93676d |
1733 | =item |
210b36aa |
1734 | |
8a93676d |
1735 | Wirth, Niklaus. 1976. I<Algorithms + Data Structures = |
1736 | Programs.> Prentice-Hall, Englewood Cliffs, NJ. |
1737 | |
1738 | =back |
210b36aa |
1739 | |
8a93676d |
1740 | Buy buy buy! |
210b36aa |
1741 | |
8a93676d |
1742 | =begin html |
210b36aa |
1743 | |
8a93676d |
1744 | <img src='wirth_spokesmodeling_book.png'> |
210b36aa |
1745 | |
8a93676d |
1746 | <hr> |
210b36aa |
1747 | |
8a93676d |
1748 | =end html |
210b36aa |
1749 | |
8a93676d |
1750 | Now now now! |
210b36aa |
1751 | |
8a93676d |
1752 | =end :biblio |
1753 | |
1754 | There, the "=begin html"..."=end html" region is nested inside |
1755 | the larger "=begin :biblio"..."=end :biblio" region. Note that the |
1756 | content of the "=begin html"..."=end html" region is data |
1757 | paragraph(s), because the immediately containing region's identifier |
1758 | ("html") I<doesn't> begin with a colon. |
1759 | |
1760 | Pod parsers, when processing a series of data paragraphs one |
1761 | after another (within a single region), should consider them to |
1762 | be one large data paragraph that happens to contain blank lines. So |
1763 | the content of the above "=begin html"..."=end html" I<may> be stored |
1764 | as two data paragraphs (one consisting of |
1765 | "<img src='wirth_spokesmodeling_book.png'>\n" |
1766 | and another consisting of "<hr>\n"), but I<should> be stored as |
1767 | a single data paragraph (consisting of |
1768 | "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n"). |
1769 | |
1770 | Pod processors should tolerate empty |
1771 | "=begin I<something>"..."=end I<something>" regions, |
1772 | empty "=begin :I<something>"..."=end :I<something>" regions, and |
1773 | contentless "=for I<something>" and "=for :I<something>" |
1774 | paragraphs. I.e., these should be tolerated: |
1775 | |
1776 | =for html |
210b36aa |
1777 | |
8a93676d |
1778 | =begin html |
210b36aa |
1779 | |
8a93676d |
1780 | =end html |
210b36aa |
1781 | |
8a93676d |
1782 | =begin :biblio |
210b36aa |
1783 | |
8a93676d |
1784 | =end :biblio |
1785 | |
1786 | Incidentally, note that there's no easy way to express a data |
1787 | paragraph starting with something that looks like a command. Consider: |
1788 | |
1789 | =begin stuff |
210b36aa |
1790 | |
8a93676d |
1791 | =shazbot |
210b36aa |
1792 | |
8a93676d |
1793 | =end stuff |
1794 | |
1795 | There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data |
1796 | paragraph "=shazbot\n". However, you can express a data paragraph consisting |
1797 | of "=shazbot\n" using this code: |
1798 | |
1799 | =for stuff =shazbot |
1800 | |
1801 | The situation where this is necessary, is presumably quite rare. |
1802 | |
1803 | Note that =end commands must match the currently open =begin command. That |
1804 | is, they must properly nest. For example, this is valid: |
1805 | |
1806 | =begin outer |
210b36aa |
1807 | |
8a93676d |
1808 | X |
210b36aa |
1809 | |
8a93676d |
1810 | =begin inner |
210b36aa |
1811 | |
8a93676d |
1812 | Y |
210b36aa |
1813 | |
8a93676d |
1814 | =end inner |
210b36aa |
1815 | |
8a93676d |
1816 | Z |
210b36aa |
1817 | |
8a93676d |
1818 | =end outer |
1819 | |
1820 | while this is invalid: |
1821 | |
1822 | =begin outer |
210b36aa |
1823 | |
8a93676d |
1824 | X |
210b36aa |
1825 | |
8a93676d |
1826 | =begin inner |
210b36aa |
1827 | |
8a93676d |
1828 | Y |
210b36aa |
1829 | |
8a93676d |
1830 | =end outer |
210b36aa |
1831 | |
8a93676d |
1832 | Z |
210b36aa |
1833 | |
8a93676d |
1834 | =end inner |
210b36aa |
1835 | |
8a93676d |
1836 | This latter is improper because when the "=end outer" command is seen, the |
1837 | currently open region has the formatname "inner", not "outer". (It just |
1838 | happens that "outer" is the format name of a higher-up region.) This is |
1839 | an error. Processors must by default report this as an error, and may halt |
210b36aa |
1840 | processing the document containing that error. A corollary of this is that |
8a93676d |
1841 | regions cannot "overlap" -- i.e., the latter block above does not represent |
1842 | a region called "outer" which contains X and Y, overlapping a region called |
1843 | "inner" which contains Y and Z. But because it is invalid (as all |
1844 | apparently overlapping regions would be), it doesn't represent that, or |
1845 | anything at all. |
1846 | |
1847 | Similarly, this is invalid: |
1848 | |
1849 | =begin thing |
210b36aa |
1850 | |
8a93676d |
1851 | =end hting |
1852 | |
1853 | This is an error because the region is opened by "thing", and the "=end" |
1854 | tries to close "hting" [sic]. |
1855 | |
1856 | This is also invalid: |
1857 | |
1858 | =begin thing |
210b36aa |
1859 | |
8a93676d |
1860 | =end |
1861 | |
1862 | This is invalid because every "=end" command must have a formatname |
1863 | parameter. |
1864 | |
1865 | =head1 SEE ALSO |
1866 | |
1867 | L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">, |
1868 | L<podchecker> |
1869 | |
1870 | =head1 AUTHOR |
1871 | |
1872 | Sean M. Burke |
1873 | |
1874 | =cut |
1875 | |
1876 | |