5 Pod::Simple::XHTML -- format Pod as validating XHTML
9 use Pod::Simple::XHTML;
11 my $parser = Pod::Simple::XHTML->new();
15 $parser->parse_file('path/to/file.pod');
19 This class is a formatter that takes Pod and renders it as XHTML
22 This is a subclass of L<Pod::Simple::Methody> and inherits all its
23 methods. The implementation is entirely different than
24 L<Pod::Simple::HTML>, but it largely preserves the same interface.
28 package Pod::Simple::XHTML;
30 use vars qw( $VERSION @ISA $HAS_HTML_ENTITIES );
33 use Pod::Simple::Methody ();
34 @ISA = ('Pod::Simple::Methody');
37 $HAS_HTML_ENTITIES = eval "require HTML::Entities; 1";
49 return HTML::Entities::encode_entities( $_[0] ) if $HAS_HTML_ENTITIES;
51 my $ents = join '', keys %entities;
52 $str =~ s/([$ents])/'&' . $entities{$1} . ';'/ge;
56 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 Pod::Simple::XHTML offers a number of methods that modify the format of
61 the HTML output. Call these after creating the parser object, but before
62 the call to C<parse_file>:
64 my $parser = Pod::PseudoPod::HTML->new();
65 $parser->set_optional_param("value");
66 $parser->parse_file($file);
68 =head2 perldoc_url_prefix
70 In turning L<Foo::Bar> into http://whatever/Foo%3a%3aBar, what
71 to put before the "Foo%3a%3aBar". The default value is
72 "http://search.cpan.org/perldoc?".
74 =head2 perldoc_url_postfix
76 What to put after "Foo%3a%3aBar" in the URL. This option is not set by
81 In turning C<< L<crontab(5)> >> into http://whatever/man/1/crontab, what
82 to put before the "1/crontab". The default value is
83 "http://man.he.net/man".
85 =head2 man_url_postfix
87 What to put after "1/crontab" in the URL. This option is not set by default.
89 =head2 title_prefix, title_postfix
91 What to put before and after the title in the head. The values should
96 $parser->html_css('path/to/style.css');
98 The URL or relative path of a CSS file to include. This option is not
101 =head2 html_javascript
103 The URL or relative path of a JavaScript file to pull in. This option is
108 A document type tag for the file. This option is not set by default.
110 =head2 html_header_tags
112 Additional arbitrary HTML tags for the header of the document. The
113 default value is just a content type header tag:
115 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
117 Add additional meta tags here, or blocks of inline CSS or JavaScript
118 (wrapped in the appropriate tags).
122 This is the level of HTML "Hn" element to which a Pod "head1" corresponds. For
123 example, if C<html_h_level> is set to 2, a head1 will produce an H2, a head2
124 will produce an H3, and so on.
128 Set a default title for the page if no title can be determined from the
129 content. The value of this string should already be &-escaped.
133 Force a title for the page (don't try to determine it from the content).
134 The value of this string should already be &-escaped.
136 =head2 html_header, html_footer
138 Set the HTML output at the beginning and end of each file. The default
139 header includes a title, a doctype tag (if C<html_doctype> is set), a
140 content tag (customized by C<html_header_tags>), a tag for a CSS file
141 (if C<html_css> is set), and a tag for a Javascript file (if
142 C<html_javascript> is set). The default footer simply closes the C<html>
145 The options listed above customize parts of the default header, but
146 setting C<html_header> or C<html_footer> completely overrides the
147 built-in header or footer. These may be useful if you want to use
148 template tags instead of literal HTML headers and footers or are
149 integrating converted POD pages in a larger website.
151 If you want no headers or footers output in the HTML, set these options
156 Whether to add a table-of-contents at the top of each page (called an
157 index for the sake of tradition).
162 __PACKAGE__->_accessorize(
163 'perldoc_url_prefix',
164 'perldoc_url_postfix',
167 'title_prefix', 'title_postfix',
173 'title', # Used internally for the title extracted from the content
179 'batch_mode', # whether we're in batch mode
180 'batch_mode_current_level',
181 # When in batch mode, how deep the current module is: 1 for "LWP",
182 # 2 for "LWP::Procotol", 3 for "LWP::Protocol::GHTTP", etc
185 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189 If the standard options aren't enough, you may want to subclass
190 Pod::Simple::XHMTL. These are the most likely candidates for methods
191 you'll want to override when subclassing.
197 my $new = $self->SUPER::new(@_);
198 $new->{'output_fh'} ||= *STDOUT{IO};
199 $new->perldoc_url_prefix('http://search.cpan.org/perldoc?');
200 $new->man_url_prefix('http://man.he.net/man');
201 $new->html_header_tags('<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />');
202 $new->nix_X_codes(1);
203 $new->codes_in_verbatim(1);
204 $new->{'scratch'} = '';
205 $new->{'to_index'} = [];
206 $new->{'output'} = [];
207 $new->{'saved'} = [];
210 $new->{'__region_targets'} = [];
211 $new->{'__literal_targets'} = {};
212 $new->accept_targets_as_html( 'html', 'HTML' );
217 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221 This method handles the body of text within any element: it's the body
222 of a paragraph, or everything between a "=begin" tag and the
223 corresponding "=end" tag, or the text within an L entity, etc. You would
224 want to override this if you are adding a custom element type that does
225 more than just display formatted text. Perhaps adding a way to generate
226 HTML tables from an extended version of POD.
228 So, let's say you want add a custom element called 'foo'. In your
229 subclass's C<new> method, after calling C<SUPER::new> you'd call:
231 $new->accept_targets_as_text( 'foo' );
233 Then override the C<start_for> method in the subclass to check for when
234 "$flags->{'target'}" is equal to 'foo' and set a flag that marks that
235 you're in a foo block (maybe "$self->{'in_foo'} = 1"). Then override the
236 C<handle_text> method to check for the flag, and pass $text to your
237 custom subroutine to construct the HTML output for 'foo' elements,
241 my ($self, $text) = @_;
242 if ($self->{'in_foo'}) {
243 $self->{'scratch'} .= build_foo_html($text);
245 $self->{'scratch'} .= $text;
249 =head2 accept_targets_as_html
251 This method behaves like C<accept_targets_as_text>, but also marks the region
252 as one whose content should be emitted literally, without HTML entity escaping
253 or wrapping in a C<div> element.
257 sub __in_literal_xhtml_region {
258 return unless @{ $_[0]{__region_targets} };
259 my $target = $_[0]{__region_targets}[-1];
260 return $_[0]{__literal_targets}{ $target };
263 sub accept_targets_as_html {
264 my ($self, @targets) = @_;
265 $self->accept_targets(@targets);
266 $self->{__literal_targets}{$_} = 1 for @targets;
270 # escape special characters in HTML (<, >, &, etc)
271 $_[0]{'scratch'} .= $_[0]->__in_literal_xhtml_region
273 : encode_entities( $_[1] );
276 sub start_Para { $_[0]{'scratch'} = '<p>' }
277 sub start_Verbatim { $_[0]{'scratch'} = '<pre><code>' }
279 sub start_head1 { $_[0]{'in_head'} = 1 }
280 sub start_head2 { $_[0]{'in_head'} = 2 }
281 sub start_head3 { $_[0]{'in_head'} = 3 }
282 sub start_head4 { $_[0]{'in_head'} = 4 }
284 sub start_item_number {
285 $_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
286 $_[0]{'scratch'} .= '<li><p>';
290 sub start_item_bullet {
291 $_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
292 $_[0]{'scratch'} .= '<li><p>';
296 sub start_item_text {
297 if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
298 $_[0]{'scratch'} = "</dd>\n";
299 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
301 $_[0]{'scratch'} .= '<dt>';
304 sub start_over_bullet { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
305 sub start_over_block { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
306 sub start_over_number { $_[0]{'scratch'} = '<ol>'; $_[0]->emit }
307 sub start_over_text {
308 $_[0]{'scratch'} = '<dl>';
310 $_[0]{'in_dd'} ||= [];
314 sub end_over_block { $_[0]{'scratch'} .= '</ul>'; $_[0]->emit }
316 sub end_over_number {
317 $_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
318 $_[0]{'scratch'} .= '</ol>';
322 sub end_over_bullet {
323 $_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
324 $_[0]{'scratch'} .= '</ul>';
329 if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
330 $_[0]{'scratch'} = "</dd>\n";
331 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
333 $_[0]{'scratch'} .= '</dl>';
338 # . . . . . Now the actual formatters:
340 sub end_Para { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
342 $_[0]{'scratch'} .= '</code></pre>';
347 my $h = delete $_[0]{in_head};
349 my $add = $_[0]->html_h_level;
350 $add = 1 unless defined $add;
353 my $id = $_[0]->idify($_[0]{scratch});
354 my $text = $_[0]{scratch};
355 $_[0]{'scratch'} = qq{<h$h id="$id">$text</h$h>};
357 push @{ $_[0]{'to_index'} }, [$h, $id, $text];
360 sub end_head1 { shift->_end_head(@_); }
361 sub end_head2 { shift->_end_head(@_); }
362 sub end_head3 { shift->_end_head(@_); }
363 sub end_head4 { shift->_end_head(@_); }
365 sub end_item_bullet { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
366 sub end_item_number { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
369 $_[0]{'scratch'} .= "</dt>\n<dd>";
370 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 1;
374 # This handles =begin and =for blocks of all kinds.
376 my ($self, $flags) = @_;
378 push @{ $self->{__region_targets} }, $flags->{target_matching};
380 unless ($self->__in_literal_xhtml_region) {
381 $self->{scratch} .= '<div';
382 $self->{scratch} .= qq( class="$flags->{target}") if $flags->{target};
383 $self->{scratch} .= '>';
392 $self->{'scratch'} .= '</div>' unless $self->__in_literal_xhtml_region;
394 pop @{ $self->{__region_targets} };
400 if (defined $self->html_header) {
401 $self->{'scratch'} .= $self->html_header;
402 $self->emit unless $self->html_header eq "";
404 my ($doctype, $title, $metatags);
405 $doctype = $self->html_doctype || '';
406 $title = $self->force_title || $self->title || $self->default_title || '';
407 $metatags = $self->html_header_tags || '';
408 if ($self->html_css) {
409 $metatags .= "\n<link rel='stylesheet' href='" .
410 $self->html_css . "' type='text/css'>";
412 if ($self->html_javascript) {
413 $metatags .= "\n<script type='text/javascript' src='" .
414 $self->html_javascript . "'></script>";
416 $self->{'scratch'} .= <<"HTML";
420 <title>$title</title>
431 my $to_index = $self->{'to_index'};
432 if ($self->index && @{ $to_index } ) {
437 my $id = ' id="index"';
439 for my $h (@{ $to_index }, [0]) {
440 my $target_level = $h->[0];
441 # Get to target_level by opening or closing ULs
442 if ($level == $target_level) {
444 } elsif ($level > $target_level) {
445 $out[-1] .= '</li>' if $out[-1] =~ /^\s+<li>/;
446 while ($level > $target_level) {
448 push @out, (' ' x --$indent) . '</li>' if @out && $out[-1] =~ m{^\s+<\/ul};
449 push @out, (' ' x --$indent) . '</ul>';
451 push @out, (' ' x --$indent) . '</li>' if $level;
453 while ($level < $target_level) {
455 push @out, (' ' x ++$indent) . '<li>' if @out && $out[-1]=~ /^\s*<ul/;
456 push @out, (' ' x ++$indent) . "<ul$id>";
463 $space = ' ' x $indent;
464 push @out, sprintf '%s<li><a href="#%s">%s</a>',
465 $space, $h->[1], $h->[2];
467 # Splice the index in between the HTML headers and the first element.
468 my $offset = defined $self->html_header ? $self->html_header eq '' ? 0 : 1 : 1;
469 splice @{ $self->{'output'} }, $offset, 0, join "\n", @out;
472 if (defined $self->html_footer) {
473 $self->{'scratch'} .= $self->html_footer;
474 $self->emit unless $self->html_footer eq "";
476 $self->{'scratch'} .= "</body>\n</html>";
481 print {$self->{'output_fh'}} join ("\n\n", @{ $self->{'output'} }), "\n\n";
482 @{$self->{'output'}} = ();
488 sub start_B { $_[0]{'scratch'} .= '<b>' }
489 sub end_B { $_[0]{'scratch'} .= '</b>' }
491 sub start_C { $_[0]{'scratch'} .= '<code>' }
492 sub end_C { $_[0]{'scratch'} .= '</code>' }
494 sub start_F { $_[0]{'scratch'} .= '<i>' }
495 sub end_F { $_[0]{'scratch'} .= '</i>' }
497 sub start_I { $_[0]{'scratch'} .= '<i>' }
498 sub end_I { $_[0]{'scratch'} .= '</i>' }
501 my ($self, $flags) = @_;
502 my ($type, $to, $section) = @{$flags}{'type', 'to', 'section'};
503 my $url = $type eq 'url' ? $to
504 : $type eq 'pod' ? $self->resolve_pod_page_link($to, $section)
505 : $type eq 'man' ? $self->resolve_man_page_link($to, $section)
508 # If it's an unknown type, use an attribute-less <a> like HTML.pm.
509 $self->{'scratch'} .= '<a' . ($url ? ' href="'. $url . '">' : '>');
512 sub end_L { $_[0]{'scratch'} .= '</a>' }
514 sub start_S { $_[0]{'scratch'} .= '<nobr>' }
515 sub end_S { $_[0]{'scratch'} .= '</nobr>' }
520 push @{ $self->{'output'} }, $self->{'scratch'};
522 print {$self->{'output_fh'}} $self->{'scratch'}, "\n\n";
524 $self->{'scratch'} = '';
528 =head2 resolve_pod_page_link
530 my $url = $pod->resolve_pod_page_link('Net::Ping', 'INSTALL');
531 my $url = $pod->resolve_pod_page_link('perlpodspec');
532 my $url = $pod->resolve_pod_page_link(undef, 'SYNOPSIS');
534 Resolves a POD link target (typically a module or POD file name) and section
535 name to a URL. The resulting link will be returned for the above examples as:
537 http://search.cpan.org/perldoc?Net::Ping#INSTALL
538 http://search.cpan.org/perldoc?perlpodspec
541 Note that when there is only a section argument the URL will simply be a link
542 to a section in the current document.
546 sub resolve_pod_page_link {
547 my ($self, $to, $section) = @_;
548 return undef unless defined $to || defined $section;
549 if (defined $section) {
550 $section = '#' . $self->idify($section, 1);
551 return $section unless defined $to;
556 return ($self->perldoc_url_prefix || '')
557 . encode_entities($to) . $section
558 . ($self->perldoc_url_postfix || '');
561 =head2 resolve_man_page_link
563 my $url = $pod->resolve_man_page_link('crontab(5)', 'EXAMPLE CRON FILE');
564 my $url = $pod->resolve_man_page_link('crontab');
566 Resolves a man page link target and numeric section to a URL. The resulting
567 link will be returned for the above examples as:
569 http://man.he.net/man5/crontab
570 http://man.he.net/man1/crontab
572 Note that the first argument is required. The section number will be parsed
573 from it, and if it's missing will default to 1. The second argument is
574 currently ignored, as L<man.he.net|http://man.he.net> does not currently
575 include linkable IDs or anchor names in its pages. Subclass to link to a
576 different man page HTTP server.
580 sub resolve_man_page_link {
581 my ($self, $to, $section) = @_;
582 return undef unless defined $to;
583 my ($page, $part) = $to =~ /^([^(]+)(?:[(](\d+)[)])?$/;
584 return undef unless $page;
585 return ($self->man_url_prefix || '')
586 . ($part || 1) . "/" . encode_entities($page)
587 . ($self->man_url_postfix || '');
593 my $id = $pod->idify($text);
594 my $hash = $pod->idify($text, 1);
596 This method turns an arbitrary string into a valid XHTML ID attribute value.
597 The rules enforced, following
598 L<http://webdesign.about.com/od/htmltags/a/aa031707.htm>, are:
604 The id must start with a letter (a-z or A-Z)
608 All subsequent characters can be letters, numbers (0-9), hyphens (-),
609 underscores (_), colons (:), and periods (.).
613 Each id must be unique within the document.
617 In addition, the returned value will be unique within the context of the
618 Pod::Simple::XHTML object unless a second argument is passed a true value. ID
619 attributes should always be unique within a single XHTML document, but pass
620 the true value if you are creating not an ID but a URL hash to point to
621 an ID (i.e., if you need to put the "#foo" in C<< <a href="#foo">foo</a> >>.
626 my ($self, $t, $not_unique) = @_;
628 s/<[^>]+>//g; # Strip HTML.
629 s/&[^;]+;//g; # Strip entities.
630 s/^([^a-zA-Z]+)$/pod$1/; # Prepend "pod" if no valid chars.
631 s/^[^a-zA-Z]+//; # First char must be a letter.
632 s/[^-a-zA-Z0-9_:.]+/-/g; # All other chars must be valid.
634 return $t if $not_unique;
636 $i++ while $self->{ids}{"$t$i"}++;
640 =head2 batch_mode_page_object_init
642 $pod->batch_mode_page_object_init($batchconvobj, $module, $infile, $outfile, $depth);
644 Called by L<Pod::Simple::HTMLBatch> so that the class has a chance to
645 initialize the converter. Internally it sets the C<batch_mode> property to
646 true and sets C<batch_mode_current_level()>, but Pod::Simple::XHTML does not
647 currently use those features. Subclasses might, though.
651 sub batch_mode_page_object_init {
652 my ($self, $batchconvobj, $module, $infile, $outfile, $depth) = @_;
653 $self->batch_mode(1);
654 $self->batch_mode_current_level($depth);
664 L<Pod::Simple>, L<Pod::Simple::Text>, L<Pod::Spell>
668 Questions or discussion about POD and Pod::Simple should be sent to the
669 pod-people@perl.org mail list. Send an empty email to
670 pod-people-subscribe@perl.org to subscribe.
672 This module is managed in an open GitHub repository,
673 L<http://github.com/theory/pod-simple/>. Feel free to fork and contribute, or
674 to clone L<git://github.com/theory/pod-simple.git> and send patches!
676 Patches against Pod::Simple are welcome. Please send bug reports to
677 <bug-pod-simple@rt.cpan.org>.
679 =head1 COPYRIGHT AND DISCLAIMERS
681 Copyright (c) 2003-2005 Allison Randal.
683 This library is free software; you can redistribute it and/or modify it
684 under the same terms as Perl itself.
686 This program is distributed in the hope that it will be useful, but
687 without any warranty; without even the implied warranty of
688 merchantability or fitness for a particular purpose.
690 =head1 ACKNOWLEDGEMENTS
692 Thanks to L<Hurricane Electrict|http://he.net/> for permission to use its
693 L<Linux man pages online|http://man.he.net/> site for man page links.
695 Thanks to L<search.cpan.org|http://search.cpan.org/> for permission to use the
696 site for Perl module links.
700 Pod::Simpele::XHTML was created by Allison Randal <allison@perl.org>.
702 Pod::Simple was created by Sean M. Burke <sburke@cpan.org>.
703 But don't bother him, he's retired.
705 Pod::Simple is maintained by:
709 =item * Allison Randal C<allison@perl.org>
711 =item * Hans Dieter Pearcey C<hdp@cpan.org>
713 =item * David E. Wheeler C<dwheeler@cpan.org>