5 Pod::Simple::XHTML -- format Pod as validating XHTML
9 use Pod::Simple::XHTML;
11 my $parser = Pod::Simple::XHTML->new();
15 $parser->parse_file('path/to/file.pod');
19 This class is a formatter that takes Pod and renders it as XHTML
22 This is a subclass of L<Pod::Simple::Methody> and inherits all its
23 methods. The implementation is entirely different than
24 L<Pod::Simple::HTML>, but it largely preserves the same interface.
28 package Pod::Simple::XHTML;
30 use vars qw( $VERSION @ISA $HAS_HTML_ENTITIES );
33 use Pod::Simple::Methody ();
34 @ISA = ('Pod::Simple::Methody');
37 $HAS_HTML_ENTITIES = eval "require HTML::Entities; 1";
49 return HTML::Entities::encode_entities( $_[0] ) if $HAS_HTML_ENTITIES;
51 my $ents = join '', keys %entities;
52 $str =~ s/([$ents])/'&' . $entities{$1} . ';'/ge;
56 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 Pod::Simple::XHTML offers a number of methods that modify the format of
61 the HTML output. Call these after creating the parser object, but before
62 the call to C<parse_file>:
64 my $parser = Pod::PseudoPod::HTML->new();
65 $parser->set_optional_param("value");
66 $parser->parse_file($file);
68 =head2 perldoc_url_prefix
70 In turning L<Foo::Bar> into http://whatever/Foo%3a%3aBar, what
71 to put before the "Foo%3a%3aBar". The default value is
72 "http://search.cpan.org/perldoc?".
74 =head2 perldoc_url_postfix
76 What to put after "Foo%3a%3aBar" in the URL. This option is not set by
79 =head2 title_prefix, title_postfix
81 What to put before and after the title in the head. The values should
86 $parser->html_css('path/to/style.css');
88 The URL or relative path of a CSS file to include. This option is not
91 =head2 html_javascript
93 The URL or relative path of a JavaScript file to pull in. This option is
98 A document type tag for the file. This option is not set by default.
100 =head2 html_header_tags
102 Additional arbitrary HTML tags for the header of the document. The
103 default value is just a content type header tag:
105 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
107 Add additional meta tags here, or blocks of inline CSS or JavaScript
108 (wrapped in the appropriate tags).
112 Set a default title for the page if no title can be determined from the
113 content. The value of this string should already be &-escaped.
117 Force a title for the page (don't try to determine it from the content).
118 The value of this string should already be &-escaped.
120 =head2 html_header, html_footer
122 Set the HTML output at the beginning and end of each file. The default
123 header includes a title, a doctype tag (if C<html_doctype> is set), a
124 content tag (customized by C<html_header_tags>), a tag for a CSS file
125 (if C<html_css> is set), and a tag for a Javascript file (if
126 C<html_javascript> is set). The default footer simply closes the C<html>
129 The options listed above customize parts of the default header, but
130 setting C<html_header> or C<html_footer> completely overrides the
131 built-in header or footer. These may be useful if you want to use
132 template tags instead of literal HTML headers and footers or are
133 integrating converted POD pages in a larger website.
135 If you want no headers or footers output in the HTML, set these options
140 Whether to add a table-of-contents at the top of each page (called an
141 index for the sake of tradition).
146 __PACKAGE__->_accessorize(
147 'perldoc_url_prefix',
148 'perldoc_url_postfix',
149 'title_prefix', 'title_postfix',
154 'title', # Used internally for the title extracted from the content
160 'batch_mode', # whether we're in batch mode
161 'batch_mode_current_level',
162 # When in batch mode, how deep the current module is: 1 for "LWP",
163 # 2 for "LWP::Procotol", 3 for "LWP::Protocol::GHTTP", etc
166 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
170 If the standard options aren't enough, you may want to subclass
171 Pod::Simple::XHMTL. These are the most likely candidates for methods
172 you'll want to override when subclassing.
178 my $new = $self->SUPER::new(@_);
179 $new->{'output_fh'} ||= *STDOUT{IO};
180 $new->accept_targets( 'html', 'HTML' );
181 $new->perldoc_url_prefix('http://search.cpan.org/perldoc?');
182 $new->html_header_tags('<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />');
183 $new->nix_X_codes(1);
184 $new->codes_in_verbatim(1);
185 $new->{'scratch'} = '';
186 $new->{'to_index'} = [];
187 $new->{'output'} = [];
188 $new->{'saved'} = [];
193 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197 This method handles the body of text within any element: it's the body
198 of a paragraph, or everything between a "=begin" tag and the
199 corresponding "=end" tag, or the text within an L entity, etc. You would
200 want to override this if you are adding a custom element type that does
201 more than just display formatted text. Perhaps adding a way to generate
202 HTML tables from an extended version of POD.
204 So, let's say you want add a custom element called 'foo'. In your
205 subclass's C<new> method, after calling C<SUPER::new> you'd call:
207 $new->accept_targets_as_text( 'foo' );
209 Then override the C<start_for> method in the subclass to check for when
210 "$flags->{'target'}" is equal to 'foo' and set a flag that marks that
211 you're in a foo block (maybe "$self->{'in_foo'} = 1"). Then override the
212 C<handle_text> method to check for the flag, and pass $text to your
213 custom subroutine to construct the HTML output for 'foo' elements,
217 my ($self, $text) = @_;
218 if ($self->{'in_foo'}) {
219 $self->{'scratch'} .= build_foo_html($text);
221 $self->{'scratch'} .= $text;
228 # escape special characters in HTML (<, >, &, etc)
229 $_[0]{'scratch'} .= encode_entities( $_[1] )
232 sub start_Para { $_[0]{'scratch'} = '<p>' }
233 sub start_Verbatim { $_[0]{'scratch'} = '<pre><code>' }
235 sub start_head1 { $_[0]{'in_head'} = 1 }
236 sub start_head2 { $_[0]{'in_head'} = 2 }
237 sub start_head3 { $_[0]{'in_head'} = 3 }
238 sub start_head4 { $_[0]{'in_head'} = 4 }
240 sub start_item_number {
241 $_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
242 $_[0]{'scratch'} .= '<li><p>';
246 sub start_item_bullet {
247 $_[0]{'scratch'} = "</li>\n" if $_[0]{'in_li'};
248 $_[0]{'scratch'} .= '<li><p>';
252 sub start_item_text {
253 if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
254 $_[0]{'scratch'} = "</dd>\n";
255 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
257 $_[0]{'scratch'} .= '<dt>';
260 sub start_over_bullet { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
261 sub start_over_block { $_[0]{'scratch'} = '<ul>'; $_[0]->emit }
262 sub start_over_number { $_[0]{'scratch'} = '<ol>'; $_[0]->emit }
263 sub start_over_text {
264 $_[0]{'scratch'} = '<dl>';
266 $_[0]{'in_dd'} ||= [];
270 sub end_over_block { $_[0]{'scratch'} .= '</ul>'; $_[0]->emit }
272 sub end_over_number {
273 $_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
274 $_[0]{'scratch'} .= '</ol>';
278 sub end_over_bullet {
279 $_[0]{'scratch'} = "</li>\n" if delete $_[0]{'in_li'};
280 $_[0]{'scratch'} .= '</ul>';
285 if ($_[0]{'in_dd'}[ $_[0]{'dl_level'} ]) {
286 $_[0]{'scratch'} = "</dd>\n";
287 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 0;
289 $_[0]{'scratch'} .= '</dl>';
294 # . . . . . Now the actual formatters:
296 sub end_Para { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
298 $_[0]{'scratch'} .= '</code></pre>';
303 my $h = delete $_[0]{in_head};
304 my $id = $_[0]->idify($_[0]{scratch});
305 my $text = $_[0]{scratch};
306 $_[0]{'scratch'} = qq{<h$h id="$id">$text</h$h>};
308 push @{ $_[0]{'to_index'} }, [$h, $id, $text];
311 sub end_head1 { shift->_end_head(@_); }
312 sub end_head2 { shift->_end_head(@_); }
313 sub end_head3 { shift->_end_head(@_); }
314 sub end_head4 { shift->_end_head(@_); }
316 sub end_item_bullet { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
317 sub end_item_number { $_[0]{'scratch'} .= '</p>'; $_[0]->emit }
320 $_[0]{'scratch'} .= "</dt>\n<dd>";
321 $_[0]{'in_dd'}[ $_[0]{'dl_level'} ] = 1;
325 # This handles =begin and =for blocks of all kinds.
327 my ($self, $flags) = @_;
328 $self->{'scratch'} .= '<div';
329 $self->{'scratch'} .= ' class="'.$flags->{'target'}.'"' if ($flags->{'target'});
330 $self->{'scratch'} .= '>';
336 $self->{'scratch'} .= '</div>';
342 if (defined $self->html_header) {
343 $self->{'scratch'} .= $self->html_header;
344 $self->emit unless $self->html_header eq "";
346 my ($doctype, $title, $metatags);
347 $doctype = $self->html_doctype || '';
348 $title = $self->force_title || $self->title || $self->default_title || '';
349 $metatags = $self->html_header_tags || '';
350 if ($self->html_css) {
351 $metatags .= "\n<link rel='stylesheet' href='" .
352 $self->html_css . "' type='text/css'>";
354 if ($self->html_javascript) {
355 $metatags .= "\n<script type='text/javascript' src='" .
356 $self->html_javascript . "'></script>";
358 $self->{'scratch'} .= <<"HTML";
362 <title>$title</title>
373 my $to_index = $self->{'to_index'};
374 if ($self->index && @{ $to_index } ) {
379 my $id = ' id="index"';
381 for my $h (@{ $to_index }, [0]) {
382 my $target_level = $h->[0];
383 # Get to target_level by opening or closing ULs
384 if ($level == $target_level) {
386 } elsif ($level > $target_level) {
387 $out[-1] .= '</li>' if $out[-1] =~ /^\s+<li>/;
388 while ($level > $target_level) {
390 push @out, (' ' x --$indent) . '</li>' if @out && $out[-1] =~ m{^\s+<\/ul};
391 push @out, (' ' x --$indent) . '</ul>';
393 push @out, (' ' x --$indent) . '</li>' if $level;
395 while ($level < $target_level) {
397 push @out, (' ' x ++$indent) . '<li>' if @out && $out[-1]=~ /^\s*<ul/;
398 push @out, (' ' x ++$indent) . "<ul$id>";
405 $space = ' ' x $indent;
406 push @out, sprintf '%s<li><a href="#%s">%s</a>',
407 $space, $h->[1], $h->[2];
409 # Splice the index in between the HTML headers and the first element.
410 my $offset = defined $self->html_header ? $self->html_header eq '' ? 0 : 1 : 1;
411 splice @{ $self->{'output'} }, $offset, 0, join "\n", @out;
414 if (defined $self->html_footer) {
415 $self->{'scratch'} .= $self->html_footer;
416 $self->emit unless $self->html_footer eq "";
418 $self->{'scratch'} .= "</body>\n</html>";
423 print {$self->{'output_fh'}} join ("\n\n", @{ $self->{'output'} }), "\n\n";
424 @{$self->{'output'}} = ();
430 sub start_B { $_[0]{'scratch'} .= '<b>' }
431 sub end_B { $_[0]{'scratch'} .= '</b>' }
433 sub start_C { $_[0]{'scratch'} .= '<code>' }
434 sub end_C { $_[0]{'scratch'} .= '</code>' }
437 my ($self, $flags) = @_;
438 push @{ $self->{'saved'} }, $self->{'scratch'};
439 $self->{'scratch'} = '';
442 my ($self, $flags) = @_;
443 my $previous = pop @{ $self->{'saved'} };
444 my $entity = $self->{'scratch'};
446 if ($entity =~ 'sol' or $entity =~ 'verbar') {
447 my $char = Pod::Escapes::e2char($entity);
448 if (defined($char)) {
449 $self->{'scratch'} = $previous . $char;
454 if ($entity =~ /^[0-9]/) {
455 $entity = '#' . $entity;
458 $self->{'scratch'} = $previous . '&'. $entity . ';'
461 sub start_F { $_[0]{'scratch'} .= '<i>' }
462 sub end_F { $_[0]{'scratch'} .= '</i>' }
464 sub start_I { $_[0]{'scratch'} .= '<i>' }
465 sub end_I { $_[0]{'scratch'} .= '</i>' }
468 my ($self, $flags) = @_;
470 if ($flags->{'type'} eq 'url') {
471 $url = $flags->{'to'};
472 } elsif ($flags->{'type'} eq 'pod') {
473 $url .= $self->perldoc_url_prefix || '';
474 $url .= $flags->{'to'} || '';
475 $url .= '/' . $flags->{'section'} if ($flags->{'section'});
476 $url .= $self->perldoc_url_postfix || '';
477 # require Data::Dumper;
478 # print STDERR Data::Dumper->Dump([$flags]);
481 $self->{'scratch'} .= '<a href="'. $url . '">';
483 sub end_L { $_[0]{'scratch'} .= '</a>' }
485 sub start_S { $_[0]{'scratch'} .= '<nobr>' }
486 sub end_S { $_[0]{'scratch'} .= '</nobr>' }
491 push @{ $self->{'output'} }, $self->{'scratch'};
493 print {$self->{'output_fh'}} $self->{'scratch'}, "\n\n";
495 $self->{'scratch'} = '';
501 my $id = $pod->idify($text);
502 my $hash = $pod->idify($text, 1);
504 This method turns an arbitrary string into a valid XHTML ID attribute value.
505 The rules enforced, following
506 L<http://webdesign.about.com/od/htmltags/a/aa031707.htm>, are:
512 The id must start with a letter (a-z or A-Z)
516 All subsequent characters can be letters, numbers (0-9), hyphens (-),
517 underscores (_), colons (:), and periods (.).
521 Each id must be unique within the document.
525 In addition, the returned value will be unique within the context of the
526 Pod::Simple::XHTML object unless a second argument is passed a true value. ID
527 attributes should always be unique within a single XHTML document, but pass
528 the true value if you are creating not an ID but a URL hash to point to
529 an ID (i.e., if you need to put the "#foo" in C<< <a href="#foo">foo</a> >>.
534 my ($self, $t, $not_unique) = @_;
536 s/<[^>]+>//g; # Strip HTML.
537 s/&[^;]+;//g; # Strip entities.
538 s/^([^a-zA-Z]+)$/pod$1/; # Prepend "pod" if no valid chars.
539 s/^[^a-zA-Z]+//; # First char must be a letter.
540 s/[^-a-zA-Z0-9_:.]+/-/g; # All other chars must be valid.
542 return $t if $not_unique;
544 $i++ while $self->{ids}{"$t$i"}++;
548 # Bypass built-in E<> handling to preserve entity encoding
557 L<Pod::Simple>, L<Pod::Simple::Methody>
561 Copyright (c) 2003-2005 Allison Randal.
563 This library is free software; you can redistribute it and/or modify
564 it under the same terms as Perl itself.
566 This library is distributed in the hope that it will be useful, but
567 without any warranty; without even the implied warranty of
568 merchantability or fitness for a particular purpose.
572 Allison Randal <allison@perl.org>