From: Steve Peters Date: Wed, 7 Dec 2005 11:38:00 +0000 (+0000) Subject: Assimilate Pod-Simple-3.03 to the Perl core X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=351625bd207602dfd0011de5eb4628c180eff839;p=p5sagit%2Fp5-mst-13.2.git Assimilate Pod-Simple-3.03 to the Perl core p4raw-id: //depot/perl@26291 --- diff --git a/MANIFEST b/MANIFEST index 635ecbc..0e2601f 100644 --- a/MANIFEST +++ b/MANIFEST @@ -1900,6 +1900,176 @@ lib/Pod/Perldoc/t/textbasic.t test Pod::Perldoc::ToText lib/Pod/Plainer.pm Pod migration utility module lib/Pod/PlainText.pm Convert POD data to formatted ASCII text lib/Pod/Select.pm Pod-Parser - select portions of POD docs +lib/Pod/Simple/BlackBox.pm Pod::Simple::BlackBox +lib/Pod/Simple/ChangeLog Pod::Simple ChangeLog +lib/Pod/Simple/Checker.pm check the Pod syntax of a document +lib/Pod/Simple/Debug.pm put Pod::Simple into trace/debug mode +lib/Pod/Simple/DumpAsText.pm dump Pod-parsing events as text +lib/Pod/Simple/DumpAsXML.pm turn Pod into XML +lib/Pod/Simple/HTMLBatch.pm convert several Pod files to several HTML files +lib/Pod/Simple/HTMLLegacy.pm Pod::Simple::HTMLLegacy +lib/Pod/Simple/HTML.pm convert Pod to HTML +lib/Pod/Simple/LinkSection.pm represent "section" attributes of L codes +lib/Pod/Simple/Methody.pm turn Pod::Simple events into method calls +lib/Pod/Simple.pm Pod made simple +lib/Pod/Simple.pod Pod for Pod::Simple +lib/Pod/Simple/Progress.pm Pod::Simple::Progress +lib/Pod/Simple/PullParserEndToken.pm end-tokens from Pod::Simple::PullParser +lib/Pod/Simple/PullParser.pm a pull-parser interface to parsing Pod +lib/Pod/Simple/PullParserStartToken.pm start-tokens from Pod::Simple::PullParser +lib/Pod/Simple/PullParserTextToken.pm text-tokens from Pod::Simple::PullParser +lib/Pod/Simple/PullParserToken.pm tokens from Pod::Simple::PullParser +lib/Pod/Simple/README Pod::Simple README file +lib/Pod/Simple/RTF.pm format Pod as RTF +lib/Pod/Simple/Search.pm find POD documents in directory trees +lib/Pod/Simple/SimpleTree.pm parse Pod into a simple parse tree +lib/Pod/Simple/Subclassing.pod write a formatter as a Pod::Simple subclass +lib/Pod/Simple/t/00about.t Pod::Simple test file +lib/Pod/Simple/t/20_skip_before_58.t Pod::Simple test file +lib/Pod/Simple/t/ac_c_extend.t Pod::Simple test file +lib/Pod/Simple/t/ac_c_simple.t Pod::Simple test file +lib/Pod/Simple/t/ac_d.t Pod::Simple test file +lib/Pod/Simple/t/basic.t Pod::Simple test file +lib/Pod/Simple/t/beginend.t Pod::Simple test file +lib/Pod/Simple/t/cbacks.t Pod::Simple test file +lib/Pod/Simple/t/chunking.t Pod::Simple test file +lib/Pod/Simple/t/closeys.t Pod::Simple test file +lib/Pod/Simple/t/corpus/buniya_cp1256.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/buniya_cp1256.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/buniya_iso6.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/buniya_iso6.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_contradiction.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_contradiction.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_duplication.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_duplication.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_when_koi8r.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/fet_when_koi8r.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/french_implicit_latin1.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/french_implicit_latin1.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/french_latin1.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/french_latin1.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/greek_iso_8859_7.pod Pod::Simple test file +lib/Pod/Simple/t/corpus/greek_iso_8859_7.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jp.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jp.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpx.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpx.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpy.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpy.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpz.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/haiku-iso2202jpz.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38b.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38b.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38p.pod Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38p.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/laozi38.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/nonesuch.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/nonesuch.xml Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/fiqhakbar_iso6.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/fiqhakbar_iso6.xml Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_implicit_utf8.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf16be_bom.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf16le_bom.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf8_bom2.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf8_bom2.xml Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf8_bom.txt Pod::Simple test file +lib/Pod/Simple/t/corpus_not_yet_impl/polish_utf8_bom.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/pasternak_cp1251.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/pasternak_cp1251.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_explicit.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_explicit.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_latin1.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_latin1.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/plain.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_utf8.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/plain_utf8.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/plain.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/polish_utf8.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/polish_utf8.xml Pod::Simple test file +lib/Pod/Simple/t/corpus/s2763_sjis.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/s2763_sjis.xml Pod::Simple test file +lib/Pod/Simple/t/corpustest.t Pod::Simple test file +lib/Pod/Simple/t/corpus/thai_iso11.txt Pod::Simple test file +lib/Pod/Simple/t/corpus/thai_iso11.xml Pod::Simple test file +lib/Pod/Simple/t/encoding_nonesuch.t Pod::Simple test file +lib/Pod/Simple/t/encoding_not_error0.t Pod::Simple test file +lib/Pod/Simple/t/encoding_not_error.t Pod::Simple test file +lib/Pod/Simple/TextContent.pm get the text content of Pod +lib/Pod/Simple/Text.pm format Pod as plaintext +lib/Pod/Simple/t/fcodes_ee.t Pod::Simple test file +lib/Pod/Simple/t/fcodes_ell.t Pod::Simple test file +lib/Pod/Simple/t/fcodes_ess.t Pod::Simple test file +lib/Pod/Simple/t/fcodes.t Pod::Simple test file +lib/Pod/Simple/t/fornot.t Pod::Simple test file +lib/Pod/Simple/t/for.t Pod::Simple test file +lib/Pod/Simple/t/fullstop_spaces.t Pod::Simple test file +lib/Pod/Simple/t/head_ends_over.t Pod::Simple test file +lib/Pod/Simple/t/heads.t Pod::Simple test file +lib/Pod/Simple/t/htmlbatch_01.t Pod::Simple test file +lib/Pod/Simple/t/html-para.t Pod::Simple test file +lib/Pod/Simple/t/html-styles.t Pod::Simple test file +lib/Pod/Simple/t/html-title.t Pod::Simple test file +lib/Pod/Simple/TiedOutFH.pm Pod::Simple::TiedOutFH +lib/Pod/Simple/t/itemadapt.t Pod::Simple test file +lib/Pod/Simple/t/itemstar.t Pod::Simple test file +lib/Pod/Simple/t/items.t Pod::Simple test file +lib/Pod/Simple/t/linkclass.t Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/hink^honk/Glunk.pod Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/hink^honk/readme.txt Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/hink^honk/Vliff.pm Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/pod/perlthang.pod Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/pod/perlzuk.pod Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/Sizzlesuzzle.pm Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/squaa/Vliff.pm Pod::Simple test file +lib/Pod/Simple/t/other^test^lib/squaa/Wowo.pod Pod::Simple test file +lib/Pod/Simple/t/puller.t Pod::Simple test file +lib/Pod/Simple/t/pulltitle.t Pod::Simple test file +lib/Pod/Simple/TranscodeDumb.pm Pod::Simple::TranscodeDumb +lib/Pod/Simple/Transcode.pm Pod::Simple::Transcode +lib/Pod/Simple/TranscodeSmart.pm Pod::Simple::TranscodeSmart +lib/Pod/Simple/t/render.t Pod::Simple test file +lib/Pod/Simple/t/sanity_tfh.t Pod::Simple test file +lib/Pod/Simple/t/search_05sane.t Pod::Simple test file +lib/Pod/Simple/t/search_10survey_specific.t Pod::Simple test file +lib/Pod/Simple/t/search_12survey_cwd.t Pod::Simple test file +lib/Pod/Simple/t/search_20survey_two.t Pod::Simple test file +lib/Pod/Simple/t/search_22survey_two_shadowing.t Pod::Simple test file +lib/Pod/Simple/t/search_25_glob_squaa_coloncolon_kleene.t Pod::Simple test file +lib/Pod/Simple/t/search_26_glob_kleene_k.t Pod::Simple test file +lib/Pod/Simple/t/search_27_glob_squaa_kleene.t Pod::Simple test file +lib/Pod/Simple/t/search_28_glob_z_kleene_k.t Pod::Simple test file +lib/Pod/Simple/t/search_29_glob_z_qmark_k.t Pod::Simple test file +lib/Pod/Simple/t/search_50survey_inc.t Pod::Simple test file +lib/Pod/Simple/t/stree.t Pod::Simple test file +lib/Pod/Simple/t/test_junk1_out.txt Pod::Simple test file +lib/Pod/Simple/t/test_junk1.pod Pod::Simple test file +lib/Pod/Simple/t/test_junk2_out.txt Pod::Simple test file +lib/Pod/Simple/t/test_junk2.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/Blorm.pm Pod::Simple test file +lib/Pod/Simple/t/test^lib/hink^honk/Glunk.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/hink^honk/readme.txt Pod::Simple test file +lib/Pod/Simple/t/test^lib/hink^honk/Vliff.pm Pod::Simple test file +lib/Pod/Simple/t/test^lib/pod/perlfliff.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/pod/perlthang.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/squaa/Glunk.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/squaa.pm Pod::Simple test file +lib/Pod/Simple/t/test^lib/squaa/Vliff.pm Pod::Simple test file +lib/Pod/Simple/t/test^lib/zikzik.pod Pod::Simple test file +lib/Pod/Simple/t/test^lib/Zonk/Fiddle.txt Pod::Simple test file +lib/Pod/Simple/t/test^lib/Zonk/Pronk.pm Pod::Simple test file +lib/Pod/Simple/t/test^lib/Zonk/Veng.pm Pod::Simple test file +lib/Pod/Simple/t/test_old_perlcygwin_out.txt Pod::Simple test file +lib/Pod/Simple/t/test_old_perlcygwin.pod Pod::Simple test file +lib/Pod/Simple/t/test_old_perlfaq3_out.txt Pod::Simple test file +lib/Pod/Simple/t/test_old_perlfaq3.pod Pod::Simple test file +lib/Pod/Simple/t/test_old_perlvar_out.txt Pod::Simple test file +lib/Pod/Simple/t/test_old_perlvar.pod Pod::Simple test file +lib/Pod/Simple/t/verbatim_formatted.t Pod::Simple test file +lib/Pod/Simple/t/verbatims.t Pod::Simple test file +lib/Pod/Simple/t/x_nixer.t Pod::Simple test file +lib/Pod/Simple/t/yet^another^test^lib/squaa/Vliff.pm Pod::Simple test file +lib/Pod/Simple/XMLOutStream.pm turn Pod into XML lib/Pod/t/basic.cap podlators test lib/Pod/t/basic.clr podlators test lib/Pod/t/basic.man podlators test diff --git a/Porting/Maintainers.pl b/Porting/Maintainers.pl index 643b489..175be9f 100644 --- a/Porting/Maintainers.pl +++ b/Porting/Maintainers.pl @@ -13,6 +13,7 @@ package Maintainers; 'abergman' => 'Arthur Bergman ', 'ams' => 'Abhijit Menon-Sen ', 'andk' => 'Andreas J. Koenig ', + 'arandal' => 'Allison Randal ', 'autrijus' => 'Autrijus Tang ', 'bbb' => 'Rob Brown ', 'craig' => 'Craig Berry ', @@ -458,6 +459,12 @@ package Maintainers; 'CPAN' => 1, }, + 'Pod::Simple' => + 'MAINTAINER' => 'arandal', + 'FILES' => q[lib/Pod/Simple.pm lib/Pod/Simple.pod lib/Pod/Simple] + 'CPAN' => 1, + }, + 'Pod::LaTeX' => { 'MAINTAINER' => 'tjenness', diff --git a/lib/Pod/Simple.pm b/lib/Pod/Simple.pm new file mode 100644 index 0000000..965243e --- /dev/null +++ b/lib/Pod/Simple.pm @@ -0,0 +1,1520 @@ + +require 5; +package Pod::Simple; +use strict; +use Carp (); +BEGIN { *DEBUG = sub () {0} unless defined &DEBUG } +use integer; +use Pod::Escapes 1.03 (); +use Pod::Simple::LinkSection (); +use Pod::Simple::BlackBox (); +#use utf8; + +use vars qw( + $VERSION @ISA + @Known_formatting_codes @Known_directives + %Known_formatting_codes %Known_directives + $NL +); + +@ISA = ('Pod::Simple::BlackBox'); +$VERSION = '3.03'; + +@Known_formatting_codes = qw(I B C L E F S X Z); +%Known_formatting_codes = map(($_=>1), @Known_formatting_codes); +@Known_directives = qw(head1 head2 head3 head4 item over back); +%Known_directives = map(($_=>'Plain'), @Known_directives); +$NL = $/ unless defined $NL; + +#----------------------------------------------------------------------------- +# Set up some constants: + +BEGIN { + if(defined &ASCII) { } + elsif(chr(65) eq 'A') { *ASCII = sub () {1} } + else { *ASCII = sub () {''} } + + unless(defined &MANY_LINES) { *MANY_LINES = sub () {20} } + DEBUG > 4 and print "MANY_LINES is ", MANY_LINES(), "\n"; + unless(MANY_LINES() >= 1) { + die "MANY_LINES is too small (", MANY_LINES(), ")!\nAborting"; + } + if(defined &UNICODE) { } + elsif($] >= 5.008) { *UNICODE = sub() {1} } + else { *UNICODE = sub() {''} } +} +if(DEBUG > 2) { + print "# We are ", ASCII ? '' : 'not ', "in ASCII-land\n"; + print "# We are under a Unicode-safe Perl.\n"; +} + +# Design note: +# This is a parser for Pod. It is not a parser for the set of Pod-like +# languages which happens to contain Pod -- it is just for Pod, plus possibly +# some extensions. + +# @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ +#@ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +__PACKAGE__->_accessorize( + 'nbsp_for_S', # Whether to map S<...>'s to \xA0 characters + 'source_filename', # Filename of the source, for use in warnings + 'source_dead', # Whether to consider this parser's source dead + + 'output_fh', # The filehandle we're writing to, if applicable. + # Used only in some derived classes. + + 'hide_line_numbers', # For some dumping subclasses: whether to pointedly + # suppress the start_line attribute + + 'line_count', # the current line number + 'pod_para_count', # count of pod paragraphs seen so far + + 'no_whining', # whether to suppress whining + 'no_errata_section', # whether to suppress the errata section + 'complain_stderr', # whether to complain to stderr + + 'doc_has_started', # whether we've fired the open-Document event yet + + 'bare_output', # For some subclasses: whether to prepend + # header-code and postpend footer-code + + 'fullstop_space_harden', # Whether to turn ". " into ".[nbsp] "; + + 'nix_X_codes', # whether to ignore X<...> codes + 'merge_text', # whether to avoid breaking a single piece of + # text up into several events + + 'preserve_whitespace', # whether to try to keep whitespace as-is + + 'content_seen', # whether we've seen any real Pod content + 'errors_seen', # TODO: document. whether we've seen any errors (fatal or not) + + 'codes_in_verbatim', # for PseudoPod extensions + + 'code_handler', # coderef to call when a code (non-pod) line is seen + 'cut_handler', # coderef to call when a =cut line is seen + #Called like: + # $code_handler->($line, $self->{'line_count'}, $self) if $code_handler; + # $cut_handler->($line, $self->{'line_count'}, $self) if $cut_handler; + +); + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub any_errata_seen { # good for using as an exit() value... + return shift->{'errors_seen'} || 0; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +# Pull in some functions that, for some reason, I expect to see here too: +BEGIN { + *pretty = \&Pod::Simple::BlackBox::pretty; + *stringify_lol = \&Pod::Simple::BlackBox::stringify_lol; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub version_report { + my $class = ref($_[0]) || $_[0]; + if($class eq __PACKAGE__) { + return "$class $VERSION"; + } else { + my $v = $class->VERSION; + return "$class $v (" . __PACKAGE__ . " $VERSION)"; + } +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +#sub curr_open { # read-only list accessor +# return @{ $_[0]{'curr_open'} || return() }; +#} +#sub _curr_open_listref { $_[0]{'curr_open'} ||= [] } + + +sub output_string { + # Works by faking out output_fh. Simplifies our code. + # + my $this = shift; + return $this->{'output_string'} unless @_; # GET. + + require Pod::Simple::TiedOutFH; + my $x = (defined($_[0]) and ref($_[0])) ? $_[0] : \( $_[0] ); + $$x = '' unless defined $$x; + DEBUG > 4 and print "# Output string set to $x ($$x)\n"; + $this->{'output_fh'} = Pod::Simple::TiedOutFH->handle_on($_[0]); + return + $this->{'output_string'} = $_[0]; + #${ ${ $this->{'output_fh'} } }; +} + +sub abandon_output_string { $_[0]->abandon_output_fh; delete $_[0]{'output_string'} } +sub abandon_output_fh { $_[0]->output_fh(undef) } +# These don't delete the string or close the FH -- they just delete our +# references to it/them. +# TODO: document these + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub new { + # takes no parameters + my $class = ref($_[0]) || $_[0]; + #Carp::croak(__PACKAGE__ . " is a virtual base class -- see perldoc " + # . __PACKAGE__ ); + return bless { + 'accept_codes' => { map( ($_=>$_), @Known_formatting_codes ) }, + 'accept_directives' => { %Known_directives }, + 'accept_targets' => {}, + }, $class; +} + + + +# TODO: an option for whether to interpolate E<...>'s, or just resolve to codes. + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _handle_element_start { # OVERRIDE IN DERIVED CLASS + my($self, $element_name, $attr_hash_r) = @_; + return; +} + +sub _handle_element_end { # OVERRIDE IN DERIVED CLASS + my($self, $element_name) = @_; + return; +} + +sub _handle_text { # OVERRIDE IN DERIVED CLASS + my($self, $text) = @_; + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +# +# And now directives (not targets) + +sub accept_directive_as_verbatim { shift->_accept_directives('Verbatim', @_) } +sub accept_directive_as_data { shift->_accept_directives('Data', @_) } +sub accept_directive_as_processed { shift->_accept_directives('Plain', @_) } + +sub _accept_directives { + my($this, $type) = splice @_,0,2; + foreach my $d (@_) { + next unless defined $d and length $d; + Carp::croak "\"$d\" isn't a valid directive name" + unless $d =~ m/^[a-zA-Z][a-zA-Z0-9]*$/s; + Carp::croak "\"$d\" is already a reserved Pod directive name" + if exists $Known_directives{$d}; + $this->{'accept_directives'}{$d} = $type; + DEBUG > 2 and print "Learning to accept \"=$d\" as directive of type $type\n"; + } + DEBUG > 6 and print "$this\'s accept_directives : ", + pretty($this->{'accept_directives'}), "\n"; + + return sort keys %{ $this->{'accept_directives'} } if wantarray; + return; +} + +#-------------------------------------------------------------------------- +# TODO: document these: + +sub unaccept_directive { shift->unaccept_directives(@_) }; + +sub unaccept_directives { + my $this = shift; + foreach my $d (@_) { + next unless defined $d and length $d; + Carp::croak "\"$d\" isn't a valid directive name" + unless $d =~ m/^[a-zA-Z][a-zA-Z0-9]*$/s; + Carp::croak "But you must accept \"$d\" directives -- it's a builtin!" + if exists $Known_directives{$d}; + delete $this->{'accept_directives'}{$d}; + DEBUG > 2 and print "OK, won't accept \"=$d\" as directive.\n"; + } + return sort keys %{ $this->{'accept_directives'} } if wantarray; + return +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +# +# And now targets (not directives) + +sub accept_target { shift->accept_targets(@_) } # alias +sub accept_target_as_text { shift->accept_targets_as_text(@_) } # alias + + +sub accept_targets { shift->_accept_targets('1', @_) } + +sub accept_targets_as_text { shift->_accept_targets('force_resolve', @_) } + # forces them to be processed, even when there's no ":". + +sub _accept_targets { + my($this, $type) = splice @_,0,2; + foreach my $t (@_) { + next unless defined $t and length $t; + # TODO: enforce some limitations on what a target name can be? + $this->{'accept_targets'}{$t} = $type; + DEBUG > 2 and print "Learning to accept \"$t\" as target of type $type\n"; + } + return sort keys %{ $this->{'accept_targets'} } if wantarray; + return; +} + +#-------------------------------------------------------------------------- +sub unaccept_target { shift->unaccept_targets(@_) } + +sub unaccept_targets { + my $this = shift; + foreach my $t (@_) { + next unless defined $t and length $t; + # TODO: enforce some limitations on what a target name can be? + delete $this->{'accept_targets'}{$t}; + DEBUG > 2 and print "OK, won't accept \"$t\" as target.\n"; + } + return sort keys %{ $this->{'accept_targets'} } if wantarray; + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +# +# And now codes (not targets or directives) + +sub accept_code { shift->accept_codes(@_) } # alias + +sub accept_codes { # Add some codes + my $this = shift; + + foreach my $new_code (@_) { + next unless defined $new_code and length $new_code; + if(ASCII) { + # A good-enough check that it's good as an XML Name symbol: + Carp::croak "\"$new_code\" isn't a valid element name" + if $new_code =~ + m/[\x00-\x2C\x2F\x39\x3B-\x40\x5B-\x5E\x60\x7B-\x7F]/ + # Characters under 0x80 that aren't legal in an XML Name. + or $new_code =~ m/^[-\.0-9]/s + or $new_code =~ m/:[-\.0-9]/s; + # The legal under-0x80 Name characters that + # an XML Name still can't start with. + } + + $this->{'accept_codes'}{$new_code} = $new_code; + + # Yes, map to itself -- just so that when we + # see "=extend W [whatever] thatelementname", we say that W maps + # to whatever $this->{accept_codes}{thatelementname} is, + # i.e., "thatelementname". Then when we go re-mapping, + # a "W" in the treelet turns into "thatelementname". We only + # remap once. + # If we say we accept "W", then a "W" in the treelet simply turns + # into "W". + } + + return; +} + +#-------------------------------------------------------------------------- +sub unaccept_code { shift->unaccept_codes(@_) } + +sub unaccept_codes { # remove some codes + my $this = shift; + + foreach my $new_code (@_) { + next unless defined $new_code and length $new_code; + if(ASCII) { + # A good-enough check that it's good as an XML Name symbol: + Carp::croak "\"$new_code\" isn't a valid element name" + if $new_code =~ + m/[\x00-\x2C\x2F\x39\x3B-\x40\x5B-\x5E\x60\x7B-\x7F]/ + # Characters under 0x80 that aren't legal in an XML Name. + or $new_code =~ m/^[-\.0-9]/s + or $new_code =~ m/:[-\.0-9]/s; + # The legal under-0x80 Name characters that + # an XML Name still can't start with. + } + + Carp::croak "But you must accept \"$new_code\" codes -- it's a builtin!" + if grep $new_code eq $_, @Known_formatting_codes; + + delete $this->{'accept_codes'}{$new_code}; + + DEBUG > 2 and print "OK, won't accept the code $new_code<...>.\n"; + } + + return; +} + + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub parse_string_document { + my $self = shift; + my @lines; + foreach my $line_group (@_) { + next unless defined $line_group and length $line_group; + pos($line_group) = 0; + while($line_group =~ + m/([^\n\r]*)((?:\r?\n)?)/g + ) { + #print(">> $1\n"), + $self->parse_lines($1) + if length($1) or length($2) + or pos($line_group) != length($line_group); + # I.e., unless it's a zero-length "empty line" at the very + # end of "foo\nbar\n" (i.e., between the \n and the EOS). + } + } + $self->parse_lines(undef); # to signal EOF + return $self; +} + +sub _init_fh_source { + my($self, $source) = @_; + + #DEBUG > 1 and print "Declaring $source as :raw for starters\n"; + #$self->_apply_binmode($source, ':raw'); + #binmode($source, ":raw"); + + return; +} + +#:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:. +# + +sub parse_file { + my($self, $source) = (@_); + + if(!defined $source) { + Carp::croak("Can't use empty-string as a source for parse_file"); + } elsif(ref(\$source) eq 'GLOB') { + $self->{'source_filename'} = '' . ($source); + } elsif(ref $source) { + $self->{'source_filename'} = '' . ($source); + } elsif(!length $source) { + Carp::croak("Can't use empty-string as a source for parse_file"); + } else { + { + local *PODSOURCE; + open(PODSOURCE, "<$source") || Carp::croak("Can't open $source: $!"); + $self->{'source_filename'} = $source; + $source = *PODSOURCE{IO}; + } + $self->_init_fh_source($source); + } + # By here, $source is a FH. + + $self->{'source_fh'} = $source; + + my($i, @lines); + until( $self->{'source_dead'} ) { + splice @lines; + for($i = MANY_LINES; $i--;) { # read those many lines at a time + local $/ = $NL; + push @lines, scalar(<$source>); # readline + last unless defined $lines[-1]; + # but pass thru the undef, which will set source_dead to true + } + $self->parse_lines(@lines); + } + delete($self->{'source_fh'}); # so it can be GC'd + return $self; +} + +#:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:. + +sub parse_from_file { + # An emulation of Pod::Parser's interface, for the sake of Perldoc. + # Basically just a wrapper around parse_file. + + my($self, $source, $to) = @_; + $self = $self->new unless ref($self); # so we tolerate being a class method + + if(!defined $source) { $source = *STDIN{IO} + } elsif(ref(\$source) eq 'GLOB') { # stet + } elsif(ref($source) ) { # stet + } elsif(!length $source + or $source eq '-' or $source =~ m/^<&(STDIN|0)$/i + ) { + $source = *STDIN{IO}; + } + + if(!defined $to) { $self->output_fh( *STDOUT{IO} ); + } elsif(ref(\$to) eq 'GLOB') { $self->output_fh( $to ); + } elsif(ref($to)) { $self->output_fh( $to ); + } elsif(!length $to + or $to eq '-' or $to =~ m/^>&?(?:STDOUT|1)$/i + ) { + $self->output_fh( *STDOUT{IO} ); + } else { + require Symbol; + my $out_fh = Symbol::gensym(); + DEBUG and print "Write-opening to $to\n"; + open($out_fh, ">$to") or Carp::croak "Can't write-open $to: $!"; + binmode($out_fh) + if $self->can('write_with_binmode') and $self->write_with_binmode; + $self->output_fh($out_fh); + } + + return $self->parse_file($source); +} + +#----------------------------------------------------------------------------- + +sub whine { + #my($self,$line,$complaint) = @_; + my $self = shift(@_); + ++$self->{'errors_seen'}; + if($self->{'no_whining'}) { + DEBUG > 9 and print "Discarding complaint (at line $_[0]) $_[1]\n because no_whining is on.\n"; + return; + } + return $self->_complain_warn(@_) if $self->{'complain_stderr'}; + return $self->_complain_errata(@_); +} + +sub scream { # like whine, but not suppressable + #my($self,$line,$complaint) = @_; + my $self = shift(@_); + ++$self->{'errors_seen'}; + return $self->_complain_warn(@_) if $self->{'complain_stderr'}; + return $self->_complain_errata(@_); +} + +sub _complain_warn { + my($self,$line,$complaint) = @_; + return printf STDERR "%s around line %s: %s\n", + $self->{'source_filename'} || 'Pod input', $line, $complaint; +} + +sub _complain_errata { + my($self,$line,$complaint) = @_; + if( $self->{'no_errata_section'} ) { + DEBUG > 9 and print "Discarding erratum (at line $line) $complaint\n because no_errata_section is on.\n"; + } else { + DEBUG > 9 and print "Queuing erratum (at line $line) $complaint\n"; + push @{$self->{'errata'}{$line}}, $complaint + # for a report to be generated later! + } + return 1; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _get_initial_item_type { + # A hack-wrapper here for when you have like "=over\n\n=item 456\n\n" + my($self, $para) = @_; + return $para->[1]{'~type'} if $para->[1]{'~type'}; + + return $para->[1]{'~type'} = 'text' + if join("\n", @{$para}[2 .. $#$para]) =~ m/^\s*(\d+)\.?\s*$/s and $1 ne '1'; + # Else fall thru to the general case: + return $self->_get_item_type($para); +} + + + +sub _get_item_type { # mutates the item!! + my($self, $para) = @_; + return $para->[1]{'~type'} if $para->[1]{'~type'}; + + + # Otherwise we haven't yet been to this node. Maybe alter it... + + my $content = join "\n", @{$para}[2 .. $#$para]; + + if($content =~ m/^\s*\*\s*$/s or $content =~ m/^\s*$/s) { + # Like: "=item *", "=item * ", "=item" + splice @$para, 2; # so it ends up just being ['=item', { attrhash } ] + $para->[1]{'~orig_content'} = $content; + return $para->[1]{'~type'} = 'bullet'; + + } elsif($content =~ m/^\s*\*\s+(.+)/s) { # tolerance + + # Like: "=item * Foo bar baz"; + $para->[1]{'~orig_content'} = $content; + $para->[1]{'~_freaky_para_hack'} = $1; + DEBUG > 2 and print " Tolerating $$para[2] as =item *\\n\\n$1\n"; + splice @$para, 2; # so it ends up just being ['=item', { attrhash } ] + return $para->[1]{'~type'} = 'bullet'; + + } elsif($content =~ m/^\s*(\d+)\.?\s*$/s) { + # Like: "=item 1.", "=item 123412" + + $para->[1]{'~orig_content'} = $content; + $para->[1]{'number'} = $1; # Yes, stores the number there! + + splice @$para, 2; # so it ends up just being ['=item', { attrhash } ] + return $para->[1]{'~type'} = 'number'; + + } else { + # It's anything else. + return $para->[1]{'~type'} = 'text'; + + } +} + +#----------------------------------------------------------------------------- + +sub _make_treelet { + my $self = shift; # and ($para, $start_line) + my $treelet; + if(!@_) { + return ['']; + } if(ref $_[0] and ref $_[0][0] and $_[0][0][0] eq '~Top') { + # Hack so we can pass in fake-o pre-cooked paragraphs: + # just have the first line be a reference to a ['~Top', {}, ...] + # We use this feechure in gen_errata and stuff. + + DEBUG and print "Applying precooked treelet hack to $_[0][0]\n"; + $treelet = $_[0][0]; + splice @$treelet, 0, 2; # lop the top off + return $treelet; + } else { + $treelet = $self->_treelet_from_formatting_codes(@_); + } + + if( $self->_remap_sequences($treelet) ) { + $self->_treat_Zs($treelet); # Might as well nix these first + $self->_treat_Ls($treelet); # L has to precede E and S + $self->_treat_Es($treelet); + $self->_treat_Ss($treelet); # S has to come after E + + $self->_wrap_up($treelet); # Nix X's and merge texties + + } else { + DEBUG and print "Formatless treelet gets fast-tracked.\n"; + # Very common case! + } + + splice @$treelet, 0, 2; # lop the top off + + return $treelet; +} + +#:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:. + +sub _wrap_up { + my($self, @stack) = @_; + my $nixx = $self->{'nix_X_codes'}; + my $merge = $self->{'merge_text' }; + return unless $nixx or $merge; + + DEBUG > 2 and print "\nStarting _wrap_up traversal.\n", + $merge ? (" Merge mode on\n") : (), + $nixx ? (" Nix-X mode on\n") : (), + ; + + + my($i, $treelet); + while($treelet = shift @stack) { + DEBUG > 3 and print " Considering children of this $treelet->[0] node...\n"; + for($i = 2; $i < @$treelet; ++$i) { # iterate over children + DEBUG > 3 and print " Considering child at $i ", pretty($treelet->[$i]), "\n"; + if($nixx and ref $treelet->[$i] and $treelet->[$i][0] eq 'X') { + DEBUG > 3 and print " Nixing X node at $i\n"; + splice(@$treelet, $i, 1); # just nix this node (and its descendants) + # no need to back-update the counter just yet + redo; + + } elsif($merge and $i != 2 and # non-initial + !ref $treelet->[$i] and !ref $treelet->[$i - 1] + ) { + DEBUG > 3 and print " Merging ", $i-1, + ":[$treelet->[$i-1]] and $i\:[$treelet->[$i]]\n"; + $treelet->[$i-1] .= ( splice(@$treelet, $i, 1) )[0]; + DEBUG > 4 and print " Now: ", $i-1, ":[$treelet->[$i-1]]\n"; + --$i; + next; + # since we just pulled the possibly last node out from under + # ourselves, we can't just redo() + + } elsif( ref $treelet->[$i] ) { + DEBUG > 4 and print " Enqueuing ", pretty($treelet->[$i]), " for traversal.\n"; + push @stack, $treelet->[$i]; + + if($treelet->[$i][0] eq 'L') { + my $thing; + foreach my $attrname ('section', 'to') { + if(defined($thing = $treelet->[$i][1]{$attrname}) and ref $thing) { + unshift @stack, $thing; + DEBUG > 4 and print " +Enqueuing ", + pretty( $treelet->[$i][1]{$attrname} ), + " as an attribute value to tweak.\n"; + } + } + } + } + } + } + DEBUG > 2 and print "End of _wrap_up traversal.\n\n"; + + return; +} + +#:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:. + +sub _remap_sequences { + my($self,@stack) = @_; + + if(@stack == 1 and @{ $stack[0] } == 3 and !ref $stack[0][2]) { + # VERY common case: abort it. + DEBUG and print "Skipping _remap_sequences: formatless treelet.\n"; + return 0; + } + + my $map = ($self->{'accept_codes'} || die "NO accept_codes in $self?!?"); + + my $start_line = $stack[0][1]{'start_line'}; + DEBUG > 2 and printf + "\nAbout to start _remap_sequences on treelet from line %s.\n", + $start_line || '[?]' + ; + DEBUG > 3 and print " Map: ", + join('; ', map "$_=" . ( + ref($map->{$_}) ? join(",", @{$map->{$_}}) : $map->{$_} + ), + sort keys %$map ), + ("B~C~E~F~I~L~S~X~Z" eq join '~', sort keys %$map) + ? " (all normal)\n" : "\n" + ; + + # A recursive algorithm implemented iteratively! Whee! + + my($is, $was, $i, $treelet); # scratch + while($treelet = shift @stack) { + DEBUG > 3 and print " Considering children of this $treelet->[0] node...\n"; + for($i = 2; $i < @$treelet; ++$i) { # iterate over children + next unless ref $treelet->[$i]; # text nodes are uninteresting + + DEBUG > 4 and print " Noting child $i : $treelet->[$i][0]<...>\n"; + + $is = $treelet->[$i][0] = $map->{ $was = $treelet->[$i][0] }; + if( DEBUG > 3 ) { + if(!defined $is) { + print " Code $was<> is UNKNOWN!\n"; + } elsif($is eq $was) { + DEBUG > 4 and print " Code $was<> stays the same.\n"; + } else { + print " Code $was<> maps to ", + ref($is) + ? ( "tags ", map("$_<", @$is), '...', map('>', @$is), "\n" ) + : "tag $is<...>.\n"; + } + } + + if(!defined $is) { + $self->whine($start_line, "Deleting unknown formatting code $was<>"); + $is = $treelet->[$i][0] = '1'; # But saving the children! + # I could also insert a leading "$was<" and tailing ">" as + # children of this node, but something about that seems icky. + } + if(ref $is) { + my @dynasty = @$is; + DEBUG > 4 and print " Renaming $was node to $dynasty[-1]\n"; + $treelet->[$i][0] = pop @dynasty; + my $nugget; + while(@dynasty) { + DEBUG > 4 and printf + " Grafting a new %s node between %s and %s\n", + $dynasty[-1], $treelet->[0], $treelet->[$i][0], + ; + + #$nugget = ; + splice @$treelet, $i, 1, [pop(@dynasty), {}, $treelet->[$i]]; + # relace node with a new parent + } + } elsif($is eq '0') { + splice(@$treelet, $i, 1); # just nix this node (and its descendants) + --$i; # back-update the counter + } elsif($is eq '1') { + splice(@$treelet, $i, 1 # replace this node with its children! + => splice @{ $treelet->[$i] },2 + # (not catching its first two (non-child) items) + ); + --$i; # back up for new stuff + } else { + # otherwise it's unremarkable + unshift @stack, $treelet->[$i]; # just recurse + } + } + } + + DEBUG > 2 and print "End of _remap_sequences traversal.\n\n"; + + if(@_ == 2 and @{ $_[1] } == 3 and !ref $_[1][2]) { + DEBUG and print "Noting that the treelet is now formatless.\n"; + return 0; + } + return 1; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub _ponder_extend { + + # "Go to an extreme, move back to a more comfortable place" + # -- /Oblique Strategies/, Brian Eno and Peter Schmidt + + my($self, $para) = @_; + my $content = join ' ', splice @$para, 2; + $content =~ s/^\s+//s; + $content =~ s/\s+$//s; + + DEBUG > 2 and print "Ogling extensor: =extend $content\n"; + + if($content =~ + m/^ + (\S+) # 1 : new item + \s+ + (\S+) # 2 : fallback(s) + (?:\s+(\S+))? # 3 : element name(s) + \s* + $ + /xs + ) { + my $new_letter = $1; + my $fallbacks_one = $2; + my $elements_one; + $elements_one = defined($3) ? $3 : $1; + + DEBUG > 2 and print "Extensor has good syntax.\n"; + + unless($new_letter =~ m/^[A-Z]$/s or $new_letter) { + DEBUG > 2 and print " $new_letter isn't a valid thing to entend.\n"; + $self->whine( + $para->[1]{'start_line'}, + "You can extend only formatting codes A-Z, not like \"$new_letter\"" + ); + return; + } + + if(grep $new_letter eq $_, @Known_formatting_codes) { + DEBUG > 2 and print " $new_letter isn't a good thing to extend, because known.\n"; + $self->whine( + $para->[1]{'start_line'}, + "You can't extend an established code like \"$new_letter\"" + ); + + #TODO: or allow if last bit is same? + + return; + } + + unless($fallbacks_one =~ m/^[A-Z](,[A-Z])*$/s # like "B", "M,I", etc. + or $fallbacks_one eq '0' or $fallbacks_one eq '1' + ) { + $self->whine( + $para->[1]{'start_line'}, + "Format for second =extend parameter must be like" + . " M or 1 or 0 or M,N or M,N,O but you have it like " + . $fallbacks_one + ); + return; + } + + unless($elements_one =~ m/^[^ ,]+(,[^ ,]+)*$/s) { # like "B", "M,I", etc. + $self->whine( + $para->[1]{'start_line'}, + "Format for third =extend parameter: like foo or bar,Baz,qu:ux but not like " + . $elements_one + ); + return; + } + + my @fallbacks = split ',', $fallbacks_one, -1; + my @elements = split ',', $elements_one, -1; + + foreach my $f (@fallbacks) { + next if exists $Known_formatting_codes{$f} or $f eq '0' or $f eq '1'; + DEBUG > 2 and print " Can't fall back on unknown code $f\n"; + $self->whine( + $para->[1]{'start_line'}, + "Can't use unknown formatting code '$f' as a fallback for '$new_letter'" + ); + return; + } + + DEBUG > 3 and printf "Extensor: Fallbacks <%s> Elements <%s>.\n", + @fallbacks, @elements; + + my $canonical_form; + foreach my $e (@elements) { + if(exists $self->{'accept_codes'}{$e}) { + DEBUG > 1 and print " Mapping '$new_letter' to known extension '$e'\n"; + $canonical_form = $e; + last; # first acceptable elementname wins! + } else { + DEBUG > 1 and print " Can't map '$new_letter' to unknown extension '$e'\n"; + } + } + + + if( defined $canonical_form ) { + # We found a good N => elementname mapping + $self->{'accept_codes'}{$new_letter} = $canonical_form; + DEBUG > 2 and print + "Extensor maps $new_letter => known element $canonical_form.\n"; + } else { + # We have to use the fallback(s), which might be '0', or '1'. + $self->{'accept_codes'}{$new_letter} + = (@fallbacks == 1) ? $fallbacks[0] : \@fallbacks; + DEBUG > 2 and print + "Extensor maps $new_letter => fallbacks @fallbacks.\n"; + } + + } else { + DEBUG > 2 and print "Extensor has bad syntax.\n"; + $self->whine( + $para->[1]{'start_line'}, + "Unknown =extend syntax: $content" + ) + } + return; +} + + +#:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:. + +sub _treat_Zs { # Nix Z<...>'s + my($self,@stack) = @_; + + my($i, $treelet); + my $start_line = $stack[0][1]{'start_line'}; + + # A recursive algorithm implemented iteratively! Whee! + + while($treelet = shift @stack) { + for($i = 2; $i < @$treelet; ++$i) { # iterate over children + next unless ref $treelet->[$i]; # text nodes are uninteresting + unless($treelet->[$i][0] eq 'Z') { + unshift @stack, $treelet->[$i]; # recurse + next; + } + + DEBUG > 1 and print "Nixing Z node @{$treelet->[$i]}\n"; + + # bitch UNLESS it's empty + unless( @{$treelet->[$i]} == 2 + or (@{$treelet->[$i]} == 3 and $treelet->[$i][2] eq '') + ) { + $self->whine( $start_line, "A non-empty Z<>" ); + } # but kill it anyway + + splice(@$treelet, $i, 1); # thereby just nix this node. + --$i; + + } + } + + return; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +# Quoting perlpodspec: + +# In parsing an L<...> code, Pod parsers must distinguish at least four +# attributes: + +############# Not used. Expressed via the element children plus +############# the value of the "content-implicit" flag. +# First: +# The link-text. If there is none, this must be undef. (E.g., in "L", the link-text is "Perl Functions". In +# "L" and even "L<|Time::HiRes>", there is no link text. Note +# that link text may contain formatting.) +# + +############# The element children +# Second: +# The possibly inferred link-text -- i.e., if there was no real link text, +# then this is the text that we'll infer in its place. (E.g., for +# "L", the inferred link text is "Getopt::Std".) +# + +############# The "to" attribute (which might be text, or a treelet) +# Third: +# The name or URL, or undef if none. (E.g., in "L", the name -- also sometimes called the page -- is +# "perlfunc". In "L", the name is undef.) +# + +############# The "section" attribute (which might be next, or a treelet) +# Fourth: +# The section (AKA "item" in older perlpods), or undef if none. E.g., in +# Getopt::Std/DESCRIPTION, "DESCRIPTION" is the section. (Note that this +# is not the same as a manpage section like the "5" in "man 5 crontab". +# "Section Foo" in the Pod sense means the part of the text that's +# introduced by the heading or item whose text is "Foo".) +# +# Pod parsers may also note additional attributes including: +# + +############# The "type" attribute. +# Fifth: +# A flag for whether item 3 (if present) is a URL (like +# "http://lists.perl.org" is), in which case there should be no section +# attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or +# possibly a man page name (like "crontab(5)" is). +# + +############# Not implemented, I guess. +# Sixth: +# The raw original L<...> content, before text is split on "|", "/", etc, +# and before E<...> codes are expanded. + + +# For L<...> codes without a "name|" part, only E<...> and Z<> codes may +# occur -- no other formatting codes. That is, authors should not use +# "L>". +# +# Note, however, that formatting codes and Z<>'s can occur in any and all +# parts of an L<...> (i.e., in name, section, text, and url). + +sub _treat_Ls { # Process our dear dear friends, the L<...> sequences + + # L + # L or L + # L or L or L<"sec"> + # L + # L or L + # L or L or L + # L + + my($self,@stack) = @_; + + my($i, $treelet); + my $start_line = $stack[0][1]{'start_line'}; + + # A recursive algorithm implemented iteratively! Whee! + + while($treelet = shift @stack) { + for(my $i = 2; $i < @$treelet; ++$i) { + # iterate over children of current tree node + next unless ref $treelet->[$i]; # text nodes are uninteresting + unless($treelet->[$i][0] eq 'L') { + unshift @stack, $treelet->[$i]; # recurse + next; + } + + + # By here, $treelet->[$i] is definitely an L node + DEBUG > 1 and print "Ogling L node $treelet->[$i]\n"; + + # bitch if it's empty + if( @{$treelet->[$i]} == 2 + or (@{$treelet->[$i]} == 3 and $treelet->[$i][2] eq '') + ) { + $self->whine( $start_line, "An empty L<>" ); + $treelet->[$i] = 'L<>'; # just make it a text node + next; # and move on + } + + # Catch URLs: + # URLs can, alas, contain E<...> sequences, so we can't /assume/ + # that this is one text node. But it has to START with one text + # node... + if(! ref $treelet->[$i][2] and + $treelet->[$i][2] =~ m/^\w+:[^:\s]\S*$/s + ) { + $treelet->[$i][1]{'type'} = 'url'; + $treelet->[$i][1]{'content-implicit'} = 'yes'; + + # TODO: deal with rel: URLs here? + + if( 3 == @{ $treelet->[$i] } ) { + # But if it IS just one text node (most common case) + DEBUG > 1 and printf qq{Catching "%s as " as ho-hum L link.\n}, + $treelet->[$i][2] + ; + $treelet->[$i][1]{'to'} = Pod::Simple::LinkSection->new( + $treelet->[$i][2] + ); # its own treelet + } else { + # It's a URL but complex (like "Lbar>"). Feh. + #$treelet->[$i][1]{'to'} = [ @{$treelet->[$i]} ]; + #splice @{ $treelet->[$i][1]{'to'} }, 0,2; + #DEBUG > 1 and printf qq{Catching "%s as " as complex L link.\n}, + # join '~', @{$treelet->[$i][1]{'to' }}; + + $treelet->[$i][1]{'to'} = Pod::Simple::LinkSection->new( + $treelet->[$i] # yes, clone the whole content as a treelet + ); + $treelet->[$i][1]{'to'}[0] = ''; # set the copy's tagname to nil + die "SANITY FAILURE" if $treelet->[0] eq ''; # should never happen! + DEBUG > 1 and print + qq{Catching "$treelet->[$i][1]{'to'}" as a complex L link.\n}; + } + + next; # and move on + } + + + # Catch some very simple and/or common cases + if(@{$treelet->[$i]} == 3 and ! ref $treelet->[$i][2]) { + my $it = $treelet->[$i][2]; + if($it =~ m/^[-a-zA-Z0-9]+\([-a-zA-Z0-9]+\)$/s) { # man sections + # Hopefully neither too broad nor too restrictive a RE + DEBUG > 1 and print "Catching \"$it\" as manpage link.\n"; + $treelet->[$i][1]{'type'} = 'man'; + # This's the only place where man links can get made. + $treelet->[$i][1]{'content-implicit'} = 'yes'; + $treelet->[$i][1]{'to' } = + Pod::Simple::LinkSection->new( $it ); # treelet! + + next; + } + if($it =~ m/^[^\/\|,\$\%\@\ \"\<\>\:\#\&\*\{\}\[\]\(\)]+(\:\:[^\/\|,\$\%\@\ \"\<\>\:\#\&\*\{\}\[\]\(\)]+)*$/s) { + # Extremely forgiving idea of what constitutes a bare + # modulename link like L or even L + DEBUG > 1 and print "Catching \"$it\" as ho-hum L link.\n"; + $treelet->[$i][1]{'type'} = 'pod'; + $treelet->[$i][1]{'content-implicit'} = 'yes'; + $treelet->[$i][1]{'to' } = + Pod::Simple::LinkSection->new( $it ); # treelet! + next; + } + # else fall thru... + } + + + + # ...Uhoh, here's the real L<...> parsing stuff... + # "With the ill behavior, with the ill behavior, with the ill behavior..." + + DEBUG > 1 and print "Running a real parse on this non-trivial L\n"; + + + my $link_text; # set to an arrayref if found + my $ell = $treelet->[$i]; + my @ell_content = @$ell; + splice @ell_content,0,2; # Knock off the 'L' and {} bits + + DEBUG > 3 and print " Ell content to start: ", + pretty(@ell_content), "\n"; + + + # Look for the "|" -- only in CHILDREN (not all underlings!) + # Like L + DEBUG > 3 and + print " Peering at L content for a '|' ...\n"; + for(my $j = 0; $j < @ell_content; ++$j) { + next if ref $ell_content[$j]; + DEBUG > 3 and + print " Peering at L-content text bit \"$ell_content[$j]\" for a '|'.\n"; + + if($ell_content[$j] =~ m/^([^\|]*)\|(.*)$/s) { + my @link_text = ($1); # might be 0-length + $ell_content[$j] = $2; # might be 0-length + + DEBUG > 3 and + print " FOUND a '|' in it. Splitting into [$1] + [$2]\n"; + + unshift @link_text, splice @ell_content, 0, $j; + # leaving only things at J and after + @ell_content = grep ref($_)||length($_), @ell_content ; + $link_text = [grep ref($_)||length($_), @link_text ]; + DEBUG > 3 and printf + " So link text is %s\n and remaining ell content is %s\n", + pretty($link_text), pretty(@ell_content); + last; + } + } + + + # Now look for the "/" -- only in CHILDREN (not all underlings!) + # And afterward, anything left in @ell_content will be the raw name + # Like L + my $section_name; # set to arrayref if found + DEBUG > 3 and print " Peering at L-content for a '/' ...\n"; + for(my $j = 0; $j < @ell_content; ++$j) { + next if ref $ell_content[$j]; + DEBUG > 3 and + print " Peering at L-content text bit \"$ell_content[$j]\" for a '/'.\n"; + + if($ell_content[$j] =~ m/^([^\/]*)\/(.*)$/s) { + my @section_name = ($2); # might be 0-length + $ell_content[$j] = $1; # might be 0-length + + DEBUG > 3 and + print " FOUND a '/' in it.", + " Splitting to page [...$1] + section [$2...]\n"; + + push @section_name, splice @ell_content, 1+$j; + # leaving only things before and including J + + @ell_content = grep ref($_)||length($_), @ell_content ; + @section_name = grep ref($_)||length($_), @section_name ; + + # Turn L<.../"foo"> into L<.../foo> + if(@section_name + and !ref($section_name[0]) and !ref($section_name[-1]) + and $section_name[ 0] =~ m/^\"/s + and $section_name[-1] =~ m/\"$/s + and !( # catch weird degenerate case of L<"> ! + @section_name == 1 and $section_name[0] eq '"' + ) + ) { + $section_name[ 0] =~ s/^\"//s; + $section_name[-1] =~ s/\"$//s; + DEBUG > 3 and + print " Quotes removed: ", pretty(@section_name), "\n"; + } else { + DEBUG > 3 and + print " No need to remove quotes in ", pretty(@section_name), "\n"; + } + + $section_name = \@section_name; + last; + } + } + + # Turn L<"Foo Bar"> into L + if(!$section_name and @ell_content + and !ref($ell_content[0]) and !ref($ell_content[-1]) + and $ell_content[ 0] =~ m/^\"/s + and $ell_content[-1] =~ m/\"$/s + and !( # catch weird degenerate case of L<"> ! + @ell_content == 1 and $ell_content[0] eq '"' + ) + ) { + $section_name = [splice @ell_content]; + $section_name->[ 0] =~ s/^\"//s; + $section_name->[-1] =~ s/\"$//s; + } + + # Turn L into L. + if(!$section_name and !$link_text and @ell_content + and grep !ref($_) && m/ /s, @ell_content + ) { + $section_name = [splice @ell_content]; + # That's support for the now-deprecated syntax. + # (Maybe generate a warning eventually?) + # Note that it deliberately won't work on L<...|Foo Bar> + } + + + # Now make up the link_text + # L -> L + # L -> L<"Bar"|Bar> + # L -> L<"Bar" in Foo/Foo> + unless($link_text) { + $ell->[1]{'content-implicit'} = 'yes'; + $link_text = []; + push @$link_text, '"', @$section_name, '"' if $section_name; + + if(@ell_content) { + $link_text->[-1] .= ' in ' if $section_name; + push @$link_text, @ell_content; + } + } + + + # And the E resolver will have to deal with all our treeletty things: + + if(@ell_content == 1 and !ref($ell_content[0]) + and $ell_content[0] =~ m/^[-a-zA-Z0-9]+\([-a-zA-Z0-9]+\)$/s + ) { + $ell->[1]{'type'} = 'man'; + DEBUG > 3 and print "Considering this ($ell_content[0]) a man link.\n"; + } else { + $ell->[1]{'type'} = 'pod'; + DEBUG > 3 and print "Considering this a pod link (not man or url).\n"; + } + + if( defined $section_name ) { + $ell->[1]{'section'} = Pod::Simple::LinkSection->new( + ['', {}, @$section_name] + ); + DEBUG > 3 and print "L-section content: ", pretty($ell->[1]{'section'}), "\n"; + } + + if( @ell_content ) { + $ell->[1]{'to'} = Pod::Simple::LinkSection->new( + ['', {}, @ell_content] + ); + DEBUG > 3 and print "L-to content: ", pretty($ell->[1]{'to'}), "\n"; + } + + # And update children to be the link-text: + @$ell = (@$ell[0,1], defined($link_text) ? splice(@$link_text) : ''); + + DEBUG > 2 and print "End of L-parsing for this node $treelet->[$i]\n"; + + unshift @stack, $treelet->[$i]; # might as well recurse + } + } + + return; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub _treat_Es { + my($self,@stack) = @_; + + my($i, $treelet, $content, $replacer, $charnum); + my $start_line = $stack[0][1]{'start_line'}; + + # A recursive algorithm implemented iteratively! Whee! + + + # Has frightening side effects on L nodes' attributes. + + #my @ells_to_tweak; + + while($treelet = shift @stack) { + for(my $i = 2; $i < @$treelet; ++$i) { # iterate over children + next unless ref $treelet->[$i]; # text nodes are uninteresting + if($treelet->[$i][0] eq 'L') { + # SPECIAL STUFF for semi-processed L<>'s + + my $thing; + foreach my $attrname ('section', 'to') { + if(defined($thing = $treelet->[$i][1]{$attrname}) and ref $thing) { + unshift @stack, $thing; + DEBUG > 2 and print " Enqueuing ", + pretty( $treelet->[$i][1]{$attrname} ), + " as an attribute value to tweak.\n"; + } + } + + unshift @stack, $treelet->[$i]; # recurse + next; + } elsif($treelet->[$i][0] ne 'E') { + unshift @stack, $treelet->[$i]; # recurse + next; + } + + DEBUG > 1 and print "Ogling E node ", pretty($treelet->[$i]), "\n"; + + # bitch if it's empty + if( @{$treelet->[$i]} == 2 + or (@{$treelet->[$i]} == 3 and $treelet->[$i][2] eq '') + ) { + $self->whine( $start_line, "An empty E<>" ); + $treelet->[$i] = 'E<>'; # splice in a literal + next; + } + + # bitch if content is weird + unless(@{$treelet->[$i]} == 3 and !ref($content = $treelet->[$i][2])) { + $self->whine( $start_line, "An E<...> surrounding strange content" ); + $replacer = $treelet->[$i]; # scratch + splice(@$treelet, $i, 1, # fake out a literal + 'E<', + splice(@$replacer,2), # promote its content + '>' + ); + # Don't need to do --$i, as the 'E<' we just added isn't interesting. + next; + } + + DEBUG > 1 and print "Ogling E<$content>\n"; + + $charnum = Pod::Escapes::e2charnum($content); + DEBUG > 1 and print " Considering E<$content> with char ", + defined($charnum) ? $charnum : "undef", ".\n"; + + if(!defined( $charnum )) { + DEBUG > 1 and print "I don't know how to deal with E<$content>.\n"; + $self->whine( $start_line, "Unknown E content in E<$content>" ); + $replacer = "E<$content>"; # better than nothing + } elsif($charnum >= 255 and !UNICODE) { + $replacer = ASCII ? "\xA4" : "?"; + DEBUG > 1 and print "This Perl version can't handle ", + "E<$content> (chr $charnum), so replacing with $replacer\n"; + } else { + $replacer = Pod::Escapes::e2char($content); + DEBUG > 1 and print " Replacing E<$content> with $replacer\n"; + } + + splice(@$treelet, $i, 1, $replacer); # no need to back up $i, tho + } + } + + return; +} + + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub _treat_Ss { + my($self,$treelet) = @_; + + _change_S_to_nbsp($treelet,0) if $self->{'nbsp_for_S'}; + + # TODO: or a change_nbsp_to_S + # Normalizing nbsp's to S is harder: for each text node, make S content + # out of anything matching m/([^ \xA0]*(?:\xA0+[^ \xA0]*)+)/ + + + return; +} + + +sub _change_S_to_nbsp { # a recursive function + # Sanely assumes that the top node in the excursion won't be an S node. + my($treelet, $in_s) = @_; + + my $is_s = ('S' eq $treelet->[0]); + $in_s ||= $is_s; # So in_s is on either by this being an S element, + # or by an ancestor being an S element. + + for(my $i = 2; $i < @$treelet; ++$i) { + if(ref $treelet->[$i]) { + if( _change_S_to_nbsp( $treelet->[$i], $in_s ) ) { + my $to_pull_up = $treelet->[$i]; + splice @$to_pull_up,0,2; # ...leaving just its content + splice @$treelet, $i, 1, @$to_pull_up; # Pull up content + $i += @$to_pull_up - 1; # Make $i skip the pulled-up stuff + } + } else { + $treelet->[$i] =~ s/\s/\xA0/g if ASCII and $in_s; + # (If not in ASCIIland, we can't assume that \xA0 == nbsp.) + + # Note that if you apply nbsp_for_S to text, and so turn + # "foo S quux" into "foo bar faz quux", you + # end up with something that fails to say "and don't hyphenate + # any part of 'bar baz'". However, hyphenation is such a vexing + # problem anyway, that most Pod renderers just don't render it + # at all. But if you do want to implement hyphenation, I guess + # that you'd better have nbsp_for_S off. + } + } + + return $is_s; +} + +#----------------------------------------------------------------------------- + +sub _accessorize { # A simple-minded method-maker + no strict 'refs'; + foreach my $attrname (@_) { + next if $attrname =~ m/::/; # a hack + *{caller() . '::' . $attrname} = sub { + use strict; + $Carp::CarpLevel = 1, Carp::croak( + "Accessor usage: \$obj->$attrname() or \$obj->$attrname(\$new_value)" + ) unless (@_ == 1 or @_ == 2) and ref $_[0]; + (@_ == 1) ? $_[0]->{$attrname} + : ($_[0]->{$attrname} = $_[1]); + }; + } + # Ya know, they say accessories make the ensemble! + return; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . +#============================================================================= + +sub filter { + my($class, $source) = @_; + my $new = $class->new; + $new->output_fh(*STDOUT{IO}); + + if(ref($source || '') eq 'SCALAR') { + $new->parse_string_document( $$source ); + } elsif(ref($source)) { # it's a file handle + $new->parse_file($source); + } else { # it's a filename + $new->parse_file($source); + } + + return $new; +} + + +#----------------------------------------------------------------------------- + +sub _out { + # For use in testing: Class->_out($source) + # returns the transformation of $source + + my $class = shift(@_); + + my $mutor = shift(@_) if @_ and ref($_[0] || '') eq 'CODE'; + + DEBUG and print "\n\n", '#' x 76, + "\nAbout to parse source: {{\n$_[0]\n}}\n\n"; + + + my $parser = $class->new; + $parser->hide_line_numbers(1); + + my $out = ''; + $parser->output_string( \$out ); + DEBUG and print " _out to ", \$out, "\n"; + + $mutor->($parser) if $mutor; + + $parser->parse_string_document( $_[0] ); + # use Data::Dumper; print Dumper($parser), "\n"; + return $out; +} + + +sub _duo { + # For use in testing: Class->_duo($source1, $source2) + # returns the parse trees of $source1 and $source2. + # Good in things like: &ok( Class->duo(... , ...) ); + + my $class = shift(@_); + + Carp::croak "But $class->_duo is useful only in list context!" + unless wantarray; + + my $mutor = shift(@_) if @_ and ref($_[0] || '') eq 'CODE'; + + Carp::croak "But $class->_duo takes two parameters, not: @_" + unless @_ == 2; + + my(@out); + + while( @_ ) { + my $parser = $class->new; + + push @out, ''; + $parser->output_string( \( $out[-1] ) ); + + DEBUG and print " _duo out to ", $parser->output_string(), + " = $parser->{'output_string'}\n"; + + $parser->hide_line_numbers(1); + $mutor->($parser) if $mutor; + $parser->parse_string_document( shift( @_ ) ); + # use Data::Dumper; print Dumper($parser), "\n"; + } + + return @out; +} + + + +#----------------------------------------------------------------------------- +1; +__END__ + +TODO: +A start_formatting_code and end_formatting_code methods, which in the +base class call start_L, end_L, start_C, end_C, etc., if they are +defined. + +have the POD FORMATTING ERRORS section note the localtime, and the +version of Pod::Simple. + +option to delete all Es? +option to scream if under-0x20 literals are found in the input, or +under-E<32> E codes are found in the tree. And ditto \x7f-\x9f + +Option to turn highbit characters into their compromised form? (applies +to E parsing too) + +TODO: BOM/encoding things. + +TODO: ascii-compat things in the XML classes? + diff --git a/lib/Pod/Simple.pod b/lib/Pod/Simple.pod new file mode 100644 index 0000000..b0a8a6f --- /dev/null +++ b/lib/Pod/Simple.pod @@ -0,0 +1,218 @@ + +=head1 NAME + +Pod::Simple - framework for parsing Pod + +=head1 SYNOPSIS + + TODO + +=head1 DESCRIPTION + +Pod::Simple is a Perl library for parsing text in the Pod ("plain old +documentation") markup language that is typically used for writing +documentation for Perl and for Perl modules. The Pod format is explained +in the L man page; the most common formatter is called +"perldoc". + +Pod formatters can use Pod::Simple to parse Pod documents into produce +renderings of them in plain ASCII, in HTML, or in any number of other +formats. Typically, such formatters will be subclasses of Pod::Simple, +and so they will inherit its methods, like C. + +If you're reading this document just because you have a Pod-processing +subclass that you want to use, this document (plus the documentation for +the subclass) is probably all you'll need to read. + +If you're reading this document because you want to write a formatter +subclass, continue reading this document, and then read +L, and then possibly even read L +(some of which is for parser-writers, but much of which is notes to +formatter-writers). + + +=head1 MAIN METHODS + + + +=over + +=item C<< $parser = I->new(); >> + +This returns a new parser object, where I> is a subclass +of Pod::Simple. + +=item C<< $parser->output_fh( *OUT ); >> + +This sets the filehandle that C<$parser>'s output will be written to. +You can pass C<*STDOUT>, otherwise you should probably do something +like this: + + my $outfile = "output.txt"; + open TXTOUT, ">$outfile" or die "Can't write to $outfile: $!"; + $parser->output_fh(*TXTOUT); + +...before you call one of the C<< $parser->parse_I >> methods. + +=item C<< $parser->output_string( \$somestring ); >> + +This sets the string that C<$parser>'s output will be sent to, +instead of any filehandle. + + +=item C<< $parser->parse_file( I<$some_filename> ); >> + +=item C<< $parser->parse_file( *INPUT_FH ); >> + +This reads the Pod content of the file (or filehandle) that you specify, +and processes it with that C<$parser> object, according to however +C<$parser>'s class works, and according to whatever parser options you +have set up for this C<$parser> object. + +=item C<< $parser->parse_string_document( I<$all_content> ); >> + +This works just like C except that it reads the Pod +content not from a file, but from a string that you have already +in memory. + +=item C<< $parser->parse_lines( I<...@lines...>, undef ); >> + +This processes the lines in C<@lines> (where each list item must be a +defined value, and must contain exactly one line of content -- so no +items like C<"foo\nbar"> are allowed). The final C is used to +indicate the end of document being parsed. + +The other C> methods are meant to be called only once +per C<$parser> object; but C can be called as many times per +C<$parser> object as you want, as long as the last call (and only +the last call) ends with an C value. + + +=item C<< $parser->content_seen >> + +This returns true only if there has been any real content seen +for this document. + + +=item C<< I->filter( I<$filename> ); >> + +=item C<< I->filter( I<*INPUT_FH> ); >> + +=item C<< I->filter( I<\$document_content> ); >> + +This is a shortcut method for creating a new parser object, setting the +output handle to STDOUT, and then processing the specified file (or +filehandle, or in-memory document). This is handy for one-liners like +this: + + perl -MPod::Simple::Text -e "Pod::Simple::Text->filter('thingy.pod')" + +=back + + + +=head1 SECONDARY METHODS + +Some of these methods might be of interest to general users, as +well as of interest to formatter-writers. + +Note that the general pattern here is that the accessor-methods +read the attribute's value with C<< $value = $parser->I >> +and set the attribute's value with +C<< $parser->I(I) >>. For each accessor, I typically +only mention one syntax or another, based on which I think you are actually +most likely to use. + + +=over + +=item C<< $parser->no_whining( I ) >> + +If you set this attribute to a true value, you will suppress the +parser's complaints about irregularities in the Pod coding. By default, +this attribute's value is false, meaning that irregularities will +be reported. + +Note that turning this attribute to true won't suppress one or two kinds +of complaints about rarely occurring unrecoverable errors. + + +=item C<< $parser->no_errata_section( I ) >> + +If you set this attribute to a true value, you will stop the parser from +generating a "POD ERRORS" section at the end of the document. By +default, this attribute's value is false, meaning that an errata section +will be generated, as necessary. + + +=item C<< $parser->complain_stderr( I ) >> + +If you set this attribute to a true value, it will send reports of +parsing errors to STDERR. By default, this attribute's value is false, +meaning that no output is sent to STDERR. + +Note that errors can be noted in an errata section, or sent to STDERR, +or both, or neither. So don't think that turning on C +will turn off C or vice versa -- these are +independent attributes. + + +=item C<< $parser->source_filename >> + +This returns the filename that this parser object was set to read from. + + +=item C<< $parser->doc_has_started >> + +This returns true if C<$parser> has read from a source, and has seen +Pod content in it. + + +=item C<< $parser->source_dead >> + +This returns true if C<$parser> has read from a source, and come to the +end of that source. + +=back + + +=head1 CAVEATS + +This is just a beta release -- there are a good number of things still +left to do. Notably, support for EBCDIC platforms is still half-done, +an untested. + + +=head1 SEE ALSO + +L + +L + +L + +L + +L + + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Original author: Sean M. Burke C + +Maintained by: Allison Randal C + +=cut + + diff --git a/lib/Pod/Simple/BlackBox.pm b/lib/Pod/Simple/BlackBox.pm new file mode 100644 index 0000000..12eba58 --- /dev/null +++ b/lib/Pod/Simple/BlackBox.pm @@ -0,0 +1,1905 @@ + +package Pod::Simple::BlackBox; +# +# "What's in the box?" "Pain." +# +########################################################################### +# +# This is where all the scary things happen: parsing lines into +# paragraphs; and then into directives, verbatims, and then also +# turning formatting sequences into treelets. +# +# Are you really sure you want to read this code? +# +#----------------------------------------------------------------------------- +# +# The basic work of this module Pod::Simple::BlackBox is doing the dirty work +# of parsing Pod into treelets (generally one per non-verbatim paragraph), and +# to call the proper callbacks on the treelets. +# +# Every node in a treelet is a ['name', {attrhash}, ...children...] + +use integer; # vroom! +use strict; +use Carp (); +BEGIN { + require Pod::Simple; + *DEBUG = \&Pod::Simple::DEBUG unless defined &DEBUG +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub parse_line { shift->parse_lines(@_) } # alias + +# - - - Turn back now! Run away! - - - + +sub parse_lines { # Usage: $parser->parse_lines(@lines) + # an undef means end-of-stream + my $self = shift; + + my $code_handler = $self->{'code_handler'}; + my $cut_handler = $self->{'cut_handler'}; + $self->{'line_count'} ||= 0; + + my $scratch; + + DEBUG > 4 and + print "# Parsing starting at line ", $self->{'line_count'}, ".\n"; + + DEBUG > 5 and + print "# About to parse lines: ", + join(' ', map defined($_) ? "[$_]" : "EOF", @_), "\n"; + + my $paras = ($self->{'paras'} ||= []); + # paragraph buffer. Because we need to defer processing of =over + # directives and verbatim paragraphs. We call _ponder_paragraph_buffer + # to process this. + + $self->{'pod_para_count'} ||= 0; + + my $line; + foreach my $source_line (@_) { + if( $self->{'source_dead'} ) { + DEBUG > 4 and print "# Source is dead.\n"; + last; + } + + unless( defined $source_line ) { + DEBUG > 4 and print "# Undef-line seen.\n"; + + push @$paras, ['~end', {'start_line' => $self->{'line_count'}}]; + push @$paras, $paras->[-1], $paras->[-1]; + # So that it definitely fills the buffer. + $self->{'source_dead'} = 1; + $self->_ponder_paragraph_buffer; + next; + } + + + if( $self->{'line_count'}++ ) { + ($line = $source_line) =~ tr/\n\r//d; + # If we don't have two vars, we'll end up with that there + # tr/// modding the (potentially read-only) original source line! + + } else { + DEBUG > 2 and print "First line: [$source_line]\n"; + + if( ($line = $source_line) =~ s/^\xEF\xBB\xBF//s ) { + DEBUG and print "UTF-8 BOM seen. Faking a '=encode utf8'.\n"; + $self->_handle_encoding_line( "=encode utf8" ); + $line =~ tr/\n\r//d; + + } elsif( $line =~ s/^\xFE\xFF//s ) { + DEBUG and print "Big-endian UTF-16 BOM seen. Aborting parsing.\n"; + $self->scream( + $self->{'line_count'}, + "UTF16-BE Byte Encoding Mark found; but Pod::Simple v$Pod::Simple::VERSION doesn't implement UTF16 yet." + ); + splice @_; + push @_, undef; + next; + + # TODO: implement somehow? + + } elsif( $line =~ s/^\xFF\xFE//s ) { + DEBUG and print "Little-endian UTF-16 BOM seen. Aborting parsing.\n"; + $self->scream( + $self->{'line_count'}, + "UTF16-LE Byte Encoding Mark found; but Pod::Simple v$Pod::Simple::VERSION doesn't implement UTF16 yet." + ); + splice @_; + push @_, undef; + next; + + # TODO: implement somehow? + + } else { + DEBUG > 2 and print "First line is BOM-less.\n"; + ($line = $source_line) =~ tr/\n\r//d; + } + } + + + DEBUG > 5 and print "# Parsing line: [$line]\n"; + + if(!$self->{'in_pod'}) { + if($line =~ m/^=([a-zA-Z]+)/s) { + if($1 eq 'cut') { + $self->scream( + $self->{'line_count'}, + "=cut found outside a pod block. Skipping to next block." + ); + + ## Before there were errata sections in the world, it was + ## least-pessimal to abort processing the file. But now we can + ## just barrel on thru (but still not start a pod block). + #splice @_; + #push @_, undef; + + next; + } else { + $self->{'in_pod'} = $self->{'start_of_pod_block'} + = $self->{'last_was_blank'} = 1; + # And fall thru to the pod-mode block further down + } + } else { + DEBUG > 5 and print "# It's a code-line.\n"; + $code_handler->(map $_, $line, $self->{'line_count'}, $self) + if $code_handler; + # Note: this may cause code to be processed out of order relative + # to pods, but in order relative to cuts. + + # Note also that we haven't yet applied the transcoding to $line + # by time we call $code_handler! + + if( $line =~ m/^#\s*line\s+(\d+)\s*(?:\s"([^"]+)")?\s*$/ ) { + # That RE is from perlsyn, section "Plain Old Comments (Not!)", + #$fname = $2 if defined $2; + #DEBUG > 1 and defined $2 and print "# Setting fname to \"$fname\"\n"; + DEBUG > 1 and print "# Setting nextline to $1\n"; + $self->{'line_count'} = $1 - 1; + } + + next; + } + } + + # . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + # Else we're in pod mode: + + # Apply any necessary transcoding: + $self->{'_transcoder'} && $self->{'_transcoder'}->($line); + + # HERE WE CATCH =encoding EARLY! + if( $line =~ m/^=encoding\s+\S+\s*$/s ) { + $line = $self->_handle_encoding_line( $line ); + } + + if($line =~ m/^=cut/s) { + # here ends the pod block, and therefore the previous pod para + DEBUG > 1 and print "Noting =cut at line ${$self}{'line_count'}\n"; + $self->{'in_pod'} = 0; + # ++$self->{'pod_para_count'}; + $self->_ponder_paragraph_buffer(); + # by now it's safe to consider the previous paragraph as done. + $cut_handler->(map $_, $line, $self->{'line_count'}, $self) + if $cut_handler; + + # TODO: add to docs: Note: this may cause cuts to be processed out + # of order relative to pods, but in order relative to code. + + } elsif($line =~ m/^\s*$/s) { # it's a blank line + if(!$self->{'start_of_pod_block'} and @$paras and $paras->[-1][0] eq '~Verbatim') { + DEBUG > 1 and print "Saving blank line at line ${$self}{'line_count'}\n"; + push @{$paras->[-1]}, $line; + } # otherwise it's not interesting + + if(!$self->{'start_of_pod_block'} and !$self->{'last_was_blank'}) { + DEBUG > 1 and print "Noting para ends with blank line at ${$self}{'line_count'}\n"; + } + + $self->{'last_was_blank'} = 1; + + } elsif($self->{'last_was_blank'}) { # A non-blank line starting a new para... + + if($line =~ m/^(=[a-zA-Z][a-zA-Z0-9]*)(?:\s+|$)(.*)/s) { + # THIS IS THE ONE PLACE WHERE WE CONSTRUCT NEW DIRECTIVE OBJECTS + my $new = [$1, {'start_line' => $self->{'line_count'}}, $2]; + # Note that in "=head1 foo", the WS is lost. + # Example: ['=head1', {'start_line' => 123}, ' foo'] + + ++$self->{'pod_para_count'}; + + $self->_ponder_paragraph_buffer(); + # by now it's safe to consider the previous paragraph as done. + + push @$paras, $new; # the new incipient paragraph + DEBUG > 1 and print "Starting new ${$paras}[-1][0] para at line ${$self}{'line_count'}\n"; + + } elsif($line =~ m/^\s/s) { + + if(!$self->{'start_of_pod_block'} and @$paras and $paras->[-1][0] eq '~Verbatim') { + DEBUG > 1 and print "Resuming verbatim para at line ${$self}{'line_count'}\n"; + push @{$paras->[-1]}, $line; + } else { + ++$self->{'pod_para_count'}; + $self->_ponder_paragraph_buffer(); + # by now it's safe to consider the previous paragraph as done. + DEBUG > 1 and print "Starting verbatim para at line ${$self}{'line_count'}\n"; + push @$paras, ['~Verbatim', {'start_line' => $self->{'line_count'}}, $line]; + } + } else { + ++$self->{'pod_para_count'}; + $self->_ponder_paragraph_buffer(); + # by now it's safe to consider the previous paragraph as done. + push @$paras, ['~Para', {'start_line' => $self->{'line_count'}}, $line]; + DEBUG > 1 and print "Starting plain para at line ${$self}{'line_count'}\n"; + } + $self->{'last_was_blank'} = $self->{'start_of_pod_block'} = 0; + + } else { + # It's a non-blank line /continuing/ the current para + if(@$paras) { + DEBUG > 2 and print "Line ${$self}{'line_count'} continues current paragraph\n"; + push @{$paras->[-1]}, $line; + } else { + # Unexpected case! + die "Continuing a paragraph but \@\$paras is empty?"; + } + $self->{'last_was_blank'} = $self->{'start_of_pod_block'} = 0; + } + + } # ends the big while loop + + DEBUG > 1 and print(pretty(@$paras), "\n"); + return $self; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _handle_encoding_line { + my($self, $line) = @_; + + # The point of this routine is to set $self->{'_transcoder'} as indicated. + + return $line unless $line =~ m/^=encoding\s+(\S+)\s*$/s; + DEBUG > 1 and print "Found an encoding line \"=encoding $1\"\n"; + + my $e = $1; + my $orig = $e; + push @{ $self->{'encoding_command_reqs'} }, "=encoding $orig"; + + my $enc_error; + + # Cf. perldoc Encode and perldoc Encode::Supported + + require Pod::Simple::Transcode; + + if( $self->{'encoding'} ) { + my $norm_current = $self->{'encoding'}; + my $norm_e = $e; + foreach my $that ($norm_current, $norm_e) { + $that = lc($that); + $that =~ s/[-_]//g; + } + if($norm_current eq $norm_e) { + DEBUG > 1 and print "The '=encoding $orig' line is ", + "redundant. ($norm_current eq $norm_e). Ignoring.\n"; + $enc_error = ''; + # But that doesn't necessarily mean that the earlier one went okay + } else { + $enc_error = "Encoding is already set to " . $self->{'encoding'}; + DEBUG > 1 and print $enc_error; + } + } elsif ( + # OK, let's turn on the encoding + do { + DEBUG > 1 and print " Setting encoding to $e\n"; + $self->{'encoding'} = $e; + 1; + } + and $e eq 'HACKRAW' + ) { + DEBUG and print " Putting in HACKRAW (no-op) encoding mode.\n"; + + } elsif( Pod::Simple::Transcode::->encoding_is_available($e) ) { + + die($enc_error = "WHAT? _transcoder is already set?!") + if $self->{'_transcoder'}; # should never happen + require Pod::Simple::Transcode; + $self->{'_transcoder'} = Pod::Simple::Transcode::->make_transcoder($e); + eval { + my @x = ('', "abc", "123"); + $self->{'_transcoder'}->(@x); + }; + $@ && die( $enc_error = + "Really unexpected error setting up encoding $e: $@\nAborting" + ); + + } else { + my @supported = Pod::Simple::Transcode::->all_encodings; + + # Note unsupported, and complain + DEBUG and print " Encoding [$e] is unsupported.", + "\nSupporteds: @supported\n"; + my $suggestion = ''; + + # Look for a near match: + my $norm = lc($e); + $norm =~ tr[-_][]d; + my $n; + foreach my $enc (@supported) { + $n = lc($enc); + $n =~ tr[-_][]d; + next unless $n eq $norm; + $suggestion = " (Maybe \"$e\" should be \"$enc\"?)"; + last; + } + my $encmodver = Pod::Simple::Transcode::->encmodver; + $enc_error = join '' => + "This document probably does not appear as it should, because its ", + "\"=encoding $e\" line calls for an unsupported encoding.", + $suggestion, " [$encmodver\'s supported encodings are: @supported]" + ; + + $self->scream( $self->{'line_count'}, $enc_error ); + } + push @{ $self->{'encoding_command_statuses'} }, $enc_error; + + return '=encoding ALREADYDONE'; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _handle_encoding_second_level { + # By time this is called, the encoding (if well formed) will already + # have been acted one. + my($self, $para) = @_; + my @x = @$para; + my $content = join ' ', splice @x, 2; + $content =~ s/^\s+//s; + $content =~ s/\s+$//s; + + DEBUG > 2 and print "Ogling encoding directive: =encoding $content\n"; + + if($content eq 'ALREADYDONE') { + # It's already been handled. Check for errors. + if(! $self->{'encoding_command_statuses'} ) { + DEBUG > 2 and print " CRAZY ERROR: It wasn't really handled?!\n"; + } elsif( $self->{'encoding_command_statuses'}[-1] ) { + $self->whine( $para->[1]{'start_line'}, + sprintf "Couldn't do %s: %s", + $self->{'encoding_command_reqs' }[-1], + $self->{'encoding_command_statuses'}[-1], + ); + } else { + DEBUG > 2 and print " (Yup, it was successfully handled already.)\n"; + } + + } else { + # Otherwise it's a syntax error + $self->whine( $para->[1]{'start_line'}, + "Invalid =encoding syntax: $content" + ); + } + + return; +} + +#~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~`~` + +{ +my $m = -321; # magic line number + +sub _gen_errata { + my $self = $_[0]; + # Return 0 or more fake-o paragraphs explaining the accumulated + # errors on this document. + + return() unless $self->{'errata'} and keys %{$self->{'errata'}}; + + my @out; + + foreach my $line (sort {$a <=> $b} keys %{$self->{'errata'}}) { + push @out, + ['=item', {'start_line' => $m}, "Around line $line:"], + map( ['~Para', {'start_line' => $m, '~cooked' => 1}, + #['~Top', {'start_line' => $m}, + $_ + #] + ], + @{$self->{'errata'}{$line}} + ) + ; + } + + # TODO: report of unknown entities? unrenderable characters? + + unshift @out, + ['=head1', {'start_line' => $m, 'errata' => 1}, 'POD ERRORS'], + ['~Para', {'start_line' => $m, '~cooked' => 1, 'errata' => 1}, + "Hey! ", + ['B', {}, + 'The above document had some coding errors, which are explained below:' + ] + ], + ['=over', {'start_line' => $m, 'errata' => 1}, ''], + ; + + push @out, + ['=back', {'start_line' => $m, 'errata' => 1}, ''], + ; + + DEBUG and print "\n<<\n", pretty(\@out), "\n>>\n\n"; + + return @out; +} + +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +############################################################################## +## +## stop reading now stop reading now stop reading now stop reading now stop +## +## HERE IT BECOMES REALLY SCARY +## +## stop reading now stop reading now stop reading now stop reading now stop +## +############################################################################## + +sub _ponder_paragraph_buffer { + + # Para-token types as found in the buffer. + # ~Verbatim, ~Para, ~end, =head1..4, =for, =begin, =end, + # =over, =back, =item + # and the null =pod (to be complained about if over one line) + # + # "~data" paragraphs are something we generate at this level, depending on + # a currently open =over region + + # Events fired: Begin and end for: + # directivename (like head1 .. head4), item, extend, + # for (from =begin...=end, =for), + # over-bullet, over-number, over-text, over-block, + # item-bullet, item-number, item-text, + # Document, + # Data, Para, Verbatim + # B, C, longdirname (TODO -- wha?), etc. for all directives + # + + my $self = $_[0]; + my $paras; + return unless @{$paras = $self->{'paras'}}; + my $curr_open = ($self->{'curr_open'} ||= []); + + my $scratch; + + DEBUG > 10 and print "# Paragraph buffer: <<", pretty($paras), ">>\n"; + + # We have something in our buffer. So apparently the document has started. + unless($self->{'doc_has_started'}) { + $self->{'doc_has_started'} = 1; + + my $starting_contentless; + $starting_contentless = + ( + !@$curr_open + and @$paras and ! grep $_->[0] ne '~end', @$paras + # i.e., if the paras is all ~ends + ) + ; + DEBUG and print "# Starting ", + $starting_contentless ? 'contentless' : 'contentful', + " document\n" + ; + + $self->_handle_element_start( + ($scratch = 'Document'), + { + 'start_line' => $paras->[0][1]{'start_line'}, + $starting_contentless ? ( 'contentless' => 1 ) : (), + }, + ); + } + + my($para, $para_type); + while(@$paras) { + last if @$paras == 1 and + ( $paras->[0][0] eq '=over' or $paras->[0][0] eq '~Verbatim' + or $paras->[0][0] eq '=item' ) + ; + # Those're the three kinds of paragraphs that require lookahead. + # Actually, an "=item Foo" inside an region + # and any =item inside an region (rare) + # don't require any lookahead, but all others (bullets + # and numbers) do. + +# TODO: winge about many kinds of directives in non-resolving =for regions? +# TODO: many? like what? =head1 etc? + + $para = shift @$paras; + $para_type = $para->[0]; + + DEBUG > 1 and print "Pondering a $para_type paragraph, given the stack: (", + $self->_dump_curr_open(), ")\n"; + + if($para_type eq '=for') { + next if $self->_ponder_for($para,$curr_open,$paras); + + } elsif($para_type eq '=begin') { + next if $self->_ponder_begin($para,$curr_open,$paras); + + } elsif($para_type eq '=end') { + next if $self->_ponder_end($para,$curr_open,$paras); + + } elsif($para_type eq '~end') { # The virtual end-document signal + next if $self->_ponder_doc_end($para,$curr_open,$paras); + } + + + # ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + #~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + if(grep $_->[1]{'~ignore'}, @$curr_open) { + DEBUG > 1 and + print "Skipping $para_type paragraph because in ignore mode.\n"; + next; + } + #~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + # ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + + if($para_type eq '=pod') { + $self->_ponder_pod($para,$curr_open,$paras); + + } elsif($para_type eq '=over') { + next if $self->_ponder_over($para,$curr_open,$paras); + + } elsif($para_type eq '=back') { + next if $self->_ponder_back($para,$curr_open,$paras); + + } else { + + # All non-magical codes!!! + + # Here we start using $para_type for our own twisted purposes, to + # mean how it should get treated, not as what the element name + # should be. + + DEBUG > 1 and print "Pondering non-magical $para_type\n"; + + my $i; + + # Enforce some =headN discipline + if($para_type =~ m/^=head\d$/s + and ! $self->{'accept_heads_anywhere'} + and @$curr_open + and $curr_open->[-1][0] eq '=over' + ) { + DEBUG > 2 and print "'=$para_type' inside an '=over'!\n"; + $self->whine( + $para->[1]{'start_line'}, + "You forgot a '=back' before '$para_type'" + ); + unshift @$paras, ['=back', {}, ''], $para; # close the =over + next; + } + + + if($para_type eq '=item') { + + my $over; + unless(@$curr_open and ($over = $curr_open->[-1])->[0] eq '=over') { + $self->whine( + $para->[1]{'start_line'}, + "'=item' outside of any '=over'" + ); + unshift @$paras, + ['=over', {'start_line' => $para->[1]{'start_line'}}, ''], + $para + ; + next; + } + + + my $over_type = $over->[1]{'~type'}; + + if(!$over_type) { + # Shouldn't happen1 + die "Typeless over in stack, starting at line " + . $over->[1]{'start_line'}; + + } elsif($over_type eq 'block') { + unless($curr_open->[-1][1]{'~bitched_about'}) { + $curr_open->[-1][1]{'~bitched_about'} = 1; + $self->whine( + $curr_open->[-1][1]{'start_line'}, + "You can't have =items (as at line " + . $para->[1]{'start_line'} + . ") unless the first thing after the =over is an =item" + ); + } + # Just turn it into a paragraph and reconsider it + $para->[0] = '~Para'; + unshift @$paras, $para; + next; + + } elsif($over_type eq 'text') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + if($item_type eq 'text') { + # Nothing special needs doing for 'text' + } elsif($item_type eq 'number' or $item_type eq 'bullet') { + die "Unknown item type $item_type" + unless $item_type eq 'number' or $item_type eq 'bullet'; + # Undo our clobbering: + push @$para, $para->[1]{'~orig_content'}; + delete $para->[1]{'number'}; + # Only a PROPER item-number element is allowed + # to have a number attribute. + } else { + die "Unhandled item type $item_type"; # should never happen + } + + # =item-text thingies don't need any assimilation, it seems. + + } elsif($over_type eq 'number') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + my $expected_value = ++ $curr_open->[-1][1]{'~counter'}; + + if($item_type eq 'bullet') { + # Hm, it's not numeric. Correct for this. + $para->[1]{'number'} = $expected_value; + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item $expected_value'" + ); + push @$para, $para->[1]{'~orig_content'}; + # restore the bullet, blocking the assimilation of next para + + } elsif($item_type eq 'text') { + # Hm, it's not numeric. Correct for this. + $para->[1]{'number'} = $expected_value; + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item $expected_value'" + ); + # Text content will still be there and will block next ~Para + + } elsif($item_type ne 'number') { + die "Unknown item type $item_type"; # should never happen + + } elsif($expected_value == $para->[1]{'number'}) { + DEBUG > 1 and print " Numeric item has the expected value of $expected_value\n"; + + } else { + DEBUG > 1 and print " Numeric item has ", $para->[1]{'number'}, + " instead of the expected value of $expected_value\n"; + $self->whine( + $para->[1]{'start_line'}, + "You have '=item " . $para->[1]{'number'} . + "' instead of the expected '=item $expected_value'" + ); + $para->[1]{'number'} = $expected_value; # correcting!! + } + + if(@$para == 2) { + # For the cases where we /didn't/ push to @$para + if($paras->[0][0] eq '~Para') { + DEBUG and print "Assimilating following ~Para content into $over_type item\n"; + push @$para, splice @{shift @$paras},2; + } else { + DEBUG and print "Can't assimilate following ", $paras->[0][0], "\n"; + push @$para, ''; # Just so it's not contentless + } + } + + + } elsif($over_type eq 'bullet') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + if($item_type eq 'bullet') { + # as expected! + + if( $para->[1]{'~_freaky_para_hack'} ) { + DEBUG and print "Accomodating '=item * Foo' tolerance hack.\n"; + push @$para, delete $para->[1]{'~_freaky_para_hack'}; + } + + } elsif($item_type eq 'number') { + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item *'" + ); + push @$para, $para->[1]{'~orig_content'}; + # and block assimilation of the next paragraph + delete $para->[1]{'number'}; + # Only a PROPER item-number element is allowed + # to have a number attribute. + } elsif($item_type eq 'text') { + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item *'" + ); + # But doesn't need processing. But it'll block assimilation + # of the next para. + } else { + die "Unhandled item type $item_type"; # should never happen + } + + if(@$para == 2) { + # For the cases where we /didn't/ push to @$para + if($paras->[0][0] eq '~Para') { + DEBUG and print "Assimilating following ~Para content into $over_type item\n"; + push @$para, splice @{shift @$paras},2; + } else { + DEBUG and print "Can't assimilate following ", $paras->[0][0], "\n"; + push @$para, ''; # Just so it's not contentless + } + } + + } else { + die "Unhandled =over type \"$over_type\"?"; + # Shouldn't happen! + } + + $para_type = 'Plain'; + $para->[0] .= '-' . $over_type; + # Whew. Now fall thru and process it. + + + } elsif($para_type eq '=extend') { + # Well, might as well implement it here. + $self->_ponder_extend($para); + next; # and skip + } elsif($para_type eq '=encoding') { + # Not actually acted on here, but we catch errors here. + $self->_handle_encoding_second_level($para); + + next; # and skip + } elsif($para_type eq '~Verbatim') { + $para->[0] = 'Verbatim'; + $para_type = '?Verbatim'; + } elsif($para_type eq '~Para') { + $para->[0] = 'Para'; + $para_type = '?Plain'; + } elsif($para_type eq 'Data') { + $para->[0] = 'Data'; + $para_type = '?Data'; + } elsif( $para_type =~ s/^=//s + and defined( $para_type = $self->{'accept_directives'}{$para_type} ) + ) { + DEBUG > 1 and print " Pondering known directive ${$para}[0] as $para_type\n"; + } else { + # An unknown directive! + DEBUG > 1 and printf "Unhandled directive %s (Handled: %s)\n", + $para->[0], join(' ', sort keys %{$self->{'accept_directives'}} ) + ; + $self->whine( + $para->[1]{'start_line'}, + "Unknown directive: $para->[0]" + ); + + # And maybe treat it as text instead of just letting it go? + next; + } + + if($para_type =~ s/^\?//s) { + if(! @$curr_open) { # usual case + DEBUG and print "Treating $para_type paragraph as such because stack is empty.\n"; + } else { + my @fors = grep $_->[0] eq '=for', @$curr_open; + DEBUG > 1 and print "Containing fors: ", + join(',', map $_->[1]{'target'}, @fors), "\n"; + + if(! @fors) { + DEBUG and print "Treating $para_type paragraph as such because stack has no =for's\n"; + + #} elsif(grep $_->[1]{'~resolve'}, @fors) { + #} elsif(not grep !$_->[1]{'~resolve'}, @fors) { + } elsif( $fors[-1][1]{'~resolve'} ) { + # Look to the immediately containing for + + if($para_type eq 'Data') { + DEBUG and print "Treating Data paragraph as Plain/Verbatim because the containing =for ($fors[-1][1]{'target'}) is a resolver\n"; + $para->[0] = 'Para'; + $para_type = 'Plain'; + } else { + DEBUG and print "Treating $para_type paragraph as such because the containing =for ($fors[-1][1]{'target'}) is a resolver\n"; + } + } else { + DEBUG and print "Treating $para_type paragraph as Data because the containing =for ($fors[-1][1]{'target'}) is a non-resolver\n"; + $para->[0] = $para_type = 'Data'; + } + } + } + + #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + if($para_type eq 'Plain') { + $self->_ponder_Plain($para); + } elsif($para_type eq 'Verbatim') { + $self->_ponder_Verbatim($para); + } elsif($para_type eq 'Data') { + $self->_ponder_Data($para); + } else { + die "\$para type is $para_type -- how did that happen?"; + # Shouldn't happen. + } + + #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + $para->[0] =~ s/^[~=]//s; + + DEBUG and print "\n", pretty($para), "\n"; + + # traverse the treelet (which might well be just one string scalar) + $self->{'content_seen'} ||= 1; + $self->_traverse_treelet_bit(@$para); + } + } + + return; +} + +########################################################################### +# The sub-ponderers... + + + +sub _ponder_for { + my ($self,$para,$curr_open,$paras) = @_; + + # Fake it out as a begin/end + my $target; + + if(grep $_->[1]{'~ignore'}, @$curr_open) { + DEBUG > 1 and print "Ignoring ignorable =for\n"; + return 1; + } + + for(my $i = 2; $i < @$para; ++$i) { + if($para->[$i] =~ s/^\s*(\S+)\s*//s) { + $target = $1; + last; + } + } + unless(defined $target) { + $self->whine( + $para->[1]{'start_line'}, + "=for without a target?" + ); + return 1; + } + DEBUG > 1 and + print "Faking out a =for $target as a =begin $target / =end $target\n"; + + $para->[0] = 'Data'; + + unshift @$paras, + ['=begin', + {'start_line' => $para->[1]{'start_line'}, '~really' => '=for'}, + $target, + ], + $para, + ['=end', + {'start_line' => $para->[1]{'start_line'}, '~really' => '=for'}, + $target, + ], + ; + + return 1; +} + +sub _ponder_begin { + my ($self,$para,$curr_open,$paras) = @_; + my $content = join ' ', splice @$para, 2; + $content =~ s/^\s+//s; + $content =~ s/\s+$//s; + unless(length($content)) { + $self->whine( + $para->[1]{'start_line'}, + "=begin without a target?" + ); + DEBUG and print "Ignoring targetless =begin\n"; + return 1; + } + + unless($content =~ m/^\S+$/s) { # i.e., unless it's one word + $self->whine( + $para->[1]{'start_line'}, + "'=begin' only takes one parameter, not several as in '=begin $content'" + ); + DEBUG and print "Ignoring unintelligible =begin $content\n"; + return 1; + } + + + $para->[1]{'target'} = $content; # without any ':' + + $content =~ s/^:!/!:/s; + my $neg; # whether this is a negation-match + $neg = 1 if $content =~ s/^!//s; + my $to_resolve; # whether to process formatting codes + $to_resolve = 1 if $content =~ s/^://s; + + my $dont_ignore; # whether this target matches us + + foreach my $target_name ( + split(',', $content, -1), + $neg ? () : '*' + ) { + DEBUG > 2 and + print " Considering whether =begin $content matches $target_name\n"; + next unless $self->{'accept_targets'}{$target_name}; + + DEBUG > 2 and + print " It DOES match the acceptable target $target_name!\n"; + $to_resolve = 1 + if $self->{'accept_targets'}{$target_name} eq 'force_resolve'; + $dont_ignore = 1; + $para->[1]{'target_matching'} = $target_name; + last; # stop looking at other target names + } + + if($neg) { + if( $dont_ignore ) { + $dont_ignore = ''; + delete $para->[1]{'target_matching'}; + DEBUG > 2 and print " But the leading ! means that this is a NON-match!\n"; + } else { + $dont_ignore = 1; + $para->[1]{'target_matching'} = '!'; + DEBUG > 2 and print " But the leading ! means that this IS a match!\n"; + } + } + + $para->[0] = '=for'; # Just what we happen to call these, internally + $para->[1]{'~really'} ||= '=begin'; + $para->[1]{'~ignore'} = (! $dont_ignore) || 0; + $para->[1]{'~resolve'} = $to_resolve || 0; + + DEBUG > 1 and print " Making note to ", $dont_ignore ? 'not ' : '', + "ignore contents of this region\n"; + DEBUG > 1 and $dont_ignore and print " Making note to treat contents as ", + ($to_resolve ? 'verbatim/plain' : 'data'), " paragraphs\n"; + DEBUG > 1 and print " (Stack now: ", $self->_dump_curr_open(), ")\n"; + + push @$curr_open, $para; + if(!$dont_ignore or scalar grep $_->[1]{'~ignore'}, @$curr_open) { + DEBUG > 1 and print "Ignoring ignorable =begin\n"; + } else { + $self->{'content_seen'} ||= 1; + $self->_handle_element_start((my $scratch='for'), $para->[1]); + } + + return 1; +} + +sub _ponder_end { + my ($self,$para,$curr_open,$paras) = @_; + my $content = join ' ', splice @$para, 2; + $content =~ s/^\s+//s; + $content =~ s/\s+$//s; + DEBUG and print "Ogling '=end $content' directive\n"; + + unless(length($content)) { + $self->whine( + $para->[1]{'start_line'}, + "'=end' without a target?" . ( + ( @$curr_open and $curr_open->[-1][0] eq '=for' ) + ? ( " (Should be \"=end " . $curr_open->[-1][1]{'target'} . '")' ) + : '' + ) + ); + DEBUG and print "Ignoring targetless =end\n"; + return 1; + } + + unless($content =~ m/^\S+$/) { # i.e., unless it's one word + $self->whine( + $para->[1]{'start_line'}, + "'=end $content' is invalid. (Stack: " + . $self->_dump_curr_open() . ')' + ); + DEBUG and print "Ignoring mistargetted =end $content\n"; + return 1; + } + + unless(@$curr_open and $curr_open->[-1][0] eq '=for') { + $self->whine( + $para->[1]{'start_line'}, + "=end $content without matching =begin. (Stack: " + . $self->_dump_curr_open() . ')' + ); + DEBUG and print "Ignoring mistargetted =end $content\n"; + return 1; + } + + unless($content eq $curr_open->[-1][1]{'target'}) { + $self->whine( + $para->[1]{'start_line'}, + "=end $content doesn't match =begin " + . $curr_open->[-1][1]{'target'} + . ". (Stack: " + . $self->_dump_curr_open() . ')' + ); + DEBUG and print "Ignoring mistargetted =end $content at line $para->[1]{'start_line'}\n"; + return 1; + } + + # Else it's okay to close... + if(grep $_->[1]{'~ignore'}, @$curr_open) { + DEBUG > 1 and print "Not firing any event for this =end $content because in an ignored region\n"; + # And that may be because of this to-be-closed =for region, or some + # other one, but it doesn't matter. + } else { + $curr_open->[-1][1]{'start_line'} = $para->[1]{'start_line'}; + # what's that for? + + $self->{'content_seen'} ||= 1; + $self->_handle_element_end( my $scratch = 'for' ); + } + DEBUG > 1 and print "Popping $curr_open->[-1][0] $curr_open->[-1][1]{'target'} because of =end $content\n"; + pop @$curr_open; + + return 1; +} + +sub _ponder_doc_end { + my ($self,$para,$curr_open,$paras) = @_; + if(@$curr_open) { # Deal with things left open + DEBUG and print "Stack is nonempty at end-document: (", + $self->_dump_curr_open(), ")\n"; + + DEBUG > 9 and print "Stack: ", pretty($curr_open), "\n"; + unshift @$paras, $self->_closers_for_all_curr_open; + # Make sure there is exactly one ~end in the parastack, at the end: + @$paras = grep $_->[0] ne '~end', @$paras; + push @$paras, $para, $para; + # We need two -- once for the next cycle where we + # generate errata, and then another to be at the end + # when that loop back around to process the errata. + return 1; + + } else { + DEBUG and print "Okay, stack is empty now.\n"; + } + + # Try generating errata section, if applicable + unless($self->{'~tried_gen_errata'}) { + $self->{'~tried_gen_errata'} = 1; + my @extras = $self->_gen_errata(); + if(@extras) { + unshift @$paras, @extras; + DEBUG and print "Generated errata... relooping...\n"; + return 1; # I.e., loop around again to process these fake-o paragraphs + } + } + + splice @$paras; # Well, that's that for this paragraph buffer. + DEBUG and print "Throwing end-document event.\n"; + + $self->_handle_element_end( my $scratch = 'Document' ); + return 1; # Hasta la byebye +} + +sub _ponder_pod { + my ($self,$para,$curr_open,$paras) = @_; + $self->whine( + $para->[1]{'start_line'}, + "=pod directives shouldn't be over one line long! Ignoring all " + . (@$para - 2) . " lines of content" + ) if @$para > 3; + # Content is always ignored. + return; +} + +sub _ponder_over { + my ($self,$para,$curr_open,$paras) = @_; + return 1 unless @$paras; + my $list_type; + + if($paras->[0][0] eq '=item') { # most common case + $list_type = $self->_get_initial_item_type($paras->[0]); + + } elsif($paras->[0][0] eq '=back') { + # Ignore empty lists. TODO: make this an option? + shift @$paras; + return 1; + + } elsif($paras->[0][0] eq '~end') { + $self->whine( + $para->[1]{'start_line'}, + "=over is the last thing in the document?!" + ); + return 1; # But feh, ignore it. + } else { + $list_type = 'block'; + } + $para->[1]{'~type'} = $list_type; + push @$curr_open, $para; + # yes, we reuse the paragraph as a stack item + + my $content = join ' ', splice @$para, 2; + my $overness; + if($content =~ m/^\s*$/s) { + $para->[1]{'indent'} = 4; + } elsif($content =~ m/^\s*((?:\d*\.)?\d+)\s*$/s) { + no integer; + $para->[1]{'indent'} = $1; + if($1 == 0) { + $self->whine( + $para->[1]{'start_line'}, + "Can't have a 0 in =over $content" + ); + $para->[1]{'indent'} = 4; + } + } else { + $self->whine( + $para->[1]{'start_line'}, + "=over should be: '=over' or '=over positive_number'" + ); + $para->[1]{'indent'} = 4; + } + DEBUG > 1 and print "=over found of type $list_type\n"; + + $self->{'content_seen'} ||= 1; + $self->_handle_element_start((my $scratch = 'over-' . $list_type), $para->[1]); + + return; +} + +sub _ponder_back { + my ($self,$para,$curr_open,$paras) = @_; + # TODO: fire off or or ?? + + my $content = join ' ', splice @$para, 2; + if($content =~ m/\S/) { + $self->whine( + $para->[1]{'start_line'}, + "=back doesn't take any parameters, but you said =back $content" + ); + } + + if(@$curr_open and $curr_open->[-1][0] eq '=over') { + DEBUG > 1 and print "=back happily closes matching =over\n"; + # Expected case: we're closing the most recently opened thing + #my $over = pop @$curr_open; + $self->{'content_seen'} ||= 1; + $self->_handle_element_end( my $scratch = + 'over-' . ( (pop @$curr_open)->[1]{'~type'} ) + ); + } else { + DEBUG > 1 and print "=back found without a matching =over. Stack: (", + join(', ', map $_->[0], @$curr_open), ").\n"; + $self->whine( + $para->[1]{'start_line'}, + '=back without =over' + ); + return 1; # and ignore it + } +} + +sub _ponder_item { + my ($self,$para,$curr_open,$paras) = @_; + my $over; + unless(@$curr_open and ($over = $curr_open->[-1])->[0] eq '=over') { + $self->whine( + $para->[1]{'start_line'}, + "'=item' outside of any '=over'" + ); + unshift @$paras, + ['=over', {'start_line' => $para->[1]{'start_line'}}, ''], + $para + ; + return 1; + } + + + my $over_type = $over->[1]{'~type'}; + + if(!$over_type) { + # Shouldn't happen1 + die "Typeless over in stack, starting at line " + . $over->[1]{'start_line'}; + + } elsif($over_type eq 'block') { + unless($curr_open->[-1][1]{'~bitched_about'}) { + $curr_open->[-1][1]{'~bitched_about'} = 1; + $self->whine( + $curr_open->[-1][1]{'start_line'}, + "You can't have =items (as at line " + . $para->[1]{'start_line'} + . ") unless the first thing after the =over is an =item" + ); + } + # Just turn it into a paragraph and reconsider it + $para->[0] = '~Para'; + unshift @$paras, $para; + return 1; + + } elsif($over_type eq 'text') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + if($item_type eq 'text') { + # Nothing special needs doing for 'text' + } elsif($item_type eq 'number' or $item_type eq 'bullet') { + die "Unknown item type $item_type" + unless $item_type eq 'number' or $item_type eq 'bullet'; + # Undo our clobbering: + push @$para, $para->[1]{'~orig_content'}; + delete $para->[1]{'number'}; + # Only a PROPER item-number element is allowed + # to have a number attribute. + } else { + die "Unhandled item type $item_type"; # should never happen + } + + # =item-text thingies don't need any assimilation, it seems. + + } elsif($over_type eq 'number') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + my $expected_value = ++ $curr_open->[-1][1]{'~counter'}; + + if($item_type eq 'bullet') { + # Hm, it's not numeric. Correct for this. + $para->[1]{'number'} = $expected_value; + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item $expected_value'" + ); + push @$para, $para->[1]{'~orig_content'}; + # restore the bullet, blocking the assimilation of next para + + } elsif($item_type eq 'text') { + # Hm, it's not numeric. Correct for this. + $para->[1]{'number'} = $expected_value; + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item $expected_value'" + ); + # Text content will still be there and will block next ~Para + + } elsif($item_type ne 'number') { + die "Unknown item type $item_type"; # should never happen + + } elsif($expected_value == $para->[1]{'number'}) { + DEBUG > 1 and print " Numeric item has the expected value of $expected_value\n"; + + } else { + DEBUG > 1 and print " Numeric item has ", $para->[1]{'number'}, + " instead of the expected value of $expected_value\n"; + $self->whine( + $para->[1]{'start_line'}, + "You have '=item " . $para->[1]{'number'} . + "' instead of the expected '=item $expected_value'" + ); + $para->[1]{'number'} = $expected_value; # correcting!! + } + + if(@$para == 2) { + # For the cases where we /didn't/ push to @$para + if($paras->[0][0] eq '~Para') { + DEBUG and print "Assimilating following ~Para content into $over_type item\n"; + push @$para, splice @{shift @$paras},2; + } else { + DEBUG and print "Can't assimilate following ", $paras->[0][0], "\n"; + push @$para, ''; # Just so it's not contentless + } + } + + + } elsif($over_type eq 'bullet') { + my $item_type = $self->_get_item_type($para); + # That kills the content of the item if it's a number or bullet. + DEBUG and print " Item is of type ", $para->[0], " under $over_type\n"; + + if($item_type eq 'bullet') { + # as expected! + + if( $para->[1]{'~_freaky_para_hack'} ) { + DEBUG and print "Accomodating '=item * Foo' tolerance hack.\n"; + push @$para, delete $para->[1]{'~_freaky_para_hack'}; + } + + } elsif($item_type eq 'number') { + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item *'" + ); + push @$para, $para->[1]{'~orig_content'}; + # and block assimilation of the next paragraph + delete $para->[1]{'number'}; + # Only a PROPER item-number element is allowed + # to have a number attribute. + } elsif($item_type eq 'text') { + $self->whine( + $para->[1]{'start_line'}, + "Expected '=item *'" + ); + # But doesn't need processing. But it'll block assimilation + # of the next para. + } else { + die "Unhandled item type $item_type"; # should never happen + } + + if(@$para == 2) { + # For the cases where we /didn't/ push to @$para + if($paras->[0][0] eq '~Para') { + DEBUG and print "Assimilating following ~Para content into $over_type item\n"; + push @$para, splice @{shift @$paras},2; + } else { + DEBUG and print "Can't assimilate following ", $paras->[0][0], "\n"; + push @$para, ''; # Just so it's not contentless + } + } + + } else { + die "Unhandled =over type \"$over_type\"?"; + # Shouldn't happen! + } + $para->[0] .= '-' . $over_type; + + return; +} + +sub _ponder_Plain { + my ($self,$para) = @_; + DEBUG and print " giving plain treatment...\n"; + unless( @$para == 2 or ( @$para == 3 and $para->[2] eq '' ) + or $para->[1]{'~cooked'} + ) { + push @$para, + @{$self->_make_treelet( + join("\n", splice(@$para, 2)), + $para->[1]{'start_line'} + )}; + } + # Empty paragraphs don't need a treelet for any reason I can see. + # And precooked paragraphs already have a treelet. + return; +} + +sub _ponder_Verbatim { + my ($self,$para) = @_; + DEBUG and print " giving verbatim treatment...\n"; + + $para->[1]{'xml:space'} = 'preserve'; + for(my $i = 2; $i < @$para; $i++) { + foreach my $line ($para->[$i]) { # just for aliasing + while( $line =~ + # Sort of adapted from Text::Tabs -- yes, it's hardwired in that + # tabs are at every EIGHTH column. For portability, it has to be + # one setting everywhere, and 8th wins. + s/^([^\t]*)(\t+)/$1.(" " x ((length($2)<<3)-(length($1)&7)))/e + ) {} + + # TODO: whinge about (or otherwise treat) unindented or overlong lines + + } + } + + # Now the VerbatimFormatted hoodoo... + if( $self->{'accept_codes'} and + $self->{'accept_codes'}{'VerbatimFormatted'} + ) { + while(@$para > 3 and $para->[-1] !~ m/\S/) { pop @$para } + # Kill any number of terminal newlines + $self->_verbatim_format($para); + } elsif ($self->{'codes_in_verbatim'}) { + push @$para, + @{$self->_make_treelet( + join("\n", splice(@$para, 2)), + $para->[1]{'start_line'}, $para->[1]{'xml:space'} + )}; + $para->[-1] =~ s/\n+$//s; # Kill any number of terminal newlines + } else { + push @$para, join "\n", splice(@$para, 2) if @$para > 3; + $para->[-1] =~ s/\n+$//s; # Kill any number of terminal newlines + } + return; +} + +sub _ponder_Data { + my ($self,$para) = @_; + DEBUG and print " giving data treatment...\n"; + $para->[1]{'xml:space'} = 'preserve'; + push @$para, join "\n", splice(@$para, 2) if @$para > 3; + return; +} + + + + +########################################################################### + +sub _traverse_treelet_bit { # for use only by the routine above + my($self, $name) = splice @_,0,2; + + my $scratch; + $self->_handle_element_start(($scratch=$name), shift @_); + + foreach my $x (@_) { + if(ref($x)) { + &_traverse_treelet_bit($self, @$x); + } else { + $self->_handle_text($x); + } + } + + $self->_handle_element_end($scratch=$name); + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _closers_for_all_curr_open { + my $self = $_[0]; + my @closers; + foreach my $still_open (@{ $self->{'curr_open'} || return }) { + my @copy = @$still_open; + $copy[1] = {%{ $copy[1] }}; + #$copy[1]{'start_line'} = -1; + if($copy[0] eq '=for') { + $copy[0] = '=end'; + } elsif($copy[0] eq '=over') { + $copy[0] = '=back'; + } else { + die "I don't know how to auto-close an open $copy[0] region"; + } + + unless( @copy > 2 ) { + push @copy, $copy[1]{'target'}; + $copy[-1] = '' unless defined $copy[-1]; + # since =over's don't have targets + } + + DEBUG and print "Queuing up fake-o event: ", pretty(\@copy), "\n"; + unshift @closers, \@copy; + } + return @closers; +} + +#-------------------------------------------------------------------------- + +sub _verbatim_format { + my($it, $p) = @_; + + my $formatting; + + for(my $i = 2; $i < @$p; $i++) { # work backwards over the lines + DEBUG and print "_verbatim_format appends a newline to $i: $p->[$i]\n"; + $p->[$i] .= "\n"; + # Unlike with simple Verbatim blocks, we don't end up just doing + # a join("\n", ...) on the contents, so we have to append a + # newline to ever line, and then nix the last one later. + } + + if( DEBUG > 4 ) { + print "<<\n"; + for(my $i = $#$p; $i >= 2; $i--) { # work backwards over the lines + print "_verbatim_format $i: $p->[$i]"; + } + print ">>\n"; + } + + for(my $i = $#$p; $i > 2; $i--) { + # work backwards over the lines, except the first (#2) + + #next unless $p->[$i] =~ m{^#:([ \^\/\%]*)\n?$}s + # and $p->[$i-1] !~ m{^#:[ \^\/\%]*\n?$}s; + # look at a formatty line preceding a nonformatty one + DEBUG > 5 and print "Scrutinizing line $i: $$p[$i]\n"; + if($p->[$i] =~ m{^#:([ \^\/\%]*)\n?$}s) { + DEBUG > 5 and print " It's a formatty line. ", + "Peeking at previous line ", $i-1, ": $$p[$i-1]: \n"; + + if( $p->[$i-1] =~ m{^#:[ \^\/\%]*\n?$}s ) { + DEBUG > 5 and print " Previous line is formatty! Skipping this one.\n"; + next; + } else { + DEBUG > 5 and print " Previous line is non-formatty! Yay!\n"; + } + } else { + DEBUG > 5 and print " It's not a formatty line. Ignoring\n"; + next; + } + + # A formatty line has to have #: in the first two columns, and uses + # "^" to mean bold, "/" to mean underline, and "%" to mean bold italic. + # Example: + # What do you want? i like pie. [or whatever] + # #:^^^^^^^^^^^^^^^^^ ///////////// + + + DEBUG > 4 and print "_verbatim_format considers:\n<$p->[$i-1]>\n<$p->[$i]>\n"; + + $formatting = ' ' . $1; + $formatting =~ s/\s+$//s; # nix trailing whitespace + unless(length $formatting and $p->[$i-1] =~ m/\S/) { # no-op + splice @$p,$i,1; # remove this line + $i--; # don't consider next line + next; + } + + if( length($formatting) >= length($p->[$i-1]) ) { + $formatting = substr($formatting, 0, length($p->[$i-1]) - 1) . ' '; + } else { + $formatting .= ' ' x (length($p->[$i-1]) - length($formatting)); + } + # Make $formatting and the previous line be exactly the same length, + # with $formatting having a " " as the last character. + + DEBUG > 4 and print "Formatting <$formatting> on <", $p->[$i-1], ">\n"; + + + my @new_line; + while( $formatting =~ m{\G(( +)|(\^+)|(\/+)|(\%+))}g ) { + #print "Format matches $1\n"; + + if($2) { + #print "SKIPPING <$2>\n"; + push @new_line, + substr($p->[$i-1], pos($formatting)-length($1), length($1)); + } else { + #print "SNARING $+\n"; + push @new_line, [ + ( + $3 ? 'VerbatimB' : + $4 ? 'VerbatimI' : + $5 ? 'VerbatimBI' : die("Should never get called") + ), {}, + substr($p->[$i-1], pos($formatting)-length($1), length($1)) + ]; + #print "Formatting <$new_line[-1][-1]> as $new_line[-1][0]\n"; + } + } + my @nixed = + splice @$p, $i-1, 2, @new_line; # replace myself and the next line + DEBUG > 10 and print "Nixed count: ", scalar(@nixed), "\n"; + + DEBUG > 6 and print "New version of the above line is these tokens (", + scalar(@new_line), "):", + map( ref($_)?"<@$_> ":"<$_>", @new_line ), "\n"; + $i--; # So the next line we scrutinize is the line before the one + # that we just went and formatted + } + + $p->[0] = 'VerbatimFormatted'; + + # Collapse adjacent text nodes, just for kicks. + for( my $i = 2; $i > $#$p; $i++ ) { # work forwards over the tokens except for the last + if( !ref($p->[$i]) and !ref($p->[$i + 1]) ) { + DEBUG > 5 and print "_verbatim_format merges {$p->[$i]} and {$p->[$i+1]}\n"; + $p->[$i] .= splice @$p, $i+1, 1; # merge + --$i; # and back up + } + } + + # Now look for the last text token, and remove the terminal newline + for( my $i = $#$p; $i >= 2; $i-- ) { + # work backwards over the tokens, even the first + if( !ref($p->[$i]) ) { + if($p->[$i] =~ s/\n$//s) { + DEBUG > 5 and print "_verbatim_format killed the terminal newline on #$i: {$p->[$i]}, after {$p->[$i-1]}\n"; + } else { + DEBUG > 5 and print + "No terminal newline on #$i: {$p->[$i]}, after {$p->[$i-1]} !?\n"; + } + last; # we only want the next one + } + } + + return; +} + + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + + +sub _treelet_from_formatting_codes { + # Given a paragraph, returns a treelet. Full of scary tokenizing code. + # Like [ '~Top', {'start_line' => $start_line}, + # "I like ", + # [ 'B', {}, "pie" ], + # "!" + # ] + + my($self, $para, $start_line, $preserve_space) = @_; + + my $treelet = ['~Top', {'start_line' => $start_line},]; + + unless ($preserve_space || $self->{'preserve_whitespace'}) { + $para =~ s/\. /\.\xA0 /g if $self->{'fullstop_space_harden'}; + + $para =~ s/\s+/ /g; # collapse and trim all whitespace first. + $para =~ s/ $//; + $para =~ s/^ //; + } + + # Only apparent problem the above code is that N<< >> turns into + # N<< >>. But then, word wrapping does that too! So don't do that! + + my @stack; + my @lineage = ($treelet); + + DEBUG > 4 and print "Paragraph:\n$para\n\n"; + + # Here begins our frightening tokenizer RE. The following regex matches + # text in four main parts: + # + # * Start-codes. The first alternative matches C< or C<<, the latter + # followed by some whitespace. $1 will hold the entire start code + # (including any space following a multiple-angle-bracket delimiter), + # and $2 will hold only the additional brackets past the first in a + # multiple-bracket delimiter. length($2) + 1 will be the number of + # closing brackets we have to find. + # + # * Closing brackets. Match some amount of whitespace followed by + # multiple close brackets. The logic to see if this closes anything + # is down below. Note that in order to parse C<< >> correctly, we + # have to use look-behind (?<=\s\s), since the match of the starting + # code will have consumed the whitespace. + # + # * A single closing bracket, to close a simple code like C<>. + # + # * Something that isn't a start or end code. We have to be careful + # about accepting whitespace, since perlpodspec says that any whitespace + # before a multiple-bracket closing delimiter should be ignored. + # + while($para =~ + m/\G + (?: + # Match starting codes, including the whitespace following a + # multiple-delimiter start code. $1 gets the whole start code and + # $2 gets all but one of the {2,}) + | + (\s?>) # $5: simple end-codes + | + ( # $6: stuff containing no start-codes or end-codes + (?: + [^A-Z\s>]+ + | + (?: + [A-Z](?!<) + ) + | + (?: + \s(?!\s*>) + ) + )+ + ) + ) + /xgo + ) { + DEBUG > 4 and print "\nParagraphic tokenstack = (@stack)\n"; + if(defined $1) { + if(defined $2) { + DEBUG > 3 and print "Found complex start-text code \"$1\"\n"; + push @stack, length($2) + 1; + # length of the necessary complex end-code string + } else { + DEBUG > 3 and print "Found simple start-text code \"$1\"\n"; + push @stack, 0; # signal that we're looking for simple + } + push @lineage, [ substr($1,0,1), {}, ]; # new node object + push @{ $lineage[-2] }, $lineage[-1]; + + } elsif(defined $4) { + DEBUG > 3 and print "Found apparent complex end-text code \"$3$4\"\n"; + # This is where it gets messy... + if(! @stack) { + # We saw " >>>>" but needed nothing. This is ALL just stuff then. + DEBUG > 4 and print " But it's really just stuff.\n"; + push @{ $lineage[-1] }, $3, $4; + next; + } elsif(!$stack[-1]) { + # We saw " >>>>" but needed only ">". Back pos up. + DEBUG > 4 and print " And that's more than we needed to close simple.\n"; + push @{ $lineage[-1] }, $3; # That was a for-real space, too. + pos($para) = pos($para) - length($4) + 1; + } elsif($stack[-1] == length($4)) { + # We found " >>>>", and it was exactly what we needed. Commonest case. + DEBUG > 4 and print " And that's exactly what we needed to close complex.\n"; + } elsif($stack[-1] < length($4)) { + # We saw " >>>>" but needed only " >>". Back pos up. + DEBUG > 4 and print " And that's more than we needed to close complex.\n"; + pos($para) = pos($para) - length($4) + $stack[-1]; + } else { + # We saw " >>>>" but needed " >>>>>>". So this is all just stuff! + DEBUG > 4 and print " But it's really just stuff, because we needed more.\n"; + push @{ $lineage[-1] }, $3, $4; + next; + } + #print "\nHOOBOY ", scalar(@{$lineage[-1]}), "!!!\n"; + + push @{ $lineage[-1] }, '' if 2 == @{ $lineage[-1] }; + # Keep the element from being childless + + pop @stack; + pop @lineage; + + } elsif(defined $5) { + DEBUG > 3 and print "Found apparent simple end-text code \"$4\"\n"; + + if(@stack and ! $stack[-1]) { + # We're indeed expecting a simple end-code + DEBUG > 4 and print " It's indeed an end-code.\n"; + + if(length($5) == 2) { # There was a space there: " >" + push @{ $lineage[-1] }, ' '; + } elsif( 2 == @{ $lineage[-1] } ) { # Closing a childless element + push @{ $lineage[-1] }, ''; # keep it from being really childless + } + + pop @stack; + pop @lineage; + } else { + DEBUG > 4 and print " It's just stuff.\n"; + push @{ $lineage[-1] }, $5; + } + + } elsif(defined $6) { + DEBUG > 3 and print "Found stuff \"$6\"\n"; + push @{ $lineage[-1] }, $6; + + } else { + # should never ever ever ever happen + DEBUG and print "AYYAYAAAAA at line ", __LINE__, "\n"; + die "SPORK 512512!"; + } + } + + if(@stack) { # Uhoh, some sequences weren't closed. + my $x= "..."; + while(@stack) { + push @{ $lineage[-1] }, '' if 2 == @{ $lineage[-1] }; + # Hmmmmm! + + my $code = (pop @lineage)->[0]; + my $ender_length = pop @stack; + if($ender_length) { + --$ender_length; + $x = $code . ("<" x $ender_length) . " $x " . (">" x $ender_length); + } else { + $x = $code . "<$x>"; + } + } + DEBUG > 1 and print "Unterminated $x sequence\n"; + $self->whine($start_line, + "Unterminated $x sequence", + ); + } + + return $treelet; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub text_content_of_treelet { # method: $parser->text_content_of_treelet($lol) + return stringify_lol($_[1]); +} + +sub stringify_lol { # function: stringify_lol($lol) + my $string_form = ''; + _stringify_lol( $_[0] => \$string_form ); + return $string_form; +} + +sub _stringify_lol { # the real recursor + my($lol, $to) = @_; + use UNIVERSAL (); + for(my $i = 2; $i < @$lol; ++$i) { + if( ref($lol->[$i] || '') and UNIVERSAL::isa($lol->[$i], 'ARRAY') ) { + _stringify_lol( $lol->[$i], $to); # recurse! + } else { + $$to .= $lol->[$i]; + } + } + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _dump_curr_open { # return a string representation of the stack + my $curr_open = $_[0]{'curr_open'}; + + return '[empty]' unless @$curr_open; + return join '; ', + map {; + ($_->[0] eq '=for') + ? ( ($_->[1]{'~really'} || '=over') + . ' ' . $_->[1]{'target'}) + : $_->[0] + } + @$curr_open + ; +} + +########################################################################### +my %pretty_form = ( + "\a" => '\a', # ding! + "\b" => '\b', # BS + "\e" => '\e', # ESC + "\f" => '\f', # FF + "\t" => '\t', # tab + "\cm" => '\cm', + "\cj" => '\cj', + "\n" => '\n', # probably overrides one of either \cm or \cj + '"' => '\"', + '\\' => '\\\\', + '$' => '\\$', + '@' => '\\@', + '%' => '\\%', + '#' => '\\#', +); + +sub pretty { # adopted from Class::Classless + # Not the most brilliant routine, but passable. + # Don't give it a cyclic data structure! + my @stuff = @_; # copy + my $x; + my $out = + # join ",\n" . + join ", ", + map {; + if(!defined($_)) { + "undef"; + } elsif(ref($_) eq 'ARRAY' or ref($_) eq 'Pod::Simple::LinkSection') { + $x = "[ " . pretty(@$_) . " ]" ; + $x; + } elsif(ref($_) eq 'SCALAR') { + $x = "\\" . pretty($$_) ; + $x; + } elsif(ref($_) eq 'HASH') { + my $hr = $_; + $x = "{" . join(", ", + map(pretty($_) . '=>' . pretty($hr->{$_}), + sort keys %$hr ) ) . "}" ; + $x; + } elsif(!length($_)) { q{''} # empty string + } elsif( + $_ eq '0' # very common case + or( + m/^-?(?:[123456789]\d*|0)(?:\.\d+)?$/s + and $_ ne '-0' # the strange case that that RE lets thru + ) + ) { $_; + } else { + if( chr(65) eq 'A' ) { + s<([^\x20\x21\x23\x27-\x3F\x41-\x5B\x5D-\x7E])> + #<$pretty_form{$1} || '\\x'.(unpack("H2",$1))>eg; + <$pretty_form{$1} || '\\x{'.sprintf("%x", ord($1)).'}'>eg; + } else { + # We're in some crazy non-ASCII world! + s<([^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])> + #<$pretty_form{$1} || '\\x'.(unpack("H2",$1))>eg; + <$pretty_form{$1} || '\\x{'.sprintf("%x", ord($1)).'}'>eg; + } + qq{"$_"}; + } + } @stuff; + # $out =~ s/\n */ /g if length($out) < 75; + return $out; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +1; + diff --git a/lib/Pod/Simple/ChangeLog b/lib/Pod/Simple/ChangeLog new file mode 100644 index 0000000..630c124 --- /dev/null +++ b/lib/Pod/Simple/ChangeLog @@ -0,0 +1,233 @@ +# ChangeLog for Pod::Simple dist # Time-stamp: "2005-11-21 12:33:54 PST" +#--------------------------------------------------------------------------- + +2005-11-21 Allison Randal + * Release 3.03 + + Applied whitespace patches for Pod::Man and Pod::Text from + Russ Allbery. + + Applied validation patches to Pod::Simple::HTML from Graham Barr. + +2004-05-24 Sean M. Burke + * Release 3.02 + + Just fixing some typos in the CSS generated by Pod::Simple:HTMLBatch. + + +2004-05-24 Sean M. Burke + * Release 3.01 + + No big changes to the main modules, but there's many changes to + the important Pod::Simple::HTML module. + + Also, new modules: + Pod::Simple::HTMLBatch + Pod::Simple::HTMLLegacy + Pod::Simple::Progress + Pod::Simple::Search + and tests for these (well, most). + + Some prettying up of the Makefile.PL. + + The test 00about.t is a bit more careful and verbose now. + + The docs are still incomplete, esp. for Pod::Simple::HTML and + Pod::Simple::HTMLBatch, which I hope to improve quite soon. + + + +2004-05-07 Sean M. Burke + * Release 2.06 + + Allison Randal brilliantly refactors a huge monolithic sub in + Blackbox.pm. Code is now actually sanely readable! + + I add the new option-attributes fullstop_space_harden and + codes_in_verbatim, and the two methods abandon_output_string and + abandon_output_fh. Not yet documented. + + New test fullstop_spaces.t and new corpus document + greek_iso_8859_7.pod. + + Another version should be forthcoming in a few days that has the + new Pod::Simple::HTML stuff in it. + + Note to self: document =extend and =encoding in perlpodspec ASAP! + + +2003-11-04 Sean M. Burke + * Release 2.05 -- bugfix version + + In an attempt to stop Perl 5.6 versions from going completely + crazy, I've tried to basically turn off all Unicode/utf8 support + under 5.6. Under 5.8 and above, Unicode should work fine, and + under 5.6, all Unicode characters should be replaced with a little + "can't render" symbol, either a "¤" or a "?". + Many many thanks to Jarkko Hietaniemi for helping out. + + (Works under 5.005 now too?) + +2003-10-10 Sean M. Burke + * Release 2.04 -- minor bugfix version + + * Added some code to insulate Pod::Simple to runtime changes in + the value of $/. Thanks to Iain Truskett for catching this. + * Added FILENO method to TiedOutFH.pm, to work quietly under some + perls. Thanks to Jochen Stenzel for catching this. + * Fixed some tests that erroneously failed under some 5.6's + because of chdir()s in *.t files messing up @INC Thanks to many + who caught this, including Rocco Caputo. + +2003-09-07 Sean M. Burke + * Release 2.03 -- minor upgrade + * A =head[1234] command now closes any currently open =over, and + warns. + * Fixing a few obscure bugs, including one to do with the + Makefile. + * Added some tests for those bugs. + +2003-09-02 Sean M. Burke + * Release 2.02 -- fixing a testing bug + * Autrijus Tang found a silly bug of mine in corpustest.t's + testing of corpus/t/nonesuch.txt. Fixed, + * I add encoding_nonesuch.t to test corpus/t/nonesuch.txt better. + + +2003-09-01 Sean M. Burke + * Release 2.01 + * Moved all version numbers up to 2.01 for reasons of Tibetan + numerology. + + * Implemented =encoding. Thanks a million to Jarkko, Autrijus, + Dan Kogai, and many others for their help, patience, and + encouragement. + + It's not yet documented, but see ./t/corpus/*.txt for examples at + least. + + * Added 'use utf8' to all the modules. Byebye perl-5.005 + compatibility, hello decent perl-5.6 compatibility. + +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +2003-09-01 Sean M. Burke + * Release 0.98 + * At Michael Schwern's suggestion, a list that begins with a + "=item [number]" line where number isn't a 1, is no longer + treated as an error, but instead is silently coerced into being a + text-item list, just as if it were "=item wubble" or whatever. + +2003-08-12 Sean M. Burke + * Release 0.97 + * Loooking goooooood. Not smelling so much like beta anymore! + * I think this might be the last version before I go + putting "use utf8" all over the place. + + * Added placefolders for "=encoding" support. If you use + "=encoding", you'll get an error about "not yet implemented" + instead of "unknown directive". + + * Rewrote Pod::Simple::Pullparser's get_title, and added + get_version, get_author, and get_description. Documented them. + Thanks to Graham Barr for contributing an initial get_title and + then being patient while I seemed to do nothing with it for months + on end! + + * More tests. + + Otherwise just minor changes since last version: + * Fixed error in Pod::Simple::RTF. + * Added new but undocumented methods to Pod::Simple: + unaccept_directive(s) unaccept_target(s) unaccept_code(s) + * Corrected '=back without =open' error to '=back without =over' + * A little tweaking to the _verbatim_format code. + * Dump routines may use \x{NN} instead of \xNN format. + * Pod::Simple::HTML now uses VerbatimFormatted + * A few changes ot DEBUG messages -- nothing you'd normally see. + +2002-11-19 Sean M. Burke + * Release 0.96 + * Still kinda sorta beta. + + * Much much more docs in this version. Only a few feature + changes: + + * Added any_errata_seen to Pod::Simple. Should've been in there + ages ago. Oops! + * Fixed some errors in the experimental VerbatimFormatted code. + * Added is_tagname / is_tag to Pod::Simple::PullParserStartToken + and Pod::Simple::PullParserEndToken + * Added RTFDEFLANG stuff to Pod::Simple::RTF. + +2002-11-10 Sean M. Burke + * Release 0.95 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. + + * Now including perlpod and perlpodspec in the dist. + + * Pod::Simple -- added parse_from_file + * Pod::Simple::RTF -- minor style changes, and minor change to the + RTF-wrapping logic. + * Pod::Simple::BlackBox -- a =cut opening a block is no longer a + fatal(ish) error. + * Pod::Simple::BlackBox -- added experimental new + VerbatimFormatted stuff. + + +2002-10-16 Sean M. Burke + * Release 0.94 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. + + * Pod::Simple::RTF -- minor style tweaks + * Pod::Simple::PullParserEndToken and ::PullParserStartToken -- + added ->tag as an alias for ->tagname. + * Pod::Simple and Pod::Simple::BlackBox -- added tolerance for + "=item * Foo" as a variant syntax for "=item *\n\nFoo". Tests + added/changed accordingly. + * Pod::Simple::HTML -- added stuff, and a hack at + doing something with X<...>'s. + + +2002-09-29 Sean M. Burke + * Release 0.93 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. + + * Pod::Simple -- added errors_seen attribute + * Pod::Simple::HTML -- added support for =for HTML ... directives, + and =extend directives. + * Oh, and I added Pod::Simple::RTF in 0.92 but forgot to note it + until now. + +2002-09-29 Sean M. Burke + * Release 0.92 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. + + * Fixing bungled distribution. + +2002-09-29 Sean M. Burke + * Release 0.91 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. + + * Pod::Simple::PullParserTextToken -- added 'text_r' + + * Pod::Simple::PullParser -- added 'get_title' and + 'get_title_short' to + + * Pod::Simple -- corrected wording of "Format for second =extend + parameter" error message. + + * Pod::PullParser -- made its filter() run as intended, like + Pod::Simple's filter. + + * Pod::Subclassing -- more docs + +2002-09-11 Sean M. Burke + * Release 0.90 + * Beta release. Much documentation still to write, many features + still to add. The APIs might change in future versions. diff --git a/lib/Pod/Simple/Checker.pm b/lib/Pod/Simple/Checker.pm new file mode 100644 index 0000000..0d01f50 --- /dev/null +++ b/lib/Pod/Simple/Checker.pm @@ -0,0 +1,171 @@ + +# A quite dimwitted pod2plaintext that need only know how to format whatever +# text comes out of Pod::BlackBox's _gen_errata + +require 5; +package Pod::Simple::Checker; +use strict; +use Carp (); +use Pod::Simple::Methody (); +use Pod::Simple (); +use vars qw( @ISA $VERSION ); +$VERSION = '2.02'; +@ISA = ('Pod::Simple::Methody'); +BEGIN { *DEBUG = defined(&Pod::Simple::DEBUG) + ? \&Pod::Simple::DEBUG + : sub() {0} + } + +use Text::Wrap 98.112902 (); # was 2001.0131, but I don't think we need that +$Text::Wrap::wrap = 'overflow'; +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub any_errata_seen { # read-only accessor + return $_[1]->{'Errata_seen'}; +} + +sub new { + my $self = shift; + my $new = $self->SUPER::new(@_); + $new->{'output_fh'} ||= *STDOUT{IO}; + $new->nix_X_codes(1); + $new->nbsp_for_S(1); + $new->{'Thispara'} = ''; + $new->{'Indent'} = 0; + $new->{'Indentstring'} = ' '; + $new->{'Errata_seen'} = 0; + return $new; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub handle_text { $_[0]{'Errata_seen'} and $_[0]{'Thispara'} .= $_[1] } + +sub start_Para { $_[0]{'Thispara'} = '' } + +sub start_head1 { + if($_[0]{'Errata_seen'}) { + $_[0]{'Thispara'} = ''; + } else { + if($_[1]{'errata'}) { # start of errata! + $_[0]{'Errata_seen'} = 1; + $_[0]{'Thispara'} = $_[0]{'source_filename'} ? + "$_[0]{'source_filename'} -- " : '' + } + } +} +sub start_head2 { $_[0]{'Thispara'} = '' } +sub start_head3 { $_[0]{'Thispara'} = '' } +sub start_head4 { $_[0]{'Thispara'} = '' } + +sub start_Verbatim { $_[0]{'Thispara'} = '' } +sub start_item_bullet { $_[0]{'Thispara'} = '* ' } +sub start_item_number { $_[0]{'Thispara'} = "$_[1]{'number'}. " } +sub start_item_text { $_[0]{'Thispara'} = '' } + +sub start_over_bullet { ++$_[0]{'Indent'} } +sub start_over_number { ++$_[0]{'Indent'} } +sub start_over_text { ++$_[0]{'Indent'} } +sub start_over_block { ++$_[0]{'Indent'} } + +sub end_over_bullet { --$_[0]{'Indent'} } +sub end_over_number { --$_[0]{'Indent'} } +sub end_over_text { --$_[0]{'Indent'} } +sub end_over_block { --$_[0]{'Indent'} } + + +# . . . . . Now the actual formatters: + +sub end_head1 { $_[0]->emit_par(-4) } +sub end_head2 { $_[0]->emit_par(-3) } +sub end_head3 { $_[0]->emit_par(-2) } +sub end_head4 { $_[0]->emit_par(-1) } +sub end_Para { $_[0]->emit_par( 0) } +sub end_item_bullet { $_[0]->emit_par( 0) } +sub end_item_number { $_[0]->emit_par( 0) } +sub end_item_text { $_[0]->emit_par(-2) } + +sub emit_par { + return unless $_[0]{'Errata_seen'}; + my($self, $tweak_indent) = splice(@_,0,2); + my $indent = ' ' x ( 2 * $self->{'Indent'} + ($tweak_indent||0) ); + # Yes, 'STRING' x NEGATIVE gives '', same as 'STRING' x 0 + + $self->{'Thispara'} =~ tr{\xAD}{}d if Pod::Simple::ASCII; + my $out = Text::Wrap::wrap($indent, $indent, $self->{'Thispara'} .= "\n"); + $out =~ tr{\xA0}{ } if Pod::Simple::ASCII; + print {$self->{'output_fh'}} $out, + #"\n" + ; + $self->{'Thispara'} = ''; + + return; +} + +# . . . . . . . . . . And then off by its lonesome: + +sub end_Verbatim { + return unless $_[0]{'Errata_seen'}; + my $self = shift; + if(Pod::Simple::ASCII) { + $self->{'Thispara'} =~ tr{\xA0}{ }; + $self->{'Thispara'} =~ tr{\xAD}{}d; + } + + my $i = ' ' x ( 2 * $self->{'Indent'} + 4); + + $self->{'Thispara'} =~ s/^/$i/mg; + + print { $self->{'output_fh'} } '', + $self->{'Thispara'}, + "\n\n" + ; + $self->{'Thispara'} = ''; + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +1; + +__END__ + +=head1 NAME + +Pod::Simple::Checker -- check the Pod syntax of a document + +=head1 SYNOPSIS + + perl -MPod::Simple::Checker -e \ + "exit Pod::Simple::Checker->filter(shift)->any_errata_seen" \ + thingy.pod + +=head1 DESCRIPTION + +This class is for checking the syntactic validity of Pod. +It works by basically acting like a simple-minded version of +L that formats only the "Pod Errors" section +(if Pod::Simple even generates one for the given document). + +This is a subclass of L and inherits all its methods. + +=head1 SEE ALSO + +L, L, L + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Sean M. Burke C + +=cut + diff --git a/lib/Pod/Simple/Debug.pm b/lib/Pod/Simple/Debug.pm new file mode 100644 index 0000000..b00e58d --- /dev/null +++ b/lib/Pod/Simple/Debug.pm @@ -0,0 +1,151 @@ + +require 5; +package Pod::Simple::Debug; +use strict; + +sub import { + my($value,$variable); + + if(@_ == 2) { + $value = $_[1]; + } elsif(@_ == 3) { + ($variable, $value) = @_[1,2]; + + ($variable, $value) = ($value, $variable) + if defined $value and ref($value) eq 'SCALAR' + and not(defined $variable and ref($variable) eq 'SCALAR') + ; # tolerate getting it backwards + + unless( defined $variable and ref($variable) eq 'SCALAR') { + require Carp; + Carp::croak("Usage:\n use Pod::Simple::Debug (NUMVAL)\nor" + . "\n use Pod::Simple::Debug (\\\$var, STARTNUMVAL)\nAborting"); + } + } else { + require Carp; + Carp::croak("Usage:\n use Pod::Simple::Debug (NUMVAL)\nor" + . "\n use Pod::Simple::Debug (\\\$var, STARTNUMVAL)\nAborting"); + } + + if( defined &Pod::Simple::DEBUG ) { + require Carp; + Carp::croak("It's too late to call Pod::Simple::Debug -- " + . "Pod::Simple has already loaded\nAborting"); + } + + $value = 0 unless defined $value; + + unless($value =~ m/^-?\d+$/) { + require Carp; + Carp::croak( "$value isn't a numeric value." + . "\nUsage:\n use Pod::Simple::Debug (NUMVAL)\nor" + . "\n use Pod::Simple::Debug (\\\$var, STARTNUMVAL)\nAborting"); + } + + if( defined $variable ) { + # make a not-really-constant + *Pod::Simple::DEBUG = sub () { $$variable } ; + $$variable = $value; + print "# Starting Pod::Simple::DEBUG = non-constant $variable with val $value\n"; + } else { + *Pod::Simple::DEBUG = eval " sub () { $value } "; + print "# Starting Pod::Simple::DEBUG = $value\n"; + } + + require Pod::Simple; + return; +} + +1; + + +__END__ + +=head1 NAME + +Pod::Simple::Debug -- put Pod::Simple into trace/debug mode + +=head1 SYNOPSIS + + use Pod::Simple::Debug (5); # or some integer + +Or: + + my $debuglevel; + use Pod::Simple::Debug (\$debuglevel, 0); + ...some stuff that uses Pod::Simple to do stuff, but which + you don't want debug output from... + + $debug_level = 4; + ...some stuff that uses Pod::Simple to do stuff, but which + you DO want debug output from... + + $debug_level = 0; + +=head1 DESCRIPTION + +This is an internal module for controlling the debug level (a.k.a. trace +level) of Pod::Simple. This is of interest only to Pod::Simple +developers. + + +=head1 CAVEATS + +Note that you should load this module I loading Pod::Simple (or +any Pod::Simple-based class). If you try loading Pod::Simple::Debug +after &Pod::Simple::DEBUG is already defined, Pod::Simple::Debug will +throw a fatal error to the effect that +"it's s too late to call Pod::Simple::Debug". + +Note that the C)> mode will make +Pod::Simple (et al) run rather slower, since &Pod::Simple::DEBUG won't +be a constant sub anymore, and so Pod::Simple (et al) won't compile with +constant-folding. + + +=head1 GUTS + +Doing this: + + use Pod::Simple::Debug (5); # or some integer + +is basically equivalent to: + + BEGIN { sub Pod::Simple::DEBUG () {5} } # or some integer + use Pod::Simple (); + +And this: + + use Pod::Simple::Debug (\$debug_level,0); # or some integer + +is basically equivalent to this: + + my $debug_level; + BEGIN { $debug_level = 0 } + BEGIN { sub Pod::Simple::DEBUG () { $debug_level } + use Pod::Simple (); + +=head1 SEE ALSO + +L + +The article "Constants in Perl", in I issue +21. See L + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Sean M. Burke C + +=cut + diff --git a/lib/Pod/Simple/DumpAsText.pm b/lib/Pod/Simple/DumpAsText.pm new file mode 100644 index 0000000..e678e42 --- /dev/null +++ b/lib/Pod/Simple/DumpAsText.pm @@ -0,0 +1,130 @@ + +require 5; +package Pod::Simple::DumpAsText; +$VERSION = '2.02'; +use Pod::Simple (); +BEGIN {@ISA = ('Pod::Simple')} + +use strict; + +use Carp (); + +BEGIN { *DEBUG = \&Pod::Simple::DEBUG unless defined &DEBUG } + +sub new { + my $self = shift; + my $new = $self->SUPER::new(@_); + $new->{'output_fh'} ||= *STDOUT{IO}; + $new->accept_codes('VerbatimFormatted'); + return $new; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _handle_element_start { + # ($self, $element_name, $attr_hash_r) + my $fh = $_[0]{'output_fh'}; + my($key, $value); + DEBUG and print "++ $_[1]\n"; + + print $fh ' ' x ($_[0]{'indent'} || 0), "++", $_[1], "\n"; + $_[0]{'indent'}++; + while(($key,$value) = each %{$_[2]}) { + unless($key =~ m/^~/s) { + next if $key eq 'start_line' and $_[0]{'hide_line_numbers'}; + _perly_escape($key); + _perly_escape($value); + printf $fh qq{%s \\ "%s" => "%s"\n}, + ' ' x ($_[0]{'indent'} || 0), $key, $value; + } + } + return; +} + +sub _handle_text { + DEBUG and print "== \"$_[1]\"\n"; + + if(length $_[1]) { + my $indent = ' ' x $_[0]{'indent'}; + my $text = $_[1]; + _perly_escape($text); + $text =~ # A not-totally-brilliant wrapping algorithm: + s/( + [^\n]{55} # Snare some characters from a line + [^\n\ ]{0,50} # and finish any current word + ) + \x20{1,10}(?!\n) # capture some spaces not at line-end + /$1"\n$indent . "/gx # => line-break here + ; + + print {$_[0]{'output_fh'}} $indent, '* "', $text, "\"\n"; + } + return; +} + +sub _handle_element_end { + DEBUG and print "-- $_[1]\n"; + print {$_[0]{'output_fh'}} + ' ' x --$_[0]{'indent'}, "--", $_[1], "\n"; + return; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub _perly_escape { + foreach my $x (@_) { + $x =~ s/([^\x00-\xFF])/sprintf'\x{%X}',ord($1)/eg; + # Escape things very cautiously: + $x =~ s/([^-\n\t \&\<\>\'!\#\%\(\)\*\+,\.\/\:\;=\?\~\[\]\^_\`\{\|\}abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])/sprintf'\x%02X',ord($1)/eg; + } + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +1; + + +__END__ + +=head1 NAME + +Pod::Simple::DumpAsText -- dump Pod-parsing events as text + +=head1 SYNOPSIS + + perl -MPod::Simple::DumpAsText -e \ + "exit Pod::Simple::DumpAsText->filter(shift)->any_errata_seen" \ + thingy.pod + +=head1 DESCRIPTION + +This class is for dumping, as text, the events gotten from parsing a Pod +document. This class is of interest to people writing Pod formatters +based on Pod::Simple. It is useful for seeing exactly what events you +get out of some Pod that you feed in. + +This is a subclass of L and inherits all its methods. + +=head1 SEE ALSO + +L + +L + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Sean M. Burke C + +=cut + diff --git a/lib/Pod/Simple/DumpAsXML.pm b/lib/Pod/Simple/DumpAsXML.pm new file mode 100644 index 0000000..fe0c166 --- /dev/null +++ b/lib/Pod/Simple/DumpAsXML.pm @@ -0,0 +1,146 @@ + +require 5; +package Pod::Simple::DumpAsXML; +$VERSION = '2.02'; +use Pod::Simple (); +BEGIN {@ISA = ('Pod::Simple')} + +use strict; + +use Carp (); + +BEGIN { *DEBUG = \&Pod::Simple::DEBUG unless defined &DEBUG } + +sub new { + my $self = shift; + my $new = $self->SUPER::new(@_); + $new->{'output_fh'} ||= *STDOUT{IO}; + $new->accept_codes('VerbatimFormatted'); + return $new; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +sub _handle_element_start { + # ($self, $element_name, $attr_hash_r) + my $fh = $_[0]{'output_fh'}; + my($key, $value); + DEBUG and print "++ $_[1]\n"; + + print $fh ' ' x ($_[0]{'indent'} || 0), "<", $_[1]; + + foreach my $key (sort keys %{$_[2]}) { + unless($key =~ m/^~/s) { + next if $key eq 'start_line' and $_[0]{'hide_line_numbers'}; + _xml_escape($value = $_[2]{$key}); + print $fh ' ', $key, '="', $value, '"'; + } + } + + + print $fh ">\n"; + $_[0]{'indent'}++; + return; +} + +sub _handle_text { + DEBUG and print "== \"$_[1]\"\n"; + if(length $_[1]) { + my $indent = ' ' x $_[0]{'indent'}; + my $text = $_[1]; + _xml_escape($text); + $text =~ # A not-totally-brilliant wrapping algorithm: + s/( + [^\n]{55} # Snare some characters from a line + [^\n\ ]{0,50} # and finish any current word + ) + \x20{1,10}(?!\n) # capture some spaces not at line-end + /$1\n$indent/gx # => line-break here + ; + + print {$_[0]{'output_fh'}} $indent, $text, "\n"; + } + return; +} + +sub _handle_element_end { + DEBUG and print "-- $_[1]\n"; + print {$_[0]{'output_fh'}} + ' ' x --$_[0]{'indent'}, "\n"; + return; +} + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub _xml_escape { + foreach my $x (@_) { + # Escape things very cautiously: + $x =~ s/([^-\n\t !\#\$\%\(\)\*\+,\.\~\/\:\;=\?\@\[\\\]\^_\`\{\|\}abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])/'&#'.(ord($1)).';'/eg; + # Yes, stipulate the list without a range, so that this can work right on + # all charsets that this module happens to run under. + # Altho, hmm, what about that ord? Presumably that won't work right + # under non-ASCII charsets. Something should be done about that. + } + return; +} + +#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +1; + +__END__ + +=head1 NAME + +Pod::Simple::DumpAsXML -- turn Pod into XML + +=head1 SYNOPSIS + + perl -MPod::Simple::DumpAsXML -e \ + "exit Pod::Simple::DumpAsXML->filter(shift)->any_errata_seen" \ + thingy.pod + +=head1 DESCRIPTION + +Pod::Simple::DumpAsXML is a subclass of L that parses Pod +and turns it into indented and wrapped XML. This class is of +interest to people writing Pod formatters based on Pod::Simple. + +Pod::Simple::DumpAsXML inherits methods from +L. + + +=head1 SEE ALSO + +L is rather like this class. +Pod::Simple::XMLOutStream's output is space-padded in a way +that's better for sending to an XML processor (that is, it has +no ignoreable whitespace). But +Pod::Simple::DumpAsXML's output is much more human-readable, being +(more-or-less) one token per line, with line-wrapping. + +L is rather like this class, +except that it doesn't dump with XML syntax. Try them and see +which one you like best! + +L, L + +The older libraries L, L, L + + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Sean M. Burke C + +=cut + diff --git a/lib/Pod/Simple/HTML.pm b/lib/Pod/Simple/HTML.pm new file mode 100644 index 0000000..c0a505d --- /dev/null +++ b/lib/Pod/Simple/HTML.pm @@ -0,0 +1,889 @@ + +require 5; +package Pod::Simple::HTML; +use strict; +use Pod::Simple::PullParser (); +use vars qw( + @ISA %Tagmap $Computerese $LamePad $Linearization_Limit $VERSION + $Perldoc_URL_Prefix $Perldoc_URL_Postfix + $Title_Prefix $Title_Postfix $HTML_EXTENSION %ToIndex + $Doctype_decl $Content_decl +); +@ISA = ('Pod::Simple::PullParser'); +$VERSION = '3.03'; + +use UNIVERSAL (); +BEGIN { + if(defined &DEBUG) { } # no-op + elsif( defined &Pod::Simple::DEBUG ) { *DEBUG = \&Pod::Simple::DEBUG } + else { *DEBUG = sub () {0}; } +} + +$Doctype_decl ||= ''; # No. Just No. Don't even ask me for it. + # qq{\n}; + +$Content_decl ||= + q{}; + +$HTML_EXTENSION = '.html' unless defined $HTML_EXTENSION; +$Computerese = "" unless defined $Computerese; +$LamePad = '' unless defined $LamePad; + +$Linearization_Limit = 120 unless defined $Linearization_Limit; + # headings/items longer than that won't get an +$Perldoc_URL_Prefix = 'http://search.cpan.org/perldoc?' + unless defined $Perldoc_URL_Prefix; +$Perldoc_URL_Postfix = '' + unless defined $Perldoc_URL_Postfix; + +$Title_Prefix = '' unless defined $Title_Prefix; +$Title_Postfix = '' unless defined $Title_Postfix; +%ToIndex = map {; $_ => 1 } qw(head1 head2 head3 head4 ); # item-text + # 'item-text' stuff in the index doesn't quite work, and may + # not be a good idea anyhow. + + +__PACKAGE__->_accessorize( + 'perldoc_url_prefix', + # In turning L into http://whatever/Foo%3a%3aBar, what + # to put before the "Foo%3a%3aBar". + # (for singleton mode only?) + 'perldoc_url_postfix', + # what to put after "Foo%3a%3aBar" in the URL. Normally "". + + 'batch_mode', # whether we're in batch mode + 'batch_mode_current_level', + # When in batch mode, how deep the current module is: 1 for "LWP", + # 2 for "LWP::Procotol", 3 for "LWP::Protocol::GHTTP", etc + + 'title_prefix', 'title_postfix', + # What to put before and after the title in the head. + # Should already be &-escaped + + 'html_header_before_title', + 'html_header_after_title', + 'html_footer', + + 'index', # whether to add an index at the top of each page + # (actually it's a table-of-contents, but we'll call it an index, + # out of apparently longstanding habit) + + 'html_css', # URL of CSS file to point to + 'html_javascript', # URL of CSS file to point to + + 'force_title', # should already be &-escaped + 'default_title', # should already be &-escaped +); + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +my @_to_accept; + +%Tagmap = ( + 'Verbatim' => "\n", + '/Verbatim' => "\n", + 'VerbatimFormatted' => "\n", + '/VerbatimFormatted' => "\n", + 'VerbatimB' => "", + '/VerbatimB' => "", + 'VerbatimI' => "", + '/VerbatimI' => "", + 'VerbatimBI' => "", + '/VerbatimBI' => "", + + + 'Data' => "\n", + '/Data' => "\n", + + 'head1' => "\n

", # And also stick in an + 'head2' => "\n

", # '' + 'head3' => "\n

", # '' + 'head4' => "\n

", # '' + '/head1' => "

\n", + '/head2' => "

\n", + '/head3' => "\n", + '/head4' => "\n", + + 'X' => "", + + changes(qw( + Para=p + B=b I=i + over-bullet=ul + over-number=ol + over-text=dl + over-block=blockquote + item-bullet=li + item-number=li + item-text=dt + )), + changes2( + map {; m/^([-a-z]+)/s && push @_to_accept, $1; $_ } + qw[ + sample=samp + definition=dfn + kbd=keyboard + variable=var + citation=cite + abbreviation=abbr + acronym=acronym + subscript=sub + superscript=sup + big=big + small=small + underline=u + strikethrough=s + ] # no point in providing a way to get ..., I think + ), + + '/item-bullet' => "$LamePad\n", + '/item-number' => "$LamePad\n", + '/item-text' => "$LamePad\n", + 'item-body' => "\n
", + '/item-body' => "
\n", + + + 'B' => "", '/B' => "", + 'I' => "", '/I' => "", + 'F' => "", '/F' => "", + 'C' => "", '/C' => "", + 'L' => "", # ideally never used! + '/L' => "", +); + +sub changes { + return map {; m/^([-_:0-9a-zA-Z]+)=([-_:0-9a-zA-Z]+)$/s + ? ( $1, => "\n<$2>", "/$1", => "\n" ) : die "Funky $_" + } @_; +} +sub changes2 { + return map {; m/^([-_:0-9a-zA-Z]+)=([-_:0-9a-zA-Z]+)$/s + ? ( $1, => "<$2>", "/$1", => "" ) : die "Funky $_" + } @_; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +sub go { exit Pod::Simple::HTML->parse_from_file(@ARGV) } + # Just so we can run from the command line. No options. + # For that, use perldoc! +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub new { + my $new = shift->SUPER::new(@_); + #$new->nix_X_codes(1); + $new->nbsp_for_S(1); + $new->accept_targets( 'html', 'HTML' ); + $new->accept_codes('VerbatimFormatted'); + $new->accept_codes(@_to_accept); + DEBUG > 2 and print "To accept: ", join(' ',@_to_accept), "\n"; + + $new->perldoc_url_prefix( $Perldoc_URL_Prefix ); + $new->perldoc_url_postfix( $Perldoc_URL_Postfix ); + $new->title_prefix( $Title_Prefix ); + $new->title_postfix( $Title_Postfix ); + + $new->html_header_before_title( + qq[$Doctype_decl] + ); + $new->html_header_after_title( join "\n" => + "", + $Content_decl, + "\n", + $new->version_tag_comment, + "\n", + ); + $new->html_footer( qq[\n\n\n\n] ); + + $new->{'Tagmap'} = {%Tagmap}; + return $new; +} + +sub batch_mode_page_object_init { + my($self, $batchconvobj, $module, $infile, $outfile, $depth) = @_; + DEBUG and print "Initting $self\n for $module\n", + " in $infile\n out $outfile\n depth $depth\n"; + $self->batch_mode(1); + $self->batch_mode_current_level($depth); + return $self; +} + +sub run { + my $self = $_[0]; + return $self->do_middle if $self->bare_output; + return + $self->do_beginning && $self->do_middle && $self->do_end; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub do_beginning { + my $self = $_[0]; + + my $title; + + if(defined $self->force_title) { + $title = $self->force_title; + DEBUG and print "Forcing title to be $title\n"; + } else { + # Actually try looking for the title in the document: + $title = $self->get_short_title(); + unless($self->content_seen) { + DEBUG and print "No content seen in search for title.\n"; + return; + } + $self->{'Title'} = $title; + + if(defined $title and $title =~ m/\S/) { + $title = $self->title_prefix . esc($title) . $self->title_postfix; + } else { + $title = $self->default_title; + $title = '' unless defined $title; + DEBUG and print "Title defaults to $title\n"; + } + } + + + my $after = $self->html_header_after_title || ''; + if($self->html_css) { + my $link = + $self->html_css =~ m/html_css # It's a big blob of markup, let's drop it in + : sprintf( # It's just a URL, so let's wrap it up + qq[\n], + $self->html_css, + ); + $after =~ s{()}{$link\n$1}i; # otherwise nevermind + } + $self->_add_top_anchor(\$after); + + if($self->html_javascript) { + my $link = + $self->html_javascript =~ m/html_javascript # It's a big blob of markup, let's drop it in + : sprintf( # It's just a URL, so let's wrap it up + qq[\n], + $self->html_javascript, + ); + $after =~ s{()}{$link\n$1}i; # otherwise nevermind + } + + print {$self->{'output_fh'}} + $self->html_header_before_title || '', + $title, # already escaped + $after, + ; + + DEBUG and print "Returning from do_beginning...\n"; + return 1; +} + +sub _add_top_anchor { + my($self, $text_r) = @_; + unless($$text_r and $$text_r =~ m/name=['"]___top['"]/) { # a hack + $$text_r .= "\n"; + } + return; +} + +sub version_tag_comment { + my $self = shift; + return sprintf + "\n", + esc( + ref($self), $self->VERSION(), $ISA[0], $ISA[0]->VERSION(), + $], scalar(gmtime), + ), $self->_modnote(), + ; +} + +sub _modnote { + my $class = ref($_[0]) || $_[0]; + return join "\n " => grep m/\S/, split "\n", + +qq{ +If you want to change this HTML document, you probably shouldn't do that +by changing it directly. Instead, see about changing the calling options +to $class, and/or subclassing $class, +then reconverting this document from the Pod source. +When in doubt, email the author of $class for advice. +See 'perldoc $class' for more info. +}; + +} + +sub do_end { + my $self = $_[0]; + print {$self->{'output_fh'}} $self->html_footer || ''; + return 1; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +# Normally this would just be a call to _do_middle_main_loop -- but we +# have to do some elaborate things to emit all the content and then +# summarize it and output it /before/ the content that it's a summary of. + +sub do_middle { + my $self = $_[0]; + return $self->_do_middle_main_loop unless $self->index; + + if( $self->output_string ) { + # An efficiency hack + my $out = $self->output_string; #it's a reference to it + my $sneakytag = "\f\f\e\e\b\bIndex Here\e\e\b\b\f\f\n"; + $$out .= $sneakytag; + $self->_do_middle_main_loop; + $sneakytag = quotemeta($sneakytag); + my $index = $self->index_as_html(); + if( $$out =~ s/$sneakytag/$index/s ) { + # Expected case + DEBUG and print "Inserted ", length($index), " bytes of index HTML into $out.\n"; + } else { + DEBUG and print "Odd, couldn't find where to insert the index in the output!\n"; + # I don't think this should ever happen. + } + return 1; + } + + unless( $self->output_fh ) { + require Carp; + Carp::confess("Parser object \$p doesn't seem to have any output object! I don't know how to deal with that."); + } + + # If we get here, we're outputting to a FH. So we need to do some magic. + # Namely, divert all content to a string, which we output after the index. + my $fh = $self->output_fh; + my $content = ''; + { + # Our horrible bait and switch: + $self->output_string( \$content ); + $self->_do_middle_main_loop; + $self->abandon_output_string(); + $self->output_fh($fh); + } + print $fh $self->index_as_html(); + print $fh $content; + + return 1; +} + +########################################################################### + +sub index_as_html { + my $self = $_[0]; + # This is meant to be called AFTER the input document has been parsed! + + my $points = $self->{'PSHTML_index_points'} || []; + + @$points > 1 or return qq[
\n]; + # There's no point in having a 0-item or 1-item index, I dare say. + + my(@out) = qq{\n
}; + my $level = 0; + + my( $target_level, $previous_tagname, $tagname, $text, $anchorname, $indent); + foreach my $p (@$points, ['head0', '(end)']) { + ($tagname, $text) = @$p; + $anchorname = $self->section_escape($text); + if( $tagname =~ m{^head(\d+)$} ) { + $target_level = 0 + $1; + } else { # must be some kinda list item + if($previous_tagname =~ m{^head\d+$} ) { + $target_level = $level + 1; + } else { + $target_level = $level; # no change needed + } + } + + # Get to target_level by opening or closing ULs + while($level > $target_level) + { --$level; push @out, (" " x $level) . ""; } + while($level < $target_level) + { ++$level; push @out, (" " x ($level-1)) + . "
    "; } + + $previous_tagname = $tagname; + next unless $level; + + $indent = ' ' x $level; + push @out, sprintf + "%s
  • %s", + $indent, $level, $anchorname, esc($text) + ; + } + push @out, "
\n"; + return join "\n", @out; +} + +########################################################################### + +sub _do_middle_main_loop { + my $self = $_[0]; + my $fh = $self->{'output_fh'}; + my $tagmap = $self->{'Tagmap'}; + + my($token, $type, $tagname, $linkto, $linktype); + my @stack; + my $dont_wrap = 0; + + while($token = $self->get_token) { + + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + if( ($type = $token->type) eq 'start' ) { + if(($tagname = $token->tagname) eq 'L') { + $linktype = $token->attr('type') || 'insane'; + + $linkto = $self->do_link($token); + + if(defined $linkto and length $linkto) { + esc($linkto); + # (Yes, SGML-escaping applies on top of %-escaping! + # But it's rarely noticeable in practice.) + print $fh qq{}; + } else { + print $fh ""; # Yes, an 'a' element with no attributes! + } + + } elsif ($tagname eq 'item-text' or $tagname =~ m/^head\d$/s) { + print $fh $tagmap->{$tagname} || next; + + my @to_unget; + while(1) { + push @to_unget, $self->get_token; + last if $to_unget[-1]->is_end + and $to_unget[-1]->tagname eq $tagname; + + # TODO: support for X<...>'s found in here? (maybe hack into linearize_tokens) + } + + my $name = $self->linearize_tokens(@to_unget); + + print $fh "section_name_tidy( $name ) ); + print $fh qq[name="$esc"]; + DEBUG and print "Linearized ", scalar(@to_unget), + " tokens as \"$name\".\n"; + push @{ $self->{'PSHTML_index_points'} }, [$tagname, $name] + if $ToIndex{ $tagname }; + # Obviously, this discards all formatting codes (saving + # just their content), but ahwell. + + } else { # ludicrously long, so nevermind + DEBUG and print "Linearized ", scalar(@to_unget), + " tokens, but it was too long, so nevermind.\n"; + } + print $fh "\n>"; + $self->unget_token(@to_unget); + + } elsif ($tagname eq 'Data') { + my $next = $self->get_token; + next unless defined $next; + unless( $next->type eq 'text' ) { + $self->unget_token($next); + next; + } + DEBUG and print " raw text ", $next->text, "\n"; + printf $fh "\n" . $next->text . "\n"; + next; + + } else { + if( $tagname =~ m/^over-/s ) { + push @stack, ''; + } elsif( $tagname =~ m/^item-/s and @stack and $stack[-1] ) { + print $fh $stack[-1]; + $stack[-1] = ''; + } + print $fh $tagmap->{$tagname} || next; + ++$dont_wrap if $tagname eq 'Verbatim' or $tagname eq "VerbatimFormatted" + or $tagname eq 'X'; + } + + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + } elsif( $type eq 'end' ) { + if( ($tagname = $token->tagname) =~ m/^over-/s ) { + if( my $end = pop @stack ) { + print $fh $end; + } + } elsif( $tagname =~ m/^item-/s and @stack) { + $stack[-1] = $tagmap->{"/$tagname"}; + if( $tagname eq 'item-text' and defined(my $next = $self->get_token) ) { + $self->unget_token($next); + if( $next->type eq 'start' and $next->tagname !~ m/^item-/s ) { + print $fh $tagmap->{"/item-text"},$tagmap->{"item-body"}; + $stack[-1] = $tagmap->{"/item-body"}; + } + } + next; + } + print $fh $tagmap->{"/$tagname"} || next; + --$dont_wrap if $tagname eq 'Verbatim' or $tagname eq 'X'; + + # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + } elsif( $type eq 'text' ) { + esc($type = $token->text); # reuse $type, why not + $type =~ s/([\?\!\"\'\.\,]) /$1\n/g unless $dont_wrap; + print $fh $type; + } + + } + return 1; +} + +########################################################################### +# + +sub do_link { + my($self, $token) = @_; + my $type = $token->attr('type'); + if(!defined $type) { + $self->whine("Typeless L!?", $token->attr('start_line')); + } elsif( $type eq 'pod') { return $self->do_pod_link($token); + } elsif( $type eq 'url') { return $self->do_url_link($token); + } elsif( $type eq 'man') { return $self->do_man_link($token); + } else { + $self->whine("L of unknown type $type!?", $token->attr('start_line')); + } + return 'FNORG'; # should never get called +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub do_url_link { return $_[1]->attr('to') } + +sub do_man_link { return undef } + # But subclasses are welcome to override this if they have man + # pages somewhere URL-accessible. + + +sub do_pod_link { + # And now things get really messy... + my($self, $link) = @_; + my $to = $link->attr('to'); + my $section = $link->attr('section'); + return undef unless( # should never happen + (defined $to and length $to) or + (defined $section and length $section) + ); + + $section = $self->section_escape($section) + if defined $section and length($section .= ''); # (stringify) + + DEBUG and printf "Resolving \"%s\" \"%s\"...\n", + $to || "(nil)", $section || "(nil)"; + + { + # An early hack: + my $complete_url = $self->resolve_pod_link_by_table($to, $section); + if( $complete_url ) { + DEBUG > 1 and print "resolve_pod_link_by_table(T,S) gives ", + $complete_url, "\n (Returning that.)\n"; + return $complete_url; + } else { + DEBUG > 4 and print " resolve_pod_link_by_table(T,S)", + " didn't return anything interesting.\n"; + } + } + + if(defined $to and length $to) { + # Give this routine first hack again + my $there = $self->resolve_pod_link_by_table($to); + if(defined $there and length $there) { + DEBUG > 1 + and print "resolve_pod_link_by_table(T) gives $there\n"; + } else { + $there = + $self->resolve_pod_page_link($to, $section); + # (I pass it the section value, but I don't see a + # particular reason it'd use it.) + DEBUG > 1 and print "resolve_pod_page_link gives ", $to || "(nil)", "\n"; + unless( defined $there and length $there ) { + DEBUG and print "Can't resolve $to\n"; + return undef; + } + # resolve_pod_page_link returning undef is how it + # can signal that it gives up on making a link + } + $to = $there; + } + + #DEBUG and print "So far [", $to||'nil', "] [", $section||'nil', "]\n"; + + my $out = (defined $to and length $to) ? $to : ''; + $out .= "#" . $section if defined $section and length $section; + + unless(length $out) { # sanity check + DEBUG and printf "Oddly, couldn't resolve \"%s\" \"%s\"...\n", + $to || "(nil)", $section || "(nil)"; + return undef; + } + + DEBUG and print "Resolved to $out\n"; + return $out; +} + + +# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + +sub section_escape { + my($self, $section) = @_; + return $self->section_url_escape( + $self->section_name_tidy($section) + ); +} + +sub section_name_tidy { + my($self, $section) = @_; + $section =~ tr/ /_/; + $section =~ tr/\x00-\x1F\x80-\x9F//d if 'A' eq chr(65); # drop crazy characters + $section = $self->unicode_escape_url($section); + $section = '_' unless length $section; + return $section; +} + +sub section_url_escape { shift->general_url_escape(@_) } +sub pagepath_url_escape { shift->general_url_escape(@_) } + +sub general_url_escape { + my($self, $string) = @_; + + $string =~ s/([^\x00-\xFF])/join '', map sprintf('%%%02X',$_), unpack 'C*', $1/eg; + # express Unicode things as urlencode(utf(orig)). + + # A pretty conservative escaping, behoovey even for query components + # of a URL (see RFC 2396) + + $string =~ s/([^-_\.!~*()abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])/sprintf('%%%02X',ord($1))/eg; + # Yes, stipulate the list without a range, so that this can work right on + # all charsets that this module happens to run under. + # Altho, hmm, what about that ord? Presumably that won't work right + # under non-ASCII charsets. Something should be done + # about that, I guess? + + return $string; +} + +#-------------------------------------------------------------------------- +# +# Oh look, a yawning portal to Hell! Let's play touch football right by it! +# + +sub resolve_pod_page_link { + # resolve_pod_page_link must return a properly escaped URL + my $self = shift; + return $self->batch_mode() + ? $self->resolve_pod_page_link_batch_mode(@_) + : $self->resolve_pod_page_link_singleton_mode(@_) + ; +} + +sub resolve_pod_page_link_singleton_mode { + my($self, $it) = @_; + return undef unless defined $it and length $it; + my $url = $self->pagepath_url_escape($it); + + $url =~ s{::$}{}s; # probably never comes up anyway + $url =~ s{::}{/}g unless $self->perldoc_url_prefix =~ m/\?/s; # sane DWIM? + + return undef unless length $url; + return $self->perldoc_url_prefix . $url . $self->perldoc_url_postfix; +} + +sub resolve_pod_page_link_batch_mode { + my($self, $to) = @_; + DEBUG > 1 and print " During batch mode, resolving $to ...\n"; + my @path = grep length($_), split m/::/s, $to, -1; + unless( @path ) { # sanity + DEBUG and print "Very odd! Splitting $to gives (nil)!\n"; + return undef; + } + $self->batch_mode_rectify_path(\@path); + my $out = join('/', map $self->pagepath_url_escape($_), @path) + . $HTML_EXTENSION; + DEBUG > 1 and print " => $out\n"; + return $out; +} + +sub batch_mode_rectify_path { + my($self, $pathbits) = @_; + my $level = $self->batch_mode_current_level; + $level--; # how many levels up to go to get to the root + if($level < 1) { + unshift @$pathbits, '.'; # just to be pretty + } else { + unshift @$pathbits, ('..') x $level; + } + return; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub resolve_pod_link_by_table { + # A crazy hack to allow specifying custom L => URL mappings + + return unless $_[0]->{'podhtml_LOT'}; # An optimizy shortcut + + my($self, $to, $section) = @_; + + # TODO: add a method that actually populates podhtml_LOT from a file? + + if(defined $section) { + $to = '' unless defined $to and length $to; + return $self->{'podhtml_LOT'}{"$to#$section"}; # quite possibly undef! + } else { + return $self->{'podhtml_LOT'}{$to}; # quite possibly undef! + } + return; +} + +########################################################################### + +sub linearize_tokens { # self, tokens + my $self = shift; + my $out = ''; + + my $t; + while($t = shift @_) { + if(!ref $t or !UNIVERSAL::can($t, 'is_text')) { + $out .= $t; # a string, or some insane thing + } elsif($t->is_text) { + $out .= $t->text; + } elsif($t->is_start and $t->tag eq 'X') { + # Ignore until the end of this X<...> sequence: + my $x_open = 1; + while($x_open) { + next if( ($t = shift @_)->is_text ); + if( $t->is_start and $t->tag eq 'X') { ++$x_open } + elsif($t->is_end and $t->tag eq 'X') { --$x_open } + } + } + } + return undef if length $out > $Linearization_Limit; + return $out; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +sub unicode_escape_url { + my($self, $string) = @_; + $string =~ s/([^\x00-\xFF])/'('.ord($1).')'/eg; + # Turn char 1234 into "(1234)" + return $string; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +sub esc { # a function. + if(defined wantarray) { + if(wantarray) { + @_ = splice @_; # break aliasing + } else { + my $x = shift; + $x =~ s/([^-\n\t !\#\$\%\(\)\*\+,\.\~\/\:\;=\?\@\[\\\]\^_\`\{\|\}abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])/'&#'.(ord($1)).';'/eg; + return $x; + } + } + foreach my $x (@_) { + # Escape things very cautiously: + $x =~ s/([^-\n\t !\#\$\%\(\)\*\+,\.\~\/\:\;=\?\@\[\\\]\^_\`\{\|\}abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789])/'&#'.(ord($1)).';'/eg + if defined $x; + # Leave out "- so that "--" won't make it thru in X-generated comments + # with text in them. + + # Yes, stipulate the list without a range, so that this can work right on + # all charsets that this module happens to run under. + # Altho, hmm, what about that ord? Presumably that won't work right + # under non-ASCII charsets. Something should be done about that. + } + return @_; +} + +#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1; +__END__ + +=head1 NAME + +Pod::Simple::HTML - convert Pod to HTML + +=head1 SYNOPSIS + + perl -MPod::Simple::HTML -e Pod::Simple::HTML::go thingy.pod + + +=head1 DESCRIPTION + +This class is for making an HTML rendering of a Pod document. + +This is a subclass of L and inherits all its +methods (and options). + +Note that if you want to do a batch conversion of a lot of Pod +documents to HTML, you should see the module L. + + + +=head1 CALLING FROM THE COMMAND LINE + +TODO + + perl -MPod::Simple::HTML -e Pod::Simple::HTML::go Thing.pod Thing.html + + + +=head1 CALLING FROM PERL + +TODO make a new object, set any options, and use parse_from_file + + +=head1 METHODS + +TODO +all (most?) accessorized methods + + +=head1 SUBCLASSING + +TODO + + can just set any of: html_css html_javascript title_prefix + 'html_header_before_title', + 'html_header_after_title', + 'html_footer', + +maybe override do_pod_link + +maybe override do_beginning do_end + + + +=head1 SEE ALSO + +L, L + + +TODO: a corpus of sample Pod input and HTML output? Or common +idioms? + + + +=head1 COPYRIGHT AND DISCLAIMERS + +Copyright (c) 2002-2004 Sean M. Burke. All rights reserved. + +This library is free software; you can redistribute it and/or modify it +under the same terms as Perl itself. + +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. + +=head1 AUTHOR + +Sean M. Burke C + +=cut + diff --git a/lib/Pod/Simple/HTMLBatch.pm b/lib/Pod/Simple/HTMLBatch.pm new file mode 100644 index 0000000..bce0a44 --- /dev/null +++ b/lib/Pod/Simple/HTMLBatch.pm @@ -0,0 +1,1342 @@ + +require 5; +package Pod::Simple::HTMLBatch; +use strict; +use vars qw( $VERSION $HTML_RENDER_CLASS $HTML_EXTENSION + $CSS $JAVASCRIPT $SLEEPY $SEARCH_CLASS @ISA +); +$VERSION = '3.02'; +@ISA = (); # Yup, we're NOT a subclass of Pod::Simple::HTML! + +# TODO: nocontents stylesheets. Strike some of the color variations? + +use Pod::Simple::HTML (); +BEGIN {*esc = \&Pod::Simple::HTML::esc } +use File::Spec (); +use UNIVERSAL (); + # "Isn't the Universe an amazing place? I wouldn't live anywhere else!" + +use Pod::Simple::Search; +$SEARCH_CLASS ||= 'Pod::Simple::Search'; + +BEGIN { + if(defined &DEBUG) { } # no-op + elsif( defined &Pod::Simple::DEBUG ) { *DEBUG = \&Pod::Simple::DEBUG } + else { *DEBUG = sub () {0}; } +} + +$SLEEPY = 1 if !defined $SLEEPY and $^O =~ /mswin|mac/i; +# flag to occasionally sleep for $SLEEPY - 1 seconds. + +$HTML_RENDER_CLASS ||= "Pod::Simple::HTML"; + +# +# Methods beginning with "_" are particularly internal and possibly ugly. +# + +Pod::Simple::_accessorize( __PACKAGE__, + 'verbose', # how verbose to be during batch conversion + 'html_render_class', # what class to use to render + 'contents_file', # If set, should be the name of a file (in current directory) + # to write the list of all modules to + 'index', # will set $htmlpage->index(...) to this (true or false) + 'progress', # progress object + 'contents_page_start', 'contents_page_end', + + 'css_flurry', '_css_wad', 'javascript_flurry', '_javascript_wad', + 'no_contents_links', # set to true to suppress automatic adding of << links. + '_contents', +); + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +# Just so we can run from the command line more easily +sub go { + @ARGV == 2 or die sprintf( + "Usage: perl -M%s -e %s:go indirs outdir\n (or use \"\@INC\" for indirs)\n", + __PACKAGE__, __PACKAGE__, + ); + + if(defined($ARGV[1]) and length($ARGV[1])) { + my $d = $ARGV[1]; + -e $d or die "I see no output directory named \"$d\"\nAborting"; + -d $d or die "But \"$d\" isn't a directory!\nAborting"; + -w $d or die "Directory \"$d\" isn't writeable!\nAborting"; + } + + __PACKAGE__->batch_convert(@ARGV); +} +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + +sub new { + my $new = bless {}, ref($_[0]) || $_[0]; + $new->html_render_class($HTML_RENDER_CLASS); + $new->verbose(1 + DEBUG); + $new->_contents([]); + + $new->index(1); + + $new-> _css_wad([]); $new->css_flurry(1); + $new->_javascript_wad([]); $new->javascript_flurry(1); + + $new->contents_file( + 'index' . ($HTML_EXTENSION || $Pod::Simple::HTML::HTML_EXTENSION) + ); + + $new->contents_page_start( join "\n", grep $_, + $Pod::Simple::HTML::Doctype_decl, + "", + "Perl Documentation", + $Pod::Simple::HTML::Content_decl, + "", + "\n\n

Perl Documentation

\n" + ); # override if you need a different title + + + $new->contents_page_end( sprintf( + "\n\n

Generated by %s v%s under Perl v%s\n
At %s GMT, which is %s local time.

\n\n\n", + esc( + ref($new), + eval {$new->VERSION} || $VERSION, + $], scalar(gmtime), scalar(localtime), + ))); + + return $new; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub muse { + my $self = shift; + if($self->verbose) { + print 'T+', int(time() - $self->{'_batch_start_time'}), "s: ", @_, "\n"; + } + return 1; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub batch_convert { + my($self, $dirs, $outdir) = @_; + $self ||= __PACKAGE__; # tolerate being called as an optionless function + $self = $self->new unless ref $self; # tolerate being used as a class method + + if(!defined($dirs) or $dirs eq '' or $dirs eq '@INC' ) { + $dirs = ''; + } elsif(ref $dirs) { + # OK, it's an explicit set of dirs to scan, specified as an arrayref. + } else { + # OK, it's an explicit set of dirs to scan, specified as a + # string like "/thing:/also:/whatever/perl" (":"-delim, as usual) + # or, under MSWin, like "c:/thing;d:/also;c:/whatever/perl" (";"-delim!) + require Config; + my $ps = quotemeta( $Config::Config{'path_sep'} || ":" ); + $dirs = [ grep length($_), split qr/$ps/, $dirs ]; + } + + $outdir = $self->filespecsys->curdir + unless defined $outdir and length $outdir; + + $self->_batch_convert_main($dirs, $outdir); +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _batch_convert_main { + my($self, $dirs, $outdir) = @_; + # $dirs is either false, or an arrayref. + # $outdir is a pathspec. + + $self->{'_batch_start_time'} ||= time(); + + $self->muse( "= ", scalar(localtime) ); + $self->muse( "Starting batch conversion to \"$outdir\"" ); + + my $progress = $self->progress; + if(!$progress and $self->verbose > 0 and $self->verbose() <= 5) { + require Pod::Simple::Progress; + $progress = Pod::Simple::Progress->new( + ($self->verbose < 2) ? () # Default omission-delay + : ($self->verbose == 2) ? 1 # Reduce the omission-delay + : 0 # Eliminate the omission-delay + ); + $self->progress($progress); + } + + if($dirs) { + $self->muse(scalar(@$dirs), " dirs to scan: @$dirs"); + } else { + $self->muse("Scanning \@INC. This could take a minute or two."); + } + my $mod2path = $self->find_all_pods($dirs ? $dirs : ()); + $self->muse("Done scanning."); + + my $total = keys %$mod2path; + unless($total) { + $self->muse("No pod found. Aborting batch conversion.\n"); + return $self; + } + + $progress and $progress->goal($total); + $self->muse("Now converting pod files to HTML.", + ($total > 25) ? " This will take a while more." : () + ); + + $self->_spray_css( $outdir ); + $self->_spray_javascript( $outdir ); + + $self->_do_all_batch_conversions($mod2path, $outdir); + + $progress and $progress->done(sprintf ( + "Done converting %d files.", $self->{"__batch_conv_page_count"} + )); + return $self->_batch_convert_finish($outdir); + return $self; +} + + +sub _do_all_batch_conversions { + my($self, $mod2path, $outdir) = @_; + $self->{"__batch_conv_page_count"} = 0; + + foreach my $module (sort {lc($a) cmp lc($b)} keys %$mod2path) { + $self->_do_one_batch_conversion($module, $mod2path, $outdir); + sleep($SLEEPY - 1) if $SLEEPY; + } + + return; +} + +sub _batch_convert_finish { + my($self, $outdir) = @_; + $self->write_contents_file($outdir); + $self->muse("Done with batch conversion. $$self{'__batch_conv_page_count'} files done."); + $self->muse( "= ", scalar(localtime) ); + $self->progress and $self->progress->done("All done!"); + return; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _do_one_batch_conversion { + my($self, $module, $mod2path, $outdir, $outfile) = @_; + + my $retval; + my $total = scalar keys %$mod2path; + my $infile = $mod2path->{$module}; + my @namelets = grep m/\S/, split "::", $module; + # this can stick around in the contents LoL + my $depth = scalar @namelets; + die "Contentless thingie?! $module $infile" unless @namelets; #sanity + + $outfile ||= do { + my @n = @namelets; + $n[-1] .= $HTML_EXTENSION || $Pod::Simple::HTML::HTML_EXTENSION; + $self->filespecsys->catfile( $outdir, @n ); + }; + + my $progress = $self->progress; + + my $page = $self->html_render_class->new; + if(DEBUG > 5) { + $self->muse($self->{"__batch_conv_page_count"} + 1, "/$total: ", + ref($page), " render ($depth) $module => $outfile"); + } elsif(DEBUG > 2) { + $self->muse($self->{"__batch_conv_page_count"} + 1, "/$total: $module => $outfile") + } + + # Give each class a chance to init the converter: + + $page->batch_mode_page_object_init($self, $module, $infile, $outfile, $depth) + if $page->can('batch_mode_page_object_init'); + $self->batch_mode_page_object_init($page, $module, $infile, $outfile, $depth) + if $self->can('batch_mode_page_object_init'); + + # Now get busy... + $self->makepath($outdir => \@namelets); + + $progress and $progress->reach($self->{"__batch_conv_page_count"}, "Rendering $module"); + + if( $retval = $page->parse_from_file($infile, $outfile) ) { + ++ $self->{"__batch_conv_page_count"} ; + $self->note_for_contents_file( \@namelets, $infile, $outfile ); + } else { + $self->muse("Odd, parse_from_file(\"$infile\", \"$outfile\") returned false."); + } + + $page->batch_mode_page_object_kill($self, $module, $infile, $outfile, $depth) + if $page->can('batch_mode_page_object_kill'); + # The following isn't a typo. Note that it switches $self and $page. + $self->batch_mode_page_object_kill($page, $module, $infile, $outfile, $depth) + if $self->can('batch_mode_page_object_kill'); + + DEBUG > 4 and printf "%s %sb < $infile %s %sb\n", + $outfile, -s $outfile, $infile, -s $infile + ; + + undef($page); + return $retval; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +sub filespecsys { $_[0]{'_filespecsys'} || 'File::Spec' } + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub note_for_contents_file { + my($self, $namelets, $infile, $outfile) = @_; + + # I think the infile and outfile parts are never used. -- SMB + # But it's handy to have them around for debugging. + + if( $self->contents_file ) { + my $c = $self->_contents(); + push @$c, + [ join("::", @$namelets), $infile, $outfile, $namelets ] + # 0 1 2 3 + ; + DEBUG > 3 and print "Noting @$c[-1]\n"; + } + return; +} + +#_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- + +sub write_contents_file { + my($self, $outdir) = @_; + my $outfile = $self->_contents_filespec($outdir) || return; + + $self->muse("Preparing list of modules for ToC"); + + my($toplevel, # maps toplevelbit => [all submodules] + $toplevel_form_freq, # ends up being 'foo' => 'Foo' + ) = $self->_prep_contents_breakdown; + + my $Contents = eval { $self->_wopen($outfile) }; + if( $Contents ) { + $self->muse( "Writing contents file $outfile" ); + } else { + warn "Couldn't write-open contents file $outfile: $!\nAbort writing to $outfile at all"; + return; + } + + $self->_write_contents_start( $Contents, $outfile, ); + $self->_write_contents_middle( $Contents, $outfile, $toplevel, $toplevel_form_freq ); + $self->_write_contents_end( $Contents, $outfile, ); + return $outfile; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _write_contents_start { + my($self, $Contents, $outfile) = @_; + my $starter = $self->contents_page_start || ''; + + { + my $css_wad = $self->_css_wad_to_markup(1); + if( $css_wad ) { + $starter =~ s{()}{\n$css_wad\n$1}i; # otherwise nevermind + } + + my $javascript_wad = $self->_javascript_wad_to_markup(1); + if( $javascript_wad ) { + $starter =~ s{()}{\n$javascript_wad\n$1}i; # otherwise nevermind + } + } + + unless(print $Contents $starter, "
\n" ) { + warn "Couldn't print to $outfile: $!\nAbort writing to $outfile at all"; + close($Contents); + return 0; + } + return 1; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _write_contents_middle { + my($self, $Contents, $outfile, $toplevel2submodules, $toplevel_form_freq) = @_; + + foreach my $t (sort keys %$toplevel2submodules) { + my @downlines = sort {$a->[-1] cmp $b->[-1]} + @{ $toplevel2submodules->{$t} }; + + printf $Contents qq[
%s
\n
\n], + esc( $t, $toplevel_form_freq->{$t} ) + ; + + my($path, $name); + foreach my $e (@downlines) { + $name = $e->[0]; + $path = join( "/", '.', esc( @{$e->[3]} ) ) + . ($HTML_EXTENSION || $Pod::Simple::HTML::HTML_EXTENSION); + print $Contents qq{ }, esc($name), "  \n"; + } + print $Contents "
\n\n"; + } + return 1; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _write_contents_end { + my($self, $Contents, $outfile) = @_; + unless( + print $Contents "
\n", + $self->contents_page_end || '', + ) { + warn "Couldn't write to $outfile: $!"; + } + close($Contents) or warn "Couldn't close $outfile: $!"; + return 1; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _prep_contents_breakdown { + my($self) = @_; + my $contents = $self->_contents; + my %toplevel; # maps lctoplevelbit => [all submodules] + my %toplevel_form_freq; # ends up being 'foo' => 'Foo' + # (mapping anycase forms to most freq form) + + foreach my $entry (@$contents) { + my $toplevel = + $entry->[0] =~ m/^perl\w*$/ ? 'perl_core_docs' + # group all the perlwhatever docs together + : $entry->[3][0] # normal case + ; + ++$toplevel_form_freq{ lc $toplevel }{ $toplevel }; + push @{ $toplevel{ lc $toplevel } }, $entry; + push @$entry, lc($entry->[0]); # add a sort-order key to the end + } + + foreach my $toplevel (sort keys %toplevel) { + my $fgroup = $toplevel_form_freq{$toplevel}; + $toplevel_form_freq{$toplevel} = + ( + sort { $fgroup->{$b} <=> $fgroup->{$a} or $a cmp $b } + keys %$fgroup + # This hash is extremely unlikely to have more than 4 members, so this + # sort isn't so very wasteful + )[0]; + } + + return(\%toplevel, \%toplevel_form_freq) if wantarray; + return \%toplevel; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _contents_filespec { + my($self, $outdir) = @_; + my $outfile = $self->contents_file; + return unless $outfile; + return $self->filespecsys->catfile( $outdir, $outfile ); +} + +#_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- + +sub makepath { + my($self, $outdir, $namelets) = @_; + return unless @$namelets > 1; + for my $i (0 .. ($#$namelets - 1)) { + my $dir = $self->filespecsys->catdir( $outdir, @$namelets[0 .. $i] ); + if(-e $dir) { + die "$dir exists but not as a directory!?" unless -d $dir; + next; + } + DEBUG > 3 and print " Making $dir\n"; + mkdir $dir, 0777 + or die "Can't mkdir $dir: $!\nAborting" + ; + } + return; +} + +#_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- + +sub batch_mode_page_object_init { + my $self = shift; + my($page, $module, $infile, $outfile, $depth) = @_; + + # TODO: any further options to percolate onto this new object here? + + $page->default_title($module); + $page->index( $self->index ); + + $page->html_css( $self-> _css_wad_to_markup($depth) ); + $page->html_javascript( $self->_javascript_wad_to_markup($depth) ); + + $self->add_header_backlink($page, $module, $infile, $outfile, $depth); + $self->add_footer_backlink($page, $module, $infile, $outfile, $depth); + + + return $self; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub add_header_backlink { + my $self = shift; + return if $self->no_contents_links; + my($page, $module, $infile, $outfile, $depth) = @_; + $page->html_header_after_title( join '', + $page->html_header_after_title || '', + + qq[

<<

\n], + ) + if $self->contents_file + ; + return; +} + +sub add_footer_backlink { + my $self = shift; + return if $self->no_contents_links; + my($page, $module, $infile, $outfile, $depth) = @_; + $page->html_footer( join '', + qq[

<<

\n], + + $page->html_footer || '', + ) + if $self->contents_file + ; + return; +} + +sub url_up_to_contents { + my($self, $depth) = @_; + --$depth; + return join '/', ('..') x $depth, esc($self->contents_file); +} + +#_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- + +sub find_all_pods { + my($self, $dirs) = @_; + # You can override find_all_pods in a subclass if you want to + # do extra filtering or whatnot. But for the moment, we just + # pass to modnames2paths: + return $self->modnames2paths($dirs); +} + +#_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- + +sub modnames2paths { # return a hashref mapping modulenames => paths + my($self, $dirs) = @_; + + my $m2p; + { + my $search = $SEARCH_CLASS->new; + DEBUG and print "Searching via $search\n"; + $search->verbose(1) if DEBUG > 10; + $search->progress( $self->progress->copy->goal(0) ) if $self->progress; + $search->shadows(0); # don't bother noting shadowed files + $search->inc( $dirs ? 0 : 1 ); + $search->survey( $dirs ? @$dirs : () ); + $m2p = $search->name2path; + die "What, no name2path?!" unless $m2p; + } + + $self->muse("That's odd... no modules found!") unless keys %$m2p; + if( DEBUG > 4 ) { + print "Modules found (name => path):\n"; + foreach my $m (sort {lc($a) cmp lc($b)} keys %$m2p) { + print " $m $$m2p{$m}\n"; + } + print "(total ", scalar(keys %$m2p), ")\n\n"; + } elsif( DEBUG ) { + print "Found ", scalar(keys %$m2p), " modules.\n"; + } + $self->muse( "Found ", scalar(keys %$m2p), " modules." ); + + # return the Foo::Bar => /whatever/Foo/Bar.pod|pm hashref + return $m2p; +} + +#=========================================================================== + +sub _wopen { + # this is abstracted out so that the daemon class can override it + my($self, $outpath) = @_; + require Symbol; + my $out_fh = Symbol::gensym(); + DEBUG > 5 and print "Write-opening to $outpath\n"; + return $out_fh if open($out_fh, "> $outpath"); + require Carp; + Carp::croak("Can't write-open $outpath: $!"); +} + +#========================================================================== + +sub add_css { + my($self, $url, $is_default, $name, $content_type, $media, $_code) = @_; + return unless $url; + unless($name) { + # cook up a reasonable name based on the URL + $name = $url; + if( $name !~ m/\?/ and $name =~ m{([^/]+)$}s ) { + $name = $1; + $name =~ s/\.css//i; + } + } + $media ||= 'all'; + $content_type ||= 'text/css'; + + my $bunch = [$url, $name, $content_type, $media, $_code]; + if($is_default) { unshift @{ $self->_css_wad }, $bunch } + else { push @{ $self->_css_wad }, $bunch } + return; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _spray_css { + my($self, $outdir) = @_; + + return unless $self->css_flurry(); + $self->_gen_css_wad(); + + my $lol = $self->_css_wad; + foreach my $chunk (@$lol) { + my $url = $chunk->[0]; + my $outfile; + if( ref($chunk->[-1]) and $url =~ m{^(_[-a-z0-9_]+\.css$)} ) { + $outfile = $self->filespecsys->catfile( $outdir, $1 ); + DEBUG > 5 and print "Noting $$chunk[0] as a file I'll create.\n"; + } else { + DEBUG > 5 and print "OK, noting $$chunk[0] as an external CSS.\n"; + # Requires no further attention. + next; + } + + #$self->muse( "Writing autogenerated CSS file $outfile" ); + my $Cssout = $self->_wopen($outfile); + print $Cssout ${$chunk->[-1]} + or warn "Couldn't print to $outfile: $!\nAbort writing to $outfile at all"; + close($Cssout); + DEBUG > 5 and print "Wrote $outfile\n"; + } + + return; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + +sub _css_wad_to_markup { + my($self, $depth) = @_; + + my @css = @{ $self->_css_wad || return '' }; + return '' unless @css; + + my $rel = 'stylesheet'; + my $out = ''; + + --$depth; + my $uplink = $depth ? ('../' x $depth) : ''; + + foreach my $chunk (@css) { + next unless $chunk and @$chunk; + + my( $url1, $url2, $title, $type, $media) = ( + $self->_maybe_uplink( $chunk->[0], $uplink ), + esc(grep !ref($_), @$chunk) + ); + + $out .= qq{\n}; + + $rel = 'alternate stylesheet'; # alternates = all non-first iterations + } + return $out; +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +sub _maybe_uplink { + # if the given URL looks relative, return the given uplink string -- + # otherwise return emptystring + my($self, $url, $uplink) = @_; + ($url =~ m{^\./} or $url !~ m{[/\:]} ) + ? $uplink + : '' + # qualify it, if/as needed +} + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +sub _gen_css_wad { + my $self = $_[0]; + my $css_template = $self->_css_template; + foreach my $variation ( + + # Commented out for sake of concision: + # + # 011n=black_with_red_on_white + # 001n=black_with_yellow_on_white + # 101n=black_with_green_on_white + # 110=white_with_yellow_on_black + # 010=white_with_green_on_black + # 011=white_with_blue_on_black + # 100=white_with_red_on_black + + qw[ + 110n=black_with_blue_on_white + 010n=black_with_magenta_on_white + 100n=black_with_cyan_on_white + + 101=white_with_purple_on_black + 001=white_with_navy_blue_on_black + + 010a=grey_with_green_on_black + 010b=white_with_green_on_grey + 101an=black_with_green_on_grey + 101bn=grey_with_green_on_white + ]) { + + my $outname = $variation; + my($flipmode, @swap) = ( ($4 || ''), $1,$2,$3) + if $outname =~ s/^([012])([012])([[012])([a-z]*)=?//s; + @swap = () if '010' eq join '', @swap; # 010 is a swop-no-op! + + my $this_css = + "/* This file is autogenerated. Do not edit. $variation */\n\n" + . $css_template; + + # Only look at three-digitty colors, for now at least. + if( $flipmode =~ m/n/ ) { + $this_css =~ s/(#[0-9a-fA-F]{3})\b/_color_negate($1)/eg; + $this_css =~ s/\bthin\b/medium/g; + } + $this_css =~ s<#([0-9a-fA-F])([0-9a-fA-F])([0-9a-fA-F])\b> + < join '', '#', ($1,$2,$3)[@swap] >eg if @swap; + + if( $flipmode =~ m/a/) + { $this_css =~ s/#fff\b/#999/gi } # black -> dark grey + elsif($flipmode =~ m/b/) + { $this_css =~ s/#000\b/#666/gi } # white -> light grey + + my $name = $outname; + $name =~ tr/-_/ /; + $self->add_css( "_$outname.css", 0, $name, 0, 0, \$this_css); + } + + # Now a few indexless variations: + foreach my $variation (qw[ + black_with_blue_on_white white_with_purple_on_black + white_with_green_on_grey grey_with_green_on_white + ]) { + my $outname = "indexless_$variation"; + my $this_css = join "\n", + "/* This file is autogenerated. Do not edit. $outname */\n", + "\@import url(\"./_$variation.css\");", + ".indexgroup { display: none; }", + "\n", + ; + my $name = $outname; + $name =~ tr/-_/ /; + $self->add_css( "_$outname.css", 0, $name, 0, 0, \$this_css); + } + + return; +} + +sub _color_negate { + my $x = lc $_[0]; + $x =~ tr[0123456789abcdef] + [fedcba9876543210]; + return $x; +} + +#=========================================================================== + +sub add_javascript { + my($self, $url, $content_type, $_code) = @_; + return unless $url; + push @{ $self->_javascript_wad }, [ + $url, $content_type || 'text/javascript', $_code + ]; + return; +} + +sub _spray_javascript { + my($self, $outdir) = @_; + return unless $self->javascript_flurry(); + $self->_gen_javascript_wad(); + + my $lol = $self->_javascript_wad; + foreach my $script (@$lol) { + my $url = $script->[0]; + my $outfile; + + if( ref($script->[-1]) and $url =~ m{^(_[-a-z0-9_]+\.js$)} ) { + $outfile = $self->filespecsys->catfile( $outdir, $1 ); + DEBUG > 5 and print "Noting $$script[0] as a file I'll create.\n"; + } else { + DEBUG > 5 and print "OK, noting $$script[0] as an external JavaScript.\n"; + next; + } + + #$self->muse( "Writing JavaScript file $outfile" ); + my $Jsout = $self->_wopen($outfile); + + print $Jsout ${$script->[-1]} + or warn "Couldn't print to $outfile: $!\nAbort writing to $outfile at all"; + close($Jsout); + DEBUG > 5 and print "Wrote $outfile\n"; + } + + return; +} + +sub _gen_javascript_wad { + my $self = $_[0]; + my $js_code = $self->_javascript || return; + $self->add_javascript( "_podly.js", 0, \$js_code); + return; +} + +sub _javascript_wad_to_markup { + my($self, $depth) = @_; + + my @scripts = @{ $self->_javascript_wad || return '' }; + return '' unless @scripts; + + my $out = ''; + + --$depth; + my $uplink = $depth ? ('../' x $depth) : ''; + + foreach my $s (@scripts) { + next unless $s and @$s; + + my( $url1, $url2, $type, $media) = ( + $self->_maybe_uplink( $s->[0], $uplink ), + esc(grep !ref($_), @$s) + ); + + $out .= qq{\n}; + } + return $out; +} + +#=========================================================================== + +sub _css_template { return $CSS } +sub _javascript { return $JAVASCRIPT } + +$CSS = <<'EOCSS'; +/* For accessibility reasons, never specify text sizes in px/pt/pc/in/cm/mm */ + +@media all { .hide { display: none; } } + +@media print { + .noprint, div.indexgroup, .backlinktop, .backlinkbottom { display: none } + + * { + border-color: black !important; + color: black !important; + background-color: transparent !important; + background-image: none !important; + } + + dl.superindex > dd { + word-spacing: .6em; + } +} + +@media aural, braille, embossed { + div.indexgroup { display: none; } /* Too noisy, don't you think? */ + dl.superindex > dt:before { content: "Group "; } + dl.superindex > dt:after { content: " contains:"; } + .backlinktop a:before { content: "Back to contents"; } + .backlinkbottom a:before { content: "Back to contents"; } +} + +@media aural { + dl.superindex > dt { pause-before: 600ms; } +} + +@media screen, tty, tv, projection { + .noscreen { display: none; } + + a:link { color: #7070ff; text-decoration: underline; } + a:visited { color: #e030ff; text-decoration: underline; } + a:active { color: #800000; text-decoration: underline; } + body.contentspage a { text-decoration: none; } + a.u { color: #fff !important; text-decoration: none; } + + body.pod { + margin: 0 5px; + color: #fff; + background-color: #000; + } + + body.pod h1, body.pod h2, body.pod h3, body.pod h4 { + font-family: Tahoma, Verdana, Helvetica, Arial, sans-serif; + font-weight: normal; + margin-top: 1.2em; + margin-bottom: .1em; + border-top: thin solid transparent; + /* margin-left: -5px; border-left: 2px #7070ff solid; padding-left: 3px; */ + } + + body.pod h1 { border-top-color: #0a0; } + body.pod h2 { border-top-color: #080; } + body.pod h3 { border-top-color: #040; } + body.pod h4 { border-top-color: #010; } + + p.backlinktop + h1 { border-top: none; margin-top: 0em; } + p.backlinktop + h2 { border-top: none; margin-top: 0em; } + p.backlinktop + h3 { border-top: none; margin-top: 0em; } + p.backlinktop + h4 { border-top: none; margin-top: 0em; } + + body.pod dt { + font-size: 105%; /* just a wee bit more than normal */ + } + + .indexgroup { font-size: 80%; } + + .backlinktop, .backlinkbottom { + margin-left: -5px; + margin-right: -5px; + background-color: #040; + border-top: thin solid #050; + border-bottom: thin solid #050; + } + + .backlinktop a, .backlinkbottom a { + text-decoration: none; + color: #080; + background-color: #000; + border: thin solid #0d0; + } + .backlinkbottom { margin-bottom: 0; padding-bottom: 0; } + .backlinktop { margin-top: 0; padding-top: 0; } + + body.contentspage { + color: #fff; + background-color: #000; + } + + body.contentspage h1 { + color: #0d0; + margin-left: 1em; + margin-right: 1em; + text-indent: -.9em; + font-family: Tahoma, Verdana, Helvetica, Arial, sans-serif; + font-weight: normal; + border-top: thin solid #fff; + border-bottom: thin solid #fff; + text-align: center; + } + + dl.superindex > dt { + font-family: Tahoma, Verdana, Helvetica, Arial, sans-serif; + font-weight: normal; + font-size: 90%; + margin-top: .45em; + /* margin-bottom: -.15em; */ + } + dl.superindex > dd { + word-spacing: .6em; /* most important rule here! */ + } + dl.superindex > a:link { + text-decoration: none; + color: #fff; + } + + .contentsfooty { + border-top: thin solid #999; + font-size: 90%; + } + +} + +/* The End */ + +EOCSS + +#========================================================================== + +$JAVASCRIPT = <<'EOJAVASCRIPT'; + +// From http://www.alistapart.com/articles/alternate/ + +function setActiveStyleSheet(title) { + var i, a, main; + for(i=0 ; (a = document.getElementsByTagName("link")[i]) ; i++) { + if(a.getAttribute("rel").indexOf("style") != -1 && a.getAttribute("title")) { + a.disabled = true; + if(a.getAttribute("title") == title) a.disabled = false; + } + } +} + +function getActiveStyleSheet() { + var i, a; + for(i=0 ; (a = document.getElementsByTagName("link")[i]) ; i++) { + if( a.getAttribute("rel").indexOf("style") != -1 + && a.getAttribute("title") + && !a.disabled + ) return a.getAttribute("title"); + } + return null; +} + +function getPreferredStyleSheet() { + var i, a; + for(i=0 ; (a = document.getElementsByTagName("link")[i]) ; i++) { + if( a.getAttribute("rel").indexOf("style") != -1 + && a.getAttribute("rel").indexOf("alt") == -1 + && a.getAttribute("title") + ) return a.getAttribute("title"); + } + return null; +} + +function createCookie(name,value,days) { + if (days) { + var date = new Date(); + date.setTime(date.getTime()+(days*24*60*60*1000)); + var expires = "; expires="+date.toGMTString(); + } + else expires = ""; + document.cookie = name+"="+value+expires+"; path=/"; +} + +function readCookie(name) { + var nameEQ = name + "="; + var ca = document.cookie.split(';'); + for(var i=0 ; i < ca.length ; i++) { + var c = ca[i]; + while (c.charAt(0)==' ') c = c.substring(1,c.length); + if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length); + } + return null; +} + +window.onload = function(e) { + var cookie = readCookie("style"); + var title = cookie ? cookie : getPreferredStyleSheet(); + setActiveStyleSheet(title); +} + +window.onunload = function(e) { + var title = getActiveStyleSheet(); + createCookie("style", title, 365); +} + +var cookie = readCookie("style"); +var title = cookie ? cookie : getPreferredStyleSheet(); +setActiveStyleSheet(title); + +// The End + +EOJAVASCRIPT + +# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +1; +__END__ + + +=head1 NAME + +Pod::Simple::HTMLBatch - convert several Pod files to several HTML files + +=head1 SYNOPSIS + + perl -MPod::Simple::HTMLBatch -e 'Pod::Simple::HTMLBatch::go' in out + + +=head1 DESCRIPTION + +This module is used for running batch-conversions of a lot of HTML +documents + +This class is NOT a subclass of Pod::Simple::HTML +(nor of bad old Pod::Html) -- although it uses +Pod::Simple::HTML for doing the conversion of each document. + +The normal use of this class is like so: + + use Pod::Simple::HTMLBatch; + my $batchconv = Pod::Simple::HTMLBatch->new; + $batchconv->some_option( some_value ); + $batchconv->some_other_option( some_other_value ); + $batchconv->batch_convert( \@search_dirs, $output_dir ); + +=head2 FROM THE COMMAND LINE + +Note that this class also provides +(but does not export) the function Pod::Simple::HTMLBatch::go. +This is basically just a shortcut for C<< +Pod::Simple::HTMLBatch->batch_convert(@ARGV) >>. +It's meant to be handy for calling from the command line. + +However, the shortcut requires that you specify exactly two command-line +arguments, C and C. + +Example: + + % mkdir out_html + % perl -MPod::Simple::HTMLBatch -e Pod::Simple::HTMLBatch::go @INC out_html + (to convert the pod from Perl's @INC + files under the directory ../htmlversion) + +(Note that the command line there contains a literal atsign-I-N-C. This +is handled as a special case by batch_convert, in order to save you having +to enter the odd-looking "" as the first command-line parameter when you +mean "just use whatever's in @INC".) + +Example: + + % mkdir ../seekrut + % chmod og-rx ../seekrut + % perl -MPod::Simple::HTMLBatch -e Pod::Simple::HTMLBatch::go . ../htmlversion + (to convert the pod under the current dir into HTML + files under the directory ../htmlversion) + +Example: + + % perl -MPod::Simple::HTMLBatch -e Pod::Simple::HTMLBatch::go happydocs . + (to convert all pod from happydocs into the current directory) + + + +=head1 MAIN METHODS + +=over + +=item $batchconv = Pod::Simple::HTMLBatch->new; + +This TODO + + +=item $batchconv->batch_convert( I, I ); + +this TODO + +=item $batchconv->batch_convert( undef , ...); + +=item $batchconv->batch_convert( q{@INC}, ...); + +These two values for I specify that the normal Perl @INC + +=item $batchconv->batch_convert( \@dirs , ...); + +This specifies that the input directories are the items in +the arrayref C<\@dirs>. + +=item $batchconv->batch_convert( "somedir" , ...); + +This specifies that the director "somedir" is the input. +(This can be an absolute or relative path, it doesn't matter.) + +A common value you might want would be just "." for the current +directory: + + $batchconv->batch_convert( "." , ...); + + +=item $batchconv->batch_convert( 'somedir:someother:also' , ...); + +This specifies that you want the dirs "somedir", "somother", and "also" +scanned, just as if you'd passed the arrayref +C<[qw( somedir someother also)]>. Note that a ":"-separator is normal +under Unix, but Under MSWin, you'll need C<'somedir;someother;also'> +instead, since the pathsep on MSWin is ";" instead of ":". (And +I is because ":" often comes up in paths, like +C<"c:/perl/lib">.) + +(Exactly what separator character should be used, is gotten from +C<$Config::Config{'path_sep'}>, via the L module.) + +=item $batchconv->batch_convert( ... , undef ); + +This specifies that you want the HTML output to go into the current +directory. + +(Note that a missing or undefined value means a different thing in +the first slot than in the second. That's so that C +with no arguments (or undef arguments) means "go from @INC, into +the current directory.) + +=item $batchconv->batch_convert( ... , 'somedir' ); + +This specifies that you want the HTML output to go into the +directory 'somedir'. +(This can be an absolute or relative path, it doesn't matter.) + +=back + + +Note that you can also call C as a class method, +like so: + + Pod::Simple::HTMLBatch->batch_convert( ... ); + +That is just short for this: + + Pod::Simple::HTMLBatch-> new-> batch_convert(...); + +That is, it runs a conversion with default options, for +whatever inputdirs and output dir you specify. + + +=head2 ACCESSOR METHODS + +The following are all accessor methods -- that is, they don't do anything +on their own, but just alter the contents of the conversion object, +which comprises the options for this particular batch conversion. + +We show the "put" form of the accessors below (i.e., the syntax you use +for setting the accessor to a specific value). But you can also +call each method with no parameters to get its current value. For +example, C<< $self->contents_file() >> returns the current value of +the contents_file attribute. + +=over + + +=item $batchconv->verbose( I ); + +This controls how verbose to be during batch conversion, as far as +notes to STDOUT (or whatever is C