#############################################################################
# Pod/Parser.pm -- package which defines a base class for parsing POD docs.
#
-# Based on Tom Christiansen's Pod::Text module
-# (with extensive modifications).
-#
-# Copyright (C) 1996-1999 Tom Christiansen. All rights reserved.
+# Copyright (C) 1996-1999 by Bradford Appleton. All rights reserved.
# This file is part of "PodParser". PodParser is free software;
# you can redistribute it and/or modify it under the same terms
# as Perl itself.
package Pod::Parser;
use vars qw($VERSION);
-$VERSION = 1.081; ## Current version of this package
+$VERSION = 1.085; ## Current version of this package
require 5.004; ## requires this Perl version or later
#############################################################################
do most of the input parsing for you and leave you free to worry about
how to intepret the commands and translate the result.
+Note that all we have described here in this quick overview overview is
+the simplest most striaghtforward use of B<Pod::Parser> to do stream-based
+parsing. It is also possible to use the B<Pod::Parser::parse_text> function
+to do more sophisticated tree-based parsing. See L<"TREE-BASED PARSING">.
+
+=head1 PARSING OPTIONS
+
+A I<parse-option> is simply a named option of B<Pod::Parser> with a
+value that corresponds to a certain specified behavior. These various
+behaviors of B<Pod::Parser> may be enabled/disabled by setting or
+or unsetting one or more I<parse-options> using the B<parseopts()> method.
+The set of currently accepted parse-options is as follows:
+
+=over 3
+
+=item B<-want_nonPODs> (default: unset)
+
+Normally (by default) B<Pod::Parser> will only provide access to
+the POD sections of the input. Input paragraphs that are not part
+of the POD-format documentation are not made available to the caller
+(not even using B<preprocess_paragraph()>). Setting this option to a
+non-empty, non-zero value will allow B<preprocess_paragraph()> to see
+non-POD sectioins of the input as well as POD sections. The B<cutting()>
+method can be used to determine if the corresponding paragraph is a POD
+paragraph, or some other input paragraph.
+
+=item B<-process_cut_cmd> (default: unset)
+
+Normally (by default) B<Pod::Parser> handles the C<=cut> POD directive
+by itself and does not pass it on to the caller for processing. Setting
+this option to non-empty, non-zero value will cause B<Pod::Parser> to
+pass the C<=cut> directive to the caller just like any other POD command
+(and hence it may be processed by the B<command()> method).
+
+B<Pod::Parser> will still interpret the C<=cut> directive to mean that
+"cutting mode" has been (re)entered, but the caller will get a chance
+to capture the actual C<=cut> paragraph itself for whatever purpose
+it desires.
+
+=back
+
+Please see L<"parseopts()"> for a complete description of the interface
+for the setting and unsetting of parse-options.
+
=cut
#############################################################################
@ISA = qw(Exporter);
## These "variables" are used as local "glob aliases" for performance
-use vars qw(%myData @input_stack);
+use vars qw(%myData %myOpts @input_stack);
#############################################################################
=head1 METHODS FOR PARSING AND PROCESSING
B<Pod::Parser> provides several methods to process input text. These
-methods typically won't need to be overridden, but subclasses may want
-to invoke them to exploit their functionality.
+methods typically won't need to be overridden (and in some cases they
+can't be overridden), but subclasses may want to invoke them to exploit
+their functionality.
=cut
[I<NOTE>: If the B<interior_sequence()> method is specified, then it is
invoked according to the interface specified in L<"interior_sequence()">].
+=item B<-expand_text> =E<gt> I<code-ref>|I<method-name>
+
+Normally, the parse-tree returned by B<parse_text()> will contain a
+text-string for each contiguous sequence of characters outside of an
+interior-sequence. Specifying B<-expand_text> tells B<parse_text()> to
+"preprocess" every such text-string it sees by invoking the referenced
+function (or named method of the parser object) and using the return value
+as the preprocessed (or "expanded") result. [Note that if the result is
+an interior-sequence, then it will I<not> be expanded as specified by the
+B<-expand_seq> option; Any such recursive expansion needs to be handled by
+the specified callback routine.]
+
+If a subroutine reference was given, it is invoked as:
+
+ &$code_ref( $parser, $text, $ptree_node )
+
+and if a method-name was given, it is invoked as:
+
+ $parser->method_name( $text, $ptree_node )
+
+where C<$parser> is a reference to the parser object, C<$text> is the
+text-string encountered, and C<$ptree_node> is a reference to the current
+node in the parse-tree (usually an interior-sequence object or else the
+top-level node of the parse-tree).
+
=item B<-expand_ptree> =E<gt> I<code-ref>|I<method-name>
Rather than returning a C<Pod::ParseTree>, pass the parse-tree as an
## This global regex is used to see if the text before a '>' inside
## an interior sequence looks like '-' or '=', but not '--', '==',
-## '$-', or '$='
+## '!=', '$-', '$=' or <<op>>=
use vars qw( $ARROW_RE );
-$ARROW_RE = join('', qw{ (?: [^-+*/=!&|%^x.<>$]= | [^$-]- )$ });
-#$ARROW_RE = qr/(?:[^=]+=|[^-]+-)$/; ## 5.005+ only!
+$ARROW_RE = join('', qw{ (?: [^-+*/=!&|%^x.<>$]= | [^-$]- )$ });
+#$ARROW_RE = qr/(?:[^-+*/=!&|%^x.<>$]+=|[^-$]+-)$/; ## 5.005+ only!
sub parse_text {
my $self = shift;
## Get options and set any defaults
my %opts = (ref $_[0]) ? %{ shift() } : ();
my $expand_seq = $opts{'-expand_seq'} || undef;
+ my $expand_text = $opts{'-expand_text'} || undef;
my $expand_ptree = $opts{'-expand_ptree'} || undef;
my $text = shift;
## Convert method calls into closures, for our convenience
my $xseq_sub = $expand_seq;
+ my $xtext_sub = $expand_text;
my $xptree_sub = $expand_ptree;
if (defined $expand_seq and $expand_seq eq 'interior_sequence') {
## If 'interior_sequence' is the method to use, we have to pass
};
}
ref $xseq_sub or $xseq_sub = sub { shift()->$expand_seq(@_) };
+ ref $xtext_sub or $xtext_sub = sub { shift()->$expand_text(@_) };
ref $xptree_sub or $xptree_sub = sub { shift()->$expand_ptree(@_) };
## Keep track of the "current" interior sequence, and maintain a stack
## Remember the current cmd-name
$cmd = (@seq_stack > 1) ? $seq_stack[-1]->name : '';
}
- else {
- ## In the middle of a sequence, append this text to it
- $seq->append($_) if length;
+ elsif (length) {
+ ## In the middle of a sequence, append this text to it, and
+ ## dont forget to "expand" it if that's what the caller wanted
+ $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_);
}
## Remember the "current" sequence and the previously seen token
($seq, $prev) = ( $seq_stack[-1], $_ );
}
## Handle unterminated sequences
+ my $errorsub = (@seq_stack > 1) ? $self->errorsub() : undef;
while (@seq_stack > 1) {
($cmd, $file, $line) = ($seq->name, $seq->file_line);
pop @seq_stack;
- warn "** Unterminated $cmd<...> at $file line $line\n";
+ my $errmsg = "** Unterminated $cmd<...> at $file line $line\n";
+ (ref $errorsub) and &{$errorsub}($errmsg)
+ or (defined $errmsg) and $self->$errorsub($errmsg)
+ or warn($errmsg);
$seq_stack[-1]->append($expand_seq ? &$xseq_sub($self,$seq) : $seq);
$seq = $seq_stack[-1];
}
with its corresponding line number, and invokes the appropriate method
(one of B<command()>, B<verbatim()>, or B<textblock()>).
-This method does I<not> usually need to be overridden by subclasses.
+For performance reasons, this method is invoked directly without any
+dynamic lookup; Hence subclasses may I<not> override it!
=end __PRIVATE__
sub parse_paragraph {
my ($self, $text, $line_num) = @_;
- local *myData = $self; ## an alias to avoid deref-ing overhead
+ local *myData = $self; ## alias to avoid deref-ing overhead
+ local *myOpts = ($myData{_PARSEOPTS} ||= {}); ## get parse-options
local $_;
+ ## See if we want to preprocess nonPOD paragraphs as well as POD ones.
+ my $wantNonPods = $myOpts{'-want_nonPODs'} || 0;
+
+ ## Perform any desired preprocessing if we wanted it this early
+ $wantNonPods and $text = $self->preprocess_paragraph($text, $line_num);
+
## This is the end of a non-empty paragraph
## Ignore up until next POD directive if we are cutting
if ($myData{_CUTTING}) {
$self->is_selected($text) or return ($myData{_CUTTING} = 1);
}
- ## Perform any desired preprocessing and re-check the "cutting" state
- $text = $self->preprocess_paragraph($text, $line_num);
- return 1 unless ((defined $text) and (length $text));
- return 1 if ($myData{_CUTTING});
+ ## If we havent already, perform any desired preprocessing and
+ ## then re-check the "cutting" state
+ unless ($wantNonPods) {
+ $text = $self->preprocess_paragraph($text, $line_num);
+ return 1 unless ((defined $text) and (length $text));
+ return 1 if ($myData{_CUTTING});
+ }
## Look for one of the three types of paragraphs
my ($pfx, $cmd, $arg, $sep) = ('', '', '', '');
## except return to "cutting" mode.
if ($cmd eq 'cut') {
$myData{_CUTTING} = 1;
- return;
+ return unless $myOpts{'-process_cut_cmd'};
}
}
## Save the attributes indicating how the command was specified.
##---------------------------------------------------------------------------
+=head1 B<errorsub()>
+
+ $parser->errorsub("method_name");
+ $parser->errorsub(\&warn_user);
+ $parser->errorsub(sub { print STDERR, @_ });
+
+Specifies the method or subroutine to use when printing error messages
+about POD syntax. The supplied method/subroutine I<must> return TRUE upon
+successful printing of the message. If C<undef> is given, then the B<warn>
+builtin is used to issue error messages (this is the default behavior).
+
+ my $errorsub = $parser->errorsub()
+ my $errmsg = "This is an error message!\n"
+ (ref $errorsub) and &{$errorsub}($errmsg)
+ or (defined $errmsg) and $parser->$errorsub($errmsg)
+ or warn($errmsg);
+
+Returns a method name, or else a reference to the user-supplied subroutine
+used to print error messages. Returns C<undef> if the B<warn> builtin
+is used to issue error messages (this is the default behavior).
+
+=cut
+
+sub errorsub {
+ return (@_ > 1) ? ($_[0]->{_ERRORSUB} = $_[1]) : $_[0]->{_ERRORSUB};
+}
+
+##---------------------------------------------------------------------------
+
=head1 B<cutting()>
$boolean = $parser->cutting();
##---------------------------------------------------------------------------
+##---------------------------------------------------------------------------
+
+=head1 B<parseopts()>
+
+When invoked with no additional arguments, B<parseopts> returns a hashtable
+of all the current parsing options.
+
+ ## See if we are parsing non-POD sections as well as POD ones
+ my %opts = $parser->parseopts();
+ $opts{'-want_nonPODs}' and print "-want_nonPODs\n";
+
+When invoked using a single string, B<parseopts> treats the string as the
+name of a parse-option and returns its corresponding value if it exists
+(returns C<undef> if it doesn't).
+
+ ## Did we ask to see '=cut' paragraphs?
+ my $want_cut = $parser->parseopts('-process_cut_cmd');
+ $want_cut and print "-process_cut_cmd\n";
+
+When invoked with multiple arguments, B<parseopts> treats them as
+key/value pairs and the specified parse-option names are set to the
+given values. Any unspecified parse-options are unaffected.
+
+ ## Set them back to the default
+ $parser->parseopts(-process_cut_cmd => 0);
+
+When passed a single hash-ref, B<parseopts> uses that hash to completely
+reset the existing parse-options, all previous parse-option values
+are lost.
+
+ ## Reset all options to default
+ $parser->parseopts( { } );
+
+See L<"PARSING OPTIONS"> for more for the name and meaning of each
+parse-option currently recognized.
+
+=cut
+
+sub parseopts {
+ local *myData = shift;
+ local *myOpts = ($myData{_PARSEOPTS} ||= {});
+ return %myOpts if (@_ == 0);
+ if (@_ == 1) {
+ local $_ = shift;
+ return ref($_) ? $myData{_PARSEOPTS} = $_ : $myOpts{$_};
+ }
+ my @newOpts = (%myOpts, @_);
+ $myData{_PARSEOPTS} = { @newOpts };
+}
+
+##---------------------------------------------------------------------------
+
=head1 B<output_file()>
$fname = $parser->output_file();
#############################################################################
+=head1 TREE-BASED PARSING
+
+If straightforward stream-based parsing wont meet your needs (as is
+likely the case for tasks such as translating PODs into structured
+markup languages like HTML and XML) then you may need to take the
+tree-based approach. Rather than doing everything in one pass and
+calling the B<interpolate()> method to expand sequences into text, it
+may be desirable to instead create a parse-tree using the B<parse_text()>
+method to return a tree-like structure which may contain an ordered list
+list of children (each of which may be a text-string, or a similar
+tree-like structure).
+
+Pay special attention to L<"METHODS FOR PARSING AND PROCESSING"> and
+to the objects described in L<Pod::InputObjects>. The former describes
+the gory details and parameters for how to customize and extend the
+parsing behavior of B<Pod::Parser>. B<Pod::InputObjects> provides
+several objects that may all be used interchangeably as parse-trees. The
+most obvious one is the B<Pod::ParseTree> object. It defines the basic
+interface and functionality that all things trying to be a POD parse-tree
+should do. A B<Pod::ParseTree> is defined such that each "node" may be a
+text-string, or a reference to another parse-tree. Each B<Pod::Paragraph>
+object and each B<Pod::InteriorSequence> object also supports the basic
+parse-tree interface.
+
+The B<parse_text()> method takes a given paragraph of text, and
+returns a parse-tree that contains one or more children, each of which
+may be a text-string, or an InteriorSequence object. There are also
+callback-options that may be passed to B<parse_text()> to customize
+the way it expands or transforms interior-sequences, as well as the
+returned result. These callbacks can be used to create a parse-tree
+with custom-made objects (which may or may not support the parse-tree
+interface, depending on how you choose to do it).
+
+If you wish to turn an entire POD document into a parse-tree, that process
+is fairly straightforward. The B<parse_text()> method is the key to doing
+this successfully. Every paragraph-callback (i.e. the polymorphic methods
+for B<command()>, B<verbatim()>, and B<textblock()> paragraphs) takes
+a B<Pod::Paragraph> object as an argument. Each paragraph object has a
+B<parse_tree()> method that can be used to get or set a corresponding
+parse-tree. So for each of those paragraph-callback methods, simply call
+B<parse_text()> with the options you desire, and then use the returned
+parse-tree to assign to the given paragraph object.
+
+That gives you a parse-tree for each paragraph - so now all you need is
+an ordered list of paragraphs. You can maintain that yourself as a data
+element in the object/hash. The most straightforward way would be simply
+to use an array-ref, with the desired set of custom "options" for each
+invocation of B<parse_text>. Let's assume the desired option-set is
+given by the hash C<%options>. Then we might do something like the
+following:
+
+ package MyPodParserTree;
+
+ @ISA = qw( Pod::Parser );
+
+ ...
+
+ sub begin_pod {
+ my $self = shift;
+ $self->{'-paragraphs'} = []; ## initialize paragraph list
+ }
+
+ sub command {
+ my ($parser, $command, $paragraph, $line_num, $pod_para) = @_;
+ my $ptree = $parser->parse_text({%options}, $paragraph, ...);
+ $pod_para->parse_tree( $ptree );
+ push @{ $self->{'-paragraphs'} }, $pod_para;
+ }
+
+ sub verbatim {
+ my ($parser, $paragraph, $line_num, $pod_para) = @_;
+ push @{ $self->{'-paragraphs'} }, $pod_para;
+ }
+
+ sub textblock {
+ my ($parser, $paragraph, $line_num, $pod_para) = @_;
+ my $ptree = $parser->parse_text({%options}, $paragraph, ...);
+ $pod_para->parse_tree( $ptree );
+ push @{ $self->{'-paragraphs'} }, $pod_para;
+ }
+
+ ...
+
+ package main;
+ ...
+ my $parser = new MyPodParserTree(...);
+ $parser->parse_from_file(...);
+ my $paragraphs_ref = $parser->{'-paragraphs'};
+
+Of course, in this module-author's humble opinion, I'd be more inclined to
+use the existing B<Pod::ParseTree> object than a simple array. That way
+everything in it, paragraphs and sequences, all respond to the same core
+interface for all parse-tree nodes. The result would look something like:
+
+ package MyPodParserTree2;
+
+ ...
+
+ sub begin_pod {
+ my $self = shift;
+ $self->{'-ptree'} = new Pod::ParseTree; ## initialize parse-tree
+ }
+
+ sub parse_tree {
+ ## convenience method to get/set the parse-tree for the entire POD
+ (@_ > 1) and $_[0]->{'-ptree'} = $_[1];
+ return $_[0]->{'-ptree'};
+ }
+
+ sub command {
+ my ($parser, $command, $paragraph, $line_num, $pod_para) = @_;
+ my $ptree = $parser->parse_text({<<options>>}, $paragraph, ...);
+ $pod_para->parse_tree( $ptree );
+ $parser->parse_tree()->append( $pod_para );
+ }
+
+ sub verbatim {
+ my ($parser, $paragraph, $line_num, $pod_para) = @_;
+ $parser->parse_tree()->append( $pod_para );
+ }
+
+ sub textblock {
+ my ($parser, $paragraph, $line_num, $pod_para) = @_;
+ my $ptree = $parser->parse_text({<<options>>}, $paragraph, ...);
+ $pod_para->parse_tree( $ptree );
+ $parser->parse_tree()->append( $pod_para );
+ }
+
+ ...
+
+ package main;
+ ...
+ my $parser = new MyPodParserTree2(...);
+ $parser->parse_from_file(...);
+ my $ptree = $parser->parse_tree;
+ ...
+
+Now you have the entire POD document as one great big parse-tree. You
+can even use the B<-expand_seq> option to B<parse_text> to insert
+whole different kinds of objects. Just don't expect B<Pod::Parser>
+to know what to do with them after that. That will need to be in your
+code. Or, alternatively, you can insert any object you like so long as
+it conforms to the B<Pod::ParseTree> interface.
+
+One could use this to create subclasses of B<Pod::Paragraphs> and
+B<Pod::InteriorSequences> for specific commands (or to create your own
+custom node-types in the parse-tree) and add some kind of B<emit()>
+method to each custom node/subclass object in the tree. Then all you'd
+need to do is recursively walk the tree in the desired order, processing
+the children (most likely from left to right) by formatting them if
+they are text-strings, or by calling their B<emit()> method if they
+are objects/references.
+
=head1 SEE ALSO
L<Pod::InputObjects>, L<Pod::Select>
-#############################################################################
-# Pod/PlainText.pm -- convert POD data to formatted ASCII text
+# Pod::PlainText -- Convert POD data to formatted ASCII text.
+# $Id: Text.pm,v 2.1 1999/09/20 11:53:33 eagle Exp $
#
-# Derived from Tom Christiansen's Pod::PlainText module
-# (with extensive modifications).
+# Copyright 1999 by Russ Allbery <rra@stanford.edu>
#
-# Copyright (C) 1994-1999 Tom Christiansen. All rights reserved.
-# This file is part of "PodParser". PodParser is free software;
-# you can redistribute it and/or modify it under the same terms
-# as Perl itself.
-#############################################################################
-
-package Pod::PlainText;
-
-use vars qw($VERSION);
-$VERSION = 1.081; ## Current version of this package
-require 5.004; ## requires this Perl version or later
-
-=head1 NAME
-
-pod2plaintext - function to convert POD data to formatted ASCII text
-
-Pod::PlainText - a class for converting POD data to formatted ASCII text
-
-=head1 SYNOPSIS
-
- use Pod::PlainText;
- pod2plaintext("perlfunc.pod");
-
-or
-
- use Pod::PlainText;
- package MyParser;
- @ISA = qw(Pod::PlainText);
+# This program is free software; you can redistribute it and/or modify it
+# under the same terms as Perl itself.
+#
+# This module is intended to be a replacement for Pod::Text, and attempts to
+# match its output except for some specific circumstances where other
+# decisions seemed to produce better output. It uses Pod::Parser and is
+# designed to be very easy to subclass.
- sub new {
- ## constructor code ...
- }
+############################################################################
+# Modules and declarations
+############################################################################
- ## implementation of appropriate subclass methods ...
+package Pod::PlainText;
- package main;
- $parser = new MyParser;
- @ARGV = ('-') unless (@ARGV > 0);
- for (@ARGV) {
- $parser->parse_from_file($_);
- }
+require 5.004;
-=head1 REQUIRES
+use Carp qw(carp croak);
+use Pod::Select ();
-perl5.004, Pod::Select, Term::Cap, Exporter, Carp
+use strict;
+use vars qw(@ISA %ESCAPES $VERSION);
+
+# We inherit from Pod::Select instead of Pod::Parser so that we can be used
+# by Pod::Usage.
+@ISA = qw(Pod::Select);
+
+($VERSION = (split (' ', q$Revision: 2.1 $ ))[1]) =~ s/\.(\d)$/.0$1/;
+
+
+############################################################################
+# Table of supported E<> escapes
+############################################################################
+
+# This table is taken near verbatim from Pod::PlainText in Pod::Parser,
+# which got it near verbatim from the original Pod::Text. It is therefore
+# credited to Tom Christiansen, and I'm glad I didn't have to write it. :)
+%ESCAPES = (
+ 'amp' => '&', # ampersand
+ 'lt' => '<', # left chevron, less-than
+ 'gt' => '>', # right chevron, greater-than
+ 'quot' => '"', # double quote
+
+ "Aacute" => "\xC1", # capital A, acute accent
+ "aacute" => "\xE1", # small a, acute accent
+ "Acirc" => "\xC2", # capital A, circumflex accent
+ "acirc" => "\xE2", # small a, circumflex accent
+ "AElig" => "\xC6", # capital AE diphthong (ligature)
+ "aelig" => "\xE6", # small ae diphthong (ligature)
+ "Agrave" => "\xC0", # capital A, grave accent
+ "agrave" => "\xE0", # small a, grave accent
+ "Aring" => "\xC5", # capital A, ring
+ "aring" => "\xE5", # small a, ring
+ "Atilde" => "\xC3", # capital A, tilde
+ "atilde" => "\xE3", # small a, tilde
+ "Auml" => "\xC4", # capital A, dieresis or umlaut mark
+ "auml" => "\xE4", # small a, dieresis or umlaut mark
+ "Ccedil" => "\xC7", # capital C, cedilla
+ "ccedil" => "\xE7", # small c, cedilla
+ "Eacute" => "\xC9", # capital E, acute accent
+ "eacute" => "\xE9", # small e, acute accent
+ "Ecirc" => "\xCA", # capital E, circumflex accent
+ "ecirc" => "\xEA", # small e, circumflex accent
+ "Egrave" => "\xC8", # capital E, grave accent
+ "egrave" => "\xE8", # small e, grave accent
+ "ETH" => "\xD0", # capital Eth, Icelandic
+ "eth" => "\xF0", # small eth, Icelandic
+ "Euml" => "\xCB", # capital E, dieresis or umlaut mark
+ "euml" => "\xEB", # small e, dieresis or umlaut mark
+ "Iacute" => "\xCD", # capital I, acute accent
+ "iacute" => "\xED", # small i, acute accent
+ "Icirc" => "\xCE", # capital I, circumflex accent
+ "icirc" => "\xEE", # small i, circumflex accent
+ "Igrave" => "\xCD", # capital I, grave accent
+ "igrave" => "\xED", # small i, grave accent
+ "Iuml" => "\xCF", # capital I, dieresis or umlaut mark
+ "iuml" => "\xEF", # small i, dieresis or umlaut mark
+ "Ntilde" => "\xD1", # capital N, tilde
+ "ntilde" => "\xF1", # small n, tilde
+ "Oacute" => "\xD3", # capital O, acute accent
+ "oacute" => "\xF3", # small o, acute accent
+ "Ocirc" => "\xD4", # capital O, circumflex accent
+ "ocirc" => "\xF4", # small o, circumflex accent
+ "Ograve" => "\xD2", # capital O, grave accent
+ "ograve" => "\xF2", # small o, grave accent
+ "Oslash" => "\xD8", # capital O, slash
+ "oslash" => "\xF8", # small o, slash
+ "Otilde" => "\xD5", # capital O, tilde
+ "otilde" => "\xF5", # small o, tilde
+ "Ouml" => "\xD6", # capital O, dieresis or umlaut mark
+ "ouml" => "\xF6", # small o, dieresis or umlaut mark
+ "szlig" => "\xDF", # small sharp s, German (sz ligature)
+ "THORN" => "\xDE", # capital THORN, Icelandic
+ "thorn" => "\xFE", # small thorn, Icelandic
+ "Uacute" => "\xDA", # capital U, acute accent
+ "uacute" => "\xFA", # small u, acute accent
+ "Ucirc" => "\xDB", # capital U, circumflex accent
+ "ucirc" => "\xFB", # small u, circumflex accent
+ "Ugrave" => "\xD9", # capital U, grave accent
+ "ugrave" => "\xF9", # small u, grave accent
+ "Uuml" => "\xDC", # capital U, dieresis or umlaut mark
+ "uuml" => "\xFC", # small u, dieresis or umlaut mark
+ "Yacute" => "\xDD", # capital Y, acute accent
+ "yacute" => "\xFD", # small y, acute accent
+ "yuml" => "\xFF", # small y, dieresis or umlaut mark
+
+ "lchevron" => "\xAB", # left chevron (double less than)
+ "rchevron" => "\xBB", # right chevron (double greater than)
+);
-=head1 EXPORTS
-pod2plaintext()
+############################################################################
+# Initialization
+############################################################################
-=head1 DESCRIPTION
+# Initialize the object. Must be sure to call our parent initializer.
+sub initialize {
+ my $self = shift;
-Pod::PlainText is a module that can convert documentation in the POD
-format (such as can be found throughout the Perl distribution) into
-formatted ASCII. Termcap is optionally supported for
-boldface/underline, and can be enabled via C<$Pod::PlainText::termcap=1>.
-If termcap has not been enabled, then backspaces will be used to
-simulate bold and underlined text.
+ $$self{alt} = 0 unless defined $$self{alt};
+ $$self{indent} = 4 unless defined $$self{indent};
+ $$self{loose} = 0 unless defined $$self{loose};
+ $$self{sentence} = 0 unless defined $$self{sentence};
+ $$self{width} = 76 unless defined $$self{width};
-A separate F<pod2plaintext> program is included that is primarily a wrapper
-for C<Pod::PlainText::pod2plaintext()>.
+ $$self{INDENTS} = []; # Stack of indentations.
+ $$self{MARGIN} = $$self{indent}; # Current left margin in spaces.
-The single function C<pod2plaintext()> can take one or two arguments. The first
-should be the name of a file to read the pod from, or "<&STDIN" to read from
-STDIN. A second argument, if provided, should be a filehandle glob where
-output should be sent.
+ $self->SUPER::initialize;
+}
-=head1 SEE ALSO
-L<Pod::Parser>.
+############################################################################
+# Core overrides
+############################################################################
-=head1 AUTHOR
+# Called for each command paragraph. Gets the command, the associated
+# paragraph, the line number, and a Pod::Paragraph object. Just dispatches
+# the command to a method named the same as the command. =cut is handled
+# internally by Pod::Parser.
+sub command {
+ my $self = shift;
+ my $command = shift;
+ return if $command eq 'pod';
+ return if ($$self{EXCLUDE} && $command ne 'end');
+ $self->item ("\n") if defined $$self{ITEM};
+ $command = 'cmd_' . $command;
+ $self->$command (@_);
+}
-Tom Christiansen E<lt>tchrist@mox.perl.comE<gt>
+# Called for a verbatim paragraph. Gets the paragraph, the line number, and
+# a Pod::Paragraph object. Just output it verbatim, but with tabs converted
+# to spaces.
+sub verbatim {
+ my $self = shift;
+ return if $$self{EXCLUDE};
+ $self->item if defined $$self{ITEM};
+ local $_ = shift;
+ return if /^\s*$/;
+ s/^(\s*\S+)/(' ' x $$self{MARGIN}) . $1/gme;
+ $self->output ($_);
+}
-Modified to derive from B<Pod::Parser> by
-Brad Appleton E<lt>bradapp@enteract.comE<gt>
+# Called for a regular text block. Gets the paragraph, the line number, and
+# a Pod::Paragraph object. Perform interpolation and output the results.
+sub textblock {
+ my $self = shift;
+ return if $$self{EXCLUDE};
+ $self->output ($_[0]), return if $$self{VERBATIM};
+ local $_ = shift;
+ my $line = shift;
-=cut
+ # Perform a little magic to collapse multiple L<> references. This is
+ # here mostly for backwards-compatibility. We'll just rewrite the whole
+ # thing into actual text at this part, bypassing the whole internal
+ # sequence parsing thing.
+ s{
+ (
+ L< # A link of the form L</something>.
+ /
+ (
+ [:\w]+ # The item has to be a simple word...
+ (\(\))? # ...or simple function.
+ )
+ >
+ (
+ ,?\s+(and\s+)? # Allow lots of them, conjuncted.
+ L<
+ /
+ (
+ [:\w]+
+ (\(\))?
+ )
+ >
+ )+
+ )
+ } {
+ local $_ = $1;
+ s%L</([^>]+)>%$1%g;
+ my @items = split /(?:,?\s+(?:and\s+)?)/;
+ my $string = "the ";
+ my $i;
+ for ($i = 0; $i < @items; $i++) {
+ $string .= $items[$i];
+ $string .= ", " if @items > 2 && $i != $#items;
+ $string .= " and " if ($i == $#items - 1);
+ }
+ $string .= " entries elsewhere in this document";
+ $string;
+ }gex;
+
+ # Now actually interpolate and output the paragraph.
+ $_ = $self->interpolate ($_, $line);
+ s/\s+$/\n/;
+ if (defined $$self{ITEM}) {
+ $self->item ($_ . "\n");
+ } else {
+ $self->output ($self->reformat ($_ . "\n"));
+ }
+}
-#############################################################################
+# Called for an interior sequence. Gets the command, argument, and a
+# Pod::InteriorSequence object and is expected to return the resulting text.
+# Calls code, bold, italic, file, and link to handle those types of
+# sequences, and handles S<>, E<>, X<>, and Z<> directly.
+sub interior_sequence {
+ my $self = shift;
+ my $command = shift;
+ local $_ = shift;
+ return '' if ($command eq 'X' || $command eq 'Z');
-use strict;
-#use diagnostics;
-use Carp;
-use Exporter;
-use Pod::Select;
-use Term::Cap;
-use vars qw(@ISA @EXPORT %HTML_Escapes);
-
-@ISA = qw(Exporter Pod::Select);
-@EXPORT = qw(&pod2plaintext);
-
-%HTML_Escapes = (
- 'amp' => '&', # ampersand
- 'lt' => '<', # left chevron, less-than
- 'gt' => '>', # right chevron, greater-than
- 'quot' => '"', # double quote
-
- "Aacute" => "\xC1", # capital A, acute accent
- "aacute" => "\xE1", # small a, acute accent
- "Acirc" => "\xC2", # capital A, circumflex accent
- "acirc" => "\xE2", # small a, circumflex accent
- "AElig" => "\xC6", # capital AE diphthong (ligature)
- "aelig" => "\xE6", # small ae diphthong (ligature)
- "Agrave" => "\xC0", # capital A, grave accent
- "agrave" => "\xE0", # small a, grave accent
- "Aring" => "\xC5", # capital A, ring
- "aring" => "\xE5", # small a, ring
- "Atilde" => "\xC3", # capital A, tilde
- "atilde" => "\xE3", # small a, tilde
- "Auml" => "\xC4", # capital A, dieresis or umlaut mark
- "auml" => "\xE4", # small a, dieresis or umlaut mark
- "Ccedil" => "\xC7", # capital C, cedilla
- "ccedil" => "\xE7", # small c, cedilla
- "Eacute" => "\xC9", # capital E, acute accent
- "eacute" => "\xE9", # small e, acute accent
- "Ecirc" => "\xCA", # capital E, circumflex accent
- "ecirc" => "\xEA", # small e, circumflex accent
- "Egrave" => "\xC8", # capital E, grave accent
- "egrave" => "\xE8", # small e, grave accent
- "ETH" => "\xD0", # capital Eth, Icelandic
- "eth" => "\xF0", # small eth, Icelandic
- "Euml" => "\xCB", # capital E, dieresis or umlaut mark
- "euml" => "\xEB", # small e, dieresis or umlaut mark
- "Iacute" => "\xCD", # capital I, acute accent
- "iacute" => "\xED", # small i, acute accent
- "Icirc" => "\xCE", # capital I, circumflex accent
- "icirc" => "\xEE", # small i, circumflex accent
- "Igrave" => "\xCD", # capital I, grave accent
- "igrave" => "\xED", # small i, grave accent
- "Iuml" => "\xCF", # capital I, dieresis or umlaut mark
- "iuml" => "\xEF", # small i, dieresis or umlaut mark
- "Ntilde" => "\xD1", # capital N, tilde
- "ntilde" => "\xF1", # small n, tilde
- "Oacute" => "\xD3", # capital O, acute accent
- "oacute" => "\xF3", # small o, acute accent
- "Ocirc" => "\xD4", # capital O, circumflex accent
- "ocirc" => "\xF4", # small o, circumflex accent
- "Ograve" => "\xD2", # capital O, grave accent
- "ograve" => "\xF2", # small o, grave accent
- "Oslash" => "\xD8", # capital O, slash
- "oslash" => "\xF8", # small o, slash
- "Otilde" => "\xD5", # capital O, tilde
- "otilde" => "\xF5", # small o, tilde
- "Ouml" => "\xD6", # capital O, dieresis or umlaut mark
- "ouml" => "\xF6", # small o, dieresis or umlaut mark
- "szlig" => "\xDF", # small sharp s, German (sz ligature)
- "THORN" => "\xDE", # capital THORN, Icelandic
- "thorn" => "\xFE", # small thorn, Icelandic
- "Uacute" => "\xDA", # capital U, acute accent
- "uacute" => "\xFA", # small u, acute accent
- "Ucirc" => "\xDB", # capital U, circumflex accent
- "ucirc" => "\xFB", # small u, circumflex accent
- "Ugrave" => "\xD9", # capital U, grave accent
- "ugrave" => "\xF9", # small u, grave accent
- "Uuml" => "\xDC", # capital U, dieresis or umlaut mark
- "uuml" => "\xFC", # small u, dieresis or umlaut mark
- "Yacute" => "\xDD", # capital Y, acute accent
- "yacute" => "\xFD", # small y, acute accent
- "yuml" => "\xFF", # small y, dieresis or umlaut mark
-
- "lchevron" => "\xAB", # left chevron (double less than)
- "rchevron" => "\xBB", # right chevron (double greater than)
-);
+ # Expand escapes into the actual character now, carping if invalid.
+ if ($command eq 'E') {
+ return $ESCAPES{$_} if defined $ESCAPES{$_};
+ carp "Unknown escape: E<$_>";
+ return "E<$_>";
+ }
-##---------------------------------
-## Function definitions begin here
-##---------------------------------
+ # For all the other sequences, empty content produces no output.
+ return if $_ eq '';
- ## Try to find #columns for the tty
-my %NotUnix = map {($_ => 1)} qw(MacOS MSWin32 VMS MVS);
-sub get_screen {
- ((defined $ENV{TERMCAP}) && ($ENV{TERMCAP} =~ /co#(\d+)/)[0])
- or ((defined $ENV{COLUMNS}) && $ENV{COLUMNS})
- or (!$NotUnix{$^O} && (`stty -a 2>/dev/null` =~ /(\d+) columns/)[0])
- or 72;
+ # For S<>, compress all internal whitespace and then map spaces to \01.
+ # When we output the text, we'll map this back.
+ if ($command eq 'S') {
+ s/\s{2,}/ /g;
+ tr/ /\01/;
+ return $_;
+ }
+ # Anything else needs to get dispatched to another method.
+ if ($command eq 'B') { return $self->seq_b ($_) }
+ elsif ($command eq 'C') { return $self->seq_c ($_) }
+ elsif ($command eq 'F') { return $self->seq_f ($_) }
+ elsif ($command eq 'I') { return $self->seq_i ($_) }
+ elsif ($command eq 'L') { return $self->seq_l ($_) }
+ else { carp "Unknown sequence $command<$_>" }
}
-sub pod2plaintext {
- my ($infile, $outfile) = @_;
- local $_;
- my $text_parser = new Pod::PlainText;
- $text_parser->parse_from_file($infile, $outfile);
+# Called for each paragraph that's actually part of the POD. We take
+# advantage of this opportunity to untabify the input.
+sub preprocess_paragraph {
+ my $self = shift;
+ local $_ = shift;
+ 1 while s/^(.*?)(\t+)/$1 . ' ' x (length ($2) * 8 - length ($1) % 8)/me;
+ $_;
}
-##-------------------------------
-## Method definitions begin here
-##-------------------------------
-
-sub new {
- my $this = shift;
- my $class = ref($this) || $this;
- my %params = @_;
- my $self = {%params};
- bless $self, $class;
- $self->initialize();
- return $self;
-}
-sub initialize {
+############################################################################
+# Command paragraphs
+############################################################################
+
+# All command paragraphs take the paragraph and the line number.
+
+# First level heading.
+sub cmd_head1 {
my $self = shift;
- $self->SUPER::initialize();
- return;
+ local $_ = shift;
+ s/\s+$//;
+ $_ = $self->interpolate ($_, shift);
+ if ($$self{alt}) {
+ $self->output ("\n==== $_ ====\n\n");
+ } else {
+ $_ .= "\n" if $$self{loose};
+ $self->output ($_ . "\n");
+ }
}
-sub makespace {
+# Second level heading.
+sub cmd_head2 {
my $self = shift;
- my $out_fh = $self->output_handle();
- if ($self->{NEEDSPACE}) {
- print $out_fh "\n";
- $self->{NEEDSPACE} = 0;
+ local $_ = shift;
+ s/\s+$//;
+ $_ = $self->interpolate ($_, shift);
+ if ($$self{alt}) {
+ $self->output ("\n== $_ ==\n\n");
+ } else {
+ $self->output (' ' x ($$self{indent} / 2) . $_ . "\n\n");
}
}
-sub bold {
+# Start a list.
+sub cmd_over {
my $self = shift;
- my $line = shift;
- my $map = $self->{FONTMAP};
- return $line if $self->{USE_FORMAT};
- if ($self->{TERMCAP}) {
- $line = "$map->{BOLD}$line$map->{NORM}";
- }
- else {
- $line =~ s/(.)/$1\b$1/g;
- }
-# $line = "$map->{BOLD}$line$map->{NORM}" if $self->{ANSIFY};
- return $line;
+ local $_ = shift;
+ unless (/^[-+]?\d+\s+$/) { $_ = $$self{indent} }
+ push (@{ $$self{INDENTS} }, $$self{MARGIN});
+ $$self{MARGIN} += ($_ + 0);
}
-sub italic {
+# End a list.
+sub cmd_back {
my $self = shift;
- my $line = shift;
- my $map = $self->{FONTMAP};
- return $line if $self->{USE_FORMAT};
- if ($self->{TERMCAP}) {
- $line = "$map->{UNDL}$line$map->{NORM}";
+ $$self{MARGIN} = pop @{ $$self{INDENTS} };
+ unless (defined $$self{MARGIN}) {
+ carp "Unmatched =back";
+ $$self{MARGIN} = $$self{indent};
}
- else {
- $line =~ s/(.)/$1\b_/g;
- }
-# $line = "$map->{UNDL}$line$map->{NORM}" if $self->{ANSIFY};
- return $line;
}
-# Fill a paragraph including underlined and overstricken chars.
-# It's not perfect for words longer than the margin, and it's probably
-# slow, but it works.
-sub fill {
+# An individual list item.
+sub cmd_item {
my $self = shift;
+ if (defined $$self{ITEM}) { $self->item }
local $_ = shift;
- my $par = "";
- my $indent_space = " " x $self->{INDENT};
- my $marg = $self->{SCREEN} - $self->{INDENT};
- my $line = $indent_space;
- my $line_length;
- foreach (split) {
- my $word_length = length;
- $word_length -= 2 while /\010/g; # Subtract backspaces
-
- if ($line_length + $word_length > $marg) {
- $par .= $line . "\n";
- $line= $indent_space . $_;
- $line_length = $word_length;
- }
- else {
- if ($line_length) {
- $line_length++;
- $line .= " ";
- }
- $line_length += $word_length;
- $line .= $_;
- }
- }
- $par .= "$line\n" if length $line;
- $par .= "\n";
- return $par;
+ s/\s+$//;
+ $$self{ITEM} = $self->interpolate ($_);
}
-## Handle a pending "item" paragraph. The paragraph (if given) is the
-## corresponding item text. (the item tag should be in $self->{ITEM}).
-sub item {
+# Begin a block for a particular translator. Setting VERBATIM triggers
+# special handling in textblock().
+sub cmd_begin {
my $self = shift;
- my $cmd = shift;
local $_ = shift;
- my $line = shift;
- $cmd = '' unless (defined $cmd);
- $_ = '' unless (defined $_);
- my $out_fh = $self->output_handle();
- return unless (defined $self->{ITEM});
- my $paratag = $self->{ITEM};
- my $prev_indent = $self->{INDENTS}->[-1] || $self->{DEF_INDENT};
- ## reset state
- undef $self->{ITEM};
- #$self->rm_callbacks('*');
-
- my $over = $self->{INDENT};
- $over -= $prev_indent if ($prev_indent < $over);
- if (length $cmd) { # tricked - this is another command
- $self->output($paratag, INDENT => $prev_indent);
- $self->command($cmd, $_);
- }
- elsif (/^\s+/o) { # verbatim
- $self->output($paratag, INDENT => $prev_indent);
- s/\s+\Z//;
- $self->verbatim($_);
- }
- else { # plain textblock
- $_ = $self->interpolate($_, $line);
- s/\s+\Z//;
- if ((length $_) && (length($paratag) <= $over)) {
- $self->IP_output($paratag, $_);
- }
- else {
- $self->output($paratag, INDENT => $prev_indent);
- $self->output($_, REFORMAT => 1);
- }
+ my ($kind) = /^(\S+)/ or return;
+ if ($kind eq 'text') {
+ $$self{VERBATIM} = 1;
+ } else {
+ $$self{EXCLUDE} = 1;
}
}
-sub remap_whitespace {
- my $self = shift;
- local($_) = shift;
- tr/\000-\177/\200-\377/;
- return $_;
-}
-
-sub unmap_whitespace {
- my $self = shift;
- local($_) = shift;
- tr/\200-\377/\000-\177/;
- return $_;
-}
-
-sub IP_output {
- my $self = shift;
- my $tag = shift;
- local($_) = @_;
- my $out_fh = $self->output_handle();
- my $tag_indent = $self->{INDENTS}->[-1] || $self->{DEF_INDENT};
- my $tag_cols = $self->{SCREEN} - $tag_indent;
- my $cols = $self->{SCREEN} - $self->{INDENT};
- $tag =~ s/\s*$//;
- s/\s+/ /g;
- s/^ //;
- my $fmt_name = '_Pod_Text_IP_output_format_';
- my $str = "format $fmt_name = \n"
- . (" " x ($tag_indent))
- . '@' . ('<' x ($self->{INDENT} - $tag_indent - 1))
- . "^" . ("<" x ($cols - 1)) . "\n"
- . '$tag, $_'
- . "\n~~"
- . (" " x ($self->{INDENT} - 2))
- . "^" . ("<" x ($cols - 5)) . "\n"
- . '$_' . "\n\n.\n1";
- #warn $str; warn "tag is $tag, _ is $_";
- {
- ## reset format (turn off warning about redefining a format)
- local($^W) = 0;
- eval $str;
- croak if ($@);
- }
- select((select($out_fh), $~ = $fmt_name)[0]);
- local($:) = ($self->curr_headings(1) eq 'SYNOPSIS') ? "\n " : $: ;
- write $out_fh;
-}
+# End a block for a particular translator. We assume that all =begin/=end
+# pairs are properly closed.
+sub cmd_end {
+ my $self = shift;
+ $$self{EXCLUDE} = 0;
+ $$self{VERBATIM} = 0;
+}
-sub output {
+# One paragraph for a particular translator. Ignore it unless it's intended
+# for text, in which case we treat it as a verbatim text block.
+sub cmd_for {
my $self = shift;
local $_ = shift;
- $_ = '' unless (defined $_);
- return unless (length $_);
- my $out_fh = $self->output_handle();
- my %options;
- if (@_ > 1) {
- ## usage was $self->output($text, NAME=>VALUE, ...);
- %options = @_;
- }
- elsif (@_ == 1) {
- if (ref $_[0]) {
- ## usage was $self->output($text, { NAME=>VALUE, ... } );
- %options = %{$_[0]};
- }
- else {
- ## usage was $self->output($text, $number);
- $options{"REFORMAT"} = shift;
- }
- }
- $options{"INDENT"} = $self->{INDENT} unless (defined $options{"INDENT"});
- if ((defined $options{"REFORMAT"}) && $options{"REFORMAT"}) {
- my $cols = $self->{SCREEN} - $options{"INDENT"};
- s/\s+/ /g;
- s/^ //;
- my $fmt_name = '_Pod_Text_output_format_';
- my $str = "format $fmt_name = \n~~"
- . (" " x ($options{"INDENT"} - 2))
- . "^" . ("<" x ($cols - 5)) . "\n"
- . '$_' . "\n\n.\n1";
- {
- ## reset format (turn off warning about redefining a format)
- local($^W) = 0;
- eval $str;
- croak if ($@);
- }
- select((select($out_fh), $~ = $fmt_name)[0]);
- local($:) = ($self->curr_headings(1) eq 'SYNOPSIS') ? "\n " : $: ;
- write $out_fh;
- }
- else {
- s/^/' ' x $options{"INDENT"}/gem;
- s/^\s+\n$/\n/gm;
- print $out_fh $_;
- }
+ my $line = shift;
+ return unless s/^text\b[ \t]*\n?//;
+ $self->verbatim ($_, $line);
}
-sub internal_lrefs {
+
+############################################################################
+# Interior sequences
+############################################################################
+
+# The simple formatting ones. These are here mostly so that subclasses can
+# override them and do more complicated things.
+sub seq_b { return $_[0]{alt} ? "``$_[1]''" : $_[1] }
+sub seq_c { return $_[0]{alt} ? "``$_[1]''" : "`$_[1]'" }
+sub seq_f { return $_[0]{alt} ? "\"$_[1]\"" : $_[1] }
+sub seq_i { return '*' . $_[1] . '*' }
+
+# The complicated one. Handle links. Since this is plain text, we can't
+# actually make any real links, so this is all to figure out what text we
+# print out.
+sub seq_l {
my $self = shift;
local $_ = shift;
- s{L</([^>]+)>}{$1}g;
- my(@items) = split( /(?:,?\s+(?:and\s+)?)/ );
- my $retstr = "the ";
- my $i;
- for ($i = 0; $i <= $#items; $i++) {
- $retstr .= "C<$items[$i]>";
- $retstr .= ", " if @items > 2 && $i != $#items;
- $retstr .= " and " if $i+2 == @items;
- }
- $retstr .= " entr" . ( @items > 1 ? "ies" : "y" )
- . " elsewhere in this document ";
+ # Smash whitespace in case we were split across multiple lines.
+ s/\s+/ /g;
- return $retstr;
+ # If we were given any explicit text, just output it.
+ if (/^([^|]+)\|/) { return $1 }
+
+ # Okay, leading and trailing whitespace isn't important; get rid of it.
+ s/^\s+//;
+ s/\s+$//;
+
+ # Default to using the whole content of the link entry as a section
+ # name. Note that L<manpage/> forces a manpage interpretation, as does
+ # something looking like L<manpage(section)>. The latter is an
+ # enhancement over the original Pod::Text.
+ my ($manpage, $section) = ('', $_);
+ if (/^"\s*(.*?)\s*"$/) {
+ $section = '"' . $1 . '"';
+ } elsif (m/^[-:.\w]+(?:\(\S+\))?$/) {
+ ($manpage, $section) = ($_, '');
+ } elsif (m%/%) {
+ ($manpage, $section) = split (/\s*\/\s*/, $_, 2);
+ }
+
+ # Now build the actual output text.
+ my $text = '';
+ if (!length $section) {
+ $text = "the $manpage manpage" if length $manpage;
+ } elsif ($section =~ /^[:\w]+(?:\(\))?/) {
+ $text .= 'the ' . $section . ' entry';
+ $text .= (length $manpage) ? " in the $manpage manpage"
+ : " elsewhere in this document";
+ } else {
+ $section =~ s/^\"\s*//;
+ $section =~ s/\s*\"$//;
+ $text .= 'the section on "' . $section . '"';
+ $text .= " in the $manpage manpage" if length $manpage;
+ }
+ $text;
}
-sub begin_pod {
- my $self = shift;
- $self->{BEGUN} = [];
- $self->{TERMCAP} = 0;
- #$self->{USE_FORMAT} = 1;
-
- $self->{FONTMAP} = {
- UNDL => "\x1b[4m",
- INV => "\x1b[7m",
- BOLD => "\x1b[1m",
- NORM => "\x1b[0m",
- };
- if ($self->{TERMCAP} and (! defined $self->{SETUPTERMCAP})) {
- $self->{SETUPTERMCAP} = 1;
- my ($term) = Tgetent Term::Cap { TERM => undef, OSPEED => 9600 };
- $self->{FONTMAP}->{UNDL} = $term->{'_us'};
- $self->{FONTMAP}->{INV} = $term->{'_mr'};
- $self->{FONTMAP}->{BOLD} = $term->{'_md'};
- $self->{FONTMAP}->{NORM} = $term->{'_me'};
- }
-
- $self->{SCREEN} = &get_screen;
- $self->{FANCY} = 0;
- $self->{DEF_INDENT} = 4;
- $self->{INDENTS} = [];
- $self->{INDENT} = $self->{DEF_INDENT};
- $self->{NEEDSPACE} = 0;
-}
+############################################################################
+# List handling
+############################################################################
-sub end_pod {
+# This method is called whenever an =item command is complete (in other
+# words, we've seen its associated paragraph or know for certain that it
+# doesn't have one). It gets the paragraph associated with the item as an
+# argument. If that argument is empty, just output the item tag; if it
+# contains a newline, output the item tag followed by the newline.
+# Otherwise, see if there's enough room for us to output the item tag in the
+# margin of the text or if we have to put it on a separate line.
+sub item {
my $self = shift;
- $self->item('', '', '', 0) if (defined $self->{ITEM});
+ local $_ = shift;
+ my $tag = $$self{ITEM};
+ unless (defined $tag) {
+ carp "item called without tag";
+ return;
+ }
+ undef $$self{ITEM};
+ my $indent = $$self{INDENTS}[-1];
+ unless (defined $indent) { $indent = $$self{indent} }
+ my $space = ' ' x $indent;
+ $space =~ s/^ /:/ if $$self{alt};
+ if (!$_ || /^\s+$/ || ($$self{MARGIN} - $indent < length ($tag) + 1)) {
+ my $margin = $$self{MARGIN};
+ $$self{MARGIN} = $indent;
+ my $output = $self->reformat ($tag);
+ $output =~ s/\n*$/\n/;
+ $self->output ($output);
+ $$self{MARGIN} = $margin;
+ $self->output ($self->reformat ($_)) if /\S/;
+ } else {
+ $_ = $self->reformat ($_);
+ s/^ /:/ if ($$self{alt} && $indent > 0);
+ my $tagspace = ' ' x length $tag;
+ s/^($space)$tagspace/$1$tag/ or warn "Bizarre space in item";
+ $self->output ($_);
+ }
}
-sub begun_excluded {
- my $self = shift;
- my @begun = @{ $self->{BEGUN} };
- return (@begun > 0) ? ($begun[-1] ne 'text') : 0;
-}
-sub command {
+############################################################################
+# Output formatting
+############################################################################
+
+# Wrap a line, indenting by the current left margin. We can't use
+# Text::Wrap because it plays games with tabs. We can't use formline, even
+# though we'd really like to, because it screws up non-printing characters.
+# So we have to do the wrapping ourselves.
+sub wrap {
my $self = shift;
- my $cmd = shift;
local $_ = shift;
- my $line = shift;
- $cmd = '' unless (defined $cmd);
- $_ = '' unless (defined $_);
- my $out_fh = $self->output_handle();
-
- return if (($cmd ne 'end') and $self->begun_excluded());
- return $self->item($cmd, $_, $line) if (defined $self->{ITEM});
- $_ = $self->interpolate($_, $line);
- s/\s+\Z/\n/;
-
- return if ($cmd eq 'pod');
- if ($cmd eq 'head1') {
- $self->makespace();
- print $out_fh $_;
- # print $out_fh uc($_);
- }
- elsif ($cmd eq 'head2') {
- $self->makespace();
- # s/(\w+)/\u\L$1/g;
- #print ' ' x $self->{DEF_INDENT}, $_;
- # print "\xA7";
- s/(\w)/\xA7 $1/ if $self->{FANCY};
- print $out_fh ' ' x ($self->{DEF_INDENT}/2), $_, "\n";
- }
- elsif ($cmd eq 'over') {
- /^[-+]?\d+$/ or $_ = $self->{DEF_INDENT};
- push(@{$self->{INDENTS}}, $self->{INDENT});
- $self->{INDENT} += ($_ + 0);
- }
- elsif ($cmd eq 'back') {
- $self->{INDENT} = pop(@{$self->{INDENTS}});
- unless (defined $self->{INDENT}) {
- carp "Unmatched =back\n";
- $self->{INDENT} = $self->{DEF_INDENT};
+ my $output = '';
+ my $spaces = ' ' x $$self{MARGIN};
+ my $width = $$self{width} - $$self{MARGIN};
+ while (length > $width) {
+ if (s/^([^\n]{0,$width})\s+// || s/^([^\n]{$width})//) {
+ $output .= $spaces . $1 . "\n";
+ } else {
+ last;
}
}
- elsif ($cmd eq 'begin') {
- my ($kind) = /^(\S*)/;
- push( @{ $self->{BEGUN} }, $kind );
- }
- elsif ($cmd eq 'end') {
- pop( @{ $self->{BEGUN} } );
- }
- elsif ($cmd eq 'for') {
- $self->textblock($1) if /^text\b\s*(.*)$/s;
- }
- elsif ($cmd eq 'item') {
- $self->makespace();
- # s/\A(\s*)\*/$1\xb7/ if $self->{FANCY};
- # s/^(\s*\*\s+)/$1 /;
- $self->{ITEM} = $_;
- #$self->add_callbacks('*', SUB => \&item);
- }
- else {
- carp "Unrecognized directive: $cmd\n";
- }
+ $output .= $spaces . $_;
+ $output =~ s/\s+$/\n\n/;
+ $output;
}
-sub verbatim {
+# Reformat a paragraph of text for the current margin. Takes the text to
+# reformat and returns the formatted text.
+sub reformat {
my $self = shift;
local $_ = shift;
- my $line = shift;
- return if $self->begun_excluded();
- return $self->item('', $_, $line) if (defined $self->{ITEM});
- $self->output($_);
- #$self->{NEEDSPACE} = 1;
-}
-sub textblock {
- my $self = shift;
- my $text = shift;
- my $line = shift;
- return if $self->begun_excluded();
- return $self->item('', $text, $line) if (defined $self->{ITEM});
- local($_) = $self->interpolate($text, $line);
- s/\s*\Z/\n/;
- $self->makespace();
- $self->output($_, REFORMAT => 1);
+ # If we're trying to preserve two spaces after sentences, do some
+ # munging to support that. Otherwise, smash all repeated whitespace.
+ if ($$self{sentence}) {
+ s/ +$//mg;
+ s/\.\n/. \n/g;
+ s/\n/ /g;
+ s/ +/ /g;
+ } else {
+ s/\s+/ /g;
+ }
+ $self->wrap ($_);
}
-sub interior_sequence {
- my $self = shift;
- my $cmd = shift;
- my $arg = shift;
- local($_) = $arg;
- if ($cmd eq 'C') {
- my ($pre, $post) = ("`", "'");
- ($pre, $post) = ($HTML_Escapes{"lchevron"}, $HTML_Escapes{"rchevron"})
- if ((defined $self->{FANCY}) && $self->{FANCY});
- $_ = $pre . $_ . $post;
- }
- elsif ($cmd eq 'E') {
- if (defined $HTML_Escapes{$_}) {
- $_ = $HTML_Escapes{$_};
- }
+# Output text to the output device.
+sub output { $_[1] =~ tr/\01/ /; print { $_[0]->output_handle } $_[1] }
+
+
+############################################################################
+# Backwards compatibility
+############################################################################
+
+# The old Pod::Text module did everything in a pod2text() function. This
+# tries to provide the same interface for legacy applications.
+sub pod2text {
+ my @args;
+
+ # This is really ugly; I hate doing option parsing in the middle of a
+ # module. But the old Pod::Text module supported passing flags to its
+ # entry function, so handle -a and -<number>.
+ while ($_[0] =~ /^-/) {
+ my $flag = shift;
+ if ($flag eq '-a') { push (@args, alt => 1) }
+ elsif ($flag =~ /^-(\d+)$/) { push (@args, width => $1) }
else {
- carp "Unknown escape: E<$_>";
- $_ = "E<$_>";
+ unshift (@_, $flag);
+ last;
}
- # }
- # elsif ($cmd eq 'B') {
- # $_ = $self->bold($_);
}
- elsif ($cmd eq 'I') {
- # $_ = $self->italic($_);
- $_ = "*" . $_ . "*";
- }
- elsif (($cmd eq 'X') || ($cmd eq 'Z')) {
- $_ = '';
- }
- elsif ($cmd eq 'S') {
- # Escape whitespace until we are ready to print
- #$_ = $self->remap_whitespace($_);
- }
- elsif ($cmd eq 'L') {
- s/\s+/ /g;
- my ($text, $manpage, $sec, $ref) = ('', $_, '', '');
- if (/\A(.*?)\|(.*)\Z/) {
- $text = $1;
- $manpage = $_ = $2;
- }
- if (/^\s*"\s*(.*)\s*"\s*$/o) {
- ($manpage, $sec) = ('', "\"$1\"");
- }
- elsif (m|\s*/\s*|s) {
- ($manpage, $sec) = split(/\s*\/\s*/, $_, 2);
- }
- if (! length $sec) {
- $ref .= "the $manpage manpage" if (length $manpage);
- }
- elsif ($sec =~ /^\s*"\s*(.*)\s*"\s*$/o) {
- $ref .= "the section on \"$1\"";
- $ref .= " in the $manpage manpage" if (length $manpage);
- }
- else {
- $ref .= "the \"$sec\" entry";
- $ref .= (length $manpage) ? " in the $manpage manpage"
- : " in this manpage"
+
+ # Now that we know what arguments we're using, create the parser.
+ my $parser = Pod::PlainText->new (@args);
+
+ # If two arguments were given, the second argument is going to be a file
+ # handle. That means we want to call parse_from_filehandle(), which
+ # means we need to turn the first argument into a file handle. Magic
+ # open will handle the <&STDIN case automagically.
+ if (defined $_[1]) {
+ local *IN;
+ unless (open (IN, $_[0])) {
+ croak ("Can't open $_[0] for reading: $!\n");
+ return;
}
- $_ = $text || $ref;
- #if ( m{^ ([a-zA-Z][^\s\/]+) (\([^\)]+\))? $}x ) {
- # ## LREF: a manpage(3f)
- # $_ = "the $1$2 manpage";
- #}
- #elsif ( m{^ ([^/]+) / ([:\w]+(\(\))?) $}x ) {
- # ## LREF: an =item on another manpage
- # $_ = "the \"$2\" entry in the $1 manpage";
- #}
- #elsif ( m{^ / ([:\w]+(\(\))?) $}x ) {
- # ## LREF: an =item on this manpage
- # $_ = $self->internal_lrefs($1);
- #}
- #elsif ( m{^ (?: ([a-zA-Z]\S+?) / )? "?(.*?)"? $}x ) {
- # ## LREF: a =head2 (head1?), maybe on a manpage, maybe right here
- # ## the "func" can disambiguate
- # $_ = ((defined $1) && $1)
- # ? "the section on \"$2\" in the $1 manpage"
- # : "the section on \"$2\"";
- #}
+ $_[0] = \*IN;
+ return $parser->parse_from_filehandle (@_);
+ } else {
+ return $parser->parse_from_file (@_);
}
- return $_;
}
+
+############################################################################
+# Module return value and documentation
+############################################################################
+
1;
+__END__
+
+=head1 NAME
+
+Pod::PlainText - Convert POD data to formatted ASCII text
+
+=head1 SYNOPSIS
+
+ use Pod::PlainText;
+ my $parser = Pod::PlainText->new (sentence => 0, width => 78);
+
+ # Read POD from STDIN and write to STDOUT.
+ $parser->parse_from_filehandle;
+
+ # Read POD from file.pod and write to file.txt.
+ $parser->parse_from_file ('file.pod', 'file.txt');
+
+=head1 DESCRIPTION
+
+Pod::PlainText is a module that can convert documentation in the POD format (the
+preferred language for documenting Perl) into formatted ASCII. It uses no
+special formatting controls or codes whatsoever, and its output is therefore
+suitable for nearly any device.
+
+As a derived class from Pod::Parser, Pod::PlainText supports the same methods and
+interfaces. See L<Pod::Parser> for all the details; briefly, one creates a
+new parser with C<Pod::PlainText-E<gt>new()> and then calls either
+parse_from_filehandle() or parse_from_file().
+
+new() can take options, in the form of key/value pairs, that control the
+behavior of the parser. The currently recognized options are:
+
+=over 4
+
+=item alt
+
+If set to a true value, selects an alternate output format that, among other
+things, uses a different heading style and marks C<=item> entries with a
+colon in the left margin. Defaults to false.
+
+=item indent
+
+The number of spaces to indent regular text, and the default indentation for
+C<=over> blocks. Defaults to 4.
+
+=item loose
+
+If set to a true value, a blank line is printed after a C<=head1> heading.
+If set to false (the default), no blank line is printed after C<=head1>,
+although one is still printed after C<=head2>. This is the default because
+it's the expected formatting for manual pages; if you're formatting
+arbitrary text documents, setting this to true may result in more pleasing
+output.
+
+=item sentence
+
+If set to a true value, Pod::PlainText will assume that each sentence ends in two
+spaces, and will try to preserve that spacing. If set to false, all
+consecutive whitespace in non-verbatim paragraphs is compressed into a
+single space. Defaults to true.
+
+=item width
+
+The column at which to wrap text on the right-hand side. Defaults to 76.
+
+=back
+
+The standard Pod::Parser method parse_from_filehandle() takes up to two
+arguments, the first being the file handle to read POD from and the second
+being the file handle to write the formatted output to. The first defaults
+to STDIN if not given, and the second defaults to STDOUT. The method
+parse_from_file() is almost identical, except that its two arguments are the
+input and output disk files instead. See L<Pod::Parser> for the specific
+details.
+
+=head1 DIAGNOSTICS
+
+=over 4
+
+=item Bizarre space in item
+
+(W) Something has gone wrong in internal C<=item> processing. This message
+indicates a bug in Pod::PlainText; you should never see it.
+
+=item Can't open %s for reading: %s
+
+(F) Pod::PlainText was invoked via the compatibility mode pod2text() interface
+and the input file it was given could not be opened.
+
+=item Unknown escape: %s
+
+(W) The POD source contained an C<EE<lt>E<gt>> escape that Pod::PlainText didn't
+know about.
+
+=item Unknown sequence: %s
+
+(W) The POD source contained a non-standard internal sequence (something of
+the form C<XE<lt>E<gt>>) that Pod::PlainText didn't know about.
+
+=item Unmatched =back
+
+(W) Pod::PlainText encountered a C<=back> command that didn't correspond to an
+C<=over> command.
+
+=back
+
+=head1 RESTRICTIONS
+
+Embedded Ctrl-As (octal 001) in the input will be mapped to spaces on
+output, due to an internal implementation detail.
+
+=head1 NOTES
+
+This is a replacement for an earlier Pod::Text module written by Tom
+Christiansen. It has a revamped interface, since it now uses Pod::Parser,
+but an interface roughly compatible with the old Pod::Text::pod2text()
+function is still available. Please change to the new calling convention,
+though.
+
+The original Pod::Text contained code to do formatting via termcap
+sequences, although it wasn't turned on by default and it was problematic to
+get it to work at all. This rewrite doesn't even try to do that, but a
+subclass of it does. Look for L<Pod::Text::Termcap|Pod::Text::Termcap>.
+
+=head1 SEE ALSO
+
+L<Pod::Parser|Pod::Parser>, L<Pod::Text::Termcap|Pod::Text::Termcap>,
+pod2text(1)
+
+=head1 AUTHOR
+
+Russ Allbery E<lt>rra@stanford.eduE<gt>, based I<very> heavily on the
+original Pod::Text by Tom Christiansen E<lt>tchrist@mox.perl.comE<gt> and
+its conversion to Pod::Parser by Brad Appleton
+E<lt>bradapp@enteract.comE<gt>.
+
+=cut