#############################################################################
# Pod/Parser.pm -- package which defines a base class for parsing POD docs.
#
-# Copyright (C) 1996-1999 by Bradford Appleton. All rights reserved.
+# Copyright (C) 1996-2000 by Bradford Appleton. All rights reserved.
# This file is part of "PodParser". PodParser is free software;
# you can redistribute it and/or modify it under the same terms
# as Perl itself.
package Pod::Parser;
use vars qw($VERSION);
-$VERSION = 1.085; ## Current version of this package
-require 5.004; ## requires this Perl version or later
+$VERSION = 1.13; ## Current version of this package
+require 5.005; ## requires this Perl version or later
#############################################################################
sub interior_sequence {
my ($parser, $seq_command, $seq_argument) = @_;
## Expand an interior sequence; sample actions might be:
- return "*$seq_argument*" if ($seq_command = 'B');
- return "`$seq_argument'" if ($seq_command = 'C');
- return "_${seq_argument}_'" if ($seq_command = 'I');
+ return "*$seq_argument*" if ($seq_command eq 'B');
+ return "`$seq_argument'" if ($seq_command eq 'C');
+ return "_${seq_argument}_'" if ($seq_command eq 'I');
## ... other sequence commands and their resulting text
}
=head1 REQUIRES
-perl5.004, Pod::InputObjects, Exporter, Carp
+perl5.005, Pod::InputObjects, Exporter, Symbol, Carp
=head1 EXPORTS
do most of the input parsing for you and leave you free to worry about
how to intepret the commands and translate the result.
-Note that all we have described here in this quick overview overview is
-the simplest most striaghtforward use of B<Pod::Parser> to do stream-based
+Note that all we have described here in this quick overview is the
+simplest most straightforward use of B<Pod::Parser> to do stream-based
parsing. It is also possible to use the B<Pod::Parser::parse_text> function
to do more sophisticated tree-based parsing. See L<"TREE-BASED PARSING">.
A I<parse-option> is simply a named option of B<Pod::Parser> with a
value that corresponds to a certain specified behavior. These various
-behaviors of B<Pod::Parser> may be enabled/disabled by setting or
+behaviors of B<Pod::Parser> may be enabled/disabled by setting
or unsetting one or more I<parse-options> using the B<parseopts()> method.
The set of currently accepted parse-options is as follows:
of the POD-format documentation are not made available to the caller
(not even using B<preprocess_paragraph()>). Setting this option to a
non-empty, non-zero value will allow B<preprocess_paragraph()> to see
-non-POD sectioins of the input as well as POD sections. The B<cutting()>
+non-POD sections of the input as well as POD sections. The B<cutting()>
method can be used to determine if the corresponding paragraph is a POD
paragraph, or some other input paragraph.
Normally (by default) B<Pod::Parser> handles the C<=cut> POD directive
by itself and does not pass it on to the caller for processing. Setting
-this option to non-empty, non-zero value will cause B<Pod::Parser> to
+this option to a non-empty, non-zero value will cause B<Pod::Parser> to
pass the C<=cut> directive to the caller just like any other POD command
(and hence it may be processed by the B<command()> method).
to capture the actual C<=cut> paragraph itself for whatever purpose
it desires.
+=item B<-warnings> (default: unset)
+
+Normally (by default) B<Pod::Parser> recognizes a bare minimum of
+pod syntax errors and warnings and issues diagnostic messages
+for errors, but not for warnings. (Use B<Pod::Checker> to do more
+thorough checking of POD syntax.) Setting this option to a non-empty,
+non-zero value will cause B<Pod::Parser> to issue diagnostics for
+the few warnings it recognizes as well as the errors.
+
=back
Please see L<"parseopts()"> for a complete description of the interface
use Pod::InputObjects;
use Carp;
use Exporter;
+BEGIN {
+ if ($] < 5.6) {
+ require Symbol;
+ import Symbol;
+ }
+}
@ISA = qw(Exporter);
## These "variables" are used as local "glob aliases" for performance
place If the empty string is returned or an undefined value is
returned, then the given C<$text> is ignored (not processed).
-This method is invoked after gathering up all thelines in a paragraph
+This method is invoked after gathering up all the lines in a paragraph
+and after determining the cutting state of the paragraph,
but before trying to further parse or interpret them. After
B<preprocess_paragraph()> returns, the current cutting state (which
is returned by C<$self-E<gt>cutting()>) is examined. If it evaluates
-to false then input text (including the given C<$text>) is cut (not
+to true then input text (including the given C<$text>) is cut (not
processed) until the next POD directive is encountered.
Please note that the B<preprocess_line()> method is invoked I<before>
the B<preprocess_paragraph()> method. After all (possibly preprocessed)
-lines in a paragraph have been assembled together and it has been
+lines in a paragraph have been assembled together and either it has been
determined that the paragraph is part of the POD documentation from one
-of the selected sections, then B<preprocess_paragraph()> is invoked.
+of the selected sections or the C<-want_nonPODs> option is true,
+then B<preprocess_paragraph()> is invoked.
The base class implementation of this method returns the given text.
This method is useful if you need to perform your own interpolation
of interior sequences and can't rely upon B<interpolate> to expand
-them in simple bottom-up order order.
+them in simple bottom-up order.
The parameter C<$text> is a string or block of text to be parsed
for interior sequences; and the parameter C<$line_num> is the
=cut
-## This global regex is used to see if the text before a '>' inside
-## an interior sequence looks like '-' or '=', but not '--', '==',
-## '!=', '$-', '$=' or <<op>>=
-use vars qw( $ARROW_RE );
-$ARROW_RE = join('', qw{ (?: [^-+*/=!&|%^x.<>$]= | [^-$]- )$ });
-#$ARROW_RE = qr/(?:[^-+*/=!&|%^x.<>$]+=|[^-$]+-)$/; ## 5.005+ only!
-
sub parse_text {
my $self = shift;
local $_ = '';
my $text = shift;
my $line = shift;
my $file = $self->input_file();
- my ($cmd, $prev) = ('', '');
+ my $cmd = "";
## Convert method calls into closures, for our convenience
my $xseq_sub = $expand_seq;
ref $xseq_sub or $xseq_sub = sub { shift()->$expand_seq(@_) };
ref $xtext_sub or $xtext_sub = sub { shift()->$expand_text(@_) };
ref $xptree_sub or $xptree_sub = sub { shift()->$expand_ptree(@_) };
-
+
## Keep track of the "current" interior sequence, and maintain a stack
## of "in progress" sequences.
##
##
my $seq = Pod::ParseTree->new();
my @seq_stack = ($seq);
+ my ($ldelim, $rdelim) = ('', '');
- ## Iterate over all sequence starts/stops, newlines, & text
- ## (NOTE: split with capturing parens keeps the delimiters)
+ ## Iterate over all sequence starts text (NOTE: split with
+ ## capturing parens keeps the delimiters)
$_ = $text;
- for ( split /([A-Z]<|>|\n)/ ) {
- ## Keep track of line count
- ++$line if ($_ eq "\n");
+ my @tokens = split /([A-Z]<(?:<+\s)?)/;
+ while ( @tokens ) {
+ $_ = shift @tokens;
## Look for the beginning of a sequence
- if ( /^([A-Z])(<)$/ ) {
+ if ( /^([A-Z])(<(?:<+\s)?)$/ ) {
## Push a new sequence onto the stack of those "in-progress"
+ ($cmd, $ldelim) = ($1, $2);
$seq = Pod::InteriorSequence->new(
- -name => ($cmd = $1),
- -ldelim => $2, -rdelim => '',
- -file => $file, -line => $line
+ -name => $cmd,
+ -ldelim => $ldelim, -rdelim => '',
+ -file => $file, -line => $line
);
+ $ldelim =~ s/\s+$//, ($rdelim = $ldelim) =~ tr/</>/;
(@seq_stack > 1) and $seq->nested($seq_stack[-1]);
push @seq_stack, $seq;
}
- ## Look for sequence ending (preclude '->' and '=>' inside C<...>)
- elsif ( (@seq_stack > 1) and
- /^>$/ and ($cmd ne 'C' or $prev !~ /$ARROW_RE/o) )
- {
- ## End of current sequence, record terminating delimiter
- $seq->rdelim($_);
- ## Pop it off the stack of "in progress" sequences
- pop @seq_stack;
- ## Append result to its parent in current parse tree
- $seq_stack[-1]->append($expand_seq ? &$xseq_sub($self,$seq) : $seq);
- ## Remember the current cmd-name
- $cmd = (@seq_stack > 1) ? $seq_stack[-1]->name : '';
+ ## Look for sequence ending
+ elsif ( @seq_stack > 1 ) {
+ ## Make sure we match the right kind of closing delimiter
+ my ($seq_end, $post_seq) = ("", "");
+ if ( ($ldelim eq '<' and /\A(.*?)(>)/s)
+ or /\A(.*?)(\s+$rdelim)/s )
+ {
+ ## Found end-of-sequence, capture the interior and the
+ ## closing the delimiter, and put the rest back on the
+ ## token-list
+ $post_seq = substr($_, length($1) + length($2));
+ ($_, $seq_end) = ($1, $2);
+ (length $post_seq) and unshift @tokens, $post_seq;
+ }
+ if (length) {
+ ## In the middle of a sequence, append this text to it, and
+ ## dont forget to "expand" it if that's what the caller wanted
+ $seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_);
+ $_ .= $seq_end;
+ }
+ if (length $seq_end) {
+ ## End of current sequence, record terminating delimiter
+ $seq->rdelim($seq_end);
+ ## Pop it off the stack of "in progress" sequences
+ pop @seq_stack;
+ ## Append result to its parent in current parse tree
+ $seq_stack[-1]->append($expand_seq ? &$xseq_sub($self,$seq)
+ : $seq);
+ ## Remember the current cmd-name and left-delimiter
+ $cmd = (@seq_stack > 1) ? $seq_stack[-1]->name : '';
+ $ldelim = (@seq_stack > 1) ? $seq_stack[-1]->ldelim : '';
+ $ldelim =~ s/\s+$//, ($rdelim = $ldelim) =~ tr/</>/;
+ }
}
elsif (length) {
## In the middle of a sequence, append this text to it, and
## dont forget to "expand" it if that's what the caller wanted
$seq->append($expand_text ? &$xtext_sub($self,$_,$seq) : $_);
}
- ## Remember the "current" sequence and the previously seen token
- ($seq, $prev) = ( $seq_stack[-1], $_ );
+ ## Keep track of line count
+ $line += tr/\n//;
+ ## Remember the "current" sequence
+ $seq = $seq_stack[-1];
}
## Handle unterminated sequences
my $errorsub = (@seq_stack > 1) ? $self->errorsub() : undef;
while (@seq_stack > 1) {
($cmd, $file, $line) = ($seq->name, $seq->file_line);
+ $ldelim = $seq->ldelim;
+ ($rdelim = $ldelim) =~ tr/</>/;
+ $rdelim =~ s/^(\S+)(\s*)$/$2$1/;
pop @seq_stack;
- my $errmsg = "** Unterminated $cmd<...> at $file line $line\n";
+ my $errmsg = "*** ERROR: unterminated ${cmd}${ldelim}...${rdelim}".
+ " at line $line in file $file\n";
(ref $errorsub) and &{$errorsub}($errmsg)
or (defined $errorsub) and $self->$errorsub($errmsg)
or warn($errmsg);
local $_;
## See if we want to preprocess nonPOD paragraphs as well as POD ones.
- my $wantNonPods = $myOpts{'-want_nonPODs'} || 0;
+ my $wantNonPods = $myOpts{'-want_nonPODs'};
+
+ ## Update cutting status
+ $myData{_CUTTING} = 0 if $text =~ /^={1,2}\S/;
## Perform any desired preprocessing if we wanted it this early
$wantNonPods and $text = $self->preprocess_paragraph($text, $line_num);
- ## This is the end of a non-empty paragraph
## Ignore up until next POD directive if we are cutting
- if ($myData{_CUTTING}) {
- return unless ($text =~ /^={1,2}\S/);
- $myData{_CUTTING} = 0;
- }
+ return if $myData{_CUTTING};
## Now we know this is block of text in a POD section!
## and whatever sequence of characters was used to separate them
$pfx = $1;
$_ = substr($text, length $pfx);
- $sep = /(\s+)(?=\S)/ ? $1 : '';
- ($cmd, $text) = split(" ", $_, 2);
+ ($cmd, $sep, $text) = split /(\s+)/, $_, 2;
## If this is a "cut" directive then we dont need to do anything
## except return to "cutting" mode.
if ($cmd eq 'cut') {
my %opts = (ref $_[0] eq 'HASH') ? %{ shift() } : ();
my ($in_fh, $out_fh) = @_;
$in_fh = \*STDIN unless ($in_fh);
+ local *myData = $self; ## alias to avoid deref-ing overhead
+ local *myOpts = ($myData{_PARSEOPTS} ||= {}); ## get parse-options
local $_;
## Put this stream at the top of the stack and do beginning-of-input
++$plines;
}
- ## See of this line is blank and ends the current paragraph.
+ ## See if this line is blank and ends the current paragraph.
## If it isnt, then keep iterating until it is.
- next unless (($textline =~ /^\s*$/) && (length $paragraph));
+ next unless (($textline =~ /^([^\S\r\n]*)[\r\n]*$/)
+ && (length $paragraph));
+
+ ## Issue a warning about any non-empty blank lines
+ if (length($1) > 0 and $myOpts{'-warnings'} and ! $myData{_CUTTING}) {
+ my $errorsub = $self->errorsub();
+ my $file = $self->input_file();
+ my $errmsg = "*** WARNING: line containing nothing but whitespace".
+ " in paragraph at line $nlines in file $file\n";
+ (ref $errorsub) and &{$errorsub}($errmsg)
+ or (defined $errorsub) and $self->$errorsub($errmsg)
+ or warn($errmsg);
+ }
## Now process the paragraph
parse_paragraph($self, $paragraph, ($nlines - $plines) + 1);
my $self = shift;
my %opts = (ref $_[0] eq 'HASH') ? %{ shift() } : ();
my ($infile, $outfile) = @_;
- my ($in_fh, $out_fh);
+ my ($in_fh, $out_fh) = (gensym, gensym) if ($] < 5.6);
my ($close_input, $close_output) = (0, 0);
local *myData = $self;
local $_;
elsif (ref $outfile) {
## Must be a filehandle-ref (or else assume its a ref to an
## object that supports the common IO write operations).
- $myData{_OUTFILE} = ${$outfile};;
+ $myData{_OUTFILE} = ${$outfile};
$out_fh = $outfile;
}
else {
## We have a filename, open it for writing
$myData{_OUTFILE} = $outfile;
+ (-d $outfile) and croak "$outfile is a directory, not POD input!\n";
open($out_fh, "> $outfile") or
croak "Can't open $outfile for writing: $!\n";
$close_output = 1;
my $errorsub = $parser->errorsub()
my $errmsg = "This is an error message!\n"
(ref $errorsub) and &{$errorsub}($errmsg)
- or (defined $errmsg) and $parser->$errorsub($errmsg)
+ or (defined $errorsub) and $parser->$errorsub($errmsg)
or warn($errmsg);
Returns a method name, or else a reference to the user-supplied subroutine
given values. Any unspecified parse-options are unaffected.
## Set them back to the default
- $parser->parseopts(-process_cut_cmd => 0);
+ $parser->parseopts(-warnings => 0);
When passed a single hash-ref, B<parseopts> uses that hash to completely
reset the existing parse-options, all previous parse-option values
## Reset all options to default
$parser->parseopts( { } );
-See L<"PARSING OPTIONS"> for more for the name and meaning of each
+See L<"PARSING OPTIONS"> for more information on the name and meaning of each
parse-option currently recognized.
=cut
tree-based approach. Rather than doing everything in one pass and
calling the B<interpolate()> method to expand sequences into text, it
may be desirable to instead create a parse-tree using the B<parse_text()>
-method to return a tree-like structure which may contain an ordered list
+method to return a tree-like structure which may contain an ordered
list of children (each of which may be a text-string, or a similar
tree-like structure).