1 #############################################################################
2 # Pod/Select.pm -- function to select portions of POD docs
4 # Based on Tom Christiansen's pod2text() function
5 # (with extensive modifications).
7 # Copyright (C) 1996-1999 Tom Christiansen. All rights reserved.
8 # This file is part of "PodParser". PodParser is free software;
9 # you can redistribute it and/or modify it under the same terms
11 #############################################################################
15 use vars qw($VERSION);
16 $VERSION = 1.081; ## Current version of this package
17 require 5.004; ## requires this Perl version or later
19 #############################################################################
23 Pod::Select, podselect() - extract selected sections of POD from input
29 ## Select all the POD sections for each file in @filelist
30 ## and print the result on standard output.
33 ## Same as above, but write to tmp.out
34 podselect({-output => "tmp.out"}, @filelist):
36 ## Select from the given filelist, only those POD sections that are
37 ## within a 1st level section named any of: NAME, SYNOPSIS, OPTIONS.
38 podselect({-sections => ["NAME|SYNOPSIS", "OPTIONS"]}, @filelist):
40 ## Select the "DESCRIPTION" section of the PODs from STDIN and write
41 ## the result to STDERR.
42 podselect({-output => ">&STDERR", -sections => ["DESCRIPTION"]}, \*STDIN);
48 ## Create a parser object for selecting POD sections from the input
49 $parser = new Pod::Select();
51 ## Select all the POD sections for each file in @filelist
52 ## and print the result to tmp.out.
53 $parser->parse_from_file("<&STDIN", "tmp.out");
55 ## Select from the given filelist, only those POD sections that are
56 ## within a 1st level section named any of: NAME, SYNOPSIS, OPTIONS.
57 $parser->select("NAME|SYNOPSIS", "OPTIONS");
58 for (@filelist) { $parser->parse_from_file($_); }
60 ## Select the "DESCRIPTION" and "SEE ALSO" sections of the PODs from
61 ## STDIN and write the result to STDERR.
62 $parser->select("DESCRIPTION");
63 $parser->add_selection("SEE ALSO");
64 $parser->parse_from_filehandle(\*STDIN, \*STDERR);
68 perl5.004, Pod::Parser, Exporter, FileHandle, Carp
76 B<podselect()> is a function which will extract specified sections of
77 pod documentation from an input stream. This ability is provided by the
78 B<Pod::Select> module which is a subclass of B<Pod::Parser>.
79 B<Pod::Select> provides a method named B<select()> to specify the set of
80 POD sections to select for processing/printing. B<podselect()> merely
81 creates a B<Pod::Select> object and then invokes the B<podselect()>
82 followed by B<parse_from_file()>.
84 =head1 SECTION SPECIFICATIONS
86 B<podselect()> and B<Pod::Select::select()> may be given one or more
87 "section specifications" to restrict the text processed to only the
88 desired set of sections and their corresponding subsections. A section
89 specification is a string containing one or more Perl-style regular
90 expressions separated by forward slashes ("/"). If you need to use a
91 forward slash literally within a section title you can escape it with a
94 The formal syntax of a section specification is:
100 I<head1-title-regex>/I<head2-title-regex>/...
104 Any omitted or empty regular expressions will default to ".*".
105 Please note that each regular expression given is implicitly
106 anchored by adding "^" and "$" to the beginning and end. Also, if a
107 given regular expression starts with a "!" character, then the
108 expression is I<negated> (so C<!foo> would match anything I<except>
111 Some example section specifications follow.
116 Match the C<NAME> and C<SYNOPSIS> sections and all of their subsections:
121 Match only the C<Question> and C<Answer> subsections of the C<DESCRIPTION>
124 C<DESCRIPTION/Question|Answer>
127 Match the C<Comments> subsection of I<all> sections:
132 Match all subsections of C<DESCRIPTION> I<except> for C<Comments>:
134 C<DESCRIPTION/!Comments>
137 Match the C<DESCRIPTION> section but do I<not> match any of its subsections:
142 Match all top level sections but none of their subsections:
148 =begin _NOT_IMPLEMENTED_
150 =head1 RANGE SPECIFICATIONS
152 B<podselect()> and B<Pod::Select::select()> may be given one or more
153 "range specifications" to restrict the text processed to only the
154 desired ranges of paragraphs in the desired set of sections. A range
155 specification is a string containing a single Perl-style regular
156 expression (a regex), or else two Perl-style regular expressions
157 (regexs) separated by a ".." (Perl's "range" operator is "..").
158 The regexs in a range specification are delimited by forward slashes
159 ("/"). If you need to use a forward slash literally within a regex you
160 can escape it with a backslash ("\/").
162 The formal syntax of a range specification is:
168 /I<start-range-regex>/[../I<end-range-regex>/]
172 Where each the item inside square brackets (the ".." followed by the
173 end-range-regex) is optional. Each "range-regex" is of the form:
177 Where I<cmd-expr> is intended to match the name of one or more POD
178 commands, and I<text-expr> is intended to match the paragraph text for
179 the command. If a range-regex is supposed to match a POD command, then
180 the first character of the regex (the one after the initial '/')
181 absolutely I<must> be an single '=' character; it may not be anything
182 else (not even a regex meta-character) if it is supposed to match
183 against the name of a POD command.
185 If no I<=cmd-expr> is given then the text-expr will be matched against
186 plain textblocks unless it is preceded by a space, in which case it is
187 matched against verbatim text-blocks. If no I<text-expr> is given then
188 only the command-portion of the paragraph is matched against.
190 Note that these two expressions are each implicitly anchored. This
191 means that when matching against the command-name, there will be an
192 implicit '^' and '$' around the given I<=cmd-expr>; and when matching
193 against the paragraph text there will be an implicit '\A' and '\Z'
194 around the given I<text-expr>.
196 Unlike with section-specs, the '!' character does I<not> have any special
197 meaning (negation or otherwise) at the beginning of a range-spec!
199 Some example range specifications follow.
204 Match all C<=for html> paragraphs:
209 Match all paragraphs between C<=begin html> and C<=end html>
210 (note that this will I<not> work correctly if such sections
213 C</=begin html/../=end html/>
216 Match all paragraphs between the given C<=item> name until the end of the
219 C</=item mine/../=head\d/>
222 Match all paragraphs between the given C<=item> until the next item, or
223 until the end of the itemized list (note that this will I<not> work as
224 desired if the item contains an itemized list nested within it):
226 C</=item mine/../=(item|back)/>
230 =end _NOT_IMPLEMENTED_
234 #############################################################################
239 use Pod::Parser 1.04;
240 use vars qw(@ISA @EXPORT $MAX_HEADING_LEVEL);
242 @ISA = qw(Pod::Parser);
243 @EXPORT = qw(&podselect);
245 ## Maximum number of heading levels supported for '=headN' directives
246 *MAX_HEADING_LEVEL = \3;
248 #############################################################################
250 =head1 OBJECT METHODS
252 The following methods are provided in this module. Each one takes a
253 reference to the object itself as an implicit first parameter.
257 ##---------------------------------------------------------------------------
261 ## =head1 B<_init_headings()>
263 ## Initialize the current set of active section headings.
269 use vars qw(%myData @section_headings);
273 local *myData = $self;
275 ## Initialize current section heading titles if necessary
276 unless (defined $myData{_SECTION_HEADINGS}) {
277 local *section_headings = $myData{_SECTION_HEADINGS} = [];
278 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
279 $section_headings[$i] = '';
284 ##---------------------------------------------------------------------------
286 =head1 B<curr_headings()>
288 ($head1, $head2, $head3, ...) = $parser->curr_headings();
289 $head1 = $parser->curr_headings(1);
291 This method returns a list of the currently active section headings and
292 subheadings in the document being parsed. The list of headings returned
293 corresponds to the most recently parsed paragraph of the input.
295 If an argument is given, it must correspond to the desired section
296 heading number, in which case only the specified section heading is
297 returned. If there is no current section heading at the specified
298 level, then C<undef> is returned.
304 $self->_init_headings() unless (defined $self->{_SECTION_HEADINGS});
305 my @headings = @{ $self->{_SECTION_HEADINGS} };
306 return (@_ > 0 and $_[0] =~ /^\d+$/) ? $headings[$_[0] - 1] : @headings;
309 ##---------------------------------------------------------------------------
313 $parser->select($section_spec1,$section_spec2,...);
315 This method is used to select the particular sections and subsections of
316 POD documentation that are to be printed and/or processed. The existing
317 set of selected sections is I<replaced> with the given set of sections.
318 See B<add_selection()> for adding to the current set of selected
321 Each of the C<$section_spec> arguments should be a section specification
322 as described in L<"SECTION SPECIFICATIONS">. The section specifications
323 are parsed by this method and the resulting regular expressions are
324 stored in the invoking object.
326 If no C<$section_spec> arguments are given, then the existing set of
327 selected sections is cleared out (which means C<all> sections will be
330 This method should I<not> normally be overridden by subclasses.
334 use vars qw(@selected_sections);
339 local *myData = $self;
342 ### NEED TO DISCERN A SECTION-SPEC FROM A RANGE-SPEC (look for m{^/.+/$}?)
344 ##---------------------------------------------------------------------
345 ## The following is a blatant hack for backward compatibility, and for
346 ## implementing add_selection(). If the *first* *argument* is the
347 ## string "+", then the remaining section specifications are *added*
348 ## to the current set of selections; otherwise the given section
349 ## specifications will *replace* the current set of selections.
351 ## This should probably be fixed someday, but for the present time,
352 ## it seems incredibly unlikely that "+" would ever correspond to
353 ## a legitimate section heading
354 ##---------------------------------------------------------------------
355 my $add = ($sections[0] eq "+") ? shift(@sections) : "";
357 ## Reset the set of sections to use
358 unless (@sections > 0) {
359 delete $myData{_SELECTED_SECTIONS} unless ($add);
362 $myData{_SELECTED_SECTIONS} = []
363 unless ($add && exists $myData{_SELECTED_SECTIONS});
364 local *selected_sections = $myData{_SELECTED_SECTIONS};
368 for $spec (@sections) {
369 if ( defined($_ = &_compile_section_spec($spec)) ) {
370 ## Store them in our sections array
371 push(@selected_sections, $_);
374 carp "Ignoring section spec \"$spec\"!\n";
379 ##---------------------------------------------------------------------------
381 =head1 B<add_selection()>
383 $parser->add_selection($section_spec1,$section_spec2,...);
385 This method is used to add to the currently selected sections and
386 subsections of POD documentation that are to be printed and/or
387 processed. See <select()> for replacing the currently selected sections.
389 Each of the C<$section_spec> arguments should be a section specification
390 as described in L<"SECTION SPECIFICATIONS">. The section specifications
391 are parsed by this method and the resulting regular expressions are
392 stored in the invoking object.
394 This method should I<not> normally be overridden by subclasses.
400 $self->select("+", @_);
403 ##---------------------------------------------------------------------------
405 =head1 B<clear_selections()>
407 $parser->clear_selections();
409 This method takes no arguments, it has the exact same effect as invoking
410 <select()> with no arguments.
414 sub clear_selections {
419 ##---------------------------------------------------------------------------
421 =head1 B<match_section()>
423 $boolean = $parser->match_section($heading1,$heading2,...);
425 Returns a value of true if the given section and subsection heading
426 titles match any of the currently selected section specifications in
427 effect from prior calls to B<select()> and B<add_selection()> (or if
428 there are no explictly selected/deselected sections).
430 The arguments C<$heading1>, C<$heading2>, etc. are the heading titles of
431 the corresponding sections, subsections, etc. to try and match. If
432 C<$headingN> is omitted then it defaults to the current corresponding
433 section heading title in the input.
435 This method should I<not> normally be overridden by subclasses.
442 local *myData = $self;
444 ## Return true if no restrictions were explicitly specified
445 my $selections = (exists $myData{_SELECTED_SECTIONS})
446 ? $myData{_SELECTED_SECTIONS} : undef;
447 return 1 unless ((defined $selections) && (@{$selections} > 0));
449 ## Default any unspecified sections to the current one
450 my @current_headings = $self->curr_headings();
451 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
452 (defined $headings[$i]) or $headings[$i] = $current_headings[$i];
455 ## Look for a match against the specified section expressions
456 my ($section_spec, $regex, $negated, $match);
457 for $section_spec ( @{$selections} ) {
458 ##------------------------------------------------------
459 ## Each portion of this spec must match in order for
460 ## the spec to be matched. So we will start with a
461 ## match-value of 'true' and logically 'and' it with
462 ## the results of matching a given element of the spec.
463 ##------------------------------------------------------
465 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
466 $regex = $section_spec->[$i];
467 $negated = ($regex =~ s/^\!//);
468 $match &= ($negated ? ($headings[$i] !~ /${regex}/)
469 : ($headings[$i] =~ /${regex}/));
470 last unless ($match);
472 return 1 if ($match);
474 return 0; ## no match
477 ##---------------------------------------------------------------------------
479 =head1 B<is_selected()>
481 $boolean = $parser->is_selected($paragraph);
483 This method is used to determine if the block of text given in
484 C<$paragraph> falls within the currently selected set of POD sections
485 and subsections to be printed or processed. This method is also
486 responsible for keeping track of the current input section and
487 subsections. It is assumed that C<$paragraph> is the most recently read
488 (but not yet processed) input paragraph.
490 The value returned will be true if the C<$paragraph> and the rest of the
491 text in the same section as C<$paragraph> should be selected (included)
492 for processing; otherwise a false value is returned.
497 my ($self, $paragraph) = @_;
499 local *myData = $self;
501 $self->_init_headings() unless (defined $myData{_SECTION_HEADINGS});
503 ## Keep track of current sections levels and headings
505 if (/^=((?:sub)*)(?:head(?:ing)?|sec(?:tion)?)(\d*)\s+(.*)\s*$/) {
506 ## This is a section heading command
507 my ($level, $heading) = ($2, $3);
508 $level = 1 + (length($1) / 3) if ((! length $level) || (length $1));
509 ## Reset the current section heading at this level
510 $myData{_SECTION_HEADINGS}->[$level - 1] = $heading;
511 ## Reset subsection headings of this one to empty
512 for (my $i = $level; $i < $MAX_HEADING_LEVEL; ++$i) {
513 $myData{_SECTION_HEADINGS}->[$i] = '';
517 return $self->match_section();
520 #############################################################################
522 =head1 EXPORTED FUNCTIONS
524 The following functions are exported by this module. Please note that
525 these are functions (not methods) and therefore C<do not> take an
526 implicit first argument.
530 ##---------------------------------------------------------------------------
532 =head1 B<podselect()>
534 podselect(\%options,@filelist);
536 B<podselect> will print the raw (untranslated) POD paragraphs of all
537 POD sections in the given input files specified by C<@filelist>
538 according to the given options.
540 If any argument to B<podselect> is a reference to a hash
541 (associative array) then the values with the following keys are
542 processed as follows:
548 A string corresponding to the desired output file (or ">&STDOUT"
549 or ">&STDERR"). The default is to use standard output.
553 A reference to an array of sections specifications (as described in
554 L<"SECTION SPECIFICATIONS">) which indicate the desired set of POD
555 sections and subsections to be selected from input. If no section
556 specifications are given, then all sections of the PODs are used.
558 =begin _NOT_IMPLEMENTED_
562 A reference to an array of range specifications (as described in
563 L<"RANGE SPECIFICATIONS">) which indicate the desired range of POD
564 paragraphs to be selected from the desired input sections. If no range
565 specifications are given, then all paragraphs of the desired sections
568 =end _NOT_IMPLEMENTED_
572 All other arguments should correspond to the names of input files
573 containing POD sections. A file name of "-" or "<&STDIN" will
574 be interpeted to mean standard input (which is the default if no
575 filenames are given).
582 my $pod_parser = new Pod::Select(%defaults);
584 my $output = ">&STDOUT";
589 next unless (ref($_) eq 'HASH');
590 %opts = (%defaults, %{$_});
592 ##-------------------------------------------------------------
593 ## Need this for backward compatibility since we formerly used
594 ## options that were all uppercase words rather than ones that
595 ## looked like Unix command-line options.
596 ## to be uppercase keywords)
597 ##-------------------------------------------------------------
599 my ($key, $val) = (lc $_, $opts{$_});
600 $key =~ s/^(?=\w)/-/;
601 $key =~ /^-se[cl]/ and $key = '-sections';
602 #! $key eq '-range' and $key .= 's';
606 ## Process the options
607 (exists $opts{'-output'}) and $output = $opts{'-output'};
609 ## Select the desired sections
610 $pod_parser->select(@{ $opts{'-sections'} })
611 if ( (defined $opts{'-sections'})
612 && ((ref $opts{'-sections'}) eq 'ARRAY') );
614 #! ## Select the desired paragraph ranges
615 #! $pod_parser->select(@{ $opts{'-ranges'} })
616 #! if ( (defined $opts{'-ranges'})
617 #! && ((ref $opts{'-ranges'}) eq 'ARRAY') );
620 $pod_parser->parse_from_file($_, $output);
624 $pod_parser->parse_from_file("-") unless ($num_inputs > 0);
627 #############################################################################
629 =head1 PRIVATE METHODS AND DATA
631 B<Pod::Select> makes uses a number of internal methods and data fields
632 which clients should not need to see or use. For the sake of avoiding
633 name collisions with client data and methods, these methods and fields
634 are briefly discussed here. Determined hackers may obtain further
635 information about them by reading the B<Pod::Select> source code.
637 Private data fields are stored in the hash-object whose reference is
638 returned by the B<new()> constructor for this class. The names of all
639 private methods and data-fields used by B<Pod::Select> begin with a
640 prefix of "_" and match the regular expression C</^_\w+$/>.
644 ##---------------------------------------------------------------------------
648 =head1 B<_compile_section_spec()>
650 $listref = $parser->_compile_section_spec($section_spec);
652 This function (note it is a function and I<not> a method) takes a
653 section specification (as described in L<"SECTION SPECIFICATIONS">)
654 given in C<$section_sepc>, and compiles it into a list of regular
655 expressions. If C<$section_spec> has no syntax errors, then a reference
656 to the list (array) of corresponding regular expressions is returned;
657 otherwise C<undef> is returned and an error message is printed (using
658 B<carp>) for each invalid regex.
664 sub _compile_section_spec {
665 my ($section_spec) = @_;
666 my (@regexs, $negated);
668 ## Compile the spec into a list of regexs
669 local $_ = $section_spec;
670 s|\\\\|\001|g; ## handle escaped backward slashes
671 s|\\/|\002|g; ## handle escaped forward slashes
673 ## Parse the regexs for the heading titles
674 @regexs = split('/', $_, $MAX_HEADING_LEVEL);
676 ## Set default regex for ommitted levels
677 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
678 $regexs[$i] = '.*' unless ((defined $regexs[$i])
679 && (length $regexs[$i]));
681 ## Modify the regexs as needed and validate their syntax
684 $_ .= '.+' if ($_ eq '!');
685 s|\001|\\\\|g; ## restore escaped backward slashes
686 s|\002|\\/|g; ## restore escaped forward slashes
687 $negated = s/^\!//; ## check for negation
688 eval "/$_/"; ## check regex syntax
691 carp "Bad regular expression /$_/ in \"$section_spec\": $@\n";
694 ## Add the forward and rear anchors (and put the negator back)
695 $_ = '^' . $_ unless (/^\^/);
696 $_ = $_ . '$' unless (/\$$/);
697 $_ = '!' . $_ if ($negated);
700 return (! $bad_regexs) ? [ @regexs ] : undef;
703 ##---------------------------------------------------------------------------
707 =head2 $self->{_SECTION_HEADINGS}
709 A reference to an array of the current section heading titles for each
710 heading level (note that the first heading level title is at index 0).
716 ##---------------------------------------------------------------------------
720 =head2 $self->{_SELECTED_SECTIONS}
722 A reference to an array of references to arrays. Each subarray is a list
723 of anchored regular expressions (preceded by a "!" if the expression is to
724 be negated). The index of the expression in the subarray should correspond
725 to the index of the heading title in C<$self-E<gt>{_SECTION_HEADINGS}>
726 that it is to be matched against.
732 #############################################################################
740 Brad Appleton E<lt>bradapp@enteract.comE<gt>
742 Based on code for B<pod2text> written by
743 Tom Christiansen E<lt>tchrist@mox.perl.comE<gt>