1 #############################################################################
2 # Pod/Select.pm -- function to select portions of POD docs
4 # Copyright (C) 1996-2000 by Bradford Appleton. All rights reserved.
5 # This file is part of "PodParser". PodParser is free software;
6 # you can redistribute it and/or modify it under the same terms
8 #############################################################################
12 use vars qw($VERSION);
13 $VERSION = 1.13; ## Current version of this package
14 require 5.005; ## requires this Perl version or later
16 #############################################################################
20 Pod::Select, podselect() - extract selected sections of POD from input
26 ## Select all the POD sections for each file in @filelist
27 ## and print the result on standard output.
30 ## Same as above, but write to tmp.out
31 podselect({-output => "tmp.out"}, @filelist):
33 ## Select from the given filelist, only those POD sections that are
34 ## within a 1st level section named any of: NAME, SYNOPSIS, OPTIONS.
35 podselect({-sections => ["NAME|SYNOPSIS", "OPTIONS"]}, @filelist):
37 ## Select the "DESCRIPTION" section of the PODs from STDIN and write
38 ## the result to STDERR.
39 podselect({-output => ">&STDERR", -sections => ["DESCRIPTION"]}, \*STDIN);
45 ## Create a parser object for selecting POD sections from the input
46 $parser = new Pod::Select();
48 ## Select all the POD sections for each file in @filelist
49 ## and print the result to tmp.out.
50 $parser->parse_from_file("<&STDIN", "tmp.out");
52 ## Select from the given filelist, only those POD sections that are
53 ## within a 1st level section named any of: NAME, SYNOPSIS, OPTIONS.
54 $parser->select("NAME|SYNOPSIS", "OPTIONS");
55 for (@filelist) { $parser->parse_from_file($_); }
57 ## Select the "DESCRIPTION" and "SEE ALSO" sections of the PODs from
58 ## STDIN and write the result to STDERR.
59 $parser->select("DESCRIPTION");
60 $parser->add_selection("SEE ALSO");
61 $parser->parse_from_filehandle(\*STDIN, \*STDERR);
65 perl5.005, Pod::Parser, Exporter, Carp
73 B<podselect()> is a function which will extract specified sections of
74 pod documentation from an input stream. This ability is provided by the
75 B<Pod::Select> module which is a subclass of B<Pod::Parser>.
76 B<Pod::Select> provides a method named B<select()> to specify the set of
77 POD sections to select for processing/printing. B<podselect()> merely
78 creates a B<Pod::Select> object and then invokes the B<podselect()>
79 followed by B<parse_from_file()>.
81 =head1 SECTION SPECIFICATIONS
83 B<podselect()> and B<Pod::Select::select()> may be given one or more
84 "section specifications" to restrict the text processed to only the
85 desired set of sections and their corresponding subsections. A section
86 specification is a string containing one or more Perl-style regular
87 expressions separated by forward slashes ("/"). If you need to use a
88 forward slash literally within a section title you can escape it with a
91 The formal syntax of a section specification is:
97 I<head1-title-regex>/I<head2-title-regex>/...
101 Any omitted or empty regular expressions will default to ".*".
102 Please note that each regular expression given is implicitly
103 anchored by adding "^" and "$" to the beginning and end. Also, if a
104 given regular expression starts with a "!" character, then the
105 expression is I<negated> (so C<!foo> would match anything I<except>
108 Some example section specifications follow.
113 Match the C<NAME> and C<SYNOPSIS> sections and all of their subsections:
118 Match only the C<Question> and C<Answer> subsections of the C<DESCRIPTION>
121 C<DESCRIPTION/Question|Answer>
124 Match the C<Comments> subsection of I<all> sections:
129 Match all subsections of C<DESCRIPTION> I<except> for C<Comments>:
131 C<DESCRIPTION/!Comments>
134 Match the C<DESCRIPTION> section but do I<not> match any of its subsections:
139 Match all top level sections but none of their subsections:
145 =begin _NOT_IMPLEMENTED_
147 =head1 RANGE SPECIFICATIONS
149 B<podselect()> and B<Pod::Select::select()> may be given one or more
150 "range specifications" to restrict the text processed to only the
151 desired ranges of paragraphs in the desired set of sections. A range
152 specification is a string containing a single Perl-style regular
153 expression (a regex), or else two Perl-style regular expressions
154 (regexs) separated by a ".." (Perl's "range" operator is "..").
155 The regexs in a range specification are delimited by forward slashes
156 ("/"). If you need to use a forward slash literally within a regex you
157 can escape it with a backslash ("\/").
159 The formal syntax of a range specification is:
165 /I<start-range-regex>/[../I<end-range-regex>/]
169 Where each the item inside square brackets (the ".." followed by the
170 end-range-regex) is optional. Each "range-regex" is of the form:
174 Where I<cmd-expr> is intended to match the name of one or more POD
175 commands, and I<text-expr> is intended to match the paragraph text for
176 the command. If a range-regex is supposed to match a POD command, then
177 the first character of the regex (the one after the initial '/')
178 absolutely I<must> be an single '=' character; it may not be anything
179 else (not even a regex meta-character) if it is supposed to match
180 against the name of a POD command.
182 If no I<=cmd-expr> is given then the text-expr will be matched against
183 plain textblocks unless it is preceded by a space, in which case it is
184 matched against verbatim text-blocks. If no I<text-expr> is given then
185 only the command-portion of the paragraph is matched against.
187 Note that these two expressions are each implicitly anchored. This
188 means that when matching against the command-name, there will be an
189 implicit '^' and '$' around the given I<=cmd-expr>; and when matching
190 against the paragraph text there will be an implicit '\A' and '\Z'
191 around the given I<text-expr>.
193 Unlike with section-specs, the '!' character does I<not> have any special
194 meaning (negation or otherwise) at the beginning of a range-spec!
196 Some example range specifications follow.
201 Match all C<=for html> paragraphs:
206 Match all paragraphs between C<=begin html> and C<=end html>
207 (note that this will I<not> work correctly if such sections
210 C</=begin html/../=end html/>
213 Match all paragraphs between the given C<=item> name until the end of the
216 C</=item mine/../=head\d/>
219 Match all paragraphs between the given C<=item> until the next item, or
220 until the end of the itemized list (note that this will I<not> work as
221 desired if the item contains an itemized list nested within it):
223 C</=item mine/../=(item|back)/>
227 =end _NOT_IMPLEMENTED_
231 #############################################################################
236 use Pod::Parser 1.04;
237 use vars qw(@ISA @EXPORT $MAX_HEADING_LEVEL);
239 @ISA = qw(Pod::Parser);
240 @EXPORT = qw(&podselect);
242 ## Maximum number of heading levels supported for '=headN' directives
243 *MAX_HEADING_LEVEL = \3;
245 #############################################################################
247 =head1 OBJECT METHODS
249 The following methods are provided in this module. Each one takes a
250 reference to the object itself as an implicit first parameter.
254 ##---------------------------------------------------------------------------
258 ## =head1 B<_init_headings()>
260 ## Initialize the current set of active section headings.
266 use vars qw(%myData @section_headings);
270 local *myData = $self;
272 ## Initialize current section heading titles if necessary
273 unless (defined $myData{_SECTION_HEADINGS}) {
274 local *section_headings = $myData{_SECTION_HEADINGS} = [];
275 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
276 $section_headings[$i] = '';
281 ##---------------------------------------------------------------------------
283 =head1 B<curr_headings()>
285 ($head1, $head2, $head3, ...) = $parser->curr_headings();
286 $head1 = $parser->curr_headings(1);
288 This method returns a list of the currently active section headings and
289 subheadings in the document being parsed. The list of headings returned
290 corresponds to the most recently parsed paragraph of the input.
292 If an argument is given, it must correspond to the desired section
293 heading number, in which case only the specified section heading is
294 returned. If there is no current section heading at the specified
295 level, then C<undef> is returned.
301 $self->_init_headings() unless (defined $self->{_SECTION_HEADINGS});
302 my @headings = @{ $self->{_SECTION_HEADINGS} };
303 return (@_ > 0 and $_[0] =~ /^\d+$/) ? $headings[$_[0] - 1] : @headings;
306 ##---------------------------------------------------------------------------
310 $parser->select($section_spec1,$section_spec2,...);
312 This method is used to select the particular sections and subsections of
313 POD documentation that are to be printed and/or processed. The existing
314 set of selected sections is I<replaced> with the given set of sections.
315 See B<add_selection()> for adding to the current set of selected
318 Each of the C<$section_spec> arguments should be a section specification
319 as described in L<"SECTION SPECIFICATIONS">. The section specifications
320 are parsed by this method and the resulting regular expressions are
321 stored in the invoking object.
323 If no C<$section_spec> arguments are given, then the existing set of
324 selected sections is cleared out (which means C<all> sections will be
327 This method should I<not> normally be overridden by subclasses.
331 use vars qw(@selected_sections);
336 local *myData = $self;
339 ### NEED TO DISCERN A SECTION-SPEC FROM A RANGE-SPEC (look for m{^/.+/$}?)
341 ##---------------------------------------------------------------------
342 ## The following is a blatant hack for backward compatibility, and for
343 ## implementing add_selection(). If the *first* *argument* is the
344 ## string "+", then the remaining section specifications are *added*
345 ## to the current set of selections; otherwise the given section
346 ## specifications will *replace* the current set of selections.
348 ## This should probably be fixed someday, but for the present time,
349 ## it seems incredibly unlikely that "+" would ever correspond to
350 ## a legitimate section heading
351 ##---------------------------------------------------------------------
352 my $add = ($sections[0] eq "+") ? shift(@sections) : "";
354 ## Reset the set of sections to use
355 unless (@sections > 0) {
356 delete $myData{_SELECTED_SECTIONS} unless ($add);
359 $myData{_SELECTED_SECTIONS} = []
360 unless ($add && exists $myData{_SELECTED_SECTIONS});
361 local *selected_sections = $myData{_SELECTED_SECTIONS};
365 for $spec (@sections) {
366 if ( defined($_ = &_compile_section_spec($spec)) ) {
367 ## Store them in our sections array
368 push(@selected_sections, $_);
371 carp "Ignoring section spec \"$spec\"!\n";
376 ##---------------------------------------------------------------------------
378 =head1 B<add_selection()>
380 $parser->add_selection($section_spec1,$section_spec2,...);
382 This method is used to add to the currently selected sections and
383 subsections of POD documentation that are to be printed and/or
384 processed. See <select()> for replacing the currently selected sections.
386 Each of the C<$section_spec> arguments should be a section specification
387 as described in L<"SECTION SPECIFICATIONS">. The section specifications
388 are parsed by this method and the resulting regular expressions are
389 stored in the invoking object.
391 This method should I<not> normally be overridden by subclasses.
397 $self->select("+", @_);
400 ##---------------------------------------------------------------------------
402 =head1 B<clear_selections()>
404 $parser->clear_selections();
406 This method takes no arguments, it has the exact same effect as invoking
407 <select()> with no arguments.
411 sub clear_selections {
416 ##---------------------------------------------------------------------------
418 =head1 B<match_section()>
420 $boolean = $parser->match_section($heading1,$heading2,...);
422 Returns a value of true if the given section and subsection heading
423 titles match any of the currently selected section specifications in
424 effect from prior calls to B<select()> and B<add_selection()> (or if
425 there are no explictly selected/deselected sections).
427 The arguments C<$heading1>, C<$heading2>, etc. are the heading titles of
428 the corresponding sections, subsections, etc. to try and match. If
429 C<$headingN> is omitted then it defaults to the current corresponding
430 section heading title in the input.
432 This method should I<not> normally be overridden by subclasses.
439 local *myData = $self;
441 ## Return true if no restrictions were explicitly specified
442 my $selections = (exists $myData{_SELECTED_SECTIONS})
443 ? $myData{_SELECTED_SECTIONS} : undef;
444 return 1 unless ((defined $selections) && (@{$selections} > 0));
446 ## Default any unspecified sections to the current one
447 my @current_headings = $self->curr_headings();
448 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
449 (defined $headings[$i]) or $headings[$i] = $current_headings[$i];
452 ## Look for a match against the specified section expressions
453 my ($section_spec, $regex, $negated, $match);
454 for $section_spec ( @{$selections} ) {
455 ##------------------------------------------------------
456 ## Each portion of this spec must match in order for
457 ## the spec to be matched. So we will start with a
458 ## match-value of 'true' and logically 'and' it with
459 ## the results of matching a given element of the spec.
460 ##------------------------------------------------------
462 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
463 $regex = $section_spec->[$i];
464 $negated = ($regex =~ s/^\!//);
465 $match &= ($negated ? ($headings[$i] !~ /${regex}/)
466 : ($headings[$i] =~ /${regex}/));
467 last unless ($match);
469 return 1 if ($match);
471 return 0; ## no match
474 ##---------------------------------------------------------------------------
476 =head1 B<is_selected()>
478 $boolean = $parser->is_selected($paragraph);
480 This method is used to determine if the block of text given in
481 C<$paragraph> falls within the currently selected set of POD sections
482 and subsections to be printed or processed. This method is also
483 responsible for keeping track of the current input section and
484 subsections. It is assumed that C<$paragraph> is the most recently read
485 (but not yet processed) input paragraph.
487 The value returned will be true if the C<$paragraph> and the rest of the
488 text in the same section as C<$paragraph> should be selected (included)
489 for processing; otherwise a false value is returned.
494 my ($self, $paragraph) = @_;
496 local *myData = $self;
498 $self->_init_headings() unless (defined $myData{_SECTION_HEADINGS});
500 ## Keep track of current sections levels and headings
502 if (/^=((?:sub)*)(?:head(?:ing)?|sec(?:tion)?)(\d*)\s+(.*)\s*$/) {
503 ## This is a section heading command
504 my ($level, $heading) = ($2, $3);
505 $level = 1 + (length($1) / 3) if ((! length $level) || (length $1));
506 ## Reset the current section heading at this level
507 $myData{_SECTION_HEADINGS}->[$level - 1] = $heading;
508 ## Reset subsection headings of this one to empty
509 for (my $i = $level; $i < $MAX_HEADING_LEVEL; ++$i) {
510 $myData{_SECTION_HEADINGS}->[$i] = '';
514 return $self->match_section();
517 #############################################################################
519 =head1 EXPORTED FUNCTIONS
521 The following functions are exported by this module. Please note that
522 these are functions (not methods) and therefore C<do not> take an
523 implicit first argument.
527 ##---------------------------------------------------------------------------
529 =head1 B<podselect()>
531 podselect(\%options,@filelist);
533 B<podselect> will print the raw (untranslated) POD paragraphs of all
534 POD sections in the given input files specified by C<@filelist>
535 according to the given options.
537 If any argument to B<podselect> is a reference to a hash
538 (associative array) then the values with the following keys are
539 processed as follows:
545 A string corresponding to the desired output file (or ">&STDOUT"
546 or ">&STDERR"). The default is to use standard output.
550 A reference to an array of sections specifications (as described in
551 L<"SECTION SPECIFICATIONS">) which indicate the desired set of POD
552 sections and subsections to be selected from input. If no section
553 specifications are given, then all sections of the PODs are used.
555 =begin _NOT_IMPLEMENTED_
559 A reference to an array of range specifications (as described in
560 L<"RANGE SPECIFICATIONS">) which indicate the desired range of POD
561 paragraphs to be selected from the desired input sections. If no range
562 specifications are given, then all paragraphs of the desired sections
565 =end _NOT_IMPLEMENTED_
569 All other arguments should correspond to the names of input files
570 containing POD sections. A file name of "-" or "<&STDIN" will
571 be interpeted to mean standard input (which is the default if no
572 filenames are given).
579 my $pod_parser = new Pod::Select(%defaults);
581 my $output = ">&STDOUT";
586 next unless (ref($_) eq 'HASH');
587 %opts = (%defaults, %{$_});
589 ##-------------------------------------------------------------
590 ## Need this for backward compatibility since we formerly used
591 ## options that were all uppercase words rather than ones that
592 ## looked like Unix command-line options.
593 ## to be uppercase keywords)
594 ##-------------------------------------------------------------
596 my ($key, $val) = (lc $_, $opts{$_});
597 $key =~ s/^(?=\w)/-/;
598 $key =~ /^-se[cl]/ and $key = '-sections';
599 #! $key eq '-range' and $key .= 's';
603 ## Process the options
604 (exists $opts{'-output'}) and $output = $opts{'-output'};
606 ## Select the desired sections
607 $pod_parser->select(@{ $opts{'-sections'} })
608 if ( (defined $opts{'-sections'})
609 && ((ref $opts{'-sections'}) eq 'ARRAY') );
611 #! ## Select the desired paragraph ranges
612 #! $pod_parser->select(@{ $opts{'-ranges'} })
613 #! if ( (defined $opts{'-ranges'})
614 #! && ((ref $opts{'-ranges'}) eq 'ARRAY') );
617 $pod_parser->parse_from_file($_, $output);
621 $pod_parser->parse_from_file("-") unless ($num_inputs > 0);
624 #############################################################################
626 =head1 PRIVATE METHODS AND DATA
628 B<Pod::Select> makes uses a number of internal methods and data fields
629 which clients should not need to see or use. For the sake of avoiding
630 name collisions with client data and methods, these methods and fields
631 are briefly discussed here. Determined hackers may obtain further
632 information about them by reading the B<Pod::Select> source code.
634 Private data fields are stored in the hash-object whose reference is
635 returned by the B<new()> constructor for this class. The names of all
636 private methods and data-fields used by B<Pod::Select> begin with a
637 prefix of "_" and match the regular expression C</^_\w+$/>.
641 ##---------------------------------------------------------------------------
645 =head1 B<_compile_section_spec()>
647 $listref = $parser->_compile_section_spec($section_spec);
649 This function (note it is a function and I<not> a method) takes a
650 section specification (as described in L<"SECTION SPECIFICATIONS">)
651 given in C<$section_sepc>, and compiles it into a list of regular
652 expressions. If C<$section_spec> has no syntax errors, then a reference
653 to the list (array) of corresponding regular expressions is returned;
654 otherwise C<undef> is returned and an error message is printed (using
655 B<carp>) for each invalid regex.
661 sub _compile_section_spec {
662 my ($section_spec) = @_;
663 my (@regexs, $negated);
665 ## Compile the spec into a list of regexs
666 local $_ = $section_spec;
667 s|\\\\|\001|g; ## handle escaped backward slashes
668 s|\\/|\002|g; ## handle escaped forward slashes
670 ## Parse the regexs for the heading titles
671 @regexs = split('/', $_, $MAX_HEADING_LEVEL);
673 ## Set default regex for ommitted levels
674 for (my $i = 0; $i < $MAX_HEADING_LEVEL; ++$i) {
675 $regexs[$i] = '.*' unless ((defined $regexs[$i])
676 && (length $regexs[$i]));
678 ## Modify the regexs as needed and validate their syntax
681 $_ .= '.+' if ($_ eq '!');
682 s|\001|\\\\|g; ## restore escaped backward slashes
683 s|\002|\\/|g; ## restore escaped forward slashes
684 $negated = s/^\!//; ## check for negation
685 eval "/$_/"; ## check regex syntax
688 carp "Bad regular expression /$_/ in \"$section_spec\": $@\n";
691 ## Add the forward and rear anchors (and put the negator back)
692 $_ = '^' . $_ unless (/^\^/);
693 $_ = $_ . '$' unless (/\$$/);
694 $_ = '!' . $_ if ($negated);
697 return (! $bad_regexs) ? [ @regexs ] : undef;
700 ##---------------------------------------------------------------------------
704 =head2 $self->{_SECTION_HEADINGS}
706 A reference to an array of the current section heading titles for each
707 heading level (note that the first heading level title is at index 0).
713 ##---------------------------------------------------------------------------
717 =head2 $self->{_SELECTED_SECTIONS}
719 A reference to an array of references to arrays. Each subarray is a list
720 of anchored regular expressions (preceded by a "!" if the expression is to
721 be negated). The index of the expression in the subarray should correspond
722 to the index of the heading title in C<$self-E<gt>{_SECTION_HEADINGS}>
723 that it is to be matched against.
729 #############################################################################
737 Brad Appleton E<lt>bradapp@enteract.comE<gt>
739 Based on code for B<pod2text> written by
740 Tom Christiansen E<lt>tchrist@mox.perl.comE<gt>