1 #############################################################################
2 # Pod/ParseUtils.pm -- helpers for POD parsing and conversion
4 # Copyright (C) 1999-2000 by Marek Rouchal. All rights reserved.
5 # This file is part of "PodParser". PodParser is free software;
6 # you can redistribute it and/or modify it under the same terms
8 #############################################################################
10 package Pod::ParseUtils;
13 use vars qw($VERSION);
14 $VERSION = '1.36'; ## Current version of this package
15 require 5.005; ## requires this Perl version or later
19 Pod::ParseUtils - helpers for POD parsing and conversion
25 my $list = new Pod::List;
26 my $link = Pod::Hyperlink->new('Pod::Parser');
30 B<Pod::ParseUtils> contains a few object-oriented helper packages for
31 POD parsing and processing (i.e. in POD formatters and translators).
35 #-----------------------------------------------------------------------------
38 # class to hold POD list info (=over, =item, =back)
39 #-----------------------------------------------------------------------------
47 B<Pod::List> can be used to hold information about POD lists
48 (written as =over ... =item ... =back) for further processing.
49 The following methods are available:
53 =item Pod::List-E<gt>new()
55 Create a new list object. Properties may be specified through a hash
58 my $list = Pod::List->new({ -start => $., -indent => 4 });
60 See the individual methods/properties for details.
66 my $class = ref($this) || $this;
76 $self->{-file} ||= 'unknown';
77 $self->{-start} ||= 'unknown';
78 $self->{-indent} ||= 4; # perlpod: "should be the default"
80 $self->{-type} ||= '';
83 =item $list-E<gt>file()
85 Without argument, retrieves the file name the list is in. This must
86 have been set before by either specifying B<-file> in the B<new()>
87 method or by calling the B<file()> method with a scalar argument.
91 # The POD file name the list appears in
93 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
96 =item $list-E<gt>start()
98 Without argument, retrieves the line number where the list started.
99 This must have been set before by either specifying B<-start> in the
100 B<new()> method or by calling the B<start()> method with a scalar
105 # The line in the file the node appears
107 return (@_ > 1) ? ($_[0]->{-start} = $_[1]) : $_[0]->{-start};
110 =item $list-E<gt>indent()
112 Without argument, retrieves the indent level of the list as specified
113 in C<=over n>. This must have been set before by either specifying
114 B<-indent> in the B<new()> method or by calling the B<indent()> method
115 with a scalar argument.
121 return (@_ > 1) ? ($_[0]->{-indent} = $_[1]) : $_[0]->{-indent};
124 =item $list-E<gt>type()
126 Without argument, retrieves the list type, which can be an arbitrary value,
127 e.g. C<OL>, C<UL>, ... when thinking the HTML way.
128 This must have been set before by either specifying
129 B<-type> in the B<new()> method or by calling the B<type()> method
130 with a scalar argument.
134 # The type of the list (UL, OL, ...)
136 return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};
139 =item $list-E<gt>rx()
141 Without argument, retrieves a regular expression for simplifying the
142 individual item strings once the list type has been determined. Usage:
143 E.g. when converting to HTML, one might strip the leading number in
144 an ordered list as C<E<lt>OLE<gt>> already prints numbers itself.
145 This must have been set before by either specifying
146 B<-rx> in the B<new()> method or by calling the B<rx()> method
147 with a scalar argument.
151 # The regular expression to simplify the items
153 return (@_ > 1) ? ($_[0]->{-rx} = $_[1]) : $_[0]->{-rx};
156 =item $list-E<gt>item()
158 Without argument, retrieves the array of the items in this list.
159 The items may be represented by any scalar.
160 If an argument has been given, it is pushed on the list of items.
164 # The individual =items of this list
166 my ($self,$item) = @_;
168 push(@{$self->{_items}}, $item);
172 return @{$self->{_items}};
176 =item $list-E<gt>parent()
178 Without argument, retrieves information about the parent holding this
179 list, which is represented as an arbitrary scalar.
180 This must have been set before by either specifying
181 B<-parent> in the B<new()> method or by calling the B<parent()> method
182 with a scalar argument.
186 # possibility for parsers/translators to store information about the
187 # lists's parent object
189 return (@_ > 1) ? ($_[0]->{-parent} = $_[1]) : $_[0]->{-parent};
192 =item $list-E<gt>tag()
194 Without argument, retrieves information about the list tag, which can be
196 This must have been set before by either specifying
197 B<-tag> in the B<new()> method or by calling the B<tag()> method
198 with a scalar argument.
204 # possibility for parsers/translators to store information about the
207 return (@_ > 1) ? ($_[0]->{-tag} = $_[1]) : $_[0]->{-tag};
210 #-----------------------------------------------------------------------------
213 # class to manipulate POD hyperlinks (L<>)
214 #-----------------------------------------------------------------------------
216 package Pod::Hyperlink;
218 =head2 Pod::Hyperlink
220 B<Pod::Hyperlink> is a class for manipulation of POD hyperlinks. Usage:
222 my $link = Pod::Hyperlink->new('alternative text|page/"section in page"');
224 The B<Pod::Hyperlink> class is mainly designed to parse the contents of the
225 C<LE<lt>...E<gt>> sequence, providing a simple interface for accessing the
226 different parts of a POD hyperlink for further processing. It can also be
227 used to construct hyperlinks.
231 =item Pod::Hyperlink-E<gt>new()
233 The B<new()> method can either be passed a set of key/value pairs or a single
234 scalar value, namely the contents of a C<LE<lt>...E<gt>> sequence. An object
235 of the class C<Pod::Hyperlink> is returned. The value C<undef> indicates a
236 failure, the error message is stored in C<$@>.
244 my $class = ref($this) || $this;
250 # called with a list of parameters
252 $self->_construct_text();
255 # called with L<> contents
256 return unless($self->parse($_[0]));
264 $self->{-line} ||= 'undef';
265 $self->{-file} ||= 'undef';
266 $self->{-page} ||= '';
267 $self->{-node} ||= '';
268 $self->{-alttext} ||= '';
269 $self->{-type} ||= 'undef';
270 $self->{_warnings} = [];
273 =item $link-E<gt>parse($string)
275 This method can be used to (re)parse a (new) hyperlink, i.e. the contents
276 of a C<LE<lt>...E<gt>> sequence. The result is stored in the current object.
277 Warnings are stored in the B<warnings> property.
278 E.g. sections like C<LE<lt>open(2)E<gt>> are deprecated, as they do not point
279 to Perl documents. C<LE<lt>DBI::foo(3p)E<gt>> is wrong as well, the manpage
280 section can simply be dropped.
287 # syntax check the link and extract destination
288 my ($alttext,$page,$node,$type,$quoted) = (undef,'','','',0);
290 $self->{_warnings} = [];
292 # collapse newlines with whitespace
295 # strip leading/trailing whitespace
297 $self->warning('ignoring leading whitespace in link');
300 $self->warning('ignoring trailing whitespace in link');
303 _invalid_link('empty link');
307 ## Check for different possibilities. This is tedious and error-prone
308 # we match all possibilities (alttext, page, section/item)
309 #warn "DEBUG: link=$_\n";
312 # problem: a lot of people use (), or (1) or the like to indicate
313 # man page sections. But this collides with L<func()> that is supposed
314 # to point to an internal funtion...
315 my $page_rx = '[\w.-]+(?:::[\w.-]+)*(?:[(](?:\d\w*|)[)]|)';
317 if(/^($page_rx)$/o) {
321 # alttext, page and "section"
322 elsif(m{^(.*?)\s*[|]\s*($page_rx)\s*/\s*"(.+)"$}o) {
323 ($alttext, $page, $node) = ($1, $2, $3);
325 $quoted = 1; #... therefore | and / are allowed
328 elsif(/^(.*?)\s*[|]\s*($page_rx)$/o) {
329 ($alttext, $page) = ($1, $2);
332 # alttext and "section"
333 elsif(m{^(.*?)\s*[|]\s*(?:/\s*|)"(.+)"$}) {
334 ($alttext, $node) = ($1,$2);
339 elsif(m{^($page_rx)\s*/\s*"(.+)"$}o) {
340 ($page, $node) = ($1, $2);
345 elsif(m{^($page_rx)\s*/\s*(.+)$}o) {
346 ($page, $node) = ($1, $2);
350 elsif(m{^/?"(.+)"$}) {
356 elsif(m{^\s*/(.+)$}) {
361 # non-standard: Hyperlink with alt-text - doesn't remove protocol prefix, maybe it should?
362 elsif(/^ \s* (.*?) \s* [|] \s* (\w+:[^:\s] [^\s|]*?) \s* $/ix) {
363 ($alttext,$node) = ($1,$2);
367 # non-standard: Hyperlink
368 elsif(/^(\w+:[^:\s]\S*)$/i) {
372 # alttext, page and item
373 elsif(m{^(.*?)\s*[|]\s*($page_rx)\s*/\s*(.+)$}o) {
374 ($alttext, $page, $node) = ($1, $2, $3);
378 elsif(m{^(.*?)\s*[|]\s*/(.+)$}) {
379 ($alttext, $node) = ($1,$2);
381 # must be an item or a "malformed" section (without "")
386 # collapse whitespace in nodes
389 # empty alternative text expands to node name
390 if(defined $alttext) {
391 if(!length($alttext)) {
392 $alttext = $node || $page;
399 if($page =~ /[(]\w*[)]$/) {
400 $self->warning("(section) in '$page' deprecated");
402 if(!$quoted && $node =~ m{[|/]} && $type ne 'hyperlink') {
403 $self->warning("node '$node' contains non-escaped | or /");
405 if($alttext =~ m{[|/]}) {
406 $self->warning("alternative text '$node' contains non-escaped | or /");
408 $self->{-page} = $page;
409 $self->{-node} = $node;
410 $self->{-alttext} = $alttext;
411 #warn "DEBUG: page=$page section=$section item=$item alttext=$alttext\n";
412 $self->{-type} = $type;
413 $self->_construct_text();
417 sub _construct_text {
419 my $alttext = $self->alttext();
420 my $type = $self->type();
421 my $section = $self->node();
422 my $page = $self->page();
424 $page =~ s/([(]\w*[)])$// && ($page_ext = $1);
426 $self->{_text} = $alttext;
428 elsif($type eq 'hyperlink') {
429 $self->{_text} = $section;
432 $self->{_text} = ($section || '') .
433 (($page && $section) ? ' in ' : '') .
436 # for being marked up later
437 # use the non-standard markers P<> and Q<>, so that the resulting
438 # text can be parsed by the translators. It's their job to put
439 # the correct hypertext around the linktext
441 $self->{_markup} = "Q<$alttext>";
443 elsif($type eq 'hyperlink') {
444 $self->{_markup} = "Q<$section>";
447 $self->{_markup} = (!$section ? '' : "Q<$section>") .
448 ($page ? ($section ? ' in ':'') . "P<$page>$page_ext" : '');
452 =item $link-E<gt>markup($string)
454 Set/retrieve the textual value of the link. This string contains special
455 markers C<PE<lt>E<gt>> and C<QE<lt>E<gt>> that should be expanded by the
456 translator's interior sequence expansion engine to the
457 formatter-specific code to highlight/activate the hyperlink. The details
458 have to be implemented in the translator.
462 #' retrieve/set markuped text
464 return (@_ > 1) ? ($_[0]->{_markup} = $_[1]) : $_[0]->{_markup};
467 =item $link-E<gt>text()
469 This method returns the textual representation of the hyperlink as above,
470 but without markers (read only). Depending on the link type this is one of
471 the following alternatives (the + and * denote the portions of the text
475 *$|* in +perlvar+ L<perlvar/$|>
476 *OPTIONS* in +perldoc+ L<perldoc/"OPTIONS">
477 *DESCRIPTION* L<"DESCRIPTION">
481 # The complete link's text
483 return $_[0]->{_text};
486 =item $link-E<gt>warning()
488 After parsing, this method returns any warnings encountered during the
493 # Set/retrieve warnings
497 push(@{$self->{_warnings}}, @_);
500 return @{$self->{_warnings}};
503 =item $link-E<gt>file()
505 =item $link-E<gt>line()
507 Just simple slots for storing information about the line and the file
508 the link was encountered in. Has to be filled in manually.
512 # The line in the file the link appears
514 return (@_ > 1) ? ($_[0]->{-line} = $_[1]) : $_[0]->{-line};
517 # The POD file name the link appears in
519 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
522 =item $link-E<gt>page()
524 This method sets or returns the POD page this link points to.
528 # The POD page the link appears on
531 $_[0]->{-page} = $_[1];
532 $_[0]->_construct_text();
534 return $_[0]->{-page};
537 =item $link-E<gt>node()
539 As above, but the destination node text of the link.
543 # The link destination
546 $_[0]->{-node} = $_[1];
547 $_[0]->_construct_text();
549 return $_[0]->{-node};
552 =item $link-E<gt>alttext()
554 Sets or returns an alternative text specified in the link.
558 # Potential alternative text
561 $_[0]->{-alttext} = $_[1];
562 $_[0]->_construct_text();
564 return $_[0]->{-alttext};
567 =item $link-E<gt>type()
569 The node type, either C<section> or C<item>. As an unofficial type,
570 there is also C<hyperlink>, derived from e.g. C<LE<lt>http://perl.comE<gt>>
574 # The type: item or headn
576 return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};
579 =item $link-E<gt>link()
581 Returns the link as contents of C<LE<lt>E<gt>>. Reciprocal to B<parse()>.
590 my $link = $self->page() || '';
592 my $node = $self->node();
593 $node =~ s/\|/E<verbar>/g;
594 $node =~ s{/}{E<sol>}g;
595 if($self->type() eq 'section') {
596 $link .= ($link ? '/' : '') . '"' . $node . '"';
598 elsif($self->type() eq 'hyperlink') {
599 $link = $self->node();
602 $link .= '/' . $node;
605 if($self->alttext()) {
606 my $text = $self->alttext();
607 $text =~ s/\|/E<verbar>/g;
608 $text =~ s{/}{E<sol>}g;
609 $link = "$text|$link";
617 #eval { die "$msg\n" };
619 $@ = $msg; # this seems to work, too!
623 #-----------------------------------------------------------------------------
626 # class to hold POD page details
627 #-----------------------------------------------------------------------------
633 B<Pod::Cache> holds information about a set of POD documents,
634 especially the nodes for hyperlinks.
635 The following methods are available:
639 =item Pod::Cache-E<gt>new()
641 Create a new cache object. This object can hold an arbitrary number of
642 POD documents of class Pod::Cache::Item.
648 my $class = ref($this) || $this;
654 =item $cache-E<gt>item()
656 Add a new item to the cache. Without arguments, this method returns a
657 list of all cache elements.
662 my ($self,%param) = @_;
664 my $item = Pod::Cache::Item->new(%param);
673 =item $cache-E<gt>find_page($name)
675 Look for a POD document named C<$name> in the cache. Returns the
676 reference to the corresponding Pod::Cache::Item object or undef if
684 my ($self,$page) = @_;
686 if($_->page() eq $page) {
693 package Pod::Cache::Item;
695 =head2 Pod::Cache::Item
697 B<Pod::Cache::Item> holds information about individual POD documents,
698 that can be grouped in a Pod::Cache object.
699 It is intended to hold information about the hyperlink nodes of POD
701 The following methods are available:
705 =item Pod::Cache::Item-E<gt>new()
713 my $class = ref($this) || $this;
715 my $self = {%params};
723 $self->{-nodes} = [] unless(defined $self->{-nodes});
726 =item $cacheitem-E<gt>page()
728 Set/retrieve the POD document name (e.g. "Pod::Parser").
734 return (@_ > 1) ? ($_[0]->{-page} = $_[1]) : $_[0]->{-page};
737 =item $cacheitem-E<gt>description()
739 Set/retrieve the POD short description as found in the C<=head1 NAME>
744 # The POD description, taken out of NAME if present
746 return (@_ > 1) ? ($_[0]->{-description} = $_[1]) : $_[0]->{-description};
749 =item $cacheitem-E<gt>path()
751 Set/retrieve the POD file storage path.
757 return (@_ > 1) ? ($_[0]->{-path} = $_[1]) : $_[0]->{-path};
760 =item $cacheitem-E<gt>file()
762 Set/retrieve the POD file name.
768 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
771 =item $cacheitem-E<gt>nodes()
773 Add a node (or a list of nodes) to the document's node list. Note that
774 the order is kept, i.e. start with the first node and end with the last.
775 If no argument is given, the current list of nodes is returned in the
776 same order the nodes have been added.
777 A node can be any scalar, but usually is a pair of node string and
778 unique id for the C<find_node> method to work correctly.
784 my ($self,@nodes) = @_;
786 push(@{$self->{-nodes}}, @nodes);
790 return @{$self->{-nodes}};
794 =item $cacheitem-E<gt>find_node($name)
796 Look for a node or index entry named C<$name> in the object.
797 Returns the unique id of the node (i.e. the second element of the array
798 stored in the node array) or undef if not found.
803 my ($self,$node) = @_;
805 push(@search, @{$self->{-nodes}}) if($self->{-nodes});
806 push(@search, @{$self->{-idx}}) if($self->{-idx});
808 if($_->[0] eq $node) {
815 =item $cacheitem-E<gt>idx()
817 Add an index entry (or a list of them) to the document's index list. Note that
818 the order is kept, i.e. start with the first node and end with the last.
819 If no argument is given, the current list of index entries is returned in the
820 same order the entries have been added.
821 An index entry can be any scalar, but usually is a pair of string and
828 # The POD index entries
830 my ($self,@idx) = @_;
832 push(@{$self->{-idx}}, @idx);
836 return @{$self->{-idx}};
842 Please report bugs using L<http://rt.cpan.org>.
844 Marek Rouchal E<lt>marekr@cpan.orgE<gt>, borrowing
845 a lot of things from L<pod2man> and L<pod2roff> as well as other POD
846 processing tools by Tom Christiansen, Brad Appleton and Russ Allbery.
850 L<pod2man>, L<pod2roff>, L<Pod::Parser>, L<Pod::Checker>,