1 #############################################################################
2 # Pod/ParseUtils.pm -- helpers for POD parsing and conversion
4 # Copyright (C) 1999-2000 by Marek Rouchal. All rights reserved.
5 # This file is part of "PodParser". PodParser is free software;
6 # you can redistribute it and/or modify it under the same terms
8 #############################################################################
10 package Pod::ParseUtils;
12 use vars qw($VERSION);
13 $VERSION = 0.22; ## Current version of this package
14 require 5.005; ## requires this Perl version or later
18 Pod::ParseUtils - helpers for POD parsing and conversion
24 my $list = new Pod::List;
25 my $link = Pod::Hyperlink->new('Pod::Parser');
29 B<Pod::ParseUtils> contains a few object-oriented helper packages for
30 POD parsing and processing (i.e. in POD formatters and translators).
34 #-----------------------------------------------------------------------------
37 # class to hold POD list info (=over, =item, =back)
38 #-----------------------------------------------------------------------------
46 B<Pod::List> can be used to hold information about POD lists
47 (written as =over ... =item ... =back) for further processing.
48 The following methods are available:
52 =item Pod::List-E<gt>new()
54 Create a new list object. Properties may be specified through a hash
57 my $list = Pod::List->new({ -start => $., -indent => 4 });
59 See the individual methods/properties for details.
65 my $class = ref($this) || $this;
75 $self->{-file} ||= 'unknown';
76 $self->{-start} ||= 'unknown';
77 $self->{-indent} ||= 4; # perlpod: "should be the default"
79 $self->{-type} ||= '';
82 =item $list-E<gt>file()
84 Without argument, retrieves the file name the list is in. This must
85 have been set before by either specifying B<-file> in the B<new()>
86 method or by calling the B<file()> method with a scalar argument.
90 # The POD file name the list appears in
92 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
95 =item $list-E<gt>start()
97 Without argument, retrieves the line number where the list started.
98 This must have been set before by either specifying B<-start> in the
99 B<new()> method or by calling the B<start()> method with a scalar
104 # The line in the file the node appears
106 return (@_ > 1) ? ($_[0]->{-start} = $_[1]) : $_[0]->{-start};
109 =item $list-E<gt>indent()
111 Without argument, retrieves the indent level of the list as specified
112 in C<=over n>. This must have been set before by either specifying
113 B<-indent> in the B<new()> method or by calling the B<indent()> method
114 with a scalar argument.
120 return (@_ > 1) ? ($_[0]->{-indent} = $_[1]) : $_[0]->{-indent};
123 =item $list-E<gt>type()
125 Without argument, retrieves the list type, which can be an arbitrary value,
126 e.g. C<OL>, C<UL>, ... when thinking the HTML way.
127 This must have been set before by either specifying
128 B<-type> in the B<new()> method or by calling the B<type()> method
129 with a scalar argument.
133 # The type of the list (UL, OL, ...)
135 return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};
138 =item $list-E<gt>rx()
140 Without argument, retrieves a regular expression for simplifying the
141 individual item strings once the list type has been determined. Usage:
142 E.g. when converting to HTML, one might strip the leading number in
143 an ordered list as C<E<lt>OLE<gt>> already prints numbers itself.
144 This must have been set before by either specifying
145 B<-rx> in the B<new()> method or by calling the B<rx()> method
146 with a scalar argument.
150 # The regular expression to simplify the items
152 return (@_ > 1) ? ($_[0]->{-rx} = $_[1]) : $_[0]->{-rx};
155 =item $list-E<gt>item()
157 Without argument, retrieves the array of the items in this list.
158 The items may be represented by any scalar.
159 If an argument has been given, it is pushed on the list of items.
163 # The individual =items of this list
165 my ($self,$item) = @_;
167 push(@{$self->{_items}}, $item);
171 return @{$self->{_items}};
175 =item $list-E<gt>parent()
177 Without argument, retrieves information about the parent holding this
178 list, which is represented as an arbitrary scalar.
179 This must have been set before by either specifying
180 B<-parent> in the B<new()> method or by calling the B<parent()> method
181 with a scalar argument.
185 # possibility for parsers/translators to store information about the
186 # lists's parent object
188 return (@_ > 1) ? ($_[0]->{-parent} = $_[1]) : $_[0]->{-parent};
191 =item $list-E<gt>tag()
193 Without argument, retrieves information about the list tag, which can be
195 This must have been set before by either specifying
196 B<-tag> in the B<new()> method or by calling the B<tag()> method
197 with a scalar argument.
203 # possibility for parsers/translators to store information about the
206 return (@_ > 1) ? ($_[0]->{-tag} = $_[1]) : $_[0]->{-tag};
209 #-----------------------------------------------------------------------------
212 # class to manipulate POD hyperlinks (L<>)
213 #-----------------------------------------------------------------------------
215 package Pod::Hyperlink;
217 =head2 Pod::Hyperlink
219 B<Pod::Hyperlink> is a class for manipulation of POD hyperlinks. Usage:
221 my $link = Pod::Hyperlink->new('alternative text|page/"section in page"');
223 The B<Pod::Hyperlink> class is mainly designed to parse the contents of the
224 C<LE<lt>...E<gt>> sequence, providing a simple interface for accessing the
225 different parts of a POD hyperlink for further processing. It can also be
226 used to construct hyperlinks.
230 =item Pod::Hyperlink-E<gt>new()
232 The B<new()> method can either be passed a set of key/value pairs or a single
233 scalar value, namely the contents of a C<LE<lt>...E<gt>> sequence. An object
234 of the class C<Pod::Hyperlink> is returned. The value C<undef> indicates a
235 failure, the error message is stored in C<$@>.
243 my $class = ref($this) || $this;
249 # called with a list of parameters
251 $self->_construct_text();
254 # called with L<> contents
255 return undef unless($self->parse($_[0]));
263 $self->{-line} ||= 'undef';
264 $self->{-file} ||= 'undef';
265 $self->{-page} ||= '';
266 $self->{-node} ||= '';
267 $self->{-alttext} ||= '';
268 $self->{-type} ||= 'undef';
269 $self->{_warnings} = [];
272 =item $link-E<gt>parse($string)
274 This method can be used to (re)parse a (new) hyperlink, i.e. the contents
275 of a C<LE<lt>...E<gt>> sequence. The result is stored in the current object.
276 Warnings are stored in the B<warnings> property.
277 E.g. sections like C<LE<lt>open(2)E<gt>> are deprected, as they do not point
278 to Perl documents. C<LE<lt>DBI::foo(3p)E<gt>> is wrong as well, the manpage
279 section can simply be dropped.
286 # syntax check the link and extract destination
287 my ($alttext,$page,$node,$type) = (undef,'','','');
289 $self->{_warnings} = [];
291 # collapse newlines with whitespace
292 if(s/\s*\n+\s*/ /g) {
293 $self->warning("collapsing newlines to blanks");
295 # strip leading/trailing whitespace
297 $self->warning("ignoring leading whitespace in link");
300 $self->warning("ignoring trailing whitespace in link");
303 _invalid_link("empty link");
307 ## Check for different possibilities. This is tedious and error-prone
308 # we match all possibilities (alttext, page, section/item)
309 #warn "DEBUG: link=$_\n";
312 # problem: a lot of people use (), or (1) or the like to indicate
313 # man page sections. But this collides with L<func()> that is supposed
314 # to point to an internal funtion...
315 my $page_rx = '[\w.]+(?:::[\w.]+)*(?:[(](?:\d\w*|)[)]|)';
317 if(m!^($page_rx)$!o) {
321 # alttext, page and "section"
322 elsif(m!^(.*?)\s*[|]\s*($page_rx)\s*/\s*"(.+)"$!o) {
323 ($alttext, $page, $node) = ($1, $2, $3);
327 elsif(m!^(.*?)\s*[|]\s*($page_rx)$!o) {
328 ($alttext, $page) = ($1, $2);
331 # alttext and "section"
332 elsif(m!^(.*?)\s*[|]\s*(?:/\s*|)"(.+)"$!) {
333 ($alttext, $node) = ($1,$2);
337 elsif(m!^($page_rx)\s*/\s*"(.+)"$!o) {
338 ($page, $node) = ($1, $2);
342 elsif(m!^($page_rx)\s*/\s*(.+)$!o) {
343 ($page, $node) = ($1, $2);
347 elsif(m!^/?"(.+)"$!) {
352 elsif(m!^\s*/(.+)$!) {
356 # non-standard: Hyperlink
357 elsif(m!^((?:http|ftp|mailto|news):.+)$!i) {
361 # alttext, page and item
362 elsif(m!^(.*?)\s*[|]\s*($page_rx)\s*/\s*(.+)$!o) {
363 ($alttext, $page, $node) = ($1, $2, $3);
367 elsif(m!^(.*?)\s*[|]\s*/(.+)$!) {
368 ($alttext, $node) = ($1,$2);
370 # nonstandard: alttext and hyperlink
371 elsif(m!^(.*?)\s*[|]\s*((?:http|ftp|mailto|news):.+)$!) {
372 ($alttext, $node) = ($1,$2);
375 # must be an item or a "malformed" section (without "")
380 # collapse whitespace in nodes
383 # empty alternative text expands to node name
384 if(defined $alttext) {
385 if(!length($alttext)) {
386 $alttext = $node | $page;
393 if($page =~ /[(]\w*[)]$/) {
394 $self->warning("(section) in '$page' deprecated");
396 if($node =~ m:[|/]:) {
397 $self->warning("node '$node' contains non-escaped | or /");
399 if($alttext =~ m:[|/]:) {
400 $self->warning("alternative text '$node' contains non-escaped | or /");
402 $self->{-page} = $page;
403 $self->{-node} = $node;
404 $self->{-alttext} = $alttext;
405 #warn "DEBUG: page=$page section=$section item=$item alttext=$alttext\n";
406 $self->{-type} = $type;
407 $self->_construct_text();
411 sub _construct_text {
413 my $alttext = $self->alttext();
414 my $type = $self->type();
415 my $section = $self->node();
416 my $page = $self->page();
418 $page =~ s/([(]\w*[)])$// && ($page_ext = $1);
420 $self->{_text} = $alttext;
422 elsif($type eq 'hyperlink') {
423 $self->{_text} = $section;
426 $self->{_text} = (!$section ? '' :
427 $type eq 'item' ? "the $section entry" :
428 "the section on $section" ) .
429 ($page ? ($section ? ' in ':'') . "the $page$page_ext manpage" :
430 ' elsewhere in this document');
432 # for being marked up later
433 # use the non-standard markers P<> and Q<>, so that the resulting
434 # text can be parsed by the translators. It's their job to put
435 # the correct hypertext around the linktext
437 $self->{_markup} = "Q<$alttext>";
439 elsif($type eq 'hyperlink') {
440 $self->{_markup} = "Q<$section>";
443 $self->{_markup} = (!$section ? '' :
444 $type eq 'item' ? "the Q<$section> entry" :
445 "the section on Q<$section>" ) .
446 ($page ? ($section ? ' in ':'') . "the P<$page>$page_ext manpage" :
447 ' elsewhere in this document');
451 =item $link-E<gt>markup($string)
453 Set/retrieve the textual value of the link. This string contains special
454 markers C<PE<lt>E<gt>> and C<QE<lt>E<gt>> that should be expanded by the
455 translator's interior sequence expansion engine to the
456 formatter-specific code to highlight/activate the hyperlink. The details
457 have to be implemented in the translator.
461 #' retrieve/set markuped text
463 return (@_ > 1) ? ($_[0]->{_markup} = $_[1]) : $_[0]->{_markup};
466 =item $link-E<gt>text()
468 This method returns the textual representation of the hyperlink as above,
469 but without markers (read only). Depending on the link type this is one of
470 the following alternatives (the + and * denote the portions of the text
474 the *$|* entry in the +perlvar+ manpage
475 the section on *OPTIONS* in the +perldoc+ manpage
476 the section on *DESCRIPTION* elsewhere in this document
480 # The complete link's text
485 =item $link-E<gt>warning()
487 After parsing, this method returns any warnings encountered during the
492 # Set/retrieve warnings
496 push(@{$self->{_warnings}}, @_);
499 return @{$self->{_warnings}};
502 =item $link-E<gt>file()
504 =item $link-E<gt>line()
506 Just simple slots for storing information about the line and the file
507 the link was encountered in. Has to be filled in manually.
511 # The line in the file the link appears
513 return (@_ > 1) ? ($_[0]->{-line} = $_[1]) : $_[0]->{-line};
516 # The POD file name the link appears in
518 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
521 =item $link-E<gt>page()
523 This method sets or returns the POD page this link points to.
527 # The POD page the link appears on
530 $_[0]->{-page} = $_[1];
531 $_[0]->_construct_text();
536 =item $link-E<gt>node()
538 As above, but the destination node text of the link.
542 # The link destination
545 $_[0]->{-node} = $_[1];
546 $_[0]->_construct_text();
551 =item $link-E<gt>alttext()
553 Sets or returns an alternative text specified in the link.
557 # Potential alternative text
560 $_[0]->{-alttext} = $_[1];
561 $_[0]->_construct_text();
566 =item $link-E<gt>type()
568 The node type, either C<section> or C<item>. As an unofficial type,
569 there is also C<hyperlink>, derived from e.g. C<LE<lt>http://perl.comE<gt>>
573 # The type: item or headn
575 return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};
578 =item $link-E<gt>link()
580 Returns the link as contents of C<LE<lt>E<gt>>. Reciprocal to B<parse()>.
589 my $link = $self->page() || '';
591 my $node = $self->node();
592 $text =~ s/\|/E<verbar>/g;
593 $text =~ s:/:E<sol>:g;
594 if($self->type() eq 'section') {
595 $link .= ($link ? '/' : '') . '"' . $node . '"';
597 elsif($self->type() eq 'hyperlink') {
598 $link = $self->node();
601 $link .= '/' . $node;
604 if($self->alttext()) {
605 my $text = $self->alttext();
606 $text =~ s/\|/E<verbar>/g;
607 $text =~ s:/:E<sol>:g;
608 $link = "$text|$link";
616 #eval { die "$msg\n" };
618 $@ = $msg; # this seems to work, too!
622 #-----------------------------------------------------------------------------
625 # class to hold POD page details
626 #-----------------------------------------------------------------------------
632 B<Pod::Cache> holds information about a set of POD documents,
633 especially the nodes for hyperlinks.
634 The following methods are available:
638 =item Pod::Cache-E<gt>new()
640 Create a new cache object. This object can hold an arbitrary number of
641 POD documents of class Pod::Cache::Item.
647 my $class = ref($this) || $this;
653 =item $cache-E<gt>item()
655 Add a new item to the cache. Without arguments, this method returns a
656 list of all cache elements.
661 my ($self,%param) = @_;
663 my $item = Pod::Cache::Item->new(%param);
672 =item $cache-E<gt>find_page($name)
674 Look for a POD document named C<$name> in the cache. Returns the
675 reference to the corresponding Pod::Cache::Item object or undef if
683 my ($self,$page) = @_;
685 if($_->page() eq $page) {
692 package Pod::Cache::Item;
694 =head2 Pod::Cache::Item
696 B<Pod::Cache::Item> holds information about individual POD documents,
697 that can be grouped in a Pod::Cache object.
698 It is intended to hold information about the hyperlink nodes of POD
700 The following methods are available:
704 =item Pod::Cache::Item-E<gt>new()
712 my $class = ref($this) || $this;
714 my $self = {%params};
722 $self->{-nodes} = [] unless(defined $self->{-nodes});
725 =item $cacheitem-E<gt>page()
727 Set/retrieve the POD document name (e.g. "Pod::Parser").
733 return (@_ > 1) ? ($_[0]->{-page} = $_[1]) : $_[0]->{-page};
736 =item $cacheitem-E<gt>description()
738 Set/retrieve the POD short description as found in the C<=head1 NAME>
743 # The POD description, taken out of NAME if present
745 return (@_ > 1) ? ($_[0]->{-description} = $_[1]) : $_[0]->{-description};
748 =item $cacheitem-E<gt>path()
750 Set/retrieve the POD file storage path.
756 return (@_ > 1) ? ($_[0]->{-path} = $_[1]) : $_[0]->{-path};
759 =item $cacheitem-E<gt>file()
761 Set/retrieve the POD file name.
767 return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};
770 =item $cacheitem-E<gt>nodes()
772 Add a node (or a list of nodes) to the document's node list. Note that
773 the order is kept, i.e. start with the first node and end with the last.
774 If no argument is given, the current list of nodes is returned in the
775 same order the nodes have been added.
776 A node can be any scalar, but usually is a pair of node string and
777 unique id for the C<find_node> method to work correctly.
783 my ($self,@nodes) = @_;
785 push(@{$self->{-nodes}}, @nodes);
789 return @{$self->{-nodes}};
793 =item $cacheitem-E<gt>find_node($name)
795 Look for a node or index entry named C<$name> in the object.
796 Returns the unique id of the node (i.e. the second element of the array
797 stored in the node arry) or undef if not found.
802 my ($self,$node) = @_;
804 push(@search, @{$self->{-nodes}}) if($self->{-nodes});
805 push(@search, @{$self->{-idx}}) if($self->{-idx});
807 if($_->[0] eq $node) {
814 =item $cacheitem-E<gt>idx()
816 Add an index entry (or a list of them) to the document's index list. Note that
817 the order is kept, i.e. start with the first node and end with the last.
818 If no argument is given, the current list of index entries is returned in the
819 same order the entries have been added.
820 An index entry can be any scalar, but usually is a pair of string and
827 # The POD index entries
829 my ($self,@idx) = @_;
831 push(@{$self->{-idx}}, @idx);
835 return @{$self->{-idx}};
841 Marek Rouchal E<lt>marek@saftsack.fs.uni-bayreuth.deE<gt>, borrowing
842 a lot of things from L<pod2man> and L<pod2roff> as well as other POD
843 processing tools by Tom Christiansen, Brad Appleton and Russ Allbery.
847 L<pod2man>, L<pod2roff>, L<Pod::Parser>, L<Pod::Checker>,