Michael G. Schwern [Wed, 10 Nov 1999 17:21:46 +0000 (12:21 -0500)]
To: perl5-porters@perl.org, pod-people@perl.org
Cc: tchrist@mox.perl.com, gnat@frii.com
Message-ID: <
19991110172146.A23527@athens.aocn.com>
p4raw-id: //depot/cfgperl@4569
=head2 How do I remove HTML from a string?
The most correct way (albeit not the fastest) is to use HTML::Parser
-from CPAN (part of the HTML-Tree package on CPAN).
+from CPAN (part of the HTML-Tree package on CPAN). Another correct
+way is to use HTML::FormatText which not only removes HTML but also
+attempts to do a little simple formatting of the resulting plain text.
Many folks attempt a simple-minded regular expression approach, like
C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags