1 package Text::ParseWords;
3 use vars qw($VERSION @ISA @EXPORT $PERL_SINGLE_QUOTE);
10 @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
11 @EXPORT_OK = qw(old_shellwords);
18 foreach my $line (@lines) {
20 my @words = parse_line('\s+', 0, $line);
21 pop @words if (@words and !defined $words[-1]);
22 return() unless (@words || !length($line));
23 push(@allwords, @words);
31 my($delim, $keep, @lines) = @_;
32 my($line, @words, @allwords);
34 foreach $line (@lines) {
35 @words = parse_line($delim, $keep, $line);
36 return() unless (@words || !length($line));
37 push(@allwords, @words);
44 sub nested_quotewords {
45 my($delim, $keep, @lines) = @_;
48 for ($i = 0; $i < @lines; $i++) {
49 @{$allwords[$i]} = parse_line($delim, $keep, $lines[$i]);
50 return() unless (@{$allwords[$i]} || !length($lines[$i]));
58 my($delimiter, $keep, $line) = @_;
61 no warnings 'uninitialized'; # we will be testing undef strings
63 while (length($line)) {
64 $line =~ s/^(["']) # a $quote
65 ((?:\\.|(?!\1)[^\\])*) # and $quoted text
66 \1 # followed by the same quote
68 ^((?:\\.|[^\\"'])*?) # an $unquoted text
69 (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
70 # plus EOL, delimiter, or quote
71 //xs or return; # extended layout
72 my($quote, $quoted, $unquoted, $delim) = ($1, $2, $3, $4);
73 return() unless( defined($quote) || length($unquoted) || length($delim));
76 $quoted = "$quote$quoted$quote";
79 $unquoted =~ s/\\(.)/$1/sg;
81 $quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
82 $quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
85 $word .= substr($line, 0, 0); # leave results tainted
86 $word .= defined $quote ? $quoted : $unquoted;
90 push(@pieces, $delim) if ($keep eq 'delimiters');
106 # @words = old_shellwords($line);
108 # @words = old_shellwords(@lines);
110 # @words = old_shellwords(); # defaults to $_ (and clobbers it)
112 no warnings 'uninitialized'; # we will be testing undef strings
113 local *_ = \join('', @_) if @_;
114 my (@words, $snippet);
118 my $field = substr($_, 0, 0); # leave results tainted
120 if (s/\A"(([^"\\]|\\.)*)"//s) {
121 ($snippet = $1) =~ s#\\(.)#$1#sg;
125 Carp::carp("Unmatched double quote: $_");
128 elsif (s/\A'(([^'\\]|\\.)*)'//s) {
129 ($snippet = $1) =~ s#\\(.)#$1#sg;
133 Carp::carp("Unmatched single quote: $_");
136 elsif (s/\A\\(.?)//s) {
139 elsif (s/\A([^\s\\'"]+)//) {
148 push(@words, $field);
159 Text::ParseWords - parse text into an array of tokens or array of arrays
163 use Text::ParseWords;
164 @lists = &nested_quotewords($delim, $keep, @lines);
165 @words = "ewords($delim, $keep, @lines);
166 @words = &shellwords(@lines);
167 @words = &parse_line($delim, $keep, $line);
168 @words = &old_shellwords(@lines); # DEPRECATED!
172 The &nested_quotewords() and "ewords() functions accept a delimiter
173 (which can be a regular expression)
174 and a list of lines and then breaks those lines up into a list of
175 words ignoring delimiters that appear inside quotes. "ewords()
176 returns all of the tokens in a single long list, while &nested_quotewords()
177 returns a list of token lists corresponding to the elements of @lines.
178 &parse_line() does tokenizing on a single string. The &*quotewords()
179 functions simply call &parse_line(), so if you're only splitting
180 one line you can call &parse_line() directly and save a function
183 The $keep argument is a boolean flag. If true, then the tokens are
184 split on the specified delimiter, but all other characters (quotes,
185 backslashes, etc.) are kept in the tokens. If $keep is false then the
186 &*quotewords() functions remove all quotes and backslashes that are
187 not themselves backslash-escaped or inside of single quotes (i.e.,
188 "ewords() tries to interpret these characters just like the Bourne
189 shell). NB: these semantics are significantly different from the
190 original version of this module shipped with Perl 5.000 through 5.004.
191 As an additional feature, $keep may be the keyword "delimiters" which
192 causes the functions to preserve the delimiters in each string as
193 tokens in the token lists, in addition to preserving quote and
194 backslash characters.
196 &shellwords() is written as a special case of "ewords(), and it
197 does token parsing with whitespace as a delimiter-- similar to most
204 use Text::ParseWords;
205 @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
231 multiple spaces are skipped because of our $delim
235 use of quotes to include a space in a word
239 use of a backslash to include a space in a word
243 use of a backslash to remove the special meaning of a double-quote
247 another simple word (note the lack of effect of the
248 backslashed double-quote)
252 Replacing C<"ewords('\s+', 0, q{this is...})>
253 with C<&shellwords(q{this is...})>
254 is a simpler way to accomplish the same thing.
258 Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
259 author unknown). Much of the code for &parse_line() (including the
260 primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
262 Examples section another documentation provided by John Heidemann
265 Bug reports, patches, and nagging provided by lots of folks-- thanks
266 everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
267 for assuring me that a &nested_quotewords() would be useful, and to
268 Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
269 error-checking (sort of-- you had to be there).