1 package Text::ParseWords;
3 use vars qw($VERSION @ISA @EXPORT $PERL_SINGLE_QUOTE);
10 @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
11 @EXPORT_OK = qw(old_shellwords);
16 $lines[$#lines] =~ s/\s+$//;
17 return(quotewords('\s+', 0, @lines));
23 my($delim, $keep, @lines) = @_;
24 my($line, @words, @allwords);
27 foreach $line (@lines) {
28 @words = parse_line($delim, $keep, $line);
29 return() unless (@words || !length($line));
30 push(@allwords, @words);
37 sub nested_quotewords {
38 my($delim, $keep, @lines) = @_;
41 for ($i = 0; $i < @lines; $i++) {
42 @{$allwords[$i]} = parse_line($delim, $keep, $lines[$i]);
43 return() unless (@{$allwords[$i]} || !length($lines[$i]));
51 # We will be testing undef strings
53 use re 'taint'; # if it's tainted, leave it as such
55 my($delimiter, $keep, $line) = @_;
58 while (length($line)) {
59 $line =~ s/^(["']) # a $quote
60 ((?:\\.|(?!\1)[^\\])*) # and $quoted text
61 \1 # followed by the same quote
63 ^((?:\\.|[^\\"'])*?) # an $unquoted text
64 (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
65 # plus EOL, delimiter, or quote
66 //xs; # extended layout
67 my($quote, $quoted, $unquoted, $delim) = ($1, $2, $3, $4);
68 return() unless( defined($quote) || length($unquoted) || length($delim));
71 $quoted = "$quote$quoted$quote";
74 $unquoted =~ s/\\(.)/$1/sg;
76 $quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
77 $quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
80 $word .= defined $quote ? $quoted : $unquoted;
84 push(@pieces, $delim) if ($keep eq 'delimiters');
100 # @words = old_shellwords($line);
102 # @words = old_shellwords(@lines);
104 local($_) = join('', @_);
105 my(@words,$snippet,$field);
111 if (s/^"(([^"\\]|\\.)*)"//) {
112 ($snippet = $1) =~ s#\\(.)#$1#g;
117 elsif (s/^'(([^'\\]|\\.)*)'//) {
118 ($snippet = $1) =~ s#\\(.)#$1#g;
126 elsif (s/^([^\s\\'"]+)//) {
135 push(@words, $field);
146 Text::ParseWords - parse text into an array of tokens or array of arrays
150 use Text::ParseWords;
151 @lists = &nested_quotewords($delim, $keep, @lines);
152 @words = "ewords($delim, $keep, @lines);
153 @words = &shellwords(@lines);
154 @words = &parse_line($delim, $keep, $line);
155 @words = &old_shellwords(@lines); # DEPRECATED!
159 The &nested_quotewords() and "ewords() functions accept a delimiter
160 (which can be a regular expression)
161 and a list of lines and then breaks those lines up into a list of
162 words ignoring delimiters that appear inside quotes. "ewords()
163 returns all of the tokens in a single long list, while &nested_quotewords()
164 returns a list of token lists corresponding to the elements of @lines.
165 &parse_line() does tokenizing on a single string. The &*quotewords()
166 functions simply call &parse_line(), so if you're only splitting
167 one line you can call &parse_line() directly and save a function
170 The $keep argument is a boolean flag. If true, then the tokens are
171 split on the specified delimiter, but all other characters (quotes,
172 backslashes, etc.) are kept in the tokens. If $keep is false then the
173 &*quotewords() functions remove all quotes and backslashes that are
174 not themselves backslash-escaped or inside of single quotes (i.e.,
175 "ewords() tries to interpret these characters just like the Bourne
176 shell). NB: these semantics are significantly different from the
177 original version of this module shipped with Perl 5.000 through 5.004.
178 As an additional feature, $keep may be the keyword "delimiters" which
179 causes the functions to preserve the delimiters in each string as
180 tokens in the token lists, in addition to preserving quote and
181 backslash characters.
183 &shellwords() is written as a special case of "ewords(), and it
184 does token parsing with whitespace as a delimiter-- similar to most
191 use Text::ParseWords;
192 @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
218 multiple spaces are skipped because of our $delim
222 use of quotes to include a space in a word
226 use of a backslash to include a space in a word
230 use of a backslash to remove the special meaning of a double-quote
234 another simple word (note the lack of effect of the
235 backslashed double-quote)
239 Replacing C<"ewords('\s+', 0, q{this is...})>
240 with C<&shellwords(q{this is...})>
241 is a simpler way to accomplish the same thing.
245 Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
246 author unknown). Much of the code for &parse_line() (including the
247 primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
249 Examples section another documentation provided by John Heidemann
252 Bug reports, patches, and nagging provided by lots of folks-- thanks
253 everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
254 for assuring me that a &nested_quotewords() would be useful, and to
255 Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
256 error-checking (sort of-- you had to be there).