1 package Text::ParseWords;
3 use vars qw($VERSION @ISA @EXPORT $PERL_SINGLE_QUOTE);
10 @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
11 @EXPORT_OK = qw(old_shellwords);
16 $lines[$#lines] =~ s/\s+$//;
17 return(quotewords('\s+', 0, @lines));
23 my($delim, $keep, @lines) = @_;
24 my($line, @words, @allwords);
27 foreach $line (@lines) {
28 @words = parse_line($delim, $keep, $line);
29 return() unless (@words || !length($line));
30 push(@allwords, @words);
37 sub nested_quotewords {
38 my($delim, $keep, @lines) = @_;
41 for ($i = 0; $i < @lines; $i++) {
42 @{$allwords[$i]} = parse_line($delim, $keep, $lines[$i]);
43 return() unless (@{$allwords[$i]} || !length($lines[$i]));
51 # We will be testing undef strings
53 use re 'taint'; # if it's tainted, leave it as such
55 my($delimiter, $keep, $line) = @_;
56 my($quote, $quoted, $unquoted, $delim, $word, @pieces);
58 while (length($line)) {
60 ($quote, $quoted, undef, $unquoted, $delim, undef) =
61 $line =~ m/^(["']) # a $quote
62 ((?:\\.|(?!\1)[^\\])*) # and $quoted text
63 \1 # followed by the same quote
64 ([\000-\377]*) # and the rest
66 ^((?:\\.|[^\\"'])*?) # an $unquoted text
67 (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
68 # plus EOL, delimiter, or quote
69 ([\000-\377]*) # the rest
71 return() unless( $quote || length($unquoted) || length($delim));
76 $quoted = "$quote$quoted$quote";
79 $unquoted =~ s/\\(.)/$1/g;
81 $quoted =~ s/\\(.)/$1/g if ($quote eq '"');
82 $quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
85 $word .= defined $quote ? $quoted : $unquoted;
89 push(@pieces, $delim) if ($keep eq 'delimiters');
105 # @words = old_shellwords($line);
107 # @words = old_shellwords(@lines);
109 local($_) = join('', @_);
110 my(@words,$snippet,$field);
116 if (s/^"(([^"\\]|\\.)*)"//) {
117 ($snippet = $1) =~ s#\\(.)#$1#g;
122 elsif (s/^'(([^'\\]|\\.)*)'//) {
123 ($snippet = $1) =~ s#\\(.)#$1#g;
131 elsif (s/^([^\s\\'"]+)//) {
140 push(@words, $field);
151 Text::ParseWords - parse text into an array of tokens or array of arrays
155 use Text::ParseWords;
156 @lists = &nested_quotewords($delim, $keep, @lines);
157 @words = "ewords($delim, $keep, @lines);
158 @words = &shellwords(@lines);
159 @words = &parse_line($delim, $keep, $line);
160 @words = &old_shellwords(@lines); # DEPRECATED!
164 The &nested_quotewords() and "ewords() functions accept a delimiter
165 (which can be a regular expression)
166 and a list of lines and then breaks those lines up into a list of
167 words ignoring delimiters that appear inside quotes. "ewords()
168 returns all of the tokens in a single long list, while &nested_quotewords()
169 returns a list of token lists corresponding to the elements of @lines.
170 &parse_line() does tokenizing on a single string. The &*quotewords()
171 functions simply call &parse_line(), so if you're only splitting
172 one line you can call &parse_line() directly and save a function
175 The $keep argument is a boolean flag. If true, then the tokens are
176 split on the specified delimiter, but all other characters (quotes,
177 backslashes, etc.) are kept in the tokens. If $keep is false then the
178 &*quotewords() functions remove all quotes and backslashes that are
179 not themselves backslash-escaped or inside of single quotes (i.e.,
180 "ewords() tries to interpret these characters just like the Bourne
181 shell). NB: these semantics are significantly different from the
182 original version of this module shipped with Perl 5.000 through 5.004.
183 As an additional feature, $keep may be the keyword "delimiters" which
184 causes the functions to preserve the delimiters in each string as
185 tokens in the token lists, in addition to preserving quote and
186 backslash characters.
188 &shellwords() is written as a special case of "ewords(), and it
189 does token parsing with whitespace as a delimiter-- similar to most
196 use Text::ParseWords;
197 @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
223 multiple spaces are skipped because of our $delim
227 use of quotes to include a space in a word
231 use of a backslash to include a space in a word
235 use of a backslash to remove the special meaning of a double-quote
239 another simple word (note the lack of effect of the
240 backslashed double-quote)
244 Replacing C<"ewords('\s+', 0, q{this is...})>
245 with C<&shellwords(q{this is...})>
246 is a simpler way to accomplish the same thing.
250 Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
251 author unknown). Much of the code for &parse_line() (including the
252 primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
254 Examples section another documentation provided by John Heidemann
257 Bug reports, patches, and nagging provided by lots of folks-- thanks
258 everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
259 for assuring me that a &nested_quotewords() would be useful, and to
260 Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
261 error-checking (sort of-- you had to be there).