1 package Text::ParseWords;
3 use vars qw($VERSION @ISA @EXPORT $PERL_SINGLE_QUOTE);
10 @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
11 @EXPORT_OK = qw(old_shellwords);
16 $lines[$#lines] =~ s/\s+$//;
17 return(quotewords('\s+', 0, @lines));
23 my($delim, $keep, @lines) = @_;
24 my($line, @words, @allwords);
27 foreach $line (@lines) {
28 @words = parse_line($delim, $keep, $line);
29 return() unless (@words || !length($line));
30 push(@allwords, @words);
37 sub nested_quotewords {
38 my($delim, $keep, @lines) = @_;
41 for ($i = 0; $i < @lines; $i++) {
42 @{$allwords[$i]} = parse_line($delim, $keep, $lines[$i]);
43 return() unless (@{$allwords[$i]} || !length($lines[$i]));
51 # We will be testing undef strings
54 my($delimiter, $keep, $line) = @_;
55 my($quote, $quoted, $unquoted, $delim, $word, @pieces);
57 while (length($line)) {
59 ($quote, $quoted, undef, $unquoted, $delim, undef) =
60 $line =~ m/^(["']) # a $quote
61 ((?:\\.|(?!\1)[^\\])*) # and $quoted text
62 \1 # followed by the same quote
63 ([\000-\377]*) # and the rest
65 ^((?:\\.|[^\\"'])*?) # an $unquoted text
66 (\Z(?!\n)|$delimiter|(?!^)(?=["']))
67 # plus EOL, delimiter, or quote
68 ([\000-\377]*) # the rest
70 return() unless( $quote || length($unquoted) || length($delim));
75 $quoted = "$quote$quoted$quote";
78 $unquoted =~ s/\\(.)/$1/g;
79 $quoted =~ s/\\(.)/$1/g if ($quote eq '"');
80 $quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
82 $word .= ($quote) ? $quoted : $unquoted;
86 push(@pieces, $delim) if ($keep eq 'delimiters');
102 # @words = old_shellwords($line);
104 # @words = old_shellwords(@lines);
106 local($_) = join('', @_);
107 my(@words,$snippet,$field);
113 if (s/^"(([^"\\]|\\.)*)"//) {
114 ($snippet = $1) =~ s#\\(.)#$1#g;
119 elsif (s/^'(([^'\\]|\\.)*)'//) {
120 ($snippet = $1) =~ s#\\(.)#$1#g;
128 elsif (s/^([^\s\\'"]+)//) {
137 push(@words, $field);
148 Text::ParseWords - parse text into an array of tokens or array of arrays
152 use Text::ParseWords;
153 @lists = &nested_quotewords($delim, $keep, @lines);
154 @words = "ewords($delim, $keep, @lines);
155 @words = &shellwords(@lines);
156 @words = &parse_line($delim, $keep, $line);
157 @words = &old_shellwords(@lines); # DEPRECATED!
161 The &nested_quotewords() and "ewords() functions accept a delimiter
162 (which can be a regular expression)
163 and a list of lines and then breaks those lines up into a list of
164 words ignoring delimiters that appear inside quotes. "ewords()
165 returns all of the tokens in a single long list, while &nested_quotewords()
166 returns a list of token lists corresponding to the elements of @lines.
167 &parse_line() does tokenizing on a single string. The &*quotewords()
168 functions simply call &parse_lines(), so if you're only splitting
169 one line you can call &parse_lines() directly and save a function
172 The $keep argument is a boolean flag. If true, then the tokens are
173 split on the specified delimiter, but all other characters (quotes,
174 backslashes, etc.) are kept in the tokens. If $keep is false then the
175 &*quotewords() functions remove all quotes and backslashes that are
176 not themselves backslash-escaped or inside of single quotes (i.e.,
177 "ewords() tries to interpret these characters just like the Bourne
178 shell). NB: these semantics are significantly different from the
179 original version of this module shipped with Perl 5.000 through 5.004.
180 As an additional feature, $keep may be the keyword "delimiters" which
181 causes the functions to preserve the delimiters in each string as
182 tokens in the token lists, in addition to preserving quote and
183 backslash characters.
185 &shellwords() is written as a special case of "ewords(), and it
186 does token parsing with whitespace as a delimiter-- similar to most
193 use Text::ParseWords;
194 @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
218 multiple spaces are skipped because of our $delim
221 use of quotes to include a space in a word
224 use of a backslash to include a space in a word
227 use of a backslash to remove the special meaning of a double-quote
230 another simple word (note the lack of effect of the
231 backslashed double-quote)
235 Replacing C<"ewords('\s+', 0, q{this is...})>
236 with C<&shellwords(q{this is...})>
237 is a simpler way to accomplish the same thing.
241 Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
242 author unknown). Much of the code for &parse_line() (including the
243 primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
245 Examples section another documentation provided by John Heidemann
248 Bug reports, patches, and nagging provided by lots of folks-- thanks
249 everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
250 for assuring me that a &nested_quotewords() would be useful, and to
251 Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
252 error-checking (sort of-- you had to be there).