1 package Text::ParseWords;
3 use vars qw($VERSION @ISA @EXPORT);
10 @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
11 @EXPORT_OK = qw(old_shellwords);
16 $lines[$#lines] =~ s/\s+$//;
17 return(quotewords('\s+', 0, @lines));
23 my($delim, $keep, @lines) = @_;
24 my($line, @words, @allwords);
27 foreach $line (@lines) {
28 @words = parse_line($delim, $keep, $line);
29 return() unless (@words || !length($line));
30 push(@allwords, @words);
37 sub nested_quotewords {
38 my($delim, $keep, @lines) = @_;
41 for ($i = 0; $i < @lines; $i++) {
42 @{$allwords[$i]} = parse_line($delim, $keep, $lines[$i]);
43 return() unless (@{$allwords[$i]} || !length($lines[$i]));
51 my($delimiter, $keep, $line) = @_;
52 my($quote, $quoted, $unquoted, $delim, $word, @pieces);
54 while (length($line)) {
55 ($quote, $quoted, $unquoted, $delim) =
56 $line =~ m/^(["']) # a $quote
57 ((?:\\.|[^\1\\])*?) # and $quoted text
58 \1 # followed by the same quote
60 ^((?:\\.|[^\\"'])*?) # an $unquoted text
61 (\Z(?!\n)|$delimiter|(?!^)(?=["']))
62 # plus EOL, delimiter, or quote
65 return() unless(length($&));
69 $quoted = "$quote$quoted$quote";
72 $unquoted =~ s/\\(.)/$1/g;
73 $quoted =~ s/\\(.)/$1/g if ($quote eq '"');
75 $word .= ($quote) ? $quoted : $unquoted;
79 push(@pieces, $delim) if ($keep eq 'delimiters');
95 # @words = old_shellwords($line);
97 # @words = old_shellwords(@lines);
99 local($_) = join('', @_);
100 my(@words,$snippet,$field);
106 if (s/^"(([^"\\]|\\.)*)"//) {
107 ($snippet = $1) =~ s#\\(.)#$1#g;
112 elsif (s/^'(([^'\\]|\\.)*)'//) {
113 ($snippet = $1) =~ s#\\(.)#$1#g;
121 elsif (s/^([^\s\\'"]+)//) {
130 push(@words, $field);
141 Text::ParseWords - parse text into an array of tokens or array of arrays
145 use Text::ParseWords;
146 @lists = &nested_quotewords($delim, $keep, @lines);
147 @words = "ewords($delim, $keep, @lines);
148 @words = &shellwords(@lines);
149 @words = &parse_line($delim, $keep, $line);
150 @words = &old_shellwords(@lines); # DEPRECATED!
154 The &nested_quotewords() and "ewords() functions accept a delimiter
155 (which can be a regular expression)
156 and a list of lines and then breaks those lines up into a list of
157 words ignoring delimiters that appear inside quotes. "ewords()
158 returns all of the tokens in a single long list, while &nested_quotewords()
159 returns a list of token lists corresponding to the elements of @lines.
160 &parse_line() does tokenizing on a single string. The &*quotewords()
161 functions simply call &parse_lines(), so if you're only splitting
162 one line you can call &parse_lines() directly and save a function
165 The $keep argument is a boolean flag. If true, then the tokens are
166 split on the specified delimiter, but all other characters (quotes,
167 backslashes, etc.) are kept in the tokens. If $keep is false then the
168 &*quotewords() functions remove all quotes and backslashes that are
169 not themselves backslash-escaped or inside of single quotes (i.e.,
170 "ewords() tries to interpret these characters just like the Bourne
171 shell). NB: these semantics are significantly different from the
172 original version of this module shipped with Perl 5.000 through 5.004.
173 As an additional feature, $keep may be the keyword "delimiters" which
174 causes the functions to preserve the delimiters in each string as
175 tokens in the token lists, in addition to preserving quote and
176 backslash characters.
178 &shellwords() is written as a special case of "ewords(), and it
179 does token parsing with whitespace as a delimiter-- similar to most
186 use Text::ParseWords;
187 @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
211 multiple spaces are skipped because of our $delim
214 use of quotes to include a space in a word
217 use of a backslash to include a space in a word
220 use of a backslash to remove the special meaning of a double-quote
223 another simple word (note the lack of effect of the
224 backslashed double-quote)
228 Replacing C<"ewords('\s+', 0, q{this is...})>
229 with C<&shellwords(q{this is...})>
230 is a simpler way to accomplish the same thing.
234 Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
235 author unknown). Much of the code for &parse_line() (including the
236 primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
238 Examples section another documentation provided by John Heidemann
241 Bug reports, patches, and nagging provided by lots of folks-- thanks
242 everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
243 for assuring me that a &nested_quotewords() would be useful, and to
244 Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
245 error-checking (sort of-- you had to be there).