1 package Text::ParseWords;
7 *AUTOLOAD = \&AutoLoader::AUTOLOAD;
11 @EXPORT = qw(shellwords quotewords);
12 @EXPORT_OK = qw(old_shellwords);
16 Text::ParseWords - parse text into an array of tokens
21 @words = "ewords($delim, $keep, @lines);
22 @words = &shellwords(@lines);
23 @words = &old_shellwords(@lines);
27 "ewords() accepts a delimiter (which can be a regular expression)
28 and a list of lines and then breaks those lines up into a list of
29 words ignoring delimiters that appear inside quotes.
31 The $keep argument is a boolean flag. If true, the quotes are kept
32 with each word, otherwise quotes are stripped in the splitting process.
33 $keep also defines whether unprotected backslashes are retained.
35 A &shellwords() replacement is included to demonstrate the new package.
36 This version differs from the original in that it will _NOT_ default
37 to using $_ if no arguments are given. I personally find the old behavior
40 "ewords() works by simply jamming all of @lines into a single
41 string in $_ and then pulling off words a bit at a time until $_
46 Hal Pomeranz (pomeranz@netcom.com), 23 March 1994
48 Basically an update and generalization of the old shellwords.pl.
49 Much code shamelessly stolen from the old version (author unknown).
58 $lines[$#lines] =~ s/\s+$//;
59 "ewords('\s+', 0, @lines);
66 # The inner "for" loop builds up each word (or $field) one $snippet
67 # at a time. A $snippet is a quoted string, a backslashed character,
68 # or an unquoted string. We fall out of the "for" loop when we reach
69 # the end of $_ or when we hit a delimiter. Falling out of the "for"
70 # loop, we push the $field we've been building up onto the list of
71 # @words we'll be returning, and then loop back and pull another word
74 # The first two cases inside the "for" loop deal with quoted strings.
75 # The first case matches a double quoted string, removes it from $_,
76 # and assigns the double quoted string to $snippet in the body of the
77 # conditional. The second case handles single quoted strings. In
78 # the third case we've found a quote at the current beginning of $_,
79 # but it didn't match the quoted string regexps in the first two cases,
80 # so it must be an unbalanced quote and we croak with an error (which can
81 # be caught by eval()).
83 # The next case handles backslashed characters, and the next case is the
84 # exit case on reaching the end of the string or finding a delimiter.
86 # Otherwise, we've found an unquoted thing and we pull of characters one
87 # at a time until we reach something that could start another $snippet--
88 # a quote of some sort, a backslash, or the delimiter. This one character
89 # at a time behavior was necessary if the delimiter was going to be a
90 # regexp (love to hear it if you can figure out a better way).
92 my ($delim, $keep, @lines) = @_;
93 my (@words, $snippet, $field);
95 local $_ = join ('', @lines);
103 if (s/^"([^"\\]*(\\.[^"\\]*)*)"//) {
105 $snippet = qq|"$snippet"| if $keep;
107 elsif (s/^'([^'\\]*(\\.[^'\\]*)*)'//) {
109 $snippet = "'$snippet'" if $keep;
112 croak 'Unmatched quote';
116 $snippet = "\\$snippet" if $keep;
118 elsif (!length || s/^$delim//) {
122 while (length && !(/^$delim/ || /^['"\\]/)) {
123 $snippet .= substr ($_, 0, 1);
124 substr($_, 0, 1) = '';
142 # @words = old_shellwords($line);
144 # @words = old_shellwords(@lines);
146 local($_) = join('', @_);
147 my(@words,$snippet,$field);
153 if (s/^"(([^"\\]|\\.)*)"//) {
154 ($snippet = $1) =~ s#\\(.)#$1#g;
157 croak "Unmatched double quote: $_";
159 elsif (s/^'(([^'\\]|\\.)*)'//) {
160 ($snippet = $1) =~ s#\\(.)#$1#g;
163 croak "Unmatched single quote: $_";
168 elsif (s/^([^\s\\'"]+)//) {
177 push(@words, $field);