that have character classes such as \w as either endpoint.
This change re-establishes the old behavior which meant that
such ranges weren't really ranges, the "-" was literal.
Moreover, this change also fixes the old behavior to be
more consistent: [\w-.] and [\s-\w] worked, but [.-\w] didn't.
Now they all do work as described above. The #3926 outlawed
all of those.
p4raw-id: //depot/cfgperl@4355
=item invalid [] range in regexp
(F) The range specified in a character class had a minimum character
-greater than the maximum character, or the range didn't start/end with
-a literal character. See L<perlre>.
+greater than the maximum character. See L<perlre>.
=item Invalid conversion in %s: "%s"
the same as matching an English word). If C<use locale> is in effect, the
list of alphabetic characters generated by C<\w> is taken from the
current locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes (though not as either end of
-a range). See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
+C<\d>, and C<\D> within character classes, but if you try to use them
+as endpoints of a range, that's not a range, the "-" is understood literally.
+See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.
The POSIX character class syntax
following all specify the same class of three characters: C<[-az]>,
C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
specifies a class containing twenty-six characters.)
+Also, if you try to use the character classes C<\w>, C<\W>, C<\s>,
+C<\S>, C<\d>, or C<\D> as endpoints of a range, that's not a range,
+the "-" is understood literally.
Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results
}
}
if (!SIZE_ONLY && namedclass > OOB_NAMEDCLASS) {
- if (range)
- FAIL("invalid [] range in regexp"); /* [a-\w], [a-[:word:]] */
+ if (range) {
+ ANYOF_BITMAP_SET(opnd, lastvalue);
+ ANYOF_BITMAP_SET(opnd, '-');
+ }
switch (namedclass) {
case ANYOF_ALNUM:
if (LOC)
ANYOF_FLAGS(opnd) |= ANYOF_CLASS;
continue;
}
+ if (range && namedclass > OOB_NAMEDCLASS)
+ range = 0; /* [a-\d], [a-[:digit:]], not a true range. */
if (range) {
if (lastvalue > value)
FAIL("invalid [] range in regexp"); /* [b-a] */
lastvalue = value;
if (*PL_regcomp_parse == '-' && PL_regcomp_parse+1 < PL_regxend &&
PL_regcomp_parse[1] != ']') {
- if (namedclass > OOB_NAMEDCLASS)
- FAIL("invalid [] range in regexp"); /* [\w-a] */
PL_regcomp_parse++;
range = 1;
continue; /* do it next time */
}
}
if (!SIZE_ONLY && namedclass > OOB_NAMEDCLASS) {
- if (range)
- FAIL("invalid [] range in regexp"); /* [a-\w], [a-[:word:]] */
- switch (namedclass) {
+ if (range) /* [a-\d], [a-[:digit:]] */
+ Perl_sv_catpvf(aTHX_ listsv, /* 0x002D is Unicode for '-' */
+ "%04"UVxf"\n%002D\n", (UV)lastvalue);
+ switch (namedclass) {
case ANYOF_ALNUM:
Perl_sv_catpvf(aTHX_ listsv, "+utf8::IsWord\n"); break;
case ANYOF_NALNUM:
}
continue;
}
+ if (range && namedclass > OOB_NAMEDCLASS)
+ range = 0; /* [a-\d], [a-[:digit:]], not a true range. */
if (range) {
if (lastvalue > value)
FAIL("invalid [] range in regexp"); /* [b-a] */
lastvalue = value;
if (*PL_regcomp_parse == '-' && PL_regcomp_parse+1 < PL_regxend &&
PL_regcomp_parse[1] != ']') {
- if (namedclass > OOB_NAMEDCLASS)
- FAIL("invalid [] range in regexp"); /* [\w-a] */
PL_regcomp_parse++;
range = 1;
continue; /* do it next time */
.[X](.+)+[X][X] bbbbXXXaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa n - -
.[X][X](.+)+[X] bbbbXXXaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa n - -
tt+$ xxxtt y - -
-[a-\w] - c - /[a-\w]/: invalid [] range in regexp
-[\w-z] - c - /[\w-z]/: invalid [] range in regexp
-[0-[:digit:]] - c - /[0-[:digit:]]/: invalid [] range in regexp
-[[:digit:]-9] - c - /[[:digit:]-9]/: invalid [] range in regexp
+([a-\d]+) za-9z y $1 a-9
+([\d-\s]+) a0- z y $1 0-
+([\d-z]+) a0-za y $1 0-z
+([a-[:digit:]]+) za-9z y $1 a-9
+([[:digit:]-[:alpha:]]+) =0-z= y $1 0-z
+([[:digit:]-z]+) =0-z= y $1 0-z
\GX.*X aaaXbX n - -