From: Karl Williamson Date: Wed, 14 Apr 2010 03:25:36 +0000 (-0600) Subject: PATCH: [perl #72998] regex looping X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=7dcb3b25fc4113f0eeb68d0d3c47ccedd5ff3f2a;p=p5sagit%2Fp5-mst-13.2.git PATCH: [perl #72998] regex looping If a character folds to multiple ones in case-insensitive matching, it should not match just one of those, or the regular expression can loop. For example, \N{LATIN SMALL LIGATURE FF} folds to 'ff', and so "\N{LATIN SMALL LIGATURE FF}" =~ /f+/i should match. Prior to this patch, this function returned that there is a match, but left the matching string pointer at the beginning of the "\N{LATIN SMALL LIGATURE FF}" because it doesn't make sense to match just half a character, and at this level it doesn't know about the '+'. This leaves things in an inconsistent state, with the reporting of a match, but the input pointer unchanged, the result of which is a loop. I don't know how to fix this so that it correctly matches, and there are semantic issues with doing so. For example, if "\N{LATIN SMALL LIGATURE FF}" =~ /ff/i matches, then one would think that so should "\N{LATIN SMALL LIGATURE FF}" =~ /(f)(f)/i But $1 and $2 don't really make sense here, since they both refer to the half of the same character. So this patch just returns failure if only a partial character is matched. That leaves things consistent, and solves the problem of looping, so that Perl doesn't hang on such a construct, but leaves the ultimate solution for another day. --- diff --git a/t/re/re.t b/t/re/re.t index 87965f2..249c6dd 100644 --- a/t/re/re.t +++ b/t/re/re.t @@ -51,6 +51,14 @@ if ('1234'=~/(?:(?\d)|(?!))(?\d)(?\d)(?\d)/){ } is(regnames_count(),3); } + + { # Keep this test last, as whole script will be interrupted if times out + # Bug #72998; this can loop + watchdog(2); + eval '"\x{100}\x{FB00}" =~ /\x{100}\N{U+66}+/i'; + pass("Didn't loop"); + } + # New tests above this line, don't forget to update the test count below! -BEGIN { plan tests => 18 } +BEGIN { plan tests => 19 } # No tests here! diff --git a/utf8.c b/utf8.c index 9ed0663..1a6077c 100644 --- a/utf8.c +++ b/utf8.c @@ -2609,7 +2609,8 @@ Perl_ibcmp_utf8(pTHX_ const char *s1, char **pe1, register UV l1, bool u1, const /* A match is defined by all the scans that specified * an explicit length reaching their final goals. */ - match = (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2); + match = (n1 == 0 && n2 == 0 /* Must not match partial char; Bug #72998 */ + && (f1 == 0 || p1 == f1) && (f2 == 0 || p2 == f2)); if (match) { if (pe1)