From: Hugo van der Sanden Date: Sat, 14 Jun 1997 04:14:33 +0000 (+1200) Subject: Avoid core dump on some paren'd regexp matches X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=44ed422101809141bc33c2b85c1cff357de4d7bf;p=p5sagit%2Fp5-mst-13.2.git Avoid core dump on some paren'd regexp matches In article <199706260526.XAA01060@lunkwill.ml.org> Jason wrote: :This script causes Perl to dump core with a segmentation fault under :Linux as well as HP-UX. Here's the script: : :#!/usr/local/bin/perl :@justalist = ("foo\nbar" =~ /(\s|(foo)|(bar))*/ ); It does the same under 5.004_01. The reason is that on the second match it tried to match (foo) and succeeded, leaving startp[2] and endp[2] pointing to the beginning and end of the matched 'foo'. On the third match, it tried to match (foo) and failed; in doing so, it overwrote startp[2] with the startpoint it was trying to match ('bar'), but left endp[2] unaltered. If that third match had failed, no problem would occur - it would restore startp[] and endp[] from saved copies. However, because the third match then succeeded on the final alternate the modified startp[] and endp[] were retained, leaving a mismatched pair of values for $2. The solution depends on what the answer should be - one interpretation is that, since (foo) failed to match the last time it was tried, the results should be ('bar', undef, 'bar'). The first patch below effects this. Alternatively, you could say that it was more correct and/or more useful for it to return the last successful match on (foo), in which case you want the rather more complicated second patch below. I'm not an expert on this stuff - Ilya, can you take a look at these patches and tell me how broken they are, please? My own feeling is that the second interpretation is more useful, but I have much less confidence in the completeness of my patch for this. No test cases supplied at this stage: Jason's testcase above should suffice for the moment. Perl passes all tests here with either patch. p5p-msgid: 199706261236.NAA03472@crypt.compulink.co.uk --- diff --git a/regexec.c b/regexec.c index 7f60a91..19fdbfa 100644 --- a/regexec.c +++ b/regexec.c @@ -134,6 +134,30 @@ regcppop() return input; } +static void +regcppartblow() +{ + I32 i = SSPOPINT; + U32 paren = 0; + char *input; + char *startp; + char *endp; + int lastparen; + int size; + assert(i == SAVEt_REGCONTEXT); + i = SSPOPINT; + input = (char *) SSPOPPTR; + lastparen = SSPOPINT; + size = SSPOPINT; + for (i -= 3; i > 0; i -= 3) { + paren = (U32)SSPOPINT; + startp = (char *) SSPOPPTR; + endp = (char *) SSPOPPTR; + if (paren <= *reglastparen && regendp[paren] == endp) + regstartp[paren] = startp; + } +} + #define regcpblow(cp) leave_scope(cp) /* @@ -864,6 +888,7 @@ char *prog; case OPEN: n = ARG1(scan); /* which paren pair */ regstartp[n] = locinput; + regendp[n] = 0; if (n > regsize) regsize = n; break; @@ -944,7 +969,7 @@ char *prog; ln = regcc->cur; cp = regcppush(cc->parenfloor); if (regmatch(cc->next)) { - regcpblow(cp); + regcppartblow(cp); sayYES; /* All done. */ } regcppop(); @@ -960,7 +985,7 @@ char *prog; cc->lastloc = locinput; cp = regcppush(cc->parenfloor); if (regmatch(cc->scan)) { - regcpblow(cp); + regcppartblow(cp); sayYES; } regcppop(); @@ -975,7 +1000,7 @@ char *prog; cc->cur = n; cc->lastloc = locinput; if (regmatch(cc->scan)) { - regcpblow(cp); + regcppartblow(cp); sayYES; } regcppop(); /* Restore some previous $s? */