From: hv@crypt.org Date: Thu, 2 Jul 2009 10:36:08 +0000 (+0100) Subject: Some bugs in Perl regexp (core Perl issues) X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=906cdd2b284d712169319a6934ba68b578748c8f;p=p5sagit%2Fp5-mst-13.2.git Some bugs in Perl regexp (core Perl issues) "Hugo van der Sanden via RT" wrote: :This is caused by a failure of the start_class optimization in the case :of lookahead, as per the attached comment. : :In more detail: at the point study_chunk() attempts to deal with the :start_class discovered for the lookahead chunk, we have :SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS | :ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS. [...] :In other words, we need to stack an alternation of ANDs and ORs to cope :with this situation, and we don't have a mechanism to do that except to :recurse into study_chunk() some more. : :A simpler short-term fix is instead to throw up our hands in this :situation, and just nullify start_class. I'm not sure exactly how to do :that, but it seems the more likely to be achievable for 5.10.1. This patch implements the simple fix, and passes all tests including Abigail's test cases for the bug. Yves: note that I've preserved the 'was' code in this chunk, introduced by you in the patch [1], discussed in the thread [2]. As far as I can see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those 3 lines a little later) are simply doing the wrong thing - they seem to be saying "when we combine two start classes using SCF_DO_STCLASS_AND, claim that end-of-string is valid if the first class says it would be even though the second says it wouldn't be". Removing those lines doesn't cause any test failures - can you remember why you introduced those lines, and maybe add a test case that fails without them? Hugo [1] http://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760 [2] http://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89 Message-Id: <200907021036.n62Aa8rv029500@zen.crypt.org> rt.perl.org #56690 --- diff --git a/regcomp.c b/regcomp.c index 7e80041..50b0632 100644 --- a/regcomp.c +++ b/regcomp.c @@ -3727,11 +3727,22 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, data->whilem_c = data_fake.whilem_c; } if (f & SCF_DO_STCLASS_AND) { - const int was = (data->start_class->flags & ANYOF_EOS); - - cl_and(data->start_class, &intrnl); - if (was) - data->start_class->flags |= ANYOF_EOS; + if (flags & SCF_DO_STCLASS_OR) { + /* OR before, AND after: ideally we would recurse with + * data_fake to get the AND applied by study of the + * remainder of the pattern, and then derecurse; + * *** HACK *** for now just treat as "no information". + * See [perl #56690]. + */ + cl_init(pRExC_state, data->start_class); + } else { + /* AND before and after: combine and continue */ + const int was = (data->start_class->flags & ANYOF_EOS); + + cl_and(data->start_class, &intrnl); + if (was) + data->start_class->flags |= ANYOF_EOS; + } } } #if PERL_ENABLE_POSITIVE_ASSERTION_STUDY diff --git a/t/op/re_tests b/t/op/re_tests index 8c7381f..10bee20 100644 --- a/t/op/re_tests +++ b/t/op/re_tests @@ -1371,8 +1371,8 @@ foo(\h)bar foo\tbar y $1 \t .*?(?:(\w)|(\w))x abx y $1-$2 b- 0{50} 000000000000000000000000000000000000000000000000000 y - - -^a?(?=b)b ab B $& ab # Bug #56690 -^a*(?=b)b ab B $& ab # Bug #56690 +^a?(?=b)b ab y $& ab # Bug #56690 +^a*(?=b)b ab y $& ab # Bug #56690 />\d+$ \n/ix >10\n y $& >10 />\d+$ \n/ix >1\n y $& >1 /\d+$ \n/ix >10\n y $& 10