From: Jarkko Hietaniemi Date: Fri, 30 Nov 2001 01:16:22 +0000 (+0000) Subject: Add a note about folding vs lowercase. X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=cadb39a9446639e3c297a768022eb9c72347992a;p=p5sagit%2Fp5-mst-13.2.git Add a note about folding vs lowercase. p4raw-id: //depot/perl@13376 --- diff --git a/regexec.c b/regexec.c index a8acb06..415bc70 100644 --- a/regexec.c +++ b/regexec.c @@ -959,6 +959,14 @@ S_find_byclass(pTHX_ regexp * prog, regnode *c, char *s, char *strend, char *sta if (do_utf8) { STRLEN len; + /* The ibcmp_utf8() uses to_uni_fold() which is more + * correct folding for Unicode than using lowercase. + * However, it doesn't work quite fully since the folding + * is a one-to-many mapping and the regex optimizer is + * unaware of this, so it may throw out good matches. + * Fortunately, not getting this right is allowed + * for Unicode Regular Expression Support level 1, + * only one-to-one matching is required. --jhi */ if (c1 == c2) while (s <= e) { if ( utf8_to_uvchr((U8*)s, &len) == c1