From: Yves Orton <demerphq@gmail.com>
Date: Sun, 24 Dec 2006 14:38:15 +0000 (+0100)
Subject: Re: Named-capture regex syntax
X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=1f1031fe96c14865e4f60fdd3a6a6ce073d190c1;p=p5sagit%2Fp5-mst-13.2.git

Re: Named-capture regex syntax
Message-ID: <9b18b3110612240538m5c45654br7d27171835f6664@mail.gmail.com>

p4raw-id: //depot/perl@29621
---

diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index af7c9c4..c219182 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3683,6 +3683,11 @@ where the problem was discovered. See L<perlre>.
 <-- HERE shows in the regular expression about where the problem was
 discovered.  See L<perlre>.
 
+=item Sequence \\%s... not terminated in regex; marked by <-- HERE in m/%s/
+
+(F) The regular expression expects a mandatory argument following the escape
+sequence and this has been omitted or incorrectly written.
+
 =item Sequence (?#... not terminated in regex; marked by <-- HERE in m/%s/
 
 (F) A regular expression comment must be terminated by a closing
diff --git a/pod/perlre.pod b/pod/perlre.pod
index a876211..6c20496 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -250,6 +250,7 @@ X<word> X<whitespace>
     \g1      Backreference to a specific or previous group,
     \g{-1}   number may be negative indicating a previous buffer and may
              optionally be wrapped in curly brackets for safer parsing.
+    \g{name} Named backreference
     \k<name> Named backreference
     \N{name} Named unicode character, or unicode escape
     \x12     Hexadecimal escape sequence
@@ -486,7 +487,7 @@ backreference only if at least 11 left parentheses have opened
 before it.  And so on.  \1 through \9 are always interpreted as
 backreferences.
 
-X<\g{1}> X<\g{-1}> X<relative backreference>
+X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
 In order to provide a safer and easier way to construct patterns using
 backrefs, in Perl 5.10 the C<\g{N}> notation is provided. The curly
 brackets are optional, however omitting them is less safe as the meaning
@@ -494,6 +495,8 @@ of the pattern can be changed by text (such as digits) following it.
 When N is a positive integer the C<\g{N}> notation is exactly equivalent
 to using normal backreferences. When N is a negative integer then it is
 a relative backreference referring to the previous N'th capturing group.
+When the bracket form is used and N is not an integer, it is treated as a
+reference to a named buffer.
 
 Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
 buffer before that. For example:
@@ -510,11 +513,12 @@ buffer before that. For example:
 and would match the same as C</(Y) ( (X) \3 \1 )/x>.
 
 Additionally, as of Perl 5.10 you may use named capture buffers and named
-backreferences. The notation is C<< (?<name>...) >> and C<< \k<name> >>
-(you may also use single quotes instead of angle brackets to quote the
-name). The only difference with named capture buffers and unnamed ones is
+backreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>
+to reference. You may also use single quotes instead of angle brackets to quote the
+name; and you may use the bracketed C<< \g{name} >> back reference syntax.
+The only difference between named capture buffers and unnamed ones is
 that multiple buffers may have the same name and that the contents of
-named capture buffers is available via the C<%+> hash. When multiple
+named capture buffers are available via the C<%+> hash. When multiple
 groups share the same name C<$+{name}> and C<< \k<name> >> refer to the
 leftmost defined group, thus it's possible to do things with named capture
 buffers that would otherwise require C<(??{})> code to accomplish. Named
@@ -751,12 +755,20 @@ pattern
 $+{foo} will be the same as $2, and $3 will contain 'z' instead of
 the opposite which is what a .NET regex hacker might expect.
 
-Currently NAME is restricted to word chars only. In other words, it
-must match C</^\w+$/>.
+Currently NAME is restricted to simple identifiers only.
+In other words, it must match C</^[_A-Za-z][_A-Za-z0-9]*\z/> or
+its Unicode extension (see L<utf8>),
+though it isn't extended by the locale (see L<perllocale>).
 
-=item C<< \k<name> >>
+B<NOTE:> In order to make things easier for programmers with experience
+with the Python or PCRE regex engines the pattern C<< (?P<NAME>pattern) >>
+maybe be used instead of C<< (?<NAME>pattern) >>; however this form does not
+support the use of single quotes as a delimiter for the name. This is
+only available in Perl 5.10 or later.
 
-=item C<< \k'name' >>
+=item C<< \k<NAME> >>
+
+=item C<< \k'NAME' >>
 
 Named backreference. Similar to numeric backreferences, except that
 the group is designated by name and not number. If multiple groups
@@ -768,6 +780,10 @@ earlier in the pattern.
 
 Both forms are equivalent.
 
+B<NOTE:> In order to make things easier for programmers with experience
+with the Python or PCRE regex engines the pattern C<< (?P=NAME) >>
+maybe be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
+
 =item C<(?{ code })>
 X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
 
@@ -989,6 +1005,10 @@ the same name, then it recurses to the leftmost.
 It is an error to refer to a name that is not declared somewhere in the
 pattern.
 
+B<NOTE:> In order to make things easier for programmers with experience
+with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
+maybe be used instead of C<< (?&NAME) >> as of Perl 5.10.
+
 =item C<(?(condition)yes-pattern|no-pattern)>
 X<(?()>
 
@@ -1980,6 +2000,28 @@ part of this regular expression needs to be converted explicitly
     $re = customre::convert $re;
     /\Y|$re\Y|/;
 
+=head1 PCRE/Python Support
+
+As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+to the regex syntax. While Perl programmers are encouraged to use the
+Perl specific syntax, the following are legal in Perl 5.10:
+
+=over 4
+
+=item C<< (?P<NAME>pattern) >>
+
+Define a named capture buffer. Equivalent to C<< (?<NAME>pattern) >>.
+
+=item C<< (?P=NAME) >>
+
+Backreference to a named capture buffer. Equivalent to C<< \g{NAME} >>.
+
+=item C<< (?P>NAME) >>
+
+Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>.
+
+=back 4
+
 =head1 BUGS
 
 This document varies from difficult to understand to completely
diff --git a/regcomp.c b/regcomp.c
index bbab843..9047b1d 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -126,6 +126,7 @@ typedef struct RExC_state_t {
     I32		utf8;
     HV		*charnames;		/* cache of named sequences */
     HV		*paren_names;		/* Paren names */
+    
     regnode	**recurse;		/* Recurse regops */
     I32		recurse_count;		/* Number of recurse regops */
 #if ADD_TO_REGEXEC
@@ -135,8 +136,10 @@ typedef struct RExC_state_t {
 #ifdef DEBUGGING
     const char  *lastparse;
     I32         lastnum;
+    AV          *paren_name_list;       /* idx -> name */
 #define RExC_lastparse	(pRExC_state->lastparse)
 #define RExC_lastnum	(pRExC_state->lastnum)
+#define RExC_paren_name_list    (pRExC_state->paren_name_list)
 #endif
 } RExC_state_t;
 
@@ -4055,6 +4058,9 @@ Perl_re_compile(pTHX_ char *exp, char *xend, PMOP *pm)
     RExC_close_parens = NULL;
     RExC_opend = NULL;
     RExC_paren_names = NULL;
+#ifdef DEBUGGING
+    RExC_paren_name_list = NULL;
+#endif
     RExC_recurse = NULL;
     RExC_recurse_count = 0;
 
@@ -4576,7 +4582,14 @@ reStudy:
         r->paren_names = (HV*)SvREFCNT_inc(RExC_paren_names);
     else
         r->paren_names = NULL;
-        	
+#ifdef DEBUGGING
+    if (RExC_paren_names) {
+        ri->name_list_idx = add_data( pRExC_state, 1, "p" );
+        ri->data->data[ri->name_list_idx] = (void*)SvREFCNT_inc(RExC_paren_name_list);
+    } else
+        ri->name_list_idx = 0;
+#endif
+
     if (RExC_recurse_count) {
         for ( ; RExC_recurse_count ; RExC_recurse_count-- ) {
             const regnode *scan = RExC_recurse[RExC_recurse_count-1];
@@ -4660,17 +4673,19 @@ Perl_reg_named_buff_sv(pTHX_ SV* namesv)
 STATIC SV*
 S_reg_scan_name(pTHX_ RExC_state_t *pRExC_state, U32 flags) {
     char *name_start = RExC_parse;
-    if ( UTF ) {
-	STRLEN numlen;
-        while( isIDFIRST_uni(utf8n_to_uvchr((U8*)RExC_parse,
-            RExC_end - RExC_parse, &numlen, UTF8_ALLOW_DEFAULT)))
-        {
-                RExC_parse += numlen;
-        }
-    } else {
-        while( isIDFIRST(*RExC_parse) )
-	    RExC_parse++;
+
+    if (isIDFIRST_lazy_if(RExC_parse, UTF)) {
+	 /* skip IDFIRST by using do...while */
+	if (UTF)
+	    do {
+		RExC_parse += UTF8SKIP(RExC_parse);
+	    } while (isALNUM_utf8((U8*)RExC_parse));
+	else
+	    do {
+		RExC_parse++;
+	    } while (isALNUM(*RExC_parse));
     }
+
     if ( flags ) {
         SV* sv_name = sv_2mortal(Perl_newSVpvn(aTHX_ name_start,
             (int)(RExC_parse - name_start)));
@@ -4916,10 +4931,46 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
 	    ret = NULL;			/* For look-ahead/behind. */
 	    switch (paren) {
 
+	    case 'P':	/* (?P...) variants for those used to PCRE/Python */
+	        paren = *RExC_parse++;
+		if ( paren == '<')         /* (?P<...>) named capture */
+		    goto named_capture;
+                else if (paren == '>') {   /* (?P>name) named recursion */
+                    goto named_recursion;
+                }
+                else if (paren == '=') {   /* (?P=...)  named backref */
+                    /* this pretty much dupes the code for \k<NAME> in regatom(), if
+                       you change this make sure you change that */
+                    char* name_start = RExC_parse;
+		    U32 num = 0;
+                    SV *sv_dat = reg_scan_name(pRExC_state,
+                        SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA);
+                    if (RExC_parse == name_start || *RExC_parse != ')')
+                        vFAIL2("Sequence %.3s... not terminated",parse_start);
+
+                    if (!SIZE_ONLY) {
+                        num = add_data( pRExC_state, 1, "S" );
+                        RExC_rxi->data->data[num]=(void*)sv_dat;
+                        SvREFCNT_inc(sv_dat);
+                    }
+                    RExC_sawback = 1;
+                    ret = reganode(pRExC_state,
+                    	   (U8)(FOLD ? (LOC ? NREFFL : NREFF) : NREF),
+                    	   num);
+                    *flagp |= HASWIDTH;
+
+                    Set_Node_Offset(ret, parse_start+1);
+                    Set_Node_Cur_Length(ret); /* MJD */
+
+                    nextchar(pRExC_state);
+                    return ret;
+                }
+                goto unknown;
 	    case '<':           /* (?<...) */
 		if (*RExC_parse == '!')
 		    paren = ',';
 		else if (*RExC_parse != '=') 
+              named_capture:
 		{               /* (?<...>) */
 		    char *name_start;
 		    SV *svname;
@@ -4944,6 +4995,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
                         if (!RExC_paren_names) {
                             RExC_paren_names= newHV();
                             sv_2mortal((SV*)RExC_paren_names);
+#ifdef DEBUGGING
+                            RExC_paren_name_list= newAV();
+                            sv_2mortal((SV*)RExC_paren_name_list);
+#endif
                         }
                         he_str = hv_fetch_ent( RExC_paren_names, svname, 1, 0 );
                         if ( he_str )
@@ -4964,6 +5019,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
                             SvIOK_on(sv_dat);
                             SvIVX(sv_dat)= 1;
                         }
+#ifdef DEBUGGING
+                        if (!av_store(RExC_paren_name_list, RExC_npar, SvREFCNT_inc(svname)))
+                            SvREFCNT_dec(svname);
+#endif
 
                         /*sv_dump(sv_dat);*/
                     }
@@ -5009,6 +5068,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth)
                 char * parse_start;
             case '&':            /* (?&NAME) */
                 parse_start = RExC_parse - 1;
+              named_recursion:
                 {
     		    SV *sv_dat = reg_scan_name(pRExC_state,
     		        SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA);
@@ -6323,46 +6383,44 @@ tryagain:
             ret= reg_namedseq(pRExC_state, NULL); 
             break;
 	case 'k':    /* Handle \k<NAME> and \k'NAME' */
+	parse_named_seq:
         {   
             char ch= RExC_parse[1];	    
-	    if (ch != '<' && ch != '\'') {
-	        if (SIZE_ONLY)
-	            vWARN( RExC_parse + 1, 
-	                "Possible broken named back reference treated as literal k");
-	        parse_start--;
-	        goto defchar;
+	    if (ch != '<' && ch != '\'' && ch != '{') {
+	        RExC_parse++;
+	        vFAIL2("Sequence %.2s... not terminated",parse_start);
 	    } else {
+	        /* this pretty much dupes the code for (?P=...) in reg(), if
+                   you change this make sure you change that */
 		char* name_start = (RExC_parse += 2);
 		U32 num = 0;
                 SV *sv_dat = reg_scan_name(pRExC_state,
                     SIZE_ONLY ? REG_RSN_RETURN_NULL : REG_RSN_RETURN_DATA);
-                ch= (ch == '<') ? '>' : '\'';
-                    
+                char sch = ch;                        
+                ch= (ch == '<') ? '>' : (ch == '{') ? '}' : '\'';
                 if (RExC_parse == name_start || *RExC_parse != ch)
-                    vFAIL2("Sequence \\k%c... not terminated",
-                        (ch == '>' ? '<' : ch));
-                
+                    vFAIL2("Sequence %.3s... not terminated",parse_start);
+
+                if (!SIZE_ONLY) {
+                    num = add_data( pRExC_state, 1, "S" );
+                    RExC_rxi->data->data[num]=(void*)sv_dat;
+                    SvREFCNT_inc(sv_dat);
+                }
+
                 RExC_sawback = 1;
                 ret = reganode(pRExC_state,
                 	   (U8)(FOLD ? (LOC ? NREFFL : NREFF) : NREF),
                 	   num);
                 *flagp |= HASWIDTH;
-                
-    		
-                if (!SIZE_ONLY) {
-                    num = add_data( pRExC_state, 1, "S" );
-                    ARG_SET(ret,num);
-                    RExC_rxi->data->data[num]=(void*)sv_dat;
-                    SvREFCNT_inc(sv_dat);
-                }    
+
                 /* override incorrect value set in reganode MJD */
                 Set_Node_Offset(ret, parse_start+1);
                 Set_Node_Cur_Length(ret); /* MJD */
                 nextchar(pRExC_state);
-		               
+
             }
             break;
-        }            
+	}
 	case 'n':
 	case 'r':
 	case 't':
@@ -6391,7 +6449,11 @@ tryagain:
 		        RExC_parse++;
 		        isrel = 1;
 		    }
-		}   
+		    if (hasbrace && !isDIGIT(*RExC_parse)) {
+		        if (isrel) RExC_parse--;
+                        RExC_parse -= 2;		            
+		        goto parse_named_seq;
+		}   }
 		num = atoi(RExC_parse);
                 if (isrel) {
                     num = RExC_npar - num;
@@ -6404,6 +6466,8 @@ tryagain:
 		    char * const parse_start = RExC_parse - 1; /* MJD */
 		    while (isDIGIT(*RExC_parse))
 			RExC_parse++;
+	            if (parse_start == RExC_parse - 1) 
+	                vFAIL("Unterminated \\g... pattern");
                     if (hasbrace) {
                         if (*RExC_parse != '}') 
                             vFAIL("Unterminated \\g{...} pattern");
@@ -8306,9 +8370,29 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const regnode *o)
     }
     else if (k == WHILEM && o->flags)			/* Ordinal/of */
 	Perl_sv_catpvf(aTHX_ sv, "[%d/%d]", o->flags & 0xf, o->flags>>4);
-    else if (k == REF || k == OPEN || k == CLOSE || k == GROUPP || OP(o)==ACCEPT) 
+    else if (k == REF || k == OPEN || k == CLOSE || k == GROUPP || OP(o)==ACCEPT) {
 	Perl_sv_catpvf(aTHX_ sv, "%d", (int)ARG(o));	/* Parenth number */
-    else if (k == GOSUB) 
+	if ( prog->paren_names ) {
+	    AV *list= (AV *)progi->data->data[progi->name_list_idx];
+	    SV **name= av_fetch(list, ARG(o), 0 );
+	    if (name)
+	        Perl_sv_catpvf(aTHX_ sv, " '%"SVf"'", *name);
+        }	    
+    } else if (k == NREF) {
+        if ( prog->paren_names ) {
+            AV *list= (AV *)progi->data->data[ progi->name_list_idx ];
+            SV *sv_dat=(SV*)progi->data->data[ ARG( o ) ];
+            I32 *nums=(I32*)SvPVX(sv_dat);
+            SV **name= av_fetch(list, nums[0], 0 );
+            I32 n;
+            if (name) {
+                for ( n=0; n<SvIVX(sv_dat); n++ ) {
+                    Perl_sv_catpvf(aTHX_ sv, "%s%d",( n ? "," : "" ),nums[n]);
+                }
+                Perl_sv_catpvf(aTHX_ sv, " '%"SVf"'", *name );
+            }
+        }
+    } else if (k == GOSUB) 
 	Perl_sv_catpvf(aTHX_ sv, "%d[%+d]", (int)ARG(o),(int)ARG2L(o));	/* Paren and offset */
     else if (k == VERB) {
         if (!o->flags) 
@@ -9116,9 +9200,10 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node,
     register U8 op = PSEUDO;	/* Arbitrary non-END op. */
     register const regnode *next;
     const regnode *optstart= NULL;
+    
     RXi_GET_DECL(r,ri);
     GET_RE_DEBUG_FLAGS_DECL;
-
+    
 #ifdef DEBUG_DUMPUNTIL
     PerlIO_printf(Perl_debug_log, "--- %d : %d - %d - %d\n",indent,node-start,
         last ? last-start : 0,plast ? plast-start : 0);
@@ -9129,13 +9214,12 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node,
 
     while (PL_regkind[op] != END && (!last || node < last)) {
 	/* While that wasn't END last time... */
-
 	NODE_ALIGN(node);
 	op = OP(node);
 	if (op == CLOSE || op == WHILEM)
 	    indent--;
 	next = regnext((regnode *)node);
-	
+
 	/* Where, what. */
 	if (OP(node) == OPTIMIZED) {
 	    if (!optstart && RE_DEBUG_FLAG(RE_DEBUG_COMPILE_OPTIMISE))
@@ -9144,23 +9228,21 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode *start, const regnode *node,
 		goto after_print;
 	} else
 	    CLEAR_OPTSTART;
-	    
+	
 	regprop(r, sv, node);
 	PerlIO_printf(Perl_debug_log, "%4"IVdf":%*s%s", (IV)(node - start),
 		      (int)(2*indent + 1), "", SvPVX_const(sv));
-
-	if (OP(node) != OPTIMIZED) {
-	    if (next == NULL)		/* Next ptr. */
-		PerlIO_printf(Perl_debug_log, "(0)");
-	    else if (PL_regkind[(U8)op] == BRANCH && PL_regkind[OP(next)] != BRANCH )
-	        PerlIO_printf(Perl_debug_log, "(FAIL)");
-	    else
-		PerlIO_printf(Perl_debug_log, "(%"IVdf")", (IV)(next - start));
-		
-	    /*if (PL_regkind[(U8)op]  != TRIE)*/
-	        (void)PerlIO_putc(Perl_debug_log, '\n');
-	}
-
+        
+        if (OP(node) != OPTIMIZED) {		      
+            if (next == NULL)		/* Next ptr. */
+                PerlIO_printf(Perl_debug_log, " (0)");
+            else if (PL_regkind[(U8)op] == BRANCH && PL_regkind[OP(next)] != BRANCH )
+                PerlIO_printf(Perl_debug_log, " (FAIL)");
+            else 
+                PerlIO_printf(Perl_debug_log, " (%"IVdf")", (IV)(next - start));
+            (void)PerlIO_putc(Perl_debug_log, '\n'); 
+        }
+        
       after_print:
 	if (PL_regkind[(U8)op] == BRANCHJ) {
 	    assert(next);
diff --git a/regcomp.h b/regcomp.h
index a5eb0fb..7df47d3 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -103,10 +103,14 @@ typedef struct regexp_paren_ofs {
 } regexp_paren_ofs;
 
  typedef struct regexp_internal {
-        regexp_paren_ofs *swap; /* Swap copy of *startp / *endp */
+#ifdef DEBUGGING
+        int name_list_idx;	/* Optional data index of an array of paren names */
+#endif
+
 	U32 *offsets;           /* offset annotations 20001228 MJD 
                                    data about mapping the program to the 
                                    string*/
+        regexp_paren_ofs *swap; /* Swap copy of *startp / *endp */                                   
         regnode *regstclass;    /* Optional startclass as identified or constructed
                                    by the optimiser */
         struct reg_data *data;	/* Additional miscellaneous data used by the program.
diff --git a/t/op/pat.t b/t/op/pat.t
index 31922e9..f42847f 100755
--- a/t/op/pat.t
+++ b/t/op/pat.t
@@ -3663,6 +3663,32 @@ SKIP:{
     $s=~s/(?'digits'\d+)\k'digits'/$+{digits}/;
     ok($s eq '123456','Named capture (single quotes) s///');    
 }
+
+{
+    if (ord("A") == 193) {
+	for (1..10) {
+	    print "ok $test # Skip: in EBCDIC";
+	    $test++;
+	}
+    } else {
+	use utf8;
+	# Ã± = U+00F1 (n-tilde)
+	# Ì§ = U+0327 (cedilla)
+	# Â² = U+00B2 (superscript two)
+
+	ok("..foo foo.." =~ /(?'Ã±'foo) \k<Ã±>/, 'Named capture UTF');
+	ok($+{Ã±} eq 'foo', 'Named capture UTF');
+	ok("..bar bar.." =~ /(?<_Ã±>bar) \k'_Ã±'/, 'Named capture UTF');
+	ok($+{_Ã±} eq 'bar', 'Named capture UTF');
+	ok("..abc abc.." =~ /(?'cÌ§'abc) \k'cÌ§'/, 'Named capture UTF');
+	ok($+{cÌ§} eq 'abc', 'Named capture UTF');
+	ok("..xyz xyz.." =~ /(?'Ã±Ì§'xyz) \k'Ã±Ì§'/, 'Named capture UTF');
+	ok($+{Ã±Ì§} eq 'xyz', 'Named capture UTF');
+	ok("..456 456.." =~ /(?<aÂ²>456) \k'aÂ²'/, 'Named capture UTF');
+	ok($+{aÂ²} eq '456', 'Named capture UTF');
+    }
+}
+
 sub iseq($$;$) { 
     my ( $got, $expect, $name)=@_;
     
@@ -4193,6 +4219,9 @@ ok((q(a)x 100) =~ /^(??{'(.)'x 100})/,
         "Regexp /^(??{'(.)'x 100})/ crashes older perls")
     or print "# Unexpected outcome: should pass or crash perl\n";
 
+eval '/\k/';
+ok($@=~/\QSequence \k... not terminated in regex;\E/);
+
 {
     local $Message = "substitution with lookahead (possible segv)";
     $_="ns1ns1ns1";
@@ -4210,7 +4239,7 @@ ok((q(a)x 100) =~ /^(??{'(.)'x 100})/,
 iseq(0+$::test,$::TestCount,"Got the right number of tests!");
 # Don't forget to update this!
 BEGIN {
-    $::TestCount = 1573; 
+    $::TestCount = 1584; 
     print "1..$::TestCount\n";
 }
 
diff --git a/t/op/re_tests b/t/op/re_tests
index 48dbb79..1700588 100644
--- a/t/op/re_tests
+++ b/t/op/re_tests
@@ -1029,6 +1029,15 @@ X(?<=foo.)[YZ]	..XfooXY..	y	pos	8
 (?<n>foo|bar|baz)(?<m>[ew]+)	snofooewa	y	$+{m}	ew
 (?<n>foo)|(?<n>bar)|(?<n>baz)	snofooewa	y	$+{n}	foo
 (?<n>foo)(??{ $+{n} })	snofooefoofoowaa	y	$+{n}	foo
+(?P<n>foo|bar|baz)	snofooewa	y	$1	foo
+(?P<n>foo|bar|baz)	snofooewa	y	$+{n}	foo
+(?P<n>foo|bar|baz)(?P<m>[ew]+)	snofooewa	y	$+{n}	foo
+(?P<n>foo|bar|baz)(?P<m>[ew]+)	snofooewa	y	$+{m}	ew
+(?P<n>foo)|(?P<n>bar)|(?P<n>baz)	snofooewa	y	$+{n}	foo
+(?P<n>foo)(??{ $+{n} })	snofooefoofoowaa	y	$+{n}	foo
+(?P<=n>foo|bar|baz)	snofooewa	c	-	Sequence (?P<=...) not recognized
+(?P<!n>foo|bar|baz)	snofooewa	c	-	Sequence (?P<!...) not recognized
+(?PX<n>foo|bar|baz)	snofooewa	c	-	Sequence (?PX<...) not recognized
 /(?'n'foo|bar|baz)/	snofooewa	y	$1	foo
 /(?'n'foo|bar|baz)/	snofooewa	y	$+{n}	foo
 /(?'n'foo|bar|baz)(?'m'[ew]+)/	snofooewa	y	$+{n}	foo
@@ -1179,7 +1188,6 @@ round\(([^()]++)\)	_I(round(xs * sz),1)	y	$1	xs * sz
 (x|y|z[QW])*+(longish|loquatious|excessive|overblown[QW])*+	xyzQzWlongishoverblownW	y	$1-$2	zW-overblownW
 (x|y|z[QW]){1,5}+(longish|loquatious|excessive|overblown[QW]){1,5}+	xyzQzWlongishoverblownW	y	$1-$2	zW-overblownW
 
-
 a*(?!)	aaaab	n	-	-
 a*(*FAIL)	aaaab	n	-	-
 a*(*F)	aaaab	n	-	-
@@ -1196,9 +1204,64 @@ a*(*F)	aaaab	n	-	-
 (([abc]+) \g-1)(([abc]+) \g{-1})	abc abccba cba	y	$2-$4	abc-cba
 (a)(b)(c)\g1\g2\g3	abcabc	y	$1$2$3	abc
 
-
+# \k<n> preceded by a literal
 /(?'n'foo) \k<n>/	..foo foo..	y	$1	foo
 /(?'n'foo) \k<n>/	..foo foo..	y	$+{n}	foo
 /(?<n>foo) \k'n'/	..foo foo..	y	$1	foo
 /(?<n>foo) \k'n'/	..foo foo..	y	$+{n}	foo
+/(?'a1'foo) \k'a1'/	..foo foo..	y	$+{a1}	foo
+/(?<a1>foo) \k<a1>/	..foo foo..	y	$+{a1}	foo
+/(?'_'foo) \k'_'/	..foo foo..	y	$+{_}	foo
+/(?<_>foo) \k<_>/	..foo foo..	y	$+{_}	foo
+/(?'_0_'foo) \k'_0_'/	..foo foo..	y	$+{_0_}	foo
+/(?<_0_>foo) \k<_0_>/	..foo foo..	y	$+{_0_}	foo
+/(?'0'foo) bar/	..foo bar..	c	-	Sequence (?'
+/(?<0>foo) bar/	..foo bar..	c	-	Sequence (?<
+/(?'12'foo) bar/	..foo bar..	c	-	Sequence (?'
+/(?<12>foo) bar/	..foo bar..	c	-	Sequence (?<
+/(?'1a'foo) bar/	..foo bar..	c	-	Sequence (?'
+/(?<1a>foo) bar/	..foo bar..	c	-	Sequence (?<
+/(?''foo) bar/	..foo bar..	c	-	Sequence (?''
+/(?<>foo) bar/	..foo bar..	c	-	Sequence (?<>
+/foo \k'n'/	foo foo	c	-	Reference to nonexistent named group
+/foo \k<n>/	foo foo	c	-	Reference to nonexistent named group
+/foo \k'a1'/	foo foo	c	-	Reference to nonexistent named group
+/foo \k<a1>/	foo foo	c	-	Reference to nonexistent named group
+/foo \k'_'/	foo foo	c	-	Reference to nonexistent named group
+/foo \k<_>/	foo foo	c	-	Reference to nonexistent named group
+/foo \k'_0_'/	foo foo	c	-	Reference to nonexistent named group
+/foo \k<_0_>/	foo foo	c	-	Reference to nonexistent named group
+/foo \k'0'/	foo foo	c	-	Sequence \\k'
+/foo \k<0>/	foo foo	c	-	Sequence \\k<
+/foo \k'12'/	foo foo	c	-	Sequence \\k'
+/foo \k<12>/	foo foo	c	-	Sequence \\k<
+/foo \k'1a'/	foo foo	c	-	Sequence \\k'
+/foo \k<1a>/	foo foo	c	-	Sequence \\k<
+/foo \k''/	foo foo	c	-	Sequence \\k'
+/foo \k<>/	foo foo	c	-	Sequence \\k<
 /(?<as>as) (\w+) \k<as> (\w+)/	as easy as pie	y	$1-$2-$3	as-easy-pie
+
+# \g{...} with a name as the argument 
+/(?'n'foo) \g{n}/	..foo foo..	y	$1	foo
+/(?'n'foo) \g{n}/	..foo foo..	y	$+{n}	foo
+/(?<n>foo) \g{n}/	..foo foo..	y	$1	foo
+/(?<n>foo) \g{n}/	..foo foo..	y	$+{n}	foo
+/(?<as>as) (\w+) \g{as} (\w+)/	as easy as pie	y	$1-$2-$3	as-easy-pie
+
+# Python style named capture buffer stuff
+/(?P<n>foo)(?P=n)/	..foofoo..	y	$1	foo
+/(?P<n>foo)(?P=n)/	..foofoo..	y	$+{n}	foo
+/(?:(?P<n>foo)|(?P<n>bar))(?P=n)/	..barbar..	y	$+{n}	bar
+/^(?P<PAL>(?P<CHAR>.)((?P>PAL)|.?)(?P=CHAR))$/	madamimadam	y	$&	madamimadam
+/^(?P<PAL>(?P<CHAR>.)((?P>PAL)|.?)(?P=CHAR))$/	madamiamadam	n	-	-
+/(?P<n>foo) (?P=n)/	..foo foo..	y	$1	foo
+/(?P<n>foo) (?P=n)/	..foo foo..	y	$+{n}	foo
+/(?P<as>as) (\w+) (?P=as) (\w+)/	as easy as pie	y	$1-$2-$3	as-easy-pie
+
+#check that non identifiers as names are treated as the appropriate lookaround
+(?<=bar>)foo	bar>foo	y	$&	foo
+(?<!bar>)foo	bar>foo	n	-	-
+(?<=bar>ABC)foo	bar>ABCfoo	y	$&	foo
+(?<!bar>ABC)foo	bar>ABCfoo	n	-	-
+(?<bar>)foo	bar>ABCfoo	y	$&	foo
+(?<bar>ABC)foo	bar>ABCfoo	y	$&	ABCfoo
\ No newline at end of file