Re: [PATCH] Initial attempt at named captures for perls regexp engine
[p5sagit/p5-mst-13.2.git] / regcomp.sym
CommitLineData
03363afd 1# regcomp.sym
2#
3# File has two sections, divided by a line of dashes '-'.
4#
5# Empty rows after #-comment are removed from input are ignored
6#
7# First section is for regops, second sectionis for regmatch-states
8#
3dab1dad 9# Note that the order in this file is important.
10#
03363afd 11# Format for first section:
12# NAME \t TYPE, arg-description [num-args] [longjump-len] \t DESCRIPTION
13#
3dab1dad 14#
c476f425 15# run perl regen.pl after editing this file
3dab1dad 16
03363afd 17
18
1de06328 19#* Exit points (0,1)
20
d09b2d29 21END END, no End of program.
22SUCCEED END, no Return from a subroutine, basically.
23
1de06328 24#* Anchors: (2..13)
25
d09b2d29 26BOL BOL, no Match "" at beginning of line.
27MBOL BOL, no Same, assuming multiline.
28SBOL BOL, no Same, assuming singleline.
b85d18e9 29EOS EOL, no Match "" at end of string.
d09b2d29 30EOL EOL, no Match "" at end of line.
31MEOL EOL, no Same, assuming multiline.
32SEOL EOL, no Same, assuming singleline.
33BOUND BOUND, no Match "" at any word boundary
34BOUNDL BOUND, no Match "" at any word boundary
35NBOUND NBOUND, no Match "" at any word non-boundary
36NBOUNDL NBOUND, no Match "" at any word non-boundary
37GPOS GPOS, no Matches where last m//g left off.
38
1de06328 39#* [Special] alternatives: (14..30)
40
22c35a8c 41REG_ANY REG_ANY, no Match any one character (except newline).
22c35a8c 42SANY REG_ANY, no Match any one character.
f33976b4 43CANY REG_ANY, no Match any one byte.
d09b2d29 44ANYOF ANYOF, sv Match character in (or not in) this class.
45ALNUM ALNUM, no Match any alphanumeric character
46ALNUML ALNUM, no Match any alphanumeric char in locale
47NALNUM NALNUM, no Match any non-alphanumeric character
48NALNUML NALNUM, no Match any non-alphanumeric char in locale
49SPACE SPACE, no Match any whitespace character
50SPACEL SPACE, no Match any whitespace char in locale
51NSPACE NSPACE, no Match any non-whitespace character
52NSPACEL NSPACE, no Match any non-whitespace char in locale
53DIGIT DIGIT, no Match any numeric character
b8c5462f 54DIGITL DIGIT, no Match any numeric character in locale
d09b2d29 55NDIGIT NDIGIT, no Match any non-numeric character
b8c5462f 56NDIGITL NDIGIT, no Match any non-numeric character in locale
a0ed51b3 57CLUMP CLUMP, no Match any combining character sequence
d09b2d29 58
1de06328 59#* Alternation (31)
60
61# BRANCH The set of branches constituting a single choice are hooked
d09b2d29 62# together with their "next" pointers, since precedence prevents
63# anything being concatenated to any individual branch. The
64# "next" pointer of the last BRANCH in a choice points to the
65# thing following the whole choice. This is also where the
66# final "next" pointer of each individual branch points; each
67# branch starts with the operand node of a BRANCH node.
68#
69BRANCH BRANCH, node Match this alternative, or the next...
70
1de06328 71#*Back pointer (32)
72
d09b2d29 73# BACK Normal "next" pointers all implicitly point forward; BACK
74# exists to make loop structures possible.
75# not used
76BACK BACK, no Match "", "next" ptr points backward.
77
1de06328 78#*Literals (33..35)
79
81714fb9 80EXACT EXACT, str Match this string (preceded by length).
81EXACTF EXACT, str Match this string, folded (prec. by length).
82EXACTFL EXACT, str Match this string, folded in locale (w/len).
d09b2d29 83
1de06328 84#*Do nothing types (36..37)
85
d09b2d29 86NOTHING NOTHING,no Match empty string.
87# A variant of above which delimits a group, thus stops optimizations
88TAIL NOTHING,no Match empty string. Can jump here from outside.
89
1de06328 90#*Loops (38..44)
91
d09b2d29 92# STAR,PLUS '?', and complex '*' and '+', are implemented as circular
93# BRANCH structures using BACK. Simple cases (one character
94# per match) are implemented with STAR and PLUS for speed
95# and to minimize recursive plunges.
96#
97STAR STAR, node Match this (simple) thing 0 or more times.
98PLUS PLUS, node Match this (simple) thing 1 or more times.
99
100CURLY CURLY, sv 2 Match this simple thing {n,m} times.
101CURLYN CURLY, no 2 Match next-after-this simple thing
102# {n,m} times, set parenths.
103CURLYM CURLY, no 2 Match this medium-complex thing {n,m} times.
104CURLYX CURLY, sv 2 Match this complex thing {n,m} times.
105
106# This terminator creates a loop structure for CURLYX
107WHILEM WHILEM, no Do curly processing and see if rest matches.
108
1de06328 109#*Buffer related (45..49)
110
d09b2d29 111# OPEN,CLOSE,GROUPP ...are numbered at compile time.
112OPEN OPEN, num 1 Mark this point in input as start of #n.
113CLOSE CLOSE, num 1 Analogous to OPEN.
114
115REF REF, num 1 Match some already matched string
116REFF REF, num 1 Match already matched string, folded
117REFFL REF, num 1 Match already matched string, folded in loc.
118
1de06328 119#*Grouping assertions (50..54)
120
d09b2d29 121IFMATCH BRANCHJ,off 1 2 Succeeds if the following matches.
122UNLESSM BRANCHJ,off 1 2 Fails if the following matches.
123SUSPEND BRANCHJ,off 1 1 "Independent" sub-RE.
124IFTHEN BRANCHJ,off 1 1 Switch, should be preceeded by switcher .
125GROUPP GROUPP, num 1 Whether the group matched.
126
1de06328 127#*Support for long RE (55..56)
128
d09b2d29 129LONGJMP LONGJMP,off 1 1 Jump far away.
130BRANCHJ BRANCHJ,off 1 1 BRANCH with long offset.
131
1de06328 132#*The heavy worker (57..58)
133
d09b2d29 134EVAL EVAL, evl 1 Execute some Perl code.
135
1de06328 136#*Modifiers (59..60)
137
d09b2d29 138MINMOD MINMOD, no Next operator is not greedy.
139LOGICAL LOGICAL,no Next opcode should set the flag only.
140
1de06328 141# This is not used yet (61)
d09b2d29 142RENUM BRANCHJ,off 1 1 Group with independently numbered parens.
143
1de06328 144#*Trie Related (62..64)
145
146# Behave the same as A|LIST|OF|WORDS would. The '..C' variants have
147# inline charclass data (ascii only), the 'C' store it in the structure.
148# NOTE: the relative order of the TRIE-like regops is signifigant
ce5e9471 149
3dab1dad 150TRIE TRIE, trie 1 Match many EXACT(FL?)? at once. flags==type
786e8c11 151TRIEC TRIE, trie charclass Same as TRIE, but with embedded charclass data
3dab1dad 152
1de06328 153# For start classes, contains an added fail table.
154AHOCORASICK TRIE, trie 1 Aho Corasick stclass. flags==type
155AHOCORASICKC TRIE, trie charclass Same as AHOCORASICK, but with embedded charclass data
156
81714fb9 157#*Recursion (65..66)
6bda09f9 158RECURSE RECURSE, num/ofs 2L recurse to paren arg1 at (signed) ofs arg2
159SRECURSE RECURSE, no recurse to start of pattern
03363afd 160
81714fb9 161#*Named references (67..69)
162NREF NREF, no-sv 1 Match some already matched string
163NREFF NREF, no-sv 1 Match already matched string, folded
164NREFFL NREF, no-sv 1 Match already matched string, folded in loc.
165
166
1de06328 167# NEW STUFF ABOVE THIS LINE -- Please update counts below.
168
03363afd 169################################################################################
170
81714fb9 171#*SPECIAL REGOPS (70, 71)
1de06328 172
173# This is not really a node, but an optimized away piece of a "long" node.
174# To simplify debugging output, we mark it as if it were a node
175OPTIMIZED NOTHING,off Placeholder for dump.
176
3dab1dad 177# Special opcode with the property that no opcode in a compiled program
178# will ever be of this type. Thus it can be used as a flag value that
179# no other opcode has been seen. END is used similarly, in that an END
180# node cant be optimized. So END implies "unoptimizable" and PSEUDO mean
181# "not seen anything to optimize yet".
182PSEUDO PSEUDO,off Pseudo opcode for internal use.
1de06328 183
03363afd 184-------------------------------------------------------------------------------
185# Format for second section:
186# REGOP \t typelist [ \t typelist] [# Comment]
187# typelist= namelist
188# = namelist:FAIL
189# = name:count
190
191# Anything below is a state
192#
193#
194TRIE next:FAIL
195EVAL AB:FAIL
c476f425 196CURLYX end:FAIL
197WHILEM A_pre,A_min,A_max,B_min,B_max:FAIL
03363afd 198BRANCH next:FAIL
199CURLYM A,B:FAIL
200IFMATCH A:FAIL
201CURLY B_min_known,B_min,B_max:FAIL
202