From: Ævar Arnfjörð Bjarmason Date: Wed, 16 May 2007 16:38:44 +0000 (+0000) Subject: Minor perlreapi.pod cleanup X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=882227b7f0b6e1ca62725268e60a7fd0211899ca;p=p5sagit%2Fp5-mst-13.2.git Minor perlreapi.pod cleanup From: "Ævar Arnfjörð Bjarmason" Message-ID: <51dd1af80705160938w13789b63m6d5f4710441ceac@mail.gmail.com> p4raw-id: //depot/perl@31244 --- diff --git a/pod/perlreapi.pod b/pod/perlreapi.pod index 5f9c1a2..1a170ff 100644 --- a/pod/perlreapi.pod +++ b/pod/perlreapi.pod @@ -46,7 +46,7 @@ to provide an extra argument to the routine holding a pointer back to the interpreter that is executing the regexp. So under threading all routines get an extra argument. -The routines are as follows: +=head1 Callbacks =head2 comp @@ -142,12 +142,12 @@ Set if the pattern is L, set by Perl_pmruntime. =back -In general these flags should be preserved in regex->extflags after -compilation, although it is possible the regex includes constructs -that changes them. The perl engine for instance may upgrade non-utf8 -strings to utf8 if the pattern includes constructs such as C<\x{...}> -that can only match unicode values. RXf_SKIPWHITE should always be -preserved verbatim in regex->extflags. +In general these flags should be preserved in C<< rx->extflags >> +after compilation, although it is possible the regex includes +constructs that changes them. The perl engine for instance may upgrade +non-utf8 strings to utf8 if the pattern includes constructs such as +C<\x{...}> that can only match unicode values. RXf_SKIPWHITE should +always be preserved verbatim in C<< regex->extflags >>. =head2 exec @@ -373,11 +373,12 @@ execute patterns in various contexts such as is the pattern anchored in some way, or what flags were used during the compile, or whether the program contains special constructs that perl needs to be aware of. -In addition it contains two fields that are intended for the private use -of the regex engine that compiled the pattern. These are the C -and pprivate members. The C is a void pointer to an arbitrary -structure whose use and management is the responsibility of the compiling -engine. perl will never modify either of these values. +In addition it contains two fields that are intended for the private +use of the regex engine that compiled the pattern. These are the +C and C members. C is a void pointer to +an arbitrary structure whose use and management is the responsibility +of the compiling engine. perl will never modify either of these +values. typedef struct regexp { /* what engine created this regexp? */ @@ -430,9 +431,7 @@ engine. perl will never modify either of these values. The fields are discussed in more detail below: -=over 4 - -=item C +=head2 C This field points at a regexp_engine structure which contains pointers to the subroutines that are to be used for performing a match. It @@ -443,16 +442,16 @@ Internally this is set to C unless a custom engine is specified in C<$^H{regcomp}>, perl's own set of callbacks can be accessed in the struct pointed to by C. -=item C +=head2 C TODO, see L -=item C +=head2 C This will be used by perl to see what flags the regexp was compiled with, this will normally be set to the value of the flags parameter on L. -=item C C +=head2 C C The minimum string length required for the pattern to match. This is used to prune the search space by not bothering to match any closer to the end of a @@ -474,36 +473,36 @@ distinction is particularly important as the substitution logic uses the C to tell whether it can do in-place substition which can result in considerable speedup. -=item C +=head2 C Left offset from pos() to start match at. -=item C +=head2 C TODO: document -=item C, C, and C +=head2 C, C, and C These fields are used to keep track of how many paren groups could be matched in the pattern, which was the last open paren to be entered, and which was the last close paren to be entered. -=item C +=head2 C The engine's private copy of the flags the pattern was compiled with. Usually this is the same as C unless the engine chose to modify one of them -=item C +=head2 C A void* pointing to an engine-defined data structure. The perl engine uses the C structure (see L) but a custom engine should use something else. -=item C +=head2 C TODO: document -=item C +=head2 C A C structure which defines offsets into the string being matched which correspond to the C<$&> and C<$1>, C<$2> etc. captures, the @@ -519,12 +518,12 @@ capture buffer did not match. C<< ->offs[0].start/end >> represents C<$&> (or C<${^MATCH> under C) and C<< ->offs[paren].end >> matches C<$$paren> where C<$paren >= 1>. -=item C C +=head2 C C Used for debugging purposes. C holds a copy of the pattern that was compiled and C its length. -=item C +=head2 C This is a hash used internally to track named capture buffers and their offsets. The keys are the names of the buffers the values are dualvars, @@ -533,7 +532,7 @@ pv being an embedded array of I32. The values may also be contained independently in the data array in cases where named backreferences are used. -=item C +=head2 C Holds information on the longest string that must occur at a fixed offset from the start of the pattern, and the longest string that must @@ -541,7 +540,7 @@ occur at a floating offset from the start of the pattern. Used to do Fast-Boyer-Moore searches on the string to find out if its worth using the regex engine at all, and if so where in the string to search. -=item C C C +=head2 C C C #define SAVEPVN(p,n) ((p) ? savepvn(p,n) : NULL) if (RX_MATCH_COPIED(ret)) @@ -554,7 +553,7 @@ Cextflags & RXf_PMf_KEEPCOPY> These are used during execution phase for managing search and replace patterns. -=item C C +=head2 C C Stores the string C stringifies to, for example C<(?-xism:eek)> in the case of C. @@ -572,26 +571,17 @@ understand some for of inline modifiers. The C in F does the stringification work. -=item C +=head2 C This stores the number of eval groups in the pattern. This is used for security purposes when embedding compiled regexes into larger patterns with C. -=item C +=head2 C The number of times the structure is referenced. When this falls to 0 the regexp is automatically freed by a call to pregfree. This should be set to 1 in each engine's L routine. -=back - -=head2 De-allocation and Cloning - -Any patch that adds data items to the REGEXP struct will need to include -changes to F (C) and F (C). This -involves freeing or cloning items in the regexp's data array based on the data -item's type. - =head1 HISTORY Originally part of L. diff --git a/regexp.h b/regexp.h index faec656..1f72112 100644 --- a/regexp.h +++ b/regexp.h @@ -55,8 +55,17 @@ typedef struct regexp_paren_pair { I32 end; } regexp_paren_pair; -/* this is ordered such that the most commonly used - fields are at the start of the struct */ +/* + The regexp/REGEXP struct, see L for further documentation + on the individual fields. The struct is ordered so that the most + commonly used fields are placed at the start. + + Any patch that adds items to this struct will need to include + changes to F (C) and F + (C). This involves freeing or cloning items in the + regexp's data array based on the data item's type. +*/ + typedef struct regexp { /* what engine created this regexp? */ const struct regexp_engine* engine;