Commit | Line | Data |
055fd3a9 |
1 | =head1 NAME |
2 | |
3 | perldebguts - Guts of Perl debugging |
4 | |
5 | =head1 DESCRIPTION |
6 | |
ba555bf5 |
7 | This is not L<perldebug>, which tells you how to use |
74410c12 |
8 | the debugger. This manpage describes low-level details concerning |
9 | the debugger's internals, which range from difficult to impossible |
10 | to understand for anyone who isn't incredibly intimate with Perl's guts. |
11 | Caveat lector. |
055fd3a9 |
12 | |
13 | =head1 Debugger Internals |
14 | |
15 | Perl has special debugging hooks at compile-time and run-time used |
16 | to create debugging environments. These hooks are not to be confused |
4375e838 |
17 | with the I<perl -Dxxx> command described in L<perlrun>, which is |
18 | usable only if a special Perl is built per the instructions in the |
055fd3a9 |
19 | F<INSTALL> podpage in the Perl source tree. |
20 | |
21 | For example, whenever you call Perl's built-in C<caller> function |
74410c12 |
22 | from the package C<DB>, the arguments that the corresponding stack |
23 | frame was called with are copied to the C<@DB::args> array. These |
24 | mechanisms are enabled by calling Perl with the B<-d> switch. |
25 | Specifically, the following additional features are enabled |
26 | (cf. L<perlvar/$^P>): |
055fd3a9 |
27 | |
13a2d996 |
28 | =over 4 |
055fd3a9 |
29 | |
30 | =item * |
31 | |
32 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require |
33 | 'perl5db.pl'}> if not present) before the first line of your program. |
34 | |
35 | =item * |
36 | |
aa0b556f |
37 | Each array C<@{"_<$filename"}> holds the lines of $filename for a |
74410c12 |
38 | file compiled by Perl. The same is also true for C<eval>ed strings |
39 | that contain subroutines, or which are currently being executed. |
40 | The $filename for C<eval>ed strings looks like C<(eval 34)>. |
41 | Code assertions in regexes look like C<(re_eval 19)>. |
8894c26d |
42 | |
43 | Values in this array are magical in numeric context: they compare |
44 | equal to zero only if the line is not breakable. |
055fd3a9 |
45 | |
46 | =item * |
47 | |
aa0b556f |
48 | Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed |
055fd3a9 |
49 | by line number. Individual entries (as opposed to the whole hash) |
50 | are settable. Perl only cares about Boolean true here, although |
51 | the values used by F<perl5db.pl> have the form |
8894c26d |
52 | C<"$break_condition\0$action">. |
055fd3a9 |
53 | |
54 | The same holds for evaluated strings that contain subroutines, or |
55 | which are currently being executed. The $filename for C<eval>ed strings |
56 | looks like C<(eval 34)> or C<(re_eval 19)>. |
57 | |
58 | =item * |
59 | |
aa0b556f |
60 | Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is |
055fd3a9 |
61 | also the case for evaluated strings that contain subroutines, or |
62 | which are currently being executed. The $filename for C<eval>ed |
63 | strings looks like C<(eval 34)> or C<(re_eval 19)>. |
64 | |
65 | =item * |
66 | |
67 | After each C<require>d file is compiled, but before it is executed, |
68 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine |
69 | C<DB::postponed> exists. Here, the $filename is the expanded name of |
70 | the C<require>d file, as found in the values of %INC. |
71 | |
72 | =item * |
73 | |
74 | After each subroutine C<subname> is compiled, the existence of |
75 | C<$DB::postponed{subname}> is checked. If this key exists, |
76 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine |
77 | also exists. |
78 | |
79 | =item * |
80 | |
81 | A hash C<%DB::sub> is maintained, whose keys are subroutine names |
82 | and whose values have the form C<filename:startline-endline>. |
83 | C<filename> has the form C<(eval 34)> for subroutines defined inside |
84 | C<eval>s, or C<(re_eval 19)> for those within regex code assertions. |
85 | |
86 | =item * |
87 | |
88 | When the execution of your program reaches a point that can hold a |
74410c12 |
89 | breakpoint, the C<DB::DB()> subroutine is called if any of the variables |
90 | C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables |
055fd3a9 |
91 | are not C<local>izable. This feature is disabled when executing |
92 | inside C<DB::DB()>, including functions called from it |
93 | unless C<< $^D & (1<<30) >> is true. |
94 | |
95 | =item * |
96 | |
97 | When execution of the program reaches a subroutine call, a call to |
98 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the |
74410c12 |
99 | name of the called subroutine. (This doesn't happen if the subroutine |
055fd3a9 |
100 | was compiled in the C<DB> package.) |
101 | |
102 | =back |
103 | |
104 | Note that if C<&DB::sub> needs external data for it to work, no |
74410c12 |
105 | subroutine call is possible without it. As an example, the standard |
106 | debugger's C<&DB::sub> depends on the C<$DB::deep> variable |
107 | (it defines how many levels of recursion deep into the debugger you can go |
108 | before a mandatory break). If C<$DB::deep> is not defined, subroutine |
109 | calls are not possible, even though C<&DB::sub> exists. |
055fd3a9 |
110 | |
111 | =head2 Writing Your Own Debugger |
112 | |
74410c12 |
113 | =head3 Environment Variables |
666f95b9 |
114 | |
74410c12 |
115 | The C<PERL5DB> environment variable can be used to define a debugger. |
116 | For example, the minimal "working" debugger (it actually doesn't do anything) |
117 | consists of one line: |
666f95b9 |
118 | |
055fd3a9 |
119 | sub DB::DB {} |
120 | |
74410c12 |
121 | It can easily be defined like this: |
666f95b9 |
122 | |
055fd3a9 |
123 | $ PERL5DB="sub DB::DB {}" perl -d your-script |
124 | |
74410c12 |
125 | Another brief debugger, slightly more useful, can be created |
055fd3a9 |
126 | with only the line: |
127 | |
128 | sub DB::DB {print ++$i; scalar <STDIN>} |
129 | |
74410c12 |
130 | This debugger prints a number which increments for each statement |
131 | encountered and waits for you to hit a newline before continuing |
132 | to the next statement. |
666f95b9 |
133 | |
74410c12 |
134 | The following debugger is actually useful: |
666f95b9 |
135 | |
055fd3a9 |
136 | { |
137 | package DB; |
138 | sub DB {} |
139 | sub sub {print ++$i, " $sub\n"; &$sub} |
140 | } |
141 | |
74410c12 |
142 | It prints the sequence number of each subroutine call and the name of the |
143 | called subroutine. Note that C<&DB::sub> is being compiled into the |
144 | package C<DB> through the use of the C<package> directive. |
055fd3a9 |
145 | |
74410c12 |
146 | When it starts, the debugger reads your rc file (F<./.perldb> or |
147 | F<~/.perldb> under Unix), which can set important options. |
148 | (A subroutine (C<&afterinit>) can be defined here as well; it is executed |
149 | after the debugger completes its own initialization.) |
055fd3a9 |
150 | |
151 | After the rc file is read, the debugger reads the PERLDB_OPTS |
74410c12 |
152 | environment variable and uses it to set debugger options. The |
153 | contents of this variable are treated as if they were the argument |
492652be |
154 | of an C<o ...> debugger command (q.v. in L<perldebug/Options>). |
74410c12 |
155 | |
156 | =head3 Debugger internal variables |
25cf7dea |
157 | |
74410c12 |
158 | In addition to the file and subroutine-related variables mentioned above, |
159 | the debugger also maintains various magical internal variables. |
160 | |
161 | =over 4 |
162 | |
163 | =item * |
055fd3a9 |
164 | |
74410c12 |
165 | C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which |
166 | holds the lines of the currently-selected file (compiled by Perl), either |
167 | explicitly chosen with the debugger's C<f> command, or implicitly by flow |
168 | of execution. |
169 | |
170 | Values in this array are magical in numeric context: they compare |
171 | equal to zero only if the line is not breakable. |
172 | |
173 | =item * |
174 | |
175 | C<%DB::dbline>, is an alias for C<%{"::_<current_file"}>, which |
176 | contains breakpoints and actions keyed by line number in |
177 | the currently-selected file, either explicitly chosen with the |
055fd3a9 |
178 | debugger's C<f> command, or implicitly by flow of execution. |
179 | |
74410c12 |
180 | As previously noted, individual entries (as opposed to the whole hash) |
181 | are settable. Perl only cares about Boolean true here, although |
182 | the values used by F<perl5db.pl> have the form |
183 | C<"$break_condition\0$action">. |
184 | |
185 | =back |
186 | |
7eabac42 |
187 | =head3 Debugger customization functions |
74410c12 |
188 | |
189 | Some functions are provided to simplify customization. |
190 | |
191 | =over 4 |
192 | |
193 | =item * |
194 | |
71110851 |
195 | See L<perldebug/"Configurable Options"> for a description of options parsed by |
196 | C<DB::parse_options(string)>. |
74410c12 |
197 | |
198 | =item * |
199 | |
200 | C<DB::dump_trace(skip[,count])> skips the specified number of frames |
201 | and returns a list containing information about the calling frames (all |
202 | of them, if C<count> is missing). Each entry is reference to a hash |
203 | with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine |
055fd3a9 |
204 | name, or info about C<eval>), C<args> (C<undef> or a reference to |
205 | an array), C<file>, and C<line>. |
206 | |
74410c12 |
207 | =item * |
208 | |
209 | C<DB::print_trace(FH, skip[, count[, short]])> prints |
055fd3a9 |
210 | formatted info about caller frames. The last two functions may be |
211 | convenient as arguments to C<< < >>, C<< << >> commands. |
212 | |
74410c12 |
213 | =back |
214 | |
055fd3a9 |
215 | Note that any variables and functions that are not documented in |
216 | this manpages (or in L<perldebug>) are considered for internal |
217 | use only, and as such are subject to change without notice. |
218 | |
219 | =head1 Frame Listing Output Examples |
220 | |
221 | The C<frame> option can be used to control the output of frame |
222 | information. For example, contrast this expression trace: |
223 | |
224 | $ perl -de 42 |
225 | Stack dump during die enabled outside of evals. |
226 | |
227 | Loading DB routines from perl5db.pl patch level 0.94 |
228 | Emacs support available. |
229 | |
230 | Enter h or `h h' for help. |
231 | |
232 | main::(-e:1): 0 |
233 | DB<1> sub foo { 14 } |
234 | |
235 | DB<2> sub bar { 3 } |
236 | |
237 | DB<3> t print foo() * bar() |
238 | main::((eval 172):3): print foo() + bar(); |
239 | main::foo((eval 168):2): |
240 | main::bar((eval 170):2): |
241 | 42 |
242 | |
492652be |
243 | with this one, once the C<o>ption C<frame=2> has been set: |
055fd3a9 |
244 | |
492652be |
245 | DB<4> o f=2 |
055fd3a9 |
246 | frame = '2' |
247 | DB<5> t print foo() * bar() |
248 | 3: foo() * bar() |
249 | entering main::foo |
250 | 2: sub foo { 14 }; |
251 | exited main::foo |
252 | entering main::bar |
253 | 2: sub bar { 3 }; |
254 | exited main::bar |
255 | 42 |
256 | |
257 | By way of demonstration, we present below a laborious listing |
258 | resulting from setting your C<PERLDB_OPTS> environment variable to |
259 | the value C<f=n N>, and running I<perl -d -V> from the command line. |
260 | Examples use various values of C<n> are shown to give you a feel |
261 | for the difference between settings. Long those it may be, this |
262 | is not a complete listing, but only excerpts. |
263 | |
264 | =over 4 |
265 | |
266 | =item 1 |
267 | |
268 | entering main::BEGIN |
269 | entering Config::BEGIN |
270 | Package lib/Exporter.pm. |
271 | Package lib/Carp.pm. |
272 | Package lib/Config.pm. |
273 | entering Config::TIEHASH |
274 | entering Exporter::import |
275 | entering Exporter::export |
276 | entering Config::myconfig |
277 | entering Config::FETCH |
278 | entering Config::FETCH |
279 | entering Config::FETCH |
280 | entering Config::FETCH |
281 | |
282 | =item 2 |
283 | |
284 | entering main::BEGIN |
285 | entering Config::BEGIN |
286 | Package lib/Exporter.pm. |
287 | Package lib/Carp.pm. |
288 | exited Config::BEGIN |
289 | Package lib/Config.pm. |
290 | entering Config::TIEHASH |
291 | exited Config::TIEHASH |
292 | entering Exporter::import |
293 | entering Exporter::export |
294 | exited Exporter::export |
295 | exited Exporter::import |
296 | exited main::BEGIN |
297 | entering Config::myconfig |
298 | entering Config::FETCH |
299 | exited Config::FETCH |
300 | entering Config::FETCH |
301 | exited Config::FETCH |
302 | entering Config::FETCH |
303 | |
d5e42f17 |
304 | =item 3 |
055fd3a9 |
305 | |
306 | in $=main::BEGIN() from /dev/null:0 |
307 | in $=Config::BEGIN() from lib/Config.pm:2 |
308 | Package lib/Exporter.pm. |
309 | Package lib/Carp.pm. |
310 | Package lib/Config.pm. |
311 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
312 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
313 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li |
314 | in @=Config::myconfig() from /dev/null:0 |
315 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
316 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
317 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
318 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
319 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 |
320 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 |
321 | |
d5e42f17 |
322 | =item 4 |
055fd3a9 |
323 | |
324 | in $=main::BEGIN() from /dev/null:0 |
325 | in $=Config::BEGIN() from lib/Config.pm:2 |
326 | Package lib/Exporter.pm. |
327 | Package lib/Carp.pm. |
328 | out $=Config::BEGIN() from lib/Config.pm:0 |
329 | Package lib/Config.pm. |
330 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
331 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
332 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
333 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
334 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
335 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
336 | out $=main::BEGIN() from /dev/null:0 |
337 | in @=Config::myconfig() from /dev/null:0 |
338 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
339 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
340 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
341 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
342 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
343 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
344 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
345 | |
d5e42f17 |
346 | =item 5 |
055fd3a9 |
347 | |
348 | in $=main::BEGIN() from /dev/null:0 |
349 | in $=Config::BEGIN() from lib/Config.pm:2 |
350 | Package lib/Exporter.pm. |
351 | Package lib/Carp.pm. |
352 | out $=Config::BEGIN() from lib/Config.pm:0 |
353 | Package lib/Config.pm. |
354 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
355 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
356 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
357 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
358 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
359 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
360 | out $=main::BEGIN() from /dev/null:0 |
361 | in @=Config::myconfig() from /dev/null:0 |
362 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
363 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
364 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
365 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
366 | |
d5e42f17 |
367 | =item 6 |
055fd3a9 |
368 | |
369 | in $=CODE(0x15eca4)() from /dev/null:0 |
370 | in $=CODE(0x182528)() from lib/Config.pm:2 |
371 | Package lib/Exporter.pm. |
372 | out $=CODE(0x182528)() from lib/Config.pm:0 |
373 | scalar context return from CODE(0x182528): undef |
374 | Package lib/Config.pm. |
375 | in $=Config::TIEHASH('Config') from lib/Config.pm:628 |
376 | out $=Config::TIEHASH('Config') from lib/Config.pm:628 |
377 | scalar context return from Config::TIEHASH: empty hash |
378 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
379 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
380 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
381 | scalar context return from Exporter::export: '' |
382 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
383 | scalar context return from Exporter::import: '' |
384 | |
385 | =back |
386 | |
387 | In all cases shown above, the line indentation shows the call tree. |
388 | If bit 2 of C<frame> is set, a line is printed on exit from a |
389 | subroutine as well. If bit 4 is set, the arguments are printed |
390 | along with the caller info. If bit 8 is set, the arguments are |
391 | printed even if they are tied or references. If bit 16 is set, the |
392 | return value is printed, too. |
393 | |
394 | When a package is compiled, a line like this |
395 | |
396 | Package lib/Carp.pm. |
397 | |
398 | is printed with proper indentation. |
399 | |
400 | =head1 Debugging regular expressions |
401 | |
402 | There are two ways to enable debugging output for regular expressions. |
403 | |
404 | If your perl is compiled with C<-DDEBUGGING>, you may use the |
405 | B<-Dr> flag on the command line. |
406 | |
407 | Otherwise, one can C<use re 'debug'>, which has effects at |
408 | compile time and run time. It is not lexically scoped. |
409 | |
410 | =head2 Compile-time output |
411 | |
412 | The debugging output at compile time looks like this: |
413 | |
1c102323 |
414 | Compiling REx `[bc]d(ef*g)+h[ij]k$' |
415 | size 45 Got 364 bytes for offset annotations. |
416 | first at 1 |
417 | rarest char g at 0 |
418 | rarest char d at 0 |
419 | 1: ANYOF[bc](12) |
420 | 12: EXACT <d>(14) |
421 | 14: CURLYX[0] {1,32767}(28) |
422 | 16: OPEN1(18) |
423 | 18: EXACT <e>(20) |
424 | 20: STAR(23) |
425 | 21: EXACT <f>(0) |
426 | 23: EXACT <g>(25) |
427 | 25: CLOSE1(27) |
428 | 27: WHILEM[1/1](0) |
429 | 28: NOTHING(29) |
430 | 29: EXACT <h>(31) |
431 | 31: ANYOF[ij](42) |
432 | 42: EXACT <k>(44) |
433 | 44: EOL(45) |
434 | 45: END(0) |
435 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) |
436 | stclass `ANYOF[bc]' minlen 7 |
437 | Offsets: [45] |
438 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] |
439 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] |
440 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] |
441 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] |
442 | Omitting $` $& $' support. |
055fd3a9 |
443 | |
444 | The first line shows the pre-compiled form of the regex. The second |
445 | shows the size of the compiled form (in arbitrary units, usually |
1c102323 |
446 | 4-byte words) and the total number of bytes allocated for the |
447 | offset/length table, usually 4+C<size>*8. The next line shows the |
448 | label I<id> of the first node that does a match. |
055fd3a9 |
449 | |
1c102323 |
450 | The |
451 | |
452 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) |
453 | stclass `ANYOF[bc]' minlen 7 |
454 | |
455 | line (split into two lines above) contains optimizer |
055fd3a9 |
456 | information. In the example shown, the optimizer found that the match |
457 | should contain a substring C<de> at offset 1, plus substring C<gh> |
458 | at some offset between 3 and infinity. Moreover, when checking for |
459 | these substrings (to abandon impossible matches quickly), Perl will check |
460 | for the substring C<gh> before checking for the substring C<de>. The |
461 | optimizer may also use the knowledge that the match starts (at the |
1c102323 |
462 | C<first> I<id>) with a character class, and no string |
463 | shorter than 7 characters can possibly match. |
055fd3a9 |
464 | |
1c102323 |
465 | The fields of interest which may appear in this line are |
055fd3a9 |
466 | |
13a2d996 |
467 | =over 4 |
055fd3a9 |
468 | |
469 | =item C<anchored> I<STRING> C<at> I<POS> |
470 | |
471 | =item C<floating> I<STRING> C<at> I<POS1..POS2> |
472 | |
473 | See above. |
474 | |
475 | =item C<matching floating/anchored> |
476 | |
477 | Which substring to check first. |
478 | |
479 | =item C<minlen> |
480 | |
481 | The minimal length of the match. |
482 | |
483 | =item C<stclass> I<TYPE> |
484 | |
485 | Type of first matching node. |
486 | |
487 | =item C<noscan> |
488 | |
489 | Don't scan for the found substrings. |
490 | |
491 | =item C<isall> |
492 | |
1c102323 |
493 | Means that the optimizer information is all that the regular |
055fd3a9 |
494 | expression contains, and thus one does not need to enter the regex engine at |
495 | all. |
496 | |
497 | =item C<GPOS> |
498 | |
499 | Set if the pattern contains C<\G>. |
500 | |
501 | =item C<plus> |
502 | |
503 | Set if the pattern starts with a repeated char (as in C<x+y>). |
504 | |
505 | =item C<implicit> |
506 | |
507 | Set if the pattern starts with C<.*>. |
508 | |
509 | =item C<with eval> |
510 | |
511 | Set if the pattern contain eval-groups, such as C<(?{ code })> and |
512 | C<(??{ code })>. |
513 | |
514 | =item C<anchored(TYPE)> |
515 | |
516 | If the pattern may match only at a handful of places, (with C<TYPE> |
517 | being C<BOL>, C<MBOL>, or C<GPOS>. See the table below. |
518 | |
519 | =back |
520 | |
521 | If a substring is known to match at end-of-line only, it may be |
522 | followed by C<$>, as in C<floating `k'$>. |
523 | |
1c102323 |
524 | The optimizer-specific information is used to avoid entering (a slow) regex |
525 | engine on strings that will not definitely match. If the C<isall> flag |
055fd3a9 |
526 | is set, a call to the regex engine may be avoided even when the optimizer |
527 | found an appropriate place for the match. |
528 | |
1c102323 |
529 | Above the optimizer section is the list of I<nodes> of the compiled |
055fd3a9 |
530 | form of the regex. Each line has format |
531 | |
532 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) |
533 | |
534 | =head2 Types of nodes |
535 | |
536 | Here are the possible types, with short descriptions: |
537 | |
538 | # TYPE arg-description [num-args] [longjump-len] DESCRIPTION |
539 | |
540 | # Exit points |
541 | END no End of program. |
542 | SUCCEED no Return from a subroutine, basically. |
543 | |
544 | # Anchors: |
545 | BOL no Match "" at beginning of line. |
546 | MBOL no Same, assuming multiline. |
547 | SBOL no Same, assuming singleline. |
548 | EOS no Match "" at end of string. |
549 | EOL no Match "" at end of line. |
550 | MEOL no Same, assuming multiline. |
551 | SEOL no Same, assuming singleline. |
552 | BOUND no Match "" at any word boundary |
553 | BOUNDL no Match "" at any word boundary |
554 | NBOUND no Match "" at any word non-boundary |
555 | NBOUNDL no Match "" at any word non-boundary |
556 | GPOS no Matches where last m//g left off. |
557 | |
558 | # [Special] alternatives |
559 | ANY no Match any one character (except newline). |
560 | SANY no Match any one character. |
561 | ANYOF sv Match character in (or not in) this class. |
562 | ALNUM no Match any alphanumeric character |
563 | ALNUML no Match any alphanumeric char in locale |
564 | NALNUM no Match any non-alphanumeric character |
565 | NALNUML no Match any non-alphanumeric char in locale |
566 | SPACE no Match any whitespace character |
567 | SPACEL no Match any whitespace char in locale |
568 | NSPACE no Match any non-whitespace character |
569 | NSPACEL no Match any non-whitespace char in locale |
570 | DIGIT no Match any numeric character |
571 | NDIGIT no Match any non-numeric character |
572 | |
573 | # BRANCH The set of branches constituting a single choice are hooked |
574 | # together with their "next" pointers, since precedence prevents |
575 | # anything being concatenated to any individual branch. The |
576 | # "next" pointer of the last BRANCH in a choice points to the |
577 | # thing following the whole choice. This is also where the |
578 | # final "next" pointer of each individual branch points; each |
579 | # branch starts with the operand node of a BRANCH node. |
580 | # |
581 | BRANCH node Match this alternative, or the next... |
582 | |
583 | # BACK Normal "next" pointers all implicitly point forward; BACK |
584 | # exists to make loop structures possible. |
585 | # not used |
586 | BACK no Match "", "next" ptr points backward. |
587 | |
588 | # Literals |
589 | EXACT sv Match this string (preceded by length). |
590 | EXACTF sv Match this string, folded (prec. by length). |
591 | EXACTFL sv Match this string, folded in locale (w/len). |
592 | |
593 | # Do nothing |
594 | NOTHING no Match empty string. |
595 | # A variant of above which delimits a group, thus stops optimizations |
596 | TAIL no Match empty string. Can jump here from outside. |
597 | |
598 | # STAR,PLUS '?', and complex '*' and '+', are implemented as circular |
599 | # BRANCH structures using BACK. Simple cases (one character |
600 | # per match) are implemented with STAR and PLUS for speed |
601 | # and to minimize recursive plunges. |
602 | # |
603 | STAR node Match this (simple) thing 0 or more times. |
604 | PLUS node Match this (simple) thing 1 or more times. |
605 | |
606 | CURLY sv 2 Match this simple thing {n,m} times. |
607 | CURLYN no 2 Match next-after-this simple thing |
608 | # {n,m} times, set parens. |
609 | CURLYM no 2 Match this medium-complex thing {n,m} times. |
610 | CURLYX sv 2 Match this complex thing {n,m} times. |
611 | |
612 | # This terminator creates a loop structure for CURLYX |
613 | WHILEM no Do curly processing and see if rest matches. |
614 | |
615 | # OPEN,CLOSE,GROUPP ...are numbered at compile time. |
616 | OPEN num 1 Mark this point in input as start of #n. |
617 | CLOSE num 1 Analogous to OPEN. |
618 | |
619 | REF num 1 Match some already matched string |
620 | REFF num 1 Match already matched string, folded |
621 | REFFL num 1 Match already matched string, folded in loc. |
622 | |
623 | # grouping assertions |
624 | IFMATCH off 1 2 Succeeds if the following matches. |
625 | UNLESSM off 1 2 Fails if the following matches. |
626 | SUSPEND off 1 1 "Independent" sub-regex. |
627 | IFTHEN off 1 1 Switch, should be preceded by switcher . |
628 | GROUPP num 1 Whether the group matched. |
629 | |
630 | # Support for long regex |
631 | LONGJMP off 1 1 Jump far away. |
632 | BRANCHJ off 1 1 BRANCH with long offset. |
633 | |
634 | # The heavy worker |
635 | EVAL evl 1 Execute some Perl code. |
636 | |
637 | # Modifiers |
638 | MINMOD no Next operator is not greedy. |
639 | LOGICAL no Next opcode should set the flag only. |
640 | |
641 | # This is not used yet |
642 | RENUM off 1 1 Group with independently numbered parens. |
643 | |
644 | # This is not really a node, but an optimized away piece of a "long" node. |
645 | # To simplify debugging output, we mark it as if it were a node |
646 | OPTIMIZED off Placeholder for dump. |
647 | |
1c102323 |
648 | =for unprinted-credits |
649 | Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 |
650 | |
651 | Following the optimizer information is a dump of the offset/length |
652 | table, here split across several lines: |
653 | |
654 | Offsets: [45] |
655 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] |
656 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] |
657 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] |
658 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] |
659 | |
660 | The first line here indicates that the offset/length table contains 45 |
661 | entries. Each entry is a pair of integers, denoted by C<offset[length]>. |
17c338f3 |
662 | Entries are numbered starting with 1, so entry #1 here is C<1[4]> and |
1c102323 |
663 | entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:> |
664 | (the C<1: ANYOF[bc]>) begins at character position 1 in the |
665 | pre-compiled form of the regex, and has a length of 4 characters. |
666 | C<5[1]> in position 12 |
667 | indicates that the node labeled C<12:> |
668 | (the C<< 12: EXACT <d> >>) begins at character position 5 in the |
669 | pre-compiled form of the regex, and has a length of 1 character. |
670 | C<12[1]> in position 14 |
671 | indicates that the node labeled C<14:> |
672 | (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the |
673 | pre-compiled form of the regex, and has a length of 1 character---that |
674 | is, it corresponds to the C<+> symbol in the precompiled regex. |
675 | |
676 | C<0[0]> items indicate that there is no corresponding node. |
677 | |
055fd3a9 |
678 | =head2 Run-time output |
679 | |
680 | First of all, when doing a match, one may get no run-time output even |
681 | if debugging is enabled. This means that the regex engine was never |
682 | entered and that all of the job was therefore done by the optimizer. |
683 | |
684 | If the regex engine was entered, the output may look like this: |
685 | |
686 | Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' |
687 | Setting an EVAL scope, savestack=3 |
688 | 2 <ab> <cdefg__gh_> | 1: ANYOF |
689 | 3 <abc> <defg__gh_> | 11: EXACT <d> |
690 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} |
691 | 4 <abcd> <efg__gh_> | 26: WHILEM |
692 | 0 out of 1..32767 cc=effff31c |
693 | 4 <abcd> <efg__gh_> | 15: OPEN1 |
694 | 4 <abcd> <efg__gh_> | 17: EXACT <e> |
695 | 5 <abcde> <fg__gh_> | 19: STAR |
696 | EXACT <f> can match 1 times out of 32767... |
697 | Setting an EVAL scope, savestack=3 |
698 | 6 <bcdef> <g__gh__> | 22: EXACT <g> |
699 | 7 <bcdefg> <__gh__> | 24: CLOSE1 |
700 | 7 <bcdefg> <__gh__> | 26: WHILEM |
701 | 1 out of 1..32767 cc=effff31c |
702 | Setting an EVAL scope, savestack=12 |
703 | 7 <bcdefg> <__gh__> | 15: OPEN1 |
704 | 7 <bcdefg> <__gh__> | 17: EXACT <e> |
705 | restoring \1 to 4(4)..7 |
706 | failed, try continuation... |
707 | 7 <bcdefg> <__gh__> | 27: NOTHING |
708 | 7 <bcdefg> <__gh__> | 28: EXACT <h> |
709 | failed... |
710 | failed... |
711 | |
712 | The most significant information in the output is about the particular I<node> |
713 | of the compiled regex that is currently being tested against the target string. |
714 | The format of these lines is |
715 | |
716 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> |
717 | |
718 | The I<TYPE> info is indented with respect to the backtracking level. |
719 | Other incidental information appears interspersed within. |
720 | |
721 | =head1 Debugging Perl memory usage |
722 | |
723 | Perl is a profligate wastrel when it comes to memory use. There |
724 | is a saying that to estimate memory usage of Perl, assume a reasonable |
725 | algorithm for memory allocation, multiply that estimate by 10, and |
726 | while you still may miss the mark, at least you won't be quite so |
4375e838 |
727 | astonished. This is not absolutely true, but may provide a good |
055fd3a9 |
728 | grasp of what happens. |
729 | |
730 | Assume that an integer cannot take less than 20 bytes of memory, a |
731 | float cannot take less than 24 bytes, a string cannot take less |
732 | than 32 bytes (all these examples assume 32-bit architectures, the |
733 | result are quite a bit worse on 64-bit architectures). If a variable |
734 | is accessed in two of three different ways (which require an integer, |
735 | a float, or a string), the memory footprint may increase yet another |
b9449ee0 |
736 | 20 bytes. A sloppy malloc(3) implementation can inflate these |
055fd3a9 |
737 | numbers dramatically. |
738 | |
739 | On the opposite end of the scale, a declaration like |
740 | |
741 | sub foo; |
742 | |
743 | may take up to 500 bytes of memory, depending on which release of Perl |
744 | you're running. |
745 | |
746 | Anecdotal estimates of source-to-compiled code bloat suggest an |
747 | eightfold increase. This means that the compiled form of reasonable |
748 | (normally commented, properly indented etc.) code will take |
749 | about eight times more space in memory than the code took |
750 | on disk. |
751 | |
b30f304a |
752 | The B<-DL> command-line switch is obsolete since circa Perl 5.6.0 |
753 | (it was available only if Perl was built with C<-DDEBUGGING>). |
754 | The switch was used to track Perl's memory allocations and possible |
755 | memory leaks. These days the use of malloc debugging tools like |
5b6a3331 |
756 | F<Purify> or F<valgrind> is suggested instead. See also |
757 | L<perlhack/PERL_MEM_LOG>. |
b30f304a |
758 | |
759 | One way to find out how much memory is being used by Perl data |
760 | structures is to install the Devel::Size module from CPAN: it gives |
761 | you the minimum number of bytes required to store a particular data |
762 | structure. Please be mindful of the difference between the size() |
763 | and total_size(). |
764 | |
765 | If Perl has been compiled using Perl's malloc you can analyze Perl |
766 | memory usage by setting the $ENV{PERL_DEBUG_MSTATS}. |
055fd3a9 |
767 | |
768 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}> |
769 | |
770 | If your perl is using Perl's malloc() and was compiled with the |
771 | necessary switches (this is the default), then it will print memory |
4375e838 |
772 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} |
055fd3a9 |
773 | > 1 >>, and before termination of the program when C<< |
774 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to |
775 | the following example: |
776 | |
777 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" |
778 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) |
779 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0 |
780 | 437 61 36 0 5 |
781 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1 |
782 | 74 109 304 84 20 |
783 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. |
784 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192) |
785 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1 |
786 | 315 162 39 42 11 |
787 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 |
788 | 196 178 1066 798 39 |
789 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. |
790 | |
791 | It is possible to ask for such a statistic at arbitrary points in |
b9449ee0 |
792 | your execution using the mstat() function out of the standard |
055fd3a9 |
793 | Devel::Peek module. |
794 | |
795 | Here is some explanation of that format: |
796 | |
13a2d996 |
797 | =over 4 |
055fd3a9 |
798 | |
799 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> |
800 | |
801 | Perl's malloc() uses bucketed allocations. Every request is rounded |
802 | up to the closest bucket size available, and a bucket is taken from |
803 | the pool of buckets of that size. |
804 | |
805 | The line above describes the limits of buckets currently in use. |
806 | Each bucket has two sizes: memory footprint and the maximal size |
807 | of user data that can fit into this bucket. Suppose in the above |
808 | example that the smallest bucket were size 4. The biggest bucket |
809 | would have usable size 8188, and the memory footprint would be 8192. |
810 | |
811 | In a Perl built for debugging, some buckets may have negative usable |
812 | size. This means that these buckets cannot (and will not) be used. |
813 | For larger buckets, the memory footprint may be one page greater |
814 | than a power of 2. If so, case the corresponding power of two is |
815 | printed in the C<APPROX> field above. |
816 | |
817 | =item Free/Used |
818 | |
819 | The 1 or 2 rows of numbers following that correspond to the number |
820 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In |
821 | the first row, the sizes (memory footprints) of buckets are powers |
822 | of two--or possibly one page greater. In the second row, if present, |
823 | the memory footprints of the buckets are between the memory footprints |
824 | of two buckets "above". |
825 | |
4375e838 |
826 | For example, suppose under the previous example, the memory footprints |
055fd3a9 |
827 | were |
828 | |
829 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192 |
830 | 4 12 24 48 80 |
831 | |
832 | With non-C<DEBUGGING> perl, the buckets starting from C<128> have |
d1be9408 |
833 | a 4-byte overhead, and thus an 8192-long bucket may take up to |
055fd3a9 |
834 | 8188-byte allocations. |
835 | |
836 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> |
837 | |
838 | The first two fields give the total amount of memory perl sbrk(2)ed |
839 | (ess-broken? :-) and number of sbrk(2)s used. The third number is |
840 | what perl thinks about continuity of returned chunks. So long as |
841 | this number is positive, malloc() will assume that it is probable |
842 | that sbrk(2) will provide continuous memory. |
843 | |
844 | Memory allocated by external libraries is not counted. |
845 | |
846 | =item C<pad: 0> |
847 | |
848 | The amount of sbrk(2)ed memory needed to keep buckets aligned. |
849 | |
850 | =item C<heads: 2192> |
851 | |
852 | Although memory overhead of bigger buckets is kept inside the bucket, for |
853 | smaller buckets, it is kept in separate areas. This field gives the |
854 | total size of these areas. |
855 | |
856 | =item C<chain: 0> |
857 | |
858 | malloc() may want to subdivide a bigger bucket into smaller buckets. |
859 | If only a part of the deceased bucket is left unsubdivided, the rest |
860 | is kept as an element of a linked list. This field gives the total |
861 | size of these chunks. |
862 | |
863 | =item C<tail: 6144> |
864 | |
865 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This |
866 | field gives the size of the yet unused part, which is sbrk(2)ed, but |
867 | never touched. |
868 | |
869 | =back |
870 | |
055fd3a9 |
871 | =head1 SEE ALSO |
872 | |
873 | L<perldebug>, |
874 | L<perlguts>, |
875 | L<perlrun> |
876 | L<re>, |
877 | and |
fe854a6f |
878 | L<Devel::DProf>. |