Commit | Line | Data |
055fd3a9 |
1 | =head1 NAME |
2 | |
3 | perldebguts - Guts of Perl debugging |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This is not the perldebug(1) manpage, which tells you how to use |
8 | the debugger. This manpage describes low-level details ranging |
9 | between difficult and impossible for anyone who isn't incredibly |
10 | intimate with Perl's guts to understand. Caveat lector. |
11 | |
12 | =head1 Debugger Internals |
13 | |
14 | Perl has special debugging hooks at compile-time and run-time used |
15 | to create debugging environments. These hooks are not to be confused |
4375e838 |
16 | with the I<perl -Dxxx> command described in L<perlrun>, which is |
17 | usable only if a special Perl is built per the instructions in the |
055fd3a9 |
18 | F<INSTALL> podpage in the Perl source tree. |
19 | |
20 | For example, whenever you call Perl's built-in C<caller> function |
21 | from the package DB, the arguments that the corresponding stack |
106325ad |
22 | frame was called with are copied to the @DB::args array. The |
055fd3a9 |
23 | general mechanisms is enabled by calling Perl with the B<-d> switch, the |
24 | following additional features are enabled (cf. L<perlvar/$^P>): |
25 | |
26 | =over |
27 | |
28 | =item * |
29 | |
30 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require |
31 | 'perl5db.pl'}> if not present) before the first line of your program. |
32 | |
33 | =item * |
34 | |
35 | The array C<@{"_<$filename"}> holds the lines of $filename for all |
36 | files compiled by Perl. The same for C<eval>ed strings that contain |
37 | subroutines, or which are currently being executed. The $filename |
38 | for C<eval>ed strings looks like C<(eval 34)>. Code assertions |
39 | in regexes look like C<(re_eval 19)>. |
40 | |
41 | =item * |
42 | |
43 | The hash C<%{"_<$filename"}> contains breakpoints and actions keyed |
44 | by line number. Individual entries (as opposed to the whole hash) |
45 | are settable. Perl only cares about Boolean true here, although |
46 | the values used by F<perl5db.pl> have the form |
47 | C<"$break_condition\0$action">. Values in this hash are magical |
48 | in numeric context: they are zeros if the line is not breakable. |
49 | |
50 | The same holds for evaluated strings that contain subroutines, or |
51 | which are currently being executed. The $filename for C<eval>ed strings |
52 | looks like C<(eval 34)> or C<(re_eval 19)>. |
53 | |
54 | =item * |
55 | |
56 | The scalar C<${"_<$filename"}> contains C<"_<$filename">. This is |
57 | also the case for evaluated strings that contain subroutines, or |
58 | which are currently being executed. The $filename for C<eval>ed |
59 | strings looks like C<(eval 34)> or C<(re_eval 19)>. |
60 | |
61 | =item * |
62 | |
63 | After each C<require>d file is compiled, but before it is executed, |
64 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine |
65 | C<DB::postponed> exists. Here, the $filename is the expanded name of |
66 | the C<require>d file, as found in the values of %INC. |
67 | |
68 | =item * |
69 | |
70 | After each subroutine C<subname> is compiled, the existence of |
71 | C<$DB::postponed{subname}> is checked. If this key exists, |
72 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine |
73 | also exists. |
74 | |
75 | =item * |
76 | |
77 | A hash C<%DB::sub> is maintained, whose keys are subroutine names |
78 | and whose values have the form C<filename:startline-endline>. |
79 | C<filename> has the form C<(eval 34)> for subroutines defined inside |
80 | C<eval>s, or C<(re_eval 19)> for those within regex code assertions. |
81 | |
82 | =item * |
83 | |
84 | When the execution of your program reaches a point that can hold a |
85 | breakpoint, the C<DB::DB()> subroutine is called any of the variables |
86 | $DB::trace, $DB::single, or $DB::signal is true. These variables |
87 | are not C<local>izable. This feature is disabled when executing |
88 | inside C<DB::DB()>, including functions called from it |
89 | unless C<< $^D & (1<<30) >> is true. |
90 | |
91 | =item * |
92 | |
93 | When execution of the program reaches a subroutine call, a call to |
94 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the |
95 | name of the called subroutine. This doesn't happen if the subroutine |
96 | was compiled in the C<DB> package.) |
97 | |
98 | =back |
99 | |
100 | Note that if C<&DB::sub> needs external data for it to work, no |
101 | subroutine call is possible until this is done. For the standard |
102 | debugger, the C<$DB::deep> variable (how many levels of recursion |
103 | deep into the debugger you can go before a mandatory break) gives |
104 | an example of such a dependency. |
105 | |
106 | =head2 Writing Your Own Debugger |
107 | |
108 | The minimal working debugger consists of one line |
109 | |
110 | sub DB::DB {} |
111 | |
112 | which is quite handy as contents of C<PERL5DB> environment |
113 | variable: |
114 | |
115 | $ PERL5DB="sub DB::DB {}" perl -d your-script |
116 | |
117 | Another brief debugger, slightly more useful, could be created |
118 | with only the line: |
119 | |
120 | sub DB::DB {print ++$i; scalar <STDIN>} |
121 | |
122 | This debugger would print the sequential number of encountered |
123 | statement, and would wait for you to hit a newline before continuing. |
124 | |
125 | The following debugger is quite functional: |
126 | |
127 | { |
128 | package DB; |
129 | sub DB {} |
130 | sub sub {print ++$i, " $sub\n"; &$sub} |
131 | } |
132 | |
133 | It prints the sequential number of subroutine call and the name of the |
134 | called subroutine. Note that C<&DB::sub> should be compiled into the |
135 | package C<DB>. |
136 | |
137 | At the start, the debugger reads your rc file (F<./.perldb> or |
138 | F<~/.perldb> under Unix), which can set important options. This file may |
139 | define a subroutine C<&afterinit> to be executed after the debugger is |
140 | initialized. |
141 | |
142 | After the rc file is read, the debugger reads the PERLDB_OPTS |
143 | environment variable and parses this as the remainder of a C<O ...> |
144 | line as one might enter at the debugger prompt. |
145 | |
146 | The debugger also maintains magical internal variables, such as |
147 | C<@DB::dbline>, C<%DB::dbline>, which are aliases for |
148 | C<@{"::_<current_file"}> C<%{"::_<current_file"}>. Here C<current_file> |
149 | is the currently selected file, either explicitly chosen with the |
150 | debugger's C<f> command, or implicitly by flow of execution. |
151 | |
152 | Some functions are provided to simplify customization. See |
153 | L<perldebug/"Options"> for description of options parsed by |
154 | C<DB::parse_options(string)>. The function C<DB::dump_trace(skip[, |
155 | count])> skips the specified number of frames and returns a list |
156 | containing information about the calling frames (all of them, if |
106325ad |
157 | C<count> is missing). Each entry is reference to a hash with |
055fd3a9 |
158 | keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine |
159 | name, or info about C<eval>), C<args> (C<undef> or a reference to |
160 | an array), C<file>, and C<line>. |
161 | |
162 | The function C<DB::print_trace(FH, skip[, count[, short]])> prints |
163 | formatted info about caller frames. The last two functions may be |
164 | convenient as arguments to C<< < >>, C<< << >> commands. |
165 | |
166 | Note that any variables and functions that are not documented in |
167 | this manpages (or in L<perldebug>) are considered for internal |
168 | use only, and as such are subject to change without notice. |
169 | |
170 | =head1 Frame Listing Output Examples |
171 | |
172 | The C<frame> option can be used to control the output of frame |
173 | information. For example, contrast this expression trace: |
174 | |
175 | $ perl -de 42 |
176 | Stack dump during die enabled outside of evals. |
177 | |
178 | Loading DB routines from perl5db.pl patch level 0.94 |
179 | Emacs support available. |
180 | |
181 | Enter h or `h h' for help. |
182 | |
183 | main::(-e:1): 0 |
184 | DB<1> sub foo { 14 } |
185 | |
186 | DB<2> sub bar { 3 } |
187 | |
188 | DB<3> t print foo() * bar() |
189 | main::((eval 172):3): print foo() + bar(); |
190 | main::foo((eval 168):2): |
191 | main::bar((eval 170):2): |
192 | 42 |
193 | |
194 | with this one, once the C<O>ption C<frame=2> has been set: |
195 | |
196 | DB<4> O f=2 |
197 | frame = '2' |
198 | DB<5> t print foo() * bar() |
199 | 3: foo() * bar() |
200 | entering main::foo |
201 | 2: sub foo { 14 }; |
202 | exited main::foo |
203 | entering main::bar |
204 | 2: sub bar { 3 }; |
205 | exited main::bar |
206 | 42 |
207 | |
208 | By way of demonstration, we present below a laborious listing |
209 | resulting from setting your C<PERLDB_OPTS> environment variable to |
210 | the value C<f=n N>, and running I<perl -d -V> from the command line. |
211 | Examples use various values of C<n> are shown to give you a feel |
212 | for the difference between settings. Long those it may be, this |
213 | is not a complete listing, but only excerpts. |
214 | |
215 | =over 4 |
216 | |
217 | =item 1 |
218 | |
219 | entering main::BEGIN |
220 | entering Config::BEGIN |
221 | Package lib/Exporter.pm. |
222 | Package lib/Carp.pm. |
223 | Package lib/Config.pm. |
224 | entering Config::TIEHASH |
225 | entering Exporter::import |
226 | entering Exporter::export |
227 | entering Config::myconfig |
228 | entering Config::FETCH |
229 | entering Config::FETCH |
230 | entering Config::FETCH |
231 | entering Config::FETCH |
232 | |
233 | =item 2 |
234 | |
235 | entering main::BEGIN |
236 | entering Config::BEGIN |
237 | Package lib/Exporter.pm. |
238 | Package lib/Carp.pm. |
239 | exited Config::BEGIN |
240 | Package lib/Config.pm. |
241 | entering Config::TIEHASH |
242 | exited Config::TIEHASH |
243 | entering Exporter::import |
244 | entering Exporter::export |
245 | exited Exporter::export |
246 | exited Exporter::import |
247 | exited main::BEGIN |
248 | entering Config::myconfig |
249 | entering Config::FETCH |
250 | exited Config::FETCH |
251 | entering Config::FETCH |
252 | exited Config::FETCH |
253 | entering Config::FETCH |
254 | |
255 | =item 4 |
256 | |
257 | in $=main::BEGIN() from /dev/null:0 |
258 | in $=Config::BEGIN() from lib/Config.pm:2 |
259 | Package lib/Exporter.pm. |
260 | Package lib/Carp.pm. |
261 | Package lib/Config.pm. |
262 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
263 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
264 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li |
265 | in @=Config::myconfig() from /dev/null:0 |
266 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
267 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
268 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
269 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
270 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 |
271 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 |
272 | |
273 | =item 6 |
274 | |
275 | in $=main::BEGIN() from /dev/null:0 |
276 | in $=Config::BEGIN() from lib/Config.pm:2 |
277 | Package lib/Exporter.pm. |
278 | Package lib/Carp.pm. |
279 | out $=Config::BEGIN() from lib/Config.pm:0 |
280 | Package lib/Config.pm. |
281 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
282 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
283 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
284 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
285 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
286 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
287 | out $=main::BEGIN() from /dev/null:0 |
288 | in @=Config::myconfig() from /dev/null:0 |
289 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
290 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
291 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
292 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
293 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
294 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
295 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
296 | |
297 | =item 14 |
298 | |
299 | in $=main::BEGIN() from /dev/null:0 |
300 | in $=Config::BEGIN() from lib/Config.pm:2 |
301 | Package lib/Exporter.pm. |
302 | Package lib/Carp.pm. |
303 | out $=Config::BEGIN() from lib/Config.pm:0 |
304 | Package lib/Config.pm. |
305 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
306 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
307 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
308 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
309 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
310 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
311 | out $=main::BEGIN() from /dev/null:0 |
312 | in @=Config::myconfig() from /dev/null:0 |
313 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
314 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
315 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
316 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
317 | |
318 | =item 30 |
319 | |
320 | in $=CODE(0x15eca4)() from /dev/null:0 |
321 | in $=CODE(0x182528)() from lib/Config.pm:2 |
322 | Package lib/Exporter.pm. |
323 | out $=CODE(0x182528)() from lib/Config.pm:0 |
324 | scalar context return from CODE(0x182528): undef |
325 | Package lib/Config.pm. |
326 | in $=Config::TIEHASH('Config') from lib/Config.pm:628 |
327 | out $=Config::TIEHASH('Config') from lib/Config.pm:628 |
328 | scalar context return from Config::TIEHASH: empty hash |
329 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
330 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
331 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
332 | scalar context return from Exporter::export: '' |
333 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
334 | scalar context return from Exporter::import: '' |
335 | |
336 | =back |
337 | |
338 | In all cases shown above, the line indentation shows the call tree. |
339 | If bit 2 of C<frame> is set, a line is printed on exit from a |
340 | subroutine as well. If bit 4 is set, the arguments are printed |
341 | along with the caller info. If bit 8 is set, the arguments are |
342 | printed even if they are tied or references. If bit 16 is set, the |
343 | return value is printed, too. |
344 | |
345 | When a package is compiled, a line like this |
346 | |
347 | Package lib/Carp.pm. |
348 | |
349 | is printed with proper indentation. |
350 | |
351 | =head1 Debugging regular expressions |
352 | |
353 | There are two ways to enable debugging output for regular expressions. |
354 | |
355 | If your perl is compiled with C<-DDEBUGGING>, you may use the |
356 | B<-Dr> flag on the command line. |
357 | |
358 | Otherwise, one can C<use re 'debug'>, which has effects at |
359 | compile time and run time. It is not lexically scoped. |
360 | |
361 | =head2 Compile-time output |
362 | |
363 | The debugging output at compile time looks like this: |
364 | |
365 | compiling RE `[bc]d(ef*g)+h[ij]k$' |
366 | size 43 first at 1 |
367 | 1: ANYOF(11) |
368 | 11: EXACT <d>(13) |
369 | 13: CURLYX {1,32767}(27) |
370 | 15: OPEN1(17) |
371 | 17: EXACT <e>(19) |
372 | 19: STAR(22) |
373 | 20: EXACT <f>(0) |
374 | 22: EXACT <g>(24) |
375 | 24: CLOSE1(26) |
376 | 26: WHILEM(0) |
377 | 27: NOTHING(28) |
378 | 28: EXACT <h>(30) |
379 | 30: ANYOF(40) |
380 | 40: EXACT <k>(42) |
381 | 42: EOL(43) |
382 | 43: END(0) |
383 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) |
384 | stclass `ANYOF' minlen 7 |
385 | |
386 | The first line shows the pre-compiled form of the regex. The second |
387 | shows the size of the compiled form (in arbitrary units, usually |
388 | 4-byte words) and the label I<id> of the first node that does a |
389 | match. |
390 | |
391 | The last line (split into two lines above) contains optimizer |
392 | information. In the example shown, the optimizer found that the match |
393 | should contain a substring C<de> at offset 1, plus substring C<gh> |
394 | at some offset between 3 and infinity. Moreover, when checking for |
395 | these substrings (to abandon impossible matches quickly), Perl will check |
396 | for the substring C<gh> before checking for the substring C<de>. The |
397 | optimizer may also use the knowledge that the match starts (at the |
398 | C<first> I<id>) with a character class, and the match cannot be |
399 | shorter than 7 chars. |
400 | |
401 | The fields of interest which may appear in the last line are |
402 | |
403 | =over |
404 | |
405 | =item C<anchored> I<STRING> C<at> I<POS> |
406 | |
407 | =item C<floating> I<STRING> C<at> I<POS1..POS2> |
408 | |
409 | See above. |
410 | |
411 | =item C<matching floating/anchored> |
412 | |
413 | Which substring to check first. |
414 | |
415 | =item C<minlen> |
416 | |
417 | The minimal length of the match. |
418 | |
419 | =item C<stclass> I<TYPE> |
420 | |
421 | Type of first matching node. |
422 | |
423 | =item C<noscan> |
424 | |
425 | Don't scan for the found substrings. |
426 | |
427 | =item C<isall> |
428 | |
429 | Means that the optimizer info is all that the regular |
430 | expression contains, and thus one does not need to enter the regex engine at |
431 | all. |
432 | |
433 | =item C<GPOS> |
434 | |
435 | Set if the pattern contains C<\G>. |
436 | |
437 | =item C<plus> |
438 | |
439 | Set if the pattern starts with a repeated char (as in C<x+y>). |
440 | |
441 | =item C<implicit> |
442 | |
443 | Set if the pattern starts with C<.*>. |
444 | |
445 | =item C<with eval> |
446 | |
447 | Set if the pattern contain eval-groups, such as C<(?{ code })> and |
448 | C<(??{ code })>. |
449 | |
450 | =item C<anchored(TYPE)> |
451 | |
452 | If the pattern may match only at a handful of places, (with C<TYPE> |
453 | being C<BOL>, C<MBOL>, or C<GPOS>. See the table below. |
454 | |
455 | =back |
456 | |
457 | If a substring is known to match at end-of-line only, it may be |
458 | followed by C<$>, as in C<floating `k'$>. |
459 | |
460 | The optimizer-specific info is used to avoid entering (a slow) regex |
461 | engine on strings that will not definitely match. If C<isall> flag |
462 | is set, a call to the regex engine may be avoided even when the optimizer |
463 | found an appropriate place for the match. |
464 | |
465 | The rest of the output contains the list of I<nodes> of the compiled |
466 | form of the regex. Each line has format |
467 | |
468 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) |
469 | |
470 | =head2 Types of nodes |
471 | |
472 | Here are the possible types, with short descriptions: |
473 | |
474 | # TYPE arg-description [num-args] [longjump-len] DESCRIPTION |
475 | |
476 | # Exit points |
477 | END no End of program. |
478 | SUCCEED no Return from a subroutine, basically. |
479 | |
480 | # Anchors: |
481 | BOL no Match "" at beginning of line. |
482 | MBOL no Same, assuming multiline. |
483 | SBOL no Same, assuming singleline. |
484 | EOS no Match "" at end of string. |
485 | EOL no Match "" at end of line. |
486 | MEOL no Same, assuming multiline. |
487 | SEOL no Same, assuming singleline. |
488 | BOUND no Match "" at any word boundary |
489 | BOUNDL no Match "" at any word boundary |
490 | NBOUND no Match "" at any word non-boundary |
491 | NBOUNDL no Match "" at any word non-boundary |
492 | GPOS no Matches where last m//g left off. |
493 | |
494 | # [Special] alternatives |
495 | ANY no Match any one character (except newline). |
496 | SANY no Match any one character. |
497 | ANYOF sv Match character in (or not in) this class. |
498 | ALNUM no Match any alphanumeric character |
499 | ALNUML no Match any alphanumeric char in locale |
500 | NALNUM no Match any non-alphanumeric character |
501 | NALNUML no Match any non-alphanumeric char in locale |
502 | SPACE no Match any whitespace character |
503 | SPACEL no Match any whitespace char in locale |
504 | NSPACE no Match any non-whitespace character |
505 | NSPACEL no Match any non-whitespace char in locale |
506 | DIGIT no Match any numeric character |
507 | NDIGIT no Match any non-numeric character |
508 | |
509 | # BRANCH The set of branches constituting a single choice are hooked |
510 | # together with their "next" pointers, since precedence prevents |
511 | # anything being concatenated to any individual branch. The |
512 | # "next" pointer of the last BRANCH in a choice points to the |
513 | # thing following the whole choice. This is also where the |
514 | # final "next" pointer of each individual branch points; each |
515 | # branch starts with the operand node of a BRANCH node. |
516 | # |
517 | BRANCH node Match this alternative, or the next... |
518 | |
519 | # BACK Normal "next" pointers all implicitly point forward; BACK |
520 | # exists to make loop structures possible. |
521 | # not used |
522 | BACK no Match "", "next" ptr points backward. |
523 | |
524 | # Literals |
525 | EXACT sv Match this string (preceded by length). |
526 | EXACTF sv Match this string, folded (prec. by length). |
527 | EXACTFL sv Match this string, folded in locale (w/len). |
528 | |
529 | # Do nothing |
530 | NOTHING no Match empty string. |
531 | # A variant of above which delimits a group, thus stops optimizations |
532 | TAIL no Match empty string. Can jump here from outside. |
533 | |
534 | # STAR,PLUS '?', and complex '*' and '+', are implemented as circular |
535 | # BRANCH structures using BACK. Simple cases (one character |
536 | # per match) are implemented with STAR and PLUS for speed |
537 | # and to minimize recursive plunges. |
538 | # |
539 | STAR node Match this (simple) thing 0 or more times. |
540 | PLUS node Match this (simple) thing 1 or more times. |
541 | |
542 | CURLY sv 2 Match this simple thing {n,m} times. |
543 | CURLYN no 2 Match next-after-this simple thing |
544 | # {n,m} times, set parens. |
545 | CURLYM no 2 Match this medium-complex thing {n,m} times. |
546 | CURLYX sv 2 Match this complex thing {n,m} times. |
547 | |
548 | # This terminator creates a loop structure for CURLYX |
549 | WHILEM no Do curly processing and see if rest matches. |
550 | |
551 | # OPEN,CLOSE,GROUPP ...are numbered at compile time. |
552 | OPEN num 1 Mark this point in input as start of #n. |
553 | CLOSE num 1 Analogous to OPEN. |
554 | |
555 | REF num 1 Match some already matched string |
556 | REFF num 1 Match already matched string, folded |
557 | REFFL num 1 Match already matched string, folded in loc. |
558 | |
559 | # grouping assertions |
560 | IFMATCH off 1 2 Succeeds if the following matches. |
561 | UNLESSM off 1 2 Fails if the following matches. |
562 | SUSPEND off 1 1 "Independent" sub-regex. |
563 | IFTHEN off 1 1 Switch, should be preceded by switcher . |
564 | GROUPP num 1 Whether the group matched. |
565 | |
566 | # Support for long regex |
567 | LONGJMP off 1 1 Jump far away. |
568 | BRANCHJ off 1 1 BRANCH with long offset. |
569 | |
570 | # The heavy worker |
571 | EVAL evl 1 Execute some Perl code. |
572 | |
573 | # Modifiers |
574 | MINMOD no Next operator is not greedy. |
575 | LOGICAL no Next opcode should set the flag only. |
576 | |
577 | # This is not used yet |
578 | RENUM off 1 1 Group with independently numbered parens. |
579 | |
580 | # This is not really a node, but an optimized away piece of a "long" node. |
581 | # To simplify debugging output, we mark it as if it were a node |
582 | OPTIMIZED off Placeholder for dump. |
583 | |
584 | =head2 Run-time output |
585 | |
586 | First of all, when doing a match, one may get no run-time output even |
587 | if debugging is enabled. This means that the regex engine was never |
588 | entered and that all of the job was therefore done by the optimizer. |
589 | |
590 | If the regex engine was entered, the output may look like this: |
591 | |
592 | Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' |
593 | Setting an EVAL scope, savestack=3 |
594 | 2 <ab> <cdefg__gh_> | 1: ANYOF |
595 | 3 <abc> <defg__gh_> | 11: EXACT <d> |
596 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} |
597 | 4 <abcd> <efg__gh_> | 26: WHILEM |
598 | 0 out of 1..32767 cc=effff31c |
599 | 4 <abcd> <efg__gh_> | 15: OPEN1 |
600 | 4 <abcd> <efg__gh_> | 17: EXACT <e> |
601 | 5 <abcde> <fg__gh_> | 19: STAR |
602 | EXACT <f> can match 1 times out of 32767... |
603 | Setting an EVAL scope, savestack=3 |
604 | 6 <bcdef> <g__gh__> | 22: EXACT <g> |
605 | 7 <bcdefg> <__gh__> | 24: CLOSE1 |
606 | 7 <bcdefg> <__gh__> | 26: WHILEM |
607 | 1 out of 1..32767 cc=effff31c |
608 | Setting an EVAL scope, savestack=12 |
609 | 7 <bcdefg> <__gh__> | 15: OPEN1 |
610 | 7 <bcdefg> <__gh__> | 17: EXACT <e> |
611 | restoring \1 to 4(4)..7 |
612 | failed, try continuation... |
613 | 7 <bcdefg> <__gh__> | 27: NOTHING |
614 | 7 <bcdefg> <__gh__> | 28: EXACT <h> |
615 | failed... |
616 | failed... |
617 | |
618 | The most significant information in the output is about the particular I<node> |
619 | of the compiled regex that is currently being tested against the target string. |
620 | The format of these lines is |
621 | |
622 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> |
623 | |
624 | The I<TYPE> info is indented with respect to the backtracking level. |
625 | Other incidental information appears interspersed within. |
626 | |
627 | =head1 Debugging Perl memory usage |
628 | |
629 | Perl is a profligate wastrel when it comes to memory use. There |
630 | is a saying that to estimate memory usage of Perl, assume a reasonable |
631 | algorithm for memory allocation, multiply that estimate by 10, and |
632 | while you still may miss the mark, at least you won't be quite so |
4375e838 |
633 | astonished. This is not absolutely true, but may provide a good |
055fd3a9 |
634 | grasp of what happens. |
635 | |
636 | Assume that an integer cannot take less than 20 bytes of memory, a |
637 | float cannot take less than 24 bytes, a string cannot take less |
638 | than 32 bytes (all these examples assume 32-bit architectures, the |
639 | result are quite a bit worse on 64-bit architectures). If a variable |
640 | is accessed in two of three different ways (which require an integer, |
641 | a float, or a string), the memory footprint may increase yet another |
b9449ee0 |
642 | 20 bytes. A sloppy malloc(3) implementation can inflate these |
055fd3a9 |
643 | numbers dramatically. |
644 | |
645 | On the opposite end of the scale, a declaration like |
646 | |
647 | sub foo; |
648 | |
649 | may take up to 500 bytes of memory, depending on which release of Perl |
650 | you're running. |
651 | |
652 | Anecdotal estimates of source-to-compiled code bloat suggest an |
653 | eightfold increase. This means that the compiled form of reasonable |
654 | (normally commented, properly indented etc.) code will take |
655 | about eight times more space in memory than the code took |
656 | on disk. |
657 | |
658 | There are two Perl-specific ways to analyze memory usage: |
659 | $ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first |
660 | is available only if Perl is compiled with Perl's malloc(); the |
661 | second only if Perl was built with C<-DDEBUGGING>. See the |
662 | instructions for how to do this in the F<INSTALL> podpage at |
663 | the top level of the Perl source tree. |
664 | |
665 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}> |
666 | |
667 | If your perl is using Perl's malloc() and was compiled with the |
668 | necessary switches (this is the default), then it will print memory |
4375e838 |
669 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} |
055fd3a9 |
670 | > 1 >>, and before termination of the program when C<< |
671 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to |
672 | the following example: |
673 | |
674 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" |
675 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) |
676 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0 |
677 | 437 61 36 0 5 |
678 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1 |
679 | 74 109 304 84 20 |
680 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. |
681 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192) |
682 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1 |
683 | 315 162 39 42 11 |
684 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 |
685 | 196 178 1066 798 39 |
686 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. |
687 | |
688 | It is possible to ask for such a statistic at arbitrary points in |
b9449ee0 |
689 | your execution using the mstat() function out of the standard |
055fd3a9 |
690 | Devel::Peek module. |
691 | |
692 | Here is some explanation of that format: |
693 | |
694 | =over |
695 | |
696 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> |
697 | |
698 | Perl's malloc() uses bucketed allocations. Every request is rounded |
699 | up to the closest bucket size available, and a bucket is taken from |
700 | the pool of buckets of that size. |
701 | |
702 | The line above describes the limits of buckets currently in use. |
703 | Each bucket has two sizes: memory footprint and the maximal size |
704 | of user data that can fit into this bucket. Suppose in the above |
705 | example that the smallest bucket were size 4. The biggest bucket |
706 | would have usable size 8188, and the memory footprint would be 8192. |
707 | |
708 | In a Perl built for debugging, some buckets may have negative usable |
709 | size. This means that these buckets cannot (and will not) be used. |
710 | For larger buckets, the memory footprint may be one page greater |
711 | than a power of 2. If so, case the corresponding power of two is |
712 | printed in the C<APPROX> field above. |
713 | |
714 | =item Free/Used |
715 | |
716 | The 1 or 2 rows of numbers following that correspond to the number |
717 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In |
718 | the first row, the sizes (memory footprints) of buckets are powers |
719 | of two--or possibly one page greater. In the second row, if present, |
720 | the memory footprints of the buckets are between the memory footprints |
721 | of two buckets "above". |
722 | |
4375e838 |
723 | For example, suppose under the previous example, the memory footprints |
055fd3a9 |
724 | were |
725 | |
726 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192 |
727 | 4 12 24 48 80 |
728 | |
729 | With non-C<DEBUGGING> perl, the buckets starting from C<128> have |
730 | a 4-byte overhead, and thus a 8192-long bucket may take up to |
731 | 8188-byte allocations. |
732 | |
733 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> |
734 | |
735 | The first two fields give the total amount of memory perl sbrk(2)ed |
736 | (ess-broken? :-) and number of sbrk(2)s used. The third number is |
737 | what perl thinks about continuity of returned chunks. So long as |
738 | this number is positive, malloc() will assume that it is probable |
739 | that sbrk(2) will provide continuous memory. |
740 | |
741 | Memory allocated by external libraries is not counted. |
742 | |
743 | =item C<pad: 0> |
744 | |
745 | The amount of sbrk(2)ed memory needed to keep buckets aligned. |
746 | |
747 | =item C<heads: 2192> |
748 | |
749 | Although memory overhead of bigger buckets is kept inside the bucket, for |
750 | smaller buckets, it is kept in separate areas. This field gives the |
751 | total size of these areas. |
752 | |
753 | =item C<chain: 0> |
754 | |
755 | malloc() may want to subdivide a bigger bucket into smaller buckets. |
756 | If only a part of the deceased bucket is left unsubdivided, the rest |
757 | is kept as an element of a linked list. This field gives the total |
758 | size of these chunks. |
759 | |
760 | =item C<tail: 6144> |
761 | |
762 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This |
763 | field gives the size of the yet unused part, which is sbrk(2)ed, but |
764 | never touched. |
765 | |
766 | =back |
767 | |
768 | =head2 Example of using B<-DL> switch |
769 | |
770 | Below we show how to analyse memory usage by |
771 | |
772 | do 'lib/auto/POSIX/autosplit.ix'; |
773 | |
774 | The file in question contains a header and 146 lines similar to |
775 | |
776 | sub getcwd; |
777 | |
778 | B<WARNING>: The discussion below supposes 32-bit architecture. In |
779 | newer releases of Perl, memory usage of the constructs discussed |
780 | here is greatly improved, but the story discussed below is a real-life |
781 | story. This story is mercilessly terse, and assumes rather more than cursory |
782 | knowledge of Perl internals. Type space to continue, `q' to quit. |
783 | (Actually, you just want to skip to the next section.) |
784 | |
785 | Here is the itemized list of Perl allocations performed during parsing |
786 | of this file: |
787 | |
788 | !!! "after" at test.pl line 3. |
789 | Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+ |
790 | 0 02 13752 . . . . 294 . . . . . . . . . . 4 |
791 | 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3 |
792 | 5 05 32 . . . . . . . 1 . . . . . . . . |
793 | 6 02 7152 . . . . . . . . . . 149 . . . . . |
794 | 7 02 3600 . . . . . 150 . . . . . . . . . . |
795 | 7 03 64 . -1 . 1 . . 2 . . . . . . . . . |
796 | 7 04 7056 . . . . . . . . . . . . . . . 7 |
797 | 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 . |
798 | 9 03 2078 17 249 32 . . . . 2 . . . . . . . . |
799 | |
800 | |
801 | To see this list, insert two C<warn('!...')> statements around the call: |
802 | |
803 | warn('!'); |
804 | do 'lib/auto/POSIX/autosplit.ix'; |
805 | warn('!!! "after"'); |
806 | |
4375e838 |
807 | and run it with Perl's B<-DL> option. The first warn() will print |
055fd3a9 |
808 | memory allocation info before parsing the file and will memorize |
809 | the statistics at this point (we ignore what it prints). The second |
810 | warn() prints increments with respect to these memorized data. This |
811 | is the printout shown above. |
812 | |
813 | Different I<Id>s on the left correspond to different subsystems of |
814 | the perl interpreter. They are just the first argument given to |
815 | the perl memory allocation API named New(). To find what C<9 03> |
816 | means, just B<grep> the perl source for C<903>. You'll find it in |
817 | F<util.c>, function savepvn(). (I know, you wonder why we told you |
818 | to B<grep> and then gave away the answer. That's because grepping |
819 | the source is good for the soul.) This function is used to store |
820 | a copy of an existing chunk of memory. Using a C debugger, one can |
821 | see that the function was called either directly from gv_init() or |
822 | via sv_magic(), and that gv_init() is called from gv_fetchpv()--which |
823 | was itself called from newSUB(). Please stop to catch your breath now. |
824 | |
825 | B<NOTE>: To reach this point in the debugger and skip the calls to |
826 | savepvn() during the compilation of the main program, you should |
827 | set a C breakpoint |
828 | in Perl_warn(), continue until this point is reached, and I<then> set |
829 | a C breakpoint in Perl_savepvn(). Note that you may need to skip a |
830 | handful of Perl_savepvn() calls that do not correspond to mass production |
831 | of CVs (there are more C<903> allocations than 146 similar lines of |
832 | F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are |
833 | added by macroization code in perl header files to avoid conflicts |
834 | with external libraries. |
835 | |
836 | Anyway, we see that C<903> ids correspond to creation of globs, twice |
837 | per glob - for glob name, and glob stringification magic. |
838 | |
839 | Here are explanations for other I<Id>s above: |
840 | |
841 | =over |
842 | |
843 | =item C<717> |
844 | |
4375e838 |
845 | Creates bigger C<XPV*> structures. In the case above, it |
055fd3a9 |
846 | creates 3 C<AV>s per subroutine, one for a list of lexical variable |
847 | names, one for a scratchpad (which contains lexical variables and |
848 | C<targets>), and one for the array of scratchpads needed for |
849 | recursion. |
850 | |
851 | It also creates a C<GV> and a C<CV> per subroutine, all called from |
852 | start_subparse(). |
853 | |
854 | =item C<002> |
855 | |
856 | Creates a C array corresponding to the C<AV> of scratchpads and the |
857 | scratchpad itself. The first fake entry of this scratchpad is |
858 | created though the subroutine itself is not defined yet. |
859 | |
860 | It also creates C arrays to keep data for the stash. This is one HV, |
861 | but it grows; thus, there are 4 big allocations: the big chunks are not |
862 | freed, but are kept as additional arenas for C<SV> allocations. |
863 | |
864 | =item C<054> |
865 | |
866 | Creates a C<HEK> for the name of the glob for the subroutine. This |
867 | name is a key in a I<stash>. |
868 | |
869 | Big allocations with this I<Id> correspond to allocations of new |
870 | arenas to keep C<HE>. |
871 | |
872 | =item C<602> |
873 | |
874 | Creates a C<GP> for the glob for the subroutine. |
875 | |
876 | =item C<702> |
877 | |
878 | Creates the C<MAGIC> for the glob for the subroutine. |
879 | |
880 | =item C<704> |
881 | |
882 | Creates I<arenas> which keep SVs. |
883 | |
884 | =back |
885 | |
886 | =head2 B<-DL> details |
887 | |
888 | If Perl is run with B<-DL> option, then warn()s that start with `!' |
889 | behave specially. They print a list of I<categories> of memory |
890 | allocations, and statistics of allocations of different sizes for |
891 | these categories. |
892 | |
893 | If warn() string starts with |
894 | |
895 | =over |
896 | |
897 | =item C<!!!> |
898 | |
899 | print changed categories only, print the differences in counts of allocations. |
900 | |
901 | =item C<!!> |
902 | |
903 | print grown categories only; print the absolute values of counts, and totals. |
904 | |
905 | =item C<!> |
906 | |
907 | print nonempty categories, print the absolute values of counts and totals. |
908 | |
909 | =back |
910 | |
911 | =head2 Limitations of B<-DL> statistics |
912 | |
913 | If an extension or external library does not use the Perl API to |
914 | allocate memory, such allocations are not counted. |
915 | |
916 | =head1 SEE ALSO |
917 | |
918 | L<perldebug>, |
919 | L<perlguts>, |
920 | L<perlrun> |
921 | L<re>, |
922 | and |
923 | L<Devel::Dprof>. |