Commit | Line | Data |
54a137f5 |
1 | =head1 NAME |
2 | |
3 | perlcompile - Introduction to the Perl Compiler-Translator |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | Perl has always had a compiler: your source is compiled into an |
8 | internal form (a parse tree) which is then optimized before being |
9 | run. Since version 5.005, Perl has shipped with a module |
10 | capable of inspecting the optimized parse tree (C<B>), and this has |
11 | been used to write many useful utilities, including a module that lets |
12 | you turn your Perl into C source code that can be compiled into an |
13 | native executable. |
14 | |
15 | The C<B> module provides access to the parse tree, and other modules |
16 | ("back ends") do things with the tree. Some write it out as |
17 | bytecode, C source code, or a semi-human-readable text. Another |
18 | traverses the parse tree to build a cross-reference of which |
19 | subroutines, formats, and variables are used where. Another checks |
20 | your code for dubious constructs. Yet another back end dumps the |
21 | parse tree back out as Perl source, acting as a source code beautifier |
22 | or deobfuscator. |
23 | |
24 | Because its original purpose was to be a way to produce C code |
25 | corresponding to a Perl program, and in turn a native executable, the |
26 | C<B> module and its associated back ends are known as "the |
27 | compiler", even though they don't really compile anything. |
28 | Different parts of the compiler are more accurately a "translator", |
29 | or an "inspector", but people want Perl to have a "compiler |
30 | option" not an "inspector gadget". What can you do? |
31 | |
32 | This document covers the use of the Perl compiler: which modules |
33 | it comprises, how to use the most important of the back end modules, |
34 | what problems there are, and how to work around them. |
35 | |
36 | =head2 Layout |
37 | |
38 | The compiler back ends are in the C<B::> hierarchy, and the front-end |
39 | (the module that you, the user of the compiler, will sometimes |
40 | interact with) is the O module. Some back ends (e.g., C<B::C>) have |
41 | programs (e.g., I<perlcc>) to hide the modules' complexity. |
42 | |
43 | Here are the important back ends to know about, with their status |
44 | expressed as a number from 0 (outline for later implementation) to |
45 | 10 (if there's a bug in it, we're very surprised): |
46 | |
47 | =over 4 |
48 | |
49 | =item B::Bytecode |
50 | |
51 | Stores the parse tree in a machine-independent format, suitable |
52 | for later reloading through the ByteLoader module. Status: 5 (some |
53 | things work, some things don't, some things are untested). |
54 | |
55 | =item B::C |
56 | |
57 | Creates a C source file containing code to rebuild the parse tree |
58 | and resume the interpreter. Status: 6 (many things work adequately, |
59 | including programs using Tk). |
60 | |
61 | =item B::CC |
62 | |
63 | Creates a C source file corresponding to the run time code path in |
64 | the parse tree. This is the closest to a Perl-to-C translator there |
65 | is, but the code it generates is almost incomprehensible because it |
66 | translates the parse tree into a giant switch structure that |
67 | manipulates Perl structures. Eventual goal is to reduce (given |
68 | sufficient type information in the Perl program) some of the |
69 | Perl data structure manipulations into manipulations of C-level |
70 | ints, floats, etc. Status: 5 (some things work, including |
71 | uncomplicated Tk examples). |
72 | |
73 | =item B::Lint |
74 | |
75 | Complains if it finds dubious constructs in your source code. Status: |
76 | 6 (it works adequately, but only has a very limited number of areas |
77 | that it checks). |
78 | |
79 | =item B::Deparse |
80 | |
81 | Recreates the Perl source, making an attempt to format it coherently. |
82 | Status: 8 (it works nicely, but a few obscure things are missing). |
83 | |
84 | =item B::Xref |
85 | |
86 | Reports on the declaration and use of subroutines and variables. |
87 | Status: 8 (it works nicely, but still has a few lingering bugs). |
88 | |
89 | =back |
90 | |
91 | =head1 Using The Back Ends |
92 | |
93 | The following sections describe how to use the various compiler back |
94 | ends. They're presented roughly in order of maturity, so that the |
95 | most stable and proven back ends are described first, and the most |
96 | experimental and incomplete back ends are described last. |
97 | |
98 | The O module automatically enabled the B<-c> flag to Perl, which |
99 | prevents Perl from executing your code once it has been compiled. |
100 | This is why all the back ends print: |
101 | |
102 | myperlprogram syntax OK |
103 | |
104 | before producing any other output. |
105 | |
106 | =head2 The Cross Referencing Back End (B::Xref) |
107 | |
108 | The cross referencing back end produces a report on your program, |
109 | breaking down declarations and uses of subroutines and variables (and |
110 | formats) by file and subroutine. For instance, here's part of the |
111 | report from the I<pod2man> program that comes with Perl: |
112 | |
113 | Subroutine clear_noremap |
114 | Package (lexical) |
115 | $ready_to_print i1069, 1079 |
116 | Package main |
117 | $& 1086 |
118 | $. 1086 |
119 | $0 1086 |
120 | $1 1087 |
121 | $2 1085, 1085 |
122 | $3 1085, 1085 |
123 | $ARGV 1086 |
124 | %HTML_Escapes 1085, 1085 |
125 | |
126 | This shows the variables used in the subroutine C<clear_noremap>. The |
127 | variable C<$ready_to_print> is a my() (lexical) variable, |
128 | B<i>ntroduced (first declared with my()) on line 1069, and used on |
129 | line 1079. The variable C<$&> from the main package is used on 1086, |
130 | and so on. |
131 | |
132 | A line number may be prefixed by a single letter: |
133 | |
134 | =over 4 |
135 | |
136 | =item i |
137 | |
138 | Lexical variable introduced (declared with my()) for the first time. |
139 | |
140 | =item & |
141 | |
142 | Subroutine or method call. |
143 | |
144 | =item s |
145 | |
146 | Subroutine defined. |
147 | |
148 | =item r |
149 | |
150 | Format defined. |
151 | |
152 | =back |
153 | |
154 | The most useful option the cross referencer has is to save the report |
155 | to a separate file. For instance, to save the report on |
156 | I<myperlprogram> to the file I<report>: |
157 | |
158 | $ perl -MO=Xref,-oreport myperlprogram |
159 | |
160 | =head2 The Decompiling Back End |
161 | |
162 | The Deparse back end turns your Perl source back into Perl source. It |
163 | can reformat along the way, making it useful as a de-obfuscator. The |
164 | most basic way to use it is: |
165 | |
166 | $ perl -MO=Deparse myperlprogram |
167 | |
168 | You'll notice immediately that Perl has no idea of how to paragraph |
169 | your code. You'll have to separate chunks of code from each other |
170 | with newlines by hand. However, watch what it will do with |
171 | one-liners: |
172 | |
173 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 |
174 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; |
175 | die$@ if$@; rename$was,$_ unless$was eq $_}' |
176 | -e syntax OK |
177 | $op = shift @ARGV || die("usage: $0 code [...]"); |
178 | chomp(@ARGV = <ARGV>) unless @ARGV; |
179 | foreach $_ (@ARGV) { |
180 | $was = $_; |
181 | eval $op; |
182 | die $@ if $@; |
183 | rename $was, $_ unless $was eq $_; |
184 | } |
185 | |
186 | (this is the I<rename> program that comes in the I<eg/> directory |
187 | of the Perl source distribution). |
188 | |
189 | The decompiler has several options for the code it generates. For |
190 | instance, you can set the size of each indent from 4 (as above) to |
191 | 2 with: |
192 | |
193 | $ perl -MO=Deparse,-si2 myperlprogram |
194 | |
195 | The B<-p> option adds parentheses where normally they are omitted: |
196 | |
197 | $ perl -MO=Deparse -e 'print "Hello, world\n"' |
198 | -e syntax OK |
199 | print "Hello, world\n"; |
200 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' |
201 | -e syntax OK |
202 | print("Hello, world\n"); |
203 | |
204 | See L<B::Deparse> for more information on the formatting options. |
205 | |
206 | =head2 The Lint Back End (B::Lint) |
207 | |
208 | The lint back end inspects programs for poor style. One programmer's |
209 | bad style is another programmer's useful tool, so options let you |
210 | select what is complained about. |
211 | |
212 | To run the style checker across your source code: |
213 | |
214 | $ perl -MO=Lint myperlprogram |
215 | |
216 | To disable context checks and undefined subroutines: |
217 | |
218 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram |
219 | |
220 | See L<B::Lint> for information on the options. |
221 | |
222 | =head2 The Simple C Back End |
223 | |
224 | This module saves the internal compiled state of your Perl program |
225 | to a C source file, which can be turned into a native executable |
226 | for that particular platform using a C compiler. The resulting |
227 | program links against the Perl interpreter library, so it |
228 | will not save you disk space (unless you build Perl with a shared |
229 | library) or program size. It may, however, save you startup time. |
230 | |
231 | The C<perlcc> tool generates such executables by default. |
232 | |
233 | perlcc myperlprogram.pl |
234 | |
235 | =head2 The Bytecode Back End |
236 | |
237 | This back end is only useful if you also have a way to load and |
238 | execute the bytecode that it produces. The ByteLoader module provides |
239 | this functionality. |
240 | |
241 | To turn a Perl program into executable byte code, you can use C<perlcc> |
242 | with the C<-b> switch: |
243 | |
244 | perlcc -b myperlprogram.pl |
245 | |
246 | The byte code is machine independent, so once you have a compiled |
247 | module or program, it is as portable as Perl source (assuming that |
248 | the user of the module or program has a modern-enough Perl interpreter |
249 | to decode the byte code). |
250 | |
251 | See B<B::Bytecode> for information on options to control the |
252 | optimization and nature of the code generated by the Bytecode module. |
253 | |
254 | =head2 The Optimized C Back End |
255 | |
256 | The optimized C back end will turn your Perl program's run time |
257 | code-path into an equivalent (but optimized) C program that manipulates |
258 | the Perl data structures directly. The program will still link against |
259 | the Perl interpreter library, to allow for eval(), C<s///e>, |
260 | C<require>, etc. |
261 | |
262 | The C<perlcc> tool generates such executables when using the -opt |
263 | switch. To compile a Perl program (ending in C<.pl> |
264 | or C<.p>): |
265 | |
266 | perlcc -opt myperlprogram.pl |
267 | |
268 | To produce a shared library from a Perl module (ending in C<.pm>): |
269 | |
270 | perlcc -opt Myperlmodule.pm |
271 | |
272 | For more information, see L<perlcc> and L<B::CC>. |
273 | |
274 | =over 4 |
275 | |
276 | =item B |
277 | |
278 | This module is the introspective ("reflective" in Java terms) |
279 | module, which allows a Perl program to inspect its innards. The |
280 | back end modules all use this module to gain access to the compiled |
281 | parse tree. You, the user of a back end module, will not need to |
282 | interact with B. |
283 | |
284 | =item O |
285 | |
286 | This module is the front-end to the compiler's back ends. Normally |
287 | called something like this: |
288 | |
289 | $ perl -MO=Deparse myperlprogram |
290 | |
291 | This is like saying C<use O 'Deparse'> in your Perl program. |
292 | |
293 | =item B::Asmdata |
294 | |
295 | This module is used by the B::Assembler module, which is in turn used |
296 | by the B::Bytecode module, which stores a parse-tree as |
297 | bytecode for later loading. It's not a back end itself, but rather a |
298 | component of a back end. |
299 | |
300 | =item B::Assembler |
301 | |
302 | This module turns a parse-tree into data suitable for storing |
303 | and later decoding back into a parse-tree. It's not a back end |
304 | itself, but rather a component of a back end. It's used by the |
305 | I<assemble> program that produces bytecode. |
306 | |
307 | =item B::Bblock |
308 | |
309 | This module is used by the B::CC back end. It walks "basic blocks", |
310 | whatever they may be. |
311 | |
312 | =item B::Bytecode |
313 | |
314 | This module is a back end that generates bytecode from a |
315 | program's parse tree. This bytecode is written to a file, from where |
316 | it can later be reconstructed back into a parse tree. The goal is to |
317 | do the expensive program compilation once, save the interpreter's |
318 | state into a file, and then restore the state from the file when the |
319 | program is to be executed. See L</"The Bytecode Back End"> |
320 | for details about usage. |
321 | |
322 | =item B::C |
323 | |
324 | This module writes out C code corresponding to the parse tree and |
325 | other interpreter internal structures. You compile the corresponding |
326 | C file, and get an executable file that will restore the internal |
327 | structures and the Perl interpreter will begin running the |
328 | program. See L</"The Simple C Back End"> for details about usage. |
329 | |
330 | =item B::CC |
331 | |
332 | This module writes out C code corresponding to your program's |
333 | operations. Unlike the B::C module, which merely stores the |
334 | interpreter and its state in a C program, the B::CC module makes a |
335 | C program that does not involve the interpreter. As a consequence, |
336 | programs translated into C by B::CC can execute faster than normal |
337 | interpreted programs. See L</"The Optimized C Back End"> for |
338 | details about usage. |
339 | |
340 | =item B::Debug |
341 | |
342 | This module dumps the Perl parse tree in verbose detail to STDOUT. |
343 | It's useful for people who are writing their own back end, or who |
344 | are learning about the Perl internals. It's not useful to the |
345 | average programmer. |
346 | |
347 | =item B::Deparse |
348 | |
349 | This module produces Perl source code from the compiled parse tree. |
350 | It is useful in debugging and deconstructing other people's code, |
351 | also as a pretty-printer for your own source. See |
352 | L</"The Decompiling Back End"> for details about usage. |
353 | |
354 | =item B::Disassembler |
355 | |
356 | This module turns bytecode back into a parse tree. It's not a back |
357 | end itself, but rather a component of a back end. It's used by the |
358 | I<disassemble> program that comes with the bytecode. |
359 | |
360 | =item B::Lint |
361 | |
362 | This module inspects the compiled form of your source code for things |
363 | which, while some people frown on them, aren't necessarily bad enough |
364 | to justify a warning. For instance, use of an array in scalar context |
365 | without explicitly saying C<scalar(@array)> is something that Lint |
366 | can identify. See L</"The Lint Back End"> for details about usage. |
367 | |
368 | =item B::Showlex |
369 | |
370 | This module prints out the my() variables used in a function or a |
371 | file. To gt a list of the my() variables used in the subroutine |
372 | mysub() defined in the file myperlprogram: |
373 | |
374 | $ perl -MO=Showlex,mysub myperlprogram |
375 | |
376 | To gt a list of the my() variables used in the file myperlprogram: |
377 | |
378 | $ perl -MO=Showlex myperlprogram |
379 | |
380 | [BROKEN] |
381 | |
382 | =item B::Stackobj |
383 | |
384 | This module is used by the B::CC module. It's not a back end itself, |
385 | but rather a component of a back end. |
386 | |
387 | =item B::Stash |
388 | |
389 | This module is used by the L<perlcc> program, which compiles a module |
390 | into an executable. B::Stash prints the symbol tables in use by a |
391 | program, and is used to prevent B::CC from producing C code for the |
392 | B::* and O modules. It's not a back end itself, but rather a |
393 | component of a back end. |
394 | |
395 | =item B::Terse |
396 | |
397 | This module prints the contents of the parse tree, but without as much |
398 | information as B::Debug. For comparison, C<print "Hello, world."> |
399 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. |
400 | |
401 | This module is useful for people who are writing their own back end, |
402 | or who are learning about the Perl internals. It's not useful to the |
403 | average programmer. |
404 | |
405 | =item B::Xref |
406 | |
407 | This module prints a report on where the variables, subroutines, and |
408 | formats are defined and used within a program and the modules it |
409 | loads. See L</"The Cross Referencing Back End"> for details about |
410 | usage. |
411 | |
412 | =cut |
413 | |
414 | =head1 KNOWN PROBLEMS |
415 | |
416 | The simple C backend currently only saves typeglobs with alphanumeric |
417 | names. |
418 | |
419 | The optimized C backend outputs code for more modules than it should |
420 | (e.g., DirHandle). It also has little hope of properly handling |
421 | C<goto LABEL> outside the running subroutine (C<goto &sub> is ok). |
422 | C<goto LABEL> currently does not work at all in this backend. |
423 | It also creates a huge initialization function that gives |
424 | C compilers headaches. Splitting the initialization function gives |
425 | better results. Other problems include: unsigned math does not |
426 | work correctly; some opcodes are handled incorrectly by default |
427 | opcode handling mechanism. |
428 | |
429 | BEGIN{} blocks are executed while compiling your code. Any external |
430 | state that is initialized in BEGIN{}, such as opening files, initiating |
431 | database connections etc., do not behave properly. To work around |
432 | this, Perl has an INIT{} block that corresponds to code being executed |
433 | before your program begins running but after your program has finished |
434 | being compiled. Execution order: BEGIN{}, (possible save of state |
435 | through compiler back-end), INIT{}, program runs, END{}. |
436 | |
437 | =head1 AUTHOR |
438 | |
439 | This document was originally written by Nathan Torkington, and is now |
440 | maintained by the perl5-porters mailing list |
441 | I<perl5-porters@perl.org>. |
442 | |
443 | =cut |