Commit | Line | Data |
c7c04614 |
1 | =head1 NAME |
2 | |
3 | perlfilter - Source Filters |
c47ff5f1 |
4 | |
c7c04614 |
5 | |
6 | =head1 DESCRIPTION |
7 | |
8 | This article is about a little-known feature of Perl called |
9 | I<source filters>. Source filters alter the program text of a module |
10 | before Perl sees it, much as a C preprocessor alters the source text of |
11 | a C program before the compiler sees it. This article tells you more |
12 | about what source filters are, how they work, and how to write your |
13 | own. |
14 | |
15 | The original purpose of source filters was to let you encrypt your |
16 | program source to prevent casual piracy. This isn't all they can do, as |
17 | you'll soon learn. But first, the basics. |
18 | |
19 | =head1 CONCEPTS |
20 | |
21 | Before the Perl interpreter can execute a Perl script, it must first |
4449c45d |
22 | read it from a file into memory for parsing and compilation. If that |
23 | script itself includes other scripts with a C<use> or C<require> |
24 | statement, then each of those scripts will have to be read from their |
25 | respective files as well. |
c7c04614 |
26 | |
27 | Now think of each logical connection between the Perl parser and an |
28 | individual file as a I<source stream>. A source stream is created when |
29 | the Perl parser opens a file, it continues to exist as the source code |
30 | is read into memory, and it is destroyed when Perl is finished parsing |
31 | the file. If the parser encounters a C<require> or C<use> statement in |
32 | a source stream, a new and distinct stream is created just for that |
33 | file. |
34 | |
35 | The diagram below represents a single source stream, with the flow of |
36 | source from a Perl script file on the left into the Perl parser on the |
37 | right. This is how Perl normally operates. |
38 | |
39 | file -------> parser |
40 | |
41 | There are two important points to remember: |
42 | |
43 | =over 5 |
44 | |
45 | =item 1. |
46 | |
47 | Although there can be any number of source streams in existence at any |
48 | given time, only one will be active. |
49 | |
50 | =item 2. |
51 | |
52 | Every source stream is associated with only one file. |
53 | |
54 | =back |
55 | |
56 | A source filter is a special kind of Perl module that intercepts and |
57 | modifies a source stream before it reaches the parser. A source filter |
40b7eeef |
58 | changes our diagram like this: |
c7c04614 |
59 | |
60 | file ----> filter ----> parser |
61 | |
62 | If that doesn't make much sense, consider the analogy of a command |
63 | pipeline. Say you have a shell script stored in the compressed file |
64 | I<trial.gz>. The simple pipeline command below runs the script without |
65 | needing to create a temporary file to hold the uncompressed file. |
66 | |
67 | gunzip -c trial.gz | sh |
68 | |
69 | In this case, the data flow from the pipeline can be represented as follows: |
70 | |
71 | trial.gz ----> gunzip ----> sh |
72 | |
73 | With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser: |
74 | |
75 | compressed gunzip |
76 | Perl program ---> source filter ---> parser |
77 | |
78 | =head1 USING FILTERS |
79 | |
80 | So how do you use a source filter in a Perl script? Above, I said that |
81 | a source filter is just a special kind of module. Like all Perl |
82 | modules, a source filter is invoked with a use statement. |
83 | |
84 | Say you want to pass your Perl source through the C preprocessor before |
85 | execution. You could use the existing C<-P> command line option to do |
86 | this, but as it happens, the source filters distribution comes with a C |
87 | preprocessor filter module called Filter::cpp. Let's use that instead. |
88 | |
89 | Below is an example program, C<cpp_test>, which makes use of this filter. |
90 | Line numbers have been added to allow specific lines to be referenced |
91 | easily. |
92 | |
93 | 1: use Filter::cpp ; |
94 | 2: #define TRUE 1 |
95 | 3: $a = TRUE ; |
96 | 4: print "a = $a\n" ; |
97 | |
98 | When you execute this script, Perl creates a source stream for the |
99 | file. Before the parser processes any of the lines from the file, the |
100 | source stream looks like this: |
101 | |
102 | cpp_test ---------> parser |
103 | |
104 | Line 1, C<use Filter::cpp>, includes and installs the C<cpp> filter |
105 | module. All source filters work this way. The use statement is compiled |
106 | and executed at compile time, before any more of the file is read, and |
107 | it attaches the cpp filter to the source stream behind the scenes. Now |
108 | the data flow looks like this: |
109 | |
110 | cpp_test ----> cpp filter ----> parser |
111 | |
112 | As the parser reads the second and subsequent lines from the source |
113 | stream, it feeds those lines through the C<cpp> source filter before |
114 | processing them. The C<cpp> filter simply passes each line through the |
115 | real C preprocessor. The output from the C preprocessor is then |
116 | inserted back into the source stream by the filter. |
117 | |
118 | .-> cpp --. |
119 | | | |
120 | | | |
121 | | <-' |
122 | cpp_test ----> cpp filter ----> parser |
123 | |
124 | The parser then sees the following code: |
125 | |
126 | use Filter::cpp ; |
127 | $a = 1 ; |
128 | print "a = $a\n" ; |
129 | |
130 | Let's consider what happens when the filtered code includes another |
131 | module with use: |
132 | |
133 | 1: use Filter::cpp ; |
134 | 2: #define TRUE 1 |
135 | 3: use Fred ; |
136 | 4: $a = TRUE ; |
137 | 5: print "a = $a\n" ; |
138 | |
139 | The C<cpp> filter does not apply to the text of the Fred module, only |
140 | to the text of the file that used it (C<cpp_test>). Although the use |
141 | statement on line 3 will pass through the cpp filter, the module that |
142 | gets included (C<Fred>) will not. The source streams look like this |
143 | after line 3 has been parsed and before line 4 is parsed: |
144 | |
145 | cpp_test ---> cpp filter ---> parser (INACTIVE) |
146 | |
147 | Fred.pm ----> parser |
148 | |
149 | As you can see, a new stream has been created for reading the source |
150 | from C<Fred.pm>. This stream will remain active until all of C<Fred.pm> |
151 | has been parsed. The source stream for C<cpp_test> will still exist, |
152 | but is inactive. Once the parser has finished reading Fred.pm, the |
153 | source stream associated with it will be destroyed. The source stream |
154 | for C<cpp_test> then becomes active again and the parser reads line 4 |
155 | and subsequent lines from C<cpp_test>. |
156 | |
157 | You can use more than one source filter on a single file. Similarly, |
158 | you can reuse the same filter in as many files as you like. |
159 | |
160 | For example, if you have a uuencoded and compressed source file, it is |
161 | possible to stack a uudecode filter and an uncompression filter like |
162 | this: |
163 | |
164 | use Filter::uudecode ; use Filter::uncompress ; |
165 | M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/ |
166 | M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[> |
167 | ... |
168 | |
169 | Once the first line has been processed, the flow will look like this: |
170 | |
171 | file ---> uudecode ---> uncompress ---> parser |
172 | filter filter |
173 | |
174 | Data flows through filters in the same order they appear in the source |
175 | file. The uudecode filter appeared before the uncompress filter, so the |
176 | source file will be uudecoded before it's uncompressed. |
177 | |
178 | =head1 WRITING A SOURCE FILTER |
179 | |
180 | There are three ways to write your own source filter. You can write it |
181 | in C, use an external program as a filter, or write the filter in Perl. |
182 | I won't cover the first two in any great detail, so I'll get them out |
183 | of the way first. Writing the filter in Perl is most convenient, so |
184 | I'll devote the most space to it. |
185 | |
186 | =head1 WRITING A SOURCE FILTER IN C |
187 | |
188 | The first of the three available techniques is to write the filter |
189 | completely in C. The external module you create interfaces directly |
190 | with the source filter hooks provided by Perl. |
191 | |
192 | The advantage of this technique is that you have complete control over |
193 | the implementation of your filter. The big disadvantage is the |
194 | increased complexity required to write the filter - not only do you |
195 | need to understand the source filter hooks, but you also need a |
196 | reasonable knowledge of Perl guts. One of the few times it is worth |
197 | going to this trouble is when writing a source scrambler. The |
198 | C<decrypt> filter (which unscrambles the source before Perl parses it) |
199 | included with the source filter distribution is an example of a C |
200 | source filter (see Decryption Filters, below). |
c47ff5f1 |
201 | |
c7c04614 |
202 | |
203 | =over 5 |
204 | |
205 | =item B<Decryption Filters> |
206 | |
207 | All decryption filters work on the principle of "security through |
208 | obscurity." Regardless of how well you write a decryption filter and |
209 | how strong your encryption algorithm, anyone determined enough can |
210 | retrieve the original source code. The reason is quite simple - once |
211 | the decryption filter has decrypted the source back to its original |
212 | form, fragments of it will be stored in the computer's memory as Perl |
213 | parses it. The source might only be in memory for a short period of |
214 | time, but anyone possessing a debugger, skill, and lots of patience can |
215 | eventually reconstruct your program. |
216 | |
217 | That said, there are a number of steps that can be taken to make life |
218 | difficult for the potential cracker. The most important: Write your |
219 | decryption filter in C and statically link the decryption module into |
220 | the Perl binary. For further tips to make life difficult for the |
221 | potential cracker, see the file I<decrypt.pm> in the source filters |
222 | module. |
223 | |
224 | =back |
225 | |
226 | =head1 CREATING A SOURCE FILTER AS A SEPARATE EXECUTABLE |
227 | |
228 | An alternative to writing the filter in C is to create a separate |
229 | executable in the language of your choice. The separate executable |
230 | reads from standard input, does whatever processing is necessary, and |
231 | writes the filtered data to standard output. C<Filter:cpp> is an |
232 | example of a source filter implemented as a separate executable - the |
233 | executable is the C preprocessor bundled with your C compiler. |
234 | |
235 | The source filter distribution includes two modules that simplify this |
236 | task: C<Filter::exec> and C<Filter::sh>. Both allow you to run any |
237 | external executable. Both use a coprocess to control the flow of data |
238 | into and out of the external executable. (For details on coprocesses, |
239 | see Stephens, W.R. "Advanced Programming in the UNIX Environment." |
240 | Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference |
241 | between them is that C<Filter::exec> spawns the external command |
242 | directly, while C<Filter::sh> spawns a shell to execute the external |
243 | command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning |
244 | a shell allows you to make use of the shell metacharacters and |
245 | redirection facilities. |
246 | |
247 | Here is an example script that uses C<Filter::sh>: |
248 | |
249 | use Filter::sh 'tr XYZ PQR' ; |
250 | $a = 1 ; |
251 | print "XYZ a = $a\n" ; |
252 | |
253 | The output you'll get when the script is executed: |
254 | |
255 | PQR a = 1 |
256 | |
257 | Writing a source filter as a separate executable works fine, but a |
258 | small performance penalty is incurred. For example, if you execute the |
259 | small example above, a separate subprocess will be created to run the |
260 | Unix C<tr> command. Each use of the filter requires its own subprocess. |
261 | If creating subprocesses is expensive on your system, you might want to |
262 | consider one of the other options for creating source filters. |
263 | |
264 | =head1 WRITING A SOURCE FILTER IN PERL |
265 | |
266 | The easiest and most portable option available for creating your own |
267 | source filter is to write it completely in Perl. To distinguish this |
268 | from the previous two techniques, I'll call it a Perl source filter. |
269 | |
270 | To help understand how to write a Perl source filter we need an example |
271 | to study. Here is a complete source filter that performs rot13 |
272 | decoding. (Rot13 is a very simple encryption scheme used in Usenet |
273 | postings to hide the contents of offensive posts. It moves every letter |
274 | forward thirteen places, so that A becomes N, B becomes O, and Z |
275 | becomes M.) |
276 | |
277 | |
278 | package Rot13 ; |
279 | |
280 | use Filter::Util::Call ; |
281 | |
282 | sub import { |
283 | my ($type) = @_ ; |
284 | my ($ref) = [] ; |
285 | filter_add(bless $ref) ; |
286 | } |
287 | |
288 | sub filter { |
289 | my ($self) = @_ ; |
290 | my ($status) ; |
291 | |
292 | tr/n-za-mN-ZA-M/a-zA-Z/ |
293 | if ($status = filter_read()) > 0 ; |
294 | $status ; |
295 | } |
296 | |
297 | 1; |
298 | |
299 | All Perl source filters are implemented as Perl classes and have the |
300 | same basic structure as the example above. |
301 | |
302 | First, we include the C<Filter::Util::Call> module, which exports a |
303 | number of functions into your filter's namespace. The filter shown |
304 | above uses two of these functions, C<filter_add()> and |
305 | C<filter_read()>. |
306 | |
307 | Next, we create the filter object and associate it with the source |
308 | stream by defining the C<import> function. If you know Perl well |
309 | enough, you know that C<import> is called automatically every time a |
310 | module is included with a use statement. This makes C<import> the ideal |
311 | place to both create and install a filter object. |
312 | |
313 | In the example filter, the object (C<$ref>) is blessed just like any |
314 | other Perl object. Our example uses an anonymous array, but this isn't |
315 | a requirement. Because this example doesn't need to store any context |
316 | information, we could have used a scalar or hash reference just as |
317 | well. The next section demonstrates context data. |
318 | |
319 | The association between the filter object and the source stream is made |
320 | with the C<filter_add()> function. This takes a filter object as a |
321 | parameter (C<$ref> in this case) and installs it in the source stream. |
322 | |
323 | Finally, there is the code that actually does the filtering. For this |
324 | type of Perl source filter, all the filtering is done in a method |
325 | called C<filter()>. (It is also possible to write a Perl source filter |
326 | using a closure. See the C<Filter::Util::Call> manual page for more |
327 | details.) It's called every time the Perl parser needs another line of |
328 | source to process. The C<filter()> method, in turn, reads lines from |
329 | the source stream using the C<filter_read()> function. |
330 | |
331 | If a line was available from the source stream, C<filter_read()> |
332 | returns a status value greater than zero and appends the line to C<$_>. |
333 | A status value of zero indicates end-of-file, less than zero means an |
334 | error. The filter function itself is expected to return its status in |
335 | the same way, and put the filtered line it wants written to the source |
336 | stream in C<$_>. The use of C<$_> accounts for the brevity of most Perl |
337 | source filters. |
338 | |
339 | In order to make use of the rot13 filter we need some way of encoding |
340 | the source file in rot13 format. The script below, C<mkrot13>, does |
341 | just that. |
342 | |
343 | die "usage mkrot13 filename\n" unless @ARGV ; |
344 | my $in = $ARGV[0] ; |
345 | my $out = "$in.tmp" ; |
346 | open(IN, "<$in") or die "Cannot open file $in: $!\n"; |
347 | open(OUT, ">$out") or die "Cannot open file $out: $!\n"; |
348 | |
349 | print OUT "use Rot13;\n" ; |
350 | while (<IN>) { |
351 | tr/a-zA-Z/n-za-mN-ZA-M/ ; |
352 | print OUT ; |
353 | } |
354 | |
355 | close IN; |
356 | close OUT; |
357 | unlink $in; |
358 | rename $out, $in; |
359 | |
360 | If we encrypt this with C<mkrot13>: |
361 | |
362 | print " hello fred \n" ; |
363 | |
364 | the result will be this: |
365 | |
366 | use Rot13; |
367 | cevag "uryyb serq\a" ; |
368 | |
369 | Running it produces this output: |
370 | |
371 | hello fred |
372 | |
373 | =head1 USING CONTEXT: THE DEBUG FILTER |
374 | |
375 | The rot13 example was a trivial example. Here's another demonstration |
376 | that shows off a few more features. |
377 | |
378 | Say you wanted to include a lot of debugging code in your Perl script |
379 | during development, but you didn't want it available in the released |
380 | product. Source filters offer a solution. In order to keep the example |
381 | simple, let's say you wanted the debugging output to be controlled by |
382 | an environment variable, C<DEBUG>. Debugging code is enabled if the |
383 | variable exists, otherwise it is disabled. |
384 | |
385 | Two special marker lines will bracket debugging code, like this: |
386 | |
387 | ## DEBUG_BEGIN |
388 | if ($year > 1999) { |
389 | warn "Debug: millennium bug in year $year\n" ; |
390 | } |
391 | ## DEBUG_END |
392 | |
393 | When the C<DEBUG> environment variable exists, the filter ensures that |
394 | Perl parses only the code between the C<DEBUG_BEGIN> and C<DEBUG_END> |
395 | markers. That means that when C<DEBUG> does exist, the code above |
396 | should be passed through the filter unchanged. The marker lines can |
397 | also be passed through as-is, because the Perl parser will see them as |
398 | comment lines. When C<DEBUG> isn't set, we need a way to disable the |
399 | debug code. A simple way to achieve that is to convert the lines |
400 | between the two markers into comments: |
401 | |
402 | ## DEBUG_BEGIN |
403 | #if ($year > 1999) { |
404 | # warn "Debug: millennium bug in year $year\n" ; |
405 | #} |
406 | ## DEBUG_END |
407 | |
408 | Here is the complete Debug filter: |
409 | |
410 | package Debug; |
411 | |
412 | use strict; |
9f1b1f2d |
413 | use warnings; |
c7c04614 |
414 | use Filter::Util::Call ; |
415 | |
416 | use constant TRUE => 1 ; |
417 | use constant FALSE => 0 ; |
418 | |
419 | sub import { |
420 | my ($type) = @_ ; |
421 | my (%context) = ( |
422 | Enabled => defined $ENV{DEBUG}, |
423 | InTraceBlock => FALSE, |
424 | Filename => (caller)[1], |
425 | LineNo => 0, |
426 | LastBegin => 0, |
427 | ) ; |
428 | filter_add(bless \%context) ; |
429 | } |
430 | |
431 | sub Die { |
432 | my ($self) = shift ; |
433 | my ($message) = shift ; |
434 | my ($line_no) = shift || $self->{LastBegin} ; |
435 | die "$message at $self->{Filename} line $line_no.\n" |
436 | } |
437 | |
438 | sub filter { |
439 | my ($self) = @_ ; |
440 | my ($status) ; |
441 | $status = filter_read() ; |
442 | ++ $self->{LineNo} ; |
443 | |
444 | # deal with EOF/error first |
445 | if ($status <= 0) { |
446 | $self->Die("DEBUG_BEGIN has no DEBUG_END") |
447 | if $self->{InTraceBlock} ; |
448 | return $status ; |
449 | } |
450 | |
451 | if ($self->{InTraceBlock}) { |
452 | if (/^\s*##\s*DEBUG_BEGIN/ ) { |
453 | $self->Die("Nested DEBUG_BEGIN", $self->{LineNo}) |
454 | } elsif (/^\s*##\s*DEBUG_END/) { |
455 | $self->{InTraceBlock} = FALSE ; |
456 | } |
457 | |
458 | # comment out the debug lines when the filter is disabled |
459 | s/^/#/ if ! $self->{Enabled} ; |
460 | } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { |
461 | $self->{InTraceBlock} = TRUE ; |
462 | $self->{LastBegin} = $self->{LineNo} ; |
463 | } elsif ( /^\s*##\s*DEBUG_END/ ) { |
464 | $self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo}); |
465 | } |
466 | return $status ; |
467 | } |
468 | |
469 | 1 ; |
470 | |
471 | The big difference between this filter and the previous example is the |
472 | use of context data in the filter object. The filter object is based on |
473 | a hash reference, and is used to keep various pieces of context |
474 | information between calls to the filter function. All but two of the |
475 | hash fields are used for error reporting. The first of those two, |
476 | Enabled, is used by the filter to determine whether the debugging code |
477 | should be given to the Perl parser. The second, InTraceBlock, is true |
478 | when the filter has encountered a C<DEBUG_BEGIN> line, but has not yet |
479 | encountered the following C<DEBUG_END> line. |
480 | |
481 | If you ignore all the error checking that most of the code does, the |
482 | essence of the filter is as follows: |
483 | |
484 | sub filter { |
485 | my ($self) = @_ ; |
486 | my ($status) ; |
487 | $status = filter_read() ; |
488 | |
489 | # deal with EOF/error first |
490 | return $status if $status <= 0 ; |
491 | if ($self->{InTraceBlock}) { |
492 | if (/^\s*##\s*DEBUG_END/) { |
493 | $self->{InTraceBlock} = FALSE |
494 | } |
495 | |
496 | # comment out debug lines when the filter is disabled |
497 | s/^/#/ if ! $self->{Enabled} ; |
498 | } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) { |
499 | $self->{InTraceBlock} = TRUE ; |
500 | } |
501 | return $status ; |
502 | } |
503 | |
504 | Be warned: just as the C-preprocessor doesn't know C, the Debug filter |
505 | doesn't know Perl. It can be fooled quite easily: |
506 | |
507 | print <<EOM; |
508 | ##DEBUG_BEGIN |
509 | EOM |
510 | |
511 | Such things aside, you can see that a lot can be achieved with a modest |
40b7eeef |
512 | amount of code. |
c7c04614 |
513 | |
514 | =head1 CONCLUSION |
515 | |
516 | You now have better understanding of what a source filter is, and you |
517 | might even have a possible use for them. If you feel like playing with |
518 | source filters but need a bit of inspiration, here are some extra |
519 | features you could add to the Debug filter. |
520 | |
521 | First, an easy one. Rather than having debugging code that is |
522 | all-or-nothing, it would be much more useful to be able to control |
523 | which specific blocks of debugging code get included. Try extending the |
524 | syntax for debug blocks to allow each to be identified. The contents of |
525 | the C<DEBUG> environment variable can then be used to control which |
526 | blocks get included. |
527 | |
528 | Once you can identify individual blocks, try allowing them to be |
529 | nested. That isn't difficult either. |
530 | |
531 | Here is a interesting idea that doesn't involve the Debug filter. |
532 | Currently Perl subroutines have fairly limited support for formal |
533 | parameter lists. You can specify the number of parameters and their |
534 | type, but you still have to manually take them out of the C<@_> array |
535 | yourself. Write a source filter that allows you to have a named |
536 | parameter list. Such a filter would turn this: |
537 | |
538 | sub MySub ($first, $second, @rest) { ... } |
539 | |
540 | into this: |
541 | |
542 | sub MySub($$@) { |
543 | my ($first) = shift ; |
544 | my ($second) = shift ; |
545 | my (@rest) = @_ ; |
546 | ... |
547 | } |
548 | |
549 | Finally, if you feel like a real challenge, have a go at writing a |
550 | full-blown Perl macro preprocessor as a source filter. Borrow the |
551 | useful features from the C preprocessor and any other macro processors |
552 | you know. The tricky bit will be choosing how much knowledge of Perl's |
553 | syntax you want your filter to have. |
554 | |
555 | =head1 REQUIREMENTS |
556 | |
557 | The Source Filters distribution is available on CPAN, in |
558 | |
559 | CPAN/modules/by-module/Filter |
560 | |
561 | =head1 AUTHOR |
562 | |
563 | Paul Marquess E<lt>Paul.Marquess@btinternet.comE<gt> |
564 | |
565 | =head1 Copyrights |
566 | |
567 | This article originally appeared in The Perl Journal #11, and is |
568 | copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and |
569 | The Perl Journal. This document may be distributed under the same terms |
570 | as Perl itself. |