4 use File::Basename qw(&basename &dirname);
7 # List explicitly here the variables you want Configure to
8 # generate. Metaconfig only looks for shell variables, so you
9 # have to mention them as if they were shell variables, not
10 # %Config entries. Thus you write
12 # to ensure Configure will look for $Config{startperl}.
14 # This forces PL files to create target in same directory as PL file.
15 # This is so that make depend always knows where to find PL derivatives.
18 $file = basename($0, '.PL');
19 $file .= '.com' if $^O eq 'VMS';
21 open OUT,">$file" or die "Can't create $file: $!";
23 print "Extracting $file (with variable substitutions)\n";
25 # In this section, perl variables will be expanded during extraction.
26 # You can use $Config{...} to use Configure variables.
28 print OUT <<"!GROK!THIS!";
30 eval 'exec $Config{perlpath} -S \$0 \${1+"\$@"}'
31 if \$running_under_some_shell;
32 my \$startperl = '$Config{startperl}';
33 my \$perlpath = '$Config{perlpath}';
36 # In the following, perl variables are not expanded during extraction.
38 print OUT <<'!NO!SUBS!';
40 $0 =~ s/^.*?(\w+)$/$1/;
42 # (p)sed - a stream editor
43 # History: Aug 12 2000: Original version.
55 sed [-an] script [file ...]
56 sed [-an] [-e script] [-f script-file] [file ...]
60 A stream editor reads the input stream consisting of the specified files
61 (or standard input, if none are given), processes is line by line by
62 applying a script consisting of edit commands, and writes resulting lines
63 to standard output. The filename `C<->' may be used to read standard input.
65 The edit script is composed from arguments of B<-e> options and
66 script-files, in the given order. A single script argument may be specified
67 as the first parameter.
69 If this program is invoked with the name F<s2p>, it will act as a
70 sed-to-Perl translator. See L<"sed Script Translation">.
72 B<sed> returns an exit code of 0 on success or >0 if an error occurred.
80 A file specified as argument to the B<w> edit command is by default
81 opened before input processing starts. Using B<-a>, opening of such
82 files is delayed until the first line is actually written to the file.
86 The editing commands defined by I<script> are appended to the script.
87 Multiple commands must be separated by newlines.
89 =item B<-f> I<script-file>
91 Editing commands from the specified I<script-file> are read and appended
96 By default, a line is written to standard output after the editing script
97 has been applied to it. The B<-n> option suppresses automatic printing.
103 B<sed> command syntax is defined as
105 Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
107 with whitespace being permitted before or after addresses, and between
108 the function character and the argument. The I<address>es and the
109 address inverter (C<!>) are used to restrict the application of a
110 command to the selected line(s) of input.
112 Each command must be on a line of its own, except where noted in
115 The edit cycle performed on each input line consist of reading the line
116 (without its trailing newline character) into the I<pattern space>,
117 applying the applicable commands of the edit script, writing the final
118 contents of the pattern space and a newline to the standard output.
119 A I<hold space> is provided for saving the contents of the
120 pattern space for later use.
124 A sed address is either a line number or a pattern, which may be combined
125 arbitrarily to construct ranges. Lines are numbered across all input files.
127 Any address may be followed by an exclamation mark (`C<!>'), selecting
128 all lines not matching that address.
134 The line with the given number is selected.
138 A dollar sign (C<$>) is the line number of the last line of the input stream.
140 =item B</>I<regular expression>B</>
142 A pattern address is a basic regular expression (see
143 L<"Basic Regular Expressions">), between the delimiting character C</>.
144 Any other character except C<\> or newline may be used to delimit a
145 pattern address when the initial delimiter is prefixed with a
150 If no address is given, the command selects every line.
152 If one address is given, it selects the line (or lines) matching the
155 Two addresses select a range that begins whenever the first address
156 matches, and ends (including that line) when the second address matches.
157 If the first (second) address is a matching pattern, the second
158 address is not applied to the very same line to determine the end of
159 the range. Likewise, if the second address is a matching pattern, the
160 first address is not applied to the very same line to determine the
161 begin of another range. If both addresses are line numbers,
162 and the second line number is less than the first line number, then
163 only the first line is selected.
168 The maximum permitted number of addresses is indicated with each
169 function synopsis below.
171 The argument I<text> consists of one or more lines following the command.
172 Embedded newlines in I<text> must be preceded with a backslash. Other
173 backslashes in I<text> are deleted and the following character is taken
181 #--------------------------------------------------------------------------
182 $ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
184 =item [1addr]B<a\> I<text>
186 Write I<text> (which must start on the line following the command)
187 to standard output immediately before reading the next line
188 of input, either by executing the B<N> function or by beginning a new cycle.
192 #--------------------------------------------------------------------------
193 $ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
195 =item [2addr]B<b> [I<label>]
197 Branch to the B<:> function with the specified I<label>. If no label
198 is given, branch to the end of the script.
202 #--------------------------------------------------------------------------
203 $ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
204 { print <<'TheEnd'; } $doPrint = 0; goto EOS;
206 ### continue OK => next CYCLE;
208 =item [2addr]B<c\> I<text>
210 The line, or range of lines, selected by the address is deleted.
211 The I<text> (which must start on the line following the command)
212 is written to standard output. With an address range, this occurs at
213 the end of the range.
217 #--------------------------------------------------------------------------
218 $ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
223 ### continue OK => next CYCLE;
227 Deletes the pattern space and starts the next cycle.
231 #--------------------------------------------------------------------------
232 $ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
234 if(length($_)){ goto BOS } else { goto EOS }
237 ### continue OK => next CYCLE;
241 Deletes the pattern space through the first embedded newline or to the end.
242 If the pattern space becomes empty, a new cycle is started, otherwise
243 execution of the script is restarted.
247 #--------------------------------------------------------------------------
248 $ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
252 Replace the contents of the pattern space with the hold space.
256 #--------------------------------------------------------------------------
257 $ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
261 Append a newline and the contents of the hold space to the pattern space.
265 #--------------------------------------------------------------------------
266 $ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
270 Replace the contents of the hold space with the pattern space.
274 #--------------------------------------------------------------------------
275 $ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
279 Append a newline and the contents of the pattern space to the hold space.
283 #--------------------------------------------------------------------------
284 $ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
286 =item [1addr]B<i\> I<text>
288 Write the I<text> (which must start on the line following the command)
293 #--------------------------------------------------------------------------
294 $ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
298 Print the contents of the pattern space: non-printable characters are
299 shown in C-style escaped form; long lines are split and have a trailing
300 `C<\>' at the point of the split; the true end of a line is marked with
301 a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
302 BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
303 octal number for all other non-printable characters.
307 #--------------------------------------------------------------------------
308 $ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
309 { print $_, "\n" if $doPrint;
312 last CYCLE unless getsARGV();
319 If automatic printing is enabled, write the pattern space to the standard
320 output. Replace the pattern space with the next line of input. If
321 there is no more input, processing is terminated.
325 #--------------------------------------------------------------------------
326 $ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
329 last CYCLE unless getsARGV( $h );
337 Append a newline and the next line of input to the pattern space. If
338 there is no more input, processing is terminated.
342 #--------------------------------------------------------------------------
343 $ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
347 Print the pattern space to the standard output. (Use the B<-n> option
348 to suppress automatic printing at the end of a cycle if you want to
349 avoid double printing of lines.)
353 #--------------------------------------------------------------------------
354 $ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
355 { if( /^(.*)/ ){ print $1, "\n"; } }
360 Prints the pattern space through the first embedded newline or to the end.
364 #--------------------------------------------------------------------------
365 $ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
366 { print $_, "\n" if $doPrint;
373 Branch to the end of the script and quit without starting a new cycle.
377 #--------------------------------------------------------------------------
378 $ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
379 ### FIXME: lazy reading - big files???
381 =item [1addr]B<r> I<file>
383 Copy the contents of the I<file> to standard output immediately before
384 the next attempt to read a line of input. Any error encountered while
385 reading I<file> is silently ignored.
389 #--------------------------------------------------------------------------
390 $ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
392 =item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
394 Substitute the I<replacement> string for the first substring in
395 the pattern space that matches the I<regular expression>.
396 Any character other than backslash or newline can be used instead of a
397 slash to delimit the regular expression and the replacement.
398 To use the delimiter as a literal character within the regular expression
399 and the replacement, precede the character by a backslash (`C<\>').
401 Literal newlines may be embedded in the replacement string by
402 preceding a newline with a backslash.
404 Within the replacement, an ampersand (`C<&>') is replaced by the string
405 matching the regular expression. The strings `C<\1>' through `C<\9>' are
406 replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
407 To get a literal `C<&>' or `C<\>' in the replacement text, precede it
410 The following I<flags> modify the behaviour of the B<s> command:
416 The replacement is performed for all matching, non-overlapping substrings
417 of the pattern space.
421 Replace only the n-th matching substring of the pattern space.
425 If the substitution was made, print the new value of the pattern space.
429 If the substitution was made, write the new value of the pattern space
430 to the specified file.
436 #--------------------------------------------------------------------------
437 $ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
439 =item [2addr]B<t> [I<label>]
441 Branch to the B<:> function with the specified I<label> if any B<s>
442 substitutions have been made since the most recent reading of an input line
443 or execution of a B<t> function. If no label is given, branch to the end of
449 #--------------------------------------------------------------------------
450 $ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
452 =item [2addr]B<w> I<file>
454 The contents of the pattern space are written to the I<file>.
458 #--------------------------------------------------------------------------
459 $ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
463 Swap the contents of the pattern space and the hold space.
467 #--------------------------------------------------------------------------
468 $ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
469 =item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
471 In the pattern space, replace all characters occuring in I<string1> by the
472 character at the corresponding position in I<string2>. It is possible
473 to use any character (other than a backslash or newline) instead of a
474 slash to delimit the strings. Within I<string1> and I<string2>, a
475 backslash followed by any character other than a newline is that literal
476 character, and a backslash followed by an `n' is replaced by a newline
481 #--------------------------------------------------------------------------
482 $ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
486 Prints the current line number on the standard output.
490 #--------------------------------------------------------------------------
491 $ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
493 =item [0addr]B<:> [I<label>]
495 The command specifies the position of the I<label>. It has no other effect.
499 #--------------------------------------------------------------------------
500 $ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
501 $ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
502 # ';' to avoid warning on empty {}-block
504 =item [2addr]B<{> [I<command>]
508 These two commands begin and end a command list. The first command may
509 be given on the same line as the opening B<{> command. The commands
510 within the list are jointly selected by the address(es) given on the
511 B<{> command (but may still have individual addresses).
515 #--------------------------------------------------------------------------
516 $ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
518 =item [0addr]B<#> [I<comment>]
520 The entire line is ignored (treated as a comment). If, however, the first
521 two characters in the script are `C<#n>', automatic printing of output is
522 suppressed, as if the B<-n> option were given on the command line.
528 use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
530 my $useDEBUG = exists( $ENV{PSEDDEBUG} );
531 my $useEXTBRE = $ENV{PSEDEXTBRE} || '';
532 $useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
534 my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
535 my $doOpenWrite = 1; # open w command output files at start (-a => 0)
536 my $svOpenWrite = 0; # save $doOpenWrite
537 my $doGenerate = $0 eq 's2p';
539 # Collected and compiled script
541 my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code );
551 my( $msg, $loc ) = @_;
553 $loc .= ': ' if length( $loc );
554 warn( "$0: $loc$msg\n" );
559 return 'L_'.++$labNum;
562 # safeHere: create safe here delimiter and modify opcode and argument
565 my( $codref, $argref ) = @_;
567 while( $$argref =~ /^$eod$/m ){
570 $$codref =~ s/TheEnd/$eod/e;
571 $$argref .= "$eod\n";
574 # Emit: create address logic and emit command
577 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
579 if( defined( $addr1 ) ){
580 if( defined( $addr2 ) ){
581 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
583 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
585 $cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
589 $Code .= "$cond$arg\n";
591 } elsif( $opcode =~ s/-X-/$arg/e ){
592 $Code .= "$cond$opcode\n";
594 } elsif( $opcode =~ /TheEnd/ ){
595 safeHere( \$opcode, \$arg );
596 $Code .= "$cond$opcode$arg";
599 $Code .= "$cond$opcode\n";
604 # Write (w command, w flag): store pathname
607 my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
609 Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
613 # Label (: command): label definition
616 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
619 if( length( $lab ) ){
621 if( ! exists( $Label{$lab} ) ){
622 $h = $Label{$lab}{name} = newLabel();
624 $h = $Label{$lab}{name};
625 if( exists( $Label{$lab}{defined} ) ){
626 my $dl = $Label{$lab}{defined};
627 Warn( "duplicate label $lab (first defined at $dl)", $fl );
631 $Label{$lab}{defined} = $fl;
637 # BeginBlock ({ command): push block start
639 sub BeginBlock($$$$$$){
640 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
641 push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
642 Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
645 # EndBlock (} command): check proper nesting
647 sub EndBlock($$$$$$){
648 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
650 my $jcom = pop( @BlockStack );
651 if( defined( $jcom ) ){
652 $rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
654 Warn( "unexpected `}'", $fl );
660 # Branch (t, b commands): check or create label, substitute default
663 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
664 $lab =~ s/\s+//; # no spaces at end
666 if( length( $lab ) ){
667 if( ! exists( $Label{$lab} ) ){
668 $h = $Label{$lab}{name} = newLabel();
670 $h = $Label{$lab}{name};
672 push( @{$Label{$lab}{used}}, $fl );
676 $opcode =~ s/XXX/$h/e;
677 Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
680 # Change (c command): is special due to range end watching
683 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
684 my $kwd = $negated ? 'unless' : 'if';
685 if( defined( $addr2 ) ){
686 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
688 $addr1 = '$icnt = ('.$addr1.')';
689 $opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
692 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
694 safeHere( \$opcode, \$arg );
695 $Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
700 # Comment (# command): A no-op. Who would've thought that!
703 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
704 ### $Code .= "# $arg\n";
710 my( $del, $sref ) = @_;
712 print "stripRegex:$del:$$sref:\n" if $useDEBUG;
713 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E(\s*)}{}s ){
715 $regex .= $1.$sl.$del;
716 if( length( $sl ) % 2 == 0 ){
724 # stripTrans: take a <del> terminated string from y command
725 # honoring and cleaning up of \-escaped <del>'s
728 my( $del, $sref ) = @_;
730 print "stripTrans:$del:$$sref:\n" if $useDEBUG;
731 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E}{}s ){
734 if( length( $sl ) % 2 == 0 ){
745 # makey - construct Perl y/// from sed y///
748 my( $fr, $to, $fl ) = @_;
751 # Ensure that any '-' is up front.
752 # Diagnose duplicate contradicting mappings
754 for( my $i = 0; $i < length($fr); $i++ ){
755 my $fc = substr($fr,$i,1);
756 my $tc = substr($to,$i,1);
757 if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
758 Warn( "ambiguos translation for character `$fc' in `y' command",
765 if( exists( $tr{'-'} ) ){
766 ( $fr, $to ) = ( '-', $tr{'-'} );
771 # might just as well sort it...
772 for my $fc ( sort keys( %tr ) ){
776 # make embedded delimiters and newlines safe
777 $fr =~ s/([{}])/\$1/g;
778 $to =~ s/([{}])/\$1/g;
781 return $error ? undef() : "{ y{$fr}{$to}; }";
785 # makes - construct Perl s/// from sed s///
788 my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
790 # make embedded newlines safe
791 $regex =~ s/\n/\\n/g;
792 $subst =~ s/\n/\\n/g;
797 if( length( $nmatch ) ){
800 while( --\$n && ( \$s = m ${regex}g ) ){}
801 \$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
806 { \$s = s ${regex}${subst}s${global};
811 $code .= ' print $_, "\n" if $s;'."\n";
813 if( defined( $path ) ){
815 $code .= " _w( '$path' ) if \$s;\n";
820 =head1 BASIC REGULAR EXPRESSIONS
822 A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
823 of I<atoms>, for matching parts of a string, and I<bounds>, specifying
824 repetitions of a preceding atom.
828 The possible atoms of a BRE are: B<.>, matching any single character;
829 B<^> and B<$>, matching the null string at the beginning or end
830 of a string, respectively; a I<bracket expressions>, enclosed
831 in B<[> and B<]> (see below); and any single character with no
832 other significance (matching that character). A B<\> before one
833 of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
834 after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
835 becomes an atom and establishes the target for a I<backreference>,
836 consisting of the substring that actually matches the enclosed atoms.
837 Finally, B<\> followed by one of the digits B<0> through B<9> is a
840 A B<^> that is not first, or a B<$> that is not last does not have
841 a special significance and need not be preceded by a backslash to
842 become literal. The same is true for a B<]>, that does not terminate
843 a bracket expression.
845 An unescaped backslash cannot be last in a BRE.
849 The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
850 atom; B<\{>I<count>B<\}>, specifying that many repetitions;
851 B<\{>I<minimum>B<,\}>, giving a lower limit; and
852 B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
855 A bound appearing as the first item in a BRE is taken literally.
857 =head2 Bracket Expressions
859 A I<bracket expression> is a list of characters, character ranges
860 and character classes enclosed in B<[> and B<]> and matches any
861 single character from the represented set of characters.
863 A character range is written as two characters separated by B<-> and
864 represents all characters (according to the character collating sequence)
865 that are not less than the first and not greater than the second.
866 (Ranges are very collating-sequence-dependent, and portable programs
867 should avoid relying on them.)
869 A character class is one of the class names
876 enclosed in B<[:> and B<:]> and represents the set of characters
877 as defined in ctype(3).
879 If the first character after B<[> is B<^>, the sense of matching is
882 To include a literal `C<^>', place it anywhere else but first. To
883 include a literal 'C<]>' place it first or immediately after an
884 initial B<^>. To include a literal `C<->' make it the first (or
885 second after B<^>) or last character, or the second endpoint of
888 The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
889 match the null string at the beginning and end of a word respectively.
890 (Note that neither is identical to Perl's `\b' atom.)
892 =head2 Additional Atoms
894 Since some sed implementations provide additional regular expression
895 atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
896 the following backslash escapes:
900 =item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
902 =item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
904 =item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
906 =item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
908 =item B<\y> Match the empty string at a word boundary.
910 =item B<\B> Match the empty string between any two either word or non-word characters.
914 To enable this feature, the environment variable PSEDEXTBRE must be set
915 to a string containing the requested characters, e.g.:
916 C<PSEDEXTBRE='E<lt>E<gt>wW'>.
921 # bre2p - convert BRE to Perl RE
924 my( $pref, $ic ) = @_;
925 $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
929 my( $del, $pat, $fl ) = @_;
931 $led =~ tr/{([</})]>/;
932 $led = '' if $led eq $del;
934 $pat = substr( $pat, 1, length($pat) - 2 );
939 for( my $ic = 0; $ic < length( $pat ); $ic++ ){
940 my $c = substr( $pat, $ic, 1 );
942 ### backslash escapes
943 my $nc = peek($pat,$ic);
945 Warn( "`\\' cannot be last in pattern", $fl );
949 if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
952 } elsif( $nc =~ /([[.*\\n])/ ){
953 ## check for \-escaped magics and \n:
954 ## \[ \. \* \\ \n stay as they are
957 } elsif( $nc eq '(' ){ ## \( => (
961 } elsif( $nc eq ')' ){ ## \) => )
965 Warn( "unmatched `\\)'", $fl );
970 } elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
971 my $endpos = index( $pat, '\\}', $ic );
973 Warn( "unmatched `\\{'", $fl );
976 my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
979 if( $res =~ /^\^?$/ ){
981 } elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
985 if( length( $max ) ){
987 Warn( "maximum less than minimum in `\\{$rep\\}'",
995 if( $min == 0 && $max eq '1' ){
997 } elsif( $min == 1 && "$com$max" eq ',' ){
999 } elsif( $min == 0 && "$com$max" eq ',' ){
1002 $res .= "{$min$com$max}";
1005 Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
1009 } elsif( $nc =~ /^[1-9]$/ ){
1010 ## \1 .. \9 => \1 .. \9, but check for a following digit
1011 if( $nc > $backref ){
1012 Warn( "invalid backreference ($nc)", $fl );
1016 if( peek($pat,$ic) =~ /[0-9]/ ){
1020 } elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
1021 ## extensions - at most <>wWyB - not in POSIX
1022 if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1023 $res .= '\\b(?<=\\W)';
1024 } elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1025 $res .= '\\b(?=\\W)';
1026 } elsif( $nc eq 'y' ){ ## \y => \b
1028 } else { ## \B, \w, \W remain the same
1031 } elsif( $nc eq $led ){
1032 ## \<closing bracketing-delimiter> - keep '\'
1035 } else { ## \ <char> => <char> ("as if `\' were not present")
1039 } elsif( $c eq '.' ){ ## . => .
1042 } elsif( $c eq '*' ){ ## * => * but \* if there's nothing preceding it
1043 if( $res =~ /^\^?$/ ){
1045 } elsif( substr( $res, -1, 1 ) ne '*' ){
1049 } elsif( $c eq '[' ){
1050 ## parse []: [^...] [^]...] [-...]
1052 if( peek($pat,$ic) eq '^' ){
1056 my $nc = peek($pat,$ic);
1057 if( $nc eq ']' || $nc eq '-' ){
1061 # check that [ is not trailing
1062 if( $ic >= length( $pat ) - 1 ){
1063 Warn( "unmatched `['", $fl );
1066 # look for [:...:] and x-y
1067 my $rstr = substr( $pat, $ic+1 );
1068 if( $rstr =~ /^((?:\[:\(\w+|[><]\):\]|[^]-](?:-[^]])?)*)/ ){
1070 $ic += length( $cnt );
1071 $cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1072 # try some simplifications
1074 if( $red =~ s/0-9// ){
1076 if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1082 # POSIX 1003.2 has this (optional) for begin/end word
1083 $add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1084 $add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1088 ## may have a trailing `-' before `]'
1089 if( $ic < length($pat) - 1 &&
1090 substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1091 $ic += length( $1 );
1093 # another simplification
1094 $add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1097 Warn( "unmatched `['", $fl );
1101 } elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1104 } elsif( $c eq ']' ){ ## unmatched ] is not magic
1107 } elsif( $c =~ /[|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1110 } elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1111 $res .= length( $res ) ? '\\^' : '^';
1113 } elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1114 $res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1122 Warn( "unmatched `\\('", $fl );
1126 # final cleanup: eliminate raw HTs
1128 return $del . $res . ( $led ? $led : $del );
1133 # sub2p - convert sed substitution to Perl substitution
1136 my( $del, $subst, $fl ) = @_;
1138 $led =~ tr/{([</})]>/;
1139 $led = '' if $led eq $del;
1141 $subst = substr( $subst, 1, length($subst) - 2 );
1144 for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1145 my $c = substr( $subst, $ic, 1 );
1147 ### backslash escapes
1148 my $nc = peek($subst,$ic);
1150 Warn( "`\\' cannot be last in substitution", $fl );
1154 if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1156 } elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1157 $res .= '${' . $nc . '}';
1158 } else { ## everything else (includes &): omit \
1161 } elsif( $c eq '&' ){ ## & => $&
1163 } elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1170 # final cleanup: eliminate raw HTs
1172 return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1178 my( $pdef, $pfil, $plin );
1179 for( my $icom = 0; $icom < @Commands; $icom++ ){
1180 my $cmd = $Commands[$icom];
1181 print "Parse:$cmd:\n" if $useDEBUG;
1183 next unless length( $cmd );
1185 if( exists( $Defined{$icom} ) ){
1186 $pdef = $Defined{$icom};
1187 if( $pdef =~ /^ #(\d+)/ ){
1188 $pfil = 'expression #';
1197 my $fl = "$pfil$plin";
1199 # insert command as comment in gnerated code
1201 $Code .= "# $cmd\n" if $doGenerate;
1205 my( $negated, $naddr, $addr1, $addr2 );
1207 if( $cmd =~ s/^(\d+)\s*// ){
1208 $addr1 = "$1"; $naddr++;
1209 } elsif( $cmd =~ s/^\$\s*// ){
1210 $addr1 = 'eofARGV()'; $naddr++;
1211 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1213 my $regex = stripRegex( $del, \$cmd );
1214 if( defined( $regex ) ){
1215 $addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1218 Warn( "malformed regex, 1st address", $fl );
1223 if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1224 if( $cmd =~ s/^(\d+)\s*// ){
1225 $addr2 = "$1"; $naddr++;
1226 } elsif( $cmd =~ s/^\$\s*// ){
1227 $addr2 = 'eofARGV()'; $naddr++;
1228 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1230 my $regex = stripRegex( $del, \$cmd );
1231 if( defined( $regex ) ){
1232 $addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1235 Warn( "malformed regex, 2nd address", $fl );
1240 Warn( "invalid address after `,'", $fl );
1246 # address modifier `!'
1248 $negated = $cmd =~ s/^!\s*//;
1249 if( defined( $addr1 ) ){
1250 print "Parse: addr1=$addr1" if $useDEBUG;
1251 if( defined( $addr2 ) ){
1252 print ", addr2=$addr2 " if $useDEBUG;
1253 # both numeric and addr1 > addr2 => eliminate addr2
1254 undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1255 $addr2 =~ /^\d+$/ && $addr1 > $addr2;
1258 print 'negated' if $useDEBUG && $negated;
1259 print " command:$cmd\n" if $useDEBUG;
1263 if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1264 my $h = substr( $cmd, 0, 1 );
1265 Warn( "unknown command `$h'", $fl );
1271 my $tabref = $ComTab{$key};
1272 if( $naddr > $tabref->[0] ){
1273 Warn( "excess address(es)", $fl );
1279 if( $tabref->[1] eq 'str' ){
1280 # take remainder - don't care if it is empty
1284 } elsif( $tabref->[1] eq 'txt' ){
1286 my $goon = $cmd =~ /(.*)\\$/;
1288 Warn( "extra characters after command ($cmd)", $fl );
1293 if( $icom > $#Commands ){
1294 Warn( "unexpected end of script", $fl );
1298 $cmd = $Commands[$icom];
1299 $Code .= "# $cmd\n" if $doGenerate;
1300 $goon = $cmd =~ s/\\$//;
1301 $cmd =~ s/\\(.)/$1/g;
1302 $arg .= "\n" if length( $arg );
1305 $arg .= "\n" if length( $arg );
1308 } elsif( $tabref->[1] eq 'sub' ){
1310 if( ! length( $cmd ) ){
1311 Warn( "`s' command requires argument", $fl );
1315 if( $cmd =~ s{^([^\\\n])}{} ){
1317 my $regex = stripRegex( $del, \$cmd );
1318 if( ! defined( $regex ) ){
1319 Warn( "malformed regular expression", $fl );
1323 $regex = bre2p( $del, $regex, $fl );
1325 # a trailing \ indicates embedded NL (in replacement string)
1326 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1328 if( $icom > $#Commands ){
1329 Warn( "unexpected end of script", $fl );
1333 $cmd .= $Commands[$icom];
1334 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1337 my $subst = stripRegex( $del, \$cmd );
1338 if( ! defined( $regex ) ){
1339 Warn( "malformed substitution expression", $fl );
1343 $subst = sub2p( $del, $subst, $fl );
1345 # parse s/// modifier: g|p|0-9|w <file>
1346 my( $global, $nmatch, $print, $write ) =
1347 ( '', '', 0, undef );
1348 while( $cmd =~ s/^([gp0-9])// ){
1349 $1 eq 'g' ? ( $global = 'g' ) :
1350 $1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1352 $write = $1 if $cmd =~ s/w\s*(.*)$//;
1353 ### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1354 if( $global && length( $nmatch ) || length( $nmatch ) > 1 ){
1355 Warn( "conflicting flags `$global$nmatch'", $fl );
1360 $arg = makes( $regex, $subst,
1361 $write, $global, $print, $nmatch, $fl );
1362 if( ! defined( $arg ) ){
1368 Warn( "improper delimiter in s command", $fl );
1373 } elsif( $tabref->[1] eq 'tra' ){
1375 # a trailing \ indicates embedded newline
1376 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1378 if( $icom > $#Commands ){
1379 Warn( "unexpected end of script", $fl );
1383 $cmd .= $Commands[$icom];
1384 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1386 if( ! length( $cmd ) ){
1387 Warn( "`y' command requires argument", $fl );
1391 my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1393 Warn( "`\\' not valid as delimiter in `y' command", $fl );
1397 my $fr = stripTrans( $d, \$cmd );
1398 if( ! defined( $fr ) || ! length( $cmd ) ){
1399 Warn( "malformed `y' command argument", $fl );
1403 my $to = stripTrans( $d, \$cmd );
1404 if( ! defined( $to ) ){
1405 Warn( "malformed `y' command argument", $fl );
1409 if( length($fr) != length($to) ){
1410 Warn( "string lengths in `y' command differ", $fl );
1414 if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1421 # $cmd must be now empty - exception is {
1422 if( $cmd !~ /^\s*$/ ){
1424 # dirty hack to process command on '{' line
1425 $Commands[$icom--] = $cmd;
1427 Warn( "extra characters after command ($cmd)", $fl );
1435 if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1436 $tabref->[3], $arg, $fl ) ){
1441 while( @BlockStack ){
1442 my $bl = pop( @BlockStack );
1443 Warn( "start of unterminated `{'", $bl );
1447 for my $lab ( keys( %Label ) ){
1448 if( ! exists( $Label{$lab}{defined} ) ){
1449 for my $used ( @{$Label{$lab}{used}} ){
1450 Warn( "undefined label `$lab'", $used );
1456 exit( 1 ) if $error;
1465 print STDERR "Usage: sed [-an] command [file...]\n";
1466 print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1470 # Here we go again...
1473 while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1478 if( length( $arg ) ){
1479 push( @Commands, split( "\n", $arg ) );
1481 push( @Commands, shift( @ARGV ) );
1483 Warn( "option -e requires an argument" );
1488 $Defined{$#Commands} = " #$expr";
1493 if( length( $arg ) ){
1496 $path = shift( @ARGV );
1498 Warn( "option -f requires an argument" );
1502 my $fst = $#Commands + 1;
1503 open( SCRIPT, "<$path" ) || die( "$0: $path: could not open ($!)\n" );
1505 while( defined( $cmd = <SCRIPT> ) ){
1507 push( @Commands, $cmd );
1510 if( $#Commands >= $fst ){
1511 $Defined{$fst} = "$path";
1515 if( $opt eq '-' && $arg eq '' ){
1518 if( $opt eq 'h' || $opt eq '?' ){
1524 } elsif( $opt eq 'a' ){
1527 Warn( "illegal option `$opt'" );
1531 if( length( $arg ) ){
1532 unshift( @ARGV, "-$arg" );
1536 # A singleton command may be the 1st argument when there are no options.
1538 if( @Commands == 0 ){
1540 Warn( "no script command given" );
1544 push( @Commands, split( "\n", shift( @ARGV ) ) );
1545 $Defined{0} = ' #1';
1548 print STDERR "Files: @ARGV\n" if $useDEBUG;
1550 # generate leading code
1552 $Code = <<'[TheEnd]';
1555 unshift( @ARGV, '-' ) unless @ARGV;
1556 my $file = shift( @ARGV );
1557 open( ARG, "<$file" )
1558 || die( "$0: can't open $file for reading ($!)\n" );
1563 my $argref = @_ ? shift() : \$_;
1564 while( $isEOF || ! defined( $$argref = <ARG> ) ){
1566 return 0 unless @ARGV;
1567 my $file = shift( @ARGV );
1568 open( ARG, "<$file" )
1569 || die( "$0: can't open $file for reading ($!)\n" );
1576 return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1582 if( ! exists( $wFiles{$path} ) || $wFiles{$path} eq '' ){
1583 $handle = $wFiles{$path} = gensym();
1585 if( ! open( $handle, ">$path" ) ){
1586 die( "$0: can't open $path for writing: ($!)\n" );
1590 $handle = $wFiles{$path};
1604 if( $h =~ /[^[:print:]]/ ){
1611 $h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1613 while( length( $h ) > $mcpl ){
1614 my $l = substr( $h, 0, $mcpl-1 );
1615 $h = substr( $h, $mcpl );
1616 # remove incomplete \-escape from end of line
1617 if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1627 my $handle = $wFiles{$path};
1628 if( ! $doOpenWrite &&
1629 ! defined( fileno( $handle ) ) ){
1630 open( $handle, ">$path" )
1631 || die( "$0: $path: cannot open ($!)\n" );
1633 print $handle $_, "\n";
1636 # condition register test/reset
1649 if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1650 open( $wFiles{$$q}, ">>$$q" );
1652 if( open( RF, "<$$q" ) ){
1654 while( defined( $line = <RF> ) ){
1667 my( $h, $icnt, $s, $n );
1668 # hack (not unbreakable :-/) to avoid // matching an empty string
1669 my $z = "\000"; $z =~ /$z/;
1674 $doPrint = $doAutoPrint;
1676 while( getsARGV() ){
1678 $CondReg = 0; # cleared on t
1682 # parse - avoid opening files when doing s2p
1684 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1687 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1690 # append trailing code
1692 $Code .= <<'[TheEnd]';
1693 EOS: if( $doPrint ){
1696 $doPrint = $doAutoPrint;
1705 # magic "#n" - same as -n option
1707 $doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1709 # eval code - check for errors
1711 print "Code:\n$Code" if $useDEBUG;
1714 print "Code:\n$Code";
1715 die( "$0: internal error - generated incorrect Perl code: $@\n" );
1720 # write full Perl program
1723 # bang line, declarations
1726 eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1728 \$0 =~ s/^.*?(\\w+)\$/\$1/;
1732 use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1733 \$doAutoPrint \$doOpenWrite \$doPrint };
1734 \$doAutoPrint = $doAutoPrint;
1735 \$doOpenWrite = $doOpenWrite;
1738 my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1743 exit( 1 ) unless makeHandle( \$p );
1754 # execute: make handles (and optionally open) all w files; run!
1756 for my $p ( keys( %wFiles ) ){
1757 exit( 1 ) unless makeHandle( $p );
1765 The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1766 See L<"Additional Atoms">.
1772 =item ambiguos translation for character `%s' in `y' command
1774 The indicated character appears twice, with different translations.
1776 =item `[' cannot be last in pattern
1778 A `[' in a BRE indicates the beginning of a I<bracket expression>.
1780 =item `\' cannot be last in pattern
1782 A `\' in a BRE is used to make the subsequent character literal.
1784 =item `\' cannot be last in substitution
1786 A `\' in a subsitution string is used to make the subsequent character literal.
1788 =item conflicting flags `%s'
1790 In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1791 multiple n-th occurrence flags are specified. Note that only the digits
1792 `1' through `9' are permitted.
1794 =item duplicate label %s (first defined at %s)
1796 =item excess address(es)
1798 The command has more than the permitted number of addresses.
1800 =item extra characters after command (%s)
1802 =item illegal option `%s'
1804 =item improper delimiter in s command
1806 The BRE and substitution may not be delimited with `\' or newline.
1808 =item invalid address after `,'
1810 =item invalid backreference (%s)
1812 The specified backreference number exceeds the number of backreferences
1815 =item invalid repeat clause `\{%s\}'
1817 The repeat clause does not contain a valid integer value, or pair of
1820 =item malformed regex, 1st address
1822 =item malformed regex, 2nd address
1824 =item malformed regular expression
1826 =item malformed substitution expression
1828 =item malformed `y' command argument
1830 The first or second string of a B<y> command is syntactically incorrect.
1832 =item maximum less than minimum in `\{%s\}'
1834 =item no script command given
1836 There must be at least one B<-e> or one B<-f> option specifying a
1837 script or script file.
1839 =item `\' not valid as delimiter in `y' command
1841 =item option -e requires an argument
1843 =item option -f requires an argument
1845 =item `s' command requires argument
1847 =item start of unterminated `{'
1849 =item string lengths in `y' command differ
1851 The translation table strings in a B<y> commanf must have equal lengths.
1853 =item undefined label `%s'
1855 =item unexpected `}'
1857 A B<}> command without a preceding B<{> command was encountered.
1859 =item unexpected end of script
1861 The end of the script was reached although a text line after a
1862 B<a>, B<c> or B<i> command indicated another line.
1864 =item unknown command `%s'
1866 =item unterminated `['
1868 A BRE contains an unterminated bracket expression.
1870 =item unterminated `\('
1872 A BRE contains an unterminated backreference.
1874 =item `\{' without closing `\}'
1876 A BRE contains an unterminated bounds specification.
1878 =item `\)' without preceding `\('
1880 =item `y' command requires argument
1886 The basic material for the preceding section was generated by running
1890 s/^.*Warn( *"\([^"]*\)".*$/\1/
1895 s/$[_[:alnum:]]\{1,\}/%s/g
1900 on the program's own text, and piping the output into C<sort -u>.
1903 =head1 SED SCRIPT TRANSLATION
1905 If this program is invoked with the name F<s2p> it will act as a
1906 sed-to-Perl translator. After option processing (all other
1907 arguments are ignored), a Perl program is printed on standard
1908 output, which will process the input stream (as read from all
1909 arguments) in the way defined by the sed script and the option setting
1910 used for the translation.
1914 perl(1), re_format(7)
1918 The B<l> command will show escape characters (ESC) as `C<\e>', but
1919 a vertical tab (VT) in octal.
1921 Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1923 The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1924 is "the last pattern used, at run time". This deviates from the Perl
1925 interpretation, which will re-use the "last last successfully executed
1926 regular expression". Since keeping track of pattern usage would create
1927 terribly cluttered code, and differences would only appear in obscure
1928 context (where other B<sed> implementations appear to deviate, too),
1929 the Perl semantics was adopted. Note that common usage of this feature,
1930 such as in C</abc/s//xyz/>, will work as expected.
1932 Collating elements (of bracket expressions in BREs) are not implemented.
1936 This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1937 definition of B<sed>, and is compatible with the I<OpenBSD>
1938 implementation, except where otherwise noted (see L<"BUGS">).
1942 This Perl implementation of I<sed> was written by Wolfgang Laun,
1943 I<Wolfgang.Laun@alcatel.at>.
1945 =head1 COPYRIGHT and LICENSE
1947 This program is free and open software. You may use, modify,
1948 distribute, and sell this program (and any modified variants) in any
1949 way you wish, provided you do not restrict others from doing the same.
1955 close OUT or die "Can't close $file: $!";
1956 chmod 0755, $file or die "Can't reset permissions for $file: $!\n";
1957 exec("$Config{'eunicefix'} $file") if $Config{'eunicefix'} ne ':';