4 use File::Basename qw(&basename &dirname);
7 # List explicitly here the variables you want Configure to
8 # generate. Metaconfig only looks for shell variables, so you
9 # have to mention them as if they were shell variables, not
10 # %Config entries. Thus you write
12 # to ensure Configure will look for $Config{startperl}.
14 # This forces PL files to create target in same directory as PL file.
15 # This is so that make depend always knows where to find PL derivatives.
18 $file = basename($0, '.PL');
19 $file .= '.com' if $^O eq 'VMS';
21 open OUT,">$file" or die "Can't create $file: $!";
23 print "Extracting $file (with variable substitutions)\n";
25 # In this section, perl variables will be expanded during extraction.
26 # You can use $Config{...} to use Configure variables.
28 print OUT <<"!GROK!THIS!";
30 eval 'exec $Config{perlpath} -S \$0 \${1+"\$@"}'
31 if \$running_under_some_shell;
34 (\$startperl = <<'/../') =~ s/\\s*\\z//;
37 (\$perlpath = <<'/../') =~ s/\\s*\\z//;
42 # In the following, perl variables are not expanded during extraction.
44 print OUT <<'!NO!SUBS!';
46 $0 =~ s/^.*?(\w+)$/$1/;
48 # (p)sed - a stream editor
49 # History: Aug 12 2000: Original version.
61 sed [-an] script [file ...]
62 sed [-an] [-e script] [-f script-file] [file ...]
66 A stream editor reads the input stream consisting of the specified files
67 (or standard input, if none are given), processes is line by line by
68 applying a script consisting of edit commands, and writes resulting lines
69 to standard output. The filename `C<->' may be used to read standard input.
71 The edit script is composed from arguments of B<-e> options and
72 script-files, in the given order. A single script argument may be specified
73 as the first parameter.
75 If this program is invoked with the name F<s2p>, it will act as a
76 sed-to-Perl translator. See L<"sed Script Translation">.
78 B<sed> returns an exit code of 0 on success or >0 if an error occurred.
86 A file specified as argument to the B<w> edit command is by default
87 opened before input processing starts. Using B<-a>, opening of such
88 files is delayed until the first line is actually written to the file.
92 The editing commands defined by I<script> are appended to the script.
93 Multiple commands must be separated by newlines.
95 =item B<-f> I<script-file>
97 Editing commands from the specified I<script-file> are read and appended
102 By default, a line is written to standard output after the editing script
103 has been applied to it. The B<-n> option suppresses automatic printing.
109 B<sed> command syntax is defined as
111 Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
113 with whitespace being permitted before or after addresses, and between
114 the function character and the argument. The I<address>es and the
115 address inverter (C<!>) are used to restrict the application of a
116 command to the selected line(s) of input.
118 Each command must be on a line of its own, except where noted in
121 The edit cycle performed on each input line consist of reading the line
122 (without its trailing newline character) into the I<pattern space>,
123 applying the applicable commands of the edit script, writing the final
124 contents of the pattern space and a newline to the standard output.
125 A I<hold space> is provided for saving the contents of the
126 pattern space for later use.
130 A sed address is either a line number or a pattern, which may be combined
131 arbitrarily to construct ranges. Lines are numbered across all input files.
133 Any address may be followed by an exclamation mark (`C<!>'), selecting
134 all lines not matching that address.
140 The line with the given number is selected.
144 A dollar sign (C<$>) is the line number of the last line of the input stream.
146 =item B</>I<regular expression>B</>
148 A pattern address is a basic regular expression (see
149 L<"Basic Regular Expressions">), between the delimiting character C</>.
150 Any other character except C<\> or newline may be used to delimit a
151 pattern address when the initial delimiter is prefixed with a
156 If no address is given, the command selects every line.
158 If one address is given, it selects the line (or lines) matching the
161 Two addresses select a range that begins whenever the first address
162 matches, and ends (including that line) when the second address matches.
163 If the first (second) address is a matching pattern, the second
164 address is not applied to the very same line to determine the end of
165 the range. Likewise, if the second address is a matching pattern, the
166 first address is not applied to the very same line to determine the
167 begin of another range. If both addresses are line numbers,
168 and the second line number is less than the first line number, then
169 only the first line is selected.
174 The maximum permitted number of addresses is indicated with each
175 function synopsis below.
177 The argument I<text> consists of one or more lines following the command.
178 Embedded newlines in I<text> must be preceded with a backslash. Other
179 backslashes in I<text> are deleted and the following character is taken
187 #--------------------------------------------------------------------------
188 $ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
190 =item [1addr]B<a\> I<text>
192 Write I<text> (which must start on the line following the command)
193 to standard output immediately before reading the next line
194 of input, either by executing the B<N> function or by beginning a new cycle.
198 #--------------------------------------------------------------------------
199 $ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
201 =item [2addr]B<b> [I<label>]
203 Branch to the B<:> function with the specified I<label>. If no label
204 is given, branch to the end of the script.
208 #--------------------------------------------------------------------------
209 $ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
210 { print <<'TheEnd'; } $doPrint = 0; goto EOS;
212 ### continue OK => next CYCLE;
214 =item [2addr]B<c\> I<text>
216 The line, or range of lines, selected by the address is deleted.
217 The I<text> (which must start on the line following the command)
218 is written to standard output. With an address range, this occurs at
219 the end of the range.
223 #--------------------------------------------------------------------------
224 $ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
229 ### continue OK => next CYCLE;
233 Deletes the pattern space and starts the next cycle.
237 #--------------------------------------------------------------------------
238 $ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
240 if(length($_)){ goto BOS } else { goto EOS }
243 ### continue OK => next CYCLE;
247 Deletes the pattern space through the first embedded newline or to the end.
248 If the pattern space becomes empty, a new cycle is started, otherwise
249 execution of the script is restarted.
253 #--------------------------------------------------------------------------
254 $ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
258 Replace the contents of the pattern space with the hold space.
262 #--------------------------------------------------------------------------
263 $ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
267 Append a newline and the contents of the hold space to the pattern space.
271 #--------------------------------------------------------------------------
272 $ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
276 Replace the contents of the hold space with the pattern space.
280 #--------------------------------------------------------------------------
281 $ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
285 Append a newline and the contents of the pattern space to the hold space.
289 #--------------------------------------------------------------------------
290 $ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
292 =item [1addr]B<i\> I<text>
294 Write the I<text> (which must start on the line following the command)
299 #--------------------------------------------------------------------------
300 $ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
304 Print the contents of the pattern space: non-printable characters are
305 shown in C-style escaped form; long lines are split and have a trailing
306 `C<\>' at the point of the split; the true end of a line is marked with
307 a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
308 BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
309 octal number for all other non-printable characters.
313 #--------------------------------------------------------------------------
314 $ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
315 { print $_, "\n" if $doPrint;
318 last CYCLE unless getsARGV();
325 If automatic printing is enabled, write the pattern space to the standard
326 output. Replace the pattern space with the next line of input. If
327 there is no more input, processing is terminated.
331 #--------------------------------------------------------------------------
332 $ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
335 last CYCLE unless getsARGV( $h );
343 Append a newline and the next line of input to the pattern space. If
344 there is no more input, processing is terminated.
348 #--------------------------------------------------------------------------
349 $ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
353 Print the pattern space to the standard output. (Use the B<-n> option
354 to suppress automatic printing at the end of a cycle if you want to
355 avoid double printing of lines.)
359 #--------------------------------------------------------------------------
360 $ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
361 { if( /^(.*)/ ){ print $1, "\n"; } }
366 Prints the pattern space through the first embedded newline or to the end.
370 #--------------------------------------------------------------------------
371 $ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
372 { print $_, "\n" if $doPrint;
379 Branch to the end of the script and quit without starting a new cycle.
383 #--------------------------------------------------------------------------
384 $ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
385 ### FIXME: lazy reading - big files???
387 =item [1addr]B<r> I<file>
389 Copy the contents of the I<file> to standard output immediately before
390 the next attempt to read a line of input. Any error encountered while
391 reading I<file> is silently ignored.
395 #--------------------------------------------------------------------------
396 $ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
398 =item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
400 Substitute the I<replacement> string for the first substring in
401 the pattern space that matches the I<regular expression>.
402 Any character other than backslash or newline can be used instead of a
403 slash to delimit the regular expression and the replacement.
404 To use the delimiter as a literal character within the regular expression
405 and the replacement, precede the character by a backslash (`C<\>').
407 Literal newlines may be embedded in the replacement string by
408 preceding a newline with a backslash.
410 Within the replacement, an ampersand (`C<&>') is replaced by the string
411 matching the regular expression. The strings `C<\1>' through `C<\9>' are
412 replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
413 To get a literal `C<&>' or `C<\>' in the replacement text, precede it
416 The following I<flags> modify the behaviour of the B<s> command:
422 The replacement is performed for all matching, non-overlapping substrings
423 of the pattern space.
427 Replace only the n-th matching substring of the pattern space.
431 If the substitution was made, print the new value of the pattern space.
435 If the substitution was made, write the new value of the pattern space
436 to the specified file.
442 #--------------------------------------------------------------------------
443 $ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
445 =item [2addr]B<t> [I<label>]
447 Branch to the B<:> function with the specified I<label> if any B<s>
448 substitutions have been made since the most recent reading of an input line
449 or execution of a B<t> function. If no label is given, branch to the end of
455 #--------------------------------------------------------------------------
456 $ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
458 =item [2addr]B<w> I<file>
460 The contents of the pattern space are written to the I<file>.
464 #--------------------------------------------------------------------------
465 $ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
469 Swap the contents of the pattern space and the hold space.
473 #--------------------------------------------------------------------------
474 $ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
475 =item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
477 In the pattern space, replace all characters occuring in I<string1> by the
478 character at the corresponding position in I<string2>. It is possible
479 to use any character (other than a backslash or newline) instead of a
480 slash to delimit the strings. Within I<string1> and I<string2>, a
481 backslash followed by any character other than a newline is that literal
482 character, and a backslash followed by an `n' is replaced by a newline
487 #--------------------------------------------------------------------------
488 $ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
492 Prints the current line number on the standard output.
496 #--------------------------------------------------------------------------
497 $ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
499 =item [0addr]B<:> [I<label>]
501 The command specifies the position of the I<label>. It has no other effect.
505 #--------------------------------------------------------------------------
506 $ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
507 $ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
508 # ';' to avoid warning on empty {}-block
510 =item [2addr]B<{> [I<command>]
514 These two commands begin and end a command list. The first command may
515 be given on the same line as the opening B<{> command. The commands
516 within the list are jointly selected by the address(es) given on the
517 B<{> command (but may still have individual addresses).
521 #--------------------------------------------------------------------------
522 $ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
524 =item [0addr]B<#> [I<comment>]
526 The entire line is ignored (treated as a comment). If, however, the first
527 two characters in the script are `C<#n>', automatic printing of output is
528 suppressed, as if the B<-n> option were given on the command line.
534 use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
536 my $useDEBUG = exists( $ENV{PSEDDEBUG} );
537 my $useEXTBRE = $ENV{PSEDEXTBRE} || '';
538 $useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
540 my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
541 my $doOpenWrite = 1; # open w command output files at start (-a => 0)
542 my $svOpenWrite = 0; # save $doOpenWrite
543 my $doGenerate = $0 eq 's2p';
545 # Collected and compiled script
547 my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code );
557 my( $msg, $loc ) = @_;
559 $loc .= ': ' if length( $loc );
560 warn( "$0: $loc$msg\n" );
565 return 'L_'.++$labNum;
568 # safeHere: create safe here delimiter and modify opcode and argument
571 my( $codref, $argref ) = @_;
573 while( $$argref =~ /^$eod$/m ){
576 $$codref =~ s/TheEnd/$eod/e;
577 $$argref .= "$eod\n";
580 # Emit: create address logic and emit command
583 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
585 if( defined( $addr1 ) ){
586 if( defined( $addr2 ) ){
587 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
589 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
591 $cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
595 $Code .= "$cond$arg\n";
597 } elsif( $opcode =~ s/-X-/$arg/e ){
598 $Code .= "$cond$opcode\n";
600 } elsif( $opcode =~ /TheEnd/ ){
601 safeHere( \$opcode, \$arg );
602 $Code .= "$cond$opcode$arg";
605 $Code .= "$cond$opcode\n";
610 # Write (w command, w flag): store pathname
613 my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
615 Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
619 # Label (: command): label definition
622 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
625 if( length( $lab ) ){
627 if( ! exists( $Label{$lab} ) ){
628 $h = $Label{$lab}{name} = newLabel();
630 $h = $Label{$lab}{name};
631 if( exists( $Label{$lab}{defined} ) ){
632 my $dl = $Label{$lab}{defined};
633 Warn( "duplicate label $lab (first defined at $dl)", $fl );
637 $Label{$lab}{defined} = $fl;
643 # BeginBlock ({ command): push block start
645 sub BeginBlock($$$$$$){
646 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
647 push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
648 Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
651 # EndBlock (} command): check proper nesting
653 sub EndBlock($$$$$$){
654 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
656 my $jcom = pop( @BlockStack );
657 if( defined( $jcom ) ){
658 $rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
660 Warn( "unexpected `}'", $fl );
666 # Branch (t, b commands): check or create label, substitute default
669 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
670 $lab =~ s/\s+//; # no spaces at end
672 if( length( $lab ) ){
673 if( ! exists( $Label{$lab} ) ){
674 $h = $Label{$lab}{name} = newLabel();
676 $h = $Label{$lab}{name};
678 push( @{$Label{$lab}{used}}, $fl );
682 $opcode =~ s/XXX/$h/e;
683 Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
686 # Change (c command): is special due to range end watching
689 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
690 my $kwd = $negated ? 'unless' : 'if';
691 if( defined( $addr2 ) ){
692 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
694 $addr1 = '$icnt = ('.$addr1.')';
695 $opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
698 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
700 safeHere( \$opcode, \$arg );
701 $Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
706 # Comment (# command): A no-op. Who would've thought that!
709 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
710 ### $Code .= "# $arg\n";
716 my( $del, $sref ) = @_;
718 print "stripRegex:$del:$$sref:\n" if $useDEBUG;
719 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E(\s*)}{}s ){
721 $regex .= $1.$sl.$del;
722 if( length( $sl ) % 2 == 0 ){
730 # stripTrans: take a <del> terminated string from y command
731 # honoring and cleaning up of \-escaped <del>'s
734 my( $del, $sref ) = @_;
736 print "stripTrans:$del:$$sref:\n" if $useDEBUG;
737 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E}{}s ){
740 if( length( $sl ) % 2 == 0 ){
751 # makey - construct Perl y/// from sed y///
754 my( $fr, $to, $fl ) = @_;
757 # Ensure that any '-' is up front.
758 # Diagnose duplicate contradicting mappings
760 for( my $i = 0; $i < length($fr); $i++ ){
761 my $fc = substr($fr,$i,1);
762 my $tc = substr($to,$i,1);
763 if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
764 Warn( "ambiguos translation for character `$fc' in `y' command",
771 if( exists( $tr{'-'} ) ){
772 ( $fr, $to ) = ( '-', $tr{'-'} );
777 # might just as well sort it...
778 for my $fc ( sort keys( %tr ) ){
782 # make embedded delimiters and newlines safe
783 $fr =~ s/([{}])/\$1/g;
784 $to =~ s/([{}])/\$1/g;
787 return $error ? undef() : "{ y{$fr}{$to}; }";
791 # makes - construct Perl s/// from sed s///
794 my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
796 # make embedded newlines safe
797 $regex =~ s/\n/\\n/g;
798 $subst =~ s/\n/\\n/g;
803 if( length( $nmatch ) ){
806 while( --\$n && ( \$s = m ${regex}g ) ){}
807 \$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
812 { \$s = s ${regex}${subst}s${global};
817 $code .= ' print $_, "\n" if $s;'."\n";
819 if( defined( $path ) ){
821 $code .= " _w( '$path' ) if \$s;\n";
826 =head1 BASIC REGULAR EXPRESSIONS
828 A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
829 of I<atoms>, for matching parts of a string, and I<bounds>, specifying
830 repetitions of a preceding atom.
834 The possible atoms of a BRE are: B<.>, matching any single character;
835 B<^> and B<$>, matching the null string at the beginning or end
836 of a string, respectively; a I<bracket expressions>, enclosed
837 in B<[> and B<]> (see below); and any single character with no
838 other significance (matching that character). A B<\> before one
839 of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
840 after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
841 becomes an atom and establishes the target for a I<backreference>,
842 consisting of the substring that actually matches the enclosed atoms.
843 Finally, B<\> followed by one of the digits B<0> through B<9> is a
846 A B<^> that is not first, or a B<$> that is not last does not have
847 a special significance and need not be preceded by a backslash to
848 become literal. The same is true for a B<]>, that does not terminate
849 a bracket expression.
851 An unescaped backslash cannot be last in a BRE.
855 The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
856 atom; B<\{>I<count>B<\}>, specifying that many repetitions;
857 B<\{>I<minimum>B<,\}>, giving a lower limit; and
858 B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
861 A bound appearing as the first item in a BRE is taken literally.
863 =head2 Bracket Expressions
865 A I<bracket expression> is a list of characters, character ranges
866 and character classes enclosed in B<[> and B<]> and matches any
867 single character from the represented set of characters.
869 A character range is written as two characters separated by B<-> and
870 represents all characters (according to the character collating sequence)
871 that are not less than the first and not greater than the second.
872 (Ranges are very collating-sequence-dependent, and portable programs
873 should avoid relying on them.)
875 A character class is one of the class names
882 enclosed in B<[:> and B<:]> and represents the set of characters
883 as defined in ctype(3).
885 If the first character after B<[> is B<^>, the sense of matching is
888 To include a literal `C<^>', place it anywhere else but first. To
889 include a literal 'C<]>' place it first or immediately after an
890 initial B<^>. To include a literal `C<->' make it the first (or
891 second after B<^>) or last character, or the second endpoint of
894 The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
895 match the null string at the beginning and end of a word respectively.
896 (Note that neither is identical to Perl's `\b' atom.)
898 =head2 Additional Atoms
900 Since some sed implementations provide additional regular expression
901 atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
902 the following backslash escapes:
906 =item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
908 =item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
910 =item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
912 =item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
914 =item B<\y> Match the empty string at a word boundary.
916 =item B<\B> Match the empty string between any two either word or non-word characters.
920 To enable this feature, the environment variable PSEDEXTBRE must be set
921 to a string containing the requested characters, e.g.:
922 C<PSEDEXTBRE='E<lt>E<gt>wW'>.
927 # bre2p - convert BRE to Perl RE
930 my( $pref, $ic ) = @_;
931 $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
935 my( $del, $pat, $fl ) = @_;
937 $led =~ tr/{([</})]>/;
938 $led = '' if $led eq $del;
940 $pat = substr( $pat, 1, length($pat) - 2 );
945 for( my $ic = 0; $ic < length( $pat ); $ic++ ){
946 my $c = substr( $pat, $ic, 1 );
948 ### backslash escapes
949 my $nc = peek($pat,$ic);
951 Warn( "`\\' cannot be last in pattern", $fl );
955 if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
958 } elsif( $nc =~ /([[.*\\n])/ ){
959 ## check for \-escaped magics and \n:
960 ## \[ \. \* \\ \n stay as they are
963 } elsif( $nc eq '(' ){ ## \( => (
967 } elsif( $nc eq ')' ){ ## \) => )
971 Warn( "unmatched `\\)'", $fl );
976 } elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
977 my $endpos = index( $pat, '\\}', $ic );
979 Warn( "unmatched `\\{'", $fl );
982 my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
985 if( $res =~ /^\^?$/ ){
987 } elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
991 if( length( $max ) ){
993 Warn( "maximum less than minimum in `\\{$rep\\}'",
1001 if( $min == 0 && $max eq '1' ){
1003 } elsif( $min == 1 && "$com$max" eq ',' ){
1005 } elsif( $min == 0 && "$com$max" eq ',' ){
1008 $res .= "{$min$com$max}";
1011 Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
1015 } elsif( $nc =~ /^[1-9]$/ ){
1016 ## \1 .. \9 => \1 .. \9, but check for a following digit
1017 if( $nc > $backref ){
1018 Warn( "invalid backreference ($nc)", $fl );
1022 if( peek($pat,$ic) =~ /[0-9]/ ){
1026 } elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
1027 ## extensions - at most <>wWyB - not in POSIX
1028 if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1029 $res .= '\\b(?<=\\W)';
1030 } elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1031 $res .= '\\b(?=\\W)';
1032 } elsif( $nc eq 'y' ){ ## \y => \b
1034 } else { ## \B, \w, \W remain the same
1037 } elsif( $nc eq $led ){
1038 ## \<closing bracketing-delimiter> - keep '\'
1041 } else { ## \ <char> => <char> ("as if `\' were not present")
1045 } elsif( $c eq '.' ){ ## . => .
1048 } elsif( $c eq '*' ){ ## * => * but \* if there's nothing preceding it
1049 if( $res =~ /^\^?$/ ){
1051 } elsif( substr( $res, -1, 1 ) ne '*' ){
1055 } elsif( $c eq '[' ){
1056 ## parse []: [^...] [^]...] [-...]
1058 if( peek($pat,$ic) eq '^' ){
1062 my $nc = peek($pat,$ic);
1063 if( $nc eq ']' || $nc eq '-' ){
1067 # check that [ is not trailing
1068 if( $ic >= length( $pat ) - 1 ){
1069 Warn( "unmatched `['", $fl );
1072 # look for [:...:] and x-y
1073 my $rstr = substr( $pat, $ic+1 );
1074 if( $rstr =~ /^((?:\[:\(\w+|[><]\):\]|[^]-](?:-[^]])?)*)/ ){
1076 $ic += length( $cnt );
1077 $cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1078 # try some simplifications
1080 if( $red =~ s/0-9// ){
1082 if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1088 # POSIX 1003.2 has this (optional) for begin/end word
1089 $add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1090 $add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1094 ## may have a trailing `-' before `]'
1095 if( $ic < length($pat) - 1 &&
1096 substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1097 $ic += length( $1 );
1099 # another simplification
1100 $add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1103 Warn( "unmatched `['", $fl );
1107 } elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1110 } elsif( $c eq ']' ){ ## unmatched ] is not magic
1113 } elsif( $c =~ /[|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1116 } elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1117 $res .= length( $res ) ? '\\^' : '^';
1119 } elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1120 $res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1128 Warn( "unmatched `\\('", $fl );
1132 # final cleanup: eliminate raw HTs
1134 return $del . $res . ( $led ? $led : $del );
1139 # sub2p - convert sed substitution to Perl substitution
1142 my( $del, $subst, $fl ) = @_;
1144 $led =~ tr/{([</})]>/;
1145 $led = '' if $led eq $del;
1147 $subst = substr( $subst, 1, length($subst) - 2 );
1150 for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1151 my $c = substr( $subst, $ic, 1 );
1153 ### backslash escapes
1154 my $nc = peek($subst,$ic);
1156 Warn( "`\\' cannot be last in substitution", $fl );
1160 if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1162 } elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1163 $res .= '${' . $nc . '}';
1164 } else { ## everything else (includes &): omit \
1167 } elsif( $c eq '&' ){ ## & => $&
1169 } elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1176 # final cleanup: eliminate raw HTs
1178 return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1184 my( $pdef, $pfil, $plin );
1185 for( my $icom = 0; $icom < @Commands; $icom++ ){
1186 my $cmd = $Commands[$icom];
1187 print "Parse:$cmd:\n" if $useDEBUG;
1189 next unless length( $cmd );
1191 if( exists( $Defined{$icom} ) ){
1192 $pdef = $Defined{$icom};
1193 if( $pdef =~ /^ #(\d+)/ ){
1194 $pfil = 'expression #';
1203 my $fl = "$pfil$plin";
1205 # insert command as comment in gnerated code
1207 $Code .= "# $cmd\n" if $doGenerate;
1211 my( $negated, $naddr, $addr1, $addr2 );
1213 if( $cmd =~ s/^(\d+)\s*// ){
1214 $addr1 = "$1"; $naddr++;
1215 } elsif( $cmd =~ s/^\$\s*// ){
1216 $addr1 = 'eofARGV()'; $naddr++;
1217 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1219 my $regex = stripRegex( $del, \$cmd );
1220 if( defined( $regex ) ){
1221 $addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1224 Warn( "malformed regex, 1st address", $fl );
1229 if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1230 if( $cmd =~ s/^(\d+)\s*// ){
1231 $addr2 = "$1"; $naddr++;
1232 } elsif( $cmd =~ s/^\$\s*// ){
1233 $addr2 = 'eofARGV()'; $naddr++;
1234 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1236 my $regex = stripRegex( $del, \$cmd );
1237 if( defined( $regex ) ){
1238 $addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1241 Warn( "malformed regex, 2nd address", $fl );
1246 Warn( "invalid address after `,'", $fl );
1252 # address modifier `!'
1254 $negated = $cmd =~ s/^!\s*//;
1255 if( defined( $addr1 ) ){
1256 print "Parse: addr1=$addr1" if $useDEBUG;
1257 if( defined( $addr2 ) ){
1258 print ", addr2=$addr2 " if $useDEBUG;
1259 # both numeric and addr1 > addr2 => eliminate addr2
1260 undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1261 $addr2 =~ /^\d+$/ && $addr1 > $addr2;
1264 print 'negated' if $useDEBUG && $negated;
1265 print " command:$cmd\n" if $useDEBUG;
1269 if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1270 my $h = substr( $cmd, 0, 1 );
1271 Warn( "unknown command `$h'", $fl );
1277 my $tabref = $ComTab{$key};
1278 if( $naddr > $tabref->[0] ){
1279 Warn( "excess address(es)", $fl );
1285 if( $tabref->[1] eq 'str' ){
1286 # take remainder - don't care if it is empty
1290 } elsif( $tabref->[1] eq 'txt' ){
1292 my $goon = $cmd =~ /(.*)\\$/;
1294 Warn( "extra characters after command ($cmd)", $fl );
1299 if( $icom > $#Commands ){
1300 Warn( "unexpected end of script", $fl );
1304 $cmd = $Commands[$icom];
1305 $Code .= "# $cmd\n" if $doGenerate;
1306 $goon = $cmd =~ s/\\$//;
1307 $cmd =~ s/\\(.)/$1/g;
1308 $arg .= "\n" if length( $arg );
1311 $arg .= "\n" if length( $arg );
1314 } elsif( $tabref->[1] eq 'sub' ){
1316 if( ! length( $cmd ) ){
1317 Warn( "`s' command requires argument", $fl );
1321 if( $cmd =~ s{^([^\\\n])}{} ){
1323 my $regex = stripRegex( $del, \$cmd );
1324 if( ! defined( $regex ) ){
1325 Warn( "malformed regular expression", $fl );
1329 $regex = bre2p( $del, $regex, $fl );
1331 # a trailing \ indicates embedded NL (in replacement string)
1332 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1334 if( $icom > $#Commands ){
1335 Warn( "unexpected end of script", $fl );
1339 $cmd .= $Commands[$icom];
1340 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1343 my $subst = stripRegex( $del, \$cmd );
1344 if( ! defined( $regex ) ){
1345 Warn( "malformed substitution expression", $fl );
1349 $subst = sub2p( $del, $subst, $fl );
1351 # parse s/// modifier: g|p|0-9|w <file>
1352 my( $global, $nmatch, $print, $write ) =
1353 ( '', '', 0, undef );
1354 while( $cmd =~ s/^([gp0-9])// ){
1355 $1 eq 'g' ? ( $global = 'g' ) :
1356 $1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1358 $write = $1 if $cmd =~ s/w\s*(.*)$//;
1359 ### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1360 if( $global && length( $nmatch ) || length( $nmatch ) > 1 ){
1361 Warn( "conflicting flags `$global$nmatch'", $fl );
1366 $arg = makes( $regex, $subst,
1367 $write, $global, $print, $nmatch, $fl );
1368 if( ! defined( $arg ) ){
1374 Warn( "improper delimiter in s command", $fl );
1379 } elsif( $tabref->[1] eq 'tra' ){
1381 # a trailing \ indicates embedded newline
1382 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1384 if( $icom > $#Commands ){
1385 Warn( "unexpected end of script", $fl );
1389 $cmd .= $Commands[$icom];
1390 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1392 if( ! length( $cmd ) ){
1393 Warn( "`y' command requires argument", $fl );
1397 my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1399 Warn( "`\\' not valid as delimiter in `y' command", $fl );
1403 my $fr = stripTrans( $d, \$cmd );
1404 if( ! defined( $fr ) || ! length( $cmd ) ){
1405 Warn( "malformed `y' command argument", $fl );
1409 my $to = stripTrans( $d, \$cmd );
1410 if( ! defined( $to ) ){
1411 Warn( "malformed `y' command argument", $fl );
1415 if( length($fr) != length($to) ){
1416 Warn( "string lengths in `y' command differ", $fl );
1420 if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1427 # $cmd must be now empty - exception is {
1428 if( $cmd !~ /^\s*$/ ){
1430 # dirty hack to process command on '{' line
1431 $Commands[$icom--] = $cmd;
1433 Warn( "extra characters after command ($cmd)", $fl );
1441 if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1442 $tabref->[3], $arg, $fl ) ){
1447 while( @BlockStack ){
1448 my $bl = pop( @BlockStack );
1449 Warn( "start of unterminated `{'", $bl );
1453 for my $lab ( keys( %Label ) ){
1454 if( ! exists( $Label{$lab}{defined} ) ){
1455 for my $used ( @{$Label{$lab}{used}} ){
1456 Warn( "undefined label `$lab'", $used );
1462 exit( 1 ) if $error;
1471 print STDERR "Usage: sed [-an] command [file...]\n";
1472 print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1476 # Here we go again...
1479 while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1484 if( length( $arg ) ){
1485 push( @Commands, split( "\n", $arg ) );
1487 push( @Commands, shift( @ARGV ) );
1489 Warn( "option -e requires an argument" );
1494 $Defined{$#Commands} = " #$expr";
1499 if( length( $arg ) ){
1502 $path = shift( @ARGV );
1504 Warn( "option -f requires an argument" );
1508 my $fst = $#Commands + 1;
1509 open( SCRIPT, "<$path" ) || die( "$0: $path: could not open ($!)\n" );
1511 while( defined( $cmd = <SCRIPT> ) ){
1513 push( @Commands, $cmd );
1516 if( $#Commands >= $fst ){
1517 $Defined{$fst} = "$path";
1521 if( $opt eq '-' && $arg eq '' ){
1524 if( $opt eq 'h' || $opt eq '?' ){
1530 } elsif( $opt eq 'a' ){
1533 Warn( "illegal option `$opt'" );
1537 if( length( $arg ) ){
1538 unshift( @ARGV, "-$arg" );
1542 # A singleton command may be the 1st argument when there are no options.
1544 if( @Commands == 0 ){
1546 Warn( "no script command given" );
1550 push( @Commands, split( "\n", shift( @ARGV ) ) );
1551 $Defined{0} = ' #1';
1554 print STDERR "Files: @ARGV\n" if $useDEBUG;
1556 # generate leading code
1558 $Code = <<'[TheEnd]';
1561 unshift( @ARGV, '-' ) unless @ARGV;
1562 my $file = shift( @ARGV );
1563 open( ARG, "<$file" )
1564 || die( "$0: can't open $file for reading ($!)\n" );
1569 my $argref = @_ ? shift() : \$_;
1570 while( $isEOF || ! defined( $$argref = <ARG> ) ){
1572 return 0 unless @ARGV;
1573 my $file = shift( @ARGV );
1574 open( ARG, "<$file" )
1575 || die( "$0: can't open $file for reading ($!)\n" );
1582 return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1588 if( ! exists( $wFiles{$path} ) || $wFiles{$path} eq '' ){
1589 $handle = $wFiles{$path} = gensym();
1591 if( ! open( $handle, ">$path" ) ){
1592 die( "$0: can't open $path for writing: ($!)\n" );
1596 $handle = $wFiles{$path};
1610 if( $h =~ /[^[:print:]]/ ){
1617 $h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1619 while( length( $h ) > $mcpl ){
1620 my $l = substr( $h, 0, $mcpl-1 );
1621 $h = substr( $h, $mcpl );
1622 # remove incomplete \-escape from end of line
1623 if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1633 my $handle = $wFiles{$path};
1634 if( ! $doOpenWrite &&
1635 ! defined( fileno( $handle ) ) ){
1636 open( $handle, ">$path" )
1637 || die( "$0: $path: cannot open ($!)\n" );
1639 print $handle $_, "\n";
1642 # condition register test/reset
1655 if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1656 open( $wFiles{$$q}, ">>$$q" );
1658 if( open( RF, "<$$q" ) ){
1660 while( defined( $line = <RF> ) ){
1673 my( $h, $icnt, $s, $n );
1674 # hack (not unbreakable :-/) to avoid // matching an empty string
1675 my $z = "\000"; $z =~ /$z/;
1680 $doPrint = $doAutoPrint;
1682 while( getsARGV() ){
1684 $CondReg = 0; # cleared on t
1688 # parse - avoid opening files when doing s2p
1690 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1693 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1696 # append trailing code
1698 $Code .= <<'[TheEnd]';
1699 EOS: if( $doPrint ){
1702 $doPrint = $doAutoPrint;
1711 # magic "#n" - same as -n option
1713 $doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1715 # eval code - check for errors
1717 print "Code:\n$Code" if $useDEBUG;
1720 print "Code:\n$Code";
1721 die( "$0: internal error - generated incorrect Perl code: $@\n" );
1726 # write full Perl program
1729 # bang line, declarations
1732 eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1734 \$0 =~ s/^.*?(\\w+)\$/\$1/;
1738 use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1739 \$doAutoPrint \$doOpenWrite \$doPrint };
1740 \$doAutoPrint = $doAutoPrint;
1741 \$doOpenWrite = $doOpenWrite;
1744 my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1749 exit( 1 ) unless makeHandle( \$p );
1760 # execute: make handles (and optionally open) all w files; run!
1762 for my $p ( keys( %wFiles ) ){
1763 exit( 1 ) unless makeHandle( $p );
1771 The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1772 See L<"Additional Atoms">.
1778 =item ambiguos translation for character `%s' in `y' command
1780 The indicated character appears twice, with different translations.
1782 =item `[' cannot be last in pattern
1784 A `[' in a BRE indicates the beginning of a I<bracket expression>.
1786 =item `\' cannot be last in pattern
1788 A `\' in a BRE is used to make the subsequent character literal.
1790 =item `\' cannot be last in substitution
1792 A `\' in a subsitution string is used to make the subsequent character literal.
1794 =item conflicting flags `%s'
1796 In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1797 multiple n-th occurrence flags are specified. Note that only the digits
1798 `1' through `9' are permitted.
1800 =item duplicate label %s (first defined at %s)
1802 =item excess address(es)
1804 The command has more than the permitted number of addresses.
1806 =item extra characters after command (%s)
1808 =item illegal option `%s'
1810 =item improper delimiter in s command
1812 The BRE and substitution may not be delimited with `\' or newline.
1814 =item invalid address after `,'
1816 =item invalid backreference (%s)
1818 The specified backreference number exceeds the number of backreferences
1821 =item invalid repeat clause `\{%s\}'
1823 The repeat clause does not contain a valid integer value, or pair of
1826 =item malformed regex, 1st address
1828 =item malformed regex, 2nd address
1830 =item malformed regular expression
1832 =item malformed substitution expression
1834 =item malformed `y' command argument
1836 The first or second string of a B<y> command is syntactically incorrect.
1838 =item maximum less than minimum in `\{%s\}'
1840 =item no script command given
1842 There must be at least one B<-e> or one B<-f> option specifying a
1843 script or script file.
1845 =item `\' not valid as delimiter in `y' command
1847 =item option -e requires an argument
1849 =item option -f requires an argument
1851 =item `s' command requires argument
1853 =item start of unterminated `{'
1855 =item string lengths in `y' command differ
1857 The translation table strings in a B<y> commanf must have equal lengths.
1859 =item undefined label `%s'
1861 =item unexpected `}'
1863 A B<}> command without a preceding B<{> command was encountered.
1865 =item unexpected end of script
1867 The end of the script was reached although a text line after a
1868 B<a>, B<c> or B<i> command indicated another line.
1870 =item unknown command `%s'
1872 =item unterminated `['
1874 A BRE contains an unterminated bracket expression.
1876 =item unterminated `\('
1878 A BRE contains an unterminated backreference.
1880 =item `\{' without closing `\}'
1882 A BRE contains an unterminated bounds specification.
1884 =item `\)' without preceding `\('
1886 =item `y' command requires argument
1892 The basic material for the preceding section was generated by running
1896 s/^.*Warn( *"\([^"]*\)".*$/\1/
1901 s/$[_[:alnum:]]\{1,\}/%s/g
1906 on the program's own text, and piping the output into C<sort -u>.
1909 =head1 SED SCRIPT TRANSLATION
1911 If this program is invoked with the name F<s2p> it will act as a
1912 sed-to-Perl translator. After option processing (all other
1913 arguments are ignored), a Perl program is printed on standard
1914 output, which will process the input stream (as read from all
1915 arguments) in the way defined by the sed script and the option setting
1916 used for the translation.
1920 perl(1), re_format(7)
1924 The B<l> command will show escape characters (ESC) as `C<\e>', but
1925 a vertical tab (VT) in octal.
1927 Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1929 The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1930 is "the last pattern used, at run time". This deviates from the Perl
1931 interpretation, which will re-use the "last last successfully executed
1932 regular expression". Since keeping track of pattern usage would create
1933 terribly cluttered code, and differences would only appear in obscure
1934 context (where other B<sed> implementations appear to deviate, too),
1935 the Perl semantics was adopted. Note that common usage of this feature,
1936 such as in C</abc/s//xyz/>, will work as expected.
1938 Collating elements (of bracket expressions in BREs) are not implemented.
1942 This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1943 definition of B<sed>, and is compatible with the I<OpenBSD>
1944 implementation, except where otherwise noted (see L<"BUGS">).
1948 This Perl implementation of I<sed> was written by Wolfgang Laun,
1949 I<Wolfgang.Laun@alcatel.at>.
1951 =head1 COPYRIGHT and LICENSE
1953 This program is free and open software. You may use, modify,
1954 distribute, and sell this program (and any modified variants) in any
1955 way you wish, provided you do not restrict others from doing the same.
1961 close OUT or die "Can't close $file: $!";
1962 chmod 0755, $file or die "Can't reset permissions for $file: $!\n";
1963 exec("$Config{'eunicefix'} $file") if $Config{'eunicefix'} ne ':';