From: Larry Wall Date: Wed, 28 Feb 1990 21:56:11 +0000 (+0000) Subject: perl 3.0 patch #9 (combined patch) X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=449aadcac0f13893f2b716ea169bf74293ee9c41;p=p5sagit%2Fp5-mst-13.2.git perl 3.0 patch #9 (combined patch) Well, I didn't quite fix 100 things--only 94. There are still some other things to do, so don't think if I didn't fix your favorite bug that your bug report is in the bit bucket. (It may be, but don't think it. :-) There are very few enhancements here. One is the new pipe() function. There was just no way to emulate this using the current operations, unless you happened to have socketpair() on your system. Not even syscall() was useful in this respect. Configure now determines whether volatile is supported, since some compilers implement volatile but don't define __STDC__. Some compilers can put structure members and global variables into registers, so more variables had to be declared volatile to avoid clobbering during longjmp(). Some systems have wanted routines stashed away in libBSD.a and libPW.a. Configure can now find them. A number of Configure tests create a file called "try" and then execute it. Unfortunately, if there was a "try" elsewhere in PATH it got that one instead. All references are now to "./try". On Ultrix machines running the Mips cpu, some header files define things differently for assembly language than for the C language. To differentiate these, cc passes a -DLANGUAGE_C to the C preprocessor. Unfortunately, Configure, makedepend and perl want to use the preprocessor independently of cc. Configure now defaults to adding -DLANGUAGE_C on machines containing that symbol in signal.h. In Configure, some libraries were getting into the list more than once, causing extra extraction overhead. The names are now uniquified. Someone has invented yet another output format for nm. Sigh. Why do people assume that only people read the output of programs? Due to commentary between a declaration and its semicolon, some standard versions of stdio weren't being considered standard, and the type of char used by stdio was being misidentified. People trying to use bison instead of yacc ran into two problems. One, lack of alloca(), is solved on some machines by finding libPW.a. The other is that you have to supply a -y switch to bison to get it to emulate yacc naming conventions. Configure now prompts correctly for bison -y. The make clean had a rm -f $suidperl where it just wanted a rm -f suidperl In the README, documented more weirdities on various machines, including a pointer to the JMPCLOBBER symbol. In the construct OUTER: foreach (1,2,3) { INNER: foreach (4,5) { ... next OUTER; } } the inner loop was not getting reset to the first element. This was one of those bugs that arise because longjmp() doesn't execute exit handlers as it unwinds the stack. Perl reallocs many things as they grow, including the stack (its stack, not the C program's stack). This means that routines have to be careful to retreive the new stack when they call subroutines that can do such a realloc. In cmd.c there was such code but it was hidden inside an #ifdef JMPCLOBBER that it should have been outside of, so you could get bad return values of JMPCLOBBER wasn't defined. If you defined JMPCLOBBER to work around this problem, you should consider undefining it if your compiler guarantees that register variables get the value they had either at setjmp() or longjmp() time. Perl runs slightly faster without JMPCLOBBER defined. The longjmp()s that perl does return known values, but as a paranoid programming measure, it now checks that the values are one of the expected ones. If you say something like while (s/ /_/) {} the substitution almost always succeeds (on normal text). There is an optimization that quickly discovers and bypasses operations that are going to fail, but does nothing to help generally successful ones such as the one above. So there's a heuristic that disables the optimization if it isn't buying us anything. Unfortunately, in the above case, it's in the conditional of a while loop, which is duplicated by another optimization to be a last unless s/ /_/; at the end of the loop, to avoid unnecessary subroutine calls. Because the conditional was duplicated (not the expression itself, just the structure pointing to it), the heuristic mentioned above tried to disable the first optimization twice, resulting in the label stack getting corrupted. Some subroutines which mix both return mechanisms like this: sub foo { local($foo); return $foo if $whatever; $foo; } This clobbered the return value of $foo when the end of the scope of the local($foo) was reached. This was because such a routine turns into something like this internally: sub foo { _SUB_: { local($foo); if ($whatever) { $foo; last _SUB_; } $foo; } } Because the outer _SUB_ block was manufactured by non-standard means, it wasn't getting marked as an expression that could return a value, ie a terminal expression. So the return value wasn't getting properly saved off to the side before the local() exited. The internal label on subroutine blocks used to be SUB, but I changed it to _SUB_ to avoid possible confusion. Evals now have labels too, so they are labelled with _EVAL_. The reason evals now have a label is that nested evals need separate longjmp environments, or fatal errors end up getting a longjmp() botch. So eval now uses the same label stack as loops and subroutines. The eval routine used to always return undef on failure. In an array context, however, this makes a non-null array, which when assigned is TRUE, which is counter-intuitive. It now returns a null array upon failure in an array context. When a foreach operator works on a non-array, the compiler translates foreach (1,2,3) { into something like @_GEN_0 = (1,2,3); foreach (@_GEN_0) { Unfortunately, the line number was not correctly propagated to both command structures, so huge line numbers could appear in error messages and while debugging. The x operator was stupidly written, just calling the internal routine str_scat() multiple times, and not preextending the string to the known new length. It now preextends the string and calls a special routine to replicate the string quickly. On long strings like '\0' x 1024, the operator is more than 10 times faster. The split operator is supposed to split into @_ if called in a scalar context. Unfortunately, it was also splitting into @_ in an array context that wasn't a real array, such as assignment to a list: ($foo,$bar) = split; This has now been fixed. The split and substitute operators have a check to make sure that it isn't looping endlessly. Unfortunate, they had a hardwired limit of 10000 iterations. There are applications conceivable where you could work on longer values than that, so they now calculate a reasonable limit based on the length of the arguments. Pack and unpack called atoi all the time on the template fields. Since there are usually at most one or two digits of number, this wasted a lot of time on machines with slow subroutine calls. It now picks up the number itself. There were several places that casts could blow up. In particular, it appears that a sun3 can't cast a negative float to an unsigned integer. Appropriate measure have been taken--hopefully this won't blow someone else up. A local($.) didn't work right because the actual value of the current line number is derived from the last input filehandle. This has been fixed by causing the last input filehandle to be restored after the scope of a local($.) to what it was when the local was executed. Assignment is supposed to return the final value of the left hand side. In the case of array assignment (in an array context), it was actually returning the right hand side. This showed up in things that referred to the actual elements of an array value, such as grep(s/foo/bar/, @abc = @xyz), which modified @xyz rather than @abc. The syscall() function was returning a garbage value (the index of the top of the stack, actually) rather than value of system call. There was some discussion about how to open files with arbitrary characters in the filename. In particular, the open function strips trailing spaces. There was no way to suppress this. Now you can put an explicit null at the end of the string open(FOO,"$filename\0") and this will hide any spaces on the end of the filename. The Unix open() function will of course treat the null as the trailing delimiter. As a hangover from when Perl was not useful on binary files, there was a check to make sure that the file being opened was a normal file or character special file or socket. Now that Perl can handle binary data, this is useless, and has been removed. Some versions of utime.h have microseconds specified as acusec and modusec. Perl was referring to these in order to zero out the fields. But not everyone has these. Perl now just bzero's out the structure and refers only to fields that everyone has. You used to have to say ($foo) = unpack("L",$bar); Now you can say $foo = unpack("L",$bar); and it will just unpack the first thing specified by the template; The subscripts for slices were ignoring the value of $[. (This never made any difference for people who leave $[ set to 0.) It seems reasonable that grep in a scalar context should return the number of items matched so that it can be used in, say, a conditional. Formerly it returned an undef. Another problem with grep was that if you said something like grep(/$1/, @foo) then each iteration of grep was executing in the context of the previous iteration's regexp, so $1 might be wiped out after the first iteration. All iterations of grep now operate in the regexp context of the grep operator itself. The eg/README file now explicity states that the examples in the eg directory are to be considered in the Public Domain, and thus do not have the same restrictions as the Perl source. In a previous patch the shift operator was made to shift @_ inside of subroutines. This made some of the getopt code wrong. The sample rename command (and the new relink command) can either take a list of filenames from stdin, or if stdin is a terminal, default to a * in the current directory. A sample travesty program is now included. If you want to know what it does, feed it about 10 Usenet articles, or the perl manual, and see what it prints out. If a return operator was embedded in an expression that supplied a scalar context, but the subroutine containing the return was called in an array context, an array was not returned correctly. Now it is. The !~ operator used to ignore the negation in an array context and do the same thing as =~. It now always returns scalar even in array context, so if you say ($foo) = ($bar !~ /(pat)/) $foo will get a value of either 1 or ''. Opens on pipes were defined to return the child's pid in the parent, and FALSE in the child. Unfortunately, what the child actually got was an undef, making it indistinguishable from a failure to open the pipe successfully. The child now gets a 0, and undef means a failure to fork a child. Formerly, @array in a scalar context returned the last value of the array, by analogy to the comma operator. This makes for counter-intuitive results when you say if (@array) if 0 or '' is a legal array value. @array now returns the length of the array (not the subscript of the last element, which is @#array). To get the last element of the array you must either pop(@array) or refer to $array[$#array]. The chdir operator with no argument was supposed to change directory to your home directory, but it core dumped instead. The wait operator was ignoring SIGINT and SIGQUIT, by analogy to the system and pipe operations. But wait is a lower level operation, and it gives you more freedom if those signals aren't automatically ignored. If you want them ignored, you now have to explicitly ignore them by setting the proper %SIG entry. Different versions of /bin/mkdir and /bin/rmdir return different messages upon failure. Perl now knows about more of them. -l FILEHANDLE now disallowed The use of the -l file test makes no sense on a filehandle, since you can't open symbolic links. So -l FILEHANDLE now is a fatal error. This also means you can't say -l _, which is also a useless operation. The heavy wizardry involved in saying $#foo -= 2 didn't work quite right. In formats, you can say ... in a ^ field to have ... output when there is more for that field that is getting truncated. The next field was getting shifted over by three characters, however. The perl library routines abbrev.pl, complete.pl, getopt.pl and getopts.pl were assuming $[ == 0. The Getopt routine wasn't returning an error on unrecognized switches. The look.pl routine had never been tested, and didn't work at all. Now it does. There were several difficulties in termcap.pl. Togoto was documented backwards for $rows and $cols. The Tgetent routine could loop endlessly if there was a tc entry. And it didn't interpret the ^x form of specifying control characters right because of base treachery (031 instead of 31). There were also problems with using @_ as a temporary array. In perl.h, the unused VREG symbol was deleted because it conflicted with somebody's header files. If perl detects a #! line that specifies some other interpreter than perl, it will now start up that interpreter for you. This let's you specify a SHELL of perl to some programs. The $/ variable specifies the input record separator. It was possible to set it to a non-text character and read in an entire text file as one input, but it wasn't possible to do that for a binary file. Now you can undef $/, and there will be no record separator, so you are guaranteed to get the entire file with one <>. The example in the manual of an open() inside a ?: had the branches of the ?: backwards. I documented the fact that grep can modify arrays in place (with caveats about modifying literal values). I also put in how to deal with filenames that might have arbitrary characters, and mentioned about the problem of unflushed buffers on opens that cause forks. It's now documented how to force top of page before the next write. Formerly, $0 was guaranteed to contain the name of the perl script only till the first regular expression was executed. It now keeps that value permanently. $0 can no longer be used as a synonym for $&. The regular expression evaluator didn't handle character classes with the 8th bit set. None of /[\200-\377]/, \d, \w or \s worked right--the character class because signed characters were not interpreted right, and the builtins because the isdigit(), isalpha() and isspace() macros are only defined if isascii() is true. Patterns of the form /\bfoo/i didn't work right because the \b wants to compare the preceding character with the next one to look for word boundaries, and the i modifier forced a move of the string to a place where it couldn't do that without examining malloc garbage. The type glob syntax *foo produces the symbol table entry for all the various foo variables. Perl has to do certain bookkeeping when moving such values around. The symbol table entry was not adequately differentiated from normal data to prevent occasion confusion, however. On MICROPORTs, the CRIPPLED_CC option made the stab_array() and stab_hash() macros into function calls, but neglected to supply the function definitions. The string length allocated to turn a number into a string internally turned out to be too short on a Sun 4. Several constructs were not recognized properly inside double-quoted strings: underline in name required @foo to be defined rather than %foo threw off bracket matcher not identified with $1 The base.term test gives misleading results if /dev/null happens not to be a character special file. So it now checks for that. The op.stat could exceed the shell's maximum argument length when evaluating . It now chdirs to /usr/bin and does <*>. return grandfathered to never be function call The construct return (1,2,3); did not do what was expected, since return was swallowing the parens in order to consider itself a function. The solution, since return never wants any trailing expression such as return (1,2,3) + 2; is to simply make return an exception to the paren-makes-a-function rule, and treat it the way it always was, so that it doesn't strip the parens. If perldb.pl doesn't exist, there was no reasonable error message given when you invoke perl -d. It now does a do-or-die internally. null hereis core dumped The hereis construct dumped core on a null string: print <<'FOO'; FOO Certain pattern matches weren't working on patterns with embedded nulls because the fbminstr() routine, when it decided it couldn't do a fancy search, degenerated to using instr(), rather than ninstr(), which is better about embedded nulls. The s2p sed-to-perl translator didn't translate \< and \> to \b. Now it does. The a2p awk-to-perl translator didn't put a $ on ExitValue when translating the awk exit construct. It also didn't allow logical expressions inside normal expressions: i = ($1 == 2 || $2 ~ /bar/) a2p.h had definition of a bzero() macro inside an ifdef of BCOPY. The two don't always go together, and since Configure is already looking for both separately... --- diff --git a/Configure b/Configure index 4e8856f..08b1e10 100755 --- a/Configure +++ b/Configure @@ -8,7 +8,7 @@ # and edit it to reflect your system. Some packages may include samples # of config.h for certain machines, so you might look for one of those.) # -# $Header: Configure,v 3.0.1.4 89/12/21 18:57:00 lwall Locked $ +# $Header: Configure,v 3.0.1.5 90/02/28 16:17:50 lwall Locked $ # # Yes, you may rip this off to use in other distribution packages. # (Note: this Configure script was generated automatically. Rather than @@ -154,6 +154,7 @@ d_syscall='' d_varargs='' d_vfork='' d_voidsig='' +d_volatile='' d_vprintf='' d_charvspr='' d_wait4='' @@ -256,7 +257,7 @@ attrlist="$attrlist i186 __m88k__ m88k DGUX __DGUX__" pth="/usr/ucb /bin /usr/bin /usr/local /usr/local/bin /usr/lbin /usr/plx /usr/5bin /vol/local/bin /etc /usr/lib /lib /usr/local/lib /sys5.3/bin /sys5.3/usr/bin /bsd4.3/bin /bsd4.3/usr/bin /bsd4.3/usr/ucb" d_newshome="/usr/NeWS" defvoidused=7 -libswanted="net_s net nsl_s nsl socket nm ndir ndbm dbm sun bsd x c_s" +libswanted="net_s net nsl_s nsl socket nm ndir ndbm dbm sun bsd BSD x c_s PW" inclwanted='/usr/netinclude /usr/include/sun /usr/include/bsd /usr/include/lan' : some greps do not return status, grrr. echo "grimblepritz" >grimble @@ -291,7 +292,7 @@ if sh -c '#' >/dev/null 2>&1 ; then echo "#!/bin/echo hi" > try $eunicefix try chmod +x try - try > today + ./try > today if $contains hi today >/dev/null 2>&1; then echo "It does." sharpbang='#!' @@ -299,7 +300,7 @@ if sh -c '#' >/dev/null 2>&1 ; then echo "#! /bin/echo hi" > try $eunicefix try chmod +x try - try > today + ./try > today if test -s today; then echo "It does." sharpbang='#! ' @@ -332,7 +333,7 @@ EOSS chmod +x try $eunicefix try -if try; then +if ./try; then echo "Yup, it does." else echo "Nope. You may have to fix up the shell scripts to make sure sh runs them." @@ -1043,6 +1044,12 @@ case "$optimize" in esac ;; esac +if $contains 'LANGUAGE_C' /usr/include/signal.h >/dev/null 2>&1; then + case "$dflt" in + *LANGUAGE_C*);; + *) dflt="$dflt -DLANGUAGE_C";; + esac +fi case "$dflt" in '') dflt=none;; esac @@ -1208,7 +1215,7 @@ main() } EOCP if $cc try.c -o try >/dev/null 2>&1 ; then - dflt=`try` + dflt=`./try` case "$dflt" in ????|????????) echo "(The test program ran ok.)";; *) echo "(The test program didn't run right for some reason.)";; @@ -1422,7 +1429,7 @@ EOM fi fi echo " " -set $libc $libnames +set `echo $libc $libnames | tr ' ' '\012' | sort | uniq` $echo $n "Extracting names from $* for later perusal...$c" nm $* 2>/dev/null >libc.tmp $sed -n -e 's/^.* [AT] *_[_.]*//p' -e 's/^.* [AT] //p' libc.list @@ -1435,6 +1442,8 @@ else $contains '^printf$' libc.list >/dev/null 2>&1 || \ $sed -n -e 's/^_//' \ -e 's/^\([a-zA-Z_0-9]*\).*xtern.*text.*/\1/p' libc.list + $contains '^printf$' libc.list >/dev/null 2>&1 || \ + $sed -n -e 's/^.*|FUNC |GLOB .*|//p' libc.list if $contains '^printf$' libc.list >/dev/null 2>&1; then echo "done" else @@ -1605,7 +1614,7 @@ until a better solution is devised for the kernel problem. EOM rp="Do you want to do setuid/setgid emulation? [$dflt]" -echo $n "$rp $c" +$echo $n "$rp $c" . myread case "$ans" in '') $ans="$dflt";; @@ -1913,7 +1922,7 @@ fi : see if stdio is really std echo " " -if $contains 'char.*_ptr;' /usr/include/stdio.h >/dev/null 2>&1 ; then +if $contains 'char.*_ptr.*;' /usr/include/stdio.h >/dev/null 2>&1 ; then if $contains '_cnt;' /usr/include/stdio.h >/dev/null 2>&1 ; then echo "Your stdio is pretty std." d_stdstdio="$define" @@ -2052,6 +2061,25 @@ else fi rm -f $$.tmp +: check for volatile keyword +echo " " +echo 'Checking to see if your C compiler knows about "volatile"...' +$cat >try.c <<'EOCP' +main() +{ + volatile int foo; + foo = foo; +} +EOCP +if $cc -c try.c >/dev/null 2>&1 ; then + d_volatile="$define" + echo "Yup, it does." +else + d_volatile="$undef" + echo "Nope, it doesn't." +fi +$rm -f try.* + : see if there is a wait4 set wait4 d_wait4 eval $inlibc @@ -2216,7 +2244,7 @@ else echo "No sys/ndir.h found." fi -: see if this is DG/UX with a funky utime.h +: see if we should include utime.h echo " " if $test -r /usr/include/utime.h ; then i_utime="$define" @@ -2259,7 +2287,7 @@ main() } EOCP if $cc try.c -o try >/dev/null 2>&1 ; then - dflt=`try` + dflt=`./try` else dflt='4' echo "(I can't seem to compile the test program. Guessing...)" @@ -2317,7 +2345,7 @@ main() } EOCP if $cc try.c -o try >/dev/null 2>&1 ; then - dflt=`try` + dflt=`./try` else dflt='?' echo "(I can't seem to compile the test program...)" @@ -2376,7 +2404,7 @@ echo "Signals are: $sig_name" : see what type of char stdio uses. echo " " -if $contains 'unsigned.*char.*_ptr;' /usr/include/stdio.h >/dev/null 2>&1 ; then +if $contains 'unsigned.*char.*_ptr.*;' /usr/include/stdio.h >/dev/null 2>&1 ; then echo "Your stdio uses unsigned chars." stdchar="unsigned char" else @@ -2444,7 +2472,7 @@ case "$yacc" in esac cont=true echo " " -rp="Which compiler compiler (yacc or bison) will you use? [$dflt]" +rp="Which compiler compiler (yacc or bison -y) will you use? [$dflt]" $echo $n "$rp $c" . myread case "$ans" in @@ -2583,6 +2611,7 @@ d_syscall='$d_syscall' d_varargs='$d_varargs' d_vfork='$d_vfork' d_voidsig='$d_voidsig' +d_volatile='$d_volatile' d_vprintf='$d_vprintf' d_charvspr='$d_charvspr' d_wait4='$d_wait4' diff --git a/Makefile.SH b/Makefile.SH index 73890cb..63d326d 100644 --- a/Makefile.SH +++ b/Makefile.SH @@ -25,9 +25,12 @@ esac echo "Extracting Makefile (with variable substitutions)" cat >Makefile <Makefile <