From: Hallvard B Furuseth Date: Sun, 30 Mar 1997 22:19:13 +0000 (+0200) Subject: Re: 5.004's new srand() default seed X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=d7d933a26349f945f93b2f0dbf85b773d8ca3219;p=p5sagit%2Fp5-mst-13.2.git Re: 5.004's new srand() default seed In the remote past, Roderick Schertler wrote: > I posted a message to clpm asking for comments on Perl's new srand() > default seed and Dean Inada replied. Here are his observations. I'm > just a conduit, would somebody who knows what they're talking about > please step in? Sorry to be late: I believe I'm the one who inserted the new code. I browse clpm very infrequently - and skip large parts of p5p too, for that matter. I'll freely admit it's not particularly well tuned. Since I don't know much about randomness, anyone who do could do a better job, and I suggested (on p5p) that someone might give it a try. My change was just to grab a few more garbage values (pointer addresses, time & pid) and multipliers that shold at least never produce worse results than the original code. The multipliers - are prime numbers, just in case that makes a difference:-) - have few 1 bits at the beginning in binary, to distribute values varying in a small range "not worse than the old version did", or something like that. I don't quite remember. If you improve it, just remember: - *every* input value (time, pid & so on) may vary within a very narrow range and/or with a constant multiplier in some instances - or not vary at all. - cast things to a "large enough" unsigned and to avoid overflow exception and/or warning on some compilers. And don't trust a variable/function to be unsigned just because some standard says that it should be. >> Using ^ here seems pretty silly. It again invites identical seeds >> every 1000001 microseconds. A much more natural seed would seem to be >> >> tv_sec*1000000 + tv_usec >> >> Which at least will never repeat on different ticks. Unless int is small, which gives faster wraparound. >> But (...) some round tv_usec to multiples of 10000 microseconds, Ah, yes. That's why I didn't multiply tv_sec with anything. Better fix: Replace 1000000 above with some uneven larger number. >> it would still be useful to mix in a getpid(). Which I did. >>> seed ^= ( ( 269 * (U32)getpid()) >>> ^ (26107 * (U32)&when) >>> ^ (73819 * (U32)stack_sp)); >> >> Since you're using primes here, why not use the addition operator from >> the same field that the primes are in, which would generate a group >> the size of the product of those primes for avoiding repetitions? Fine. Someone said xor, but not with any arguments that I can rember. >> Are you still using rand(3), rather than the better random(3)? If so, >> there may be the additional issue that small changes in the seed tend >> to produce small changes in the sequence. In that case, something >> like seed*32769 (i.e. seed + (seed<<15)) may help. (though a prime >> like 32771 might be better.) INSTALL suggests -Drand=random -Dsrand=srandom if one wants larger period. Which means the period could be anything. However, #if RANDBITS > 32 ... #else #if RANDBITS > 16 ... #else ... should allow us to tune this for the most common seed sizes. Anyway, what should be used is a *real* hash function which spreads the effect of every bit over a whole 1<According to Fabien TASSIN: >>> warning(1412): destination type of cast is too small to >>> hold all pointers: truncation possible >>> ^ (26107 * (U32)&when) >>> ^ >> >> this one is interesting for all 64bit OSs.. > > It's on purpose. The value of "&when" is just a presumably large, > difficult-to-predict value; converting it to U32 is the Right Thing, > even on 64-bit machines. Well, (U32)(unsigned_with_same_size_as_pointers_t)&when wouldn't hurt, if the latter type exists:-) And the same for (U32)stack_sp, of course. p5p-msgid: 199703302219.AAA20998@bombur2.uio.no --- diff --git a/pp.c b/pp.c index a988646..4ca4b3e 100644 --- a/pp.c +++ b/pp.c @@ -1363,28 +1363,50 @@ PP(pp_srand) static U32 seed() { + /* + * This is really just a quick hack which grabs various garbage + * values. It really should be a real hash algorithm which + * spreads the effect of every input bit onto every output bit, + * if someone who knows about such tings would bother to write it. + * Might be a good idea to add that function to CORE as well. + * No numbers below come from careful analysis or anyting here, + * except they are primes and SEED_C1 > 1E6 to get a full-width + * value from (tv_sec * SEED_C1 + tv_usec). The multipliers should + * probably be bigger too. + */ +#if RANDBITS > 16 +# define SEED_C1 1000003 +#define SEED_C4 73819 +#else +# define SEED_C1 25747 +#define SEED_C4 20639 +#endif +#define SEED_C2 3 +#define SEED_C3 269 +#define SEED_C5 26107 + U32 u; #ifdef VMS # include unsigned int when[2]; _ckvmssts(sys$gettim(when)); - u = when[0] ^ when[1]; + /* Please tell us: Which value is seconds and what is the other here? */ + u = (U32)SEED_C1 * when[0] + (U32)SEED_C2 * when[1]; #else # ifdef HAS_GETTIMEOFDAY struct timeval when; gettimeofday(&when,(struct timezone *) 0); - u = when.tv_sec ^ when.tv_usec; + u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec; # else Time_t when; (void)time(&when); - u = when; + u = (U32)SEED_C1 * when; # endif #endif -#ifndef PLAN9 /* XXX Plan9 assembler chokes on this; fix needed */ - /* What is a good hashing algorithm here? */ - u ^= ( ( 269 * (U32)getpid()) - ^ (26107 * (U32)&when) - ^ (73819 * (U32)stack_sp)); + u += SEED_C3 * (U32)getpid(); + u += SEED_C4 * (U32)stack_sp; +#ifndef PLAN9 /* XXX Plan9 assembler chokes on this; fix needed */ + u += SEED_C5 * (U32)&when; #endif return u; }