In the remote past, Roderick Schertler wrote:
> I posted a message to clpm asking for comments on Perl's new srand()
> default seed and Dean Inada replied. Here are his observations. I'm
> just a conduit, would somebody who knows what they're talking about
> please step in?
Sorry to be late:
I believe I'm the one who inserted the new code. I browse clpm very
infrequently - and skip large parts of p5p too, for that matter.
I'll freely admit it's not particularly well tuned. Since I don't know
much about randomness, anyone who do could do a better job, and I
suggested (on p5p) that someone might give it a try. My change was just
to grab a few more garbage values (pointer addresses, time & pid) and
multipliers that shold at least never produce worse results than the
original code.
The multipliers
- are prime numbers, just in case that makes a difference:-)
- have few 1 bits at the beginning in binary, to distribute values
varying in a small range "not worse than the old version did",
or something like that. I don't quite remember.
If you improve it, just remember:
- *every* input value (time, pid & so on) may vary within a very narrow
range and/or with a constant multiplier in some instances - or not
vary at all.
- cast things to a "large enough" unsigned and to avoid overflow
exception and/or warning on some compilers. And don't trust a
variable/function to be unsigned just because some standard says
that it should be.
>> Using ^ here seems pretty silly. It again invites identical seeds
>> every 1000001 microseconds. A much more natural seed would seem to be
>>
>> tv_sec*1000000 + tv_usec
>>
>> Which at least will never repeat on different ticks.
Unless int is small, which gives faster wraparound.
>> But (...) some round tv_usec to multiples of 10000 microseconds,
Ah, yes. That's why I didn't multiply tv_sec with anything.
Better fix: Replace 1000000 above with some uneven larger number.
>> it would still be useful to mix in a getpid().
Which I did.
>>> seed ^= ( ( 269 * (U32)getpid())
>>> ^ (26107 * (U32)&when)
>>> ^ (73819 * (U32)stack_sp));
>>
>> Since you're using primes here, why not use the addition operator from
>> the same field that the primes are in, which would generate a group
>> the size of the product of those primes for avoiding repetitions?
Fine. Someone said xor, but not with any arguments that I can rember.
>> Are you still using rand(3), rather than the better random(3)? If so,
>> there may be the additional issue that small changes in the seed tend
>> to produce small changes in the sequence. In that case, something
>> like seed*32769 (i.e. seed + (seed<<15)) may help. (though a prime
>> like 32771 might be better.)
INSTALL suggests -Drand=random -Dsrand=srandom if one wants larger
period. Which means the period could be anything. However,
#if RANDBITS > 32 ... #else #if RANDBITS > 16 ... #else ...
should allow us to tune this for the most common seed sizes.
Anyway, what should be used is a *real* hash function which spreads the
effect of every bit over a whole 1<<RANDBITS, so truncating to 32, 48 or
16 bits will all work. Multiplication with primes is not good enough,
because changes in high bits do not affect low bits of the seed.
However, I can't do that without checking some book, I don't know what
code to avoid which could cause recurring patterns instead...
Anyway, here is a starting point - slightly improved from your version:
[included code block applied as patch - ed]
Oh, also note:
Some time ago, Chip Salzenberg wrote:
>According to Fabien TASSIN:
>>> warning(1412): destination type of cast is too small to
>>> hold all pointers: truncation possible
>>> ^ (26107 * (U32)&when)
>>> ^
>>
>> this one is interesting for all 64bit OSs..
>
> It's on purpose. The value of "&when" is just a presumably large,
> difficult-to-predict value; converting it to U32 is the Right Thing,
> even on 64-bit machines.
Well,
(U32)(unsigned_with_same_size_as_pointers_t)&when
wouldn't hurt, if the latter type exists:-) And the same for
(U32)stack_sp, of course.
p5p-msgid:
199703302219.AAA20998@bombur2.uio.no