RAND_MAX broken (was: Re: svn commit: r252484 - head/sys/ufs/ffs)

Tue Jul 2 07:59:36 UTC 2013

On Tue, 2 Jul 2013, Bruce Evans wrote:

> ...
> Also, random(9) is internally broken on 64-bit arches.  It shouldn't
> exist.  It is not random() at all, but just rand() with the clamp to
> RAND_MAX removed.  Its linear congruential generator is suboptimal
> for 64 bits, and other parts of its algorithm are just broken for
> 64 bits.  It uses longs internally to try to fake unsigned 31 bits,
> but the faking is broken in different ways depending on the size of
> long.  It is documented to return a 31-bit value, but on 32-bit
> arches it seems to be possible for it to return 0xfffffffff, and on
> 64-bit arches it does return full 64-bit values.
>
> random(9) was cloned from rand(3).  The userland versions have been
> edited a bit, but still have most of the bugs:
>
> random(3) uses an internal copy of rand(3) named good_rand() for
> seeding.  If USE_WEAK_SEEDING is defined, this uses a weaker linear
> congruential generator (LCG).  This uses int32_t instead of long
> internally, and doesn't attempt to reduce the value to 31 bits.
> Otherwise, the same LCG as random(9) is used, with the same buggy code
> except for changing the longs to 32-bits.  Using int32_t gives the
> same overflow bugs on all (supported) arches, and avoids returning
> invalid values half the time on 64-bit arches.  There are 2 callers
> of good_rand(), and only 1 of them discards bits above the 31st.
>
> srand() also supports USE_WEAK_SEEDING.  It uses u_long internally,
> so it is actually correct.  The internal value has the number of
> bits in a u_long and is generated without overflow and without any
> bias in the reduction to 31 bits.  Then returning this value as
> an int in by taking the value modulo ((u_long)RAND_MAX + 1) gives
> a correct reduction to 31 bits when RAND_MAX is 0x7fffffff (or
> 15 bits if RAND_MAX is 0x7fff, etc.).
>
> srand() in the !USE_WEAK_SEEDING case still uses the same buggy code
> as random(9), with type long, so the internal values overflow and the
> inernal reduction to 31 bits is buggy, with the bugs depending on the
> size of long and other things.  But it is mostly saved by taking the
> value modulo ((u_long)RAND_MAX + 1).  This reduces to a valid value
> and leaves only minor biases from the buggy earlier reduction.
>
> random(6) used to have bugs related to the buggy internal reduction,
> and the biases from these were noticed.  It uses floating point, so
> the reduction was easier, but it was still done wrong, by dividing
> by LONG_MAX instead of RANDOM_MAX_PLUS1.  Using LONG_MAX is like using
> 0x7fffffff in random(9), but obviously buggier since the range for
> both is documented to be [0,0x7fffffff].  Hard-coding 0x7fffffff in
> random(6) would have been equally buggy.  I think 0x80000000 is correct
> in both, but in random(9) this assumes too much about type sizes and
> layouts.  The correct method in integer code is to take an unsigned
> modulus with divisor 0x80000000.  Let the compiler optimize this to
> masking with 0x7fffffff.  This depends on the maximum value plus 1
> being representable in an unsigned type.  For rand(), this occurs
> because rand() returns int, and for random(3), this occurs because
> random(3) returns long.
>
> Another bug in random(9) is that it returns u_long, so its API is
> different from random(3), but since it is documented to return only
> 31 bits, it is not really different except in the buggy cases where
> it returns 32-64 bits.

The bugs are a little different than I said above.  There is no overflow
problem and no problem with invalid values being produces, since the
algorithm from ACM is careful to do everything with 32 bit signed
integers without causing overflow.  The algorithm just doesn't produce
values mod 2**32 as expected by all the functions.  It does what it
claims to do -- it produces values mod (2**32 - 1).  The largest bug
is that RAND_MAX is off by 1.  It is specified as the largest value
returned by rand(), but in FreeBSD rand() never returns it unless
USE_WEAK_SEEDING is confgured.  The values should be unifornly
distributed on average beteen 0 and RAND_MAX,but that is impossible
if RADND_MAX is never returned.  From libc/stdlib/srand.c:

% static int
% do_rand(unsigned long *ctx)
% {
% #ifdef  USE_WEAK_SEEDING
% /*
%  * Historic implementation compatibility.
%  * The random sequences do not vary much with the seed,
%  * even with overflowing.
%  */
% 	return ((*ctx = *ctx * 1103515245 + 12345) % ((u_long)RAND_MAX + 1));

RAND_MAX is correct for this algorithm.  Except it is off by about a factor
of 65536 instead of just by +1.  This is the LCG documented as an example in
C standards.  It is only suitable for use with 16-bit ints and/or with
RAND_MAX = 32767 and that is how it is used in the example.  With FreeBSD's
RAND_MAX = 0x7fffffff, RAND_MAX is correct but the implementation is of low
quality.

% #else   /* !USE_WEAK_SEEDING */
% /*
%  * Compute x = (7^5 * x) mod (2^31 - 1)

Note that (2**31 - 1) is not 2**31.

%  * without overflowing 31 bits:
%  *      (2^31 - 1) = 127773 * (7^5) + 2836
%  * From "Random number generators: good ones are hard to find",
%  * Park and Miller, Communications of the ACM, vol. 31, no. 10,
%  * October 1988, p. 1195.
%  */
% 	long hi, lo, x;
% 
% 	/* Can't be initialized with 0, so use another value. */
% 	if (*ctx == 0)
% 		*ctx = 123459876;
% 	hi = *ctx / 127773;
% 	lo = *ctx % 127773;
% 	x = 16807 * lo - 2836 * hi;

This splits up the multiplication so that it cannot overflow.  But the
subtraction may give negative value.

% 	if (x < 0)
% 		x += 0x7fffffff;

This does the mod by (2**31 - 1).  The range of values before and after
the mod operation are not clear to me.  If the comment is correct,
then the algorithm must have arranged that the value is never 2's
complement INT32_MIN or INT32_MAX before the mod operation, else the
final value would be out of bounds.

% 	return ((*ctx = x) % ((u_long)RAND_MAX + 1));

If the ACM part of the algorithm is correct, then this part is nonsense
(has no effect), since the mod has already been taken using the correct
modulus, and the correct modulus is smaller than the 1 used here.

If the ACM part of the algorithm is incorrect, then this part prevents
returning the invalid value -1, but the values are very unlikely to
be correctly distributed.

With 64-bit longs, the multiplication can be written more simply and
run more efficiently as a 64-bit one, but it isn't clear that translation to
match the comment:

%  * Compute x = (7^5 * x) mod (2^31 - 1)

 	     x = ((int64_t)16807 * x) % 0x7fffffff;

gives the same result.  This expression would also be much slower if the
mod operation cannot be reduced by the compiler, so perhaps you should
write this expression as:

              int64_t y = (int64_t)16807 * x;
              int32_t z = y;             /* but do this more carefully */
              if (y < 0)
                      y += 0x7fffffff;	/* (mod 2**31 - 1), as before */

% #endif  /* !USE_WEAK_SEEDING */

random(9) and good_rand() don't have the bogus mod by (RAND_MAX + 1).  They
don't really document their maximum value, and it might not matter that
they never return 0x7fffffff.

Bruce