Ask for opinion: changing rand(3) to random(3) in awk(1)

Chenguang Li horus.li at gmail.com
Fri Aug 29 02:44:20 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Lowell Gilbert <freebsd-questions-local at be-well.ilk.org> wrote:

> Chenguang Li <horus.li at gmail.com> writes:
> 
>> The problem I was trying to describe was its "one-shot" randomness, take these two as examples (where it matters):
>> 
>> 1. You wrote a script[1] that simulate rolling a dice, it would
>> produce the same result if executed within, say, 5 seconds.
>> [1] BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }, won't matter.
> 
> One second, not 5. Calling srand() without a parameter seeds the random
> number generator with the current time in seconds, so the value changes
> once per second.

Did you actually run this line? I will let the examples speak for me:

m1: FreeBSD 10.0-RELEASE-p6 amd64

m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277292
53
m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277300
53
m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277302
53

m2: FreeBSD 10.0-RELEASE i386

m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277368
53
m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277374
53
m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277379
53

m3: FreeBSD 10.0-RELEASE i386

m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248690
31
m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248697
31
m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248700
31

m1, m2, m3 are 3 different machines I have access to. Other versions
and/or architectures are not tested.

>> 2. You have a CGI script which will show different content based on the number generated by rand().
>> 
>> In the first situation, you can generate all the outcomes in a single
>> run by using for-loop, but the first outcome will be the same. OSX's
>> awk(1) will produce a reasonable number every time I run it. In the
>> latter one, you could call rand() once and throw away the result, and
>> call it again to get another number. Both are practical workarounds,
>> but we do have a better choice: applying the modification I suggested
>> before.

> You are still misunderstanding the relationship between srand() and
> rand(), in a way that will not be fixed by changing awk's implementation
> from rand(3) to random(3). srand() "seeds" the random number generator
> with a particular value, and the sequence of numbers is completely
> determined afterwards. This isn't a bug; the ability to exactly
> reproduce a sequence of "random" numbers is an essential feature in a
> lot of simulation uses. This is also why we refer to these algorithms as
> "pseudo-random" rather than just "random."

I'm fairly confident that I have a not-so-bad understanding of the relationship 
between them.

> In your cases, you really do want a different sequence every time. The
> way that is handled is by using a different seed each time. The normal
> use of srand() uses the current time, so as long as it isn't called
> twice within one second, it will always use a different sequence of
> numbers. If it *is* called twice within the same second, it will produce
> the same sequence of numbers (not just the same first number, but the
> second, third, etc. number will be the same also). This is just as true
> on OSX as on FreeBSD. Your use of srand() in your first script is buggy
> because it calls srand() for *every* call to rand(); your second version
> fixes this problem.

Yes, it's buggy, but for one-shot demonstration purpose only, makes no 
difference to me. And one more example:

m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409278327
54
15
10
6
56
m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409278335
54
20
14
73
82

Just the first number. Now which one to blame? rand(), the timer, or the compiler?
It's weird, and I have the same thought before - it should change by seconds. The 
fact is, it's not. Is it just me or ...

> How do we deal with the one-second window? Well, most of the time we
> ignore it. For a CGI script, it won't matter. If you really do need to
> run separate copies of an awk script more often, you'll need a better
> seed. Reading it from /dev/random would be one place for your awk
> script to get that. An important point that you may have missed is that
> when your script calls srand(), it can provide a parameter, which will
> be used instead of the current time.
>
>> If others are not affected by the problem I described above, then I am
>> okay with that. The other reason why I suggest this is, I see no loss,
>> only to make it better.
> 
> The problem you described is caused by your calling srand() multiple
> times. This is a bug on your part, not a problem with awk that would
> affect other people. Changing awk to use random(3) instead of rand(3)
> will not fix your problem, because continually reseeding srandom(3) with
> the same seed will give you the same values from random(3) just as much
> as doing the same with srand(3) and rand(3) will. In your example:
> BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }
> the first one is broken and the second one works (try them and compare
> the output).

I do know that I can provide a better seed when calling srand(). I know that 
I shoudn't call srand() every time I call rand(). I insists that our awk(1)
should provide a good randomness in very single run, based on the example given, 
it's not doing its job well. Below is a locally patched version:

m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409279522
59
38
84
67
8
m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409279524
80
71
94
80
94

Much better.

> Although it may not fix the problem you thought it would, you're right
> that there's no loss in making the change, so I think it's a good idea.
> 
> Be well.
>       Lowell

I am afraid I have only touched the surface of the problem, nevertheless the modification do 
fix my problem. My journey ends here.

Chenguang Li

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJT/+j9AAoJELG4cS+11lRh0SsP+wROOZIHSuA2iR+NsnrAVEM8
WH6UY/Gqyh/uxzWVDJ+FIEfgFz9GGVFfOndOhsTMYnQdLWTkrbKcAcjDUP4zBXG/
nFMxKwdVws8Q3gIRM6+ZIDiPt8Yui2w+JrPks0fJQ9LVJTtGnv7v0t+jkCag5u8G
aeseg1SQU5Z3aSoBaxBtuObjjNg+0wSMntwJDToG5AriKzB8uYvu5ljZ6tDhKb2z
q19uVcP5AUCxr7WgOoNOhVWHP+kLYMUmpiWR7rTmkKa3Bx4jbMwIJzQZ86rjyaGk
8EyKCd+K+4GsKMEvaA+yXBYwsB4rM4f0dYUfPQ7EmQX0hS78xkO7Y7cP8QAfyv1j
/ziWuecSYo0RgipU3S8gLCxt9zm9CHoTmNy81tFqJA2ZV7cqhXlx7AKwcqzoOhtI
tSW9iXimUhAxTB7pB04M/hGCooZrgW0bdyP5VeaetZHTz8TNTyOHrhCPCHBwSV3O
aXM+qMwYkRMcs3lEGzRzxoRdo0J4dg7FpORTT8mrm81vGIcuqFfidZpah2RLgD1K
JUyd+TTUAs6aqWDC+pG80dOSdA/yE5iHnApEQp6gG3egIQK893jD7Hk4Flnsem8n
RJKNTVB3ewbxwwcyJQIatFao209cvMXgsS9OsbSzvv5mYndPLhxSp7XpApvnCcCs
Ob720IJk95ixCo7/tklZ
=q2fd
-----END PGP SIGNATURE-----


More information about the freebsd-questions mailing list