Ask for opinion: changing rand(3) to random(3) in awk(1)
Chenguang Li
horus.li at gmail.com
Fri Aug 29 02:44:20 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Lowell Gilbert <freebsd-questions-local at be-well.ilk.org> wrote:
> Chenguang Li <horus.li at gmail.com> writes:
>
>> The problem I was trying to describe was its "one-shot" randomness, take these two as examples (where it matters):
>>
>> 1. You wrote a script[1] that simulate rolling a dice, it would
>> produce the same result if executed within, say, 5 seconds.
>> [1] BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }, won't matter.
>
> One second, not 5. Calling srand() without a parameter seeds the random
> number generator with the current time in seconds, so the value changes
> once per second.
Did you actually run this line? I will let the examples speak for me:
m1: FreeBSD 10.0-RELEASE-p6 amd64
m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277292
53
m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277300
53
m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277302
53
m2: FreeBSD 10.0-RELEASE i386
m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277368
53
m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277374
53
m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409277379
53
m3: FreeBSD 10.0-RELEASE i386
m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248690
31
m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248697
31
m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}'
1409248700
31
m1, m2, m3 are 3 different machines I have access to. Other versions
and/or architectures are not tested.
>> 2. You have a CGI script which will show different content based on the number generated by rand().
>>
>> In the first situation, you can generate all the outcomes in a single
>> run by using for-loop, but the first outcome will be the same. OSX's
>> awk(1) will produce a reasonable number every time I run it. In the
>> latter one, you could call rand() once and throw away the result, and
>> call it again to get another number. Both are practical workarounds,
>> but we do have a better choice: applying the modification I suggested
>> before.
> You are still misunderstanding the relationship between srand() and
> rand(), in a way that will not be fixed by changing awk's implementation
> from rand(3) to random(3). srand() "seeds" the random number generator
> with a particular value, and the sequence of numbers is completely
> determined afterwards. This isn't a bug; the ability to exactly
> reproduce a sequence of "random" numbers is an essential feature in a
> lot of simulation uses. This is also why we refer to these algorithms as
> "pseudo-random" rather than just "random."
I'm fairly confident that I have a not-so-bad understanding of the relationship
between them.
> In your cases, you really do want a different sequence every time. The
> way that is handled is by using a different seed each time. The normal
> use of srand() uses the current time, so as long as it isn't called
> twice within one second, it will always use a different sequence of
> numbers. If it *is* called twice within the same second, it will produce
> the same sequence of numbers (not just the same first number, but the
> second, third, etc. number will be the same also). This is just as true
> on OSX as on FreeBSD. Your use of srand() in your first script is buggy
> because it calls srand() for *every* call to rand(); your second version
> fixes this problem.
Yes, it's buggy, but for one-shot demonstration purpose only, makes no
difference to me. And one more example:
m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409278327
54
15
10
6
56
m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409278335
54
20
14
73
82
Just the first number. Now which one to blame? rand(), the timer, or the compiler?
It's weird, and I have the same thought before - it should change by seconds. The
fact is, it's not. Is it just me or ...
> How do we deal with the one-second window? Well, most of the time we
> ignore it. For a CGI script, it won't matter. If you really do need to
> run separate copies of an awk script more often, you'll need a better
> seed. Reading it from /dev/random would be one place for your awk
> script to get that. An important point that you may have missed is that
> when your script calls srand(), it can provide a parameter, which will
> be used instead of the current time.
>
>> If others are not affected by the problem I described above, then I am
>> okay with that. The other reason why I suggest this is, I see no loss,
>> only to make it better.
>
> The problem you described is caused by your calling srand() multiple
> times. This is a bug on your part, not a problem with awk that would
> affect other people. Changing awk to use random(3) instead of rand(3)
> will not fix your problem, because continually reseeding srandom(3) with
> the same seed will give you the same values from random(3) just as much
> as doing the same with srand(3) and rand(3) will. In your example:
> BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }
> the first one is broken and the second one works (try them and compare
> the output).
I do know that I can provide a better seed when calling srand(). I know that
I shoudn't call srand() every time I call rand(). I insists that our awk(1)
should provide a good randomness in very single run, based on the example given,
it's not doing its job well. Below is a locally patched version:
m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409279522
59
38
84
67
8
m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}'
1409279524
80
71
94
80
94
Much better.
> Although it may not fix the problem you thought it would, you're right
> that there's no loss in making the change, so I think it's a good idea.
>
> Be well.
> Lowell
I am afraid I have only touched the surface of the problem, nevertheless the modification do
fix my problem. My journey ends here.
Chenguang Li
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCgAGBQJT/+j9AAoJELG4cS+11lRh0SsP+wROOZIHSuA2iR+NsnrAVEM8
WH6UY/Gqyh/uxzWVDJ+FIEfgFz9GGVFfOndOhsTMYnQdLWTkrbKcAcjDUP4zBXG/
nFMxKwdVws8Q3gIRM6+ZIDiPt8Yui2w+JrPks0fJQ9LVJTtGnv7v0t+jkCag5u8G
aeseg1SQU5Z3aSoBaxBtuObjjNg+0wSMntwJDToG5AriKzB8uYvu5ljZ6tDhKb2z
q19uVcP5AUCxr7WgOoNOhVWHP+kLYMUmpiWR7rTmkKa3Bx4jbMwIJzQZ86rjyaGk
8EyKCd+K+4GsKMEvaA+yXBYwsB4rM4f0dYUfPQ7EmQX0hS78xkO7Y7cP8QAfyv1j
/ziWuecSYo0RgipU3S8gLCxt9zm9CHoTmNy81tFqJA2ZV7cqhXlx7AKwcqzoOhtI
tSW9iXimUhAxTB7pB04M/hGCooZrgW0bdyP5VeaetZHTz8TNTyOHrhCPCHBwSV3O
aXM+qMwYkRMcs3lEGzRzxoRdo0J4dg7FpORTT8mrm81vGIcuqFfidZpah2RLgD1K
JUyd+TTUAs6aqWDC+pG80dOSdA/yE5iHnApEQp6gG3egIQK893jD7Hk4Flnsem8n
RJKNTVB3ewbxwwcyJQIatFao209cvMXgsS9OsbSzvv5mYndPLhxSp7XpApvnCcCs
Ob720IJk95ixCo7/tklZ
=q2fd
-----END PGP SIGNATURE-----
More information about the freebsd-questions
mailing list