Unexpected threading performance result

Sun Oct 7 13:09:51 PDT 2007

Tijl Coosemans wrote:
> On Sunday 07 October 2007 16:52:03 Ivan Voras wrote:
>> For an unrelated purpose, I'm benchmarking performance of tree 
>> algorithms in SMP environments and my preliminary run has an unexpected 
>> result. Here's the typical output from the (small) benchmark program, 
>> run on a dual-core Athlon64 (i386 mode):
>>
>> Running benchmarks on small_nonuniform, 1000000 samples
>> Step 1: Running 100 loops
>> ** Step 1 benchmark completed 100 loops in 84.44 seconds.
>> Step 2: Running 2 threads with 100 loops each
>> ** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds.
> 
> My guess is, that in the beginning of step1() and step2() you have to
> add a line "time_start = gettime();".

Of course I have. I was so focused on the low level stuff I did 
something stupid to the effect of your suggestion. Thanks for the help!

The results make sense now, and if anyone's interested, I'm pasting them 
below. I did additional effort and run it under both 4BSD and ULE 
schedulers non 7-CURRENT (SMP, dual-core).

-- 4BSD, nonuniform samples --
Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 86.33 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 82.79 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 124.67 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 166.32 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 210.67 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 251.83 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 291.25 seconds.

-- ULE nonuniform samples --
Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 84.09 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 83.43 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 126.21 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 166.66 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 209.40 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 250.36 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 291.92 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 333.42 seconds.

-- 4BSD uniform samples --
Running benchmarks on small_uniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 93.33 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 89.33 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 135.20 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 179.96 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 226.40 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 269.57 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 314.06 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 358.67 seconds.

-- ULE uniform samples --
Running benchmarks on small_uniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 89.76 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 89.90 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 135.75 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 179.72 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 226.10 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 269.63 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 314.76 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 359.44 seconds.

"uniform" / "nonuniform" describes the distribution of the random number 
function.