Unexpected threading performance result

Ivan Voras ivoras at freebsd.org
Sun Oct 7 07:52:40 PDT 2007


Hi,

For an unrelated purpose, I'm benchmarking performance of tree 
algorithms in SMP environments and my preliminary run has an unexpected 
result. Here's the typical output from the (small) benchmark program, 
run on a dual-core Athlon64 (i386 mode):

Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 84.44 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds.

The interpretation is: running the same loop twice, in two parallel 
threads doesn't result in a speedup, but it looks like the execution is 
serialized. The problem is: the loops are completely independent, no 
locking in their execution, and 'top' confirms that both threads in the 
program are running at 100% CPU each.

I verified this behaviour on:

- 7-CURRENT, i386, ULE
- 7-CURRENT, i386, 4BSD
- 6-STABLE, amd64, 4BSD

I can't really explain this behaviour, but it might not be related to 
FreeBSD - maybe I made a mistake in the program or there's a 
hardware-related reason for it (maybe CPU cache trashing from the tree 
traversal?). In any case, can someone shed some light on this?

The main part of the (small) program is pasted below.


  47 double time_start, time_b1, time_b2;
  48 int n_data, n_samples;
  49 int *data, *samples;
  50
  51
  52 void bench_loop()
  53 {
  54         int i;
  55         struct node *nd, find;
  56         for (i = 0; i < n_samples; i++) {
  57                 find.data = samples[i];
  58                 nd = RB_FIND(node_tree, &head, &find);
  59                 if (nd == NULL)
  60                         errx(1, "Sample %d was not found", find.data);
  61         }
  62 }
  63
  64 void step1()
  65 {
  66         int n;
  67         /* step 1 - simple tree traversal */
  68         printf("Step 1: Running %d loops\n", STEP1_ITER);
  69         for (n = 0; n < STEP1_ITER; n++)
  70                 bench_loop();
  71         time_b1 = gettime();
  72         printf("** Step 1 benchmark completed %d loops in %0.2lf 
seconds.\n", STEP1_ITER, time_b1 - time_start);
  73 }
  74
  75 void *step2_thread(void *arg) {
  76         int n;
  77         for (n = 0; n < STEP2_ITER; n++)
  78                 bench_loop();
  79         return NULL;
  80 }
  81
  82 void step2()
  83 {
  84         /* step 2 - run tree traversal in parallel threads */
  85         int n;
  86         pthread_t threads[STEP2_THREADS];
  87
  88         printf("Step 2: Running %d threads with %d loops each\n", 
STEP2_THREADS, STEP2_ITER);
  89         for (n = 0; n < STEP2_THREADS; n++) {
  90                 if (pthread_create(&threads[n], NULL, step2_thread, 
NULL) != 0)
  91                         err(1, "Cannot spawn thread");
  92         }
  93         for (n = 0; n < STEP2_THREADS; n++)
  94                 pthread_join(threads[n], NULL);
  95         time_b2 = gettime();
  96         printf("** Step 2 benchmark completed %d loops in %d 
threads in %0.2lf seconds.\n",
  97                         STEP2_ITER, STEP2_THREADS, time_b2 - 
time_start);
  98 }



More information about the freebsd-threads mailing list