run-time performance of regression of sparse matrix vector multiplication

Mon Mar 24 14:59:43 UTC 2008

I have found a problem with FreeBSD AMD64 (maybe i386 too).
Performance decrease related to Linux. I am attaching the results and
the piece of code I used. You have to install g++42 on FreeBSD first.

here are the results of the benchmark: 

===============
==== LINUX ====
===============

                      Intel Core 2
                      ============
        number of threads: 1/  2
Sun CC      create  :    808/443
            multiply:    5063/4488

g++-4.2.2  create  :    881/479
            multiply:    5245/4691

intel icpc  create  :    724/404
            multiply:    4903/4594

  we see that although the allocation of can be safely parallelized the
  multiplication has a really hard time to do so.

  Are there any problems with this approach I cannot see?

sysctl dev.cpu.0.freq
[archwn at home /usr/home/archwn/sparsematrixvector]$ sysctl dev.cpu.0.freq
dev.cpu.0.freq: 1654

=====================
==== FreeBSD 7.0 ====
=====================

                      Intel Core 2
                      ============
      number of threads:  1/  2
g++-4.2.2  create  :    1750/1288
            multiply:    7098/5271

Same optimization flags in both cases with g++-4.2.2. I have also written a pthreads version of the above code which doesn't need OpenMP capable compiler at all. This allows us to try gcc-3.4.6 compiler which is unlikely to have problems of its own. Is there anything you would like me to try out? Is anybody interested in having the code in order to perform his own tests?

Thanks in advance,
Archwn.

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs