5.4-R ULE peformance and MPI

Tue May 24 04:20:51 PDT 2005

On 5.4-R, the 4BSD scheduler appears to be much faster than the ULE scheduler,
all else being equal, when an application is being parallelized by MPI. 
The hardward is dual pentium3. 

The particular application that we are using to benchmark this is the
science/gromacs molecular dynamics simulation port, custom built to work 
with the single precision floating point configuration of math/fftw and 
net/mpich ports. None of these are threadsafe so we do not link to a
threading library.  In repeated runs of the same initial conditions of a 
particular test simulation, gromacs reports about 350mflops calculated while
on ULE and 700mflops on 4BSD and the total cpu time used is ~230s on ULE and 
118s on 4BSD.

Using top(1), we notice that under ULE, the two gromacs processes are unable
to fully use the two cpus because the IPC causes them to both request the 
same cpu about half the time (and therefore each process runs at 50% all 
the time), I am guessing ULE is spin locking so that one 
process is effectively blocking the other. I don't have time(1) data to
show the context switching though.

My cursory googling didn't turn up anything related to this so I was
wondering if you already know about this issue. Thanks

peter
-- 
Peter C. Lai
University of Connecticut
Dept. of Molecular and Cell Biology
Yale University School of Medicine
SenseLab | Research Assistant
http://cowbert.2y.net/