5.4-R ULE peformance and MPI
Peter C. Lai
sirmoo at cowbert.2y.net
Tue May 24 04:20:51 PDT 2005
On 5.4-R, the 4BSD scheduler appears to be much faster than the ULE scheduler,
all else being equal, when an application is being parallelized by MPI.
The hardward is dual pentium3.
The particular application that we are using to benchmark this is the
science/gromacs molecular dynamics simulation port, custom built to work
with the single precision floating point configuration of math/fftw and
net/mpich ports. None of these are threadsafe so we do not link to a
threading library. In repeated runs of the same initial conditions of a
particular test simulation, gromacs reports about 350mflops calculated while
on ULE and 700mflops on 4BSD and the total cpu time used is ~230s on ULE and
118s on 4BSD.
Using top(1), we notice that under ULE, the two gromacs processes are unable
to fully use the two cpus because the IPC causes them to both request the
same cpu about half the time (and therefore each process runs at 50% all
the time), I am guessing ULE is spin locking so that one
process is effectively blocking the other. I don't have time(1) data to
show the context switching though.
My cursory googling didn't turn up anything related to this so I was
wondering if you already know about this issue. Thanks
peter
--
Peter C. Lai
University of Connecticut
Dept. of Molecular and Cell Biology
Yale University School of Medicine
SenseLab | Research Assistant
http://cowbert.2y.net/
More information about the freebsd-smp
mailing list