Heavy I/O blocks FreeBSD box for several seconds
Steve Kargl
sgk at troutmask.apl.washington.edu
Thu Jul 7 20:08:46 UTC 2011
On Thu, Jul 07, 2011 at 10:42:39PM +0300, Andriy Gapon wrote:
> on 07/07/2011 18:14 Steve Kargl said the following:
>>
>> I'm using OpenMPI. These are N > Ncpu processes not threads,
>
> I used 'thread' in a sense of a kernel thread. It shouldn't
> actually matter if it's a process or a thread in userland
> in this context.
>
> > and without
> > the loss of generality let N = Ncpu + 1. It is a classic master-slave
> > situation where 1 process initializes all others. The n-1 slave processes
> > are then independent of each other. After 20 minutes or so of number
> > crunching, each slave sends a few 10s of KB of data to the master. The
> > master collects all the data, writes it to disk, and then sends the
> > slaves the next set of computations to do. The computations are nearly
> > identical, so each slave finishes it task in the same amount of time. The
> > problem appears to be that 2 slaves are bound to the same cpu and the
> > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves
> > finish their task, send data to the master, and then spin (chewing up
> > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes.
> > This causes a stall in the computation. When a complete computation
> > takes days to complete, theses stall become problematic. So, yes, I
> > want the processes to get a more uniform access to cpus via migration
> > to other cpus. This is what 4BSD appears to do.
>
> I would imagine that periodic rebalancing would take care of this,
> but probably the ULE rebalancing algorithm is not perfect.
:-)
> There was a suggestion on performance@ to try to use a lower value for
> kern.sched.steal_thresh, a value of 1 was recommended:
> http://article.gmane.org/gmane.os.freebsd.performance/3459
node16:kargl[215] uname -a
FreeBSD node16.cimu.org 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r223824M:
Thu Jul 7 11:12:15 PDT 2011
node16:kargl[216] sysctl -a | grep smp.cpu
kern.smp.cpus: 4
4BSD kernel gives for N = Ncpu.
33 processes: 5 running, 28 sleeping
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1387 kargl 1 67 0 370M 293M CPU1 1 1:31 98.34% sasmp
1384 kargl 1 67 0 370M 293M CPU2 2 1:31 98.34% sasmp
1386 kargl 1 67 0 370M 294M CPU3 3 1:30 98.34% sasmp
1385 kargl 1 67 0 370M 294M RUN 0 1:31 98.29% sasmp
4BSD kernel gives for N = Ncpu + 1.
34 processes: 6 running, 28 sleeping
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1417 kargl 1 71 0 370M 294M RUN 0 1:30 79.39% sasmp
1416 kargl 1 71 0 370M 294M RUN 0 1:30 79.20% sasmp
1418 kargl 1 71 0 370M 294M CPU2 0 1:29 78.81% sasmp
1420 kargl 1 71 0 370M 294M CPU1 2 1:30 78.27% sasmp
1419 kargl 1 70 0 370M 294M CPU3 0 1:30 77.59% sasmp
Recompiling the kernel to use ULE instead of 4BSD with the exact same
hardware and kernel configuration.
ULE kernel gives for N = Ncpu.
33 processes: 5 running, 28 sleeping
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1294 kargl 1 103 0 370M 294M CPU3 3 1:30 100.00% sasmp
1292 kargl 1 103 0 370M 294M RUN 2 1:30 100.00% sasmp
1295 kargl 1 103 0 370M 293M CPU0 0 1:30 100.00% sasmp
1293 kargl 1 103 0 370M 294M CPU1 1 1:28 100.00% sasmp
ULE kernel gives for N = Ncpu + 1.
34 processes: 6 running, 28 sleeping
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1318 kargl 1 103 0 370M 294M CPU0 0 1:31 100.00% sasmp
1319 kargl 1 103 0 370M 294M RUN 1 1:29 100.00% sasmp
1322 kargl 1 99 0 370M 294M CPU2 2 1:03 87.26% sasmp
1320 kargl 1 91 0 370M 294M RUN 3 1:07 60.79% sasmp
1321 kargl 1 89 0 370M 294M CPU3 3 1:06 55.18% sasmp
node16:root[165] sysctl -w kern.sched.steal_thresh=1
kern.sched.steal_thresh: 2 -> 1
34 processes: 6 running, 28 sleeping
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1396 kargl 1 103 0 366M 291M CPU3 3 1:30 100.00% sasmp
1397 kargl 1 103 0 366M 291M CPU2 2 1:30 99.17% sasmp
1400 kargl 1 97 0 366M 291M CPU0 0 1:05 83.25% sasmp
1399 kargl 1 94 0 366M 291M RUN 1 1:04 73.97% sasmp
1398 kargl 1 98 0 366M 291M RUN 0 1:01 54.05% sasmp
--
Steve
More information about the freebsd-current
mailing list