SCHED_ULE should not be the default
ohartman at zedat.fu-berlin.de
Tue Dec 13 00:04:06 UTC 2011
On 12/12/11 18:06, Steve Kargl wrote:
> On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote:
>> On 12/12/2011 15:51, Steve Kargl wrote:
>>> This comes up every 9 months or so, and must be approaching FAQ
>>> status. In a HPC environment, I recommend 4BSD. Depending on the
>>> workload, ULE can cause a severe increase in turn around time when
>>> doing already long computations. If you have an MPI application,
>>> simply launching greater than ncpu+1 jobs can show the problem. PS:
>>> search the list archives for "kargl and ULE".
>> This isn't something that can be fixed by tuning ULE? For example for
>> desktop applications kern.sched.preempt_thresh should be set to 224 from
>> its default. I'm wondering if the installer should ask people what the
>> typical use will be, and tune the scheduler appropriately.
Is the tuning of kern.sched.preempt_thresh and a proper method of
estimating its correct value for the intended to use workload documented
in the manpages, maybe tuning()?
I find it hard to crawl a lot of pros and cons of mailing lists for
evaluating a correct value of this, seemingly, important tunable.
> Tuning kern.sched.preempt_thresh did not seem to help for
> my workload. My code is a classic master-slave OpenMPI
> application where the master runs on one node and all
> cpu-bound slaves are sent to a second node. If I send
> send ncpu+1 jobs to the 2nd node with ncpu's, then
> ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The
> last two jobs are assigned to the ncpu'th cpu, and
> these ping-pong on the this cpu. AFAICT, it is a cpu
> affinity issue, where ULE is trying to keep each job
> associated with its initially assigned cpu.
> While one might suggest that starting ncpu+1 jobs
> is not prudent, my example is just that. It is an
> example showing that ULE has performance issues.
> So, I now can start only ncpu jobs on each node
> in the cluster and send emails to all other users
> to not use those node, or use 4BSD and not worry
> about loading issues.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 488 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20111213/5822c17b/signature.pgp
More information about the freebsd-stable