[Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sun Aug 20 07:19:30 UTC 2017


--- Comment #80 from Don Lewis <truckman at FreeBSD.org> ---
The events controlled by kern.sched.balance are timer based and are probably
much less frequent than occurences of a CPU running out of work to do and
triggering a steal_idle event unless the machine has a very high load average
so that all of the CPUs have long queues of runnable processes.  Since I don't
really care about scheduling fairness in these experiments, I'm planning on
leaving kern.sched.balance=0.  I think that kern.sched.steal_idle is much more
interesting.  In order to make the experiments as meaningful as possible, I'd
like to keep the overall CPU idle time as low as possible.  If there is a lot
of CPU idle time with kern.sched.steal_idle=0, then the reduction of memory
bandwidth demand and lower die temperature can confuse the results.

Based on my previous experiment, I suspect that steal_idle events ignore the
CPU affinity time setting, which sort of makes sense.

In my latest experiment, I set kern.sched.affinity=10 and set both steal_idle
and balance back to 0.  This time ghc successfully built, which I attribute to
a lucky roll of the dice.  During the lang/ghc build, ghc (either the bootstrap
or newly built version) gets executing a number of times.  How many times it
succeeds before eventually getting a SIGBUS seems random.  What is somewhat
surprising is that once built, ghc was able to successfully able to build a
large number of hs_* ports without any failures.  This time chromium also
avoided the rename issue and built successfully.  The only failure was lang/go,
which has also failed frequently on my Ryzen machine.  I think the lang/ghc and
lang/go issues are distinct with the other random build problems.

What would be interesting would be to dig into sched_ule and tweak steal_idle
to restrict the CPUs that it can steal tasks from.  One experiment would be to
only allow it to steal from the other thread on the same CPU core.  Another
would be to only allow it to steal from a core in the same CCX.

You are receiving this mail because:
You are the assignee for the bug.

More information about the freebsd-bugs mailing list