[Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Aug 21 20:25:07 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029

--- Comment #87 from Don Lewis <truckman at FreeBSD.org> ---
I set affinity back to its default value of 1 and got another clean 1700 port
poudriere run.  It's curious that the only issues I've had when steal_idle=0
and balance=0 happened when I set affinity=1000.  This is the opposite of what
I would expect.

I would expect that migrations controlled by the steal_idle and balance knobs
to have similar issues.  In either case, the thread that is getting migrated is
one that was preempted by an interrupt, and before being resumed, the scheduler
noticed that the thread had exhausted its run time quantum and moved the thread
to the back of the run queue for that cpu before resuming the thread that is at
the front of the run queue.  The only difference between steal_idle and balance
is the event that actually causes the thread to migrate.  When they restart,
they basically just execute the kernel code to restore their state before
dropping back into user mode where they were preempted from. For some reason,
threads that have exhausted their time quantum seem to resume properly on the
same CPU that they were previously running, but sometimes go wonky if they
resume on some other CPU.

The migrations controlled by the affinity knob are different.  In those cases,
the thread has voluntarily put itself to sleep, either because it blocked in a
syscall, or perhaps trap on a page fault and then go to sleep in the kernel
while the missing page is brought in.  When these threads get a wakeup event,
they then execute the remaining part of the syscall or the page fault handler
before returning to user mode.  It doesn't seem to matter what CPU these
threads restart on.

As a test, I set balance=1 and reduced balance_interval from its default 127 to
10 so that balance events would happen a lot more frequently to try to make up
for the steal_idle being disabled.  I had three port build failures.  The first
was a guile segfault when building finance/gnucash.  The second was a unit test
failure in editors/openoffice-devel.  The third was build runaway in
devel/doxygen.

The steal_idle code in sched_ule is topology-aware, so it looks like it should
be easy to hack the code to only allow migrations between SMT threads sharing
the same core, or cores in the same CCX.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list