[Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Aug 23 01:19:43 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029

--- Comment #92 from Don Lewis <truckman at FreeBSD.org> ---
I'm building the same set of ports as I do on my FX-8320E machine so I have a
reasonable idea of what to expect in terms of package build fallout.  Some
amount of core dump messages getting logged is fairly normal.

I've never seen the "Failed to fully fault in a core file segment" message.

The motherboard that I'm using has an igb interface.  The interrupt storm
messages are likely to be specific to that chip, driver, or motherboard.

I've set kern.sched.balance=0 for my testing since I suspect it could have
similar issues as kern.sched.steal_idle and I want to eliminate that source of
noise.

On Ryzen, at the CPU topology level that only includes the SMT threads
belonging to one core, the steal_idle code will only steal a thread from the
other SMT thread if the load on that other SMT thread exceeds steal_thresh
(default 2).  At the other CPU hierarchy levels, the threshold for stealing a
thread is hardwired to 1.  That could sometimes allow a thread to be stolen
from the other SMT thread on the same core even though that was not allowed on
the previous iteration.  Since my last experiment (with steal_thresh=1)
exhibited a lot of random failures, I hacked the code to set the threshold at
the other hierarchy levels to 2.  I also hacked the code to only steal threads
from cores in the same CCX.  The results of this experiment only had two build
failures.  One was the usual ghc SIGBUS, and the other appears to have been a
SIGSEGV in lang/go14.  The latter was a bit different than the usual go build
failures.  All of the ones that I have previously looked at appear to have been
caused by corruption of the internal malloc state.

fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0x0 pc=0x49890c]

runtime stack:
runtime.gothrow(0x6fd3f0, 0x2a)
        /usr/local/go14/src/runtime/panic.go:503 +0x8e
runtime.sigpanic()
        /usr/local/go14/src/runtime/sigpanic_unix.go:14 +0x5e
futexsleep()
        /usr/local/go14/src/runtime/os_freebsd.c:72 +0x6c
runtime.onM(0xc208349f50)
        /usr/local/go14/src/runtime/asm_amd64.s:273 +0x9a
runtime.futexsleep(0xc208116ed8, 0xc200000000, 0xffffffffffffffff)
        /usr/local/go14/src/runtime/os_freebsd.c:58 +0x73
runtime.notesleep(0xc208116ed8)
        /usr/local/go14/src/runtime/lock_futex.go:145 +0xae
stopm()
        /usr/local/go14/src/runtime/proc.c:1178 +0x119
exitsyscall0(0xc2082197a0)
        /usr/local/go14/src/runtime/proc.c:2020 +0xd8
runtime.mcall(0x49b4c4)
        /usr/local/go14/src/runtime/asm_amd64.s:186 +0x5a

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list