[Bug 219399] System panics after several hours of 14-threads-compilation orgies using poudriere on AMD Ryzen...

Fri Jul 28 19:15:51 UTC 2017

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399

--- Comment #184 from Mark Millard <markmi at dsl-only.net> ---
(In reply to Nils Beyer from comment #183)

For:

#3  0xffffffff80a6b9e3 in panic (fmt=<value optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80a4cf71 in _mtx_lock_spin_cookie (c=<value optimized out>,
v=<value optimized out>, tid=18446735290631902560, opts=<value optimized out>,
file=<value optimized out>, 
   line=<value optimized out>) at /usr/src/sys/kern/kern_mutex.c:672

I looked up the panic call at:

/usr/src/sys/kern/kern_mutex.c:672

It is the one in:

static void
_mtx_lock_spin_failed(struct mtx *m)
{
        struct thread *td;

        td = mtx_owner(m);

        /* If the mutex is unlocked, try again. */
        if (td == NULL)
                return;

        printf( "spin lock %p (%s) held by %p (tid %d) too long\n",
            m, m->lock_object.lo_name, td, td->td_tid);
#ifdef WITNESS
        witness_display_spinlock(&m->lock_object, td, printf);
#endif
        panic("spin lock held too long");
}

So the duration of holding the lock is involved in
hitting this specific panic.

In _mtx_lock_spin_cookie there is:

        for (;;) {
                if (v == MTX_UNOWNED) {
                        if (_mtx_obtain_lock_fetch(m, &v, tid))
                                break;
                        continue;
                }
                /* Give interrupts a chance while we spin. */
                spinlock_exit();
                do {
                        if (lda.spin_cnt < 10000000) {
                                lock_delay(&lda);
                        } else {
                                lda.spin_cnt++;
                                if (lda.spin_cnt < 60000000 || kdb_active ||
                                    panicstr != NULL)
                                        DELAY(1);
                                else
                                        _mtx_lock_spin_failed(m);
                                cpu_spinwait();
                        }
                        v = MTX_READ_VALUE(m);
                } while (v != MTX_UNOWNED);
                spinlock_enter();
        }

So apparently lda.spin_cnt made it to 60000000
or beyond.

-- 
You are receiving this mail because:
You are the assignee for the bug.