[Bug 237544] graphics/drm-fbsd12.0-kmod: panic on 12-STABLE with Radeon HD 7450 (but not with drm-fbsd11.2-kmod)

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 01 Jan 2022 20:00:52 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237544

--- Comment #14 from Bill Paul <noisetube@gmail.com> ---
I think I figured it out.

The problem seems to be in dma_fence_signal_locked_sub():

static inline void
dma_fence_signal_locked_sub(struct dma_fence *fence)
{
        struct dma_fence_cb *cur;

        while ((cur = list_first_entry_or_null(&fence->cb_list,
                    struct dma_fence_cb, node)) != NULL) {
                list_del_init(&cur->node);
                spin_unlock(fence->lock);
                cur->func(fence, cur);
                spin_lock(fence->lock);
        }
}

This function is shared by dma_fence_signal() and dma_fence_signal_unlocked().
It looks like the problem is the spin_unlock()/spin_lock() calls used to drop
the fence lock while calling the signal callbacks. The drm-fbsd11.2-kmod code
did not do this, and for that matter it looks like the most recent Linux code
doesn't do it either. As far as I can tell, dropping this lock here is what
causes the race condition: the rest of the code is not expecting this to happen
when dma_fence_signal() is called: it's only dma_fence_signal_locked() that
should work this way.

If I patch the drm-fbsd12.0-kmod code to remove the spin_unlock()/spin_lock()
calls, I also don't get any crashes.

I created a new tarball with a single patch that has just this fix:

http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz

I've been running with this patch for the last day or so and haven't had any
panics. I would appreciate it if anyone else who has been experiencing this
same crash (i.e. similar to the panics in this PR) could test this patch and
see if it fixes for you.

It would also be nice if someone could also review the code and confirm if my
findings make sense.

Oh, one last thing: from a cursory inspection of the FreeBSD 13 code, I don't
see this same problem, so if you claim that you're experiencing "the same
crash" with FreeBSD 13 or later, please back up your claim by showing me the
panic stack trace. If it doesn't match the examples in this PR, they your
problem may be something entirely different. I'm sorry if your system is also
unstable, but it's important to be sure, because I don't want to waste a lot of
time on something that turns out to be unrelated.

-- 
You are receiving this mail because:
You are the assignee for the bug.