[Bug 237544] graphics/drm-fbsd12.0-kmod: panic on 12-STABLE with Radeon HD 7450 (but not with drm-fbsd11.2-kmod)
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 01 Jan 2022 20:00:52 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237544 --- Comment #14 from Bill Paul <noisetube@gmail.com> --- I think I figured it out. The problem seems to be in dma_fence_signal_locked_sub(): static inline void dma_fence_signal_locked_sub(struct dma_fence *fence) { struct dma_fence_cb *cur; while ((cur = list_first_entry_or_null(&fence->cb_list, struct dma_fence_cb, node)) != NULL) { list_del_init(&cur->node); spin_unlock(fence->lock); cur->func(fence, cur); spin_lock(fence->lock); } } This function is shared by dma_fence_signal() and dma_fence_signal_unlocked(). It looks like the problem is the spin_unlock()/spin_lock() calls used to drop the fence lock while calling the signal callbacks. The drm-fbsd11.2-kmod code did not do this, and for that matter it looks like the most recent Linux code doesn't do it either. As far as I can tell, dropping this lock here is what causes the race condition: the rest of the code is not expecting this to happen when dma_fence_signal() is called: it's only dma_fence_signal_locked() that should work this way. If I patch the drm-fbsd12.0-kmod code to remove the spin_unlock()/spin_lock() calls, I also don't get any crashes. I created a new tarball with a single patch that has just this fix: http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz I've been running with this patch for the last day or so and haven't had any panics. I would appreciate it if anyone else who has been experiencing this same crash (i.e. similar to the panics in this PR) could test this patch and see if it fixes for you. It would also be nice if someone could also review the code and confirm if my findings make sense. Oh, one last thing: from a cursory inspection of the FreeBSD 13 code, I don't see this same problem, so if you claim that you're experiencing "the same crash" with FreeBSD 13 or later, please back up your claim by showing me the panic stack trace. If it doesn't match the examples in this PR, they your problem may be something entirely different. I'm sorry if your system is also unstable, but it's important to be sure, because I don't want to waste a lot of time on something that turns out to be unrelated. -- You are receiving this mail because: You are the assignee for the bug.