[Bug 237544] graphics/drm-fbsd12.0-kmod: panic on 12-STABLE with Radeon HD 7450 (but not with drm-fbsd11.2-kmod)

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 31 Dec 2021 22:01:57 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237544

--- Comment #13 from Bill Paul <noisetube@gmail.com> ---
> But radeonkms still panics. As you said it's really old hardware at this point[...]

I really don't think it's fair to call it really old hardware. It may not be
current hardware, but it's still perfectly serviceable, and I suspect it would
work fine with the same driver code in Linux. I think the real problem here is
a bug in the Linux compatibility code, and the fact that it only happens to
trip with Radeon hardware is just dumb luck.

To be fair, the reason I have this hardware in the first place is that I've
been salvaging it from the e-waste bin at work. Bear in mind, sometimes stuff
just ends up in there because it was used for one project and then forgotten.
(I have a 32-core, 32GB system at work because of this.) As a result, I have
several Radeon graphics cards, and recently I also ended up with a laptop with
built-in R600/SUMO graphics as well. The laptop works great with FreeBSD 12.3
on it, except for the stupid panic in the radeonkms driver. Now that I seem to
have worked around that too, I have no complaints.

Anyway, if I understand you correctly, it sounds like the problem is still
present in FreeBSD 14-CURRENT. From what I can tell, the major difference from
the older drm-fbsd11.2-kmod code and the later code is that the dmafence code
was updated to include support for some new APIs in Linux. In particular, there
is a dma_fence_get_status() function which wasn't there before, and support for
tracking timestamps. There also seems to be a dma_fence_get_rcu_locked()
routine which wasn't there before (there was just dma_fence_get_rcu()). I
suspect that the implementation of these routines is not quite correct, but
only the Radeon driver seems to use them in a way which causes them to fail.

Note that these routines are used both by the Radeon driver directly and
internally by other modules, e.g. linux_sync_file and drm_syncobj. The use
patterns may also depend on how the user-space drivers that are part of the X
server use these facilities too, and I don't know much about that.

> Oh wow seems like you narrowed it down a lot.

I think I've narrowed it down even more. I realized that the only function
difference in linux_dmafence.c between the 11.2 and 12.0 cases is the addition
of the dma_fence_get_status() function, so I created a smaller patch for this
module that just #ifdefs this function off and leaves everything else alone.
I'm using that code right now, and it seems to be holding up. I updated the
tarball with the new patch:

http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz

This means that right now, the only major difference is that I'm using the
older version of dma-fence.h from the drm-fbsd11.2-kmod code, with only one
minor compatibility fixup in linux_sync_file.c.

Unfortunately I can't easily analyze the 14-CURRENT code right now because I
don't have a machine with it installed. I might be able to fix that once I get
back to the office next week. One thing I've noticed is that the linuxkpi
directory in the drm-fbsd-kmod package gets smaller for more recent versions of
FreeBSD. I guess this is because the GPL'ed modules in the drm driver are
gradually being rewritten and migrated into the FreeBSD kernel sources proper.
I don't know if the dma-fence code is part of that. It seems that at least for
drm-fbsd13.0-kmod the dma-fence code is still in the driver package, but I
haven't checked for -CURRENT yet

If the dma-fence code is still in the driver package, then it may still have
the same bug, but I can't easily kludge up a patch for it just yet.

> When I'm done shuffling hardware I'm gonna get my radeonkms test computer back > on my desk and leave it running and gather more infos on crashes if that helps[...]

It would be interesting to see if the stack traces are similar. I would expect
to see the same drm_ioctl()->drm_ioctl_kernel()->radeon_cs() path as that seems
to be the most common.

What would really help is if whoever put together the dma-fence support in the
linuxkpi module would step forward and maybe review it a bit and maybe offer
some guidance. I'm only vaguely familiar with what facility is even for; it
would be nice if someone with some more insight would comment.

Also, my other question remains: should I open an new PR for this?

-- 
You are receiving this mail because:
You are the assignee for the bug.