From nobody Thu Dec 30 20:55:13 2021 X-Original-To: x11@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 13B29191788A for ; Thu, 30 Dec 2021 20:55:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JQ0sF6LZnz3t8R for ; Thu, 30 Dec 2021 20:55:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id BB9DD1F22E for ; Thu, 30 Dec 2021 20:55:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 1BUKtDpS092563 for ; Thu, 30 Dec 2021 20:55:13 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 1BUKtDg1092562 for x11@FreeBSD.org; Thu, 30 Dec 2021 20:55:13 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: x11@FreeBSD.org Subject: [Bug 237544] graphics/drm-fbsd12.0-kmod: panic on 12-STABLE with Radeon HD 7450 (but not with drm-fbsd11.2-kmod) Date: Thu, 30 Dec 2021 20:55:13 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.0-STABLE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: noisetube@gmail.com X-Bugzilla-Status: Closed X-Bugzilla-Resolution: Overcome By Events X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: x11@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: X11 List-Archive: https://lists.freebsd.org/archives/freebsd-x11 List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-x11@freebsd.org X-BeenThere: freebsd-x11@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1640897713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZtNBGMvpHffobx1u9nxjJqdB+l0SGR08ia/5aZLWdu4=; b=CQj9JsBvQEHh5jpBEd5lR1U8dw+diWMwHj/vkSu+yL6Cxj1cv9wCGd3jiPzvQYS1nlTJYM uj/49WrQvMq421LpVMSRaHE6S0gRiinkXHZ/X3k81UDMUI+mMsI1eYgaCbAbFYwmIElTl5 uZaSuzhtVxofV4/tsWthCU/kpsaKVZcJEcYaJjek2EeQSdenUz08Lluz6/tNy4AFs5LLda nqZJ1uincV9K8/852N1/r4M0ZURW2/wjlk/Ku4srCl+hJN9LcT/Vvwd46wchWnHHedKW8C fHmzv8wSN6jgWcbg2aO5ey5faLYfh5XCH6zk0Yx3LLoD5Qbi3sgLg0PaNQPULQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1640897713; a=rsa-sha256; cv=none; b=bBVwWggs9BwyoxWldepVr/dxIvlZ6c7Dp4xtgJhwJCPQNs8AduU9kwjBs5VF9upqVNhv3Q IKnjXxBP/j1cYL8FZ7651yd2KzXLtKNMUrcb2fqVu/KNhWlaHek833t5yt64qHVqGIr/MY XA5xfA2ZmGyFo3sZE8tfF9no6WPWeIwOssyNxKnXQUjdUk6Kd6V8Re0Gxo/elJRZLuG0li oyBCdZApcBSWD8Yzeu77pK0JN7dTtDRpBdlNJHQ+67hAkFb3MCKSK26U90H9oyVXKM7MMf NbOWY4co08yRLsgRnQVlv/LjkrDAO5Cc6LdIs5nkfX++4va0jXgJmwj114ABvw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237544 --- Comment #11 from Bill Paul --- So, since I'm off work this week and have not much else to do, I decided to= try isolating the actual problem here. Now that I have a known working set of c= ode (drm-fbsd11.2-kmod) I thought I could compare it to the non-working code (drm-fbsd12.0-kmod) and gradually bisect things to narrow down the fault After much hair-pulling and gnashing of teeth, I finally isolated things do= wn to the dma-fence module in the linuxkpi code. Here's what I tried: - Replaced the contents of the drivers/gpu/drm/radeon directory in drm-fbsd12.0-kmod with the contents from the radeon directory in drm-fbsd11.2-kmod - Result: no change, panic still occurred - Replaced the contents of the drivers/gpu/drm/ttm directory in drm-fbsd12.0-kmod with the contents of the drm directory in drm-fbsd11.2-km= od (as well as the associated header files) - Result: no change, panic still occurred - Replaced the contents of the linuxkpi and drivers/gpu/drm/ttm directories= in drm-fbsd12.0-kmod with the contents of linuxkpi and ttm directories from drm-fbsd11.2-kmod (as well as the associated header files) - Result: No panic - Replaced _just_ the contents of the linuxkpi directory in drm-fbsd12.0-km= od with the contents of the linuxkpi directory in drm-fbsd11.2-kmod (this time taking care to preserve the ttm module; they are somewhat tightly coupled so this took a bit more effort) - Result: No panic - Replaced _just_ the dma-fence.h and linux_dmafence.c modules in the linux= kpi directory in drm-fbsd12.0-kmod with the ones from drm-fbsd11.2-kmod, and al= so tweaked linux_synx_file.c a little (it uses an API from the 12.0 code which isn't in the 11.2 code) - Result: No panic I'm still not exactly sure what's wrong here, but there seems to be a probl= em in the dma-fence module with locking and/or reference counting that causes fence structures to be deleted unexpectedly. This is what leads to the trap= s on bad pointers. I created a custom tarball of the drm-fbsd12.0-kmod port which includes pat= ches to the 4.16 FreeBSDDesktop 4.16 code to revert the dma-fence code as descri= bed above. You can download it from here: http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz The specific things I did are: 1) Replaced dma-fence.h and linux_dmafence.c in the drm-fbsd12.0-kmod port = with the versions drm-fbsd11.2-kmod. 2) Added a compat wrapper function in dma-fence.h for dma_fence_get_rcu_saf= e() which just calls dma_fence_get_rcu(). 3) Added a compat macro in dma-fence.h for dma_fence_is_signaled_locked() w= hich just calls dma_fence_is_signaled() 4) In linux_sync_file.c, changed the sync_fill_fence_info() function back to how it looked in the 11.2 codebase, because it uses dma_fence_get_status() = and DMA_FENCE_FLAG_TIMESTAMP_BIT, which were not available in the older 11.2 dma-fence code Just unpack the tarball under /usr/ports/graphics in place of the old one a= nd then run make, followed by "make deinstall" and "make reinstall". It occurred to me that instead of taking the older 11.2 dma-fence module and porting it forward, it might make more sense to take the 13.0 module and po= rt it back. But this assumes that the drm-fbsd13.0-kmod code doesn't have the = same stability problem it in as drm-fbsd12.0-kmod, and I don't know if that's tr= ue. (So far nobody has said whether or not they're using a Radeon card with 13.0 and whether or not they've encountered the same problems.) I may still try = this anyway if I'm still sufficiently bored. So far I've tested this on two devices: vgapci0@pci0:1:0:0: class=3D0x030000 card=3D0x21261028 chip=3D0x68f9100= 2 rev=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Cedar [Radeon HD 5000/6000/7350/8350 Series]' class =3D display subclass =3D VGA vgapci0@pci0:0:1:0: class=3D0x030000 card=3D0x168b103c chip=3D0x96481002 re= v=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Sumo [Radeon HD 6480G]' class =3D display subclass =3D VGA I'm using the machine with the CEDAR device right now. The laptop with the = SUMO device is much more prone to crashing. Usually what I do to provoke it is: - Boot and load the driver - Plug in my phone and set up tethering over USB - Start KDE5 - Start Firefox - Browse Facebook or Reddit for a while It usually panics within a few minutes. Lastly, I have a question: I followed up to this particular PR because the = it seemed to most closely match the problems I was having, but it's been close= d. Should I open a new PR? This bug is still present with 12.3 and I'm clearly= not the only one affected by it. (I also still can't explain why it doesn't see= m to affect the i915kms driver.) --=20 You are receiving this mail because: You are the assignee for the bug.=