From nobody Tue Jan 04 22:52:02 2022 X-Original-To: x11@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 38748192AE2F for ; Tue, 4 Jan 2022 22:52:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JT7Cj6ZdTz4tdD for ; Tue, 4 Jan 2022 22:52:01 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id C3129213C8 for ; Tue, 4 Jan 2022 22:52:01 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 204Mq1dt065462 for ; Tue, 4 Jan 2022 22:52:01 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 204Mq1WC065461 for x11@FreeBSD.org; Tue, 4 Jan 2022 22:52:01 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: x11@FreeBSD.org Subject: [Bug 253461] [AMD/ATI] RV730 PRO [Radeon HD 4650] panic kernel Date: Tue, 04 Jan 2022 22:52:02 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: panic X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: noisetube@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: X11 List-Archive: https://lists.freebsd.org/archives/freebsd-x11 List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-x11@freebsd.org X-BeenThere: freebsd-x11@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1641336721; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2u17oZzbo86bE64hj0tXUeRYCeRRWDpINGHEnV0C1x4=; b=d3qe59SOkEphLVYnKiHMKUkT7WVxoDMNhvkm/8NE/pKRUHOhI4w3hMrZQCuJ9tnlFdixOB KUDG5S5PmlVkJ38nW/Vv9KjvI3SXJKbiomHtIVpqo1T5WBwmKx+lV6lfoUDM6a972LqBGn 4BLFFtyn2Zni34/bbwC+tyKFs14xQEt0aS0FFO1vcD11PZPQDxU8naIioZOmZvDV6UnfDj jRfBCOwcW/+oKn11nwh3zk9RKH37rIZgdrtoSO75wNofQngTEYCzCX4KCgN372rz6nsabZ evLWBjqZPtjljUEcSX8/U/NgwU4u9Hp6tdjub3eZuSQXFuU4AGag1M1NQfT3/Q== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1641336722; a=rsa-sha256; cv=none; b=bKiu5/HtsVkiuWwKs5z+aWI+EZWCxZmG7ileuzl6zRRCghAYiFTI2ldN6PGWeYwDjpfH6I 4qF1e+Yw+yTKypdpojDOiFHpgR4LA1PJyL9XuBTG/sTqsrvu+ceOD/l/NYMc1mIIPaizzK +/6P3EEViyGfb83TgLZ42mHl0nmD7kVaTus5XQ7X3hboGrVwv0i11N5xh0gFH5lEhQr/T9 JwhrenzA1/S4wyr7fZBkb6sq4f6485tI4phEMpfUS0Yrp0jbRqBh8ePlMfEfgzKN53khbG qaHsQQEaxt074u/jKyqn12e1jAyszqRLHqgdvV968OOn+QLvp/hl1JBVzVwCjQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253461 Bill Paul changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |noisetube@gmail.com --- Comment #3 from Bill Paul --- I believe I have a fix for this bug. It is a problem with the linuxkpi code= in the FreeBSDDesktop-kms-drm-4.16.g20201016-8843e1fc5_GH0.tar.gz distribution. Notes: - This problem has been there for some time. I've had it happen in FreeBSD 12.2-RELEASE and FreeBSD 12.3-RELEASE. - It's not confined to a single Radeon card. I've observed the problem with= the following hardware on different machines: vgapci0@pci0:1:0:0: class=3D0x030000 card=3D0x21261028 chip=3D0x68f9100= 2 rev=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Cedar [Radeon HD 5000/6000/7350/8350 Series]' class =3D display subclass =3D VGA vgapci0@pci0:0:1:0: class=3D0x030000 card=3D0x168b103c chip=3D0x96481002 re= v=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Sumo [Radeon HD 6480G]' class =3D display subclass =3D VGA vgapci1@pci0:131:0:0: class=3D0x030000 card=3D0x90b8103c chip=3D0x6771100= 2 rev=3D0x00 hdr=3D0x00 vendor =3D 'Advanced Micro Devices, Inc. [AMD/ATI]' device =3D 'Caicos XTX [Radeon HD 8490 / R5 235X OEM]' class =3D display subclass =3D VGA (Note that the Sumo device is built into a laptop, an HP ProBook 4535S.) - This problem has been reported by others. PR 237544 is a duplicate. The panics I experienced had the same stack traces as shown in both PRs. - PR 237544 provides an important hint that this crash did _not_ happen with the drm-fbsd11.2-kmod port/package. Although it has been deprecated, I was = able to build and install the drm-fbsd11.2-kmod code on my FreeBSD 12.3-RELEASE system (the laptop) and the crashes went away. - In my case, the panics were more likely to occur when the system was under load. The laptop seemed to trigger it more frequently (which actually made = it easier to track it down). I tried to track the problem down by comparing the the drm-fbsd11.2-kmod and drm-fbsd12.0-kmod code and swapping bits of the 11.2 code into the 12.0 tre= e to see what effect that would have. Eventually I traced the problem to the linuxkpi code, and then to the dma-fence code, and then finally, to this function in linuxkpi/gplv2/include/linux/dma-fence.h: static inline void dma_fence_signal_locked_sub(struct dma_fence *fence) { struct dma_fence_cb *cur; while ((cur =3D list_first_entry_or_null(&fence->cb_list, struct dma_fence_cb, node)) !=3D NULL) { list_del_init(&cur->node); spin_unlock(fence->lock); /* <-- No! */ cur->func(fence, cur); spin_lock(fence->lock); /* <-- No! */ } }=20 Note the two lines highlited above. The dma_fence_signal_locked_sub() routine is shared by both dma_fence_signa= l() and dma_fence_signal_locked(). The latter function is intended to be used w= hen the caller is already holding the fence spinlock. The former takes the spin= lock itself. The problem is that the above code causes the spinlock to be dropped in the case where dma_fence_signal() is called. This is not the same behavior as t= he older 11.2 code: in that case, the lock is held while the callouts are invo= ked. (I *think* this is also the case in the later code in FreeBSD 13 too.) I believe that dropping the lock before calling the callouts opens a race condition window and this is what leads to the crash. It's difficult to ascertain that this is the what's happening from the crash stack traces, bu= t in my analysis I found that at least sometimes the problem was that something = was trying to dereference a NULL DMA fence pointer. I patched my copy of the code to remove the spin_unlock() and spin_lock() c= alls shown above, and that seemed to fix the problem. The laptop has not crashed since I did this. I also made the same change to the 12.2-RELEASE system wi= th the "Cedar" card and exercised it a bit, and that one seemed to run ok too.= I have just patched the "Caicos" machine today and so far it's running stable= as well (this is my work machine and this is my first day back at the office f= or the new year). I created a version of the drm-fbsd12.0-kmod port with this change included= as a patch, which can be downloaded from here: http://people.freebsd.org/~wpaul/radeon/drm-fbsd12.0-kmod.tar.gz I will also attach the patch to this PR. Can someone please test this to see if it fixes the problem for them too? Note: I happen to have about 3 or 4 extra Radeon cards as spares (I rescued these from the e-waste bin) and would be happen to send one to a developer = if that would help (assuming they have a machine with a slot that can accommod= ate it). --=20 You are receiving this mail because: You are on the CC list for the bug.=