From nobody Sun Jan 09 23:30:52 2022 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9BA74192EE10 for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JXCrD2hXgz4vPQ for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 3D2B81B10C for ; Sun, 9 Jan 2022 23:30:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 209NUqGl025413 for ; Sun, 9 Jan 2022 23:30:52 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 209NUq2Q025412 for bugs@FreeBSD.org; Sun, 9 Jan 2022 23:30:52 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 253461] LinuxKPI: [AMD/ATI] RV730 PRO [Radeon HD 4650] crashes kernel Date: Sun, 09 Jan 2022 23:30:52 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: noisetube@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? maintainer-feedback? maintainer-feedback? maintainer-feedback? mfc-stable13? mfc-stable12? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1641771052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=58haikVdLNa5o0V/+DXMi+DyHRVX4FyzYCpQBrbiHzo=; b=pjQi4eaeOMD4QyDyIeyTaBkerXHPMrjX4ns02+mEAwuMicrl5Bp9XLAGDeprXsZJBHZGBI ywDa8oOxIwOE/Oqya6N4HazMsGZZwg3Z+771/lVYKlgg+YJACtaMOec3TvMvdo7D24Algh XI0v27vNDprVyoD4u1zpDpHL0HY7cMd+g3lApnUJARfeNu5mejlgn+rAEPEdZxKXebiyOs Tn/3G6tzdmRXkc3WsTeOboJut1HWRyGxi0OnGcVnrAeFTJPq/ySHN/L2UzbIttfhg/9sBy JkIa+0SdbfH3R+2eAQgLw4545dCr7Ft5dxLFMJSIRw+/XUK8d3ghC/oMWsHZ6w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1641771052; a=rsa-sha256; cv=none; b=ZFiToNkQgCY19Vy33G1PH52mXPoayIPXDA2K+mLy1dsP48wxK8i23GN8dBXl463D2waQvw y0QiZD9EQKl2nq/lVEiz1ubZUoC1pQVIRDnhpqER6cn4h3hdP0zSiJjUQ8SV3WTlMxGf84 9Hk7mPy0Lvsv85FGUa0dJ2zumY0x8M+6wRkTXC64gXQuZRdS5zp+1tLxdz5e7nmGlTIC+0 wYFFmR7WL/KUqibBKH+7Z0Q/DcSYYjJbrQTuUpyCFEQ26BPlJ/jtK24PWJ8Cvr6Qih9z4q wyfPXdDRrO6bQ+Y1Csrj3/AGESVJpat7C04u8gWduBJvqEy7Nlujz00uEWehWg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253461 --- Comment #9 from Bill Paul --- (In reply to Vladimir Kondratyev from comment #7) It looked to me that originally Linux had the dma_fence_signal() API and la= ter a new API dma_fence_signal_locked() was added. According to what I've read,= the idea is that dma_fence_signal_locked() can be used if the caller is already holding the DMA fence object spinlock, while the older dma_fence_signal() function takes the lock for you. The question here is: when you signal a dma fence object, and you invoke its attached callout routines, do you hold the spinlock or do you drop it? The older linuxkpi code in drm-fbsd11.2-kmod was based on Linux 4.11 and on= ly had the dma_fence_signal() API, and that code always held the fence spinlock when invoking the callouts. In drm-fbsd12.0-kmod, based on Linux 4.16, both dma_fence_signal() and dma_fence_signal_locked() are present. HOWEVER, the logic is now such that = both functions drop the dma fence spinlock when calling the callouts. This changes the behavior of dma_fence_signal(), and I think the change was wrong (though likely unintentional). Now, dma_fence_signal() drops the spin= lock when invoking the callouts. This does not seem to harm the Intel i915kms.ko driver, but it seems to cause the radeonkms.ko driver driver to panic when = the system is under load. I must assume that dropping the lock leads to a race condition when two different threads try to access the same dma fence objec= t. If you browse the most recent Linux kernel code, you can also see that this behavior is inconsistent with the native Linux implementations of dma_fence_signal() and dma_fence_signal_locked(): https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-fence.c#= L376 The dma_fence_signal_timestamp_locked() function shown here is used by both dma_fence_signal() and dma_fence_signal_locked(). dma_fence_signal() takes = the fence spinlock before calling it. Note that the fence spinlock is _not_ released when invoking the callbacks. From this I am forced to conclude: - When calling dma_fence_signal(), the fence spinlock is supposed to be held until the function returns, including when the callbacks are called. - When calling dma_fence_signal_locked(), the same is true, except it is the caller that's expected to take the fence spinlock. - The current behavior in drm-fbsd12.0-kmod where the lock is dropped when invoking the callouts is therefore wrong on two counts: it deviates from the Linux behavior, which breaks synchronization in the Radeon driver. I think my fix preserves the expected behavior of both routines, because dma_fence_signal_unlocked() does not call dma_fence_signal_unlocked_sub() w= ith the spinlock held, while dma_fence_signal() does. My office machine with the CAICOS chipset has been running with this fix fo= r a week now and has been stable. I've also been using the same fix on my laptop with the SUMO chipset with the same fix for a bit longer and it also hasn't crashed. Before the laptop would not last more than 5 minutes before it wou= ld panic. -Bill --=20 You are receiving this mail because: You are the assignee for the bug.=