Fri, 07 Jul 2023 01:38:12 UTC

--- Comment #48 from Jonathan Vasquez <jon@xyinn.org> ---
Hey all,

So I spent a few hours today debugging this issue on 13.2-RELEASE and I have
interesting stuff to report.


1. There definitely seems to be a race condition somewhere with how either the
AMD Raven HDA Controller is being enumerated, or how it's being accessed.

2. I was able to build on John's idea regarding the delays and come up with
something that seems to no longer crash my system. Although I don't think it
might be an acceptable solution since it would introduce a delay to all
"hdac_intr_handler()" calls for any device that uses that function. But I'll
keep testing it locally to see if I notice any new types of weirdness (outside
of any known ones that I've experienced before this patch), and also because I
don't want to have my system continuing to crash. A side note is that I ordered
2 PCIe sound cards that I want to see if they are FreeBSD compatible, which
would help mitigate this issue if anything. Best case scenario, we fix this
issue, and I also end up having a better sounding sound card that's not the
on-board sound :).

3. We can experience different types of severity levels depending on the length
of the delay.


So this is how the patch looks like in order to allow my system to no longer
crash on first boot:

diff --git a/sys/dev/sound/pci/hda/hdac.c b/sys/dev/sound/pci/hda/hdac.c
index 9aa0e4bffdc8..e9d581a422cb 100644
--- a/sys/dev/sound/pci/hda/hdac.c
+++ b/sys/dev/sound/pci/hda/hdac.c
@@ -378,6 +378,11 @@ hdac_one_intr(struct hdac_softc *sc, uint32_t intsts)
 static void
 hdac_intr_handler(void *context)
+       /*
+        * Add slight delay to avoid crashes with AMD Raven HDA Controllers
+        */
+       DELAY(5000);
        struct hdac_softc *sc;
        uint32_t intsts;


- If there is no DELAY (the default), the system will crash.
- If there is a DELAY of 1000, the system won't crash, but we will see access
errors! Which is revealing.


hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000

- If there is a DELAY of 5000, the system won't crash, and we no longer see any

In the situations where I don't use delays (and leading up to this reduced
solution), I was able to have the machine stop crashing if I added at least 4
printf statements lol. If I used 3 printf, it would crash. I suppose 4 printf
is relatively equal to a DELAY of 5000 for me.

As stated before, with the above patch, the machine no longer crashes for me on
a cold boot. I was also able to access and use my pcm8 device immediately and
sound worked. This is progress.

I've attached the following files:

- bad.0.txt - Shows the access errors with a delay of 1000 with my previous
expanded debug messages.
- good.0.txt - Shows a good cold boot with a delay of 5000 with my previous
expanded debug messages.
- bad.1.txt - Shows the access errors with a delay of 1000 (minimal logging).

root@weshly:/usr/src # uname -a
FreeBSD weshly 13.2-RELEASE-p1 FreeBSD 13.2-RELEASE-p1 #23
releng/13.2-n254621-08b87f63a046-dirty: Thu Jul  6 21:22:10 EDT 2023    
root@weshly:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

debugging on:

commit 08b87f63a046bd966bd0ed548211ae98ff50e638 (HEAD -> releng/13.2,
Author: Gordon Tetlow <gordon@FreeBSD.org>
Date:   Tue Jun 20 22:40:02 2023 -0700

    Add UPDATING entries and bump version.

    Approved by:    so

