Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

From: Warner Losh <imp_at_bsdimp.com>
Date: Fri, 01 Sep 2023 22:00:03 UTC
I think that the problem is that admsmn has probed, but not attached (or
failed to attach for some reason), so we find the device, but it's not
initialized yet, so when we call amdsmn_read, it tries to lock a mutex
that's not yet initialized.

Not sure why this is happening, or why loading it as modules fixes it...

But since I don't have the hardware, I can't help more. Sorry.

Warner

On Fri, Sep 1, 2023 at 10:21 AM Gary Jennejohn <garyj@gmx.de> wrote:

> On Fri, 01 Sep 2023 17:14:02 +0200
> "Herbert J. Skuhra" <herbert@gojira.at> wrote:
>
> > On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:
> > >
> > > On Fri, 01 Sep 2023 14:15:20 +0200
> > > "Herbert J. Skuhra" <herbert@gojira.at> wrote:
> > >
> > > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> > > > >
> > > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7
> 3700X.
> > > > >
> > > > > These are respectively Zen 1 and Zen 2 CPUs.
> > > > >
> > > > > I built a kernel on both computers using the FreeBSD-15 source
> tree.
> > > > >
> > > > > If I include the amdtemp device in my kernel file BOTH computers
> end up
> > > > > with a kernel panic while trying to attach the amdtemp device.
> > > > >
> > > > > If I remove amdtemp both computers boot without any issues.
> > > > >
> > > > > I suspect that this commit is the cause:
> > > > >
> > > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > > > > Author: Akio Morita <akio.morita@kek.jp>
> > > > > Date:   Tue Aug 1 22:32:12 2023 +0200
> > > > >
> > > > >     amdsmn(4), amdtemp(4): add support for Zen 4
> > > > >
> > > > >     Zen 4 support, tested on Ryzen 9 7900
> > > > >
> > > > >     Reviewed by:    imp (previous version), mhorne
> > > > >     Approved by:    mhorne
> > > > >     Obtained from:
> http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> > > > >     Differential Revision:  https://reviews.freebsd.org/D41049
> > > >
> > > > Thanks for sharing your findings.
> > > >
> > > > Now I probably know why my old kernel from stable/13 no longer booted
> > > > after updating to stable/14. I've create a new kernel config and
> > > > forgot to add "device amdtemp" & "device amdsmn" and forgot about the
> > > > issue. After removing only "device amdtemp" from my old kernel config
> > > > it boots again.
> > > >
> > > > Unfortunately reverting this commit (git revert -n 323a94afb623)
> > > > doesn't resolve this issue. Old kernel does not boot if "device
> > > > amdtemp" is enabled. Probably wrong commit or I am doing somethig
> > > > wrong!?
> > > >
> > >
> > > Strange.  My FreeBSD-14 kernel boots with device amdtemp (which
> automatically
> > > results in amdsmn being included).  It's FreeBSD-15 which fails for me.
> >
> > 1. 'kload amdtemp' works:
> >    12    1 0xffffffff81e7c000     3160 amdtemp.ko
> >    13    1 0xffffffff81e80000     2138 amdsmn.ko
> >
> >    amdsmn0: <AMD Family 17h System Management Network> on hostb0
> >    amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
> >
> > 2. GENERIC boots fine. The following kernel does not:
> >
> >    include GENERIC
> >
> >    ident      TEST
> >    device     amdtemp
> >
> > 3. Unfortunately this is a remote server without a serial console. I
> > don't get a crashdump and I can't find anything in /var/log/messages.
> >
> > 4. I have no good revision for stable/14 and main. On main I always
> > use GENERIC-NODEBUG. :-(
> >
>
> Thanks, Herbert!  kldload'ing amdsmn and amdtemp really does work!
>
> Now I can run FBSD-15 :)
>
> --
> Gary Jennejohn
>
>