Random panics in 11.0 and 12.0 on J1900

Marco Steinbach coco at executive-computing.de
Sat Jul 20 16:56:30 UTC 2019


> I have a set of J1900 hosts running 11.0-RELEASE-p1 that experience
> seemingly random panics. The panics are all basically the same:
>
> Fatal trap 12: page fault while in kernel mode
> fault code = supervisor read data, page not present
>
> Adding workloads to the hosts seems to increase panic frequency, but the
> panics have also occurred on completely idle hosts. Similarly, uptime
> when panicking has been as low as minutes, and as high as ~620 days.
>
> For reasons, it has not been possible to extract a coredump from these
> hosts, nor practical to run memtest on them or upgrade them to a newer
> release. About 1% of our hosts are affected each day, so we've just been
> living with the problem.
>
> However, while testing 12.0 on the same hardware, I encountered the same
> panic and was able to capture the core dump. (See below.)
>
> All of my Google-fu on this panic has turned up threads suggesting the
> problem is hardware, but there are two problems with that idea...
>
> One, memtest has turned up no errors on 12.0 host I witnessed the panic
> on.
>
> Two, a small number of systems on the same hardware are running
> 10.3-RELEASE, and have experienced no panics in their history. Panics
> have only happened on 11s, and now 12.
>
> kgdb output from the panic follows. (This particular host was in the
> middle of rebooting when it panicked.)
>
> Hoping someone here has some insight. My uninformed wild-ass guess is
> something relating to spectre/meltdown fixes.
>
> Thanks,
>
>
> -Snow

I've been running 10.x, 11.x and 12.0 for a while on several J1900s, namely ASRock Q1900M and Q1900M Pro3 boards.

All of them are getting a good beating on occassion, running for example poudriere on top of GELI and ZFS software RAIDs attached to the onboard 2-port ACHI SATA controller and Marvel based PCIe 4-port SATA controllers.

I've outfitted all of them with 4-port Intel PRO/1000 PCIe driven by igb(4), and am not using the onboard re(4) NICs.

I can't recall ever seeing a panic like you described. Could you share a full dmesg and what mainboard(s) you are using ?

MfG CoCo



More information about the freebsd-stable mailing list