[Bug 270943] Complete system freeze on Asus dual socket AMD 7742 system
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 270943] Complete system freeze on Asus dual socket AMD 7742 system"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 270943] Complete system freeze on Asus dual socket AMD 7742 system"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 270943] Complete system freeze on Asus dual socket AMD 7742 system"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 20 Apr 2023 02:08:36 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270943
Bug ID: 270943
Summary: Complete system freeze on Asus dual socket AMD 7742
system
Product: Base System
Version: 13.2-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: misc
Assignee: bugs@FreeBSD.org
Reporter: nb@synthcom.com
Created attachment 241605
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=241605&action=edit
dmesg.boot For this system
I have a dual socket 7742 system (128 total real cores, 128 threads) that will
completely lock up the system in under an hour if left idle. By "lock up", this
means:
* Console unresponsive (no keyboard/USB/numlock)
* Networking unresponsive (no pings, no arps, nothing)
Like it's "jumping to self" with all interrupts disabled. The system needs to
be reset or power cycled. I have tried the following distributions over the
last few months with the same results:
FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
FreeBSD 13.1
FreeBSD 13.0
FreeBSD 12.3
Several memstick images of 14.0 since December 2022
Other notes:
* The lockup is guaranteed. I've never had it not lock up when left idle.
Always locks up in <1 hour (usually in 10-20 minutes).
* If I run a "stress" program, the system runs for days at a time without any
observed lockups. If there's any significant system activity, it appears to not
lock up.
* At one point (on a 14.0 build) I was able to get the kernel debugger compiled
in. When the system locked up, hitting the local USB keyboard sequence to get
in to the kernel debugger worked. This also seemed to unlock the system, as
after I exited the kernel debugger, the system was alive again.
* I've installed the OSes on either 2GB M.2 Samsung SSDs *OR* on a Western
Digital SN200 NVME disk. No changes in behavior. Storage does not appear to be
a factor.
* I've halved the memory and swapped DIMMs entirely. No change.
System specs:
Motherboard : Asus rs700a-e11-rs12u-wocpu009z
CPUs : Dual AMD 7742 CPUs
BIOS Version : 0901
BMC Firmware model : RS700A-E11-RS12U
BMC Firmware version: 1.2.15
Installed ECC memory: 512GB
Storage : Two Samsung EVO 980 TB M.2 SSDs, and a WD SN200 7.68TB
NVME U.2 disk
Video is the ASpeed AST2500, which supplies video for the system.
I'd be happy to put this system on the internet and allow any and all
interested parties access to it for troubleshooting/debugging. Thank you!
--
You are receiving this mail because:
You are the assignee for the bug.