[Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug.
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 05 Jul 2022 23:04:50 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264141 --- Comment #23 from Warner Losh <imp@FreeBSD.org> --- (In reply to dgilbert from comment #22) > theory: FreeBSD is stomping on the host DRAM reserved for the NVME There's no host ram reserved for nvme, per se. The driver will optionally allocate memory for the drive to use, however. Do you have "nvmeX: Allocated %lluMB host memory buffer" in your dmesg? Without it, you're not using nvme memory. You can set the tunable hw.nvme.hmb_max=0 as well to disable using host memory for the DRAM-less cards at the cost of some additional latency if you think that this is the cause of the problem. This would rule it out as a problem. There may be some cards that lose their minds when this is enabled as well, though I've not seen reports of that in Linux world (I could easily have missed them). Ruling this in/out would be useful... But corrupting host memory seems unlikely to be a cause given that the card drops off the bus and has its memory BARs reset so it isn't decoding anything (which is what's indicated by the possible hotplug messages). This indicates some kind of power or connection issue to the card, a faulty power controller on the card or wonky firmware in the cases that I've diagnosed. There might be a possible additional cause that's still unknown, but absent better evidence I'm at a loss for where to look. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug.