Re: nvme device errors & zfs

From: Tomek CEDRO <tomek_at_cedro.info>
Date: Tue, 05 Nov 2024 11:10:35 UTC
On Tue, Nov 5, 2024 at 10:15 AM Dave Cottlehuber <dch@freebsd.org> wrote:
> these are samsung 990, mainly chosen for low price at the time:
> nda0: <Samsung SSD 990 PRO 2TB 0B2QJXG7 S7DNNJ0WC12665P>
> nda1: <Samsung SSD 990 PRO 2TB 0B2QJXG7 S7DNNJ0WC12664X>

These are pretty decent and not really cheap drives!

Magician software can upgrade firmware and perform other checks, works
on Windoze macOS and Android:

https://www.samsung.com/ca/support/model/MZ-V9P2T0B/AM/#downloads

> I forgot to mention dmesg prior:
> Oct 31 16:11:05 wintermute kernel[9406]: nvme1: Resetting controller due to a timeout.
> Oct 31 16:11:05 wintermute kernel[9406]: nvme1: event="start"
> Oct 31 16:11:05 wintermute kernel[9406]: nvme1: Waiting for reset to complete
> Oct 31 16:11:05 wintermute kernel[9406]: nvme1: Waiting for reset to complete
> ... repeated x400

Another idea is maybe disk overheats and resets itself to cool down?

Lots of people in the reviews of various nvme drives asks about
temperature and suggests using heatsink ;-)


-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info