nvme(4) losing control, and subsequent use of fsck_ffs(8) with UFS

From: Graham Perrin <grahamperrin_at_gmail.com>
Date: Sat, 17 Jul 2021 13:32:07 +0100
When the file system is stress-tested, it seems that the device (an 
internal drive) is lost.

A recent photograph:

<https://photos.app.goo.gl/wB7gZKLF5PQzusrz7>

Transcribed manually:

nvme0: Resetting controller due to a timeout.
nvme0: resetting controller
nvme0: controller ready did not become 0 within 5500 ms
nvme0: failing outstanding i/o
nvme0: WRITE sqid:2 cid:115 nsid:1 lba:296178856 len:64
nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:115 cdw0:0
g_vfs_done():nvd0p2[WRITE(offset=151370924032, length=32768)]error = 6
UFS: forcibly unmounting /dev/nvd0p2 from /
nvme0: failing outstanding i/o

… et cetera.

Is this a sure sign of a hardware problem? Or must I do something 
special to gain reliability under stress?

I don't how to interpret parts of the manual page for nvme(4). There's 
direction to include this line in loader.conf(5):

nvme_load="YES"

– however when I used kldload(8), it seemed that the module was already 
loaded, or in kernel.

Using StressDisk:

<https://github.com/ncw/stressdisk>

– failures typically occur after around six minutes of testing.

The drive is very new, less than 2 TB written:

<https://bsd-hardware.info/?probe=7138e2a9e7&log=smartctl>

I do suspect a hardware problem, because two prior installations of 
Windows 10 became non-bootable.

Also: I find peculiarities with use of fsck_ffs(8), which I can describe 
later. Maybe to be expected, if there's a problem with the drive.
Received on Sat Jul 17 2021 - 12:32:07 UTC

Original text of this message