Re: Changes in cam/nvme causes issues?

From: Alexander Leidinger <Alexander_at_Leidinger.net>
Date: Sun, 21 Dec 2025 15:35:52 UTC
Am 2025-12-14 14:05, schrieb Warner Losh:

> Let's do one issue at a time. There's too much missing info. Top 
> posting since there's  not a lot of context to this request

The disk died now completely, so the CRC errors are out of reach now.

> First, let's start with pciconf -l of the nvme drive. I have a strong 
> idea, but need some data.

While already provided privately with some other data, here for the 
public so that people are aware that currently there is an issue with 
such drives:
nvme0@pci0:5:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d 
device=0xa809 subvendor=0x144d subdevice=0xa801
Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V

Bye,
Alexander.

> Also, the disk report needs full logs with and without the settings 
> that have uncorrectable in them. I'd expect that a shorter timeout 
> would lead to different behavior, but maybe that error syndrome isn't 
> one I've seen. It would also be helpful to know which of the times 
> changes the behavior...
> 
> Warner
> 
> On Sun, Dec 14, 2025, 5:06 AM Alexander Leidinger 
> <Alexander@leidinger.net> wrote:
> 
>> Hi Warner,
>> 
>> I try to update a 15-current (as of 2025-11-27-110715) to a recent 16
>> (as of 2025-12-13-132815). It fails to import a pool due to a missing
>> nvme. I also have a broken HD in this system... to be on the safe side 
>> I
>> mention it.
>> 
>> This is from 15-current:
>> ---snip---
>> NAME                               STATE     READ WRITE CKSUM
>> rpool                              DEGRADED     0     0     0
>> mirror-0                         DEGRADED     0     0     0
>> diskid/DISK-WD-WCC4N4KLEZT7p3  ONLINE       0     0     0
>> diskid/DISK-WD-WCC4N1DF9DA2p3  ONLINE       0     0     0
>> diskid/DISK-WD-WX52D625R0NTp3  ONLINE       0     0     0
>> diskid/DISK-WD-WCC4N1PYJ3F8p3  OFFLINE      0     0     0
>> logs
>> diskid/DISK-493504058890547p1    ONLINE       0     0     0
>> cache
>> diskid/DISK-493504058890547p2    ONLINE       0     0     0
>> 
>> NAME                               STATE     READ WRITE CKSUM
>> space                              DEGRADED     0     0     0
>> raidz2-0                         DEGRADED     0     0     0
>> diskid/DISK-WD-WCC4N4KLEZT7p4  ONLINE       0     0     0
>> diskid/DISK-WD-WCC4N1DF9DA2p4  ONLINE       0     0     0
>> diskid/DISK-WD-WX52D625R0NTp4  ONLINE       0     0     0
>> diskid/DISK-WD-WX52D625R2TPp4  ONLINE       0     0     0
>> diskid/DISK-WD-WCC4N1PYJ3F8p4  OFFLINE      0     0     0
>> logs
>> diskid/DISK-S649NL0T819360Vp2    ONLINE       0     0     0
>> cache
>> diskid/DISK-S649NL0T819360Vp3    ONLINE       0     0     0
>> ---snip---
>> 
>> The offline marked partitions are on the same HD (the broken one). The
>> DISK-S649NL0T819360V device use as log and cache in the second pool
>> causes the issue on 16-current.
>> 
>> On 16-current I get "uncorrectable parity/CRC error" messages on boot
>> from the broken disk. I used this to get rid of those errors:
>> ---snip---
>> # grep kern.cam /tmp/be_mount.MhLw/boot/loader.conf
>> kern.cam.tur_timeout="60"
>> kern.cam.inquiry_timeout="60"
>> kern.cam.modesense_timeout="60"
>> ---snip---
>> 
>> But the second pool ("space") fails to get imported. When I import it
>> via "zpool import -m space" it shows me that the log and cache devices
>> (different partitions on the same hardware) are not available.
>> This is the device in question as seen from 15-current:
>> ---snip---
>> nda0: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
>> nda0: Serial Number S649NL0T819360V
>> [1] nda0: nvme version 1.4
>> nda0: 953869MB (1953525168 512 byte sectors)
>> [1] GEOM: new disk nda0
>> ...
>> [1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1
>> pass6: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
>> pass6: Serial Number S649NL0T819360V
>> [1] pass6: nvme version 1.4
>> ---snip---
>> 
>> In case you need some info from the 15- or 16-current BE, which info 
>> do
>> you need?
>> 
>> Bye,
>> Alexander.
>> 
>> --
>> http://www.Leidinger.net Alexander@Leidinger.net: PGP 
>> 0x8F31830F9F2772BF
>> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 
>> 0x8F31830F9F2772BF

-- 
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF