Re: Changes in cam/nvme causes issues?
- Reply: Warner Losh : "Re: Changes in cam/nvme causes issues?"
- In reply to: Warner Losh : "Re: Changes in cam/nvme causes issues?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 21 Dec 2025 15:35:52 UTC
Am 2025-12-14 14:05, schrieb Warner Losh:
> Let's do one issue at a time. There's too much missing info. Top
> posting since there's not a lot of context to this request
The disk died now completely, so the CRC errors are out of reach now.
> First, let's start with pciconf -l of the nvme drive. I have a strong
> idea, but need some data.
While already provided privately with some other data, here for the
public so that people are aware that currently there is an issue with
such drives:
nvme0@pci0:5:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d
device=0xa809 subvendor=0x144d subdevice=0xa801
Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V
Bye,
Alexander.
> Also, the disk report needs full logs with and without the settings
> that have uncorrectable in them. I'd expect that a shorter timeout
> would lead to different behavior, but maybe that error syndrome isn't
> one I've seen. It would also be helpful to know which of the times
> changes the behavior...
>
> Warner
>
> On Sun, Dec 14, 2025, 5:06 AM Alexander Leidinger
> <Alexander@leidinger.net> wrote:
>
>> Hi Warner,
>>
>> I try to update a 15-current (as of 2025-11-27-110715) to a recent 16
>> (as of 2025-12-13-132815). It fails to import a pool due to a missing
>> nvme. I also have a broken HD in this system... to be on the safe side
>> I
>> mention it.
>>
>> This is from 15-current:
>> ---snip---
>> NAME STATE READ WRITE CKSUM
>> rpool DEGRADED 0 0 0
>> mirror-0 DEGRADED 0 0 0
>> diskid/DISK-WD-WCC4N4KLEZT7p3 ONLINE 0 0 0
>> diskid/DISK-WD-WCC4N1DF9DA2p3 ONLINE 0 0 0
>> diskid/DISK-WD-WX52D625R0NTp3 ONLINE 0 0 0
>> diskid/DISK-WD-WCC4N1PYJ3F8p3 OFFLINE 0 0 0
>> logs
>> diskid/DISK-493504058890547p1 ONLINE 0 0 0
>> cache
>> diskid/DISK-493504058890547p2 ONLINE 0 0 0
>>
>> NAME STATE READ WRITE CKSUM
>> space DEGRADED 0 0 0
>> raidz2-0 DEGRADED 0 0 0
>> diskid/DISK-WD-WCC4N4KLEZT7p4 ONLINE 0 0 0
>> diskid/DISK-WD-WCC4N1DF9DA2p4 ONLINE 0 0 0
>> diskid/DISK-WD-WX52D625R0NTp4 ONLINE 0 0 0
>> diskid/DISK-WD-WX52D625R2TPp4 ONLINE 0 0 0
>> diskid/DISK-WD-WCC4N1PYJ3F8p4 OFFLINE 0 0 0
>> logs
>> diskid/DISK-S649NL0T819360Vp2 ONLINE 0 0 0
>> cache
>> diskid/DISK-S649NL0T819360Vp3 ONLINE 0 0 0
>> ---snip---
>>
>> The offline marked partitions are on the same HD (the broken one). The
>> DISK-S649NL0T819360V device use as log and cache in the second pool
>> causes the issue on 16-current.
>>
>> On 16-current I get "uncorrectable parity/CRC error" messages on boot
>> from the broken disk. I used this to get rid of those errors:
>> ---snip---
>> # grep kern.cam /tmp/be_mount.MhLw/boot/loader.conf
>> kern.cam.tur_timeout="60"
>> kern.cam.inquiry_timeout="60"
>> kern.cam.modesense_timeout="60"
>> ---snip---
>>
>> But the second pool ("space") fails to get imported. When I import it
>> via "zpool import -m space" it shows me that the log and cache devices
>> (different partitions on the same hardware) are not available.
>> This is the device in question as seen from 15-current:
>> ---snip---
>> nda0: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
>> nda0: Serial Number S649NL0T819360V
>> [1] nda0: nvme version 1.4
>> nda0: 953869MB (1953525168 512 byte sectors)
>> [1] GEOM: new disk nda0
>> ...
>> [1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1
>> pass6: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
>> pass6: Serial Number S649NL0T819360V
>> [1] pass6: nvme version 1.4
>> ---snip---
>>
>> In case you need some info from the 15- or 16-current BE, which info
>> do
>> you need?
>>
>> Bye,
>> Alexander.
>>
>> --
>> http://www.Leidinger.net Alexander@Leidinger.net: PGP
>> 0x8F31830F9F2772BF
>> http://www.FreeBSD.org netchild@FreeBSD.org : PGP
>> 0x8F31830F9F2772BF
--
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF