Re: Changes in cam/nvme causes issues?
- Reply: Alexander Leidinger : "Re: Changes in cam/nvme causes issues?"
- In reply to: Warner Losh : "Re: Changes in cam/nvme causes issues?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 23 Dec 2025 09:31:43 UTC
Am 2025-12-22 17:58, schrieb Warner Losh:
> On Sun, Dec 21, 2025 at 8:37 AM Alexander Leidinger
> <Alexander@leidinger.net> wrote:
>
> Am 2025-12-14 14:05, schrieb Warner Losh:
>
> Let's do one issue at a time. There's too much missing info. Top
> posting since there's not a lot of context to this request
>
> The disk died now completely, so the CRC errors are out of reach now.
>
> First, let's start with pciconf -l of the nvme drive. I have a strong
> idea, but need some data.
>
> While already provided privately with some other data, here for the
> public so that people are aware that currently there is an issue with
> such drives:
> nvme0@pci0:5:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d
> device=0xa809 subvendor=0x144d subdevice=0xa801
> Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V
Yea, so far this is the only report I've received, and there's not
enough data in it to reproduce it with any of the dozen NVMe drives that
I have, or to spot a difference with what I know I check in the code. So
if it's compiled into the kernel with cam also compiled into the kernel,
I know it works.
CAM is in the kerne, nvme is loaded as a module (from 15-current):
---snip---
# kldstat | egrep '(nvm|cam)'
2 1 0xffffffff811e3000 20db8 nvme.ko
---snip---
I will do a clean rebuild with the most recent 16-current and provide a
full dmesg if this still doesn't work.
Bye,
Alexander.
> Warner
>
> Bye,
> Alexander.
>
> Also, the disk report needs full logs with and without the settings
> that have uncorrectable in them. I'd expect that a shorter timeout
> would lead to different behavior, but maybe that error syndrome isn't
> one I've seen. It would also be helpful to know which of the times
> changes the behavior...
>
> Warner
>
> On Sun, Dec 14, 2025, 5:06 AM Alexander Leidinger
> <Alexander@leidinger.net> wrote: Hi Warner,
>
> I try to update a 15-current (as of 2025-11-27-110715) to a recent 16
> (as of 2025-12-13-132815). It fails to import a pool due to a missing
> nvme. I also have a broken HD in this system... to be on the safe side
> I
> mention it.
>
> This is from 15-current:
> ---snip---
> NAME STATE READ WRITE CKSUM
> rpool DEGRADED 0 0 0
> mirror-0 DEGRADED 0 0 0
> diskid/DISK-WD-WCC4N4KLEZT7p3 ONLINE 0 0 0
> diskid/DISK-WD-WCC4N1DF9DA2p3 ONLINE 0 0 0
> diskid/DISK-WD-WX52D625R0NTp3 ONLINE 0 0 0
> diskid/DISK-WD-WCC4N1PYJ3F8p3 OFFLINE 0 0 0
> logs
> diskid/DISK-493504058890547p1 ONLINE 0 0 0
> cache
> diskid/DISK-493504058890547p2 ONLINE 0 0 0
>
> NAME STATE READ WRITE CKSUM
> space DEGRADED 0 0 0
> raidz2-0 DEGRADED 0 0 0
> diskid/DISK-WD-WCC4N4KLEZT7p4 ONLINE 0 0 0
> diskid/DISK-WD-WCC4N1DF9DA2p4 ONLINE 0 0 0
> diskid/DISK-WD-WX52D625R0NTp4 ONLINE 0 0 0
> diskid/DISK-WD-WX52D625R2TPp4 ONLINE 0 0 0
> diskid/DISK-WD-WCC4N1PYJ3F8p4 OFFLINE 0 0 0
> logs
> diskid/DISK-S649NL0T819360Vp2 ONLINE 0 0 0
> cache
> diskid/DISK-S649NL0T819360Vp3 ONLINE 0 0 0
> ---snip---
>
> The offline marked partitions are on the same HD (the broken one). The
> DISK-S649NL0T819360V device use as log and cache in the second pool
> causes the issue on 16-current.
>
> On 16-current I get "uncorrectable parity/CRC error" messages on boot
> from the broken disk. I used this to get rid of those errors:
> ---snip---
> # grep kern.cam /tmp/be_mount.MhLw/boot/loader.conf
> kern.cam.tur_timeout="60"
> kern.cam.inquiry_timeout="60"
> kern.cam.modesense_timeout="60"
> ---snip---
>
> But the second pool ("space") fails to get imported. When I import it
> via "zpool import -m space" it shows me that the log and cache devices
> (different partitions on the same hardware) are not available.
> This is the device in question as seen from 15-current:
> ---snip---
> nda0: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
> nda0: Serial Number S649NL0T819360V
> [1] nda0: nvme version 1.4
> nda0: 953869MB (1953525168 512 byte sectors)
> [1] GEOM: new disk nda0
> ...
> [1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1
> pass6: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
> pass6: Serial Number S649NL0T819360V
> [1] pass6: nvme version 1.4
> ---snip---
>
> In case you need some info from the 15- or 16-current BE, which info do
> you need?
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP
> 0x8F31830F9F2772BF
> http://www.FreeBSD.org netchild@FreeBSD.org : PGP
> 0x8F31830F9F2772BF
--
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF
--
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF