From nobody Tue Feb 13 04:06:21 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TYnnX6Fbzz59k2G for ; Tue, 13 Feb 2024 04:06:24 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TYnnX5kCxz47gJ for ; Tue, 13 Feb 2024 04:06:24 +0000 (UTC) (envelope-from truckman@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1707797184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Fs7J4jfA6FZQpbnWD9Mc5cKrmoG12EZIajnfqhzqhXk=; b=EHC0O8O1DxkrVZx7lwqJcPqq9cHpHaoCd9Uxd/GwrAVb9SY09U4Ua2p0l5BQnWs4Y5HJgZ FJiqOP+DZ+5Qp91s+LLz1aodmcHyE117qj6KFCavizmZIFj1sDgFIjsEnuhEa9ebWr8wl8 xYlU+9LX4t6ZCD/pXMc98rIbQaTS+DdIFTV4S3p9bzHYVGbsiaY5wgBWy3NurVFc8sWD2c logPfMWZ7hPBoKShF4Ng/QSJI/schg6x1o7F52vhzbuUBwDn7ivwyYj6Pue0zrcUJxgZyi VG7OyDvpVqHVZ8iYxen+PzaUY2Rh5hBCnBPisSxwGJ2Mvn4ZXa81fRZhCv+9xg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1707797184; a=rsa-sha256; cv=none; b=iggunIMpTe5vC/P95xHPGIR24bDRpviRw3WIA8d7C6CWPhdTCwxJlFTFv+hdRIFEGAx9hb 7fg2jau++70sDv7y3Jhc7/mzVNo7NM8kAlIv/Na5CQFzqyRqKLuaWR1feX1yQV9tCcos// fs3zCgzCIRWu4MaalXqnbP/S4Dz4HIP345TCub49zyMblNN9Lb8BATVRiRo6HzHIkZWyg2 fjrStuIFP/VYi6jqg4me6bO0qzjijYrV4tnHtitG9sOseMz9b/RPMWgDY4l0tEo3SwiO2M Er4LZ6okkFJaZysnSScKP9sqPGx0+K68Tt6j7mlCgHi+09J4Z4RImZlPU1jAbA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1707797184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Fs7J4jfA6FZQpbnWD9Mc5cKrmoG12EZIajnfqhzqhXk=; b=UpO2v3fSHmGgZFZ+DSSHoSt+wus3F0RDPkFTl+BhWYqBm3ortb2dfPyKDPjg0qtPfNwwck LNhz7Z7BZ8QiEt+yKr72uofO50CRyuGlal4gS+/s7IlRvP82prYLEJ0TJ98p2gyHin67ET UfOLdj+eQcSwzWBwhFctg0C77cxPHe41OuMxyKgrw1OIrC9QaDW/amacvtr79Pnsjf9JRk 6p/0guRynn01DtgvhmHmsMP7/W1ERQvzpsfxz1rvc43sM9ivizO6Bouzhc+KJpP324ys1Y +LvaRWEioGFLxi9nW3zTaeDb1rTNiSzuWbS+NcTSIO5y6fJ02pF2fLcjPpCzug== Received: from gw.catspoiler.org (unknown [IPv6:2602:304:cd45:5b11::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: truckman) by smtp.freebsd.org (Postfix) with ESMTPSA id 4TYnnX1yh8zgn0 for ; Tue, 13 Feb 2024 04:06:24 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from dl (uid 1001) (envelope-from truckman@FreeBSD.org) id 239f63 by gw.catspoiler.org (DragonFly Mail Agent v0.13 on mousie.catspoiler.org); Mon, 12 Feb 2024 20:06:21 -0800 Date: Mon, 12 Feb 2024 20:06:21 -0800 (PST) From: Don Lewis Subject: Re: nvme controller reset failures on recent -CURRENT To: Mark Johnston cc: FreeBSD current , John Baldwin In-Reply-To: Message-ID: References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE On 12 Feb, Mark Johnston wrote: > On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: >> I just upgraded my package build machine to: >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e >> from: >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 >> and I've had two nvme-triggered panics in the last day. >> >> nvme is being used for swap and L2ARC. I'm not able to get a crash >> dump, probably because the nvme device has gone away and I get an error >> about not having a dump device. It looks like a low-memory panic >> because free memory is low and zfs is calling malloc(). >> >> This shows up in the log leading up to the panic: >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti >> meout. > > Are you by chance using the drive mentioned here? https://github.com/openzfs/zfs/discussions/14793 > > I was bitten by that and ended up replacing the drive with a different > model. The crash manifested exactly as you describe, though I didn't > have L2ARC or swap enabled on it. Nope: nda0 at nvme0 bus 0 scbus9 target 0 lun 1 nda0: nda0: Serial Number BTNH940617WE512A nda0: nvme version 1.3 nda0: 488386MB (1000215216 512 byte sectors) I'm not seeing super high I/O rates> I happened to have iostat running when the machine paniced: 0 584 88.4 31 2.68 65.8 112 7.18 68.2 107 7.13 80 0 20 0 0 0 565 99.1 32 3.06 27.9 74 2.01 30.5 70 2.08 80 0 20 0 0 0 612 92.8 31 2.77 18.9 148 2.74 18.9 148 2.73 86 0 14 0 0 0 618 88.6 13 1.17 25.0 59 1.44 24.2 61 1.44 89 0 11 0 0 0 586 45.4 5 0.22 31.4 55 1.70 30.8 57 1.70 84 0 16 0 0 0 598 12.7 3 0.03 38.1 64 2.40 37.1 66 2.40 84 0 16 0 0 0 675 36.1 6 0.21 23.7 156 3.62 22.7 164 3.63 88 0 12 0 0 0 641 6.9 6 0.04 25.7 243 6.10 25.3 246 6.08 71 0 29 0 0 0 737 20.1 9 0.18 36.4 148 5.24 37.2 144 5.24 78 0 22 0 0 0 578 44.7 23 1.03 25.1 164 4.01 25.5 161 3.99 86 0 14 0 0 0 608 70.3 15 1.06 51.1 64 3.19 51.3 64 3.19 89 0 11 0 0 0 624 38.6 9 0.35 32.3 121 3.80 32.2 121 3.79 90 0 10 0 0 0 577 80.6 16 1.28 37.8 66 2.44 36.5 69 2.46 90 0 10 0 0 tty nda0 ada0 ada1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 566 87.7 16 1.39 27.2 60 1.60 25.3 66 1.62 87 0 13 0 0 0 599 77.2 11 0.83 17.4 391 6.66 17.3 395 6.66 74 0 26 0 0 0 660 45.0 7 0.31 18.7 575 10.51 18.6 578 10.49 76 0 24 0 0 0 615 37.7 8 0.31 24.0 303 7.11 24.0 303 7.11 58 0 42 0 0 Fssh_packet_write_wait: ... port 22: Broken pipe ada* are old and slow spinning rust. That report does mention something else that could also be a cause. I upgraded the motherboard BIOS around the same time. When I get a chance, I'll drop back to the older FreeBSD version and see if the problem goes away.