From nobody Tue Mar 08 14:34:05 2022 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3EE701A08CAE for ; Tue, 8 Mar 2022 14:34:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KCdB509cxz4TfL for ; Tue, 8 Mar 2022 14:34:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id DAD281E631 for ; Tue, 8 Mar 2022 14:34:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 228EY4o6011960 for ; Tue, 8 Mar 2022 14:34:04 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 228EY47v011959 for bugs@FreeBSD.org; Tue, 8 Mar 2022 14:34:04 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 262421] zfs checksum errors and panic with invalid abd_t Date: Tue, 08 Mar 2022 14:34:05 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jfc@mit.edu X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1646750045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=uRmwJDnAKvpZk6Hb5ktBxP8DPJEQKQFFpGeG7IKPd1c=; b=wc4x7xNbv09GtTzg8mTjVmxN+ftRMqPO4Lhcecnk4QDCoIk4E1f/7SuhjX+ztBOMDkj2x/ L3dy3UUXtP+Fo828PN2UPSa7o402ySVWS/C9jd5A1hmCGqKnl5H47mlhtvFRLZmVcxjvbC lmjdK+s8Xwy0v1HDhXaQfIWLfbuLLIGznwFyM14X45vQh94j5wpiHGt4YuH5M9vEqj4oSA 7VoGDjQ1ri8YOrdp8shYD16/MB96OCkQu0/lHuQbZ/5kQ/fJMIHwyw88t1CrQxhcxyN+Np vc7tnnm4l0rOTENtPTDzkB1ms/pooDUarLfrK3s1DKkbtmNbiUIP55AUnkRv+g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1646750045; a=rsa-sha256; cv=none; b=QkPnmBN8mt/ithfxe/lCjEtx/trSpC45LT+R6YjmHfV++eh1Th/F/3O82XtbMqEpbjq/9/ 3jndZADVYhpYam8J2gv70VxZdt7u/IreR+5asJYA3OMO27jpHQ9HU1U5ZbU4Ywu5jiGIhl EnelDgdamYnS8oPKRFv2fhYbva+Sc4Wb1kUAq97wXQuYm+FolzdBA3hHgceXT1U8M8h8BH y5khTRvSMs/1Ij2Q9HJiyFtPn4JdB9GEipf5n0ZYE/o/Lc0Frpv7b6MApb8T03FMYj+5KR SIcOPbjKIdeVzYgFSafDIy9TpcFIfvPIMjKxlozgAvtZfseANqdtpTPLBMM2Ng== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D262421 Bug ID: 262421 Summary: zfs checksum errors and panic with invalid abd_t Product: Base System Version: 13.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: jfc@mit.edu During a scrub my zfs pool reported a few dozen checksum errors per disk, about 1 per 200 GB scanned: $ zpool status -v data pool: data state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Sun Mar 6 19:16:15 2022 13.6T scanned at 942M/s, 11.7T issued at 202M/s, 18.2T total 2.42M repaired, 64.64% done, 09:16:24 to go config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada0 ONLINE 0 0 18 (repairing) ada1 ONLINE 0 0 17 (repairing) ada2 ONLINE 0 0 12 (repairing) ada3 ONLINE 0 0 23 (repairing) cache ada4p5 ONLINE 0 0 0 errors: No known data errors This affects all disks so it is not a single bad disk (unless the cache disk is bad). More likely it is data corruption in the controller, the data path from controller to kernel ZFS code, or the ZFS data structures. After several hours the system crashed with VERIFY3(abd->abd_size <=3D SPA_MAXBLOCKSIZE) failed (930062841 <=3D 1677721= 6) This indicates a corrupt abd_t structure (see abd.c line 113). savecore did not generate a stack trace. After rebooting the checksum error counters had reset to zero and the scrub finished without error. Probably something mysterious and irreproducible in the state of my kernel that one time. My kernel was up to date on stable/13: FreeBSD flaviventris 13.1-PRERELEASE FreeBSD 13.1-PRERELEASE #8 stable/13-n249920-d1f3afc4a47: Mon Mar 7 10:10:37 EST 2022=20=20=20=20 root@flaviventris:/usr/obj/usr/src/amd64.amd64/sys/CALIGATA amd64 Worth noting: 1. I have dedup enabled. 2. I have encryption enabled. 3. Since the previous scrub I did a zfs dump | zfs restore of close to 50% of the pool size to enable encryption. The pool was very nearly full when I had both an encrypted and an unencrypted copy around. Now it is half full. 4. In /etc/make.conf I set "CPUTYPE?=3Damdfam10", appropriate for the HP MicroServer hardware. ada0 to ada3 are identical spinning disks, ada4 (cache) is SSD. ahci0: port 0xe050-0xe057,0xe040-0xe043,0xe030-0xe037,0xe020-0xe0 23,0xe000-0xe01f mem 0xfea40000-0xfea407ff at device 0.0 on pci1 ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahci0: quirks=3D0x1000900 ada3: ACS-4 ATA SATA 3.x device ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 9537536MB (19532873728 512 byte sectors) ada4: ACS-4 ATA SATA 3.x device ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada4: Command Queueing enabled ada4: 953869MB (1953525168 512 byte sectors) --=20 You are receiving this mail because: You are the assignee for the bug.=