[Bug 265222] T480 zfs checksum mismatch error

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 14 Jul 2022 22:16:46 UTC

            Bug ID: 265222
           Summary: T480 zfs checksum mismatch error
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: mail@malisek.org

I encounter severe i/o related issues on my thinkpad T480 
Those seem to be similar to #239801 #241476 & #262421 although I'm not using
any special/obscure drivers on this thinkpad.
After clean install and configuration of FreeBSD 13.1-release following errors
appear after I extract big archive or decide to reinstall big package.

zpool status -v
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:38 with 3 errors on Thu Jul 14 22:13:49 2022

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          ada0p4.eli  ONLINE       0     0    12

errors: Permanent errors have been detected in the following files:

When i scrub the drive following errors appear in log:

Jul 14 22:13:33 ltop ZFS[59816]: pool I/O failure, zpool=zroot error=97
Jul 14 22:13:33 ltop ZFS[60942]: checksum mismatch, zpool=zroot
path=/dev/ada0p4.eli offset=77657673728 size=131072
Jul 14 22:13:49 ltop ZFS[99701]: pool I/O failure, zpool=zroot error=97
Jul 14 22:13:49 ltop ZFS[621]: checksum mismatch, zpool=zroot
path=/dev/ada0p4.eli offset=193825124352 size=131072
Jul 14 22:13:49 ltop ZFS[2696]: pool I/O failure, zpool=zroot error=97
Jul 14 22:13:49 ltop ZFS[4749]: checksum mismatch, zpool=zroot
path=/dev/ada0p4.eli offset=197592481792 size=131072

I decided to try few different things to verify whether it's hw related issue
or not:

1. I tried using spare motherboard (and cpu), spare sata ssd (new) and spare
cable for ssd that I managed to borrow. Errors on FreeBSD still persists.
neither memtest86+ or thinkpad bios diagnosis tools reported any issues with
the hardware. both ssds tried are new and no error was reported by smartctl

2. I tried downloading several big distro iso files on windows 10 and do
checksuming there manually. None of those checksums failed on windows so it
seems to be freebsd issue.

3. I did not try running and testing linux with openzfs yet.

Worth note:

1. pkg also reports randomly checksum mismatch too (especially for big
packages) which causes install to fail

2. This seems to happen only when X11 is running. No such error occurred when
my post install script installs packages when X is not even present.

3. System even crashed randomly several times on previous install due to
partial corruption of drm-kmod.




You are receiving this mail because:
You are the assignee for the bug.