[Bug 286869] ctld randomly corrupted data

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 17 May 2025 09:48:43 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286869

            Bug ID: 286869
           Summary: ctld randomly corrupted data
           Product: Base System
           Version: Unspecified
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: d8zNeCFG@aon.at

Scenario:
- Server running FreeBSD stable/14 (ca. Feb. 1) or openSUSE Leap 15.6 (dual
boot)
- zpool accessible by both operating systems
- zpool contains zvols which are exported as iSCSI targets:
  . on FreeBSD using ctl
  . on openSUSE using targetcli
- Client running FreeBSD stable/14
- Latest port on client (and FreeBSD server)
- Client running Windows 10 using virtualbox-ose-6.1.50, accessing the iSCSI
target

Result:
- When running the server with openSUSE, the client always works correctly.
- A few days ago, when running the server with FreeBSD stable/14, the client
complained about a corrupted disk, ultimately resulting in the disk becoming
unbootable - it seems that even the EFI boot ultimately got corrupted
- The symptoms perfectly fit what is described in
https://www.reddit.com/r/freebsd/comments/15asthr/freebsd_iscsi_disk_corruption_issue/
- Switching the server back to openSUSE and rolling back the zvol to the latest
backup snapshot resolved the issue.
- Additionally, (only) when the server is running FreeBSD, the client VM stops
with a dropped iSCSI connection; on the server console, iscsi ping timeout
messages are printed. However, this is most likely unrelated to the issue
described here.
- Today, running the same client again, using FreeBSD on the server, it was
(seemed to be) working normally, except again for several freezes due to
dropped iSCSI connections. I ultimately switched to openSUSE on the server,
rolled back the changes, and started again.

Some observations:
- The same zvol was previously on an older FreeBSD stable/14 server; there the
issue never occurred. It was transferred to the zpool on the new server using
zfs send -R | zfs receive.
- Some of the differences between the old and the new server:
  . Network interface:
    old: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet>
    new: re0: <Realtek PCIe 2.5GbE Family Controller>
  . Disks comprising the zpools:
    old: SATA spinning rust
    new: SSD
- In general, the new server is orders of magnitude faster than the old one. It
might be that there is a race condition in CTL which gets triggered by this.

As a final aside note, it seems virtualbox-ose-6.1.50 has lost the ability to
just unsuspend it after a lost iSCSI connection. This worked previously, but
now I must explicitly stop the machine using suspend-to-disk and then restart
it to get the iSCSI connection going again. This should go into another bug
report...

-- Martin

-- 
You are receiving this mail because:
You are the assignee for the bug.