ZFS pool hangs (live-locks?) after adding L2ARC
Date: Wed, 20 Dec 2023 13:31:15 UTC
Hello!
System in question is FreeBSD 13.2-STABLE stable/13-n256849-05c55eed44e5.
I have 3 ZFS pools: one "simple" (nda1p2) (it is system pool with BE, root, etc), and two radiz1 pools:
"zstor", consisting of 5 HDDs (daX) and
"ztorr", consisting of 3 HDDs (adaX).
Also, I have NVMe disk nvme0 (nda0, it is brand new AData Legend 960 2TB) with 1 GPT partition of type "freebsd-swap"
(it is NOT configured or enabled as swap in the system!). Size of this partition is 1.6T.
When I try to add nda0p1 (AData partition) as "cache" to "zstor" pool it is added without problem, but later pool hangs.
I've experienced 2 hangs:
(1) Right after adding cache and reboot import of pool hangs. When I tried to import pool by hands in single-user mode,
I've seen that one kernel thread with name like "z_int_2_2" consume 100% of one core.
I've waited for one hour without any result. After that I've removed NMVe physically, booted successfully
and removed it with "zpool remove".
(2) After that I've re-added "cache" device and everything worked for some time (10+ days). But suddenly one filesystem on the pool
(only one!) starts to livelock: if you do "ls" on this filesystem it hangs forever, "ls" consume one core (100%) in system and
again thread with name like "z_int_X_Y" consumes 100% of other core. "ls" could not be killed, only reboot (which hangs too after
"all bufs synced"!) helps. But after reboot it reproduced again, with exactly same symptoms.
This time I was able to remove chache device with "zpool remove", without detaching it physically.
Status of pool:
> zpool status zstor
pool: zstor
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 1.85T in 04:02:19 with 0 errors on Sat Dec 9 16:21:33 2023
config:
NAME STATE READ WRITE CKSUM
zstor ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da4 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da1 ONLINE 0 0 0
da0 ONLINE 0 0 0
errors: No known data errors
I have two non-default settings for zfs:
vfs.zfs.min_auto_ashift=12
vfs.zfs.abd_scatter_enabled=0
I can not find any discussion about such problem on Internet. Also, "live" system doesn't have these "z_int_X_Y" threads at all.
I want my L2ARC, I've payed for this NVMe!
--
// Lev Serebryakov