[Bug 218954] [ZFS] Add a sysctl to toggle zfs_free_leak_on_eio
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Sat Apr 29 12:19:07 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218954
Bug ID: 218954
Summary: [ZFS] Add a sysctl to toggle zfs_free_leak_on_eio
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Keywords: patch
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: fk at fabiankeil.de
CC: freebsd-fs at FreeBSD.org
Created attachment 182174
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=182174&action=edit
sys/cddl: Add a sysctl to toggle zfs_free_leak_on_eio
The attached patch adds a sysctl to toggle zfs_free_leak_on_eio.
Setting the sysctl allows to break a previously-endless cycle
of ZFS collecting checksum errors for metadata.
Before setting vfs.zfs.free_leak_on_eio=1:
fk at t520 ~ $zpool status cloudia2
pool: cloudia2
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 308K in 53h23m with 3358 errors on Sun Apr 16 20:33:26
2017
config:
NAME STATE READ WRITE CKSUM
cloudia2 ONLINE 0 0 129
label/cloudia2.eli ONLINE 0 0 516
errors: 3362 data errors, use '-v' for a list
fk at t520 ~ $zpool status -v cloudia2
[..]
errors: Permanent errors have been detected in the following files:
<0x186>:<0x28>
<0x186>:<0x35>
<0xffffffffffffffff>:<0x28>
Every five seconds the checksum counter got increased.
zfsdbg-msg output:
2017 Apr 21 11:12:43: bptree index 0: traversing from min_txg=1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:43: bptree index 1: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 2: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 3: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 4: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 5: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 6: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 7: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 0: traversing from min_txg=1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:48: bptree index 1: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 2: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 3: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 4: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 5: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 6: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 7: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 0: traversing from min_txg=1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:53: bptree index 1: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 2: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 3: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 4: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 5: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 6: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 7: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 0: traversing from min_txg=1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:58: bptree index 1: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 2: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 3: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 4: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 5: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 6: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 7: traversing from min_txg=-1 bookmark
0/0/0/0
fk at t520 ~ $zpool get all cloudia2
NAME PROPERTY VALUE SOURCE
cloudia2 size 2.98T -
cloudia2 capacity 54% -
cloudia2 altroot - default
cloudia2 health ONLINE -
cloudia2 guid 4205907112567218706 default
cloudia2 version - default
cloudia2 bootfs - default
cloudia2 delegation on default
cloudia2 autoreplace off default
cloudia2 cachefile - default
cloudia2 failmode wait default
cloudia2 listsnapshots off default
cloudia2 autoexpand off default
cloudia2 dedupditto 0 default
cloudia2 dedupratio 1.00x -
cloudia2 free 1.37T -
cloudia2 allocated 1.62T -
cloudia2 readonly off -
cloudia2 comment - default
cloudia2 expandsize - -
cloudia2 freeing 24.2G default
cloudia2 fragmentation 32% -
cloudia2 leaked 0 default
[...]
After setting vfs.zfs.free_leak_on_eio=1:
zfsdbg-msg output:
2017 Apr 21 11:13:03: bptree index 0: traversing from min_txg=1 bookmark
-1/40/0/5120
2017 Apr 21 11:13:06: freed 100000 blocks in 3050ms from free_bpobj/bptree txg
17892; err=-1
2017 Apr 21 11:13:07: bptree index 0: traversing from min_txg=1 bookmark
-1/68/0/718
2017 Apr 21 11:13:08: bptree index 1: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 2: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 3: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 4: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 5: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 6: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 7: traversing from min_txg=-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: freed 96110 blocks in 1927ms from free_bpobj/bptree txg
17893; err=0
2017 Apr 21 11:15:33: command: zpool clear cloudia2
The checksum error counters stopped incrementing,
"freeing" went to 0 and "leaked" from 0 to 256M.
fk at t520 ~ $zpool get all cloudia2
NAME PROPERTY VALUE SOURCE
cloudia2 size 2.98T -
cloudia2 capacity 53% -
cloudia2 altroot - default
cloudia2 health ONLINE -
cloudia2 guid 4205907112567218706 default
cloudia2 version - default
cloudia2 bootfs - default
cloudia2 delegation on default
cloudia2 autoreplace off default
cloudia2 cachefile - default
cloudia2 failmode wait default
cloudia2 listsnapshots off default
cloudia2 autoexpand off default
cloudia2 dedupditto 0 default
cloudia2 dedupratio 1.00x -
cloudia2 free 1.39T -
cloudia2 allocated 1.59T -
cloudia2 readonly off -
cloudia2 comment - default
cloudia2 expandsize - -
cloudia2 freeing 0 default
cloudia2 fragmentation 32% -
cloudia2 leaked 256M default
[...]
The difference on the receiving side confirmed that some space had been
recovered:
[fk at kendra ~]$ zfs list -r -p -t all dpool/ggated/cloudia2
NAME USED AVAIL
REFER MOUNTPOINT
[...]
dpool/ggated/cloudia2 at 2017-04-21_10:37 9251840 -
1812645106176 -
dpool/ggated/cloudia2 at 2017-04-21_11:17 3950592 -
1800267106304 -
It's not obvious to me if the 256M were really leaked
but either way it looks like a clear win.
On another ZFS pool with the same issue but backed by an USB disk all
the space in "freeing" was supposedly "leaked" but it was a lot less
to begin with:
Before setting vfs.zfs.free_leak_on_eio=1:
fk at t520 /usr/src $zpool get all wde4
NAME PROPERTY VALUE SOURCE
wde4 size 1.81T -
wde4 capacity 94% -
wde4 altroot - default
wde4 health ONLINE -
wde4 guid 14402430966328721211 default
wde4 version - default
wde4 bootfs - default
wde4 delegation on default
wde4 autoreplace off default
wde4 cachefile - default
wde4 failmode wait default
wde4 listsnapshots off default
wde4 autoexpand off default
wde4 dedupditto 0 default
wde4 dedupratio 1.00x -
wde4 free 107G -
wde4 allocated 1.71T -
wde4 readonly off -
wde4 comment - default
wde4 expandsize - -
wde4 freeing 1.18M default
wde4 fragmentation 23% -
wde4 leaked 0 default
After setting vfs.zfs.free_leak_on_eio=1:
fk at t520 /usr/src $zpool get all wde4
NAME PROPERTY VALUE SOURCE
wde4 size 1.81T -
wde4 capacity 94% -
wde4 altroot - default
wde4 health ONLINE -
wde4 guid 14402430966328721211 default
wde4 version - default
wde4 bootfs - default
wde4 delegation on default
wde4 autoreplace off default
wde4 cachefile - default
wde4 failmode wait default
wde4 listsnapshots off default
wde4 autoexpand off default
wde4 dedupditto 0 default
wde4 dedupratio 1.00x -
wde4 free 107G -
wde4 allocated 1.71T -
wde4 readonly off -
wde4 comment - default
wde4 expandsize - -
wde4 freeing 0 default
wde4 fragmentation 23% -
wde4 leaked 1.18M default
[...]
The pool was affected by the issue since 2015:
https://lists.freebsd.org/pipermail/freebsd-fs/2015-February/020845.html
Obtained from: ElectroBSD
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-fs
mailing list