[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 02 Sep 2021 10:10:42 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258208
Bug ID: 258208
Summary: [zfs] locks up when using rollback or destroy on both
13.0-RELEASE & sysutils/openzfs port
Product: Base System
Version: 13.0-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: bin
Assignee: bugs@FreeBSD.org
Reporter: dch@freebsd.org
zfs operations such as rollback or destroy deadlock FreeBSD. Subsequent
commands
such as `mount -p` or `bectl list` also hang. writing data as files still
works.
dmesg, syslog etc are all empty.
## environment
tried under "default" 13.0-RELEASE-p3 zfs, and also under "kmod"
zfs-2.1.99-1
zfs-kmod-v2021073000-zfs_7eebcd2be
I will build CURRENT with debug, re-try this, and report back.
## pools
embiggen/koans dataset (zpool of 4 drive striped mirrors)
envy (nvme zpool)
pool: embiggen
state: ONLINE
scan: scrub repaired 0B in 02:42:41 with 0 errors on Thu Aug 26 18:22:44 2021
config:
NAME STATE READ WRITE CKSUM
embiggen ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gpt/zfs0 ONLINE 0 0 0
gpt/zfs1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gpt/zfs2 ONLINE 0 0 0
gpt/zfs3 ONLINE 0 0 0
errors: No known data errors
pool: envy
state: ONLINE
scan: scrub repaired 0B in 00:07:20 with 0 errors on Thu Aug 26 15:47:42 2021
config:
NAME STATE READ WRITE CKSUM
envy ONLINE 0 0 0
gpt/envy ONLINE 0 0 0
errors: No known data errors
## default 13.0-RELEASE zfs
- the parent process is not a zombie process
- not writing a coredump
- can't be attached to from lldb etc
- won't respond to `kill -CONT ...`
```
$ top -SjwbHzPp 47255
last pid: 20225; load averages: 0.78, 0.81, 0.70 up 2+02:11:14 14:14:38
3744 threads: 10 running, 3693 sleeping, 6 stopped, 35 waiting
CPU 0: 2.6% user, 0.8% nice, 1.0% system, 0.0% interrupt, 95.6% idle
CPU 1: 3.0% user, 0.9% nice, 1.1% system, 0.6% interrupt, 94.3% idle
CPU 2: 3.2% user, 1.0% nice, 1.2% system, 0.0% interrupt, 94.6% idle
CPU 3: 3.3% user, 1.0% nice, 1.2% system, 0.0% interrupt, 94.6% idle
CPU 4: 3.4% user, 1.0% nice, 1.2% system, 0.1% interrupt, 94.4% idle
CPU 5: 3.5% user, 1.0% nice, 1.2% system, 0.1% interrupt, 94.3% idle
CPU 6: 3.5% user, 1.0% nice, 1.3% system, 1.1% interrupt, 93.0% idle
CPU 7: 3.4% user, 1.0% nice, 1.2% system, 0.0% interrupt, 94.3% idle
Mem: 2489M Active, 20G Inact, 472M Laundry, 66G Wired, 8074K Buf, 35G Free
ARC: 51G Total, 34G MFU, 9441M MRU, 15M Anon, 518M Header, 7401M Other
36G Compressed, 69G Uncompressed, 1.93:1 Ratio
Swap: 252G Total, 252G Free
PID JID USERNAME PRI NICE SIZE RES SWAP STATE C TIME WCPU
COMMAND
47255 0 dch 20 0 1316M 111M 0B STOP 4 0:00 0.00%
beam.smp{10_dirty_io_sch}
47255 0 dch 20 0 1316M 111M 0B STOP 4 0:00 0.00%
beam.smp{sys_sig_dispatc}
$ ps augx -p 47255
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
dch 47255 0.0 0.1 1347516 113316 1 TX 13:57 0:03.48
/usr/local/lib/erlang24/erts-12.0.3/bin/beam.smp -- -root
/usr/local/lib/erlang24 -progname erl
$ sudo procstat -kk 47255
PID TID COMM TDNAME KSTACK
47255 574625 beam.smp sys_sig_dispatc mi_switch+0xc1
thread_suspend_switch+0xc0 thread_single+0x69c exit1+0xc1 sys_sys_exit+0xd
amd64_syscall+0x10c fast_syscall_common+0xf8
47255 574653 beam.smp 10_dirty_io_sch mi_switch+0xc1
_sleep+0x1cb rms_rlock_fallback+0x90 zfs_lookup+0x7e
zfs_freebsd_cachedlookup+0x6b vfs_cache_lookup+0xad lookup+0x68c namei+0x487
kern_statat+0xcf sys_fstatat+0x2f amd64_syscall+0x10c fast_syscall_common+0xf8
```
This situation repeats after reboot, and shows a hanging zfs rollback or
similar command each time:
```
/sbin/zfs rollback -r embiggen/...@pristine
18990 168480 zfs - mi_switch+0xc1
_vm_page_busy_sleep+0x100 vm_page_sleep_if_busy+0x28 vm_object_page_remove+0xdf
vn_pages_remove+0x4c zfs_rezget+0x35 zfs_resume_fs+0x258 zfs_ioc_rollback+0x158
zfsdev_ioctl_common+0x4e3 zfsdev_ioctl+0x143 devfs_ioctl+0xc7 vn_ioctl+0x1a4
devfs_ioctl_f+0x1e kern_ioctl+0x26d sys_ioctl+0xf6 amd64_syscall+0x10c
fast_syscall_common+0xf8
```
--
You are receiving this mail because:
You are the assignee for the bug.