[Bug 244048] mksnap_ffs hangs machine (12.1 regression over 11.3)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sun Feb 21 18:51:38 UTC 2021


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244048

--- Comment #6 from ml at netfence.it ---
After some investigation this is what I found.
(Notice I'm no kernel expert, so I just hope I'm not saying stupid things).

The thread originating from mksnap_ffs is stuck in softdep_check_suspend,
sleeping on mp->mnt_secondary_writes ("secwr" for userland utilities).
Full backtrace:
#0  sched_switch (td=0xfffff8000237e760, newtd=0xfffff8000212c760,
flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1  0xffffffff805a5294 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:452
#2  0xffffffff805f272b in sleepq_switch (wchan=0xfffff80004120a00, pri=119) at
/usr/src/sys/kern/subr_sleepqueue.c:626
#3  0xffffffff805f25c3 in sleepq_wait (wchan=0xfffff80004120a00, pri=119) at
/usr/src/sys/kern/subr_sleepqueue.c:705
#4  0xffffffff805a4a6b in _sleep (ident=0xfffff80004120a00, lock=<optimized
out>, priority=631, wmesg=0xffffffff80921d1b "secwr", sbt=0, pr=0, flags=256)
at /usr/src/sys/kern/kern_synch.c:217
#5  0xffffffff807b6324 in softdep_check_suspend (mp=0xfffff80004120000,
devvp=0xfffff800041ff780, softdep_depcnt=15, softdep_accdepcnt=50188,
secondary_writes=1, secondary_accwrites=106) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:14299
#6  0xffffffff807c19a6 in ffs_sync (mp=0xfffff80004120000, waitfor=4) at
/usr/src/sys/ufs/ffs/ffs_vfsops.c:1620
#7  0xffffffff8067c8bf in vfs_write_suspend (mp=0xfffff80004120000, flags=0) at
/usr/src/sys/kern/vfs_vnops.c:1864
#8  0xffffffff8079c0b9 in ffs_snapshot (mp=0xfffff80004120000,
snapfile=0xfffff80004214780
"A\003\215\200\377\377\377\377\070\232\254\200\377\377\377\377 \b\361\035") at
/usr/src/sys/ufs/ffs/ffs_snapshot.c:430
#9  0xffffffff807bfe5a in ffs_mount (mp=<unavailable>) at
/usr/src/sys/ufs/ffs/ffs_vfsops.c:479
#10 0xffffffff80661a54 in vfs_domount_update (td=0xfffff80080ac9401,
vp=<optimized out>, fsflags=<optimized out>, optlist=<optimized out>) at
/usr/src/sys/kern/vfs_mount.c:1037
#11 vfs_domount (td=0xfffff80080ac9401, fstype=<optimized out>,
fspath=<optimized out>, fsflags=<optimized out>, optlist=0xfffffe000053aa38) at
/usr/src/sys/kern/vfs_mount.c:1191
#12 0xffffffff80660b27 in vfs_donmount (td=0xfffff8000237e760, fsflags=2166784,
fsoptions=0xfffff80004108600) at /usr/src/sys/kern/vfs_mount.c:726
#13 0xffffffff80660312 in sys_nmount (td=0xfffff8000237e760,
uap=0xfffff8000237eb20) at /usr/src/sys/kern/vfs_mount.c:431
#14 0xffffffff808418b7 in syscallenter (td=0xfffff8000237e760) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#15 amd64_syscall (td=0xfffff8000237e760, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1163
#16 <signal handler called>
#17 0x00000008002dcb9a in ?? ()

I *think* it should be awakened by softdep_flush thread, in function
process_worklist_item.
Alas, this is stuck waiting for a buffer in bufspace_wait.
Full backtrace:
#0  sched_switch (td=0xfffff8000425f760, newtd=0xfffff800040df000,
flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1  0xffffffff805a5294 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:452
#2  0xffffffff805f272b in sleepq_switch (wchan=0xffffffff80a0a8b8
<bdomain+33208>, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:626
#3  0xffffffff805f25c3 in sleepq_wait (wchan=0xffffffff80a0a8b8
<bdomain+33208>, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:705
#4  0xffffffff805a4a6b in _sleep (ident=0xffffffff80a0a8b8 <bdomain+33208>,
lock=<optimized out>, priority=96, wmesg=0xffffffff808e5676 "newbuf", sbt=0,
pr=0, flags=256) at /usr/src/sys/kern/kern_synch.c:217
#5  0xffffffff8064f6ff in bufspace_wait (bd=0xffffffff80a02700 <bdomain>,
vp=0xfffff800042145a0, gbflags=<optimized out>, slpflag=<optimized out>,
slptimeo=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:773
#6  0xffffffff8064bbfc in getnewbuf (vp=<optimized out>, slpflag=0, slptimeo=0,
maxsize=32768, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:3284
#7  0xffffffff80649155 in getblkx (vp=0xfffff800042145a0, blkno=<optimized
out>, size=32768, slpflag=0, slptimeo=0, flags=0, bpp=0xfffffe00005853d8) at
/usr/src/sys/kern/vfs_bio.c:4022
#8  0xffffffff8064b905 in getblk (vp=<unavailable>, blkno=<unavailable>,
size=<unavailable>, slpflag=<unavailable>, slptimeo=<unavailable>,
flags=<unavailable>) at /usr/src/sys/kern/vfs_bio.c:3802
#9  0xffffffff807c69b5 in readindir (vp=<unavailable>, lbn=<unavailable>,
daddr=56528, bpp=0xfffffe0000585478) at /usr/src/sys/ufs/ufs/ufs_bmap.c:111
#10 0xffffffff807c6468 in ufs_bmaparray (vp=0xfffff800042145a0, bn=-6049804,
bnp=0xfffffe0000585518, nbp=<optimized out>, runp=<optimized out>, runb=0x0) at
/usr/src/sys/ufs/ufs/ufs_bmap.c:266
#11 0xffffffff807d3975 in ufs_strategy (ap=<optimized out>) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:2309
#12 0xffffffff808b9911 in VOP_STRATEGY_APV (vop=0xffffffff80aca540
<ufs_vnodeops>, a=0xfffffe0000585570) at vnode_if.c:2279
#13 0xffffffff80647114 in VOP_STRATEGY (vp=<unavailable>,
bp=0xfffffe000090b5c0) at ./vnode_if.h:940
#14 bufstrategy (bo=<optimized out>, bp=0xfffffe000090b5c0) at
/usr/src/sys/kern/vfs_bio.c:4999
#15 0xffffffff80648b1e in bstrategy (bp=<optimized out>) at
/usr/src/sys/sys/buf.h:419
#16 breadn_flags (vp=<optimized out>, blkno=<optimized out>, size=<optimized
out>, rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, flags=0, ckhashfunc=0x0,
bpp=0xfffffe0000585780) at /usr/src/sys/kern/vfs_bio.c:2181
#17 0xffffffff807982b8 in ffs_balloc_ufs2 (vp=<optimized out>,
startoffset=<optimized out>, size=<optimized out>, cred=0xfffff8000211b000,
flags=<optimized out>, bpp=0xfffffe0000585820) at
/usr/src/sys/ufs/ffs/ffs_balloc.c:894
#18 0xffffffff8079fb22 in ffs_snapblkfree (fs=0xfffffe0011c76000,
devvp=<optimized out>, bno=48416024, size=32768, inum=5, vtype=VREG,
wkhd=0xfffffe0000585950) at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1790
#19 0xffffffff80790d16 in ffs_blkfree (ump=0xfffff80004116800,
fs=0xfffffe0011c76000, devvp=0xfffff800041ff780, bno=48416024, size=32768,
inum=5, vtype=VREG, dephd=0xfffffe0000585950, key=2) at
/usr/src/sys/ufs/ffs/ffs_alloc.c:2602
#20 0xffffffff807baa4f in indir_trunc (freework=0xfffff800048ed480,
dbn=<optimized out>, lbn=<optimized out>) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:8259
#21 0xffffffff807ba93b in indir_trunc (freework=0xfffff800048ed480,
dbn=<optimized out>, lbn=<optimized out>) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:8240
#22 0xffffffff807ad4f9 in handle_workitem_indirblk (freework=<optimized out>)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:7875
#23 handle_workitem_freeblocks (freeblks=0xfffff800048eda00, flags=512) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:7970
#24 0xffffffff807b5ba1 in process_worklist_item (mp=0xfffff80004120000,
target=10, flags=512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1806
#25 0xffffffff807a1e92 in softdep_process_worklist (mp=0xfffff80004120000,
full=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1600
#26 0xffffffff807a580f in softdep_flush (addr=0xfffff80004120000) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:1402
#27 0xffffffff80556f8c in fork_exit (callout=0xffffffff807a5720
<softdep_flush>, arg=0xfffff80004120000, frame=0xfffffe0000585c00) at
/usr/src/sys/kern/kern_fork.c:1080
#28 <signal handler called>

The above explains why mksnap_ffs is halted; however the whole machine is
hanged.
It seems almost any user thread (e.g. an "ls" *on a different filesystem*) is
stuck in bufspace_wait.

Neither buf_daemon thread, nor its child bufspace_daemon are stuck (they are
running their normal loop).


So far I wasn't able to pinpoint what changed between 11.4 and 12.1 to cause
this.
Any hint appreciated.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-fs mailing list