[Bug 224292] processes are hanging in state ufs / possible deadlock in file system

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Mar 31 08:12:59 UTC 2021


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292

--- Comment #18 from sigsys at gmail.com ---
(In reply to Konstantin Belousov from comment #17)
This sure seems to have helped.  I was about to report that the problem is most
likely gone since it hadn't happened in a while (despite running kyua in a loop
for hours) after getting that patch series.

But then it happened again with chrome this time and I got a dump.  Dunno if
running "sync" would have unwedged the whole thing since I made it panic
instead.

There were two threads from two processes looping and doing crazy I/O: a chrome
process and a zsh process.

zsh thread backtrace:

#0  sched_switch (td=td at entry=0xfffffe00aa35ce00, flags=<optimized out>,
flags at entry=260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=flags at entry=260) at
/usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=wchan at entry=0xfffffe00097da0a8,
pri=92, pri at entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff80c6f7fe in sleepq_wait (wchan=<optimized out>, pri=<optimized
out>) at /usr/src/sys/kern/subr_sleepqueue.c:659
#4  0xffffffff80c1e9e6 in _sleep (ident=ident at entry=0xfffffe00097da0a8,
lock=<optimized out>, lock at entry=0xfffffe000863b0c0,
priority=priority at entry=92, wmesg=<optimized out>, sbt=sbt at entry=0,
pr=pr at entry=0, flags=256) at /usr/src/sys/kern/kern_synch.c:221
#5  0xffffffff80cd5214 in bwait (bp=0xfffffe00097da0a8, pri=92 '\\',
wchan=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020
#6  bufwait (bp=bp at entry=0xfffffe00097da0a8) at
/usr/src/sys/kern/vfs_bio.c:4433
#7  0xffffffff80cd285a in bufwrite (bp=0xfffffe00097da0a8, bp at entry=<error
reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2305
#8  0xffffffff80f01789 in bwrite (bp=<unavailable>) at
/usr/src/sys/sys/buf.h:430
#9  ffs_update (vp=vp at entry=0xfffff80004c61380, waitfor=waitfor at entry=1) at
/usr/src/sys/ufs/ffs/ffs_inode.c:204
#10 0xffffffff80f2f98a in ffs_syncvnode (vp=vp at entry=0xfffff80004c61380,
waitfor=<optimized out>, waitfor at entry=1, flags=<optimized out>, flags at entry=0)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#11 0xffffffff80f0f91d in softdep_prelink (dvp=dvp at entry=0xfffff80004c61380,
vp=vp at entry=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417
#12 0xffffffff80f3fee3 in ufs_makeinode (mode=33188, dvp=0xfffff80004c61380,
vpp=0xfffffe00aae0a9d8, cnp=<unavailable>, callfunc=<unavailable>) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:2741
#13 0xffffffff80f3bfa4 in ufs_create (ap=0xfffffe00aae0a8a8) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:213
#14 0xffffffff8118a31d in VOP_CREATE_APV (vop=0xffffffff81b63158
<ffs_vnodeops2>, a=a at entry=0xfffffe00aae0a8a8) at vnode_if.c:244
#15 0xffffffff80d15233 in VOP_CREATE (dvp=<unavailable>,
vpp=0xfffffe00aae0a9d8, cnp=0xfffffe00aae0aa00, vap=0xfffffe00aae0a7f0) at
./vnode_if.h:133
#16 vn_open_cred (ndp=ndp at entry=0xfffffe00aae0a968,
flagp=flagp at entry=0xfffffe00aae0aa94, cmode=cmode at entry=420,
vn_open_flags=<optimized out>, vn_open_flags at entry=0, cred=0xfffff80048d42e00,
fp=0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285
#17 0xffffffff80d14f6d in vn_open (ndp=<unavailable>,
ndp at entry=0xfffffe00aae0a968, flagp=<unavailable>,
flagp at entry=0xfffffe00aae0aa94, cmode=<unavailable>, cmode at entry=420,
fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202
#18 0xffffffff80d08999 in kern_openat (td=0xfffffe00aa35ce00, fd=-100,
path=0x8002fd420 <error: Cannot access memory at address 0x8002fd420>,
pathseg=UIO_USERSPACE, flags=34306, mode=<optimized out>) at
/usr/src/sys/kern/vfs_syscalls.c:1142
#19 0xffffffff810c5803 in syscallenter (td=<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#20 amd64_syscall (td=0xfffffe00aa35ce00, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1156
#21 <signal handler called>
#22 0x00000008004f223a in ?? ()

chrome thread backtrace:

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475
#1  0xffffffff8108afe9 in ipi_nmi_handler () at
/usr/src/sys/x86/x86/mp_x86.c:1432
#2  0xffffffff810c4256 in trap (frame=0xfffffe0009848f30) at
/usr/src/sys/amd64/amd64/trap.c:201
#3  <signal handler called>
#4  vtpci_legacy_notify_vq (dev=<optimized out>, queue=0, offset=16) at
/usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485
#5  0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=0xfffff8000362fb00,
queue=0, offset=16) at ./virtio_bus_if.h:144
#6  vq_ring_notify_host (vq=0xfffffe0063e27000) at
/usr/src/sys/dev/virtio/virtqueue.c:834
#7  virtqueue_notify (vq=0xfffffe0063e27000, vq at entry=0xfffff8004de6f600) at
/usr/src/sys/dev/virtio/virtqueue.c:439
#8  0xffffffff80a538c0 in vtblk_startio (sc=sc at entry=0xfffff8000362f100) at
/usr/src/sys/dev/virtio/block/virtio_blk.c:1123
#9  0xffffffff80a53bed in vtblk_strategy (bp=0xfffff8004de6f600) at
/usr/src/sys/dev/virtio/block/virtio_blk.c:571
#10 0xffffffff80b4bcfc in g_disk_start (bp=<optimized out>) at
/usr/src/sys/geom/geom_disk.c:473
#11 0xffffffff80b4f147 in g_io_request (bp=0xfffff80021d33c00, cp=<optimized
out>, cp at entry=0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589
#12 0xffffffff80b5b1a9 in g_part_start (bp=0xfffff8004e974900) at
/usr/src/sys/geom/part/g_part.c:2332
#13 0xffffffff80b4f147 in g_io_request (bp=0xfffff8004e974900, cp=<optimized
out>) at /usr/src/sys/geom/geom_io.c:589
#14 0xffffffff80cd284c in bstrategy (bp=0xfffffe0008ac5388) at
/usr/src/sys/sys/buf.h:442
#15 bufwrite (bp=0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302
#16 0xffffffff80f01789 in bwrite (bp=0x0) at /usr/src/sys/sys/buf.h:430
#17 ffs_update (vp=vp at entry=0xfffff80139495000, waitfor=waitfor at entry=1) at
/usr/src/sys/ufs/ffs/ffs_inode.c:204
#18 0xffffffff80f2f98a in ffs_syncvnode (vp=vp at entry=0xfffff80139495000,
waitfor=<optimized out>, waitfor at entry=1, flags=<optimized out>, flags at entry=0)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#19 0xffffffff80f0f86f in softdep_prelink (dvp=dvp at entry=0xfffff80139495000,
vp=vp at entry=0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417
#20 0xffffffff80f3d797 in ufs_remove (ap=0xfffffe00aabdfa20) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:1011
#21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=0xffffffff81b63158
<ffs_vnodeops2>, a=a at entry=0xfffffe00aabdfa20) at vnode_if.c:1540
#22 0xffffffff80d0a468 in VOP_REMOVE (dvp=0x0, vp=0xfffff8013c8328c0,
cnp=<optimized out>) at ./vnode_if.h:802
#23 kern_funlinkat (td=0xfffffe00aa6e3100, dfd=dfd at entry=-100, path=0x8288d40e0
<error: Cannot access memory at address 0x8288d40e0>, fd=<optimized out>,
fd at entry=-200, pathseg=pathseg at entry=UIO_USERSPACE, flag=<optimized out>,
flag at entry=0, oldinum=0) at /usr/src/sys/kern/vfs_syscalls.c:1927
#24 0xffffffff80d0a138 in sys_unlink (td=0xfffff8000362fb00, uap=<optimized
out>) at /usr/src/sys/kern/vfs_syscalls.c:1808
#25 0xffffffff810c5803 in syscallenter (td=<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#26 amd64_syscall (td=0xfffffe00aa6e3100, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1156
#27 <signal handler called>
#28 0x000000080e40d17a in ?? ()

syncer backtrace:

#0  sched_switch (td=td at entry=0xfffffe00a5e29100, flags=<optimized out>,
flags at entry=260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=flags at entry=260) at
/usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=wchan at entry=0xffffffff81fa9550
<sync_wakeup>, pri=pri at entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff80c6fe3b in sleepq_timedwait
(wchan=wchan at entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri at entry=0) at
/usr/src/sys/kern/subr_sleepqueue.c:690
#4  0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=0xffffffff81fa9550
<sync_wakeup>, lock=0xffffffff81fa9520 <sync_mtx>, sbt=<optimized out>,
pr=<optimized out>, pr at entry=0, flags=0, flags at entry=256) at
/usr/src/sys/kern/kern_condvar.c:312
#5  0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739
#6  0xffffffff80bcb9a0 in fork_exit (callout=0xffffffff80d03090 <sched_sync>,
arg=0x0, frame=0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077
#7  <signal handler called>

It seems like some kind of livelock involving ERELOOKUP loops. I can only guess
though, softupdates' is way too complicated for me.

That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged.  This kernel has
some local changes and I'm a little bit worried that this might be the problem
but I think it's unlikely.  The problem happens pretty rarely and that's the
only -CURRENT install on UFS that I'm working with so that's the best that I've
got.  That's with a virtio disk backed by a ZFS volume on bhyve BTW.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-fs mailing list