Deadlock in UFS/SU+J?
Lucius Windschuh
lwindschuh at googlemail.com
Thu Aug 12 07:37:05 UTC 2010
Dear list members,
I tried to reproduce another bug on my test machine (i386, CURRENT
r211175), but ran into the following deadlock:
lock order reversal:
1st 0xc66677c8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2101
2nd 0xe5ba2460 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11291
3rd 0xc8647af8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2101
KDB: stack backtrace:
db_trace_self_wrapper(c08cfabe,f60ee2d4,c06441c5,c06339ab,c08d2b4d,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c06339ab,c08d2b4d,c5112100,c5115b18,f60ee330,...) at
kdb_backtrace+0x29
_witness_debugger(c08d2b4d,c8647af8,c08c4963,c5115b18,c08d9dc5,...) at
_witness_debugger+0x25
witness_checkorder(c8647af8,9,c08d9dc5,835,0,...) at witness_checkorder+0x839
__lockmgr_args(c8647af8,80100,c8647b18,0,0,...) at __lockmgr_args+0x7f4
ffs_lock(f60ee458,c0643f6b,c08d9267,80100,c8647aa0,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c0945da0,f60ee458,c82ea364,c095e280,c8647aa0,...) at
VOP_LOCK1_APV+0xb5
_vn_lock(c8647aa0,80100,c08d9dc5,835,4,...) at _vn_lock+0x5e
vget(c8647aa0,80100,c82ea2c0,50,0,...) at vget+0xb9
vfs_hash_get(c5ea2a20,5c000e,80000,c82ea2c0,f60ee5a8,...) at vfs_hash_get+0xe6
ffs_vgetf(c5ea2a20,5c000e,80000,f60ee5a8,1,...) at ffs_vgetf+0x49
softdep_sync_metadata(c6667770,0,c08edf39,144,0,...) at
softdep_sync_metadata+0xc92
ffs_syncvnode(c6667770,1,f60ee664,246,c666781c,...) at ffs_syncvnode+0x3e2
ffs_truncate(c6667770,200,0,880,c5e4f780,...) at ffs_truncate+0x87e
ufs_direnter(c6667770,c8647aa0,f60ee91c,f60eeba4,0,...) at ufs_direnter+0x8d4
ufs_makeinode(f60eeba4,0,f60eeb00,f60eea64,c0879745,...) at ufs_makeinode+0x517
ufs_create(f60eeb00,f60eeb18,0,0,f60eeb78,...) at ufs_create+0x30
VOP_CREATE_APV(c0945da0,f60eeb00,2,c0643f6b,0,...) at VOP_CREATE_APV+0xa5
vn_open_cred(f60eeb78,f60eec2c,180,0,c5e4f780,...) at vn_open_cred+0x1d3
vn_open(f60eeb78,f60eec2c,180,c5f077e0,0,...) at vn_open+0x3b
kern_openat(c82ea2c0,ffffff9c,2848cc80,0,a03,...) at kern_openat+0xf5
kern_open(c82ea2c0,2848cc80,0,a02,180,...) at kern_open+0x35
open(c82ea2c0,f60eecec,f60eed28,c08d1426,0,...) at open+0x30
syscallenter(c82ea2c0,f60eece4,f60eece4,0,c097b8f0,...) at syscallenter+0x23f
syscall(f60eed28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (5, FreeBSD ELF32, open), eip = 0x282a2623, esp =
0xbfbfb08c, ebp = 0xbfbfb538 ---
The test, upgrading lang/gcc44 in an endless loop, gut stuck sometime
in the night when /etc/periodic/daily/450.status-security was running
in parallel:
$ ps axl | grep ufs
0 44219 44215 0 76 0 13624 7200 ufs D ?? 0:03.19
find -sx / /usr /zfs/test /dev/null -type f ( -perm -u+x -or -perm
-g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {}
+
0 44713 2638 0 44 0 9752 1304 piperd S+ 3 0:00.01 grep ufs
0 43824 43823 0 76 0 10532 1368 ufs D+ 2 0:00.02
/usr/ports/lang/gcc44/work/build/./prev-gcc/collect2 -dynamic-linker
/libexec/ld-elf.so.1 -o cc1 /usr/lib/crt1.o /usr/lib/crti.o
/usr/ports/lang/gcc44/work/build/./prev-gcc/crtbegin.o
-L/usr/local/lib -L/usr/ports/lang/gcc44/work/build/./prev-gcc
-L/usr/local/i386-portbld-freebsd9.0/bin c-lang.o stub-objc.o
attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o
c-convert.o c-aux-info.o c-common.o c-opts.o c-format.o c-semantics.o
c-ppoutput.o c-cppbuiltin.o c-objc-common.o c-dump.o c-pch.o
c-parser.o i386-c.o c-gimplify.o tree-mudflap.o c-pretty-print.o
c-omp.o cc1-checksum.o main.o libbackend.a ../libcpp/libcpp.a
../libdecnumber/libdecnumber.a ../libcpp/libcpp.a
/usr/local/lib/libiconv.so -rpath /usr/local/lib
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -lmpfr -lgmp
-lgcc -lgcc_eh -lc -lgcc -lgcc_eh
/usr/ports/lang/gcc44/work/build/./prev-gcc/crtend.o /usr/lib/crtn.o
0 43828 43780 0 76 0 9608 3580 ufs D+ 2 0:00.02
/usr/local/bin/as -I. -I. -I.././../gcc-4.4-20100803/gcc
-I.././../gcc-4.4-20100803/gcc/.
-I.././../gcc-4.4-20100803/gcc/../include
-I.././../gcc-4.4-20100803/gcc/../libcpp/include -I/usr/local/include
-I.././../gcc-4.4-20100803/gcc/../libdecnumber
-I.././../gcc-4.4-20100803/gcc/../libdecnumber/dpd -I../libdecnumber
-I/usr/local/include -o gcc.o /var/tmp//ccxkWkSs.s
Short backtraces of the affected processes:
# procstat -kk 44219
PID TID COMM TDNAME KSTACK
44219 100252 find - mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af kern_statat_vnhook+0x72 kern_statat+0x3c
kern_lstat+0x36 lstat+0x2f syscallenter+0x23f
# procstat -kk 43824
PID TID COMM TDNAME KSTACK
43824 100082 collect2 - mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b
kern_openat+0xf5 kern_open+0x35 open+0x30
# procstat -kk 43828
PID TID COMM TDNAME KSTACK
43828 100276 as - mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b
kern_openat+0xf5 kern_open+0x35 open+0x30
As you see, both gcc-related process hang in open(), the find process
hangs in lstat(). I don't know which one got stuck first, but I am
confident that I ran into the deadlock described by WITNESS above.
Is there anything I can help with? The machine is still hanging in
this state, entering the debugger is possible, generating a coredump
should work (did hang the last time, but since than, I updated the
kernel).
Kernel config: https://sites.google.com/site/lwfreebsd/Home/files/kernelconfig-CURRENT.txt.gz?attredirects=0&d=1
DDB output from a textdump simulation:
https://sites.google.com/site/lwfreebsd/Home/files/current-r211175-UFS-deadlock.txt.gz?attredirects=0&d=1
Regards
Lucius
More information about the freebsd-current
mailing list