kern/160662: Snapshots cause a lockup on UFS with SU+J enabled

Hans Ottevanger hans at beastielabs.net
Sun Sep 11 15:50:08 UTC 2011


>Number:         160662
>Category:       kern
>Synopsis:       Snapshots cause a lockup on UFS with SU+J enabled
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Sep 11 15:50:07 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Hans Ottevanger
>Release:        9.0-BETA2
>Organization:
>Environment:
FreeBSD testp4.beastielabs.net 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Wed Aug 31 17:26:34 UTC 2011     root at obrian.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
On a UFS filesystem with SU+J enabled attempting to make a snapshot with mksnap_ffs causes the system to lockup completely after a while, needing a
reset to recover.

This is not the extreme slowdown due to the snapshot taking all available
disk bandwidth: the system becomes fully unresponsive, i.e. no reaction on
keyboard or mouse and e.g. remote ssh sessions just stop. However, the
system remains pingable.

If journalling is disabled by running tunefs -j disable <fs> (in single
user mode, if needed), making a snapshot will succeed again.

Two lock order reversal occurs in both cases, identical modulo the addresses.
These are the ones for an SU+J case:

lock order reversal:
 1st 0xc6347498 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
 2nd 0xdf326728 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
 3rd 0xc603baf8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
KDB: stack backtrace:
db_trace_self_wrapper(c0efdd0c,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26
kdb_backtrace(c0a415fb,c0f016fc,c5965370,c5969198,c57a8404,...) at kdb_backtrace+0x2a
_witness_debugger(c0f016fc,c603baf8,c0ef0968,c5969198,c0f2c002,...) at _witness_debugger+0x25
witness_checkorder(c603baf8,9,c0f2c002,222,0,...) at witness_checkorder+0x839
__lockmgr_args(c603baf8,80100,c603bb18,0,0,...) at __lockmgr_args+0x824
ffs_lock(c57a852c,c11dd3c8,c5ee5390,80100,c603baa0,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c1047760,c57a852c,c57a854c,c1057e00,c603baa0,...) at VOP_LOCK1_APV+0xb5
_vn_lock(c603baa0,80100,c0f2c002,222,c598de80,...) at _vn_lock+0x5e
ffs_snapshot(c5ec4798,c5aea300,c0f2f410,1a2,0,...) at ffs_snapshot+0x14fc
ffs_mount(c5ec4798,c5ca6000,ff,394,c5962450,...) at ffs_mount+0x1c13
vfs_donmount(c5ee52e0,11000,c5997d80,c5997d80,c5f0d588,...) at vfs_donmount+0x1219
nmount(c5ee52e0,c57a8cec,c57a8d28,c0efffda,0,...) at nmount+0x84
syscallenter(c5ee52e0,c57a8ce4,c57a8ce4,0,0,...) at syscallenter+0x263
syscall(c57a8d28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280dc61b, esp = 0xbfbfe56c, ebp = 0xbfbfece8 ---
lock order reversal:
 1st 0xdf326728 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
 2nd 0xc5995a1c snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
KDB: stack backtrace:
db_trace_self_wrapper(c0efdd0c,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26
kdb_backtrace(c0a415fb,c0f016e3,c5965370,c5969540,c57a8404,...) at kdb_backtrace+0x2a
_witness_debugger(c0f016e3,c5995a1c,c0f2c064,c5969540,c0f2c002,...) at _witness_debugger+0x25
witness_checkorder(c5995a1c,9,c0f2c002,332,c63474b8,...) at witness_checkorder+0x839
__lockmgr_args(c5995a1c,80400,c63474b8,0,0,...) at __lockmgr_args+0x824
ffs_lock(c57a852c,df2f7f68,100000,80400,c6347440,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c1047760,c57a852c,df2f7fc4,c1057e00,c6347440,...) at VOP_LOCK1_APV+0xb5
_vn_lock(c6347440,80400,c0f2c002,332,0,...) at _vn_lock+0x5e
ffs_snapshot(c5ec4798,c5aea300,c0f2f410,1a2,0,...) at ffs_snapshot+0x298e
ffs_mount(c5ec4798,c5ca6000,ff,394,c5962450,...) at ffs_mount+0x1c13
vfs_donmount(c5ee52e0,11000,c5997d80,c5997d80,c5f0d588,...) at vfs_donmount+0x1219
nmount(c5ee52e0,c57a8cec,c57a8d28,c0efffda,0,...) at nmount+0x84
syscallenter(c5ee52e0,c57a8ce4,c57a8ce4,0,0,...) at syscallenter+0x263
syscall(c57a8d28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280dc61b, esp = 0xbfbfe56c, ebp = 0xbfbfece8

It is not clear if the LORs are related to the lockup.

This issue specifically occurs on i386 (2.4 GHz P4, 2 GByte RAM, 500 GByte PATA disk) running 9.0-BETA2 as distributed, but the problem is also 100% reproducible on amd64 running a more recent 9.0-BETA2.
>How-To-Repeat:
Attempt to make a snapshot of the /usr filesystem (32 GByte in my case), which is SU+J enabled by default. This can done by typing (as root):

cd /usr; mksnap_ffs /usr/.snap/testsnap

After a lot of disk activity for a few seconds the system locks up a described.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list