System deadlock when using mksnap_ffs

Tim Bishop tim-lists at bishnet.net
Wed Nov 12 16:41:11 PST 2008


On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote:
> On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
> > I've been playing around with snapshots lately but I've got a problem on
> > one of my servers running 7-STABLE amd64:
> > 
> > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 tdb at paladin:/usr/obj/usr/src/sys/PALADIN  amd64
> > 
> > I run the mksnap_ffs command to take the snapshot and some time later
> > the system completely freezes up:
> > 
> > paladin# cd /u2/.snap/
> > paladin# mksnap_ffs /u2 test.1
> > 
> > It only happens on this one filesystem, though, which might be to do
> > with its size. It's not over the 2TB marker, but it's pretty close. It's
> > also backed by a hardware RAID system, although a smaller filesystem on
> > the same RAID has no issues.
> > 
> > Filesystem  1K-blocks       Used     Avail Capacity  Mounted on
> > /dev/da0s1a 2078881084 921821396 990749202    48%    /u2
> > 
> > To clarify "completely freezes up": unresponsive to all services over
> > the network, except ping. On the console I can switch between the ttys,
> > but none of them respond. The only way out is to hit the reset button.
> 
> You need to provide information described in the
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
> and especially
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Ok, I've done that, and removed the patch that seemed to fix things.

The first thing I notice after doing this on the console is that I can
still ctrl+t the process:

load: 0.14  cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k

But the top and ps I left running on other ttys have all stopped
responding.

Also the following kernel message came out:

Expensive timeout(9) function: 0xffffffff802ce380(0xffffff000677ca50) 0.006121001 s

There is also still some disk I/O.

Dropping to ddb worked, but I don't have a serial console so I can't
paste the output.

ps shows mksnap_ffs in newbuf, as we already saw. A trace of mksnap_ffs
looks like this:

Tracing pid 2603 tid 100214 td 0xffffff0006a0e370
sched_switch() at sched_switch+0x2a1
mi_switch() at mi_switch+0x233
sleepq_switch() at sleepq_switch+0xe9
sleepq_wait() at sleepq_wait+0x44
_sleep() at _sleep+0x351
getnewbuf() at getnewbuf+0x2e1
getblk() at getblk+0x30d
setup_allocindir_phase2() at setup_allocindir_phase2+0x338
softdep_setup_allocindir_page() at softdep_setup_allocindir_page+0xa7
ffs_balloc_ufs2() at ffs_balloc_ufs2+0x121e
ffs_snapshot() at ffs_snapshot+0xc52
ffs_mount() at ffs_mount+0x735
vfs_donmount() at vfs_donmount+0xeb5
kernel_mount() at kernel_mount+0xa1
ffs_cmount() at ffs_cmount+0x92
mount() at mount+0x1cc
syscall() at syscall+0x1f6
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (21, FreeBSD ELF64, mount), rip = 0x80068636c, rsp = 0x7fffffffe518, rbp = 0x8008447a0 ---

show pcpu shows cpuid 3 (quad core machine) in thread "swi6: Giant taskq".
All the other cpus are idle.

show locks shows:

exclusive sleep mutex Giant r = 0 (0xffffffff806ae040) locked @ /usr/src/sys/kern/kern_intr.c:1087

There are two other locks shown by show all locks, one for sshd and one
for mysqld, both in kern/uipc_sockbuf.c.

show lockedvnods shows mksnap_ffs has a lock on da0s1a with ffs_vget at
the top of the stack.

Sorry for any typos. I'll sort out a serial cable if more is needed :-)

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984


More information about the freebsd-stable mailing list