Many processes stuck in zfs

Pawel Jakub Dawidek pjd at FreeBSD.org
Wed Mar 10 17:31:53 UTC 2010


On Wed, Mar 10, 2010 at 04:12:36PM +0100, Borja Marcos wrote:
> 		
> On Mar 10, 2010, at 12:02 PM, Pawel Jakub Dawidek wrote:
> 
> > Once the deadlock occur, enter DDB and send me the output of:
> > 
> > 	ps
> > 	show alllocks
> > 	show lockedvnods
> > 	show allchains
> > 	alltrace
> 
> (Again, crossposted to -fs, ZFS related)
> 
> 
> Previous one was a panic when performing the test with several tar jobs running in parallel.
> 
> Now this is a capture of the deadlock itself, instead of a panic. (I called panic from the debugger to generate a dump)
[...]

Hmm, interesting. Especially those two traces:

Tracing command zfs pid 1820 tid 100105 td 0xffffff0002ca4000
[...]
_cv_wait() at _cv_wait+0x17a
txg_wait_synced() at txg_wait_synced+0x98
zfsvfs_teardown() at zfsvfs_teardown+0x1f6
zfs_suspend_fs() at zfs_suspend_fs+0x2b
zfs_ioc_recv() at zfs_ioc_recv+0x28b
zfsdev_ioctl() at zfsdev_ioctl+0x8d
devfs_ioctl_f() at devfs_ioctl_f+0x76
kern_ioctl() at kern_ioctl+0xc5
ioctl() at ioctl+0xfd
[...]

Tracing command bsdtar pid 1699 tid 100093 td 0xffffff000262dae0
[...]
_sx_slock_hard() at _sx_slock_hard+0x1b7
_sx_slock() at _sx_slock+0xc1 
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x63
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5
vgonel() at vgonel+0x119
vnlru_free() at vnlru_free+0x345
getnewvnode() at getnewvnode+0x24f
zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x43
zfs_znode_alloc() at zfs_znode_alloc+0x38
zfs_mknode() at zfs_mknode+0x259
zfs_freebsd_create() at zfs_freebsd_create+0x661
VOP_CREATE_APV() at VOP_CREATE_APV+0xb3
vn_open_cred() at vn_open_cred+0x473
kern_openat() at kern_openat+0x179
[...]

This should be impossible. If we are that deep in zfsvfs_teardown(), it means
that we hold the z_teardown_lock exclusively. And we do as 'show alllocks'
output confirms. But if we are holding this lock exclusively we shouldn't be
that deep in create code path, because we need hold this lock as reader.
It isn't visible in 'show alllocks' output, because this lock is special
(rrwlock.c).

I see three possibilities:
1. We are looking at different file systems here. But where is deadlock
   coming from then?
2. There is a bug in rrwlock.c. Highly unlikely I think.
3. My thinking is incorrect somewhere.

Let me do some more thinking and I'll get back to you (possibly with a patch
that will help us to find right possibility).

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20100310/7514138e/attachment.pgp


More information about the freebsd-fs mailing list