ZFS: Deadlock during vnode recycling
Pawel Jakub Dawidek
pjd at FreeBSD.org
Sat Sep 29 14:45:08 UTC 2012
On Tue, Sep 18, 2012 at 09:14:22AM -0600, Justin T. Gibbs wrote:
> One of our systems became unresponsive due to an inability to recycle
> vnodes. We tracked this down to a deadlock in zfs_zget(). I've attached
> the stack trace from the vnlru process to the end of this email.
>
> We are currently testing the following patch. Since this issue is hard to
> replicate I would appreciate review and feedback before I commit it to
> FreeBSD.
I think the patch is ok. The whole mess we have to deal when to comes to
reclamation won't get much more messy with this patch in place:)
> Patch
> ===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<
> Change 635310 by justing at justing_ns1_spectrabsd on 2012/09/17 15:30:14
>
> For most vnode consumers of ZFS, the appropriate behavior
> when encountering a vnode that is in the process of being
> reclaimed is to wait for that process to complete and then
> allocate a new vnode. This behavior is enforced in zfs_zget()
> by checking for the VI_DOOMED vnode flag. In the case of
> the thread actually reclaiming the vnode, zfs_zget() must
> return the current vnode, otherwise a deadlock will occur.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h:
> Create a virtual znode field, z_reclaim_td, which is
> implemeted as a macro that redirects to z_task.ta_context.
>
> z_task is only used by the reclaim code to perform the
> final cleanup of a znode in a secondary thread. Since
> this can only occur after any calls to zfs_zget(), it
> is safe to reuse the ta_context field.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:
> In zfs_freebsd_reclaim(), record curthread in the
> znode being reclaimed.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:
> o Null out z_reclaim_td when znode_ts are constructed.
>
> o In zfs_zget(), return a "doomed vnode" if the current
> thread is actively reclaiming this object.
>
> Affected files ...
>
> ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h#2 edit
> ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c#3 edit
> ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#2 edit
>
> Differences ...
>
> ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h#2 (text) ====
>
> @@ -241,6 +241,7 @@
> struct task z_task;
> } znode_t;
>
> +#define z_reclaim_td z_task.ta_context
>
> /*
> * Convert between znode pointers and vnode pointers
>
> ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c#3 (text) ====
>
> @@ -6083,6 +6083,13 @@
>
> ASSERT(zp != NULL);
>
> + /*
> + * Mark the znode so that operations that typically block
> + * waiting for reclamation to complete will return the current,
> + * "doomed vnode", for this thread.
> + */
> + zp->z_reclaim_td = curthread;
> +
> /*
> * Destroy the vm object and flush associated pages.
> */
>
> ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#2 (text) ====
>
> @@ -158,6 +158,7 @@
> zp->z_dirlocks = NULL;
> zp->z_acl_cached = NULL;
> zp->z_moved = 0;
> + zp->z_reclaim_td = NULL;
> return (0);
> }
>
> @@ -1192,7 +1193,8 @@
> dying = 1;
> else {
> VN_HOLD(vp);
> - if ((vp->v_iflag & VI_DOOMED) != 0) {
> + if ((vp->v_iflag & VI_DOOMED) != 0 &&
> + zp->z_reclaim_td != curthread) {
> dying = 1;
> /*
> * Don't VN_RELE() vnode here, because
>
> vnlru_proc debug session
> ===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<
> #0 sched_switch (td=0xfffffe000f87b470, newtd=0xfffffe000d36c8e0, flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1927
> #1 0xffffffff8057f2b6 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485
> #2 0xffffffff805b8982 in sleepq_timedwait (wchan=0xfffffe05c7515640, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:658
> #3 0xffffffff8057f89f in _sleep (ident=0xfffffe05c7515640, lock=0x0, priority=Variable "priority" is not available.
> ) at /usr/src/sys/kern/kern_synch.c:246
> #4 0xffffffff81093035 in zfs_zget (zfsvfs=0xfffffe001de4c000, obj_num=81963, zpp=0xffffff8c60dc51b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1224
> #5 0xffffffff810bec9a in zfs_get_data (arg=0xfffffe001de4c000, lr=0xffffff820f5330b8, buf=0x0, zio=0xfffffe0584625000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1142
> #6 0xffffffff81096891 in zil_commit (zilog=0xfffffe001c382800, foid=Variable "foid" is not available.
> ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1048
> #7 0xffffffff810bceb0 in zfs_freebsd_write (ap=Variable "ap" is not available.
> ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1083
> #8 0xffffffff8081f112 in VOP_WRITE_APV (vop=0xffffffff8112cf40, a=0xffffff8c60dc5680) at vnode_if.c:951
> #9 0xffffffff807b1a6b in vnode_pager_generic_putpages (vp=0xfffffe05c76171e0, ma=0xffffff8c60dc5890, bytecount=Variable "bytecount" is not available.
> ) at vnode_if.h:413
> #10 0xffffffff807b1749 in vnode_pager_putpages (object=0xfffffe05e9ee9bc8, m=0xffffff8c60dc5890, count=61440, sync=1, rtvals=0xffffff8c60dc57a0) at vnode_if.h:1189
> #11 0xffffffff807aaee0 in vm_pageout_flush (mc=0xffffff8c60dc5890, count=15, flags=1, mreq=0, prunlen=0xffffff8c60dc594c, eio=0xffffff8c60dc59c0) at vm_pager.h:145
> #12 0xffffffff807a3da3 in vm_object_page_collect_flush (object=Variable "object" is not available.
> ) at /usr/src/sys/vm/vm_object.c:936
> #13 0xffffffff807a3f23 in vm_object_page_clean (object=0xfffffe05e9ee9bc8, start=Variable "start" is not available.
> ) at /usr/src/sys/vm/vm_object.c:861
> #14 0xffffffff807a42d4 in vm_object_terminate (object=0xfffffe05e9ee9bc8) at /usr/src/sys/vm/vm_object.c:706
> #15 0xffffffff807b241e in vnode_destroy_vobject (vp=0xfffffe05c76171e0) at /usr/src/sys/vm/vnode_pager.c:167
> #16 0xffffffff810beec7 in zfs_freebsd_reclaim (ap=Variable "ap" is not available.
> ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6146
> #17 0xffffffff806101e1 in vgonel (vp=0xfffffe05c76171e0) at vnode_if.h:830
> #18 0xffffffff80616379 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:734
>
> (kgdb) frame 4
> #4 0xffffffff81093035 in zfs_zget (zfsvfs=0xfffffe001de4c000, obj_num=81963, zpp=0xffffff8c60dc51b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1224
> 1224 tsleep(zp, 0, "zcollide", 1);
> (kgdb) l
> 1219 sa_buf_rele(db, NULL);
> 1220 mutex_exit(&zp->z_lock);
> 1221 ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
> 1222 if (vp != NULL)
> 1223 VN_RELE(vp);
> 1224 tsleep(zp, 0, "zcollide", 1);
> 1225 goto again;
> 1226 }
> 1227 *zpp = zp;
> 1228 err = 0;
> (kgdb)
>
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20120929/4ecd4c05/attachment.pgp
More information about the freebsd-fs
mailing list