Possible kqueue related issue on STABLE/RC.

Wed Sep 25 21:06:22 UTC 2013

Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 22:40 +0300:
> On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote:
> > Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300:
> > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > > I'd like to understand why you think protecting these functions w/
> > > > the _DETACHED check is correct...  In kern_event.c, all calls to
> > > > f_detach are followed by knote_drop which will ensure that the knote
> > > > is removed and free, so no more f_event calls will be called on that
> > > > knote..
> > > 
> > > My current belief is that what happens is a glitch in the
> > > kqueue_register(). After a new knote is created and attached, the kq
> > > lock is dropped and then f_event() is called. If the vnode is reclaimed
> > > or possible freed meantime, f_event() seems to dereference freed memory,
> > > since kn_hook points to freed vnode.
> > 
> > Well, if that happens, then the vnode isn't properly clearing up the
> > knote before it gets reclaimed...  It is the vnode's responsibility to
> > make sure any knotes that are associated w/ it get cleaned up properly..
> See below.
> 
> > 
> > > The issue as I see it is that vnode lifecycle is detached from the knote
> > > lifecycle.  Might be, only the second patch, which acquires a hold reference
> > > on the vnode for each knote, is really needed.  But before going into any
> > > conclusions, I want to see the testing results.
> > 
> > The vnode lifecycle can't/shouldn't be detached from the knote lifecycle
> > since the knote contains a pointer to the vnode...  There is the function
> > knlist_clear that can be used to clean up knotes when the object goes
> > away..
> This is done from the vdropl() (null hold count) -> destroy_vpollinfo().
> But this is too late, IMO. vdropl() is only executing with the vnode
> interlock locked, and knote lock is vnode lock.  This way, you might
> get far enough into vdropl in other thread, while trying to operate on
> a vnode with zero hold count in some kqueue code path.
> 
> We do not drain the vnode lock holders when destroying vnode, because
> VFS interface require that anybody locking the vnode own a hold reference
> on it.  My short patch should fix exactly this issue, hopefully we will see.

Which clearly wasn't happening before...  With the above, and rereading
your patch, I understand how this patch should fix the issue and bring
the life cycle of the two back into sync...

> > I was looking at the code, is there a good reason why you do
> > VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in
> > the vfs_knllock/vfs_knlunlock functions?  Because kq code will modify
> > the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions,
> > so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to
> > vfs_knllock/vfs_knlunlock...
> 
> vfs_knllock() is fine. The problematic case if the
> VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you
> look at the vfs_knl_assert_locked(), you would note that the function
> only asserts that vnode is locked, not that it is locked exclusively.
> This is because some filesystems started require from VFS to do e.g.
> VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called
> with shared-locked vnode lock.
> 
> The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue
> callers are fine. Taking vnode interlock inside the filters provides
> enough exclusion for the VOP callers.

Ahh, ok, makes sense now..  Clearly I need to learn more about the
VFS/vnope system.. :)

Thanks for the explanations...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."