Possible kqueue related issue on STABLE/RC.

Konstantin Belousov kostikbel at gmail.com
Wed Sep 25 19:40:30 UTC 2013

On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300:
> > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote:
> > > I'd like to understand why you think protecting these functions w/
> > > the _DETACHED check is correct...  In kern_event.c, all calls to
> > > f_detach are followed by knote_drop which will ensure that the knote
> > > is removed and free, so no more f_event calls will be called on that
> > > knote..
> > 
> > My current belief is that what happens is a glitch in the
> > kqueue_register(). After a new knote is created and attached, the kq
> > lock is dropped and then f_event() is called. If the vnode is reclaimed
> > or possible freed meantime, f_event() seems to dereference freed memory,
> > since kn_hook points to freed vnode.
> Well, if that happens, then the vnode isn't properly clearing up the
> knote before it gets reclaimed...  It is the vnode's responsibility to
> make sure any knotes that are associated w/ it get cleaned up properly..
See below.

> > The issue as I see it is that vnode lifecycle is detached from the knote
> > lifecycle.  Might be, only the second patch, which acquires a hold reference
> > on the vnode for each knote, is really needed.  But before going into any
> > conclusions, I want to see the testing results.
> The vnode lifecycle can't/shouldn't be detached from the knote lifecycle
> since the knote contains a pointer to the vnode...  There is the function
> knlist_clear that can be used to clean up knotes when the object goes
> away..
This is done from the vdropl() (null hold count) -> destroy_vpollinfo().
But this is too late, IMO. vdropl() is only executing with the vnode
interlock locked, and knote lock is vnode lock.  This way, you might
get far enough into vdropl in other thread, while trying to operate on
a vnode with zero hold count in some kqueue code path.

We do not drain the vnode lock holders when destroying vnode, because
VFS interface require that anybody locking the vnode own a hold reference
on it.  My short patch should fix exactly this issue, hopefully we will see.

> I was looking at the code, is there a good reason why you do
> VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in
> the vfs_knllock/vfs_knlunlock functions?  Because kq code will modify
> the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions,
> so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to
> vfs_knllock/vfs_knlunlock...

vfs_knllock() is fine. The problematic case if the
VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you
look at the vfs_knl_assert_locked(), you would note that the function
only asserts that vnode is locked, not that it is locked exclusively.
This is because some filesystems started require from VFS to do e.g.
VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called
with shared-locked vnode lock.

The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue
callers are fine. Taking vnode interlock inside the filters provides
enough exclusion for the VOP callers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130925/6090e2a0/attachment.sig>

More information about the freebsd-stable mailing list