panic in vget()

Fri Apr 16 21:12:03 UTC 2010

> -----Original Message-----
> From: Kostik Belousov [mailto:kostikbel at gmail.com]
> Sent: Friday, April 16, 2010 1:41 PM
> To: Matthew Fleming
> Cc: freebsd-stable at freebsd.org
> Subject: Re: panic in vget()
> 
> On Fri, Apr 16, 2010 at 01:23:17PM -0700, Matthew Fleming wrote:
> > I'm looking at this panic in vget() on stable/7:
> >
> > 	if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0)
> > 		panic("vget: vn_lock failed to return ENOENT\n");
> >
> > It seems to me that this is not a correct assertion, because if the
> > caller passed in no lock flags (i.e. just checking the vnode for
> > validity) then there is a window between the VI_UNLOCK() in
> > _vn_lock(9) and the subsequent VI_LOCK() in vget() where another
> > thread could have set VI_DOOMED.
> >
> > This isn't a problem on CURRENT because the code has been changed to
> > not allow an empty lock flags.
> >
> > I believe the following is a potential fix is:
> >
> >  	vholdl(vp);
> >  	if ((error = vn_lock(vp, flags | LK_INTERLOCK, td)) != 0) {
> >  		vdrop(vp);
> >  		return (error);
> >  	}
> >  	VI_LOCK(vp);
> > +	/*
> > +	 * Deal with a timing window when the interlock is not held
> > +	 * and VI_DOOMED can be set, since we only have a holdcnt,
> > +	 * not a usecount.
> > +	 */
> > +	if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) {
> > +		KASSERT((flags & LK_TYPE_MASK) == 0, ("Unexpected flags
> > %x", flags));
> > +		vdropl(vp);
> > +		return (ENOENT);
> > +	}
> >  	/* Upgrade our holdcnt to a usecount. */
> >  	v_upgrade_usecount(vp);
> > -	if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0)
> > -		panic("vget: vn_lock failed to return ENOENT\n");
> >  	if (oweinact) {
> >  		if (vp->v_iflag & VI_OWEINACT)
> >  			vinactive(vp, td);
> >  		VI_UNLOCK(vp);
> >  		if ((oldflags & LK_TYPE_MASK) == 0)
> 
> Both the analysis and the patch look good.
> 
> Did you considered locking the vnode even when no locking flags were
> given, as is done for VI_OWEINACT handling ? Your solution is better,
esp.
> for old lockmgr, but acquiring vnode lock might be safer.

For our systems, the vnode lock is distributed across the entire
cluster, so we prefer not to take it unless required.  The code path
that produced this panic is one such; it is using other mechanisms to
guarantee the data is correct.

(As a side note, splitting the vnode lock into a lock on the vnode
struct and a "file" lock would be really great, since the VOP_LOCK uses
seem split between serializing the file contents and serializing some of
the members of struct vnode itself, and we only need a distributed lock
for the file contents).

Thanks,
matthew