svn commit: r306512 - in head/sys: kern sys

Bryan Drewery bdrewery at FreeBSD.org
Fri Aug 25 21:29:50 UTC 2017


On 8/25/2017 3:35 AM, Konstantin Belousov wrote:
> On Thu, Aug 24, 2017 at 08:18:03PM -0700, Bryan Drewery wrote:
>> On 9/30/2016 10:27 AM, Mateusz Guzik wrote:
>>> Author: mjg
>>> Date: Fri Sep 30 17:27:17 2016
>>> New Revision: 306512
>>> URL: https://svnweb.freebsd.org/changeset/base/306512
>>>
>>> Log:
>>>   vfs: batch free vnodes in per-mnt lists
>>>   
>>>   Previously free vnodes would always by directly returned to the global
>>>   LRU list. With this change up to mnt_free_list_batch vnodes are collected
>>>   first.
>>>   
>>>   syncer runs always return the batch regardless of its size.
>>>   
>>>   While vnodes on per-mnt lists are not counted as free, they can be
>>>   returned in case of vnode shortage.
>>>   
>>>   Reviewed by:	kib
>>>   Tested by:	pho
>>>
>>> Modified:
>>>   head/sys/kern/vfs_mount.c
>>>   head/sys/kern/vfs_subr.c
>>>   head/sys/sys/mount.h
>>>   head/sys/sys/vnode.h
>>>
>> ...
>>> @@ -2753,17 +2824,25 @@ _vhold(struct vnode *vp, bool locked)
>>>  	 * Remove a vnode from the free list, mark it as in use,
>>>  	 * and put it on the active list.
>>>  	 */
>>> -	mtx_lock(&vnode_free_list_mtx);
>>> -	TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist);
>>> -	freevnodes--;
>>> -	vp->v_iflag &= ~VI_FREE;
>>> +	mp = vp->v_mount;
>>> +	mtx_lock(&mp->mnt_listmtx);
>>                   ^^
>>
>>> +	if ((vp->v_mflag & VMP_TMPMNTFREELIST) != 0) {
>>> +		TAILQ_REMOVE(&mp->mnt_tmpfreevnodelist, vp, v_actfreelist);
>>> +		mp->mnt_tmpfreevnodelistsize--;
>>> +		vp->v_mflag &= ~VMP_TMPMNTFREELIST;
>>> +	} else {
>>> +		mtx_lock(&vnode_free_list_mtx);
>>> +		TAILQ_REMOVE(&vnode_free_list, vp, v_actfreelist);
>>> +		freevnodes--;
>>> +		mtx_unlock(&vnode_free_list_mtx);
>>> +	}
>>>  	KASSERT((vp->v_iflag & VI_ACTIVE) == 0,
>>>  	    ("Activating already active vnode"));
>>> +	vp->v_iflag &= ~VI_FREE;
>>>  	vp->v_iflag |= VI_ACTIVE;
>>> -	mp = vp->v_mount;
>>>  	TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist);
>>>  	mp->mnt_activevnodelistsize++;
>>> -	mtx_unlock(&vnode_free_list_mtx);
>>> +	mtx_unlock(&mp->mnt_listmtx);
>>>  	refcount_acquire(&vp->v_holdcnt);
>>>  	if (!locked)
>>>  		VI_UNLOCK(vp);
>>> @@ -2819,21 +2898,25 @@ _vdrop(struct vnode *vp, bool locked)
>>>  		if ((vp->v_iflag & VI_OWEINACT) == 0) {
>>>  			vp->v_iflag &= ~VI_ACTIVE;
>>>  			mp = vp->v_mount;
>>> -			mtx_lock(&vnode_free_list_mtx);
>>> +			mtx_lock(&mp->mnt_listmtx);
>>                                   ^^
>>
>> If code runs getnewvnode() and then immediately runs vhold() or vrele(),
>> without first running insmntque(vp, mp), then vp->v_mount is NULL here
>> and the lock/freelist dereferencing just panic.
> getnewvnode() returns vref-ed vnode, i.e. both hold and use counts are
> set to 1.  What is the use case there which requires vhold-ing vnode
> without finishing its construction ?

None that I can think of.  For vhold I was just observing that a NULL
dereference is possible.

> 
>> Locking the vnode and then using vgone() and vput() on it avoids the
>> issue since it marks the vnode VI_DOOMED and instead frees it in
>> _vdrop() rather than try to re-add it to its NULL per-mount free list.
> This is the common pattern where insmntque() fails.
> 
>>
>> I'm not sure what the right thing here is.  Is it a requirement to
>> insmntque() a new vnode immediately, or to vgone() it before releasing
>> it if was just returned from getnewvnode()?
> These are the only uses of a newly allocated vnode which make sense.
> Do you need this for something that does not happen in the svn tree ?

Outside of the tree in OneFS, we have some code that does getnewvnode(),
then adds the unlocked vnode to a kqueue event list (it's actually very
complicated compared to the basic description), and then does insmntque.
 The event list registration is failing in a stress run and it only does
a vrele in this case.  I need to change it to lock/vgone/vrele (or swap
the event registration and insmntque calls of which I'm unsure on the
impact).  Previously the vrele was enough, and not vgone too, since the
vnode was just added back to the global free list.  But now the v_mount
is required since the free list is per-mount.

It is trivial for me to fix the panic I am running into, but it feels
like there's a bug in _vdrop now due to this commit and I want to be
sure I follow through with it being fixed properly.  My
confusion/question is probably about what the real distinction here is
between using vgone before vrele to cause the vnode to be free'd and not
using vgone and causing it to go back into the free list instead - which
I cannot do now without swapping the calls.

It seems to me that an assert would now make sense in the _vdrop
!VI_DOOMED case since a v_mount is required to add back into the free
list, and probably _vhold, such as:
VNASSERT(mp != NULL, vn, ("%s: not on a mount vnode list"));

Or we just add it back to the global mount list in _vdrop as it was
supported before with something like:

https://people.freebsd.org/~bdrewery/patches/vdrop-global-list.diff

> 
>>
>> Perhaps these cases should assert that v_mount is not NULL or handle the
>> odd case and just free the vnode instead, or can it still just add to
>> the global vnode_free_list if mp is NULL?
>>
>> The old code handled the case fine since the freelist was global and not
>> per-mount.  'mp' was only dereferenced below if the vnode had been
>> active, which was not the case for my example.
> 


-- 
Regards,
Bryan Drewery

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20170825/46d02b31/attachment.sig>


More information about the svn-src-head mailing list