Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h

Mon Apr 28 17:26:49 UTC 2008

Kostik Belousov wrote:
> On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
>> Thanks for your response and explanation :)
>>
>> Kostik Belousov wrote:
>>> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
>>>> Kostik Belousov wrote:
>>>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>>>>>> Hi Konstantin :)
>>>>>>
>>>>>> To fix a unionfs issue of 
>>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>>>>>> we need to add new functions
>>>>>>
>>>>>>  void vkernrele(struct vnode *vp);
>>>>>>  void vkernref(struct vnode *vp);
>>>>>>
>>>>>> and one value
>>>>>>
>>>>>>  int	v_kernusecount; /* i ref count of kernel */
>>>>>>
>>>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>>>>>
>>>>>> Unionfs will be panic when lower fs layer is forced umounted by
>>>>>> "umount -f".  So to avoid this issue, we've added
>>>>>> "v_kernusecount" value that means "a vnode count that kernel are
>>>>>> using".  vkernrele() and vkernref() are functions that manage
>>>>>> "v_kernusecount" value.
>>>>>>
>>>>>> Please check those and give us an approve or some comments!
>>>>> There is already the vnode reference count. From your description, I 
>>>>> cannot
>>>>> understand how the kernusecount would prevent the panic when forced 
>>>>> unmount
>>>>> is performed. Could you, please, show the actual code ? PR mentioned
>>>>> does not contain any patch.
>>>> Our patch realizes avoiding kernel panic by "umount -f" operation using 
>>>> with
>>>> EBUSY process.
>>>>
>>>> On current implementation (not applied our patch), "umount -f" tries to
>>>> release vnode at any vnode reference count value. Since that, unionfs
>>>> and nullfs access invalid vnode and lead kernel panic. To prevent this
>>>> issue, we need a some kind of not-umount-accept-mechanism in invalid case
>>>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
>>>> to realize that, current vnode reference count is not enough we are 
>>>> thinking.
>>>>
>>>> If you have any ideas to realize the same solution with current vnode
>>>> reference, would you please tell us your idea :)
>>>>
>>>>> The problem you described is common for the kernel code, and right way
>>>>> to handle it, for now, is to keep refcount _and_ check for the forced
>>>>> reclaim.
>>> Your patch in essence disables the forced unmount. I would object against
>>> such decision.
>> Oooooo....   OK. We understand.
>>
>>> Even if taking this direction, I believe more cleaner solution would be
>>> to introduce a counter that disables the (forced) unmount into the
>>> struct mount, instead of the struct vnode. Having the counter in the
>>> vnode, the unmount -f behaviour is non-deterministic and depended on
>>> the presence of the cached vnodes of the upper layer. The mount counter
>>> would be incremented by unionfs cover mount. But, as I said above, this
>>> looks like a wrong solution.
>>>
>>> The right way to handle the forced reclaim with the current VFS is to
>>> add the explicit checks for the reclaimed vnodes where it is needed. The
>>> vnode cannot be reclaimed while the vnode lock is held. When obtaining
>>> the vnode lock, the reclamation can be detected. For instance, the
>>> vget() without LK_RETRY shall be checked for ENOENT.
>> At last, we want to check that vnode is released or not where
>> unionfs does not know. If we can do that check, our patch is
>> not needed for solving that issue.
>>
>> Would you please give us the way to check that target vnode is
>> released or not before accessing it.
> 
> The basic rules of our VFS are:
> 1. You _must_ hold the vnode unless the vnode is locked. Hold count
>    prevents the vnode memory from being reused and guarantees the
>    validity of the counters, v_vnlock, v_mount and vop (but please note
>    that validity != stability). E.g., v_mount may be NULLed and vop
>    become the deadfs_vop due to reclamation.
> 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
>    words, if you have a pointer to the non-reclaimed vnode that
>    is locked, the vnode cannot be reclaimed until the lock is freed.
> 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
>    of operations.
>    - If you specify the LK_RETRY in the lock flags, you would get
>      even the reclaimed vnode locked.
>    - If you do not specified LK_RETRY, you would get ENOENT for the
>      reclaimed vnode.
>    [See the #1 for the reason why you must have a vnode held while
>     calling vget() or vn_lock()].
> 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
>    interlock locked to check the context of the v_iflag. Most filesystems,
>    as opposed to the VFS, use the other technique to detect the reclaimed
>    vnode, if needed. They clear the v_data in the vop_reclaim, and
>    verification of the (v_data != NULL) is enough to check for reclamation.
> 
> Very good example of the practical usage of the rules above are the
> nullfs routines null_reclaim(), null_lock() and null_nodeget().

Thanks for your explanation!  We'll try to research and get another
new solution for this issue :)

>>> You said that that nullfs is vulnerable to the problem. Could you,
>>> please, point me to the corresponding stack trace ? At least, the nullfs
>>> vop_lock() seems to carefully check the possible problems.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi