Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h

Mon Apr 28 16:22:51 UTC 2008

On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
> Thanks for your response and explanation :)
> 
> Kostik Belousov wrote:
> >On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
> >>Kostik Belousov wrote:
> >>>On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
> >>>>Hi Konstantin :)
> >>>>
> >>>>To fix a unionfs issue of 
> >>>>http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
> >>>>we need to add new functions
> >>>>
> >>>>  void vkernrele(struct vnode *vp);
> >>>>  void vkernref(struct vnode *vp);
> >>>>
> >>>>and one value
> >>>>
> >>>>  int	v_kernusecount; /* i ref count of kernel */
> >>>>
> >>>>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
> >>>>
> >>>>Unionfs will be panic when lower fs layer is forced umounted by
> >>>>"umount -f".  So to avoid this issue, we've added
> >>>>"v_kernusecount" value that means "a vnode count that kernel are
> >>>>using".  vkernrele() and vkernref() are functions that manage
> >>>>"v_kernusecount" value.
> >>>>
> >>>>Please check those and give us an approve or some comments!
> >>>There is already the vnode reference count. From your description, I 
> >>>cannot
> >>>understand how the kernusecount would prevent the panic when forced 
> >>>unmount
> >>>is performed. Could you, please, show the actual code ? PR mentioned
> >>>does not contain any patch.
> >>Our patch realizes avoiding kernel panic by "umount -f" operation using 
> >>with
> >>EBUSY process.
> >>
> >>On current implementation (not applied our patch), "umount -f" tries to
> >>release vnode at any vnode reference count value. Since that, unionfs
> >>and nullfs access invalid vnode and lead kernel panic. To prevent this
> >>issue, we need a some kind of not-umount-accept-mechanism in invalid case
> >>(e.x. fs in unionfsed stack, it must be umounted in correct order) and
> >>to realize that, current vnode reference count is not enough we are 
> >>thinking.
> >>
> >>If you have any ideas to realize the same solution with current vnode
> >>reference, would you please tell us your idea :)
> >>
> >>>The problem you described is common for the kernel code, and right way
> >>>to handle it, for now, is to keep refcount _and_ check for the forced
> >>>reclaim.
> >
> >Your patch in essence disables the forced unmount. I would object against
> >such decision.
> 
> Oooooo....   OK. We understand.
> 
> >Even if taking this direction, I believe more cleaner solution would be
> >to introduce a counter that disables the (forced) unmount into the
> >struct mount, instead of the struct vnode. Having the counter in the
> >vnode, the unmount -f behaviour is non-deterministic and depended on
> >the presence of the cached vnodes of the upper layer. The mount counter
> >would be incremented by unionfs cover mount. But, as I said above, this
> >looks like a wrong solution.
> >
> >The right way to handle the forced reclaim with the current VFS is to
> >add the explicit checks for the reclaimed vnodes where it is needed. The
> >vnode cannot be reclaimed while the vnode lock is held. When obtaining
> >the vnode lock, the reclamation can be detected. For instance, the
> >vget() without LK_RETRY shall be checked for ENOENT.
> 
> At last, we want to check that vnode is released or not where
> unionfs does not know. If we can do that check, our patch is
> not needed for solving that issue.
> 
> Would you please give us the way to check that target vnode is
> released or not before accessing it.

The basic rules of our VFS are:
1. You _must_ hold the vnode unless the vnode is locked. Hold count
   prevents the vnode memory from being reused and guarantees the
   validity of the counters, v_vnlock, v_mount and vop (but please note
   that validity != stability). E.g., v_mount may be NULLed and vop
   become the deadfs_vop due to reclamation.
2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
   words, if you have a pointer to the non-reclaimed vnode that
   is locked, the vnode cannot be reclaimed until the lock is freed.
3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
   of operations.
   - If you specify the LK_RETRY in the lock flags, you would get
     even the reclaimed vnode locked.
   - If you do not specified LK_RETRY, you would get ENOENT for the
     reclaimed vnode.
   [See the #1 for the reason why you must have a vnode held while
    calling vget() or vn_lock()].
4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
   interlock locked to check the context of the v_iflag. Most filesystems,
   as opposed to the VFS, use the other technique to detect the reclaimed
   vnode, if needed. They clear the v_data in the vop_reclaim, and
   verification of the (v_data != NULL) is enough to check for reclamation.

Very good example of the practical usage of the rules above are the
nullfs routines null_reclaim(), null_lock() and null_nodeget().

> 
> 
> >You said that that nullfs is vulnerable to the problem. Could you,
> >please, point me to the corresponding stack trace ? At least, the nullfs
> >vop_lock() seems to carefully check the possible problems.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080428/0a0079e6/attachment.pgp