svn commit: r254585 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Xin Li delphij at delphij.net
Wed Aug 28 01:03:04 UTC 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 08/27/13 01:59, Davide Italiano wrote:
> On Tue, Aug 27, 2013 at 10:00 AM, Andriy Gapon <avg at freebsd.org>
> wrote:
>> on 27/08/2013 02:03 Xin Li said the following:
>>> On 08/26/13 08:35, Andriy Gapon wrote:
>>>> on 26/08/2013 01:15 Jeremie Le Hen said the following:
>>>>> Hi Xin,
>>>>> 
>>>>> On Tue, Aug 20, 2013 at 10:31:14PM +0000, Xin LI wrote:
>> [snip]
>>>>>> @@ zfs_rename(vnode_t *sdvp, char *snm, vno if 
>>>>>> (VOP_REALVP(tdvp, &realvp, ct) == 0) tdvp = realvp;
>>>>>> 
>>>>>> -  if (tdvp->v_vfsp != sdvp->v_vfsp ||
>>>>>> zfsctl_is_node(tdvp)) { + tdzp = VTOZ(tdvp);
>>> 
>>>> The problem with this change, at least on FreeBSD, is that
>>>> tdvp may not belong to ZFS.  In that case VTOZ(tdvp) does not
>>>> make any sense and would produce garbage or trigger an
>>>> assert.  Previously tdvp->v_vfsp != sdvp->v_vfsp check would
>>>> catch that situations. Two side notes: - v_vfsp is actually
>>>> v_mount on FreeBSD
>>> 
>>> Ah that's good point.  Any objection in putting a change to
>>> their _freebsd_ counterpart (zfs_freebsd_rename and
>>> zfs_freebsd_link) as attached?
>> 
>> I think that at least the change to zfs_freebsd_rename as it is
>> now would break locking and reference counting semantics for the
>> vnodes involved -- vreles and vputs have to be done even in the
>> error case.
>> 
>> Also, look at the check at the start of ufs_rename, it also
>> considers tvp when it's not NULL.  I am not sure if it is
>> possible to have a situation where fvp and tdvp are from the same
>> fs, but tvp is from a different one.
>> 
>> I've been also tempted to place the check in the common code in
>> kern_renameat. That should cover the system calls.  But could be
>> insufficient for direct calls to VOP_RENAME in other places.
>> 
>> diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c 
>> index a7cb87a..cfa4d93 100644 --- a/sys/kern/vfs_syscalls.c +++
>> b/sys/kern/vfs_syscalls.c @@ -3608,6 +3608,14 @@
>> kern_renameat(struct thread *td, int oldfd, char *old, int newfd,
>> char *new, error = EINVAL; goto out; } + +       /* Check for
>> cross-device rename. */ +       if ((fvp->v_mount !=
>> tdvp->v_mount) || +           (tvp && (fvp->v_mount !=
>> tvp->v_mount))) { +               error = EXDEV; +
>> goto out; +       } + /* * If the source is the same as the
>> destination (that is, if they * are links to the same vnode),
>> then there is nothing to do.
>> 
>>>> - VOP_REALVP is a glorified nop on FreeBSD
>>> 
>>> It's not clear to me what was the intention for having a macro
>>> instead of just ifdef'ing the code out -- maybe to prevent
>>> unwanted conflict with upstream?  These two VOP's are the only
>>> consumers of VOP_REALVP in tree.
>> 
>> Yes.  Personally I would just ifdef out the calls.
>> 
>>>> Another unrelated problem that existed before this change and
>>>> has been noted by Davide Italiano is that sdvp is not locked
>>>> and so it can potentially be recycled before ZFS_ENTER()
>>>> enter and for that reason sdzp and zfsvfs could be invalid.
>>>> Because sdvp is referenced, this problem can currently occur
>>>> only if a forced unmount runs concurrently to zfs_rename.
>>>> tdvp should be locked and thus could be used instead of sdvp
>>>> as a source for zfsvfs.
>>> 
>>> I think this would need more work, I'll take a look after we
>>> have this regression fixed.
>> 
>> Thank you.
>> 
>> -- Andriy Gapon
> 
> I've written a patch that attempts to fix the second problem. 
> http://people.freebsd.org/~davide/review/zfsrename_recycle.diff The
> culprit is that we're dereferencing sdvp without the vnode lock 
> held, so data-stability is not guaranteed. tdvp, instead, is locked
> at the entry point of VOP_RENAME() (at least according to
> vop_rename_pre()) so it can be safely used. As pointed out by
> Andriy ZFS has some different mechanisms wrt other file systems
> that allow the vnode to not be recycled when it's referenced
> (ZFS_ENTER), so once we get zfsvfs_t * from tdzp we can call
> ZFS_ENTER and dereference sdvp in a safe manner.

I'm not sure that this is right.  Now we have:

        tdzp = VTOZ(tdvp);
        ZFS_VERIFY_ZP(tdzp);
        zfsvfs = tdzp->z_zfsvfs;
        ZFS_ENTER(zfsvfs);		// tdzp's z_zfsvfs entered
        zilog = zfsvfs->z_log;
        sdzp = VTOZ(sdvp);
        ZFS_VERIFY_ZP(sdzp);		// (*)

Note that in the (*) step, when sdzp is invalid and sdzp have
different z_zfsvfs than tdzp (for instance when the node is in the
snapshot directory; the code later would test this), we could end up
with ZFS_EXIT()'ing the wrong z_zfsvfs.

Cheers,
- -- 
Xin LI <delphij at delphij.net>    https://www.delphij.net/
FreeBSD - The Power to Serve!           Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (FreeBSD)

iQEcBAEBCgAGBQJSHUxBAAoJEG80Jeu8UPuzkzEIAKXlfGuTodrPcPkDkxBhhaOj
QoxoT06jFwMTplysICuCpslQNyyG2Jxq654u9nMh6q5dww370eTtm2FJ0n2QTxk4
JeWLpZVUu7VHWNXYVJQqrmATaMFHO4wVf5AYpSHn+1iCWo0kQn1MPxPw/oSUt0yw
1628jhWTs8n+rxQaYrYN6ewXYeylMwC50ZB0kE/gQpQZ+cKAGmrM/8tur24SQBEo
WwrHakrv1DGJ8rv3Q53FB2iUZ+zEAZs6MJ28w32lV3vOI20jfQbEK+RR7i7HvxdV
c/7M96wCd0X4iEqZb87VVfJFSRdQsXy/35scXhhh/BSviT7rervS8XxPGhfpXOs=
=MzwT
-----END PGP SIGNATURE-----


More information about the svn-src-head mailing list