fix for per-mount i/o counting in ffs
Konstantin Belousov
kostikbel at gmail.com
Fri May 20 14:27:04 UTC 2016
On Fri, May 20, 2016 at 09:22:08PM +1000, Bruce Evans wrote:
> On Fri, 20 May 2016, Konstantin Belousov wrote:
>
> > On Fri, May 20, 2016 at 09:27:38AM +1000, Bruce Evans wrote:
> >>> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> >>> index 712fc21..21425f5 100644
> >>> --- a/sys/ufs/ffs/ffs_vfsops.c
> >>> +++ b/sys/ufs/ffs/ffs_vfsops.c
> >>> @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> >>> cred = td ? td->td_ucred : NOCRED;
> >>> ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> >>>
> >>> + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
> >>> dev = devvp->v_rdev;
> >>> dev_ref(dev);
> >>> + if (!atomic_cmpset_ptr(&dev->si_mountpt, 0, mp)) {
> >>> + dev_rel(dev);
> >>> + VOP_UNLOCK(devvp, 0);
> >>> + return (EBUSY);
> >>> + }
> >>
> >> This is cleaner and safer than my version.
> >>
> >>> DROP_GIANT();
> >>> g_topology_lock();
> >>> error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> >>
> >> g_vfs_open() already sets devvp->v_bufobj.bo_ops to g_vfs_bufops unless
> >> it fails. This clobbered our setting in the buggy multiple-mount case.
> >> But with multiple mounts not allowed, this cleans up any garbage in
> >> v_bufobj.
> > Yes, and this orders things. g_vfs_open() shoudl have devvp locked,
> > both fo bo manipulations and for vnode_create_vobject() call.
> > We can only assign to bo_ops after g_vfs_open() was done successfully.
>
> The atomic cmpset now orders things too. Is that enough? It ensures
> that an old mount cannot be active. I don't know if v_bufobj is used
> for non-mounts.
v_bufobj is logically protected against modifications by the vnode lock.
>
> Except, for zfs there is no g_vfs_open() to order things, and for all
> other file systems there is no atomic cmpset yet.
>
> >> g_vfs_open() has 2 failures for non-exclusive access. It starts by
> >> checking v_bufobj.bo_private == devvp (this is after translating its
> >> pointers to the ones passed here). This is avg's fix for the multiple-
> >> mounts problem (r206130). It doesn't work in all cases. I think this
> >> is unecessary now.
> > At least it weeds out other devfs mounts.
>
> Yes, we need it until everything is converted.
>
> >> ...
> >> I now see another cleanup: don't goto out when g_vfs_open() fails. This
> >> depends on it setting cp to NULL and leaving nothing to clean when it
> >> fails. It has no man page and this detail is documented in its source
> >> code.
> > Then I would need to add another NULL assignment, VOP_UNLOCK etc.
>
> g_vfs_open() already sets cp to NULL when it fails, and the cleanup
> depends on that now, but it is just as good to depend on no cleanup
> being needed on failure. You do need another dev_rel().
>
> I thought about moving the dev_ref() later to simplify the early returns.
> I thought that this didn't quite work. Now I think it does work, for
> obvious reasons:
> - the device is attached to a vnode, so it is referenced to prevent it
> going away unless the device is revoked. It seems to be referenced
> at least 3 times in FreeBSD-9.
> - the vnode is locked, so the reference count remains > 0 until we unlock.
> So we just need a dev_ref() before the unlock in the non-error case, to
> keep the device from going away if it is revoked.
Yes, and this is how the current patched code is structured.
>
> > Updated patch to add acq/rel.
> >
> > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> > index 712fc21..670bb15 100644
> > --- a/sys/ufs/ffs/ffs_vfsops.c
> > +++ b/sys/ufs/ffs/ffs_vfsops.c
> > @@ -764,24 +764,29 @@ ffs_mountfs(devvp, mp, td)
> > cred = td ? td->td_ucred : NOCRED;
> > ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> >
> > + KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
>
> Hrmph.
I want this, it would remove amount of obvious questions.
>
> > dev = devvp->v_rdev;
> > dev_ref(dev);
>
> Move later...
>
> > + if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, mp) != 0) {
>
> I changed the first 0 to NULL, and this works on i386, but now I remember
> that i386 has bogus casts which break detection of type mismatches --
> the atomic ptr functions take a [u]intptr_t, not a pointer type, so
> NULL won't work if it is ((void *)0). At least amd64 is still missing
> this bug.
cmpset_<bar>_ptr() on i386 has cast for old and new parameters to u_int.
store_rel_ptr() on i386 does not cast value to u_int. As result, NULL
is acceptable for cmpset, but not for store. I spelled it 0 in all cases.
Hm, I also should add uintptr_t cast for cmpset, otherwise, I suspect,
some arch might be broken.
>
> > + dev_rel(dev);
>
> ...then this dev_rel() is not needed.
>
> > + VOP_UNLOCK(devvp, 0);
> > + return (EBUSY);
> > + }
> > DROP_GIANT();
> > g_topology_lock();
> > error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> > g_topology_unlock();
> > PICKUP_GIANT();
> > - VOP_UNLOCK(devvp, 0);
> > - if (error)
> > + if (error != 0) {
> > + VOP_UNLOCK(devvp, 0);
> > goto out;
>
> This becomes:
>
> if (error != 0) {
> VOP_UNLOCK(devvp, 0);
> return (EBUSY);
> }
>
> Then assign v_bufobj.
>
> Then dev_ref(), just in time for unlocking.
>
> Then unlock.
Ok.
>
> > - if (devvp->v_rdev->si_iosize_max != 0)
> > - mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
> > + }
> > + if (dev->si_iosize_max != 0)
> > + mp->mnt_iosize_max = dev->si_iosize_max;
> > if (mp->mnt_iosize_max > MAXPHYS)
> > mp->mnt_iosize_max = MAXPHYS;
> > -
> > devvp->v_bufobj.bo_ops = &ffs_ops;
> > - if (devvp->v_type == VCHR)
> > - devvp->v_rdev->si_mountpt = mp;
> > + VOP_UNLOCK(devvp, 0);
>
> This belongs earlier.
>
> >
> > fs = NULL;
> > sblockloc = 0;
> > ...
>
> We need this in a central function. g_vfs_open/close() can do it for
> all cases except zfs. This looks like:
I might look at this later.
>
> DROP_GIANT();
> g_topology_lock();
> // atomic_cmpset and its error = EBUSY moved to top of g_vfs_open()
> error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
> g_topology_unlock();
> PICKUP_GIANT();
> if (error != 0) {
> VOP_UNLOCK(devvp, 0);
> return (error);
> }
> devvp->v_bufobj.bo_ops = &ffs_ops;
> dev_ref(dev);
> VOP_UNLOCK(devvp, 0);
> if (dev->si_iosize_max != 0)
> mp->mnt_iosize_max = dev->si_iosize_max;
> if (mp->mnt_iosize_max > MAXPHYS)
> mp->mnt_iosize_max = MAXPHYS;
>
> where 2 of 2 lines with GIANT and 3 of 4 lines with iosize_max remain to
> be cleaned up.
>
> Resetting si_mountpt in g_vfs_close() is even simpler. Oops, it also has
> to be reset in g_vfs_open() on a later failure there.
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 712fc21..65b1891 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -764,25 +764,30 @@ ffs_mountfs(devvp, mp, td)
cred = td ? td->td_ucred : NOCRED;
ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
+ KASSERT(devvp->v_type == VCHR, ("reclaimed devvp"));
dev = devvp->v_rdev;
- dev_ref(dev);
+ if (atomic_cmpset_acq_ptr(&dev->si_mountpt, 0, (uintptr_t)mp) != 0) {
+ VOP_UNLOCK(devvp, 0);
+ return (EBUSY);
+ }
DROP_GIANT();
g_topology_lock();
error = g_vfs_open(devvp, &cp, "ffs", ronly ? 0 : 1);
g_topology_unlock();
PICKUP_GIANT();
+ if (error != 0) {
+ VOP_UNLOCK(devvp, 0);
+ atomic_store_rel_ptr(&dev->si_mountpt, 0);
+ return (error);
+ }
+ dev_ref(dev);
+ devvp->v_bufobj.bo_ops = &ffs_ops;
VOP_UNLOCK(devvp, 0);
- if (error)
- goto out;
- if (devvp->v_rdev->si_iosize_max != 0)
- mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+ if (dev->si_iosize_max != 0)
+ mp->mnt_iosize_max = dev->si_iosize_max;
if (mp->mnt_iosize_max > MAXPHYS)
mp->mnt_iosize_max = MAXPHYS;
- devvp->v_bufobj.bo_ops = &ffs_ops;
- if (devvp->v_type == VCHR)
- devvp->v_rdev->si_mountpt = mp;
-
fs = NULL;
sblockloc = 0;
/*
@@ -1083,8 +1088,6 @@ ffs_mountfs(devvp, mp, td)
out:
if (bp)
brelse(bp);
- if (devvp->v_type == VCHR && devvp->v_rdev != NULL)
- devvp->v_rdev->si_mountpt = NULL;
if (cp != NULL) {
DROP_GIANT();
g_topology_lock();
@@ -1102,6 +1105,7 @@ out:
free(ump, M_UFSMNT);
mp->mnt_data = NULL;
}
+ atomic_store_rel_ptr(&dev->si_mountpt, 0);
dev_rel(dev);
return (error);
}
@@ -1287,8 +1291,7 @@ ffs_unmount(mp, mntflags)
g_vfs_close(ump->um_cp);
g_topology_unlock();
PICKUP_GIANT();
- if (ump->um_devvp->v_type == VCHR && ump->um_devvp->v_rdev != NULL)
- ump->um_devvp->v_rdev->si_mountpt = NULL;
+ atomic_store_rel_ptr(&ump->um_dev->si_mountpt, 0);
vrele(ump->um_devvp);
dev_rel(ump->um_dev);
mtx_destroy(UFS_MTX(ump));
More information about the freebsd-fs
mailing list