svn commit: r277199 - in head/sys: fs/devfs kern

Konstantin Belousov kostikbel at gmail.com
Fri Jan 16 08:03:40 UTC 2015


On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote:
> On 01/15/15 12:51, Konstantin Belousov wrote:
> > On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote:
> >>
> >> I see no leakage in that case either!
> > Because these cases come through destroy_dev().
> >
> >>
> >> Are there more cases which I don't see?
> > You are breaking existig devfs KPI by your hack.  You introduce yet another
> > reference on the device, which is not supposed to be there.
> 
> Hi Konstantin,
> 
> I need a non-sleeping way to say a character device is no longer 
> supposed to be used and be able to re-use the device name right away 
> creating a new device. I guess you got that.
Yes, I got it.  The devfs design is not suitable for this, and your
hack is the good witness of the fact.

My opinion is that you should have tried to handle the issue at the driver
level, instead of making this devfs issue.  I.e., if you already have
cdev node with the correct name, driver should have replaced the private
data to point to new device.

This would also close a window where /dev node is non-existent or operate
erronously.

> 
> >
> > If some code calls delist_dev(), it could be said that it is a contract
> > of the new function that destroy_dev() must be called eventually on
> > the cdev. Then, the reference could be said to be shared-owned by
> > delist_dev() and destroy_dev(). But, for arbitrary devfs user this new
> > reference is unacceptable and breaks interface.
> 
> delist_dev() changes no references. It can be called multiple times 
> even, also inside destroy_devl(). Also I think that the 
> "destroy_dev_sched_cbl()" function should call delist_dev() first so 
> that we don't have a time from when the "destroy_dev_sched_cbl()" 
> function is called where the device entry still exists in devfs mounts 
> until the final destroy_devl() is done by a taskqueue.
You do not understand my point.

I object against imposing one additional global reference on all cdevs
just to cope with the delist hack.  See the patch at the end of the message.

WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
delist_dev(), the node still exists in the /dev. It is only removed
after populate loop is run sometime later. dev_sched() KPI is inheritly
racy, drivers must handle the races for other reasons.

> 
> >>>> In the case of direct free through #1, the reference count is ignored
> >>>> and it doesn't matter if it is one or zero. Only in the case of
> >>>> destruction through destroy_dev() it matters.
> >>>>
> >>>> Like the comment says in destroy_devl():
> >>>>
> >>>> /* Avoid race with dev_rel() */
> >>>>
> >>>> The problem is that the "cdev->si_refcount" is zero when the initial
> >>>> devfs_create() is called. Then one ref is made. When we clear the
> >>>> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running
> >>>> process to destroy all the FS related structures and the reference count
> >>>> goes back to zero when the "cdp" is removed from the "cdevp_list". Then
> >>>> the cdev is freed too early. This happens because destroy_devl() is
> >>>> dropping the dev_lock() to sleep waiting for pending references.
> >>> Basically, this is very good explanation why your delist hack is wrong,
> >>> for one of the reason.  Another reason is explained below.
> >>> You are trying to cover it with additional reference, but this is wrong
> >>> as well.
> >>>
> >>>>
> >>>> Do you see something else?
> >>>
> >>> I think that what you are trying to do with the CDP_ACTIVE hack is doomed
> >>> anyway, because you are allowing for devfs directory to have two entries
> >>> with the same name, until the populate loop cleans up the inactive one.
> >>> In the meantime, any access to the directory operates on random entry.
> >>
> >> The entry will not be random, because upon an open() call to a character
> >> device, I believe the devfs_lookup() function will be called, which
> >> always populate the devfs tree at first by calls to
> >> devfs_populate_xxx(). Any delisted devices which don't have the
> >> "CDP_ACTIVE" bit set, will never be seen by any open.
> > Entry can be random, since after the populate loop is ran, your code in
> > other thread could start and create duplicate entry. There is a window
> > in the lookup where both directory vnode lock and mount point sx locks
> > are dropped. So running the populate does not guarantee anything.
> 
> If there is such a race, it is already there! My patch changes nothing 
> in that area:
> 
> Thread1:
> Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes 
> waiting for some refs for xxx milliseconds.
> Thread2:
> Tries to create create a new character device having the same name like 
> the one in thread1. Device name duplication check is missed because 
> CDP_ACTIVE is cleared. Still thread1 is waiting.
> Thread3:
> Tries to open character device created by thread2 while thread1 is still 
> waiting for some ref held by a userspace app to go away.
> 
> This can happen already before my patches! What do you think?
Possibly.

> 
> >
> >>
> >> Regarding leftover filedescriptors which still access the old "cdev"
> >> this is not a problem, and these will be closed when the si_refcount
> >> goes to zero after the destroy_devl() call.
> >>
> >>>
> >>> The checks for existent names in make_dev() are performed for the reason,
> >>> and you makes the rounds to effectively ignore it.
> >>>
> >>
> >> These checks are still correct and don't conflict with my patch from
> >> what I can see. Else the existing destroy_devl() would also be broken
> >> even before my patch with regard to the "random" selection of character
> >> devices at open() from userspace.
> >
> > The checks are done to avoid duplicate names.  Your patch makes these
> > checks ineffective (i.e. broken).
> 
> At what level do you mean duplicate names, I don't get this fully? At 
> the directory level (DE nodes)? Or inside the list of character devices 
> (LIST_XXX)?
It does not matter, dup at either one directory level, or dup of full
names in the global list are equivalent (bad) things.

> 
> > Let me summarize:
> > - the extra reference on arbitrary cdev should be eliminated. The
> >    delist_dev_locked() may add the ref and set some CDP_ flag to
> >    indicate to destroy_dev() that it should do dev_rel().
> 
> It is possible to do this. I thought about this before doing my patch, 
> but decided to try to avoid adding yet another cdev flag.
> 
> > - the existence of the duplicated entries should be either eliminated
> >    (I am not sure it is possible with your code), or we must ensure
> >    that only one name with CDP_ACTIVE flag set and given name exists.
> >    Then, all lookup code must be audited to take CDP_ACTIVE into account
> >    when accessing names.  I see at least devfs_find() and devfs_mknod()
> >    which must be changed.  I did not performed full audit.
> 
> I will check this path out aswell.
> 

It seems it is simpler for me to try to clean up after the commit.
The patch was only lightly tested.  I post the patch for discussion,
not for you to committing it.  I will expedite the change into HEAD
after the consensus on it is made and adequate testing is performed.

diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c
index 294bd62..6620aef 100644
--- a/sys/fs/devfs/devfs_devs.c
+++ b/sys/fs/devfs/devfs_devs.c
@@ -137,12 +137,6 @@ devfs_alloc(int flags)
 	vfs_timestamp(&ts);
 	cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts;
 	cdev->si_cred = NULL;
-	/*
-	 * Avoid race with dev_rel() by setting the initial
-	 * reference count to 1. This last reference is taken
-	 * by the destroy_dev() function.
-	 */
-	cdev->si_refcount = 1;
 
 	return (cdev);
 }
@@ -192,6 +186,16 @@ devfs_find(struct devfs_dirent *dd, const char *name, int namelen, int type)
 			continue;
 		if (type != 0 && type != de->de_dirent->d_type)
 			continue;
+
+		/*
+		 * The race with finding non-active name is not
+		 * completely closed by the check, but it is similar
+		 * to the devfs_allocv() in making it unlikely enough.
+		 */
+		if (de->de_dirent->d_type == DT_CHR &&
+		    (de->de_cdp->cdp_flags & CDP_ACTIVE) == 0)
+			continue;
+
 		if (bcmp(name, de->de_dirent->d_name, namelen) != 0)
 			continue;
 		break;
diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h
index ce55416..6c57109 100644
--- a/sys/fs/devfs/devfs_int.h
+++ b/sys/fs/devfs/devfs_int.h
@@ -56,6 +56,7 @@ struct cdev_priv {
 	u_int			cdp_flags;
 #define CDP_ACTIVE		(1 << 0)
 #define CDP_SCHED_DTR		(1 << 1)
+#define	CDP_UNREF_DTR		(1 << 2)
 
 	u_int			cdp_inuse;
 	u_int			cdp_maxdirent;
diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c
index 9153588..570f710 100644
--- a/sys/fs/devfs/devfs_vnops.c
+++ b/sys/fs/devfs/devfs_vnops.c
@@ -1045,6 +1045,9 @@ devfs_mknod(struct vop_mknod_args *ap)
 	TAILQ_FOREACH(de, &dd->de_dlist, de_list) {
 		if (cnp->cn_namelen != de->de_dirent->d_namlen)
 			continue;
+		if (de->de_dirent->d_type == DT_CHR &&
+		    (de->de_cdp->cdp_flags & CDP_ACTIVE) == 0)
+			continue;
 		if (bcmp(cnp->cn_nameptr, de->de_dirent->d_name,
 		    de->de_dirent->d_namlen) != 0)
 			continue;
diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c
index bcd6fb9..79c8fea 100644
--- a/sys/kern/kern_conf.c
+++ b/sys/kern/kern_conf.c
@@ -116,6 +116,8 @@ dev_free_devlocked(struct cdev *cdev)
 
 	mtx_assert(&devmtx, MA_OWNED);
 	cdp = cdev2priv(cdev);
+	KASSERT((cdp->cdp_flags & CDP_UNREF_DTR) == 0,
+	    ("destroy_dev() was not called after delist_dev(%p)", cdev));
 	TAILQ_INSERT_HEAD(&cdevp_free_list, cdp, cdp_list);
 }
 
@@ -1035,6 +1037,7 @@ destroy_devl(struct cdev *dev)
 {
 	struct cdevsw *csw;
 	struct cdev_privdata *p;
+	struct cdev_priv *cdp;
 
 	mtx_assert(&devmtx, MA_OWNED);
 	KASSERT(dev->si_flags & SI_NAMED,
@@ -1043,7 +1046,18 @@ destroy_devl(struct cdev *dev)
 	    ("WARNING: Driver mistake: destroy_dev on eternal %d\n",
 	     dev2unit(dev)));
 
-	devfs_destroy(dev);
+	cdp = cdev2priv(dev);
+	if ((cdp->cdp_flags & CDP_UNREF_DTR) == 0) {
+		/*
+		 * Avoid race with dev_rel(), e.g. from the populate
+		 * loop.  If CDP_UNREF_DTR flag is set, the reference
+		 * to be dropped at the end of destroy_devl() was
+		 * already taken by delist_dev_locked().
+		 */
+		dev_refl(dev);
+
+		devfs_destroy(dev);
+	}
 
 	/* Remove name marking */
 	dev->si_flags &= ~SI_NAMED;
@@ -1103,19 +1117,27 @@ destroy_devl(struct cdev *dev)
 		}
 	}
 	dev->si_flags &= ~SI_ALIAS;
-	dev->si_refcount--;	/* Avoid race with dev_rel() */
+	cdp->cdp_flags &= ~CDP_UNREF_DTR;
+	dev->si_refcount--;
 
-	if (dev->si_refcount > 0) {
+	if (dev->si_refcount > 0)
 		LIST_INSERT_HEAD(&dead_cdevsw.d_devs, dev, si_list);
-	} else {
+	else
 		dev_free_devlocked(dev);
-	}
 }
 
 static void
 delist_dev_locked(struct cdev *dev)
 {
+	struct cdev_priv *cdp;
 	struct cdev *child;
+
+	mtx_assert(&devmtx, MA_OWNED);
+	cdp = cdev2priv(dev);
+	if ((cdp->cdp_flags & CDP_UNREF_DTR) != 0)
+		return;
+	cdp->cdp_flags |= CDP_UNREF_DTR;
+	dev_refl(dev);
 	devfs_destroy(dev);
 	LIST_FOREACH(child, &dev->si_children, si_siblings)
 		delist_dev_locked(child);
@@ -1124,6 +1146,7 @@ delist_dev_locked(struct cdev *dev)
 void
 delist_dev(struct cdev *dev)
 {
+
 	dev_lock();
 	delist_dev_locked(dev);
 	dev_unlock();


More information about the svn-src-head mailing list