protection against module unloading ?

Mon Jul 13 19:15:06 UTC 2015

On Mon, Jul 13, 2015 at 08:56:36PM +0200, Luigi Rizzo wrote:
> On Mon, Jul 13, 2015 at 06:29:12PM +0300, Konstantin Belousov wrote:
> > On Mon, Jul 13, 2015 at 05:00:30PM +0200, Luigi Rizzo wrote:
> ...
> > > thanks a lot for the clarification on the intent.
> > > I clearly need to understand more on the architecture of the module unload.
> > > 
> > > In any case: the global contention on devmtx for I/O syscalls is
> > > really a showstopper for making effective use of modular drivers
> > > so we should really find a way to remove it.
> > What contention do you see ?  Is it on the device node for a single
> > device ?  If yes, then any modification of the below proposal would
> > not help.  I explained this below.
> 
> It was adrian that pointed it out to me the huge devmtx contention
> with multiple threads doing I/O on netmap file descriptor
> (4-8 threads each of them issuing around 200K syscalls/s)
> 
> Now i see how even if my idea of per-dev lock was correct
> it would not remove contention at all.
> 
> One final thing:
> 
> > > Is there any other way to protect access to dev->si_threadcount ?
> > > 
> > > Eg how about the following:
> > > - use a (leaf) lock into struct cdev to protect dev->si_threadcount, so that
> > >   we could replace dev_lock() with mtx_lock(&dev->foo) in dev_refthread(dev)
> > >   dev_relthread(dev) and other places that access si_threadcount
> > This would not work, you cannot protect a lifetime of the object by a lock
> > contained in the object.
> 
> i thought so but then the current dev_refthread() is already unsafe,
> accessing dev->si_flags unprotected
> 
>     sys/kern/kern_conf.c:
> 
> 	struct cdevsw *
> 	dev_refthread(struct cdev *dev, int *ref)
> 	{
> 		struct cdevsw *csw;
> 		struct cdev_priv *cdp;
> 
> 		mtx_assert(&devmtx, MA_NOTOWNED);
> 		if ((dev->si_flags & SI_ETERNAL) != 0) {
> 			*ref = 0;
> 			return (dev->si_devsw);
> 		}
> 		dev_lock();
> 		csw = dev->si_devsw;
> 		if (csw != NULL) {
> 			cdp = cdev2priv(dev);
> 			if ((cdp->cdp_flags & CDP_SCHED_DTR) == 0)
> 				atomic_add_long(&dev->si_threadcount, 1);
> 			else
> 				csw = NULL;
> 		}
> 		dev_unlock();
> 		*ref = 1;
> 		return (csw);
> 	}
> 
> that is particularly bad though, because it prevents from
> checking the SI_ETERNAL flag without holding the lock (short
> of encoding the flag in the pointer!)
> 

I believe this check is fine, although for the complicated reasons.

E.g., for devfs_open(), the vnode is locked, so it cannot be reclaimed.
Since the vnode is not doomed, it referenced the v_rdev and struct cdev
memory cannot be freed and reused. So the access to the so_flags is
fine.

Note that devfs_fp_check(), which is called with the vnode unlocked,
still has a guarantee that the vnode is not going away, because the
vnode is referenced by the file descriptor. Then, the devvn_refthread()
checks VV_ETERNALDEV, without accessing the struct cdev at all. If the
vnode claims that the backing device is eternal, we have a guarantee
that v_rdev is either valid cdev, or NULL.

For the non-managed device pager, the vm object owns a reference on
the cdev, so again the device memory cannot go away while in the
dev_refthread().

I believe I considered all cases when I added eternal flag.

But note that what I explained above does not allow to put the per-device
lock (like dev_mtx) into the struct cdev, e.g. due to the vnode reference
trick.  If the vnode is not VV_ETERNAL, v_rdev might be invalidated at
any time, if you do not own vnode lock or interlock.