cvs commit: src/sys/dev/io iodev.c

Wed Aug 13 11:52:37 UTC 2008

On Tue, 12 Aug 2008, John Baldwin wrote:

> On Tuesday 12 August 2008 01:23:47 pm Bruce Evans wrote:
>> On Tue, 12 Aug 2008, John Baldwin wrote:
>>> Of course bpf is
>>> broken with revoke, but nobody uses revoke with bpf.  What people do do in
>>> the normal course of using bpf is lots of concurrent bpf accesses, and w/o
>>> D_TRACKCLOSE, bpf devices don't get closed.
>>
>> Why not fix the actual bug?
>>
>> Your commit doesn't give enough details on the actual bug, so I will
>> try to guess it:  libpcap has to probe for for a free bpf unit, so it
>> does lots of failing opens of bpfN (especially when N == 0) when bpfN
>> is already in use.  Failing opens break last closes with which they
>> are concurrent, because the relevant reference count (si_usecount) is
>> increased during the failing open (I think it is the vref() in _fgetvp()
>> that does it)...
>
> Correct-ish.  The actual extra reference is in devfs_lookup() rather than
> _fgetvp().  Specifically, it is an open concurrent with a close.  The opening
> thread first does the lookup which bumps the reference count in
> devfs_allocv() (IIRC).  Eventually we get to devfs_open() which drops the
> vnode lock while it invokes d_open().  If the closing thread is waiting for
> the vnode lock for close, then it can acquire the lock and do devfs_close().

si_usecount is actually locked by dev_lock() (a global mutex), so in general
the vnode lock is neither necessary nor sufficient for implementing or fixing
the bug, but the general case is rarely used.  The general case involves
doing failing opens using more than one devfs mount.  dev_lock() is only
held briefly, so failing opens can easily get in and change si_usecount.
count_dev() and vcount() of course use dev_lock(), but this only gives the
transient value, so the value means very little unless it is the magic value
1 (this means that our reference is the only one).  devfs_close() has to
unlock everything, so si_usecount becomes even more volatile.

> This sees count_dev() > 1 and doesn't call d_close().  Meanwhile, the opening
> thread will fail bpfopen() and return, but the bpf device is permamently
> open.  When I talked with phk@ about it originally his reply to my various
> suggestions was D_TRACKCLOSE.  I'm not sure how you'd really fix this
> otherwise:

I would first try to actually count opens correctly.  It should be possible
to tell when a last-close is needed if everything is locked.  This is
complicated by the possibility of any numbers of opens and last-closes
already being in progress, with each having progressed to a timing-dependent
and driver-dependent state.  This can certainly happen now, but at least the
general case of it shouldn't be allowed to happen since no drivers can handle
it.  Ways that it can happen:
- after devfs_close() sees count_dev() == 1, it calls d_close().  SInce it
   unlocks everthing, devfs_open() can easily call d_open() any number of
   times before the d_close() completes.
- d_close() may be called any number of times to last-close the new open
   instances.  The new d_close()s may complete before the original ones.
This behaviour is easy to set up for hardware serial devices, and is
useful.  Let the first open instance be blocking and make it block on
drainwait in close.  Do several non-blocking open/closes using stty
-f to look at the problem.  Then do another non-blocking open/close
using stty -f or comcontrol to clear the problem.

> calling d_close() from devfs_open() if the use_count is 1 after
> re-acquiring the vnode lock (you have to drop the vnode lock while you call
> d_close() though, so you might still race with other concurrent opens for
> example), having a count of pending opens and subtracting that from
> count_dev() when checking for whether or not to call d_close() (but is that
> dubious?

This is basically what I put in FreeBSD-1.  It almost worked there.

> Then you might call d_close() while d_open() is running, so the
> driver would need to put locking to synchronize the two and handle the case
> that the d_close() actually runs after a d_open() that was successfull, so
> each driver now has to duplicate that work.), etc.

Strictly, all drivers already need to do this.

In FreeeBSD-1, simple methods worked provided d_close() and d_open()
don't sleep, since the kernel was not preemptive.  After sleeping, it
is obvious that d_close() and d_open() should check for relevant state
changes after waking up and not continue if the changes are too large.

Now, similar behaviour could be obtained using locking.   Don't unlock
things in devfs_close() and devfs_open() by default.  Only drivers
that unlocked things would face the full complications.  In particular,
in the usual case, holding dev_lock() or a corresponding device-dependent
lock would block new opens (and even lookups?) until after d_close()
returns.

Bruce