kern/109152: [rp] RocketPort panic from device_unbusy()

Bruce Evans bde at zeta.org.au
Fri Mar 16 12:02:34 UTC 2007


On Thu, 15 Mar 2007, Doug Ambrisko wrote:

> Craig Leres writes:
> | I was still able to crash Doug Ambrisko's version of the rp driver.
> | I think the problem is that in some cases rp_handle_port() calls
> | rpclose() which calls device_unbusy(). Later rpclose() is called
> | again and we hit the panic.
> |
> | I looked at the last known good version of the driver I'd used,
> | 1.45.2.2 from 4.10-RELEASE, and found it has code to avoid calling
> | rpclose() twice. I did something similar that seems to work.
> |
> | Note: The appended diffs are against version 1.67.2.2 of rp.c.
> ...
> I wonder if this should be a reference count?  Good find.

It is already a (buggy) reference count, and that is what causes the
panic.  device_busy() increments the device reference count, and
device_unbusy() decrements it.  A panic occurs when the reference
count is decremented below 0.

device_busy()/device_unbusy() pairs cannot work in general, since even
vfs doesn't know how to do the reference counting necessary for using
them.  Fortunately, only 6 drivers in -current use such pairs.  These
drivers are psm, bktr, drm, fdc, rp and agp.  The panic is most likely
to be triggered by rp, because it is the only tty driver among these,
the vfs refcount problems are largest near revoke(2), and revoke()s
are usually only done on tty devices.  revoke() sometimes causes a
last close on a device that is already closed or in the process of
being closed (driver blocked in it close function).  The result in
naive drivers like rp, that do an unconditional device_busy() in their
open function and an unconditional device_unbusy() in their close
function, is the device refcount going below 0 on the doubled close.
Conditional busy/unbusy calls are little better.  vfs doesn't know
the correct vfs refcount, and device drivers cannot know what vfs is
doing better than vfs (they can only know better what they are doing,
but the problem is partly internal).

Naive unconditional calls to device_busy()/device_unbusy() in a driver's
open/close functions also cannot work.  It is possible for revoke()
to be called while a driver's i/o/ioctl functions are active.  revoke()
then calls close() to cancel activity on the device.  Most drivers are
not aware of the possibility of close being called on active devices,
so they don't do anything to force the i/o/ioctl functions to finish
before the close, and it is quite likely for the close to complete
first.  Then calling device_unbusy() for the close doesn't panic, but
is wrong when it decrements the device refcount to 0, since the device
is still busy in the i/o/ioctl functions at that point.  Panics might
occur would an uncontrolled way if upper layers actually used the
device refcount alone to device whether to unload the device, but I
think the device refcount is unused except for invariants checking
(not under INVARIANTS) in device_unbusy().  Since device_busy()/unbusy()
are unusable, the invariants checking reduces to checking that they
are never used in cases where they have an effect :-).

I sent mail to Craig about different details of the bugs.  The preceding
paragraph (about why device_unbusy() cannot be called unconditionally in
device close, since being closed doesn't imply being unbusied) is the
best argument for never using these functions.

Bruce


More information about the freebsd-bugs mailing list