cvs commit: src/sys/dev/io iodev.c
brde at optusnet.com.au
Tue Aug 12 17:23:59 UTC 2008
On Tue, 12 Aug 2008, John Baldwin wrote:
> On Tuesday 12 August 2008 10:07:43 am Bruce Evans wrote:
>> I checked that bpf panics (even under UP) due to the obvious bugs in
>> its d_close():
>> # Generate lots of network activity using something like:
>> sysctl net.inet.icmp.icmplim=0; ping -fq localhost &
>> # Race to panic eventually:
>> while :; do tcpdump -i lo0 & sleep 0.001; revoke /dev/bpf0
>> Most or all device drivers have obvious bugs in their d_close(); bpf
>> is just a bit easier to understand and more likely to cause a panic
>> than most device drivers, since it is simple and frees resources. A
>> panic is very likely when si_drv1 is freed, and si_drv1 is only locked
> I think revoke(2) should EINVAL (or ENOTTY) for non-ttys.
That might work, but revoke() is supposed to work for all types of
files, and there may other paths to a forced d_close(). umount -f
should be one (umount -f should call vflush() with flags FORCECLOSE;
then vflush() calls vgonel(). revoke() is essentially vgone()).
However, devfs seems to be already missing support for umount -f.
> Of course bpf is
> broken with revoke, but nobody uses revoke with bpf. What people do do in
> the normal course of using bpf is lots of concurrent bpf accesses, and w/o
> D_TRACKCLOSE, bpf devices don't get closed.
Why not fix the actual bug?
Your commit doesn't give enough details on the actual bug, so I will
try to guess it: libpcap has to probe for for a free bpf unit, so it
does lots of failing opens of bpfN (especially when N == 0) when bpfN
is already in use. Failing opens break last closes with which they
are concurrent, because the relevant reference count (si_usecount) is
increased during the failing open (I think it is the vref() in _fgetvp()
that does it). Then when the opens fail, si_usecount is decremented
to 1, but devfs_close() is not called again because only 1 real last
close is possible (I think -- at least without races with revoke()),
so d_close() is never called twice for 1 real least close. Failing
opens shouldn't take long, so it is surprising that the race is often
lost. Apparently there is some synchronization.
This bug probably also affects the probe for a free pty.
The bug affects callin/callout serial devices in a non-race fashion.
I committed a work-around for it in FreeBSD-1 and still use this, but
never committed the work-around to FreeBSD-2+. The workaround is to
not count non-returned d_open()s in count_dev() or vcount(). This is
only a workaround because it might be correct to count non-returned
d_open()s in some cases. One interesting case is the vcount(vp) > 1
check for revoke(). We hold 1 reference to the vp, and non-returned
d_open()s (or even to-be-called d_open()s) may hold more, so vcount()
may be > 1 for non-open devices. Then we eventually call devfs_close(),
and devfs_close() may later be confused into calling d_close() on a
non-open device (I think devfs_close() is confused enough iff there
is precisely 1 non-returned d_open() later). This is bogus, but we
should do something to cancel any partially complete d_open()s if
there is >= 1 of them.
I once thought that using D_TRACKCLOSE was the right fix for this in
serial drivers. Now I know better :-). The bug affects all devices,
and has a tricky interaction with revoke() and with open()s and closes()
racing each other. vfs should handle it as an easy special case of
handling the others.
More information about the cvs-all