cvs commit: src/sys/dev/io iodev.c

Wed Aug 13 10:16:29 UTC 2008

On Wed, 13 Aug 2008, Bruce Evans wrote:

> On Tue, 12 Aug 2008, John Baldwin wrote:

>> Of course bpf is
>> broken with revoke, but nobody uses revoke with bpf.  What people do do in
>> the normal course of using bpf is lots of concurrent bpf accesses, and w/o
>> D_TRACKCLOSE, bpf devices don't get closed.
>
> Why not fix the actual bug?
>
> Your commit doesn't give enough details on the actual bug, so I will
> try to guess it:  libpcap has to probe for for a free bpf unit, so it
> does lots of failing opens of bpfN (especially when N == 0) when bpfN
> is already in use.  Failing opens break last closes with which they
> are concurrent, because the relevant reference count (si_usecount) is
> increased during the failing open (I think it is the vref() in _fgetvp()
> that does it).  Then when the opens fail, si_usecount is decremented
> to 1, but devfs_close() is not called again because only 1 real last
> close is possible (I think -- at least without races with revoke()),
> so d_close() is never called twice for 1 real least close.  Failing
> opens shouldn't take long, so it is surprising that the race is often
> lost.  Apparently there is some synchronization.
>
> This bug probably also affects the probe for a free pty.

In fact, it is well known that it does.  There is a PR or 2 about this,
and committed fixes that were backed out because they gave panics instead
of just leaked ptys.  The following seems to my most recent mail about
this.  The test program in it still leaks ptys under -current.

% From bde at zeta.org.au Thu Nov  9 16:39:06 2006 +1100
% Date: Thu, 9 Nov 2006 16:39:03 +1100 (EST)
% From: Bruce Evans <bde at zeta.org.au>
% X-X-Sender: bde at delplex.bde.org
% To: Rong-en Fan <grafan at gmail.com>
% cc: Kostik Belousov <kostikbel at gmail.com>, bde at FreeBSD.org, tegge at FreeBSD.org, 
%     mb at imp.ch, Vlad Galu <dudu at dudu.ro>
% Subject: Re: Fwd: panic when portupgrade in jail (devfs related?)
% In-Reply-To: <6eb82e0611081559s2dd6d088k529b161a63b9eec2 at mail.gmail.com>
% Message-ID: <20061109155921.O59908 at delplex.bde.org>
% References: <6eb82e0611050847i54d16638x89c428c9dffcc106 at mail.gmail.com> 
%  <6eb82e0611060946p726b3418of94036f583e95dcc at mail.gmail.com> 
%  <20061106193405.GU12108 at deviant.kiev.zoral.com.ua> 
%  <6eb82e0611061146gd6b8402xc254a992d149dd11 at mail.gmail.com> 
%  <20061107041730.GV12108 at deviant.kiev.zoral.com.ua> 
%  <6eb82e0611062223o4b7193bbu4fb19c79f49e9700 at mail.gmail.com> 
%  <6eb82e0611070026m682bc3f7vfdbcb7c0527fdd06 at mail.gmail.com> 
%  <6eb82e0611080221u47473713m17b19e60307663a0 at mail.gmail.com>
%  <6eb82e0611081559s2dd6d088k529b161a63b9eec2 at mail.gmail.com>
% MIME-Version: 1.0
% Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
% Status: O
% X-Status: 
% X-Keywords: 
% X-UID: 623
% 
% On Thu, 9 Nov 2006, Rong-en Fan wrote:
% 
% > OK, after running 4 instances of script in jail for 10+ hours. It seems
% > that ttys are leaked (I haven't got any panic). I get lots of
% >
% > script: openpty: No such file or directory
% 
% This is another known bug.  No fix is in sight, but I think one of the
% other known bugs can be exploited to write a program to unleak the
% ptys.  (Arrange for a nonzero vfs refcount, possibly by attempting to
% open the leaked pty, and race with revoke().  The open() will always fail
% int ptcopen(), but it may be possible to trick revoke() into calling
% ptcclose() for the unopen device; then the ptcclose() will unleak the
% device.)
% 
% I now have a better SMP machine for testing the races.  The original
% version of my program to demonstrate the leak doesn't show the leak
% very fast under -current on this machine, and it takes 2 instances
% (parent doing open()s and child doing stat()s) to leak at all, though
% the leak occurs fast on a machine with similar speed (but configured
% with INVARIANTS etc) running 6.1.  Apparently, locking fixes in
% -current have fixed or reduced the race with stat().
% 
% The following version (with TIOCEXCL and both the parent and child
% doing open()'s) leaks fast on all machines that I tested on.
% 
% %%%
% #include <sys/ioctl.h>
% #include <sys/stat.h>
% 
% #include <err.h>
% #include <fcntl.h>
% #include <unistd.h>
% 
% int
% main(int argc, char **argv)
% {
%  	struct stat sb;
%  	const char *pty, *who;
%  	pid_t child, parent;
%  	int fd, i, lastok;
% 
%  	pty = (argc == 1 ? "/dev/ptyp0" : argv[1]);
%  	i = 0;
%  	lastok = -1;
%  	parent = getpid();
%  	child = fork();
%  	who = (child == 0 ? "child" : "parent");
%  	for (;;) {
%  		fd = open(pty, O_WRONLY | O_NONBLOCK);
%  		if (fd < 0) {
%  			if (i - lastok >= 1000) {
%  				if (child != 0)
%  					usleep(50000);
%  				err(1, "%s open %d", who, i);
%  			}
%  			i++;
%  			continue;
%  		}
%  		(void)ioctl(fd, TIOCEXCL);
%  		if (close(fd) != 0)
%  			err(1, "%s close %d", who, i);
%  		lastok = i;
%  		i++;
%  	}
% }
% %%%
% 
% Usage is something like "./foo /dev/ptyp<victim>" as non-root.  It
% quits after 1000 failed opens and a leaked pty is shown by failing
% open #'s 0 through 999 and printing i == 999 when quitting.  Some
% open failures are now normal for ptyp* since all opens of ptyp*
% are exclusive and the program now races itself.  The TIOCEXCL is to
% give similar exclusivity for testing other devices.  It only works
% as non-root.  Remove it to simplify testing of ptyp* only.  After
% an open() failure, the usleep() is to give up control so as not
% to see 1000 sequential open() failures on UP systems just because
% 1000 open()s can be done before being rescheduled.  Remove it to
% simplify testing on SMP systems.
% 
% Bruce
%

Any device tha enforces exclusive access for all opens (instead of
selectively/correctly according to O_EXCL and/or TIOCEXCL) has the
same leak.  Such devices must have some state which indicates that
the device is open.  When a last-close is missed, the device-is-open
state remains set, so the device cannot be opened again.  Then it
cannot be closed again.  For devices that don't do this, the missed
last-close isn't much of a problem.  New opens just work, and it
is possible to unleak the device by repeating open()/close() until
the last-close isn't missed.

Bruce