vgone() calling VOP_CLOSE() -> blocked threads?

Kostik Belousov kostikbel at gmail.com
Sat Mar 15 21:06:00 UTC 2008


On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote:
> On Sat, 15 Mar 2008, Ed Schouten wrote:
> 
> >The last couple of days I'm seeing some strange things in my mpsafetty
> >branch related to terminal revocation.
> >
> >In my current TTY design, I hold a count (t_ldisccnt) of the amount of
> >threads that are sleeping in the line discipline. I need to store such a
> >count, because it's not possible to change line disciplines while some
> >threads are still blocked inside the discipline. This means that when
> >d_close() is called on a TTY, t_ldisccnt should always be 0. There
> >cannot be any threads stuck inside the line discipline when there aren't
> >any descriptors referencing it.
> >
> >Unfortunately, this isn't entirely true with the current VFS/devfs
> >design. When vgone() is called, a VOP_CLOSE() is performed , which means
> >there could be a dozen threads still stuck inside a device driver, but
> >the close routine is already called to clean up stuff. There are a
> >*real* lot of drivers that blindly clean up their stuff in the d_close()
> >routine, expecting that the device is completely unused. This can
> >easily be demonstrated by revoking a bpf device, while running tcpdump.
> 
> Yes, most drivers are broken here, but the problem is rarely noticed
> because revoke() isn't normally applied to any devices except ttys.
> Even ordinary close() can cause problems when a thread is sleeping
> in device open, but this too is only common for ttys (for callin and
> callout devices).
> 
> The tty driver is about the only driver that handles this problem
> almost correctly.  It uses a generation count.  All tty drivers are
> supposed to sleep using only ttysleep().  ttysleep() checks the
> generation count and returns ERESTART if the generation is new.  All
> tty drivers should consider this error to be fatal and propagate it
> up to the syscall level where the syscall is restarted.  This tends
> to happen naturally, but some places (in device close IIRC), the driver
> ignores the error and does more i/o (to finish cleaning up in close
> -- close and open can easily pass each other and clobber each others
> state when this happens).  More I/O also tends to occur if a revoke()
> happens when a thread is blocked but not sleeping.  Then ttysleep()
> isn't in sight, so the thread has no idea that the generation count
> changed.  Giant locking limits this problem.
> 
> >To be honest, I'm not completely sure how to solve this issue, though I
> >know it should at least do something similar to this:
> >
> >- The device driver should have a seperate routine (d_revoke) to wake
> > up any blocked threads, to make sure they leave the device driver
> > properly.
> 
> Something is needed to signal blocked but non-sleeping threads.  I
> think the wakeup for ttys now normally occurs as a side effect of
> flushing i/o.  revoke() normally calls ttyclose() which calls ttyldclose()
> which normally calls ttylclose() which flushes i/o which wakes up
> threads waiting on the i/o.  I don't see how ttylclose() can work right
> in the usual !FNONBLOCK case.  Maybe revoke() sets FNONBLOCK.  The
> generation count stuff doesn't help here because the flush is done
> before incrementing the generation count.
> 
> There is an obvious race here for threads doing i/o instead of waiting
> for it.  These muse be blocked (by Giant now for ttys, or by your
> MPSAFE locking).  They will run when revoke() releases the lock and
> find the i/o flushed and maybe the generation count incrememented, but
> they normally won't check these states and will just blunder on doing
> more i/o.  It would be painful to check these states every time the
> lock is aquired, but this seems to be necessary.  Magic Giant locking
> makes the places where the lock is acquired hard see.
> 
> >- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the
> > vnode into deadfs, with the exception of the close() routine. Maybe
> > it's better to add a new function to do this, vrevoke().
> >
> >This means that when a revoke() call is performed, all blocked threads
> >are woken up, will leave the driver, to find out their terminal has been
> >revoked. Further system calls will fail, because the vnode is in deadfs,
> >but when the processes close the descriptor, the device driver can still
> >clean up everything.
> 
> I think vfs already moves the vnode to deadfs.  It doesn't do anything
> to synchronize with threads running in device drivers.  The forced
> last-close() should complete synchronously as part of revoke().  Then
> other threads leave the device driver asynchronously, hopefully not
> much later.  Then if the generation count stuff is working right, the
> syscall is restarted, but now file descriptors point to deadfs so the
> syscall normally fails.  I think the async completion is OK provided
> it is done right (don't delay it indefinitely, and don't do more
> i/o on completion).  It doesn't seem to be useful to make revoke()
> wait for the completions.
> 
> I don't think it would work well to move everything except d_close to
> deadfs.
> 
> Other problems near here:
> - neither vfs nor drivers currently know how many threads are in a
>   driver.  vfs uses vp->v_rdev->si_usecount, but this doesn't quite work
This is provided by si_threadcount.
See the dev(vn)_refthread and it usage in the devfs vnops and fops.


>   since it doesn't count threads sleeping in open.  Maybe ones excuting
>   last close too -- this would be more of a problem.  revoke() just
>   uses vcount(), which just acquires the device locks and returns
>   si_usecount after releasing the device lock.  (I don't understand
>   this locking -- what stops the count changing after the lock is
>   released, or if it cannot changed then why acquire the lock?)  This
>   can result in revoke() not calling device close when it should.
>   Drivers can obviously keep count of their activities using large
>   code.  I can't see any way for vfs to keep count short of asking
>   drivers for their counts.
> - there can be any number of threads in device open and close concurrently,
>   even without the complications for revoke().  The most problematic
>   cases happen when last-close blocks, as is common for ttys waiting
>   for output to drain (since no one cares about their output actually
>   working and ensures draining it using tcdrain() -- normal losing
>   programs finish up with something like printf(); exit(); and depend
>   on the close() in exit(); blocking to drain the output).  Then new
>   opens are allowed, and this is useful for doing ioctls() to unblocked
>   blocked closes.  If the new open or fcntl sets non-blocking mode, then
>   the last-close for the new open may pass the blocked last close.  If
>   the new mode is blocking, then the last-close for the new open may block
>   too.  The number of threads in last-close is thus unlimited.  A thundering
>   herd of them tends to stomp on each other when they are all unblocked at
>   the same time.
> 
>   The connections of this with revoke() are:
>   - it takes vfs's not counting of all threads in the device driver to
>     allow the useful behaviour of opens while a close is blocked and
>     the necessary behaviour of last-close while another last-close is
>     executing (drivers should be aware of this possibility and merge
>     the closes, but don't).
>   - I think revoke() sets FNONBLOCK somewhere.  Thus it tends to unblock
>     any thread waiting in last-close for output to drain.
> 
>   Less problematic cases occur when opens block.  ttyopen() understands
>   this possibility and handles it almost right using its t_wopeners
>   count.  ttyopen() uses various sleeps where it should use ttysleep()
>   or check the generation count itself; this results in it looping
>   internally instead of restarting the syscall, which is only a small
>   error since for open() alone, restarting the syscall would call back
>   to the same non-dead device open except in unusual cases where there
>   was a signal and syscalls are not restarted, or the device name went
>   away.  There is still a problem with the vfs usage counting -- in
>   one case involving callin and callout devices whose details I forget,
>   last-close is not called when it needs to be called to wake up all
>   the threads sleeping in open so that they can enter a new state.

The device driver already could provide the d_purge method that
is intended to safely drain all threads that are now in the
driver. See the kern_conf.c:destroy_dev() for the usage.

Also, please note that the drivers cannot call destroy_dev() from the
d_close method due to selflock with si_threadcount. The livelock is
caused by the fix for the problem identical to the problem you described,
with substitution s/ldisc/cdev/. The destroy_dev_sched()
function is provided to execute destroy_dev() from another context.

Alternatively to what was proposed regarding vrevoke(), you could
use the similar lifecycle management for the ldisc, if suitable.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080315/3ffb6366/attachment.pgp


More information about the freebsd-arch mailing list