vgone() calling VOP_CLOSE() -> blocked threads?
Kostik Belousov
kostikbel at gmail.com
Sat Mar 15 21:06:00 UTC 2008
On Sun, Mar 16, 2008 at 03:55:18AM +1100, Bruce Evans wrote:
> On Sat, 15 Mar 2008, Ed Schouten wrote:
>
> >The last couple of days I'm seeing some strange things in my mpsafetty
> >branch related to terminal revocation.
> >
> >In my current TTY design, I hold a count (t_ldisccnt) of the amount of
> >threads that are sleeping in the line discipline. I need to store such a
> >count, because it's not possible to change line disciplines while some
> >threads are still blocked inside the discipline. This means that when
> >d_close() is called on a TTY, t_ldisccnt should always be 0. There
> >cannot be any threads stuck inside the line discipline when there aren't
> >any descriptors referencing it.
> >
> >Unfortunately, this isn't entirely true with the current VFS/devfs
> >design. When vgone() is called, a VOP_CLOSE() is performed , which means
> >there could be a dozen threads still stuck inside a device driver, but
> >the close routine is already called to clean up stuff. There are a
> >*real* lot of drivers that blindly clean up their stuff in the d_close()
> >routine, expecting that the device is completely unused. This can
> >easily be demonstrated by revoking a bpf device, while running tcpdump.
>
> Yes, most drivers are broken here, but the problem is rarely noticed
> because revoke() isn't normally applied to any devices except ttys.
> Even ordinary close() can cause problems when a thread is sleeping
> in device open, but this too is only common for ttys (for callin and
> callout devices).
>
> The tty driver is about the only driver that handles this problem
> almost correctly. It uses a generation count. All tty drivers are
> supposed to sleep using only ttysleep(). ttysleep() checks the
> generation count and returns ERESTART if the generation is new. All
> tty drivers should consider this error to be fatal and propagate it
> up to the syscall level where the syscall is restarted. This tends
> to happen naturally, but some places (in device close IIRC), the driver
> ignores the error and does more i/o (to finish cleaning up in close
> -- close and open can easily pass each other and clobber each others
> state when this happens). More I/O also tends to occur if a revoke()
> happens when a thread is blocked but not sleeping. Then ttysleep()
> isn't in sight, so the thread has no idea that the generation count
> changed. Giant locking limits this problem.
>
> >To be honest, I'm not completely sure how to solve this issue, though I
> >know it should at least do something similar to this:
> >
> >- The device driver should have a seperate routine (d_revoke) to wake
> > up any blocked threads, to make sure they leave the device driver
> > properly.
>
> Something is needed to signal blocked but non-sleeping threads. I
> think the wakeup for ttys now normally occurs as a side effect of
> flushing i/o. revoke() normally calls ttyclose() which calls ttyldclose()
> which normally calls ttylclose() which flushes i/o which wakes up
> threads waiting on the i/o. I don't see how ttylclose() can work right
> in the usual !FNONBLOCK case. Maybe revoke() sets FNONBLOCK. The
> generation count stuff doesn't help here because the flush is done
> before incrementing the generation count.
>
> There is an obvious race here for threads doing i/o instead of waiting
> for it. These muse be blocked (by Giant now for ttys, or by your
> MPSAFE locking). They will run when revoke() releases the lock and
> find the i/o flushed and maybe the generation count incrememented, but
> they normally won't check these states and will just blunder on doing
> more i/o. It would be painful to check these states every time the
> lock is aquired, but this seems to be necessary. Magic Giant locking
> makes the places where the lock is acquired hard see.
>
> >- Maybe vgonel() shouldn't call VOP_CLOSE(). It should probably move the
> > vnode into deadfs, with the exception of the close() routine. Maybe
> > it's better to add a new function to do this, vrevoke().
> >
> >This means that when a revoke() call is performed, all blocked threads
> >are woken up, will leave the driver, to find out their terminal has been
> >revoked. Further system calls will fail, because the vnode is in deadfs,
> >but when the processes close the descriptor, the device driver can still
> >clean up everything.
>
> I think vfs already moves the vnode to deadfs. It doesn't do anything
> to synchronize with threads running in device drivers. The forced
> last-close() should complete synchronously as part of revoke(). Then
> other threads leave the device driver asynchronously, hopefully not
> much later. Then if the generation count stuff is working right, the
> syscall is restarted, but now file descriptors point to deadfs so the
> syscall normally fails. I think the async completion is OK provided
> it is done right (don't delay it indefinitely, and don't do more
> i/o on completion). It doesn't seem to be useful to make revoke()
> wait for the completions.
>
> I don't think it would work well to move everything except d_close to
> deadfs.
>
> Other problems near here:
> - neither vfs nor drivers currently know how many threads are in a
> driver. vfs uses vp->v_rdev->si_usecount, but this doesn't quite work
This is provided by si_threadcount.
See the dev(vn)_refthread and it usage in the devfs vnops and fops.
> since it doesn't count threads sleeping in open. Maybe ones excuting
> last close too -- this would be more of a problem. revoke() just
> uses vcount(), which just acquires the device locks and returns
> si_usecount after releasing the device lock. (I don't understand
> this locking -- what stops the count changing after the lock is
> released, or if it cannot changed then why acquire the lock?) This
> can result in revoke() not calling device close when it should.
> Drivers can obviously keep count of their activities using large
> code. I can't see any way for vfs to keep count short of asking
> drivers for their counts.
> - there can be any number of threads in device open and close concurrently,
> even without the complications for revoke(). The most problematic
> cases happen when last-close blocks, as is common for ttys waiting
> for output to drain (since no one cares about their output actually
> working and ensures draining it using tcdrain() -- normal losing
> programs finish up with something like printf(); exit(); and depend
> on the close() in exit(); blocking to drain the output). Then new
> opens are allowed, and this is useful for doing ioctls() to unblocked
> blocked closes. If the new open or fcntl sets non-blocking mode, then
> the last-close for the new open may pass the blocked last close. If
> the new mode is blocking, then the last-close for the new open may block
> too. The number of threads in last-close is thus unlimited. A thundering
> herd of them tends to stomp on each other when they are all unblocked at
> the same time.
>
> The connections of this with revoke() are:
> - it takes vfs's not counting of all threads in the device driver to
> allow the useful behaviour of opens while a close is blocked and
> the necessary behaviour of last-close while another last-close is
> executing (drivers should be aware of this possibility and merge
> the closes, but don't).
> - I think revoke() sets FNONBLOCK somewhere. Thus it tends to unblock
> any thread waiting in last-close for output to drain.
>
> Less problematic cases occur when opens block. ttyopen() understands
> this possibility and handles it almost right using its t_wopeners
> count. ttyopen() uses various sleeps where it should use ttysleep()
> or check the generation count itself; this results in it looping
> internally instead of restarting the syscall, which is only a small
> error since for open() alone, restarting the syscall would call back
> to the same non-dead device open except in unusual cases where there
> was a signal and syscalls are not restarted, or the device name went
> away. There is still a problem with the vfs usage counting -- in
> one case involving callin and callout devices whose details I forget,
> last-close is not called when it needs to be called to wake up all
> the threads sleeping in open so that they can enter a new state.
The device driver already could provide the d_purge method that
is intended to safely drain all threads that are now in the
driver. See the kern_conf.c:destroy_dev() for the usage.
Also, please note that the drivers cannot call destroy_dev() from the
d_close method due to selflock with si_threadcount. The livelock is
caused by the fix for the problem identical to the problem you described,
with substitution s/ldisc/cdev/. The destroy_dev_sched()
function is provided to execute destroy_dev() from another context.
Alternatively to what was proposed regarding vrevoke(), you could
use the similar lifecycle management for the ldisc, if suitable.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080315/3ffb6366/attachment.pgp
More information about the freebsd-arch
mailing list