cvs commit: src/sys/dev/io iodev.c

Mon Aug 11 16:36:46 UTC 2008

On Sat, 9 Aug 2008, John Baldwin wrote:

> On Saturday 09 August 2008 12:35:50 am Bruce Evans wrote:
>> On Fri, 8 Aug 2008, John Baldwin wrote:
>>> On Friday 08 August 2008 09:43:56 am Ed Schouten wrote:
>>>> ed          2008-08-08 13:43:56 UTC
>>>>
>>>>   Apart from this change, I think some fishy things may happen when
>>>> using /dev/io in multithreaded applications. I haven't tested, but
>>>> looking at the code, the flag doesn't get cleared when close() is called
>>>> from another thread, but this may not be this important.
>>
>> Of course it isn't.  The flag only gets cleared on last close.  Threads
>> probably involve mainly file descriptors, and for file descriptors close()
>> doesn't even reach vn_close() until the fd reference count drops to 0.
>> Then for files, vn_close() normally doesn't reach device close until the
>> file reference count drops to 0.
>>
>>> It should be setting D_TRACKCLOSE though so that close() reliably clears
>>> the flag even in single-threaded processes.  You can still get odd
>>> behavior if
>>
>> You mean "should _not_ be setting D_TRACKCLOSE", so that close() works
>> normally.
>>
>> close() can never reliably clear the flag, since even vn_close() is not
>> reached after open()/dup()/close() or open()/fork()/close()...
>>
>>> you explicitly open it twice in an app and then close one of the two
>>> fd's. You will no longer have IO permission even though you still have
>>> one fd open. However, if you do that I think you deserve what you asked
>>> for. :)
>>
>> You asked for normal last-close behaviour and deserve that not being broken
>> by setting D_TRACKCLOSE.
>>
>> D_TRACKCLOSE allows different handling of non-last closes, but this is
>> rarely wanted, and in theory it allows better determination of last
>> closes, since vfs doesn't count last closes right.  However, it is
>> difficult to count last closes right, and most uses of D_TRACKCLOSE handle
>> last closes are worse than vfs:
>> - ast: uses D_TRACKCLOSE to trash (write a filemark) and/or rewind tapes
>>    on non-last closes.  After the trash and rewind, astclose() uses
>>    count_dev() to avoid doing a full hardware close if the device is still
>>    open.  This involves using count_dev(), but count_dev() is just an
>>    interface to the vfs count, so it miscounts in the same way as vfs.  In
>>    particular, it doesn't count incompleted opens.  I don't know if ast's
>>    locking is sufficient to prevent close() being called while a new open
>>    is in progress.  For most devices, it isn't.
>> - drm: it uses its own count of opens and closes, and it doesn't use
>>    count_dev().  The own count has some chance of working, but needs
>>    delicate locking of device open and device close to prevent them
>>    running into each other, and some defense against the vfs bugs which
>>    at least used to result in missing and/or extra closes.
>> - smb: like drm, except it limits its own count to 0 or 1 to give half-
>>    baked exclusive access (fork() and dup(), etc, still give non-exclusive
>>    access).  D_TRACKCLOSE thus has no effect except for its interaction
>> with other bugs.
>> - geom: unlike most or all of the others, this may actually need
>>    D_TRACKCLOSE, to implement multiple logical drivers per physical device.
>>    I don't understand its details.
>> - apm: like drm
>> - bpf: like smb
>
> You failed to note that not using D_TRACKCLOSE doesn't actually reliable close
> devices.

No, I noted that vfs doesn't count last closes right.  Unreliability of
last-close is a special case of that.  But using D_TRACKCLOSE mostly makes
things worse.  To avoid the vfs bugs in tracked closes, it is necessary
for individual drivers to understand vfs's bugs better than vfs does.

> I set D_TRACKCLOSE on bpf because under load at work (lots of
> concurrent nmap's) bpf devices would sometimes never get fully closed, so
> you'd have an unopened bpf device that was no longer available.

It is unclear how that helps.  vfs's bugs must be really large for the
last-close to go completely missing.  kib fixed some of them a few months
ago (before you changed bpf I think).  bpf is especially simple since it
is exclusive-accesses, so the vfs bugs should be smaller for it.

> D_TRACKCLOSE
> gives much saner semantics for devices, each open() of an fd calls
> the 'd_open' method, and the last close of that fd (which could be in another
> process) calls 'd_close'.

This is insane semantics for most devices, and D_TRACKCLOSE doesn't give it.
Most devices don't care about the difference between file descriptors and
files -- they want d_close to be called only when the last file descriptor
on the device is closed.  !D_TRACKCLOSE gives this, modulo the vfs bugs.

D_TRACKCLOSE doesn't give these semantics: revoke() closes everything
in 1 step and then calls d_close(), giving an N-1 correspondence between
d_open()s and d_close()s (modulo the vfs bugs giving a fuzzier
correspondence).  (IIRC, at least before kib's fixes, it was possible
for revoke() to cause any of 0, 1 or 2 calls to d_close().  I think 1
call always gets as far as devfs_close() but this is only guaranteed
to reaach d_close() with D_TRACKCLOSE.)  This is not a bug, but all
drivers that depend on the correspondence being 1-1 are broken.  Drivers
that enforce "exclusive" access like bpf have N=1, so they probably
work modulo the vfs bugs, and they have a good chance of not being
affected by the vfs bug that gives an extra devfs_close() and thus an
extra d_close().

> /dev/io in particular is quite bogus since even though a child process
> inherits the file descriptor, it doesn't inherit the behavior of
> having /dev/io open.

Not just child processes.  The same process disinherits the behaviour on
exec though it normally keeps the fd open.

Bruce