USB stack getting confused

Sun Mar 10 09:48:31 UTC 2019

On Sat, Mar 09, 2019 at 10:35:28PM +0100, Hans Petter Selasky wrote:
> On 3/9/19 8:23 PM, Konstantin Belousov wrote:
> > On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
> >>
> >> Is there a form of destroy_dev() that does a revoke on all open instances?
> >> Eg, this is gone, you can't use it anymore, and all further attempts to use
> >> the device will generate an error, but in the mean time we destroy the
> >> device and let the detach routine get on with life. waiting may make sense
> >> when you are merely unloading the driver (and getting to the detach routine
> >> that way), but when the device is gone, I've come around to the point of
> >> view that we should just destroy it w/o waiting for closes and anybody that
> >> touches it afterwards gets an error and has to cope with the error. But
> >> even in the unload case, we maybe we shouldn't get to the detach routine
> >> unless we're forcing and/or the detach routine just returns EBUSY since the
> >> only one that knows what dev_t's are associated with the device_t is the
> >> driver itself.
> > You are asking very basic questions about devfs there.
> > 
> > destroy_dev(9) waits for two things:
> > - that all threads left the cdevsw methods for the given device;
> > - that all cdevpriv destructors finished running.
> 
> Hi,
> 
> > To facilitate waking up threads potentially sleeping inside the cdevsw
> > methods, drivers might implement d_purge method which must weed out sleeping
> > threads from inside the code in the bound time.
> 
> USB will purge all callers before calling destroy_dev(). This is not the 
> problem.
> 
> > After that we return from destroy_dev(9) and guarantee that no new calls
> > into cdevsw is done for this device.  devfs magic consumes  the fo_ and
> > VOP_ calls and does not allow them to reach into the driver.
> 
> When I designed the current USB devfs it was important to me to keep 
> open() and close() calls balanced to avoid situations where an open call 
> may setup some resource and then close(), which free this resource 
> again, never gets called. destroy_dev(9) makes no such guarantee, and I 
> think that is a failure of destroy_dev(9). That's when I started using 
> the cdev's destructor callback function.
Lets correct the terminology first.
Are you referring to the d_open/d_close pairing ?

Without D_TRACKCLOSE, d_close() is only called on the last close of
the device.  With D_TRACKCLOSE, devfs _tries_ to call d_close each time
it sees the VOP_CLOSE() operation from VFS, but due to way VFS works
VOP_CLOSE() could be missed.  Also, d_open vs d_close are not synchronized,
so a driver might get call to d_open in parallel to last d_close.

What do you mean by cdev destructor callback function ?  Do you mean
callback from destroy_dev_cb(), or do you actually reference the
destructors from devfs_set_cdevpriv(9) ?

If the later, then destroy_dev() guarantees that all cdevpriv destructors
for all file descriptors opened against the destroyed cdev are finished
before destroy_dev() returns.  In other words, if you use cdevpriv, you
can remove the drain for your refcount and everything should just work.

> 
> > So what usb does there is actively defeating existing mechanism by
> > keeping internal refcount on opens and refusing to call destroy_dev()
> > until the count goes to zero 
> 
> The FreeBSD USB stack also is used in environments w/o DEVFS and need 
> own refcounts.
I completely disagree with use of code sharing as excuse for FreeBSD bugs.

> 
> > (I did not read the usb code, but I believe
> > that I am not too wrong).  
>  >Would usb core just destroy_dev() when the
> > physical device goes away, then at worst the existing file descriptors
> > opened against the lost devices would become dead (not same dead as
> > terminals after revoke(2), but very similar).
> 
> Yes, I can do that if destroy_dev() ensures that d_close is called for 
> all open file handles once and only once before it returns. I think this 
> is where the problem comes from.
See above.  For d_close it is impossible, for cdevpriv dtr it is already
there by design.

> 
> > 
> > If the problem is due to keeping some instance data for the opened device,
> > then cdevpriv might be the better fit (at least the KPI was designed
> > to be) than blocking destroy until all users are gone.
> > 
> 
> The USB stack does not use MMAP, so this is not a problem.
I do not follow, why does it matter ?

On Sat, Mar 09, 2019 at 10:40:02PM +0100, Hans Petter Selasky wrote:
> On 3/9/19 8:28 PM, Rozhuk Ivan wrote:
> > On Sat, 9 Mar 2019 18:26:40 +0200
> > Konstantin Belousov <kostikbel at gmail.com> wrote:
> > 
> >> In fact I saw something similar with apcupsd and either usb/com
> >> adapters or native usb control card for APC UPSes.  For reasons I do
> >> not understand, these devices are often disconnected.  For older
> >> versions of apcupsd, it required restart for newly reattached device
> >> to be recreated in /dev. Sometimes it hangs whole usb stack.
> >>
> >> Newer apcupsd seems to open /dev/ugen only for the duration of the
> >> query, which makes the erratic behaviour is much less likely, but
> >> could still cause breakage when device disappear while apcupsd has it
> >> opened.
> >>
> > 
> > Same problem with usb sound cards.
> > I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
> > https://reviews.freebsd.org/D11140
> > 
> 
> Hi,
> 
> How will these apps detect that they need to open the new /dev/mixer node?
> 
> I mean, after hang is fixed, mixer app will still try to query the old 
> file handle forever?
Userspace gets either ENXIO or EIO from syscalls.  For polls, it gets
POLLIN | POLLHUP immediately.