[PATCH] Newbus locking

Fri Jul 31 16:18:16 UTC 2009

Hi,

Speaking about the USB subsystem and newbus:

I consider this patch to be a step backwards for the USB stack. USB is 
multithreaded, whilst the patch tries to enforce single threading through the 
newbus lock which covers the complete device_xxx domain and all its device_t 
nodes.

Let me define two terms.

1) Inside caller
For each USB controller there is a usbusX device associated with a single 
thread that is responsible for creating and hooking on children. See reference 
at end of e-mail. An inside caller is when virtual device methods originate 
from the same thread that created it.

In USB context, the following device methods are known inside callers:

device_attach
device_detach
device_probe
device_suspend
device_resume

2) Outside caller
When an external thread tries to execute a method on a device that was not 
created by the same thread.

In USB context, the following device methods are possibly are being calling 
from the outside:

        DEVMETHOD(bus_child_location_str, uhub_child_location_string),
        DEVMETHOD(bus_child_pnpinfo_str, uhub_child_pnpinfo_string),
        DEVMETHOD(bus_driver_added, uhub_driver_added),
        DEVMETHOD(usb_handle_request, ustorage_fs_handle_request),

Comments:

The USB stack ensures that "usb_handle_request" is being called when only when 
the device is present and attached, through use of its own SX-locks.

"bus_driver_added" can be called anytime.

"bus_child_pnpinfo_str" and "bus_child_location_str" should be called at 
probe/attach and its contents strdup'ed to avoid outside calls.

1+2) There should be no requirement to lock any newbus locks before calling 
device_xxxx() when an inside call happens. For the outside calls that happen, 
the data can be copied / strdup'ed to avoid any problems with regard to the 
USB stack. After this change there are no problems left with regard to USB and 
newbus.

I was hoping that the Giant requirement for newbus would go away, but all that 
has been done is to replace Giant with an sx-lock protecting all access and 
calls into any device_xxx() method. This is a step backwards.

Do you have any paper Attilio, describing your other efforts and why you did 
not choose them in an orderly way?

Is your patch to be considered a temporary bandaid patch?

--HPS

Reference: Example output devinfo:

        uhci0
          usbus0
            uhub0
        uhci1
          usbus1
            uhub1
        uhci2
          usbus2
            uhub2
        uhci3
          usbus3
            uhub3
        ehci0
          usbus4
            uhub4
              uhub5
                uaudio0
                  pcm2
                ums0
                ukbd0
                ums1
              umass0
              uhub6
                umass1
                ihfc0
                uhub7
                  uplcom0
                  uaudio1
                    pcm3
                  uhid0

On Friday 31 July 2009 16:59:21 Attilio Rao wrote:
> I spent the last two weeks searching, coding and trying different ways to
> get newbus locked.
>
> In newbus we have a lot of different members in datastructures which
> are accessed in parallel:
> * A list of devclasses
> * Any devclass maintains a list of drivers and a table of devices
> (useful for accessing them through the unit number)
> * Any device maintain a list of children devices and makes operations
> on them (generally a bus)
> * Flags for devclasses and states for devices
>
> In order to maintain consistency on all the accesses to such
> datastructures I outlined the following pre-requisites:
> * Locking must maintain consistency between set of different
> operations (you can't consider to break things like
> devclasses/children devices lists modifications in the middle)
> * Locking must not change pre-requisites in the callers (aka: don't
> switch a normally not sleeping function into a sleeping one)
> * Locking must take into account that virtual functions (example:
> BUS_DRIVER_ADDED(), DEVICE_DETACH(), etc) can sleeep and however their
> content is unknown.
> * Caching objects within the above mentioned datastructure was not a
> good option because if datastructures were allowed to add/remove new
> objects in the while some informations could have lost
>
> In order to have a good locking scheme I tried several things. The
> problem is that newbus presents a lot of calls willing to access to
> one of the lists and calling a virtual function inside. This was
> giving some problems because just dropping the lock, call the function
> and reacquiring the lock was not a viable option. Refcounting an
> objects on reading of such datastructures (like the list of drivers
> for a given devclass, for example) was going to work but it was going
> to offer much pain on the write path for the datastructure: as long as
> we could not sleep we had to fail the operation. While this could work
> in place it was going to complicate things a lot and adding a lot of
> failure point. While it is true that in newbus operations can fail, it
> is also true they often don't fail (they just usually do in the case
> of failed malloc()s, for example). This would have given high likely
> of failure on simple things like device probe and attach.
> This method, however, was also not going to protect efficiently
> members like devices flags.
>
> In order to satisfy prerequisites, the better thing would have been to:
> * Have a lock which could have be maintained through virtual functions
> (so probabilly a sx lock)
> * Have a lock to be acquired outside of the functions itself (in order
> to avoid recursion in the newbus specific functions and maintain a
> critical path across several calls)
>
> This is not too difficult and it is basically what Giant currently does.
> Additively, newbus design helps us a bit here because newbus modifies
> happens in some well known contexts:
> 1) boot time, single-threaded, busses/device initialization
> 2) post-boot, multi-threaded, busses/device initialization via the config
> hooks 3) character device operations which handles new devices (it is the
> case of ATA, for example)
> 4) run-time loaded modules (referred to driver_module_handler())
> 5) edge cases of adeguately dedicated threads (usb) and taskqueues
> (ATA) that do post-initialization probing
>
> The idea is to use a global lock to be acquired in key points, with a
> bit of help from the consumers (and please note that consumers alredy
> forsee such help in the Giant case, so we are no adding anything
> difficult).
> More in detail:
> 1 is adeguately protected alone on its own because it is single-threaded.
> 2 can be protected by locking in the config hooks themselves by each
> driver (as it happens now) or locking in the
> run_interrupt_driven_config_hooks() directly (for the patch I followed the
> former approach in order to minimize the overhead when it is not necessary)
> 3 can have sleepable context so they can handle to acquire a sx lock there.
> 4 can be simply locked in driver_module_handler()
> 5 should have its own locking as any normal thread, so they can
> acquire newbus lock
>
> So I prepared this patch based on the above mentioned analysis:
> http://www.freebsd.org/~attilio/Yahoo/newbus/newbus_locking3.diff
>
> This patch has been reviewed by all the key committers (in the newbus
> and device drivers area) and tested by a couple of testers, but I'd
> really would like to gather more testing before to commit the patch as
> we are very near to a new release.
> More specifically I would like to get more testing in the 'hotplugging
> devices' area like USB and pccard.
> Ideally, a tester, should compile a kernel with KDB, DDB,
> INVARIANT_SUPPORT, INVARIANTS and WITNESS and try how many different
> devices he can, both loading as module or directly at boot time.
> Ideally, testers should seek for LOR involving the newbus lock, panic
> for newbus lock recursion or lost assertions on newbus lock and Giant.
> If one of these happens, please report the backtrace along with the
> 'show alllocks' from ddb (if you can't collect with textdumps, console
> or pictures, even writing down by hand is a viable option).
> This patch is considered a major cornerstone for FreeBSD-8.0 RELEASE
> and we should consider its successfull completition an high priority
> task.
>
> The work has been kindly founded by Yahoo! incorporated.
>
> Thanks,
> Attilio