skipping locks, mutex_owned, usb

Tue Aug 23 12:38:36 UTC 2011

On Tuesday 23 August 2011 14:09:15 Andriy Gapon wrote:
> Yes, the subject sounds quite hairy, so please let me try to explain it.
> First, let's consider one concrete function:
> 
> static int
> ukbd_poll(keyboard_t *kbd, int on)
> {
>         struct ukbd_softc *sc = kbd->kb_data;
> 
>         if (!mtx_owned(&Giant)) {
>                 /* XXX cludge */
>                 int retval;
>                 mtx_lock(&Giant);
>                 retval = ukbd_poll(kbd, on);
>                 mtx_unlock(&Giant);
>                 return (retval);
>         }
> 
>         if (on) {
>                 sc->sc_flags |= UKBD_FLAG_POLLING;
>                 sc->sc_poll_thread = curthread;
>         } else {
>                 sc->sc_flags &= ~UKBD_FLAG_POLLING;
>                 ukbd_start_timer(sc);   /* start timer */
>         }
>         return (0);
> }
> 
> This "XXX cludge" [sic] pattern is scattered around a few functions in the
> ukbd code and perhaps other usb code:
> func()
> {
> 	if (!mtx_owned(&Giant)) {
> 		mtx_lock(&Giant);
>                 func();
>                 mtx_unlock(&Giant);
> 		return;
> 	}
> 
> 	// etc ...
> }
> 
> I have a few question directly and indirectly related to this pattern.

Hi Andriy,

Thanks for bringing up this topic.

> 
> I.  [Why] do we need this pattern?
> Can the code be re-written in a smarter (if not to say proper) way?

Unless that API's surrounding ukbd change, we are stuck with auto-magic Giant 
locking inside ukbd. The USB stack itself does not require that Giant is used 
for the USB keyboard driver. It is some times since I touched this topic. From 
what I remember some interfacing layers of ukbd are using Giant. Some are not.

Much of the existing non-USB keyboard code is written to get/put key-presses 
directly through legacy PCI/ISA registers. This approach is deferred in the 
USB stack, hence it would cause blocking delays of 1ms per polling operation.

Another corner case:

When you are scrolling the console having scroll lock locked, then the first 
printf on that console will directly call an IOCTL in the ukbd to switch off 
the scroll lock led. As you might be aware this can be dangerous. The ukbd is 
also sometimes printing stuff. The chance of deadlock and LOR is increasing.

Without having to push for multiple changes outside ukbd, this problem is 
simply hidden in UKBD by checking if Giant is locked, which is required for 
starting the USB transfers in UKBD. I think it would be a very bad idea to 
lock another lock in a sub-function of printf. See!

A multithread / taskqueue interface is needed for postponing keyboard events.

And what about polling and UKBD? This needs a separate calling path I think.

> 
> II.  ukbd_poll() in particular can be called from the kdb context (kdb_trap
> -> db_trap -> db_command_loop -> etc).  What would happen if the Giant is
> held by a thread on some other CPU (which would be hard-stopped by
> kdp_trap)?  Won't we get a lockup here?
> So shouldn't this code actually check for kdb_active?
> 
> III.  With my stop_scheduler_on_panic patch ukbd_poll() produces infinite
> chains of 'infinite' recursion because mtx_owned() always returns false. 
> This is because I skip all lock/unlock/etc operations in the post-panic
> context.  I think that it's a good philosophical question: what operations
> like mtx_owned(), mtx_recursed(), mtx_trylock() 'should' return when we
> actually act as if no locks exist at all?
> 
> My first knee-jerk reaction was to change mtx_owned() to return true in
> this "lock-less" context, but _hypothetically_ there could exist some
> symmetric code that does something like:
> func()
> {
> 	if (mtx_owned(&Giant)) {
> 		mtx_unlock(&Giant);
>                 func();
>                 mtx_lock(&Giant);
> 		return;
> 	}
> 
> 	// etc ...
> 
> What do you think about this problem?
> Should we re-write ukbd_poll() (and possibly usb code) or should
> mutex_owned() be adjusted?

I think you've brought an interesing problem. I would suggest to change 
mtx_owned or make a new function to take an integer value to handle the case 
of SCHEDULER_STOPPED:

mtx_owned_default(struct mtx *, int return value in case of 
SCHEDULER_STOPPED);

> 
> That question III actually brings another thought: perhaps the whole of
> idea of skipping locks in a certain context is not a correct direction?
> Perhaps, instead we should identify all the code that could be legitimately
> executed in the after-panic and/or kdb contexts and make that could
> explicitly aware of its execution context.  That is, instead of trying to
> make _lock, _unlock, _owned, _trylock, etc do the right thing
> auto-magically, we should try to make the calling code check panicstr and
> kdb_active and do the right thing on that level (which would include not
> invoking those lock-related operations or other inappropriate operations).

> Thank you very much in advance for your insights and help!

--HPS