svn commit: r331298 - head/sys/dev/syscons

Thu Mar 22 00:31:50 UTC 2018

On Wed, Mar 21, 2018 at 11:53 AM, Bruce Evans <brde at optusnet.com.au> wrote:

> On Wed, 21 Mar 2018, Warner Losh wrote:
>
> Log:
>>  Unlock giant when calling shutdown_nice()
>>
>
> This breaks the driver.  Giant is syscons' driver lock, and also the
> interrupt handler lock for at least the atkbd keyboard driver, so vt
> sometimes holds the lock for.
>

OK. I got carried away. You're right. The proper fix is to unlock Giant at
the top of kern_reboot() instead. This handles the case where we call
shutdown_nice() with no init running and have to call kern_reboot directly.
Otherwise it's perfectly fine to just call shutdown_nice() with Giant held
since we just signal init from there. So I'll revert this change and make
that other change instead.


> vt has to sprinkle lots of Giant locking and unlocking where the
> corresponding locking is automatic for syscons, but this seems to be
> nonsense in its event handler.  vt has to acquire Giant when calling
> the keyboard driver, but event handling is a call from the keyboard
> driver, usually from the interrupt handler with a suitable lock held.
> For atkbd, the suitable lock is Giant -- see atkbd_intr().  (All keyboard
> drivers that use the kbd(4) for the top level use Giant looking since
> toe top level specifies D_NEEDGIANT in its cdevsw.  I think that is all
> keyboard drivers.)  So where vt_kbdevent() sprinkles Giant, that has no
> effect since Giant is held by the caller.  So vt_kbdevent() calls
> vt_processkey() with Giant held despite it not acquiring Giant explicitly;
> vt_processkey() calls vt_machine_kbdevent() and that calls various
> shutdown functions, all with Giant held.


True, and covered above.


> Modified: head/sys/dev/syscons/syscons.c
>> ============================================================
>> ==================
>> --- head/sys/dev/syscons/syscons.c      Wed Mar 21 14:47:08 2018
>> (r331297)
>> +++ head/sys/dev/syscons/syscons.c      Wed Mar 21 14:47:12 2018
>> (r331298)
>> @@ -3858,22 +3858,28 @@ next_code:
>>
>>             case RBT:
>> #ifndef SC_DISABLE_REBOOT
>> -               if (enable_reboot && !(flags & SCGETC_CN))
>> +               if (enable_reboot && !(flags & SCGETC_CN)) {
>> +                       mtx_unlock(&Giant);
>>                         shutdown_nice(0);
>> +               }
>> #endif
>>                 break;
>>
>>             case HALT:
>> #ifndef SC_DISABLE_REBOOT
>> -               if (enable_reboot && !(flags & SCGETC_CN))
>> +               if (enable_reboot && !(flags & SCGETC_CN)) {
>> +                       mtx_unlock(&Giant);
>>                         shutdown_nice(RB_HALT);
>> +               }
>> #endif
>>                 break;
>>
>>             case PDWN:
>> #ifndef SC_DISABLE_REBOOT
>> -               if (enable_reboot && !(flags & SCGETC_CN))
>> +               if (enable_reboot && !(flags & SCGETC_CN)) {
>> +                       mtx_unlock(&Giant);
>>                         shutdown_nice(RB_HALT|RB_POWEROFF);
>> +               }
>> #endif
>>                 break;
>>
>
> The new bugs should cause assertion failures.  shutdown_nice() just
> signals init and returns.  Giant is not re-acquired, so assertions
> should fail when the caller drops the lock.  So the shutdown should
> be a nasty panic.
>

I must have been using vt when I tested it, since I didn't see that.


> Transiently dropping the lock is probably not fatal depending on what
> the caller does.  On x86 with atkbd, nested interrupts are prevented
> by masking the in the ATPIC or APIC -- Giant is not really the driver
> lock, but just a lock for interfacing between related screen and keyboard
> drivers.
>
> Serial console drivers with fast interrupt handlers have much more
> broken locking for ddb special keys.  It is invalid to either drop locks
> or call the "any" function from a fast interrupt handler, but buggy
> serial console drivers calls kbd_alt_break(), and that now calls
> shutdown_nice() and other functions that cannot be called from a fast
> interrupt handler.  ddb keys supply most of the shutdown_nice()
> functionality for serial consoles, and there are no escape sequence to
> get this without ddb or maybe another debugger, so these bugs don't
> affect most users.
>

It's called it before my changes... I'll make a note to look into this.


> Handling this correctly requires much the same fix as an unsafe signal
> handler, and fixes have much the same problems -- not much more than
> setting a flag is safe, and the flag might never be looked at if the
> system is in a bad state.  However, if a nice shutdown is possible then
> the sytem must be in a good enough state to poll for flags.
>
> For normal signal handlers, there is no problem sending a signal to init
> or at least with setting a flag and waking up some thread to check the
> flag.
>
> I don't quite understand this commit.  It should be valid to send a
> signal to init() in proc or non-fast ithread context including with
> Giant held.  shutdown_nice() starts with PROC_LOCK(initproc), so it
> would be a LOR to call it with a spinlock held, and is even more
> obviously wrong to call it from kbd_alt_break() with interrupts masked
> and possibly a spinlock held, but Giant is a special sleep lock that
> causes fewer LORs than most sleep locks.
>

Yes, this commit is wrong.


> Actual testing shows that doesn't cause a panic, but it also doesn't
> actually unlock for shutdown_nice(), since the lock is acquired twice
> and only released once.  syscons has much the same extra lock sprinkling
> for event handling as vt:
>
> - intr_event_execute_handlers() acquires Giant and calls atkbdintr()
> - atkbdintr() calls sckbdevent()
> - sckbdevent() unnecessarily acquires Giant again
> - the buggy unlocking drops Giant just once
> - shutdown_nice() is called with Giant held
> - the buggy unlocking fails to re-acquire Giant
> - sckbdevent() releases Giant, leaving it not held
> - sckbdevent() returns
> - atkbdintr() returns
> - intr_event_execute_handlers() releases Giant.  This should panic, but
>   it apparently blocks for long enough for init to shut down first.
>
> When I trace the last step, I get a panic which might be either from the
> different timing or just a bug in kdb.


Good point. I think the following change is good for everything except
calling shutdown_nice() from a fast interrupt handler with noinit running:

diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c
index e5ea9644ad3f..564aecd811be 100644
--- a/sys/kern/kern_shutdown.c
+++ b/sys/kern/kern_shutdown.c
@@ -366,6 +366,12 @@ kern_reboot(int howto)
 {
        static int once = 0;

+       /*
+        * Drop Giant once and for all.
+        */
+       while (mtx_owned(&Giant))
+               mtx_unlock(&Giant);
+
 #if defined(SMP)
        /*
         * Bind us to the first CPU so that all shutdown code runs there.
Some

Comments?

Warner