git: 35547df5c786 - main - Call wakeup() with the lock held to avoid missed wakeup races.

Scott Long scottl at samsco.org
Wed Aug 11 22:50:25 UTC 2021



> On Aug 11, 2021, at 4:42 PM, John Baldwin <jhb at freebsd.org> wrote:
> 
> On 8/11/21 3:31 PM, Luiz Otavio O Souza wrote:
>>> On Wed, Aug 11, 2021 at 7:03 PM John Baldwin <jhb at freebsd.org> wrote:
>>> 
>>> On 8/11/21 2:38 PM, Luiz Otavio O Souza wrote:
>>>> On Wed, Aug 11, 2021 at 3:49 PM John Baldwin <jhb at freebsd.org> wrote:
>>>>> 
>>>>> On 8/10/21 3:41 PM, Scott Long wrote:
>>>>>> The branch main has been updated by scottl:
>>>>>> 
>>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=35547df5c78653b2da030f920323c0357056099f
>>>>>> 
>>>>>> commit 35547df5c78653b2da030f920323c0357056099f
>>>>>> Author:     Scott Long <scottl at FreeBSD.org>
>>>>>> AuthorDate: 2021-08-10 22:36:38 +0000
>>>>>> Commit:     Scott Long <scottl at FreeBSD.org>
>>>>>> CommitDate: 2021-08-10 22:36:38 +0000
>>>>>> 
>>>>>>       Call wakeup() with the lock held to avoid missed wakeup races.
>>>>>> 
>>>>>>       Submitted by: luiz
>>>>>>       Sponsored by: Rubicon Communications, LLC ("Netgate")
>>>>>> ---
>>>>>>    sys/dev/sdhci/sdhci.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> 
>>>>>> diff --git a/sys/dev/sdhci/sdhci.c b/sys/dev/sdhci/sdhci.c
>>>>>> index d075c2e05000..573e6949b57e 100644
>>>>>> --- a/sys/dev/sdhci/sdhci.c
>>>>>> +++ b/sys/dev/sdhci/sdhci.c
>>>>>> @@ -2078,8 +2078,8 @@ sdhci_generic_release_host(device_t brdev __unused, device_t reqdev)
>>>>>>        /* Deactivate led. */
>>>>>>        WR1(slot, SDHCI_HOST_CONTROL, slot->hostctrl &= ~SDHCI_CTRL_LED);
>>>>>>        slot->bus_busy--;
>>>>>> -     SDHCI_UNLOCK(slot);
>>>>>>        wakeup(slot);
>>>>>> +     SDHCI_UNLOCK(slot);
>>>>>>        return (0);
>>>>>>    }
>>>>> 
>>>>> Hmm, how does this avoid a race?  The sleep is checking bus_busy under
>>>>> the lock and should never see a stale value and go back to sleep after
>>>>> the wakeup has occurred:
>>>>> 
>>>>>          SDHCI_LOCK(slot);
>>>>>          while (slot->bus_busy)
>>>>>                  msleep(slot, &slot->mtx, 0, "sdhciah", 0);
>>>>>          slot->bus_busy++;
>>>>>          /* Activate led. */
>>>>>          WR1(slot, SDHCI_HOST_CONTROL, slot->hostctrl |= SDHCI_CTRL_LED);
>>>>>          SDHCI_UNLOCK(slot);
>>>>> 
>>>>> Dropping the lock before wakeup() is a tiny optimization that avoids
>>>>> having the second thread wakeup and immediately block on the lock before
>>>>> it has been released by the first thread.
>>>>> 
>>>> 
>>>> 'race' is probably wrong here.  this change will prevent a second
>>>> thread from taking the bus before you call wakeup() - poking all other
>>>> threads unnecessarily.
>>> 
>>> This change does not prevent that.  The other thread and the thread that
>>> are awakened will race with each other to acquire the lock.  wakeup()
>>> doesn't do any sort of explicit lock handoff to the thread being awakened
>>> and it's just as likely for a thread not yet asleep to acquire the lock as
>>> for the thread being awakened to acquire the lock.  If you have observed
>>> thundering herd problems with this wakeup() then you might want to change
>>> it to wakeup_one().
>> correct, but to be more specific, on the first thread, after you free
>> the bus and release the lock, a new thread can run, successfully
>> acquire the lock and grab the bus.  at this point the first thread
>> resumes and call wakeup().  when that happens, the new thread always
>> wins, the threads being awakened won't have a chance.
> 
> Perhaps on a uniprocessor system this might be true, but otherwise
> your new thread is likely spinning adaptively on the lock on another
> CPU and grabs it as soon as it is released before the thread awakened by
> wakeup() is even scheduled on a CPU and given a chance to run.  That is,
> I suspect the new thread always wins even with this change on any
> multiprocessor system, but it now has to wait a bit longer before it
> wins.  Have you observed some specific behavior with traces that this
> change seeks to address?
> 
> -- 
> John Baldwin

Thats a lousy question, John.  Yes, we have, and yes, we still support uniprocessor systems.  This isnt fast path code, and it solves a problem for us.

Scott


More information about the dev-commits-src-all mailing list