[BUG] I think sleepqueue need to be protected in sleepq_broadcast

Benjamin Close Benjamin.Close at clearchain.com
Thu Aug 28 14:56:10 UTC 2008


Attilio Rao wrote:
> 2008/8/23, John Baldwin <jhb at freebsd.org>:
>   
>> On Friday 22 August 2008 01:33:28 pm kevinxlinuz wrote:
>>     
>>> Hi,
>>>   I'm looking in the problem ( amd64/124200: kernel panic on mutex sleepq
>>> chain).It troubles me for a long time.I add a KASSERT in sleepq_broadcast()
>>> to check the sleepqueue's wait channel.At last it turn out that the
>>> sleepqueue's wait channel was changed before sleepq_resume_thread(). In
>>> sleepq_lookup(),We can easily find sq->sq_wchan == wchan.But after a short
>>> time,the sq->sq_wchan nolonger equal with wchan,so I think it was changed
>>> by other threads.
>>>       
>> The sleepq chain lock is already held for all of sleepq_broadcast() by the
>> caller (see wakeup() and cv_broadcastpri()).  That said, I don't have any
>> other good ideas for the panic you are seeing.  Do you have a crash dump?  It
>> might be interesting to see what other thread is using that sleep queue.
>>     
>
> Ben Close and me investigated this bug extensively and still didn't
> find the source.
> Factors we have now:
> 1) The lock, when accessing with DDB, is exactly locked by another
> thread even if it should be held by the curthread. It is like the
> mutex cookie gets overwritten by the other thread like if it was free.
> An extra drop (and subsequent acquire) is not very likely because of
> (2).
> 2) KTR traces doesn't show anything wrong. Accesses to sleepqueue
> chain lock are paired (both on via mtx_* interface and thread_lock
> respectively). This is very strange because it excludes a wrong locks
> semantic.
> 3) The problem is reproducible even on 4BSD, without PREEMPTION and
> even with smp sysctl disabled (it just brings more time).
> 4) The bug seems triggered by sx + waitchannel when used in the
> sx_sleep() and such.
>
> I'm thinking this can be some nasty, but sorta of deterministic, race
> between sleepqueue accesses between the sx sleepqueue and the
> waitchannel sleepqueue.
> I have still to think better about it, but actually I'm pretty busy
> and if you have good ideas please let me know.
>   
The other common factor, though not 100% verified is everyone 
experiencing the race is running amd64.

Cheers,
    Benjamin



More information about the freebsd-current mailing list