7.2-release/amd64: panic, spin lock held too long

Tue Jul 7 01:27:51 UTC 2009

2009/7/7 Dan Naumov <dan.naumov at gmail.com>:
> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<attilio at freebsd.org> wrote:
>> 2009/7/7 Dan Naumov <dan.naumov at gmail.com>:
>>> I just got a panic following by a reboot a few seconds after running
>>> "portsnap update", /var/log/messages shows the following:
>>>
>>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>>> Jul  7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock
>>> 1) held by 0xffffff00017d8370 (tid 100054) too long
>>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>>
>> That's a known bug, affecting -CURRENT as well.
>> The cpustop IPI is handled though an NMI, which means it could
>> interrupt a CPU in any moment, even while holding a spinlock,
>> violating one well known FreeBSD rule.
>> That means that the cpu can stop itself while the thread was holding
>> the sched lock spinlock and not releasing it (there is no way, modulo
>> highly hackish, to fix that).
>> In the while hardclock() wants to schedule something else to run and
>> got stuck on the thread lock.
>>
>> Ideal fix would involve not using a NMI for serving the cpustop while
>> having a cheap way (not making the common path too hard) to tell
>> hardclock() to avoid scheduling while cpustop is in flight.
>>
>> Thanks,
>> Attilio
>
> Any idea if a fix is being worked on and how unlucky must one be to
> run into this issue, should I expect it to happen again? Is it
> basically completely random?

I'd like to work on that issue before BETA3 (and backport to
STABLE_7), I'm just time-constrained right now.
it is completely random.

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein