7.2-release/amd64: panic, spin lock held too long

Dan Naumov dan.naumov at gmail.com
Wed Jul 8 00:57:30 UTC 2009


On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao<attilio at freebsd.org> wrote:
> 2009/7/7 Dan Naumov <dan.naumov at gmail.com>:
>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<attilio at freebsd.org> wrote:
>>> 2009/7/7 Dan Naumov <dan.naumov at gmail.com>:
>>>> I just got a panic following by a reboot a few seconds after running
>>>> "portsnap update", /var/log/messages shows the following:
>>>>
>>>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>>>> Jul  7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock
>>>> 1) held by 0xffffff00017d8370 (tid 100054) too long
>>>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>>>
>>> That's a known bug, affecting -CURRENT as well.
>>> The cpustop IPI is handled though an NMI, which means it could
>>> interrupt a CPU in any moment, even while holding a spinlock,
>>> violating one well known FreeBSD rule.
>>> That means that the cpu can stop itself while the thread was holding
>>> the sched lock spinlock and not releasing it (there is no way, modulo
>>> highly hackish, to fix that).
>>> In the while hardclock() wants to schedule something else to run and
>>> got stuck on the thread lock.
>>>
>>> Ideal fix would involve not using a NMI for serving the cpustop while
>>> having a cheap way (not making the common path too hard) to tell
>>> hardclock() to avoid scheduling while cpustop is in flight.
>>>
>>> Thanks,
>>> Attilio
>>
>> Any idea if a fix is being worked on and how unlucky must one be to
>> run into this issue, should I expect it to happen again? Is it
>> basically completely random?
>
> I'd like to work on that issue before BETA3 (and backport to
> STABLE_7), I'm just time-constrained right now.
> it is completely random.
>
> Thanks,
> Attilio

Ok, this is getting pretty bad, 23 hours later, I get the same kind of
panic, the only difference is that instead of "portsnap update", this
was triggered by "portsnap cron" which I have running between 3 and 4
am every day:

Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
00xxffffffffffffffff8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d
10100006070)5 )t otoo ol olnogng
Jul  8 03:03:49 atom kernel: p
Jul  8 03:03:49 atom kernel: anic: spin lock held too long
Jul  8 03:03:49 atom kernel: cpuid = 0
Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s

- Sincerely,
Dan Naumov


More information about the freebsd-stable mailing list