proliant server lockups with freebsd-amd64-stable (2010-03-10)

Alexander Motin mav at FreeBSD.org
Fri Mar 12 18:38:39 UTC 2010


Pawel Jakub Dawidek wrote:
> On Thu, Mar 11, 2010 at 01:39:16PM +0100, Kai Gallasch wrote:
>> I have some trouble with an opteron server locking up spontaneously. It looses
>> all networks connectivity and even through console I can get no shell.
>>
>> Lockups occur mostly under disk load (periodic daily, bacula backup
>> running, make buildworld/buildkernel) and I can provoke them easily.
> [...]
>>     4     0     0     0  LL     *cissmtx  0xffffff04ed820c00 [g_down]
> [...]
>> 100046                   L      *cissmtx  0xffffff04ed820c00 [irq257: ciss0]
> [...]
> 
> I was analizing similar problem as potential ZFS bug. It turned out to
> be bug in ciss(4) and I believe mav@ (CCed) has fix for that.

That my patch is already at 8-STABLE since r204873 of 2010-03-08. Make
sure you have it.

In this case trap stopped process at ciss_get_request(), which indeed
called holding cissmtx lock. But there is no place to sleep or loop
there, so may be it was just spontaneous. With bugs I was fixing there
was a chance to loop indefinitely between ciss and CAM on resource
constraint. That increases chance for such situation to be caught.

You may try also look what's going on with `top -HS` and `systat -vm 1`.

-- 
Alexander Motin


More information about the freebsd-fs mailing list