MegaRAID 'Bad Slot' Kernel message and crash.

Doug White dwhite at gumbysoft.com
Thu Jan 13 17:35:33 PST 2005


On Tue, 11 Jan 2005, Tony Byrne wrote:

> Basically, after some amount of uptime the kernel will emit a "amr0:
> Bad slot x completed" message and pretty soon after this the box goes into a
> partially unresponsive state forcing us to reboot it.  So far the only
> thing triggering the problem is the nightly jobs, where the amount of
> IO is higher than during the day.

scottl has been able to reproduce this on a U320 controller he has. I only
have U160 equipment and can't get the txn rate up high enough to reproduce
the issue.  The driver needs KTR instrumentation so we can see where the
bad slot is popping up from.  The "bad slot" message appears when the
controller returns completion for a command that had already completed.

The amr driver has several other issues and is in dire need of an
overhaul. Unfortunately LSI has not been forthcoming with documentation,
so Scott and I are pretty much scratching our heads without knowing where
to go.

This is in 5.X and HEAD, at least.  I can't comment on 4.x.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite at gumbysoft.com          |  www.FreeBSD.org


More information about the freebsd-stable mailing list