PERC5 (LSI MegaSAS) Patrol Read crashes

Scott Long scottl at samsco.org
Mon Jun 30 16:47:33 UTC 2008


Brian A. Seklecki wrote:
> On Thu, 2007-11-15 at 15:55 -0500, Brian A Seklecki (Mobile) wrote:
>> Normally I'd be praising Dell, but I think a little vendor bashing is
>> due here.
> 
> All:
> 
> Just to follow up, we've been running these 1st-generation 2950s in our
> lab with RHEl5.2 x86_64 for ~3 weeks w/o any disk or I/O problems.
> 
> It must have been some random bug with the FreeBSD mfi(4) that only
> affected that revision of the PERC5, or, since the motherboard/CPU
> family/chipset is entirely different in R2 and R3, something with
> FreeBSD and how it was handling the controller (ACPI?)
> 
> We never had any stability problems with R2 and R3 on RELENG_6_3 on the
> 2950 or 1950.
> 
>>From now on we'll wait for R2 before we go anywhere near new Dell
> gear.  
> 
> What do you think the chances of them dumping LSI for Acera and Broadcom
> for Intel? :)
> 
> ~BAS
> 
>> Its a software bug (driver).  It can probably be easily fixed.  I
>> think there's a PR on it somewhere (will check).

The problem is a firmware bug in the Megaraid SAS controller.  It seems
that while the controller can handle 512 or more concurrent commands,
it can only handle 128 concurrent commands to each array.  Patrols
reads aren't the primary cause, they just help the problem appear; when
a patrol read cycle runs, it tends to slow down i/o enough that commands
to the array get backed up, and you tend to reach the 128 limit.

I don't know if there is a firmware fix from Dell/LSI, or if there will
ever be a fix.  FreeBSD drivers tend to stress hardware a lot more
than Linux and Windows do, and since the latter two are used as the
QA yardstick, anything that doesn't affect them doesn't usually get
fixed.  An easy work-around for the driver is to change the following
line in /sys/dev/mfi/mfi.c::mfi_alloc_commands()

ncmds = sc->mfi_max_fw_cmds;

to

ncmds = 128;

A more complete solution requires me writing an i/o scheduler in the
driver, something that would take quite a bit of effort.

With all this said, I still stand behind LSI controllers.  This bug,
while unfortunate, is relatively minor and easy to work around, and
it's the only significant bug that has turned up in over two and half
years with this hardware.

Scott



More information about the freebsd-hardware mailing list