mfi timeouts

Vincent Hoffman vince at unsane.co.uk
Thu Oct 27 23:39:35 UTC 2011


On 28/10/2011 00:04, Jeremy Chadwick wrote:
> On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote:
>>     I've recently installed a new NAS at work which uses a rebranded LSI
>> megaraid sas
>> [root at banshee ~]# mfiutil show adapter
>> mfi0 Adapter:
>>     Product Name: Supermicro SMC2108
>>    Serial Number:
>>         Firmware: 12.12.0-0047
>>      RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
>>   Battery Backup: present
>>            NVRAM: 32K
>>   Onboard Memory: 512M
>>   Minimum Stripe: 8k
>>   Maximum Stripe: 1M
>>
>> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives)
>>
>> I'm seeing a lot of messages like
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS
>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS
>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS
>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS
>>
>> At which time I'm seeing IO stall on the array connected to the mfi
>> adapter, this can continue for
>> 20 minutes or so resuming randomly (or so it seems although a little
>> more on this later on)
>>
>> >From pciconf -lv
>> mfi0 at pci0:5:0:0:        class=0x010400 card=0x070015d9 chip=0x00791000
>> rev=0x04 hdr=0x00
>>     vendor     = 'LSI Logic (Was: Symbios Logic, NCR)'
>>     class      = mass storage
>>     subclass   = RAID
>>
>> >From dmesg
>> mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem
>> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5
>> mfi0: Megaraid SAS driver Ver 3.00
>> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host
>> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started
>> (PCI ID 0079/1000/0700/15d9)
>> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
>> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
>> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
>> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision
>>
>> I have found this thread from a bit of googleing but it doesnt end too well.
>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html
>> Was this ever taken further?
>>
>> One thing I have noticed is that the stall (and timeout messages) seem
>> to go away if I query the card using mfiutil, I currently have a cron
>> doing this every 2 minutes to see if this has been coincidence or not.
>>
>>
>> Any suggestions welcome and i'm happy to provide more info if i can but
>> I dont have a duplicate to do too much debugging on, I'm happy to try
>> patches though.
>>
>> Is this worth filing a PR?
> Can you please provide uname -a output?  The version of FreeBSD you're
> using matters greatly here.
>
Sure
FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct 26
16:14:09 BST 2011    
toor at banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE  amd64
[root at banshee /usr/src]# svn info
Path: .
Working Copy Root Path: /usr/src
URL: http://svn.freebsd.org/base/stable/8
Repository Root: http://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 226708
Node Kind: directory
Schedule: normal
Last Changed Author: brueffer
Last Changed Rev: 226671
Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011)


It's looking like the mfiutil query stopping the stall is not a coincidence
the last 2 have lasted less than the every 2 minutes that i set the cron
to run, much less than previously.
The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL 
So get at least get an email if the volume breaks ;)
Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
59 SECONDS
Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
89 SECONDS
Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
50 SECONDS
Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
80 SECONDS

I'm guessing this must kick something on the card.

Vince



More information about the freebsd-stable mailing list