mfi timeouts

Vincent Hoffman vince at unsane.co.uk
Thu Oct 27 22:52:53 UTC 2011


Hi,
    I've recently installed a new NAS at work which uses a rebranded LSI
megaraid sas
[root at banshee ~]# mfiutil show adapter
mfi0 Adapter:
    Product Name: Supermicro SMC2108
   Serial Number:
        Firmware: 12.12.0-0047
     RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
  Battery Backup: present
           NVRAM: 32K
  Onboard Memory: 512M
  Minimum Stripe: 8k
  Maximum Stripe: 1M

I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives)

I'm seeing a lot of messages like
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS
mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS
mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS
mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS

At which time I'm seeing IO stall on the array connected to the mfi
adapter, this can continue for
20 minutes or so resuming randomly (or so it seems although a little
more on this later on)

>From pciconf -lv
mfi0 at pci0:5:0:0:        class=0x010400 card=0x070015d9 chip=0x00791000
rev=0x04 hdr=0x00
    vendor     = 'LSI Logic (Was: Symbios Logic, NCR)'
    class      = mass storage
    subclass   = RAID

>From dmesg
mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem
0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5
mfi0: Megaraid SAS driver Ver 3.00
mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host
mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started
(PCI ID 0079/1000/0700/15d9)
mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision

I have found this thread from a bit of googleing but it doesnt end too well.
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html
Was this ever taken further?

One thing I have noticed is that the stall (and timeout messages) seem
to go away if I query the card using mfiutil, I currently have a cron
doing this every 2 minutes to see if this has been coincidence or not.


Any suggestions welcome and i'm happy to provide more info if i can but
I dont have a duplicate to do too much debugging on, I'm happy to try
patches though.

Is this worth filing a PR?

Vince


More information about the freebsd-stable mailing list