Write Timeouts with MPS

Desai, Kashyap Kashyap.Desai at lsi.com
Thu Apr 12 12:30:07 UTC 2012


We never see this issue on our test machines.
Adding Sreekanth and he will plan to reproduce this issue locally to have further analysis on issue.

Please help Sreekanth to reproduce it locally.


~ Kashyap

> -----Original Message-----
> From: owner-freebsd-scsi at freebsd.org [mailto:owner-freebsd-
> scsi at freebsd.org] On Behalf Of John Hickey
> Sent: Wednesday, April 11, 2012 1:06 PM
> To: freebsd-scsi at freebsd.org
> Subject: Re: Write Timeouts with MPS
> 
> I pretty much did this and filed a ticket with Seagate this afternoon.
> They told me the latest firmware is 0006 (I am at 0001) and wanted
> the serial numbers of the other drives in the array (probably to
> confirm firmware compatibility).  I suspect I'll have the update in
> hand tomorrow and see how that works.  Running FreeBSD didn't seem to
> be an issue to them aside from concern about reading the serial numbers
> without seatools.  Only issue with that was that I initially gave them
> the whole inquiry serial string, but only the first 8 (X) characters of
> inquiry are the serial number:
> 
>     $ sudo camcontrol inquiry da3
>     pass3: <SEAGATE ST2000NM0001 0001> Fixed Direct Access SCSI-6 device
>     pass3: Serial Number XXXXXXXX0000YYYYYYYY
>     pass3: 600.000MB/s transfers, Command Queueing Enabled
> 
> John
> 
> On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote:
> > Well, when I emailed some Seagate people, they just told me to use
> > supported ones. So I suggest you email them about it, telling them it
> is
> > on the compatibility list, and asking for an explanation and fix (eg.
> > firmware bug fix). You could also say it is fairly common on seagate
> > (and Samsung) disks, and very uncommon with other brands.
> >
> > Peter
> >
> > On 11.04.2012 00:26, John Hickey wrote:
> > > I have 19 drives in my array, so changing them isn't that easy. ;-)
> They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001
> 0001) and according to LSI documents my whole setup should be supported.
> The drives at least aren't being marked as failed.  I believe a change
> was made a while back to make FreeBSD less sensitive to these sorts of
> timeouts.  I have had a panic or two on the system, but haven't tracked
> down the exact cause yet.
> > >
> > > John
> > >
> > > On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote:
> > >
> > >> I found this only happens with specific disks / disk firmware...
> but
> > >> nobody seems to listen to me about it. They all seem to blame the
> > >> driver. (I blame both, but changing disks is a simple fix.)
> > >>
> > >> And looking around, most reports are with various Seagates
> (including
> > >> one that can cause this type of error with smartctl -a with a SAS
> > >> Seagate, but cannot reproduce with the binary LSI driver) or
> Samsung
> > >> Spinpoints. The only other disk I know of that does this is a
> Crucial
> > >> SSD with old firmware. One guy said he can do a camcontrol rescan
> to get
> > >> it back; I tried that and get either panics, hangs, or nothing.
> > >>
> > >> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB
> Seagate
> > >> greens don't seem to have this problem. I have no idea if different
> > >> disks behave differently with different controllers. I asked
> Seagate
> > >> about it and they reply with marketing nonsense about buying
> enterprise
> > >> disks instead, and say I should buy disks that are on the specific
> > >> compatibility list for the HBA.
> > >>
> > >> I found that with the few disks that I have that fail randomly (and
> > >> others), I can reproduce the issue (not exact same symptoms though)
> by
> > >> hot pulling the disk while writing something, putting it back, wait
> a
> > >> few seconds (<10; less than enough for the SCSI controller to
> rescan)
> > >> pull and replace again. The old 2TB seagate greens fail this test,
> but
> > >> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test,
> as
> > >> well as 3TB WD greens. (all enterprise disks I tried pass this test
> > >> except the Toshiba 2TB ones I tried)
> > >>
> > >> If I put a "failed" disk back in, it does not work. If I put it in
> a
> > >> different slot, same. But if I put any other disk in, it works
> fine. So
> > >> it is the disk, but it is also FreeBSD not being able to
> reset/rescan
> > >> it. But it is simple enough to blame both, and since you can't get
> rid
> > >> of the driver, get different disks (eg. swap them with some
> different
> > >> same sized ones in a different machine).
> > >>
> > >> Here is my forum thread about it, including disk product ids for
> ones I
> > >> tested, and a huge list of things that don't fix it.
> > >> http://forums.freebsd.org/showthread.php?t=28252
> > >>
> > >> Peter
> > >>
> > >>
> > >> On 10.04.2012 03:52, John Hickey wrote:
> > >>> I've seen people having this problem before, but I don't think
> anyone
> > >>> has figured it out.  I am running:
> > >>>
> > >>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr  7
> 18:05:57 PDT 2012     root at zfs:/usr/obj/usr/src/sys/GENERIC  amd64
> > >>>
> > >>> I have the latest LSI IT firmware 13 loaded:
> > >>>
> > >>> mps1: <LSI SAS2008> port 0xc000-0xc0ff mem 0xfe93c000-
> 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5
> > >>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd
> > >>> mps1: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDis
> c>
> > >>>
> > >>> All disks are on a SuperMicro SAS II backplane:
> > >>>
> > >>> root at zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist
> > >>> <SEAGATE ST3300657SS 0008>         at scbus0 target 0 lun 0
> (da0,pass0)
> > >>> <SEAGATE ST3300657SS 0008>         at scbus0 target 1 lun 0
> (da1,pass1)
> > >>> <SEAGATE ST2000NM0001 0001>        at scbus1 target 8 lun 0
> (da2,pass2)
> > >>> .... x16 more of the same
> > >>> <SEAGATE ST2000NM0001 0001>        at scbus1 target 46 lun 0
> (da20,pass20)
> > >>> <LSI CORP SAS2X36 0717>            at scbus1 target 47 lun 0
> (ses0,pass21)
> > >>>
> > >>> Essentially when putting the ZFS filesystem under load, I am
> getting
> > >>> these sorts of errors:
> > >>>
> > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0
> length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length
> 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length
> 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0
> length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0
> length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length
> 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length
> 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length
> 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length
> 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length
> 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length
> 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length
> 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length
> 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length
> 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length
> 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length
> 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length
> 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length
> 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0
> length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0
> length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length
> 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0
> > >>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length
> 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0
> > >>> _______________________________________________
> > >>> freebsd-scsi at freebsd.org mailing list
> > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > >>> To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe at freebsd.org"
> > >> _______________________________________________
> > >> freebsd-scsi at freebsd.org mailing list
> > >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > >> To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe at freebsd.org"
> > >>
> > > _______________________________________________
> > > freebsd-scsi at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > > To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe at freebsd.org"
> >
> > _______________________________________________
> > freebsd-scsi at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe at freebsd.org"
> >
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"


More information about the freebsd-scsi mailing list