mpt request timed out

Sun Jun 6 02:25:34 UTC 2010

I used to have "UNIT ATTENTION asc:29,0" errors on mpt when it was
running with IR firmware (the one that supports RAID). The issue
disappeared after firmware was changed to IT variant (w/o RAID). Keep
in mind that in my case original IR firmware was quite a bit older
than IT version that replaced it. It's possible that it was upgraded
firmware that fixed the issue for me, not the switch to IT.

In my case those errors were correlating pretty well with disks' SMART
UDMA_CRC_Error_Count. I guess corrupted transaction was re-issued and
succeeded in the end as there were no ZFS errors in my case either.

--Artem

2010/6/5 Ståle Kristoffersen <staale at kristoffersen.ws>:
> Hi, I'm not sure if this is the right list, please tell me if it is.
>
> I'm having problems with mpt timeouts when putting load on the disks
> connected to it.
> I have the mpt-adapter connected to a sas-expander, and several disks
> connected to that expander:
>
> mpt0: <LSILogic SAS/SATA Adapter> port 0xc800-0xc8ff mem
> 0xfe8fc000-0xfe8fffff,0xfe8e0000-0xfe8effff irq 16 at device 0.0 on pci1
> mpt0: [ITHREAD]
> mpt0: MPI Version=1.5.20.0
>
> ses0 at mpt0 bus 0 scbus0 target 0 lun 0
> ses0: <LSILOGIC SASX36 A.1 7015> Fixed Enclosure Services SCSI-3 device
> ses0: 300.000MB/s transfers
> ses0: Command Queueing enabled
> ses0: SCSI-3 SES Device
>
> And all the disks are consumer-grade SATA-diskes like this:
> da0 at mpt0 bus 0 scbus0 target 1 lun 0
> da0: <ATA ST31000528AS CC38> Fixed Direct Access SCSI-5 device
> da0: 300.000MB/s transfers
> da0: Command Queueing enabled
> da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
>
> The error I'm seeing is this:
> http://folk.uio.no/stalk/mpt/timeout.txt
>
> I've also put out a full dmesg from boot:
> http://folk.uio.no/stalk/mpt/dmesg.txt
> (i've since added 4 new disks, but the error was there before that).
>
> What can be causing these timeouts? The controller resets everything and
> zfs is not complaining:
>
>  pool: media
>  state: ONLINE
>  scrub: scrub stopped after 0h4m with 0 errors on Wed May 12 13:58:05 2010
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        media       ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            da14    ONLINE       0     0     0
>            da11    ONLINE       0     0     0
>            da6     ONLINE       0     0     0
>            da3     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            da15    ONLINE       0     0     0
>            da12    ONLINE       0     0     0
>            da9     ONLINE       0     0     0
>            da5     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            da8     ONLINE       0     0     0
>            da2     ONLINE       0     0     0
>            da0     ONLINE       0     0     0
>            da4     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            da1     ONLINE       0     0     0
>            da13    ONLINE       0     0     0
>            da10    ONLINE       0     0     0
>            da7     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            da17    ONLINE       0     0     0
>            da18    ONLINE       0     0     0
>            da19    ONLINE       0     0     0
>            da20    ONLINE       0     0     0
>
> errors: No known data errors
>
> but clients time out or gets an error if they try to do IO while the
> connection is down, and thats causing havoc. The timeouts last from 10 up
> to 30 seconds each.
>
> I'd appreciate any ideas!
>
> --
> Ståle Kristoffersen
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
>