mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags

Peter Maloney peter.maloney at brockmann-consult.de
Thu Nov 3 10:31:42 UTC 2011


Dear Jason,

On 11/02/2011 07:05 PM, Jason Wolfe wrote:
> Hello,
> Testing with the LSI supplied driver, it appears they have a code path for
> this condition that causes our driver to crash.  Here are 2 sets of
> messages:
>
> mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
> 0xffffff800040bdf8
> (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
> SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00
> mpslsi0: mpssas_alloc_tm freezing simq
> mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070
> (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536
> SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y
> (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536
> SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay
> (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
> SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1
> (noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID
> 97
> mpslsi0: mpssas_free_tm releasing simq
>
>
> mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
> 0xffffff8000441e18
> (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
> 131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0
> mpslsi0: mpssas_alloc_tm freezing simq
> mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0
> (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072
> SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y
> (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536
> SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y
> (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
> 131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1
> (noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID
> 989
> mpslsi0: mpssas_free_tm releasing simq
>
> The server ran for 10 minutes with these happening every 10-30 seconds,
> with our community driver the first instance of commands timing out during
> this smartctl storm would cause the server to hang and sometimes the
> controller to reset.  Hopefully this is helpful to someone.
>

Does this mean it didn't hang? or it ran your smartctl -a test for 10
minutes before a hang?

I am also trying the mpslsi driver now, but I couldn't reproduce the
problem using "smartctl -a" (also tried -A, -h and -i) with the mps
driver. Tags was set to 255 on all disks. I only tried it on the backup
server, which didn't crash randomly on its own either. So I will just
have to assume it works if it doesn't do the same thing in a month or two.

However, with the mpslsi driver, during a scrub on the backup server
(probably during smartctl -a), I got these messages (including what
looks like a controller reset), and no disks were lost, with no read
errors reported in zpool status. But I can't get it to happen a second
time. So I hope that means our problems are over.

Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout
checking sc 0xffffff800f629000 cm 0xffffff800f65f698
Nov  3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 717 command timeout cm 0xffffff800f65f698 ccb
 0xffffff0026bbb800
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm
0xffffff800f65f698 allocated tm 0xffffff800f6340f8
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm
0xffffff800f643cd8 ccb 0xffffff0026bd1000 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm
0xffffff800f654550 ccb 0xffffff0026b96000 during recovery i
oc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm
0xffffff800f664510 ccb 0xffffff003d438000 during recovery i
oc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm
0xffffff800f657b90 ccb 0xffffff00314ce800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm
0xffffff800f65a630 ccb 0xffffff0026ba1800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm
0xffffff800f65ece8 ccb 0xffffff0026bb1800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 717 completed timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800
during recov(da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b
0 length 22016 SMID 690 completed cm 0xffffff800f65dc70 ccb
0xffffff0026bea800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm
0xffffff800f66d568 ccb 0xffffff0026bf9000 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm
0xffffff800f65d5a8 ccb 0xffffff003d47f800 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm
0xffffff800f641428 ccb 0xffffff0031536000 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm
0xffffff800f63e3b8 ccb 0xffffff00314ec800 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 139 completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 876 completed cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during
recovery ioc 8(pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 661 completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 471 completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 215 completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 203 completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 546 completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 513 completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1
abort TaskMID 717 status 0x0 code 0x0 count 20
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1
finished recovery after aborting TaskMID 717
Nov  3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI
Status Error
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status:
Check Condition
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT
ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)


Peter

> Jason
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney at brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------



More information about the freebsd-scsi mailing list