9211 (LSI/SAS) issues on 11.2-STABLE

Borja Marcos borjam at sarenet.es
Wed Feb 6 15:18:51 UTC 2019



> On 5 Feb 2019, at 23:49, Karl Denninger <karl at denninger.net> wrote:
> 
> BTW under 12.0-STABLE (built this afternoon after the advisories came
> out, with the patches) it's MUCH worse.  I get the same device resets
> BUT it's followed by an immediate panic which I cannot dump as it
> generates a page-fault (supervisor read data, page not present) in the
> mps *driver* at mpssas_send_abort+0x21.

> This precludes a dump of course since attempting to do so gives you a
> double-panic (I was wondering why I didn't get a crash dump!); I'll
> re-jigger the box to stick a dump device on an internal SATA device so I
> can successfully get the dump when it happens and see if I can obtain a
> proper crash dump on this.
> 
> I think it's fair to assume that 12.0-STABLE should not panic on a disk
> problem (unless of course the problem is trying to page something back
> in -- it's not, the drive that aborts and resets is on a data pack doing
> a scrub)

It shouldn’t panic I imagine.

>>>> mps0: Sending reset from mpssas_send_abort for target ID 37


>> 0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
>> 0x06  0x008  4               6  ---  Number of Hardware Resets
>> 0x06  0x010  4               0  ---  Number of ASR Events
>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>                                 |||_ C monitored condition met
>>                                 ||__ D supports DSN
>>                                 |___ N normalized value
>> 
>> 0x06  0x008  4               7  ---  Number of Hardware Resets
>> 0x06  0x010  4               0  ---  Number of ASR Events
>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>                                 |||_ C monitored condition met
>>                                 ||__ D supports DSN
>>                                 |___ N normalized value
>> 
>> Number of Hardware Resets has incremented.  There are no other errors shown:

What is _exactly_ that value? Is it related to the number of resets sent from the HBA
_or_ the device resetting by itself?

>> I'd throw possible shade at the backplane or cable /but I have already
>> swapped both out for spares without any change in behavior./

What about the power supply? 





Borja.




More information about the freebsd-stable mailing list