replaced da devices not being detected

Graham Allan allan at physics.umn.edu
Thu Jul 10 18:43:04 UTC 2014


On Thu, Jul 03, 2014 at 06:41:05PM -0500, Graham Allan wrote:
> On 7/3/2014 3:03 PM, Graham Allan wrote:
> >
> >It does seem to me like we get to replace some number of drives without
> >incident, then after some point no new da devices are detected.
> 
> I should have given some more info about the HBA etc in use - it's
> an LSI 9205-8e (SAS2308, using mps driver), and dmesg is telling me
> the HBA has (IT) firmware 14.00.00.00. Don't know if this is good or
> bad but it appears to match the mps driver version, if that means
> anything.
> 
> I can see LSI is up to firmware 19.00.00.00 for the card, and I know
> I've seen discussion here of the favored version, but can't find it
> now.
 
> However SAS2IRCU can see the added drive even when camcontrol fails
> to, so I'm not sure that it's related to the HBA as such - unless
> SAS2IRCI gets that information by a different path such as querying
> the enclosure controller.

Funnily enough the "missing" drive showed up round about the time I was
messing with sas2ircu - though I didn't notice at first.

The first time I ran "sas2ircu 0 display", it took a *really* long time
to respond - subsequent runs were instant. I see now in kern.log that
something issued a reinit to the HBA:

Jul  3 17:57:39 hostname kernel: mps0: Calling Reinit from
mps_wait_command
Jul  3 17:57:39 hostname kernel: mps0: mps_reinit sc 0xffffff8002a77000
Jul  3 17:57:39 hostname kernel: mps0: mps_reinit mask interrupts
Jul  3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup
Jul  3 17:57:40 hostname kernel: mps0: mpssas_announce_reset code 1
target -1 lun -1
Jul  3 17:57:40 hostname kernel: mps0: mpssas_complete_all_commands
Jul  3 17:57:40 hostname kernel: (noperiph:mps0:0:4294967295:0): SMID 370
waking up cm 0xffffff8002aa7a10 state 1 ccb 0 for diag reset
Jul  3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup 0 tm
0 after command completion
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit doorbell 0x24000000
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit unmask interrupts post
0 free 1055
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit restarting post 0 free
1055
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit finished sc
0xffffff8002a77000 post 0 free 1055
Jul  3 17:57:40 hostname kernel: mps0: Reinit success
Jul  3 17:57:40 hostname kernel: mps0: mps_user_pass_thru: invalid
request: error 60

the drive showed up right after this.

Jul  3 18:00:15 hostname kernel: da91 at mps0 bus 0 scbus0 target 218 lun
0
Jul  3 18:00:15 hostname kernel: da91: <ATA ST3000DM001-1CH1 CC26> Fixed
Direct Access SCSI-6 device
Jul  3 18:00:15 hostname kernel: da91: 600.000MB/s transfers
Jul  3 18:00:15 hostname kernel: da91: Command Queueing enabled
Jul  3 18:00:15 hostname kernel: da91: 2861588MB (5860533168 512 byte
sectors: 255H 63S/T 364801C)

I suspect sas2ircu was probably responsible for this. The system was
generally unresponsive during that first sas2ircu run, but was normal
before and after.

Does this make any sense?

Is there a recommended firmware version (other than our current
14.00.00.00) for the 9205-8e which might help with this?

Thanks for any ideas,

Graham
-- 
-------------------------------------------------------------------------
Graham Allan
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------


More information about the freebsd-fs mailing list