replaced da devices not being detected
Graham Allan
allan at physics.umn.edu
Thu Jul 10 18:43:04 UTC 2014
On Thu, Jul 03, 2014 at 06:41:05PM -0500, Graham Allan wrote:
> On 7/3/2014 3:03 PM, Graham Allan wrote:
> >
> >It does seem to me like we get to replace some number of drives without
> >incident, then after some point no new da devices are detected.
>
> I should have given some more info about the HBA etc in use - it's
> an LSI 9205-8e (SAS2308, using mps driver), and dmesg is telling me
> the HBA has (IT) firmware 14.00.00.00. Don't know if this is good or
> bad but it appears to match the mps driver version, if that means
> anything.
>
> I can see LSI is up to firmware 19.00.00.00 for the card, and I know
> I've seen discussion here of the favored version, but can't find it
> now.
> However SAS2IRCU can see the added drive even when camcontrol fails
> to, so I'm not sure that it's related to the HBA as such - unless
> SAS2IRCI gets that information by a different path such as querying
> the enclosure controller.
Funnily enough the "missing" drive showed up round about the time I was
messing with sas2ircu - though I didn't notice at first.
The first time I ran "sas2ircu 0 display", it took a *really* long time
to respond - subsequent runs were instant. I see now in kern.log that
something issued a reinit to the HBA:
Jul 3 17:57:39 hostname kernel: mps0: Calling Reinit from
mps_wait_command
Jul 3 17:57:39 hostname kernel: mps0: mps_reinit sc 0xffffff8002a77000
Jul 3 17:57:39 hostname kernel: mps0: mps_reinit mask interrupts
Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup
Jul 3 17:57:40 hostname kernel: mps0: mpssas_announce_reset code 1
target -1 lun -1
Jul 3 17:57:40 hostname kernel: mps0: mpssas_complete_all_commands
Jul 3 17:57:40 hostname kernel: (noperiph:mps0:0:4294967295:0): SMID 370
waking up cm 0xffffff8002aa7a10 state 1 ccb 0 for diag reset
Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup 0 tm
0 after command completion
Jul 3 17:57:40 hostname kernel: mps0: mps_reinit doorbell 0x24000000
Jul 3 17:57:40 hostname kernel: mps0: mps_reinit unmask interrupts post
0 free 1055
Jul 3 17:57:40 hostname kernel: mps0: mps_reinit restarting post 0 free
1055
Jul 3 17:57:40 hostname kernel: mps0: mps_reinit finished sc
0xffffff8002a77000 post 0 free 1055
Jul 3 17:57:40 hostname kernel: mps0: Reinit success
Jul 3 17:57:40 hostname kernel: mps0: mps_user_pass_thru: invalid
request: error 60
the drive showed up right after this.
Jul 3 18:00:15 hostname kernel: da91 at mps0 bus 0 scbus0 target 218 lun
0
Jul 3 18:00:15 hostname kernel: da91: <ATA ST3000DM001-1CH1 CC26> Fixed
Direct Access SCSI-6 device
Jul 3 18:00:15 hostname kernel: da91: 600.000MB/s transfers
Jul 3 18:00:15 hostname kernel: da91: Command Queueing enabled
Jul 3 18:00:15 hostname kernel: da91: 2861588MB (5860533168 512 byte
sectors: 255H 63S/T 364801C)
I suspect sas2ircu was probably responsible for this. The system was
generally unresponsive during that first sas2ircu run, but was normal
before and after.
Does this make any sense?
Is there a recommended firmware version (other than our current
14.00.00.00) for the 9205-8e which might help with this?
Thanks for any ideas,
Graham
--
-------------------------------------------------------------------------
Graham Allan
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------
More information about the freebsd-fs
mailing list