mps0-troubles

Kenneth D. Merry ken at freebsd.org
Mon Feb 21 21:45:46 UTC 2011


On Mon, Feb 21, 2011 at 22:26:31 +0100, Joachim Tingvold wrote:
> On Mon, Feb 21, 2011, at 16:50:41PM GMT+01:00, Kenneth D. Merry wrote:
> >Okay, good.  It looks like it is running as designed.
> 
> It is? It still terminating the commands, which I guess it shouldn't?
> 
> 	mps0: (0:40:0) terminated ioc 804b scsi 0 state c xfer 0

Sorry, I missed that, I was just looking at the first part.

I'm still waiting for LSI to look at the SAS analyzer trace I sent them for
the "IOC terminated" bug.

It appears to be (at least on my hardware) a backend issue of some sort,
and probably not anything we can fix in the driver.

Since you've got an HP branded expander, that makes it a little more
difficult to determine whether it's an LSI, Maxim, or some other expander.
Can you try the following on your system?  You'll need the sg3_utils port:

sg_inq -i ses0

(I need to update camcontrol to parse page 0x83 output.)

This is example output from a Supermicro chassis with an LSI 3Gb expander:

VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 12
    transport: Serial Attached SCSI (SAS)
    designator_type: NAA,  code_set: Binary
    associated with the target port
      NAA 5, IEEE Company_id: 0x3048
      Vendor Specific Identifier: 0x45157f
      [0x500304800045157f]

Maxim expanders seem to report LUN descriptors in VPD page 0x83 instead of
target port descriptors.  We might get a slight clue from the output, but
it's hard to say for certain since HP could have customized the page 0x83
values in the expander firmware.

> >I'll probably just double the number of chain buffers and disable the
> >messages.
> >
> >From previous experiments, the problem is much less likely to occur  
> >when
> >you have 2048 chain buffers, correct?
> 
> It just doesn't display the 'out of chain'-errors, that's all I think.

Well, if you don't see the 'out of chain' errors with 2048 chain buffers,
that means the condition isn't happening.

The cost of going from 1024 to 2048 is only 32K of extra memory, which is
not a big deal, so I think I'll go ahead and bump the limit up and remove
the printfs.  We've now proven the recovery strategy, so it'll just slow
things down slightly if anyone runs into that issue again.

> >What filesystem are you using by the way?
> 
> ZFS.

Interesting.  I haven't been able to run out of chain elements with ZFS,
but I can use quite a few with UFS.  I had to artificially limit the number
of chain elements to test the change.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the freebsd-scsi mailing list