svn commit: r254615 - head/sys/dev/mps

Kenneth D. Merry ken at freebsd.org
Thu Aug 22 14:21:09 UTC 2013


On Thu, Aug 22, 2013 at 16:42:41 +0400, Dmitry Morozovsky wrote:
> Ken,
> 
> On Wed, 21 Aug 2013, Kenneth D. Merry wrote:
> 
> > Author: ken
> > Date: Wed Aug 21 21:30:56 2013
> > New Revision: 254615
> > URL: http://svnweb.freebsd.org/changeset/base/254615
> > 
> > Log:
> >   Fix mps(4) driver breakage that came in in change 253550 that
> >   manifested itself in out of chain frame conditions.
> >   
> >   When the driver ran out of chain frames, the request in question
> >   would get completed early, and go through mpssas_scsiio_complete().
> >   
> >   In mpssas_scsiio_complete(), the negation of the CAM status values
> >   (CAM_STATUS_MASK | CAM_SIM_QUEUED) was ORed in instead of being
> >   ANDed in.  This resulted in a bogus CAM CCB status value.  This
> >   didn't show up in the non-error case, because the status was reset
> >   to something valid (e.g. CAM_REQ_CMP) later on in the function.
> >   
> >   But in the error case, such as when the driver ran out of chain
> >   frames, the CAM_REQUEUE_REQ status was ORed in to the bogus status
> >   value.  This led to the CAM transport layer repeatedly releasing
> >   the SIM queue, because it though that the CAM_RELEASE_SIMQ flag had
> >   been set.  The symptom was messages like this on the console when
> >   INVARIANTS were enabled:
> >   
> >   xpt_release_simq: requested 1 > present 0
> >   xpt_release_simq: requested 1 > present 0
> >   xpt_release_simq: requested 1 > present 0
> 
> what is real impact of the bug?

Your system will essentially hang, certainly as far as anything connected
to the controller in question.

> >   
> >   mps_sas.c:	In mpssas_scsiio_complete(), use &= to take status
> >   		bits out.  |= adds them in.
> >   
> >   		In the error case in mpssas_scsiio_complete(), set
> >   		the status to CAM_REQUEUE_REQ, don't OR it in.
> >   
> >   MFC after:	3 days
> 
> This patch does not apply cleanly as r253550 had not been merged, and the first 
> masking does not occur on contemporary stable/9. Comments?

As far as I know, this is not a problem on the version of the driver in
stable/9.  But then again, I have not tested the out of chain frames code
since early 2011 when I last fixed it.

If you want to verify the behavior is correct in stable/9, do this:

1. enable INVARIANTS

2. In /boot/loader.conf:
hw.mps.max_chains=32

3. Use up most of your memory.  If you're using ZFS, just do a sequential
write to a file so that the ARC starts filling up with cached data.  Look
at the free memory in top to see how much you've used.  This will cause
enough fragmentation to lead to more scatter/gather segments getting used
in the driver.

4. Do something like this:

((i=0)); while [ $i -lt 60 ]; do dd if=/dev/da0 of=/dev/null bs=1m &
((i++)); done

5.  Look for an out of chain frames message on the console.  To see how far
you are towards using the chain frames, run 'sysctl dev.mps'.  You can see
how many chain frames you have free, and how many requests have failed.

This change just needs to be merged along with the other changes to avoid
having the regression in stable.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the svn-src-head mailing list