svn commit: r355056 - in head/sys/dev: mpr mps

Warner Losh imp at FreeBSD.org
Sun Nov 24 15:24:06 UTC 2019


Author: imp
Date: Sun Nov 24 15:24:05 2019
New Revision: 355056
URL: https://svnweb.freebsd.org/changeset/base/355056

Log:
  Fix leak in state machine for commands.
  
  When we get a device departed message from the firmware, we send a TARGET_REST
  to the device to let the firmware know we're done and as part of the recovery
  process. This will abort all the commands. While the documentation says the IOC
  is responsible for writing the completion message for all the commands pending
  with an aborted status, we sometimes have queued commands for the target that
  haven't been completed so are in the INQUEUE state. So, when we later complete
  the pending CCB as aborted, these commands are freed and we hit the "state not
  busy" panic.
  
  Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
  that here as well. In talking to Ken, Scott and Justin, they recommended a
  series of tests to see if this is 100% safe. Those tests are ongoing, but
  preliminary tests suggest this is safe as we see no duplicate completions when
  we hit this case at work. We have a machine that has a dodgy powersupply which
  usually doesn't apply power to a few drives, but sometimes does when the machine
  is under heavy load so we get a rash of the connect / disconnect messages over
  half an hour. Without this change, we'd see state not busy panic. With this
  change, the drives just annoyingly come and go without affecting the rest of the
  machine, but without a complete error injection test suite, it's hard to know if
  all edge cases are now covered or not.
  
  Discussed with: scottl, ken, gibbs

Modified:
  head/sys/dev/mpr/mpr_sas.c
  head/sys/dev/mps/mps_sas.c

Modified: head/sys/dev/mpr/mpr_sas.c
==============================================================================
--- head/sys/dev/mpr/mpr_sas.c	Sun Nov 24 15:03:35 2019	(r355055)
+++ head/sys/dev/mpr/mpr_sas.c	Sun Nov 24 15:24:05 2019	(r355056)
@@ -624,6 +624,7 @@ mprsas_remove_device(struct mpr_softc *sc, struct mpr_
 		mpr_dprint(sc, MPR_XINFO, "Completing missed command %p\n", tm);
 		ccb = tm->cm_complete_data;
 		mprsas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE);
+		tm->cm_state = MPR_CM_STATE_BUSY;
 		mprsas_scsiio_complete(sc, tm);
 	}
 }

Modified: head/sys/dev/mps/mps_sas.c
==============================================================================
--- head/sys/dev/mps/mps_sas.c	Sun Nov 24 15:03:35 2019	(r355055)
+++ head/sys/dev/mps/mps_sas.c	Sun Nov 24 15:24:05 2019	(r355056)
@@ -619,6 +619,7 @@ mpssas_remove_device(struct mps_softc *sc, struct mps_
 		mps_dprint(sc, MPS_XINFO, "Completing missed command %p\n", tm);
 		ccb = tm->cm_complete_data;
 		mpssas_set_ccbstatus(ccb, CAM_DEV_NOT_THERE);
+		tm->cm_state = MPS_CM_STATE_BUSY;
 		mpssas_scsiio_complete(sc, tm);
 	}
 }


More information about the svn-src-all mailing list