MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller

Thu Apr 29 13:55:40 UTC 2010

Alexander Motin wrote:
> Pete French wrote:
>>> I have some 29160N locally and I'll try to reproduce this.
>> I would suggest you try gmirror across two drives - that is how
>> both myself and the original poster first noticed the issue.
> 
> Thanks. First step successful - I can steadily reproduce problem on
> CURRENT. raidtest with 200 I/O streams over gmirror of two disks on same
> channel triggers issue in seconds. Any I/O on channel dying after both
> disks report "Queue full" error same time. The rest of system works
> fine. If I preliminarily manually adjust queue depth of one disk -
> everything works fine. I'll investigate it tomorrow.

Seems like I've found the reason. Attached patch fixes problem for me.

This call was removed by mistake in specified commit. It is not needed
during normal operation, only when device queue shrinking. And even in
that case problem often wasn't not triggered if there were more requests
and controller request allocation queue wasn't not exhausted at the
moment. That's why problem wasn't detected and why gmirror increased
it's chances.

-- 
Alexander Motin
-------------- next part --------------

--- cam_xpt.c.prev	2010-04-28 08:15:40.000000000 +0300
+++ cam_xpt.c	2010-04-29 16:01:23.000000000 +0300
@@ -4903,6 +4903,10 @@ camisr_runqueue(void *V_queue)
 			if ((dev->flags & CAM_DEV_TAG_AFTER_COUNT) != 0
 			 && (--dev->tag_delay_count == 0))
 				xpt_start_tags(ccb_h->path);
+			if (!device_is_send_queued(dev)) {
+				runq = xpt_schedule_dev_sendq(ccb_h->path->bus,
+				    dev);
+			}
 		}
 
 		if (ccb_h->status & CAM_RELEASE_SIMQ) {