kern/140438: isp(4) causes threads to hang due to callwheel bucket list corruption

Tue Nov 10 02:40:02 UTC 2009

>Number:         140438
>Category:       kern
>Synopsis:       isp(4) causes threads to hang due to callwheel bucket list corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 10 02:40:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Alexander Sack
>Release:        RELENG_7, RELENG_7_2_RELEASE, RELENG_6_1_RELEASE
>Organization:
Niksun
>Environment:
FreeBSD 7.2-STABLE FreeBSD 7.2-STABLE #3: Mon Nov  9 18:23:49 EST 2009  amd6
>Description:
Hardware:

IBM x346 Series with 8GB of RAM
QLA2342 BIOS 1.47 and 1.52 connected to an external SAN
ispfw(4) 3.3.19 as well as 3.3.25 (taken from QLogic website)

The basic signature of the issue is that a process will call usleep() and the nanosleep() system call will never come back!  Quite disturbing.  The way I was reproducing the issue was by executing a test C binary that called usleep() in a tight loop, sample code:

    int n = 1;
    while (n <= times) {
	usleep(usecs);

	if ((n % 1000) == 0) {
	    printf("sleeper: called %d times\n", n);
	}

	if (gotSignal) {
	    printf("sleeper: received sig %d\n", theSignal);
	    theSignal = -1;
	    gotSignal = 0;
	}

	++n;
    }

[this based off of a colleague and friend's initial test program, I added signals so I could restart it and debug the sleepq path]

etc.  Clearly the above should NEVER hang on a system.  The problem stems from the isp(4) driver.  I found that if no fibre channel devices were found or the cards were removed, the system would act normally.  I also discovered that if I swap the 23xx 2GB FC card for a 24xx 4GB FC card, again the symptoms disappear.  Also if a SIGIO was sent to the process, it would restart (which means more than likely it was not sleep queue stuff, i.e. a valid td was there to setrunnable and move on).

Root Cause:

The issue stems from the fact that there is a race condition of who starts the isp(4)'s Lost Device Timer callout or isp_ldt() function.  The callout is started at attach time in the isp_attach() function.  This is called right after the firmware of the card is initialized via mbox command.  The isp_ldt() callout is also started in isp_async() in response to a ISPASYNC_LOOP_DOWN event.   The issue is the ISP_LOCK() is dropped in isp_attach() while interrupts are attached which allows a ISASYNC_LOOP_DOWN event to occur BEFORE the LDT callout is initialized (via callout_init) on the 23xx series of cards.  As a result, callout_reset() is called FIRST out of isp_async() BEFORE callout_init() in isp_attach() which depending on release (6.1 vs. 7.2) causes different callwheel side effects.  The gist of it is the callwheel bucket list gets corrupt where by other callout's that happen to fall on the bucket the isp_ldt() wound up on, get stuck (to go even further, the head of a bucket
  list head pointer gets set to NULL even though there is a valid callwheel_tailq first and last pointers.  When I walked the whole list backwards (i.e. follow last backwards) I found my sleeper on the corrupt list!  softclock() SKIPs the whole bucket list since it believes nothing is actually on it due to the corrupt head pointer being NULL and HENCE, valid callouts on the list never wake up and hang).  Again callout_init() needs to be called before the first callout_reset to avoid these side effects.  Depending on timing, you get different callwheel types of issues.  ALSO bare in mind, if you happen to get a boot where callout_init() does get called in time, everything just works as it should which makes this problem very sporadic.

I have not gone into thinking about how to protecting the callout wheel from misbehaving clients (i.e. perhaps a flag should be added and checked to ensure callout_init() (and variants) was called before allowing callout_reset() to proceed, a simple check has very minimal overhead, my 2 cents).

Just to note, it is my feeling that interrupts really shouldn't be serviced this early anyway but trying to rearrange the logic of isp_reset()->isp_init()->isp_attach() logic is way too much of a change.  This patch was been tested extensively on 7.2 and 6.1 with great success.

I know due to the 8Gbps FC support on RELENG_8 and HEAD, this problem may not exist, I have to verify that in my testing (also the callout stuff is also changed a lot on a per cpu basis so even if the problem DOES exist it may not manifest itself in the same fashion).  Though 7.x is still susceptible to this issue and I feel my patch or a variant should be considered for MFC ASAP.

Please feel free to email me for any further questions or comments!

Thanks!

-aps
>How-To-Repeat:
Run usleep() in a tight loop on a system, it will eventually hang.  I found that if I disconnect and replug in a FC port to cause LDT callout cycle, the box would hang pretty quickly (just got to get my sleep program on the same bucket as isp_ldt()).
>Fix:
The patch submitted below is an attempt to fix the issue while trying to do the minimal amount to keep regression testing low.  

It composes of two main fixes:

- Initialize the callout structures before attach time initialization
- Start LDT only once during attach time (either from the actual attach OR from the IPASYNC_LOOP_DOWN event, whatever comes first, drivers was designed the loop is down until isp_kthread() calls isp_fc_runstate() which will generate NOTIFY messages which will cancel the LDT otherwise it times out as usual). 

With the patch I no longer have any of these issues and my sleep program never hangs (without it, I can get it to hang almost immediately).  Was hoping to get some feedback from the core group before I go ahead and try this in the field.

Patch attached with submission follows:

Index: isp_freebsd.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/isp/isp_freebsd.c,v
retrieving revision 1.146.2.1
diff -u -r1.146.2.1 isp_freebsd.c

--- isp_freebsd.c	13 May 2009 19:26:04 -0000	1.146.2.1
+++ isp_freebsd.c	10 Nov 2009 02:33:32 -0000
@@ -247,8 +247,6 @@
 	 * Create a kernel thread for fibre channel instances.
 	 */
 	if (IS_FC(isp)) {
-		isp_callout_init(&isp->isp_osinfo.ldt);
-		isp_callout_init(&isp->isp_osinfo.gdt);
 		ISP_UNLOCK(isp);
 #if __FreeBSD_version >= 500000  
 		if (kthread_create(isp_kthread, isp, &isp->isp_osinfo.kproc,
@@ -272,11 +270,13 @@
 		 */
 		if (isp->isp_role & ISP_ROLE_INITIATOR) {
 			isp_freeze_loopdown(isp, "isp_attach");
-			isp->isp_osinfo.ldt_running = 1;
-			callout_reset(&isp->isp_osinfo.ldt,
-			    isp_quickboot_time * hz, isp_ldt, isp);
-			isp_prt(isp, ISP_LOGSANCFG|ISP_LOGDEBUG0,
-			   "Starting Initial Loop Down Timer");
+			if (isp->isp_osinfo.ldt_running == 0) {
+			    	isp->isp_osinfo.ldt_running = 1;
+				callout_reset(&isp->isp_osinfo.ldt,
+			    	   isp_quickboot_time * hz, isp_ldt, isp);
+				isp_prt(isp, ISP_LOGSANCFG|ISP_LOGDEBUG0,
+				   "Starting Initial Loop Down Timer");
+			}
 		}
 	}
 }
Index: isp_pci.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/isp/isp_pci.c,v
retrieving revision 1.148
diff -u -r1.148 isp_pci.c
--- isp_pci.c	26 Jun 2007 23:08:57 -0000	1.148
+++ isp_pci.c	10 Nov 2009 02:33:33 -0000
@@ -1193,6 +1193,15 @@
 	}
 
 	/*
+	 * Initialize callouts here to ensure no side-effects
+	 * after we start the ISP's firmware.  
+	 */
+	if (IS_FC(isp)) {
+	    	isp_callout_init(&isp->isp_osinfo.ldt);
+	    	isp_callout_init(&isp->isp_osinfo.gdt);
+	}
+
+	/*
 	 * Make sure we're in reset state.
 	 */
 	ISP_LOCK(isp);


>Release-Note:
>Audit-Trail:
>Unformatted: