ZFS panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066

Richard Todd rmtodd at servalan.servalan.com
Sun Mar 23 00:02:19 UTC 2014


I just recently updated my main machine to a recent (Wed. morning)
9-STABLE version.  While doing backups to an external ZFS-formatted 
USB drive I got the following panic:

panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066

The culprit seems to be the write-throttling code in dmu_tx.c (I've noted thea
actual line of the panic with an arrow):

#ifdef _KERNEL
#ifdef illumos
	mutex_enter(&curthread->t_delay_lock);
	while (cv_timedwait_hires(&curthread->t_delay_cv,
	    &curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns,
	    CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0)
		continue;
	mutex_exit(&curthread->t_delay_lock);
#else
	/* XXX High resolution callouts are not available */
	ASSERT(wakeup >= now);   <------ panic
	pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
#endif
#else
	hrtime_t delta = wakeup - gethrtime();
	struct timespec ts;
	ts.tv_sec = delta / NANOSEC;
	ts.tv_nsec = delta % NANOSEC;
	(void) nanosleep(&ts, NULL);
#endif

The code is supposed to put the thread to sleep until such time as the time
"wakeup" arrives, but apparently under some circumstances when the system is
under heavy enough load things get delayed enough so that by the time the
code gets to line 1066 above, the actual time is past the "wakeup" time, 
making the assert fail.  (The system was quite busy when this happened, with
not only the backups running but a couple of the zfs pools were doing the
/etc/periodic-initiated regular scrub at the time.) 
I've gone ahead and hacked my copy to skip the pause
if the wakeup time has already passed instead of doing a panic, logging an 
informative printf when this happens:

diff -r e817c2457f83 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c	Wed Mar 19 21:08:39 2014 -0500
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c	Sat Mar 22 16:28:06 2014 -0500
@@ -1063,8 +1063,11 @@
 	mutex_exit(&curthread->t_delay_lock);
 #else
 	/* XXX High resolution callouts are not available */
-	ASSERT(wakeup >= now);
-	pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+        if (wakeup < now) {
+            printf("Warning: dmu_tx_delay: wakeup %lu < now %lu\n", (unsigned long)wakeup, (unsigned long)now);
+        } else { 
+	    pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+        }
 #endif
 #else
 	hrtime_t delta = wakeup - gethrtime();

Things seem to be running okay with this patch, with an occasional console 
message like
Mar 22 17:43:09 ichotolot kernel: Warning: dmu_tx_delay: wakeup 3965794197537 < now 3975734195460



More information about the freebsd-fs mailing list