ZFS panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066
Richard Todd
rmtodd at servalan.servalan.com
Sun Mar 23 00:02:19 UTC 2014
I just recently updated my main machine to a recent (Wed. morning)
9-STABLE version. While doing backups to an external ZFS-formatted
USB drive I got the following panic:
panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066
The culprit seems to be the write-throttling code in dmu_tx.c (I've noted thea
actual line of the panic with an arrow):
#ifdef _KERNEL
#ifdef illumos
mutex_enter(&curthread->t_delay_lock);
while (cv_timedwait_hires(&curthread->t_delay_cv,
&curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns,
CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0)
continue;
mutex_exit(&curthread->t_delay_lock);
#else
/* XXX High resolution callouts are not available */
ASSERT(wakeup >= now); <------ panic
pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
#endif
#else
hrtime_t delta = wakeup - gethrtime();
struct timespec ts;
ts.tv_sec = delta / NANOSEC;
ts.tv_nsec = delta % NANOSEC;
(void) nanosleep(&ts, NULL);
#endif
The code is supposed to put the thread to sleep until such time as the time
"wakeup" arrives, but apparently under some circumstances when the system is
under heavy enough load things get delayed enough so that by the time the
code gets to line 1066 above, the actual time is past the "wakeup" time,
making the assert fail. (The system was quite busy when this happened, with
not only the backups running but a couple of the zfs pools were doing the
/etc/periodic-initiated regular scrub at the time.)
I've gone ahead and hacked my copy to skip the pause
if the wakeup time has already passed instead of doing a panic, logging an
informative printf when this happens:
diff -r e817c2457f83 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Wed Mar 19 21:08:39 2014 -0500
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Sat Mar 22 16:28:06 2014 -0500
@@ -1063,8 +1063,11 @@
mutex_exit(&curthread->t_delay_lock);
#else
/* XXX High resolution callouts are not available */
- ASSERT(wakeup >= now);
- pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+ if (wakeup < now) {
+ printf("Warning: dmu_tx_delay: wakeup %lu < now %lu\n", (unsigned long)wakeup, (unsigned long)now);
+ } else {
+ pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+ }
#endif
#else
hrtime_t delta = wakeup - gethrtime();
Things seem to be running okay with this patch, with an occasional console
message like
Mar 22 17:43:09 ichotolot kernel: Warning: dmu_tx_delay: wakeup 3965794197537 < now 3975734195460
More information about the freebsd-fs
mailing list