zfs process hang on pool access

David P Discher dpd at bitgravity.com
Wed Jul 27 20:41:43 UTC 2011


The way I found this was breaking into the debugger, do some back traces, continue, break in again, do some more back traces on the hung processes ... see what is going on, then walk through the code. 

Then what I had specific loops and code locations, asking the higher powers of the freebsd kernel world.

Of course, I had the high cpu and was peaking at the arc_reclaim_thread. 

I've seen this nearly like clockwork in production at 106-107 days. If it goes on too much longer than that, then things deadlock. 

But 112 days, and 8.2 ... you for sure have the LBOLT overflow. 

Otherwise, reboot and patch.  However, I have not fully vetted the patch under heavily load, and currently seeing another deadlock issue with 8.1+ zfs v14 - but seemly durning writes after 6-40 hours.  Still investigating. 

Note, my proposal of "time_uptime" doesn't work - as it causes a buildworld error in zfs userland tools.

This is what I'm currently running to fix the 26 day issue with l2arc feeder and arc_reclaim_thread with LBOLT in 8.1. 


Index: sys/cddl/compat/opensolaris/sys/time.h
===================================================================
--- sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110105) (revision 3322)
+++ sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110613) (working copy)
@@ -38,7 +38,7 @@
 
 typedef longlong_t     hrtime_t;
 
-#define        LBOLT   ((gethrtime() * hz) / NANOSEC)
+#define        LBOLT   (gethrtime() * (NANOSEC/hz))
 
 #if defined(__i386__) || defined(__powerpc__)
 #define        TIMESPEC_OVERFLOW(ts)                                           \

Index: sys/cddl/compat/opensolaris/sys/types.h
===================================================================
--- sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110105) (revision 3322)
+++ sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110613) (working copy)
@@ -34,6 +34,12 @@
  */
 
 #include <sys/stdint.h>
+
+#ifdef _KERNEL
+typedef        int64_t         clock_t;
+#define        _CLOCK_T_DECLARED
+#endif
+
 #include_next <sys/types.h>
 
 #define        MAXNAMELEN      256


---
David P. Discher
dpd at bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Jul 27, 2011, at 7:34 AM, Andriy Gapon wrote:

>> Ahh, is there anyway to confirm that before I reboot, or any other
>> information we could glean that might be useful?
> 
> No quick ideas, unfortunately.



More information about the freebsd-fs mailing list