zfs process hang on pool access
David P Discher
dpd at bitgravity.com
Tue Aug 2 18:01:27 UTC 2011
On Aug 1, 2011, at 5:55 AM, Martin Matuska wrote:
>>> ===================================================================
>>> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (revision
>>> 224527)
>>> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (working copy)
>>> @@ -488,7 +488,7 @@
>>> txg_delay(dsl_pool_t *dp, uint64_t txg, int ticks)
>>> {
>>> tx_state_t *tx = &dp->dp_tx;
>>> - int timeout = ddi_get_lbolt() + ticks;
>>> + clock_t timeout = ddi_get_lbolt() + ticks;
>>
>> So you recon that one line will fix the 100+ days overflow David's
>> talking about?
> No, this bug causes that after 28,5 days of uptime this function won't
> delay txg sync threads anymore (ddi_get_lbolt() will be always larger
> than timeout).This may lead to a slowdown of writes.
Thanks Martin. I see you committed to -head as well.
However, a bit confused with my issue still. I've been able to recreating it and I can't explain why LBOLT is still coming back negative. I'm in 8.1-RELEASE in this case, with the patches that I posted before for LBOLT and typedef of clock_t.
I added some debugging and am seeing LBOLT still return a negative number, which then (timeout-LBOLT) because a large number of ticks to wait.
I put some debugging in txg_delay, without fixing the timeout's type:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (revision 3343)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (working copy)
@@ -427,6 +427,7 @@
{
tx_state_t *tx = &dp->dp_tx;
int timeout = LBOLT + ticks;
+ /* if ( timeout < 0 ) { printf ("txg_delay: timeout negative timeout=%d, ticks=%d\n", timeout, ticks); } */
/* don't delay if this txg could transition to quiesing immediately */
if (tx->tx_open_txg > txg ||
@@ -439,10 +440,10 @@
return;
}
- while (LBOLT < timeout &&
- tx->tx_syncing_txg < txg-1 && !txg_stalled(dp))
- (void) cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock,
- timeout - LBOLT);
+ while (LBOLT < timeout && tx->tx_syncing_txg < txg-1 && !txg_stalled(dp)) {
+ printf ("txg_delay::cv_timeoutwait: timeout negative timeout=%d, ticks=%d %lld \n", timeout, ticks, LBOLT );
+ (void) cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock, timeout - LBOLT);
+ }
mutex_exit(&tx->tx_sync_lock);
}
After two hours of doing 9 parallel file creates ... I got it deadlocked ... well, probably just a long wait ...
txg_delay::cv_timeoutwait: timeout negative timeout=-518109823, ticks=1 -9182683359751551616
txg_delay::cv_timeoutwait: timeout negative timeout=-355729919, ticks=1 -9182562881461551616
txg_delay::cv_timeoutwait: timeout negative timeout=1272384705, ticks=1 -9182430354322551616
txg_delay::cv_timeoutwait: timeout negative timeout=1272384705, ticks=1 -9182308872293551616
txg_delay::cv_timeoutwait: timeout negative timeout=1272384705, ticks=1 -9182180363183551616
txg_delay::cv_timeoutwait: timeout negative timeout=1272384705, ticks=1 -9182051853654551616
Grabbed a core, and did some more poking :
(kgdb) up
#3 0xffffffff804d222b in _cv_timedwait (cvp=0xffffff0029524260,
lock=0xffffff00295241d0, timo=1914894593)
at /usr/src/sys/kern/kern_condvar.c:323
323 rval = sleepq_timedwait(cvp, 0);
(kgdb) up
#4 0xffffffff8106ef7e in txg_delay (dp=0xffffff0029524000, txg=9264, ticks=1)
at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:445
445 (void) cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock, timeout - LBOLT);
(kgdb) info locals
timeout = Variable "timeout" is not available.
(kgdb)
As you can see, LBOLT is apparently negative - if I'm getting my printf formatting correct - but regardless the timo value to cv_timedwait() is timo=1914894593 ... which is trying to wait for :
1914894593 / 1000hz / 60s / 60m /24h = 22.16 hours
Regardless of setting timeout's type correctly ... why is LBOLT still negative ? It's not clear to me that fixing timeout's type will fix this issue. However, I will do so, and re-run this same test.
My patches earlier show my changes to sys/cddl/compat/opensolaris/sys/time.h but in summary :
37 #define NANOSEC 1000000000
39 typedef longlong_t hrtime_t;
41 #define LBOLT (gethrtime() * (NANOSEC/hz))
And gerhrtime() still remains unchanged from 8.1:
51 #ifdef _KERNEL
52 #define lbolt64 (int64_t)(LBOLT)
53
54 static __inline hrtime_t
55 gethrtime(void) {
56
57 struct timespec ts;
58 hrtime_t nsec;
59
60 #if 1
61 getnanouptime(&ts);
62 #else
63 nanouptime(&ts);
64 #endif
65 nsec = (hrtime_t)ts.tv_sec * NANOSEC + ts.tv_nsec;
66 return (nsec);
67 }
68
69 #define gethrestime_sec() (time_second)
70 #define gethrestime(ts) getnanotime(ts)
---
David P. Discher
dpd at bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com
More information about the freebsd-fs
mailing list