Possible ZFS livelock or SCHED_ULE bug ?

Ben Kelly ben at wanderview.com
Wed Dec 16 16:41:07 UTC 2009


On Dec 16, 2009, at 11:04 AM, Arnaud Houdelette wrote:

> Hi all !
> I got a UniProcessor AMD64 box, with 512 MB ram with 2 ZFS pools as a home-NAS.
> 
> I got some IO issues since I moved from 7.2 to 8.0.
> With a GENERIC kernel (or a stripped down one),  during high IO activity (as a make buildword can cause), I encounter random hangs or deadlocks.
> top show system CPU usage at 99%, the most CPU using process being [zfskern] ( {txg_thread_enter} if I switch to thread view).
> The box still respond to ping. Current processes can still run, but I can't run new ones.
> Sometimes, I can return to normal by Ctrl-C-ing the buildworld (or other operation), sometimes I can't, I got to reboot the box.
> 
> The Issue seemed to become less frequent with 8.0-stable instead of 8.0-RELEASE, but still present (I get approximately 75% chance of hang with a buildworld).
> I got the hang with Prefetch enabled or disabled. Idem for ZIL.
> 
> I tried to enable kernel dumps, but the box hangs saving the dump (root is on ZFS) or when starting kdbg on it.
> I recompiled kernel with SCHED_4BSD, and it seems I can't reproduce the hang.
> 
> What do you think ?
> Did I misconfigured something ?

This sounds similar to something I ran into on CURRENT last year:

  http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832196+0+archive/2009/freebsd-current/20090322.freebsd-current

The immediate problem was a priority inversion problem between the txg_thread_enter threads and the spa_zio threads.  This should be solved (or at least mitigated) on 8.0 now that these threads have explicit priorities set.  Can you check to see what priorities these threads are at on your machine?  They should have priorities something like -8 for txg_thread_enter and -16 for spa_zio.

- Ben





More information about the freebsd-stable mailing list