[patch] zfs livelock and thread priorities

Artem Belevich fbsdlist at src.cx
Tue Apr 14 00:03:31 UTC 2009


Tried your patch that used PRIBIO+{1,2} for priorities with -current
r191008 and the kernel died with "spinlock held too long" panic.
Actually, there apparently were two instances of panic on different
cores..

Here's output of "alltrace" and "ps" after the crash:
http://pastebin.com/f140f4596

I've reverted the change and kernel booted just fine.

The box is quad-core with two ZFS pools -- one single-disk and another
one is a two-disk mirror. Freebsd is installed on UFS partitions, ZFS
is used for user stuff only.

--Artem


On Thu, Mar 19, 2009 at 5:19 PM, Ben Kelly <ben at wanderview.com> wrote:
> On Mar 19, 2009, at 7:06 PM, Adam McDougall wrote:
>>
>> I was really impressed with your diagnosis but didn't try your patch until
>> this afternoon.  I had not seen processes spin, but I have had zfs get stuck
>> roughly every 2 days on a somewhat busy ftp/rsync server until I turned off
>> zil again, then it was up for over 13 days when I decided to try this patch.
>>  This system boots from a ufs / and turns around to try mounting a zfs root
>> over top, but the first time it stalled for a few minutes at the root mount
>> and "gave up" with a spinlock held too long, second time same thing but I
>> didn't wait long enough for the spinlock error. Then I tried a power cycle
>> just because, and the next two tries I got a page fault kernel panic.  I'd
>> try to give more details but right now im trying to get the server back up
>> with a livecd because I goofed and don't have an old kernel to fall back on.
>>  Just wanted to let you know, and thanks for getting as far as you did!
>
> Ouch!  Sorry you ran into that.
>
> I haven't seen these problems, but I keep my root partition on UFS and only
> use zfs for /usr, /var, etc.  Perhaps that explains the difference in
> behavior.
>
> You could try changing the patch to use lower priorities.  To do this change
> compat/opensolaris/sys/proc.h so that it reads:
>
>  #define        minclsyspri     PRI_MAX_REALTIME
>  #define        maxclsyspri    (PRI_MAX_REALTIME - 4)
>
> This compiles and runs on my machine.  The theory here is that other kernel
> threads will be able to run as they used to, but the zfs threads will still
> be fixed relative to one another.  Its really just a stab in the dark,
> though.  I don't have any experience with the "zfs mounted on top of ufs
> root" configuration.  If this works we should try to see if we can replace
> PRI_MAX_REALTIME with PRI_MAX_KERN so that the zfs kernel threads run in the
> kernel priority range.
>
> If you could get a stack trace of the kernel panic that would be helpful.
>  Also, if you have console access, can you break to debugger during the boot
> spinlock hang and get a backtrace of the blocked process?
>
> If you want to compare other aspects of your environment to mine I uploaded
> a bunch of info here:
>
>  http://www.wanderview.com/svn/public/misc/zfs_livelock
>
> Finally, I'm CC'ing the list and some other people so they are aware that
> the patch runs the risk of a panic.
>
> I hope that helps.
>
> - Ben
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>


More information about the freebsd-current mailing list