[patch] zfs livelock and thread priorities
Ben Kelly
ben at wanderview.com
Sat May 16 16:40:50 UTC 2009
On May 15, 2009, at 11:13 PM, Adam McDougall wrote:
> On Tue, Apr 28, 2009 at 04:52:23PM -0400, Ben Kelly wrote:
> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote:
>> My system had eventually deadlocked overnight, though it took much
>> longer than before to reach that point.
>>
>> In the end I've got many many processes sleeping in zio_wait with no
>> disk activity whatsoever.
>> I'm not sure if that's the same issue or not.
>>
>> Here are stack traces for all processes -- http://pastebin.com/f364e1452
>> I've got the core saved, so if you want me to dig out some more info,
>> let me know if/how I could help.
>
> It looks like there is a possible deadlock between zfs_zget() and
> zfs_zinactive(). They both acquire a lock via ZFS_OBJ_HOLD_ENTER().
> The zfs_zinactive() path can get called indirectly from within
> zio_done(). The zfs_zget() can in turn block waiting for
> zio_done()'s
> completion while holding the object lock.
>
> The following patch might help:
>
> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>
> This simply bails out of the inactive processing if the object lock
> is
> already held. I'm not sure if this is 100% correct or not as it
> cannot verify there are references to the vnode. I also tried
> executing the zfs_zinactive() logic in a taskqueue to avoid the
> deadlock, but that caused other deadlocks to occur.
>
> Hope that helps.
>
> - Ben
>
> Its my understanding that the deadlock was fixed in -current,
> how does that affect the usefulness of the thread priorities
> patch? Should I continue testing it or is it effectively a
> NOOP now?
As far as I know the vnode release deadlock is unrelated to the thread
prioritization patch.
The particular problem I ran into that caused me to look at the
priorities was a livelock. When the arc cache got low on memory
sometimes user and txg threads would begin messaging each other in a
seemingly infinite pattern waiting for space to be freed.
Unfortunately, these threads were simultaneously starving out the
spa_zio threads from actually flushing data to the disks. This
effectively blocked all disk related activity and would wedge the box
when the syncer process got into the mix as well. This condition
doesn't happen on opensolaris because their use of explicit priorities
ensures that the spa_zio threads take precedence over user and txg
threads.
Beyond this particular scenario it seems possible that there are other
priority related problems lurking. ZFS in opensolaris is either
explicitly or implicitly designed around the different threads having
certain relative priorities. While it seems to mostly work without
these priorities we are definitely opening ourselves up to untested
corner cases by ignoring the prioritization.
The one downside I have noticed to setting zfs thread priorities
explicitly is a reduction in interactivity during heavy disk load.
This is somewhat to be expected since the spa_zio threads are running
at a higher priority than user threads. This has been an issue on
opensolaris as well:
http://bugs.opensolaris.org/view_bug.do?bug_id=6586537
The bug states that a fix is available, but I haven't had a chance to
go back and see what they ended up doing to make things more responsive.
Currently the thread priority patch for freebsd is a proof of
concept. If people think its a valid approach I can try to clean it
up so that it could be committed. The two main issues with it right
now are:
1) It changes the kproc(9) API by adding a kproc_create_priority()
function that allows you to set the priority of the newly created
thread. I'm not sure how people feel about this.
2) It makes the opensolaris thread_create() function take freebsd
priority values and sets the constants maxclsyspri and minclsyspri to
somewhat arbitrary values. This means that if someone ports other
opensolaris code over and passes priority values to thread_create
without using these constants they will get unexpected behavior. This
could be addressed by creating a mapping function from opensolaris
priorities to freebsd priorities.
> Also, I've been doing some fairly intense testing of zfs in
> recent -current and I am tracking down a situation where
> performance gets worse but I think I found a workaround.
> I am gathering more data regarding the cause, workaround,
> symptoms, and originating commit and will post about it soon.
I'd be interested to here more about this.
Thanks!
- Ben
More information about the freebsd-current
mailing list