[patch] zfs livelock and thread priorities

Tue May 19 11:14:26 UTC 2009

On May 19, 2009, at 5:40 AM, Attilio Rao wrote:
> 2009/5/19 Ben Kelly <ben at wanderview.com>:
>> On May 18, 2009, at 1:38 PM, Attilio Rao wrote:
>>>
>>> OMG.
>>> This still doesn't explain priorities like 49 or such seen in the
>>> first report as long as we don't set priorities by hand,
>>
>> I'm trying to understand why this particular priority value is so
>> concerning, but I'm a little bit confused.  Can you elaborate on  
>> why you
>> think its a problem?  From previous off-list e-mails I get the  
>> impression
>> that you are concerned that it does not fall on an RQ_PPQ  
>> boundary.  Is this
>> the case?  Again, I may be completely confused, but ULE does not  
>> seem to
>> consider RQ_PPQ when it assigns priorities for interactive  
>> threads.  Here is
>> how I came to this conclusion:
>
> I'm concerned because the first starvation I saw in this thread was
> caused by the proprity lowered inappropriately (it was 49 on 45 IIRC).
> 49 means that the thread will never be choosen when the 45s are still
> in the runqueue. I'm not concerned on RQ_PPQ boundaries.

Ah, ok.  Sorry for my confusion.

I guess the condition seemed somewhat reasonable to me because the  
behavior of the 45s probably looks very interactive to the scheduler.   
The user threads wake up, see that there is no space in the arc,  
signal the txg threads, then sleep.  The txg threads then wake up, see  
that the spa_zio threads are not done, signal all the user threads,  
then sleep.  They bounce back and forth like this very quickly while  
waiting for data to be flushed to the disk.  (On my system this can  
take a while since my backup pool is on a set of encrypted external  
USB drives.)  It seems likely that their runtime and sleeptime values  
are balanced so the scheduler marks them as high priority interactive  
threads.

So to me the interprocess communication within zfs appears to be  
somewhat brain damaged in low memory conditions, but I do not think it  
points to a problem in the scheduler.  It seems that no matter what  
algorithm the scheduler uses to determine interactivity an application  
will be able to devise a perverse work load that will be misclassified.

Anyway, that was my rough guestimate of what was happening.  If you  
have time to do a more thorough analysis of the ktr dump that would be  
great.  Thanks again for your help!

- Ben