[developer] Re: Potential bug recently introduced in arc_adjust() that leads to unintended pressure on MFU eventually leading to dramatic reduction in its size

Richard Elling richard.elling at richardelling.com
Wed Aug 29 22:03:33 UTC 2018


Thanks for passing this along, Mark.
Comments embedded

> On Aug 29, 2018, at 2:22 PM, Mark Johnston <markj at freebsd.org> wrote:
> 
> On Wed, Aug 29, 2018 at 12:42:33PM +0300, Paul wrote:
>> Hello team,
>> 
>> 
>> It seems like a commit on Mar 23 introduced a bug: if during execution of arc_adjust()
>> target is reached after MRU is evicted current code continues evicting MFU. Before said
>> commit, on the step prior to MFU eviction, target value was recalculated as:

arc_size is hot, so it was broken up into per-cpu counters and asize is now a snapshot
of the sum of the counters...

>> 
>>  target = arc_size - arc_c;
>> 
>> arc_size here is a global variable that was being updated accordingly, during MRU eviction,
>> hence this expression, resulted in zero or negative target if MRU eviction was enough
>> to reach the original goal.
>> 
>> Modern version uses cached value of arc_size, called asize:
>> 
>>  target = asize - arc_c;
>> 
>> Because asize stays constant during execution of whole body of arc_adjust() it means that
>> above expression will always be evaluated to value > 0, causing MFU to be evicted every 
>> time, even if MRU eviction has reached the goal already. Because of the difference in 
>> nature of MFU and MRU, globally it leads to eventual reduction of amount of MFU in ARC 
>> to dramatic numbers.
> 
> Hi Paul,
> 
> Your analysis does seem right to me.  I cc'ed the openzfs mailing list
> so that an actual ZFS expert can chime in; it looks like this behaviour
> is consistent between FreeBSD, illumos and ZoL.

Agree. In the pre-aggsum code, arc_size would have changed after the MRU adjustment.
Now it does not. I have at least one correlation to this occuring in a repeatable test that
I can run on my ZoL test machine (when it is finished punishing some other code).

> 
> Have you already tried the obvious "fix" of subtracting total_evicted
> from the MFU target?

ameta also needs to be re-aggsummed after the MRU adjustments.
 -- richard

> 
>> Servers that run the version of FreeBSD prior to the issue have this picture of ARC:
>> 
>>   ARC: 369G Total, 245G MFU, 97G MRU, 36M Anon, 3599M Header, 24G Other
>> 
>> As you can see, MFU dominates. This is a nature of our workload: we have a considerably 
>> small dataset that we use constantly and repeatedly; and a large dataset that we use
>> rarely.
>> 
>> But on the modern version of FreeBSD picture is dramatically different: 
>> 
>>   ARC: 360G Total, 50G MFU, 272G MRU, 211M Anon, 7108M Header, 30G Other
>> 
>> This leads to a much heavier burden on the disk sub-system.
>> 
>> 
>> Commit that introduced a bug: 
>> https://github.com/freebsd/freebsd/commit/555f9563c9dc217341d4bb5129f5d233cf1f92b8
> 
> ------------------------------------------
> openzfs: openzfs-developer
> Permalink: https://openzfs.topicbox.com/groups/developer/T10a105c53bcce15c-M8152dc2430a5ea4e625ad564
> Delivery options: https://openzfs.topicbox.com/groups/developer/subscription



More information about the freebsd-fs mailing list