ZFS: arc_reclaim_thread running 100%, 8.1-RELEASE, LBOLT related
Artem Belevich
art at freebsd.org
Tue May 31 19:21:34 UTC 2011
On Tue, May 31, 2011 at 12:07 PM, David P Discher <dpd at bitgravity.com> wrote:
>
> On May 31, 2011, at 9:56 AM, Andriy Gapon wrote:
>
>>>
>>> FYI, we've already changed clock_t for opensolaris code to int64_t in
>>> r218169 regardless of $MACHINE.
>>
>> I think that's a good solution.
>> I hope that we don't have any place where osol clock_t would somehow mix with fbsd
>> clock_t with detrimental effects.
>
> Well, clock_t to int64_t only fixes l2arc_reclaim_thread.
>
> If you look at lines 1712-1725 in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, the arc_evict(), (8.1-release), you'll see that clock_t is not use. So, yes, clock_t will fix the l2arc issue, but not the arc_reclaim/arc evict issue of the signed 64bit int overflowing for LBOLT.
No disagreement there. The way LBOLT is implemented now would indeed
cause the overflow that leads to CPU hogging you've reported.
> And from the conversation on this thread, there isn't any agreement on how to really fix it.
Andriy has proposed potential fix that would eliminate multiplication
of gethrtime result by hz and would delay overflow by few hundred
years. Should be good enough.
> Someone please explain to me the units of LBOLT?
HZ ticks per second.
>
> hz is cycles per second right ?
>
> #define LBOLT ((gethrtime() * hz) / NANOSEC)
>
>
> gethrtime() is returning nanoseconds. How is LBOLT in ticks-per-second ?
gethrtime is indeed nanoseconds. After division by NANOSES it becomes
old good seconds. Multiplication by hz (which is ticks/sec) the whole
thing represents just 'ticks'
>
> In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c :
>
> 152 /* number of seconds before growing cache again */
> 153 static int arc_grow_retry = 60;
That would be in seconds.
> ...
> 2255 /* reset the growth delay for every reclaim */
> 2256 growtime = LBOLT + (arc_grow_retry * hz);
(arc_grow_retry in seconds * hz in ticks/sec) --> ticks
>
>
> LBOLT is then clearly using it math with ticks-per-second. I'm I just crazy ? It seems to me that in this case, line 2256 ... can't be added together.
LBOLT is in 'ticks', not ticks/sec.
--Artem
More information about the freebsd-fs
mailing list