ZFS: arc_reclaim_thread running 100%, 8.1-RELEASE, LBOLT related

Artem Belevich art at freebsd.org
Tue May 31 16:48:27 UTC 2011


On Tue, May 31, 2011 at 8:24 AM, Andriy Gapon <avg at freebsd.org> wrote:
>>>         65         nsec = (hrtime_t)ts.tv_sec * NANOSEC + ts.tv_nsec;
>>
>> Yup. This would indeed overflow in ~106.75 days.
>
> Have you referred to the LBOLT above?
> gethrtime() should have several hundred years before overflowing.

hrtime_t is 64-bit. NANOSEC=1000000000.

When it's time to use LBOLT, we further multiply number of seconds by HZ:
>>         41 #define LBOLT   ((gethrtime() * hz) / NANOSEC)

In the end we want 64-bit scalar to hold number of seconds times 10e12.
0x7fffffffffffffff/1000000000000 = 9223372  # number of seconds before
signed overflow
9223372/(24*60*60) --> 106   # .. or about 106 days

>> The side effect is that it limits bolt resolution to hz units. With
>> HZ=100, that will be 10ms. Whether it's good enough or too coarse I
>
> Nope, I think you did your your math wrong here.
> As shown above it limits the resolution to hz ticks, i.e. hz * 1/hz seconds :)

Point taken.

>
>> have no idea. Perhaps we can compromise and update lbolt in
>> microseconds. That should give us few hundred years until the
>> overflow.
>
> Well, we can either use the ticks variable, since we are not switching to tickless
> mode yet.  But we would need to make it 64-bit to avoid early overflows.
> Or, perhaps, to be somewhat future-friendly we could do approximately what
> OpenSolaris [upstream :-)] code does:
>
> gethrtime() / (NANOSEC / hz)
>
> Or, given that NANOSEC is constant and hz is invariant, we could apply extended
> invariant division by multiplication approach to get precise and fast result
> without overflowing.  But likely that's an overkill here.  Though we definitely
> should pre-calculate, store and re-use (NANOSEC / hz) value just like OpenSolaris
> does it.

This should work.

>
>>> clock_t will still need the typedef'ed to 64bit to still address the l2arc usage of LBOLT.
>
> Hm, I see that OpenSolaris defines clock_t to long, while we use int32_t.
> So, I think that it means two things:
> - i386 OpenSolaris (if it exists) should be affected by the problem as well
> - we should not use our native clock_t definition for ported OpenSolaris code
>
> Maybe we should fix our clock_t to be something wider at least for 64-bit
> platforms.  But I am not prepared to discuss this.

>
> To summarize:
> 1. We must fix ddi_get_lbolt*() to avoid unnecessary overflow in multiplication

Agreed.

> 2. We could fix 31-bit clock_t overflow by using Solaris-compatible definition of
> it (long), but that still won't help on i386.  Maybe we should bring up this issue
> to the attention of upstream ZFS developers.  Universally using ddi_get_lbolt64()
> seems like a safer bet.

FYI, we've already changed clock_t for opensolaris code to int64_t in
r218169 regardless of $MACHINE.

--Artem


More information about the freebsd-fs mailing list