ZFS L2ARC statistics interpretation

Andriy Gapon avg at FreeBSD.org
Fri Aug 21 12:33:24 UTC 2015


On 20/08/2015 10:34, Andriy Gapon wrote:
> On 20/08/2015 03:29, Gary Palmer wrote:
>> On Wed, Aug 19, 2015 at 04:08:47PM -0700, Wim Lewis wrote:
>>> I'm trying to understand some problems we've been having with our ZFS systems, in particular their L2ARC performance. Before I make too many guesses about what's going on, I'm hoping someone can clarify what some of the ZFS statistics actually mean, or point me to documentation if any exists.
>>>
>>> In particular, I'm hoping someone can tell me the interpretation of:
>>>
>>> Errors:
>>>    kstat.zfs.misc.arcstats.l2_cksum_bad
>>>    kstat.zfs.misc.arcstats.l2_io_error
>>>
>>> Other than problems with the underlying disk (or controller or cable or...), are there reasons for these counters to be nonzero? On some of our systems, they increase fairly rapidly (20000/day). Is this considered normal, or does it indicate a problem? If a problem, what should I be looking at?
>>>
>>> Size:
>>>    kstat.zfs.misc.arcstats.l2_size
>>>    kstat.zfs.misc.arcstats.l2_asize
>>>
>>> What does l2_size/l2_asize measure? Compressed or uncompressed size? It sometimes tops out at roughly the size of my L2ARC device, and sometimes just continually grows (e.g., one of my systems has an l2_size of about 1.3T but a 190G L2ARC; I doubt I'm getting nearly 7:1 compression on my dataset! But maybe I am? How can I tell?)
>>>
>>> There are reports over the last few years [1,2,3,4] that suggest that there's a ZFS bug that attempts to use space past the end of the L2ARC, resulting both in l2_size being larger than is possible and also in io_errors and bad cksums (when the nonexistent sectors are read back). But given that this behavior has been reported off and on for several years now, and many of the threads devolve into supposition and folklore, I'm hoping to get an informed answer about what these statistics mean, whether the numbers I'm seeing indicate a problem or not, and be able to make a judgment about whether a given fix in FreeBSD might solve the problem.
>>>
>>> FWIW, I'm seeing these problems on FreeBSD 10.0 and 10.1; I'm not seeing them on 9.2. 
>>>
>>>
>>> [1] https://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html
>>> [2] https://forums.freebsd.org/threads/l2arc-degraded.47540/
>>> [3] https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020256.html
>>> [4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198242
>>
>>
>> I think the checksum/IO problems as well as the huge reported size
>> of your L2ARC are both a result of a problem described at the following
>> url
>>
>> https://reviews.freebsd.org/D2764
>>
>> Not sure if a fix is in 10.2 or not yet.
> 
> The fix is not in head yet.
> And the patch needs to be rebased after the recent large imports of the
> upstream code.

An updated patch for head is here
https://reviews.freebsd.org/D2764?download=true
Testers are welcome!


-- 
Andriy Gapon


More information about the freebsd-fs mailing list