Still getting kmem exhausted panic

Ben Kelly ben at wanderview.com
Tue Sep 28 18:40:07 UTC 2010


On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote:

> on 28/09/2010 19:46 Ben Kelly said the following:
>> Hmm.  My server is currently idle with no I/O happening:
>> 
>>  kstat.zfs.misc.arcstats.c: 25165824
>>  kstat.zfs.misc.arcstats.c_max: 46137344
>>  kstat.zfs.misc.arcstats.size: 91863156
>> 
>> If what you say is true, this shouldn't happen, should it?  This system is an i386 machine with kmem max at 800M and arc set to 40M.  This is running head from April 6, 2010, so it is a bit old, though.
> 
> Well, your system is a bit old indeed.
> And the branch is unknown, so I can't really see what sources you have.
> And I am not sure if I'll be able to say anything about those sources.

Quite old.  I've been intending to update, but haven't found the time lately.  I'll try to do the upgrade this weekend and see if it changes anything.

> As to the numbers - yes, with current code I'd expect arcstats.size to go down to
> arcstats.c when there is no I/O.  arc_reclaim_thread should do that.

Thats what I thought as well, but when I debugged it a year or two ago I found that the buffers were still referenced and thus could not be reclaimed.  As far as I can remember they needed a vfs/vnops like zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free the reference.  What is responsible for making those calls?

> 
>> At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max.  Or at least I thought it did.
>> 
>> In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs.  So I'm more than willing to believe I'm confused.  Any insights are greatly appreciated.
> 
> ARC is a ZFS private cache.
> ZFS doesn't use unified buffer/page cache.
> So ARC is not directly affected by pagedaemon.
> But this is not exactly VFS layer thing.

Can you explain the difference in how the vfs/vnode operations are called or used for those two situations?

I thought that the buffer cache was used by filesystems to implement these operations.  So that the buffer cache was below the vfs/vnops layer.  So while zfs implemented its operations in terms of the arc, things like UFS implemented vfs/vnops in terms of the buffer cache.  I thought the layers further up the chain like the page daemon did not distinguish that much between these two implementation due to the VFS interface layer.  (Although there seems to be a layering violation in that the buffer cache signals directly to the upper page daemon layer to trigger page reclamation.)

The old (ancient) patch I tried previously to help reduce the arc working set and allow it to shrink is here:

  http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff

Unfortunately, there are a couple ideas on fighting fragmentation mixed into that patch.  See the part about arc_reclaim_pages().  This patch did seem to allow my arc to stay under the target maximum even when under load that previously caused the system to exceed the maximum.  When I update this weekend I'll try a stripped down version of the patch to see if it helps or not with the latest zfs.

Thanks for your help in understanding this stuff!

- Ben


More information about the freebsd-fs mailing list