FreeBSD 8.2 - active plus inactive memory leak!?
freebsd at damnhippie.dyndns.org
Wed Mar 7 00:48:57 UTC 2012
On Tue, 2012-03-06 at 19:13 +0000, Luke Marsden wrote:
> Hi all,
> I'm having some trouble with some production 8.2-RELEASE servers where
> the 'Active' and 'Inact' memory values reported by top don't seem to
> correspond with the processes which are running on the machine. I have
> two near-identical machines (with slightly different workloads); on one,
> let's call it A, active + free is small (6.5G) and on the other (B)
> active + free is large (13.6G), even though they have almost identical
> sums-of-resident memory (8.3G on A and 9.3G on B).
> The only difference is that A has a smaller number of quite long-running
> processes (it's hosting a small number of busy sites) and B has a larger
> number of more frequently killed/recycled processes (it's hosting a
> larger number of quiet sites, so the FastCGI processes get killed and
> restarted frequently). Notably B has many more ZFS filesystems mounted
> than A (around 4,000 versus 100). The machines are otherwise under
> similar amounts of load. I hoped that the community could please help
> me understand what's going on with respect to the worryingly large
> amount of active + free memory on B.
> Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes
> around 5-6 days. I have recently reduced the ARC cache on both machines
> since my previous thread  and Wired memory usage is now stable at 6G
> on A and 7G on B with an arc_max of 4G on both machines.
> Neither of the machines have any swap in use:
> Swap: 10G Total, 10G Free
> My current (probably quite simplistic) understanding of the FreeBSD
> virtual memory system is that, for each process as reported by top:
> * Size corresponds to the total size of all the text pages for the
> process (those belonging to code in the binary itself and linked
> libraries) plus data pages (including stack and malloc()'d but
> not-yet-written-to memory segments).
> * Resident corresponds to a subset of the pages above: those pages
> which actually occupy physical/core memory. Notably pages may
> appear in size but not appear in resident for read-only text
> pages from libraries which have not been used yet or which have
> been malloc()'d but not yet written-to.
> My understanding for the values for the system as a whole (at the top in
> 'top') is as follows:
> * Active / inactive memory is the same thing: resident memory from
> processes in use. Being in the inactive as opposed to active
> list simply indicates that the pages in question are less
> recently used and therefore more likely to get swapped out if
> the machine comes under memory pressure.
> * Wired is mostly kernel memory.
> * Cache is freed memory which the kernel has decided to keep in
> case it correspond to a useful page in future; it can be cheaply
> evicted into the free list.
> * Free memory is actually not being used for anything.
> It seems that pages which occur in the active + inactive lists must
> occur in the resident memory of one or more processes ("or more" since
> processes can share pages in e.g. read-only shared libs or COW forked
> address space). Conversely, if a page *does not* occur in the resident
> memory of any process, it must not occupy any space in the active +
> inactive lists.
> Therefore the active + inactive memory should always be less than or
> equal to the sum of the resident memory of all the processes on the
> system, right?
> But it's not. So, I wrote a very simple Python script to add up the
> resident memory values in the output from 'top' and, on machine A:
> Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G
> There were 246 processes totalling 8271 MB resident memory
> Whereas on machine B:
> Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M
> There were 441 processes totalling 9297 MB resident memory
> Now, on machine A:
> 3388M active + 3209M inactive - 8271M sum-of-resident = -1674M
> I can attribute this negative value to shared libraries between the
> running processes (which the sum-of-res is double-counting but active +
> inactive is not). But on machine B:
> 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M
> I'm struggling to explain how, when there are only 9.2G (worst case,
> discounting shared pages) of resident processes, the system is using 11G
> + 2598M = 13.8G of memory!
> This "missing memory" is scary, because it seems to be increasing over
> time, and eventually when the system runs out of free memory, I'm
> certain it will crash in the same way described in my previous thread
> Is my understanding of the virtual memory system badly broken - in which
> case please educate me ;-) or is there a real problem here? If so how
> can I dig deeper to help uncover/fix it?
> Best Regards,
> Luke Marsden
>  lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html
>  https://gist.github.com/1988153
In my experience, the bulk of the memory in the inactive category is
cached disk blocks, at least for ufs (I think zfs does things
differently). On this desktop machine I have 12G physical and typically
have roughly 11G inactive, and I can unmount one particular filesystem
where most of my work is done and instantly I have almost no inactive
and roughly 11G free.
More information about the freebsd-stable