11.2-STABLE kernel wired memory leak

Karl Denninger karl at denninger.net
Tue Feb 12 17:12:02 UTC 2019


On 2/12/2019 10:49, Eugene Grosbein wrote:
> 12.02.2019 23:34, Mark Johnston wrote:
>
>> I suspect that the "leaked" memory is simply being used to cache UMA
>> items.  Note that the values in the FREE column of vmstat -z output are
>> quite large.  The cached items are reclaimed only when the page daemon
>> wakes up to reclaim memory; if there are no memory shortages, large
>> amounts of memory may accumulate in UMA caches.  In this case, the sum
>> of the product of columns 2 and 5 gives a total of roughly 4GB cached.
> Forgot to note, that before I got system to single user mode, there was heavy swap usage (over 3.5GB)
> and heavy page-in/page-out, 10-20 megabytes per second and system was crawling slow due to pageing.

This is a manifestation of the general issue I've had an ongoing
"discussion" running in a long-running thread on bugzilla and the
interaction between UMA, ARC and the VM system.

The short version is that the VM system does pathological things
including paging out working set when there is a large amount of
allocated-but-unused UMA and the means by which the ARC code is "told"
that it needs to release RAM also interacts with the same mechanisms and
exacerbates the problem.

I've basically given up on getting anything effective to deal with this
merged into the code and have my own private set of patches that I
published for a while in that thread (and which had some collaborative
development and testing) but have given up on seeing anything meaningful
put into the base code.  To the extent I need them in a given workload
and environment I simply apply them on my own and go on my way.

I don't have enough experience with 12 yet to know if the same approach
will be necessary there (that is, what if any changes got into the 12.x
code), and never ran 11.2 much, choosing to stay on 11.1 where said
patches may not have been the most-elegant means of dealing with it but
were successful.  There was also a phabricator thread on this but I
don't know what part of it, if any, got merged (it was more-elegant, in
my opinion, than what I had coded up.)  Under certain workloads here
without the patches I was experiencing "freezes" due to unnecessary
page-outs onto spinning rust that in some cases reached into
double-digit *seconds.*  With them the issue was entirely resolved.

At the core of the issue is that "something" has to be taught that
*before* the pager starts evicting working set to swap if you have large
amounts of UMA allocated to ARC but not in use that RAM should be
released, and beyond that if you have ARC allocated and in use but are
approaching where VM is going to page working set out you need to come
up with some meaningful way of deciding whether to release some of the
ARC rather than take the page hit -- and in virtually every case the
answer to that question is to release the RAM consumed by ARC.  Part of
the issue is that UMA can be allocated for other things besides ARC yet
you really only want to release the ARC-related UMA that is
allocated-but-unused in this instance.

The logic is IMHO pretty simple on this -- a page-out of a process that
will run again always requires TWO disk operations -- one to page it out
right now and a second at a later time to page it back in.  A released
ARC cache *MAY* (if there would have been a cache hit in the future)
require ONE disk operation (to retrieve it from disk.)

Two is always greater than one and one is never worse than "maybe one
later" therefore prioritizing taking two *definite* disk I/Os or one
definite I/O now and one possible one later instead of one *possible*
disk I/O later is always a net lose -- and thus IMHO substantial effort
should be made to avoid doing that.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190212/59fe917d/attachment.bin>


More information about the freebsd-stable mailing list