[SOLVED] Re: Strange behavior after running under high load

Sun Apr 4 20:01:46 UTC 2021

--------
Konstantin Belousov writes:

> > B) We lack a nuanced call-back to tell the subsystems to release some of their memory "without major delay".

> The delay in the wall clock sense does not drive the issue.

I didnt say anything about "wall clock" and you're missing my point by a wide margin.

We need to make major memory consumers, like vnodes take action *before* shortages happen, so that *when* they happen, a lot of memory can be released to relive them.

> We cannot expect any io to proceed while we are low on memory [...]

Which is precisely why the top level goal should be for that to never happen, while still allowing the freeable" memory to be used as a cache as much as possible.

> > C) We have never attempted to enlist userland, where jemalloc often hang on to a lot of unused VM pages.
> > 
> The userland does not add to this problem, [...]

No, but userland can help solve it:  The unused pages from jemalloc/userland can very quickly be released to relieve any imminent shortage the kernel might have.

As can pages from vnodes, and for that matter socket buffers.

But there are always costs, actual costs, ie: what it will take to release the memory (locking, VM mappings, washing) and potential costs (lack of future caching opportunities).

These costs need to be presented to the central memory allocator, so when it decides back-pressure is appropriate, it can decide who to punk for how much memory.

> But normally operating system does not have an issue with user pages.  

Only if you disregard all non-UNIX operating systems.

Many other kernels have cooperated with userland to balance memory (and for that matter disk-space).

Just imagine how much better the desktop experience would be, if we could send SIGVM to firefox to tell it stop being a memory-pig.

(At least two of the major operating systems in the desktop world does something like that today.)

> Io latency is not the factor there. We must avoid situations where
> instantiating a vnode stalls waiting for KVA to appear, similarly we
> must avoid system state where vnodes allocation consumed so much kmem
> that other allocations stall.

My argument is the precise opposite:  We must make vnodes and the allocations they cause responsive to the sytems overall memory availability, well in advance of the shortage happening in the first place.

> Quite indicative is that we do not shrink the vnode list on low memory
> events.  Vnlru also does not account for the memory pressure.

The only reason we do not, is that we cannot tell definitively if freeing a vnode will cause disk-I/O (which may not matter with SSD's) or even how much memory it might free, if anything.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.