High mbuf count leading to processes getting killed for "lack of swap space"

From: Fabian Keil <freebsd-listen_at_fabiankeil.de>
Date: Tue, 14 Dec 2021 15:28:36 UTC
A while ago one of my physical systems running ElectroBSD based on
FreeBSD stable/11 became occasionally unresponsive to network
input and had to be power-cycled to get into a usable state again.

It's conceivable that console access still would have worked
but I didn't have console access for the system (and still don't).

The problem was always preceded by
"[zone: mbuf_cluster] kern.ipc.nmbclusters limit reached"
messages and only occurred when the system was busy reproducing
ElectroBSD or building ports with poudriere while additionally
running the normal work load which includes serving web pages
with nginx and relaying tor traffic.

Some additional log messages and munin graphs from March are
available at:
<https://www.fabiankeil.de/blog-surrogat/2021/03/14/website-ausfall-durch-mbuf-cluster-limit.html>

As a first step to diagnose the problem I added munin plugins
for the mbuf state but unfortunately munin needs the network
to work and thus is unreliable when mbufs are scarce ...

While I'm generally not a big fan of cargo-cult administration
I later on set kern.ipc.nmbclusters=1000000 while the auto-tuned
value was "only" 247178 which is more than enough for the normal
operation.

This prevented the system which is now running ElectroBSD amd64
based on FreeBSD 12/stable from completely becoming unresponsive
under load but unfortunately it also results in important processes
getting killed with messages like:

2021-12-14T06:24:40.267263+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 63731 (tor), jid 33, uid 256, was killed: out of swap space                                                                                    
2021-12-14T06:24:41.000777+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 91800 (tor), jid 37, uid 256, was killed: out of swap space                                                                                    
2021-12-14T06:24:41.000862+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 81324 (c++), jid 56, uid 1001, was killed: out of swap space                                                                                   
2021-12-14T06:24:41.000903+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 17233 to 200 packets/sec                                                                                    
2021-12-14T06:24:41.000917+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 19764 (c++), jid 56, uid 1001, was killed: out of swap space                                                                                   
2021-12-14T06:24:41.000954+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 1635 to 200 packets/sec                                                                                     
2021-12-14T06:24:41.000967+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 1192 to 200 packets/sec                                                                                     
2021-12-14T06:24:41.000980+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 974 (xz), jid 0, uid 0, was killed: out of swap space                                                                                          
2021-12-14T06:24:41.001016+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 441 to 200 packets/sec                                                                                      
2021-12-14T06:24:41.001029+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 10872 (perl), jid 0, uid 842, was killed: out of swap space                                                                                    
2021-12-14T06:24:41.001065+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 1052 to 200 packets/sec                                                                                     
2021-12-14T06:24:41.001078+01:00 elektrobier.fabiankeil.de kernel <3>1 - - - pid 62569 (tor), jid 35, uid 256, was killed: out of swap space                                                                                    
2021-12-14T06:24:41.001114+01:00 elektrobier.fabiankeil.de kernel <5>1 - - - Limiting closed port RST response from 269 to 200 packets/sec                                                                                      

My impression is that the system isn't actually out of swap space.

The system has 4 GB of RAM and I temporarily increased the swap space
from 8 GB to 16 GB which didn't make a difference.

As far as munin is concerned the swap space isn't full in the time
when munin is working:
<https://www.fabiankeil.de/bilder/munin/mbuf-issues-2021-12-14/>

As munin isn't reliable under load, I additionally let the system
dump sysctls periodically and it looks like mbuf usage goes up to
over 800000:

[fk@elektrobier /var/log/sysctl-dumps]$ grep "mbufs in use" sysctl-dump-2021-12-14_0[56]\:*
[...]
sysctl-dump-2021-12-14_06:11:53.txt:829476/729/830205 mbufs in use (current/cache/total) 
sysctl-dump-2021-12-14_06:12:57.txt:831954/831/832785 mbufs in use (current/cache/total) 
sysctl-dump-2021-12-14_06:14:02.txt:834506/4/834510 mbufs in use (current/cache/total)   
sysctl-dump-2021-12-14_06:15:11.txt:837446/814/838260 mbufs in use (current/cache/total)  
sysctl-dump-2021-12-14_06:16:19.txt:840177/948/841125 mbufs in use (current/cache/total)  
sysctl-dump-2021-12-14_06:17:26.txt:842652/603/843255 mbufs in use (current/cache/total)  
sysctl-dump-2021-12-14_06:22:31.txt:657/1293/1950 mbufs in use (current/cache/total)     
sysctl-dump-2021-12-14_06:25:40.txt:528/1422/1950 mbufs in use (current/cache/total)     
sysctl-dump-2021-12-14_06:26:40.txt:518/1432/1950 mbufs in use (current/cache/total)     
sysctl-dump-2021-12-14_06:27:40.txt:517/1433/1950 mbufs in use (current/cache/total)     

The sysctls were supposed to be dumped once per minute but
apparently the system can't be trusted to do this under
pressure either ...

It's interesting to me that in the case above the mbuf usage went
up but the mbuf cluster usage didn't go up as well. In the past
both went up together (as shown in the munin graphs for the week).

I'm wondering if killing processes is the best way to deal
with the problem.

I would prefer it, if the kernel would simply stop allocating new
mbufs and mbuf clusters before memory becomes too scarce for the
system to function.

I'm aware that this would affect applications as well and
would probably result in dropped connections, but my expectation
would be that it would be less annoying than the whole system
becoming unresponsive or important application getting killed
and becoming unavailable until I can restart them.

Has anyone already looked into this?

Is there maybe a reason why stopping to allocate more mbufs
and mbuf clusters than the system can handle isn't expected
to work for reasons that aren't obvious to me?

Fabian