[Bug 259651] bhyve process uses all memory/swap

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 04 Nov 2021 20:34:27 UTC

            Bug ID: 259651
           Summary: bhyve process uses all memory/swap
           Product: Base System
           Version: 12.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: reg@FreeBSD.org

I've had a Windows Server running in bhyve on FreeNAS for a few years now.  It
uses DFS-R to sync a few windows file systems to my remote backup location. 
The VM has several zvol backed AHCI devices and a virtio network adapter.  It
has been running (mostly) stably for a long time with adequate performance (as
in, it can mostly saturate the 1GB link it's on and can get disk speeds in the
VM which are as fast as I expect from the low power backing store).  Recently I
made a few changes to the machine and the host, some of which are hard to
reverse, and the VM has started to consume all available RAM, then all the swap
and eventually it gets killed by the OOM handler...  A few crashes corrupted
the DFS-R databases, and so now the machine wants to do a huge amount of IO
(both network and disk) to resync (but that's my problem).

There are other reports online of RAM exhaustion from bhyve, but I couldn't
find an open bug, so I'm filing one.  My problem seemed to start on updating to
TrueNAS-12.0-U5.1, but I also did some other reconfiguration around this time,
and judging from the other reports, this might be a long-standing issue.

The other change I made was to mess with the CPU/RAM allocation to this VM, and
I accidentally misread the number of the cores as the total number of cores,
not the per CPU cores, so I allocated way more cores as my CPU has threads
(2xCPUs, 2xcores, 2xthreads, 8GB RAM)...  Needless to say, the VM quickly
swamped the host.  However, this also caused the memory use to grow.  I've now
scaled the CPUs back to (1xCPU, 1xcore, 2xthreads, 6GB RAM) and the memory use
is now staying stable - although it's currently rebuilding some DFS-R database
so it's not maxing out the VM CPUs.

The behavior I observe is that the memory use stays stable as long as the host
CPU use is reasonable.  As soon as the host starts to max out its real cores
(it's a 2xcore, 2xthread CPU) and the bhyve VM is doing a lot of IO, the memory
use grows rapidly.  When the byhe process is stopped (by shutting down the VM,
if you can get in quick enough), it takes a very long time to exit and sits in
a 'tx->tx' state.  It looks like it's trying to flush buffers, although the
zpool seems to show only reads while the process is exiting.  My guess as to
the bug is that byhve has a huge amount of outstanding IO, but I'm not sure how
to monitor that.  When the host CPU is really busy these IO buffers are not
being freed properly, and are eventually leaking.

Around the same time as making these changes, I also turned on dedup on one of
the zvols (the backups on that disk are rewritten every day, even though
they're the same, so I was getting a lot of snapshot growth).  I've turned that
off, but it didn't seem to change the behavior.  I also added the ZIL and L2ARC
devices to the pool around this time.  I've not tried removing them.

The host and the VM have been set up for a long time and working, so I'm going
to ignore suggestions to get a bigger box or tune my zarc values...  But I'm
happy to debug it - I've been able to reproduce this relatively reliably with
different CPU settings, although it does rely on Windows cooperating.  I can't
mess with it too much since I do need to keep the other backups going directly
via TrueNAS to the other pools going ;-).

TrueNAS Server:
ThinkServer TS140, Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, 20GB RAM.
zpool: 2 striped mirrored 3TB TOSHIBA HDWD130, with mirrored 12GB ZIL and 19GB
TrueNAS-12.0-U6 (FreeBSD 12.2-RELEASE-p10), 25GB of swap.

Windows Server VM:
2xCPU, 1xcore, 1xthread, 6GB RAM (original, see other comments).
4xAHCI zvol with 64K cluster, one of which had dedup on for a period as an
experiment, 512B blocks. (VM BSODs immediately if I try using virtio-blk).
1xVirtIO NIC (em0), with 0.1.208 virtio-win drivers.
Windows Server 2019, fully patched.

You are receiving this mail because:
You are the assignee for the bug.