Re: ZFS + FreeBSD XEN dom0 panic

From: Brian Buhrow <buhrow_at_nfbcal.org>
Date: Wed, 02 Mar 2022 20:07:34 UTC
	Hello.  Given Roger's message and the presentation of the errors you're seeing, I think
we're both in agreement that you're running out of memory.  Below I've listed some suggestions
on how you might go about making your system more stable, and, possibly figuring out where the
trouble is, exactly.  When I was building my infrastructure, I ran into a bunch of edge
conditions and weird bugs as well, so some ideas that may not seem obvious, may, in fact, work,
simply because they cause you to skirt some latent bug in the system.

Here's what my setup looks like, in case it helps.

1.  Use FreeBSD-12.1 or 12.2, and xen-4.13 or xen-4.14
(Using FreeBSD as dom0.)

2.  Use a minimum of 8G of RAM for the dom0.

3.  Put the root filesystem on a mirrored set of partitions or drives; I do not use ZFS as a
root filesystem, but put root and swap on a mirrored media using the gmirror(8) utility and
mirror driver.  In this way, if the system needs to use swap, it can do so without touching ZFS
and, hopefully, without requiring additional memory to get to the swap.

4.  Create a network bridge using the bridge(4) driver for attaching the domu's to the network.
You can either attach this bridge to one of your physical network interfaces, or you can attach
it to a vlan(4) interface, thus allowing separation between the network the dom0 lives on and
the network(s) the domu's live on.  I do not do any firewalling or natting on the dom0, it's
just a packet forwarder.

5.  Create a zpool for your domu disks.  It can be comprised of partitions on the same disks
you use for booting, or it can live on separate disks.

6.   Create zvols for the disks you want to attach to your domu's.  I typically attach 1 or two
disks to each domu.  I've not experimented with attaching large numbers of disks to a single
domu, instead partitioning the virtual disks from inside the domu itself if I want separation.
I use the xbd(4) block driver for attaching the disks as raw block devices to the domu's.  I've
not played with the other virtual drivers.   ( I have played with HVM hosts using the qemu
drivers and found things sort of work, but are not stable; though the dom0 is fine.)

	While I've not tried oversubscribing the RAM reserved for domu use on the systems I run,
if the domu's you're running support the balloon driver, I think you will have success in doing
that.  I disagree with your comment that adding memory just puts off the inevitable problem.  Once
you figure out how much memory your dom0 needs to do its work, then it's just a matter of
deciding how much memory you want for your domu's.  Having said that, however, it's important
to figure out what the minimum amount of memory you need for your dom0 is; going below that
threshold will result in instability.  

As I think abut your issue, I'm guessing you'll find things are much more stable and 
you'll be happy with 8G for the dom0 if you put the root and swap on a mirror and get it out
of ZFS.  I think you're runing into a situation where your system is short of memory, it tries
to swap, the swap is on ZFS, which needs more memory to fulfill the paging request and, well,
you get the idea.  The system I'm sending this from, for example, has 8GB devoted to the dom0
and it's currently using 36Mb of swap.  It's currently been up for 69 days, which is the time
since the last power failure.  I have another system, the one running pfsense as one of its
domu's, which has 8GB of memory, no swap configured, and it's been up for 740 days.

Hope these notes are helpful.
-Brian