Re: ZFS + FreeBSD XEN dom0 panic

From: Brian Buhrow <buhrow_at_nfbcal.org>
Date: Wed, 02 Mar 2022 17:05:22 UTC
	hello.  One difference between my systems and yurs, though I don't think that's the
problem, is that I'm not running a firewall on the dom0 itself.  The dom0 runs on a protected
vlan with respect to the external network and the domu's are connected to bridges that are
directly connected to the external network.  I have one system where the customer wants the
pfsense system runing, so pfsense runs as a domu on this system, connected to an internal
"private" bridge and the public bridge, doing all the firewalling between them.  In this way,
the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management.

	If I had to wager a guess as to your trouble, it's that you don't have enough memory on
your dom0.  ZFS is a memory hog and I can't imagine getting away with anything less than 8G on
the dom0 with FreeBSD-12 and ZFS.  I'm using 8G for the dom0 on the system I'm writing from and
it is quite stable, but, then again, I'm not doing as much with the dom0 as you are.

	I too am using zvols as disks for the domu's, but I've not been trying to make zfs
snapshots from them.  obvious question, but I'll ask it anyway, you're not trying to make
snapshots of the zvols while the domu's on top of them are running, are you?  I would imagine
that would not give you good images, but I wouldn't expect it to panic the dom0 either.
However, it wil stretch your meager memory resources even further.

Have you been able to get a panic message or does the system just spontaneously reboot?  If it
just reboots, then, again, I think you are having a memory shortage.

My suggestion is to try giving the dom0 8G of RAM and then for the domU's, use the balloon
driver to oversubscribe the remaining memory for the domu's.  Of course, the best course of
action is to see if you can put more memory in this system; 16GB  just isn't that much when
you're trying to run Xen plus a few domu's, especially on top of ZFS.
If yu can get a panic message or a crash dump, that would be helpful in figuring out more
accurately what's going on.

Another thought, since you were getting some crashes when running jails with xen, is to get
memtest86 running on the raw machine and let it run for  3 or 4 days.  If you don't get any
memory errors, then I think you can be pretty sure it's not a hardware problem.   If, however,
you get any errors at all with that test, then I think it's a good bet you hav a hardware issue.


-thanks
-Brian