Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Wed, 02 Mar 2022 18:12:28 UTC
I agree with you, that firewall and networking most probably is not at
fault. When Dom0 had jails with netgraph, then it panic'ed a lot. Well, now
i have concluded that mixing jails with VMs is not a good idea anyways,
better having jails in DomU.

Well, i could beef up RAM a bit, but it seems that this would just postpone
the inevitable. While DomUs are just having CPU load, use network and have
little load on HDDs, all is fine, but once HDD load increases, at some
point system crashes. It really could be as you say due to ZFS and it's
monopolistic/unfriendly RAM usage. I guess i will go on quest to search how
to tune/limit ZFS a bit.

About snapshoting ZFS volumes... i'm not doing it while DomUs are running.
At the moment the crashing lab machine has no snapshots at all.

Lab machine reboots with panic messages written on serial output, and
logged by old laptop. So it feels that it is not hardware error; i will run
memtest to be sure. At first i did get only partial panic messages, since
XEN rebooted too soon, but then i added sync_console=1, so i get full panic
messages, it seems that reboot=no value is not taken into account, though.

About memory balooning i feel somewhat hesitant, in a way i am trying to
not use too many different techniques that could introduce more problems.
Is balooning on XEN + FreeBSD Dom0 considered stable?

Have you used XENs driver domain with FreeBSD to "export/provide" disks? It
seemed interesting approach as well, but as i was following documentation i
could not understand how to even configure FreeBSD as a driver domain, if
it's even possible, to provide block devices to Dom0 so it can provide them
to other DomUs. This might solve RAM issues as well, since driver domain
would have it's reserved RAM and could not put pressure on Dom0's RAM for
whatever reason.

In a way i am thinking about various strategies to shave off services from
Dom0, to ensure it's stability. Maybe i should configure firewall inside in
a DomU as in your pfSense example. Since for me usually CPU resources are
not exhausted, but NICs and HDDs are.

Thank's for the ideas of what else could be done, to solve this!

Best wishes,
Ze Dupsys


On Wed, Mar 2, 2022 at 7:05 PM Brian Buhrow <buhrow@nfbcal.org> wrote:

>         hello.  One difference between my systems and yurs, though I don't
> think that's the
> problem, is that I'm not running a firewall on the dom0 itself.  The dom0
> runs on a protected
> vlan with respect to the external network and the domu's are connected to
> bridges that are
> directly connected to the external network.  I have one system where the
> customer wants the
> pfsense system runing, so pfsense runs as a domu on this system, connected
> to an internal
> "private" bridge and the public bridge, doing all the firewalling between
> them.  In this way,
> the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management.
>
>         If I had to wager a guess as to your trouble, it's that you don't
> have enough memory on
> your dom0.  ZFS is a memory hog and I can't imagine getting away with
> anything less than 8G on
> the dom0 with FreeBSD-12 and ZFS.  I'm using 8G for the dom0 on the system
> I'm writing from and
> it is quite stable, but, then again, I'm not doing as much with the dom0
> as you are.
>
>         I too am using zvols as disks for the domu's, but I've not been
> trying to make zfs
> snapshots from them.  obvious question, but I'll ask it anyway, you're not
> trying to make
> snapshots of the zvols while the domu's on top of them are running, are
> you?  I would imagine
> that would not give you good images, but I wouldn't expect it to panic the
> dom0 either.
> However, it wil stretch your meager memory resources even further.
>
> Have you been able to get a panic message or does the system just
> spontaneously reboot?  If it
> just reboots, then, again, I think you are having a memory shortage.
>
> My suggestion is to try giving the dom0 8G of RAM and then for the domU's,
> use the balloon
> driver to oversubscribe the remaining memory for the domu's.  Of course,
> the best course of
> action is to see if you can put more memory in this system; 16GB  just
> isn't that much when
> you're trying to run Xen plus a few domu's, especially on top of ZFS.
> If yu can get a panic message or a crash dump, that would be helpful in
> figuring out more
> accurately what's going on.
>
> Another thought, since you were getting some crashes when running jails
> with xen, is to get
> memtest86 running on the raw machine and let it run for  3 or 4 days.  If
> you don't get any
> memory errors, then I think you can be pretty sure it's not a hardware
> problem.   If, however,
> you get any errors at all with that test, then I think it's a good bet you
> hav a hardware issue.
>
>
> -thanks
> -Brian
>
>