Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Wed, 02 Mar 2022 08:57:37 UTC
Hello,

I started using XEN on one pre-production machine (with aim to use later in
production) with 12.2, but since it experienced random crashes i updated to
13.0 in hope that errors might disappear.

I do not know how detailed should i write, so that this email is not too
long, but gives enough info.

FreeBSD Dom0 is installed on ZFS, somewhat basic install, IPFW and rules
for NATting are used. Zpool is composed of 2 mirrored disks. There is a
ZVOL volmode=dev for each VM and VM's jail that are attached as raw devices
to DomU. At the moment DomUs contain FreeBSD, some 12.0 to 13.0, UFS, with
VNET jails, epairs all bridged to DomU's xn0 interface. On Dom0 i have
bridge interfaces, where DomU's are connected depending on their
"zone/network", those that have allowed outgoing connections are NATted by
IPFW on specific physical NIC and IP.

xen_cmdline="dom0_mem=6144M cpufreq=dom0-kernel dom0_max_vcpus=4 dom0=pvh
console=vga,com1 com1=115200,8n1 guest_loglvl=all loglvl=all"

Physical hardware is XEON CPU, ECC RAM 16G, 2x8TB HDD.

DomU config, something like this:
memory = 1024
vcpus=2
name = "sys-01"

type = "hvm"
boot = "dc"

vif = [ 'vifname=xbr0p5,type=vif,mac=00:16:3E:01:63:05,bridge=xbr0' ]
disk = [ 'backendtype=phy, format=raw, vdev=xvda,
target=/dev/zvol/sys/vmdk/root/sys-01-root',
         'backendtype=phy, format=raw, vdev=xvdb,
target=/dev/zvol/sys/vmdk/root/sys-01-jail1',
         'backendtype=phy, format=raw, vdev=xvdc,
target=/dev/zvol/sys/vmdk/root/sys-01-jail2'
         .. more defs, if any ..
       ]

vnc=1
vnclisten="0.0.0.0:X"
usbdevice = "tablet"
serial = "pty"


When just started, overall system works, speeds are acceptable, load is not
high so system is not under stress. The thing is that at some unexpected
times i noticed that system reboots, i.e. when i create new ZFS volume in
Dom0, or when i reboot DomU or do something in Dom0 which seems unrelated,
sometimes it was that init 0 would reboot system, sometimes it shut it
down. It somehow felt, that panics happen when there is HDD load. So i got
somewhat similar machine for testing/lab env, 16G ECC, slower XEON, 2x2TB
HDD and serial port and started to try to push that system to limits with
various combinations, restricting RAM, CPUs, etc. The bug info contains
combination, that seemed for me to be the fastest way of how to panic
system.

For XEN startup "vifname=" did not work as described in XEN user manual
pages for default startup script, so i added "ifconfig name $vifname" in
that script. The necessity for it was, that ipfw rules that required "via
$ifname in", had to have specific NIC, but XEN by default each time was
creating new NIC name depending on which name was free. This is not active
on lab system, and it still crashes, so i do not think that problem cause
is this.


About history.
I believe hardware is okay, since before XEN i was using FreeBSD 12.2
(upgraded incrementally from 12.0), ZFS + jails a lot, VNETs used were
netgraph(VNET bridge and ethernet interfaces). What i loved about that
setup was, clean output of ifconfig, since host had only bridge interface
and virtual ethernet interfaces for jails came directly from that bridge.
New jail creation was just "zfs clone", it did not take much space,
snapshots for backups could be made, whole HDD space could be easily
expanded/limited for each jail, due to ZFS capabilities. System was stable.
The problem with that setup was, that if some jail started to misbehave
badly it was hard to control overall system performance and behavioral
characteristics, i tried rctl, but jails could misbehave in new unexpected
bad ways (exhausting RAM, process count, CPU load, HDD load, opening too
many network sockets, etc. If OOM killer started to kill processes, it was
impossible to control which process/jail should get killed first, which
should be kept), so for me it seemed that virtualization is better way to
solve that. I.e. to have a system VM, that has DNS, Web gateway, etc., and
lower priority VMs, that could crash if misbehaving. I like XEN
architecture in general, and i would like to use FreeBSD as Dom0, if
possible; due to ZFS, knowledge and good history of OS stability.

Since ZFS dataset can not be passed through to DomU, my idea was to use
ZVOLs and UFS within VM, then i could snapshot those ZVOLs for backups,
DomU could growfs when necessary. Somewhat less convenient as for jail
architecture, but still, good enough.

My first attempt was to keep netgraph jails in Dom0, but it turned out bad.
Almost every time system panic happened when jail was started/stopped. Not
first jail, but 5th+, panic-ed system with high probability. So i started
to use epairs instead. It was less unstable, but still crashed from time to
time. Now there are no jails, and still.

I tried different ideas, to pass through whole HDD as raw in DomU-iscsi and
use ctld on Dom0 to provide disks for other DomUs, HDD speed was bad, but
system still crashed, i tried raw files on ZFS datasets, speeds seemed
close to ZVOLs actually, but system still crashed. So now i was starting to
wonder, what configurations do people use successfully? What have i missed?


On Tue, Mar 1, 2022 at 5:40 PM Brian Buhrow <buhrow@nfbcal.org> wrote:

>         hello.  I've been running FreeBSD-12.1 and Freebsd-12.2 plus ZFS
> plus Xen with FreeBSD as
> dom0  without any stability issues for about 2 years now.  I'm doing this
> on a number of
> systems, with a variety of  NetBSD, FreeBSD and Linux as domU guests.  I
> haven't looked at your
> bug details, but are you running FreeBSD-13?
> -thanks
> -Brian
>
>