Re: ZFS + FreeBSD XEN dom0 panic

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Thu, 03 Mar 2022 10:39:43 UTC
On Wed, Mar 02, 2022 at 07:26:18PM +0200, Ze Dupsys wrote:
> Today managed to crash lab Dom0 with:
> xen_cmdline="dom0_mem=6144M dom0_max_vcpus=2 dom0=pvh,verbose=1
> console=vga,com1 com1=9600,8n1 guest_loglvl=all loglvl=all sync_console=1
> reboot=no"

Hm, it's weird that reboot=no doesn't work for you. Does noreboot
instead make a difference?

> 
> I wrote ' vmstat -m | sort -k 2 -r' each 120 seconds, the latest one was as
> in attachment, panic was with the same fingerprint as the one with
> "rman_is_region_manager" line already reported.



> The scripts i ran in parallel generally were the same as attached in bug
> report, just a bit modified.
> 1) ./libexec.sh zfs_volstress_fast_4g (this just creates new ZVOLs and
> instead of 2GB, it writes 4BG in each ZVOL created dd if=/dev/zero)
> 2)  ./test_vm1_zvol_3gb.sh (this loops commands: start first DomU, write
> 3GB in it's /tmp, restart DomU, removes /tmp, repeat)
> 3) ./test_vm2_zvol_5_on_off.sh (this loops: start second DomU, which has 5
> disks attached, turn off DomU, repeat)

Right. So the trigger for this seem to be related to creating (and
destroying) VMs in a loop?

Do you still see the same if you only execute steps 1 and 4 from the
repro described above?

> 4) monitoring, sleep 120 seconds, print vmstat | sort in serial output.
> 
> Around dom id 108, system started to behave suspiciously, xl list showed
> DomUs created, but they did not really start up, script timeout-ed for ssh
> connection, no vnc. When i did xl destroy manually, and xl create, system
> panic happened.

Could you also add the output of `top -n1` to see where memory is
going?

I'm quite sure we have a leak in some of the backends, maybe the
bounce buffer used by blkback.

Thanks, Roger.