[Bug 261059] Kernel panic XEN + ZFS volume.
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261059] Kernel panic XEN + ZFS volume."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 09 Jan 2022 13:21:17 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261059 Bug ID: 261059 Summary: Kernel panic XEN + ZFS volume. Product: Base System Version: 13.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: zedupsys@gmail.com Created attachment 230842 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=230842&action=edit all config and test script files Broadly describing, problem is simple - whole system reboots at unexpected/unwanted times uncontrollably. XEN virtualization toolstack is used. FreeBSD is run as Dom0 PVH and hosted FreeBSD VMs DomU HVM. ZFS file system is used for Dom0. And disks for DomUs are exposed as block devices, ZFS volumes. I haven't been able to narrow down which area is one that is causing crash. At first i thought that this is XEN related problem, but the more i tested, it somewhat started to feel ZFS related as well; sort of concurrency related. While searching i have created some scripts (will add as attachments), which when run atleast on my testing hardware can crash system most of the time. Based on my observations, the most effective way to crash the system is to run as root three scripts in parallel: 1) one that creates 2GB ZFS volumes and copies data from IMG file onto ZVOL by executing dd, 2) script that turns on/off VM1, 3) script that turns on/off VM2 and VM2 has at least 5 disks. But it is not the only way, it is the one that crashes system faster than other ways. System hardware: CPU: Intel(R) Xeon(R) CPU X3440 @ 2.53GHz RAM: 16GB ECC HDD: 2x WDC WD2003FYYS 2TB System installed from FreeBSD-13.0-RELEASE-amd64-dvd1.iso, all defaults except IP and some basic configuration. ZFS pool created automatically with name sys. XEN toolstack installed by pkg install. Done freebsd-update. root@lab-01 > uname -a FreeBSD lab-01.b7.abj.lv 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 root@lab-01 > freebsd-version 13.0-RELEASE-p5 root@lab-01 > zpool status pool: sys state: ONLINE scan: resilvered 3.70M in 00:00:03 with 0 errors on Fri Jan 7 11:06:07 2022 config: NAME STATE READ WRITE CKSUM sys ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/sys0 ONLINE 0 0 0 gpt/sys1 ONLINE 0 0 0 errors: No known data errors root@lab-01 > pkg info argp-standalone-1.3_4 Standalone version of arguments parsing functions from GLIBC ca_root_nss-3.69_1 Root certificate bundle from the Mozilla Project curl-7.79.1 Command line tool and library for transferring data with URLs edk2-xen-x64-g202102 EDK2 Firmware for xen_x64 gettext-runtime-0.21 GNU gettext runtime libraries and programs glib-2.70.1,2 Some useful routines of C programming (current stable version) indexinfo-0.3.1 Utility to regenerate the GNU info page index libevent-2.1.12 API for executing callback functions on events or timeouts libffi-3.3_1 Foreign Function Interface libiconv-1.16 Character set conversion library libnghttp2-1.44.0 HTTP/2.0 C Library libssh2-1.9.0_3,3 Library implementing the SSH2 protocol libxml2-2.9.12 XML parser library for GNOME lzo2-2.10_1 Portable speedy, lossless data compression library mpdecimal-2.5.1 C/C++ arbitrary precision decimal floating point libraries pcre-8.45 Perl Compatible Regular Expressions library perl5-5.32.1_1 Practical Extraction and Report Language pixman-0.40.0_1 Low-level pixel manipulation library pkg-1.17.5 Package manager python38-3.8.12 Interpreted object-oriented programming language readline-8.1.1 Library for editing command lines as they are typed seabios-1.14.0 Open source implementation of a 16bit X86 BIOS tmux23-2.3_1 Terminal Multiplexer (old stable version 2.3) vim-8.2.3458 Improved version of the vi editor (console flavor) xen-kernel-4.15.0_1 Hypervisor using a microkernel design xen-tools-4.15.0_2 Xen management tools yajl-2.1.0 Portable JSON parsing and serialization library in ANSI C zsh-5.8 The Z shell root@lab-01 > cat /boot/loader.conf zfs_load="YES" vfs.root.mountfrom="zfs:sys" beastie_disable="YES" autoboot_delay="5" boot_multicons="YES" boot_serial="YES" comconsole_speed="9600" console="comconsole,vidconsole" xen_kernel="/boot/xen" xen_cmdline="dom0_mem=2048M cpufreq=dom0-kernel dom0_max_vcpus=2 dom0=pvh console=vga,com1 com1=9600,8n1 guest_loglvl=all loglvl=all" hw.usb.no_boot_wait=1 root@lab-01 > cat /etc/rc.conf hostname="lab-01.b7.abj.lv" cloned_interfaces="bridge10" create_args_bridge10="name xbr0" cloned_interfaces_sticky="YES" ifconfig_xbr0="inet 10.63.0.1/16" zfs_enable="YES" sshd_enable="YES" xencommons_enable="YES" Besides default ZFS dataset that is mounted at /, i have created parent for VM ZVOLs and for working in /service directory. root@lab-01 > zfs list NAME USED AVAIL REFER MOUNTPOINT sys 98.6G 1.66T 1.99G / sys/service 96.6G 1.66T 96.6G /service sys/vmdk 48K 1.66T 24K none sys/vmdk/dev 24K 1.66T 24K none # zfs create -o mountpoint=none sys/vmdk # etc. I am running scripts from folder /service/crash, so attachments can just be placed there on fresh system. Scripts need SSH key, thus create it by command ssh-keygen. Attached file descriptions: lib.sh - this file contains reusable functions for tests and VM preparation, used by test scripts and manually. libexec.sh - this is just a wrapper file which uses first arg as function name to be called from lib.sh, this is used for manual function calls. test_vm1_zvol_on_off.sh - this script in loop executes VM1 boot, sleep, VM1 power off test_vm2_zvol_on_off.sh - this script in loop executes VM2 boot, sleep, VM2 power off test_vm2_zvol_5_on_off.sh - this turns on/off VM2 which has 5 HDDs test_vm1_zvol_3gb.sh - this turns VM1 on/off, and writes/removes 3GB file in VM1:/tmp folder xen-vm1-zvol.conf - XEN config file for VM1 xen-vm2-zvol.conf - XEN config file for VM2 xen-vm2-zvol-5.conf - XEN config file for VM2 with 5 HDDs. To create VMs: With all those attached files in /service/crash. Run as root: ./libexec.sh vm1_img_create ./libexec.sh vm2_img_create These commands will create VM1 and VM2 disk images, set internal IP as defined in lib.sh and copy SSH key from hosts /root/.ssh int VM disks. VM image is downloaded from https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Latest/FreeBSD-13.0-RELEASE-amd64.raw.xz, so network connection is necessary or file FreeBSD-13.0-RELEASE-amd64.raw.xz must be placed in folder /service/crash/cache. Then to convert IMG to ZVOL: ./libexec.sh vm1_img_to_zvol ./libexec.sh vm2_img_to_zvol Sometimes at this point there is dd error, that /dev/zvol/sys/vmdk/dev/vm1-root is not accessible. There is some ZFS bug, but i could not repeat it reliably enough to write bug report. So just reboot system, it will show up, just rerun command. Create dummy disks for VM2 data. ./libexec.sh vmdk_empty_create vm2-data1.img 2G ./libexec.sh vmdk_empty_create vm2-data2.img 2G ./libexec.sh vmdk_empty_create vm2-data3.img 2G ./libexec.sh vmdk_empty_create vm2-data4.img 2G ./libexec.sh vm2_data_to_zvol Now that everything is prepared, just test VMs with xl create xen-vm1-zvol.conf To see that VM boots, run: xl console xen-vm1-zvol It is necessary to connect with SSH manually once, to ensure that connection works and SSH updates /root/.ssh/known_hosts. Before test start, expected ZFS layout is: root@lab-01 #1> zfs list NAME USED AVAIL REFER MOUNTPOINT sys 142G 1.62T 1.99G / sys/service 111G 1.62T 111G /service sys/vmdk 28.9G 1.62T 24K none sys/vmdk/dev 28.9G 1.62T 24K none sys/vmdk/dev/vm1-root 10.3G 1.62T 5.07G - sys/vmdk/dev/vm2-data1 2.06G 1.62T 12K - sys/vmdk/dev/vm2-data2 2.06G 1.62T 2.00G - sys/vmdk/dev/vm2-data3 2.06G 1.62T 2.00G - sys/vmdk/dev/vm2-data4 2.06G 1.62T 12K - sys/vmdk/dev/vm2-root 10.3G 1.62T 5.07G - And directory # ls -la /dev/zvol/sys/vmdk/dev/ total 1 dr-xr-xr-x 2 root wheel 512 Jan 9 14:27 . dr-xr-xr-x 3 root wheel 512 Jan 9 14:27 .. crw-r----- 1 root operator 0x72 Jan 9 14:27 vm1-root crw-r----- 1 root operator 0x70 Jan 9 14:27 vm2-data1 crw-r----- 1 root operator 0x71 Jan 9 14:27 vm2-data2 crw-r----- 1 root operator 0x75 Jan 9 14:27 vm2-data3 crw-r----- 1 root operator 0x73 Jan 9 14:27 vm2-data4 crw-r----- 1 root operator 0x74 Jan 9 14:27 vm2-root For me sometimes there are missing ZVOLs in /dev/zvol directory, vm2-data1 or vm2-data3, even if zfs list shows them up, thus init 6, before tests can be started. Once the environment is ready, just run from three different SSH sessions commands: 1) cd /service/crash; ./libexec.sh zfs_volstress 2) cd /service/crash; ./test_vm1_zvol_on_off.sh 3) cd /service/crash; ./test_vm2_zvol_5_on_off.sh Sometimes it crashes fast (in 2 minutes) sometimes it takes some time, like 30 minutes. My observations so far. 1. ZVOLs are acting weird, for example at some point i see output like this: ./libexec.sh: creating sys/stress/data1 2G dd: /dev/zvol/sys/stress/data1: No such file or directory ./libexec.sh: creating sys/stress/data2 2G 4194304+0 records in 4194304+0 records out 2147483648 bytes transferred in 70.178650 secs (30600241 bytes/sec) ./libexec.sh: creating sys/stress/data3 2G 4194304+0 records in 4194304+0 records out 2147483648 bytes transferred in 73.259213 secs (29313496 bytes/sec) ./libexec.sh: creating sys/stress/data4 2G dd: /dev/zvol/sys/stress/data4: Operation not supported ./libexec.sh: creating sys/stress/data5 2G dd: /dev/zvol/sys/stress/data5: Operation not supported ./libexec.sh: creating sys/stress/data6 2G For me this seems somewhat unexpected behaviour, since each time before dd is run, zfs create has returned; it is not done in parallel from user's perspective. See function zfs_volstress in lib.sh file. 2. Often, but not always there are problems with starting VM2 before system crash, output: libxl: error: libxl_device.c:1111:device_backend_callback: Domain 53:unable to add device with path /local/domain/0/backend/vbd/53/51712 libxl: error: libxl_create.c:1613:domcreate_launch_dm: Domain 53:unable to add disk devices libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 53:Non-existant domain libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 53:Unable to destroy guest libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 53:Destruction of domain failed ./test_vm2_zvol_single_hdd_on_off.sh: waiting VM to be ready Sometimes must restart script ./test_vm2_zvol_single_hdd_on_off.sh, because it is not smart with waiting for VM2 start. 3. It is not necessary for VM2 to have 5 disks to crash system; even running 1) cd /service/crash; ./libexec.sh zfs_volstress 2) cd /service/crash; ./test_vm1_zvol_on_off.sh 3) cd /service/crash; ./test_vm2_zvol_on_off.sh Will crash system eventually, but it takes much longer to wait for it; sometimes for me it takes 2-3 hours. 4. If just running, test_vm1_zvol_on_off and test_vm2_zvol_on_off, system seems not to crash, or maybe i did not wait long enough; it was whole day. Thus ZFS load seems essential to provoke panic. 5. It is possible to crash system with scripts only 2 scripts: 1) cd /service/crash; ./test_vm1_zvol_3gb.sh (this writes 3GB data inside VM1:/tmp) 2) cd /service/crash; ./test_vm2_zvol_5_on_off.sh Writing larger files inside VM1 tends to provoke panic sooner; with 1GB could not repeat the case often enough. The problem is that there is little info when system crashes. I am open for advices how could i capture more useful data, but below are some incomplete, for me seemed interesting fragments from serial output: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x30028 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c45832 stack pointer = 0x28:0xfffffe00967ec930 frame pointer = 0x28:0xfffffe00967ec930 cod Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80c45832 stack pointer = 0x28:0xfffffe009666b930 frame pointer = 0x28:0xfffffe009666b930 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, Fatal trap 12: page fault w (d2) Booting from Hard Disk... (d2) Booting from 0000:7c00 (XEN) d1v0: upcall vector 93 (XEN) d2v0: upcall vector 93 xnb(xnb_frontend_changed:1391): frontend_state=Connected, xnb_state=InitWait xnb(xnb_connect_comms:787): rings connected! xbbd4: Error 12 Unable to allocate request bounce buffers xbbd4: Fatal error. Transitioning to Closing State panic: pmap_growkernel: no memory to grow kernel cpuid = 0 time = 1641731595 KDB: stack backtrace: #0 0xffffffff80c574c5 at kdb_backtrace+0x65 #1 0xffffffff80c09ea1 at vpanic+0x181 #2 0xffffffff80c09d13 at panic+0x43 #3 0xffffffff81073eed at pmap_growkernel+0x27d #4 0xffffffff80f2da88 at vm_map_insert+0x248 #5 0xffffffff80f301e9 at vm_map_find+0x549 #6 0xffffffff80f2bf16 at kmem_init+0x226 Loading /boot/loader.conf.local I am interested in solving this. This is a testing machine, thus i can run modified tests any time. But i am somewhat out of ideas what could be done to get more verbose output, so that at least complete messages are written in serial output before automatic reboot happens. As for "panic: pmap_growkernel: no memory to grow kernel", for me it seemed that it should be enough that Dom0 has 8GB RAM, and each VM 1GB. But i do not claim that i am XEN expert and maybe this could be clasified as misconfiguration of system. If so, i am open to pointers what could be done to make system more stable. The same scripts can crash RELEASE-12.1 as well. Tested. -- You are receiving this mail because: You are the assignee for the bug.