Re: ZFS + FreeBSD XEN dom0 panic

From: Ze Dupsys <zedupsys_at_gmail.com>
Date: Sun, 27 Mar 2022 17:42:23 UTC
On Sun, Mar 27, 2022 at 12:13 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Sun, Mar 27, 2022 at 12:38:00AM +0200, Ze Dupsys wrote:
> > On 2022.03.26. 16:38, Roger Pau Monné wrote:
> > > ..
> > > It's weird, because here you get a page fault, but there are also
> > > traces with:
> > > ..
> > > general protection fault while in kernel mode
> > > ..
> > > That show a general protection fault instead of a page fault.
> >
> > Yes indeed, i had not noticed this. Grepped across 34 stored panic log
> > files, i see that 28 are page fault, 4 are general protection fault, 2
> > other. I though maybe RAM size influences this, but page faults have 2G, 4G,
> > 6G, 8G Dom0, general protection faults have 2G, 4G, 8G.
> >
> > I have no idea what triggers what, since stress tests and command line args
> > are more or less the same. Builds are different with patches, some debug
> > info, etc. Almost all panic traces have "rman_is_region_manager" in mid,
> > actually looking all of them together seemed interesting. I'll attach unique
> > panic traces, since some included snprintf, kvprintf as well, maybe helpful.
> > Unfortunately i do not know which version and what patches were applied.
> >
> >
> > > I've also noticed it seems to always be 'devmatch' the process that
> > > triggers the panic.
> >
> > Yes, it seems to be the case most of the time. There are 3 cases when
> > process is "xbbd* taskq". 2 cases with 2G RAM, 1 with 6G.
> >
> >
> > > I've been able to get a better trace with gdb and your debug symbols,
> > > and this is:
> > >
> > > (gdb) info line *0xffffffff80c6a2b2
> > > Line 1386 of "/usr/src/sys/kern/subr_bus.c" starts at address
> > 0xffffffff80c6a2b2 <device_get_name+18>
> > >     and ends at 0xffffffff80c6a2b6 <device_get_name+22>.
> > > (gdb) info line *0xffffffff80c86ed1
> > > Line 1052 of "/usr/src/sys/kern/subr_rman.c" starts at address
> > 0xffffffff80c86ecc <sysctl_rman+540>
> > >     and ends at 0xffffffff80c86ed5 <sysctl_rman+549>.
> >
> > This is a nice find!
> >
> >
> > > I'm trying to figure out how the device could be removed or
> > > disconnected from the rman. I will try to create a patch to catch the
> > > device that leaves rman regions when destroyed/removed.
> >
> > Okay, i'll apply when it will be possible.
> >
> > I did run xen-debug on system with applied blkback.patch as you sent in next
> > message to this.
> >
> > System had panic with new trace:
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 2; apic id = 04
> > fault virtual address = 0xa4
> > fault code            = supervisor read data, page not present
> > instruction pointer   = 0x20:0xffffffff80c90ed0
> > stack pointer         = 0x28:0xfffffe0051927ab0
> > frame pointer         = 0x28:0xfffffe0051927ad0
> > code segment          = base 0x0, limit 0xfffff, type 0x1b
> >                       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags      = interrupt enabled, resume, IOPL = 0
> > current process               = 16 (xenwatch)
> > trap number           = 12
> > panic: page fault
> > cpuid = 1
> > time = 1648331592
> > KDB: stack backtrace:
> > #0 0xffffffff80c7c275 at kdb_backtrace+0x65
> > #1 0xffffffff80c2e2d1 at vpanic+0x181
> > #2 0xffffffff80c2e143 at panic+0x43
> > #3 0xffffffff810c8b97 at trap+0xba7
> > #4 0xffffffff810c8bef at trap+0xbff
> > #5 0xffffffff810c8243 at trap+0x253
> > #6 0xffffffff810a0838 at calltrap+0x8
> > #7 0xffffffff80a98515 at xbd_instance_create+0x7895
> > #8 0xffffffff80a98462 at xbd_instance_create+0x77e2
> > #9 0xffffffff80a9619b at xbd_instance_create+0x551b
> > #10 0xffffffff80f95c54 at xenbusb_localend_changed+0x7c4
> > #11 0xffffffff80ab0ef4 at xs_unlock+0x704
> > #12 0xffffffff80beaede at fork_exit+0x7e
> > #13 0xffffffff810a18ae at fork_trampoline+0xe
> >
> > cat /tmp/panic.log| sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs addr2line
> > -e /usr/lib/debug/boot/kernel/kernel.debug
> >
> > /usr/src/sys/kern/subr_kdb.c:443
> > /usr/src/sys/kern/kern_shutdown.c:0
> > /usr/src/sys/kern/kern_shutdown.c:844
> > /usr/src/sys/amd64/amd64/trap.c:944
> > /usr/src/sys/amd64/amd64/trap.c:0
> > /usr/src/sys/amd64/amd64/trap.c:0
> > /usr/src/sys/amd64/amd64/exception.S:292
> > /usr/src/sys/dev/xen/blkback/blkback.c:2789
> > /usr/src/sys/dev/xen/blkback/blkback.c:3431
> > /usr/src/sys/dev/xen/blkback/blkback.c:3912
> > /usr/src/sys/xen/xenbus/xenbusb_back.c:238
> > /usr/src/sys/dev/xen/xenstore/xenstore.c:1007
> > /usr/src/sys/kern/kern_fork.c:1099
> > /usr/src/sys/amd64/amd64/exception.S:1091
>
> Thanks, unfortunately that patch was incomplete. I have an updated
> version that I think is better now, and I've slightly tested it
> (creating and destroying a domain with it doesn't seem to crash).
> Appended patch at the end of the message.
>
> >
> > Full serial log in attachment.
> >
> > Thanks.
>
> > ==== COUNT: 1
> > Fatal trap 9: general protection fault while in kernel mode
> > cpuid = 0; apic id = 00
> > instruction pointer   = 0x20:0xffffffff80c45892
> > stack pointer         = 0x28:0xfffffe00d2d2b930
> > frame pointer         = 0x28:0xfffffe00d2d2b930
> > code segment          = base 0x0, limit 0xfffff, type 0x1b
> >                       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags      = interrupt enabled, resume, IOPL = 0
> > current process               = 10984 (devmatch)
> > trap number           = 9
> > panic: general protection fault
> > cpuid = 0
> > time = 1646305680
> > KDB: stack backtrace:
> > #0 0xffffffff80c57525 at kdb_backtrace+0x65
> > #1 0xffffffff80c09f01 at vpanic+0x181
> > #2 0xffffffff80c09d73 at panic+0x43
> > #3 0xffffffff8108b1a7 at trap+0xbc7
> > #4 0xffffffff8108a66e at trap+0x8e
> > #5 0xffffffff81061b18 at calltrap+0x8
> > #6 0xffffffff80c62011 at rman_is_region_manager+0x241
> > #7 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> > #8 0xffffffff80c1949c at kernel_sysctl+0x43c
> > #9 0xffffffff80c19b13 at userland_sysctl+0x173
> > #10 0xffffffff80c1995f at sys___sysctl+0x5f
> > #11 0xffffffff8108baac at amd64_syscall+0x10c
> > #12 0xffffffff8106243e at Xfast_syscall+0xfe
> > Uptime: 1h15m46s
> >
> >
> > ==== COUNT: 3
> > Fatal trap 9: general protection fault while in kernel mode
> > cpuid = 0; apic id = 00
> > instruction pointer   = 0x20:0xffffffff80d0728f
> > stack pointer         = 0x28:0xfffffe00a17ea790
> > frame pointer         = 0x28:0xfffffe00a17ea790
> > code segment          = base 0x0, limit 0xfffff, type 0x1b
> >                       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags      = interrupt enabled, resume, IOPL = 0
> > current process               = 2785 (devmatch)
> > trap number           = 9
> > panic: general protection fault
> > cpuid = 1
> > time = 1646419029
> > KDB: stack backtrace:
> > #0 0xffffffff80c57525 at kdb_backtrace+0x65
> > #1 0xffffffff80c09f01 at vpanic+0x181
> > #2 0xffffffff80c09d73 at panic+0x43
> > #3 0xffffffff8108b1a7 at trap+0xbc7
> > #4 0xffffffff8108a66e at trap+0x8e
> > #5 0xffffffff81061b18 at calltrap+0x8
> > #6 0xffffffff80c5da17 at kvprintf+0x1007
> > #7 0xffffffff80c5e719 at snprintf+0x59
> > #8 0xffffffff80c6204b at rman_is_region_manager+0x27b
> > #9 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> > #10 0xffffffff80c1949c at kernel_sysctl+0x43c
> > #11 0xffffffff80c19b13 at userland_sysctl+0x173
> > #12 0xffffffff80c1995f at sys___sysctl+0x5f
> > #13 0xffffffff8108baac at amd64_syscall+0x10c
> > #14 0xffffffff8106243e at Xfast_syscall+0xfe
>
> > Unique on "current process" and trace fingerprint #0-#*.
> >
> > ==== COUNT: 23
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address = 0x22710028
> > fault code    = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff80c45892
> > stack pointer         = 0x28:0xfffffe0096600930
> > frame pointer         = 0x28:0xfffffe0096600930
> > code segment    = base 0x0, limit 0xfffff, type 0x1b
> >       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags  = interrupt enabled, resume, IOPL = 0
> > current process   = 1496 (devmatch)
> > trap number   = 12
> > panic: page fault
> > cpuid = 0
> > time = 1646123791
> > KDB: stack backtrace:
> > #0 0xffffffff80c57525 at kdb_backtrace+0x65
> > #1 0xffffffff80c09f01 at vpanic+0x181
> > #2 0xffffffff80c09d73 at panic+0x43
> > #3 0xffffffff8108b1a7 at trap+0xbc7
> > #4 0xffffffff8108b1ff at trap+0xc1f
> > #5 0xffffffff8108a85d at trap+0x27d
> > #6 0xffffffff81061b18 at calltrap+0x8
> > #7 0xffffffff80c62011 at rman_is_region_manager+0x241
> > #8 0xffffffff80c1a051 at sbuf_new_for_sysctl+0x101
> > #9 0xffffffff80c1949c at kernel_sysctl+0x43c
> > #10 0xffffffff80c19b13 at userland_sysctl+0x173
> > #11 0xffffffff80c1995f at sys___sysctl+0x5f
> > #12 0xffffffff8108baac at amd64_syscall+0x10c
> > #13 0xffffffff8106243e at Xfast_syscall+0xfe
> >
> >
> >
> > ==== COUNT: 2
> > current process   = 20284 (devmatch)
> > trap number   = 12
> > panic: page fault
> > cpuid = 3
> > time = 1647247618
> > KDB: stack backtrace:
> > #0 0xffffffff80c7c615 at kdb_backtrace+0x65
> > #1 0xffffffff80c2e621 at vpanic+0x181
> > #2 0xffffffff80c2e493 at panic+0x43
> > #3 0xffffffff810c8b97 at trap+0xba7
> > #4 0xffffffff810c8bef at trap+0xbff
> > #5 0xffffffff810c8243 at trap+0x253
> > #6 0xffffffff810a09d8 at calltrap+0x8
> > #7 0xffffffff80c82c77 at kvprintf+0x1007
> > #8 0xffffffff80c83a09 at snprintf+0x59
> > #9 0xffffffff80c8729b at rman_is_region_manager+0x27b
> > #10 0xffffffff80c3ee81 at sbuf_new_for_sysctl+0x101
> > #11 0xffffffff80c3e2cc at kernel_sysctl+0x3ec
> > #12 0xffffffff80c3e943 at userland_sysctl+0x173
> > #13 0xffffffff80c3e78f at sys___sysctl+0x5f
> > #14 0xffffffff810c949c at amd64_syscall+0x10c
> > #15 0xffffffff810a12eb at Xfast_syscall+0xfb
>
> Thanks, those all seem to be related to a device being removed without
> cleaning it's rman regions properly. So far I've spotted an issue in
> blkback in this regard, but I wouldn't discard other issues in either
> blkback or netback. Let's see if the updated blkback patch makes a
> difference now.
>
> > ==== COUNT: 2
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 1; apic id = 02
> > fault virtual address = 0x68
> > fault code    = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff824a599d
> > stack pointer         = 0x28:0xfffffe00b1e27910
> > frame pointer         = 0x28:0xfffffe00b1e279b0
> > code segment    = base 0x0, limit 0xfffff, type 0x1b
> >       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags  = interrupt enabled, resume, IOPL = 0
> > current process   = 0 (xbbd7 taskq)
> > trap number   = 12
> > panic: page fault
> > cpuid = 1
> > time = 1646122723
> > KDB: stack backtrace:
> > #0 0xffffffff80c57525 at kdb_backtrace+0x65
> > #1 0xffffffff80c09f01 at vpanic+0x181
> > #2 0xffffffff80c09d73 at panic+0x43
> > #3 0xffffffff8108b1a7 at trap+0xbc7
> > #4 0xffffffff8108b1ff at trap+0xc1f
> > #5 0xffffffff8108a85d at trap+0x27d
> > #6 0xffffffff81061b18 at calltrap+0x8
> > #7 0xffffffff8248935a at dmu_read+0x2a
> > #8 0xffffffff82456a3a at zvol_geom_bio_strategy+0x2aa
> > #9 0xffffffff80a7f214 at xbd_instance_create+0xa394
> > #10 0xffffffff80a7b1ea at xbd_instance_create+0x636a
> > #11 0xffffffff80c6b1c1 at taskqueue_run+0x2a1
> > #12 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac
> > #13 0xffffffff80bc7e3e at fork_exit+0x7e
> > #14 0xffffffff81062b9e at fork_trampoline+0xe
> >
> >
> > ==== COUNT: 1
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 1; apic id = 02
> > fault virtual address = 0x148
> > fault code    = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff8248cef4
> > stack pointer         = 0x28:0xfffffe009941d9a0
> > frame pointer         = 0x28:0xfffffe009941d9a0
> > code segment    = base 0x0, limit 0xfffff, type 0x1b
> >       = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags  = interrupt enabled, resume, IOPL = 0
> > current process   = 0 (xbbd1 taskq)
> > trap number   = 12
> > panic: page fault
> > cpuid = 1
> > time = 1646129773
> > KDB: stack backtrace:
> > #0 0xffffffff80c57525 at kdb_backtrace+0x65
> > #1 0xffffffff80c09f01 at vpanic+0x181
> > #2 0xffffffff80c09d73 at panic+0x43
> > #3 0xffffffff8108b1a7 at trap+0xbc7
> > #4 0xffffffff8108b1ff at trap+0xc1f
> > #5 0xffffffff8108a85d at trap+0x27d
> > #6 0xffffffff81061b18 at calltrap+0x8
> > #7 0xffffffff825cb76e at zil_open+0xe
> > #8 0xffffffff82456d02 at zvol_ensure_zilog+0xb2
> > #9 0xffffffff82456818 at zvol_geom_bio_strategy+0x88
> > #10 0xffffffff80a7f214 at xbd_instance_create+0xa394
> > #11 0xffffffff80a7b1ea at xbd_instance_create+0x636a
> > #12 0xffffffff80c6b1c1 at taskqueue_run+0x2a1
> > #13 0xffffffff80c6c4dc at taskqueue_thread_loop+0xac
> > #14 0xffffffff80bc7e3e at fork_exit+0x7e
> > #15 0xffffffff81062b9e at fork_trampoline+0xe
>
> Hm, those last ones are in ZFS code, can you try to get the line
> numbers for those?
>
> Maybe it's blkback providing bad data to the disk open functions.
>
> Since you are doing so much testing, it might make sense for you to
> use a debug FreeBSD kernel rather than a production one (one with
> WITNESS and INVARIANTS enabled).
>
> Thanks, Roger.
>
> ---8<---
> diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c
> index 33414295bf5e..4007a93a54c7 100644
> --- a/sys/dev/xen/blkback/blkback.c
> +++ b/sys/dev/xen/blkback/blkback.c
> @@ -2774,19 +2774,12 @@ xbb_free_communication_mem(struct xbb_softc *xbb)
>  static int
>  xbb_disconnect(struct xbb_softc *xbb)
>  {
> -       struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
> -       struct gnttab_unmap_grant_ref *op;
> -       u_int                          ring_idx;
> -       int                            error;
> -
>         DPRINTF("\n");
>
> -       if ((xbb->flags & XBBF_RING_CONNECTED) == 0)
> -               return (0);
> -
>         mtx_unlock(&xbb->lock);
>         xen_intr_unbind(&xbb->xen_intr_handle);
> -       taskqueue_drain(xbb->io_taskqueue, &xbb->io_task);
> +       if (xbb->io_taskqueue != NULL)
> +               taskqueue_drain(xbb->io_taskqueue, &xbb->io_task);
>         mtx_lock(&xbb->lock);
>
>         /*
> @@ -2796,19 +2789,28 @@ xbb_disconnect(struct xbb_softc *xbb)
>         if (xbb->active_request_count != 0)
>                 return (EAGAIN);
>
> -       for (ring_idx = 0, op = ops;
> -            ring_idx < xbb->ring_config.ring_pages;
> -            ring_idx++, op++) {
> -               op->host_addr    = xbb->ring_config.gnt_addr
> -                                + (ring_idx * PAGE_SIZE);
> -               op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
> -               op->handle       = xbb->ring_config.handle[ring_idx];
> -       }
> +       if (xbb->flags & XBBF_RING_CONNECTED) {
> +               struct gnttab_unmap_grant_ref  ops[XBB_MAX_RING_PAGES];
> +               struct gnttab_unmap_grant_ref *op;
> +               unsigned int ring_idx;
> +               int error;
> +
> +               for (ring_idx = 0, op = ops;
> +                    ring_idx < xbb->ring_config.ring_pages;
> +                    ring_idx++, op++) {
> +                       op->host_addr    = xbb->ring_config.gnt_addr
> +                                        + (ring_idx * PAGE_SIZE);
> +                       op->dev_bus_addr = xbb->ring_config.bus_addr[ring_idx];
> +                       op->handle       = xbb->ring_config.handle[ring_idx];
> +               }
>
> -       error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
> -                                         xbb->ring_config.ring_pages);
> -       if (error != 0)
> -               panic("Grant table op failed (%d)", error);
> +               error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, ops,
> +                                                 xbb->ring_config.ring_pages);
> +               if (error != 0)
> +                       panic("Grant table op failed (%d)", error);
> +
> +               xbb->flags &= ~XBBF_RING_CONNECTED;
> +       }
>
>         xbb_free_communication_mem(xbb);
>
> @@ -2839,7 +2841,6 @@ xbb_disconnect(struct xbb_softc *xbb)
>                 xbb->request_lists = NULL;
>         }
>
> -       xbb->flags &= ~XBBF_RING_CONNECTED;
>         return (0);
>  }

Hello,

I applied given patch, i did not have enough time to test thoroughly,
but for 3 hours system was running without panic whereas previously it
would crash in around 1,5 hours in similar settings. Till Thursday i
will not be able to test.

About those ZFS panic traces, i will try to get line numbers, but the
problem is that i do not have /usr/lib/debug/boot/kernel/kernel.debug
for FreeBSD 13.0-RELEASE-p7. I tried on laptop's VirtualBox to set up
13.0-RELEASE, but freebsd-update now updates to -p10 version not -p7,
and i did not find a way to to get -p7. It seems to be unsupported
feature.

What do you mean to use debug kernel with WITNESS and INVARIANTS? To
build custom kernel GENERIC + add those two options or is there a
common kernel build config used by devs that already includes those
options?

Thanks.