Re: Stable/13 doesn't boot with xen

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Tue, 07 Jun 2022 07:49:00 UTC
On Mon, Jun 06, 2022 at 11:18:09PM -0700, Brian Buhrow wrote:
> 	hello.  Following up on my recent post, here is the log of the failed shutdown from
>  FreeBSD head.
> This probably doesn't tell you everything, but hopefully it will give you  a clue.
> 
> -thanks
> -Brian
> 
> 
> Jun  6 10:25:34 xen-lothlorien shutdown[1582]: reboot by buhrow:
> Stopping sshd.
> Waiting for PIDS: 1292.
> Stopping cron.
> Waiting for PIDS: 1276.
> Stopping devd.
> Waiting for PIDS: 537.
> Writing entropy file: .
> Writing early boot entropy file: .
> Terminated
> .
> Jun  6 10:25:34 xen-lothlorien syslogd: exiting on signal 15
> Waiting (max 60 seconds) for system process `vnlru' to stop... done
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining... 2 3 2 1 0 done
> All buffers synced.
> Uptime: 39m13s
> GEOM_MIRROR^Oyk^[ce gptback: provider destroyed.
> GEOM_MIRROR: Device gptswap: provider destroyed.
> GEOM_MIRROR: Device gptswap destroyed.
> GEOM_MIRROR: Device gptroot: provider destroyed.
> GEOM_MIRROR: Device gptroot destroyed.
> uhub4: detached
> uhub0: detached
> uhub2: detached
> uhub3: detached
> uhub1: detached
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x188
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff82cafe74
> stack pointer    ^O^R   = 0x28:0xfffffe00d18a9bb0
> yk^[y    nter           = 0x28:0xfffffe00d18a9bb0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (xbbd0 taskq)
> trap number             = 12
> panic: page fault
> cpuid = 0
> time = 1654536343
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d18a9970
> vpanic() at vpanic+0x151/frame 0xfffffe00d18a99c0
> panic() at panic+0x43/frame 0xfffffe00d18a9a20
> trap_fatal() at trap_fatal+0x387/frame 0xfffffe00d18a9a80
> trap_pfault() at trap_pfault+0xab/frame 0xfffffe00d18a9ae0
> calltrap() at calltrap+0x8/frame 0xfffffe00d18a9ae0
> --- trap 0xc, rip = 0xffffffff82cafe74, rsp = 0xfffffe00d18a9bb0, rbp = 0xfffffe00d18a9bb0 ---
> dmu_objset_zil() at dmu_objset_zil+0x4/frame 0xfffffe00d18a9bb0
> zil_open() at zil_open+0xf/frame 0xfffffe00d18a9bd0
> 
> l_ensure_zilog() at zvol_ensure_zilog+0xf1/frame 0xfffffe00d18a9bf0
> zvol_geom_bio_strategy() at zvol_geom_bio_strategy+0x90/frame 0xfffffe00d18a9c70
> xbb_dispatch_dev() at xbb_dispatch_dev+0x274/frame 0xfffffe00d18a9d20
> xbb_run_queue() at xbb_run_queue+0xbf5/frame 0xfffffe00d18a9e40

This seems to be caused by a stale blkback thread that's still active
after the underlying device has been shut down.

Can you provide a bit more information about how to reproduce the
issue?

I assume you are shutting down with at least a guest still active and
doing IO?

Which command are you using to reboot the system?

Thanks, Roger.