Re: Stable/13 doesn't boot with xen

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Tue, 07 Jun 2022 09:32:57 UTC
On Sat, Jun 04, 2022 at 01:03:55PM -0700, Brian Buhrow wrote:
> 	Hello.  In the process of getting the netback.c patch into a working kernel on the system
> I'm updating, I discovered that Stable/13 doesn't boot successfully when xen-4.16 is running.
> Specifically, the system hangs while configuring the PCI busses.  I suspect it has something to
> do with the failure of the hpet timers to attach.
> I do not know if it works with older versions of the xen hypervisor, but it boots and runs fine
> without xen running.  Below are two boot logs, the first with xen installed and running, the
> one that fails, and the successfful boot,  with FreeBSD running on bare metal.
> Hopefully this is a known issue and there is an easy fix.
> 
> -thanks
> -Brian
> 
> Version info:
> 
> FreeBSD xen-lothlorien.nfbcal.org 13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-984a45d77: Sat Jun  4 07:38:09 PDT 2022     buhrow@fbsd_dev.nfbcal.org:/usr/home/buhrow/obj/usr/home/buhrow/src/fbsd-src/13/amd64.amd64/sys/GENERIC amd64
> 
> <broken boot log with xen>
> 
> BIOS drive C: is disk0
> BIOS drive D: is disk1
> BIOS drive E: is disk2
> BIOS drive F: is disk3
> 
> FreeBSD/x86 bootstrap loader, Revision 1.1
> Loading /boot/defaults/loader.conf
> Loading /boot/defaults/loader.conf
> Loading /boot/device.hints
> Loading /boot/loader.conf
> Loading /boot/loader.conf.local
> Autoboot in 1 seconds. [Space] to pause 
> Loading Xen kernel...
> /boot/xen data=0x2659c8+0x13e638 -
> Loading kernel...
> /boot/kernel/kernel 
> Loading configured modules...
> /boot/kernel/ipmi.ko size 0x11950 at 0x21a6000
> loading required module 'smbus'
> /boot/kernel/smbus.ko size 0x3cb0 at 0x21b8000
> /boot/kernel/geom_mirror.ko size 0x20c80 at 0x21bc000
> /boot/kernel/tpm.ko size 0xad70 at 0x21dd000
> /boot/firmware/intel-ucode.bin -size=0x303800
> /etc/hostid size=0x25
> /boot/entropy size=0x1000
>  Xen 4.16.0
> (XEN) Xen version 4.16.0 (buhrow@) (FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)) debug=n Thu Jun  2 11:30:17 PDT 2022
> (XEN) Latest ChangeSet: 
> (XEN) Bootloader: FreeBSD Loader
> (XEN) Command line: dom0_mem=8192m dom0_max_vcpus=2 dom0=pvh pv-l1tf=off,domu=off console=com1,vga com1=9600,8n1
> (XEN) Xen image load base address: 0
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
> (XEN)  EDID info not retrieved because no DDC retrieval method detected
> (XEN) Disc information:
> (XEN)  Found 4 MBR signatures
> (XEN)  Found 4 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)  [0000000000000000, 00000000000997ff] (usable)
> (XEN)  [0000000000099800, 000000000009ffff] (reserved)
> (XEN)  [00000000000e0000, 00000000000fffff] (reserved)
> (XEN)  [0000000000100000, 000000001fffffff] (usable)
> (XEN)  [0000000020000000, 00000000201fffff] (reserved)
> (XEN)  [0000000020200000, 000000003fffffff] (usable)
> (XEN)  [0000000040000000, 00000000401fffff] (reserved)
> (XEN)  [0000000040200000, 00000000bc855fff] (usable)
> (XEN)  [00000000bc856000, 00000000bc85efff] (ACPI data)
> (XEN)  [00000000bc85f000, 00000000bc8a9fff] (ACPI NVS)
> (XEN)  [00000000bc8aa000, 00000000bc8b1fff] (usable)
> (XEN)  [00000000bc8b2000, 00000000bc9a4fff] (reserved)
> (XEN)  [00000000bc9a5000, 00000000bc9a6fff] (usable)
> (XEN)  [00000000bc9a7000, 00000000bcbc5fff] (reserved)
> (XEN)  [00000000bcbc6000, 00000000bcbc6fff] (usable)
> (XEN)  [00000000bcbc7000, 00000000bcbd6fff] (reserved)
> (XEN)  [00000000bcbd7000, 00000000bcbf4fff] (ACPI NVS)
> (XEN)  [00000000bcbf5000, 00000000bcc18fff] (reserved)
> (XEN)  [00000000bcc19000, 00000000bcc5bfff] (ACPI NVS)
> (XEN)  [00000000bcc5c000, 00000000bce7bfff] (reserved)
> (XEN)  [00000000bce7c000, 00000000bcffffff] (usable)
> (XEN)  [00000000bd800000, 00000000bf9fffff] (reserved)
> (XEN)  [00000000fed1c000, 00000000fed3ffff] (reserved)
> (XEN)  [00000000ff000000, 00000000ffffffff] (reserved)
> (XEN)  [0000000100000000, 000000083e5fffff] (usable)
> (XEN) New Xen image base address: 0xbc200000
> (XEN) ACPI: RSDP 000F0450, 0024 (r2  INTEL)
> (XEN) ACPI: XSDT BC856070, 0064 (r1 INTEL  DQ67SW    1072009 AMI     10013)
> (XEN) ACPI: FACP BC85DBC0, 00F4 (r4 INTEL  DQ67SW    1072009 AMI     10013)
> (XEN) ACPI: DSDT BC856168, 7A54 (r2 INTEL  DQ67SW         16 INTL 20051117)
> (XEN) ACPI: FACS BCBDBF80, 0040
> (XEN) ACPI: APIC BC85DCB8, 0072 (r3 INTEL  DQ67SW    1072009 AMI     10013)
> (XEN) ACPI: TCPA BC85DD30, 0032 (r2 INTEL  DQ67SW          1 MSFT  1000013)
> (XEN) ACPI: SSDT BC85DD68, 0102 (r1 INTEL  DQ67SW          1 MSFT  3000001)
> (XEN) ACPI: MCFG BC85DE70, 003C (r1 INTEL  DQ67SW    1072009 MSFT       97)
> (XEN) ACPI: HPET BC85DEB0, 0038 (r1 INTEL  DQ67SW    1072009 AMI.        4)
> (XEN) ACPI: ASF! BC85DEE8, 00A0 (r32 INTEL  DQ67SW          1 TFSM    F4240)
> (XEN) ACPI: DMAR BC85DF88, 00E8 (r1 INTEL  DQ67SW          1 INTL        1)
> (XEN) System RAM: 32683MB (33467896kB)
> (XEN) Domain heap initialised
> (XEN) ACPI: 32/64X FACS address mismatch in FADT - bcbdbf80/0000000000000000, using 32
> (XEN) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
> (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
> (XEN) PCI: Not using MCFG for segment 0000 bus 00-3f
> (XEN) Switched to APIC driver x2apic_cluster
> (XEN) CPU0: 1600 ... 3100 MHz
> (XEN) xstate: size: 0x340 and states: 0x7
> (XEN) Speculative mitigation facilities:
> (XEN)   Hardware hints:
> (XEN)   Hardware features: IBPB IBRS STIBP SSBD L1D_FLUSH
> (XEN)   Compiled-in support: INDIRECT_THUNK SHADOW_PAGING
> (XEN)   Xen settings: BTI-Thunk RETPOLINE, SPEC_CTRL: IBRS- STIBP- SSBD-, Other: IBPB L1D_FLUSH BRANCH_HARDEN
> (XEN)   L1TF: believed vulnerable, maxphysaddr L1D 46, CPUID 36, Safe address 1000000000
> (XEN)   Support for HVM VMs: MSR_SPEC_CTRL RSB EAGER_FPU
> (XEN)   Support for PV VMs: MSR_SPEC_CTRL RSB EAGER_FPU
> (XEN)   XPTI (64-bit PV only): Dom0 enabled, DomU enabled (without PCID)
> (XEN)   PV L1TF shadowing: Dom0 disabled, DomU disabled
> (XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) Initializing Credit2 scheduler
> (XEN) Platform timer is 14.318MHz HPET
> (XEN) Detected 3092.988 MHz processor.
> (XEN) Intel VT-d iommu 0 supported page sizes: 4kB
> (XEN) Intel VT-d iommu 1 supported page sizes: 4kB
> (XEN) Intel VT-d Snoop Control not enabled.
> (XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
> (XEN) Intel VT-d Queued Invalidation enabled.
> (XEN) Intel VT-d Interrupt Remapping enabled.
> (XEN) Intel VT-d Posted Interrupt not enabled.
> (XEN) Intel VT-d Shared EPT tables not enabled.
> (XEN) I/O virtualisation enabled
> (XEN)  - Dom0 mode: Relaxed
> (XEN) Interrupt remapping enabled
> (XEN) Enabled directed EOI with ioapic_ack_old on!
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using old ACK method
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VMX: Supported advanced features:
> (XEN)  - APIC MMIO access virtualisation
> (XEN)  - APIC TPR shadow
> (XEN)  - Extended Page Tables (EPT)
> (XEN)  - Virtual-Processor Identifiers (VPID)
> (XEN)  - Virtual NMI
> (XEN)  - MSR direct-access bitmap
> (XEN)  - Unrestricted Guest
> (XEN) HVM: ASIDs enabled.
> (XEN) VMX: Disabling executable EPT superpages due to CVE-2018-12207
> (XEN) HVM: VMX enabled
> (XEN) HVM: Hardware Assisted Paging (HAP) detected
> (XEN) HVM: HAP page sizes: 4kB, 2MB
> (XEN) Brought up 4 CPUs
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> (XEN) Dom0 has maximum 440 PIRQs
> (XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0
> (XEN) WARNING: PVH is an experimental mode with limited functionality
> (XEN) Initial low memory virq threshold set at 0x4000 pages.
> (XEN) Scrubbing Free RAM in background
> (XEN) Std. Loglevel: Errors and warnings
> (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
> (XEN) Xen is relinquishing VGA console.
> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
> (XEN) Freed 600kB init memory
> ---<<BOOT>>---
> Copyright (c) 1992-2021 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 13.1-STABLE #0 stable/13-984a45d77: Sat Jun  4 07:38:09 PDT 2022
>     buhrow@fbsd_dev.nfbcal.org:/usr/home/buhrow/obj/usr/home/buhrow/src/fbsd-src/13/amd64.amd64/sys/GENERIC amd64
> FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
> VT(vga): text 80x25
> XEN: Hypervisor version 4.16 detected.
> CPU microcode: no matching update found
> CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (3092.99-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x206a7  Family=0x6  Model=0x2a  Stepping=7
>   Features=0x1fc3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT>
>   Features2=0x9fba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
>   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   Structured Extended Features3=0x9c000000<IBPB,STIBP,L1DFL,SSBD>
>   XSAVE Features=0x1<XSAVEOPT>
>   AMD Extended Feature Extensions ID EBX=0x101000<IBPB>
> Hypervisor: Origin = "XenVMMXenVMM"
> real memory  = 9725026304 (9274 MB)
> avail memory = 8273047552 (7889 MB)
> Event timer "LAPIC" quality 100
> ACPI APIC Table: <INTEL  DQ67SW  >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> FreeBSD/SMP: 1 package(s) x 2 core(s)
> random: unblocking device.
> ioapic0 <Version 1.1> irqs 0-23
> Launching APs: 1
> random: entropy device external interface
> kbd1 at kbdmux0
> vtvga0: <VT VGA driver>
> smbios0: <System Management BIOS> at iomem 0xf0480-0xf049e
> smbios0: Version: 2.6, BCD Revision: 2.6
> aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
> acpi0: <INTEL DQ67SW>
> acpi0: Power Button (fixed)
> attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
> atrtc0: registered as a time-of-day clock, resolution 1.000000s
> Event timer "RTC" frequency 32768 Hz quality 0
> hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
> hpet0: memory region width 1024 too small for 32 timers
> device_attach: hpet0 attach returned 6
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> [ ... At this point, the system hangs, needing a hard reset ...]

Interesting.  I don't seem to be able to reproduce the hang, so I will
need a bit more data from your side in order to debug.

When the system hangs, can you switch to the Xen console (Ctrl-A x3 on
the serial) and then press the 'd' key and paste the output here?

If possible also upload your kernel.debug somewhere, so I can get the
symbols (it's in /usr/lib/debug/boot/kernel/kernel.debug)

Alternatively you can also try to find the commit that broke it, by
using 'git bisect' between 13.1 RELEASE and current stable/13.  13.1
is fairly new, so there shouldn't be many bisection steps to figure
out the wrong commit.

Thanks, Roger.