Re: git: a695ac2ce8bc - main - arm64: Move intr_pic_init_secondary earlier
- In reply to: John Baldwin : "Re: git: a695ac2ce8bc - main - arm64: Move intr_pic_init_secondary earlier"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 22 Nov 2025 22:00:59 UTC
> On 22. Nov 2025, at 17:47, John Baldwin <jhb@freebsd.org> wrote:
>
> On 11/18/25 13:02, Andrew Turner wrote:
>> The branch main has been updated by andrew:
>> URL: https://cgit.FreeBSD.org/src/commit/?id=a695ac2ce8bc8e8b989359002659063f2e056dcf
>> commit a695ac2ce8bc8e8b989359002659063f2e056dcf
>> Author: Andrew Turner <andrew@FreeBSD.org>
>> AuthorDate: 2025-11-18 18:00:32 +0000
>> Commit: Andrew Turner <andrew@FreeBSD.org>
>> CommitDate: 2025-11-18 18:00:32 +0000
>> arm64: Move intr_pic_init_secondary earlier
>> This may have been called after intr_irq_shuffle. For most interrupt
>> controllers this appears to be safe, however for the GICv5 we need to
>> read a per-CPU ID register before we can assign interrupts to a given
>> CPU.
>> Fix the race by moving intr_pic_init_secondary earlier in the boot,
>> after devices have been enumerated and before the interrupts are moved
>> to their assigned CPUs.
>> Sponsored by: Arm Ltd
>> Differential Revision: https://reviews.freebsd.org/D53685
>
> This reliably panics on boot on an Ampere Altra system I have access to.
I think this also affects FreeBSD under VMWare Fusion or VirtualBox in arm-based Macs.
Booting in safe mode always worked. Using QEMU did not result in any problem.
Best regards
Michael
> Unfortunately the panic isn't very helpful as multiple CPUs panic at once
> cluttering the console and there appear to be secondary panics in the
> console code that obscure whatever the original panic is. A few sample
> crashes below:
>
> pci24: <PCI bus> numa-domain 0 on pcib24
> cpu0: <ACPI CPU> on acpi0
> armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
> Fa t xal d x0: 0xffff0000:<E5><FF> 0 x0FFF Fpaatnailc :d ata abormt:t
> x_ Asser0ion p->tp_row < t->t_winsize.tp_row failed at /usr/src/sys/teke dttken
> .cr103
> puid = -65536
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> KDB: enter: panic
> panic: kdb_backend_permitted: missing cred for 0xffff0000455b21a0
> cpuid = -65536
> time = 1
> ...
>
> pcib24: <PCI-PCI bridge> at device 7.0 numa-domain 0 on pci20
> pci24: <PCI bus> numa-domain 0 on pcib24
> cpu0: <ACPI CPU> on acpi0
> armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
> Fatal data abo rxt0:: 0xFfaftf lFaFF x0: 0x0000000096000004
> x1: 0xffff0000454c3640 (crypto_dev + 0x43a95f80)
> x2: 0x0000000096000004
> x3: 0x0000000096000504
> x4: 0xffff0000454c3590 (crypto_dev + 0x43a95ed0)
> x5: 0xffff00000088881c (handle_el1h_sync + 0x1c)
> x6: 0x0000000000000000
> x7: 0xffff00000088881c (handle_el1h_sync + 0x1c)
> x8: 0x00000000f0c1a000
> x9: 0x0000000000000620
> x10: 0x000000000x0:pa00
> x11: 0x000 000000000500
> x12: 0x0000000096000004
> x13: 0xffff0000454c36e0 (crypto_dev + 0x43a96020)
> x14: 0xffff0000454c3610 (crypto_dev + 0x43a95f50)
> x15: 0xffff00000088881c (handle_el1h_sync + 0x1c)
> x16: 0xffff0000008b59e4 (data_abort + 0x158)
> x17: 0x00000000804000c9
> x18: 0xffff00004553a000 (crypto_dev + 0x43b0c940)
> x19: 0xffff0000454c3640 (crypto_dev + 0x43a95f80)
> x20: 0x0000000096000004
> x21: 0x0000000096000504
> x22: 0x0000000096000004
> x23: 0x0000000000000620
> x24: 0x00000000f0c1a000
> x25: 0x0000000000000000
> x26: 0xffff000000000000
> x27: 0xffff000000a318d6 (notify.prefix + 0x3e2a5)
> x28: 0xffff000000a02aa1 (notify.prefix + 0xf470)
> x29: 0xffff000000b77488 (abort_handlers + 0x0)
> sp: 0xffff0000454c3570
> lr: 0xffff00000088881c (handle_el1h_sync + 0x1c)
> elr: 0xffff0000008b59e4 (data_abort + 0x158)
> spsr: 0x00000000804000c9
> far: 0x0000000096000504
> esr: 0x0000000096000004
> panic: data abort with spinlock held (spinlock count 356126888 != 0)
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> data_abort() at data_abort+0x3a0
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x96000004
> data_abort() at data_abort+0x158
> (null)() at -0x4
> WARNING: D-cacheline size mismatch 64 != 1024
> WARNING: I-cacheline size mismatch 64 != 16384
> WARNING: D-cacheline size mismatch 64 != 8192
> WARNING: D-cacheline size mismatch 64 != 8
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 2048
> WARNING: D-cacheline size mismatch 64 != 2048
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 1024
> WARNING: I-cacheline size mismatch 64 != 16384
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 8192
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 128
> WARNING: D-cacheline size mismatch 64 != 2048
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 512
> WARNING: I-cacheline size mismatch 64 != 1024
> WARNING: D-cacheline size mismatch 64 != 2048
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 2048
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 8
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 2048
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 1024
> WARNING: I-cacheline size mismatch 64 != 16384
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 8192
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 2048
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 1024
> WARNING: I-cacheline size mismatch 64 != 16384
> WARNING: D-cacheline size mismatch 64 != 4
> WARNING: I-cacheline size mismatch 64 != 4
> WARNING: D-cacheline size mismatch 64 != 8192
> WARNING: I-cacheline size mismatch 64 != 4
> Fatal data abort:
> x0: 0x0000000096000504
> x1: 0xffff0000015e7ef6 ($d + 0x46)
> x2: 0x00000000000000df
> x3: 0x0000000000000074
> x4: 0x0000000000000000
> x5: 0x020f352e0d060319
> x6: 0x0000000000000004
> x7: 0x656e6f7a5f716b73
> x8: 0x0101010101010101
> x9: 0x0000000000000003
> x10: 0xfffeffff6b5e79f2
> x11: 0x0000000000000001
> x12: 0x0000000000000000
> x13: 0x0000000000000017
> x14: 0x0000080080000000
> x15: 0xffff000000b73548 (mvfr1_fields + 0x0)
> x16: 0xffff0000018edd30 (__stop_set_modmetadata_set + 0xf00)
> x17: 0xffff000000831d3c (uma_zcreate + 0x0)
> x18: 0xffff0000011bc900 (pcpu0 + 0x0)
> x19: 0xffff000116200200
> x20: 0xffff000000e5b9c8 (initstack + 0x39c8)
> x21: 0xffff0000015e7ef6 ($d + 0x46)
> x22: 0xffff0000404cd200 (crypto_dev + 0x3ea9fb40)
> x23: 0x0000000000000000
> x24: 0xffff00004548b000 (crypto_dev + 0x43a5d940)
> x25: 0xffff0000018b7128 (system_taskq_init_sys_init + 0x0)
> x26: 0xffff0000010bd478 (mp_ncpus + 0x0)
> x27: 0x0000000003800000
> x28: 0xffff00000103b000 (g_bio_run_down + 0x30)
> x29: 0xffff000000e5b8b0 (initstack + 0x38b0)
> sp: 0xffff000000e5b880
> lr: 0xffff0000008313e0 (zone_ctor + 0xd8)
> elr: 0xffff0000008b38f8 (strcmp + 0x98)
> spsr: 0x0000000000400009
> far: 0x0000000096000504
> esr: 0x0000000096000004
> panic: vm_fault failed: 0xffff0000008b38f8 error 1
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> data_abort() at data_abort+0x28c
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x96000004
> strcmp() at strcmp+0x98
> item_ctor() at item_ctor+0x218
> zone_alloc_item() at zone_alloc_item+0x140
> uma_zcreate() at uma_zcreate+0xa4
> system_taskq_init() at system_taskq_init+0x10c
> mi_startup() at mi_startup+0x1f4
> virtdone() at virtdone+0x74
> KDB: enter: panic
> KDBh rcdntnrc0i^Mtpa1ick0spi
> Siot ece_ opanic ankc:tmtt_ckanic:cpamppa :p sc p tal cp ix: i a 0:nixpfpfnm0npan2c5a0nippp: (ntc: npteap p nia: mnxap1:i0:0m00anpa00p0p_pon
> p p2p 0papfn:0n000p0nab: mppppanip_nic:_mtx.loc p nx1: 3p)_
> o k_s an: 0ec0ra0dpapmpic
> ppan::pxpac0 00x_lo0k_nic
> pax5n pxp0npa0 00x_lo050^Mpinx6p 0xpmffc:0mt0_lb128spnpa (nvnr anapa mp4p0n
> mt:alxp0pic: m3a_0p04p
> pap8_ capani0mp0n1c:amt0apppantnr ad0_lp p ni0np
> anp9:mpac:p0000: nix0pap
> cp0:pcnpcp pan0c: m00pan
> an1np0nip0np pa00pp0na0:cp p2: nxtp00tpan00: 0tap
> xnic pxpp000c00mt0p0o0fpppxc4c m0xi000c: 0p0p0nca
> sapa: 0x00 ap00i0:0map00^Mpix: anx_lpfficp0ni3:cmd0pp antc:rmnan: :impu_pa 0nic:
> :a1ac:pa0m0:p0nip: mtx_0a
> icni:a anfnp0:p00p1bpmp0_lpippamic:pmta_l)ck s19pi0x00npc: m0n0c:ppan
> nxpa:nicf fpx0000k_np3nicpipa(icapapsnpr:_can_leck+ac:0)
> tx2:p picpf:00p0pan1c00mtx_lock_spinv rnvnpnpoptxpace4apepa ptxplp3ni^M:pnp2n c:fana0ic:0mtx_0ppaip:nppapreanipsc: 0apanp
> xc:capxc: mpip0n0n:am0xpppipanicpictpan_parnpbpo trpapptpbpa_scapap ninicp mp24pacic:fn0a00c1 mlxcpanic:pmcx_loci_bp+ 0plo)^Minpp5p nxcf ft00lopa6can0pppanp : o_expcpenfep pvapaocnii:apap c:net9anp
> px26pa0apap000cp0panpanic
> nxn7: 0ni0:0p0nic:i0n m0
> panic atxfp00ic: m5a_3ock_sppn: itauacad+c:pap38)^Mnpx29::amtpap000appnnap0nicpipanic:tnixcp ni0:a70pa^Mipani p0xfffcp0n0c0 op7nin
> pplp:n0p:9max_9ocka5pan1
> pelp: ox49c2dm9xbcapan:1^Mpspxrp nipa0apan0c0 a0xcla
> f:i: x4pandc:abca5anic
> mapini inknicpakaraplpa ctxtinnpac2 mtxpana1icxnan: 0t0
> ock_d pax
> panic= 1t
> panic= 1t
> pDn:nipankcb mtxp:cenp
> : nppb_nraca_spapataaic:npapbnipaci_mexp^Mpicppp_prac _nilf mraploc(_ atn:precnicncc_ mtf_lrpp_icipxni
> pmtp_ppanic(ppnncppptp_lini +0xn:p
> iapanpcn ct)_lpapppmppnicn mpanix_pppppappan1p_apnp(p c:nppppppx_nil1 mtxnl+cxpsp^Mnpcnicp papln:el papyncc: acx_ppcpanacdlp_eic:_spnpanip8pap-cpanceapanic estx_loaka0pc0n
> pipapac: )tananipppp_anacppn92nic: 0t1p
> picic:ppriic:(mtxtpockpspvnrcn apa_apa
> ia:ipecp1n
> papppppppppppppaanic:mmtx_occk_spin: recursed o nnnreecursivemmutex trmlck @ /usrssrc/sys/krrn/sbbr_temmina.c::605
>
> ccpuid = 05
> ti^Me i 1
> KDB: DB:ck bakkbrace:
> e:
> db_tract_sel_() atb_trdb_tracf_self
> e: db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xa0
> kvprintf() at kvprintf+0xa4
> _vprintf() at _vprintf+0x78
> printf() at printf+0x58
> vpanic() at vpanic+0x26c
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnput<FF>NOTICE: DRAM FW version 211207
> ...
>
>
> pci24: <PCI bus> numa-domain 0 on pcib24
> cpu0: <ACPI CPU> on acpi0
> armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
> <C9>p a nxip cx:0aF n:a it0caxl:f F<DF> panic: stack overflow detected; back
> trace may be corrupted
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> __stack_chk_fail() at __stack_chk_fail+0x14
> msgbuf_addst (x0: 0x00000s0b<E0> x0dd0xr00004000000000ul
> x1: 0x0x
> KDB: enter: panic
> KDB: KeKK KKKKlKKKnKKK KKKKKpKKaKKKKKcKKpKKaKKKKKKKKKKKKKKKpKKKKKKppKKKKpKKKpKK
> KKKKKKKKKKKKKKKpKKKKpKKKKKKKpKKKKKKpKKpKpKKpKKKpKpKpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppanic: mtx_lock_spin: eecrrsed on non-recursive mutxx trmlkk @ (null)x//yys/kern/uubrtterminl..c:60
>
>
> cpuid =-65
> tim
> = 1
> KDB: sB:ck aacktaacer
> e:
> db_trace_aelf() adb_trdb_trace^Mself
> e: ds_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1d0
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xa0
> kvprintf() at kvprintf+0xa4
> _vprintf() at _vprintf+0x78
> printf() at printf+0x58
> vpanic() at vpanic+0x26c
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xa0
> kvprintf() at kvprintf+0xa4
> _vprintf() at _vprintf+0x78
> printf() at printf+0x58
> vpanic() at vpanic+0x26c
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xa0
> kvprintf() at kvprintf+0xa4
> _vprintf() at _vprintf+0x78
> printf() at printf+0x58
> vpanic() at vpanic+0x26c
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xa0
> kvprintf() at kvprintf+0xa4
> _vprintf() at _vprintf+0x78
> printf() at printf+0x58
> vpanic() at vpanic+0x26c
> panic() at panic+0x48
> __mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
> termcn_cnputc() at termcn_cnputc+0x2c
> cnputc() at cnputc+0xaNOTICE: DRAM FW version 211207
>
> I do see gic0 attached in dmesg before each of the crashes.
>
> Hmm, this tries to use spin locks in the gic driver before curthread is
> set and that's probably not going to work.
>
> Indeed, the fix below lets my box boot again:
>
> diff --git a/sys/arm64/arm64/mp_machdep.c b/sys/arm64/arm64/mp_machdep.c
> index ba673ce9d6ee..5fd5197b6818 100644
> --- a/sys/arm64/arm64/mp_machdep.c
> +++ b/sys/arm64/arm64/mp_machdep.c
> @@ -270,6 +270,10 @@ init_secondary(uint64_t cpu)
> install_cpu_errata();
> enable_cpu_feat(CPU_FEAT_AFTER_DEV);
> + /* Initialize curthread */
> + KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
> + pcpup->pc_curthread = pcpup->pc_idlethread;
> +
> intr_pic_init_secondary();
> /* Signal we are done */
> @@ -279,9 +283,6 @@ init_secondary(uint64_t cpu)
> while (!atomic_load_int(&aps_ready))
> __asm __volatile("wfe");
> - /* Initialize curthread */
> - KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
> - pcpup->pc_curthread = pcpup->pc_idlethread;
> schedinit_ap();
> /* Initialize curpmap to match TTBR0's current setting. */
>
> --
> John Baldwin
>
>