Re: git: a695ac2ce8bc - main - arm64: Move intr_pic_init_secondary earlier

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Sat, 22 Nov 2025 16:47:20 UTC
On 11/18/25 13:02, Andrew Turner wrote:
> The branch main has been updated by andrew:
> 
> URL: https://cgit.FreeBSD.org/src/commit/?id=a695ac2ce8bc8e8b989359002659063f2e056dcf
> 
> commit a695ac2ce8bc8e8b989359002659063f2e056dcf
> Author:     Andrew Turner <andrew@FreeBSD.org>
> AuthorDate: 2025-11-18 18:00:32 +0000
> Commit:     Andrew Turner <andrew@FreeBSD.org>
> CommitDate: 2025-11-18 18:00:32 +0000
> 
>      arm64: Move intr_pic_init_secondary earlier
>      
>      This may have been called after intr_irq_shuffle. For most interrupt
>      controllers this appears to be safe, however for the GICv5 we need to
>      read a per-CPU ID register before we can assign interrupts to a given
>      CPU.
>      
>      Fix the race by moving intr_pic_init_secondary earlier in the boot,
>      after devices have been enumerated and before the interrupts are moved
>      to their assigned CPUs.
>      
>      Sponsored by:   Arm Ltd
>      Differential Revision:  https://reviews.freebsd.org/D53685

This reliably panics on boot on an Ampere Altra system I have access to.
Unfortunately the panic isn't very helpful as multiple CPUs panic at once
cluttering the console and there appear to be secondary panics in the
console code that obscure whatever the original panic is.  A few sample
crashes below:

pci24: <PCI bus> numa-domain 0 on pcib24
cpu0: <ACPI CPU> on acpi0
armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
Fa t xal  d x0: 0xffff0000:<E5><FF> 0 x0FFF   Fpaatnailc :d ata abormt:t
x_  Asser0ion p->tp_row < t->t_winsize.tp_row failed at /usr/src/sys/teke dttken
.cr103
  puid = -65536
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
KDB: enter: panic
panic: kdb_backend_permitted: missing cred for 0xffff0000455b21a0
cpuid = -65536
time = 1
...

pcib24: <PCI-PCI bridge> at device 7.0 numa-domain 0 on pci20
pci24: <PCI bus> numa-domain 0 on pcib24
cpu0: <ACPI CPU> on acpi0
armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
  Fatal data  abo rxt0:: 0xFfaftf lFaFF  x0: 0x0000000096000004
   x1: 0xffff0000454c3640 (crypto_dev + 0x43a95f80)
   x2: 0x0000000096000004
   x3: 0x0000000096000504
   x4: 0xffff0000454c3590 (crypto_dev + 0x43a95ed0)
   x5: 0xffff00000088881c (handle_el1h_sync + 0x1c)
   x6: 0x0000000000000000
   x7: 0xffff00000088881c (handle_el1h_sync + 0x1c)
   x8: 0x00000000f0c1a000
   x9: 0x0000000000000620
  x10: 0x000000000x0:pa00
  x11: 0x000 000000000500
  x12: 0x0000000096000004
  x13: 0xffff0000454c36e0 (crypto_dev + 0x43a96020)
  x14: 0xffff0000454c3610 (crypto_dev + 0x43a95f50)
  x15: 0xffff00000088881c (handle_el1h_sync + 0x1c)
  x16: 0xffff0000008b59e4 (data_abort + 0x158)
  x17: 0x00000000804000c9
  x18: 0xffff00004553a000 (crypto_dev + 0x43b0c940)
  x19: 0xffff0000454c3640 (crypto_dev + 0x43a95f80)
  x20: 0x0000000096000004
  x21: 0x0000000096000504
  x22: 0x0000000096000004
  x23: 0x0000000000000620
  x24: 0x00000000f0c1a000
  x25: 0x0000000000000000
  x26: 0xffff000000000000
  x27: 0xffff000000a318d6 (notify.prefix + 0x3e2a5)
  x28: 0xffff000000a02aa1 (notify.prefix + 0xf470)
  x29: 0xffff000000b77488 (abort_handlers + 0x0)
   sp: 0xffff0000454c3570
   lr: 0xffff00000088881c (handle_el1h_sync + 0x1c)
  elr: 0xffff0000008b59e4 (data_abort + 0x158)
spsr: 0x00000000804000c9
  far: 0x0000000096000504
  esr: 0x0000000096000004
panic: data abort with spinlock held (spinlock count 356126888 != 0)
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1d0
panic() at panic+0x48
data_abort() at data_abort+0x3a0
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x96000004
data_abort() at data_abort+0x158
(null)() at -0x4
WARNING: D-cacheline size mismatch 64 != 1024
WARNING: I-cacheline size mismatch 64 != 16384
WARNING: D-cacheline size mismatch 64 != 8192
WARNING: D-cacheline size mismatch 64 != 8
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 2048
WARNING: D-cacheline size mismatch 64 != 2048
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 1024
WARNING: I-cacheline size mismatch 64 != 16384
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 8192
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 128
WARNING: D-cacheline size mismatch 64 != 2048
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 512
WARNING: I-cacheline size mismatch 64 != 1024
WARNING: D-cacheline size mismatch 64 != 2048
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 2048
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 8
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 2048
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 1024
WARNING: I-cacheline size mismatch 64 != 16384
WARNING: D-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 8192
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 2048
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 1024
WARNING: I-cacheline size mismatch 64 != 16384
WARNING: D-cacheline size mismatch 64 != 4
WARNING: I-cacheline size mismatch 64 != 4
WARNING: D-cacheline size mismatch 64 != 8192
WARNING: I-cacheline size mismatch 64 != 4
Fatal data abort:
   x0: 0x0000000096000504
   x1: 0xffff0000015e7ef6 ($d + 0x46)
   x2: 0x00000000000000df
   x3: 0x0000000000000074
   x4: 0x0000000000000000
   x5: 0x020f352e0d060319
   x6: 0x0000000000000004
   x7: 0x656e6f7a5f716b73
   x8: 0x0101010101010101
   x9: 0x0000000000000003
  x10: 0xfffeffff6b5e79f2
  x11: 0x0000000000000001
  x12: 0x0000000000000000
  x13: 0x0000000000000017
  x14: 0x0000080080000000
  x15: 0xffff000000b73548 (mvfr1_fields + 0x0)
  x16: 0xffff0000018edd30 (__stop_set_modmetadata_set + 0xf00)
  x17: 0xffff000000831d3c (uma_zcreate + 0x0)
  x18: 0xffff0000011bc900 (pcpu0 + 0x0)
  x19: 0xffff000116200200
  x20: 0xffff000000e5b9c8 (initstack + 0x39c8)
  x21: 0xffff0000015e7ef6 ($d + 0x46)
  x22: 0xffff0000404cd200 (crypto_dev + 0x3ea9fb40)
  x23: 0x0000000000000000
  x24: 0xffff00004548b000 (crypto_dev + 0x43a5d940)
  x25: 0xffff0000018b7128 (system_taskq_init_sys_init + 0x0)
  x26: 0xffff0000010bd478 (mp_ncpus + 0x0)
  x27: 0x0000000003800000
  x28: 0xffff00000103b000 (g_bio_run_down + 0x30)
  x29: 0xffff000000e5b8b0 (initstack + 0x38b0)
   sp: 0xffff000000e5b880
   lr: 0xffff0000008313e0 (zone_ctor + 0xd8)
  elr: 0xffff0000008b38f8 (strcmp + 0x98)
spsr: 0x0000000000400009
  far: 0x0000000096000504
  esr: 0x0000000096000004
panic: vm_fault failed: 0xffff0000008b38f8 error 1
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1d0
panic() at panic+0x48
data_abort() at data_abort+0x28c
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x96000004
strcmp() at strcmp+0x98
item_ctor() at item_ctor+0x218
zone_alloc_item() at zone_alloc_item+0x140
uma_zcreate() at uma_zcreate+0xa4
system_taskq_init() at system_taskq_init+0x10c
mi_startup() at mi_startup+0x1f4
virtdone() at virtdone+0x74
KDB: enter: panic
KDBh rcdntnrc0i^Mtpa1ick0spi
Siot ece_ opanic   ankc:tmtt_ckanic:cpamppa   :p  sc  p tal cp  ix:  i a 0:nixpfpfnm0npan2c5a0nippp: (ntc: npteap p nia: mnxap1:i0:0m00anpa00p0p_pon
p p2p 0papfn:0n000p0nab: mppppanip_nic:_mtx.loc p nx1: 3p)_
o k_s an: 0ec0ra0dpapmpic
ppan::pxpac0 00x_lo0k_nic
pax5n pxp0npa0 00x_lo050^Mpinx6p 0xpmffc:0mt0_lb128spnpa (nvnr anapa mp4p0n
   mt:alxp0pic: m3a_0p04p
pap8_ capani0mp0n1c:amt0apppantnr ad0_lp p ni0np
anp9:mpac:p0000: nix0pap
  cp0:pcnpcp pan0c: m00pan
  an1np0nip0np pa00pp0na0:cp p2: nxtp00tpan00: 0tap
  xnic pxpp000c00mt0p0o0fpppxc4c  m0xi000c: 0p0p0nca
sapa: 0x00 ap00i0:0map00^Mpix: anx_lpfficp0ni3:cmd0pp antc:rmnan: :impu_pa 0nic:
:a1ac:pa0m0:p0nip: mtx_0a
icni:a anfnp0:p00p1bpmp0_lpippamic:pmta_l)ck s19pi0x00npc: m0n0c:ppan
nxpa:nicf fpx0000k_np3nicpipa(icapapsnpr:_can_leck+ac:0)
tx2:p picpf:00p0pan1c00mtx_lock_spinv rnvnpnpoptxpace4apepa ptxplp3ni^M:pnp2n c:fana0ic:0mtx_0ppaip:nppapreanipsc:  0apanp
  xc:capxc: mpip0n0n:am0xpppipanicpictpan_parnpbpo trpapptpbpa_scapap ninicp mp24pacic:fn0a00c1 mlxcpanic:pmcx_loci_bp+ 0plo)^Minpp5p nxcf ft00lopa6can0pppanp : o_expcpenfep pvapaocnii:apap c:net9anp
px26pa0apap000cp0panpanic
nxn7: 0ni0:0p0nic:i0n m0
panic  atxfp00ic: m5a_3ock_sppn: itauacad+c:pap38)^Mnpx29::amtpap000appnnap0nicpipanic:tnixcp ni0:a70pa^Mipani p0xfffcp0n0c0 op7nin
pplp:n0p:9max_9ocka5pan1
pelp: ox49c2dm9xbcapan:1^Mpspxrp nipa0apan0c0 a0xcla
  f:i:  x4pandc:abca5anic
mapini inknicpakaraplpa ctxtinnpac2 mtxpana1icxnan: 0t0
ock_d  pax
panic= 1t
panic= 1t
pDn:nipankcb mtxp:cenp
: nppb_nraca_spapataaic:npapbnipaci_mexp^Mpicppp_prac _nilf mraploc(_ atn:precnicncc_ mtf_lrpp_icipxni
pmtp_ppanic(ppnncppptp_lini +0xn:p
iapanpcn ct)_lpapppmppnicn mpanix_pppppappan1p_apnp(p c:nppppppx_nil1 mtxnl+cxpsp^Mnpcnicp papln:el papyncc: acx_ppcpanacdlp_eic:_spnpanip8pap-cpanceapanic estx_loaka0pc0n
pipapac: )tananipppp_anacppn92nic: 0t1p
picic:ppriic:(mtxtpockpspvnrcn apa_apa
ia:ipecp1n
papppppppppppppaanic:mmtx_occk_spin: recursed o  nnnreecursivemmutex trmlck @ /usrssrc/sys/krrn/sbbr_temmina.c::605

ccpuid = 05
ti^Me i 1
KDB: DB:ck bakkbrace:
e:
db_tract_sel_()          atb_trdb_tracf_self
e: db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1d0
panic() at panic+0x48
vpanic() at vpanic+0x1d0
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xa0
kvprintf() at kvprintf+0xa4
_vprintf() at _vprintf+0x78
printf() at printf+0x58
vpanic() at vpanic+0x26c
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnput<FF>NOTICE:  DRAM FW version 211207
...


pci24: <PCI bus> numa-domain 0 on pcib24
cpu0: <ACPI CPU> on acpi0
armv8crypto0: <AES-CBC,AES-XTS,AES-GCM>
  <C9>p a  nxip cx:0aF n:a it0caxl:f   F<DF> panic: stack overflow detected; back
trace may be corrupted
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1d0
panic() at panic+0x48
__stack_chk_fail() at __stack_chk_fail+0x14
msgbuf_addst (x0: 0x00000s0b<E0> x0dd0xr00004000000000ul
   x1: 0x0x
KDB: enter: panic
KDB:  KeKK KKKKlKKKnKKK KKKKKpKKaKKKKKcKKpKKaKKKKKKKKKKKKKKKpKKKKKKppKKKKpKKKpKK
KKKKKKKKKKKKKKKpKKKKpKKKKKKKpKKKKKKpKKpKpKKpKKKpKpKpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppanic: mtx_lock_spin: eecrrsed on non-recursive mutxx trmlkk @ (null)x//yys/kern/uubrtterminl..c:60


cpuid =-65
tim
  = 1
KDB: sB:ck aacktaacer
e:
db_trace_aelf()          adb_trdb_trace^Mself
e: ds_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1d0
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xa0
kvprintf() at kvprintf+0xa4
_vprintf() at _vprintf+0x78
printf() at printf+0x58
vpanic() at vpanic+0x26c
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xa0
kvprintf() at kvprintf+0xa4
_vprintf() at _vprintf+0x78
printf() at printf+0x58
vpanic() at vpanic+0x26c
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xa0
kvprintf() at kvprintf+0xa4
_vprintf() at _vprintf+0x78
printf() at printf+0x58
vpanic() at vpanic+0x26c
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xa0
kvprintf() at kvprintf+0xa4
_vprintf() at _vprintf+0x78
printf() at printf+0x58
vpanic() at vpanic+0x26c
panic() at panic+0x48
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0x188
termcn_cnputc() at termcn_cnputc+0x2c
cnputc() at cnputc+0xaNOTICE:  DRAM FW version 211207

I do see gic0 attached in dmesg before each of the crashes.

Hmm, this tries to use spin locks in the gic driver before curthread is
set and that's probably not going to work.

Indeed, the fix below lets my box boot again:

diff --git a/sys/arm64/arm64/mp_machdep.c b/sys/arm64/arm64/mp_machdep.c
index ba673ce9d6ee..5fd5197b6818 100644
--- a/sys/arm64/arm64/mp_machdep.c
+++ b/sys/arm64/arm64/mp_machdep.c
@@ -270,6 +270,10 @@ init_secondary(uint64_t cpu)
         install_cpu_errata();
         enable_cpu_feat(CPU_FEAT_AFTER_DEV);
  
+       /* Initialize curthread */
+       KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
+       pcpup->pc_curthread = pcpup->pc_idlethread;
+
         intr_pic_init_secondary();
  
         /* Signal we are done */
@@ -279,9 +283,6 @@ init_secondary(uint64_t cpu)
         while (!atomic_load_int(&aps_ready))
                 __asm __volatile("wfe");
  
-       /* Initialize curthread */
-       KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
-       pcpup->pc_curthread = pcpup->pc_idlethread;
         schedinit_ap();
  
         /* Initialize curpmap to match TTBR0's current setting. */

-- 
John Baldwin