Re: regression: memory issues on main/arm64 over sched/runq changes
Date: Mon, 30 Jun 2025 14:03:28 UTC
On 6/28/25 11:35, Zhenlei Huang wrote: > I boot from disk. > > Updates on this locking issue, > > I think I finally figured out why. More stack trace from my video: > > ``` > shared lock of (sx) ifnet_sx @/usr/home/zlei/freebsd-src/sys/net/if.c:1467 > while exclusively locked from /usr/home/zlei/freebsd-src/sys/net/if.c:1416 > panic: excl->share > ... > witness_checkorder() at ... > _sx_slock_int() at _sx_slock_int+0x64/frame .... > if_addgroup() at ... > if_attach_internal() at ... > ether_ifattach() at ... > iflib_device_register() at ... > iflib_device_attach() at ... > device_attach() at ... > ... > root_bus_configure() at ... > configure() at ... > mi_startup() at ... > ``` > > The ifnet_sx has flag bit SX_RECURSE then it can be recursively locked. > > iflib_device_register() acquired ifnet_sx exclusively and then calls ethernet_ifattach() which will then calls if_addgroup(). It is prohibited to re-acquire the same lock shared so the witness blames. > > I think the witness should show the first file location of the exclusively lock, i.e. sys/net/iflib.c rather than the sys/net/if.c:1416 . So that it is more straight forward to figure out how that happens. CC John to see if that can be improved. Hmm, I think we have stopped at the first lle we found walking back up the lle list (find_instance() always works this way). You could add a 'find_last_instance' and use it in a few places perhaps. I guess both the share->excl and excl->share are places where you would maybe use it. Alternatively, you could have a 'find_next_instance' and maybe output all of them before the panic? -- John Baldwin