Re: regression: memory issues on main/arm64 over sched/runq changes
Date: Fri, 27 Jun 2025 20:12:04 UTC
On Sat, 28 Jun 2025, Zhenlei Huang wrote:
>
>
>> On Jun 27, 2025, at 11:02 PM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
>>
>> On Wed, 25 Jun 2025, Zhenlei Huang wrote:
>>
>> Hi,
>>
>> I appplied olce's change from the review but it didn't make a difference
>> on my arm64 and now on a tree with local changes (wifi bits, user sapce
>> bits, etc).
>>
>> Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran
>> into something else (the same tree boots in a bhyve instance on a
>> different machine from a local disk image).
>>
>> At the end of if_addgroup() I had added the following for local
>> debugging (really crude sorry):
>>
>> ...
>>
>> + atomic_thread_fence_seq_cst();
>> IF_ADDR_WLOCK(ifp);
>> CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
>> CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
>> IF_ADDR_WUNLOCK(ifp);
>>
>> IFNET_WUNLOCK(); // excl unlock
>>
>> if (new)
>> EVENTHANDLER_INVOKE(group_attach_event, ifg);
>> EVENTHANDLER_INVOKE(group_change_event, groupname);
>>
>> + IFNET_RLOCK(); // shared, panic
>> + CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
>> + if (bz_debug_groups) if_printf(ifp, "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL, (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group : NULL);
>> + }
>> + IFNET_RUNLOCK();
>> +
>> return (0);
>> }
>>
>>
>>
>> You see the anotation //shared ?
>>
>> I got a panic: excl->share with that.
>
> Well, I applied identical patch with you and I can repeat that panic, but my screen freezes and the top most stack is
I took a video of the boot at 60fps so I could "scroll" a bit backwards.
> ```
> _sx_slock_int() at _sx_slock_int+0x64/frame 0xff....
> if_addgroup() at .....
> ....
> device_attach() at ...
> ...
> root_bus_configure() at ...
> configure() at ...
> mi_startup() at ..
> ```
>
> I've no idea what's wrong. From the disassembly it appears the panic happens just after witness_checkorder .
That is interesting. So it's not just me.
Did you do a netboot or from disk?
--
Bjoern A. Zeeb r15:7