Re: regression: memory issues on main/arm64 over sched/runq changes
Date: Fri, 27 Jun 2025 15:02:35 UTC
On Wed, 25 Jun 2025, Zhenlei Huang wrote:
Hi,
I appplied olce's change from the review but it didn't make a difference
on my arm64 and now on a tree with local changes (wifi bits, user sapce
bits, etc).
Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran
into something else (the same tree boots in a bhyve instance on a
different machine from a local disk image).
At the end of if_addgroup() I had added the following for local
debugging (really crude sorry):
...
+ atomic_thread_fence_seq_cst();
IF_ADDR_WLOCK(ifp);
CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
IF_ADDR_WUNLOCK(ifp);
IFNET_WUNLOCK(); // excl unlock
if (new)
EVENTHANDLER_INVOKE(group_attach_event, ifg);
EVENTHANDLER_INVOKE(group_change_event, groupname);
+ IFNET_RLOCK(); // shared, panic
+ CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
+ if (bz_debug_groups) if_printf(ifp, "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL, (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group : NULL);
+ }
+ IFNET_RUNLOCK();
+
return (0);
}
You see the anotation //shared ?
I got a panic: excl->share with that.
The excl. is the
IFNET_WLOCK(); // excl
at the top of the function after the groupname check.
But that gets unlocked before the event handler above
so how can this happen?
Sadly I cannot even dump or anything as the keyboard is as dead
as the rest of the laptop. Have to power cycle it hard.
Apart from the debugging I added I have no local changes in sys/net
in that tree. sys/kern seems to have no relevant changes either
(added a bus func, toggle link_elf_leak_locals default, and a printf
got an extra argument to print %d error when modules fail to load).
I'll try a plain main (hopefully tonight) on that machine too but I am
really at a loss here now that it's also happening on X86 and only for me
and always around the same code there...
I'll also try to boot this tree from a USB pen drive or something; not
that my problem comes in from netbooing...
I'll keep you posted...
/bz
--
Bjoern A. Zeeb r15:7