Re: regression: memory issues on main/arm64 over sched/runq changes

From: Olivier Certner <olce_at_freebsd.org>
Date: Sat, 21 Jun 2025 20:04:52 UTC
Hi Bjoern,

Given the amount of analysis and test that went into the runq/scheduler commits, a priori it is extremely unlikely there's any problem with them (assuming you're using ULE). Additionally, the scheduler never does memory allocations except at initialization.

> (testing/fiddling reports)

Assuming the commits you listed are actually the cause of the change in behavior, what you're likely observing is a race condition exposed by slightly changed execution orderings/thread selection for timesharing threads.  Since you said you observe no deletions, this may be a race due to the list being observed while being added to.  Actually, skimming at the code of if_addgroup() and if_getgroup(), I suspect some barriers are missing.

Out of caution, are you absolutely sure you've so far bisected without any local change?

> I do not know if it's feasible or doable to bi-sect those chanes further?

All these commits are independent, and you can bisect them as usual.  Actually, it would be great if you can do so, as apparently the problem you're having is not completely deterministic, involves interface groups which I'm not familiar with, and personally I won't be able to spend a significant amount of time on it until Monday.

If you do so, my bet is the outcome will be baecdea10eb5 or af8de65ef23e (more likely), which would be one more hint at what I've just said above.  Even better, if something like fdf31d274769 comes out, that would be an actual proof of a race.  Most other changes should never come out, and if they do may invalidate what I've said above.

So, if you have the occasion, I would grateful if you can bisect and report.

Thanks and regards.

-- 
Olivier Certner