[Bug 246207] [geom] geli livelocks during panic
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Tue May 5 01:44:40 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246207
Bug ID: 246207
Summary: [geom] geli livelocks during panic
Product: Base System
Version: 12.1-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: asomers at FreeBSD.org
Some geli-using machines I administer occasionally panic. When they do, they
sometimes dump core but often don't. When they don't, they simply hang after
printing the stack trace, but before printing the uptime.
I've traced the problem to geli's shutdown_pre_sync event handler. It tries to
destroy each geli device. We can't simply skip that step if a panic is
underway; erasing the keys is necessary to prevent warm-boot attacks. The
problem lies in the following lines.
g_eli_destroy:
sc->sc_flags |= G_ELI_FLAG_DESTROY;
wakeup(sc);
/*
* Wait for kernel threads self destruction.
*/
while (!LIST_EMPTY(&sc->sc_workers)) {
msleep(&sc->sc_workers, &sc->sc_queue_mtx, PRIBIO,
"geli:destroy", 0);
}
_sleep:
if (SCHEDULER_STOPPED_TD(td)) {
if (lock != NULL && priority & PDROP)
class->lc_unlock(lock);
return (0);
}
As you can see, if the scheduler is stopped for the current thread (which it
will be during a panic), then msleep does nothing, cause g_eli_destroy to loop
indefinitely. The obvious solution, which I haven't yet tested, would be to
skip that section in g_eli_destroy when the scheduler is stopped. What I don't
understand is why g_eli_destroy _ever_ works during a panic. Perhaps it has
something to do with the allocation of worker threads among cores? Perhaps it
only succeeds when all worker threads happen to be on different cores? I find
that unlikely though, because these servers have thousands of worker threads.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list