[Bug 246207] [geom] geli livelocks during panic

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Tue May 5 01:44:40 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246207

            Bug ID: 246207
           Summary: [geom] geli livelocks during panic
           Product: Base System
           Version: 12.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: asomers at FreeBSD.org

Some geli-using machines I administer occasionally panic.  When they do, they
sometimes dump core but often don't.  When they don't, they simply hang after
printing the stack trace, but before printing the uptime.

I've traced the problem to geli's shutdown_pre_sync event handler.  It tries to
destroy each geli device.  We can't simply skip that step if a panic is
underway; erasing the keys is necessary to prevent warm-boot attacks.  The
problem lies in the following lines.  

g_eli_destroy:
        sc->sc_flags |= G_ELI_FLAG_DESTROY;
        wakeup(sc);
        /*
         * Wait for kernel threads self destruction.
         */
        while (!LIST_EMPTY(&sc->sc_workers)) {
                msleep(&sc->sc_workers, &sc->sc_queue_mtx, PRIBIO,
                    "geli:destroy", 0);
        }

_sleep:
        if (SCHEDULER_STOPPED_TD(td)) {
                if (lock != NULL && priority & PDROP)
                        class->lc_unlock(lock);
                return (0);
        }

As you can see, if the scheduler is stopped for the current thread (which it
will be during a panic), then msleep does nothing, cause g_eli_destroy to loop
indefinitely.  The obvious solution, which I haven't yet tested, would be to
skip that section in g_eli_destroy when the scheduler is stopped.  What I don't
understand is why g_eli_destroy _ever_ works during a panic.  Perhaps it has
something to do with the allocation of worker threads among cores?  Perhaps it
only succeeds when all worker threads happen to be on different cores?  I find
that unlikely though, because these servers have thousands of worker threads.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list