git: 893be9d8ac16 - main - sleepqueue: Address a lock order reversal

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Mon, 14 Feb 2022 15:07:02 UTC
The branch main has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=893be9d8ac161c4cc96e9f3f12f1260355dd123b

commit 893be9d8ac161c4cc96e9f3f12f1260355dd123b
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-02-14 14:38:53 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-02-14 15:06:47 +0000

    sleepqueue: Address a lock order reversal
    
    After commit 74cf7cae4d22 ("softclock: Use dedicated ithreads for
    running callouts."), there is a lock order reversal between the per-CPU
    callout lock and the scheduler lock.  softclock_thread() locks callout
    lock then the scheduler lock, when preparing to switch off-CPU, and
    sleepq_remove_thread() stops the timed sleep callout while potentially
    holding a scheduler lock.  In the latter case, it's the thread itself
    that's locked, and if the thread is sleeping then its lock will be a
    sleepqueue lock, but if it's still in the process of going to sleep
    it'll be a scheduler lock.
    
    We could perhaps change softclock_thread() to try to acquire locks in
    the opposite order, but that'd require dropping and re-acquiring the
    callout lock, which seems expensive for an operation that will happen
    quite frequently.  We can instead perhaps avoid stopping the
    td_slpcallout callout if the thread is still going to sleep, which is
    what this patch does.  This will result in a spurious call to
    sleepq_timeout(), but some counters suggest that this is very rare.
    
    PR:             261198
    Fixes:          74cf7cae4d22 ("softclock: Use dedicated ithreads for running callouts.")
    Reported and tested by: thj
    Reviewed by:    kib
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D34204
---
 sys/kern/subr_sleepqueue.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/sys/kern/subr_sleepqueue.c b/sys/kern/subr_sleepqueue.c
index 36832ef96ba4..af5a001b46fb 100644
--- a/sys/kern/subr_sleepqueue.c
+++ b/sys/kern/subr_sleepqueue.c
@@ -833,7 +833,8 @@ sleepq_remove_thread(struct sleepqueue *sq, struct thread *td)
 		td->td_sleepqueue = LIST_FIRST(&sq->sq_free);
 	LIST_REMOVE(td->td_sleepqueue, sq_hash);
 
-	if ((td->td_flags & TDF_TIMEOUT) == 0 && td->td_sleeptimo != 0)
+	if ((td->td_flags & TDF_TIMEOUT) == 0 && td->td_sleeptimo != 0 &&
+	    td->td_lock == &sc->sc_lock) {
 		/*
 		 * We ignore the situation where timeout subsystem was
 		 * unable to stop our callout.  The struct thread is
@@ -843,8 +844,16 @@ sleepq_remove_thread(struct sleepqueue *sq, struct thread *td)
 		 * sleepq_timeout() ensure that the thread does not
 		 * get spurious wakeups, even if the callout was reset
 		 * or thread reused.
+		 *
+		 * We also cannot safely stop the callout if a scheduler
+		 * lock is held since softclock_thread() forces a lock
+		 * order of callout lock -> scheduler lock.  The thread
+		 * lock will be a scheduler lock only if the thread is
+		 * preparing to go to sleep, so this is hopefully a rare
+		 * scenario.
 		 */
 		callout_stop(&td->td_slpcallout);
+	}
 
 	td->td_wmesg = NULL;
 	td->td_wchan = NULL;