git: afc3d49b17a3 - main - nvme: Close a race in destroying qpair and timeouts

From: Warner Losh <imp_at_FreeBSD.org>
Date: Tue, 10 Oct 2023 22:26:20 UTC
The branch main has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=afc3d49b17a35db3b70c9e4f63a508a14a8237fe

commit afc3d49b17a35db3b70c9e4f63a508a14a8237fe
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2023-10-10 17:13:25 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-10-10 22:13:57 +0000

    nvme: Close a race in destroying qpair and timeouts
    
    While we should have cleared all the pending I/O prior to calling
    nvme_qpair_destroy, which should ensure that if the callout_drain causes
    a call to nvme_qpair_timeout(), it won't schedule any new
    timeout. However, it doesn't hurt to set timeout_pending to false in
    nvme_qpair_destroy() and have nvme_qpair_timeout() exit early if it sees
    it w/o scheduling a timeout. Since we don't otherwise stop the timeout
    until we're about to destroy the qpair, this ensures we fail safe. The
    lock/unlock also ensures the callout_drain will either remove the callout,
    or wait for it to run with the early bailout.
    
    We can likely further improve this by using callout_stop() inside the
    pending lock. I'll investigate that for future refinement.
    
    Sponsored by:           Netflix
    Suggestions by:         jhb
    Reviewed by:            gallatin
    Differential Revision:  https://reviews.freebsd.org/D42065
---
 sys/dev/nvme/nvme_qpair.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
index 6d9d337e76a5..569d54ab6aef 100644
--- a/sys/dev/nvme/nvme_qpair.c
+++ b/sys/dev/nvme/nvme_qpair.c
@@ -727,6 +727,10 @@ nvme_qpair_construct(struct nvme_qpair *qpair,
 	mtx_init(&qpair->lock, "nvme qpair lock", NULL, MTX_DEF);
 	mtx_init(&qpair->recovery, "nvme qpair recovery", NULL, MTX_DEF);
 
+	callout_init_mtx(&qpair->timer, &qpair->recovery, 0);
+	qpair->timer_armed = false;
+	qpair->recovery_state = RECOVERY_WAITING;
+
 	/* Note: NVMe PRP format is restricted to 4-byte alignment. */
 	err = bus_dma_tag_create(bus_get_dma_tag(ctrlr->dev),
 	    4, ctrlr->page_size, BUS_SPACE_MAXADDR,
@@ -792,10 +796,6 @@ nvme_qpair_construct(struct nvme_qpair *qpair,
 	qpair->cpl_bus_addr = queuemem_phys + cmdsz;
 	prpmem_phys = queuemem_phys + cmdsz + cplsz;
 
-	callout_init_mtx(&qpair->timer, &qpair->recovery, 0);
-	qpair->timer_armed = false;
-	qpair->recovery_state = RECOVERY_WAITING;
-
 	/*
 	 * Calcuate the stride of the doorbell register. Many emulators set this
 	 * value to correspond to a cache line. However, some hardware has set
@@ -891,6 +891,9 @@ nvme_qpair_destroy(struct nvme_qpair *qpair)
 {
 	struct nvme_tracker	*tr;
 
+	mtx_lock(&qpair->recovery);
+	qpair->timer_armed = false;
+	mtx_unlock(&qpair->recovery);
 	callout_drain(&qpair->timer);
 
 	if (qpair->tag) {
@@ -1039,6 +1042,19 @@ nvme_qpair_timeout(void *arg)
 		return;
 	}
 
+	/*
+	 * Shutdown condition: We set qpair->timer_armed to false in
+	 * nvme_qpair_destroy before calling callout_drain. When we call that,
+	 * this routine might get called one last time. Exit w/o setting a
+	 * timeout. None of the watchdog stuff needs to be done since we're
+	 * destroying the qpair.
+	 */
+	if (!qpair->timer_armed) {
+		nvme_printf(qpair->ctrlr,
+		    "Timeout fired during nvme_qpair_destroy\n");
+		return;
+	}
+
 	switch (qpair->recovery_state) {
 	case RECOVERY_NONE:
 		/*