git: cb702452adeb - stable/13 - g_mirror: don't fail reads while losing next-to-last disk

Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Thu, 17 Feb 2022 06:48:34 UTC
The branch stable/13 has been updated by avg:

URL: https://cgit.FreeBSD.org/src/commit/?id=cb702452adeb2cab1e5c083f81108f049bcafa3a

commit cb702452adeb2cab1e5c083f81108f049bcafa3a
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2022-01-27 10:49:04 +0000
Commit:     Andriy Gapon <avg@FreeBSD.org>
CommitDate: 2022-02-17 06:46:14 +0000

    g_mirror: don't fail reads while losing next-to-last disk
    
    I observed a situation where some read requests failed when a 2-way geom
    mirror lost one disk.  The problem appears to be in the logic that skips
    retrying a failed request when a mirror has only one active disk.
    Generally, that makes sense.  But during a transition from two disks to
    one it is possible that the request failed on the failing disk before it
    was inactivated and, so, the remaining active disk is the disk that
    should be tried.
    
    This change adds an additional check to ensure that it was the (only)
    active disk that was already tried.
    
    (cherry picked from commit 5d5f44623eb3d121d528060d131ee5d6bcd63489)
---
 sys/geom/mirror/g_mirror.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/sys/geom/mirror/g_mirror.c b/sys/geom/mirror/g_mirror.c
index 455aae4bebf8..cb5129dabd33 100644
--- a/sys/geom/mirror/g_mirror.c
+++ b/sys/geom/mirror/g_mirror.c
@@ -1035,9 +1035,19 @@ g_mirror_regular_request(struct g_mirror_softc *sc, struct bio *bp)
 	case BIO_READ:
 		if (pbp->bio_inbed < pbp->bio_children)
 			break;
-		if (g_mirror_ndisks(sc, G_MIRROR_DISK_STATE_ACTIVE) == 1)
+
+		/*
+		 * If there is only one active disk we want to double-check that
+		 * it is, in fact, the disk that we already tried.  This is
+		 * necessary because we might have just lost a race with a
+		 * removal of the tried disk (likely because of the same error)
+		 * and the only remaining disk is still viable for a retry.
+		 */
+		if (g_mirror_ndisks(sc, G_MIRROR_DISK_STATE_ACTIVE) == 1 &&
+		    disk != NULL &&
+		    disk->d_state == G_MIRROR_DISK_STATE_ACTIVE) {
 			g_io_deliver(pbp, pbp->bio_error);
-		else {
+		} else {
 			pbp->bio_error = 0;
 			mtx_lock(&sc->sc_queue_mtx);
 			TAILQ_INSERT_TAIL(&sc->sc_queue, pbp, bio_queue);