Device departure processing

Mon Jan 31 21:58:00 UTC 2011

On both -current and -stable (v16), I've been able to reliably panic the
system by pulling a drive (leaf vdev in a RAIDZ2) while I/O is
in progress.  The source of the panic is either from GEOM complaining
that vdev_geom's consumer isn't idle, or "use after free" bugs caused
by in-flight I/O (often due to a re-probe).  The following patch resolves
the issue for me.

>From a brief review of the v28 code, it seems appropriate there too.
The code in vdev_geom_orphan is identical save for an added call to
zfs_post_remove().  In our port, this call appears to do nothing more
than to post the devctl notice event a little earlier than via the
scheduled removal of the device by the spa code.  Is there some benefit
I'm missing that makes the zfs_post_remove() necessary?

--
Justin
-------------- next part --------------
Change 475132 by justing at justing-ns1 on 2011/01/14 13:08:34

	Correct ZFS panic during device removal.
	
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
		When the ZFS geom device consumer is orphaned, simply
		signal the SPA to begin device removal proceedings instead
		of attempting to destroy the consumer directly.
	
		The orphan handler is called with the geom topology lock
		held from the geom event thread.  In order to perform
		a complete consumer shutdown from this context, the
		SPA ZIO configuration lock would need to be held as a
		writer to insure that no ZIOs are active on this consumer.
		This lock acquisition order is the reverse of the normal
		SPA config lock -> geom topology lock order.  We could
		probably drop locks and acquire in the correct order, but
		it is much simpler to just defer all of this until the
		SPA's eventual close of the vdev_geom instance.

Affected files ...

... //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#7 edit

Differences ...

==== //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#7 (text) ====

@@ -50,28 +50,26 @@
 static void
 vdev_geom_orphan(struct g_consumer *cp)
 {
-	struct g_geom *gp;
 	vdev_t *vd;
-	int error;
 
 	g_topology_assert();
 
 	vd = cp->private;
-	gp = cp->geom;
-	error = cp->provider->error;
 
-	ZFS_LOG(1, "Closing access to %s.", cp->provider->name);
-	if (cp->acr + cp->acw + cp->ace > 0)
-		g_access(cp, -cp->acr, -cp->acw, -cp->ace);
-	ZFS_LOG(1, "Destroyed consumer to %s.", cp->provider->name);
-	g_detach(cp);
-	g_destroy_consumer(cp);
-	/* Destroy geom if there are no consumers left. */
-	if (LIST_EMPTY(&gp->consumer)) {
-		ZFS_LOG(1, "Destroyed geom %s.", gp->name);
-		g_wither_geom(gp, error);
-	}
-	vd->vdev_tsd = NULL;
+	/*
+	 * Orphan callbacks occur from the GEOM event thread.
+	 * Concurrent with this call, new I/O requests may be
+	 * working their way through GEOM about to find out
+	 * (only once executed by the g_down thread) that we've
+	 * been orphaned from our disk provider.  These I/Os
+	 * must be retired before we can detach our consumer.
+	 * This is most easily achieved by acquiring the
+	 * SPA ZIO configuration lock as a writer, but doing
+	 * so with the GEOM topology lock held would cause
+	 * a lock order reversal.  Instead, rely on the SPA's
+	 * async removal support to invoke a close on this
+	 * vdev once it is safe to do so.
+	 */
 	vd->vdev_remove_wanted = B_TRUE;
 	spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE);
 }