[PATCH] ZFS: Access vdevs by GUIDs when necessary/appropriate.

Justin T. Gibbs gibbs at scsiguy.com
Fri Feb 4 18:19:43 UTC 2011


The attached patch addresses three issues:

 o The Zpool command (via libzfs) does not display vdev GUID information
   in all cases where it is the only safe way to refer to a vdev.  For
   example, once a vdev is removed, there is no guarantee that it will
   return via the same devfs node.  The Zpool command already returned GUID
   information for vdevs found to be missing at system boot time.  This
   change does this for vdevs that go missing some time after a pool is
   initially imported/mounted.

 o The Zpool command does not allow a vdev to be referenced by GUID
   for operations such as "online" or "replace".  This is especially
   frustrating when the vdev you wish to replace was "last seen at"
   a devfs path that is currently owned by an active member of the pool.
   In this situation I know of no way to properly replace a vdev, or
   in the case of an online, to insure that the device is associated with
   the correct in-core vdev metadata.

 o The ZFS vdev geom provider uses a incomplete heuristic to determine
   if a vdev open is for a "new device" or one that has been seen/labeled
   before.  This heuristic fails during a "zpool online" operation allowing
   the open to associate with a geom provider at the "last seen at" device
   path when a device exists in the system (at a different device path)
   with a proper match of the pool and vdev GUIDs referenced in the open.

These patches are against -current.  Checkin comments indicating what I
changed and why are also in the attached file.  I haven't reviewed the
userland aspects of the pending v28 import yet, but I know that the vdev
geom changes are appropriate there too.

I would appreciate review and comments on these changes before I commit
them to -current and -stable.

Thanks,
Justin
-------------- next part --------------
Change 477305 by justing at justing-ns1 on 2011/02/03 14:32:58

	Allow zpool operations on vdev's specified by GUID.
	
	cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
		The "zpool status" command reports the "last seen at"
		device node path when the vdev name is being reported
		by GUID.  Augment this code to assume a GUID is reported
		when a device goes missing after initial boot in addition
		to the previous behavior of doing this for devices that
		weren't seen at boot.
	
	cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c:
		o In zpool_vdev_name(), report recently missing devices
		  by GUID.  There is no guarantee they will return at
		  their previous location.
		o In vdev_to_nvlist_iter(), remove the test for the
		  ZPOOL_CONFIG_NOT_PRESENT vdev attribute before allowing
		  a search by GUID.  The search parameter unambiguously
		  indicates the desired search behavior and this test
		  made it impossible to search for an active (or removed
		  but active at boot) device by GUID.

Affected files ...

... //depot/SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#5 edit
... //depot/SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c#6 edit

Differences ...

==== //depot/SpectraBSD/head/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#5 (text) ====

@@ -1071,7 +1071,8 @@
 	}
 
 	if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
-	    &notpresent) == 0) {
+	    &notpresent) == 0 ||
+	    vs->vs_state <= VDEV_STATE_CANT_OPEN) {
 		char *path;
 		verify(nvlist_lookup_string(nv, ZPOOL_CONFIG_PATH, &path) == 0);
 		(void) printf("  was %s", path);

==== //depot/SpectraBSD/head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c#6 (text) ====

@@ -1391,11 +1391,9 @@
 
 	verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &theguid) == 0);
 
-	if (search == NULL &&
-	    nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &present) == 0) {
+	if (search == NULL) {
 		/*
-		 * If the device has never been present since import, the only
-		 * reliable way to match the vdev is by GUID.
+		 * Search by GUID.
 		 */
 		if (theguid == guid)
 			return (nv);
@@ -2396,9 +2394,17 @@
 	char buf[64];
 	vdev_stat_t *vs;
 	uint_t vsc;
+	int have_stats;
 
-	if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT,
-	    &value) == 0) {
+	have_stats = nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_STATS,
+	    (uint64_t **)&vs, &vsc) == 0;
+
+	/*
+	 * If the device is not currently present, assume it will not
+	 * come back at the same device path.  Display the device by GUID.
+	 */
+	if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_NOT_PRESENT, &value) == 0 ||
+	    have_stats != 0 && vs->vs_state <= VDEV_STATE_CANT_OPEN) {
 		verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID,
 		    &value) == 0);
 		(void) snprintf(buf, sizeof (buf), "%llu",
@@ -2412,8 +2418,7 @@
 		 * open a misbehaving device, which can have undesirable
 		 * effects.
 		 */
-		if ((nvlist_lookup_uint64_array(nv, ZPOOL_CONFIG_STATS,
-		    (uint64_t **)&vs, &vsc) != 0 ||
+		if ((have_stats == 0 ||
 		    vs->vs_state >= VDEV_STATE_DEGRADED) &&
 		    zhp != NULL &&
 		    nvlist_lookup_string(nv, ZPOOL_CONFIG_DEVID, &devid) == 0) {

Change 477306 by justing at justing-ns1 on 2011/02/03 14:49:16

	Modify the geom vdev provider's open behavior so that it will
	only unconditionally open a device by path if the open is part
	of a pool create or device add operation, and a search of all
	known geom provider's label data doesn't yield a device with
	matching pool and vdev GUIDs.
	
	This fixes a bug where the wrong disk could be associated with
	a vdev's configuration data when device devfs paths change due to
	insert and remove events.  While, ZFS detects this kind of coding
	mixup and immediately flags the device as faulted before the
	confusion can cause permanent data loss, a reboot was necessary
	in order to resurrect the configuration.
	
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
		o When opening by GUID, require both the pool and vdev
		  GUIDs to match.  While it is highly unlikely for two
		  vdevs to have the same vdev GUIDs, the ZFS storage
		  pool allocator only guarantees they are unique within
		  a pool.
		o Modify the open behavior to:
		    - Open by recorded device path with GUID matching
		    - If that fails, search all geom providers for a
		      device with matching GUIDs.
		    - If that fails and we are opening a "new to a
		      pool configuration" vdev, open by path.
		    - Otherwise fail the open.

Affected files ...

... //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#8 edit

Differences ...

==== //depot/SpectraBSD/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#8 (text) ====

@@ -160,20 +160,26 @@
 	}
 }
 
-static uint64_t
-nvlist_get_guid(nvlist_t *list)
+static void
+nvlist_get_guids(nvlist_t *list, uint64_t *pguid, uint64_t *vguid)
 {
 	nvpair_t *elem = NULL;
-	uint64_t value;
 
+	*vguid = 0;
+	*pguid = 0;
 	while ((elem = nvlist_next_nvpair(list, elem)) != NULL) {
-		if (nvpair_type(elem) == DATA_TYPE_UINT64 &&
-		    strcmp(nvpair_name(elem), "guid") == 0) {
-			VERIFY(nvpair_value_uint64(elem, &value) == 0);
-			return (value);
+		if (nvpair_type(elem) != DATA_TYPE_UINT64)
+			continue;
+
+		if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_POOL_GUID) == 0) {
+			VERIFY(nvpair_value_uint64(elem, pguid) == 0);
+		} else if (strcmp(nvpair_name(elem), ZPOOL_CONFIG_GUID) == 0) {
+			VERIFY(nvpair_value_uint64(elem, vguid) == 0);
 		}
+
+		if (*pguid != 0 && *vguid != 0)
+			break;
 	}
-	return (0);
 }
 
 static int
@@ -211,8 +217,8 @@
 	return (error);
 }
 
-static uint64_t
-vdev_geom_read_guid(struct g_consumer *cp)
+static void
+vdev_geom_read_guids(struct g_consumer *cp, uint64_t *pguid, uint64_t *vguid)
 {
 	struct g_provider *pp;
 	vdev_label_t *label;
@@ -220,16 +226,17 @@
 	size_t buflen;
 	uint64_t psize;
 	off_t offset, size;
-	uint64_t guid;
 	int error, l, len, iszvol;
 
 	g_topology_assert_not();
 
+	*pguid = 0;
+	*vguid = 0;
 	pp = cp->provider;
 	ZFS_LOG(1, "Reading guid from %s...", pp->name);
 	if (g_getattr("ZFS::iszvol", cp, &iszvol) == 0 && iszvol) {
 		ZFS_LOG(1, "Skipping ZVOL-based provider %s.", pp->name);
-		return (0);
+		return;
 	}
 
 	psize = pp->mediasize;
@@ -238,7 +245,6 @@
 	size = sizeof(*label) + pp->sectorsize -
 	    ((sizeof(*label) - 1) % pp->sectorsize) - 1;
 
-	guid = 0;
 	label = kmem_alloc(size, KM_SLEEP);
 	buflen = sizeof(label->vl_vdev_phys.vp_nvlist);
 
@@ -256,20 +262,21 @@
 		if (nvlist_unpack(buf, buflen, &config, 0) != 0)
 			continue;
 
-		guid = nvlist_get_guid(config);
+		nvlist_get_guids(config, pguid, vguid);
 		nvlist_free(config);
-		if (guid != 0)
+		if (*pguid != 0 && *vguid != 0)
 			break;
 	}
 
 	kmem_free(label, size);
-	if (guid != 0)
-		ZFS_LOG(1, "guid for %s is %ju", pp->name, (uintmax_t)guid);
-	return (guid);
+	if (*pguid != 0 && *vguid != 0)
+		ZFS_LOG(1, "guid for %s is %ju:%ju", pp->name,
+		    (uintmax_t)*pguid, (uintmax_t)*vguid);
 }
 
 struct vdev_geom_find {
-	uint64_t guid;
+	uint64_t pool_guid;
+	uint64_t vdev_guid;
 	struct g_consumer *cp;
 };
 
@@ -289,7 +296,8 @@
 	struct g_geom *gp, *zgp;
 	struct g_provider *pp;
 	struct g_consumer *zcp;
-	uint64_t guid;
+	uint64_t pguid;
+	uint64_t vguid;
 
 	g_topology_assert();
 
@@ -315,15 +323,16 @@
 					continue;
 				}
 				g_topology_unlock();
-				guid = vdev_geom_read_guid(zcp);
+				vdev_geom_read_guids(zcp, &pguid, &vguid);
 				g_topology_lock();
 				g_access(zcp, -1, 0, 0);
 				g_detach(zcp);
-				if (guid != ap->guid)
+				if (pguid != ap->pool_guid ||
+				    vguid != ap->vdev_guid)
 					continue;
 				ap->cp = vdev_geom_attach(pp);
 				if (ap->cp == NULL) {
-					printf("ZFS WARNING: Unable to attach to %s.\n",
+					ZFS_LOG(1, "ZFS WARNING: Unable to attach to %s.",
 					    pp->name);
 					continue;
 				}
@@ -338,13 +347,14 @@
 }
 
 static struct g_consumer *
-vdev_geom_attach_by_guid(uint64_t guid)
+vdev_geom_attach_by_guid(uint64_t pool_guid, uint64_t vdev_guid)
 {
 	struct vdev_geom_find *ap;
 	struct g_consumer *cp;
 
 	ap = kmem_zalloc(sizeof(*ap), KM_SLEEP);
-	ap->guid = guid;
+	ap->pool_guid = pool_guid;
+	ap->vdev_guid = vdev_guid;
 	g_waitfor_event(vdev_geom_attach_by_guid_event, ap, M_WAITOK, NULL);
 	cp = ap->cp;
 	kmem_free(ap, sizeof(*ap));
@@ -352,14 +362,14 @@
 }
 
 static struct g_consumer *
-vdev_geom_open_by_guid(vdev_t *vd)
+vdev_geom_open_by_guids(vdev_t *vd)
 {
 	struct g_consumer *cp;
 	char *buf;
 	size_t len;
 
 	ZFS_LOG(1, "Searching by guid [%ju].", (uintmax_t)vd->vdev_guid);
-	cp = vdev_geom_attach_by_guid(vd->vdev_guid);
+	cp = vdev_geom_attach_by_guid(spa_guid(vd->vdev_spa), vd->vdev_guid);
 	if (cp != NULL) {
 		len = strlen(cp->provider->name) + strlen("/dev/") + 1;
 		buf = kmem_alloc(len, KM_SLEEP);
@@ -368,10 +378,12 @@
 		spa_strfree(vd->vdev_path);
 		vd->vdev_path = buf;
 
-		ZFS_LOG(1, "Attach by guid [%ju] succeeded, provider %s.",
+		ZFS_LOG(1, "Attach by guid [%ju:%ju] succeeded, provider %s.",
+		    (uintmax_t)spa_guid(vd->vdev_spa),
 		    (uintmax_t)vd->vdev_guid, vd->vdev_path);
 	} else {
-		ZFS_LOG(1, "Search by guid [%ju] failed.",
+		ZFS_LOG(1, "Search by guid [%ju:%ju] failed.",
+		    (uintmax_t)spa_guid(vd->vdev_spa),
 		    (uintmax_t)vd->vdev_guid);
 	}
 
@@ -383,7 +395,8 @@
 {
 	struct g_provider *pp;
 	struct g_consumer *cp;
-	uint64_t guid;
+	uint64_t pguid;
+	uint64_t vguid;
 
 	cp = NULL;
 	g_topology_lock();
@@ -393,14 +406,17 @@
 		cp = vdev_geom_attach(pp);
 		if (cp != NULL && check_guid) {
 			g_topology_unlock();
-			guid = vdev_geom_read_guid(cp);
+			vdev_geom_read_guids(cp, &pguid, &vguid);
 			g_topology_lock();
-			if (guid != vd->vdev_guid) {
+			if (pguid != spa_guid(vd->vdev_spa) ||
+			    vguid != vd->vdev_guid) {
 				vdev_geom_detach(cp, 0);
 				cp = NULL;
 				ZFS_LOG(1, "guid mismatch for provider %s: "
-				    "%ju != %ju.", vd->vdev_path,
-				    (uintmax_t)vd->vdev_guid, (uintmax_t)guid);
+				    "%ju:%ju != %ju:%ju.", vd->vdev_path,
+				    (uintmax_t)spa_guid(vd->vdev_spa),
+				    (uintmax_t)vd->vdev_guid,
+				    (uintmax_t)pguid, (uintmax_t)vguid);
 			} else {
 				ZFS_LOG(1, "guid match for provider %s.",
 				    vd->vdev_path);
@@ -434,22 +450,42 @@
 	error = 0;
 
 	/*
-	 * If we're creating pool, just find GEOM provider by its name
-	 * and ignore GUID mismatches.
+	 * Try using the recorded path for this device, but only
+	 * accept it if its label data contains the expected GUIDs.
 	 */
-	if (vd->vdev_spa->spa_load_state == SPA_LOAD_NONE)
+	cp = vdev_geom_open_by_path(vd, 1);
+	if (cp == NULL) {
+		/*
+		 * The device at vd->vdev_path doesn't have the
+		 * expected GUIDs. The disks might have merely
+		 * moved around so try all other GEOM providers
+		 * to find one with the right GUIDs.
+		 */
+		cp = vdev_geom_open_by_guids(vd);
+	}
+
+	if (cp == NULL &&
+	    vd->vdev_prevstate == VDEV_STATE_UNKNOWN &&
+	    vd->vdev_spa->spa_load_state == SPA_LOAD_NONE) {
+		/*
+		 * We are dealing with a vdev that hasn't been previosly
+		 * opened (since boot), and we are not loading an
+		 * existing pool configuration.  This looks like a
+		 * vdev add operation to a new or existing pool.
+		 * Assume the user knows what he/she is doing and find
+		 * GEOM provider by its name, ignoring GUID mismatches.
+		 *
+		 * XXPOLICY: It would be safer to only allow a device
+		 *           that is devoid of label information to be
+		 *           opened in this fashion.  This would require
+		 *           a new option to the zpool command line tool
+		 *           allowing the label information to be wiped
+		 *           on a device, and augmented error reporting
+		 *           so the user can understand why their request
+		 *           failed and the required steps to repurpose
+		 *           the device.
+		 */
 		cp = vdev_geom_open_by_path(vd, 0);
-	else {
-		cp = vdev_geom_open_by_path(vd, 1);
-		if (cp == NULL) {
-			/*
-			 * The device at vd->vdev_path doesn't have the
-			 * expected guid. The disks might have merely
-			 * moved around so try all other GEOM providers
-			 * to find one with the right guid.
-			 */
-			cp = vdev_geom_open_by_guid(vd);
-		}
 	}
 
 	if (cp == NULL) {


More information about the freebsd-fs mailing list