ZFS stripesize patch (in the context of 4k sector drives)

Ivan Voras ivoras at freebsd.org
Sun Dec 5 14:29:14 UTC 2010


On 11/18/10 09:24, Xin LI wrote:
> On 11/12/10 10:09, Ivan Voras wrote:
>> On 11/12/10 16:00, Ivan Voras wrote:
>>> Hello,
>>>
>>> Any objections to me committing the following patch?
>>>
>>> The intention is to use stripesize info from GEOM in creating vdevs, in
>>> the hope that the 4 KiB sector magic will work.
> 
>> Or maybe not. I've grepped and other tools use stripesize in the way its
>> name suggests - as RAID stripe size, not as logical sector size.
> 
>> New idea on the menu: make the logical sector size a separate concept
>> and a separate variable from stripe size. Would that be a better approach?
> 
> Have you tested this booting from existing ZFS file system?

No, but it will probably work because ashift is stored in ZFS metadata
and compliant implementations should read it.

I did tests with ZFS and combined ashift and sector sizes with gnop and
here is what is possible and what isn't:

* Pools created with ashift of 512 and imported while sectorsize in GEOM
is 512 byte will work.
* Pools created with ashift of 512 and imported while sectorsize in GEOM
is 4096 will NOT work
* Pools created with ashift of 4096 and imported while sectorsize in
GEOM is 512 byte will work
* Pools created with ashift of 4096 and imported while sectorsize in
GEOM is 4096 byte will work.

Basically, only increasing sectorsize (i.e. minimum IO alignment) will
cause drives which had formerly been formatted with old (512 byte)
sector size will not work. Personally, I'd still do it sooner rather
than later to reduce the number of users which have problems with it,
but after discussing it with mav I also understand the conservative side.

Also from this discussion came the idea of capping ashift to some upper
value. SPA_MAXBLOCKSIZE (128 KiB) looks reasonable for this so here's an
updated patch. As the goal is to deal with current 4 KiB sector drives,
the whole thing may need to be revisited in the future if there are
other devices which fill in stripesize (probably by introducing a
"physsectorsize" field).

Comments? Ideas?


--- vdev_geom.c.ori	2010-12-05 15:08:09.000000000 +0100
+++ vdev_geom.c	2010-12-05 15:10:50.000000000 +0100
@@ -496,7 +496,10 @@
 	/*
 	 * Determine the device's minimum transfer size.
 	 */
-	*ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;
+	if (pp->stripesize != 0 && pp->stripesize > pp->sectorsize)
+		*ashift = highbit(MIN(pp->stripesize, SPA_MAXBLOCKSIZE)) - 1;
+	else
+		*ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;

 	/*
 	 * Clear the nowritecache bit, so that on a vdev_reopen() we will



More information about the freebsd-geom mailing list