AF (4096 byte sector) drives: Can you mix/match in a ZFS pool?

Daniel Kalchev daniel at digsys.bg
Wed Oct 12 16:50:26 UTC 2011



On 12.10.11 18:59, Jeremy Chadwick wrote:
> On Wed, Oct 12, 2011 at 10:11:04AM -0500, Larry Rosenman wrote:
>> I have a root on ZFS box with 6 drives, all 400G (except one 500G)
>> in a pool.
>>
>> I want to upgrade to 2T or 3T drives, but was wondering if you can
>> mix/match while doing the drive by drive
>> replacement.
>>
>> This is on 9.0-BETA3 if that matters.
> This is a very good question, and opens a large can of worms.  My gut
> feeling tells me this discussion is going to be very long.
>
> I'm going to say that no, mixing 512-byte and 4096-byte sector drives in
> a single vdev is a bad idea.  Here's why:

This was not the original question. The original question is whether 
replacing 512-byte sector drives in a 512-byte sector aligned zpool with 
4096-byte sector drives is possible.

It is possible, of course, as most 4096-byte drives today emulate 
512-byte drives and some even pretend to be 512-byte sector drives.

Performance might degrade, this depends on the workload. In some cases 
the performance might be way bad.

>
> The procedure I've read for doing this is as follows:
>
> ada0 =  512-byte sector disk
> ada1 = 4096-byte sector disk
> ada2 =  512-byte sector disk
>
> gnop create -S 4096 ada1
> zpool create mypool raidz ada0 ada1.nop ada2
> zdb | grep ashift
>     <should show "ashift: 12" for 4096-byte alignment or "ashift: 9" for
>      512-byte alignment>
> zpool export mypool
> gnop destroy ada1.nop
> zpool import mypool

It is not important which of the underlying drives will be gnop-ed. You 
may well gnop all of these. The point is, that ZFS uses the largest 
sector size of any of the underlying devices to determine the ashift 
value. That is the "minimum write" value, or the smallest unit of data 
ZFS will write in an I/O.

> Circling back to the procedure I stated above: this would result in an
> ashift=12 alignment for all I/O to all underlying disks.  How do you
> think your 512-byte sector drives are going to perform when doing reads
> and writes?  (Answer: badly)

The gnop trick is used not because you will ask a 512-byte sector drive 
to write 8 sectors with one I/O, but because you may ask an 4096-byte 
sector drive to write only 512 bytes -- which for the drive means it has 
to read 4096 bytes, modify 512 of these bytes and write back 4096 bytes.

> So my advice is do not mix-match 512-byte and 4096-byte sector disks in a
> vdev that consists of multiple disks.
>

The proper way to handle this is to create your zpool with 4096-byte 
alignment, that is, for the time being by using the above gnop 'hack'.

This way, you are sure to not have performance implications no matter 
what (512 or 4096 byte) drives you use in the vdev.

There should be no implications to having one vdev with 512 byte 
alignment and another with 4096 byte alignment. ZFS is smart enough to 
issue minimum of 512 byte writes to the former and 4096 bytes to the 
latter thus not creating any bottleneck.

Daniel

PS: I didn't say you are wrong. ;)


More information about the freebsd-fs mailing list