effect of differing spindle speeds on prospective zfs vdevs

Mon Dec 7 22:23:26 UTC 2020

On Sat, 5 Dec 2020 19:16:33 +0000, tech-lists <tech-lists at zyxst.net> wrote:

> Hi,
> 
> On Sat, Dec 05, 2020 at 08:51:08AM -0500, Paul Mather wrote:
>> IIRC, ZFS pools have a single ashift for the entire pool, so you should 
>> set it to accommodate the 4096/4096 devices to avoid performance 
>> degradation.  I believe it defaults to that now, and should auto-detect 
>> anyway.  But, in a mixed setup of vdevs like you have, you should be 
>> using ashift=12.
>> 
>> I believe having an ashift=9 on your mixed-drive setup would have the 
>> biggest performance impact in terms of reducing performance.
> 
> Part of my confusion about the ashift thing is I thought ashift=9 was for
> 512/512 logical/physical. Is this still the case?
> 
> On a different machine which has been running since FreeBSD12 was -current,
> one of the disks in the array went bang. zdb shows ashift=9 (as was default
> when it was created). The only available replacement was an otherwise 
> identical disk but 512 logical/4096 physical. zpool status mildly warns 
> about preformance degradation like this:
> 
> ada2    ONLINE       0     0     0  block size: 512B configured, 4096B native
> 
>  state: ONLINE
> status: One or more devices are configured to use a non-native block size.
>      Expect reduced performance.
> action: Replace affected devices with devices that support the
>      configured block size, or migrate data to a properly configured
>      pool.
> 
> The other part of my confusion is that I understood zfs to set its own 
> blocksize on the fly.

You're correct in that ZFS has its own concept of a block size (the "recordsize" property) but this is not the same as the block size concerning ashift.  When "zpool" complains about "non-native block size" it is talking about the physical block size of the underlying vdev.  That is the smallest unit of data that are read or written from the device.  (It also has an impact on where partitions can be addressed.)

When hard drives became larger the number of bits used to address logical blocks (LBAs) became insufficient to reference all blocks on the device.  One way around this, and to enable devices to store more total data, was to make the referenced blocks larger.  (Larger block sizes are also good in that they require relatively less space for ECC data.)  Hence, the 4K "advanced format" drives arrived.  Before that, block (a.k.a. sector) sizes typically had been 512 bytes for hard drives.  After, it became 4096 bytes.

For some drives, the device actually utilises 4096-byte sectors but advertises a 512-byte sector size to the outside world.  From a read standpoint this doesn't create a problem.  It is when writing that you can incur performance issues.  This is because writing/updating a 512-byte sector within a 4096-byte physical sector involves a read-modify-write operation: the original 4096-byte contents must be read, then the 512-byte subset updated, and finally the new 4096-byte whole re-written back to disk.  That involves more than simply writing a 512-byte block as-is to a 512-byte sector.  (In similar fashion, partitions not aligned on a 4K boundary can incur performance degradation for 4096-byte physical sectors that advertise as 512-byte.)

> (I guess there must be some performance degradation but it's not
> yet enough for me to notice. Or it might only be noticable if low on space).

ZFS has a lot of caching, plus the use of ZIL "batches" writes, and all of this can ameliorate the effects of misaligned block sizes and partition boundaries.  (Large sequential writes are best for performance, especially in spinning disks that incur penalties for head movement and can incur rotational delays.)  But, if you have a write-intensive pool, you are unnecessarily causing yourself a performance hit by not using the correct ashift and/or partition boundaries.

BTW, low space mainly affects performance due to fragmentation.  It is a different issue vs. mismatched block size (ashift).

When I replaced my ashift=9 512-byte drives I eventually recreated the pool with ashift=12.  Using ashift=12 on pools with 512-byte sector size drives will not incur any performance penalty, which is why ashift defaults to 12 nowadays.  (I wouldn't be surprised if the default changes to ashift=13 due to the prevalence of SSDs these days.)

Cheers,

Paul.