getting to 4K disk blocks in ZFS

Aristedes Maniatis ari at ish.com.au
Wed Sep 10 06:47:15 UTC 2014


As we all know, it is important to ensure that modern disks are set up properly with the correct block size. Everything is good if all the disks and the pool are "ashift=9" (512 byte blocks). But as soon as one new drive requires 4k blocks, performance drops through the floor of the enture pool.


In order to upgrade there appear to be two separate things that must be done for a ZFS pool.

1. Create partitions on 4K boundaries. This is simple with the "-a 4k" option in gpart, and it isn't hard to remove disks one at a time from a pool, reformat them on the right boundaries and put them back. Hopefully you've left a few spare bytes on the disk to ensure that your partition doesn't get smaller when you reinsert it to the pool.

2. Create a brand new pool which has ashift=12 and zfs send|receive all the data over.


I guess I don't understand enough about zpool to know why the pool itself has a block size, since I understood ZFS to have variable stripe widths.

The problem with step 2 is that you need to have enough hard disks spare to create a whole new pool and throw away the old disks. Plus a disk controller with lots of spare ports. Plus the ability to take the system offline for hours or days while the migration happens.

One way to reduce this slightly is to create a new pool with reduced redundancy. For example, create a RAIDZ2 with two fake disks, then offline those disks.


So, given how much this problem sucks (it is extremely easy to add a 4K disk by mistake as a replacement for a failed disk), and how painful the workaround is... will ZFS ever gain the ability to change block size for the pool? Or is this so deep in the internals of ZFS it is as likely as being able to dynamically add disks to an existing zvol in the "never going to happen" basket?

And secondly, is it also bad to have ashift 9 disks inside a ashift 12 pool? That is, do we need to replace all our disks in one go and forever keep big sticky labels on each disk so we never mix them?



Thanks for any advice
Ari Maniatis





-- 
-------------------------->
Aristedes Maniatis
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


More information about the freebsd-stable mailing list