ZFS 4K drive overhead

Sun May 6 13:55:32 UTC 2012

A couple months back, I finished rebuilding my spools to use
ashift=12, after trying a 4k drive in a pool with ashift=9.  If you
try a 4k drive on an ashift=9 pool, you're going to have a bad time.

Performance for occasional IO (particularly streaming) isn't too bad
with mis-aligned sectors.  However, resilvering time is MUCH, MUCH,
MUCH higher - I saw estimates for resilver completion go up by over an
order of magnitude, and pool performance become nearly unusable while
a resilver was in operation.

ZFS will dynamically adjust block size for a file, between the
smallest block size the media supports and 128k or so (IIRC).  That
means that even if you align a partition on your 4k disk, or use the
raw disk itself (so ZFS starts on an aligned sector), after the first
small file is written you'll be doing un-aligned IOs.  Resilvering a
1.5 TB drive was estimated at over 230 hours for me; it was actually
less time to abort and rebuild the server from backups.

Given the propensity of 4k drives on the market now, and the
likelihood that they'll be the only product available in the future,
I'd highly recommend using ashift=12 on any new zpools.  It's time to
stop using ashift=9.

On Sun, May 6, 2012 at 4:46 AM, Miroslav Lachman <000.fbsd at quip.cz> wrote:
> Chris wrote:
>>
>> Hi all,
>>
>> I'm planning on making a raidz2 with 6 2 TB drives - all 4K sectors,
>> all reporting as 512 bytes. I've been reading some disturbing things
>> about ZFS when used on 4K drives. In this discussion
>>
>> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049959.html),
>> Jim Klimov pointed out that when ZFS is used with ashift=12, the
>> metadata overhead for a filesystem with a lot of small files can reach
>> 100%
>> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049960.html)!
>> That seems pretty bad to me. My questions are:
>>
>> Does anyone on this list have experience using ZFS on 4K drives with
>> ashift=12? Is the overhead per file, such that having a relatively
>> large average filesize, say, 19 MB, would render it insignificant? Or
>> would the overhead be large regardless?
>
>
> Average size of 19MB is much more larger than 4k (metadata), the overhead
> will be not so high as with really small files (files with size of few kB).
>
>
>> What is the speed penalty for using ashift=9 on the array? Is the
>> safety of the data on the array an issue  (due to how ZFS can't write
>> to a 512 byte sector but it's coded with the assumption that it can
>> thus making it no longer strictly copy-on-write)? Does anyone have any
>> experience with ashift=9 arrays on 4K drives?
>
>
> Even if the overhead will be larger, the speed penalty is much higher. You
> should read about it in some post on this blog:
>
> http://blog.des.no/search/label/freebsd
>
> There are various articles with banchmarks of 4k sectors drives and some of
> them are almost useless with unaligned writes. So I strongly recommend you
> to use 4k (ashift=12).
>
> Use ashift=9 only if performance doesn't metter and you are concerned only
> on available space.
>
> Miroslav Lachman
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"