ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool?

Taylor j.freebsd-zfs at enone.net
Sat Mar 24 18:38:57 UTC 2012


Dennis,

This is a bit off topic from my original question and I'm hoping not to distract from it too much,
but to briefly answer your question:

My experience with 4TB Hitachi drives is limited; I've only had these drives for about a week. One of the
drives exhibited ICRC errors, which in theory could be just a cabling issue, but I couldn't reproduce
the problem with the same cable/slot and different drive, so I ended up RMAing the ICRC drive just
in case. However, I have had good luck with Hitachi 3TB drives over the past year, one Hitachi
4TB drive over the last month and have not encountered any other problems with this
batch of 4TB drives so far.

Cheers,

-Taylor


On Mar 23, 2012, at 9:40 AM, Dennis Glatting wrote:

> 
> Somewhat related:
> 
> I am also using 4TB Hitachi drives but only four. Although fairly happy with these drives I have had one disk fail in the two months I have been using them. This may have been an infant failure but I am wondering if you have had any similar experiances with the drives.
> 
> 
> 
> On Fri, 23 Mar 2012, Taylor wrote:
> 
>> Hello,
>> 
>> I'm bringing up a new ZFS filesystem and have noticed something strange with respect to the overhead from ZFS. When I create a raidz2 pool with 512-byte sectors (ashift=9), I have an overhead of 2.59%, but when I create the zpool using 4k sectors (ashift=12), I have an overhead of 8.06%. This amounts to a difference of 2.79TiB in my particular application, which I'd like to avoid. :)
>> 
>> (Assuming I haven't done anything wrong. :) ) Is the extra overhead for 4k sector (ashift=12) raidz2 pools expected? Is there any way to reduce this?
>> 
>> (In my very limited performance testing, 4K sectors do seem to perform slightly better and more consistently, so I'd like to use them if I can avoid the extra overhead.)
>> 
>> Details below.
>> 
>> Thanks in advance for your time,
>> 
>> -Taylor
>> 
>> 
>> 
>> I'm running:
>> FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0  amd64
>> 
>> I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector devices.
>> 
>> In order to "future proof" the raidz2 pool against possible variations in replacement drive size, I've created a single partition on each drive, starting at sector 2048 and using 100MB less than total available space on the disk.
>> $ sudo gpart list da2
>> Geom name: da2
>> modified: false
>> state: OK
>> fwheads: 255
>> fwsectors: 63
>> last: 7814037134
>> first: 34
>> entries: 128
>> scheme: GPT
>> Providers:
>> 1. Name: da2p1
>> Mediasize: 4000682172416 (3.7T)
>> Sectorsize: 512
>> Stripesize: 0
>> Stripeoffset: 1048576
>> Mode: r1w1e1
>> rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634
>> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
>> label: (null)
>> length: 4000682172416
>> offset: 1048576
>> type: freebsd-zfs
>> index: 1
>> end: 7813834415
>> start: 2048
>> Consumers:
>> 1. Name: da2
>> Mediasize: 4000787030016 (3.7T)
>> Sectorsize: 512
>> Mode: r1w1e2
>> 
>> Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using 16 drives.  I create the zpool with 4K sectors as follows:
>> $ sudo gnop create -S 4096 /dev/da2p1
>> $ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 da7p1 da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 da17p1
>> 
>> I confirm ashift=12:
>> $ sudo zdb zav | grep ashift
>>              ashift: 12
>>              ashift: 12
>> 
>> "zpool list" approximately matches the expected raw capacity of 16*4000682172416 = 64010914758656 bytes (58.28 TiB).
>> $ zpool list zav
>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> zav     58T  1.34M  58.0T     0%  1.00x  ONLINE  -
>> 
>> For raidz2, I'd expect to see 4000682172416*14 = 56009550413824 bytes (50.94 TiB). However, I only get:
>> $ zfs list zav
>> NAME   USED  AVAIL  REFER  MOUNTPOINT
>> zav   1.10M  46.8T  354K  /zav
>> 
>> Or using df for greater accuracy:
>> $ df zav
>> Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
>> zav        50288393472  354 50288393117     0%    /zav
>> 
>> A total of 51495314915328 bytes (46.83TiB). (This is for a freshly created zpool before any snapshots, etc. have been performed.)
>> 
>> I measure overhead as "expected - actual / expected", which in the case of 4k sector (ashift=12) raidz2 comes to 8.05%.
>> 
>> To create a 512-byte sector (ashift=9) raidz2 pool, I basically just replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm ashift=9. zpool raw size is the same (as much as I can tell with such limited precision from zpool list). However, the available size according to zfs list/df is 54560512935936 bytes (49.62 TiB), which amounts to an overhead of 2.58%. There are some minor differences in ALLOC and USED size listings, so I repeat them here for the 512-byte sector raidz2 pool:
>> $ zpool list zav
>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> zav     58T   228K  58.0T     0%  1.00x  ONLINE  -
>> $ zfs list zav
>> NAME   USED  AVAIL  REFER  MOUNTPOINT
>> zav    198K  49.6T  73.0K  /zav
>> $ df zav
>> Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
>> zav        53281750914   73 53281750841     0%    /zav
>> 
>> I expect some overhead from ZFS and according to this blog post:
>> http://www.cuddletech.com/blog/pivot/entry.php?id=1013
>> (via http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)
>> there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly enough, when I create a pool with no raid/mirroring, I get an overhead of 1.93% regardless of ashift=9 or ashift=12 which is quite close to the 1/64 number. I have also tested raidz, which has similar behavior to raidz2, however the overhead is slightly less in each case: 1) ashift=9 raidz overhead is 2.33% and 2) ashift=12 raidz overhead is 7.04%.
>> 
>> In order to preserve space, I've put the zdb listings for both ashift=9 and ashift=12 radiz2 pools here:
>> http://pastebin.com/v2xjZkNw
>> 
>> There are also some differences in ZDB output, for example "SPA allocated" is higher for in the 4K sector raidz2 pool, which seems interesting, although I don't comprehend the significance of this._______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>> 
>> 
> 



More information about the freebsd-fs mailing list