ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool?
Dennis Glatting
dg at pki2.com
Fri Mar 23 16:40:33 UTC 2012
Somewhat related:
I am also using 4TB Hitachi drives but only four. Although fairly happy
with these drives I have had one disk fail in the two months I have been
using them. This may have been an infant failure but I am wondering if you
have had any similar experiances with the drives.
On Fri, 23 Mar 2012, Taylor wrote:
> Hello,
>
> I'm bringing up a new ZFS filesystem and have noticed something strange with respect to the overhead from ZFS. When I create a raidz2 pool with 512-byte sectors (ashift=9), I have an overhead of 2.59%, but when I create the zpool using 4k sectors (ashift=12), I have an overhead of 8.06%. This amounts to a difference of 2.79TiB in my particular application, which I'd like to avoid. :)
>
> (Assuming I haven't done anything wrong. :) ) Is the extra overhead for 4k sector (ashift=12) raidz2 pools expected? Is there any way to reduce this?
>
> (In my very limited performance testing, 4K sectors do seem to perform slightly better and more consistently, so I'd like to use them if I can avoid the extra overhead.)
>
> Details below.
>
> Thanks in advance for your time,
>
> -Taylor
>
>
>
> I'm running:
> FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0 amd64
>
> I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector devices.
>
> In order to "future proof" the raidz2 pool against possible variations in replacement drive size, I've created a single partition on each drive, starting at sector 2048 and using 100MB less than total available space on the disk.
> $ sudo gpart list da2
> Geom name: da2
> modified: false
> state: OK
> fwheads: 255
> fwsectors: 63
> last: 7814037134
> first: 34
> entries: 128
> scheme: GPT
> Providers:
> 1. Name: da2p1
> Mediasize: 4000682172416 (3.7T)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 1048576
> Mode: r1w1e1
> rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634
> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
> label: (null)
> length: 4000682172416
> offset: 1048576
> type: freebsd-zfs
> index: 1
> end: 7813834415
> start: 2048
> Consumers:
> 1. Name: da2
> Mediasize: 4000787030016 (3.7T)
> Sectorsize: 512
> Mode: r1w1e2
>
> Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using 16 drives. I create the zpool with 4K sectors as follows:
> $ sudo gnop create -S 4096 /dev/da2p1
> $ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 da7p1 da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 da17p1
>
> I confirm ashift=12:
> $ sudo zdb zav | grep ashift
> ashift: 12
> ashift: 12
>
> "zpool list" approximately matches the expected raw capacity of 16*4000682172416 = 64010914758656 bytes (58.28 TiB).
> $ zpool list zav
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> zav 58T 1.34M 58.0T 0% 1.00x ONLINE -
>
> For raidz2, I'd expect to see 4000682172416*14 = 56009550413824 bytes (50.94 TiB). However, I only get:
> $ zfs list zav
> NAME USED AVAIL REFER MOUNTPOINT
> zav 1.10M 46.8T 354K /zav
>
> Or using df for greater accuracy:
> $ df zav
> Filesystem 1K-blocks Used Avail Capacity Mounted on
> zav 50288393472 354 50288393117 0% /zav
>
> A total of 51495314915328 bytes (46.83TiB). (This is for a freshly created zpool before any snapshots, etc. have been performed.)
>
> I measure overhead as "expected - actual / expected", which in the case of 4k sector (ashift=12) raidz2 comes to 8.05%.
>
> To create a 512-byte sector (ashift=9) raidz2 pool, I basically just replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm ashift=9. zpool raw size is the same (as much as I can tell with such limited precision from zpool list). However, the available size according to zfs list/df is 54560512935936 bytes (49.62 TiB), which amounts to an overhead of 2.58%. There are some minor differences in ALLOC and USED size listings, so I repeat them here for the 512-byte sector raidz2 pool:
> $ zpool list zav
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> zav 58T 228K 58.0T 0% 1.00x ONLINE -
> $ zfs list zav
> NAME USED AVAIL REFER MOUNTPOINT
> zav 198K 49.6T 73.0K /zav
> $ df zav
> Filesystem 1K-blocks Used Avail Capacity Mounted on
> zav 53281750914 73 53281750841 0% /zav
>
> I expect some overhead from ZFS and according to this blog post:
> http://www.cuddletech.com/blog/pivot/entry.php?id=1013
> (via http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)
> there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly enough, when I create a pool with no raid/mirroring, I get an overhead of 1.93% regardless of ashift=9 or ashift=12 which is quite close to the 1/64 number. I have also tested raidz, which has similar behavior to raidz2, however the overhead is slightly less in each case: 1) ashift=9 raidz overhead is 2.33% and 2) ashift=12 raidz overhead is 7.04%.
>
> In order to preserve space, I've put the zdb listings for both ashift=9 and ashift=12 radiz2 pools here:
> http://pastebin.com/v2xjZkNw
>
> There are also some differences in ZDB output, for example "SPA allocated" is higher for in the 4K sector raidz2 pool, which seems interesting, although I don't comprehend the significance of this._______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
>
More information about the freebsd-fs
mailing list