16.0E ExpandSize? -- New Server

Steven Hartland killing at multiplay.co.uk
Wed Feb 1 02:43:53 UTC 2017


Thanks I've put a PR in upstream to get some eyes on the fix.
https://github.com/openzfs/openzfs/pull/296

If no objections are raised to the approach I've used I'll commit the 
fix to HEAD too.

On 01/02/2017 02:31, Larry Rosenman wrote:
>
> no grief that I can see:
>
> borg-new /home/ler $ sudo zdb
> Password:
> zroot:
> version: 5000
> name: 'zroot'
> state: 0
> txg: 96143
> pool_guid: 11945658884309024932
> hostid: 3619181042
> hostname: ''
> com.delphix:has_per_vdev_zaps
> vdev_children: 1
> vdev_tree:
> type: 'root'
> id: 0
> guid: 11945658884309024932
> create_txg: 4
> children[0]:
> type: 'raidz'
> id: 0
> guid: 7596925654112466913
> nparity: 1
> metaslab_array: 42
> metaslab_shift: 36
> ashift: 12
> asize: 11947471798272
> is_log: 0
> create_txg: 4
> com.delphix:vdev_zap_top: 35
> children[0]:
> type: 'disk'
> id: 0
> guid: 1443238581175429852
> path: '/dev/mfid4p4'
> whole_disk: 1
> DTL: 137
> create_txg: 4
> com.delphix:vdev_zap_leaf: 131
> children[1]:
> type: 'disk'
> id: 1
> guid: 1865792721003775978
> path: '/dev/mfid0p4'
> whole_disk: 1
> DTL: 133
> create_txg: 4
> com.delphix:vdev_zap_leaf: 37
> children[2]:
> type: 'disk'
> id: 2
> guid: 12541720522827927342
> path: '/dev/mfid1p4'
> whole_disk: 1
> DTL: 132
> create_txg: 4
> com.delphix:vdev_zap_leaf: 38
> children[3]:
> type: 'disk'
> id: 3
> guid: 13053934791777776444
> path: '/dev/mfid3p4'
> whole_disk: 1
> DTL: 136
> create_txg: 4
> com.delphix:vdev_zap_leaf: 135
> children[4]:
> type: 'disk'
> id: 4
> guid: 4432707573898874857
> path: '/dev/mfid2p4'
> whole_disk: 1
> DTL: 130
> create_txg: 4
> com.delphix:vdev_zap_leaf: 40
> children[5]:
> type: 'disk'
> id: 5
> guid: 5106293125005422556
> path: '/dev/mfid5p4'
> whole_disk: 1
> DTL: 129
> create_txg: 4
> com.delphix:vdev_zap_leaf: 41
> features_for_read:
> com.delphix:hole_birth
> com.delphix:embedded_data
> borg-new /home/ler $ sudo zpool list -v
> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
> zroot 10.8T 94.3G 10.7T - 0% 0% 1.00x ONLINE -
> raidz1 10.8T 94.3G 10.7T - 0% 0%
> mfid4p4 - - - - - -
> mfid0p4 - - - - - -
> mfid1p4 - - - - - -
> mfid3p4 - - - - - -
> mfid2p4 - - - - - -
> mfid5p4 - - - - - -
> borg-new /home/ler $ sudo zpool get all
> NAME PROPERTY VALUE SOURCE
> zroot size 10.8T -
> zroot capacity 0% -
> zroot altroot - default
> zroot health ONLINE -
> zroot guid 11945658884309024932 default
> zroot version - default
> zroot bootfs zroot/ROOT/default local
> zroot delegation on default
> zroot autoreplace off default
> zroot cachefile - default
> zroot failmode wait default
> zroot listsnapshots off default
> zroot autoexpand off default
> zroot dedupditto 0 default
> zroot dedupratio 1.00x -
> zroot free 10.7T -
> zroot allocated 94.3G -
> zroot readonly off -
> zroot comment - default
> zroot expandsize - -
> zroot freeing 0 default
> zroot fragmentation 0% -
> zroot leaked 0 default
> zroot feature at async_destroy enabled local
> zroot feature at empty_bpobj active local
> zroot feature at lz4_compress active local
> zroot feature at multi_vdev_crash_dump enabled local
> zroot feature at spacemap_histogram active local
> zroot feature at enabled_txg active local
> zroot feature at hole_birth active local
> zroot feature at extensible_dataset enabled local
> zroot feature at embedded_data active local
> zroot feature at bookmarks enabled local
> zroot feature at filesystem_limits enabled local
> zroot feature at large_blocks enabled local
> zroot feature at sha512 enabled local
> zroot feature at skein enabled local
> borg-new /home/ler $
>
>
>
> On 01/31/2017 5:22 pm, Steven Hartland wrote:
>
>> Yep
>>
>> On 31/01/2017 21:49, Larry Rosenman wrote:
>>>
>>> revert the other patch and apply this one?
>>>
>>>
>>>
>>> On 01/31/2017 3:47 pm, Steven Hartland wrote:
>>>
>>>     Hmm, looks like there's also a bug in the way vdev_min_asize is
>>>     calculated for raidz as it can and has resulted in child
>>>     min_asize which won't provided enough space for the parent due
>>>     to the use of unrounded integer division.
>>>
>>>     1981411579221 * 6 = 11888469475326 < 11888469475328
>>>
>>>     You should have vdev_min_asize: 1981411579222 for your children.
>>>
>>>     Updated patch attached, however calculation still isn't 100%
>>>     reversible so may need work, however it does now ensure that the
>>>     children will provide enough capacity for min_asize even if all
>>>     of them are shrunk to their individual min_asize, which I
>>>     believe previously may not have been the case.
>>>
>>>     This isn't related to the incorrect EXPANDSZ output, but would
>>>     be good if you could confirm it doesn't cause any issues for
>>>     your pool given its state.
>>>
>>>     On 31/01/2017 21:00, Larry Rosenman wrote:
>>>
>>>         borg-new /home/ler $ sudo ./vdev-stats.d
>>>         Password:
>>>         vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0,
>>>         vdev_min_asize: 0
>>>         vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
>>>         11947478089728, vdev_min_asize: 11888469475328
>>>         vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712,
>>>         vdev_asize: 1991245299712, vdev_min_asize: 1981411579221
>>>         vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288,
>>>         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>         vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288,
>>>         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>         vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152,
>>>         vdev_asize: 1991247921152, vdev_min_asize: 1981411579221
>>>         vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288,
>>>         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>         vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288,
>>>         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>         ^C
>>>
>>>         borg-new /home/ler $
>>>
>>>
>>>         borg-new /home/ler $ sudo zpool list -v
>>>         Password:
>>>         NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>>>         zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
>>>         raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
>>>         mfid4p4 - - - - - -
>>>         mfid0p4 - - - - - -
>>>         mfid1p4 - - - - - -
>>>         mfid3p4 - - - - - -
>>>         mfid2p4 - - - - - -
>>>         mfid5p4 - - - - - -
>>>         borg-new /home/ler $
>>>
>>>
>>>         On 01/31/2017 2:37 pm, Steven Hartland wrote:
>>>
>>>             In that case based on your zpool history I suspect that
>>>             the original mfid4p4 was the same size as mfid0p4
>>>             (1991246348288) but its been replaced with a drive which
>>>             is (1991245299712), slightly smaller.
>>>
>>>             This smaller size results in a max_asize of
>>>             1991245299712 * 6 instead of original 1991246348288* 6.
>>>
>>>             Now given the way min_asize (the value used to check if
>>>             the device size is acceptable) is rounded to the the
>>>             nearest metaslab I believe that replace would be allowed.
>>>             https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
>>>
>>>             Now the problem is that on open the calculated asize is
>>>             only updated if its expanding:
>>>             https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
>>>
>>>             The updated dtrace file outputs vdev_min_asize which
>>>             should confirm my suspicion about why the replace was
>>>             allowed.
>>>
>>>             On 31/01/2017 19:05, Larry Rosenman wrote:
>>>
>>>                 I've replaced some disks due to failure, and some of
>>>                 the pariition sizes are different.
>>>
>>>
>>>                 autoexpand is off:
>>>
>>>                 borg-new /home/ler $ zpool get all zroot
>>>                 NAME PROPERTY VALUE SOURCE
>>>                 zroot size 10.8T -
>>>                 zroot capacity 0% -
>>>                 zroot altroot - default
>>>                 zroot health ONLINE -
>>>                 zroot guid 11945658884309024932 default
>>>                 zroot version - default
>>>                 zroot bootfs zroot/ROOT/default local
>>>                 zroot delegation on default
>>>                 zroot autoreplace off default
>>>                 zroot cachefile - default
>>>                 zroot failmode wait default
>>>                 zroot listsnapshots off default
>>>                 zroot autoexpand off default
>>>                 zroot dedupditto 0 default
>>>                 zroot dedupratio 1.00x -
>>>                 zroot free 10.7T -
>>>                 zroot allocated 94.3G -
>>>                 zroot readonly off -
>>>                 zroot comment - default
>>>                 zroot expandsize 16.0E -
>>>                 zroot freeing 0 default
>>>                 zroot fragmentation 0% -
>>>                 zroot leaked 0 default
>>>                 zroot feature at async_destroy enabled local
>>>                 zroot feature at empty_bpobj active local
>>>                 zroot feature at lz4_compress active local
>>>                 zroot feature at multi_vdev_crash_dump enabled local
>>>                 zroot feature at spacemap_histogram active local
>>>                 zroot feature at enabled_txg active local
>>>                 zroot feature at hole_birth active local
>>>                 zroot feature at extensible_dataset enabled local
>>>                 zroot feature at embedded_data active local
>>>                 zroot feature at bookmarks enabled local
>>>                 zroot feature at filesystem_limits enabled local
>>>                 zroot feature at large_blocks enabled local
>>>                 zroot feature at sha512 enabled local
>>>                 zroot feature at skein enabled local
>>>                 borg-new /home/ler $
>>>
>>>
>>>                 borg-new /home/ler $ gpart show
>>>                 => 40 3905945520 mfid0 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 1432 - free - (716K)
>>>                 4096 16777216 3 freebsd-swap (8.0G)
>>>                 16781312 3889162240 4 freebsd-zfs (1.8T)
>>>                 3905943552 2008 - free - (1.0M)
>>>
>>>                 => 40 3905945520 mfid1 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 1432 - free - (716K)
>>>                 4096 16777216 3 freebsd-swap (8.0G)
>>>                 16781312 3889162240 4 freebsd-zfs (1.8T)
>>>                 3905943552 2008 - free - (1.0M)
>>>
>>>                 => 40 3905945520 mfid2 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 1432 - free - (716K)
>>>                 4096 16777216 3 freebsd-swap (8.0G)
>>>                 16781312 3889162240 4 freebsd-zfs (1.8T)
>>>                 3905943552 2008 - free - (1.0M)
>>>
>>>                 => 40 3905945520 mfid3 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 16777216 3 freebsd-swap (8.0G)
>>>                 16779880 3889165680 4 freebsd-zfs (1.8T)
>>>
>>>                 => 40 3905945520 mfid5 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 1432 - free - (716K)
>>>                 4096 16777216 3 freebsd-swap (8.0G)
>>>                 16781312 3889162240 4 freebsd-zfs (1.8T)
>>>                 3905943552 2008 - free - (1.0M)
>>>
>>>                 => 40 3905945520 mfid4 GPT (1.8T)
>>>                 40 1600 1 efi (800K)
>>>                 1640 1024 2 freebsd-boot (512K)
>>>                 2664 1432 - free - (716K)
>>>                 4096 16777216 3 freebsd-swap (8.0G)
>>>                 16781312 3889160192 4 freebsd-zfs (1.8T)
>>>                 3905941504 4056 - free - (2.0M)
>>>
>>>                 borg-new /home/ler $
>>>
>>>
>>>                 this system was built last week, and I **CAN**
>>>                 rebuild it if necessary, but I didn't do anything
>>>                 strange (so I thought :) )
>>>
>>>
>>>
>>>
>>>                 On 01/31/2017 12:30 pm, Steven Hartland wrote:
>>>
>>>                     Your issue is the reported vdev_max_asize >
>>>                     vdev_asize:
>>>                     vdev_max_asize: 11947471798272
>>>                     vdev_asize:     11947478089728
>>>
>>>                     max asize is smaller than asize by 6291456
>>>
>>>                     For raidz1 Xsize should be the smallest disk
>>>                     Xsize * disks so:
>>>                     1991245299712 * 6 = 11947471798272
>>>
>>>                     So your max asize looks right but asize looks
>>>                     too big
>>>
>>>                     Expand Size is calculated by:
>>>                     if (vd->vdev_aux == NULL && tvd != NULL &&
>>>                     vd->vdev_max_asize != 0) {
>>>                         vs->vs_esize = P2ALIGN(vd->vdev_max_asize -
>>>                     vd->vdev_asize,
>>>                             1ULL << tvd->vdev_ms_shift);
>>>                     }
>>>
>>>                     So the question is why is asize too big?
>>>
>>>                     Given you seem to have some random disk sizes do
>>>                     you have auto expand turned on?
>>>
>>>                     On 31/01/2017 17:39, Larry Rosenman wrote:
>>>
>>>                         vdev_path: n/a, vdev_max_asize:
>>>                         11947471798272, vdev_asize: 11947478089728
>>>
>>>
>>>                 -- 
>>>                 Larry Rosenman http://people.freebsd.org/~ler
>>>                 <http://people.freebsd.org/%7Eler>
>>>                 Phone: +1 214-642-9640                 E-Mail:
>>>                 ler at FreeBSD.org <mailto:ler at FreeBSD.org>
>>>                 US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>
>>>
>>>         -- 
>>>         Larry Rosenman http://people.freebsd.org/~ler
>>>         <http://people.freebsd.org/%7Eler>
>>>         Phone: +1 214-642-9640                 E-Mail:
>>>         ler at FreeBSD.org <mailto:ler at FreeBSD.org>
>>>         US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>
>>>
>>> -- 
>>> Larry Rosenman http://people.freebsd.org/~ler 
>>> <http://people.freebsd.org/%7Eler>
>>> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org 
>>> <mailto:ler at FreeBSD.org>
>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>
> -- 
> Larry Rosenman http://people.freebsd.org/~ler 
> <http://people.freebsd.org/%7Eler>
> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org 
> <mailto:ler at FreeBSD.org>
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281



More information about the freebsd-fs mailing list