16.0E ExpandSize? -- New Server

Steven Hartland killing at multiplay.co.uk
Sun Feb 5 23:16:32 UTC 2017


Its still actually waiting on review, the merged icon at the bottom was 
for an unrelated PR which reference mine as they saw the same random 
test failure as I did.

On 05/02/2017 03:54, Larry Rosenman wrote:
> I saw it was accepted upstream. Can it be committed to FreeBSD?
>
>
> On Wed, Feb 01, 2017 at 02:43:51AM +0000, Steven Hartland wrote:
>> Thanks I've put a PR in upstream to get some eyes on the fix.
>> https://github.com/openzfs/openzfs/pull/296
>>
>> If no objections are raised to the approach I've used I'll commit the fix to
>> HEAD too.
>>
>> On 01/02/2017 02:31, Larry Rosenman wrote:
>>> no grief that I can see:
>>>
>>> borg-new /home/ler $ sudo zdb
>>> Password:
>>> zroot:
>>> version: 5000
>>> name: 'zroot'
>>> state: 0
>>> txg: 96143
>>> pool_guid: 11945658884309024932
>>> hostid: 3619181042
>>> hostname: ''
>>> com.delphix:has_per_vdev_zaps
>>> vdev_children: 1
>>> vdev_tree:
>>> type: 'root'
>>> id: 0
>>> guid: 11945658884309024932
>>> create_txg: 4
>>> children[0]:
>>> type: 'raidz'
>>> id: 0
>>> guid: 7596925654112466913
>>> nparity: 1
>>> metaslab_array: 42
>>> metaslab_shift: 36
>>> ashift: 12
>>> asize: 11947471798272
>>> is_log: 0
>>> create_txg: 4
>>> com.delphix:vdev_zap_top: 35
>>> children[0]:
>>> type: 'disk'
>>> id: 0
>>> guid: 1443238581175429852
>>> path: '/dev/mfid4p4'
>>> whole_disk: 1
>>> DTL: 137
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 131
>>> children[1]:
>>> type: 'disk'
>>> id: 1
>>> guid: 1865792721003775978
>>> path: '/dev/mfid0p4'
>>> whole_disk: 1
>>> DTL: 133
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 37
>>> children[2]:
>>> type: 'disk'
>>> id: 2
>>> guid: 12541720522827927342
>>> path: '/dev/mfid1p4'
>>> whole_disk: 1
>>> DTL: 132
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 38
>>> children[3]:
>>> type: 'disk'
>>> id: 3
>>> guid: 13053934791777776444
>>> path: '/dev/mfid3p4'
>>> whole_disk: 1
>>> DTL: 136
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 135
>>> children[4]:
>>> type: 'disk'
>>> id: 4
>>> guid: 4432707573898874857
>>> path: '/dev/mfid2p4'
>>> whole_disk: 1
>>> DTL: 130
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 40
>>> children[5]:
>>> type: 'disk'
>>> id: 5
>>> guid: 5106293125005422556
>>> path: '/dev/mfid5p4'
>>> whole_disk: 1
>>> DTL: 129
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 41
>>> features_for_read:
>>> com.delphix:hole_birth
>>> com.delphix:embedded_data
>>> borg-new /home/ler $ sudo zpool list -v
>>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>>> zroot 10.8T 94.3G 10.7T - 0% 0% 1.00x ONLINE -
>>> raidz1 10.8T 94.3G 10.7T - 0% 0%
>>> mfid4p4 - - - - - -
>>> mfid0p4 - - - - - -
>>> mfid1p4 - - - - - -
>>> mfid3p4 - - - - - -
>>> mfid2p4 - - - - - -
>>> mfid5p4 - - - - - -
>>> borg-new /home/ler $ sudo zpool get all
>>> NAME PROPERTY VALUE SOURCE
>>> zroot size 10.8T -
>>> zroot capacity 0% -
>>> zroot altroot - default
>>> zroot health ONLINE -
>>> zroot guid 11945658884309024932 default
>>> zroot version - default
>>> zroot bootfs zroot/ROOT/default local
>>> zroot delegation on default
>>> zroot autoreplace off default
>>> zroot cachefile - default
>>> zroot failmode wait default
>>> zroot listsnapshots off default
>>> zroot autoexpand off default
>>> zroot dedupditto 0 default
>>> zroot dedupratio 1.00x -
>>> zroot free 10.7T -
>>> zroot allocated 94.3G -
>>> zroot readonly off -
>>> zroot comment - default
>>> zroot expandsize - -
>>> zroot freeing 0 default
>>> zroot fragmentation 0% -
>>> zroot leaked 0 default
>>> zroot feature at async_destroy enabled local
>>> zroot feature at empty_bpobj active local
>>> zroot feature at lz4_compress active local
>>> zroot feature at multi_vdev_crash_dump enabled local
>>> zroot feature at spacemap_histogram active local
>>> zroot feature at enabled_txg active local
>>> zroot feature at hole_birth active local
>>> zroot feature at extensible_dataset enabled local
>>> zroot feature at embedded_data active local
>>> zroot feature at bookmarks enabled local
>>> zroot feature at filesystem_limits enabled local
>>> zroot feature at large_blocks enabled local
>>> zroot feature at sha512 enabled local
>>> zroot feature at skein enabled local
>>> borg-new /home/ler $
>>>
>>>
>>>
>>> On 01/31/2017 5:22 pm, Steven Hartland wrote:
>>>
>>>> Yep
>>>>
>>>> On 31/01/2017 21:49, Larry Rosenman wrote:
>>>>> revert the other patch and apply this one?
>>>>>
>>>>>
>>>>>
>>>>> On 01/31/2017 3:47 pm, Steven Hartland wrote:
>>>>>
>>>>>      Hmm, looks like there's also a bug in the way vdev_min_asize is
>>>>>      calculated for raidz as it can and has resulted in child
>>>>>      min_asize which won't provided enough space for the parent due
>>>>>      to the use of unrounded integer division.
>>>>>
>>>>>      1981411579221 * 6 = 11888469475326 < 11888469475328
>>>>>
>>>>>      You should have vdev_min_asize: 1981411579222 for your children.
>>>>>
>>>>>      Updated patch attached, however calculation still isn't 100%
>>>>>      reversible so may need work, however it does now ensure that the
>>>>>      children will provide enough capacity for min_asize even if all
>>>>>      of them are shrunk to their individual min_asize, which I
>>>>>      believe previously may not have been the case.
>>>>>
>>>>>      This isn't related to the incorrect EXPANDSZ output, but would
>>>>>      be good if you could confirm it doesn't cause any issues for
>>>>>      your pool given its state.
>>>>>
>>>>>      On 31/01/2017 21:00, Larry Rosenman wrote:
>>>>>
>>>>>          borg-new /home/ler $ sudo ./vdev-stats.d
>>>>>          Password:
>>>>>          vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0,
>>>>>          vdev_min_asize: 0
>>>>>          vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
>>>>>          11947478089728, vdev_min_asize: 11888469475328
>>>>>          vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712,
>>>>>          vdev_asize: 1991245299712, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152,
>>>>>          vdev_asize: 1991247921152, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          ^C
>>>>>
>>>>>          borg-new /home/ler $
>>>>>
>>>>>
>>>>>          borg-new /home/ler $ sudo zpool list -v
>>>>>          Password:
>>>>>          NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>>>>>          zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
>>>>>          raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
>>>>>          mfid4p4 - - - - - -
>>>>>          mfid0p4 - - - - - -
>>>>>          mfid1p4 - - - - - -
>>>>>          mfid3p4 - - - - - -
>>>>>          mfid2p4 - - - - - -
>>>>>          mfid5p4 - - - - - -
>>>>>          borg-new /home/ler $
>>>>>
>>>>>
>>>>>          On 01/31/2017 2:37 pm, Steven Hartland wrote:
>>>>>
>>>>>              In that case based on your zpool history I suspect that
>>>>>              the original mfid4p4 was the same size as mfid0p4
>>>>>              (1991246348288) but its been replaced with a drive which
>>>>>              is (1991245299712), slightly smaller.
>>>>>
>>>>>              This smaller size results in a max_asize of
>>>>>              1991245299712 * 6 instead of original 1991246348288* 6.
>>>>>
>>>>>              Now given the way min_asize (the value used to check if
>>>>>              the device size is acceptable) is rounded to the the
>>>>>              nearest metaslab I believe that replace would be allowed.
>>>>>              https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
>>>>>
>>>>>              Now the problem is that on open the calculated asize is
>>>>>              only updated if its expanding:
>>>>>              https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
>>>>>
>>>>>              The updated dtrace file outputs vdev_min_asize which
>>>>>              should confirm my suspicion about why the replace was
>>>>>              allowed.
>>>>>
>>>>>              On 31/01/2017 19:05, Larry Rosenman wrote:
>>>>>
>>>>>                  I've replaced some disks due to failure, and some of
>>>>>                  the pariition sizes are different.
>>>>>
>>>>>
>>>>>                  autoexpand is off:
>>>>>
>>>>>                  borg-new /home/ler $ zpool get all zroot
>>>>>                  NAME PROPERTY VALUE SOURCE
>>>>>                  zroot size 10.8T -
>>>>>                  zroot capacity 0% -
>>>>>                  zroot altroot - default
>>>>>                  zroot health ONLINE -
>>>>>                  zroot guid 11945658884309024932 default
>>>>>                  zroot version - default
>>>>>                  zroot bootfs zroot/ROOT/default local
>>>>>                  zroot delegation on default
>>>>>                  zroot autoreplace off default
>>>>>                  zroot cachefile - default
>>>>>                  zroot failmode wait default
>>>>>                  zroot listsnapshots off default
>>>>>                  zroot autoexpand off default
>>>>>                  zroot dedupditto 0 default
>>>>>                  zroot dedupratio 1.00x -
>>>>>                  zroot free 10.7T -
>>>>>                  zroot allocated 94.3G -
>>>>>                  zroot readonly off -
>>>>>                  zroot comment - default
>>>>>                  zroot expandsize 16.0E -
>>>>>                  zroot freeing 0 default
>>>>>                  zroot fragmentation 0% -
>>>>>                  zroot leaked 0 default
>>>>>                  zroot feature at async_destroy enabled local
>>>>>                  zroot feature at empty_bpobj active local
>>>>>                  zroot feature at lz4_compress active local
>>>>>                  zroot feature at multi_vdev_crash_dump enabled local
>>>>>                  zroot feature at spacemap_histogram active local
>>>>>                  zroot feature at enabled_txg active local
>>>>>                  zroot feature at hole_birth active local
>>>>>                  zroot feature at extensible_dataset enabled local
>>>>>                  zroot feature at embedded_data active local
>>>>>                  zroot feature at bookmarks enabled local
>>>>>                  zroot feature at filesystem_limits enabled local
>>>>>                  zroot feature at large_blocks enabled local
>>>>>                  zroot feature at sha512 enabled local
>>>>>                  zroot feature at skein enabled local
>>>>>                  borg-new /home/ler $
>>>>>
>>>>>
>>>>>                  borg-new /home/ler $ gpart show
>>>>>                  => 40 3905945520 mfid0 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid1 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid2 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid3 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 16777216 3 freebsd-swap (8.0G)
>>>>>                  16779880 3889165680 4 freebsd-zfs (1.8T)
>>>>>
>>>>>                  => 40 3905945520 mfid5 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid4 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889160192 4 freebsd-zfs (1.8T)
>>>>>                  3905941504 4056 - free - (2.0M)
>>>>>
>>>>>                  borg-new /home/ler $
>>>>>
>>>>>
>>>>>                  this system was built last week, and I **CAN**
>>>>>                  rebuild it if necessary, but I didn't do anything
>>>>>                  strange (so I thought :) )
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                  On 01/31/2017 12:30 pm, Steven Hartland wrote:
>>>>>
>>>>>                      Your issue is the reported vdev_max_asize >
>>>>>                      vdev_asize:
>>>>>                      vdev_max_asize: 11947471798272
>>>>>                      vdev_asize:     11947478089728
>>>>>
>>>>>                      max asize is smaller than asize by 6291456
>>>>>
>>>>>                      For raidz1 Xsize should be the smallest disk
>>>>>                      Xsize * disks so:
>>>>>                      1991245299712 * 6 = 11947471798272
>>>>>
>>>>>                      So your max asize looks right but asize looks
>>>>>                      too big
>>>>>
>>>>>                      Expand Size is calculated by:
>>>>>                      if (vd->vdev_aux == NULL && tvd != NULL &&
>>>>>                      vd->vdev_max_asize != 0) {
>>>>>                          vs->vs_esize = P2ALIGN(vd->vdev_max_asize -
>>>>>                      vd->vdev_asize,
>>>>>                              1ULL << tvd->vdev_ms_shift);
>>>>>                      }
>>>>>
>>>>>                      So the question is why is asize too big?
>>>>>
>>>>>                      Given you seem to have some random disk sizes do
>>>>>                      you have auto expand turned on?
>>>>>
>>>>>                      On 31/01/2017 17:39, Larry Rosenman wrote:
>>>>>
>>>>>                          vdev_path: n/a, vdev_max_asize:
>>>>>                          11947471798272, vdev_asize: 11947478089728
>>>>>
>>>>>
>>>>>                  --                 Larry Rosenman
>>>>> http://people.freebsd.org/~ler
>>>>>                  <http://people.freebsd.org/%7Eler>
>>>>>                  Phone: +1 214-642-9640                 E-Mail:
>>>>>                  ler at FreeBSD.org <mailto:ler at FreeBSD.org>
>>>>>                  US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>>>
>>>>>
>>>>>          --         Larry Rosenman http://people.freebsd.org/~ler
>>>>>          <http://people.freebsd.org/%7Eler>
>>>>>          Phone: +1 214-642-9640                 E-Mail:
>>>>>          ler at FreeBSD.org <mailto:ler at FreeBSD.org>
>>>>>          US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>>>
>>>>>
>>>>> -- 
>>>>> Larry Rosenman http://people.freebsd.org/~ler
>>>>> <http://people.freebsd.org/%7Eler>
>>>>> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org
>>>>> <mailto:ler at FreeBSD.org>
>>>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>
>>> -- 
>>> Larry Rosenman http://people.freebsd.org/~ler
>>> <http://people.freebsd.org/%7Eler>
>>> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org
>>> <mailto:ler at FreeBSD.org>
>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281



More information about the freebsd-fs mailing list