16.0E ExpandSize? -- New Server

Steven Hartland killing at multiplay.co.uk
Tue Jan 31 22:37:31 UTC 2017


On 31/01/2017 22:02, Marie Helene Kvello-Aune wrote:
> On Tue, Jan 31, 2017 at 10:49 PM Larry Rosenman <ler at freebsd.org 
> <mailto:ler at freebsd.org>> wrote:
>
>     revert the other patch and apply this one?
>
>     On 01/31/2017 3:47 pm, Steven Hartland wrote:
>
>     > Hmm, looks like there's also a bug in the way vdev_min_asize is
>     calculated for raidz as it can and has resulted in child min_asize
>     which won't provided enough space for the parent due to the use of
>     unrounded integer division.
>     >
>     > 1981411579221 * 6 = 11888469475326 < 11888469475328
>     >
>     > You should have vdev_min_asize: 1981411579222 for your children.
>     >
>     > Updated patch attached, however calculation still isn't 100%
>     reversible so may need work, however it does now ensure that the
>     children will provide enough capacity for min_asize even if all of
>     them are shrunk to their individual min_asize, which I believe
>     previously may not have been the case.
>     >
>     > This isn't related to the incorrect EXPANDSZ output, but would
>     be good if you could confirm it doesn't cause any issues for your
>     pool given its state.
>     >
>     > On 31/01/2017 21:00, Larry Rosenman wrote:
>     >
>     > borg-new /home/ler $ sudo ./vdev-stats.d
>     > Password:
>     > vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, vdev_min_asize: 0
>     > vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
>     11947478089728, vdev_min_asize: 11888469475328
>     > vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712,
>     vdev_asize: 1991245299712, vdev_min_asize: 1981411579221
>     > vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288,
>     vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>     > vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288,
>     vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>     > vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152,
>     vdev_asize: 1991247921152, vdev_min_asize: 1981411579221
>     > vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288,
>     vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>     > vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288,
>     vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>     > ^C
>     >
>     > borg-new /home/ler $
>     >
>     > borg-new /home/ler $ sudo zpool list -v
>     > Password:
>     > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>     > zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
>     > raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
>     > mfid4p4 - - - - - -
>     > mfid0p4 - - - - - -
>     > mfid1p4 - - - - - -
>     > mfid3p4 - - - - - -
>     > mfid2p4 - - - - - -
>     > mfid5p4 - - - - - -
>     > borg-new /home/ler $
>     >
>     > On 01/31/2017 2:37 pm, Steven Hartland wrote: In that case based
>     on your zpool history I suspect that the original mfid4p4 was the
>     same size as mfid0p4 (1991246348288) but its been replaced with a
>     drive which is (1991245299712), slightly smaller.
>     >
>     > This smaller size results in a max_asize of 1991245299712 * 6
>     instead of original 1991246348288* 6.
>     >
>     > Now given the way min_asize (the value used to check if the
>     device size is acceptable) is rounded to the the nearest metaslab
>     I believe that replace would be allowed.
>     >
>     https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
>     >
>     > Now the problem is that on open the calculated asize is only
>     updated if its expanding:
>     >
>     https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
>     >
>     > The updated dtrace file outputs vdev_min_asize which should
>     confirm my suspicion about why the replace was allowed.
>     >
>     > On 31/01/2017 19:05, Larry Rosenman wrote:
>     >
>     > I've replaced some disks due to failure, and some of the
>     pariition sizes are different.
>     >
>     > autoexpand is off:
>     >
>     > borg-new /home/ler $ zpool get all zroot
>     > NAME PROPERTY VALUE SOURCE
>     > zroot size 10.8T -
>     > zroot capacity 0% -
>     > zroot altroot - default
>     > zroot health ONLINE -
>     > zroot guid 11945658884309024932 default
>     > zroot version - default
>     > zroot bootfs zroot/ROOT/default local
>     > zroot delegation on default
>     > zroot autoreplace off default
>     > zroot cachefile - default
>     > zroot failmode wait default
>     > zroot listsnapshots off default
>     > zroot autoexpand off default
>     > zroot dedupditto 0 default
>     > zroot dedupratio 1.00x -
>     > zroot free 10.7T -
>     > zroot allocated 94.3G -
>     > zroot readonly off -
>     > zroot comment - default
>     > zroot expandsize 16.0E -
>     > zroot freeing 0 default
>     > zroot fragmentation 0% -
>     > zroot leaked 0 default
>     > zroot feature at async_destroy enabled local
>     > zroot feature at empty_bpobj active local
>     > zroot feature at lz4_compress active local
>     > zroot feature at multi_vdev_crash_dump enabled local
>     > zroot feature at spacemap_histogram active local
>     > zroot feature at enabled_txg active local
>     > zroot feature at hole_birth active local
>     > zroot feature at extensible_dataset enabled local
>     > zroot feature at embedded_data active local
>     > zroot feature at bookmarks enabled local
>     > zroot feature at filesystem_limits enabled local
>     > zroot feature at large_blocks enabled local
>     > zroot feature at sha512 enabled local
>     > zroot feature at skein enabled local
>     > borg-new /home/ler $
>     >
>     > borg-new /home/ler $ gpart show
>     > => 40 3905945520 mfid0 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 1432 - free - (716K)
>     > 4096 16777216 3 freebsd-swap (8.0G)
>     > 16781312 3889162240 4 freebsd-zfs (1.8T)
>     > 3905943552 2008 - free - (1.0M)
>     >
>     > => 40 3905945520 mfid1 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 1432 - free - (716K)
>     > 4096 16777216 3 freebsd-swap (8.0G)
>     > 16781312 3889162240 4 freebsd-zfs (1.8T)
>     > 3905943552 2008 - free - (1.0M)
>     >
>     > => 40 3905945520 mfid2 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 1432 - free - (716K)
>     > 4096 16777216 3 freebsd-swap (8.0G)
>     > 16781312 3889162240 4 freebsd-zfs (1.8T)
>     > 3905943552 2008 - free - (1.0M)
>     >
>     > => 40 3905945520 mfid3 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 16777216 3 freebsd-swap (8.0G)
>     > 16779880 3889165680 4 freebsd-zfs (1.8T)
>     >
>     > => 40 3905945520 mfid5 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 1432 - free - (716K)
>     > 4096 16777216 3 freebsd-swap (8.0G)
>     > 16781312 3889162240 4 freebsd-zfs (1.8T)
>     > 3905943552 2008 - free - (1.0M)
>     >
>     > => 40 3905945520 mfid4 GPT (1.8T)
>     > 40 1600 1 efi (800K)
>     > 1640 1024 2 freebsd-boot (512K)
>     > 2664 1432 - free - (716K)
>     > 4096 16777216 3 freebsd-swap (8.0G)
>     > 16781312 3889160192 4 freebsd-zfs (1.8T)
>     > 3905941504 4056 - free - (2.0M)
>     >
>     > borg-new /home/ler $
>     >
>     > this system was built last week, and I **CAN** rebuild it if
>     necessary, but I didn't do anything strange (so I thought :) )
>     >
>     > On 01/31/2017 12:30 pm, Steven Hartland wrote: Your issue is the
>     reported vdev_max_asize > vdev_asize:
>     > vdev_max_asize: 11947471798272
>     > vdev_asize:     11947478089728
>     >
>     > max asize is smaller than asize by 6291456
>     >
>     > For raidz1 Xsize should be the smallest disk Xsize * disks so:
>     > 1991245299712 * 6 = 11947471798272
>     >
>     > So your max asize looks right but asize looks too big
>     >
>     > Expand Size is calculated by:
>     > if (vd->vdev_aux == NULL && tvd != NULL && vd->vdev_max_asize !=
>     0) {
>     > vs->vs_esize = P2ALIGN(vd->vdev_max_asize - vd->vdev_asize,
>     > 1ULL << tvd->vdev_ms_shift);
>     > }
>     >
>     > So the question is why is asize too big?
>     >
>     > Given you seem to have some random disk sizes do you have auto
>     expand turned on?
>     >
>     > On 31/01/2017 17:39, Larry Rosenman wrote: vdev_path: n/a,
>     vdev_max_asize: 11947471798272, vdev_asize: 11947478089728
>
>     --
>     Larry Rosenman http://people.freebsd.org/~ler
>     <http://people.freebsd.org/%7Eler> [1]
>     Phone: +1 214-642-9640 <tel:%28214%29%20642-9640>          
>      E-Mail: ler at FreeBSD.org
>     US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>     --
>     Larry Rosenman http://people.freebsd.org/~ler
>     <http://people.freebsd.org/%7Eler> [1]
>     Phone: +1 214-642-9640 <tel:%28214%29%20642-9640>          
>      E-Mail: ler at FreeBSD.org
>     US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>     --
>     Larry Rosenman http://people.freebsd.org/~ler
>     <http://people.freebsd.org/%7Eler>
>     Phone: +1 214-642-9640 <tel:%28214%29%20642-9640>          
>      E-Mail: ler at FreeBSD.org
>     US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>
>     Links:
>     ------
>     [1] http://people.freebsd.org/%7Eler
>     _______________________________________________
>     freebsd-fs at freebsd.org <mailto:freebsd-fs at freebsd.org> mailing list
>     https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     To unsubscribe, send any mail to
>     "freebsd-fs-unsubscribe at freebsd.org
>     <mailto:freebsd-fs-unsubscribe at freebsd.org>"
>
>
> I have the same observation on my home file server. I've not tried the 
> patches (will try that once I get time next week), but the output of 
> the dtrace script while doing 'zpool list -v' shows:
>
>  # ./dtrace.sh
> vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0
> vdev_path: n/a, vdev_max_asize: 23907502915584, vdev_asize: 23907504488448
> vdev_path: /dev/gpt/Bay1.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
> vdev_path: /dev/gpt/Bay2.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
> vdev_path: /dev/gpt/Bay3.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
> vdev_path: /dev/gpt/Bay4.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
> vdev_path: /dev/gpt/Bay5.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
> vdev_path: /dev/gpt/Bay6.eli, vdev_max_asize: 3984583819264, 
> vdev_asize: 3984583819264
>
> The second line has the same discrepancy as above. This pool was 
> created without geli encryption first, then while the pool was still 
> empty, each drive was offlined and replaced with its .eli counterpart. 
> IIRC geli leaves some metadata on the disk, shrinking available space 
> ever so slightly, which seems to fit the proposed cause earlier in 
> this thread.
>
> MH
Yes indeed it does.

     Regards
     Steve


More information about the freebsd-fs mailing list