16.0E ExpandSize? -- New Server

Marie Helene Kvello-Aune marieheleneka at gmail.com
Tue Jan 31 22:02:29 UTC 2017


On Tue, Jan 31, 2017 at 10:49 PM Larry Rosenman <ler at freebsd.org> wrote:

> revert the other patch and apply this one?
>
> On 01/31/2017 3:47 pm, Steven Hartland wrote:
>
> > Hmm, looks like there's also a bug in the way vdev_min_asize is
> calculated for raidz as it can and has resulted in child min_asize which
> won't provided enough space for the parent due to the use of unrounded
> integer division.
> >
> > 1981411579221 * 6 = 11888469475326 < 11888469475328
> >
> > You should have vdev_min_asize: 1981411579222 for your children.
> >
> > Updated patch attached, however calculation still isn't 100% reversible
> so may need work, however it does now ensure that the children will provide
> enough capacity for min_asize even if all of them are shrunk to their
> individual min_asize, which I believe previously may not have been the case.
> >
> > This isn't related to the incorrect EXPANDSZ output, but would be good
> if you could confirm it doesn't cause any issues for your pool given its
> state.
> >
> > On 31/01/2017 21:00, Larry Rosenman wrote:
> >
> > borg-new /home/ler $ sudo ./vdev-stats.d
> > Password:
> > vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, vdev_min_asize: 0
> > vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
> 11947478089728, vdev_min_asize: 11888469475328
> > vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712, vdev_asize:
> 1991245299712, vdev_min_asize: 1981411579221
> > vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288, vdev_asize:
> 1991246348288, vdev_min_asize: 1981411579221
> > vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288, vdev_asize:
> 1991246348288, vdev_min_asize: 1981411579221
> > vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152, vdev_asize:
> 1991247921152, vdev_min_asize: 1981411579221
> > vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288, vdev_asize:
> 1991246348288, vdev_min_asize: 1981411579221
> > vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288, vdev_asize:
> 1991246348288, vdev_min_asize: 1981411579221
> > ^C
> >
> > borg-new /home/ler $
> >
> > borg-new /home/ler $ sudo zpool list -v
> > Password:
> > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
> > zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
> > raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
> > mfid4p4 - - - - - -
> > mfid0p4 - - - - - -
> > mfid1p4 - - - - - -
> > mfid3p4 - - - - - -
> > mfid2p4 - - - - - -
> > mfid5p4 - - - - - -
> > borg-new /home/ler $
> >
> > On 01/31/2017 2:37 pm, Steven Hartland wrote: In that case based on your
> zpool history I suspect that the original mfid4p4 was the same size as
> mfid0p4 (1991246348288) but its been replaced with a drive which is
> (1991245299712), slightly smaller.
> >
> > This smaller size results in a max_asize of 1991245299712 * 6 instead of
> original 1991246348288* 6.
> >
> > Now given the way min_asize (the value used to check if the device size
> is acceptable) is rounded to the the nearest metaslab I believe that
> replace would be allowed.
> >
> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
> >
> > Now the problem is that on open the calculated asize is only updated if
> its expanding:
> >
> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
> >
> > The updated dtrace file outputs vdev_min_asize which should confirm my
> suspicion about why the replace was allowed.
> >
> > On 31/01/2017 19:05, Larry Rosenman wrote:
> >
> > I've replaced some disks due to failure, and some of the pariition sizes
> are different.
> >
> > autoexpand is off:
> >
> > borg-new /home/ler $ zpool get all zroot
> > NAME PROPERTY VALUE SOURCE
> > zroot size 10.8T -
> > zroot capacity 0% -
> > zroot altroot - default
> > zroot health ONLINE -
> > zroot guid 11945658884309024932 default
> > zroot version - default
> > zroot bootfs zroot/ROOT/default local
> > zroot delegation on default
> > zroot autoreplace off default
> > zroot cachefile - default
> > zroot failmode wait default
> > zroot listsnapshots off default
> > zroot autoexpand off default
> > zroot dedupditto 0 default
> > zroot dedupratio 1.00x -
> > zroot free 10.7T -
> > zroot allocated 94.3G -
> > zroot readonly off -
> > zroot comment - default
> > zroot expandsize 16.0E -
> > zroot freeing 0 default
> > zroot fragmentation 0% -
> > zroot leaked 0 default
> > zroot feature at async_destroy enabled local
> > zroot feature at empty_bpobj active local
> > zroot feature at lz4_compress active local
> > zroot feature at multi_vdev_crash_dump enabled local
> > zroot feature at spacemap_histogram active local
> > zroot feature at enabled_txg active local
> > zroot feature at hole_birth active local
> > zroot feature at extensible_dataset enabled local
> > zroot feature at embedded_data active local
> > zroot feature at bookmarks enabled local
> > zroot feature at filesystem_limits enabled local
> > zroot feature at large_blocks enabled local
> > zroot feature at sha512 enabled local
> > zroot feature at skein enabled local
> > borg-new /home/ler $
> >
> > borg-new /home/ler $ gpart show
> > => 40 3905945520 mfid0 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 1432 - free - (716K)
> > 4096 16777216 3 freebsd-swap (8.0G)
> > 16781312 3889162240 4 freebsd-zfs (1.8T)
> > 3905943552 2008 - free - (1.0M)
> >
> > => 40 3905945520 mfid1 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 1432 - free - (716K)
> > 4096 16777216 3 freebsd-swap (8.0G)
> > 16781312 3889162240 4 freebsd-zfs (1.8T)
> > 3905943552 2008 - free - (1.0M)
> >
> > => 40 3905945520 mfid2 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 1432 - free - (716K)
> > 4096 16777216 3 freebsd-swap (8.0G)
> > 16781312 3889162240 4 freebsd-zfs (1.8T)
> > 3905943552 2008 - free - (1.0M)
> >
> > => 40 3905945520 mfid3 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 16777216 3 freebsd-swap (8.0G)
> > 16779880 3889165680 4 freebsd-zfs (1.8T)
> >
> > => 40 3905945520 mfid5 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 1432 - free - (716K)
> > 4096 16777216 3 freebsd-swap (8.0G)
> > 16781312 3889162240 4 freebsd-zfs (1.8T)
> > 3905943552 2008 - free - (1.0M)
> >
> > => 40 3905945520 mfid4 GPT (1.8T)
> > 40 1600 1 efi (800K)
> > 1640 1024 2 freebsd-boot (512K)
> > 2664 1432 - free - (716K)
> > 4096 16777216 3 freebsd-swap (8.0G)
> > 16781312 3889160192 4 freebsd-zfs (1.8T)
> > 3905941504 4056 - free - (2.0M)
> >
> > borg-new /home/ler $
> >
> > this system was built last week, and I **CAN** rebuild it if necessary,
> but I didn't do anything strange (so I thought :) )
> >
> > On 01/31/2017 12:30 pm, Steven Hartland wrote: Your issue is the
> reported vdev_max_asize > vdev_asize:
> > vdev_max_asize: 11947471798272
> > vdev_asize:     11947478089728
> >
> > max asize is smaller than asize by 6291456
> >
> > For raidz1 Xsize should be the smallest disk Xsize * disks so:
> > 1991245299712 * 6 = 11947471798272
> >
> > So your max asize looks right but asize looks too big
> >
> > Expand Size is calculated by:
> > if (vd->vdev_aux == NULL && tvd != NULL && vd->vdev_max_asize != 0) {
> > vs->vs_esize = P2ALIGN(vd->vdev_max_asize - vd->vdev_asize,
> > 1ULL << tvd->vdev_ms_shift);
> > }
> >
> > So the question is why is asize too big?
> >
> > Given you seem to have some random disk sizes do you have auto expand
> turned on?
> >
> > On 31/01/2017 17:39, Larry Rosenman wrote: vdev_path: n/a,
> vdev_max_asize: 11947471798272, vdev_asize: 11947478089728
>
> --
> Larry Rosenman                     http://people.freebsd.org/~ler [1]
> Phone: +1 214-642-9640 <(214)%20642-9640>                 E-Mail:
> ler at FreeBSD.org
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
> --
> Larry Rosenman                     http://people.freebsd.org/~ler [1]
> Phone: +1 214-642-9640 <(214)%20642-9640>                 E-Mail:
> ler at FreeBSD.org
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
> --
> Larry Rosenman                     http://people.freebsd.org/~ler
> Phone: +1 214-642-9640 <(214)%20642-9640>                 E-Mail:
> ler at FreeBSD.org
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>
> Links:
> ------
> [1] http://people.freebsd.org/%7Eler
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>

I have the same observation on my home file server. I've not tried the
patches (will try that once I get time next week), but the output of the
dtrace script while doing 'zpool list -v' shows:

 # ./dtrace.sh
vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0
vdev_path: n/a, vdev_max_asize: 23907502915584, vdev_asize: 23907504488448
vdev_path: /dev/gpt/Bay1.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264
vdev_path: /dev/gpt/Bay2.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264
vdev_path: /dev/gpt/Bay3.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264
vdev_path: /dev/gpt/Bay4.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264
vdev_path: /dev/gpt/Bay5.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264
vdev_path: /dev/gpt/Bay6.eli, vdev_max_asize: 3984583819264, vdev_asize:
3984583819264

The second line has the same discrepancy as above. This pool was created
without geli encryption first, then while the pool was still empty, each
drive was offlined and replaced with its .eli counterpart. IIRC geli leaves
some metadata on the disk, shrinking available space ever so slightly,
which seems to fit the proposed cause earlier in this thread.

MH


More information about the freebsd-fs mailing list