16.0E ExpandSize? -- New Server

Steven Hartland killing at multiplay.co.uk
Tue Jan 31 21:47:16 UTC 2017


Hmm, looks like there's also a bug in the way vdev_min_asize is 
calculated for raidz as it can and has resulted in child min_asize which 
won't provided enough space for the parent due to the use of unrounded 
integer division.

1981411579221 * 6 = 11888469475326 < 11888469475328

You should have vdev_min_asize: 1981411579222 for your children.

Updated patch attached, however calculation still isn't 100% reversible 
so may need work, however it does now ensure that the children will 
provide enough capacity for min_asize even if all of them are shrunk to 
their individual min_asize, which I believe previously may not have been 
the case.

This isn't related to the incorrect EXPANDSZ output, but would be good 
if you could confirm it doesn't cause any issues for your pool given its 
state.

On 31/01/2017 21:00, Larry Rosenman wrote:
>
> borg-new /home/ler $ sudo ./vdev-stats.d
> Password:
> vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, vdev_min_asize: 0
> vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize: 
> 11947478089728, vdev_min_asize: 11888469475328
> vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712, vdev_asize: 
> 1991245299712, vdev_min_asize: 1981411579221
> vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288, vdev_asize: 
> 1991246348288, vdev_min_asize: 1981411579221
> vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288, vdev_asize: 
> 1991246348288, vdev_min_asize: 1981411579221
> vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152, vdev_asize: 
> 1991247921152, vdev_min_asize: 1981411579221
> vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288, vdev_asize: 
> 1991246348288, vdev_min_asize: 1981411579221
> vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288, vdev_asize: 
> 1991246348288, vdev_min_asize: 1981411579221
> ^C
>
> borg-new /home/ler $
>
>
> borg-new /home/ler $ sudo zpool list -v
> Password:
> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
> zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
> raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
> mfid4p4 - - - - - -
> mfid0p4 - - - - - -
> mfid1p4 - - - - - -
> mfid3p4 - - - - - -
> mfid2p4 - - - - - -
> mfid5p4 - - - - - -
> borg-new /home/ler $
>
>
> On 01/31/2017 2:37 pm, Steven Hartland wrote:
>
>> In that case based on your zpool history I suspect that the original 
>> mfid4p4 was the same size as mfid0p4 (1991246348288) but its been 
>> replaced with a drive which is (1991245299712), slightly smaller.
>>
>> This smaller size results in a max_asize of 1991245299712 * 6 instead 
>> of original 1991246348288* 6.
>>
>> Now given the way min_asize (the value used to check if the device 
>> size is acceptable) is rounded to the the nearest metaslab I believe 
>> that replace would be allowed.
>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
>>
>> Now the problem is that on open the calculated asize is only updated 
>> if its expanding:
>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
>>
>> The updated dtrace file outputs vdev_min_asize which should confirm 
>> my suspicion about why the replace was allowed.
>>
>> On 31/01/2017 19:05, Larry Rosenman wrote:
>>>
>>> I've replaced some disks due to failure, and some of the pariition 
>>> sizes are different.
>>>
>>>
>>> autoexpand is off:
>>>
>>> borg-new /home/ler $ zpool get all zroot
>>> NAME PROPERTY VALUE SOURCE
>>> zroot size 10.8T -
>>> zroot capacity 0% -
>>> zroot altroot - default
>>> zroot health ONLINE -
>>> zroot guid 11945658884309024932 default
>>> zroot version - default
>>> zroot bootfs zroot/ROOT/default local
>>> zroot delegation on default
>>> zroot autoreplace off default
>>> zroot cachefile - default
>>> zroot failmode wait default
>>> zroot listsnapshots off default
>>> zroot autoexpand off default
>>> zroot dedupditto 0 default
>>> zroot dedupratio 1.00x -
>>> zroot free 10.7T -
>>> zroot allocated 94.3G -
>>> zroot readonly off -
>>> zroot comment - default
>>> zroot expandsize 16.0E -
>>> zroot freeing 0 default
>>> zroot fragmentation 0% -
>>> zroot leaked 0 default
>>> zroot feature at async_destroy enabled local
>>> zroot feature at empty_bpobj active local
>>> zroot feature at lz4_compress active local
>>> zroot feature at multi_vdev_crash_dump enabled local
>>> zroot feature at spacemap_histogram active local
>>> zroot feature at enabled_txg active local
>>> zroot feature at hole_birth active local
>>> zroot feature at extensible_dataset enabled local
>>> zroot feature at embedded_data active local
>>> zroot feature at bookmarks enabled local
>>> zroot feature at filesystem_limits enabled local
>>> zroot feature at large_blocks enabled local
>>> zroot feature at sha512 enabled local
>>> zroot feature at skein enabled local
>>> borg-new /home/ler $
>>>
>>>
>>> borg-new /home/ler $ gpart show
>>> => 40 3905945520 mfid0 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 1432 - free - (716K)
>>> 4096 16777216 3 freebsd-swap (8.0G)
>>> 16781312 3889162240 4 freebsd-zfs (1.8T)
>>> 3905943552 2008 - free - (1.0M)
>>>
>>> => 40 3905945520 mfid1 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 1432 - free - (716K)
>>> 4096 16777216 3 freebsd-swap (8.0G)
>>> 16781312 3889162240 4 freebsd-zfs (1.8T)
>>> 3905943552 2008 - free - (1.0M)
>>>
>>> => 40 3905945520 mfid2 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 1432 - free - (716K)
>>> 4096 16777216 3 freebsd-swap (8.0G)
>>> 16781312 3889162240 4 freebsd-zfs (1.8T)
>>> 3905943552 2008 - free - (1.0M)
>>>
>>> => 40 3905945520 mfid3 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 16777216 3 freebsd-swap (8.0G)
>>> 16779880 3889165680 4 freebsd-zfs (1.8T)
>>>
>>> => 40 3905945520 mfid5 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 1432 - free - (716K)
>>> 4096 16777216 3 freebsd-swap (8.0G)
>>> 16781312 3889162240 4 freebsd-zfs (1.8T)
>>> 3905943552 2008 - free - (1.0M)
>>>
>>> => 40 3905945520 mfid4 GPT (1.8T)
>>> 40 1600 1 efi (800K)
>>> 1640 1024 2 freebsd-boot (512K)
>>> 2664 1432 - free - (716K)
>>> 4096 16777216 3 freebsd-swap (8.0G)
>>> 16781312 3889160192 4 freebsd-zfs (1.8T)
>>> 3905941504 4056 - free - (2.0M)
>>>
>>> borg-new /home/ler $
>>>
>>>
>>> this system was built last week, and I **CAN** rebuild it if 
>>> necessary, but I didn't do anything strange (so I thought :) )
>>>
>>>
>>>
>>>
>>> On 01/31/2017 12:30 pm, Steven Hartland wrote:
>>>
>>>     Your issue is the reported vdev_max_asize > vdev_asize:
>>>     vdev_max_asize: 11947471798272
>>>     vdev_asize:     11947478089728
>>>
>>>     max asize is smaller than asize by 6291456
>>>
>>>     For raidz1 Xsize should be the smallest disk Xsize * disks so:
>>>     1991245299712 * 6 = 11947471798272
>>>
>>>     So your max asize looks right but asize looks too big
>>>
>>>     Expand Size is calculated by:
>>>     if (vd->vdev_aux == NULL && tvd != NULL && vd->vdev_max_asize !=
>>>     0) {
>>>         vs->vs_esize = P2ALIGN(vd->vdev_max_asize - vd->vdev_asize,
>>>             1ULL << tvd->vdev_ms_shift);
>>>     }
>>>
>>>     So the question is why is asize too big?
>>>
>>>     Given you seem to have some random disk sizes do you have auto
>>>     expand turned on?
>>>
>>>     On 31/01/2017 17:39, Larry Rosenman wrote:
>>>
>>>         vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
>>>         11947478089728
>>>
>>>
>>> -- 
>>> Larry Rosenman http://people.freebsd.org/~ler 
>>> <http://people.freebsd.org/%7Eler>
>>> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org 
>>> <mailto:ler at FreeBSD.org>
>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>
>
> -- 
> Larry Rosenman http://people.freebsd.org/~ler 
> <http://people.freebsd.org/%7Eler>
> Phone: +1 214-642-9640                 E-Mail: ler at FreeBSD.org 
> <mailto:ler at FreeBSD.org>
> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281

-------------- next part --------------
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	(revision 313003)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c	(working copy)
@@ -229,7 +229,8 @@
 	 * so each child must provide at least 1/Nth of its asize.
 	 */
 	if (pvd->vdev_ops == &vdev_raidz_ops)
-		return (pvd->vdev_min_asize / pvd->vdev_children);
+		return ((pvd->vdev_min_asize + pvd->vdev_children - 1) /
+		    pvd->vdev_children);
 
 	return (pvd->vdev_min_asize);
 }
@@ -1377,7 +1378,7 @@
 	vd->vdev_psize = psize;
 
 	/*
-	 * Make sure the allocatable size hasn't shrunk.
+	 * Make sure the allocatable size hasn't shrunk too much.
 	 */
 	if (asize < vd->vdev_min_asize) {
 		vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
@@ -1420,10 +1421,19 @@
 	 * If all children are healthy and the asize has increased,
 	 * then we've experienced dynamic LUN growth.  If automatic
 	 * expansion is enabled then use the additional space.
+	 * 
+	 * Otherwise if asize has reduced, shrink to ensure that
+	 * calculations based of max_asize and asize e.g. esize are
+	 * always valid. This is safe as we've already validated that
+	 * asize is not less than min_asize.
 	 */
-	if (vd->vdev_state == VDEV_STATE_HEALTHY && asize > vd->vdev_asize &&
-	    (vd->vdev_expanding || spa->spa_autoexpand))
-		vd->vdev_asize = asize;
+	if (vd->vdev_state == VDEV_STATE_HEALTHY) {
+		if (asize > vd->vdev_asize &&
+		    (vd->vdev_expanding || spa->spa_autoexpand))
+			vd->vdev_asize = asize;
+		else if (asize < vd->vdev_asize)
+			vd->vdev_asize = asize;
+	}
 
 	vdev_set_min_asize(vd);
 


More information about the freebsd-fs mailing list