Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Thu, 07 Sep 2023 20:07:55 UTC
Thanks, Mark.

On 07.09.2023 15:40, Mark Millard wrote:
> On Sep 7, 2023, at 11:48, Glen Barber <gjb@FreeBSD.org> wrote:
> 
>> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
>>> When I next have time, should I retry based on a more recent
>>> vintage of main that includes 969071be938c ?
>>
>> Yes, please, if you can.
> 
> As stands, I rebooted that machine into my normal
> enviroment, so the after-crash-with-dump-info
> context is preserved. I'll presume lack of a need
> to preserve that context unless I hear otherwise.
> (But I'll work on this until later today.)
> 
> Even my normal environment predates the commit in
> question by a few commits. So I'll end up doing a
> more general round of updates overall.
> 
> Someone can let me know if there is a preference
> for debug over non-debug for the next test run.

It is not unknown when some bugs disappear once debugging is enabled due 
to different execution timings, but generally debug may to detect the 
problem closer to its origin instead of looking on random consequences. 
I am only starting to look on this report (unless Pawel or somebody beat 
me on it), and don't have additional requests yet, but if you can repeat 
the same with debug kernel (in-base ZFS's ZFS_DEBUG setting follows 
kernel's INVARIANTS), it may give us some additional information.

> Looking at "git: 969071be938c - main", the relevant
> part seems to be just (white space possibly not
> preserved accurately):
> 
> diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
> index 9fb5aee6a023..4e4161ef1a7f 100644
> --- a/sys/kern/vfs_vnops.c
> +++ b/sys/kern/vfs_vnops.c
> @@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, struct vnode *outvp,
>   		goto out;
>   
>   	/*
> -	 * If the two vnode are for the same file system, call
> +	 * If the two vnodes are for the same file system type, call
>   	 * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range()
> -	 * which can handle copies across multiple file systems.
> +	 * which can handle copies across multiple file system types.
>   	 */
>   	*lenp = len;
> -	if (invp->v_mount == outvp->v_mount)
> +	if (invp->v_mount == outvp->v_mount ||
> +	    strcmp(invp->v_mount->mnt_vfc->vfc_name,
> +	    outvp->v_mount->mnt_vfc->vfc_name) == 0)
>   		error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
>   		    lenp, flags, incred, outcred, fsize_td);
>   	else
> 
> That looks to call VOP_COPY_FILE_RANGE in more contexts and
> vn_generic_copy_file_range in fewer.
> 
> The backtrace I reported involves: VOP_COPY_FILE_RANGE
> So it appears this change is unlikely to invalidate my
> test result,  although failure might happen sooner if
> more VOP_COPY_FILE_RANGE calls happen with the newer code.

Your logic is likely right, but if you have block cloning requests both 
within and between datasets, this patch may change the pattern.  Though 
it is obviously not a fix for the issue.  I responded to the commit 
email only because it makes no difference while vfs.zfs.bclone_enabled is 0.

> That in turns means that someone may come up with some
> other change for me to test by the time I get around to
> setting up another test. Let me know if so.

-- 
Alexander Motin