Re: git: a4e4132fa3bf - main - swapoff(2): replace special device name argument with a structure

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sun, 05 Dec 2021 19:53:37 UTC
On Sun, Dec 05, 2021 at 01:38:52PM -0600, Mike Karels wrote:
> On 5 Dec 2021, at 12:56, Jessica Clarke wrote:
> 
> > On 5 Dec 2021, at 18:51, Konstantin Belousov <kostikbel@gmail.com> wrote:
> >> On Sun, Dec 05, 2021 at 05:14:54PM +0000, Jessica Clarke wrote:
> >>> On 5 Dec 2021, at 13:22, Konstantin Belousov <kostikbel@gmail.com> wrote:
> >>>>
> >>>> On Sun, Dec 05, 2021 at 03:03:26AM +0000, Jessica Clarke wrote:
> >>>>> On 4 Dec 2021, at 22:21, Konstantin Belousov <kib@FreeBSD.org> wrote:
> >>>>>>
> >>>>>> The branch main has been updated by kib:
> >>>>>>
> >>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=a4e4132fa3bfadb6047fc0fa5f399f4640460300
> >>>>>>
> >>>>>> commit a4e4132fa3bfadb6047fc0fa5f399f4640460300
> >>>>>> Author:     Konstantin Belousov <kib@FreeBSD.org>
> >>>>>> AuthorDate: 2021-11-29 16:26:31 +0000
> >>>>>> Commit:     Konstantin Belousov <kib@FreeBSD.org>
> >>>>>> CommitDate: 2021-12-04 22:20:58 +0000
> >>>>>>
> >>>>>>  swapoff(2): replace special device name argument with a structure
> >>>>>>
> >>>>>>  For compatibility, add a placeholder pointer to the start of the
> >>>>>>  added struct swapoff_new_args, and use it to distinguish old vs. new
> >>>>>>  style of syscall invocation.
> >>>>>>
> >>>>>>  Reviewed by:    markj
> >>>>>>  Discussed with: alc
> >>>>>>  Sponsored by:   The FreeBSD Foundation
> >>>>>>  MFC after:      1 week
> >>>>>>  Differential revision:  https://reviews.freebsd.org/D33165
> >>>>>> ---
> >>>>>> sys/vm/swap_pager.c | 27 +++++++++++++++++++++++++--
> >>>>>> sys/vm/swap_pager.h |  8 ++++++++
> >>>>>> 2 files changed, 33 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
> >>>>>> index 165373d1b527..dc1df79f4fcd 100644
> >>>>>> --- a/sys/vm/swap_pager.c
> >>>>>> +++ b/sys/vm/swap_pager.c
> >>>>>> @@ -2491,15 +2491,38 @@ sys_swapoff(struct thread *td, struct swapoff_args *uap)
> >>>>>> 	struct vnode *vp;
> >>>>>> 	struct nameidata nd;
> >>>>>> 	struct swdevt *sp;
> >>>>>> -	int error;
> >>>>>> +	struct swapoff_new_args sa;
> >>>>>> +	int error, probe_byte;
> >>>>>>
> >>>>>> 	error = priv_check(td, PRIV_SWAPOFF);
> >>>>>> 	if (error)
> >>>>>> 		return (error);
> >>>>>>
> >>>>>> +	/*
> >>>>>> +	 * Detect old vs. new-style swapoff(2) syscall.  The first
> >>>>>> +	 * pointer in the memory pointed to by uap->name is NULL for
> >>>>>> +	 * the new variant.
> >>>>>> +	 */
> >>>>>> +	probe_byte = fubyte(uap->name);
> >>>>>> +	switch (probe_byte) {
> >>>>>> +	case -1:
> >>>>>> +		return (EFAULT);
> >>>>>> +	case 0:
> >>>>>> +		error = copyin(uap->name, &sa, sizeof(sa));
> >>>>>> +		if (error != 0)
> >>>>>> +			return (error);
> >>>>>> +		if (sa.flags != 0)
> >>>>>> +			return (EINVAL);
> >>>>>> +		break;
> >>>>>> +	default:
> >>>>>> +		bzero(&sa, sizeof(sa));
> >>>>>> +		sa.name = uap->name;
> >>>>>> +		break;
> >>>>>> +	}
> >>>>>
> >>>>> Doesn’t this change the semantics of swapoff("")?
> >>>>>
> >>>>> Previously it would fail deterministically, presumably with ENOENT or
> >>>>> something, but now it reinterprets whatever follows that string in
> >>>>> memory as the new argument structure. It probably doesn’t matter, but
> >>>>> this approach is ugly. Can we not just define a new syscall rather than
> >>>>> this kind of bodge?
> >>>>
> >>>> Having two swapoff() syscalls is worse, and having them only differ in
> >>>> semantic by single flag is kind of crime.
> >>>>
> >>>> I do not see swapoff("") as problematic, we are changing a minor semantic of
> >>>> the management syscall.  I only wanted to avoid flag day for swapoff binaries.
> >>>>
> >>>> BTW, I considered requiring proper alignment for uap->name, and then checking
> >>>> the whole uap->name_old_syscall for NULL, but then decided that this is
> >>>> overkill.  If you think that swapoff("") that important, I can add that
> >>>> additional verification.
> >>>
> >>> Why’s it worse? It’s just a syscall number, you deprecate the old one
> >>> and move on, we do that for things relatively regularly. This is really
> >>> not a good solution; harder to use as a caller since the prototype is
> >>> wrong, impossible to ensure you preserve the semantics for the existing
> >>> interface in all cases, and ugly to implement. You don’t need a flag
> >>> day for a new syscall, either, you can continue to only use the new
> >>> method for -f for a release and then switch over to the new syscall
> >>> entirely. Or switch over to the new syscall entirely now and fall back
> >>> on the old syscall if -f isn’t passed. Defining a new syscall also lets
> >>> you not need the name_old_syscall member in the struct, and gives you a
> >>> clean, fully-extensible syscall to which future features can be added
> >>> in a backwards-compatible way, rather than forever keeping around this
> >>> legacy mess.
> >>
> >> I disagree, it is not just a syscall number, it is whole user/kernel
> >> interface that bloats, which means cognitive efforts from anybody using
> >> this interfaces, and for which we must maintain ABI compatibility.
> >
> > Which is just as true of this approach; you have the same two
> > interfaces here, just smashed together into a single harder-to-use
> > syscall rather than two separate syscalls. Having a separate syscall at
> > least allows the old one to return ENOSYS in the future, whereas if you
> > ever want to deprecate the old interface with this method then you’ll
> > need some other weird error response that’s harder to interpret as
> > meaning “that variant of this syscall doesn’t exist any more”.
> >
> >> New syscall allocation should be done only as a last resort, when existing
> >> interfaces cannot be adopted for new functionality.
> >
> > Which this can’t without breaking the existing well-defined semantics,
> > as I’ve stated.
> >
> >> Good (or rather, bad) example of the uglyness that is backed by the attitude
> >> that syscalls are free, is whole *at() mess, or specific stat*() mess (old,
> >> other bsds, pre-ino64, ino64, at, then stat vs fstat, then Linux statx which
> >> probably fixes the interface ultimately).
> >
> > It’s better than this approach.
> 
> I have less resistance to adding new syscalls, and I agree with Jess that it
> is the right thing to do in this case.  Adding a syscall means the kernel
> supports either old or new interface, so there is no flag day.  And it is
> easier to clean up; maintaining two syscalls should only be needed for a
> while on -current, and not in any release.
No, we do not do this.  We maintain backward compatibility, old swapoff(2)
would have to live under COMPAT_FREEBSD12 forever.