From nobody Thu Jan 13 15:27:23 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7747C194BCD5 for ; Thu, 13 Jan 2022 15:27:35 +0000 (UTC) (envelope-from wma@semihalf.com) Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JZSwl20drz3M0L for ; Thu, 13 Jan 2022 15:27:35 +0000 (UTC) (envelope-from wma@semihalf.com) Received: by mail-yb1-xb2a.google.com with SMTP id h14so15955520ybe.12 for ; Thu, 13 Jan 2022 07:27:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o/BM3SiSgk05WdPj42R4ijNkmNtf+eVpkNCRz5Mqldg=; b=QQxZar7T8p8KETGgkT+QeU8QbizxwCJsyVD/8ZkoMWG+BDCQmnVHj94wxvwQy+uM/S 2O9rpXjcLnUz+GEbk06JAWZ550xLpoaeBqdMg+KbNTaDtIXg977gwu7SEwY390mwHReT 5Q6msImrAne2QO+26X9oXozXeY+QdJ+USsgAwSMLi1YJCSLUpTkQ0HK6zrEZ9SMvjr68 HHNdJsXwlxWJa7oH0xFFy1WePonDKEc5vLva2lU8WSsRkv0DgQshCXaUmpL6CvYXPLER Zsj+eEQKioFTHRDZmDs1+QaTpgt/gSPXxhQUiD5dT+/f05aUoKecGhY18wwqBTF5yFrt wBxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o/BM3SiSgk05WdPj42R4ijNkmNtf+eVpkNCRz5Mqldg=; b=6ZhUYw2luo5YAOmvdGlpD8X4vTBvWzP63201wgg2yZ8zXYMlnahR4vvaYAyPXbDV9r A568CcUzVcyYtbCmG+0hcApFjjlQcVkhLzLESo/4j6ZaxvGlF6S1XylYGd3KaBWMKkUz NDHEAZ2xSo8Rx5oYFVVDIambyuU2+gxbWg7QymndoMcwPqHAOisM6gqlI/p0jXgvoBQc cRwjKc37KScomg705mKFGFY+mbsM79HflPXD57ujJxguqXNye8JmUK4nF/UlHUnEkZWV lercGXHwyDXEM1KfvOITpDcJ3knOaw/KhK7KJYmp4cVgSB38dzQrzedkvek+9YWmuSFH VZOA== X-Gm-Message-State: AOAM5312aTb2Bq2BVWkweBycLcx6NtOENY1e2nyAr0AG8Gkqq032ErZG 9RbevEgqQ5HzlnW/j5SNOQdkOQUr6C9IkrJdRaNoRg== X-Google-Smtp-Source: ABdhPJxVdQTj3JupcLisl0b+6rJVw0UHQqAo8HGNy2zxUZgK4EDUyqq5SYdhCO7RRfXEe/bTAXXvFc5ww7DweGdpxu4= X-Received: by 2002:a5b:f0e:: with SMTP id x14mr6524864ybr.671.1642087654696; Thu, 13 Jan 2022 07:27:34 -0800 (PST) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 References: <202201111019.20BAJjCg058259@gitrepo.freebsd.org> In-Reply-To: From: Wojciech Macek Date: Thu, 13 Jan 2022 16:27:23 +0100 Message-ID: Subject: Re: git: 2e72208b6c62 - main - ip_mroute: do not call epoch_waitwhen lock is taken To: Mark Johnston Cc: Wojciech Macek , src-committers , "" , dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="000000000000593d8f05d578538d" X-Rspamd-Queue-Id: 4JZSwl20drz3M0L X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N --000000000000593d8f05d578538d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, i'll take a look if moving mrouter_done outside the lock would work= . Wojtek =C5=9Br., 12 sty 2022, 22:40 u=C5=BCytkownik Mark Johnston napisa=C5=82: > On Tue, Jan 11, 2022 at 10:19:45AM +0000, Wojciech Macek wrote: > > The branch main has been updated by wma: > > > > URL: > https://cgit.FreeBSD.org/src/commit/?id=3D2e72208b6c622505323ed48dc58830f= c307392b1 > > > > commit 2e72208b6c622505323ed48dc58830fc307392b1 > > Author: Wojciech Macek > > AuthorDate: 2022-01-11 10:08:35 +0000 > > Commit: Wojciech Macek > > CommitDate: 2022-01-11 10:19:32 +0000 > > > > ip_mroute: do not call epoch_waitwhen lock is taken > > > > mrouter_done is called with RAW IP lock taken. Some annoying > > printfs are visible on the console if INVARIANTS option is enabled. > > > > Provide atomic-based mechanism which counts enters and exits from/t= o > > critical section in ip_input and ip_output. > > Before de-initialization of function pointers ensure (with busy-wai= t) > > that mrouter de-initialization is visible to all readers and that w= e > don't > > remove pointers (like ip_mforward etc.) in the middle of packet > processing. > > This doesn't address the problem that the warning is complaining about, > which is that a non-sleepable lock is held while performing an expensive > operation. Now we are silently spinning instead, and there's no > guarantee of forward progress since threads may be frequently entering > the MROUTER_RLOCK-protected section in ip_input(). And the change > converts a per-CPU atomic operation (in epoch_enter()) to a global > atomic operation, potentially hurting performance. The atomics are also > missing requisite barriers. > > I think a better solution is to not hold the raw inpcb lock while > calling the ip_mrouter_done() callback. That lock does not provide > synchronization for anything related to ip_mroute.c. Also note that > X_ip_mrouter_done() can sleep in taskqueue_drain(), which is another > reason to avoid holding the lock there. > > > --- > > sys/netinet/ip_mroute.h | 9 +++++---- > > sys/netinet/raw_ip.c | 1 + > > 2 files changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/sys/netinet/ip_mroute.h b/sys/netinet/ip_mroute.h > > index 65c5bdd3a025..faf5b8c72a46 100644 > > --- a/sys/netinet/ip_mroute.h > > +++ b/sys/netinet/ip_mroute.h > > @@ -365,11 +365,12 @@ extern int (*ip_mrouter_set)(struct socket *= , > struct sockopt *); > > extern int (*ip_mrouter_get)(struct socket *, struct sockopt *); > > extern int (*ip_mrouter_done)(void); > > extern int (*mrt_ioctl)(u_long, caddr_t, int); > > +extern int ip_mrouter_critical_section_cnt; > > > > -#define MROUTER_RLOCK_TRACKER struct epoch_tracker mrouter_et > > -#define MROUTER_RLOCK() epoch_enter_preempt(net_epoch_preempt, > &mrouter_et) > > -#define MROUTER_RUNLOCK() > epoch_exit_preempt(net_epoch_preempt, &mrouter_et) > > -#define MROUTER_WAIT() epoch_wait_preempt(net_epoch_preempt) > > +#define MROUTER_RLOCK_TRACKER > > Why keep this definition at all? > > > +#define MROUTER_RLOCK() > atomic_add_int(&ip_mrouter_critical_section_cnt, 1) > > +#define MROUTER_RUNLOCK() > atomic_subtract_int(&ip_mrouter_critical_section_cnt, 1) > > +#define MROUTER_WAIT() do {} while > (atomic_load_int(&ip_mrouter_critical_section_cnt) !=3D 0) > > > > #endif /* _KERNEL */ > > > > diff --git a/sys/netinet/raw_ip.c b/sys/netinet/raw_ip.c > > index 7c495745806e..dc49c36f25ad 100644 > > --- a/sys/netinet/raw_ip.c > > +++ b/sys/netinet/raw_ip.c > > @@ -120,6 +120,7 @@ VNET_DEFINE(struct socket *, ip_mrouter); > > int (*ip_mrouter_set)(struct socket *, struct sockopt *); > > int (*ip_mrouter_get)(struct socket *, struct sockopt *); > > int (*ip_mrouter_done)(void); > > +int ip_mrouter_critical_section_cnt; > > int (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *, > > struct ip_moptions *); > > int (*mrt_ioctl)(u_long, caddr_t, int); > --000000000000593d8f05d578538d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks, i'll take a look if moving mrouter_done outsi= de the lock would work.

Wojtek=

=C5=9Br., 12 sty 2022, 22:40 u=C5=BCytkownik Mark Johnston <markj@freebsd.org> napisa=C5=82:
<= /div>
On Tue, Jan 11, 2022 at 10:19:45AM +000= 0, Wojciech Macek wrote:
> The branch main has been updated by wma:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=3D2e72208b6c622505323ed48dc58830= fc307392b1
>
> commit 2e72208b6c622505323ed48dc58830fc307392b1
> Author:=C2=A0 =C2=A0 =C2=A0Wojciech Macek <wma@FreeBSD.org>
> AuthorDate: 2022-01-11 10:08:35 +0000
> Commit:=C2=A0 =C2=A0 =C2=A0Wojciech Macek <wma@FreeBSD.org>
> CommitDate: 2022-01-11 10:19:32 +0000
>
>=C2=A0 =C2=A0 =C2=A0ip_mroute: do not call epoch_waitwhen lock is taken=
>=C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0mrouter_done is called with RAW IP lock taken. Some= annoying
>=C2=A0 =C2=A0 =C2=A0printfs are visible on the console if INVARIANTS op= tion is enabled.
>=C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0Provide atomic-based mechanism which counts enters = and exits from/to
>=C2=A0 =C2=A0 =C2=A0critical section in ip_input and ip_output.
>=C2=A0 =C2=A0 =C2=A0Before de-initialization of function pointers ensur= e (with busy-wait)
>=C2=A0 =C2=A0 =C2=A0that mrouter de-initialization is visible to all re= aders and that we don't
>=C2=A0 =C2=A0 =C2=A0remove pointers (like ip_mforward etc.) in the midd= le of packet processing.

This doesn't address the problem that the warning is complaining about,=
which is that a non-sleepable lock is held while performing an expensive operation.=C2=A0 Now we are silently spinning instead, and there's no guarantee of forward progress since threads may be frequently entering
the MROUTER_RLOCK-protected section in ip_input().=C2=A0 And the change
converts a per-CPU atomic operation (in epoch_enter()) to a global
atomic operation, potentially hurting performance.=C2=A0 The atomics are al= so
missing requisite barriers.

I think a better solution is to not hold the raw inpcb lock while
calling the ip_mrouter_done() callback.=C2=A0 That lock does not provide synchronization for anything related to ip_mroute.c.=C2=A0 Also note that X_ip_mrouter_done() can sleep in taskqueue_drain(), which is another
reason to avoid holding the lock there.

> ---
>=C2=A0 sys/netinet/ip_mroute.h | 9 +++++----
>=C2=A0 sys/netinet/raw_ip.c=C2=A0 =C2=A0 | 1 +
>=C2=A0 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/sys/netinet/ip_mroute.h b/sys/netinet/ip_mroute.h
> index 65c5bdd3a025..faf5b8c72a46 100644
> --- a/sys/netinet/ip_mroute.h
> +++ b/sys/netinet/ip_mroute.h
> @@ -365,11 +365,12 @@ extern int=C2=A0 =C2=A0 =C2=A0 (*ip_mrouter_set)= (struct socket *, struct sockopt *);
>=C2=A0 extern int=C2=A0 =C2=A0(*ip_mrouter_get)(struct socket *, struct= sockopt *);
>=C2=A0 extern int=C2=A0 =C2=A0(*ip_mrouter_done)(void);
>=C2=A0 extern int=C2=A0 =C2=A0(*mrt_ioctl)(u_long, caddr_t, int);
> +extern int=C2=A0 =C2=A0ip_mrouter_critical_section_cnt;
>=C2=A0
> -#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RLOCK_TRACKER=C2=A0 =C2=A0struct = epoch_tracker mrouter_et
> -#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RLOCK() epoch_enter_preempt(net_e= poch_preempt, &mrouter_et)
> -#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RUNLOCK()=C2=A0 =C2=A0 =C2=A0 =C2= =A0epoch_exit_preempt(net_epoch_preempt, &mrouter_et)
> -#define=C2=A0 =C2=A0 =C2=A0 MROUTER_WAIT()=C2=A0 epoch_wait_preempt(n= et_epoch_preempt)
> +#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RLOCK_TRACKER

Why keep this definition at all?

> +#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RLOCK()=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0atomic_add_int(&ip_mrouter_critical_section_cnt, 1)
> +#define=C2=A0 =C2=A0 =C2=A0 MROUTER_RUNLOCK()=C2=A0 =C2=A0 =C2=A0 =C2= =A0atomic_subtract_int(&ip_mrouter_critical_section_cnt, 1)
> +#define=C2=A0 =C2=A0 =C2=A0 MROUTER_WAIT()=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 do {} while (atomic_load_int(&ip_mrouter_critical_section_cnt) = !=3D 0)
>=C2=A0
>=C2=A0 #endif /* _KERNEL */
>=C2=A0
> diff --git a/sys/netinet/raw_ip.c b/sys/netinet/raw_ip.c
> index 7c495745806e..dc49c36f25ad 100644
> --- a/sys/netinet/raw_ip.c
> +++ b/sys/netinet/raw_ip.c
> @@ -120,6 +120,7 @@ VNET_DEFINE(struct socket *, ip_mrouter);
>=C2=A0 int (*ip_mrouter_set)(struct socket *, struct sockopt *);
>=C2=A0 int (*ip_mrouter_get)(struct socket *, struct sockopt *);
>=C2=A0 int (*ip_mrouter_done)(void);
> +int ip_mrouter_critical_section_cnt;
>=C2=A0 int (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *, >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct i= p_moptions *);
>=C2=A0 int (*mrt_ioctl)(u_long, caddr_t, int);
--000000000000593d8f05d578538d--