Re: git: 2e72208b6c62 - main - ip_mroute: do not call epoch_waitwhen lock is taken
Date: Wed, 12 Jan 2022 21:40:33 UTC
On Tue, Jan 11, 2022 at 10:19:45AM +0000, Wojciech Macek wrote:
> The branch main has been updated by wma:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=2e72208b6c622505323ed48dc58830fc307392b1
>
> commit 2e72208b6c622505323ed48dc58830fc307392b1
> Author: Wojciech Macek <wma@FreeBSD.org>
> AuthorDate: 2022-01-11 10:08:35 +0000
> Commit: Wojciech Macek <wma@FreeBSD.org>
> CommitDate: 2022-01-11 10:19:32 +0000
>
> ip_mroute: do not call epoch_waitwhen lock is taken
>
> mrouter_done is called with RAW IP lock taken. Some annoying
> printfs are visible on the console if INVARIANTS option is enabled.
>
> Provide atomic-based mechanism which counts enters and exits from/to
> critical section in ip_input and ip_output.
> Before de-initialization of function pointers ensure (with busy-wait)
> that mrouter de-initialization is visible to all readers and that we don't
> remove pointers (like ip_mforward etc.) in the middle of packet processing.
This doesn't address the problem that the warning is complaining about,
which is that a non-sleepable lock is held while performing an expensive
operation. Now we are silently spinning instead, and there's no
guarantee of forward progress since threads may be frequently entering
the MROUTER_RLOCK-protected section in ip_input(). And the change
converts a per-CPU atomic operation (in epoch_enter()) to a global
atomic operation, potentially hurting performance. The atomics are also
missing requisite barriers.
I think a better solution is to not hold the raw inpcb lock while
calling the ip_mrouter_done() callback. That lock does not provide
synchronization for anything related to ip_mroute.c. Also note that
X_ip_mrouter_done() can sleep in taskqueue_drain(), which is another
reason to avoid holding the lock there.
> ---
> sys/netinet/ip_mroute.h | 9 +++++----
> sys/netinet/raw_ip.c | 1 +
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/sys/netinet/ip_mroute.h b/sys/netinet/ip_mroute.h
> index 65c5bdd3a025..faf5b8c72a46 100644
> --- a/sys/netinet/ip_mroute.h
> +++ b/sys/netinet/ip_mroute.h
> @@ -365,11 +365,12 @@ extern int (*ip_mrouter_set)(struct socket *, struct sockopt *);
> extern int (*ip_mrouter_get)(struct socket *, struct sockopt *);
> extern int (*ip_mrouter_done)(void);
> extern int (*mrt_ioctl)(u_long, caddr_t, int);
> +extern int ip_mrouter_critical_section_cnt;
>
> -#define MROUTER_RLOCK_TRACKER struct epoch_tracker mrouter_et
> -#define MROUTER_RLOCK() epoch_enter_preempt(net_epoch_preempt, &mrouter_et)
> -#define MROUTER_RUNLOCK() epoch_exit_preempt(net_epoch_preempt, &mrouter_et)
> -#define MROUTER_WAIT() epoch_wait_preempt(net_epoch_preempt)
> +#define MROUTER_RLOCK_TRACKER
Why keep this definition at all?
> +#define MROUTER_RLOCK() atomic_add_int(&ip_mrouter_critical_section_cnt, 1)
> +#define MROUTER_RUNLOCK() atomic_subtract_int(&ip_mrouter_critical_section_cnt, 1)
> +#define MROUTER_WAIT() do {} while (atomic_load_int(&ip_mrouter_critical_section_cnt) != 0)
>
> #endif /* _KERNEL */
>
> diff --git a/sys/netinet/raw_ip.c b/sys/netinet/raw_ip.c
> index 7c495745806e..dc49c36f25ad 100644
> --- a/sys/netinet/raw_ip.c
> +++ b/sys/netinet/raw_ip.c
> @@ -120,6 +120,7 @@ VNET_DEFINE(struct socket *, ip_mrouter);
> int (*ip_mrouter_set)(struct socket *, struct sockopt *);
> int (*ip_mrouter_get)(struct socket *, struct sockopt *);
> int (*ip_mrouter_done)(void);
> +int ip_mrouter_critical_section_cnt;
> int (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *,
> struct ip_moptions *);
> int (*mrt_ioctl)(u_long, caddr_t, int);