git: 1f628be888b7 - main - tcp_ratelimit: provide an api for drivers to release ratesets at detach
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 05 Aug 2024 16:52:07 UTC
The branch main has been updated by gallatin: URL: https://cgit.FreeBSD.org/src/commit/?id=1f628be888b74f1219b3ea7ccea1e7a3d1db77a2 commit 1f628be888b74f1219b3ea7ccea1e7a3d1db77a2 Author: Andrew Gallatin <gallatin@FreeBSD.org> AuthorDate: 2024-08-05 15:45:42 +0000 Commit: Andrew Gallatin <gallatin@FreeBSD.org> CommitDate: 2024-08-05 16:51:35 +0000 tcp_ratelimit: provide an api for drivers to release ratesets at detach When the kernel is compiled with options RATELIMIT, the mlx5en driver cannot detach. It gets stuck waiting for all kernel users of its rates to drop to zero before finally calling ether_ifdetach. The tcp ratelimit code has an eventhandler for ifnet departure which causes rates to be released. However, this is called as an ifnet departure eventhandler, which is invoked as part of ifdetach(), via either_ifdetach(). This means that the tcp ratelimit code holds down many hw rates when the mlx5en driver is waiting for the rate count to go to 0. Thus devctl detach will deadlock on mlx5 with this stack: mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a To fix this, provide an explicit API for a driver to call the tcp ratelimit code telling it to detach itself from an ifnet. This allows the mlx5 driver to unload cleanly. I considered adding an ifnet pre-departure eventhandler. However, that would need to be invoked by the driver, so a simple function call seemed better. The mlx5en driver has been updated to call this function. Reviewed by: kib, rrs Differential Revision: https://reviews.freebsd.org/D46221 Sponsored by: Netflix --- sys/dev/mlx5/mlx5_en/mlx5_en_main.c | 8 +++++++- sys/netinet/tcp_ratelimit.c | 6 ++++++ sys/netinet/tcp_ratelimit.h | 9 +++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/sys/dev/mlx5/mlx5_en/mlx5_en_main.c b/sys/dev/mlx5/mlx5_en/mlx5_en_main.c index ccbdf11a1dd5..a80235f0f347 100644 --- a/sys/dev/mlx5/mlx5_en/mlx5_en_main.c +++ b/sys/dev/mlx5/mlx5_en/mlx5_en_main.c @@ -36,6 +36,7 @@ #include <machine/atomic.h> #include <net/debugnet.h> +#include <netinet/tcp_ratelimit.h> #include <netipsec/keydb.h> #include <netipsec/ipsec_offload.h> @@ -4876,7 +4877,12 @@ mlx5e_destroy_ifp(struct mlx5_core_dev *mdev, void *vpriv) #ifdef RATELIMIT /* - * The kernel can have reference(s) via the m_snd_tag's into + * Tell the TCP ratelimit code to release the rate-sets attached + * to our ifnet. + */ + tcp_rl_release_ifnet(ifp); + /* + * The kernel can still have reference(s) via the m_snd_tag's into * the ratelimit channels, and these must go away before * detaching: */ diff --git a/sys/netinet/tcp_ratelimit.c b/sys/netinet/tcp_ratelimit.c index 1834c702c493..22bdf707fa89 100644 --- a/sys/netinet/tcp_ratelimit.c +++ b/sys/netinet/tcp_ratelimit.c @@ -1298,6 +1298,12 @@ tcp_rl_ifnet_departure(void *arg __unused, struct ifnet *ifp) NET_EPOCH_EXIT(et); } +void +tcp_rl_release_ifnet(struct ifnet *ifp) +{ + tcp_rl_ifnet_departure(NULL, ifp); +} + static void tcp_rl_shutdown(void *arg __unused, int howto __unused) { diff --git a/sys/netinet/tcp_ratelimit.h b/sys/netinet/tcp_ratelimit.h index cd540d1164e1..0ce42dea0d90 100644 --- a/sys/netinet/tcp_ratelimit.h +++ b/sys/netinet/tcp_ratelimit.h @@ -94,6 +94,8 @@ CK_LIST_HEAD(head_tcp_rate_set, tcp_rate_set); #ifndef ETHERNET_SEGMENT_SIZE #define ETHERNET_SEGMENT_SIZE 1514 #endif +struct tcpcb; + #ifdef RATELIMIT #define DETAILED_RATELIMIT_SYSCTL 1 /* * Undefine this if you don't want @@ -131,6 +133,9 @@ tcp_get_pacing_burst_size_w_divisor(struct tcpcb *tp, uint64_t bw, uint32_t segs void tcp_rl_log_enobuf(const struct tcp_hwrate_limit_table *rte); +void +tcp_rl_release_ifnet(struct ifnet *ifp); + #else static inline const struct tcp_hwrate_limit_table * tcp_set_pacing_rate(struct tcpcb *tp, struct ifnet *ifp, @@ -218,6 +223,10 @@ tcp_rl_log_enobuf(const struct tcp_hwrate_limit_table *rte) { } +static inline void +tcp_rl_release_ifnet(struct ifnet *ifp) +{ +} #endif /*