Re: git: e9fc0c538264 - main - if_clone: Make ifnet_detach_sxlock opaque to consumers
Date: Tue, 14 Apr 2026 17:45:53 UTC
> On Apr 14, 2026, at 11:57 PM, Kristof Provost <kp@FreeBSD.org> wrote:
>
> On 13 Apr 2026, at 6:39, Zhenlei Huang wrote:
>
> The branch main has been updated by zlei:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290 <https://cgit.freebsd.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290>
> commit e9fc0c538264355bd3fd9120c650078281c2a290
> Author: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>
> AuthorDate: 2026-04-13 04:38:44 +0000
> Commit: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>
> CommitDate: 2026-04-13 04:38:44 +0000
>
> if_clone: Make ifnet_detach_sxlock opaque to consumers
>
> The change e133271fc1b5e introduced ifnet_detach_sxlock, and change
> 6d2a10d96fb5 widened its coverage, but there are still consumers,
> net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere,
> make it opaque to consumers.
>
> Out of tree drivers shall also benefit from this change.
>
> Reviewed by: kp
> MFC after: 2 weeks
> Differential Revision: https://reviews.freebsd.org/D56298
> I don’t really understand why, but this commit causes bricoler test runs to fail for me.
> Or more precisely, it looks like parallel test runs see a number of failures in the pf tests.
>
> Reverting only part of this change, specifically this:
>
> diff --git a/sys/net/if_clone.c b/sys/net/if_clone.c
> index db3db78c1b76..d0fe54b6146b 100644
> --- a/sys/net/if_clone.c
> +++ b/sys/net/if_clone.c
> @@ -683,11 +683,12 @@ if_clone_detach(struct if_clone *ifc)
> V_if_cloners_count--;
> IF_CLONERS_UNLOCK();
>
> - sx_xlock(&ifnet_detach_sxlock);
> /* destroy all interfaces for this cloner */
> - while (!LIST_EMPTY(&ifc->ifc_iflist))
> + while (!LIST_EMPTY(&ifc->ifc_iflist)) {
> + sx_xlock(&ifnet_detach_sxlock);
> if_clone_destroyif_flags(ifc, LIST_FIRST(&ifc->ifc_iflist), IFC_F_FORCE);
> - sx_xunlock(&ifnet_detach_sxlock);
> + sx_xunlock(&ifnet_detach_sxlock);
> + }
>
> IF_CLONE_REMREF(ifc);
> }
> Seems to be enough to avoid the problem. So it looks like somehow deleting all interfaces for a given cloner under that lock (rather than releasing it and re-acquiring it) causes the test failures.
>
>
That is interesting. I do not expect a re-acquiring will make any differences, given ifc_detach_cloner() and / or if_clone_detach() are basically be invoked by VNET_SYSUNINIT() and vnet_destroy() has acquired ifnet_detach_sxlock ( I planed to removed the acquiring of it in vnet_destroy() but not yet done ). So I'd expect an assert `sx_assert(&ifnet_detach_sxlock, SA_XLOCKED)` in if_clone_detach() will mostly succeed.
Some drivers, for example wg(4), iterate all VNETs and invoke ifc_detach_cloner() on module unload. That should be the only case that `ifnet_detach_sxlock` is not recursively acquired, and not acquired prior to
the commit.
> The failing tests I see are:
>
> sys/netpfil/pf/divert-to:in_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [36.591s]
> sys/netpfil/pf/divert-to:in_div_in -> failed: atf-check failed; see the output of the test for details [30.867s]
> sys/netpfil/pf/divert-to:in_div_in_fwd_out_div_out -> failed: atf-check failed; see the output of the test for details [39.705s]
> sys/netpfil/pf/divert-to:in_dn_in_div_in_out_div_out_dn_out -> failed: atf-check failed; see the output of the test for details [45.059s]
> sys/netpfil/pf/divert-to:out_div -> failed: Test case body returned a non-ok exit code, but this is not allowed [23.629s]
> sys/netpfil/pf/killstate:gateway -> failed: Killing with a different gateway removed the state. [32.803s]
> sys/netpfil/pf/killstate:id -> failed: Killing a different ID removed the state. [32.789s]
> sys/netpfil/pf/killstate:label -> failed: Killing a non-existing label removed the state. [47.749s]
> sys/netpfil/pf/killstate:multilabel -> failed: Setting new rules removed the state. [43.570s]
> sys/netpfil/pf/killstate:src_dst -> failed: Killing with the wrong destination IP removed our state. [57.819s]
> sys/netpfil/pf/killstate:v6 -> failed: Killing with the wrong IP removed our state. [54.957s]
> sys/netpfil/pf/nat64:pool -> failed: atf-check failed; see the output of the test for details [51.118s]
> sys/netpfil/pf/nat64:sctp_in -> failed: Failed to connect to SCTP server [55.495s]
> sys/netpfil/pf/nat64.py:TestNAT64::test_udp_checksum -> failed: /usr/tests/sys/netpfil/pf/nat64.py:205: AttributeError [80.001s]
> sys/netpfil/pf/pflog:matches -> failed: atf-check failed; see the output of the test for details [61.807s]
> sys/netpfil/pf/pflog:matches_logif -> failed: atf-check failed; see the output of the test for details [51.261s]
> sys/netpfil/pf/pflog:unspecified_v6 -> failed: atf-check failed; see the output of the test for details [36.281s]
> sys/netpfil/pf/rdr:srcport_pass -> failed: atf-check failed; see the output of the test for details [76.815s]
> sys/netpfil/pf/sctp:pfsync -> failed: Initial SCTP connection failed [134.866s]
> sys/netpfil/pf/sctp:related_icmp -> failed: SCTP connection failed [59.620s]
> sys/netpfil/pf/table:zero_one -> failed: atf-check failed; see the output of the test for details [64.125s]
> sys/netpfil/pf/tcp:rst -> failed: atf-check failed; see the output of the test for details [80.261s]
> Looking at it now, those runtimes seems suspiciously high as well.
>
>
I noticed that. My local test shows most of them finish in seconds. Some require more time to finish but still within about 20 seconds.
I tested as the following,
```
# kyua test -k Kyuafile pf/divert-to
```
Is the failing tests always fail ? Does the kyua report show anything useful ?
> Best regards,
> Kristof
>
Best regards,
Zhenlei