Re: git: e9fc0c538264 - main - if_clone: Make ifnet_detach_sxlock opaque to consumers

From: Zhenlei Huang <zlei_at_FreeBSD.org>
Date: Tue, 14 Apr 2026 17:45:53 UTC

> On Apr 14, 2026, at 11:57 PM, Kristof Provost <kp@FreeBSD.org> wrote:
> 
> On 13 Apr 2026, at 6:39, Zhenlei Huang wrote:
> 
> The branch main has been updated by zlei:
> 
> URL: https://cgit.FreeBSD.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290 <https://cgit.freebsd.org/src/commit/?id=e9fc0c538264355bd3fd9120c650078281c2a290>
> commit e9fc0c538264355bd3fd9120c650078281c2a290
> Author: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>
> AuthorDate: 2026-04-13 04:38:44 +0000
> Commit: Zhenlei Huang zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>
> CommitDate: 2026-04-13 04:38:44 +0000
> 
> if_clone: Make ifnet_detach_sxlock opaque to consumers
> 
> The change e133271fc1b5e introduced ifnet_detach_sxlock, and change
> 6d2a10d96fb5 widened its coverage, but there are still consumers,
> net80211 and tuntap e.g., want it. Instead of sprinkling it everywhere,
> make it opaque to consumers.
> 
> Out of tree drivers shall also benefit from this change.
> 
> Reviewed by:    kp
> MFC after:      2 weeks
> Differential Revision:  https://reviews.freebsd.org/D56298
> I don’t really understand why, but this commit causes bricoler test runs to fail for me.
> Or more precisely, it looks like parallel test runs see a number of failures in the pf tests.
> 
> Reverting only part of this change, specifically this:
> 
> diff --git a/sys/net/if_clone.c b/sys/net/if_clone.c
> index db3db78c1b76..d0fe54b6146b 100644
> --- a/sys/net/if_clone.c
> +++ b/sys/net/if_clone.c
> @@ -683,11 +683,12 @@ if_clone_detach(struct if_clone *ifc)
>         V_if_cloners_count--;
>         IF_CLONERS_UNLOCK();
> 
> -       sx_xlock(&ifnet_detach_sxlock);
>         /* destroy all interfaces for this cloner */
> -       while (!LIST_EMPTY(&ifc->ifc_iflist))
> +       while (!LIST_EMPTY(&ifc->ifc_iflist)) {
> +               sx_xlock(&ifnet_detach_sxlock);
>                 if_clone_destroyif_flags(ifc, LIST_FIRST(&ifc->ifc_iflist), IFC_F_FORCE);
> -       sx_xunlock(&ifnet_detach_sxlock);
> +               sx_xunlock(&ifnet_detach_sxlock);
> +       }
> 
>         IF_CLONE_REMREF(ifc);
>  }
> Seems to be enough to avoid the problem. So it looks like somehow deleting all interfaces for a given cloner under that lock (rather than releasing it and re-acquiring it) causes the test failures.
> 
> 

That is interesting. I do not expect a re-acquiring will make any differences, given ifc_detach_cloner() and / or if_clone_detach() are basically be invoked by VNET_SYSUNINIT() and  vnet_destroy() has acquired ifnet_detach_sxlock ( I planed to removed the acquiring of it in vnet_destroy() but not yet done ). So I'd expect an assert `sx_assert(&ifnet_detach_sxlock, SA_XLOCKED)` in if_clone_detach() will mostly succeed.

Some drivers, for example wg(4), iterate all VNETs and invoke ifc_detach_cloner() on module unload. That should be the only case that `ifnet_detach_sxlock` is not recursively acquired, and not acquired prior to
the commit.
> The failing tests I see are:
> 
> sys/netpfil/pf/divert-to:in_div  ->  failed: Test case body returned a non-ok exit code, but this is not allowed  [36.591s]
> sys/netpfil/pf/divert-to:in_div_in  ->  failed: atf-check failed; see the output of the test for details  [30.867s]
> sys/netpfil/pf/divert-to:in_div_in_fwd_out_div_out  ->  failed: atf-check failed; see the output of the test for details  [39.705s]
> sys/netpfil/pf/divert-to:in_dn_in_div_in_out_div_out_dn_out  ->  failed: atf-check failed; see the output of the test for details  [45.059s]
> sys/netpfil/pf/divert-to:out_div  ->  failed: Test case body returned a non-ok exit code, but this is not allowed  [23.629s]
> sys/netpfil/pf/killstate:gateway  ->  failed: Killing with a different gateway removed the state.  [32.803s]
> sys/netpfil/pf/killstate:id  ->  failed: Killing a different ID removed the state.  [32.789s]
> sys/netpfil/pf/killstate:label  ->  failed: Killing a non-existing label removed the state.  [47.749s]
> sys/netpfil/pf/killstate:multilabel  ->  failed: Setting new rules removed the state.  [43.570s]
> sys/netpfil/pf/killstate:src_dst  ->  failed: Killing with the wrong destination IP removed our state.  [57.819s]
> sys/netpfil/pf/killstate:v6  ->  failed: Killing with the wrong IP removed our state.  [54.957s]
> sys/netpfil/pf/nat64:pool  ->  failed: atf-check failed; see the output of the test for details  [51.118s]
> sys/netpfil/pf/nat64:sctp_in  ->  failed: Failed to connect to SCTP server  [55.495s]
> sys/netpfil/pf/nat64.py:TestNAT64::test_udp_checksum  ->  failed: /usr/tests/sys/netpfil/pf/nat64.py:205: AttributeError  [80.001s]
> sys/netpfil/pf/pflog:matches  ->  failed: atf-check failed; see the output of the test for details  [61.807s]
> sys/netpfil/pf/pflog:matches_logif  ->  failed: atf-check failed; see the output of the test for details  [51.261s]
> sys/netpfil/pf/pflog:unspecified_v6  ->  failed: atf-check failed; see the output of the test for details  [36.281s]
> sys/netpfil/pf/rdr:srcport_pass  ->  failed: atf-check failed; see the output of the test for details  [76.815s]
> sys/netpfil/pf/sctp:pfsync  ->  failed: Initial SCTP connection failed  [134.866s]
> sys/netpfil/pf/sctp:related_icmp  ->  failed: SCTP connection failed  [59.620s]
> sys/netpfil/pf/table:zero_one  ->  failed: atf-check failed; see the output of the test for details  [64.125s]
> sys/netpfil/pf/tcp:rst  ->  failed: atf-check failed; see the output of the test for details  [80.261s]
> Looking at it now, those runtimes seems suspiciously high as well.
> 
> 

I noticed that. My local test shows most of them finish in seconds. Some require more time to finish but still within about 20 seconds.
I tested as the following,
```
# kyua test -k Kyuafile pf/divert-to
```

Is the failing tests always fail ? Does the kyua report show anything useful ?
> Best regards,
> Kristof
> 

Best regards,
Zhenlei