Re: What's going on with vnets and epairs w/ addresses?

From: Zhenlei Huang <zlei_at_FreeBSD.org>
Date: Sat, 17 Dec 2022 13:36:10 UTC
> On Dec 17, 2022, at 6:55 AM, Bjoern A. Zeeb <bz@FreeBSD.org <mailto:bz@FreeBSD.org>> wrote:
> 
> On Fri, 16 Dec 2022, Zhenlei Huang wrote:
> 
> Hi,
> 
>> I managed to repeat this issue on CURRENT/14 with this small snip:
>> 
>> -------------------------------------------
>> #!/bin/sh
>> 
>> # test jail name
>> n="test_ref_leak"
>> 
>> jail -c name=$n path=/ vnet persist
>> # The following line trigger jail pr_ref leak
>> jexec $n ifconfig lo0 inet 127.0.0.1/8
>> 
>> jail -R $n
>> 
>> # wait a moment
>> sleep 1
>> 
>> jls -j $n
>> 
>> 
>> -------------------------------------------
>> 
>> 
>> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2]
>> 
>> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 <https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915><https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 <https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915>>
>> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b <https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b><https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b <https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b>>
>> 
>> 
>> In [1] the per-VNET uma zone is shared with the global one.
>> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
>> 
>> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() .
>> 
>> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately ,
>> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>> 
>> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy.
> 
> Thanks a lot for tracking it down.
> 
> That seems to be a regression then that needs to be fixed before
> 14.0-RELEASE will happen as it'll break management utilities of people.
> 
> Could you open a bug report and flag it as such?

While I was trying to open a new bug report Bugzilla prompt an existing PR https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264981 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264981> 
opened by  olivier. I think this issue is same with that one.

> 
> /bz
> 
> 
>> 
>> 
>> Best regards,
>> Zhenlei
>> 
>>> On Dec 14, 2022, at 9:56 AM, Zhenlei Huang <zlei@FreeBSD.org <mailto:zlei@FreeBSD.org>> wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> I also encounter this problem while testing gif tunnel between jails.
>>> 
>>> My script is similar but with additional gif tunnels.
>>> 
>>> 
>>> There are reports in mailing list [1], [2], and another one in forum [3] .
>>> 
>>> Seem to be a long standing issue.
>>> 
>>> [1] https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html <https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html><https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html <https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html>>
>>> [2] https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html <https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html> <https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html <https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html>>
>>> [3] https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/ <https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/><https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/ <https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/>>
>>> 
>>> 
>>> Best regards,
>>> Zhenlei
>>> 
>>>> On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb <bz@FreeBSD.org <mailto:bz@FreeBSD.org> <mailto:bz@FreeBSD.org <mailto:bz@FreeBSD.org>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I have used scripts like the below for almost a decade and a half
>>>> (obviously doing more than that in the middle).  I haven't used them
>>>> much lately but given other questions I just wanted to fire up a test.
>>>> 
>>>> I have an end-November kernel doing the below my eapirs do not come back
>>>> to be destroyed (immediately).
>>>> I have to start polling for the jid to be no longer alive and not in
>>>> dying state (hence added the jls/ifconfig -l lines and removed the
>>>> error checking from ifconfig destroy).  That seems sometimes rather
>>>> unreasonably long (to the point I give up).
>>>> 
>>>> If I don't configure the addresses below this isn't a problem.
>>>> 
>>>> Sorry I am confused by too many incarnations of the code; I know I once
>>>> had a version with an async shutdown path but I believe that never made
>>>> it into mainline, so why are we holding onto the epairs now and not
>>>> nuking the addresses and returning them and are clean?
>>>> 
>>>> It's a bit more funny; I added a twiddle loop at the end and nothing
>>>> happened.  So I stop the script and start it again and suddenly another
>>>> jail or two have cleaned up and their epairs are back.  Something feels
>>>> very very wonky.  Play around with this and see ... and let me know if
>>>> you can reproduce this...  I quite wonder why some test cases haven't
>>>> gone crazy ...
>>>> 
>>>> /bz
>>>> 
>>>> ------------------------------------------------------------------------
>>>> #!/bin/sh
>>>> 
>>>> set -e
>>>> set -x
>>>> 
>>>> js=`jail -i -c -n jl host.hostname=left.example.net <http://left.example.net/> <http://left.example.net/ <http://left.example.net/>> vnet persist`
>>>> jb=`jail -i -c -n jr host.hostname=right.example.net <http://right.example.net/> <http://right.example.net/ <http://right.example.net/>> vnet persist`
>>>> 
>>>> # Create an epair connecting the two machines (vnet jails).
>>>> ep=`ifconfig epair create | sed -e 's/a$//'`
>>>> 
>>>> # Add one end to each vnet jail.
>>>> ifconfig ${ep}a vnet ${js}
>>>> ifconfig ${ep}b vnet ${jb}
>>>> 
>>>> # Add an IP address on the epairs in each vnet jail.
>>>> # XXX Leave these out and the cleanup seems to work fine.
>>>> jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
>>>> jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24
>>>> 
>>>> # Clean up.
>>>> jail -r ${jb}
>>>> jail -r ${js}
>>>> 
>>>> # You want to be able to remove this line ...
>>>> set +e
>>>> 
>>>> # No epairs to destroy with addresses configured; fine otherwise.
>>>> ifconfig ${ep}a destroy
>>>> # echo $?
>>>> 
>>>> # Add this is here only as things are funny ...
>>>> # jls -av jid dying
>>>> # ifconfig -l
>>>> 
>>>> # end
>>>> ------------------------------------------------------------------------
>>>> 
>>>> --
>>>> Bjoern A. Zeeb                                                     r15:7
>>>> 
>>> 
>> 
>> 
> 
> -- 
> Bjoern A. Zeeb                                                     r15:7