Re: epair and vnet jail loose connection.

From: Johan Hendriks <joh.hendriks_at_gmail.com>
Date: Thu, 10 Mar 2022 00:40:39 UTC
I remembered that it used to work, so i thought lets go back in time.
So i did a git reset --hard 375fdb6e161ea78a957314efeecd5ee0654a2793 which
is a commit from january the first of 2022.

[root]@[jhost001] -
[ ~ ] > uname -a
FreeBSD jhost001 13.0-STABLE FreeBSD 13.0-STABLE #0
stable/13-n248793-375fdb6e161: Thu Mar 10 00:11:19 CET 2022
root@jhost001:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

With this version i do not see the jails go down, so it is something that
has been done after 01-01-2022
I will try to rebuild it a couple more times and see when it breaks.



Op wo 9 mrt. 2022 om 20:55 schreef Johan Hendriks <joh.hendriks@gmail.com>:

> The problem:
> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both running the same jails just to test the workings.
>
> The jails that are running are a salt master, a haproxy  jail, 2 webservers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. All the jails are connected to bridge0 and all the jails use vnet.
>
> I believe this worked on an older 14-HEAD machine, but i did not do a lot with it back then, and when i started testing again and after updating the OS i noticed that one of the varnish jails lost it's network connection after running for a few hours. I thought it was just something on HEAD so never really looked at it. But later on when i start using the jails again and testing a test wordpress site i noticed that with a simple load test my haproxy jail within one minute looses it's network connection. I see nothing in the logs, on the host and on the jail.
> From the jail i can not ping the other jails or the IP adres of the bridge. I can however ping the jails own IP adres. From the host i can also not ping the haproxy jail IP adres. If i start a tcpdump on the epaira interface from the haproxy jail i do see the packets arrive but not in the jail.
>
> I used ZFS to send all the jails to a 13-STABLE machine and copied over the jail.conf file as well as the pf.conf file and i saw the same behavior.
>
> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see this happening. There i can stress test the machine for 10 minutes without a problem but on 14-HEAD and 13-STABLE within a minute the jail's network connection fails and only a restart of the jail brings it back online to exhibit the same behavior if i start a simple load test which it should handle nicely.
>
> One of the jail hosts is running under VMWARE and the other is running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under Ubuntu with KVM
>
> Thank you for your time.
> regards
> Johan
>
>