Re: epair and vnet jail loose connection.

From: Johan Hendriks <joh.hendriks_at_gmail.com>
Date: Sun, 13 Mar 2022 10:26:38 UTC
Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin <grembo@freebsd.org>:

> I also gave it another go (this time with multiple CPUs assigned to the
> vm), still works just fine - so I think we would need more details about
> the setup.
>
> Would it make sense to share our test setups, so Johan can try to
> reproduce with them?
>
> -m
>
> On 13. Mar 2022, at 00:48, Kristof Provost <kp@freebsd.org> wrote:
>
> 
>
> I’m still failing to reproduce.
>
> Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you
> trigger it with iperf)?
> Is the bridge strictly required?
>
> Kristof
>
> On 12 Mar 2022, at 8:18, Johan Hendriks wrote:
>
> For me this minimal setup let me see the drop off of the network from the
> haproxy server.
>
> 2 jails, one with haproxy, one with nginx which is using the following
> html file to be served.
>
> <!DOCTYPE html>
> <html>
> <head>
> <title>Page Title</title>
> </head>
> <body>
>
> <h1>My First Heading</h1>
> <p>My first paragraph.</p>
>
> </body>
> </html>
>
> From a remote machine i do a  hey -h2 -n 10 -c 10 -z 300s
> https://wp.test.nl
> Then a ping on the jailhost to the haproxy shows the following
>
> [ /] > ping 10.233.185.20
> PING 10.233.185.20 (10.233.185.20): 56 data bytes
> 64 bytes from 10.233.185.20: icmp_seq=0 ttl=64 time=0.054 ms
> 64 bytes from 10.233.185.20: icmp_seq=1 ttl=64 time=0.050 ms
> 64 bytes from 10.233.185.20: icmp_seq=2 ttl=64 time=0.041 ms
> <SNIP>
> 64 bytes from 10.233.185.20: icmp_seq=169 ttl=64 time=0.050 ms
> 64 bytes from 10.233.185.20: icmp_seq=170 ttl=64 time=0.154 ms
> 64 bytes from 10.233.185.20: icmp_seq=171 ttl=64 time=0.054 ms
> 64 bytes from 10.233.185.20: icmp_seq=172 ttl=64 time=0.039 ms
> 64 bytes from 10.233.185.20: icmp_seq=173 ttl=64 time=0.160 ms
> 64 bytes from 10.233.185.20: icmp_seq=174 ttl=64 time=0.045 ms
> ^C
> --- 10.233.185.20 ping statistics ---
> 335 packets transmitted, 175 packets received, 47.8% packet loss
> round-trip min/avg/max/stddev = 0.037/0.070/0.251/0.040 ms
>
>
> ifconfig
> vtnet0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
> 0 mtu 1500
>
> options=4c00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
> ether 56:16:e9:80:5e:41
> inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159
> inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156
> inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155
> inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154
> media: Ethernet autoselect (10Gbase-T <full-duplex>)
> status: active
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> vtnet1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
> 1500
>
> options=4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
> ether 56:16:2c:64:32:35
> media: Ethernet autoselect (10Gbase-T <full-duplex>)
> status: active
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
> options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
> inet 127.0.0.1 netmask 0xff000000
> groups: lo
> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
> 1500
> ether 58:9c:fc:10:ff:82
> inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255
> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> member: epair20a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>        ifmaxaddr 0 port 7 priority 128 path cost 2000
> member: epair18a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>        ifmaxaddr 0 port 15 priority 128 path cost 2000
> groups: bridge
> nd6 options=9<PERFORMNUD,IFDISABLED>
> bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
> 1500
> ether 58:9c:fc:10:d9:1a
> id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> member: vtnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>        ifmaxaddr 0 port 1 priority 128 path cost 2000
> groups: bridge
> nd6 options=9<PERFORMNUD,IFDISABLED>
> pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160
> groups: pflog
> epair18a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
> metric 0 mtu 1500
> description: jail_web01
> options=8<VLAN_MTU>
> ether 02:77:ea:19:c7:0a
> groups: epair
> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
> status: active
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> epair20a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
> metric 0 mtu 1500
> description: jail_haproxy
> options=8<VLAN_MTU>
> ether 02:9b:93:8c:59:0a
> groups: epair
> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
> status: active
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>
> jail.conf
>
> # Global settings applied to all jails.
> $domain = "test.nl";
>
> exec.start = "/bin/sh /etc/rc";
> exec.stop = "/bin/sh /etc/rc.shutdown";
> exec.clean;
>
> mount.fstab = "/storage/jails/$name.fstab";
>
> exec.system_user  = "root";
> exec.jail_user    = "root";
> mount.devfs;
> sysvshm="new";
> sysvsem="new";
> allow.raw_sockets;
> allow.set_hostname = 0;
> allow.sysvipc;
> enforce_statfs = "2";
> devfs_ruleset     = "11";
>
> path = "/storage/jails/${name}";
> host.hostname = "${name}.${domain}";
>
>
> # Networking
> vnet;
> vnet.interface    = "vnet0";
>
>   # Commands to run on host before jail is created
>   exec.prestart  = "ifconfig epair${ip} create up description
> jail_${name}";
>   exec.prestart  += "ifconfig epair${ip}a up";
>   exec.prestart  += "ifconfig bridge0 addm epair${ip}a up";
>   exec.created   = "ifconfig epair${ip}b name vnet0";
>
>   # Commands to run in jail after it is created
>   exec.start  += "/bin/sh /etc/rc";
>
>   # commands to run in jail when jail is stopped
>   exec.stop  = "/bin/sh /etc/rc.shutdown";
>
>   # Commands to run on host when jail is stopped
>   exec.poststop  = "ifconfig bridge0 deletem epair${ip}a";
>   exec.poststop  += "ifconfig epair${ip}a destroy";
>   persist;
>
> web01 {
>     $ip = 18;
> }
>
> haproxy {
>     $ip = 20;
>     mount.fstab = "";
>     path = "/storage/jails/${name}";
> }
>
> pf.conf
>
> #######################################################################
> ext_if="vtnet0"
> table <bruteforcers> persist
> table <torlist> persist
> table <ssh-trusted> persist file "/usr/local/etc/pf/ssh-trusted"
> table <custom-block> persist file "/usr/local/etc/pf/custom-block"
> table <jailnetworks> { 10.233.185.0/24, 192.168.10.0/24 }
>
> icmp_types = "echoreq"
> junk_ports="{ 135,137,138,139,445,68,67,3222,17500 }"
>
> # Log interface
> set loginterface $ext_if
>
> # Set limits
> set limit { states 40000, frags 20000, src-nodes 20000 }
>
> scrub on $ext_if all fragment reassemble no-df random-id
>
> # ---- Nat jails to the web
> binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 ->
> 87.233.191.156/32 # saltmaste
> binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 ->
> 87.233.191.155/32 # haproxy
> binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 ->
> 87.233.191.154/32 # web-comb
>
> nat on $ext_if from <jailnetworks> to any -> ($ext_if:0)
>
> # ---- First rule obligatory "Pass all on loopback"
> pass quick on lo0 all
> pass quick on bridge0 all
> pass quick on bridge1 all
>
> # ---- Block TOR exit addresses
> block quick proto { tcp, udp } from <torlist> to $ext_if
>
> # ---- Second rule "Block all in and pass all out"
> block in log all
> pass out all keep state
>
> # IPv6 pass in/out all IPv6 ICMP traffic
> pass in quick proto icmp6 all
>
> # Pass all lo0
> set skip on lo0
>
> ############### FIREWALL ###############################################
> # ---- Block custom ip's and logs
> block quick proto { tcp, udp } from <custom-block> to $ext_if
>
> # ---- Jail poorten
> pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port {
> smtp 80 443 993 995 1956 } keep state
> pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port {
> smtp 80 443 993 995 1956 } keep state
> pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port {
> 4505 4506 } keep state
>
> # ---- Allow ICMP
> pass in inet proto icmp all icmp-type $icmp_types keep state
> pass out inet proto icmp all icmp-type $icmp_types keep state
>
> pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 443
> } flags S/SA keep state
> pass in quick on $ext_if inet proto tcp from <ssh-trusted> to $ext_if port
> { 4505 4506 } flags S/SA keep state
> block log quick from <bruteforcers>
> pass quick proto tcp from <ssh-trusted> to $ext_if port ssh flags S/SA
> keep state
>
> This is as minimal i can get it.
>
> Hope this helps.
> regards,
> Johan Hendriks
>
>
> Op za 12 mrt. 2022 om 02:10 schreef Kristof Provost <kp@freebsd.org>:
>
>> On 11 Mar 2022, at 18:55, Michael Gmelin wrote:
>> >> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> wrote:
>> >>
>> >> On 11 Mar 2022, at 17:44, Johan Hendriks wrote:
>> >>>> On 09/03/2022 20:55, Johan Hendriks wrote:
>> >>>> The problem:
>> >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both
>> running the same jails just to test the workings.
>> >>>>
>> >>>> The jails that are running are a salt master, a haproxy  jail, 2
>> webservers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1.
>> All the jails are connected to bridge0 and all the jails use vnet.
>> >>>>
>> >>>> I believe this worked on an older 14-HEAD machine, but i did not do
>> a lot with it back then, and when i started testing again and after
>> updating the OS i noticed that one of the varnish jails lost it's network
>> connection after running for a few hours. I thought it was just something
>> on HEAD so never really looked at it. But later on when i start using the
>> jails again and testing a test wordpress site i noticed that with a simple
>> load test my haproxy jail within one minute looses it's network connection.
>> I see nothing in the logs, on the host and on the jail.
>> >>>> From the jail i can not ping the other jails or the IP adres of the
>> bridge. I can however ping the jails own IP adres. From the host i can also
>> not ping the haproxy jail IP adres. If i start a tcpdump on the epaira
>> interface from the haproxy jail i do see the packets arrive but not in the
>> jail.
>> >>>>
>> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied
>> over the jail.conf file as well as the pf.conf file and i saw the same
>> behavior.
>> >>>>
>> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see
>> this happening. There i can stress test the machine for 10 minutes without
>> a problem but on 14-HEAD and 13-STABLE within a minute the jail's network
>> connection fails and only a restart of the jail brings it back online to
>> exhibit the same behavior if i start a simple load test which it should
>> handle nicely.
>> >>>>
>> >>>> One of the jail hosts is running under VMWARE and the other is
>> running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running
>> under Ubuntu with KVM
>> >>>>
>> >>>> Thank you for your time.
>> >>>> regards
>> >>>> Johan
>> >>>>
>> >>> I did some bisecting and the latest commit that works on FreeBSD
>> 13-Stable is 009a56b2e
>> >>> Then the commit 2e0bee4c7  if_epair: implement fanout and above is
>> showing the symptoms described above.
>> >>>
>> >> Interestingly I cannot reproduce stalls in simple epair setups.
>> >> It would be useful if you could reduce the setup with the problem into
>> a minimal configuration so we can figure out what other factors are
>> involved.
>> >
>> > If there are clear instructions on how to reproduce, I’m happy to help
>> experimenting (I’m relying heavily on epair at this point).
>> >
>> > @Kristof: Did you try on bare metal or on vms?
>> >
>> Both.
>>
>> Kristof
>>
> I also did do a new install, this time based on 13.1-PRERELEASE.
Copyd my haproxy en web01 jail to this machine and have the same problem.

Could it be a sysctl i use? or boot/loader.conf setting.

this is my /boot/loader.conf
# -- sysinstall generated deltas -- #

autoboot_delay="2"  #optional

cryptodev_load="YES"

vbe_max_resolution=1024x768

# disable hyperthreading
machdep.hyperthreading_allowed=0

# filemon
filemon_load="YES"

# use gpt ids instead of gptids or disks idents
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gpt.enable="1"
kern.geom.label.gptid.enable="0"

# ZFS
zfs_load="YES"

My /etc/sysctl.conf

# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#
kern.timecounter.hardware=HPET
# accept queue
kern.ipc.soacceptqueue=4096

# PF vnet jail
net.link.bridge.pfil_member=0
net.link.bridge.pfil_bridge=0
net.inet.ip.forwarding=1                   # (default 0)
net.inet.tcp.tso=0  # (default 1)
vfs.zfs.min_auto_ashift=12

I f you want i can give you full root access on this machine.

I do use a machine outside of the host machine to do the hey command. The
host file points to the alias which is binat for the haproxy jail.

Thank you all for your time on this!

regards
Johan Hendriks