From nobody Sun Mar 13 10:26:38 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2ECF71A0CF74 for ; Sun, 13 Mar 2022 10:26:59 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KGbSd5nVYz4Sxl; Sun, 13 Mar 2022 10:26:57 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: by mail-lj1-x229.google.com with SMTP id y17so823996ljd.12; Sun, 13 Mar 2022 03:26:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lIpTfaYo3cfyhYLOan2SIHhN/jzOA5sJY2nR0sLNEfk=; b=KUc3bFaxv0JlqvOKajXZV/rJqbPIJxC9s087fM2hsKVERbB0f85pJhgqOw+/p0eEx8 vKhhrJfLfNKRwaoDXu5MsqgIq4BHhx1ptpF1OAnO+STdlTlSdapb1gJtjajgW9qf7nsr i3Wa/wwA1KuQx/mf2mNVoXajLR85wb/Hhc5KdWlQwmrHIMWzHR3ks0u7APKmErL1GrLS /19zFyyrkNhDE66dD8YVdSDi3PTA6LrY31i7yhmBOLAyfHwx2wQ6YcWC+xqBZCxD4+wm eVW8nnfkY+7nafGQ8zjrIZYuZnRJ5Zcz/WJfgVyerhrYRE9G55rphI/d81OhDP9DDTOQ LsNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lIpTfaYo3cfyhYLOan2SIHhN/jzOA5sJY2nR0sLNEfk=; b=q466vZYUoGEeMpQHjgRuMQbZXnEH212pmwEOuCsTtGrISsMd4V8kFUlGUVRo5j2AvN xZXNdVbu6Hk4l5vKq/kdzhnI/GZRJZCHscG0Srbg34XYbB5dKXJvjkm5GhVCYbSHnKph UVp+JRPqekEyls0itxXcaOY+EomxI/51xdIXvUBXGL1YSNQVgbjDaSYab5tCIgpRWo1n EUzhiy2H/xvB3Gsc0xB4qGlsxgyxXn84YUo78j5Bzb1/8JIvomlbr8pZXxdvLcJpsOis ZrqZSZttkcXdbWJxU36GmTJJisucnuLJAdR4OSEFWb2Vy7LvcY/OdiqUUO1aQRIlDll1 8BNQ== X-Gm-Message-State: AOAM531/80YEJT8zkJVWWRa6yWTiiVL/ZwdLCCm4Mar43F+mmXQY8H4z We5z2eICk8NopBhYruaO3A1SiSikb8/0a6sI4D1vdi66dns= X-Google-Smtp-Source: ABdhPJxac8kPbqOk7SO5Vlr2chrPPniYlGvjr1e6KoVcM4btjxkVv1EYcv9j/cQQxf66kHmAex5ROR5vTWJkC42+GiI= X-Received: by 2002:a2e:9092:0:b0:247:ec9c:393 with SMTP id l18-20020a2e9092000000b00247ec9c0393mr10789970ljg.119.1647167209539; Sun, 13 Mar 2022 03:26:49 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <94B8885D-F63F-40C3-9E7E-158CC252FF9A@FreeBSD.org> <95793CDF-6E72-4FAB-8BF5-F2E67D3F69CD@freebsd.org> In-Reply-To: <95793CDF-6E72-4FAB-8BF5-F2E67D3F69CD@freebsd.org> From: Johan Hendriks Date: Sun, 13 Mar 2022 11:26:38 +0100 Message-ID: Subject: Re: epair and vnet jail loose connection. To: Michael Gmelin Cc: Kristof Provost , freeBSD-net , ">> \\\\\\\\Patrick M. Hausen\\\\" Content-Type: multipart/alternative; boundary="000000000000690ce805da1700a4" X-Rspamd-Queue-Id: 4KGbSd5nVYz4Sxl X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=KUc3bFax; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of johhendriks@gmail.com designates 2a00:1450:4864:20::229 as permitted sender) smtp.mailfrom=johhendriks@gmail.com X-Spamd-Result: default: False [1.08 / 15.00]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MID_RHS_MATCH_FROMTLD(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:~,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; R_PARTS_DIFFER(0.85)[92.3%]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(1.00)[1.000]; NEURAL_HAM_LONG(-0.97)[-0.970]; MIME_GOOD(-0.10)[multipart/alternative]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::229:from]; MIME_HTML_ONLY(0.20)[]; MLMMJ_DEST(0.00)[freebsd-net]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --000000000000690ce805da1700a4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Op zo 13 mrt. 2022 01:17 schreef Michael Gmelin : > I also gave it another go (this time with multiple CPUs assigned to the > vm), still works just fine - so I think we would need more details about > the setup. > > Would it make sense to share our test setups, so Johan can try to > reproduce with them? > > -m > > On 13. Mar 2022, at 00:48, Kristof Provost wrote: > > =EF=BB=BF > > I=E2=80=99m still failing to reproduce. > > Is pf absolutely required to trigger the issue? Is haproxy (i.e. can you > trigger it with iperf)? > Is the bridge strictly required? > > Kristof > > On 12 Mar 2022, at 8:18, Johan Hendriks wrote: > > For me this minimal setup let me see the drop off of the network from the > haproxy server. > > 2 jails, one with haproxy, one with nginx which is using the following > html file to be served. > > > > > Page Title > > > >

My First Heading

>

My first paragraph.

> > > > > From a remote machine i do a hey -h2 -n 10 -c 10 -z 300s > https://wp.test.nl > Then a ping on the jailhost to the haproxy shows the following > > [ /] > ping 10.233.185.20 > PING 10.233.185.20 (10.233.185.20): 56 data bytes > 64 bytes from 10.233.185.20: icmp_seq=3D0 ttl=3D64 time=3D0.054 ms > 64 bytes from 10.233.185.20: icmp_seq=3D1 ttl=3D64 time=3D0.050 ms > 64 bytes from 10.233.185.20: icmp_seq=3D2 ttl=3D64 time=3D0.041 ms > > 64 bytes from 10.233.185.20: icmp_seq=3D169 ttl=3D64 time=3D0.050 ms > 64 bytes from 10.233.185.20: icmp_seq=3D170 ttl=3D64 time=3D0.154 ms > 64 bytes from 10.233.185.20: icmp_seq=3D171 ttl=3D64 time=3D0.054 ms > 64 bytes from 10.233.185.20: icmp_seq=3D172 ttl=3D64 time=3D0.039 ms > 64 bytes from 10.233.185.20: icmp_seq=3D173 ttl=3D64 time=3D0.160 ms > 64 bytes from 10.233.185.20: icmp_seq=3D174 ttl=3D64 time=3D0.045 ms > ^C > --- 10.233.185.20 ping statistics --- > 335 packets transmitted, 175 packets received, 47.8% packet loss > round-trip min/avg/max/stddev =3D 0.037/0.070/0.251/0.040 ms > > > ifconfig > vtnet0: flags=3D8963 metr= ic > 0 mtu 1500 > > options=3D4c00bb > ether 56:16:e9:80:5e:41 > inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159 > inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156 > inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155 > inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154 > media: Ethernet autoselect (10Gbase-T ) > status: active > nd6 options=3D29 > vtnet1: flags=3D8863 metric 0 mtu > 1500 > > options=3D4c07bb > ether 56:16:2c:64:32:35 > media: Ethernet autoselect (10Gbase-T ) > status: active > nd6 options=3D29 > lo0: flags=3D8049 metric 0 mtu 16384 > options=3D680003 > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 > inet 127.0.0.1 netmask 0xff000000 > groups: lo > nd6 options=3D21 > bridge0: flags=3D8843 metric 0 mt= u > 1500 > ether 58:9c:fc:10:ff:82 > inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255 > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 > member: epair20a flags=3D143 > ifmaxaddr 0 port 7 priority 128 path cost 2000 > member: epair18a flags=3D143 > ifmaxaddr 0 port 15 priority 128 path cost 2000 > groups: bridge > nd6 options=3D9 > bridge1: flags=3D8843 metric 0 mt= u > 1500 > ether 58:9c:fc:10:d9:1a > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 > member: vtnet0 flags=3D143 > ifmaxaddr 0 port 1 priority 128 path cost 2000 > groups: bridge > nd6 options=3D9 > pflog0: flags=3D141 metric 0 mtu 33160 > groups: pflog > epair18a: flags=3D8963 > metric 0 mtu 1500 > description: jail_web01 > options=3D8 > ether 02:77:ea:19:c7:0a > groups: epair > media: Ethernet 10Gbase-T (10Gbase-T ) > status: active > nd6 options=3D29 > epair20a: flags=3D8963 > metric 0 mtu 1500 > description: jail_haproxy > options=3D8 > ether 02:9b:93:8c:59:0a > groups: epair > media: Ethernet 10Gbase-T (10Gbase-T ) > status: active > nd6 options=3D29 > > jail.conf > > # Global settings applied to all jails. > $domain =3D "test.nl"; > > exec.start =3D "/bin/sh /etc/rc"; > exec.stop =3D "/bin/sh /etc/rc.shutdown"; > exec.clean; > > mount.fstab =3D "/storage/jails/$name.fstab"; > > exec.system_user =3D "root"; > exec.jail_user =3D "root"; > mount.devfs; > sysvshm=3D"new"; > sysvsem=3D"new"; > allow.raw_sockets; > allow.set_hostname =3D 0; > allow.sysvipc; > enforce_statfs =3D "2"; > devfs_ruleset =3D "11"; > > path =3D "/storage/jails/${name}"; > host.hostname =3D "${name}.${domain}"; > > > # Networking > vnet; > vnet.interface =3D "vnet0"; > > # Commands to run on host before jail is created > exec.prestart =3D "ifconfig epair${ip} create up description > jail_${name}"; > exec.prestart +=3D "ifconfig epair${ip}a up"; > exec.prestart +=3D "ifconfig bridge0 addm epair${ip}a up"; > exec.created =3D "ifconfig epair${ip}b name vnet0"; > > # Commands to run in jail after it is created > exec.start +=3D "/bin/sh /etc/rc"; > > # commands to run in jail when jail is stopped > exec.stop =3D "/bin/sh /etc/rc.shutdown"; > > # Commands to run on host when jail is stopped > exec.poststop =3D "ifconfig bridge0 deletem epair${ip}a"; > exec.poststop +=3D "ifconfig epair${ip}a destroy"; > persist; > > web01 { > $ip =3D 18; > } > > haproxy { > $ip =3D 20; > mount.fstab =3D ""; > path =3D "/storage/jails/${name}"; > } > > pf.conf > > ####################################################################### > ext_if=3D"vtnet0" > table persist > table persist > table persist file "/usr/local/etc/pf/ssh-trusted" > table persist file "/usr/local/etc/pf/custom-block" > table { 10.233.185.0/24, 192.168.10.0/24 } > > icmp_types =3D "echoreq" > junk_ports=3D"{ 135,137,138,139,445,68,67,3222,17500 }" > > # Log interface > set loginterface $ext_if > > # Set limits > set limit { states 40000, frags 20000, src-nodes 20000 } > > scrub on $ext_if all fragment reassemble no-df random-id > > # ---- Nat jails to the web > binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 -> > 87.233.191.156/32 # saltmaste > binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 -> > 87.233.191.155/32 # haproxy > binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 -> > 87.233.191.154/32 # web-comb > > nat on $ext_if from to any -> ($ext_if:0) > > # ---- First rule obligatory "Pass all on loopback" > pass quick on lo0 all > pass quick on bridge0 all > pass quick on bridge1 all > > # ---- Block TOR exit addresses > block quick proto { tcp, udp } from to $ext_if > > # ---- Second rule "Block all in and pass all out" > block in log all > pass out all keep state > > # IPv6 pass in/out all IPv6 ICMP traffic > pass in quick proto icmp6 all > > # Pass all lo0 > set skip on lo0 > > ############### FIREWALL ############################################### > # ---- Block custom ip's and logs > block quick proto { tcp, udp } from to $ext_if > > # ---- Jail poorten > pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port { > smtp 80 443 993 995 1956 } keep state > pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port { > smtp 80 443 993 995 1956 } keep state > pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port { > 4505 4506 } keep state > > # ---- Allow ICMP > pass in inet proto icmp all icmp-type $icmp_types keep state > pass out inet proto icmp all icmp-type $icmp_types keep state > > pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 44= 3 > } flags S/SA keep state > pass in quick on $ext_if inet proto tcp from to $ext_if por= t > { 4505 4506 } flags S/SA keep state > block log quick from > pass quick proto tcp from to $ext_if port ssh flags S/SA > keep state > > This is as minimal i can get it. > > Hope this helps. > regards, > Johan Hendriks > > > Op za 12 mrt. 2022 om 02:10 schreef Kristof Provost : > >> On 11 Mar 2022, at 18:55, Michael Gmelin wrote: >> >> On 12. Mar 2022, at 01:21, Kristof Provost wrote: >> >> >> >> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote: >> >>>> On 09/03/2022 20:55, Johan Hendriks wrote: >> >>>> The problem: >> >>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both >> running the same jails just to test the workings. >> >>>> >> >>>> The jails that are running are a salt master, a haproxy jail, 2 >> webservers, 2 varnish servers, 2 php jails one for php8.0 and one with 8= .1. >> All the jails are connected to bridge0 and all the jails use vnet. >> >>>> >> >>>> I believe this worked on an older 14-HEAD machine, but i did not do >> a lot with it back then, and when i started testing again and after >> updating the OS i noticed that one of the varnish jails lost it's networ= k >> connection after running for a few hours. I thought it was just somethin= g >> on HEAD so never really looked at it. But later on when i start using th= e >> jails again and testing a test wordpress site i noticed that with a simp= le >> load test my haproxy jail within one minute looses it's network connecti= on. >> I see nothing in the logs, on the host and on the jail. >> >>>> From the jail i can not ping the other jails or the IP adres of the >> bridge. I can however ping the jails own IP adres. From the host i can a= lso >> not ping the haproxy jail IP adres. If i start a tcpdump on the epaira >> interface from the haproxy jail i do see the packets arrive but not in t= he >> jail. >> >>>> >> >>>> I used ZFS to send all the jails to a 13-STABLE machine and copied >> over the jail.conf file as well as the pf.conf file and i saw the same >> behavior. >> >>>> >> >>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not se= e >> this happening. There i can stress test the machine for 10 minutes witho= ut >> a problem but on 14-HEAD and 13-STABLE within a minute the jail's networ= k >> connection fails and only a restart of the jail brings it back online to >> exhibit the same behavior if i start a simple load test which it should >> handle nicely. >> >>>> >> >>>> One of the jail hosts is running under VMWARE and the other is >> running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running >> under Ubuntu with KVM >> >>>> >> >>>> Thank you for your time. >> >>>> regards >> >>>> Johan >> >>>> >> >>> I did some bisecting and the latest commit that works on FreeBSD >> 13-Stable is 009a56b2e >> >>> Then the commit 2e0bee4c7 if_epair: implement fanout and above is >> showing the symptoms described above. >> >>> >> >> Interestingly I cannot reproduce stalls in simple epair setups. >> >> It would be useful if you could reduce the setup with the problem int= o >> a minimal configuration so we can figure out what other factors are >> involved. >> > >> > If there are clear instructions on how to reproduce, I=E2=80=99m happy= to help >> experimenting (I=E2=80=99m relying heavily on epair at this point). >> > >> > @Kristof: Did you try on bare metal or on vms? >> > >> Both. >> >> Kristof >> > I also did do a new install, this time based on 13.1-PRERELEASE. Copyd my haproxy en web01 jail to this machine and have the same problem. Could it be a sysctl i use? or boot/loader.conf setting. this is my /boot/loader.conf # -- sysinstall generated deltas -- # autoboot_delay=3D"2" #optional cryptodev_load=3D"YES" vbe_max_resolution=3D1024x768 # disable hyperthreading machdep.hyperthreading_allowed=3D0 # filemon filemon_load=3D"YES" # use gpt ids instead of gptids or disks idents kern.geom.label.disk_ident.enable=3D"0" kern.geom.label.gpt.enable=3D"1" kern.geom.label.gptid.enable=3D"0" # ZFS zfs_load=3D"YES" My /etc/sysctl.conf # $FreeBSD$ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # kern.timecounter.hardware=3DHPET # accept queue kern.ipc.soacceptqueue=3D4096 # PF vnet jail net.link.bridge.pfil_member=3D0 net.link.bridge.pfil_bridge=3D0 net.inet.ip.forwarding=3D1 # (default 0) net.inet.tcp.tso=3D0 # (default 1) vfs.zfs.min_auto_ashift=3D12 I f you want i can give you full root access on this machine. I do use a machine outside of the host machine to do the hey command. The host file points to the alias which is binat for the haproxy jail. Thank you all for your time on this! regards Johan Hendriks --000000000000690ce805da1700a4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Op zo 13 mrt. 2022 01:17 schreef Mic= hael Gmelin <gre= mbo@freebsd.org>:
I also gave it another go= (this time with multiple CPUs assigned to the vm), still works just fine -= so I think we would need more details about the setup.

Would it make sense to share our test setups,= so Johan can try to reproduce with them?

<= div dir=3D"ltr">-m

On 1= 3. Mar 2022, at 00:48, Kristof Provost <kp@freebsd.org> wrote:
=EF=BB=BF

I=E2=80=99m still failing to reproduce.

Is pf absolutely required to trigger the issue? Is haproxy = (i.e. can you trigger it with iperf)?
Is the bridge strictly required?

Kristof

On 12 Mar 2022, at 8:18, Johan Hendriks wrote:

For me this minimal setup let me see the drop off of the n= etwork from the haproxy server.

2 jails, one with haproxy, one with nginx which is using the following html= file to be served.

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>My First Heading</h1>
<p>My first paragraph.</p>

</body>
</html>

From a remote machine i do a=C2=A0=C2=A0hey -h2 -n 10 -c 10 -z 300s https://wp.tes= t.nl
Then a ping on the jailhost to the haproxy shows the following

[ /] > ping 10.233.185.20
PING 10.233.185.20 (10.233.185.20): 56 data bytes
64 bytes from 10.233.185.20: icmp_seq=3D0 ttl=3D64 time=3D0.054 ms
64 bytes from 10.233.185.20: icmp_seq=3D1 ttl=3D64 time=3D0.050 ms
64 bytes from 10.233.185.20: icmp_seq=3D2 ttl=3D64 time=3D0.041 ms
<SNIP>
64 bytes from 10.233.185.20: icmp_seq=3D169 ttl=3D64 time=3D0.050 ms
64 bytes from 10.233.185.20: icmp_seq=3D170 ttl=3D64 time=3D0.154 ms
64 bytes from 10.233.185.20: icmp_seq=3D171 ttl=3D64 time=3D0.054 ms
64 bytes from 10.233.185.20: icmp_seq=3D172 ttl=3D64 time=3D0.039 ms
64 bytes from 10.233.185.20: icmp_seq=3D173 ttl=3D64 time=3D0.160 ms
64 bytes from 10.233.185.20: icmp_seq=3D174 ttl=3D64 time=3D0.045 ms
^C
--- 10.233.185.20 ping statistics ---
335 packets transmitted, 175 packets received, 47.8% packet loss
round-trip min/avg/max/stddev =3D 0.037/0.070/0.251/0.040 ms


ifconfig
vtnet0: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> = metric 0 mtu 1500
options=3D4c00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HW= CSUM,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
ether 56:16:e9:80:5e:41
inet 87.233.191.146 netmask 0xfffffff0 broadcast 87.233.191.159
inet 87.233.191.156 netmask 0xffffffff broadcast 87.233.191.156
inet 87.233.191.155 netmask 0xffffffff broadcast 87.233.191.155
inet 87.233.191.154 netmask 0xffffffff broadcast 87.233.191.154
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vtnet1: flags=3D8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0= mtu 1500
options=3D4c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HW= CSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,TXCSUM_IPV6>
ether 56:16:2c:64:32:35
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric = 0 mtu 1500
ether 58:9c:fc:10:ff:82
inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair20a flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
=C2=A0 =C2=A0 =C2=A0 =C2=A0ifmaxaddr 0 port 7 priority 128 path cost 2000 member: epair18a flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
=C2=A0 =C2=A0 =C2=A0 =C2=A0ifmaxaddr 0 port 15 priority 128 path cost 2000<= br> groups: bridge
nd6 options=3D9<PERFORMNUD,IFDISABLED>
bridge1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric = 0 mtu 1500
ether 58:9c:fc:10:d9:1a
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: vtnet0 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
=C2=A0 =C2=A0 =C2=A0 =C2=A0ifmaxaddr 0 port 1 priority 128 path cost 2000 groups: bridge
nd6 options=3D9<PERFORMNUD,IFDISABLED>
pflog0: flags=3D141<UP,RUNNING,PROMISC> metric 0 mtu 33160
groups: pflog
epair18a: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>= ; metric 0 mtu 1500
description: jail_web01
options=3D8<VLAN_MTU>
ether 02:77:ea:19:c7:0a
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair20a: flags=3D8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>= ; metric 0 mtu 1500
description: jail_haproxy
options=3D8<VLAN_MTU>
ether 02:9b:93:8c:59:0a
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

jail.conf

# Global settings applied to all jails.
$domain =3D "test.nl";

exec.start =3D "/bin/sh /etc/rc";
exec.stop =3D "/bin/sh /etc/rc.shutdown";
exec.clean;

mount.fstab =3D "/storage/jails/$name.fstab";

exec.system_user =C2=A0=3D "root";
exec.jail_user =C2=A0 =C2=A0=3D "root";
mount.devfs;
sysvshm=3D"new";
sysvsem=3D"new";
allow.raw_sockets;
allow.set_hostname =3D 0;
allow.sysvipc;
enforce_statfs =3D "2";
devfs_ruleset =C2=A0 =C2=A0 =3D "11";

path =3D "/storage/jails/${name}";
host.hostname =3D "${name}.${domain}";


# Networking
vnet;
vnet.interface =C2=A0 =C2=A0=3D "vnet0";

=C2=A0 # Commands to run on host before jail is created
=C2=A0 exec.prestart =C2=A0=3D "ifconfig epair${ip} create up descript= ion jail_${name}";
=C2=A0 exec.prestart =C2=A0+=3D "ifconfig epair${ip}a up";
=C2=A0 exec.prestart =C2=A0+=3D "ifconfig bridge0 addm epair${ip}a up&= quot;;
=C2=A0 exec.created =C2=A0 =3D "ifconfig epair${ip}b name vnet0";=

=C2=A0 # Commands to run in jail after it is created
=C2=A0 exec.start =C2=A0+=3D "/bin/sh /etc/rc";

=C2=A0 # commands to run in jail when jail is stopped
=C2=A0 exec.stop =C2=A0=3D "/bin/sh /etc/rc.shutdown";

=C2=A0 # Commands to run on host when jail is stopped
=C2=A0 exec.poststop =C2=A0=3D "ifconfig bridge0 deletem epair${ip}a&q= uot;;
=C2=A0 exec.poststop =C2=A0+=3D "ifconfig epair${ip}a destroy"; =C2=A0 persist;

web01 {
=C2=A0 =C2=A0 $ip =3D 18;
}

haproxy {
=C2=A0 =C2=A0 $ip =3D 20;
=C2=A0 =C2=A0 mount.fstab =3D "";
=C2=A0 =C2=A0 path =3D "/storage/jails/${name}";
}

pf.conf

#######################################################################
ext_if=3D"vtnet0"
table <bruteforcers> persist
table <torlist> persist
table <ssh-trusted> persist file "/usr/local/etc/pf/ssh-trusted&= quot;
table <custom-block> persist file "/usr/local/etc/pf/custom-bloc= k"
table <jailnetworks> { 10.233.185.0/24, 192.168.10.0/24 }

icmp_types =3D "echoreq"
junk_ports=3D"{ 135,137,138,139,445,68,67,3222,17500 }"

# Log interface
set loginterface $ext_if

# Set limits
set limit { states 40000, frags 20000, src-nodes 20000 }

scrub on $ext_if all fragment reassemble no-df random-id

# ---- Nat jails to the web
binat on $ext_if from 10.233.185.15/32 to !10.233.185.0/24 -> 87.233.1= 91.156/32 # saltmaste
binat on $ext_if from 10.233.185.20/32 to !10.233.185.0/24 -> 87.233.1= 91.155/32 # haproxy
binat on $ext_if from 10.233.185.22/32 to !10.233.185.0/24 -> 87.233.1= 91.154/32 # web-comb

nat on $ext_if from <jailnetworks> to any -> ($ext_if:0)

# ---- First rule obligatory "Pass all on loopback"
pass quick on lo0 all
pass quick on bridge0 all
pass quick on bridge1 all

# ---- Block TOR exit addresses
block quick proto { tcp, udp } from <torlist> to $ext_if

# ---- Second rule "Block all in and pass all out"
block in log all
pass out all keep state

# IPv6 pass in/out all IPv6 ICMP traffic
pass in quick proto icmp6 all

# Pass all lo0
set skip on lo0

############### FIREWALL ############################################### # ---- Block custom ip's and logs
block quick proto { tcp, udp } from <custom-block> to $ext_if

# ---- Jail poorten
pass in quick on { $ext_if } proto tcp from any to 10.233.185.22 port { smt= p 80 443 993 995 1956 } keep state
pass in quick on { $ext_if } proto tcp from any to 10.233.185.20 port { smt= p 80 443 993 995 1956 } keep state
pass in quick on { $ext_if } proto tcp from any to 10.233.185.15 port { 450= 5 4506 } keep state

# ---- Allow ICMP
pass in inet proto icmp all icmp-type $icmp_types keep state
pass out inet proto icmp all icmp-type $icmp_types keep state

pass in quick on $ext_if inet proto tcp from any to $ext_if port { 80, 443 = } flags S/SA keep state
pass in quick on $ext_if inet proto tcp from <ssh-trusted> to $ext_if= port { 4505 4506 } flags S/SA keep state
block log quick from <bruteforcers>
pass quick proto tcp from <ssh-trusted> to $ext_if port ssh flags S/S= A keep state

This is as minimal i can get it.

Hope this helps.
regards,
Johan Hendriks


Op za 12 mrt. 2022 om 02:10 schreef K= ristof Provost <kp@freebsd.org>:
On 11 Mar 2022, at 18:55,= Michael Gmelin wrote:
>> On 12. Mar 2022, at 01:21, Kristof Provost <kp@freebsd.org> = wrote:
>>
>> =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote:
>>>> On 09/03/2022 20:55, Johan Hendriks wrote:
>>>> The problem:
>>>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machin= e, both running the same jails just to test the workings.
>>>>
>>>> The jails that are running are a salt master, a haproxy=C2= =A0 jail, 2 webservers, 2 varnish servers, 2 php jails one for php8.0 and o= ne with 8.1. All the jails are connected to bridge0 and all the jails use v= net.
>>>>
>>>> I believe this worked on an older 14-HEAD machine, but i d= id not do a lot with it back then, and when i started testing again and aft= er updating the OS i noticed that one of the varnish jails lost it's ne= twork connection after running for a few hours. I thought it was just somet= hing on HEAD so never really looked at it. But later on when i start using = the jails again and testing a test wordpress site i noticed that with a sim= ple load test my haproxy jail within one minute looses it's network con= nection. I see nothing in the logs, on the host and on the jail.
>>>> From the jail i can not ping the other jails or the IP adr= es of the bridge. I can however ping the jails own IP adres. From the host = i can also not ping the haproxy jail IP adres. If i start a tcpdump on the = epaira interface from the haproxy jail i do see the packets arrive but not = in the jail.
>>>>
>>>> I used ZFS to send all the jails to a 13-STABLE machine an= d copied over the jail.conf file as well as the pf.conf file and i saw the = same behavior.
>>>>
>>>> Then i tried to use 13.0-RELEASE-p7 and on that machine i = do not see this happening. There i can stress test the machine for 10 minut= es without a problem but on 14-HEAD and 13-STABLE within a minute the jail&= #39;s network connection fails and only a restart of the jail brings it bac= k online to exhibit the same behavior if i start a simple load test which i= t should handle nicely.
>>>>
>>>> One of the jail hosts is running under VMWARE and the othe= r is running under Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is runnin= g under Ubuntu with KVM
>>>>
>>>> Thank you for your time.
>>>> regards
>>>> Johan
>>>>
>>> I did some bisecting and the latest commit that works on FreeB= SD 13-Stable is 009a56b2e
>>> Then the commit 2e0bee4c7=C2=A0 if_epair: implement fanout and= above is showing the symptoms described above.
>>>
>> Interestingly I cannot reproduce stalls in simple epair setups. >> It would be useful if you could reduce the setup with the problem = into a minimal configuration so we can figure out what other factors are in= volved.
>
> If there are clear instructions on how to reproduce, I=E2=80=99m happy= to help experimenting (I=E2=80=99m relying heavily on epair at this point)= .
>
> @Kristof: Did you try on bare metal or on vms?
>
Both.

Kristof
<= /div>
I also did do a new install, this time based on 13.1= -PRERELEASE.
Copyd my haproxy en web01 jail to this machine and have the= same problem.=C2=A0

Could it be a sysctl i use? or boot/loader.conf= setting.

this is my /boot/loader.conf
# -- sysinstall generated = deltas -- #

autoboot_delay=3D"2" =C2=A0#optional

cr= yptodev_load=3D"YES"

vbe_max_resolution=3D1024x768

= # disable hyperthreading
machdep.hyperthreading_allowed=3D0

# fil= emon
filemon_load=3D"YES"

# use gpt ids instead of gpti= ds or disks idents
kern.geom.label.disk_ident.enable=3D"0"
= kern.geom.label.gpt.enable=3D"1"
kern.geom.label.gptid.enable= =3D"0"

# ZFS
zfs_load=3D"YES"

My /etc/= sysctl.conf

# $FreeBSD$
#
# =C2=A0This file is read when going= to multi-user and its contents piped thru
# =C2=A0``sysctl'' to= adjust kernel values. =C2=A0``man 5 sysctl.conf'' for details.
= #
kern.timecounter.hardware=3DHPET
# accept queue
kern.ipc.soaccep= tqueue=3D4096

# PF vnet jail
net.link.bridge.pfil_member=3D0
n= et.link.bridge.pfil_bridge=3D0
net.inet.ip.forwarding=3D1 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # (default 0)
net.inet.= tcp.tso=3D0 =C2=A0# (default 1)
vfs.zfs.min_auto_ashift=3D12

I f = you want i can give you full root access on this machine.=C2=A0

I do= use a machine outside of the host machine to do the hey command. The host = file points to the alias which is binat for the haproxy jail.

Thank = you all for your time on this!

regards
Johan Hendriks


=



=C2=A0
--000000000000690ce805da1700a4--