From nobody Sat Mar 12 00:55:44 2022 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 365831A164EF for ; Sat, 12 Mar 2022 00:55:52 +0000 (UTC) (envelope-from grembo@freebsd.org) Received: from mail.evolve.de (mail.evolve.de [213.239.217.29]) (using TLSv1.3 with cipher TLS_CHACHA20_POLY1305_SHA256 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail.evolve.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KFkr66yNyz4jWF; Sat, 12 Mar 2022 00:55:50 +0000 (UTC) (envelope-from grembo@freebsd.org) Received: by mail.evolve.de (OpenSMTPD) with ESMTP id af6d9d7a; Sat, 12 Mar 2022 00:55:47 +0000 (UTC) Received: by mail.evolve.de (OpenSMTPD) with ESMTPSA id 2ee7d9e5 (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO); Sat, 12 Mar 2022 00:55:45 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org Mime-Version: 1.0 (1.0) Subject: Re: epair and vnet jail loose connection. From: Michael Gmelin In-Reply-To: <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org> Date: Sat, 12 Mar 2022 01:55:44 +0100 Cc: Johan Hendriks , freebsd-net@freebsd.org, ">> \\\\\\\\Patrick M. Hausen\\\\" Message-Id: <43AA6B37-6235-4787-A03F-B4C264C75A58@freebsd.org> References: <41ED1534-5E98-4D46-A562-811E80F82C5F@FreeBSD.org> To: Kristof Provost X-Mailer: iPhone Mail (19D52) X-Rspamd-Queue-Id: 4KFkr66yNyz4jWF X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=softfail (mx1.freebsd.org: 213.239.217.29 is neither permitted nor denied by domain of grembo@freebsd.org) smtp.mailfrom=grembo@freebsd.org X-Spamd-Result: default: False [-0.17 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEFALL_USER(0.00)[grembo]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; MV_CASE(0.50)[]; RCVD_TLS_ALL(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all:c]; MID_RHS_MATCH_FROM(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-0.07)[-0.075]; NEURAL_HAM_MEDIUM(-1.00)[-0.998]; MLMMJ_DEST(0.00)[freebsd-net]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:213.239.192.0/18, country:DE]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org,punkt.de]; SUSPICIOUS_RECIPS(1.50)[] X-ThisMailContainsUnwantedMimeParts: N > On 12. Mar 2022, at 01:21, Kristof Provost wrote: >=20 > =EF=BB=BFOn 11 Mar 2022, at 17:44, Johan Hendriks wrote: >>> On 09/03/2022 20:55, Johan Hendriks wrote: >>> The problem: >>> I have a FreeBSD 14 machine and a FreeBSD 13-stable machine, both runnin= g the same jails just to test the workings. >>>=20 >>> The jails that are running are a salt master, a haproxy jail, 2 webserv= ers, 2 varnish servers, 2 php jails one for php8.0 and one with 8.1. All the= jails are connected to bridge0 and all the jails use vnet. >>>=20 >>> I believe this worked on an older 14-HEAD machine, but i did not do a lo= t with it back then, and when i started testing again and after updating the= OS i noticed that one of the varnish jails lost it's network connection aft= er running for a few hours. I thought it was just something on HEAD so never= really looked at it. But later on when i start using the jails again and te= sting a test wordpress site i noticed that with a simple load test my haprox= y jail within one minute looses it's network connection. I see nothing in th= e logs, on the host and on the jail. >>> =46rom the jail i can not ping the other jails or the IP adres of the br= idge. I can however ping the jails own IP adres. =46rom the host i can also n= ot ping the haproxy jail IP adres. If i start a tcpdump on the epaira interf= ace from the haproxy jail i do see the packets arrive but not in the jail. >>>=20 >>> I used ZFS to send all the jails to a 13-STABLE machine and copied over t= he jail.conf file as well as the pf.conf file and i saw the same behavior. >>>=20 >>> Then i tried to use 13.0-RELEASE-p7 and on that machine i do not see thi= s happening. There i can stress test the machine for 10 minutes without a pr= oblem but on 14-HEAD and 13-STABLE within a minute the jail's network connec= tion fails and only a restart of the jail brings it back online to exhibit t= he same behavior if i start a simple load test which it should handle nicely= . >>>=20 >>> One of the jail hosts is running under VMWARE and the other is running u= nder Ubuntu with KVM. The 13.0-RELEASE-p7 jail host is running under Ubuntu w= ith KVM >>>=20 >>> Thank you for your time. >>> regards >>> Johan >>>=20 >> I did some bisecting and the latest commit that works on FreeBSD 13-Stabl= e is 009a56b2e >> Then the commit 2e0bee4c7 if_epair: implement fanout and above is showin= g the symptoms described above. >>=20 > Interestingly I cannot reproduce stalls in simple epair setups. > It would be useful if you could reduce the setup with the problem into a m= inimal configuration so we can figure out what other factors are involved. If there are clear instructions on how to reproduce, I=E2=80=99m happy to he= lp experimenting (I=E2=80=99m relying heavily on epair at this point). @Kristof: Did you try on bare metal or on vms? -m