From nobody Wed Jun 09 06:57:24 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C10E97EE32E for ; Wed, 9 Jun 2021 06:57:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4G0Hwk58Kwz4vqs; Wed, 9 Jun 2021 06:57:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from mousie.catspoiler.org (unknown [76.212.85.177]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: truckman) by smtp.freebsd.org (Postfix) with ESMTPSA id 306A020C15; Wed, 9 Jun 2021 06:57:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Date: Tue, 8 Jun 2021 23:57:24 -0700 (PDT) From: Don Lewis Subject: Re: ssh connections break with "Fssh_packet_write_wait" on 13 [SOLVED] To: Michael Gmelin cc: "freebsd-current@freebsd.org" In-Reply-To: <20210608224725.35930d70@bsd64.grem.de> Message-ID: References: <20210601134747.40920d51@bsd64.grem.de> <20210603150906.48cbd638@bsd64.grem.de> <20210608224725.35930d70@bsd64.grem.de> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE X-ThisMailContainsUnwantedMimeParts: N On 8 Jun, Michael Gmelin wrote: > > > On Thu, 3 Jun 2021 15:09:06 +0200 > Michael Gmelin wrote: > >> On Tue, 1 Jun 2021 13:47:47 +0200 >> Michael Gmelin wrote: >> >> > Hi, >> > >> > Since upgrading servers from 12.2 to 13.0, I get >> > >> > Fssh_packet_write_wait: Connection to 1.2.3.4 port 22: Broken pipe >> > >> > consistently, usually after about 11 idle minutes, that's with and >> > without pf enabled. Client (11.4 in a VM) wasn't altered. >> > >> > Verbose logging (client and server side) doesn't show anything >> > special when the connection breaks. In the past, QoS problems >> > caused these disconnects, but I didn't see anything apparent >> > changing between 12.2 and 13 in this respect. >> > >> > I did a test on a newly commissioned server to rule out other >> > factors (so, same client connections, some routes, same >> > everything). On 12.2 before the update: Connection stays open for >> > hours. After the update (same server): connections breaks >> > consistently after < 15 minutes (this is with unaltered >> > configurations, no *AliveInterval configured on either side of the >> > connection). >> >> I did a little bit more testing and realized that the problem goes >> away when I disable "Proportional Rate Reduction per RFC 6937" on the >> server side: >> >> sysctl net.inet.tcp.do_prr=0 >> >> Keeping it on and enabling net.inet.tcp.do_prr_conservative doesn't >> fix the problem. >> >> This seems to be specific to Parallels. After some more digging, I >> realized that Parallels Desktop's NAT daemon (prl_naptd) handles >> keep-alive between the VM and the external server on its own. There is >> no direct communication between the client and the server. This means: >> >> - The NAT daemon starts sending keep-alive packages right away (not >> after the VM's net.inet.tcp.keepidle), every 75 seconds. >> - Keep-alive packages originating in the VM never reach the server. >> - Keep-alive originating on the server never reaches the VM. >> - Client and server basically do keep-alive with the nat daemon, not >> with each other. >> >> It also seems like Parallels is filtering the tos field (so it's >> always 0x00), but that's unrelated. >> >> I configured a bhyve VM running FreeBSD 11.4 on a separate laptop on >> the same network for comparison and is has no such issues. >> >> Looking at TCP dump output on the server, this is what a keep-alive >> package sent by Parallels looks like: >> >> 10:14:42.449681 IP (tos 0x0, ttl 64, id 15689, offset 0, flags >> [none], proto TCP (6), length 40) >> 192.168.1.1.58222 > 192.168.1.2.22: Flags [.], cksum x (correct), >> seq 2534, ack 3851, win 4096, length 0 >> >> While those originating from the bhyve VM (after lowering >> net.inet.tcp.keepidle) look like this: >> >> 12:18:43.105460 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], >> proto TCP (6), length 52) >> 192.168.1.3.57555 > 192.168.1.2.22: Flags [.], cksum x >> (correct), seq 1780337696, ack 45831723, win 1026, options >> [nop,nop,TS val 3003646737 ecr 3331923346], length 0 >> >> Like written above, once net.inet.tcp.do_prr is disabled, keepalive >> seems to be working just fine. Otherwise, Parallel's NAT daemon kills >> the connection, as its keep-alive requests are not answered (well, >> that's what I think is happening): >> >> 10:19:43.614803 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], >> proto TCP (6), length 40) >> 192.168.1.1.58222 > 192.168.1.2.22: Flags [R.], cksum x (correct), >> seq 2535, ack 3851, win 4096, length 0 >> >> The easiest way to work around the problem Client side is to configure >> ServerAliveInterval in ~/.ssh/config in the Client VM. >> >> I'm curious though if this is basically a Parallels problem that has >> only been exposed by PRR being more correct (which is what I suspect), >> or if this is actually a FreeBSD problem. >> > > So, PRR probably was a red herring and the real reason that's happening > is that FreeBSD (since version 13[0]) by default discards packets > without timestamps for connections that formally had negotiated to have > them. This new behavior seems to be in line with RFC 7323, section > 3.2[1]: > > "Once TSopt has been successfully negotiated, that is both and > contain TSopt, the TSopt MUST be sent in every non- > segment for the duration of the connection, and SHOULD be sent in an > segment (see Section 5.2 for details)." > > As it turns out, macOS does exactly this - send keep-alive packets > without a timestamp for connections that were negotiated to have them. I wonder if I'm running into this with ssh connections to freefall. My outgoing IPv6 connections pass through an ipfw firewall that uses dynamic rules. When the dynamic rule gets close to expiration, it generates keep alive packets that just seem to be ignored by freefall. Eventually the dynamic rule expires, then sometime later sshd on freefall sends a keepalive which gets dropped at my end.