From nobody Wed Jun 09 08:30:27 2021
X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6A87CF7B307
	for <freebsd-current@mlmmj.nyi.freebsd.org>; Wed,  9 Jun 2021 08:30:31 +0000 (UTC)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Received: from gndrsh.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4G0L070pJBz3J6Z;
	Wed,  9 Jun 2021 08:30:30 +0000 (UTC)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Received: from gndrsh.dnsmgr.net (localhost [127.0.0.1])
	by gndrsh.dnsmgr.net (8.13.3/8.13.3) with ESMTP id 1598UR5S012206;
	Wed, 9 Jun 2021 01:30:27 -0700 (PDT)
	(envelope-from freebsd-rwg@gndrsh.dnsmgr.net)
Received: (from freebsd-rwg@localhost)
	by gndrsh.dnsmgr.net (8.13.3/8.13.3/Submit) id 1598URfk012205;
	Wed, 9 Jun 2021 01:30:27 -0700 (PDT)
	(envelope-from freebsd-rwg)
From: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Message-Id: <202106090830.1598URfk012205@gndrsh.dnsmgr.net>
Subject: Re: ssh connections break with "Fssh_packet_write_wait" on 13 [SOLVED]
In-Reply-To: <tkrat.41b4bdb2f948847a@FreeBSD.org>
To: Don Lewis <truckman@FreeBSD.org>
Date: Wed, 9 Jun 2021 01:30:27 -0700 (PDT)
CC: Michael Gmelin <freebsd@grem.de>,
        "freebsd-current@freebsd.org" <freebsd-current@FreeBSD.org>
X-Mailer: ELM [version 2.4ME+ PL121h (25)]
List-Id: Discussions about the use of FreeBSD-current <freebsd-current.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-current
List-Help: <mailto:freebsd-current+help@freebsd.org>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Subscribe: <mailto:freebsd-current+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-current+unsubscribe@freebsd.org>
Sender: owner-freebsd-current@freebsd.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
X-Rspamd-Queue-Id: 4G0L070pJBz3J6Z
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-4.00 / 15.00];
	 REPLY(-4.00)[]
X-ThisMailContainsUnwantedMimeParts: N

> On  8 Jun, Michael Gmelin wrote:
> > 
> > 
> > On Thu, 3 Jun 2021 15:09:06 +0200
> > Michael Gmelin <freebsd@grem.de> wrote:
> > 
> >> On Tue, 1 Jun 2021 13:47:47 +0200
> >> Michael Gmelin <freebsd@grem.de> wrote:
> >> 
> >> > Hi,
> >> > 
> >> > Since upgrading servers from 12.2 to 13.0, I get
> >> > 
> >> >   Fssh_packet_write_wait: Connection to 1.2.3.4 port 22: Broken pipe
> >> > 
> >> > consistently, usually after about 11 idle minutes, that's with and
> >> > without pf enabled. Client (11.4 in a VM) wasn't altered.
> >> > 
> >> > Verbose logging (client and server side) doesn't show anything
> >> > special when the connection breaks. In the past, QoS problems
> >> > caused these disconnects, but I didn't see anything apparent
> >> > changing between 12.2 and 13 in this respect.
> >> > 
> >> > I did a test on a newly commissioned server to rule out other
> >> > factors (so, same client connections, some routes, same
> >> > everything). On 12.2 before the update: Connection stays open for
> >> > hours. After the update (same server): connections breaks
> >> > consistently after < 15 minutes (this is with unaltered
> >> > configurations, no *AliveInterval configured on either side of the
> >> > connection). 
> >> 
> >> I did a little bit more testing and realized that the problem goes
> >> away when I disable "Proportional Rate Reduction per RFC 6937" on the
> >> server side:
> >> 
> >>   sysctl net.inet.tcp.do_prr=0
> >> 
> >> Keeping it on and enabling net.inet.tcp.do_prr_conservative doesn't
> >> fix the problem.
> >> 
> >> This seems to be specific to Parallels. After some more digging, I
> >> realized that Parallels Desktop's NAT daemon (prl_naptd) handles
> >> keep-alive between the VM and the external server on its own. There is
> >> no direct communication between the client and the server. This means:
> >> 
> >> - The NAT daemon starts sending keep-alive packages right away (not
> >>   after the VM's net.inet.tcp.keepidle), every 75 seconds.
> >> - Keep-alive packages originating in the VM never reach the server.
> >> - Keep-alive originating on the server never reaches the VM.
> >> - Client and server basically do keep-alive with the nat daemon, not
> >>   with each other.
> >> 
> >> It also seems like Parallels is filtering the tos field (so it's
> >> always 0x00), but that's unrelated.
> >> 
> >> I configured a bhyve VM running FreeBSD 11.4 on a separate laptop on
> >> the same network for comparison and is has no such issues.
> >> 
> >> Looking at TCP dump output on the server, this is what a keep-alive
> >> package sent by Parallels looks like:
> >> 
> >>   10:14:42.449681 IP (tos 0x0, ttl 64, id 15689, offset 0, flags
> >> [none], proto TCP (6), length 40)
> >>     192.168.1.1.58222 > 192.168.1.2.22: Flags [.], cksum x (correct),
> >>     seq 2534, ack 3851, win 4096, length 0
> >> 
> >> While those originating from the bhyve VM (after lowering
> >> net.inet.tcp.keepidle) look like this:
> >> 
> >>   12:18:43.105460 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF],
> >>     proto TCP (6), length 52)
> >>     192.168.1.3.57555 > 192.168.1.2.22: Flags [.], cksum x
> >>     (correct), seq 1780337696, ack 45831723, win 1026, options
> >>     [nop,nop,TS val 3003646737 ecr 3331923346], length 0
> >> 
> >> Like written above, once net.inet.tcp.do_prr is disabled, keepalive
> >> seems to be working just fine. Otherwise, Parallel's NAT daemon kills
> >> the connection, as its keep-alive requests are not answered (well,
> >> that's what I think is happening):
> >> 
> >>   10:19:43.614803 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> >>     proto TCP (6), length 40)
> >>     192.168.1.1.58222 > 192.168.1.2.22: Flags [R.], cksum x (correct),
> >>     seq 2535, ack 3851, win 4096, length 0
> >> 
> >> The easiest way to work around the problem Client side is to configure
> >> ServerAliveInterval in ~/.ssh/config in the Client VM.
> >> 
> >> I'm curious though if this is basically a Parallels problem that has
> >> only been exposed by PRR being more correct (which is what I suspect),
> >> or if this is actually a FreeBSD problem.
> >> 
> > 
> > So, PRR probably was a red herring and the real reason that's happening
> > is that FreeBSD (since version 13[0]) by default discards packets
> > without timestamps for connections that formally had negotiated to have
> > them. This new behavior seems to be in line with RFC 7323, section
> > 3.2[1]:
> > 
> >     "Once TSopt has been successfully negotiated, that is both <SYN> and
> >     <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
> >     segment for the duration of the connection, and SHOULD be sent in an
> >     <RST> segment (see Section 5.2 for details)."
> > 
> > As it turns out, macOS does exactly this - send keep-alive packets
> > without a timestamp for connections that were negotiated to have them.
> 
> I wonder if I'm running into this with ssh connections to freefall.  My
> outgoing IPv6 connections pass through an ipfw firewall that uses
> dynamic rules.  When the dynamic rule gets close to expiration, it
> generates keep alive packets that just seem to be ignored by freefall.
> Eventually the dynamic rule expires, then sometime later sshd on
> freefall sends a keepalive which gets dropped at my end.

Verry likely:
freefall:rgrimes {101} sysctl net.inet.tcp.tolerate_missing_ts
net.inet.tcp.tolerate_missing_ts: 0

Can someone please flip this on freefall to =1.
-- 
Rod Grimes                                                 rgrimes@freebsd.org