Re: ssh connections break with "Fssh_packet_write_wait" on 13 [SOLVED]

From: Rodney W. Grimes <freebsd-rwg_at_gndrsh.dnsmgr.net>
Date: Tue, 08 Jun 2021 22:20:56 UTC
> 
> On Thu, 3 Jun 2021 15:09:06 +0200
> Michael Gmelin <freebsd@grem.de> wrote:
> 
> > On Tue, 1 Jun 2021 13:47:47 +0200
> > Michael Gmelin <freebsd@grem.de> wrote:
> > 
> > > Hi,
> > > 
> > > Since upgrading servers from 12.2 to 13.0, I get
> > > 
> > >   Fssh_packet_write_wait: Connection to 1.2.3.4 port 22: Broken pipe
> > > 
> > > consistently, usually after about 11 idle minutes, that's with and
> > > without pf enabled. Client (11.4 in a VM) wasn't altered.
> > > 
> > > Verbose logging (client and server side) doesn't show anything
> > > special when the connection breaks. In the past, QoS problems
> > > caused these disconnects, but I didn't see anything apparent
> > > changing between 12.2 and 13 in this respect.
> > > 
> > > I did a test on a newly commissioned server to rule out other
> > > factors (so, same client connections, some routes, same
> > > everything). On 12.2 before the update: Connection stays open for
> > > hours. After the update (same server): connections breaks
> > > consistently after < 15 minutes (this is with unaltered
> > > configurations, no *AliveInterval configured on either side of the
> > > connection). 
> > 
> > I did a little bit more testing and realized that the problem goes
> > away when I disable "Proportional Rate Reduction per RFC 6937" on the
> > server side:
> > 
> >   sysctl net.inet.tcp.do_prr=0
> > 
> > Keeping it on and enabling net.inet.tcp.do_prr_conservative doesn't
> > fix the problem.
> > 
> > This seems to be specific to Parallels. After some more digging, I
> > realized that Parallels Desktop's NAT daemon (prl_naptd) handles
> > keep-alive between the VM and the external server on its own. There is
> > no direct communication between the client and the server. This means:
> > 
> > - The NAT daemon starts sending keep-alive packages right away (not
> >   after the VM's net.inet.tcp.keepidle), every 75 seconds.
> > - Keep-alive packages originating in the VM never reach the server.
> > - Keep-alive originating on the server never reaches the VM.
> > - Client and server basically do keep-alive with the nat daemon, not
> >   with each other.
> > 
> > It also seems like Parallels is filtering the tos field (so it's
> > always 0x00), but that's unrelated.
> > 
> > I configured a bhyve VM running FreeBSD 11.4 on a separate laptop on
> > the same network for comparison and is has no such issues.
> > 
> > Looking at TCP dump output on the server, this is what a keep-alive
> > package sent by Parallels looks like:
> > 
> >   10:14:42.449681 IP (tos 0x0, ttl 64, id 15689, offset 0, flags
> > [none], proto TCP (6), length 40)
> >     192.168.1.1.58222 > 192.168.1.2.22: Flags [.], cksum x (correct),
> >     seq 2534, ack 3851, win 4096, length 0
> > 
> > While those originating from the bhyve VM (after lowering
> > net.inet.tcp.keepidle) look like this:
> > 
> >   12:18:43.105460 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF],
> >     proto TCP (6), length 52)
> >     192.168.1.3.57555 > 192.168.1.2.22: Flags [.], cksum x
> >     (correct), seq 1780337696, ack 45831723, win 1026, options
> >     [nop,nop,TS val 3003646737 ecr 3331923346], length 0
> > 
> > Like written above, once net.inet.tcp.do_prr is disabled, keepalive
> > seems to be working just fine. Otherwise, Parallel's NAT daemon kills
> > the connection, as its keep-alive requests are not answered (well,
> > that's what I think is happening):
> > 
> >   10:19:43.614803 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> >     proto TCP (6), length 40)
> >     192.168.1.1.58222 > 192.168.1.2.22: Flags [R.], cksum x (correct),
> >     seq 2535, ack 3851, win 4096, length 0
> > 
> > The easiest way to work around the problem Client side is to configure
> > ServerAliveInterval in ~/.ssh/config in the Client VM.
> > 
> > I'm curious though if this is basically a Parallels problem that has
> > only been exposed by PRR being more correct (which is what I suspect),
> > or if this is actually a FreeBSD problem.
> > 
> 
> So, PRR probably was a red herring and the real reason that's happening
> is that FreeBSD (since version 13[0]) by default discards packets
> without timestamps for connections that formally had negotiated to have
> them. This new behavior seems to be in line with RFC 7323, section
> 3.2[1]:
> 
>     "Once TSopt has been successfully negotiated, that is both <SYN> and
>     <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
>     segment for the duration of the connection, and SHOULD be sent in an
>     <RST> segment (see Section 5.2 for details)."
> 
> As it turns out, macOS does exactly this - send keep-alive packets
> without a timestamp for connections that were negotiated to have them.
> 
> Under normal circumstances - ssh from macOS to a server running FreeBSD
> 13 - this won't be noticed, since macOS uses the same default settings
> as FreeBSD (2 hours idle time, 75 seconds intervals), so the server
> side initiated keep-alive will save the connection before it has a
> chance to break due to eight consecutive unanswered keep-alives at the
> client side.
> 
> This is different for ssh connections originating from a VM inside
> Parallels, as connections created by prl_naptd will start sending tcp
> keep-alives shortly after the connection becomes idle. As a result,
> idle connections break after about 11 minutes of idle time (60s
> + 8*75s = 660s == 11m), unless countermeasures are taken.
> 
> An easy way to demonstrate the problem is to change keep-alive defaults
> on *macOS* using sysctl and sshing to a FreeBSD 13 server:
> 
>     $ sudo sysctl net.inet.tcp.keepidle=5000
>     $ sudo sysctl net.inet.tcp.keepintvl=5000
>     $ ssh -oTCPKeepAlive=yes myserver
> 
> This way, the problem described can be reproduced quite easily:
> Disconnect due to broken pipe after 45-60 seconds of idle time, tcpdump
> confirming that keep-alive packets don't have tcp timestamps, while
> they were used when negotiating the connection.
> 
> There are various ways to work around the issue.
> 
> Client side workarounds:
> - Use ServerAlive* settings in ~/.ssh/config (ssh only)
> - Tune net.inet.tcp.keep* sysctls on macOS (for all services)
> 
> Server side workarounds:
> - Use ClientAlive* settings in ~/.ssh/config (ssh only)
> - Tolerate missing timestamps in packets using sysctl, which makes
>   FreeBSD 13 behave like previous versions did:
> 
>     sysctl net.inet.tcp.tolerate_missing_ts=1
> 
> The last option probably being the most practical one.
> 
> rscheff@ and tuexen@ (thank you!) were able to reproduce the issue and
> reached out to Apple to see if there is something they can do to fix
> this at their end (macOS) in the future.

Can we please have the default of tolerate_missing_ts in
current, stable/13 and an errata issued to releng_13 changing
this value to =1 and staying that way until the buggy tcp
stacks are found and eliminated.

> 
> Best
> Michael
> 
> [0]https://cgit.freebsd.org/src/commit/?id=283c76c7c3f2f634f19f303a771a3f81fe890cab
> [1]https://datatracker.ietf.org/doc/html/rfc7323#section-3.2
> 
> -- 
> Michael Gmelin
> 
> 

-- 
Rod Grimes                                                 rgrimes@freebsd.org