[CURRENT]: Broken ssh: Fssh_packet_write_wait: Connection to XXX.XXX.XXX.XXX port 22: Broken pipe

O. Hartmann ohartman at zedat.fu-berlin.de
Thu May 12 16:44:55 UTC 2016

Since a couple of time now (~1 1/2 months) I'm bothered by very unreliable ssh
connections betwwwn CURRENT boxes. Very often, the connection simply dies with

Fssh_packet_write_wait: Connection to  XXX.XXX.XXX.XXX port 22: Broken pipe

This is even worse than annoying, how to maintain systems remotely with such unreliable

The problem seems to be related to CURRENT, but I do not have any truthfull reference
since we use only one 10.3-STABLE box.

I will describe my observations, hopefully someone can make a picture out of it. 

The "Broken pipe" which kills poudriere sessions, buildworld (worse, if a installworld
gets caught by the Broken pipe!) are between CURRENT systems, the "controling" box is a
CURRENT box with X11/xterm from which I start the ssh sesseion.

Connections from such X11/xterm systems no remote servers seem to be "stable" as long as
I do not open a second ssh connection. But this is not much reliable, just an
observation. Sometimes an open ssh connection lasts tens of minutes, even with some
"noise" (output) on the terminal or relaxed (static blinking cursor awaiting
further input), but in other cases, a connections dies very quickly. It seems to me that
this behaviour is random. It occurs under load or on relaxed systems randomly, sometimes
very quick, sometimes it lasts longer. The observation of today about the single-ssh
connection is weak, but I have a strange suspicion that concurrent sessions trigger the
drops faster. In any case, the ssh session seems to go "asleep" after a while: that
happens randomly over a time or very quickly - I have no clue what triggers this erratic
behaviour. It takes a while before the ssh connection/xterm takes input again - up to 30
seconds (even on fast, relaxed systems) or as final consequence, a "Broken pipe".

Today, I made another experience. Having some autofs mounts on several systems,
performance/bandwith seemed very bad/slow (both server and clients are CURRENT, most
recent builds as of today).

I reported earlier on this list about shaky and slow performance in conjunction with the
ssh problem, but I wasn't able to figure out what causes the problem! And I'm wondering
about nobody else is facing such dramatic dropouts of the ssh connections or performance

I think I will issue a PR on this, too.

Kind regards,

O. Hartmann
