Random 'Connection reset' issues between jails on same host

Eirik Øverby ltning at anduin.net
Sun Jan 15 18:36:19 UTC 2012


On Jan 15, 2012, at 18:44, Eirik Øverby wrote:

> Hi all,
> 
> We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. 
> 
> We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two systems to 9.0-RELEASE, and it shows the same problem.
> 
> When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up.

Replying to myself here:

Assignig a cpuset with a single CPU to the jail with puppetmaster seems to cure the symptom. I've made a few thousand connects now and no failures so far. Repeatable on 8 and 9. This is obviously only a workaround - but may give some hints as to where the problem is.

/Eirik



More information about the freebsd-stable mailing list