Debugging dropped shell connections over a VPN

Wed Jul 27 15:04:19 UTC 2011

On 07/27/11 06:50, Gary Palmer wrote:
> On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote:
>> On 07/26/11 08:05, Gary Palmer wrote:
>>> On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:
>>>> Again, sorry for the sluggish response.
>>>>
>>>> On 07/20/11 15:15, Gary Palmer wrote:
>>>>> On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:
>>>>>> On 07/07/11 14:39, Chuck Swiger wrote:
>>>>>>> On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
>>>>>>>> My setup is something like this:
>>>>>>>> - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
>>>>>>>> machines running various OS versions.
>>>>>>>> - My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1
>>>>>>>> with
>>>>>>>> ipfw, nat and racoon for the firewall and VPN.
>>>>>>>>
>>>>>>>> The problem is that rlogin, ssh and telnet connections over the VPN
>>>>>>>> get
>>>>>>>> dropped after some period of inactivity.
>>>>>>> You're probably getting NAT timeouts against the VPN connection if it
>>>>>>> is
>>>>>>> left idle.  racoon ought to have a config setting called natt_keepalive
>>>>>>> which sends periodic keepalives-- see whether that's disabled.
>>>>>>>
>>>>>>> Regards,
>>>>>> Thanks for the suggestions Chuck, sorry it's taken so long to respond
>>>>>> but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
>>>>>> order to try this out.
>>>>>>
>>>>>> One thing that I did not explicitly mention before is that I am routing
>>>>>> a network over the VPN.
>>>>> Hi Paul,
>>>>>
>>>>> Even if you are not being NAT'd on the VPN there may be a firewall (or
>>>>> other active network component like a load balancer) with an
>>>>> overflowing state table somewhere at the remote end.  We see this
>>>>> frequently where I work with customer networks and the
>>>>> firewall/VPN/network
>>>>> admin denies that its a time out issue so there is likely some device in
>>>>> the network that has a state table and if the connection is idle for a
>>>>> few minutes it gets dropped.
>>>> Hmmm,  this seems likely.  Have you had any luck in finding the culprit
>>>> and resolving the problem?
>>> Unfortunately no.  We know the problem exists but as a vendor we have
>>> very little success in getting the customer to identify the problematic
>>> device inside their network as it only seems to affect our connections
>>> to them when we are helping them with problems, so there is almost
>>> always something more important going on and the timeout issue gets put
>>> on the back burner and forgotten.  We've worked around it in some
>>> places by using the ssh 'ServerAliveInterval' directive to make ssh
>>> send packets and keep the session open even if we're idle, but that
>>> doesn't always work.
>> OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in
>> the ssh_config man page.  I assume these are what you are referring to.
>> I tried setting ClientAliveInterval to 15 seconds with
>> ClientAliveCountMax set to 3 and this seems to help.  I've only tried
>> this a couple of times but I have seen an ssh session stay alive for
>> over an hour.  The bad news is that the sessions are still getting
>> dropped, at least now I know when it happens.  Now I'm getting the
>> following message:
>>
>>      Received disconnect from 10.64.20.69: 2: Timeout, your session not
>> responding.
>>
>>  From a quick perusal of the openssh source, it is not obvious whether
>> this message is coming from the client or the server side.   Initially,
>> because the keep alive timer is a server side setting, I assumed the
>> message was coming from the server side but if the session is not
>> responding how is the message getting to the client?  If it is a client
>> side problem, then I have much more flexibility to fix.  All I can do is
>> whine about server side problems.
>
> Hi Paul,
>
> ServerAliveInterval is actually a client setting.  e.g.  put this in
> your ~/.ssh/config file
>
> host *
> 	ServerAliveInterval 15
>
> will set the client to ping the server every 15 seconds and try to
> keep the connection alive.  You can replace '*' you want to be more
> targeted in your configuration.

Ah, I see.  I was looking at the Solaris ssh_config man page.  The 
OpenSSH ssh_config man page is third in the sequence.  The ServerAlive* 
options are not documented in the Solaris ssh_config man page.  I'll try 
it out too.  Thanks.

> I've never played with the server side settings for various reasons.
>
> Regards,
>
> Gary
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>

-- 
Paul Keusemann			                      pkeusem at visi.com
4266 Joppa Court		                      (952) 894-7805
Savage, MN  55378