VIMAGE UDP memory leak fix

Sat Nov 22 17:09:24 UTC 2014

On 21 Nov 2014, at 17:40, Adrian Chadd <adrian at freebsd.org> wrote:

>>> Skimming through a bunch of hosts with moderately loaded hosts with
>>> reasonably high uptime I couldn't find one where net.inet.tcp.timer_race
>>> was not zero. A ny suggestions how to best reproduce the race(s) in
>>> tcp_timer.c?
>> 
>> They would likely occur only on very highly loaded hosts, as they require race conditions to arise between TCP timers and TCP close. I think I did manage to reproduce it at one stage, and left the counter in to see if we could spot it in production, and I have had (multiple) reports of it in deployed systems. I'm not sure it's worth trying to reproduce them, given that knowledge -- we should simply fix them.
> 
> Wasn't this just fixed by Julien @ Verisign?

I don't believe so, although it's the kind of thing Julien is very good at fixing!

The issue here is that we can't call callout_drain() from contexts where we finalise TCP connection close and attempt to free the inpcb. The 'easy' fix is to create a taskqueue thread to do the callout_drain() in the event that we discover that callout_stop() isn't able to guarantee that pending callouts are neither in execution nor scheduled. We'd then defer the very tail of TCP teardown to that asynchronous context rather than trying to do it to completion in the current (and rather more sensitive) one. This would happen only very in frequently so have little overhead in practice, although one would want to carefully look at the sync behaviour to make sure it wasn't frequently enough that a backlog might build up.

> As for the vimage stability side of things - I'd really like to see
> some VIMAGE torture tests written. Stuff like "do a high rate TCP
> connection test whilst creating and destroying VIMAGEs."

... and even for non-VIMAGE. :-)

Robert