TCP socket shutdown race condition

Ed Maste emaste at sandvine.com
Wed Aug 13 14:01:58 PDT 2003


Mike "Silby" Silbersack wrote:

>Well, as ui_ref is the best bet, redoing your tests with it expanded to
>ui_int is where we need to start before looking further. :)
>
>I believe that a uidinfo->ui_ref over/underflow could cause random memory
>corruption, so maybe the panic you're seeing comes about after a bunch of
>memory has already been trashed.
>
>So anyway, promote ui_ref to a u_int and retest.  Tell us what happens.

So as Scot mentioned (http://news.gw.com/freebsd.net/10900) it doesn't 
look like the ui_ref is overflowing, and the panic still happens with a 
32 bit ref count.

I think I've found the problem.

crfree() is called from a lot of places (I counted at least 20) including 
sodealloc() in the socket code, crcopy() etc.  It's called at splnet() from 
sodealloc().   I'm not sure what spl (if any) it might be called at from 
elsewhere, but certainly not splnet().

I believe the non-atomic (on SMP) increment and decrement in crhold() and 
crfree() result in a race condition, with the ref count ending up less than 
it should be.  By adding a busy wait loop to crhold() and crfree() and
making 
them "even less atomic" I was able to reliably make the same panic occur 
within a minute or two of starting my test.  (Running the same test, it took

on the order of a day for Scot to observe a panic.)

I've added an splhigh() around the code in crhold() and crfree() (with the 
delay left in) and haven't observed a panic yet.  I'm not sure what the best
way to fix this is, but the ref count inc/dec either needs to be protected
or made atomic.

I'm going to investigate the correct solution for this and supply a 
PR / patch, but for now let me know if more information is desired.

-ed


More information about the freebsd-net mailing list