svn commit: r278472 - in head/sys: netinet netinet6

Gleb Smirnoff glebius at FreeBSD.org
Fri Feb 13 21:21:33 UTC 2015


On Mon, Feb 09, 2015 at 03:11:21PM -0500, John Baldwin wrote:
J> On Monday, February 09, 2015 07:28:12 PM Randall Stewart wrote:
J> > Author: rrs
J> > Date: Mon Feb  9 19:28:11 2015
J> > New Revision: 278472
J> > URL: https://svnweb.freebsd.org/changeset/base/278472
J> > 
J> > Log:
J> >   This fixes a bug in the way that the LLE timers for nd6
J> >   and arp were being used. They basically would pass in the
J> >   mutex to the callout_init. Because they used this method
J> >   to the callout system, it was possible to "stop" the callout.
J> >   When flushing the table and you stopped the running callout, the
J> >   callout_stop code would return 1 indicating that it was going
J> >   to stop the callout (that was about to run on the callout_wheel blocked
J> >   by the function calling the stop). Now when 1 was returned, it would
J> >   lower the reference count one extra time for the stopped timer, then
J> >   a few lines later delete the memory. Of course the callout_wheel was
J> >   stuck in the lock code and would then crash since it was accessing
J> >   freed memory. By using callout_init(c, 1) we always get a 0 back
J> >   and the reference counting bug does not rear its head. We do have
J> >   to make a few adjustments to the callouts themselves though to make
J> >   sure it does the proper thing if rescheduled as well as gets the lock.
J> > 
J> >   Commented upon by hiren and sbruno
J> >   See Phabricator D1777 for more details.
J> > 
J> >   Commented upon by hiren and sbruno
J> >   Reviewed by:	adrian, jhb and bz
J> >   Sponsored by:	Netflix Inc.
J> 
J> Eh, I looked at it, but I really, really don't like it.  I think 
J> callout_init_*() should be preferred to CALLOUT_MPSAFE whenever possible as it 
J> is less race-prone.  I think this should probably be fixed by adding Hans' 
J> callout_drain_async() instead, though this is fine as a temporary workaround.

I second concerns. Please look at kern/165863 and r238990 that fixed it.
Transition from CALLOUT_MPSAFE to callout_init_rw() was intentional
and fixed races.

I added to Cc guys who helped to track down that races. May be someone still
has test scripts at hand. AFAIR, there were some that allowed to put a box
down quite quickly.

-- 
Totus tuus, Glebius.


More information about the svn-src-all mailing list