[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests).

Wed Feb 4 12:55:42 UTC 2015

rrs added a comment.

Ok guys, I have puzzled out what that crash *may* be that was posted by Hiren. The same
issue exists in the timeout code rewrite that Han's has up on the board as well. Though
the callout_drain_async() may solve it if the user called that instead. Here is what is happening.

1) CPU1 is running the callout from the callout wheel it gets to 
    softclock_call_cc() line 635, its all ready to run and unlocks the
    callout mutex. It then sees c_lock is populated and thus
    calls class->c_lock()...) to switch locks. It begins to spin
    in __rw_wlock_hard() in the while loop waiting to get the lock since its held.

2) lltable_free() has been called on CPU 2, it grabs its AFDATA lock and then goes through
    each entry and does
    LIST_FOREACH_SAFE(..) {
           LLE_WLOCK(lle); <-- this is the lock that CPU 1 is waiting on
           if(callout_stop(la->la_timer))
                 LLE_REMREF(lle)
            llentry_free(lle);
      }

3) The callout_stop() on CPU 2 does what it is supposed to and sets the cc_cancel bit to true and
     return 1. This causes callout_stop() to lower the reference count which means when llentry_free()
     is called the lle *is* freed. This calls into either in_lltable_free() or in6_lltable_free() which
     unlocks the lock (letting CPU 1 go forward) and then promptly destroys the lock
     and frees the memory.

4) Finally our spinning CPU 1 loops back around and finds the destroyed mutex and crashes.

Now, some interesting notes on this:

a) If the lle* code had used MPSAFE instead of passing the lock in, then zero would have been 
    returned since the callout "could not have been stopped". This would have left the reference
    and *not* freed the memory.. the timer would have done that has it executed.
b) If the lle* code just did not stop the timer, and instead let it run off, it would have freed itself
    on expiration.. of course what you really want here is "stop it if you can" but don't stop it
    if its already running. Which is why <a> is true since thats what you get if you control the
    lock. I
c) This little hole is also in Han's new code unless we change the user of the code
    to use the async_drain routine. Which again is changing the KPI something I don't like to do
    since its in *so* many places ;-) Of course we may want an async-drain I have not thought about that.
d) This race has always been there near as I can tell and really is not caused by the migration
    code that was added like the other code that was fixed.

I don't really see an easy way to fix this accept to change the caller to use MPSAFE.. or maybe make
a async_drain routine.. but then we still end up changing the caller ;-)

INLINE COMMENTS
  kern/kern_timeout.c:1159 sounds good I will make it so.
  kern/kern_timeout.c:1234 Imp:

  Yes, it looks wrong in the kern_timeout.c code.. let me fix it ;-)

REVISION DETAIL
  https://reviews.freebsd.org/D1711

To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, hselasky, adrian, imp, sbruno
Cc: hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net