[Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage.

rrs (Randall Stewart) phabric-noreply at FreeBSD.org
Thu Feb 5 17:11:48 UTC 2015


rrs added a comment.

Jhb/Others

So lets go through your  scenario with code in arp:

a) softclock dequeues callout to run

-- Which calls softclock_call_cc 
    We make it to line:676 and see that "yes" the user (arp) init'd with a rw_mtx
    and run the next line 677 (to get the lock).

b) other thread grabs lle_lock
   -- Our other thread is a call in to flush the table in net/if_llatble.c the
      lucky winner is line 181.

c) softclock blocks on lle_wlock above
    -- from the call to line 677 right.

d) other thread tears down structure, unlocks lock, zeros memory, 0xdeadc0de, etc.

  -- Now here is where its interesting, the other thread does
      if (callout_stop(&lle->la_timer))
             LLE_REMREF(lle);
      lleentry_free

 llentry_free is going to do:
   - remove the entry from the lists
   - and in the end call LLE_FREE_LOCKED()
   - LLE_FREE_LOCKED() is a macro that checks
    if (lle->lle_refcnt == 1) 
           call free function
    else {
           LLE_REMREF(lle)
           LLE_WUNLOCK()
    }

   Since we are a "stoppable callout" and the callout has a lock, it will fall through (even though its not
   safe) and return 1, yes the callout was stopped. So we hit the call free function which in this
   case in_lltable_free
   which does:
        LLE_WUNLOCK(lle)
        LLE_LOK_DESTROY(lle)
        free(lle, M_LLTABLE)


e) softclock wakes up in mutex code and panics becuase the mutex is destroyed and it either triggers an assertion, follows a bad pointer trying to propagate priority or see if the "owner" is running, etc.

-- And we wake up and boom.

However, with the change  from callout_init_rw(c,..) -> callout_init(c, 1)  things change.

Instead you get a 0 return from callout_stop, since the callout can *not* be stopped at this point.
We don't do that first reference  lower so we hit the else case in llentry_free which just reduces
the count to 1 and unlocks and returns. 

Now our callout proceeds, getting the lock and it will then go through and check only
the pending flag (the active has been removed by the stop but we don't care). There
we now do the free and all is well.

That is how this fix avoids the issue.

Would it be better to have callout_async_drain()? Yes probably so, but then this
code would have to be restructured a lot more than this small change.

I hope that explains how it works here.. unless of course I am missing something???

R

REVISION DETAIL
  https://reviews.freebsd.org/D1777

To: rrs, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian, bz, jhb
Cc: bz, emaste, hiren, julian, hselasky, freebsd-net


More information about the freebsd-net mailing list