Reference count race window

Thu Jan 2 22:17:40 UTC 2014

Hi,
  I am Suresh from NetAPP and I have  questions/queries related to the reference count usage in the BSD kernel. We are seeing some corruptions/use after free
  issues and while  debugging we found that the corruption pattern is a ucred/crgroups structure and started looking at ucred reference count implemenation.

This is my understanding of ref count race window,  please correct me if I am wrong.

It seems there is a timing window exposed by the FreeBSD reference count usage/implementation. Let's start with the definitions of the acquire and release routines
in freebsd/sys/sys/refcount.h

static __inline void
refcount_acquire(volatile u_int *count)
{

        atomic_add_acq_int(count, 1);
}

static __inline int
refcount_release(volatile u_int *count)
{
        u_int old;

        /* XXX: Should this have a rel membar? */
        old = atomic_fetchadd_int(count, -1);
        KASSERT(old > 0, ("negative refcount %p", count));
        return (old == 1);
}

As implemented, a call to refcount_acquire  atomically increments the reference count while refcount_release decrements
the reference count and returns true if this release dropped the reference count to zero.

Consider the following sequence of events in the absence of other external synchronization:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo.
* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b,
  decrementing foo's refcount to zero and setting old to 1. refcount_release returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.

* The major hole here is that refcount_acquire is a void function. If it also returned status,
   calling software could determine that it had a valid reference and take appropriate action if it failed to acquire.

One such implementation might look like:
static __inline int
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old = atomic_fetchadd_int(count, 1);
        return (old != 0);
}

This change would require modification of all calls to refcount_acquire and determining appropriate action in the case of a non-success return.

Without changing the return-value semantics of refcount_acquire, we have introduced a panic if we detected a race as below.
static __inline void
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old = atomic_fetchadd_int(count, 1);
        if (old == 0) {
          panic("refcount_acquire race condition detected!\n");
        }
}

After this change , we have seen this panic in one of our systems. Could someone look at my understanding and give me some ways to narrow down this problem.
As I mentioned earlier, one option is to change refcount_acquire to be non void and change all the callers, but it seems there are many paths to be changed on failure case.

Thank you
Suresh