flowtable, collisions, locking and CPU affinity

Thu Feb 13 07:48:46 UTC 2014

On 12 February 2014 21:48, Gleb Smirnoff <glebius at freebsd.org> wrote:
> On Fri, Feb 07, 2014 at 04:12:56PM -0800, Adrian Chadd wrote:
> A> I've been knee deep in the flowtable code looking at some of the less
> A> .. predictable ways it behaves.
> A>
> A> One of them is the collisions that do pop up from time to time.
> A>
> A> I dug into it in quite some depth and found out what's going on. This
> A> assumes it's a per-CPU flowtable.
> A>
> A> * A flowtable lookup is performed, on say CPU #0
> A> * the flowtable lookup fails, so it goes to do a flowtable insert
> A> * .. but since in between the two, the flowtable "lock" is released so
> A> it can do a route/adjacency lookup, and that grabs a lock
> A> * .. then the flowtable insert is done on a totally different CPU
> A> * .. which happens to _have_ the flowtable entry already, so it fails
> A> as a collision which already has a matching entry.
> A>
> A> Now, the reason for this is primarily because there's no CPU pinning
> A> in the lookup path and if there's contention during the route lookup
> A> phase, the scheduler may decide to schedule the kernel thread on a
> A> totally different CPU to the one that was running the code when the
> A> lock was entered.
> A>
> A> Now, Gleb's recent changes seem to have made the instances of this
> A> drop, but he didn't set out to fix it. So there's something about his
> A> changes that has changed the locking/contention profile that I was
> A> using to easily reproduce it.
> A>
> A> In any case - the reason it's happening above is because there's no
> A> actual lock held over the whole lookup/insert path. It's a per-CPU
> A> critical enter/exit path, so the only way to guarantee consistency is
> A> to use sched_pin() for the entirety of the function.
> A>
> A> I'll go and test that out in a moment and see if it quietens the
> A> collisions that I see in lab testing.
> A>
> A> Has anyone already debugged/diagnosed this? Can anyone think of an
> A> alternate (better) way to fix this?
>
> Can't we just reuse the colliding entry?
>
> Can you evaluate patch attached (against head) in your testing
> conditions?

It's late and I'm exhausted, but I think one of the things that stood
out to me was the uncertainty as to whether the thread being preempted
by some work would end up being resumed on the same CPU, or whether it
could resume on a different CPU. If that happened whilst the flowtable
code ran on a different CPU, things could be freed from underneath the
code that's about to use it.

I'll look at this in some more depth tomorrow, but I think the safest
thing to do right now is to sched_pin() to make the concurrency
locking model consistent. The window of opportunity for things to get
resumed on a different CPU may be almost-zero, but we're doing this
over a million times a second on some forwarding based platforms.

Thanks,

-a