flowtable, collisions, locking and CPU affinity

Adrian Chadd adrian at freebsd.org
Sat Feb 8 00:12:58 UTC 2014


Hi,

I've been knee deep in the flowtable code looking at some of the less
.. predictable ways it behaves.

One of them is the collisions that do pop up from time to time.

I dug into it in quite some depth and found out what's going on. This
assumes it's a per-CPU flowtable.

* A flowtable lookup is performed, on say CPU #0
* the flowtable lookup fails, so it goes to do a flowtable insert
* .. but since in between the two, the flowtable "lock" is released so
it can do a route/adjacency lookup, and that grabs a lock
* .. then the flowtable insert is done on a totally different CPU
* .. which happens to _have_ the flowtable entry already, so it fails
as a collision which already has a matching entry.

Now, the reason for this is primarily because there's no CPU pinning
in the lookup path and if there's contention during the route lookup
phase, the scheduler may decide to schedule the kernel thread on a
totally different CPU to the one that was running the code when the
lock was entered.

Now, Gleb's recent changes seem to have made the instances of this
drop, but he didn't set out to fix it. So there's something about his
changes that has changed the locking/contention profile that I was
using to easily reproduce it.

In any case - the reason it's happening above is because there's no
actual lock held over the whole lookup/insert path. It's a per-CPU
critical enter/exit path, so the only way to guarantee consistency is
to use sched_pin() for the entirety of the function.

I'll go and test that out in a moment and see if it quietens the
collisions that I see in lab testing.

Has anyone already debugged/diagnosed this? Can anyone think of an
alternate (better) way to fix this?

Thanks,



-a


More information about the freebsd-arch mailing list