From gallatin at netflix.com Fri May 3 15:11:32 2019 From: gallatin at netflix.com (Drew Gallatin) Date: Fri, 3 May 2019 11:11:15 -0400 Subject: RFC: NUMA mods for SO_REUSEPORT_LB Message-ID: The next patch up in my NUMA patchset is my patch to affinitize SO_REUSEPORT_LB sockets. I have to admit that I'm not super happy with it, and I was looking for constructive feedback. In our (Netflix) workload, we have an nginx master process which creates N different listen sockets when SO_REUSEPORT_LB is in use. He forks off the workers, and they then affinitize themselves as directed in the nginx.conf. (worker N might not be bound to CPU N). They then take over the listen sockets and start serving. In order to deal with this, I made a TCP_REUSPORT_LB_NUMA socket option. The inpcblbgroup struct has been modified to add an il_numa_domain field. When a group is created, this is set to M_NODOM ("numa wildcard"). On lookup, only groups with matching numa domains are considered when an mbuf has a non-M_NODOM m_numa_domain field set. (and a numa wildcard match is done if no matches are found). When nginx wants to use this, he calls setsockopt(... TCP_REUSEPORT_LB_NUMA...) on the existing listen socket This sockopt: - gets the CPU affinity mask of the calling thread - finds the current NUMA domain for the calling thread - looks up the inp and removes it from the numa-wildcard (M_NODOM) group and inserts it into a new group specific to that numa domain. This actually works quite well for me, but I don't think it is ready for prime-time. The sockopt API was admittedly done to satisfy my particular use case, and I'm looking for feedback on how to improve it. Specifically: 1) Is it OK to add a new option that modifies an existing listen socket? - This was the right choice for my application. Is it too awkward in general? 2) Should the sockopt put the job of selecting the appropriate numa domain onto the caller? Right now, everything is automatic. Should it just take an argument which corresponds to a NUMA domain (or -1 to remove the NUMA domain affinity)? Should it take an argument that corresponds to a CPUSET? Any feedback is welcome. Thanks, Drew -------------- next part -------------- A non-text attachment was scrubbed... Name: reuse_numa.diff Type: text/x-patch Size: 16043 bytes Desc: not available URL: