NUMA Support is there in FreeBSD.

Mon Oct 3 17:34:31 UTC 2011

On Mon, Oct 3, 2011 at 10:24 AM, Arnaud Lacombe <lacombar at gmail.com> wrote:
> Hi,
>
> On Mon, Oct 3, 2011 at 12:31 PM,  <mdf at freebsd.org> wrote:
>> On Mon, Oct 3, 2011 at 7:55 AM, satish kondapalli <nitw.satish at gmail.com> wrote:
>>> I am new to FreeBSD, I just want know whether FreeBSD supports NUMA.
>>> If FreeBSD supports NUMA what are the kernel API to allocate memory?
>>> is there any example driver or any driver which is using the NUMA API?
>>>
>>> please provide some inputs...
>>
>> The kernel is NUMA-aware (at least for x86),
>>
> What "x86" ? i386 ? amd64 ? both ?

Both; see sys/x86/acpica/srat.c which parses the SRAT table.

>> and memory is allocated
>> round-robin amongst the memory domains.  There are not yet any KPIs
>> for allocating memory in a specific NUMA domain, nor for binding
>> specific threads / processes to get their memory local to a bound cpu
>> instead of round robin.
>>
> I'm not sure to follow you. Say you have 2 memory domain attached to 2
> different CPU package, each providing a memory domain, 4 physical core
> and eventually 8 virtual. Say you have a network adapter supporting 8
> RX/TX queue, dispatching RX packet to 8 netisr. Ideally, you'd want
> those 8 queue/netisr to each have an affinity for a given CPU/memory
> domain, have the network adapter route flow evenly on those those 8
> CPU. Now, if you allocated an mbuf from memory domain 1, and end up
> being processed by a CPU in domain 0, that likely to introduce
> performance penalty.

Your statement isn't incorrect.  What I'm saying is that there's no
KPI for requesting bound memory because, while the netstat example is
a fine one for where local memory is desired, the majority [1] of
processing is not bound to a CPU and so round-robin allocations will
produce uniform performance results -- that is, not the best possible,
but not wildly fluctuating as scheduling decisions over different runs
give different remote memory penalties.

[1] for some definition of 'majority'.

> Now, what about userland ?
>
> This is certainly an horribly big picture :/

Yes, and it's why I said just that there's no KPI.  One reason there
is no KPI is that there's a lot of fiddly bits to take into account.

My experience at IBM on AIX was that NUMA is very easy to get wrong;
specifically what one usually wants is for the OS to get the answer
right (especially for userspace) without a lot of manual tuning;
except for some specific applications like netstat queues or a machine
doing HPC or mostly running e.g. an Oracle db server, there's too much
happening for any one program to configure itself "right" for all the
uses of that code.  I remember a lot of customer reports of problems
from overly aggressive local memory use.  Most of the time no one
complained when things had consistent performance, even if that wasn't
quite as fast as possible.

In fact, I may be wrong about the round-robin; I sent jhb@ a patch and
I have no recollection anymore whether it's actually in CURRENT.  It's
been over a year since I thought about this much (BSDCan 2010 was the
last time I remember).

Cheers,
matthew