getting NUMA into the tree (userland most interesting for me)

K. Macy kmacy at freebsd.org
Thu Feb 19 22:49:19 UTC 2015


>> I personally don't think the infrastructure is far enough along that
>> this is near to be an interesting value proposition. However, that
>> said, I do believe that maintaining linux compatibility is important.
>> Thus I would be for adding it to the linux compatibility layer and
>> export it on the FreeBSD API side purely as an SPI until consensus is
>> reached.
>
> Yes, I think we have a fair bit to do in the kernel before we are in a
> position to export anything truly useful to userland unfortunately.  The last
> time I talked with Jeff about projects/numa (after the first draft of the wiki
> page) I came away with the impression that there might be some things we can
> pull out of that branch, but that it isn't suitable for merging upstream
> directly.  Jeff noted that he and Alan had gone through several iterations of
> this already (I believe at least 3 completely different policy designs) all of
> which had their own issues.
>
> Outside of the VM I think that we can keep the APIs somewhat stable by having
> this opaque policy cookie to pass around that we can redefine the guts of
> later.  However, various parts of the VM all have to handle whatever the
> policy defines, and while the vm_phys bits and contigmalloc() might be kind of
> obvious to implement, higher level VM layers like kmem() and malloc() are more
> complicated.  One thing that is in projects/numa is changes for UMA that we
> can hopefully reuse much of, but I don't recall how much (if any) of
> kmem/malloc is in there.  Also, while vm_phys is one of the first things to
> do, I know that Alan and Jeff have pending patches to remove the cache queue
> (since it is far less useful than it seems) which simplify vm_phys making it
> easier to implement NUMA policies there, so I'm hoping we can get that in
> sooner before having to start tearing up the VM too much.  This is why the
> stuff I currently have is targeted non-VM bits like interrupts as getting that
> correct is lower-hanging fruit that might provide some gains regardless.  Even
> once vm_phys is done I think the first thing to tackle next is contigmalloc to
> facilitate static bus_dma allocations (descriptor rings and such) being local
> to a device.
>

Contigmalloc improvements and cache queue removal are in the
phabricator queue now. They are also prerequisites for per-cpu free
page caches which are a huge scalability improvement for some
workloads such as Netflix's.

There is still a fair amount of scalability work  (including Jeffr's
per-domain pagedaemon work) that really needs to happens before we can
seriously think about a general user-level NUMA interface.



-K


More information about the freebsd-arch mailing list