getting NUMA into the tree (userland most interesting for me)

Thu Feb 19 22:41:26 UTC 2015

On Thursday, February 19, 2015 01:32:13 PM K. Macy wrote:
> On Wed, Feb 18, 2015 at 8:10 PM, John-Mark Gurney <jmg at funkthat.com> wrote:
> > I would like to help drive getting NUMA into the tree.  Specificly,
> > getting userland allocations to be done from a specified domain.
> > 
> > I've looked at the projects/numa tree, but it appears that not much was
> > done to get userland mappings to be NUMA aware.
> > 
> > How are we going to do this?  Do people have code to do this?
> > 
> > I've looked at how Linux does this, at least from a programming
> > interface.  They use mmap to create the mapping, and then use the call
> > mbind to tell the kernel where to handle the allocations.  Is this
> > what people are thinking?
> > 
> > I've checked the wiki status, and the userland section is quite
> > empty.
> 
> I personally don't think the infrastructure is far enough along that
> this is near to be an interesting value proposition. However, that
> said, I do believe that maintaining linux compatibility is important.
> Thus I would be for adding it to the linux compatibility layer and
> export it on the FreeBSD API side purely as an SPI until consensus is
> reached.

Yes, I think we have a fair bit to do in the kernel before we are in a 
position to export anything truly useful to userland unfortunately.  The last 
time I talked with Jeff about projects/numa (after the first draft of the wiki 
page) I came away with the impression that there might be some things we can 
pull out of that branch, but that it isn't suitable for merging upstream 
directly.  Jeff noted that he and Alan had gone through several iterations of 
this already (I believe at least 3 completely different policy designs) all of 
which had their own issues.

Outside of the VM I think that we can keep the APIs somewhat stable by having 
this opaque policy cookie to pass around that we can redefine the guts of 
later.  However, various parts of the VM all have to handle whatever the 
policy defines, and while the vm_phys bits and contigmalloc() might be kind of 
obvious to implement, higher level VM layers like kmem() and malloc() are more 
complicated.  One thing that is in projects/numa is changes for UMA that we 
can hopefully reuse much of, but I don't recall how much (if any) of 
kmem/malloc is in there.  Also, while vm_phys is one of the first things to 
do, I know that Alan and Jeff have pending patches to remove the cache queue 
(since it is far less useful than it seems) which simplify vm_phys making it 
easier to implement NUMA policies there, so I'm hoping we can get that in 
sooner before having to start tearing up the VM too much.  This is why the 
stuff I currently have is targeted non-VM bits like interrupts as getting that 
correct is lower-hanging fruit that might provide some gains regardless.  Even 
once vm_phys is done I think the first thing to tackle next is contigmalloc to 
facilitate static bus_dma allocations (descriptor rings and such) being local 
to a device.

-- 
John Baldwin