getting NUMA into the tree (userland most interesting for me)
John Baldwin
john at baldwin.cx
Thu Feb 19 22:41:26 UTC 2015
On Thursday, February 19, 2015 01:32:13 PM K. Macy wrote:
> On Wed, Feb 18, 2015 at 8:10 PM, John-Mark Gurney <jmg at funkthat.com> wrote:
> > I would like to help drive getting NUMA into the tree. Specificly,
> > getting userland allocations to be done from a specified domain.
> >
> > I've looked at the projects/numa tree, but it appears that not much was
> > done to get userland mappings to be NUMA aware.
> >
> > How are we going to do this? Do people have code to do this?
> >
> > I've looked at how Linux does this, at least from a programming
> > interface. They use mmap to create the mapping, and then use the call
> > mbind to tell the kernel where to handle the allocations. Is this
> > what people are thinking?
> >
> > I've checked the wiki status, and the userland section is quite
> > empty.
>
> I personally don't think the infrastructure is far enough along that
> this is near to be an interesting value proposition. However, that
> said, I do believe that maintaining linux compatibility is important.
> Thus I would be for adding it to the linux compatibility layer and
> export it on the FreeBSD API side purely as an SPI until consensus is
> reached.
Yes, I think we have a fair bit to do in the kernel before we are in a
position to export anything truly useful to userland unfortunately. The last
time I talked with Jeff about projects/numa (after the first draft of the wiki
page) I came away with the impression that there might be some things we can
pull out of that branch, but that it isn't suitable for merging upstream
directly. Jeff noted that he and Alan had gone through several iterations of
this already (I believe at least 3 completely different policy designs) all of
which had their own issues.
Outside of the VM I think that we can keep the APIs somewhat stable by having
this opaque policy cookie to pass around that we can redefine the guts of
later. However, various parts of the VM all have to handle whatever the
policy defines, and while the vm_phys bits and contigmalloc() might be kind of
obvious to implement, higher level VM layers like kmem() and malloc() are more
complicated. One thing that is in projects/numa is changes for UMA that we
can hopefully reuse much of, but I don't recall how much (if any) of
kmem/malloc is in there. Also, while vm_phys is one of the first things to
do, I know that Alan and Jeff have pending patches to remove the cache queue
(since it is far less useful than it seems) which simplify vm_phys making it
easier to implement NUMA policies there, so I'm hoping we can get that in
sooner before having to start tearing up the VM too much. This is why the
stuff I currently have is targeted non-VM bits like interrupts as getting that
correct is lower-hanging fruit that might provide some gains regardless. Even
once vm_phys is done I think the first thing to tackle next is contigmalloc to
facilitate static bus_dma allocations (descriptor rings and such) being local
to a device.
--
John Baldwin
More information about the freebsd-arch
mailing list