Question about adding flags to mmap system call / NVIDIA amd64 driver implementation

Thu Apr 30 21:41:32 UTC 2009

On Tuesday 28 April 2009 7:58:57 pm Julian Elischer wrote:
> Robert Noland wrote:
> > On Tue, 2009-04-28 at 16:48 -0500, Kevin Day wrote:
> >> On Apr 28, 2009, at 3:19 PM, Julian Bangert wrote:
> >>
> >>> Hello,
> >>>
> >>> I am currently trying to work a bit on the remaining "missing  
> >>> feature" that NVIDIA requires ( 
http://wiki.freebsd.org/NvidiaFeatureRequests 
> >>>   or a back post in this ML) -  the improved mmap system call.
> 
> 
> you might check with jhb (john Baldwin) as I think (from his
> p4 work) that he may be doing something in this area in p4.

After some promptings from Robert and his needs for Xorg recently I did start 
hacking on this again.  However, I haven't tested it yet.  What I have done 
so far is in //depot/user/jhb/pat/... and it does the following:

1) Adds a vm_cache_mode_t.  Each arch defines the valid values for this (I've 
only done the MD portions of this work for amd64 so far).  Every arch must at 
least define a value for VM_CACHE_DEFAULT.

2) Stores a cache mode in each vm_map_entry struct.  This cache mode is then 
passed down to a few pmap functions: pmap_object_init_pt(), 
pmap_enter_object(), and pmap_enter_quick().  Several vm_map routines such as 
vm_map_insert() and vm_map_find() now take a cache mode to use when adding a 
new mapping.

3) Each VM object stores a cache mode as well (defaults to VM_CACHE_DEFAULT).  
When a VM_CACHE_DEFAULT mapping is made of an object, the cache mode of the 
object is used.

4) A new VM object type: OBJT_SG.  This object type has its own pager that is 
sort of like the device pager.  However, instead of invoking d_mmap() to 
determine the physaddr for a given page, it consults a pre-created 
scatter/gather list (an ADT from my branch for working on unmapped buffer 
I/O) to determine the backing physical address for a given virtual address.

5) A new callback for device mmap: d_mmap_single().  One of the features of 
this is that it can return a vm_object_t to be used to satisfy the mmap() 
request instead of using the device's device pager VM object.

6) A new mcache() system call similar to mprotect(), except that it changes 
the cache mode of an address range rather than the protection.  This may not 
be all that useful really.

Given all this, a driver could do the following to map a "thing" as WC in both 
userland and the kernel:

1) When it learns about a "thing" it creates a SG list to describe it.  If 
the "thing" consists of userland pages, it has to wire the pages first.  The 
driver can use vm_allocate_pager() to create a OBJT_SG VM object.  It sets 
the object's cache mode to VM_CACHE_WC (if the arch supports that).

2) When userland wants to map the "thing" it does a device mmap() with a 
proper length and a file offset that is a cookie for the "thing".  The device 
driver's d_mmap_single() recognizes the magic file offset and returns 
the "thing"'s VM object.  Since the mapping info is now part of a normal 
object mapping, it will go away via munmap(), etc.  The driver no longer has 
to do weird gymnastics to invalidate mappings from its device pager 
as "transient" mappings are no longer stored in the device pager.

3) When the driver wants to map the "thing" into the kernel, it can use 
vm_map_find() to insert the "thing"'s VM object into kernel map.

And I think that is all there is to it.  I need to test this somehow to make 
sure though, and make sure this meets the needs of Robert and Nvidia.

-- 
John Baldwin