NVIDIA FreeBSD kernel feature requests
czander at nvidia.com
Thu Jun 29 11:12:35 UTC 2006
NVIDIA has been looking at ways to improve its graphics driver for the
FreeBSD i386 platform, as well as investigating the possibility of adding
support for the FreeBSD amd64 platform, and identified a number of
obstacles. Some progress has been made to resolve them, and NVIDIA would
like to summarize the current status. We would also like to thank John
Baldwin and Doug Rabson for their valuable help.
This summary makes an attempt to describe the kernel interfaces needed by
the NVIDIA FreeBSD i386 graphics driver to achieve feature parity with
the Linux/Solaris graphics drivers, and/or required to make support for
the FreeBSD amd64 platform feasible. It also describes some of the
technical difficulties encountered by NVIDIA during the FreeBSD i386
graphics driver's development, how these problems have been worked around
and what could be done to solve them better.
While the following is focused on the NVIDIA FreeBSD graphics drivers, we
believe the interfaces discussed below are generally applicable to any
modern high performance graphics driver.
The interfaces in question can be loosely categorized into the different
classes reliability, compatibility and performance:
The NVIDIA graphics driver needs to be able to create uncached kernel
and user mappings of I/O memory, such as NVIDIA GPU registers. The
FreeBSD kernel does not currently provide the interfaces necessary to
specify the memory type when creating such mappings, which makes it
difficult for the NVIDIA graphics driver to guarantee that the correct
memory type is selected.
Kernel mappings of I/O memory can be created with the pmap_mapdev()
interface, user mappings are created with mmap(2). On FreeBSD i386 and
on FreeBSD amd64, the effective memory type of mappings created with
either interface is determined by a given system's MTRR configuration
by default, which will specify the correct UC memory type in most, but
not in all cases.
MTRR configurations with non-UC memory ranges overlapping I/O memory
mapped via pmap_mapdev() or mmap(2) can result in the incorrect memory
type being selected, which can impair reliability.
To reduce the likelihood of problems, the FreeBSD i386 driver updates
the mappings returned by pmap_mapdev() with the PCD/PWT flags to force
use of the UC memory type. On FreeBSD amd64, the presence of a large
static mapping using 2MB pages makes this approach unfeasible.
In the case of user mappings, limited control over the memory type can
be exerted with the help of MTRRs, but their lack of flexibility
greatly reduces the feasibility of this approach.
1) The NVIDIA FreeBSD graphics driver is in need of new a interface that
supports the creation of UC kernel mappings on FreeBSD i386 and on
John Baldwin is working on a new interface, pmap_mapdev_attr(), which
will allow the NVIDIA graphics driver to create UC kernel mappings
on FreeBSD i386 and on FreeBSD amd64; the implementation on the latter
platform will handle the direct mapping transparently.
2) As described above, user mappings of I/O memory are created via the
mmap(2) interface and the FreeBSD device pager; unfortunately, drivers
do not currently have control over the memory type used.
The NVIDIA FreeBSD graphics driver needs to be able to specify the
memory type used for user mappings created via mmap(2). This interface
is also important for high performance graphics (see 'Performance'
1) The NVIDIA graphics driver needs to be able to set the memory type of
the kernel mapping of memory allocated with malloc()/contigmalloc()
to UC, which presents essentially the same problems as those outlined
above for I/O memory mappings.
The ability to change the memory type is necessary to avoid aliasing
problems when the memory is mapped into the AGP aperture, which is
accessed via WC user mappings. If the creation of UC/WC user mappings
becomes possible for system memory in the future (see below), the
ability to change the memory type of the associated kernel mappings to
UC will be important for the same reason.
Newer NVIDIA FreeBSD i386 graphics drivers manually update the memory
type of the kernel mappings of malloc() allocated memory using the
approach described for kernel mappings above. This is not feasible on
FreeBSD amd64 due to the static direct mapping (see above).
The NVIDIA FreeBSD graphics driver needs an interface that allows it
to change the memory type of the kernel mapping(s) of system memory
allocated with malloc()/contigmalloc(). The interface should flush CPU
and TLB caches, when necessary.
John Baldwin is working on pmap_change_attr() for FreeBSD i386 and for
FreeBSD amd64, which will allow specifying the desired memory types
for kernel mappings created with e.g. malloc()/contigmalloc().
2) The NVIDIA graphics driver needs to map different types of memory into
the address spaces of user clients, most commonly:
a) NVIDIA graphics device registers
b) NVIDIA graphics device frame buffer memory
c) AGP memory allocations (mapped via the AGP aperture)
d) DMA system memory allocations
This is currently done via mmap(2) and the device pager, i.e. the user
client performs a private ioctl(2) to allocate memory (this step is
specific to the b) - d) memory types), then calls mmap(2) to obtain a
user mapping of the memory. The NVIDIA graphics driver's d_mmap()
callback is invoked first to check the logical mmap(2) offset(s), then
again to return the associated page frame number(s) when the mapping
is accessed for the first time.
The device pager mechanism works well for a) - c), but not for d). The
system memory allocations are frequently very large (several MB) and
need to be allocated physically non-contiguous. This leads to problems
with the d_mmap() interface:
- d_mmap() is called per page with logical offsets computed based on
the mmap(2) base offset provided by the client and the current
page's position within the allocation, but no context information
is provided to d_mmap(). The NVIDIA FreeBSD graphics driver can
look up the associated system memory allocation and determine the
page frame number(s) for a given logical offset only if a linear
address range is associated with each system memory allocation, in
which case the start address can serve as the mmap(2) offset used
by the client and the logical offsets can be compared with each
allocation's linear address range.
Since the memory itself is not physically contiguous, the physical
addresses of pages in the allocation can not be used as mmap(2)
offsets, a different address range needs to be used. The FreeBSD
i386 driver currently allocates its system memory with malloc() and
derives the address range used with mmap(2) from the allocation's
kernel virtual address range.
This allocation of DMA system memory with malloc() is problematic
on FreeBSD i386 PAE and FreeBSD amd64 systems with more than 4GB of
RAM and older NVIDIA GPUs limited to 32-bit DMA, since malloc()
doesn't currently allow drivers to specify allocation constraints,
like contigmalloc() does, i.e. it may allocate physical memory that
can not be addressed by such GPUs.
Further, since the physical addresses of non-contiguous allocations
can not be used as mmap(2) offsets for system memory, but need to
be used for a) - c), the logical and physical addresses used as
mmap(2) offsets can potentially be confused by d_mmap(). The NVIDIA
graphics driver tries to minimize this risk, but can not avoid it
completely without a significant performance penalty.
- The device pager was designed for I/O memory regions and it assumes
that d_mmap() will always return the same page frame number for a
given logical offset. As a result, d_mmap() is invoked exactly once
for any given logical offset by default. In case of system memory
allocations, however, the physical page backing a given offset may
change as the malloc()'d memory is freed/reallocated.
The NVIDIA FreeBSD graphics driver needs to manually invalidate the
translation cache to work around this problem. It does so with the
msync() system call, which was extended for this purpose in FreeBSD
4.7 and again in FreeBSD 4.9 and 5.2.1. This leads to performance
problems on some configurations.
The NVIDIA FreeBSD graphics driver needs a different interface to make
the mapping of system memory allocations via mmap(2) simpler. If the
d_mmap() callback was extended to be called with the base offset in
addition to the current offset, the first two of the problems detailed
above would no longer be an issue; the NVIDIA graphics driver would
then be able to use physical addresses as mmap(2) offsets for a) - d).
The new interface may not require a FreeBSD specific ioctl(2), as this
would break compatibility with the NVIDIA Linux OpenGL library used
in the FreeBSD Linux ABI compatibility environment.
3) To be able to support FreeBSD i386 PAE and FreeBSD amd64 systems with
more than 4GB of physical memory and NVIDIA GPUs that are limited to
32-bit DMA, the NVIDIA FreeBSD graphics driver will need to be updated
to allocate memory from within the first 4GB of memory.
Unfortunately, this is not feasible with the current interfaces. The
malloc() interface does not allow the caller to specify allocation
constraints and while contigmalloc() does, its usefulness is currently
limited. This is because DMA memory can't realistically be allocated
contiguously, except if the allocations are very small, and because
a contiguous address range is needed for mmap(2), as described above,
which would need to be maintained seperately for contigmalloc() memory
The introduction of an malloc() variant that allows the specification
of allocation constraints would solve the addressing problem, but
due to the problems caused by using logical and physical addresses for
mmap(2), a different solution would be preferred. By making it
possible to use physical addresses exclusively as mmap(2) offsets, as
described above, the NVIDIA FreeBSD graphics driver could use the
contigmalloc() interface to allocate the invidiual pages in the larger
If contigmalloc() were used, the NVIDIA FreeBSD graphics driver would
need to be able to create contiguous virtual mappings spanning more
than one page within larger virtually non-contiguous allocations; this
functionality had best be implemented in the FreeBSD kernel.
The 'vmap()' kernel interface does this on Linux. It takes an array of
pages and maps them into a single contiguous address range.
1) For optimal PCI-E performance and improved compatibility with systems
where MTRR memory ranges do not provide sufficient flexibility, the
NVIDIA FreeBSD graphics driver needs to be able to specify the memory
type used for user mappings created with mmap(2).
John Baldwin is working on PAT support for FreeBSD, which will be used
by the pmap_mapdev_attr() and pmap_change_attr() kernel interfaces
referred to above. This support can provide the desired flexibility if
the d_mmap() interface is extended or complemented with a new one,
allowing drivers to take advantage of the PAT support.
In order to provide optimal PCI-E performance, NVIDIA FreeBSD graphics
drivers need to be able to create WC system memory mappings.
2) The device pager mechanism is page fault based, which incurs noticable
overhead due to the large number of user/kernel context switches.
This can result in significant performance penalties with very large
or numerous kernel mappings. It also currently requires the use of the
msync() workaround (see above), which incurs additional overhead.
Performance with the NVIDIA FreeBSD graphics driver would benefit from
an mmap(2) interface that is independent of the device pager and
allows the mappings' page tables to be prebuilt. The Linux and Solaris
operating systems support such interfaces.
3) On Linux and Solaris, the NVIDIA graphics driver can maintain per open
instance data, i.e. data that is specific to the processes' file
descriptors associated with NVIDIA character special files. This is
useful primarily to achieve optimal results with the driver's internal
notification mechanism, which is used to implement Sync-to-VBlank
functionality, among other things. On these two operating systems, the
NVIDIA graphics driver can selectively wake threads select(2)'ing the
device files (/dev/nvidia0..N).
The NVIDIA FreeBSD graphics driver can only maintain per device state
at the moment. It wakes all processes waiting on /dev/nvidiaX, and
needs to traverse a per device event list for each of these processes
to check whether an event was delivered for each one of them, which
incurs some overhead. The logic also can't currently guarantee correct
delivery of events to different threads in the same process.
Future versions of the NVIDIA FreeBSD graphics driver are likely to
employ the notification mechanism more aggressively, to better support
composited X desktop functionality.
Summary of Tasks:
# Task: implement pmap_mapdev_attr() on FreeBSD i386 and on
Motivation: allows reliable creation of kernel mappings of I/O
memory with specific cache attributes (with per-page
Priority: gates FreeBSD amd64 support.
Status: is being implemented for i386 and amd64 (work is being
done to allow easily breaking down 2MB pages).
# Task: design/implement better mmap(2) mechanism for mapping
memory to user space (context information, cache
Motivation: allows reliable creation of user mappings of DMA and
I/O memory and support for systems with more than
4GB of RAM.
Priority: gates improved FreeBSD i386 support (PCI-E performance,
SLI support, improved reliability); gates FreeBSD
Status: has not been started, pending.
# Task: implement pmap_change_attr() on FreeBSD i386 and on
Motivation: allows prevention of cache coherency problems.
Priority: gates FreeBSD amd64 support.
Status: is being implemented for i386 and amd64.
# Task: implement vmap()-like kernel interface.
Motivation: allows creation of contiguous kernel mappings of
parts of or complete non-contiguous DMA/system memory
Priority: gates support for systems with more than 4GB of RAM.
Status: has not been started.
# Task: implement mechanism to allow character drivers to
maintain per-open instance data (e.g. like the Linux
kernel's 'struct file *').
Motivation: allows per thread NVIDIA notification delivery; also
reduces CPU overhead for notification delivery
from the NVIDIA kernel module to the X driver and to
Priority: should translate to improved X/OpenGL performance.
Status: has not been started.
ch?zander at nvidia.com
More information about the freebsd-hackers