x86 IOMMU support (DMAR)
kostikbel at gmail.com
Mon May 27 10:58:55 UTC 2013
For the several months, I worked (and continue the work now) on the
driver for the Intel VT-d for FreeBSD. The VT-d is sold as the I/O
Virtualization technology, but in essence it is a DMA addresses
remapping engine, i.e. it is advanced and improved I/O MMU, as also
found on other big-iron machines, e.g. PowerPC or Sparc. See the
Intel document titled 'Intel Virtualization Technology for Directed
I/O Architecture Specification' and chipsets datasheets for the
description of the facility.
The development was greatly facilitated by Jim Harris from Intel who
provided me the access to the Sandy and Ivy Bridge north bridge
documentation. John Baldwin patiently educated me about newbus and
helped developing required hooks for integration with the existing
The core hardware element of the VT-d is DMA remap unit, referenced as
DMAR both in the documentation and in the source code. Besides DMA
remap, VT-d also allows to do remapping of the MSI/MSI-X interrupt
messages. FreeBSD could utilize the functionality for the interrupt
rebalancing, instead of reprogramming msi registers of the PCI
devices, but this part is not (yet) implemented.
For the FreeBSD architecture, DMAR naturally fits as busdma engine,
making it possible to eliminate bounce page copying. Another great
benefit of the DMAR use is the reliability and security improvements,
since DMA transfers are only allowed to the memory areas explicitely
designated by the device driver as buffers. As noted by Jim Harris,
this security angle could find a use in the NTB driver.
The existing busdma code for x86 was split into generic interface,
kept in the busdma_machdep.c, and bouncing implementation in the
busdma_bounce.c. The DMAR-based implementation, which calls the DMAR
core, is located in the busdma_dmar.c. There is no KPI provided to
manage DMARs, but I plan to implement the proper interface after
discussing the needs of the bhyve.
I tried to support both i386 and amd64, but for i386 the limited KVA,
together with the busdma interface structure of never sleeping from
the driver calls, make some promises of IOMMU less strict. For
instance, to unload the map, code needs to transiently map the DMAR
page table pages, which require sleepable allocations of sf buffers.
As result, map unload on i386 is done asynchronously in the taskqueue
context, which makes it possible for the buggy device driver or
hardware to perform the transfer to freed pages for some time after
unload. This problem is not present for amd64 port. For the same
reason of busdma KPI, I cannot use queued invalidation both for i386
At the moment the code makes the 1:1 relations between device contexts
and domains, which is fine for busdma. To support PCI pass-through
into the virtualized machines, the relations should be changed to N:1
contexts to domains, which is planned but currently is not yet done.
Overall state of the code is that I can boot multiuser over the
network from if_igb(4) or if_bce(4), and can use ahci(4) and ata(4)
attached disks without corrupting UFS volumes. Uhci(4) has known
issues due to too late establishment of the RMRR mappings. Extensive
testing of the already written code is not done yet. Plans include
- providing the external KPI for the VMM consumers
- support ATS
- making it possible to select busdma_dmar or busdma_bounce for
individual PCI functions
- the stabilization work.
Also, by converting the ISA DMA implementation to use the busdma KPI,
it is possible to make the floppies work reliably again !
It is known that IOMMU adds overhead due to the mapping and unmapping
for each I/O. DMAR implementations usually have some erratas, as well
as PCIe devices sometime do not completely follow the specification,
causing misbehaviour with remapping enabled. For this reason I do not
plan to enable IOMMU by default, and intend to provide a possibility
to route individual PCI devices to the bounce busdma implementation.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 834 bytes
Desc: not available
More information about the freebsd-arch