GPU passthrough: mixed success on Linux, not yet on Windows

Wed Mar 27 16:16:11 UTC 2019

I added some logging in ppt_pci_reset() and I can confirm the GPU is indeed cycled through the D3 state at the beginning and end of the VM session. AFAIK that’s only D3 hot though; does FreeBSD have support for D3 cold?

On Wed, Mar 27, 2019 at 15:48, Nick Wolff <darkfiberiru at gmail.com> wrote:

> Hi Robert,
>
> So for problem 2/3 you may want to look at the  ppt_pci_reset(device_t dev) function in ppt.c. I'm not sure why this isn't dealing with both of your problems. Matt Macy may have some ideas.
>
> Great work by the way
>
> Thanks,
>
> Nick Wolff
>
> On Sun, Mar 17, 2019 at 12:23 PM Robert Crowston via freebsd-virtualization <freebsd-virtualization at freebsd.org> wrote:
>
>> Hi folks, this is my first post to the group. Apologies for length.
>>
>> I've been experimenting with GPU passthrough on bhyve. For background, the host system is FreeBSD 12.0-RELEASE on an AMD Ryzen 1700 CPU @ 3.8 GHz, 32 GB of ECC RAM, with two nVidia GPUs. I'm working with a Linux Debian 9 guest and a Windows Server 2019 (desktop experience installed) guest. I also have a USB controller passed-through for bluetooth and keyboard.
>>
>> With some unpleasant hacks I have succeeded in starting X on the Linux guest, passing-through an nVidia GT 710 under the nouveau driver. I can run the "mate" desktop and glxgears, both of which are smooth at 4K. The Unity Heaven benchmark runs at an embarrassing 0.1 fps, and 2160p x264 video in VLC runs at about 5 fps. Neither appears to be CPU-bound in the host or the guest.
>>
>> The hack I had to make: I found that many instructions to access memory-mapped PCI BARs are not being executed on the CPU in guest mode but are being passed back for emulation in the hypervisor. This causes an assertion to fail inside passthru_write() in pci_passthru.c ["pi->pi_bar[baridx].type == PCIBAR_IO"] because it does not expect to perform memory-mapped IO for the guest. Examining the to-be-emulated instructions in vmexit_inst_emul() {e.g., movl (%rdi), %eax}, they look benign to me, and I have no explanation for why the CPU refused to execute them in guest mode.
>>
>> As an amateur work-around, I removed the assertion and instead I obtain the desired offset into the guest's BAR, calculate what that guest address translates to in the host's address space, open(2) /dev/mem, mmap(2) over to that address, and perform the write directly. I do a similar trick in passthru_read(). Ugly, slow, but functional.
>>
>> This code path is accessed continuously whether or not X is running, with an increase in activity when running anything GPU-heavy. Always to bar 1, and mostly around the same offsets. I added some logging of this event. It runs at about 100 lines per second while playing video. An excerpt is:
>> ...
>> Unexpected out-of-vm passthrough write #492036 to bar 1 at offset 41100.
>> Unexpected out-of-vm passthrough write #492037 to bar 1 at offset 41100.
>> Unexpected out-of-vm passthrough read #276162 to bar 1 at offset 561280.
>> Unexpected out-of-vm passthrough write #492038 to bar 1 at offset 38028.
>> Unexpected out-of-vm passthrough write #492039 to bar 1 at offset 38028.
>> Unexpected out-of-vm passthrough read #276163 to bar 1 at offset 561184.
>> Unexpected out-of-vm passthrough read #276164 to bar 1 at offset 561184.
>> Unexpected out-of-vm passthrough read #276165 to bar 1 at offset 561184.
>> Unexpected out-of-vm passthrough read #276166 to bar 1 at offset 561184.
>> ...
>>
>> So my question here is,
>> 1. How do I diagnose why the instructions are not being executed in guest mode?
>>
>> Some other problems:
>>
>> 2. Once the virtual machine is shut down, the passed-through GPU doesn't get turned off. Whatever message was on the screen in the final throes of Linux's shutdown stays there. Maybe there is a specific detach command which bhyve or nouveau hasn't yet implemented? Alternatively, maybe I could exploit some power management feature to reset the card when bhyve exits.
>>
>> 3. It is not possible to reboot the guest and then start X again without an intervening host reboot. The text console works fine. Xorg.0.log has a message like
>>     (EE) [drm] Failed to open DRM device for pci:0000:00:06.0: -19
>>     (EE) open /dev/dri/card0: No such file or directory
>> dmesg is not very helpful either.[0] I suspect that this is related to problem (2).
>>
>> 4. There is a known bug in the version of the Xorg server that ships with Debian 9, where the switch from an animated mouse cursor back to a static cursor causes the X server to sit in a busy loop of gradually increasing stack depth, if the GPU takes too long to communicate with the driver.[1] For me, this consistently happens after I type my password into the Debian login dialog box and eventually (~ 120 minutes) locks up the host by eating all the swap. A work-around is to replace the guest's animated cursors with static cursors. The bug is fixed in newer versions of X, but I haven't tested whether their fix works for me yet.
>>
>> 5. The GPU doesn't come to life until the nouveau driver kicks in. What is special about the driver? Why doesn't the UEFI open the GPU and send it output before the boot? Any idea if the problem is on the UEFI side or the hypervisor side?
>>
>> 6. On Windows, the way Windows probes multi-BAR devices seems to be inconsistent with bhyve's model for storing io memory mappings. Specifically, I believe Windows assigns the 0xffffffff sentinel to all BARs on a device in one shot, then reads them back and assigns the true addresses afterwards. However, bhyve sees the multiple 0xffffffff assignments to different BARs as a clash and errors out on the second BAR probe. I removed most of the mmio_rb_tree error handling in mem.c and this is sufficient for Windows to boot, and detect and correctly identify the GPU. (A better solution might be to handle the initial 0xffffffff write as a special case.) I can then install the official nVidia drivers without problem over Remote Desktop. However, the GPU never springs into life: I am stuck with a "Windows has stopped this device because it has reported problems. (Code 43)" error in the device manager, a blank screen, and not much else to go on.
>>
>> Is it worth me continuing to hack away at these problems---of course I'm happy to share anything I come up with---or is there an official solution to GPU support in the pipe about to make my efforts redundant :)?
>>
>> Thanks,
>> Robert Crowston.
>>
>> ---
>> Footnotes
>>
>> [0]  Diff'ing dmesg after successful GPU initialization (+) and after failure (-), and cutting out some lines that aren't relevant:
>>  nouveau 0000:00:06.0: bios: version 80.28.a6.00.10
>> +nouveau 0000:00:06.0: priv: HUB0: 085014 ffffffff (1f70820b)
>>  nouveau 0000:00:06.0: fb: 1024 MiB DDR3
>> @@ -466,24 +467,17 @@
>>  nouveau 0000:00:06.0: DRM: DCB conn 00: 00001031
>>  nouveau 0000:00:06.0: DRM: DCB conn 01: 00002161
>>  nouveau 0000:00:06.0: DRM: DCB conn 02: 00000200
>> -nouveau 0000:00:06.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
>> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:88/gf119_disp_dmac_init()!
>> -nouveau 0000:00:06.0: disp: ch 1 init: c207009b
>> -nouveau: DRM:00000000:0000927c: init failed with -16
>> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
>> -nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
>> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
>> -nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
>> +[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>> +[drm] Driver supports precise vblank timestamp query.
>> +nouveau 0000:00:06.0: DRM: MM: using COPY for buffer copies
>> +nouveau 0000:00:06.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff96fdb39a1800
>> +fbcon: nouveaufb (fb0) is primary device
>> -nouveau 0000:00:06.0: timeout at /build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/coregf119.c:187/gf119_disp_core_fini()
>> -nouveau 0000:00:06.0: disp: core fini: 8d0f0088
>> -[TTM] Finalizing pool allocator
>> -[TTM] Finalizing DMA pool allocator
>> -[TTM] Zone  kernel: Used memory at exit: 0 kiB
>> -[TTM] Zone   dma32: Used memory at exit: 0 kiB
>> -nouveau: probe of 0000:00:06.0 failed with error -16
>> +Console: switching to colour frame buffer device 240x67
>> +nouveau 0000:00:06.0: fb0: nouveaufb frame buffer device
>> +[drm] Initialized nouveau 1.3.1 20120801 for 0000:00:06.0 on minor 0
>>
>> [1] https://devtalk.nvidia.com/default/topic/1028172/linux/titan-v-ubuntu-16-04lts-and-387-34-driver-crashes-badly/post/5230898/#5230898
>> _______________________________________________
>> freebsd-virtualization at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
>> To unsubscribe, send any mail to "freebsd-virtualization-unsubscribe at freebsd.org"