[Bug 265487] bhyve drops the PCI passthrough device in a VM if the host is suspended/resumed
Date: Thu, 28 Jul 2022 21:58:10 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265487
            Bug ID: 265487
           Summary: bhyve drops the PCI passthrough device in a VM if the
                    host is suspended/resumed
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: jon@xyinn.org
Hello,
I've been debugging an issue with the creator if wifibox (
PÁLI Gábor János) that I've been experiencing on my Thinkpad X1 Carbon Gen 7 on
13.1-STABLE (n251926-488f9d85278, but I also reproduced it in 13.1-RELEASE -
n250148-fc952ac2212 - while seeing if this was a regression, it is not).
My particular scenario is using 'wifibox' to pass my wifi card to an Alpine
Linux VM using bhyve. Most of the details and logs during our debugging have
been posted in this bug report:
https://github.com/pgj/freebsd-wifibox/issues/31
It seems that when I shut down my laptop (host) while the bhyve VM is running,
it would non-deterministically (but very soon, around 1-5 times of doing
suspend/resume) would resume the system successfully, but the wifi would stop
working. Upon further investigation, we noticed that the wireless interface and
the entire PCI card was completely missing from within the VM. The only way to
"fix" it was to reboot.
Upon further experimentation, I noticed that if I stopped the VM, then
suspended/resumed, and then started the VM again, the PCI card would be
successfully available again. This workaround so far hasn't failed for me (but
I'm still testing it throughout the day). My suspicion is that when bhyve has
passed down a PCI device and the host is suspended, the resources for that
handle aren't being released correctly and the machine may be in an
inconsistent state in regards to this specific passthrough device.
To post some logs, we can see the following before and after inside the bhyve
VM:
Before
-----------
wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:06.0 Class 0280: 8086:02f0
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f
wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20396 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33557 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4197622 (4.0 MiB)  TX bytes:27342203 (26.0 MiB)
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
wlan0     Link encap:Ethernet  HWaddr F8:E4:E3:EB:35:02  
          inet addr:192.168.1.139  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::fae4:e3ff:feeb:3502/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20133 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:27437272 (26.1 MiB)  TX bytes:4448962 (4.2 MiB)
wifibox:~# dmesg | grep iwlwifi
[    0.869316] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.869318] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.870101] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.926465] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by
driver
[    0.926473] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version:
89.3.35.37
[    0.926601] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0
QuZ-a0-jf-b0-66.ucode op_mode iwlmvm
[    0.946738] iwlwifi 0000:00:06.0: Detected Intel(R) Wireless-AC 9560 160MHz,
REV=0x354
[    1.061975] iwlwifi 0000:00:06.0: Detected RF JF, rfid=0x105110
[    1.118696] iwlwifi 0000:00:06.0: base HW address: f8:e4:e3:eb:35:02
[    5.207325] iwlwifi 0000:00:06.0: Unhandled alg: 0x3f0707
After
------
wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f
wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:145 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:143406 (140.0 KiB)  TX bytes:1594 (1.5 KiB)
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
wifibox:~# dmesg | grep iwlwifi
<nothing>
Here we can see that the '00:06.0 Class 0280: 8086:02f0' has completely
dissapeared.
Using my "stop wifibox before sleep to release the PCI resource + start it back
up later" approach in this way:
root@leslie:~ # cat /usr/local/etc/devd/wifibox.conf
# This is a `devd(8)` configuration file to run the resume action of
# wifibox on the ACPI resume event.  Review the contents and create a
# copy of it without the `.sample` extension to use it.  Restart the
# `devd` service once the file has been created.
notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "logger 'Stopping wifibox before suspend' &&
/usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify";
};
notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after
resume and getting IP via DHCP' && /usr/local/sbin/wifibox start &&
/sbin/dhclient wifibox0";
};
we can see the following in the logs (correct flow on the host):
bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
lo0: link state changed to UP
...
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ue0: link state changed to UP
tap0: promiscuous mode disabled
...
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
tap0: promiscuous mode disabled
Regarding the devd rule for wifibox, notice that in the Suspend action, the
"/etc/rc.suspend acpi $notify" call is happening at the end. I'm suspecting
this is behaving like a stack (and makes sense since we are basically reversing
stuff when we suspend/resume). When I made that call in the beginning, the PCI
passthrough issue arose, but when I placed it at the end it was fine. I'm
guessing this is due to the suspend command being triggered and the wifibox
section didn't have enough time to fully release the resources, thus it would
yield the same issue that I mentioned before. Perhaps that rule can help
further narrowing this issue down.
At the moment I'm testing some slight improvements to the devd workaround that 
PÁLI suggested which will help speed up the suspend/resume times and not need
me to explicitly call the dhclient command. Ultimately, we'll want to fix the
bhyve issue. I don't have any experience with bhyve or its internals but I can
help with debugging across RELEASE/STABLE/CURRENT.
- Jonathan
-- 
You are receiving this mail because:
You are the assignee for the bug.