[Bug 242724] bhyve: Unkillable processes stuck in 'STOP' state (vmm::vm_handle_suspend())

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Dec 19 13:04:01 UTC 2019


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242724

            Bug ID: 242724
           Summary: bhyve: Unkillable processes stuck in 'STOP' state
                    (vmm::vm_handle_suspend())
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: bhyve
          Assignee: virtualization at FreeBSD.org
          Reporter: aleksandr.fedorov at itglobal.com

On one of our servers, there are three unkillable bhyve processes which stuck
in 'STOP' state. All has the same symptoms:

# procstat -kk 9277 | sort -k 4
  PID    TID COMM                TDNAME              KSTACK
 9277 102504 bhyve               blk-1:0:0-0         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102514 bhyve               blk-1:0:0-1         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102752 bhyve               blk-1:0:0-2         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102760 bhyve               blk-1:0:0-3         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102761 bhyve               blk-1:0:0-4         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102763 bhyve               blk-1:0:0-5         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102764 bhyve               blk-1:0:0-6         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102766 bhyve               blk-1:0:0-7         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102767 bhyve               blk-3:0-0           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102768 bhyve               blk-3:0-1           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102769 bhyve               blk-3:0-2           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102770 bhyve               blk-3:0-3           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102771 bhyve               blk-3:0-4           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102772 bhyve               blk-3:0-5           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102773 bhyve               blk-3:0-6           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102774 bhyve               blk-3:0-7           mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102775 bhyve               blk-4:0:0-0         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102780 bhyve               blk-4:0:0-1         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102887 bhyve               blk-4:0:0-2         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102888 bhyve               blk-4:0:0-3         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102889 bhyve               blk-4:0:0-4         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102890 bhyve               blk-4:0:0-5         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102891 bhyve               blk-4:0:0-6         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102892 bhyve               blk-4:0:0-7         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102687 bhyve               mevent              mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102941 bhyve               pci-devstat         mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102983 bhyve               pci-reconfig        mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102937 bhyve               rfb                 mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
 9277 102991 bhyve               vcpu 0              mi_switch+0xe2
sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2
devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f
kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101
 9277 100782 bhyve               vcpu 1              mi_switch+0xe2
thread_suspend_switch+0xd4 thread_single+0x47b sigexit+0x57 postsig+0x2f8
ast+0x327 fast_syscall_common+0x198
 9277 100788 bhyve               vcpu 2              mi_switch+0xe2
sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2
devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f
kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101
 9277 101385 bhyve               vcpu 3              mi_switch+0xe2
sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2
devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f
kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101
 9277 103402 bhyve               vcpu 4              mi_switch+0xe2
sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2
devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f
kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101
 9277 102206 bhyve               vcpu 5              mi_switch+0xe2
sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2
devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f
kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101
 9277 102932 bhyve               vtnet-5:0 tx        mi_switch+0xe2
thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f

# procstat threads 9277|grep vcpu
 9277 100782 bhyve               vcpu 1               -1  124 stop    -
 9277 100788 bhyve               vcpu 2               -1  123 stop    vmsusp
 9277 101385 bhyve               vcpu 3               -1  123 stop    vmsusp
 9277 102206 bhyve               vcpu 5               -1  122 stop    vmsusp
 9277 102991 bhyve               vcpu 0               -1  124 stop    vmsusp
 9277 103402 bhyve               vcpu 4               -1  123 stop    vmsusp

As you can see, One of the VCPU  threads exit, the rest are sleeping in kernel
mode:
https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=markup#l1515

The main problem is that if the bhyve process is killed, the threads driving
those vCPUs will be gone, resulting in the remaining thread waiting in
kernelspace forever:
https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=markup#l1507

I found that the same bug was fixed in SmartOS:
https://github.com/joyent/illumos-joyent/commit/dce228e4331f185347c3e0325cab8a3af72d6410#diff-721b4b2a86bbb8d11c6e01a2995868e6

Unfortunately the site https://smartos.org/bugview/OS-6888 is not available, so
it is impossible to see the detailed description.

Can we apply similar fix to our bhyve code?

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-virtualization mailing list