cpu soft lockup bhyve host ubuntu guest

From: tech-lists <tech-lists_at_zyxst.net>
Date: Thu, 18 Nov 2021 00:59:30 UTC
Hi,

I'm seeing an ubuntu guest freezing on a freebsd (12-stable) bhyve host.
No issues are being reported in console.log, messages or all.log or
dmesg on the freebsd server that show anything linking to lack of
resources or cpu allocation. There's 44 cores, they're not
oversubscribed. The guest vm has 4 vcores and 8G vram.

but if I go to the guest console:

[...]
Ubuntu 21.10 foo ttyS0

foo login: [74237.857639] watchdog: BUG: soft lockup - CPU#2 stuck for
26s! [sshd:5452]
[74265.874169] watchdog: BUG: soft lockup - CPU#2 stuck for 52s!
[sshd:5452]
[74270.328733] rcu: INFO: rcu_sched self-detected stall on CPU
[74270.329269] rcu:     2-....: (14862 ticks this GP)
idle=68a/1/0x4000000000000000 softirq=1242684/1242684 fqs=7432
[74296.627733] watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[swapper/1:0]
[74297.892134] watchdog: BUG: soft lockup - CPU#2 stuck for 82s!
[sshd:5452]

hit return a few times and it unlocks itself. I wonder if it's going to
sleep?! surely not

I see this sort of message in ubuntu's dmesg:

[...]
[74237.857639] watchdog: BUG: soft lockup - CPU#2 stuck for 26s!
[sshd:5452]
[74237.858302] Modules linked in: xt_recent dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common rapl
input_leds serio_raw mac
_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT
nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype
xt_tcpudp xt_conntrack
  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter
  sch_fq_codel nf_tables nfnetlink msr drm ip_tables x_tables autofs4
  btrfs blake2b_generic
  zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
  crct10dif_pclmul
  crc32_pclmul ghash_clmulni_intel virtio_net aesni_intel net_failover
  crypto_simd cryptd psmouse virtio_blk failover
  [74237.858360] CPU: 2 PID: 5452 Comm: sshd Not tainted
  5.13.0-21-generic #21-Ubuntu
  [74237.858363] Hardware name:  BHYVE, BIOS 1.00 03/14/2014
  [74237.858365] RIP: 0010:smp_call_function_many_cond+0x11a/0x2c0
  [74237.858377] Code: 74 24 08 e8 38 48 49 00 3b 05 76 45 00 02 89 c7 73
  22 48 63 c7 49 8b 0c 24 48 03 0c c5 00 59 eb 85 8b 41 08 a8 01 74 0a f3
  90 <8b> 51 08
   83 e2 01 75 f6 eb ca 48 83 c4 40 5b 41 5c 41 5d 41 5e 41
   [74237.858379] RSP: 0018:ffffb92f80c9fab8 EFLAGS: 00000202
   [74237.858381] RAX: 0000000000000011 RBX: 0000000000000001 RCX:
   ffff9a8f37c341a0
   [74237.858383] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
   0000000000000000
   [74237.858384] RBP: ffffb92f80c9fb20 R08: 0000000000000000 R09:
   0000000000000000
   [74237.858385] R10: 0000000000000000 R11: 0000000000000000 R12:
   ffff9a8f37d2de80
   [74237.858386] R13: 0000000000000246 R14: 0000000000000000 R15:
   ffff9a8f37d2de80
   [74237.858388] FS:  00007f7a70523900(0000) GS:ffff9a8f37d00000(0000)
   knlGS:0000000000000000
   [74237.858389] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [74237.858390] CR2: 00007f7a70aa64a6 CR3: 000000010b8d2001 CR4:
   00000000000706e0
   [74237.858393] Call Trace:
   [74237.858399]  ? invalidate_user_asid+0x30/0x30
   [74237.858409]  on_each_cpu_cond_mask+0x1d/0x20
   [74237.858412]  flush_tlb_kernel_range+0x41/0xa0
   [74237.858414]  __purge_vmap_area_lazy+0xbd/0x6f0
   [74237.858420]  ? do_jit+0xe3a/0x2390
   [74237.858424]  ? purge_fragmented_blocks+0xc3/0x1c0
   [74237.858427]  _vm_unmap_aliases.part.0+0x114/0x150
   [74237.858430]  vm_unmap_aliases+0x27/0x30
   [74237.858432]  change_page_attr_set_clr+0xb7/0x1b0
   [74237.858436]  set_memory_ro+0x29/0x30
   [74237.858439]  bpf_int_jit_compile+0x353/0x3d0
   [74237.858442]  bpf_prog_select_runtime+0xf7/0x130
   [74237.858447]  bpf_prepare_filter+0x1cc/0x200
   [74237.858456]  ? hardlockup_detector_perf_cleanup+0xa0/0xa0
   [74237.858461]  bpf_prog_create_from_user+0xc4/0x120
   [74237.858464]  seccomp_set_mode_filter+0x122/0x530
   [74237.858467]  do_seccomp+0x37/0x1f0
   [74237.858469]  prctl_set_seccomp+0x2c/0x40
   [74237.858471]  __do_sys_prctl+0x438/0x6f0
   [74237.858476]  __x64_sys_prctl+0x21/0x30
   [74237.858478]  do_syscall_64+0x61/0xb0
   [74237.858484]  entry_SYSCALL_64_after_hwframe+0x44/0xae
   [74237.858490] RIP: 0033:0x7f7a70a0d281
   [74237.858495] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
   08 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00
   00 0f 05 <48> 3d 00 f0 ff ff 77 17 48 8b 4c 24 18 64 48 2b 0c 25 28 00
   00 00
   [74237.858497] RSP: 002b:00007fff28e84180 EFLAGS: 00000246 ORIG_RAX:
   000000000000009d
   [74237.858499] RAX: ffffffffffffffda RBX: 000055c0301fe020 RCX:
   00007f7a70a0d281
[...]

thanks,
-- 
J.