Diagnosing virtual machine network issues

From: Alex Arslan <ararslan_at_comcast.net>
Date: Fri, 28 Jun 2024 15:53:36 UTC
Hello,

I originally posted the following to freebsd-questions but was encouraged
to repost here instead.

I work on the Julia language (https://julialang.org) and am the de facto
maintainer of its FreeBSD support. Our continuous integration runs jobs in
FreeBSD 13.2 AMD64 virtual machines with KVM on Linux. This same Linux
machine also runs Windows jobs in VMs with KVM as well as Linux jobs using
a custom sandboxing setup.

We've noticed a number of network-related issues that only occur on the
FreeBSD VMs and cause tests to fail. Currently we reliably see a test
failure that expects a host resolution failure via libcurl from
https://domain.invalid but on the FreeBSD VMs we instead get a timeout.
Previously we've also seen timeouts when making requests to httpbingo
and GitHub. However, I've never been able to reproduce any of these test
failures, which makes me suspect there's an issue with how we've set up
networking for the VMs.

Can anybody provide guidance for how to determine what, if anything, could
be misconfigured? I apologize for the vagueness of this question; I'm not
really familiar with anything networking- or virtualization-related, so
I'm not sure what information would be helpful to include here. The
complete setup lives in https://github.com/JuliaCI/sandboxed-buildkite-agent
in the freebsd-kvm directory. In base-image/freebsd13.pkr.hcl [1], which
uses Packer to build a base qcow2 image, we set net_device = "virtio-net".
In buildkite-worker/kvm_machine.xml.template [2], we set the target device
to vnet0 with bridge virbr0.

Thank you very much for your time!

Best,
Alex

[1]: https://github.com/JuliaCI/sandboxed-buildkite-agent/blob/main/freebsd-kvm/base-image/freebsd13.pkr.hcl
[2]: https://github.com/JuliaCI/sandboxed-buildkite-agent/blob/main/freebsd-kvm/buildkite-worker/kvm_machine.xml.template