[Bug 287337] LAN interface gets totally wedged, unkillable processes, no packets received
Date: Fri, 06 Jun 2025 07:27:14 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287337
Bug ID: 287337
Summary: LAN interface gets totally wedged, unkillable
processes, no packets received
Product: Base System
Version: 13.4-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: gert@greenie.muc.de
Hi,
so, this is a machine with multiple NICs, and occasionally one of them gets
"wedged" or "stuck", with no packets being received at all (outgoing packets do
seem to get out), and things like "ifconfig $nic down" resulting in an
unkillable "ifconfig" process. Only reboot helps.
This used to happen on an old HP DL360G8 "once or twice a year", so we assumed
"it's dying hardware", and eventually moved the whole FreeBSD system into a
Proxmox VM cluster. It was running perfectly fine there for about 8 months,
and yesterday the "interface wedged" problem occured *4 times*, which makes me
think we might actually have hit a FreeBSD bug there.
At this point, the VM has 3 NICs, vtnet0..vtnet2.
vtnet0 is the one that does "all the production traffic", and that's the one
that gets stuck. vtnet1/vtnet2 connect to two isolated network segments, and
keep working perfectly fine - so when the problem happens, I can still ssh in
from another machine connected to vtnet1/vtnet2 and run diagnostics.
At this point, I would primarily ask for "what sort of information should I
gather, and what should I test, if it happens again?"
What we did
- tcpdump -n -s0 -i vtnet0 --> claims "we send packets, we do not receive
packets, at all"
- flap the virtual NIC link on the hypervisor --> shows up in "dmesg", but
does not change anything
- move the VM to a different cluster node (see if it's something on the KVM
side) --> no change
- try "ifconfig vtnet0 down" --> makes "ifconfig" unkillable, and the
interface is still displayed as "up"
- run "ping 141.1.1.1" --> makes "ping" unkillable
- on "shutdown -r" it tries to kill dhcp6d for 90 seconds, which refuses to
die, then complains about "90 seconds watchdog timeout" and proceeds to
"flushing disks" and gets stuck there - so a press to the (virtual) reset
button is needed to un-stick things
We have first seen this on 13.2-RELEASE (2 times over 9 months), and yesterday
4x on 13.4-RELEASE-p5.
Right now my suspect is the IPv6 DHCP server from
"isc-dhcp44-server-4.4.3P1_2", because this is really the only thing that makes
this machine unique - we have some 20+ more FreeBSD machines on 13.4-RELEASE,
some hardware, some VMs on Proxmox, and no othe machine has ever exhibited
this. Some have way more network traffic and sessions.
What we did in the meantime is to upgrade the kernel to 14.2-RELEASE (first
half of "freebsd-update -r 14.2-RELEASE upgrade") to see if that will help -
that machine needs to get work done.
--
You are receiving this mail because:
You are the assignee for the bug.