[Bug 287163] if_bridge: network problems under load
Date: Fri, 30 May 2025 12:43:08 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287163
Bug ID: 287163
Summary: if_bridge: network problems under load
Product: Base System
Version: 14.2-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: d8zNeCFG@aon.at
This is a somewhat complex scenario involving four hosts; hopefully this does
not throw off a prospective reader:
- mizar: Laptop running FreeBSD stable/14 ca. Feb. 1
. interface em0 1 Gbps (here using the native driver - see also bug 235031)
. em0 as only member of bridge0 (see also bug 287146)
. serving as a VirtualBox host
- Windows 10 client
- Its network interface is bridged to em0 (using vboxnet); it is not
possible to bridge to bridge0, doing so generates an error message
. So the disk goes via iSCSI and bridge0, the vbox client's network goes via
vbox bridging via em0
. mizar runs the UI (KDE).
- orion: Fast modern server running openSUSE Leap 15.6
. serves iSCSI disks, one of them for the Windows 10 VirtualBox client on
mizar
. 1 Gbps interface
. (It could run FreeBSD instead, but then -> bug 286869)
- hal: An old server running FreeBSD stable/14
. 1 Gbps interface
- gandalf: An old laptop running FreeBSD stable/14
. 100 Mbps interface
. Internet gateway including IPv6 via 6to4 (stf)
- The complete network is dual IPv4/IPv6, with RFC1918 addresses for IPv4 and
site-local addresses for IPv6. Also running is rtsol/rtadv, resulting in fully
routable IPv6 addresses for all hosts (if enabling auto_linklocal on bridge 0,
see the non-bug 287146).
- The Windows 10 VirtualBox client is started on mizar. It gets its disk via
iSCSI from orion.
- This client contains a cygwin installation. The "find" command is used to
search for files with certain characteristics in c:\Windows.
- This generates a significant load on the disk, therefore via iSCSI to orion,
therefore via bridge0.
- There should not be a great load from the vboxnet interface via em0, except
maybe that Windows is doing some background updates or whatever.
- In addition, there are xterms, xloads, and other programs running on gandalf,
orion, and hal, which are all displaying on mizar. On mizar, this results in
something like this:
[0]# lsof | grep :x11
Xorg 2677 root 4u IPv6 0xfffff80240772000 0
TCP *:x11 (LISTEN)
Xorg 2677 root 5u IPv4 0xfffff80055560a80 0
TCP *:x11->*:* (LISTEN)
Xorg 2677 root 91u IPv4 0xfffff80324a4b000 0
TCP mizar.xyzzy:x11->gandalf.xyzzy:12378 (ESTABLISHED)
Xorg 2677 root 92u IPv4 0xfffff8005599b540 0
TCP mizar.xyzzy:x11->gandalf.xyzzy:37719 (ESTABLISHED)
Xorg 2677 root 93u IPv4 0xfffff80055560000 0
TCP mizar.xyzzy:x11->gandalf.xyzzy:10580 (ESTABLISHED)
Xorg 2677 root 94u IPv4 0xfffff8048c0bba80 0
TCP mizar.xyzzy:x11->gandalf.xyzzy:30011 (ESTABLISHED)
Xorg 2677 root 95u IPv4 0xfffff800aaac1000 0
TCP mizar.xyzzy:x11->hal.xyzzy:24597 (ESTABLISHED)
Xorg 2677 root 96u IPv4 0xfffff802d8f7c540 0
TCP mizar.xyzzy:x11->hal.xyzzy:54794 (ESTABLISHED)
Xorg 2677 root 97u IPv4 0xfffff8048ca14000 0
TCP mizar.xyzzy:x11->orion.xyzzy:51438 (ESTABLISHED)
Xorg 2677 root 98u IPv4 0xfffff802d8467000 0
TCP mizar.xyzzy:x11->hal.xyzzy:10010 (ESTABLISHED)
Xorg 2677 root 101u IPv4 0xfffff8048c0bc540 0
TCP mizar.xyzzy:x11->orion.xyzzy:41936 (ESTABLISHED)
Xorg 2677 root 102u IPv4 0xfffff803373cb000 0
TCP mizar.xyzzy:x11->orion.xyzzy:41950 (ESTABLISHED)
Xorg 2677 root 103u IPv4 0xfffff800aaac1a80 0
TCP mizar.xyzzy:x11->orion.xyzzy:41952 (ESTABLISHED)
[0]#
Result:
- After a while, the connections to the remote X programs from orion and hal
are dropped, but not from gandalf (this could be reproduced at least once
already, with net/intel-em-kmod).
- Because gandalf still has a working xterm, the following can be seen there:
. "arp mizar" still displays an entry
. "ndp mizar" has no entry anymore
- Going via gandalf to hal or orion, one can see that they have neither an arp
nor an ndp entry for mizar anymore.
- Strangely enough, the iSCSI connection from VirtualBox to orion continues to
work for a little longer, until it is also dropped and the VirtualBox client
stops with a corresponding error message.
- Some seconds after the VirtualBox client is stopped (and therefore the
network load via the bridge is gone), hal and orion can successfully create arp
and ndp entries for mizar, and from then on direct connections are possible
again.
- Once the direct connections was possible again, I resumed the VirtualBox
Windows 10 client.
- After a while, this again results in (some, but not all) x11 connections
being dropped. And then also the iSCSI connection, again stopping the
VirtualBox client.
- What I wrote in
https://forums.freebsd.org/threads/mountd-does-not-respond-via-ipv6-over-a-bridge.97913/
seems to be related.
Note that if on mizar bridge0 is omitted everything works fine.
It is difficult to draw conclusions:
1. Obviously, using the native em0 instead of the ports net/intel-em-kmod does
not make a difference regarding connectivity issues when bridge0 is under load.
2. I also made some speed measurements with the native em0 using iperf and
iperf3. They were good, so maybe bug 235031 is really resolved, although I
still have some doubts.
3. Something is not working correctly with if_bridge, especially under load.
4. Why is it not possible to make VirtualBox vboxnet bridge to bridge0 instead
of em0?
The main issue is 3.
-- Martin
--
You are receiving this mail because:
You are the assignee for the bug.