[Bug 287163] if_bridge: network problems under load

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 30 May 2025 12:43:08 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287163

            Bug ID: 287163
           Summary: if_bridge: network problems under load
           Product: Base System
           Version: 14.2-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: d8zNeCFG@aon.at

This is a somewhat complex scenario involving four hosts; hopefully this does
not throw off a prospective reader:
- mizar: Laptop running FreeBSD stable/14 ca. Feb. 1
  . interface em0 1 Gbps (here using the native driver - see also bug 235031)
  . em0 as only member of bridge0 (see also bug 287146)
  . serving as a VirtualBox host
    - Windows 10 client
    - Its network interface is bridged to em0 (using vboxnet); it is not
possible to bridge to bridge0, doing so generates an error message
  . So the disk goes via iSCSI and bridge0, the vbox client's network goes via
vbox bridging via em0
  . mizar runs the UI (KDE).
- orion: Fast modern server running openSUSE Leap 15.6
  . serves iSCSI disks, one of them for the Windows 10 VirtualBox client on
mizar
  . 1 Gbps interface
  . (It could run FreeBSD instead, but then -> bug 286869)
- hal: An old server running FreeBSD stable/14
  . 1 Gbps interface
- gandalf: An old laptop running FreeBSD stable/14
  . 100 Mbps interface
  . Internet gateway including IPv6 via 6to4 (stf)

- The complete network is dual IPv4/IPv6, with RFC1918 addresses for IPv4 and
site-local addresses for IPv6. Also running is rtsol/rtadv, resulting in fully
routable IPv6 addresses for all hosts (if enabling auto_linklocal on bridge 0,
see the non-bug 287146).

- The Windows 10 VirtualBox client is started on mizar. It gets its disk via
iSCSI from orion.
- This client contains a cygwin installation. The "find" command is used to
search for files with certain characteristics in c:\Windows.
- This generates a significant load on the disk, therefore via iSCSI to orion,
therefore via bridge0.
- There should not be a great load from the vboxnet interface via em0, except
maybe that Windows is doing some background updates or whatever.

- In addition, there are xterms, xloads, and other programs running on gandalf,
orion, and hal, which are all displaying on mizar. On mizar, this results in
something like this:

[0]# lsof | grep :x11
Xorg       2677       root    4u     IPv6    0xfffff80240772000          0    
TCP *:x11 (LISTEN)
Xorg       2677       root    5u     IPv4    0xfffff80055560a80          0    
TCP *:x11->*:* (LISTEN)
Xorg       2677       root   91u     IPv4    0xfffff80324a4b000          0    
TCP mizar.xyzzy:x11->gandalf.xyzzy:12378 (ESTABLISHED)
Xorg       2677       root   92u     IPv4    0xfffff8005599b540          0    
TCP mizar.xyzzy:x11->gandalf.xyzzy:37719 (ESTABLISHED)
Xorg       2677       root   93u     IPv4    0xfffff80055560000          0    
TCP mizar.xyzzy:x11->gandalf.xyzzy:10580 (ESTABLISHED)
Xorg       2677       root   94u     IPv4    0xfffff8048c0bba80          0    
TCP mizar.xyzzy:x11->gandalf.xyzzy:30011 (ESTABLISHED)
Xorg       2677       root   95u     IPv4    0xfffff800aaac1000          0    
TCP mizar.xyzzy:x11->hal.xyzzy:24597 (ESTABLISHED)
Xorg       2677       root   96u     IPv4    0xfffff802d8f7c540          0    
TCP mizar.xyzzy:x11->hal.xyzzy:54794 (ESTABLISHED)
Xorg       2677       root   97u     IPv4    0xfffff8048ca14000          0    
TCP mizar.xyzzy:x11->orion.xyzzy:51438 (ESTABLISHED)
Xorg       2677       root   98u     IPv4    0xfffff802d8467000          0    
TCP mizar.xyzzy:x11->hal.xyzzy:10010 (ESTABLISHED)
Xorg       2677       root  101u     IPv4    0xfffff8048c0bc540          0    
TCP mizar.xyzzy:x11->orion.xyzzy:41936 (ESTABLISHED)
Xorg       2677       root  102u     IPv4    0xfffff803373cb000          0    
TCP mizar.xyzzy:x11->orion.xyzzy:41950 (ESTABLISHED)
Xorg       2677       root  103u     IPv4    0xfffff800aaac1a80          0    
TCP mizar.xyzzy:x11->orion.xyzzy:41952 (ESTABLISHED)
[0]# 

Result:
- After a while, the connections to the remote X programs from orion and hal
are dropped, but not from gandalf (this could be reproduced at least once
already, with net/intel-em-kmod).
- Because gandalf still has a working xterm, the following can be seen there:
  . "arp mizar" still displays an entry
  . "ndp mizar" has no entry anymore
- Going via gandalf to hal or orion, one can see that they have neither an arp
nor an ndp entry for mizar anymore.
- Strangely enough, the iSCSI connection from VirtualBox to orion continues to
work for a little longer, until it is also dropped and the VirtualBox client
stops with a corresponding error message.
- Some seconds after the VirtualBox client is stopped (and therefore the
network load via the bridge is gone), hal and orion can successfully create arp
and ndp entries for mizar, and from then on direct connections are possible
again.
- Once the direct connections was possible again, I resumed the VirtualBox
Windows 10 client.
- After a while, this again results in (some, but not all) x11 connections
being dropped. And then also the iSCSI connection, again stopping the
VirtualBox client.

- What I wrote in
https://forums.freebsd.org/threads/mountd-does-not-respond-via-ipv6-over-a-bridge.97913/
seems to be related.

Note that if on mizar bridge0 is omitted everything works fine.

It is difficult to draw conclusions:
1. Obviously, using the native em0 instead of the ports net/intel-em-kmod does
not make a difference regarding connectivity issues when bridge0 is under load.
2. I also made some speed measurements with the native em0 using iperf and
iperf3. They were good, so maybe bug 235031 is really resolved, although I
still have some doubts.
3. Something is not working correctly with if_bridge, especially under load.
4. Why is it not possible to make VirtualBox vboxnet bridge to bridge0 instead
of em0?

The main issue is 3.

-- Martin

-- 
You are receiving this mail because:
You are the assignee for the bug.