Abysmal re(4) performance under 8.1-STABLE (mid-August)

Tue Nov 16 11:55:55 UTC 2010

On Mon, 15.11.2010 at 18:03:25 -0800, Pyun YongHyeon wrote:
> On Sun, Nov 14, 2010 at 10:41:03AM +0100, Ulrich Sp??rlein wrote:
> > On Mon, 08.11.2010 at 22:41:12 +0100, Ulrich Sp??rlein wrote:
> > > On Sun, 07.11.2010 at 15:10:20 -0800, Pyun YongHyeon wrote:
> > > > On Sun, Nov 07, 2010 at 12:24:21PM +0100, Ulrich Sp??rlein wrote:
> > > > > On Sat, 06.11.2010 at 23:19:33 -0700, Pyun YongHyeon wrote:
> > > > > > On Sat, Nov 6, 2010 at 2:37 AM, Ulrich Sp??rlein <uqs at spoerlein.net> wrote:
> > > > > > > Hello Pyun,
> > > > > > >
> > > > > > > On this new server, I cannot get more than ~280kByte/s up/downstream out of
> > > > > > > re(4) without any tweaking.
> > > > > > >
> > > > > > > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > > > > > > ?? ?? ?? ??options=389b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
> > > > > > > ?? ?? ?? ??ether 00:21:85:63:74:34
> > > > > > > ?? ?? ?? ??inet6 fe80::221:85ff:fe63:7434%re0 prefixlen 64 scopeid 0x1
> > > > > > > ?? ?? ?? ??inet 46.4.12.147 netmask 0xffffffc0 broadcast 46.4.12.191
> > > > > > > ?? ?? ?? ??nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
> > > > > > > ?? ?? ?? ??media: Ethernet autoselect (100baseTX <half-duplex>)
> > > > > > > ?? ?? ?? ??status: active
> > > > > > >
> > > > > > 
> > > > > > It seems the link was resolved to half-duplex. Does link partner
> > > > > > also agree on the resolved speed/duplex?
> > > > > 
> > > > > As this is a dedicated server in a colo hundreds of km away, I have no
> > > > > means to check this easily. Especially I cannot change the setting from
> > > > > auto-neg. Btw, linux will show a negotiated 100/full link via mii-tool.
> > > > > 
> > > > 
> > > > I guess you can contact network administrator of the data center to
> > > > check the switch configuration. IEEE 802.3 says if link parter use
> > > > forced full-duplex media and you use auto media, the resolved
> > > > duplex is half-duplex by definition. I think RealTek may have
> > > > followed the standard. There is no reason to use manual media
> > > > configuration unless your link partner is severely broken with
> > > > auto-negotiation.
> > > > 
> > > > Due to silicon bug of RealTek PHYs, rgephy(4) always use
> > > > auto-negotiation so manual media configuration is a kind of
> > > > auto-negotiation with limited set of available media advertising.
> > > > I don't know how Linux solve the silicon bug though. One of magic
> > > > DSP fixups might fix the issue, the DSP fixups vendor released is
> > > > not under BSD license and does not say more detailed information
> > > > for the code.
> > > 
> > > Luckily the provider switch me to another switch that is set to
> > > autoneg, instead of hardcoded to 100/full. re(4) now happily transfers
> > > with reasonable speeds, ie. 11MByte/s.
> > 
> > Alas, spoken too soon. While the throughput is now up to speed, I have
> > severe problems with packet loss on this device. Again, the linux rescue
> > system works fine, but under a recent -STABLE (including your latest
> > MFCs) I get an average packet loss of 10-20%. But it is not constant,
> > meaning every 5th packet or so, but instead will drop no packets for
> > minutes-hours and then blackout for 1-5 min straight (these times are
> > estimates, I haven't used a stop watch or anything.)
> > 
> > At first, putting the card into promisc mode seemed to alleviate the
> > issue, but the average ping packet loss during the last 10h was again up
> > to 10%. Due to the "blackout" nature, this drops all TCP sessions and is
> > really annoying.
> > 
> > Do you have any other ideas that I could try? Or should I simply switch
> > to a different hardware altogether?
> > 
> 
> Could you try latest re(4) in HEAD? It has a new feature that
> displays hardware MAC counters and it contains a couple of PHY
> access enhancements. You would get the MAC counters on console with
> "sysctl dev.re.0.stats=1". And let me know how many frames were
> dropped.

This is very weird, I managed to narrow it down to IPv4 forwarding, that
is making the box unreachable for certain periods.

One the server, I did:
# sysctl net.inet.ip.forwarding=1; sleep 7200; sysctl net.inet.ip.forwarding=0

And on a client, I did:
# ping -c 7200 <ipv4-address>
...
7200 packets transmitted, 5269 packets received, 26.8% packet loss
round-trip min/avg/max/stddev = 5.386/23.155/64.633/21.594 ms

I have a couple of virtual interfaces, gif, tun (openvpn) and pf(4)
running on that box, but no routing daemons or anything. I also had an
IPv6 ping running simultaneously (using net/mtr) and this is showing me
~0% packet loss over IPv6 (net.inet6.ip6.forwarding is set, but this
does not exhibit the problem).

Now I had this very same software setup running on another box without
any blackouts, so I'm wondering what the re(4) hardware would have to do
with IPv4 forwarding.

Next up, I'm going to test if net.inet.ip.fastforwarding exhibits the
same blackouts. I will also take down all other interfaces, pf and
openvpn and see if I can make any sense of this.

Uli

re0 statistics:
Tx frames : 391300
Rx frames : 383759
Tx errors : 0
Rx errors : 0
Rx missed frames : 0
Rx frame alignment errs : 0
Tx single collisions : 0
Tx multiple collisions : 0
Rx unicast frames : 369915
Rx broadcast frames : 446
Rx multicast frames : 13398
Tx aborts : 0
Tx underruns : 0