broken re(4)

Wed May 28 00:28:32 UTC 2008

On Tue, May 27, 2008 at 04:52:32PM +0200, Gerrit K?hn wrote:
 > Hi folks,
 > 
 > I have four identical ITX boards from Jetway here, each having two re(4)
 > onboard nics:
 > 
 > re0 at pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10
 > hdr=0x00 vendor     = 'Realtek Semiconductor'
 >     device     = 'RTL8169/8110 Family Gigabit Ethernet NIC'
 >     class      = network
 >     subclass   = ethernet
 > re1 at pci0:0:11:0:        class=0x020000 card=0x10ec16f3 chip=0x816710ec
 > rev=0x10 hdr=0x00 vendor     = 'Realtek Semiconductor'
 >     device     = 'RTL8169/8110 Family Gigabit Ethernet NIC'
 >     class      = network
 >     subclass   = ethernet
 > atapci0 at pci0:0:15:0:    class=0x01018f card=0x31491106 chip=0x31491106
 > rev=0x80 
 > 
 > 
 > I run FreeBSD 7-stable from early March 08 on three of these
 > machines and noticed no problems with networking with that so far.
 > Some days ago I installed a fourth machine with 7-stable from early May
 > (and some days later -because of the problems described below- to May
 > 17th). With this new machine I see several networking problems. The most
 > prominent are these two:
 > 
 > - heavy networking traffic (in this case backup via tar & NFS) causes hangs
 > for about 10s-30s and sometimes also leads to watchdog timeouts: 
 > May 27 09:04:07 protoserve kernel: re0: watchdog timeout 
 > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN
 > May 27 09:04:10 protoserve kernel: re0: link state changed to UP
 > 
 > - copying large files (more than some 100MB) via ssh/scp drops the
 > connection due to "corrupted MAC on input":
 > Disconnecting: Corrupted MAC on input.
 > lost connection
 > 
 > In the latter case the networking traffic should actually not be that
 > high, because these are nanobsd systems which are transferring a new image
 > file (system update, 2GB) via ssh (so the bottleneck should be the write
 > speed of the CF card used to hold the system).
 > 
 > 
 > I do not see these problems with the old codebase from March 08 on my old
 > machines. The cvs shows a large MFC for the re-driver in April, so I
 > guessed something came in there which broke things here. Therefore I
 > downgraded the new system to a cvs codebase from March 1st, but the
 > problems persist. They also exist on both interfaces. memtest86 is running
 > for hours now without finding something wrong.
 > 
 > Any hints what I should do next to find the culprit?
 > 

There were similiar reports on this issue. It seems that it's very
hard to make re(4) work so many RTL8168/8169/8111 revisions without
documentation as different revisions require different workaround.
Anyway, would you try this one? The patch was generated against HEAD
but it would apply to STABLE too.
http://people.freebsd.org/~yongari/re/re.HEAD.20080519

-- 
Regards,
Pyun YongHyeon