broken re(4)

Tue May 27 15:45:21 UTC 2008

Gerrit Kühn wrote:
> Hi folks,
> 
> I have four identical ITX boards from Jetway here, each having two re(4)
> onboard nics:
> 
> re0 at pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10
> hdr=0x00 vendor     = 'Realtek Semiconductor'
>     device     = 'RTL8169/8110 Family Gigabit Ethernet NIC'
>     class      = network
>     subclass   = ethernet
> re1 at pci0:0:11:0:        class=0x020000 card=0x10ec16f3 chip=0x816710ec
> rev=0x10 hdr=0x00 vendor     = 'Realtek Semiconductor'
>     device     = 'RTL8169/8110 Family Gigabit Ethernet NIC'
>     class      = network
>     subclass   = ethernet
> atapci0 at pci0:0:15:0:    class=0x01018f card=0x31491106 chip=0x31491106
> rev=0x80
> 
> 
> I run FreeBSD 7-stable from early March 08 on three of these
> machines and noticed no problems with networking with that so far.
> Some days ago I installed a fourth machine with 7-stable from early May
> (and some days later -because of the problems described below- to May
> 17th). With this new machine I see several networking problems. The most
> prominent are these two:
> 
> - heavy networking traffic (in this case backup via tar & NFS) causes hangs
> for about 10s-30s and sometimes also leads to watchdog timeouts:
> May 27 09:04:07 protoserve kernel: re0: watchdog timeout
> May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN
> May 27 09:04:10 protoserve kernel: re0: link state changed to UP
> 
> - copying large files (more than some 100MB) via ssh/scp drops the
> connection due to "corrupted MAC on input":
> Disconnecting: Corrupted MAC on input.
> lost connection
> 
> In the latter case the networking traffic should actually not be that
> high, because these are nanobsd systems which are transferring a new image
> file (system update, 2GB) via ssh (so the bottleneck should be the write
> speed of the CF card used to hold the system).
> 
> 
> I do not see these problems with the old codebase from March 08 on my old
> machines. The cvs shows a large MFC for the re-driver in April, so I
> guessed something came in there which broke things here. Therefore I
> downgraded the new system to a cvs codebase from March 1st, but the
> problems persist. They also exist on both interfaces. memtest86 is running
> for hours now without finding something wrong.
> 
> Any hints what I should do next to find the culprit?
> 

I'm running 6.3 on the exact same Jetway board at home, and while I
haven't been bitten by the DOWN/UP issue I have seen the occasional
"corrupted MAC on input" error when doing an ssh/scp. Seems to have
simmered-down since moving from 6.3-RELEASE to 6.3-STABLE (last
supped/rebuilt on 5/6/08).

Note this is using only one of the 2 on-board NICs. I disabled the 2nd
one in the BIOS as I don't need it at the moment.

-Proto