broken re(4)
Gerrit Kühn
gerrit at pmp.uni-hannover.de
Thu May 29 18:45:49 UTC 2008
On Thu, 29 May 2008 18:52:55 +0200 (CEST) Oliver Fromme
<olli at lurza.secnetix.de> wrote about Re: broken re(4):
OF> In that case I would suspect that the one piece of hardware
OF> that is misbehaving is broken and needs to be replaced.
I agree. I just do not know yet which part is broken.
OF> > The only hardware thing that is different in this system from the
OF> > others is an additional SATA-controller. Can there be conflicts with
OF> > this card which are triggering the problems?
OF> I think it's unlikely. Do they share interrupts? (The
OF> output of "vmstat -i" will tell you.)
protoserve# vmstat -i
interrupt total rate
irq0: clk 31564049 1000
irq7: ppbus0 ppc0 1 0
irq8: rtc 4038754 127
irq9: uhci0 uhci1+ 2 0
irq10: re0 re1+ 2401340 76
irq11: atapci0+++ 655498 20
irq14: ata0 11167 0
Total 38670811 1225
Just the two NICs on the same IRQ. A system that is working fine looks
like this:
firefly1# vmstat -i
interrupt total rate
irq0: clk 2614761182 1000
irq1: atkbd0 902 0
irq7: ppbus0 ppc0 1 0
irq8: rtc 334559120 127
irq10: re0 re1+ 24354774 9
irq11: atapci0++++ 70905 0
irq14: ata0 800110 0
Total 2974546994 1138
OF> In theory it could also be a power supply problem. I
OF> assume that you use rather small (thus possibly weak)
OF> power supplies for your ITX machines. Maybe the SATA
OF> controller in that problematic machine drives the power
OF> supply to its limit, and the re(4) interfaces suffer.
OF> You could check whether removing the SATA controller
OF> improves things. Or try to connect a stronger power
OF> supply if you have one available.
I have Travla C146/C147 chassis these macines and use the power supply
that comes with them.
However, the ultimate test for checking the controller-related things is
to simply remove it. I will try this tomorrow (the systems are at work,
and I am at home now - can't unplug a controller via ssh :-).
OF> - Do you see any non-zero numbers in the collision or
OF> error columns of "netstat -i"?
No:
protoserve# netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
re0 1500 <Link#1> 00:30:18:af:19:6a 131032 0 271757 0 0
re0 1500 10.117.0.0 protoserve 80442 - 271722 - -
re1 1500 <Link#2> 00:30:18:af:19:6b 1474484 0 1114542 0 0
re1 1500 192.168.0.0 192.168.2.1 1471156 - 1114457 - -
plip0 1500 <Link#3> 0 0 0 0 0
lo0 16384 <Link#4> 0 0 0 0 0
lo0 16384 fe80:4::1 fe80:4::1 0 - 0 - -
lo0 16384 localhost ::1 0 - 0 - -
lo0 16384 your-net localhost 0 - 0 - -
OF> - Are you sure the interfaces don't have the same MAC
OF> addresses (it's unlikely, but it doesn't hurt to check
OF> in the ifconfig output).
Yes:
protoserve# ifconfig
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
ether 00:30:18:af:19:6a
inet 10.117.15.1 netmask 0xffff0000 broadcast 10.117.255.255
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
ether 00:30:18:af:19:6b
inet 192.168.2.1 netmask 0xffff0000 broadcast 192.168.255.255
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
plip0: flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000
OF> - Are you sure that media and duplex settings are
OF> correct on both sides (i.e. PC and switch)?
The systems are all on the same switch (I also changed the switch during the tests with no change), all devices show a 1GB link.
OF> - Have you tried replacing cables, switch ports, or the
OF> whole switch?
Yes, all of that.
OF> - Have you tried to disable hardware support features
OF> of the driver? In 7-stable re(4) supports quite a lot
OF> of hardware features. See "ifconfig -m". You could
OF> check whether disabling RXCSUM, TXCSUM and/or TSO4
OF> makes a difference.
Another good idea, thanks. I will try that tomorrow, too.
cu
Gerrit
More information about the freebsd-stable
mailing list