Diagnosing packet loss

Matthew Seaman m.seaman at infracaninophile.co.uk
Thu Nov 24 11:43:14 UTC 2011


On 24/11/2011 10:07, Kees Jan Koster wrote:
> This seems to be local to my machine. Here is another reason why I
> say that: I can reliably transmit data when I bind to the aliased IP
> address: If I use mtr to measure packet loss from saffron (the stricken
> machine) to cumin (another machine in a different data center) I see the
> following:
> 
>  saffron (ip address a) -> cumin: packet loss
>  saffron (ip address b) -> cumin: no packet loss
> 
>  cumin -> saffron (ip address a): packet loss
>  cumin -> saffron (ip address b): no packet loss
> 
> This is consistent from running mtr for 5 minutes straight. This to
> me shows that the hardware is fine. Using the alias IP address I can
> run with no packet loss for as long as I like.
> 
> Sooo.... Now what? I am completely at a loss. :-/

Hmm... I wouldn't dismiss hardware problems just yet. Earlier you showed
the ifconfig output for your problem machine:

> [kjkoster at saffron ~]$ ifconfig bge0
> bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> 	options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE>
> 	ether 00:e0:81:32:ed:b4
> 	inet 91.196.169.165 netmask 0xfffffff8 broadcast 91.196.169.167
> 	inet 91.196.169.166 netmask 0xffffffff broadcast 91.196.169.166
> 	media: Ethernet autoselect (100baseTX <full-duplex,flowcontrol,rxpause,txpause>)
> 	status: active

Where there is a one-bit difference between the addresses.  Can you try
temporarily using two even-numbered addresses and then two odd-numbered
addresses and repeat your mtr tests?  If the packet loss problem
correlates with whether the address is even or odd, then I think that's
pretty good evidence for a dud network interface: a one-bit problem in a
memory register somewhere, occasionally flipping the least significant
bit in the address to 0.

Another test would be to swap the configuration order (ie. make .166 the
primary address and .165 the alias) -- if it's always the first
configured address that has problems, again that indicates memory
trouble in the hardware.

Are these NICs built-in to your motherboard?  If so, they will almost
certainly share a PHY, which is where the problem would be, and why
swapping the cables between interfaces made no difference.
Unfortunately in that case to fix the problem, you'll either have to
swap out the motherboard or add a separate NIC card to your system.
Hopefully the system is still under warranty.

	Cheers,

	Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
JID: matthew at infracaninophile.co.uk               Kent, CT11 9PW

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 267 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20111124/7c106e38/signature.pgp


More information about the freebsd-questions mailing list