Dear All,

Thank you so much for the excellent suggestions. I can tell some of you have a lot of experience troubleshooting this issue.

At this stage I ruled out hardware or network issues. These are server grade network interfaces, new cables and the ifconfig configuration seems in order. netstat shows no collisions or packet errors for the past week or so.

I am dead certain there is no dupe IP. The other machines on the switch are currently off (test and load test box) and I still see packet loss. There simply is no other machine on the subnet that might have the same IP.

This seems to be local to my machine. Here is another reason why I say that: I can reliably transmit data when I bind to the aliased IP address: If I use mtr to measure packet loss from saffron (the stricken machine) to cumin (another machine in a different data center) I see the following:

 saffron (ip address a) -> cumin: packet loss
 saffron (ip address b) -> cumin: no packet loss

 cumin -> saffron (ip address a): packet loss
 cumin -> saffron (ip address b): no packet loss

This is consistent from running mtr for 5 minutes straight. This to me shows that the hardware is fine. Using the alias IP address I can run with no packet loss for as long as I like.

Sooo.... Now what? I am completely at a loss. :-/
