Diagnosing packet loss

Kees Jan Koster kjkoster at gmail.com
Tue Nov 22 21:58:19 UTC 2011

Dear Gary,

Thank you for your reply. Your comment about dupe IP triggered something that I failed to mention: the interface is aliased. It has two IP addresses. IP address a and it has an alias IP address b. I just tested binding mtr to each of these interfaces separately to measure packet loss.

If I use mtr to measure packet loss from saffron (the stricken machine) to cumin (another machine in a different data center) I see the following:

  saffron (ip address a) -> cumin: packet loss
  saffron (ip address b) -> cumin: no packet loss

  cumin -> saffron (ip address a): packet loss
  cumin -> saffron (ip address b): no packet loss

This is consistent from running mtr for 5 minutes straight. This to me shows that the hardware is fine. Using the alias IP address I can run with no packet loss for as long as I like.

Hum.... Could it be that my switch does not support IP aliasing? Then why is there packet loss only on one IP and not on both?

This is getting weirder and weirder.

Kees Jan

On 22 Nov 2011, at 22:15, Gary Gatten wrote:

> Well, 1% is not good but I've seen worse for sure!  Sounds like you tried the obvious.  I would recommend a different IP to rule out a dupe ip; else it must be NIC related - either hardware or driver.  Also, perhaps swap cables and ports with a working machine and see if the problem follows or stays put.
> ----- Original Message -----
> From: Kees Jan Koster [mailto:kjkoster at gmail.com]
> Sent: Tuesday, November 22, 2011 02:33 PM
> To: freebsd-questions at freebsd.org <freebsd-questions at freebsd.org>
> Subject: Diagnosing packet loss
> Dear All,
> I am stuck with a machine that shows serious packet loss (about 1% of all traffic is dropped). I tried the obvious (new network cable, different switch port, different ethernet interface on the machine), but the problems remain.
> Another machine that sits in the same rack and is hooked up to the same switch shows no such packet loss issues. The problematic machine is a dual Opteron with FreeBSD 8.2-STABLE from Thu Aug 11 14:05:47 CEST 2011.
> The machine is lightly loaded. A MySQL slave is running, but the machine is not serving queries. Plus a Munin server process.
> I am at a loss where to start diagnosing this. Can you advise me where to look? Are there network buffers that may be overflowing?
> --
> Kees Jan
> http://java-monitor.com/
> kjkoster at kjkoster.org
> +31651838192
> Change is good. Granted, it is good in retrospect, but change is good.
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"
> <font size="1">
> <div style='border:none;border-bottom:double windowtext 2.25pt;padding:0in 0in 1.0pt 0in'>
> </div>
> "This email is intended to be reviewed by only the intended recipient
> and may contain information that is privileged and/or confidential.
> If you are not the intended recipient, you are hereby notified that
> any review, use, dissemination, disclosure or copying of this email
> and its attachments, if any, is strictly prohibited.  If you have
> received this email in error, please immediately notify the sender by
> return email and delete this email from your system."
> </font>

Kees Jan

kjkoster at kjkoster.org

I hate unit tests; I much prefer the illusion that there are no errors in my code.
                                                                 -- Hendrik Muller

More information about the freebsd-questions mailing list