igb network lockups

Barney Cordoba barney_cordoba at yahoo.com
Sat Mar 2 14:28:13 UTC 2013



--- On Mon, 2/25/13, Christopher D. Harrison <harrison at biostat.wisc.edu> wrote:

> From: Christopher D. Harrison <harrison at biostat.wisc.edu>
> Subject: Re: igb network lockups
> To: "Jack Vogel" <jfvogel at gmail.com>
> Cc: freebsd-net at freebsd.org
> Date: Monday, February 25, 2013, 1:38 PM
> Sure,
> The problem appears on both systems running with ALTQ and
> vanilla.
>      -C
> On 02/25/13 12:29, Jack Vogel wrote:
> > I've not heard of this problem, but I think most users
> do not use 
> > ALTQ, and we (Intel) do not
> > test using it. Can it be eliminated from the equation?
> >
> > Jack
> >
> >
> > On Mon, Feb 25, 2013 at 10:16 AM, Christopher D.
> Harrison 
> > <harrison at biostat.wisc.edu
> <mailto:harrison at biostat.wisc.edu>>
> wrote:
> >
> >     I recently have been
> experiencing network "freezes" and network
> >     "lockups" on our Freebsd 9.1
> systems which are running zfs and nfs
> >     file servers.
> >     I upgraded from 9.0 to 9.1
> about 2 months ago and we have been
> >     having issues with almost
> bi-monthly.   The issue manifests in the
> >     system becomes unresponsive to
> any/all nfs clients.   The system
> >     is not resource bound as our
> I/O is low to disk and our network is
> >     usually in the 20mbit/40mbit
> range.   We do notice a correlation
> >     between temporary i/o spikes
> and network freezes but not enough to
> >     send our system in to "lockup"
> mode for the next 5min.   Currently
> >     we have 4 igb nics in 2 aggr's
> with 8 queue's per nic and our
> >     dev.igb reports:
> >
> >     dev.igb.3.%desc: Intel(R)
> PRO/1000 Network Connection version - 2.3.4
> >
> >     I am almost certain the problem
> is with the ibg driver as a friend
> >     is also experiencing the same
> problem with the same intel igb nic.
> >       He has addressed the
> issue by restarting the network using netif
> >     on his
> systems.   According to my friend, once the
> network
> >     interfaces get cleared,
> everything comes back and starts working
> >     as expected.
> >
> >     I have noticed an issue with
> the igb driver and I was looking for
> >     thoughts on how to help address
> this problem.
> >     http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html
> >
> >     Thoughts/Ideas are greatly
> appreciated!!!
> >
> >         -C

Do you have 32 cpus in the system? You've created a lock contention
nightmare; frankly Im surprised that the system runs at all.

Try running with 1 queue per nic. The point of using queues is to spread
the load; the fact that you're even using queues with such a minuscule load
is a commentary on the blind use of "features" without any explanation or
understanding of what they do.

Does igb still bind to CPUs without regard to whether its a real cpu or
a hyper thread? This needs to be removed.

I wish that someone who understood this stuff would have a beer with Jack
and explain to him why this design is defective. The "default" for this
driver is almost always the wrong configuration.

You don't need to spread the load with 40Mb/s throughput, and using
multiple queues will use a lot more CPU than using just 1. do you really
want 4 cpus using 10% instead of 1 using 14%?

You also should consider increasing your tx buffers; a property of 
applications like ALTQ is that they tend to send out big bursts of 
packets and they can overflow the rings. I'm not specifically familiar with
ALTQ so Im not sure how it handles such things; nor am I sure of how it
handles multiple tx queues, if at all.

BC


More information about the freebsd-net mailing list