High CPU interrupt load on intel I350T4 with igb on 8.3

Barney Cordoba barney_cordoba at yahoo.com
Sat May 11 15:59:02 UTC 2013



--- On Fri, 5/10/13, Eugene Grosbein <egrosbein at rdtc.ru> wrote:

> From: Eugene Grosbein <egrosbein at rdtc.ru>
> Subject: Re: High CPU interrupt load on intel I350T4 with igb on 8.3
> To: "Barney Cordoba" <barney_cordoba at yahoo.com>
> Cc: freebsd-net at freebsd.org, ""Clément Hermann (nodens)"" <nodens2099 at gmail.com>
> Date: Friday, May 10, 2013, 8:56 AM
> On 10.05.2013 05:16, Barney Cordoba
> wrote:
> 
> >>>> Network device driver is not guilty here,
> that's
> >> just pf's
> >>>> contention
> >>>> running in igb's context.
> >>>
> >>> They're both at play. Single threadedness
> aggravates
> >> subsystems that 
> >>> have too many lock points.
> >>>
> >>> It can also be "solved" with using 1 queue,
> because
> >> then you don't
> >>> have 4 queues going into a single thread.
> >>
> >> Again, the problem is within pf(4)'s global lock,
> not in the
> >> igb(4).
> >>
> > 
> > Again, you're wrong. It's not the bottleneck's fault;
> it's the fault of 
> > the multi-threaded code for only working properly when
> there are no
> > bottlenecks.
> 
> In practice, the problem is easily solved without any change
> in the igb code.
> The same problem will occur for other NIC drivers too -
> if several NICs were combined within one lagg(4). So, driver
> is not guilty and
> solution would be same - eliminate bottleneck and you will
> be fine and capable
> to spread the load on several CPU cores.
> 
> Therefore, I don't care of CS theory for this particular
> case.

Clearly you don't understand the problem. Your logic is that because
other drivers are defective also; therefore its not a driver problem?

The problem is caused by a multi-threaded driver that haphazardly launches
tasks and that doesn't manage the case that the rest of the system can't
handle the load. It's no different than a driver that barfs when mbuf
clusters are exhausted. The answer isn't to increase memory or mbufs,  even
though that may alleviate the problem. The answer is to fix the driver,
so that it doesn't crash the system for an event that is wholly predictable.

igb has 1) too many locks and 2) exasperates the problem by binding to
cpus, which causes it to not only have to wait for the lock to free, but 
also for a specific cpu to become free. So it chugs along happily until 
it encounters a bottleneck, at which point it quickly blows up the entire
system in a domino effect. It needs to manage locks more efficiently, and
also to detect when the backup is unmanageable.

Ever since FreeBSD 5 the answer has been "it's fixed in 7, or its fixed in
9, or it's fixed in 10". There will always be bottlenecks, and no driver
should blow up the system no matter what intermediate code may present a
problem. Its the driver's responsibility to behave and to drop packets
if necessary.

BC


More information about the freebsd-net mailing list