igb(4) watchdog timeout, lagg(4) fails

Harald Schmalzbauer h.schmalzbauer at omnilan.de
Thu Jan 8 10:22:40 UTC 2015


 Bezüglich Harald Schmalzbauer's Nachricht vom 08.01.2015 11:05
(localtime):
>  Bezüglich Harry Schmalzbauer's Nachricht vom 07.01.2015 06:39 (localtime):
>>  Hello,
>>
>> recently I upgraded one server from 9.1 to 10.1. There are two 82576
>> (one port of two Intel ET Dual-Port GbE [kawela]), driven by igb(4).
>> I've never seen any watchdog timeout with FreeBSD-9.1 but suddenly (with
>> 10-stable) I see:
>> igb0: Watchdog timeout -- resetting
>> igb0: Queue(0) tdh = 2974, hw tdt = 2973
>> igb0: TX(0) desc avail = 0,Next TX to Clean = 0
>>
>> My biggest problem is, that lagg(4) doesn't detect the problem with
>> igb0. It's configured with "lagghash l2' and most connections were
>> interupted until I manually do 'ifconfig igb0 down'. Then lagg does it's
>> job and connectivity was restored via the remaining igb1.
>>
>> Is there a way to auto-if-down an interface which suffers from watchdog
>> timeouts? And any way to really reset it without rebooting the machine?
> igb wathchdog timeout happened again :-( ~48 hours after the last with
> very moderate-to-low avarage traffic.
>
> This time I could fetch dev.igb sysctls before igb0 was reset by watchdog
> It's showing strange irq load:

While systat tells:
   3 igb1:que 0
1619 igb1:que 1
   3 igb1:que 2
   1 igb1:que 3

sysctl dev.igb tells:
dev.igb.1.queue0.interrupt_rate: 43478
dev.igb.1.queue1.interrupt_rate: 76923
dev.igb.1.queue2.interrupt_rate: 111111
dev.igb.1.queue3.interrupt_rate: 90909

How do I have to understand sysctl's interrupt_rate value?

Thanks,

-Harry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20150108/c2953d01/attachment.sig>


More information about the freebsd-stable mailing list