FreeBSD 7-STABLE mbuf corruption

Tue Sep 13 18:36:54 UTC 2011

Hi,

On Wed, Sep 7, 2011 at 7:57 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> I have seen this, but I don't have any hot ideas right off the top of my
> head yet :(
>
I've been running for 19h now the following patches:
 - backport of kmacy@'s buf_ring(9) API, from FreeBSD 8 (from [0], see
attachment for full diff)
 - conversion of igb(4), from CURRENT, to use buf_ring(9) on FreeBSD
7.1 (see attachment)
 - all the original patches I already sent

It did not crash, yet. The only downside is that after 3h30 and ~4h,
igb(4) queues' handler started spinning infinitely, breaking network
connectivity.

I would be tempted to say that the infinite loop issue is an igb(4)
(separate from the original crashes), and to link the crashes I was
seeing to a race in the legacy IFQ code...

 - Arnaud

[0]: roughly, a cherry-pick of r185162, r185164, r185193, r185543,
r186207, r186213, r191033, r191161, r191899, r193848 and r194518.

> Jack
>
>
> On Wed, Sep 7, 2011 at 4:19 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>>
>> Hi,
>>
>> On Mon, Sep 5, 2011 at 2:59 AM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>> > Hi folks,
>> >
>> > We have been trying to track down a bad mbuf management for about two
>> > weeks on a customized 7.1 base. I have finally been able to reproduce
>> > it with a stock FreeBSD 7-STABLE (kernel from r225276, userland from
>> > 7.4).
>> >
>> > With the help of the attached patches, I have just been able to
>> > trigger the following panic:
>> >
>> > panic: Corrupted unused flags, expected 0xffffffff00000000, got 0x0,
>> > flags 0x3
>> > cpuid = 1
>> > Uptime: 3d10h5m3s
>> > Cannot dump. No dump device defined
>> >
>> General form of the crash is:
>>
>> panic: Corrupted unused flags, expected 0xffffffff00000000, got
>> 0xbabe0000000000, flags 0xbabe0000babe00
>> cpuid = 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper(c0874e29,0,c0835757,f4574c48,0,...) at
>> db_trace_self_wrapper+0x26
>> panic(c0835757,0,ffffffff,0,babe00,...) at panic+0x10b
>> igb_txeof(c6a25008,0,c0837083,5ea,17c,...) at igb_txeof+0x399
>> igb_msix_que(c6a2b800,0,c084d367,4b6,c69dd068,...) at igb_msix_que+0x7b
>> ithread_loop(c6a29090,f4574d38,c084d0db,31c,c6a16828,...) at
>> ithread_loop+0xc3
>> fork_exit(c061d520,c6a29090,f4574d38) at fork_exit+0xa6
>> fork_trampoline() at fork_trampoline+0x8
>> --- trap 0, eip = 0, esp = 0xf4574d70, ebp = 0 ---
>> Uptime: 1m42s
>>
>> It happens particularly easily when the box receives wall of SYN
>> (about 1000 cnx attempts at once) every 5s or so.
>>
>>  - Arnaud
>>
>> >
>> > [cut stuff no one cares about...]
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buf_ring_backport.diff
Type: text/x-patch
Size: 42539 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110913/8c925ac5/buf_ring_backport.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-IGB_MULTIQUEUE.patch
Type: text/x-patch
Size: 5642 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110913/8c925ac5/0001-IGB_MULTIQUEUE.bin