iwn firmware instability with an up-to-date stable kernel

Garrett Cooper yanefbsd at gmail.com
Sat Apr 24 04:59:19 UTC 2010


On Fri, Apr 23, 2010 at 9:42 PM, Garrett Cooper <yanefbsd at gmail.com> wrote:
> On Fri, Apr 23, 2010 at 8:05 PM, Brandon Gooch
> <jamesbrandongooch at gmail.com> wrote:
>> 2010/4/23 Garrett Cooper <yanefbsd at gmail.com>:
>>> 2010/4/23 Garrett Cooper <yanefbsd at gmail.com>:
>>>> 2010/4/18 Olivier Cochard-Labbé <olivier at cochard.me>:
>>>>> 2010/4/18 Bernhard Schmidt <bschmidt at techwires.net>:
>>>>>> Are you able to reproduce this on demand? As in type a few commands and
>>>>>> the firmware error occurs?
>>>>>>
>>>>>
>>>>> No, I'm not able to reproduce on demand this problem.
>>>>
>>>> I'm seeing similar issues on occasion with my Lenovo as well:
>>>>
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: firmware error log:
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error type      =
>>>> "NMI_INTERRUPT_WDG" (0x00000004)
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: program counter = 0x0000046C
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: source line     = 0x000000D0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error data      = 0x0000000207030000
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: branch link     = 0x00008370000004C2
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: interrupt link  = 0x000006DA000018B8
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: time            = 4287402440
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: driver status:
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  0: qid=0  cur=1   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  1: qid=1  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  2: qid=2  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  3: qid=3  cur=36  queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  4: qid=4  cur=123 queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  5: qid=5  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  6: qid=6  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  7: qid=7  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  8: qid=8  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  9: qid=9  cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 10: qid=10 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 11: qid=11 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 12: qid=12 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 13: qid=13 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 14: qid=14 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 15: qid=15 cur=0   queued=0
>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: rx ring: cur=8
>>>>
>>>> This may be because the system was under load (I was installing a port
>>>> shortly before the connection dropped). I'll try poking at this
>>>> further because it's going to be an annoying productivity loss :/.
>>>
>>>    Sorry... should have included more helpful details.
>>> Thanks,
>>> -Garrett
>>>
>>> dmesg:
>>>
>>> iwn0: <Intel(R) PRO/Wireless 4965BGN> mem 0xdf2fe000-0xdf2fffff irq 17
>>> at device 0.0 on pci3
>>> iwn0: MIMO 2T3R, MoW1, address 00:1d:e0:7d:9f:c7
>>> iwn0: [ITHREAD]
>>> iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
>>> iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
>>> iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
>>> 24Mbps 36Mbps 48Mbps 54Mbps
>>>
>>> pciconf -lv snippet:
>>>
>>> iwn0 at pci0:3:0:0:        class=0x028000 card=0x11108086 chip=0x42308086
>>> rev=0x61 hdr=0x00
>>>    vendor     = 'Intel Corporation'
>>>    device     = 'Intel Wireless WiFi Link 4965AGN (Intel 4965AGN)'
>>>    class      = network
>>> cbb0 at pci0:21:0:0:       class=0x060700 card=0x20c617aa chip=0x04761180
>>> rev=0xba hdr=0x02
>>>
>>> uname -a:
>>>
>>> $ uname -a
>>> FreeBSD garrcoop-fbsd.cisco.com 8.0-STABLE FreeBSD 8.0-STABLE #0
>>> r207006: Wed Apr 21 13:18:44 PDT 2010
>>> root at garrcoop-fbsd.cisco.com:/usr/obj/usr/src/sys/LAPPY_X86  i386
>>
>> I'm actually looking at this right now. For me, it's actually
>> happening when my machine stays on overnight (or for long periods of
>> time, idle).
>>
>> Also, it seems to be causing the kernel to panic, although I'm now
>> wondering if the Machine Check Architecture is somehow catching this
>> device error and causing an exception (hw.mca.enabled=1)(?) -- not
>> possible, right ???
>>
>> Whatever the case, I can't seem to get the firmware error to occur
>> with iwn(4) debugging or wlandebug options enabled, so who knows
>> exactly what leads to this.
>>
>> I know Bernhard has worked hard on this driver, it's a shame that this
>> freaky bug has bit us all now, without leaving many clues :(
>>
>> I've attached a textdump for posterity if nothing else :)
>
>    Connectivity appears to be shoddy in my neck of the woods (kind of
> ironic... but meh). Just running buildworld, buildkernel, then doing a
> tcpdump in parallel causes the pseudo device to go up and down a lot.
> I assume this isn't standard behavior?
>    Just for reference buildworld was started shortly after 19:39:05,
> and it finished at 21:29. The interface has also gone up and down once
> since then while the system's been basically idle.

    Hmmm... I'm seem to be in an excellent position to reproduce this
issue. I've reproduced it twice by merely bringing the interface up
and down several times using:

ifconfig_wlan0="WPA DHCP"

    instead of my usual:

ifconfig_wlan0="WPA ssid <base-station-id1> DHCP"

    Maybe others who are experiencing the issue should try that? I'll
do more testing when I get home...
Thanks,
-Garrett


More information about the freebsd-stable mailing list