Problems with BCE network adapter (Dell PE2950)

Tom Judge tom at tomjudge.com
Thu Jun 28 14:29:34 UTC 2007


Dave,

Sorry for the top post, but I have just managed to repeat is exact crash 
twice on a new PE 1950 system.   I have core files available.

It seems that after a couple of reboots the problem goes away. The 
system actually crashed 4 times but 2 of the cores where corrupt.

It also seems that the system will be stable if the following message is 
not produced shortly after /etc/rc.d/netif start:

bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 > 
0x01FE)!

I have attached the chip information bellow.

Any help with this would be appreciated as we now have 21 systems 
PE[12]950 systems which randomly crash due to the original error

bce0: discard frame w/o leading ethernet header (len 4294967292 pkt len 
4294967292)

Tom

PE 2950 Chips:

bce0 at pci9:0:0:  class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11 
hdr=0x00
     vendor   = 'Broadcom Corporation'
     class    = network
     subclass = ethernet
--
bce1 at pci5:0:0:  class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11 
hdr=0x00
     vendor   = 'Broadcom Corporation'
     class    = network
     subclass = ethernet



PE1950 Chips:

bce0 at pci9:0:0:  class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12 
hdr=0x00
     vendor   = 'Broadcom Corporation'
     class    = network
     subclass = ethernet
--
bce1 at pci5:0:0:  class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12 
hdr=0x00
     vendor   = 'Broadcom Corporation'
     class    = network
     subclass = ethernet

Tom Judge wrote:
> David Christensen wrote:
>> Tom,
>>
>> There's already some debug code to watch for unusual size packets.
>> If you can recompile the driver from HEAD with the attached diffs
>> we can printout the first 128 bytes of any unusual sized packets.
>>
>> This does enabled other debugging code so performance will drop
>> but that should be OK since this doesn't present as a performance
>> problem.
>>
>> Dave
>>
> <SNIP/>
> I am currently running the driver from RELENG_6 (With the MSI code 
> backed out and your patch applied by hand) on a 6.2-p5 amd64 system 
> (Dell PE2950) and have managed to get the following crash.
> 
> The crash was caused by "cat * >/dev/null" in an NFS mounted directory.
> 
> I'm not sure if this is the same crash but some other boxes (identical) 
> to this one have crashed first time they are rebooted with the new 
> driver. Unfortunately I have not managed to get a dump from one of these 
> crashes yet.
> 
> Also I am seeing a lot of these messages on boxes running this driver:
> 
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5!
> 
> It seems to be caused by NFS traffic.
> 
> I still have the core file if you need any more information.
> 
> Tom
> 
> kgdb  /usr/obj/usr/src/sys/PE2950/kernel.debug vmcore.0
> [GDB will not be able to debug user-mode threads: 
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd".
> 
> Unread portion of the kernel message buffer:
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x2F0A!
> <SNIP LOTS OF THESE ERRORS>
> bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFFB > 
> 0x01FE)!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xF043!
> bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 > 
> 0x01FE)!
> bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x8C5F!
> bce0: /usr/src/sys/dev/bce/if_bce.c(3973): Unexpected mbuf found in 
> rx_bd[0x005A]!
> bce0: ----------------------------  Driver State 
> ----------------------------
> bce0: 0xFFFFFFFF:8B92A000 - (sc) driver softc structure virtual address
> bce0: 0xFFFFFF00:F4000000 - (sc->bce_vhandle) PCI BAR virtual address
> bce0: 0xFFFFFF00:009E3680 - (sc->status_block) status block virtual address
> bce0: 0xFFFFFF00:009D6400 - (sc->stats_block) statistics block virtual 
> address
> bce0: 0xFFFFFFFF:8B92A1B0 - (sc->tx_bd_chain) tx_bd chain virtual adddress
> bce0: 0xFFFFFFFF:8B92A1E8 - (sc->rx_bd_chain) rx_bd chain virtual address
> bce0: 0xFFFFFFFF:8B92B260 - (sc->tx_mbuf_ptr) tx mbuf chain virtual address
> bce0: 0xFFFFFFFF:8B92D260 - (sc->rx_mbuf_ptr) rx mbuf chain virtual address
> bce0:          0x0000357F - (sc->interrupts_generated) h/w intrs
> bce0:          0x00002981 - (sc->rx_interrupts) rx interrupts handled
> bce0:          0x0000212A - (sc->tx_interrupts) tx interrupts handled
> bce0:          0x0000706B - (sc->last_status_idx) status block index
> bce0:          0x0000675E - (sc->tx_prod) tx producer index
> bce0:          0x00006707 - (sc->tx_cons) tx consumer index
> bce0:          0x001B39EA - (sc->tx_prod_bseq) tx producer bseq index
> bce0:          0x0000F25C - (sc->rx_prod) rx producer index
> bce0:          0x0000F059 - (sc->rx_cons) rx consumer index
> bce0:          0x0B850C00 - (sc->rx_prod_bseq) rx producer bseq index
> bce0:          0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated
> bce0:          0x0000FFF8 - (sc->free_rx_bd) free rx_bd's
> bce0: 0x00000000/000001FE - (sc->rx_low_watermark) rx low watermark
> bce0:          0x0000001D - (sc->txmbuf_alloc) tx mbufs allocated
> bce0:          0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated
> bce0:          0x00000057 - (sc->used_tx_bd) used tx_bd's
> bce0: 0x000001FE/000001FE - (sc->tx_hi_watermark) tx hi watermark
> bce0:          0x00000000 - (sc->mbuf_alloc_failed) failed mbuf alloc
> bce0: 
> ------------------------------------------------------------------------
> bce0: ----------------------------  Status Block 
> ----------------------------
> bce0: attn_bits  = 0x00000001, attn_bits_ack = 0x00000001, index = 0x70BF
> bce0: rx_cons0   = 0x0000F061, tx_cons0      = 0x0000675E
> bce0: status_idx = 0x70BF
> bce0: 
> ------------------------------------------------------------------------
> 
> 
> Fatal trap 3: breakpoint instruction fault while in kernel mode
> cpuid = 4; apic id = 04
> instruction pointer     = 0x8:0xffffffff801ee956
> stack pointer           = 0x10:0xffffffffb6d60b40
> frame pointer           = 0x10:0x5a
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, IOPL = 0
> current process         = 27 (irq16: bce0 bce1)
> trap number             = 3
> panic: breakpoint instruction fault
> cpuid = 4
> Uptime: 3m10s
> Dumping 8191 MB (3 chunks)
>   chunk 0: 1MB (156 pages) ... ok
>   chunk 1: 3327MB (851624 pages) 3311 3295 3279 3263 3247 3231 3215 3199 
> 3183 31
> <SNIP>
> #0  doadump () at pcpu.h:172
> 172     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:172
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff8029e0e7 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:409
> #3  0xffffffff8029e781 in panic (fmt=0xffffff021ef0a4c0 
> "?\206?\036\002?????\036\002???") at /usr/src/sys/kern/kern_shutdown.c:565
> #4  0xffffffff803f9e3f in trap_fatal (frame=0xffffff021ef0a4c0, 
> eva=18446742983307069104) at /usr/src/sys/amd64/amd64/trap.c:660
> #5  0xffffffff803fa2e2 in trap (frame=
>       {tf_rdi = 0, tf_rsi = -2139025408, tf_rdx = 1, tf_rcx = 1915683, 
> tf_r8 = 1048064, tf_r9 = 10, tf_rax = 79, tf_rbx = -1953325056, tf_rbp = 
> 90, tf_r10 = -1227486624, tf_r11 = 4294967208, tf_r12 = -1953325056, 
> tf_r13 = 90, tf_r14 = 61537, tf_r15 = 61530, tf_trapno = 3, tf_addr = 0, 
> tf_flags = -1099501259136, tf_err = 0, tf_rip = -2145457834, tf_cs = 8, 
> tf_rflags = 642, tf_rsp = -1227486384, tf_ss = 16}) at 
> /usr/src/sys/amd64/amd64/trap.c:469
> #6  0xffffffff803e55fb in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:168
> #7  0xffffffff801ee956 in bce_breakpoint (sc=0xffffffff8b92a000) at 
> cpufunc.h:63
> #8  0xffffffff801ef0f6 in bce_intr (xsc=0x0) at 
> /usr/src/sys/dev/bce/if_bce.c:3970
> #9  0xffffffff80284919 in ithread_loop (arg=0xffffff00009e4000) at 
> /usr/src/sys/kern/kern_intr.c:682
> #10 0xffffffff802830b7 in fork_exit (callout=0xffffffff802847d0 
> <ithread_loop>, arg=0xffffff00009e4000, frame=0xffffffffb6d60c50) at 
> /usr/src/sys/kern/kern_fork.c:821
> #11 0xffffffff803e595e in fork_trampoline () at 
> /usr/src/sys/amd64/amd64/exception.S:394
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000001 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> <SNIP LOTS OF 0 FRAMES>
> #44 0x00000000007f3000 in ?? ()
> #45 0xffffff021ef286b0 in ?? ()
> #46 0x0000000000000104 in ?? ()
> #47 0x0000000000000000 in ?? ()
> #48 0xffffff021ef286b0 in ?? ()
> #49 0xffffff021ef68000 in ?? ()
> #50 0xffffffffb6d60848 in ?? ()
> #51 0xffffff021ef0a4c0 in ?? ()
> #52 0xffffffff802b4856 in sched_switch (td=0xffffff00009e4000, 
> newtd=0x0, flags=0) at /usr/src/sys/kern/sched_4bsd.c:973
> <SNIP LOTS OF 0 FRAMES>
> #124 0x0000000000000000 in ?? ()
> Cannot access memory at address 0xffffffffb6d61000
> (kgdb) frame 8
> #8  0xffffffff801ef0f6 in bce_intr (xsc=0x0) at 
> /usr/src/sys/dev/bce/if_bce.c:3970
> 3970                            DBRUNIF((!(rxbd->rx_bd_flags & 
> RX_BD_FLAGS_END)),
> (kgdb) list
> 3965
> 3966                    /* The mbuf is stored with the last rx_bd entry 
> of a packet. */
> 3967                    if (sc->rx_mbuf_ptr[sw_chain_cons] != NULL) {
> 3968
> 3969                            /* Validate that this is the last rx_bd. */
> 3970                            DBRUNIF((!(rxbd->rx_bd_flags & 
> RX_BD_FLAGS_END)),
> 3971                                    BCE_PRINTF("%s(%d): Unexpected 
> mbuf found in rx_bd[0x%04X]!\n",
> 3972                                    __FILE__, __LINE__, sw_chain_cons);
> 3973                                    bce_breakpoint(sc));
> 3974
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"



More information about the freebsd-net mailing list