FreeBSD 9.1 and BCM57711 issues (broadcom 10ge ethernet card)

Sébastien RICCIO sr at swisscenter.com
Tue Jul 23 12:50:06 UTC 2013


root at filer-01-a:/# freebsd-update -r 9.2-BETA1 upgrade

Looking up update.FreeBSD.org mirrors... 4 mirrors found.
Fetching metadata signature for 9.1-RELEASE from update4.freebsd.org... 
done.
Fetching metadata index... done.
Inspecting system... done.

The following components of FreeBSD seem to be installed:
kernel/generic src/src world/base world/lib32

The following components of FreeBSD do not seem to be installed:
world/doc world/games

Does this look reasonable (y/n)? y

Fetching metadata signature for 9.2-BETA1 from update4.freebsd.org... done.
Fetching metadata index... done.

The update metadata is correctly signed, but
failed an integrity check.
Cowardly refusing to proceed any further.


Cowardly ? :)


On 23.07.2013 13:56, Steven Hartland wrote:
> Have you tried a more recent version e.g. 9.2-PRERELEASE or 9/stable?
>
>    Regards
>    Steve
>
> ----- Original Message ----- From: "Sébastien RICCIO" 
> <sr at swisscenter.com>
> To: <freebsd-net at freebsd.org>
> Sent: Tuesday, July 23, 2013 12:28 PM
> Subject: FreeBSD 9.1 and BCM57711 issues (broadcom 10ge ethernet card)
>
>
> Hi freebsd-net!
>
> We recently installed FreeBSD 9.1 64bit on a Dell PowerEdge R510 system
> in which we have two BCM57711 (for a total of four 10Gbit interfaces.)
>
> We're planning to use it as a storage filer using ZFS/NFS.
>
> Actually in test, the filer is connected with two 10gigs interfaces to a
> 10ge Dell PowerConnect switch that serves some linux clients using 10ge
> cards too.
>
> We get into a lot of troubles trying to get something working out of
> this setup.
>
> -- 
>
> First issue:
>
> Without any special tweaking, when we're reading or writing to the NFS
> server from a client, the network card crashes and become. In the logs I
> can see:
>
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ---------- Begin crash dump
> ----------
> Jul 19 11:49:26 filer-01-a kernel: bxe0: 
> ------------------------------ Idle Check ------------------------------
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39
> CID_CAM 0x7 Value is 0xc
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit
> is not equal to initial credit. Values are 0xf8 0x140
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is
> not equal to initial credit. Values are 0x5a1c 0x8000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM
> is not empty. Value is 0x1
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM
> is not empty. Value is 0x1
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty.
> Value is 0x3
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is
> not 64. Value is 0x30
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4
> error(s) and 0 warning(s)!
> Jul 19 11:49:26 filer-01-a kernel: bxe0:
> ------------------------------------------------------------------------
> Jul 19 11:49:26 filer-01-a kernel: bxe0: 
> ------------------------------ Idle Check ------------------------------
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39
> CID_CAM 0x7 Value is 0xc
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit
> is not equal to initial credit. Values are 0xf8 0x140
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is
> not equal to initial credit. Values are 0x5a1c 0x8000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM
> is not empty. Value is 0x1
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM
> is not empty. Value is 0x1
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty.
> Value is 0x4
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is
> not 64. Value is 0x30
> Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING PRS: TCM current credit
> is not 0. Value is 0x10
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0
> is not 0. Value is 0x10000
> Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4
> error(s) and 0 warning(s)!
> Jul 19 11:49:26 filer-01-a kernel: bxe0:
> ------------------------------------------------------------------------
> Jul 19 11:49:26 filer-01-a kernel: bxe0: ----------  End crash dump
> ----------
>
> A reboot of the system is even not enough. After rebooting the system, I
> can't even ping any hosts on the network. It seems that it leaves the
> card in a bogus state that requires a complete power cycle to get the
> cards back in business.
>
> We found out that disabling: tso4 txcsum rxcsum on the cards prevent
> this from happening.
>
> So although I think it's not, let's say we have a fix for this setting
> in rc.conf something like this:
> ifconfig_bxe0="inet 10.50.50.11 netmask 255.255.255.0 mtu 9000 -tso4
> -txcsum -rxcsum"
>
> -- 
>
> Second issue,
>
> Issuing an ifconfig mtu 9000 on the interfaces randomly produce this 
> error:
>
> Jul 19 09:47:03 filer-01-a kernel: bxe0:
> /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot
> fill fp[04] RX chain.
> Jul 19 09:47:03 filer-01-a kernel: bxe0:
> /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!
> Jul 19 09:47:12 filer-01-a kernel: bxe3:
> /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot
> fill fp[04] RX chain.
> Jul 19 09:47:12 filer-01-a kernel: bxe3:
> /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting!
>
> That sounds quite bad and, I can't reproduce it with mtu 1500 setting.
> (But does it makes sens to use a MTU of 1500 on a 10gig local 
> network...?)
>
> -- 
>
> Third issue,
>
> part 1)
>
> We've tried two interfaces (each interface with an mtu of 9000) using
> lagg, like this:
>
> ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 9000
> ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 9000
> ifconfig lagg0 create
> ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2
> 10.50.50.11/24
>
> This instantanely crashes the kernel and cause a machine reboot. The log
> says:
>
> Jul 19 09:47:12 filer-01-a kernel:
> Jul 19 09:47:12 filer-01-a kernel:
> Jul 19 09:47:12 filer-01-a kernel: Fatal trap 12: page fault while in
> kernel mode
> Jul 19 09:47:12 filer-01-a kernel: cpuid = 0; apic id = 20
> Jul 19 09:47:12 filer-01-a kernel: fault virtual address        = 0x6d
> Jul 19 09:47:12 filer-01-a kernel: fault code           = supervisor
> read data, page not present
> Jul 19 09:47:12 filer-01-a kernel: instruction pointer  =
> 0x20:0xffffffff808d5879
> Jul 19 09:47:12 filer-01-a kernel: stack pointer                =
> 0x28:0xffffff80003227f0
>              --*** BOOOM REBOOT ***--
> Jul 19 09:49:49 filer-01-a syslogd: kernel boot file is 
> /boot/kernel/kernel
>
> /var/crash/core.txt.0 returns:
>
> Unread portion of the kernel message buffer:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 5; apic id = 33
> fault virtual address   = 0x6d
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff808d5879
> stack pointer           = 0x28:0xffffff80003227f0
> frame pointer           = 0x28:0xffffff8000322820
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 12 (swi6: task queue)
> trap number             = 12
> panic: page fault
> cpuid = 5
> KDB: stack backtrace:
> #0 0xffffffff809208a6 at kdb_backtrace+0x66
> #1 0xffffffff808ea8be at panic+0x1ce
> #2 0xffffffff80bd8240 at trap_fatal+0x290
> #3 0xffffffff80bd857d at trap_pfault+0x1ed
> #4 0xffffffff80bd8b9e at trap+0x3ce
> #5 0xffffffff80bc315f at calltrap+0x8
> #6 0xffffffff8045da8c at bxe_free_buf_rings+0x4c
> #7 0xffffffff8046c0d5 at bxe_init_locked+0x125
> #8 0xffffffff80470cfe at bxe_ioctl+0x4fe
> #9 0xffffffff8099d08f at if_setlladdr+0x1ff
> #10 0xffffffff8174c94a at lagg_port_setlladdr+0x8a
> #11 0xffffffff8092cf55 at taskqueue_run_locked+0x85
> #12 0xffffffff8092d0da at taskqueue_run+0x3a
> #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104
> #14 0xffffffff808c0076 at ithread_loop+0xa6
> #15 0xffffffff808bb9ef at fork_exit+0x11f
> #16 0xffffffff80bc368e at fork_trampoline+0xe
> Uptime: 39m41s
> Dumping 1505 out of 32735
> MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91%
>
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from
> /boot/kernel/if_lagg.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/if_lagg.ko
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> 224     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> #1  0xffffffff808ea3a1 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:448
> #2  0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>)
>     at /usr/src/sys/kern/kern_shutdown.c:636
> #3  0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is
> not available.
> )
>     at /usr/src/sys/amd64/amd64/trap.c:857
> #4  0xffffffff80bd857d in trap_pfault (frame=0xffffff8000322740, 
> usermode=0)
>     at /usr/src/sys/amd64/amd64/trap.c:773
> #5  0xffffffff80bd8b9e in trap (frame=0xffffff8000322740)
>     at /usr/src/sys/amd64/amd64/trap.c:456
> #6  0xffffffff80bc315f in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:228
> #7  0xffffffff808d5879 in free (addr=0xffffff80083e5000,
>     mtp=0xffffffff81198ba0) at uma_int.h:413
> #8  0xffffffff8045da8c in bxe_free_buf_rings (sc=0xffffff8000c1c000)
>     at /usr/src/sys/dev/bxe/if_bxe.c:3787
> #9  0xffffffff8046c0d5 in bxe_init_locked (sc=0x0, load_mode=0)
>     at /usr/src/sys/dev/bxe/if_bxe.c:4063
> #10 0xffffffff80470cfe in bxe_ioctl (ifp=0xfffffe000ec59000,
> command=Variable "command" is not available.
> )
>     at /usr/src/sys/dev/bxe/if_bxe.c:9668
> #11 0xffffffff8099d08f in if_setlladdr (ifp=0xfffffe000ec59000,
>     lladdr=0xfffffe00125da4c8 "", len=6) at /usr/src/sys/net/if.c:3304
> #12 0xffffffff8174c94a in lagg_port_setlladdr (arg=Variable "arg" is not
> available.
> )
>     at /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:495
> #13 0xffffffff8092cf55 in taskqueue_run_locked (queue=0xfffffe000e833980)
>     at /usr/src/sys/kern/subr_taskqueue.c:308
> #14 0xffffffff8092d0da in taskqueue_run (queue=0xfffffe000e833980)
>     at /usr/src/sys/kern/subr_taskqueue.c:322
> #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is
> not available.
> )
>     at /usr/src/sys/kern/kern_intr.c:1262
> #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe000e66c140)
>     at /usr/src/sys/kern/kern_intr.c:1275
> #17 0xffffffff808bb9ef in fork_exit (
>     callout=0xffffffff808bffd0 <ithread_loop>, arg=0xfffffe000e66c140,
>     frame=0xffffff8000322c40) at /usr/src/sys/kern/kern_fork.c:992
> #18 0xffffffff80bc368e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:602
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000001 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> #34 0x0000000000000000 in ?? ()
> #35 0x0000000000000000 in ?? ()
> #36 0x0000000000000000 in ?? ()
> #37 0x0000000000000000 in ?? ()
> #38 0x0000000000000000 in ?? ()
> #39 0x0000000000000000 in ?? ()
> #40 0x0000000000000000 in ?? ()
> #41 0x0000000000000000 in ?? ()
> #42 0x0000000000000000 in ?? ()
> #43 0x0000000000000005 in ?? ()
> #44 0xffffffff81244180 in tdq_cpu ()
> #45 0xfffffe000e698000 in ?? ()
> #46 0x0000000000000000 in ?? ()
> #47 0xffffff8000322b30 in ?? ()
> #48 0xffffff8000322ad8 in ?? ()
> #49 0xfffffe000e6728e0 in ?? ()
> #50 0xffffffff8091352e in sched_switch (td=0x0, newtd=0xfffffe000e66c140,
>     flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1921
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)
>
> Okay guess it has something to do again with the MTU 9000 but this time
> it does completly panic the kernel. This is no good.
>
>
> Part 2) Trying bonding with normal MTU 1500
>
> ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 1500
> ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 1500
> ifconfig lagg0 create
> ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2
> 10.50.50.11/24
>
> This time. No error messages, no crash. Yiha!
>
> But no. Even everything seems to be correct, the bonding is not working.
> We can't ping any host on the network.
> Also the lagg0 says: No carrier
>
> see:
>
> bxe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 
> 1500
> options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
>         ether 00:10:18:98:35:f8
>         inet6 fe80::210:18ff:fe98:35f8%bxe0 prefixlen 64 scopeid 0x3
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>         status: active
> bxe2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 
> 1500
> options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
>         ether 00:10:18:98:35:f8
>         inet6 fe80::210:18ff:fe95:eaa0%bxe2 prefixlen 64 scopeid 0x5
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-SR <full-duplex>)
>         status: active
> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 
> 1500
> options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
>         ether 00:10:18:98:35:f8
>         inet6 fe80::7a2b:cbff:fe1a:eab1%lagg0 prefixlen 64 scopeid 0x14
>         inet 10.50.50.11 netmask 0xffffff00 broadcast 10.50.50.255
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect
>         status: no carrier
>         laggproto failover lagghash l2,l3,l4
>         laggport: bxe2 flags=0<>
>         laggport: bxe0 flags=1<MASTER>
>
> Please note that priore to installing freebsd, the machine was running a
> Debian 7 GNU/Linux 64 bit OS where we had the cards bonded and MTU'ed to
> 9000 without any crash or stability issue.
> So it looks to me that there is something really wrong with the broadcom
> driver on freebsd 9.1, at least with the NIC's used in Dell servers.
>
> Provided that broadcom themselves doesn't supply drivers for freebsd Is
> there any possible fix ?
>
> Thanks for your attention and your help.
>
> Cheers,
> Sébastien
>
>
>
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. 
> and the person or entity to whom it is addressed. In the event of 
> misdirection, the recipient is prohibited from using, copying, 
> printing or otherwise disseminating it or any information contained in 
> it.
> In the event of misdirection, illegible or incomplete transmission 
> please telephone +44 845 868 1337
> or return the E.mail to postmaster at multiplay.co.uk.
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>



More information about the freebsd-net mailing list