cvs commit: src/sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h

Peter Wemm peter at
Mon Mar 31 12:35:01 PDT 2008

On Mon, Mar 31, 2008 at 12:13 PM, David Christensen
<davidch at> wrote:
> > On Thu, Feb 21, 2008 at 5:46 PM, David Christensen
>  > <davidch at> wrote:
>  > >   Modified files:
>  > >     sys/dev/bce          if_bce.c if_bcefw.h if_bcereg.h
>  > >   Log:
>  > >   MFC after:      4 weeks
>  > >
>  > >   - Added loose RX MTU functionality to allow frames larger
>  > than 1500 bytes
>  > >     to be accepted even though the interface MTU is set to 1500.
>  > >   - Implemented new TCP header splitting/jumbo frame
>  > support which uses
>  > >     two chains for receive traffic rather than the original
>  > single recevie
>  > >     chain.
>  > >   - Added additional debug support code.
>  > >
>  > >   Revision  Changes      Path
>  > >   1.36      +1559 -675   src/sys/dev/bce/if_bce.c
>  > >   1.5       +6179 -4850  src/sys/dev/bce/if_bcefw.h
>  > >   1.17      +264 -55     src/sys/dev/bce/if_bcereg.h
>  >
>  > This has been devastating on the cluster.
>  >
>  > Attached are three test runs.  I've done a cold reboot, then 'cd
>  > /usr/src/sys' and doing a 'cvs -Rq update' where the CVSROOT is over
>  > nfs.
>  >
>  > First, the old driver:
>  > svn# time cvs -Rq up
>  > 0.890u 4.577s 1:14.48 7.3%      669+2315k 7379+0io 10094pf+0w
>  >
>  > Now, the same test again, but with this change included in the kernel:
>  > svn# time cvs -Rq up
>  > 0.940u 359.906s 7:01.04 85.7%   648+2242k 7365+0io 10082pf+0w
>  >
>  > Note the massive increase (nearly 100 times increase) in system time,
>  > and the almost 7-fold increase in wall clock time.
>  >
>  > Turning on promisc mode helps a lot, but doesn't solve it.  (This was
>  > found when ps@ was using tcpdump to try and figure out what the
>  > problem was)
>  The change is needed to update the FreeBSD driver so that it can
>  continue using production firmware for the controllers.  The previous
>  firmware was specific to FreeBSD and was not being maintained.
>  I didn't see any performance issues running with netperf.  Is the NFS
>  traffic UDP or TCP?  What's the MTU in use?  How much system memory is
>  available?

NFS over UDP.  We're also seeing problems with NIS/YP (also UDP) on
the box with the driver active.  The MTU is the standard 1500.  Both
machines have 8GB of ram.  Both are 64 bit kernels.  Client is a Dell
2950 (2 x quad core2), the server is a HP DL385 (quad opteron with

>  If this is a performance problem then the first place I would look is
>  in the definitions for rx_bd_mbuf_alloc_size and pg_bd_mbuf_alloc_size.
>  The older version of the driver would use multiple 2KB buffers
>  (MCLBYTES in size) from a single chain when building a packet so you
>  would typically have a single mbuf cluster passed to the stack.  The
>  new firmware uses two chains, each of which may be a different size.
>  The current implementation will use MHLEN bytes for the rx chain and
>  MCLBYTES for the pg chain.  When a packet is received the hardware will
>  place as much data as possible into a single mbuf in the rx chain,
>  then place any remaining data into one or more mbufs in the pg chain.
>  The driver will then stitch together the mbufs before passing them up
>  the stack.  This process is supposed to improve performance for TCP
>  because the TCP payload will be split from the TCP header and should
>  be quicker to access.
>  A quick test would be to set rx_bd_mbuf_alloc_size to MCLBYTES, which
>  should for the most part duplicate the older behavior.  The driver
>  will still allocate more mbufs which might be a problem if system
>  memory is already low.  Is anyone else aware of a driver that does
>  TCP header splitting?  It's typically on the TX side to see a packet
>  with two or three mbufs in a chain but I suspect it's less typical
>  on the RX side which could be part of the problem.

The one thing that I'm very sure of is that system memory isn't low,
on either machine.   The extraordinary increase in accumulated system
time of the process makes me wonder if something odd is going on with
the TX path.  When sending packets, the network stack and driver code
path execution times are charged to the user process doing the writes.
 On the receive side, the cpu time will be accumulated in either the
driver ithread or taskqueue, or the netisr kthread.  To be honest, I
hadn't been looking to see if excessive cpu time was accumulating
there, but I did notice that the system's load average was over 2.0
for the duration of the 'cvs update' on an otherwise idle machine.
This suggests to me that both send and receive were bogging down

Perhaps it is something silly like a spin lock being triggered?

>  >
>  > Here's the same test, with the new driver, and promisc mode on:
>  > svn# ifconfig bce0 promisc
>  > svn# time cvs -Rq up
>  > 0.967u 50.919s 2:13.97 38.7%    650+2250k 7379+0io 10094pf+0w
>  >
>  > It is better.. Only double the wall clock time, but still over 10
>  > times as much system time.
>  >
>  It's not clear to me why promiscuous mode would make a difference
>  here as that should only affect which packets are accepted by the
>  MAC.  Is there any teaming or VLANs used in your configuration?
>  The RX MTU settings shouldn't be affected by promiscuous mode.

There is nothing special going on.  Just a plain gige cable to a cisco
gige switch.  I have no explanation for the promisc thing - one of the admins thought the problem was with YP/NIS. He started up
a tcpdump to observe the NIS interactions during ssh login, and the
problem mostly went away.

BTW; I did the test twice.  I ran the machine with cvs HEAD, and
backed the driver out to before the commit.  I also tried a RELENG_7
kernel, and then put the HEAD bce driver on 7.x - the problem goes
with the bce driver change in both 7.x and 8.x/HEAD.

There will be 4 more of these machines online sometime today (7.x and
8,x, both 32 and 64 bit).  We can experiment with those at will.

>  >
>  > So please, don't MFC until this is solved..
>  >
>  I haven't yet as I've received reports from a few other people that
>  they're having problems, though they're functional problems and not
>  performance issues.

Peter Wemm - peter at; peter at; peter at
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
**WANTED TO BUY: Garmin Streetpilot 2650 or 2660. Not later model! **

More information about the cvs-src mailing list