cvs commit: src/sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h

Thu Apr 3 12:21:06 UTC 2008

On Mon, Mar 31, 2008 at 12:34 PM, Peter Wemm <peter at wemm.org> wrote:
>
> On Mon, Mar 31, 2008 at 12:13 PM, David Christensen
>  <davidch at broadcom.com> wrote:
>  > > On Thu, Feb 21, 2008 at 5:46 PM, David Christensen
>  >  > <davidch at freebsd.org> wrote:
>  >  > >   Modified files:
>  >  > >     sys/dev/bce          if_bce.c if_bcefw.h if_bcereg.h
>  >  > >   Log:
>  >  > >   MFC after:      4 weeks
>  >  > >
>  >  > >   - Added loose RX MTU functionality to allow frames larger
>  >  > than 1500 bytes
>  >  > >     to be accepted even though the interface MTU is set to 1500.
>  >  > >   - Implemented new TCP header splitting/jumbo frame
>  >  > support which uses
>  >  > >     two chains for receive traffic rather than the original
>  >  > single recevie
>  >  > >     chain.
>  >  > >   - Added additional debug support code.
>  >  > >
>  >  > >   Revision  Changes      Path
>  >  > >   1.36      +1559 -675   src/sys/dev/bce/if_bce.c
>  >  > >   1.5       +6179 -4850  src/sys/dev/bce/if_bcefw.h
>  >  > >   1.17      +264 -55     src/sys/dev/bce/if_bcereg.h
>  >  >
>  >  > This has been devastating on the freebsd.org cluster.
>  >  >
>  >  > Attached are three test runs.  I've done a cold reboot, then 'cd
>  >  > /usr/src/sys' and doing a 'cvs -Rq update' where the CVSROOT is over
>  >  > nfs.
>  >  >
>  >  > First, the old driver:
>  >  > svn# time cvs -Rq up
>  >  > 0.890u 4.577s 1:14.48 7.3%      669+2315k 7379+0io 10094pf+0w
>  >  >
>  >  > Now, the same test again, but with this change included in the kernel:
>  >  > svn# time cvs -Rq up
>  >  > 0.940u 359.906s 7:01.04 85.7%   648+2242k 7365+0io 10082pf+0w
>  >  >
>  >  > Note the massive increase (nearly 100 times increase) in system time,
>  >  > and the almost 7-fold increase in wall clock time.
>  >  >
>  >  > Turning on promisc mode helps a lot, but doesn't solve it.  (This was
>  >  > found when ps@ was using tcpdump to try and figure out what the
>  >  > problem was)
>  >
>  >  The change is needed to update the FreeBSD driver so that it can
>  >  continue using production firmware for the controllers.  The previous
>  >  firmware was specific to FreeBSD and was not being maintained.
>  >
>  >  I didn't see any performance issues running with netperf.  Is the NFS
>  >  traffic UDP or TCP?  What's the MTU in use?  How much system memory is
>  >  available?
>
>  NFS over UDP.  We're also seeing problems with NIS/YP (also UDP) on
>  the box with the driver active.  The MTU is the standard 1500.  Both
>  machines have 8GB of ram.  Both are 64 bit kernels.  Client is a Dell
>  2950 (2 x quad core2), the server is a HP DL385 (quad opteron with
>  bge).
>
>
>  >  If this is a performance problem then the first place I would look is
>  >  in the definitions for rx_bd_mbuf_alloc_size and pg_bd_mbuf_alloc_size.
>  >  The older version of the driver would use multiple 2KB buffers
>  >  (MCLBYTES in size) from a single chain when building a packet so you
>  >  would typically have a single mbuf cluster passed to the stack.  The
>  >  new firmware uses two chains, each of which may be a different size.
>  >  The current implementation will use MHLEN bytes for the rx chain and
>  >  MCLBYTES for the pg chain.  When a packet is received the hardware will
>  >  place as much data as possible into a single mbuf in the rx chain,
>  >  then place any remaining data into one or more mbufs in the pg chain.
>  >  The driver will then stitch together the mbufs before passing them up
>  >  the stack.  This process is supposed to improve performance for TCP
>  >  because the TCP payload will be split from the TCP header and should
>  >  be quicker to access.
>  >
>  >  A quick test would be to set rx_bd_mbuf_alloc_size to MCLBYTES, which
>  >  should for the most part duplicate the older behavior.  The driver
>  >  will still allocate more mbufs which might be a problem if system
>  >  memory is already low.  Is anyone else aware of a driver that does
>  >  TCP header splitting?  It's typically on the TX side to see a packet
>  >  with two or three mbufs in a chain but I suspect it's less typical
>  >  on the RX side which could be part of the problem.
>
>  The one thing that I'm very sure of is that system memory isn't low,
>  on either machine.   The extraordinary increase in accumulated system
>  time of the process makes me wonder if something odd is going on with
>  the TX path.  When sending packets, the network stack and driver code
>  path execution times are charged to the user process doing the writes.
>   On the receive side, the cpu time will be accumulated in either the
>  driver ithread or taskqueue, or the netisr kthread.  To be honest, I
>  hadn't been looking to see if excessive cpu time was accumulating
>  there, but I did notice that the system's load average was over 2.0
>  for the duration of the 'cvs update' on an otherwise idle machine.
>  This suggests to me that both send and receive were bogging down
>  somehow.
>
>  Perhaps it is something silly like a spin lock being triggered?
>
>
>  >  >
>  >  > Here's the same test, with the new driver, and promisc mode on:
>  >  > svn# ifconfig bce0 promisc
>  >  > svn# time cvs -Rq up
>  >  > 0.967u 50.919s 2:13.97 38.7%    650+2250k 7379+0io 10094pf+0w
>  >  >
>  >  > It is better.. Only double the wall clock time, but still over 10
>  >  > times as much system time.
>  >  >
>  >
>  >  It's not clear to me why promiscuous mode would make a difference
>  >  here as that should only affect which packets are accepted by the
>  >  MAC.  Is there any teaming or VLANs used in your configuration?
>  >  The RX MTU settings shouldn't be affected by promiscuous mode.
>
>  There is nothing special going on.  Just a plain gige cable to a cisco
>  gige switch.  I have no explanation for the promisc thing - one of the
>  freebsd.org admins thought the problem was with YP/NIS. He started up
>  a tcpdump to observe the NIS interactions during ssh login, and the
>  problem mostly went away.
>
>  BTW; I did the test twice.  I ran the machine with cvs HEAD, and
>  backed the driver out to before the commit.  I also tried a RELENG_7
>  kernel, and then put the HEAD bce driver on 7.x - the problem goes
>  with the bce driver change in both 7.x and 8.x/HEAD.
>
>  There will be 4 more of these machines online sometime today (7.x and
>  8,x, both 32 and 64 bit).  We can experiment with those at will.
>
>
>  >
>  >
>  >  >
>  >  > So please, don't MFC until this is solved..
>  >  >
>  >
>  >  I haven't yet as I've received reports from a few other people that
>  >  they're having problems, though they're functional problems and not
>  >  performance issues.

On 8.0/i386, with PAE enabled, I get messages on the console and the
system hangs when trying to do a nfs mount.  Backing out the driver
fixes it.  The same driver doesn't cause quite as spectacular a
failure on 8.0/amd64, but it isn't exactly happy..

Additional IP options:.^M
Mounting NFS file systebcms:e1: link state changed to UP^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
[..forever..]

NFS over UDP, fwiw.  Server is a netapp.

-- 
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
**WANTED TO BUY: Garmin Streetpilot 2650 or 2660. Not later model! **