dev.bce.X.com_no_buffers increasing and packet loss

Tue Mar 9 23:00:55 UTC 2010

> -----Original Message-----
> From: Ryan Stone [mailto:rysto32 at gmail.com] 
> Sent: Tuesday, March 09, 2010 2:31 PM
> To: David Christensen
> Cc: pyunyh at gmail.com; Ian FREISLICH; current at freebsd.org
> Subject: Re: dev.bce.X.com_no_buffers increasing and packet loss
> 
> > What's the traffic look like?  Jumbo, standard, short frames?  Any 
> > good ideas on profiling the code?  I haven't figured out how to use 
> > the CPU TSC but there is a free running timer on the device 
> that might 
> > be usable to calculate where the driver's time is spent.
> >
> > Dave
> 
> In my experience hwpmc is the best and easiest way to profile 
> anything on FreeBSD.  Here's something I sent to a different 
> thread a couple of months ago explaining how to use it:
> 
> 1) If device hwpmc is not compiled into your kernel, kldload 
> hwpmc(you will need the HWPMC_HOOKS option in either case)
> 2) Run pmcstat to begin taking samples(make sure that 
> whatever you are profiling is busy doing work first!):
> 
> pmcstat -S unhalted-cycles -O /tmp/samples.out
> 
> The -S option specifies what event you want to use to trigger 
> sampling.  The unhalted-cycles is the best event to use if 

> 3) After you've run pmcstat for "long enough"(a proper 
> definition of long enough requires a statistician, which I 
> most certainly am not, but I find that for a busy system 10 
> seconds is enough), Control-C it to stop it*.  You can use 
> pmcstat to post-process the samples into human-readable text:
> 
> pmcstat -R /tmp/samples.out -G /tmp/graph.txt
> 
> The graph.txt file will show leaf functions on the left and 
> their callers beneath them, indented to reflect the 
> callchain.  It's not too easy to describe and I don't have 
> sample output available right now.

Below is a quick sample I obtained running netperf.  We're 
interested in the bce(4) driver so I assume I'm interested
in the time spent in bce and the functions it calls.  Looks
to me like memory allocation/freeing is a major source of
CPU cycles in this test.  Am I reading this right?

@ CPU_CLK_UNHALTED_CORE [1091924 samples]

49.25%  [537739]   sched_idletd @ /boot/kernel/kernel
 100.0%  [537739]    fork_exit

20.89%  [228070]   trash_dtor @ /boot/kernel/kernel
 85.45%  [194883]    mb_dtor_clust
  100.0%  [194883]     uma_zfree_arg
   100.0%  [194883]      mb_free_ext
 14.55%  [33186]     mb_dtor_mbuf
  100.0%  [33186]      uma_zfree_arg
   84.27%  [27966]       mb_free_ext
   15.73%  [5220]        m_freem
 00.00%  [1]         mb_dtor_pack
  100.0%  [1]          uma_zfree_arg
   100.0%  [1]           mb_free_ext

02.34%  [25542]    bce_intr @ /boot/kernel/if_bce.ko
 100.0%  [25542]     intr_event_execute_handlers @ /boot/kernel/kernel
  100.0%  [25542]      ithread_loop
   100.0%  [25542]       fork_exit

02.20%  [24055]    trash_ctor @ /boot/kernel/kernel
 96.41%  [23192]     mb_ctor_clust
  100.0%  [23192]      uma_zalloc_arg
   100.0%  [23192]       bce_fill_rx_chain @ /boot/kernel/if_bce.ko
 03.39%  [815]       mb_ctor_mbuf @ /boot/kernel/kernel
  100.0%  [815]        uma_zalloc_arg
   99.39%  [810]         bce_fill_rx_chain @ /boot/kernel/if_bce.ko
   00.49%  [4]           m_copym @ /boot/kernel/kernel
   00.12%  [1]           tcp_output
 00.20%  [48]        uma_zalloc_arg
  100.0%  [48]         bce_fill_rx_chain @ /boot/kernel/if_bce.ko
   100.0%  [48]          bce_intr

Dave