dev.bce.X.com_no_buffers increasing and packet loss
David Christensen
davidch at broadcom.com
Tue Mar 9 23:00:55 UTC 2010
> -----Original Message-----
> From: Ryan Stone [mailto:rysto32 at gmail.com]
> Sent: Tuesday, March 09, 2010 2:31 PM
> To: David Christensen
> Cc: pyunyh at gmail.com; Ian FREISLICH; current at freebsd.org
> Subject: Re: dev.bce.X.com_no_buffers increasing and packet loss
>
> > What's the traffic look like? Jumbo, standard, short frames? Any
> > good ideas on profiling the code? I haven't figured out how to use
> > the CPU TSC but there is a free running timer on the device
> that might
> > be usable to calculate where the driver's time is spent.
> >
> > Dave
>
> In my experience hwpmc is the best and easiest way to profile
> anything on FreeBSD. Here's something I sent to a different
> thread a couple of months ago explaining how to use it:
>
> 1) If device hwpmc is not compiled into your kernel, kldload
> hwpmc(you will need the HWPMC_HOOKS option in either case)
> 2) Run pmcstat to begin taking samples(make sure that
> whatever you are profiling is busy doing work first!):
>
> pmcstat -S unhalted-cycles -O /tmp/samples.out
>
> The -S option specifies what event you want to use to trigger
> sampling. The unhalted-cycles is the best event to use if
> 3) After you've run pmcstat for "long enough"(a proper
> definition of long enough requires a statistician, which I
> most certainly am not, but I find that for a busy system 10
> seconds is enough), Control-C it to stop it*. You can use
> pmcstat to post-process the samples into human-readable text:
>
> pmcstat -R /tmp/samples.out -G /tmp/graph.txt
>
> The graph.txt file will show leaf functions on the left and
> their callers beneath them, indented to reflect the
> callchain. It's not too easy to describe and I don't have
> sample output available right now.
Below is a quick sample I obtained running netperf. We're
interested in the bce(4) driver so I assume I'm interested
in the time spent in bce and the functions it calls. Looks
to me like memory allocation/freeing is a major source of
CPU cycles in this test. Am I reading this right?
@ CPU_CLK_UNHALTED_CORE [1091924 samples]
49.25% [537739] sched_idletd @ /boot/kernel/kernel
100.0% [537739] fork_exit
20.89% [228070] trash_dtor @ /boot/kernel/kernel
85.45% [194883] mb_dtor_clust
100.0% [194883] uma_zfree_arg
100.0% [194883] mb_free_ext
14.55% [33186] mb_dtor_mbuf
100.0% [33186] uma_zfree_arg
84.27% [27966] mb_free_ext
15.73% [5220] m_freem
00.00% [1] mb_dtor_pack
100.0% [1] uma_zfree_arg
100.0% [1] mb_free_ext
02.34% [25542] bce_intr @ /boot/kernel/if_bce.ko
100.0% [25542] intr_event_execute_handlers @ /boot/kernel/kernel
100.0% [25542] ithread_loop
100.0% [25542] fork_exit
02.20% [24055] trash_ctor @ /boot/kernel/kernel
96.41% [23192] mb_ctor_clust
100.0% [23192] uma_zalloc_arg
100.0% [23192] bce_fill_rx_chain @ /boot/kernel/if_bce.ko
03.39% [815] mb_ctor_mbuf @ /boot/kernel/kernel
100.0% [815] uma_zalloc_arg
99.39% [810] bce_fill_rx_chain @ /boot/kernel/if_bce.ko
00.49% [4] m_copym @ /boot/kernel/kernel
00.12% [1] tcp_output
00.20% [48] uma_zalloc_arg
100.0% [48] bce_fill_rx_chain @ /boot/kernel/if_bce.ko
100.0% [48] bce_intr
Dave
More information about the freebsd-current
mailing list