non-temporal copyin/copyout?

Joseph Koshy joseph.koshy at gmail.com
Fri Feb 17 07:50:33 PST 2006


> I'm bringing this up because I've noticed that FreeBSD 10GbE
> performance is far below Solaris/amd64 and linux/x86_64 when
> using the PCI-e 10GbE adaptor that I'm doing drivers for.
> For example, Solaris can recieve a netperf TCP stream at

There was a bug in my port of netperf; I had left the
`HISTOGRAM' option turned on, which causes it to slow
down significantly.

v2.3.1,1 is the latest & bugfixed version of the port.

> 9.75Gb/sec while using only 47% CPU as measured by vmstat.
> (eg, it is using a little less than a single core).  In
> contrast, FreeBSD is limited to 7.7Gb/sec, and uses nearly
> 90% CPU.  When profiling with hwpmc, I see a profile which
> shows up to 70% of the time is spent in copyout.

You could use the following events to probe the system:

 "k8-dc-miss" : data cache misses
 "k8-bu-fill-request-l2-miss,mask=dc-fill" : L2 fills for the
     data cache
 "k8-dc-misaligned-data-reference": in case there are any
 "k8-fr-interrupts-masked-while-pending-cycles": for
     finding spots in the code where spin-locks are being
     held for long.

You may need to tweak the sample rate (the -n option to
pmcstat); the default of 65536 events per sample may be too
high or too low for some of these.  Using pmcstat -p EVENT
will give a feel for a good sample rate to choose for EVENT.

--
FreeBSD Volunteer,     http://people.freebsd.org/~jkoshy


More information about the freebsd-amd64 mailing list