non-temporal copyin/copyout?

Andrew Gallatin gallatin at cs.duke.edu
Fri Feb 17 07:01:09 PST 2006



Has anybody considered using non-temporal copies for the in-kernel
bcopy on amd64?

A quick test in userspace shows that for large copies, an adapted
pagecopy (from amd64/amd64/support.S) more than doubles bcopy
bandwidth from 1.2GB/s to 2.5GB/s on my on my Athlon64 X2 3800+.

I'm bringing this up because I've noticed that FreeBSD 10GbE
performance is far below Solaris/amd64 and linux/x86_64 when using the
PCI-e 10GbE adaptor that I'm doing drivers for.  For example, Solaris
can recieve a netperf TCP stream at 9.75Gb/sec while using only 47%
CPU as measured by vmstat.  (eg, it is using a little less than a
single core).  In contrast, FreeBSD is limited to 7.7Gb/sec, and uses
nearly 90% CPU.  When profiling with hwpmc, I see a profile which
shows up to 70% of the time is spent in copyout. 

Thanks,

Drew



More information about the freebsd-amd64 mailing list