svn commit: r303583 - head/sys/amd64/amd64

Slawa Olhovchenkov slw at zxy.spb.ru
Sun Jul 31 15:48:27 UTC 2016


On Sun, Jul 31, 2016 at 06:26:29PM +0300, Slawa Olhovchenkov wrote:

> On Mon, Aug 01, 2016 at 12:30:14AM +1000, Bruce Evans wrote:
> 
> > On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote:
> > 
> > > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote:
> > >
> > >> Misalignment of this loop made it almost twice as slow on old Turion2 with
> > >> slow DDR2 memory.  It made no difference on Haswell.  I added an extra
> > >> movnti, but that makes little or no differences.  2 more movnti's wouldn't
> > >> fit in a 16-byte cache line so are slower unless even more care is taken
> > >> with alignment (or with less care, 4 with misalignment are not less than
> > >> twice as slow as 1 with alignment).
> > >>
> > >> I thought that alignment and unrolling didn't matter here, because movnti
> > >> has to wait for memory and almost any loop runs fast enough to keep up.
> > >> The timing on my old system is something like: CPUs at 2 GHz; main memory
> > >> at 4 GB/sec; movnti is only 4 bytes wide on i386 (so this problem
> > >> only affects i386, at least with slow memory).  So sustaining 4 GB/sec
> > >> requires 1 G movnti's/sec, so the loop needs to run at 2 cycles/iteration
> > >> to keep up.  But when it is misaligned, it runs at 3-4 cycles/iteration.
> > >> Alignment makes it take about 2, and the extra movnti is for safety and
> > >> to work with faster memory.
> > >>
> > >> On Haswell with CPUs at 4 GHz, 2 cycles/iteration gives 8 GB/sec on
> > >> i386 and 16 GB/sec on amd64 with wider movnti.  IIRC, 16 GB/sec is about
> > >> the main memory speed so nothing better is possible but just 1 extra
> > >> movnti gives more with faster memory.  This is just worse than bzero()
> > >
> > > What about modern system with 120 GB/sec main memory speed?
> > 
> > Is there such a system?  It would have main memory almost twice as fast
> > as Haswell L2 and almost half as fast as Haswell L1.
> 
> http://ark.intel.com/products/family/93797/Intel-Xeon-Processor-E7-v4-Family#@Server
> 
> 102 GB/s (sorry, 120 is misprint)
> 
> > My fastest memory actually does 20001 MB/s according to old memtest
> > and that is about right according to other tests.
> 
> Some short time I am have free 1650v4
> http://ark.intel.com/products/92994/Intel-Xeon-Processor-E5-1650-v4-15M-Cache-3_60-GHz
> with up to 76.8 GB/s (by datasheet, at DDR4-2400).
> With installed DDR4-2133 -- up to 68.2 GB/s (teoretical)
> After short time system put into production.
> 
> I am unable to boot UEFI Memtest86 7.0, old version (4.3.7) show 15 GB/s.

Here http://wccftech.com/intel-broadwell-ep-xeon-e5-2698-v4-processor/
some benchmark show 110 GB/s write speed.


More information about the svn-src-all mailing list