L1 cache thrashing affects performance of HIMENO benchmark

Jason Evans jasone at freebsd.org
Sat Jan 5 22:02:24 UTC 2013


On Jan 5, 2013, at 12:47 PM, Adrian Chadd <adrian at freebsd.org> wrote:
> On 5 January 2013 07:38, Hakisho Nukama <nukama at gmail.com> wrote:
>> FreeBSD (PCBSD) is slower compared to Linux and kFreeBSD in this
>> benchmark of HIMENO:
>> http://openbenchmarking.org/prospect/1202215-BY-FREEBSD9683/88ac7a01c6cb355d7e7603224b2ee1e5a4cb881d
>> Also DragonFly BSD compares worse to kFreeBSD and Linux:
>> http://www.phoronix.com/scan.php?page=article&item=dragonfly_linux_32&num=3
>> http://openbenchmarking.org/prospect/1206255-SU-DRAGONFLY55/88ac7a01c6cb355d7e7603224b2ee1e5a4cb881d
>> 
>> Matt, Venkatesh and Alex investigated this performance problem and
>> came to these results:
>> http://leaf.dragonflybsd.org/mailarchive/users/2013-01/msg00011.html
> 
> I've CC'ed jasone on this as it's an interesting side-effect of memory
> allocation logic.
> 
> Jason - any comments?

There are many variations on this class of performance problem, and the short of it is that only the application can have adequate understanding of data structure layout and access patterns to reliably make optimal use of the cache.  However, it is possible for the allocator to lay out memory in a more haphazard fashion than jemalloc, phkmalloc, etc. do, such that the application can be cache-oblivious and (usually) not suffer worst case consequences as happened in this case.  Extent-based allocators like dlmalloc often get this "for free" for a significant range of allocation sizes.  jemalloc could be modified to this end, but a full solution would necessarily increase internal fragmentation.  It might be worth experimenting with nonetheless.

Thanks,
Jason


More information about the freebsd-hackers mailing list