thread aware malloc

Niall Douglas s_sourceforge at
Thu Apr 14 02:38:45 PDT 2005

On 14 Apr 2005 at 1:54, Ivan Voras wrote:

> > KSE threads, FreeBSD v5.3. It's a mixture of process and system
> > threads. As I mentioned in a previous post, it's eight times slower
> > than Linux. See (scroll down to the
> > screenshots).
> Did you test on a "real" FreeBSD & Linux hardware (not VMWare)?
> Because VMWare **greatly** pessimizes low-level operations that depend
> on atomic/bus locks, CMPXCHG & similar operations that are used in
> synchronization, context switches & multithreading, and IO operations
> are also very very slow compared to real hardware. It simply cannot be
> used to do benchmarks on (except if you're benchmarking vmware, not
> the guest system :) )

VMWare should penalise Linux and FreeBSD equally in this area, 
actually Linux even more as it doesn't have a CMPXCHG exemption from 
kernel builds and Fedora Linux has more daemons etc. running by 

Julian Elischer wrote:

> did you compile the FreeBSD kernel with the required changes for running
> under vmware?

Yes. It's a custom build made go as fast as I could make it.

Interestingly, v5.3 was about half the speed of v5.2.1.

> Vmware REALLY SUCKS when it comes to emulating the exact instructions we
> use for kernel locks and mutexes. You'd get maybe an order of magnitude
> difference through this under some situations.
> I forget the exact options but they'll be in the list archives.
> also make sure it's a Uniprocessor kernel.

It's also a uniprocessor kernel.

I finally have a native 64 bit platform which I just bought, I'm just 
waiting on the release of v5.4. I'll post benchmarks here when I 
know, but I'm still expecting an order of magnitude difference. 
FreeBSD just feels a lot slower when doing multithreaded tasks 
despite being faster for multiprocess work. It's as one would expect 
with the added complexity of a M:N threading model which hasn't been 
optimised yet.

Either way, ptmalloc2 is many times faster than the libc memory 
allocator. It's over sixty times faster than the win32 default 
allocator under four threads, so one would expect at least a similar 
speed up. I would expect a six to twelve times speed up for real 
world code.


More information about the freebsd-threads mailing list