The optimization of malloc(3): FreeBSD vs GNU libc

Brooks Davis brooks at one-eyed-alien.net
Mon Aug 14 23:15:12 UTC 2006


On Tue, Aug 15, 2006 at 07:10:47AM +0800, Intron wrote:
> One day, a friend told me that his program was 3 times slower under
> FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5).
> I was astonished by the real repeatable performance difference on
> AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache).
> 
> After hacking, I found that the problem is nested in malloc(3) of
> FreeBSD libc.
> 
> Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2
> 
> You may try to compile the program WITHOUT the macro "MY_MALLOC"
> defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1.
> Then, time the running of the binary (on Athlon XP 2500+):
> 
> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
>        165.24 real       164.19 user         0.02 sys
> 
> Please try to recompile the program (Remember to "make clean")
> WITH the macro "MY_MALLOC" defined (in Makefile) to use my own
> simple implementation of malloc(3) (i.e. my_malloc() in cal.c).
> And time the running again:
> 
> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
>        50.41 real        49.95 user         0.04 sys
> 
> You may repeat this testing again and again.
> 
> I guess this kind of performance difference comes from:
> 
> 1. His program uses malloc(3) to obtain so many small memory blocks.
> 
> 2. In this case, FreeBSD malloc(3) obtains small memory blocks from
>    kernel and pass them to application. 
> 
>    But malloc(3) of GNU libc obtains large memory blocks from kernel
>    and splits & reallocates them in small blocks to application.
> 
>    You may verify my judgement with truss(1).
> 
> 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which
>    reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc()
>    simulates the behavior of GNU libc malloc(3) partially and avoids
>    the over-chaos.
> 
> Callgrind is broken under FreeBSD, or I will verify my guess with it.
> 
> I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB
> L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T)
> 
> >/usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
>       185.30 real       184.28 user         0.02 sys
> 
> >/usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
>        36.31 real        35.94 user         0.03 sys
> 
> NOTE: you probably cannot see the performance difference on CPU with
>    small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache.

In CURRENT we've replaced phkmalloc with jemalloc.  It would be useful
to see how this benchmark performs with that.  I believe it does similar
things.

-- Brooke
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20060814/59e3aa25/attachment.pgp


More information about the freebsd-hackers mailing list