The optimization of malloc(3): FreeBSD vs GNU libc

Intron mag at intron.ac
Mon Aug 14 23:10:49 UTC 2006


One day, a friend told me that his program was 3 times slower under
FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5).
I was astonished by the real repeatable performance difference on
AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache).

After hacking, I found that the problem is nested in malloc(3) of
FreeBSD libc.

Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2

You may try to compile the program WITHOUT the macro "MY_MALLOC"
defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1.
Then, time the running of the binary (on Athlon XP 2500+):

#/usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
        165.24 real       164.19 user         0.02 sys

Please try to recompile the program (Remember to "make clean")
WITH the macro "MY_MALLOC" defined (in Makefile) to use my own
simple implementation of malloc(3) (i.e. my_malloc() in cal.c).
And time the running again:

#/usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
        50.41 real        49.95 user         0.04 sys

You may repeat this testing again and again.

I guess this kind of performance difference comes from:

1. His program uses malloc(3) to obtain so many small memory blocks.

2. In this case, FreeBSD malloc(3) obtains small memory blocks from
    kernel and pass them to application. 

    But malloc(3) of GNU libc obtains large memory blocks from kernel
    and splits & reallocates them in small blocks to application.

    You may verify my judgement with truss(1).

3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which
    reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc()
    simulates the behavior of GNU libc malloc(3) partially and avoids
    the over-chaos.

Callgrind is broken under FreeBSD, or I will verify my guess with it.

I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB
L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T)

>/usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
       185.30 real       184.28 user         0.02 sys

>/usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
        36.31 real        35.94 user         0.03 sys

NOTE: you probably cannot see the performance difference on CPU with
    small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache.

------------------------------------------------------------------------
                                                 From Beijing, China



More information about the freebsd-hackers mailing list