unsatisfying c++/boost::multi_index_container::erase performance on at least FreeBSD 6.0

Jason Evans jasone at FreeBSD.org
Fri Mar 17 01:47:02 UTC 2006


bert hubert wrote:
> Dear FreeBSD hackers,
> 
> I'm working on improving the PowerDNS recursor for a big FreeBSD-loving
> internet provider in The Netherlands and I am hitting some snags. I also
> hope this is the appropriate list to share my concerns.
> 
> Pruning the cache is very very slow on the providers FreeBSD 6.0 x86 systems
> whereas it flies on other operating systems.
> 
> I've managed to boil down the problem to the code found on
> http://ds9a.nl/tmp/cache-test.cc which can be compiled with:
> 'g++ -O3 -I/usr/local/include cache-test.cc -o cache-test' after installing
> Boost from the ports.
> 
> The problem exists both with the system compiler and with a self-compiled
> g++ 4.1.
> 
> Here are some typical timings:
> $ ./cache-test
> Creating..
> Copying 499950 nodes
> 100                           345 usec            3.45 usec/erase
> 300                           3298 usec           10.99 usec/erase
> 500                           8749 usec           17.50 usec/erase
> 700                           72702 usec          103.86 usec/erase
> 900                           46521 usec          51.69 usec/erase
> 
> On another operating system with almost the same cpu:
> 
> $ ./cache-test
> Creating..
> Copying 499950 nodes
> 100                           62 usec             0.62 usec/erase
> 300                           187 usec            0.62 usec/erase
> 500                           347 usec            0.69 usec/erase
> 700                           419 usec            0.60 usec/erase
> 900                           575 usec            0.64 usec/erase
> 
> I've toyed with MALLOC_OPTIONS, especially the >> options, I've tried
> GLIBCXX_FORCE_NEW, I've tried specifying a different STL allocator in the
> c++ code, it all doesn't change a thing.
> 
> A quick gprof profile shows a tremendous number of calls to 'ifree' but that
> may be due to the copying of the container going on between test runs.
> 
> Any help would be very appreciated as I am all out of clues. 
> 
> Thanks.

I ran cache-test on -current using phkmalloc and a couple of different 
versions of jemalloc.  jemalloc does not appear to have the same issue 
for this test.  It isn't obvious to me why phkmalloc is performing so 
poorly, but I think you can assume that this is a malloc performance 
problem.

The following jemalloc results were run with NO_MALLOC_EXTRAS defined. 
I included the patch results because I expect to commit the patch this week.

phkmalloc and jemalloc have similar memory usage, but jemalloc is 
substantially faster.  The jemalloc patch uses substantially less memory 
than either phkmalloc or jemalloc.

Jason

------- phkmalloc: -----------------------------------------------------
onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/tmp/phkmalloc/libc/libc.so.6 =time 
-l ./cache-test
Creating..
Copying 499950 nodes
100                           501 usec            5.01 usec/erase
300                           53183 usec          177.28 usec/erase
500                           5491 usec           10.98 usec/erase
700                           158989 usec         227.13 usec/erase
900                           47491 usec          52.77 usec/erase
1100                          324948 usec         295.41 usec/erase
1300                          106480 usec         81.91 usec/erase
1500                          522414 usec         348.28 usec/erase
1700                          155604 usec         91.53 usec/erase
1900                          685235 usec         360.65 usec/erase
2100                          230939 usec         109.97 usec/erase
2300                          860083 usec         373.95 usec/erase
2500                          234910 usec         93.96 usec/erase
2700                          1226310 usec        454.19 usec/erase
2900                          205739 usec         70.94 usec/erase
3100                          1379395 usec        444.97 usec/erase
3300                          296925 usec         89.98 usec/erase
3500                          1620705 usec        463.06 usec/erase
3700                          312343 usec         84.42 usec/erase
3900                          1835125 usec        470.54 usec/erase
4100                          306443 usec         74.74 usec/erase
4300                          1805999 usec        420.00 usec/erase
4500                          323179 usec         71.82 usec/erase
4700                          1593007 usec        338.94 usec/erase
4900                          316249 usec         64.54 usec/erase
       495.53 real       494.29 user         1.17 sys
     279240  maximum resident set size
         60  average shared memory size
     274524  average unshared data size
        128  average unshared stack size
      78238  page reclaims
          1  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          4  voluntary context switches
       6492  involuntary context switches

------- jemalloc (-current): -------------------------------------------
onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/tmp/jemalloc/libc/libc.so.6 =time 
-l ./cache-test
Creating..
Copying 499950 nodes
100                           281 usec            2.81 usec/erase
300                           586 usec            1.95 usec/erase
500                           1008 usec           2.02 usec/erase
700                           973 usec            1.39 usec/erase
900                           1489 usec           1.65 usec/erase
1100                          2269 usec           2.06 usec/erase
1300                          2493 usec           1.92 usec/erase
1500                          3337 usec           2.22 usec/erase
1700                          3815 usec           2.24 usec/erase
1900                          3511 usec           1.85 usec/erase
2100                          4493 usec           2.14 usec/erase
2300                          4235 usec           1.84 usec/erase
2500                          6043 usec           2.42 usec/erase
2700                          5474 usec           2.03 usec/erase
2900                          7670 usec           2.64 usec/erase
3100                          6104 usec           1.97 usec/erase
3300                          10923 usec          3.31 usec/erase
3500                          4560 usec           1.30 usec/erase
3700                          9998 usec           2.70 usec/erase
3900                          8023 usec           2.06 usec/erase
4100                          15031 usec          3.67 usec/erase
4300                          5588 usec           1.30 usec/erase
4500                          15490 usec          3.44 usec/erase
4700                          6544 usec           1.39 usec/erase
4900                          14565 usec          2.97 usec/erase
        38.58 real        37.98 user         0.57 sys
     275752  maximum resident set size
         60  average shared memory size
         12  average unshared data size
        128  average unshared stack size
      68494  page reclaims
          0  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          1  voluntary context switches
       1180  involuntary context switches

------- jemalloc (patch): ----------------------------------------------
(http://people.freebsd.org/~jasone/jemalloc/patches/jemalloc_20060315a.diff)

onyx:~> MALLOC_OPTIONS=aj LD_PRELOAD=/usr/obj/usr/src/lib/libc/libc.so.6 
=time -l ./cache-test
Creating..
Copying 499950 nodes
100                           232 usec            2.32 usec/erase
300                           912 usec            3.04 usec/erase
500                           2514 usec           5.03 usec/erase
700                           2008 usec           2.87 usec/erase
900                           3255 usec           3.62 usec/erase
1100                          2931 usec           2.66 usec/erase
1300                          4010 usec           3.08 usec/erase
1500                          3486 usec           2.32 usec/erase
1700                          4675 usec           2.75 usec/erase
1900                          2992 usec           1.57 usec/erase
2100                          2417 usec           1.15 usec/erase
2300                          4986 usec           2.17 usec/erase
2500                          4000 usec           1.60 usec/erase
2700                          5990 usec           2.22 usec/erase
2900                          3661 usec           1.26 usec/erase
3100                          4702 usec           1.52 usec/erase
3300                          5934 usec           1.80 usec/erase
3500                          7999 usec           2.29 usec/erase
3700                          5998 usec           1.62 usec/erase
3900                          6489 usec           1.66 usec/erase
4100                          6997 usec           1.71 usec/erase
4300                          7965 usec           1.85 usec/erase
4500                          7849 usec           1.74 usec/erase
4700                          8456 usec           1.80 usec/erase
4900                          7814 usec           1.59 usec/erase
        37.13 real        35.86 user         1.22 sys
     222976  maximum resident set size
         59  average shared memory size
         11  average unshared data size
        127  average unshared stack size
     104136  page reclaims
          0  page faults
          0  swaps
          0  block input operations
          0  block output operations
          0  messages sent
          0  messages received
          0  signals received
          2  voluntary context switches
       1162  involuntary context switches


More information about the freebsd-hackers mailing list