mysql performance on 4 * dualcore opteron

Sven Petai hadara at bsd.ee
Tue Apr 4 16:43:51 UTC 2006


hi

Before I begin, let me just say that I'm probably aware most of the threads 
about mysql performance in various fbsd lists over last couple of years, so 
please let's not consentrate on the usual points made over and over again 
like how filesystems are mounted under linux, how fast time() is or how 
various combinations of scheduler/threding library/compiler flags give you 
~5-10% better performance. It's very unlikely that any of these reasons, or 
even all of them together can explain performance differences of 2-3 * 

so now a little bit of the backround...
I usually use MySQL benchmark called super-smack as one of the benchmarks on 
all the new machines to get a general feeling of the servers performance.
I certainly agree that the default smack workloads are far too simple to say 
much about actual production performance, but still... better than nothing...

In general 2.4Ghz amd64 UP box (6.1 betaX) can do about
17400 q/s with select-smack+4bsd+thr combination and
4300 q/s with update-smack+4bsd+thr

on dualcore 2Ghz opteron (6.1 prerelease) the results are:
20000 q/s with select-smack+4bsd+thr and
4500 q/s  with update-smack+4bsd+thr

performance for update-smack seems to be always 4XXX q/s, no matter how many 
CPUs the box has or what kind or raid controller/disks are used (i have 
tested on about 8 rather different machines).  I have no idea if IO on all 
the servers I have tried really maxes out at this point or is there some 
bottleneck in UFS.
select-smack performance gains on dualcore are not quite as good as one might 
expect, but then again that dualcore box uses ECC memory which is probably 
somewhat slower because of the checksum calculations, and synchronisation has 
some overhead too... 
Anyway all in all I'm more or less happy with these results, even though linux 
will do about twise as much selects on the same hardware.

Today I had a chance to test 4 * 2Ghz dualcore opteron machine,  so this 
machine has 8 cores in total and 8G of RAM.

Now, on that server I get:
11000 q/s for select-smack+4bsd+thr combination (with KSE it's around 6000 
q/s, ule+thr gives somewhere around 12000 q/s)
4100 q/s for update-smack+4bsd+thr

So the 8 core machine got almost 2* worse result for select than UP server.

After some tinkering I found out that renicing mysqld to -5 will make it push 
out 21000 q/s (4bsd, thr), so I suspect part of the problem is in the 
scheduling - probably super-smack with it's 100 processes gets just a lot 
more CPU time otherwise than mysql with it's 100 threads servicing them. 
But anyway even this result is still only about equal in performance to what I 
get from dualcore machine.

As I ran out of good (macro)tuning ideas at this point, and wanted to make 
sure higher scores are indeed achievable, I tried Linux on the same hardware.
Here are the results for same tests on Suse enterprise linux 9 
(2.6.5-7.97-smp):
76857 q/s for select-smack
10050 q/s for update-smack

the mysql configuration was identical to the one I used under freebsd 
(my-huge). 
This Suse uses ReiserFS, but I have no idea about what kind of FS guarantees 
it provides, didn't see any sync/async stuff in the mount output.
I also repeated the tests on identical box that had Fedora installed 
(2.6.9-22-ELsmp) and used ext3'fs.
select-smack results were obviously almost the same as it doesn't touch the 
FS, update was about 8000 q/s.

I'm relativelly sure that this kind of huge performance differences can't be 
explained by mere speed difference of time(), I haven't yet tested phk'd and 
roberts timer hacks, but at some point in time I rewrote mysql's timing code 
to completelly avoid any calls to time() by keeping internal timestamp that 
was updated from TSC reg. value. It was certainly very ugly and imprecise, 
but worked well enough since mysql uses these code paths mainly for 
statistics and for setting various safeguard timeouts. 
Even with ~90% time() calls removed the performance still didn't get 
measurably better.
Of course it's possible that I fucked up somehow, so if someone has tested 
roberts and phk's changes then it would be certainly nice to hear about your 
results.

To make the long story short - does anyone have any good ideas about where 
might the bottleneck and how to debug it ?

PS
Here's some system/test information:
super-smack was used with concurrency of 100 and reqs. set to 10000
it was running on the same machine as the mysqld and connections were done 
over local socket.

timer: acpi-fast in all the cases
mysql: 4.1.18_2 from ports, table type is myisam
mysql configuration file:
http://bsd.ee/~hadara/debug/mysql3/2way/my.cnf
in general it's just my-huge.cnf from mysql distribution, with increased 
max_connections

kernel config is GENERIC-SMP (no it doesn't have WITNESS enabled)
== 4 * dualcore opteron ==:
vmstat 1, during select-smack test:
http://bsd.ee/~hadara/debug/mysql3/8way/vmstat.txt
dmesg:
http://bsd.ee/~hadara/debug/mysql3/8way/dmesg.boot
sysctl -a:
http://bsd.ee/~hadara/debug/mysql3/8way/sysctl.txt

== 1 * dualcore opteron ==:
vmstat 1, during select-smack test:
http://bsd.ee/~hadara/debug/mysql3/2way/vmstat.txt
dmesg:
http://bsd.ee/~hadara/debug/mysql3/2way/dmesg.boot
sysctl -a:
http://bsd.ee/~hadara/debug/mysql3/2way/sysctl.txt


More information about the freebsd-performance mailing list