RELENG_4 -> 5 -> 6: significant performance regression

Dmitry Pryanishnikov dmitry at atlantis.dp.ua
Fri May 12 13:26:13 PDT 2006


Hello!

On Fri, 28 Apr 2006, Kris Kennaway wrote:
>>>> makeoptions	CONF_CFLAGS=-fno-builtin
> I don't know, it needs to be tested in your particular case.

  I've built another kernel, adding back

makeoptions  CONF_CFLAGS=-fno-builtin
options      QUOTA

Results are almost the same as w/o these 2 options. So the following overhead 
difference:

>>>>                 %Sys   %Intr   %Idl
>>>> RELENG_6 + rl0      45      40     15
>>>> RELENG_6 + fxp0     45      35     20
>
>>                   %Sys   %Intr   %Idl  "time md5 -t" wall clock time
>> RELENG_6 + rl0      34      24     42   1:43
>> RELENG_6 + fxp0     30      20     50   1:40

is caused by just these:

options 	INVARIANTS
options 	INVARIANT_SUPPORT

>> (I'll try to find out which one of these takes which % of overhead when I
>> get free time), but still much worse then under RELENG_4, where this
>> particular (I'd say "quote common") usage pattern takes 24-28% of CPU time,
>> while under RELENG_5 / 6 it takes >= 50% ;(
>
> Thanks.  Silly question: the data transfer rate is the same on both
> 4.x and 6.x, right?  i.e. the data transfer itself takes the same
> time?

  Yes. I'm transferring a large file (ISO image) from another (much faster,
lightly loaded) machine over 10Mbit/s Ethernet link, so the transfer itself
is limited only by the wire speed (actual transfer rate is very close to 1000
KBytes/sec according to ftp client and 'systat -vm 1' disk transfer rate in 
every measurement).

> The next step is for you to run some profiling tests to see
> where the kernel is spending time, e.g. with hwpmc.

  I have to get myself familiar with this new (for me) feature first... Also, 
hwpmc doesn't exist in RELENG_4, so it'll be impossible to compare results
with RELENG_4. It's a pity, because my tests clearly show that main loss
of performance (growth of overhead) occured during RELENG_4 -> 5 transition.
And last, but not least: my test system (Transcend TS-ABX31A
motherboard based on Intel BX chipset) does not provide APIC, will hwpmc
be useful in this situation?

> Also, when you are trying to quantify performance differences, you
> need to run many copies of the test (at least 10) under identical
> conditions to account for possible variations.  The ministat tool
> (/usr/src/tools/tools/ministat) is good for performing statistically
> meaningful comparisons of data sets when you have them.

  As my transfer takes much time (say 10 minutes) I've observed % of time
used many times during the transfer - they don't vary more than +/- several
(2-3) % during the main transfer phase (when transfer speed is stable).
My "time md5 -t" runs was used only as a confirmation that systat's numbers
are trustworthy - they simply confirm that there are _much_ less CPU cycles
available for application under RELENG_5/6 than under RELENG_4 (under 
identical load pattern). I ran "time md5 -t" several (3-5 times) just to
confirm my assumptions, and results didn't vary more than 3%. So I suppose
that ministat isn't necessary in my tests.

> Kris

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry at atlantis.dp.ua
nic-hdl: LYNX-RIPE


More information about the freebsd-stable mailing list