Performance issue
Ewan Todd
ewan at mathcode.net
Mon May 9 08:00:05 PDT 2005
Hi All,
I have what I think is a serious performance issue with fbsd 5.3
release. I've read about threading issues, and it seems to me that
that is what I'm looking at, but I'm not confident enough to rule out
that it might be a hardware issue, a kernel configuration issue, or
something to do with the python port. I'd appreciate it if someone
would it point out if I'm overlooking something obvious. Otherwise,
if it is the problem I think it is, then there seems entirely too
little acknowledgement of a major issue.
Here's the background. I just got a new (to me) AMD machine and put
5.3 release on it. I'd been very happy with the way my old Intel
machine had been performing with 4.10 stable, and I decided to run a
simple performance diagnostic on both machines, to wow myself with the
amazing performance of the new hardware / kernel combination.
However, the result was pretty disappointing.
Here are what I think are the pertinent dmesg details.
Old rig:
FreeBSD 4.10-RELEASE #0: Thu Jul 1 22:47:08 EDT 2004
Timecounter "i8254" frequency 1193182 Hz
Timecounter "TSC" frequency 449235058 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (449.24-MHz 686-class CPU)
New rig:
FreeBSD 5.3-RELEASE #0: Fri Nov 5 04:19:18 UTC 2004
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) Processor (995.77-MHz 686-class CPU)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Timecounter "TSC" frequency 995767383 Hz quality 800
Timecounters tick every 10.000 msec
The diagnostic I selected was a python program to generate 1 million
pseudo-random numbers and then to perform a heap sort on them. That
code is included at the foot of this email. I named the file
"heapsort.py". I ran it on both machines, using the "time" utility in
/usr/bin/ (not the builtin tcsh "time"). So the command line was
/usr/bin/time -al -o heapsort.data ./heapsort.py 1000000
A typical result for the old rig was
130.78 real 129.86 user 0.11 sys
22344 maximum resident set size
608 average shared memory size
20528 average unshared data size
128 average unshared stack size
5360 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
2386 involuntary context switches
Whereas, the typical result for the new rig looked more like
105.36 real 71.10 user 33.41 sys
23376 maximum resident set size
659 average shared memory size
20796 average unshared data size
127 average unshared stack size
5402 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
10548 involuntary context switches
You'll notice that the new rig is indeed a little faster (times in
seconds): 105.36 real (new rig) compared with 130.78 real (old rig).
However, the new rig spends about 33.41 seconds on system overhead
compared with just 0.11 seconds on the old rig. Comparing the rusage
stats, the only significant difference is the "involuntary context
switches" field, where the old rig has 2386 and the new rig has a
whopping 10548. Further, I noticed that the number of context
switches on the new rig seems to be more or less exactly one per 10
msec of real time, that is, one per timecounter tick. (I saw this
when comparing heapsort.py runs with arguments other than 1000000.)
I think the new rig ought to execute this task in about 70 seconds:
just over the amount of user time. Assuming that I'm not overlooking
something obvious, and that I'm not interpreting a feature as a bug,
this business with the context switches strikes me as a bit of a
show-stopper. If that's right, it appears to be severely underplayed
in the release documentation.
I'll be happy if someone would kindly explain to me what's going on
here. I'll be even happier to hear of a fix or workaround to remedy
the situation.
Thanks in advance,
-e
heapsort.py:
#!/usr/local/bin/python -O
# $Id: heapsort-python-3.code,v 1.3 2005/04/04 14:56:45 bfulgham Exp $
#
# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
#
# Updated by Valentino Volonghi for Python 2.4
# Reworked by Kevin Carson to produce correct results and same intent
import sys
IM = 139968
IA = 3877
IC = 29573
LAST = 42
def gen_random(max) :
global LAST
LAST = (LAST * IA + IC) % IM
return( (max * LAST) / IM )
def heapsort(n, ra) :
ir = n
l = (n >> 1) + 1
while True :
if l > 1 :
l -= 1
rra = ra[l]
else :
rra = ra[ir]
ra[ir] = ra[1]
ir -= 1
if ir == 1 :
ra[1] = rra
return
i = l
j = l << 1
while j <= ir :
if (j < ir) and (ra[j] < ra[j + 1]) :
j += 1
if rra < ra[j] :
ra[i] = ra[j]
i = j
j += j
else :
j = ir + 1;
ra[i] = rra;
def main() :
if len(sys.argv) == 2 :
N = int(sys.argv[1])
else :
N = 1
ary = [None]*(N + 1)
for i in xrange(1, N + 1) :
ary[i] = gen_random(1.0)
heapsort(N, ary)
print "%.10f" % ary[N]
main()
More information about the freebsd-stable
mailing list