Performance issue

Ewan Todd ewan at mathcode.net
Mon May 9 08:00:05 PDT 2005


Hi All,

I have what I think is a serious performance issue with fbsd 5.3
release.  I've read about threading issues, and it seems to me that
that is what I'm looking at, but I'm not confident enough to rule out
that it might be a hardware issue, a kernel configuration issue, or
something to do with the python port.  I'd appreciate it if someone
would it point out if I'm overlooking something obvious.  Otherwise,
if it is the problem I think it is, then there seems entirely too
little acknowledgement of a major issue.

Here's the background.  I just got a new (to me) AMD machine and put
5.3 release on it.  I'd been very happy with the way my old Intel
machine had been performing with 4.10 stable, and I decided to run a
simple performance diagnostic on both machines, to wow myself with the
amazing performance of the new hardware / kernel combination.
However, the result was pretty disappointing.

Here are what I think are the pertinent dmesg details.

Old rig:

  FreeBSD 4.10-RELEASE #0: Thu Jul  1 22:47:08 EDT 2004
  Timecounter "i8254"  frequency 1193182 Hz
  Timecounter "TSC"  frequency 449235058 Hz
  CPU: Pentium III/Pentium III Xeon/Celeron (449.24-MHz 686-class CPU)

New rig:

  FreeBSD 5.3-RELEASE #0: Fri Nov  5 04:19:18 UTC 2004
  Timecounter "i8254" frequency 1193182 Hz quality 0
  CPU: AMD Athlon(tm) Processor (995.77-MHz 686-class CPU)
  Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
  Timecounter "TSC" frequency 995767383 Hz quality 800
  Timecounters tick every 10.000 msec

The diagnostic I selected was a python program to generate 1 million
pseudo-random numbers and then to perform a heap sort on them.  That
code is included at the foot of this email.  I named the file
"heapsort.py".  I ran it on both machines, using the "time" utility in
/usr/bin/ (not the builtin tcsh "time").  So the command line was

  /usr/bin/time -al -o heapsort.data ./heapsort.py 1000000

A typical result for the old rig was

      130.78 real       129.86 user         0.11 sys
     22344  maximum resident set size
       608  average shared memory size
     20528  average unshared data size
       128  average unshared stack size
      5360  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
      2386  involuntary context switches

Whereas, the typical result for the new rig looked more like

      105.36 real        71.10 user        33.41 sys
     23376  maximum resident set size
       659  average shared memory size
     20796  average unshared data size
       127  average unshared stack size
      5402  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         0  voluntary context switches
     10548  involuntary context switches

You'll notice that the new rig is indeed a little faster (times in
seconds): 105.36 real (new rig) compared with 130.78 real (old rig).

However, the new rig spends about 33.41 seconds on system overhead
compared with just 0.11 seconds on the old rig.  Comparing the rusage
stats, the only significant difference is the "involuntary context
switches" field, where the old rig has 2386 and the new rig has a
whopping 10548.  Further, I noticed that the number of context
switches on the new rig seems to be more or less exactly one per 10
msec of real time, that is, one per timecounter tick.  (I saw this
when comparing heapsort.py runs with arguments other than 1000000.)

I think the new rig ought to execute this task in about 70 seconds:
just over the amount of user time.  Assuming that I'm not overlooking
something obvious, and that I'm not interpreting a feature as a bug, 
this business with the context switches strikes me as a bit of a
show-stopper.  If that's right, it appears to be severely underplayed
in the release documentation.

I'll be happy if someone would kindly explain to me what's going on
here.  I'll be even happier to hear of a fix or workaround to remedy
the situation.

Thanks in advance,

-e




heapsort.py:

#!/usr/local/bin/python -O
# $Id: heapsort-python-3.code,v 1.3 2005/04/04 14:56:45 bfulgham Exp $
#
# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
#
# Updated by Valentino Volonghi for Python 2.4
# Reworked by Kevin Carson to produce correct results and same intent

import sys

IM = 139968
IA =   3877
IC =  29573

LAST = 42
def gen_random(max) :
    global LAST
    LAST = (LAST * IA + IC) % IM
    return( (max * LAST) / IM )

def heapsort(n, ra) :
    ir = n
    l = (n >> 1) + 1

    while True :
        if l > 1 :
            l -= 1
            rra = ra[l]
        else :
            rra = ra[ir]
            ra[ir] = ra[1]
            ir -= 1
            if ir == 1 :
                ra[1] = rra
                return

        i = l
        j = l << 1
        while j <= ir :
            if (j < ir) and (ra[j] < ra[j + 1]) :
                j += 1

            if rra < ra[j] :
                ra[i] = ra[j]
                i = j
                j += j
            else :
                j = ir + 1;
        ra[i] = rra;

def main() :
    if len(sys.argv) == 2 :
        N = int(sys.argv[1])
    else :
        N = 1

    ary = [None]*(N + 1)
    for i in xrange(1, N + 1) :
        ary[i] = gen_random(1.0)

    heapsort(N, ary)

    print "%.10f" % ary[N]

main()



More information about the freebsd-stable mailing list