Odd news transit performance problem

Tue Dec 21 04:58:30 PST 2004

On Mon, Dec 20, 2004 at 06:24:49PM +0000, Dave Williams wrote:
> Various reporting tools (systat, vmstat, top, etc) report that the
> box is idle - there's no significant contention for memory, disk,
> network, etc. that we can see and actually bouncing the box seems
> to bring performance back up to speed again for a period - restarting
> innd doesn't have the same effect.

I'd like to flesh out this thread with some more detail. I've put my
interpretation of the detail in too, but if I'm not correct, clarification
would be appreciated!

The host is built as follows:

  - Intel Xeon 2.8GHz
  - 3GB RAM
  - 1x fxp NIC <Intel 82551 Pro/100 Ethernet>
  - 2x em NICs <Intel(R) PRO/1000 Network Connection, Version - 1.7.16>
  - 2x sym SCSI controllers <1010-66>
  - 14x 18GB U160 SCSI disks <IC35L018UCD210-0> (not quite split equally across
    the two SCSI controllers)
  - vinum is used to stripe volumes across multiple spindles. In particular
    the history database is striped over 10 devices.

The host is fed news from only two sources which on average equates to 25
streams. The total volume this host is handling per day is of the order of
1.4TB inbound. At present it's handling ~50 articles per second (average
article size is 350KB). News is then fed out to a number of other hosts which
equates to a further 40 streams.

We are seeing this host not keep up with all the news being offered to it, and
consequently the hosts feeding it are keeping a backlog of articles to offer it
later. The host that is backlogging is behind the first em interface.

'top' shows the CPU to be on average 25% idle, and that little swap is being
used:

CPU states: 11.3% user,  1.6% nice, 47.1% system, 15.2% interrupt, 24.9% idle
Mem: 438M Active, 1815M Inact, 658M Wired, 102M Cache, 199M Buf, 4348K Free
Swap: 4096M Total, 156K Used, 4096M Free

'vmstat 1' shows a number of pages being paged out per second, but few ever
being paged in? How does this tally with the fact that < 1MB of swap is in use?
I think my understanding of paging could be at fault here. :)

 procs      memory      page                    disks     faults      cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr da0 da1   in   sy  cs us sy id
 3 0 0  576472  88672 4854   0   0 139 9465   0   0   1 1474 10821 3655 18 48 33
 2 0 0  577492 148356 4796   1   0  94 9566 19789   0   0 1675 12267 4471 17 53 29
 1 6 0  576088 129444 5655   1   0 139 10133   0   0   1 1511 11115 4033 20 51 29
 1 6 0  576172 106752 5110   0   0 146 10618   0   0   0 1462 12521 4411 24 50 26
 2 0 0  577168 167000 4753   0   1 142 8541 19797   0   4 1664 11800 4307 20 52 28
 3 0 0  580476 143228 6905   2   1  93 11867   0   0   1 1426 10999 3798 19 51 30
 2 6 0  579984 126212 4678   0   0 209 8677   0   0   0 1492 14254 5125 15 44 41
 4 0 0  575016 112616 6526   0   0  94 11638   0   0   1 1736 11860 4002 21 44 35
 2 5 0  576932  93844 5016   0   0  93 8178   0   0   0 1477 8863 2898 16 45 39
 0 7 0  579076 164308 2546   1   7 3553 4244 19793  13   1 3921 10493 2388  9 43 47

Both 'systat -vmstat' and 'iostat' show the disks are not busy. They are
transferring approximately 2.0MB/s on average. 'systat -vmstat' indicates the
devices are <20% busy.

Indeed, writing from /dev/zero to a vinum volume striped over ten disks I can
achieve a further 5MB/s per disk over and above what the system is usually
generating. This still only pushes the disks to 50% busy.

Network-wise the fxp interface does ~45mbit/s (majority outbound), em0 does
20mbit (majority inbound) does and em1 does ~150mbit/s (majority outbound).
Duplex settings are all correct and 'netstat' shows very few errors on host
interfaces:

Name  Mtu   Network       Address            Ipkts Ierrs    Opkts Oerrs  Coll
fxp0  1500  <Link#1>    00:e0:18:a4:d4:4d 96710493     0 103018264     2     0
em0   1500  <Link#2>    00:e0:18:a4:d4:4c 393932814     0 429947240     0     0
em1   1500  <Link#3>    00:07:e9:0f:9a:34 465099760     0 559720416     0     0

There appears not to be a shortage of mbufs:

# netstat -m
1307/5936/262144 mbufs in use (current/peak/max):
        1307 mbufs allocated to data
1301/4970/65536 mbuf clusters in use (current/peak/max)
11424 Kbytes allocated to network (5% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

There doesn't appear to be another bottleneck in the network subsystem because
an scp from one host to the other over em0 generates a further 30mbit/s of
traffic.

Receive queues are about 20k on each inbound news stream. Does this figure
indicate that data that has not yet been transferred from the kernel to user
space?

The following sysctl's have been set over the years:

net.inet.tcp.inflight_enable=1
vfs.vmiodirenable=1
vfs.lorunningspace=2097152
vfs.hirunningspace=4194304
kern.maxfiles=262144
kern.maxfilesperproc=32768
net.inet.tcp.rfc1323=1
net.inet.tcp.delayed_ack=0
net.inet.tcp.sendspace=131070
net.inet.tcp.recvspace=131070
net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=57344
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535
kern.polling.enable=1

as well as the following kernel configuration options:

options         NMBCLUSTERS=65536
options         MAXDSIZ="(1024*1024*1024)"
options         MAXSSIZ="(256*1024*1024)"
options         DFLDSIZ="(256*1024*1024)"
options         DEVICE_POLLING
options         HZ=1000

Given that the CPU is not wedged at 100%, that there is free memory, that the
disks have plenty of bandwidth left, and likewise the network interfaces, I'm
convinced that this host ought to be keeping up and not causing other hosts to
keep a backlog for it.

Does anyone have any suggestions for where else we might look to see why this
host doesn't appear to be performing as well as one might expect it do?

Yours,

Ollie

-- 
Ollie Cook         Systems Architect, Claranet UK
ollie at uk.clara.net               +44 20 7685 8065