6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu

Mon May 15 11:15:21 PDT 2006

Hi,

After upgrading from 5.5-PRERELEASE to 6.1-RELEASE on one
nfs server today, I noticed that the load is very high, ranging from 4.x
to 30.x, depends how many nfsd I run. From mrtg traffic graph, I did
not notice there is high traffic. This box is 2 physical Xeon CPU w/
HTT enabled.  Some screen snapshots:

            input        (Total)           output
   packets  errs      bytes    packets  errs      bytes colls
      4593     0    5431035       2122     0    1463331     0
      2224     0    2500421       1459     0    1310224     0
      1929     0    2210035       1252     0    1165426     0
      2381     0    2782648       1724     0    1795611     0
      1975     0    2340899       1314     0    1342320     0
      2114     0    2537347       1254     0    1195396     0
      2050     0    2465473        890     0     611592     0
      1482     0    1660772        985     0     898894     0
      2002     0    2179834       1900     0    2092382     0
      1912     0    2202576       1598     0    1743046     0
      2436     0    3051876       1368     0    1345762     0
      2759     0    2977552       1346     0     730580     0

systat -vmstat 1:

    3 users    Load 19.80 14.39 11.08                  May 16 02:12

Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
        Tot   Share      Tot    Share    Free         in  out     in  out
Act  138500    6420   220028    10240  237472 count
All  810252    9900 1696014k    17272         pages
                                                       26 zfod   Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt      2 cow   61756 total
     5    18 60     53544   612233728595   12   49 173928 wire     56 52: mpt
                                                   153000 act         53: mpt
68.7%Sys  29.9%Intr  1.4%User  0.0%Nice  0.0%Idl   455576 inact       38: ips
|    |    |    |    |    |    |    |    |    |      29920 cache       1: atkb
==================================+++++++++++++++> 207552 free    169 4: sio0
                                                          daefr 49713 16: em0
Namei         Name-cache    Dir-cache                  38 prcfr  3199 cpu0: time
    Calls     hits    %     hits    %                     react  2965 cpu3: time
      342      333   97                                   pdwak  2717 cpu1: time
                                                          pdpgs  2937 cpu2: time
Disks ipsd0   da0   da1 pass0 pass1                       intrn
KB/t   4.00 70.83 22.67  0.00  0.00                113872 buf
tps       1    58    10     0     0                   698 dirtybuf
MB/s   0.00  4.03  0.21  0.00  0.00                100000 desiredvnodes
% busy    0    46    18     0     0                 25368 numvnodes
                                                    17281 freevnodes

vmstat -i :

interrupt                          total       rate
irq52: mpt0                       586784         48
irq53: mpt1                           12          0
irq38: ips0                        74926          6
irq1: atkbd0                           2          0
irq4: sio0                         20363          1
irq16: em0                     100321381       8348
cpu0: timer                     23813454       1981
cpu3: timer                     22903961       1906
cpu1: timer                     21907744       1823
cpu2: timer                     22886458       1904
Total                          192515085      16021

The high interrupt rate of em0 looks very suspicious. I even saw 30K~90K
interrupt in systat -vmstat 1's output. As for top's output:

last pid: 21888;  load averages: 25.52, 16.86, 12.22    up 0+03:30:42  02:13:06
143 processes: 29 running, 99 sleeping, 2 zombie, 13 waiting
CPU states:  0.5% user,  0.0% nice, 66.7% system, 32.8% interrupt,  0.0% idle
Mem: 152M Active, 566M Inact, 172M Wired, 29M Cache, 111M Buf, 78M Free
Swap: 2048M Total, 100K Used, 2048M

  PID USERNAME     THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   15 root           1 -32 -151     0K     8K CPU1   0  48:47 46.83% swi4: cloc
94182 root           1   4    0  1300K   720K RUN    1  11:17 39.31% nfsd
94183 root           1  -4    0  1300K   720K RUN    1   7:15 37.70% nfsd
94186 root           1  -4    0  1300K   720K RUN    0   3:35 30.81% nfsd
   17 root           1 -44 -163     0K     8K WAIT   1  32:56 28.71% swi1: net
94185 root           1  -8    0  1300K   720K biowr  1   4:18 28.27% nfsd
94187 root           1  -4    0  1300K   720K RUN    1   3:16 26.42% nfsd
    6 root           1  -8    0     0K     8K CPU3   0  18:57 26.03% g_down
94180 root           1  -4    0  1300K   720K RUN    2   4:58 24.85% nfsd
94184 root           1   4    0  1300K   720K RUN    1   2:59 24.76% nfsd
94188 root           1  -4    0  1300K   720K RUN    1   2:39 22.95% nfsd
   31 root           1 -68 -187     0K     8K WAIT   0  10:48 20.41% irq16: em0
   27 root           1 -64 -183     0K     8K WAIT   1  12:33 15.87% irq52: mpt
   21 root           1 -40 -159     0K     8K CPU0   0   8:19  9.18% swi2: camb
   40 root           1 -16    0     0K     8K sdflus 1   6:04  5.13% softdepflu

The wait channel of nfsd are usually biord, biowd, ufs, RUN, CPUX, and -.

The kernel conf is GENERIC without unneeded hardware + ipfw2, FAST_IPSEC,
QUOTA (but I don't have any userquota or groupquota in fstab). I also tuned some
sysctls:

machdep.hyperthreading_allowed=1
vm.kmem_size_max=419430400
vm.kmem_size_scale=2
net.link.ether.inet.log_arp_wrong_iface=0
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.udp.recvspace=65536
kern.ipc.somaxconn=4096
kern.maxfiles=65535
kern.ipc.shmmax=104857600
kern.ipc.shmall=25600
net.inet.ip.random_id=1
kern.maxvnodes=100000
vfs.read_max=16
kern.cam.da.retry_count=20
kern.cam.da.default_timeout=300

Anything that I can provide to help nail this problem down?

Thanks,
Rong-En Fan