Interrupt performance

Sun Jan 30 12:55:37 UTC 2011

On Sat, Jan 29, 2011 at 11:54:11PM +1100, Bruce Evans wrote:

> > And I see drammaticaly less number of context switches in linux stats
> > (by dstat).
> 
> FreeBSD uses ithreds for most interrupts, so of course it does many
> more context switches (at least 2 per interrupt).  This doesn't make
> much difference provided there are not too many.  I think the version
> of re that you are using actually uses "fast" interrupts and a task
> queue.  This also seems to be making little difference.  You get a
> relatively lightweight "fast" interrupt following by followed by a
> context switch to and from the task.  IIRC, your statistics showed 
> about twice as many context switches as interrupts, so the task queue
> isn't doing much to reduce the "interrupt overhead" -- it just gives
> context switches to the task instead of to an ithread.

Now I build kernel with polling and profiling.
Network performance with profiling (off) don't change.

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad0   in   sy   cs us sy id
 1 0 0  98824K   431M     0   0   0   0     0   0   0    0  117 2172  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  123 2176  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2197  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99

Network traffic ON:

 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3206  4 96  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107778 3183  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3184  1 99  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107155 3182  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107945 3206  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107613 3182  7 93  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107432 3180  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107523 3181  4 96  0

Report from gprof:

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 41.4      31.12    31.12        0  100.00%           __mcount [1]
 36.2      58.30    27.18    54341     0.50     0.50  acpi_cpu_c1 [6]
  8.9      65.01     6.71  2521168     0.00     0.00  copyin [17]
  2.8      67.11     2.10   419006     0.01     0.01  in_cksum_skip [23]
  1.0      67.86     0.75 12236575     0.00     0.00  memcpy [29]
  0.8      68.43     0.58  9309659     0.00     0.00  uma_zalloc_arg [25]
  0.6      68.89     0.45  7293157     0.00     0.00  mb_ctor_mbuf [32]
  0.6      69.32     0.43  1008034     0.00     0.00  uma_find_refcnt [34]
  0.5      69.71     0.39  2933058     0.00     0.00  ether_output [24]
  0.5      70.07     0.36  2933058     0.00     0.00  if_transmit [38]
  0.3      70.31     0.25   504035     0.00     0.01  ip_output [18]
  0.3      70.56     0.24  2933257     0.00     0.00  bcmp [48]
  0.3      70.77     0.21   504032     0.00     0.01  m_uiotombuf [19]
  0.3      70.98     0.21  3352048     0.00     0.00  mb_dupcl [51]
  0.3      71.19     0.21  2514036     0.00     0.00  m_copym [28]
  0.3      71.39     0.20   419006     0.00     0.01  ip_fragment [21]
  0.2      71.56     0.17   504017     0.00     0.02  udp_send [16]
  0.2      71.74     0.17  2520731     0.00     0.00  bzero [53]
  0.2      71.91     0.17   504648     0.00     0.03  Xint0x80_syscall [8]
  0.2      72.07     0.16   504017     0.00     0.00  in_pcbconnect_setup [30]
  0.2      72.22     0.15   504017     0.00     0.03  sosend_dgram [15]
  0.2      72.37     0.15 25113400     0.00     0.00  critical_exit <cycle 1> [57]
  0.2      72.51     0.14 25113400     0.00     0.00  critical_enter [59]
  0.2      72.63     0.13   504104     0.00     0.00  mb_ctor_pack [60]
  0.2      72.75     0.11  1512179     0.00     0.00  _rw_runlock [62]
  0.1      72.85     0.10   504017     0.00     0.03  kern_sendit [13]
  0.1      72.95     0.10  9311895     0.00     0.00  uma_zfree_arg [49]
  0.1      73.05     0.10   504114     0.00     0.00  free [54]
  0.1      73.14     0.10  1512161     0.00     0.00  uiomove [20]

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

                                  called/total       parents
index  %time    self descendents  called+self    name           index   
                                  called/total       children

                                                     <spontaneous>
[1]     41.4   31.12        0.00                 __mcount [1]

-----------------------------------------------

                                                     <spontaneous>
[2]     36.2    0.01       27.18                 sched_idletd [2]
                0.00       27.18   54341/54341       cpu_idle [4]

-----------------------------------------------

                0.00       27.18   54341/54341       cpu_idle_acpi [5]
[3]     36.2    0.00       27.18   54341         acpi_cpu_idle [3]
               27.18        0.00   54341/54341       acpi_cpu_c1 [6]
                0.00        0.00  108682/108682      AcpiHwRead [157]
                0.00        0.00   54341/54341       acpi_TimerDelta [653]

-----------------------------------------------

                0.00       27.18   54341/54341       sched_idletd [2]
[4]     36.2    0.00       27.18   54341         cpu_idle [4]
                0.00       27.18   54341/54341       cpu_idle_acpi [5]   
                0.00        0.00   54341/54341       mp_grab_cpu_hlt [654]

-----------------------------------------------