CPU affinity with ULE scheduler

Archimedes Gaviola archimedes.gaviola at gmail.com
Thu Nov 13 03:55:02 PST 2008


On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb at freebsd.org> wrote:
> On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb at freebsd.org> wrote:
>> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>> >> To Whom It May Concerned:
>> >>
>> >> Can someone explain or share about ULE scheduler (latest version 2 if
>> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>> >> scheduler and as what I have observed especially on processing high
>> >> network load traffic on multiple CPU cores, only one CPU were being
>> >> stressed with network interrupt while the rests are mostly in idle
>> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>> >> case.
>> >
>> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
> the
>> > same interrupt (irq 23), the CPU that interrupt is routed to is going to
> end
>> > up handling all the interrupts for bce0 and bce1.  This not something ULE
> or
>> > 4BSD have any control over.
>> >
>> > --
>> > John Baldwin
>> >
>>
>> Hi John,
>>
>> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>>
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
>>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
>>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
>>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
>>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
>>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
>>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
>>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>> irq23: bce0 bce1
>>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
>>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
>>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
> clock s
>>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
>>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>>
>> Actually I was doing a network performance testing on this system with
>> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>> tool to generate big amount of traffic around 600Mbps-700Mbps
>> traversing the FreeBSD system in bi-direction, meaning both network
>> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>> that handles the (irq 23) on both interfaces consumed big amount of
>> CPU utilization around 65.53% in which it affects other running
>> applications and services like sshd and httpd. It's no longer
>> accessible when traffic is bombarded. With the current situation of my
>> FreeBSD system with only one CPU being stressed, I was thinking of
>> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>> my concern has something to do with the distributions of load on
>> multiple CPU cores handled by the scheduler especially at the network
>> level, processing network load. So, if it is more of interrupt
>> handling and not on the scheduler, is there a way we can optimize it?
>> Because if it still routed only to one CPU then for me it's still
>> inefficient. Who handles interrupt scheduling for bounding CPU in
>> order to prevent shared IRQ? Is there any improvements with
>> FreeBSD-7.0 with regards to interrupt handling?
>
> It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
> hardwired to the same interrupt pin and so they will always share the same
> ithread when using the legacy INTx interrupts.  However, bce(4) parts do
> support MSI, and if you try a newer OS snap (6.3 or later) these devices
> should use MSI in which case each NIC would be assigned to a separate CPU.  I
> would suggest trying 7.0 or a 7.1 release candidate and see if it does
> better.
>
> --
> John Baldwin
>

Hi John,

I try 7.0 release and each network interface were already allocated
separately on different CPU. Here, MSI is already working.

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle: cpu6
   15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle: cpu3
   14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle: cpu4
   16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle: cpu2
   17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle: cpu1
   37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256: bce0
   13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
   40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257: bce1
   18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
   11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
   19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4: clock s
14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
   22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
   25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6: Giant t
11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
   41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1: atkbd0
    4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down

The bce0 interface interrupt (irq256) gets stressed out which already
have 100% of CPU7 while CPU0 is around 51.17%. Any more
recommendations? Is there anything we can do about optimization with
MSI?

Thanks,
Archimedes


More information about the freebsd-smp mailing list