From archimedes.gaviola at gmail.com Mon Nov 10 01:06:07 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 04:26:46 2008 Subject: CPU affinity with ULE scheduler Message-ID: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> To Whom It May Concerned: Can someone explain or share about ULE scheduler (latest version 2 if I'm not mistaken) dealing with CPU affinity? Is there any existing benchmarks on this with FreeBSD? Because I am currently using 4BSD scheduler and as what I have observed especially on processing high network load traffic on multiple CPU cores, only one CPU were being stressed with network interrupt while the rests are mostly in idle state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom network interface cards (bce0 and bce1). Below is the snapshot of the case. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K RUN 0 96:04 97.71% idle: cpu0 15 root 1 171 52 0K 16K RUN 2 98:41 97.07% idle: cpu2 14 root 1 171 52 0K 16K RUN 3 103:56 95.90% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 104:17 88.23% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 97:59 86.57% idle: cpu5 10 root 1 171 52 0K 16K RUN 7 81:51 82.08% idle: cpu7 11 root 1 171 52 0K 16K RUN 6 95:28 81.35% idle: cpu6 16 root 1 171 52 0K 16K RUN 1 102:15 77.78% idle: cpu1 36 root 1 -68 -187 0K 16K WAIT 7 19:37 4.59% irq23: bce0 bce1 18 root 1 -32 -151 0K 16K CPU0 0 2:13 0.00% swi4: clock sio 4488 root 1 96 0 30728K 4292K select 3 1:51 0.00% sshd 43 root 1 171 52 0K 16K pgzero 3 1:08 0.00% pagezero 218 root 1 96 0 3852K 1380K select 3 0:38 0.00% syslogd 20 root 1 -44 -163 0K 16K WAIT 7 0:32 0.00% swi1: net Thanks, Archimedes From archimedes.gaviola at gmail.com Mon Nov 10 04:52:05 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 05:18:32 2008 Subject: CPU affinity with ULE scheduler Message-ID: <42e3d810811100452h51d7d8ccw4a1008e234d07692@mail.gmail.com> To Whom It May Concerned: Can someone explain or share about ULE scheduler (latest version 2 if I'm not mistaken) dealing with CPU affinity? Is there any existing benchmarks on this with FreeBSD? Because I am currently using 4BSD scheduler and as what I have observed especially on processing high network load traffic on multiple CPU cores, only one CPU were being stressed with network interrupt while the rests are mostly in idle state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom network interface cards (bce0 and bce1). Below is the snapshot of the case. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K RUN 0 96:04 97.71% idle: cpu0 15 root 1 171 52 0K 16K RUN 2 98:41 97.07% idle: cpu2 14 root 1 171 52 0K 16K RUN 3 103:56 95.90% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 104:17 88.23% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 97:59 86.57% idle: cpu5 10 root 1 171 52 0K 16K RUN 7 81:51 82.08% idle: cpu7 11 root 1 171 52 0K 16K RUN 6 95:28 81.35% idle: cpu6 16 root 1 171 52 0K 16K RUN 1 102:15 77.78% idle: cpu1 36 root 1 -68 -187 0K 16K WAIT 7 19:37 4.59% irq23: bce0 bce1 18 root 1 -32 -151 0K 16K CPU0 0 2:13 0.00% swi4: clock sio 4488 root 1 96 0 30728K 4292K select 3 1:51 0.00% sshd 43 root 1 171 52 0K 16K pgzero 3 1:08 0.00% pagezero 218 root 1 96 0 3852K 1380K select 3 0:38 0.00% syslogd 20 root 1 -44 -163 0K 16K WAIT 7 0:32 0.00% swi1: net Thanks, Archimedes From ivoras at freebsd.org Mon Nov 10 06:05:20 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Nov 10 06:05:27 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> Message-ID: Archimedes Gaviola wrote: > To Whom It May Concerned: > > Can someone explain or share about ULE scheduler (latest version 2 if > I'm not mistaken) dealing with CPU affinity? Is there any existing > benchmarks on this with FreeBSD? Because I am currently using 4BSD Yes but not for network loads. See for example benchmarks in http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf > scheduler and as what I have observed especially on processing high > network load traffic on multiple CPU cores, only one CPU were being > stressed with network interrupt while the rests are mostly in idle > state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > network interface cards (bce0 and bce1). Below is the snapshot of the > case. This is unfortunately so and cannot be changed for now - you are not the first with this particular performance problem. BUT, looking at the data in the snapshot you gave, it's not clear that there is a performance problem in your case - bce is not nearly taking as much CPU time to be bottlenecking. What exactly do you think is wrong in your case? > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 17 root 1 171 52 0K 16K RUN 0 96:04 97.71% idle: cpu0 > 15 root 1 171 52 0K 16K RUN 2 98:41 97.07% idle: cpu2 > 14 root 1 171 52 0K 16K RUN 3 103:56 95.90% idle: cpu3 > 13 root 1 171 52 0K 16K RUN 4 104:17 88.23% idle: cpu4 > 12 root 1 171 52 0K 16K RUN 5 97:59 86.57% idle: cpu5 > 10 root 1 171 52 0K 16K RUN 7 81:51 82.08% idle: cpu7 > 11 root 1 171 52 0K 16K RUN 6 95:28 81.35% idle: cpu6 > 16 root 1 171 52 0K 16K RUN 1 102:15 77.78% idle: cpu1 > 36 root 1 -68 -187 0K 16K WAIT 7 19:37 4.59% > irq23: bce0 bce1 > 18 root 1 -32 -151 0K 16K CPU0 0 2:13 0.00% > swi4: clock sio > 4488 root 1 96 0 30728K 4292K select 3 1:51 0.00% sshd > 43 root 1 171 52 0K 16K pgzero 3 1:08 0.00% pagezero > 218 root 1 96 0 3852K 1380K select 3 0:38 0.00% syslogd > 20 root 1 -44 -163 0K 16K WAIT 7 0:32 0.00% swi1: net -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20081110/235eac6c/signature.pgp From jhb at freebsd.org Mon Nov 10 14:34:28 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Nov 10 14:34:40 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> Message-ID: <200811101733.04547.jhb@freebsd.org> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > To Whom It May Concerned: > > Can someone explain or share about ULE scheduler (latest version 2 if > I'm not mistaken) dealing with CPU affinity? Is there any existing > benchmarks on this with FreeBSD? Because I am currently using 4BSD > scheduler and as what I have observed especially on processing high > network load traffic on multiple CPU cores, only one CPU were being > stressed with network interrupt while the rests are mostly in idle > state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > network interface cards (bce0 and bce1). Below is the snapshot of the > case. Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the same interrupt (irq 23), the CPU that interrupt is routed to is going to end up handling all the interrupts for bce0 and bce1. This not something ULE or 4BSD have any control over. -- John Baldwin From jhb at freebsd.org Mon Nov 10 14:34:28 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Nov 10 14:34:40 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> Message-ID: <200811101733.04547.jhb@freebsd.org> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > To Whom It May Concerned: > > Can someone explain or share about ULE scheduler (latest version 2 if > I'm not mistaken) dealing with CPU affinity? Is there any existing > benchmarks on this with FreeBSD? Because I am currently using 4BSD > scheduler and as what I have observed especially on processing high > network load traffic on multiple CPU cores, only one CPU were being > stressed with network interrupt while the rests are mostly in idle > state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > network interface cards (bce0 and bce1). Below is the snapshot of the > case. Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the same interrupt (irq 23), the CPU that interrupt is routed to is going to end up handling all the interrupts for bce0 and bce1. This not something ULE or 4BSD have any control over. -- John Baldwin From archimedes.gaviola at gmail.com Mon Nov 10 20:32:57 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 20:33:03 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811101733.04547.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> Message-ID: <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> To Whom It May Concerned: >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> scheduler and as what I have observed especially on processing high >> network load traffic on multiple CPU cores, only one CPU were being >> stressed with network interrupt while the rests are mostly in idle >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> network interface cards (bce0 and bce1). Below is the snapshot of the >> case. > > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the > same interrupt (irq 23), the CPU that interrupt is routed to is going to end > up handling all the interrupts for bce0 and bce1. This not something ULE or > 4BSD have any control over. > > -- > John Baldwin > Hi John, I'm sorry for the wrong snapshot. Here's the right one with my concern. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% irq23: bce0 bce1 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd Actually I was doing a network performance testing on this system with FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a tool to generate big amount of traffic around 600Mbps-700Mbps traversing the FreeBSD system in bi-direction, meaning both network interfaces are receiving traffic. What happened was, the CPU (cpu7) that handles the (irq 23) on both interfaces consumed big amount of CPU utilization around 65.53% in which it affects other running applications and services like sshd and httpd. It's no longer accessible when traffic is bombarded. With the current situation of my FreeBSD system with only one CPU being stressed, I was thinking of moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought my concern has something to do with the distributions of load on multiple CPU cores handled by the scheduler especially at the network level, processing network load. So, if it is more of interrupt handling and not on the scheduler, is there a way we can optimize it? Because if it still routed only to one CPU then for me it's still inefficient. Who handles interrupt scheduling for bounding CPU in order to prevent shared IRQ? Is there any improvements with FreeBSD-7.0 with regards to interrupt handling? Thanks, Archimedes From archimedes.gaviola at gmail.com Mon Nov 10 20:47:24 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 20:47:30 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811101733.04547.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> Message-ID: <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> To Whom It May Concerned: >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> scheduler and as what I have observed especially on processing high >> network load traffic on multiple CPU cores, only one CPU were being >> stressed with network interrupt while the rests are mostly in idle >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> network interface cards (bce0 and bce1). Below is the snapshot of the >> case. > > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the > same interrupt (irq 23), the CPU that interrupt is routed to is going to end > up handling all the interrupts for bce0 and bce1. This not something ULE or > 4BSD have any control over. > > -- > John Baldwin > Hi John, I'm sorry for the wrong snapshot. Here's the right one with my concern. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% irq23: bce0 bce1 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd Actually I was doing a network performance testing on this system with FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a tool to generate big amount of traffic around 600Mbps-700Mbps traversing the FreeBSD system in bi-direction, meaning both network interfaces are receiving traffic. What happened was, the CPU (cpu7) that handles the (irq 23) on both interfaces consumed big amount of CPU utilization around 65.53% in which it affects other running applications and services like sshd and httpd. It's no longer accessible when traffic is bombarded. With the current situation of my FreeBSD system with only one CPU being stressed, I was thinking of moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought my concern has something to do with the distributions of load on multiple CPU cores handled by the scheduler especially at the network level, processing network load. So, if it is more of interrupt handling and not on the scheduler, is there a way we can optimize it? Because if it still routed only to one CPU then for me it's still inefficient. Who handles interrupt scheduling for bounding CPU in order to prevent shared IRQ? Is there any improvements with FreeBSD-7.0 with regards to interrupt handling? Thanks, Archimedes From archimedes.gaviola at gmail.com Mon Nov 10 23:02:20 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 23:02:32 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> Message-ID: <42e3d810811102302h3a0e38bcuf1195cf0a89c29a7@mail.gmail.com> On Tue, Nov 11, 2008 at 12:32 PM, Archimedes Gaviola wrote: > On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >>> To Whom It May Concerned: >>> >>> Can someone explain or share about ULE scheduler (latest version 2 if >>> I'm not mistaken) dealing with CPU affinity? Is there any existing >>> benchmarks on this with FreeBSD? Because I am currently using 4BSD >>> scheduler and as what I have observed especially on processing high >>> network load traffic on multiple CPU cores, only one CPU were being >>> stressed with network interrupt while the rests are mostly in idle >>> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >>> network interface cards (bce0 and bce1). Below is the snapshot of the >>> case. >> >> Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the >> same interrupt (irq 23), the CPU that interrupt is routed to is going to end >> up handling all the interrupts for bce0 and bce1. This not something ULE or >> 4BSD have any control over. >> >> -- >> John Baldwin >> > > Hi John, > > I'm sorry for the wrong snapshot. Here's the right one with my concern. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 > 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 > 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 > 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 > 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 > 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 > 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 > 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > irq23: bce0 bce1 > 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 > 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero > 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s > 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net > 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > > Actually I was doing a network performance testing on this system with > FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > tool to generate big amount of traffic around 600Mbps-700Mbps > traversing the FreeBSD system in bi-direction, meaning both network > interfaces are receiving traffic. What happened was, the CPU (cpu7) > that handles the (irq 23) on both interfaces consumed big amount of > CPU utilization around 65.53% in which it affects other running > applications and services like sshd and httpd. It's no longer > accessible when traffic is bombarded. With the current situation of my > FreeBSD system with only one CPU being stressed, I was thinking of > moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > my concern has something to do with the distributions of load on > multiple CPU cores handled by the scheduler especially at the network > level, processing network load. So, if it is more of interrupt > handling and not on the scheduler, is there a way we can optimize it? > Because if it still routed only to one CPU then for me it's still > inefficient. Who handles interrupt scheduling for bounding CPU in > order to prevent shared IRQ? Is there any improvements with > FreeBSD-7.0 with regards to interrupt handling? > > Thanks, > Archimedes > Hi Ivan, Archimedes Gaviola wrote: > To Whom It May Concerned: >=20 > Can someone explain or share about ULE scheduler (latest version 2 if > I'm not mistaken) dealing with CPU affinity? Is there any existing > benchmarks on this with FreeBSD? Because I am currently using 4BSD Yes but not for network loads. See for example benchmarks in http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf [Archimedes] Ah okay, so based on my understanding with ULE scheduler in FreeBSD-7.0, it only scale well with userland applications scheduling such as database and DNS? > scheduler and as what I have observed especially on processing high > network load traffic on multiple CPU cores, only one CPU were being > stressed with network interrupt while the rests are mostly in idle > state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > network interface cards (bce0 and bce1). Below is the snapshot of the > case. This is unfortunately so and cannot be changed for now - you are not the first with this particular performance problem. [Archimedes] Meaning, you still have to improve the ULE scheduler processing network load? I have read some papers and articles that FreeBSD is implementing parallelized network stack, what is the status of this development? Is processing high network load can address this? BUT, looking at the data in the snapshot you gave, it's not clear that there is a performance problem in your case - bce is not nearly taking as much CPU time to be bottlenecking. What exactly do you think is wrong in your case? [Archimedes] Oh I'm sorry this is not the right one. Here below, PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% irq23: bce0 bce1 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd I was doing network performance testing with a traffic generator tool bombarding 600Mbps-700Mbps traversing my FreeBSD system in both directions. As you can see cpu7 is bounded to irq23 shared on both network interfaces bce0 and bce1. cpu7 takes up 65.53% CPU utilization which affects some of the applications running on the system like sshd and httpd. These services are no longer accessible when bombarding that amount of traffic. Since there are still more idled CPUs, I'm concern about CPU load distribution so that not only one CPU will be stressed. Thanks, Archimedes From archimedes.gaviola at gmail.com Mon Nov 10 23:02:20 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 10 23:02:32 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> Message-ID: <42e3d810811102302h3a0e38bcuf1195cf0a89c29a7@mail.gmail.com> On Tue, Nov 11, 2008 at 12:32 PM, Archimedes Gaviola wrote: > On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >>> To Whom It May Concerned: >>> >>> Can someone explain or share about ULE scheduler (latest version 2 if >>> I'm not mistaken) dealing with CPU affinity? Is there any existing >>> benchmarks on this with FreeBSD? Because I am currently using 4BSD >>> scheduler and as what I have observed especially on processing high >>> network load traffic on multiple CPU cores, only one CPU were being >>> stressed with network interrupt while the rests are mostly in idle >>> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >>> network interface cards (bce0 and bce1). Below is the snapshot of the >>> case. >> >> Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the >> same interrupt (irq 23), the CPU that interrupt is routed to is going to end >> up handling all the interrupts for bce0 and bce1. This not something ULE or >> 4BSD have any control over. >> >> -- >> John Baldwin >> > > Hi John, > > I'm sorry for the wrong snapshot. Here's the right one with my concern. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 > 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 > 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 > 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 > 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 > 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 > 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 > 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > irq23: bce0 bce1 > 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 > 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero > 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s > 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net > 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > > Actually I was doing a network performance testing on this system with > FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > tool to generate big amount of traffic around 600Mbps-700Mbps > traversing the FreeBSD system in bi-direction, meaning both network > interfaces are receiving traffic. What happened was, the CPU (cpu7) > that handles the (irq 23) on both interfaces consumed big amount of > CPU utilization around 65.53% in which it affects other running > applications and services like sshd and httpd. It's no longer > accessible when traffic is bombarded. With the current situation of my > FreeBSD system with only one CPU being stressed, I was thinking of > moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > my concern has something to do with the distributions of load on > multiple CPU cores handled by the scheduler especially at the network > level, processing network load. So, if it is more of interrupt > handling and not on the scheduler, is there a way we can optimize it? > Because if it still routed only to one CPU then for me it's still > inefficient. Who handles interrupt scheduling for bounding CPU in > order to prevent shared IRQ? Is there any improvements with > FreeBSD-7.0 with regards to interrupt handling? > > Thanks, > Archimedes > Hi Ivan, Archimedes Gaviola wrote: > To Whom It May Concerned: >=20 > Can someone explain or share about ULE scheduler (latest version 2 if > I'm not mistaken) dealing with CPU affinity? Is there any existing > benchmarks on this with FreeBSD? Because I am currently using 4BSD Yes but not for network loads. See for example benchmarks in http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf [Archimedes] Ah okay, so based on my understanding with ULE scheduler in FreeBSD-7.0, it only scale well with userland applications scheduling such as database and DNS? > scheduler and as what I have observed especially on processing high > network load traffic on multiple CPU cores, only one CPU were being > stressed with network interrupt while the rests are mostly in idle > state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > network interface cards (bce0 and bce1). Below is the snapshot of the > case. This is unfortunately so and cannot be changed for now - you are not the first with this particular performance problem. [Archimedes] Meaning, you still have to improve the ULE scheduler processing network load? I have read some papers and articles that FreeBSD is implementing parallelized network stack, what is the status of this development? Is processing high network load can address this? BUT, looking at the data in the snapshot you gave, it's not clear that there is a performance problem in your case - bce is not nearly taking as much CPU time to be bottlenecking. What exactly do you think is wrong in your case? [Archimedes] Oh I'm sorry this is not the right one. Here below, PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% irq23: bce0 bce1 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd I was doing network performance testing with a traffic generator tool bombarding 600Mbps-700Mbps traversing my FreeBSD system in both directions. As you can see cpu7 is bounded to irq23 shared on both network interfaces bce0 and bce1. cpu7 takes up 65.53% CPU utilization which affects some of the applications running on the system like sshd and httpd. These services are no longer accessible when bombarding that amount of traffic. Since there are still more idled CPUs, I'm concern about CPU load distribution so that not only one CPU will be stressed. Thanks, Archimedes From ivoras at freebsd.org Tue Nov 11 09:06:50 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Nov 11 09:06:56 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811102302h3a0e38bcuf1195cf0a89c29a7@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> <42e3d810811102302h3a0e38bcuf1195cf0a89c29a7@mail.gmail.com> Message-ID: Archimedes Gaviola wrote: > Hi Ivan, > > Archimedes Gaviola wrote: >> To Whom It May Concerned: >> =20 >> Can someone explain or share about ULE scheduler (latest version 2 if >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > > Yes but not for network loads. See for example benchmarks in > http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf > > [Archimedes] Ah okay, so based on my understanding with ULE scheduler > in FreeBSD-7.0, it only scale well with userland applications > scheduling such as database and DNS? The problem you are seeing is probably not solvable by a better scheduler. There are other parts of the system that cause performance bottlenecks. I'd recommend you try 7-STABLE, it might help you, but it probably won't. From jhb at freebsd.org Wed Nov 12 05:12:14 2008 From: jhb at freebsd.org (John Baldwin) Date: Wed Nov 12 05:12:24 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> Message-ID: <200811111216.37462.jhb@freebsd.org> On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >> To Whom It May Concerned: > >> > >> Can someone explain or share about ULE scheduler (latest version 2 if > >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >> scheduler and as what I have observed especially on processing high > >> network load traffic on multiple CPU cores, only one CPU were being > >> stressed with network interrupt while the rests are mostly in idle > >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >> network interface cards (bce0 and bce1). Below is the snapshot of the > >> case. > > > > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on the > > same interrupt (irq 23), the CPU that interrupt is routed to is going to end > > up handling all the interrupts for bce0 and bce1. This not something ULE or > > 4BSD have any control over. > > > > -- > > John Baldwin > > > > Hi John, > > I'm sorry for the wrong snapshot. Here's the right one with my concern. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 > 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 > 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 > 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 > 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 > 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 > 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 > 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > irq23: bce0 bce1 > 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 > 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero > 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: clock s > 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net > 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > > Actually I was doing a network performance testing on this system with > FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > tool to generate big amount of traffic around 600Mbps-700Mbps > traversing the FreeBSD system in bi-direction, meaning both network > interfaces are receiving traffic. What happened was, the CPU (cpu7) > that handles the (irq 23) on both interfaces consumed big amount of > CPU utilization around 65.53% in which it affects other running > applications and services like sshd and httpd. It's no longer > accessible when traffic is bombarded. With the current situation of my > FreeBSD system with only one CPU being stressed, I was thinking of > moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > my concern has something to do with the distributions of load on > multiple CPU cores handled by the scheduler especially at the network > level, processing network load. So, if it is more of interrupt > handling and not on the scheduler, is there a way we can optimize it? > Because if it still routed only to one CPU then for me it's still > inefficient. Who handles interrupt scheduling for bounding CPU in > order to prevent shared IRQ? Is there any improvements with > FreeBSD-7.0 with regards to interrupt handling? It depends. In all likelihood, the interrupts from bce0 and bce1 are both hardwired to the same interrupt pin and so they will always share the same ithread when using the legacy INTx interrupts. However, bce(4) parts do support MSI, and if you try a newer OS snap (6.3 or later) these devices should use MSI in which case each NIC would be assigned to a separate CPU. I would suggest trying 7.0 or a 7.1 release candidate and see if it does better. -- John Baldwin From ivoras at freebsd.org Wed Nov 12 16:35:45 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Nov 12 16:35:52 2008 Subject: NUMA? Message-ID: Hi, As even Intel's new CPUs have integrated memory controllers and thus become NUMA, I'm interested in what is, in theory (I'm not proposing to do it, I'm just curious), necessary to change in an OS to support NUMA. My guess is: 1) node topology detection - something similar to what ULE does but also recording which memory ranges are "close" to which CPU and the "distance" between nodes/CPUs 2) on new image load (exec), pick a node for it, among "least used" nodes and record the choice per-proc; on fork, keep the new process on the same node 3) schedule threads on a CPU from the proc's node if at all possible (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a list of distances sorted in order of cost 4) allocate new pages for a proc from its node's memory range(s) if at all possible. Is this all? On the other hand, did someone do a study of performance increase for todays "consumer" NUMA systems (e.g. 2-4 sockets/nodes x86/x64 systems) - is it worth it? From julian at elischer.org Wed Nov 12 17:19:03 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Nov 12 17:19:09 2008 Subject: NUMA? In-Reply-To: References: Message-ID: <491B79BE.50800@elischer.org> Ivan Voras wrote: > Hi, I did the AMD course a few weeks ago so I'm also very interested in this.. > > As even Intel's new CPUs have integrated memory controllers and thus > become NUMA, I'm interested in what is, in theory (I'm not proposing to > do it, I'm just curious), necessary to change in an OS to support NUMA. > My guess is: > > 1) node topology detection - something similar to what ULE does but also > recording which memory ranges are "close" to which CPU and the > "distance" between nodes/CPUs at a minimum, this is needed before anything else can really work. > 2) on new image load (exec), pick a node for it, among "least used" > nodes and record the choice per-proc; on fork, keep the new process on > the same node In some cases it may be worth having multiple copies of teh read-only text segments. For example, it may eventually be worth having a /bin/sh text segment in each CPU's memory space. > 3) schedule threads on a CPU from the proc's node if at all possible > (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a > list of distances sorted in order of cost this is where it really starts getting hairy.. when do you migrate a process? and what if there are as many threads runnable as processors? > 4) allocate new pages for a proc from its node's memory range(s) if at > all possible. > > Is this all? There are other interesting effects too.. assigning network interrupts to processors that have good access to the hardware AND the destination if you can.. > > On the other hand, did someone do a study of performance increase for > todays "consumer" NUMA systems (e.g. 2-4 sockets/nodes x86/x64 systems) > - is it worth it? caches hide a multitude of sins.. > > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" From marc at wiz.com Wed Nov 12 17:32:20 2008 From: marc at wiz.com (Marc Wiz) Date: Wed Nov 12 17:32:26 2008 Subject: NUMA? In-Reply-To: References: Message-ID: <20081113010221.GB20056@wiz.com> On Thu, Nov 13, 2008 at 01:35:28AM +0100, Ivan Voras wrote: > Hi, > > As even Intel's new CPUs have integrated memory controllers and thus > become NUMA, I'm interested in what is, in theory (I'm not proposing to > do it, I'm just curious), necessary to change in an OS to support NUMA. > My guess is: > > 1) node topology detection - something similar to what ULE does but also > recording which memory ranges are "close" to which CPU and the > "distance" between nodes/CPUs > 2) on new image load (exec), pick a node for it, among "least used" > nodes and record the choice per-proc; on fork, keep the new process on > the same node > 3) schedule threads on a CPU from the proc's node if at all possible > (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a > list of distances sorted in order of cost > 4) allocate new pages for a proc from its node's memory range(s) if at > all possible. One good source of information on this topic is IBM's AIX on the Power 4 - 6 processors. There is the concept of distant vs. close memory and processors as well as what is referred to as memory affinity. Marc -- Marc Wiz marc@wiz.com Yes, that really is my last name. From archimedes.gaviola at gmail.com Thu Nov 13 03:55:02 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Thu Nov 13 03:55:09 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811111216.37462.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811101733.04547.jhb@freebsd.org> <42e3d810811102032w7850a1c0t386d80ce747f37d3@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> Message-ID: <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> >> To Whom It May Concerned: >> >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> >> scheduler and as what I have observed especially on processing high >> >> network load traffic on multiple CPU cores, only one CPU were being >> >> stressed with network interrupt while the rests are mostly in idle >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >> >> case. >> > >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > the >> > same interrupt (irq 23), the CPU that interrupt is routed to is going to > end >> > up handling all the interrupts for bce0 and bce1. This not something ULE > or >> > 4BSD have any control over. >> > >> > -- >> > John Baldwin >> > >> >> Hi John, >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >> irq23: bce0 bce1 >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > clock s >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >> >> Actually I was doing a network performance testing on this system with >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >> tool to generate big amount of traffic around 600Mbps-700Mbps >> traversing the FreeBSD system in bi-direction, meaning both network >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >> that handles the (irq 23) on both interfaces consumed big amount of >> CPU utilization around 65.53% in which it affects other running >> applications and services like sshd and httpd. It's no longer >> accessible when traffic is bombarded. With the current situation of my >> FreeBSD system with only one CPU being stressed, I was thinking of >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >> my concern has something to do with the distributions of load on >> multiple CPU cores handled by the scheduler especially at the network >> level, processing network load. So, if it is more of interrupt >> handling and not on the scheduler, is there a way we can optimize it? >> Because if it still routed only to one CPU then for me it's still >> inefficient. Who handles interrupt scheduling for bounding CPU in >> order to prevent shared IRQ? Is there any improvements with >> FreeBSD-7.0 with regards to interrupt handling? > > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > hardwired to the same interrupt pin and so they will always share the same > ithread when using the legacy INTx interrupts. However, bce(4) parts do > support MSI, and if you try a newer OS snap (6.3 or later) these devices > should use MSI in which case each NIC would be assigned to a separate CPU. I > would suggest trying 7.0 or a 7.1 release candidate and see if it does > better. > > -- > John Baldwin > Hi John, I try 7.0 release and each network interface were already allocated separately on different CPU. Here, MSI is already working. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: cpu6 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: cpu3 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: cpu4 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: cpu2 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: cpu1 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: bce0 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: bce1 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: clock s 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: Giant t 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: atkbd0 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down The bce0 interface interrupt (irq256) gets stressed out which already have 100% of CPU7 while CPU0 is around 51.17%. Any more recommendations? Is there anything we can do about optimization with MSI? Thanks, Archimedes From jhb at freebsd.org Thu Nov 13 11:46:33 2008 From: jhb at freebsd.org (John Baldwin) Date: Thu Nov 13 11:46:39 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> Message-ID: <200811131128.55220.jhb@freebsd.org> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: > On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >> >> To Whom It May Concerned: > >> >> > >> >> Can someone explain or share about ULE scheduler (latest version 2 if > >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >> >> scheduler and as what I have observed especially on processing high > >> >> network load traffic on multiple CPU cores, only one CPU were being > >> >> stressed with network interrupt while the rests are mostly in idle > >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >> >> network interface cards (bce0 and bce1). Below is the snapshot of the > >> >> case. > >> > > >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > > the > >> > same interrupt (irq 23), the CPU that interrupt is routed to is going to > > end > >> > up handling all the interrupts for bce0 and bce1. This not something ULE > > or > >> > 4BSD have any control over. > >> > > >> > -- > >> > John Baldwin > >> > > >> > >> Hi John, > >> > >> I'm sorry for the wrong snapshot. Here's the right one with my concern. > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: cpu0 > >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: cpu2 > >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: cpu3 > >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: cpu4 > >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: cpu5 > >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: cpu1 > >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: cpu6 > >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > >> irq23: bce0 bce1 > >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: cpu7 > >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% pagezero > >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > > clock s > >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: net > >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > >> > >> Actually I was doing a network performance testing on this system with > >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > >> tool to generate big amount of traffic around 600Mbps-700Mbps > >> traversing the FreeBSD system in bi-direction, meaning both network > >> interfaces are receiving traffic. What happened was, the CPU (cpu7) > >> that handles the (irq 23) on both interfaces consumed big amount of > >> CPU utilization around 65.53% in which it affects other running > >> applications and services like sshd and httpd. It's no longer > >> accessible when traffic is bombarded. With the current situation of my > >> FreeBSD system with only one CPU being stressed, I was thinking of > >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > >> my concern has something to do with the distributions of load on > >> multiple CPU cores handled by the scheduler especially at the network > >> level, processing network load. So, if it is more of interrupt > >> handling and not on the scheduler, is there a way we can optimize it? > >> Because if it still routed only to one CPU then for me it's still > >> inefficient. Who handles interrupt scheduling for bounding CPU in > >> order to prevent shared IRQ? Is there any improvements with > >> FreeBSD-7.0 with regards to interrupt handling? > > > > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > > hardwired to the same interrupt pin and so they will always share the same > > ithread when using the legacy INTx interrupts. However, bce(4) parts do > > support MSI, and if you try a newer OS snap (6.3 or later) these devices > > should use MSI in which case each NIC would be assigned to a separate CPU. I > > would suggest trying 7.0 or a 7.1 release candidate and see if it does > > better. > > > > -- > > John Baldwin > > > > Hi John, > > I try 7.0 release and each network interface were already allocated > separately on different CPU. Here, MSI is already working. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: cpu6 > 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: cpu3 > 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: cpu4 > 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: cpu2 > 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: cpu1 > 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: bce0 > 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 > 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: bce1 > 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 > 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 > 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: clock s > 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd > 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow > 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: Giant t > 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd > 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd > 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: atkbd0 > 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down > > The bce0 interface interrupt (irq256) gets stressed out which already > have 100% of CPU7 while CPU0 is around 51.17%. Any more > recommendations? Is there anything we can do about optimization with > MSI? Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it seems you are hammering your bce0 interface. You might want to try using polling on bce0 and seeing if it keeps up with the traffic better. -- John Baldwin From ivoras at freebsd.org Thu Nov 13 13:31:12 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Nov 13 13:31:18 2008 Subject: NUMA? In-Reply-To: <491B79BE.50800@elischer.org> References: <491B79BE.50800@elischer.org> Message-ID: Julian Elischer wrote: > There are other interesting effects too.. > > assigning network interrupts to processors that have good access to the > hardware AND the destination if you can.. UMA also seems to be sensitive to topology. While at that, how do you (if at all) deal with kernel memory allocations with respect to topology? Things that have their own thread or process is easy but AFAIK there is a lot of "thread-agnostic" code? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20081113/edca562e/signature.pgp From xiazhongqi at huawei.com Fri Nov 14 01:10:44 2008 From: xiazhongqi at huawei.com (Sam Xia) Date: Fri Nov 14 01:10:50 2008 Subject: inquiry In-Reply-To: <20081113120028.C8E0810656F3@hub.freebsd.org> Message-ID: <000001c94636$c0d4ce40$2f096f0a@china.huawei.com> Dear all, I am a new comer to FreeBSD kernel. I am reading code of FeeBSD kernel. who can help me explain the purpose/usage/aciton of routine "thread_single()" in kern_thread.c of FreeBSD7.0? thank everyone for reading my email. Best Regards, Sam Xia From ivoras at freebsd.org Fri Nov 14 02:10:29 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Fri Nov 14 02:10:35 2008 Subject: inquiry In-Reply-To: <000001c94636$c0d4ce40$2f096f0a@china.huawei.com> References: <20081113120028.C8E0810656F3@hub.freebsd.org> <000001c94636$c0d4ce40$2f096f0a@china.huawei.com> Message-ID: Sam Xia wrote: > Dear all, > > I am a new comer to FreeBSD kernel. I am reading code of FeeBSD kernel. > who can help me explain the purpose/usage/aciton of routine > "thread_single()" in kern_thread.c of FreeBSD7.0? Have you read the comment describing the function (it's there immediately before the function)? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20081114/67b95a7c/signature.pgp From julian at elischer.org Fri Nov 14 10:41:46 2008 From: julian at elischer.org (Julian Elischer) Date: Fri Nov 14 10:41:53 2008 Subject: inquiry In-Reply-To: References: <20081113120028.C8E0810656F3@hub.freebsd.org> <000001c94636$c0d4ce40$2f096f0a@china.huawei.com> Message-ID: <491DBFA6.70705@elischer.org> Ivan Voras wrote: > Sam Xia wrote: >> Dear all, >> >> I am a new comer to FreeBSD kernel. I am reading code of FeeBSD kernel. >> who can help me explain the purpose/usage/aciton of routine >> "thread_single()" in kern_thread.c of FreeBSD7.0? > > Have you read the comment describing the function (it's there > immediately before the function)? > I wrote that a long time ago, and things have changed a lot since then, but.. There are times, in a threaded process, when a thread making some change to teh state of the process must ensure that no other threads are running. There are several variants of this: An example of this is: Your thread is calling exit (or exec) and all the other threads must stop. now, they can't just bekilled (at least those in the kernel can't) as they may hold resources in the kernel that need to be released, so they are asked to suicide after releasing their resources. Your thread is allowed to proceed when there ar eno other threads alive. (in your process) Your thread is going to do some other action that requires no memory changes in the user space, or resources be stable.. I this case it will allow you to continue when all other threads have suspended at the user boundary.(where they are guaranteed to not hold resources). From xiazhongqi at huawei.com Fri Nov 14 22:32:54 2008 From: xiazhongqi at huawei.com (Sam Xia) Date: Fri Nov 14 22:33:02 2008 Subject: freebsd-smp Digest, Vol 223, Issue 4 In-Reply-To: <20081114120026.294C21065801@hub.freebsd.org> Message-ID: <001501c946eb$eb4c8810$2f096f0a@china.huawei.com> Hi Ivan, Thank you for your response. yes, i have read the comments. but i am not very clear what is the difference between "SINGLE_EXIT" and "SINGLE_BOUNDARY". >From the comments, I guess that this routine should suspend the other threads and only one thread can run. But from the internal implementation of "thread_single", all other threads are waked up. I am very confused. BR, S.Xia > Message: 4 > Date: Fri, 14 Nov 2008 11:10:52 +0100 > From: Ivan Voras > Subject: Re: inquiry > To: freebsd-smp@freebsd.org > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Sam Xia wrote: > > Dear all, > > > > I am a new comer to FreeBSD kernel. I am reading code of > FeeBSD kernel. > > who can help me explain the purpose/usage/aciton of routine > > "thread_single()" in kern_thread.c of FreeBSD7.0? > > Have you read the comment describing the function (it's there > immediately before the function)? > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: signature.asc > Type: application/pgp-signature > Size: 252 bytes > Desc: OpenPGP digital signature > Url : > http://lists.freebsd.org/pipermail/freebsd-smp/attachments/200 81114/67b95a7c/signature-0001.pgp > > ------------------------------ > > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > > End of freebsd-smp Digest, Vol 223, Issue 4 > ******************************************* > From archimedes.gaviola at gmail.com Mon Nov 17 03:11:01 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 17 03:11:07 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811131128.55220.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> Message-ID: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> >> >> To Whom It May Concerned: >> >> >> >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> >> >> scheduler and as what I have observed especially on processing high >> >> >> network load traffic on multiple CPU cores, only one CPU were being >> >> >> stressed with network interrupt while the rests are mostly in idle >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >> >> >> case. >> >> > >> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on >> > the >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > to >> > end >> >> > up handling all the interrupts for bce0 and bce1. This not something > ULE >> > or >> >> > 4BSD have any control over. >> >> > >> >> > -- >> >> > John Baldwin >> >> > >> >> >> >> Hi John, >> >> >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >> >> >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > cpu0 >> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > cpu2 >> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > cpu3 >> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > cpu4 >> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > cpu5 >> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > cpu1 >> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > cpu6 >> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >> >> irq23: bce0 bce1 >> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > cpu7 >> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > pagezero >> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: >> > clock s >> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > net >> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >> >> >> >> Actually I was doing a network performance testing on this system with >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >> >> tool to generate big amount of traffic around 600Mbps-700Mbps >> >> traversing the FreeBSD system in bi-direction, meaning both network >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >> >> that handles the (irq 23) on both interfaces consumed big amount of >> >> CPU utilization around 65.53% in which it affects other running >> >> applications and services like sshd and httpd. It's no longer >> >> accessible when traffic is bombarded. With the current situation of my >> >> FreeBSD system with only one CPU being stressed, I was thinking of >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >> >> my concern has something to do with the distributions of load on >> >> multiple CPU cores handled by the scheduler especially at the network >> >> level, processing network load. So, if it is more of interrupt >> >> handling and not on the scheduler, is there a way we can optimize it? >> >> Because if it still routed only to one CPU then for me it's still >> >> inefficient. Who handles interrupt scheduling for bounding CPU in >> >> order to prevent shared IRQ? Is there any improvements with >> >> FreeBSD-7.0 with regards to interrupt handling? >> > >> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both >> > hardwired to the same interrupt pin and so they will always share the same >> > ithread when using the legacy INTx interrupts. However, bce(4) parts do >> > support MSI, and if you try a newer OS snap (6.3 or later) these devices >> > should use MSI in which case each NIC would be assigned to a separate CPU. > I >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does >> > better. >> > >> > -- >> > John Baldwin >> > >> >> Hi John, >> >> I try 7.0 release and each network interface were already allocated >> separately on different CPU. Here, MSI is already working. >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > cpu6 >> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > cpu3 >> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > cpu4 >> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > cpu2 >> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > cpu1 >> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > bce0 >> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 >> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > bce1 >> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 >> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 >> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > clock s >> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd >> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow >> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > Giant t >> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd >> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd >> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > atkbd0 >> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down >> >> The bce0 interface interrupt (irq256) gets stressed out which already >> have 100% of CPU7 while CPU0 is around 51.17%. Any more >> recommendations? Is there anything we can do about optimization with >> MSI? > > Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > seems you are hammering your bce0 interface. You might want to try using > polling on bce0 and seeing if it keeps up with the traffic better. > > -- > John Baldwin > With net.isr.direct=0, my IBM system lessens CPU utilization per interface (bce0 and bce1) but swi1:net increase its utilization. Can you explained what's happening here? What does net.isr.direct do with the decrease of CPU utilization on its interface? I really wanted to know what happened internally during the packets being processed and received by the interfaces then to the device interrupt up to the software interrupt level because I am confused when enabling/disabling net.isr.direct in sysctl. Is there a tool that can we used to trace this process just to be able to know which part of the kernel internal is doing the bottleneck especially when net.isr.direct=1? By the way with device polling enabled, the system experienced packet errors and the interface throughput is worst, so I avoid using it though. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 Regards, Archimedes From archimedes.gaviola at gmail.com Mon Nov 17 03:36:41 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Mon Nov 17 03:36:48 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Message-ID: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola wrote: > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >>> >> >> To Whom It May Concerned: >>> >> >> >>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >>> >> >> scheduler and as what I have observed especially on processing high >>> >> >> network load traffic on multiple CPU cores, only one CPU were being >>> >> >> stressed with network interrupt while the rests are mostly in idle >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >>> >> >> case. >>> >> > >>> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on >>> > the >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going >> to >>> > end >>> >> > up handling all the interrupts for bce0 and bce1. This not something >> ULE >>> > or >>> >> > 4BSD have any control over. >>> >> > >>> >> > -- >>> >> > John Baldwin >>> >> > >>> >> >>> >> Hi John, >>> >> >>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >>> >> >>> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >>> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: >> cpu0 >>> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: >> cpu2 >>> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: >> cpu3 >>> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: >> cpu4 >>> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: >> cpu5 >>> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: >> cpu1 >>> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: >> cpu6 >>> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >>> >> irq23: bce0 bce1 >>> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: >> cpu7 >>> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% >> pagezero >>> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >>> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >>> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: >>> > clock s >>> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: >> net >>> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >>> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >>> >> >>> >> Actually I was doing a network performance testing on this system with >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps >>> >> traversing the FreeBSD system in bi-direction, meaning both network >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >>> >> that handles the (irq 23) on both interfaces consumed big amount of >>> >> CPU utilization around 65.53% in which it affects other running >>> >> applications and services like sshd and httpd. It's no longer >>> >> accessible when traffic is bombarded. With the current situation of my >>> >> FreeBSD system with only one CPU being stressed, I was thinking of >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >>> >> my concern has something to do with the distributions of load on >>> >> multiple CPU cores handled by the scheduler especially at the network >>> >> level, processing network load. So, if it is more of interrupt >>> >> handling and not on the scheduler, is there a way we can optimize it? >>> >> Because if it still routed only to one CPU then for me it's still >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in >>> >> order to prevent shared IRQ? Is there any improvements with >>> >> FreeBSD-7.0 with regards to interrupt handling? >>> > >>> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both >>> > hardwired to the same interrupt pin and so they will always share the same >>> > ithread when using the legacy INTx interrupts. However, bce(4) parts do >>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices >>> > should use MSI in which case each NIC would be assigned to a separate CPU. >> I >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does >>> > better. >>> > >>> > -- >>> > John Baldwin >>> > >>> >>> Hi John, >>> >>> I try 7.0 release and each network interface were already allocated >>> separately on different CPU. Here, MSI is already working. >>> >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >>> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: >> cpu6 >>> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: >> cpu3 >>> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: >> cpu4 >>> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: >> cpu2 >>> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: >> cpu1 >>> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: >> bce0 >>> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 >>> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: >> bce1 >>> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 >>> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 >>> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: >> clock s >>> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd >>> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow >>> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: >> Giant t >>> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd >>> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd >>> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: >> atkbd0 >>> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down >>> >>> The bce0 interface interrupt (irq256) gets stressed out which already >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more >>> recommendations? Is there anything we can do about optimization with >>> MSI? >> >> Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it >> seems you are hammering your bce0 interface. You might want to try using >> polling on bce0 and seeing if it keeps up with the traffic better. >> >> -- >> John Baldwin >> > > With net.isr.direct=0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface? I really wanted > to know what happened internally during the packets being processed > and received by the interfaces then to the device interrupt up to the > software interrupt level because I am confused when enabling/disabling > net.isr.direct in sysctl. Is there a tool that can we used to trace > this process just to be able to know which part of the kernel internal > is doing the bottleneck especially when net.isr.direct=1? By the way > with device polling enabled, the system experienced packet errors and > the interface throughput is worst, so I avoid using it though. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 > > > Regards, > Archimedes > One more thing, I observed that when net.isr.direct=1, bce0 is using irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now using irq31 and bce1 is using irq32. What makes it different? From ivoras at freebsd.org Mon Nov 17 05:18:02 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Mon Nov 17 05:18:09 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Message-ID: Archimedes Gaviola wrote: > With net.isr.direct=0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface? The system has a choice between processing the packets in the interrupt handler (the "irq:bce" process) or in a dedicated network process (the "swi:net" process). This is about protocol handling not simply receiving packets. With net.isr.direct you're toggling between those two options. If "direct" is 1, the packets are processed in the interrupt handler; if it's 0, the processing is delegated to swi. It's set to 1 by default because this setting should yield best latency. In both cases the code path a packet must go through is very similar: it has to be received, then processed through firewalls and network stack code, then delivered to application(s), so it's a serial process. There are things that could be better parallelized in the stack and people are working on them, but they will not be finished any time soon. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20081117/5c417e8c/signature.pgp From jhb at freebsd.org Mon Nov 17 13:13:50 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Nov 17 13:13:58 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Message-ID: <200811171609.15913.jhb@freebsd.org> On Monday 17 November 2008 06:11:00 am Archimedes Gaviola wrote: > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: > >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >> >> >> To Whom It May Concerned: > >> >> >> > >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if > >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >> >> >> scheduler and as what I have observed especially on processing high > >> >> >> network load traffic on multiple CPU cores, only one CPU were being > >> >> >> stressed with network interrupt while the rests are mostly in idle > >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the > >> >> >> case. > >> >> > > >> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > >> > the > >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > > to > >> > end > >> >> > up handling all the interrupts for bce0 and bce1. This not something > > ULE > >> > or > >> >> > 4BSD have any control over. > >> >> > > >> >> > -- > >> >> > John Baldwin > >> >> > > >> >> > >> >> Hi John, > >> >> > >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. > >> >> > >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > > cpu0 > >> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > > cpu2 > >> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > > cpu3 > >> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > > cpu4 > >> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > > cpu5 > >> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > > cpu1 > >> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > > cpu6 > >> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > >> >> irq23: bce0 bce1 > >> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > > cpu7 > >> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > > pagezero > >> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > >> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > >> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > >> > clock s > >> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > > net > >> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > >> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > >> >> > >> >> Actually I was doing a network performance testing on this system with > >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > >> >> tool to generate big amount of traffic around 600Mbps-700Mbps > >> >> traversing the FreeBSD system in bi-direction, meaning both network > >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) > >> >> that handles the (irq 23) on both interfaces consumed big amount of > >> >> CPU utilization around 65.53% in which it affects other running > >> >> applications and services like sshd and httpd. It's no longer > >> >> accessible when traffic is bombarded. With the current situation of my > >> >> FreeBSD system with only one CPU being stressed, I was thinking of > >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > >> >> my concern has something to do with the distributions of load on > >> >> multiple CPU cores handled by the scheduler especially at the network > >> >> level, processing network load. So, if it is more of interrupt > >> >> handling and not on the scheduler, is there a way we can optimize it? > >> >> Because if it still routed only to one CPU then for me it's still > >> >> inefficient. Who handles interrupt scheduling for bounding CPU in > >> >> order to prevent shared IRQ? Is there any improvements with > >> >> FreeBSD-7.0 with regards to interrupt handling? > >> > > >> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > >> > hardwired to the same interrupt pin and so they will always share the same > >> > ithread when using the legacy INTx interrupts. However, bce(4) parts do > >> > support MSI, and if you try a newer OS snap (6.3 or later) these devices > >> > should use MSI in which case each NIC would be assigned to a separate CPU. > > I > >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does > >> > better. > >> > > >> > -- > >> > John Baldwin > >> > > >> > >> Hi John, > >> > >> I try 7.0 release and each network interface were already allocated > >> separately on different CPU. Here, MSI is already working. > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > > cpu6 > >> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > > cpu3 > >> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > > cpu4 > >> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > > cpu2 > >> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > > cpu1 > >> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > > bce0 > >> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 > >> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > > bce1 > >> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 > >> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 > >> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > > clock s > >> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd > >> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow > >> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > > Giant t > >> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd > >> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd > >> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > > atkbd0 > >> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down > >> > >> The bce0 interface interrupt (irq256) gets stressed out which already > >> have 100% of CPU7 while CPU0 is around 51.17%. Any more > >> recommendations? Is there anything we can do about optimization with > >> MSI? > > > > Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > > seems you are hammering your bce0 interface. You might want to try using > > polling on bce0 and seeing if it keeps up with the traffic better. > > > > -- > > John Baldwin > > > > With net.isr.direct=0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface? I really wanted > to know what happened internally during the packets being processed > and received by the interfaces then to the device interrupt up to the > software interrupt level because I am confused when enabling/disabling > net.isr.direct in sysctl. Is there a tool that can we used to trace > this process just to be able to know which part of the kernel internal > is doing the bottleneck especially when net.isr.direct=1? By the way > with device polling enabled, the system experienced packet errors and > the interface throughput is worst, so I avoid using it though. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 With net.isr.direct=1, the ithread tries to pass the received packets up to IP/UDP/TCP/socket directly. With net.isr.direct=0, the ithread places received packets on a queue and sends a signal to 'sw1: net'. The swi thread wakes up, pulls the packets off of the queue and sends them to IP/UDP/TCP/socket. -- John Baldwin From jhb at freebsd.org Mon Nov 17 13:13:57 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Nov 17 13:14:09 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> Message-ID: <200811171609.54527.jhb@freebsd.org> On Monday 17 November 2008 06:36:40 am Archimedes Gaviola wrote: > On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola > wrote: > > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: > >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >>> >> >> To Whom It May Concerned: > >>> >> >> > >>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if > >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >>> >> >> scheduler and as what I have observed especially on processing high > >>> >> >> network load traffic on multiple CPU cores, only one CPU were being > >>> >> >> stressed with network interrupt while the rests are mostly in idle > >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the > >>> >> >> case. > >>> >> > > >>> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > >>> > the > >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > >> to > >>> > end > >>> >> > up handling all the interrupts for bce0 and bce1. This not something > >> ULE > >>> > or > >>> >> > 4BSD have any control over. > >>> >> > > >>> >> > -- > >>> >> > John Baldwin > >>> >> > > >>> >> > >>> >> Hi John, > >>> >> > >>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. > >>> >> > >>> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >>> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > >> cpu0 > >>> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > >> cpu2 > >>> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > >> cpu3 > >>> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > >> cpu4 > >>> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > >> cpu5 > >>> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > >> cpu1 > >>> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > >> cpu6 > >>> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > >>> >> irq23: bce0 bce1 > >>> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > >> cpu7 > >>> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > >> pagezero > >>> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > >>> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > >>> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > >>> > clock s > >>> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > >> net > >>> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > >>> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > >>> >> > >>> >> Actually I was doing a network performance testing on this system with > >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps > >>> >> traversing the FreeBSD system in bi-direction, meaning both network > >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) > >>> >> that handles the (irq 23) on both interfaces consumed big amount of > >>> >> CPU utilization around 65.53% in which it affects other running > >>> >> applications and services like sshd and httpd. It's no longer > >>> >> accessible when traffic is bombarded. With the current situation of my > >>> >> FreeBSD system with only one CPU being stressed, I was thinking of > >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > >>> >> my concern has something to do with the distributions of load on > >>> >> multiple CPU cores handled by the scheduler especially at the network > >>> >> level, processing network load. So, if it is more of interrupt > >>> >> handling and not on the scheduler, is there a way we can optimize it? > >>> >> Because if it still routed only to one CPU then for me it's still > >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in > >>> >> order to prevent shared IRQ? Is there any improvements with > >>> >> FreeBSD-7.0 with regards to interrupt handling? > >>> > > >>> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > >>> > hardwired to the same interrupt pin and so they will always share the same > >>> > ithread when using the legacy INTx interrupts. However, bce(4) parts do > >>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices > >>> > should use MSI in which case each NIC would be assigned to a separate CPU. > >> I > >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does > >>> > better. > >>> > > >>> > -- > >>> > John Baldwin > >>> > > >>> > >>> Hi John, > >>> > >>> I try 7.0 release and each network interface were already allocated > >>> separately on different CPU. Here, MSI is already working. > >>> > >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >>> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > >> cpu6 > >>> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > >> cpu3 > >>> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > >> cpu4 > >>> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > >> cpu2 > >>> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > >> cpu1 > >>> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > >> bce0 > >>> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 > >>> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > >> bce1 > >>> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 > >>> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 > >>> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > >> clock s > >>> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd > >>> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow > >>> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > >> Giant t > >>> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd > >>> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd > >>> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > >> atkbd0 > >>> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down > >>> > >>> The bce0 interface interrupt (irq256) gets stressed out which already > >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more > >>> recommendations? Is there anything we can do about optimization with > >>> MSI? > >> > >> Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > >> seems you are hammering your bce0 interface. You might want to try using > >> polling on bce0 and seeing if it keeps up with the traffic better. > >> > >> -- > >> John Baldwin > >> > > > > With net.isr.direct=0, my IBM system lessens CPU utilization per > > interface (bce0 and bce1) but swi1:net increase its utilization. > > Can you explained what's happening here? What does net.isr.direct do > > with the decrease of CPU utilization on its interface? I really wanted > > to know what happened internally during the packets being processed > > and received by the interfaces then to the device interrupt up to the > > software interrupt level because I am confused when enabling/disabling > > net.isr.direct in sysctl. Is there a tool that can we used to trace > > this process just to be able to know which part of the kernel internal > > is doing the bottleneck especially when net.isr.direct=1? By the way > > with device polling enabled, the system experienced packet errors and > > the interface throughput is worst, so I avoid using it though. > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 > > > > > > Regards, > > Archimedes > > > > One more thing, I observed that when net.isr.direct=1, bce0 is using > irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now > using irq31 and bce1 is using irq32. What makes it different? That is not from net.isr.direcct. irq256/257 is when the bce devices are using MSI. irq31/32 is when the bce devices are using INTx. -- John Baldwin From takawata at init-main.com Wed Nov 19 03:42:03 2008 From: takawata at init-main.com (Takanori Watanabe) Date: Wed Nov 19 03:42:15 2008 Subject: Core i7 anyone else? Message-ID: <200811191144.mAJBi3Lg004559@sana.init-main.com> Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) and sometimes hangs up oddly. When in the state, some specific process only works and replys ping, but not reply any useful information. I suspect it may caused by CPU power management, so I cut almost all CPU power management feature on BIOS parameter. Are there any people encouterd such trouble? And on this machine build world in SCHED_ULE(15min.) is slower than SCHED_4BSD(12min.). ===dmesg=== http://www.init-main.com/corei7.dmesg or http://pastebin.com/m187f77aa (if host is down) =====DSDT==== http://www.init-main.com/corei7.asl or http://pastebin.com/m6879984a ==some sysctls== hw.machine: i386 hw.model: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz hw.ncpu: 8 hw.byteorder: 1234 hw.physmem: 3202322432 hw.usermem: 2956083200 hw.pagesize: 4096 hw.floatingpoint: 1 hw.machine_arch: i386 hw.realmem: 3211264000 == machdep.enable_panic_key: 0 machdep.adjkerntz: -32400 machdep.wall_cmos_clock: 1 machdep.disable_rtc_set: 0 machdep.disable_mtrrs: 0 machdep.guessed_bootdev: 2686451712 machdep.idle: acpi machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, machdep.hlt_cpus: 0 machdep.prot_fault_translation: 0 machdep.panic_on_nmi: 1 machdep.kdb_on_nmi: 1 machdep.tsc_freq: 2684011396 machdep.i8254_freq: 1193182 machdep.acpi_timer_freq: 3579545 machdep.acpi_root: 1024240 machdep.hlt_logical_cpus: 0 machdep.logical_cpus_mask: 254 machdep.hyperthreading_allowed: 1 == kern.sched.preemption: 0 kern.sched.topology_spec: 0, 1, 2, 3, 4, 5, 6, 7 kern.sched.steal_thresh: 3 kern.sched.steal_idle: 1 kern.sched.steal_htt: 1 kern.sched.balance_interval: 133 kern.sched.balance: 1 kern.sched.affinity: 1 kern.sched.idlespinthresh: 4 kern.sched.idlespins: 10000 kern.sched.static_boost: 160 kern.sched.preempt_thresh: 0 kern.sched.interact: 30 kern.sched.slice: 13 kern.sched.name: ULE === From koitsu at FreeBSD.org Wed Nov 19 03:57:57 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Wed Nov 19 03:58:04 2008 Subject: Core i7 anyone else? In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com> References: <200811191144.mAJBi3Lg004559@sana.init-main.com> Message-ID: <20081119114714.GA85533@icarus.home.lan> On Wed, Nov 19, 2008 at 08:44:03PM +0900, Takanori Watanabe wrote: > Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) > and sometimes hangs up oddly. > When in the state, some specific process only works and > replys ping, but not reply any useful information. > > I suspect it may caused by CPU power management, so I cut > almost all CPU power management feature on BIOS parameter. > > Are there any people encouterd such trouble? > And on this machine build world in SCHED_ULE(15min.) is slower > than SCHED_4BSD(12min.). > > > ===dmesg=== > http://www.init-main.com/corei7.dmesg > or > http://pastebin.com/m187f77aa > (if host is down) > > =====DSDT==== > http://www.init-main.com/corei7.asl > or > http://pastebin.com/m6879984a > > ==some sysctls== > hw.machine: i386 > hw.model: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz > hw.ncpu: 8 > hw.byteorder: 1234 > hw.physmem: 3202322432 > hw.usermem: 2956083200 > hw.pagesize: 4096 > hw.floatingpoint: 1 > hw.machine_arch: i386 > hw.realmem: 3211264000 > == > machdep.enable_panic_key: 0 > machdep.adjkerntz: -32400 > machdep.wall_cmos_clock: 1 > machdep.disable_rtc_set: 0 > machdep.disable_mtrrs: 0 > machdep.guessed_bootdev: 2686451712 > machdep.idle: acpi > machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, > machdep.hlt_cpus: 0 > machdep.prot_fault_translation: 0 > machdep.panic_on_nmi: 1 > machdep.kdb_on_nmi: 1 > machdep.tsc_freq: 2684011396 > machdep.i8254_freq: 1193182 > machdep.acpi_timer_freq: 3579545 > machdep.acpi_root: 1024240 > machdep.hlt_logical_cpus: 0 > machdep.logical_cpus_mask: 254 > machdep.hyperthreading_allowed: 1 > == > kern.sched.preemption: 0 > kern.sched.topology_spec: > > 0, 1, 2, 3, 4, 5, 6, 7 > > > > > kern.sched.steal_thresh: 3 > kern.sched.steal_idle: 1 > kern.sched.steal_htt: 1 > kern.sched.balance_interval: 133 > kern.sched.balance: 1 > kern.sched.affinity: 1 > kern.sched.idlespinthresh: 4 > kern.sched.idlespins: 10000 > kern.sched.static_boost: 160 > kern.sched.preempt_thresh: 0 > kern.sched.interact: 30 > kern.sched.slice: 13 > kern.sched.name: ULE > === When building world/kernel, do you see odd behaviour (on CURRENT) such as the load average being absurdly high, or processes (anything; sh, make, mutt, etc.) getting stuck in bizarre states? These things are what caused my buildworld/buildkernel times to increase (compared to RELENG_7). I was using ULE entirely (on CURRENT and RELENG_7), but did not try 4BSD. I documented my experience. http://wiki.freebsd.org/JeremyChadwick/Bizarre_CURRENT_experience I have no idea if your problem is the same as mine. This is purely speculative on my part. (And readers of that Wiki article should note that the problem was not hardware-related) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From ivoras at freebsd.org Wed Nov 19 04:05:11 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Wed Nov 19 04:05:18 2008 Subject: Core i7 anyone else? In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com> References: <200811191144.mAJBi3Lg004559@sana.init-main.com> Message-ID: <4923FF7E.1080101@freebsd.org> Takanori Watanabe wrote: > Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) > and sometimes hangs up oddly. > When in the state, some specific process only works and > replys ping, but not reply any useful information. > > I suspect it may caused by CPU power management, so I cut > almost all CPU power management feature on BIOS parameter. > > Are there any people encouterd such trouble? > And on this machine build world in SCHED_ULE(15min.) is slower > than SCHED_4BSD(12min.). I don't know but this: > ===dmesg=== > http://www.init-main.com/corei7.dmesg > or > http://pastebin.com/m187f77aa > (if host is down) CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (2684.00-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x106a4 Stepping = 4 Features=0xbfebfbff Features2=0x98e3bd AMD Features=0x28100000 AMD Features2=0x1 Cores per package: 8 Logical CPUs per core: 2 real memory = 3211264000 (3062 MB) avail memory = 3143983104 (2998 MB) ACPI APIC Table: <7522MS A7522100> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 is a bit in conflict with this: > kern.sched.topology_spec: > > 0, 1, 2, 3, 4, 5, 6, 7 > > > From what I know of its architecture i7 has hyperthreading - i.e. the CPU has 4 "real" cores which are hyperthreaded, so you get 8 cores total. It probably also includes a different way of enumerating its topology which might have caused wrong topology detection and your slowdown in buildworld. (the CPU also has L3 cache, but I think it's not looked up in topology detection). I don't know it this particular error could be responsible for your lockups - probably not. The CPU also introduces some big changes in power management (dynamic powerdown of individual cores) which could cause them - but I can't help you there. Are you sure it's not something trivial like overheating? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20081119/c476043b/signature.pgp From archimedes.gaviola at gmail.com Wed Nov 26 01:32:45 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Wed Nov 26 01:32:52 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811171609.54527.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> <200811171609.54527.jhb@freebsd.org> Message-ID: <42e3d810811260132l53027184s102e8a5e3b70dfb2@mail.gmail.com> > In both cases the code path a packet must go through is very similar: it > has to be received, then processed through firewalls and network stack > code, then delivered to application(s), so it's a serial process. There > are things that could be better parallelized in the stack and people are > working on them, but they will not be finished any time soon. Ah okay so the project is moving towards network stack parallelism. What is the benefit of parallelized network stack in comparison to the current serialized network stack? Is there any known issues with serialized network stack dealing with multiple CPUs? If it has, in what aspect, components or subsystem of the operating system? With network stack parallelism, what are the necessary changes of the operating system? How should be the network processing be optimized with parallelized network stack? I have gone through a technical paper in the Internet about evaluation on network stack parallelism strategies for modern operating system http://www.cs.rice.edu/CS/Architecture/docs/willmann-usenix06.pdf which described about approaches in implementing parallelized network stack in which also described FreeBSD were used as the prototype of the different approaches, from here I want to know what approach does FreeBSD is implementing, is it message-based parallelism or connection-based parallelism? Thanks, Archimedes From archimedes.gaviola at gmail.com Wed Nov 26 03:18:45 2008 From: archimedes.gaviola at gmail.com (Archimedes Gaviola) Date: Wed Nov 26 03:18:53 2008 Subject: CPU affinity with ULE scheduler In-Reply-To: <200811171609.54527.jhb@freebsd.org> References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> <200811171609.54527.jhb@freebsd.org> Message-ID: <42e3d810811260318j2656ac57k465c56d1c2b0dcf2@mail.gmail.com> > Is there a tool that can we used to trace > this process just to be able to know which part of the kernel internal > is doing the bottleneck especially when net.isr.direct=1? By the way > with device polling enabled, the system experienced packet errors and > the interface throughput is worst, so I avoid using it though. > Since I was really looking for a tool to be able to know how packets are being processed from the interface and up to the network stack and applications, but I haven't found any tool for my concern. What I have found is the LOCK_PROFILING tool. Although I'm sure that this really not answer my concern but I just tried because I need to know something about locks which FreeBSD is using with. Some people consider that there's a lot of factors and variables with regards to network performance in FreeBSD, so I got a try on this tool. I also get valuable info from this link http://markmail.org/message/3uqxi4pipvvoy6jx#query:lock%20profiling%20freebsd+page:1+mid:ymqgrxqf4min54zd+state:results. Instead of the IBM machine with Broadcom NICs, I use another machine with 4 x Quad-Core AMD64 with still Broadcom NICs on FreeBSD-7.1 BETA2. I took data results with traffic and without traffic. With traffic, I use both TCP and UDP protocols in bombarding traffic. UDP for upload and TCP for download in a back-to-back setup. What I have found is that there's a high wait_total on some of the following when there's traffic: max total wait_total count avg wait_avg cnt_hold cnt_lock name 517 24761291 6165864 4460995 5 1 552124 1558183 net/route.c:293 (sleep mutex:radix node head) 277 1427082 140797 354220 4 0 14476 20674 amd64/amd64/io_apic.c:212 (spin mutex:icu) 33 25275 20744 5401 4 3 0 5400 amd64/amd64/mp_machdep.c:974 (spin mutex:sched lock 4) 17283 3346679 104214 107262 31 0 4545 4072 kern/kern_sysctl.c:1334 (sleep mutex:Giant) 257 28599 386 1302 21 0 35 30 vm/vm_fault.c:667 (sleep mutex:vm object) 282 2821743 2673 977635 2 0 926 552 net/if_ethersubr.c:405 (sleep mutex:bce1) 22 743637 157239 256274 2 0 5304 48357 dev/random/randomdev_soft.c:308 (spin mutex:entropy harvest mutex) 301 16301894 881827 1255534 12 0 241491 45973 dev/bce/if_bce.c:5016 (sleep mutex:bce0) 273 1228787 55458 103863 11 0 3733 4736 kern/subr_sleepqueue.c:232 (spin mutex:sleepq chain) 624 4682305 1339783 1251253 3 1 32664 254211 dev/bce/if_bce.c:4320 (sleep mutex:bce1) With lock profiling, how do we know that a certain kernel structure or function is causing a contention? I only have little knowledge about mutex, can someone elaborate on these especially sleep and spin mutex? Unfortunately due to the log result is too big for the mailing list then I only attached the complete log in compressed format. Thanks, Archimedes From salvataha1 at live.co.za Wed Nov 26 05:55:28 2008 From: salvataha1 at live.co.za (Col. Salva Taha) Date: Wed Nov 26 05:55:34 2008 Subject: PROPERTIES. Message-ID: <200811261355.mAQDtQXk017310@vulcan.highspd.net> Good Day, I wish to introduce myself to you.I am Col.Salva Taha a top Sudanese Goverment official who opposed the war in Dafur in my country Sudan.Due to my oppostion to the war,the goverment of my country has been persecuting me.Consequently my wife,children and I managed to enter a red cross airplane that was evacuating foreigners and we are presently in Cape Town,South Africa. We wish to invest in properties in your country with your assistance and cooperation.If you are in a good position to help my family, please send an email to the email address below indicating your desire to help my family invest the funds in your country and beyond. I await your email. best regards. God bless, Col. Salva Taha Email:salvataha1@live.co.za From salvataha1 at live.co.za Wed Nov 26 06:41:48 2008 From: salvataha1 at live.co.za (Col. Salva Taha) Date: Wed Nov 26 06:41:53 2008 Subject: PROPERTIES. Message-ID: <200811261340.mAQDea6q013474@vulcan.highspd.net> Good Day, I wish to introduce myself to you.I am Col.Salva Taha a top Sudanese Goverment official who opposed the war in Dafur in my country Sudan.Due to my oppostion to the war,the goverment of my country has been persecuting me.Consequently my wife,children and I managed to enter a red cross airplane that was evacuating foreigners and we are presently in Cape Town,South Africa. We wish to invest in properties in your country with your assistance and cooperation.If you are in a good position to help my family, please send an email to the email address below indicating your desire to help my family invest the funds in your country and beyond. I await your email. best regards. God bless, Col. Salva Taha Email:salvataha1@live.co.za