From fabrizio.invernizzi at telecomitalia.it Mon Aug 3 08:46:28 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Mon Aug 3 08:46:35 2009 Subject: Test on 10GBE Intel based network card Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> Hi all I am doing some tests on a BSD system with a 10gbe Intel based network card and I have some doubts about the configuration since the performance I am experiencing looks (very) poor. This is the system I am doing test on: - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core installed. - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit Ethernet PCI Express Server Adapter Intel? based (chip 82598EB). Driver ixgbe-1.8.6 - FreeBSD 7.2-RELEASE (64 bit) with this options compiled in the kernel options ZERO_COPY_SOCKETS # Turn on zero copy send code options HZ=1000 options BPF_JITTER I worked on the driver settings in order to have big TX/RX rings and low interrupt rate (traffic latency is not an issue). In order to tune up the system i started with some echo request tests. These are the maximum Bandwidths I can send without loss: - 64 byte packets: 312 Mbps (1,64% CPU idle) - 512 byte packets: 2117 Mbps (1,63% CPU idle) - 1492 byte packets: 5525 Mbps (1,93% CPU idle) Am I right considering these figures lower than expected? The system is just managing network traffic! Now I have started with netgraph tests, in particular with ng_bpf and the overall system is going even worst. I sent some HTTP traffic (597 bytes-long packets) and I configured an ng_bpf to filter TCP traffic out from the incoming interface (ix0). If I use the ngctl msg to see counters on the ng_bpf node, I see extremely poor performance: - Sending 96Mbps of this traffic I figured out 0.1% packet loss. This looks very strange. May be some counter bug? - Sending 5500Mbps, the netgraph (not the network card driver) is loosing 21% of the number of sent packets. See below a snapshot of the CPU load under traffic load CPU: 0.0% user, 0.0% nice, 87.0% system, 9.1% interrupt, 3.9% idle Mem: 16M Active, 317M Inact, 366M Wired, 108K Cache, 399M Buf, 7222M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 1 171 ki31 0K 16K RUN 2 20.2H 68.80% idle: cpu2 11 root 1 171 ki31 0K 16K RUN 3 20.1H 64.70% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 64.26% idle: cpu0 13 root 1 171 ki31 0K 16K RUN 1 20.2H 63.67% idle: cpu1 38 root 1 -68 - 0K 16K CPU1 1 1:28 34.67% ix0 rxq 40 root 1 -68 - 0K 16K CPU2 0 1:26 34.18% ix0 rxq 34 root 1 -68 - 0K 16K CPU3 3 1:27 34.08% ix0 rxq 36 root 1 -68 - 0K 16K RUN 2 1:26 34.08% ix0 rxq 33 root 1 -68 - 0K 16K WAIT 3 0:40 4.05% irq260: ix0 39 root 1 -68 - 0K 16K WAIT 2 0:41 3.96% irq263: ix0 35 root 1 -68 - 0K 16K WAIT 0 0:39 3.66% irq261: ix0 37 root 1 -68 - 0K 16K WAIT 1 0:42 3.47% irq262: ix0 16 root 1 -32 - 0K 16K WAIT 0 14:53 2.49% swi4: clock sio Am I missing something? Does someone know some (more) system tuning to have higher traffic rate supported? Any help is greatly appreciated. Fabrizio ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From stefan.lambrev at moneybookers.com Mon Aug 3 09:41:42 2009 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Mon Aug 3 09:41:49 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> Message-ID: <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> Hi, The limitation that you see is about the max number of packets that FreeBSD can handle - it looks like your best performance is reached at 64 byte packets? Am I correct that the maximum you can reach is around 639,000 packets per second? Also you are not routing the traffic, but instead the server handles the requests itself and eat CPU to reply? On Aug 3, 2009, at 11:46 AM, Invernizzi Fabrizio wrote: > Hi all > > I am doing some tests on a BSD system with a 10gbe Intel based > network card and I have some doubts about the configuration since > the performance I am experiencing looks (very) poor. > > This is the system I am doing test on: > > > > - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 > cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core > installed. > > - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit > Ethernet PCI Express Server Adapter Intel? based (chip 82598EB). > Driver ixgbe-1.8.6 > > - FreeBSD 7.2-RELEASE (64 bit) with this options compiled in the > kernel > options ZERO_COPY_SOCKETS # Turn on zero > copy send code > options HZ=1000 > options BPF_JITTER > > > > I worked on the driver settings in order to have big TX/RX rings and > low interrupt rate (traffic latency is not an issue). > > > > In order to tune up the system i started with some echo request tests. > > These are the maximum Bandwidths I can send without loss: > > - 64 byte packets: 312 Mbps (1,64% CPU idle) > > - 512 byte packets: 2117 Mbps (1,63% CPU idle) > > - 1492 byte packets: 5525 Mbps (1,93% CPU idle) > > > > Am I right considering these figures lower than expected? > The system is just managing network traffic! > > > > Now I have started with netgraph tests, in particular with ng_bpf > and the overall system is going even worst. > > I sent some HTTP traffic (597 bytes-long packets) and I configured > an ng_bpf to filter TCP traffic out from the incoming interface (ix0). > > If I use the ngctl msg to see counters on the ng_bpf node, I see > extremely poor performance: > > > > - Sending 96Mbps of this traffic I figured out 0.1% packet loss. > This looks very strange. May be some counter bug? > > - Sending 5500Mbps, the netgraph (not the network card driver) is > loosing 21% of the number of sent packets. See below a snapshot of > the CPU load under traffic load > > > > CPU: 0.0% user, 0.0% nice, 87.0% system, 9.1% interrupt, 3.9% idle > > Mem: 16M Active, 317M Inact, 366M Wired, 108K Cache, 399M Buf, 7222M > Free > > Swap: 2048M Total, 2048M Free > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > > 12 root 1 171 ki31 0K 16K RUN 2 20.2H 68.80% > idle: cpu2 > > 11 root 1 171 ki31 0K 16K RUN 3 20.1H 64.70% > idle: cpu3 > > 14 root 1 171 ki31 0K 16K RUN 0 20.2H 64.26% > idle: cpu0 > > 13 root 1 171 ki31 0K 16K RUN 1 20.2H 63.67% > idle: cpu1 > > 38 root 1 -68 - 0K 16K CPU1 1 1:28 34.67% > ix0 rxq > > 40 root 1 -68 - 0K 16K CPU2 0 1:26 34.18% > ix0 rxq > > 34 root 1 -68 - 0K 16K CPU3 3 1:27 34.08% > ix0 rxq > > 36 root 1 -68 - 0K 16K RUN 2 1:26 34.08% > ix0 rxq > > 33 root 1 -68 - 0K 16K WAIT 3 0:40 4.05% > irq260: ix0 > > 39 root 1 -68 - 0K 16K WAIT 2 0:41 3.96% > irq263: ix0 > > 35 root 1 -68 - 0K 16K WAIT 0 0:39 3.66% > irq261: ix0 > > 37 root 1 -68 - 0K 16K WAIT 1 0:42 3.47% > irq262: ix0 > > 16 root 1 -32 - 0K 16K WAIT 0 14:53 2.49% > swi4: clock sio > > > > > > > > Am I missing something? > > Does someone know some (more) system tuning to have higher traffic > rate supported? > > > > Any help is greatly appreciated. > > > > Fabrizio > > > > > > ------------------------------------------------------------------ > Telecom Italia > Fabrizio INVERNIZZI > Technology - TILAB > Accesso Fisso e Trasporto > Via Reiss Romoli, 274 10148 Torino > Tel. +39 011 2285497 > Mob. +39 3316001344 > Fax +39 06 41867287 > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > alle persone indicate. La diffusione, copia o qualsiasi altra azione > derivante dalla conoscenza di queste informazioni sono rigorosamente > vietate. Qualora abbiate ricevuto questo documento per errore siete > cortesemente pregati di darne immediata comunicazione al mittente e > di provvedere alla sua distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain > privileged information intended for the addressee(s) only. > Dissemination, copying, printing or use by anybody else is > unauthorised. If you are not the intended recipient, please delete > this message and any attachments and advise the sender by return e- > mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org > " -- Best Wishes, Stefan Lambrev ICQ# 24134177 From fabrizio.invernizzi at telecomitalia.it Mon Aug 3 09:53:58 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Mon Aug 3 09:54:05 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> Hi > -----Original Message----- > From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] > Sent: luned? 3 agosto 2009 11.22 > To: Invernizzi Fabrizio > Cc: freebsd-performance@freebsd.org > Subject: Re: Test on 10GBE Intel based network card > > Hi, > > The limitation that you see is about the max number of packets that > FreeBSD can handle - it looks like your best performance is reached at > 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps > Am I correct that the maximum you can reach is around 639,000 packets > per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? > Also you are not routing the traffic, but instead the server handles > the requests itself and eat CPU to reply? Correct. In these first tests I want to "tune" the system, so I am using the (let me say) worst scenario. > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to "freebsd-performance- > unsubscribe@freebsd.org > > " > > -- > Best Wishes, > Stefan Lambrev > ICQ# 24134177 > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From stefan.lambrev at moneybookers.com Mon Aug 3 10:15:13 2009 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Mon Aug 3 10:15:20 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> Message-ID: <18AAC16B-3CC0-4C70-A009-00A325AB5932@moneybookers.com> Hi, On Aug 3, 2009, at 12:53 PM, Invernizzi Fabrizio wrote: > Hi > > >> -----Original Message----- >> From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] >> Sent: luned? 3 agosto 2009 11.22 >> To: Invernizzi Fabrizio >> Cc: freebsd-performance@freebsd.org >> Subject: Re: Test on 10GBE Intel based network card >> >> Hi, >> >> The limitation that you see is about the max number of packets that >> FreeBSD can handle - it looks like your best performance is reached >> at >> 64 byte packets? > > If you are meaning in term of Packet per second, you are right. > These are the packet per second measured during tests: > > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > >> Am I correct that the maximum you can reach is around 639,000 packets >> per second? > > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? I duno - the tests I did before were with SYN packets (random source) which was my worst scenario, and the server CPU were busy generating MD5 check sums for "syncache" (around 35% of the time). If I have to compare my results with your, you beat me with factor 2.5, may be because you use ICMP for the test and your processor is better then my test stations :) Also my experience is only with gigabit cards (em driver) and FreeBSD 7.something_before_1 where the em thread was eating 100% cpu. If you are lucky LOCK_PROFILING(9) will help you to see where the CPUs spend their time, if not you will see kernel panic :) Once problematic locks identified they can be reworked, but I think the first part is already done and work on the second already started. In my experience increasing hw.em.rxd and hw.em.txd yelled better results, but I think ixgb already comes tuned by default as it still doesn't have to support such a large number of different cards. Also at the time of my tests there were not support for multi queues in the OS even if they were supported by the HW, which is changed in 7.2 (?) > >> Also you are not routing the traffic, but instead the server handles >> the requests itself and eat CPU to reply? > > Correct. In these first tests I want to "tune" the system, so I am > using the (let me say) worst scenario. > > >>> _______________________________________________ >>> freebsd-performance@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >>> To unsubscribe, send any mail to "freebsd-performance- >> unsubscribe@freebsd.org >>> " >> >> -- >> Best Wishes, >> Stefan Lambrev >> ICQ# 24134177 >> >> >> >> > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > alle persone indicate. La diffusione, copia o qualsiasi altra azione > derivante dalla conoscenza di queste informazioni sono rigorosamente > vietate. Qualora abbiate ricevuto questo documento per errore siete > cortesemente pregati di darne immediata comunicazione al mittente e > di provvedere alla sua distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain > privileged information intended for the addressee(s) only. > Dissemination, copying, printing or use by anybody else is > unauthorised. If you are not the intended recipient, please delete > this message and any attachments and advise the sender by return e- > mail, Thanks. > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From fabrizio.invernizzi at telecomitalia.it Mon Aug 3 10:17:53 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Mon Aug 3 10:17:59 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4560DF961@GRFMBX702BA020.griffon.local> HI These are the sysctl vars you pointed out kern.ipc.somaxconn: 128 net.inet.tcp.recvspace: 65536 net.inet.tcp.sendspace: 32768 kern.ipc.shmmax: 33554432 kern.ipc.shmmni: 192 kern.ipc.shmseg: 128 kern.ipc.semmni: 10 net.local.stream.sendspace: 8192 net.local.stream.recvspace: 8192 net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.nolocaltimewait: 0 net.inet.tcp.hostcache.expire: 3600 kern.maxusers: 384 kern.ipc.nmbclusters: 65635 kern.ipc.maxsockets: 65635 kern.ipc.maxsockbuf: 262144 net.inet.tcp.tcbhashsize: 512 net.inet.tcp.hostcache.hashsize: 512 Fabrizio ________________________________________ From: Istv?n [mailto:leccine@gmail.com] Sent: luned? 3 agosto 2009 12.02 To: Invernizzi Fabrizio Cc: Stefan Lambrev; freebsd-performance@freebsd.org Subject: Re: Test on 10GBE Intel based network card what about your sysctls? I would like to see what have you done yet. Here is a previous conversation about that: http://www.mail-archive.com/freebsd-performance@freebsd.org/msg02293.html Actually I am not aware of the changes in 7.2 or what could be the best value set for you, worth to try to fine tune these. Regards, Istvan On Mon, Aug 3, 2009 at 10:53 AM, Invernizzi Fabrizio wrote: Hi > -----Original Message----- > From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] > Sent: luned? 3 agosto 2009 11.22 > To: Invernizzi Fabrizio > Cc: freebsd-performance@freebsd.org > Subject: Re: Test on 10GBE Intel based network card > > Hi, > > The limitation that you see is about the max number of packets that > FreeBSD can handle - it looks like your best performance is reached at > 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps > Am I correct that the maximum you can reach is around 639,000 packets > per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? > Also you are not routing the traffic, but instead the server handles > the requests itself and eat CPU to reply? Correct. In these first tests I want to "tune" the system, so I am using the (let me say) worst scenario. > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to "freebsd-performance- > unsubscribe@freebsd.org > > " > > -- > Best Wishes, > Stefan Lambrev > ICQ# 24134177 > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" -- the sun shines for all Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From leccine at gmail.com Mon Aug 3 10:20:59 2009 From: leccine at gmail.com (=?ISO-8859-1?B?SXN0duFu?=) Date: Mon Aug 3 10:21:06 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF961@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <36A93B31228D3B49B691AD31652BCAE9A4560DF961@GRFMBX702BA020.griffon.local> Message-ID: Nice, I guess to analyse the bottlenecks you might want to use dtrace....If it is working with 7.2, I am not sure about that. Regards, Istvan On Mon, Aug 3, 2009 at 11:17 AM, Invernizzi Fabrizio < fabrizio.invernizzi@telecomitalia.it> wrote: > HI > > These are the sysctl vars you pointed out > > kern.ipc.somaxconn: 128 > net.inet.tcp.recvspace: 65536 > net.inet.tcp.sendspace: 32768 > kern.ipc.shmmax: 33554432 > kern.ipc.shmmni: 192 > kern.ipc.shmseg: 128 > kern.ipc.semmni: 10 > net.local.stream.sendspace: 8192 > net.local.stream.recvspace: 8192 > net.inet.tcp.local_slowstart_flightsize: 4 > net.inet.tcp.nolocaltimewait: 0 > net.inet.tcp.hostcache.expire: 3600 > > kern.maxusers: 384 > kern.ipc.nmbclusters: 65635 > kern.ipc.maxsockets: 65635 > kern.ipc.maxsockbuf: 262144 > net.inet.tcp.tcbhashsize: 512 > net.inet.tcp.hostcache.hashsize: 512 > > > Fabrizio > > ________________________________________ > From: Istv?n [mailto:leccine@gmail.com] > Sent: luned? 3 agosto 2009 12.02 > To: Invernizzi Fabrizio > Cc: Stefan Lambrev; freebsd-performance@freebsd.org > Subject: Re: Test on 10GBE Intel based network card > > what about your sysctls? > > I would like to see what have you done yet. > > Here is a previous conversation about that: > http://www.mail-archive.com/freebsd-performance@freebsd.org/msg02293.html > > Actually I am not aware of the changes in 7.2 or what could be the best > value set for you, worth to try to fine tune these. > > Regards, > Istvan > On Mon, Aug 3, 2009 at 10:53 AM, Invernizzi Fabrizio < > fabrizio.invernizzi@telecomitalia.it> wrote: > Hi > > > > -----Original Message----- > > From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] > > Sent: luned? 3 agosto 2009 11.22 > > To: Invernizzi Fabrizio > > Cc: freebsd-performance@freebsd.org > > Subject: Re: Test on 10GBE Intel based network card > > > > Hi, > > > > The limitation that you see is about the max number of packets that > > FreeBSD can handle - it looks like your best performance is reached at > > 64 byte packets? > If you are meaning in term of Packet per second, you are right. These are > the packet per second measured during tests: > > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > > > Am I correct that the maximum you can reach is around 639,000 packets > > per second? > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? > > > Also you are not routing the traffic, but instead the server handles > > the requests itself and eat CPU to reply? > Correct. In these first tests I want to "tune" the system, so I am using > the (let me say) worst scenario. > > > > > _______________________________________________ > > > freebsd-performance@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > > To unsubscribe, send any mail to "freebsd-performance- > > unsubscribe@freebsd.org > > > " > > > > -- > > Best Wishes, > > Stefan Lambrev > > ICQ# 24134177 > > > > > > > > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > > > > -- > the sun shines for all > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > -- the sun shines for all From leccine at gmail.com Mon Aug 3 10:22:36 2009 From: leccine at gmail.com (=?ISO-8859-1?B?SXN0duFu?=) Date: Mon Aug 3 10:22:48 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> Message-ID: what about your sysctls? I would like to see what have you done yet. Here is a previous conversation about that: http://www.mail-archive.com/freebsd-performance@freebsd.org/msg02293.html Actually I am not aware of the changes in 7.2 or what could be the best value set for you, worth to try to fine tune these. Regards, Istvan On Mon, Aug 3, 2009 at 10:53 AM, Invernizzi Fabrizio < fabrizio.invernizzi@telecomitalia.it> wrote: > Hi > > > > -----Original Message----- > > From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] > > Sent: luned? 3 agosto 2009 11.22 > > To: Invernizzi Fabrizio > > Cc: freebsd-performance@freebsd.org > > Subject: Re: Test on 10GBE Intel based network card > > > > Hi, > > > > The limitation that you see is about the max number of packets that > > FreeBSD can handle - it looks like your best performance is reached at > > 64 byte packets? > > If you are meaning in term of Packet per second, you are right. These are > the packet per second measured during tests: > > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > > > Am I correct that the maximum you can reach is around 639,000 packets > > per second? > > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? > > > Also you are not routing the traffic, but instead the server handles > > the requests itself and eat CPU to reply? > > Correct. In these first tests I want to "tune" the system, so I am using > the (let me say) worst scenario. > > > > > _______________________________________________ > > > freebsd-performance@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > > To unsubscribe, send any mail to "freebsd-performance- > > unsubscribe@freebsd.org > > > " > > > > -- > > Best Wishes, > > Stefan Lambrev > > ICQ# 24134177 > > > > > > > > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > -- the sun shines for all From fabrizio.invernizzi at telecomitalia.it Mon Aug 3 10:27:00 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Mon Aug 3 10:27:08 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <18AAC16B-3CC0-4C70-A009-00A325AB5932@moneybookers.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <18AAC16B-3CC0-4C70-A009-00A325AB5932@moneybookers.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4560DF96A@GRFMBX702BA020.griffon.local> > > If you are meaning in term of Packet per second, you are right. > > These are the packet per second measured during tests: > > > > 64 byte: 610119 Pps > > 512 byte: 516917 Pps > > 1492 byte: 464962 Pps > > > > > >> Am I correct that the maximum you can reach is around 639,000 packets > >> per second? > > > > Yes, as you can see the maximum is 610119 Pps. > > Where does this limit come from? > > I duno - the tests I did before were with SYN packets (random source) > which was my worst scenario, > and the server CPU were busy generating MD5 check sums for > "syncache" (around 35% of the time). > > If I have to compare my results with your, you beat me with factor > 2.5, may be because you use ICMP for the test > and your processor is better then my test stations :) > Also my experience is only with gigabit cards (em driver) and FreeBSD > 7.something_before_1 where the em thread was eating 100% cpu. > If you are lucky LOCK_PROFILING(9) will help you to see where the CPUs > spend their time, if not you will see kernel panic :) I will check, thanks for the hint. > Once problematic locks identified they can be reworked, but I think > the first part is already done > and work on the second already started. > > In my experience increasing hw.em.rxd and hw.em.txd yelled better > results, but I think ixgb already comes tuned by default > as it still doesn't have to support such a large number of different > cards. I did some tuning in the code of the driver e recompiled the kernel in order to reduce context switching (interrupt mitigation) since the driver does not support POLLING. > Also at the time of my tests there were not support for multi queues > in the OS even if they were supported by the HW, which is changed in > 7.2 (?) It looks like multi queue is working since I can see the load distributed over the 4 cores. Fabrizio Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From jfvogel at gmail.com Mon Aug 3 15:49:25 2009 From: jfvogel at gmail.com (Jack Vogel) Date: Mon Aug 3 15:49:32 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF96A@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <18AAC16B-3CC0-4C70-A009-00A325AB5932@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF96A@GRFMBX702BA020.griffon.local> Message-ID: <2a41acea0908030820o40438f6erda78927733529a9@mail.gmail.com> If you go to FreeBSD 8 you will get the improved stack code, and the RX/TX queue pairs will be pinned to cpus. It should improve performance. Make sure you have enough mbuf memory allocated, try increasing the descriptors. Jack On Mon, Aug 3, 2009 at 3:26 AM, Invernizzi Fabrizio < fabrizio.invernizzi@telecomitalia.it> wrote: > > > If you are meaning in term of Packet per second, you are right. > > > These are the packet per second measured during tests: > > > > > > 64 byte: 610119 Pps > > > 512 byte: 516917 Pps > > > 1492 byte: 464962 Pps > > > > > > > > >> Am I correct that the maximum you can reach is around 639,000 packets > > >> per second? > > > > > > Yes, as you can see the maximum is 610119 Pps. > > > Where does this limit come from? > > > > I duno - the tests I did before were with SYN packets (random source) > > which was my worst scenario, > > and the server CPU were busy generating MD5 check sums for > > "syncache" (around 35% of the time). > > > > If I have to compare my results with your, you beat me with factor > > 2.5, may be because you use ICMP for the test > > and your processor is better then my test stations :) > > Also my experience is only with gigabit cards (em driver) and FreeBSD > > 7.something_before_1 where the em thread was eating 100% cpu. > > If you are lucky LOCK_PROFILING(9) will help you to see where the CPUs > > spend their time, if not you will see kernel panic :) > > > I will check, thanks for the hint. > > > Once problematic locks identified they can be reworked, but I think > > the first part is already done > > and work on the second already started. > > > > In my experience increasing hw.em.rxd and hw.em.txd yelled better > > results, but I think ixgb already comes tuned by default > > as it still doesn't have to support such a large number of different > > cards. > > I did some tuning in the code of the driver e recompiled the kernel in > order to reduce context switching (interrupt mitigation) since the driver > does not support POLLING. > > > Also at the time of my tests there were not support for multi queues > > in the OS even if they were supported by the HW, which is changed in > > 7.2 (?) > > It looks like multi queue is working since I can see the load distributed > over the 4 cores. > > > Fabrizio > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > From julian at elischer.org Mon Aug 3 16:11:17 2009 From: julian at elischer.org (Julian Elischer) Date: Mon Aug 3 16:11:23 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> Message-ID: <4A77094C.8030308@elischer.org> Invernizzi Fabrizio wrote: > Hi > > >> -----Original Message----- >> From: Stefan Lambrev [mailto:stefan.lambrev@moneybookers.com] >> Sent: luned? 3 agosto 2009 11.22 >> To: Invernizzi Fabrizio >> Cc: freebsd-performance@freebsd.org >> Subject: Re: Test on 10GBE Intel based network card >> >> Hi, >> >> The limitation that you see is about the max number of packets that >> FreeBSD can handle - it looks like your best performance is reached at >> 64 byte packets? > > If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: > > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > >> Am I correct that the maximum you can reach is around 639,000 packets >> per second? > > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? ah that's the whole point of tuning :-) there are severalpossibities: 1/ the card's interrupts are probably attache dto aonly 1 cpu, so that cpu can do no more work 2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere. is the machine still responsive to other networks while running at maximum capacity on this network? (make sure that the other networks are on a differnet CPU (hmm I can't remember how to do that :-). > >> Also you are not routing the traffic, but instead the server handles >> the requests itself and eat CPU to reply? > > Correct. In these first tests I want to "tune" the system, so I am using the (let me say) worst scenario. > > From raykinsella78 at gmail.com Mon Aug 3 16:22:43 2009 From: raykinsella78 at gmail.com (Ray Kinsella) Date: Mon Aug 3 16:22:50 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <584ec6bb0908030819vee58480p43989b742e1b7fd2@mail.gmail.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <584ec6bb0908030819vee58480p43989b742e1b7fd2@mail.gmail.com> Message-ID: <584ec6bb0908030914m74b79dceq9af2581e1b02449a@mail.gmail.com> Hi Fabizio, Ignore my last mail direct to you, 638976 PPS is awful. (today is a national holiday here, my brain is not switched on). To me it looks like interrupt coalescing is not switched on for some reason. Are you passing any parameters to the driver in boot.conf. Could you retest with vmstat switched on "vmstat 3" and send us the output. I expect we are going to see alot of interrupts. Regards Ray Kinsella On Mon, Aug 3, 2009 at 4:19 PM, Ray Kinsella wrote: > Hi Fabrizio, > > I am an Intel Network Software Engineer, I test/improve the performance of > network device drivers among other things. I will do my best to help you. > > The first thing I would say is that I haven't used the 10GB NICs yet, but a > rate of ~5 million PPS ((312*1024*1024)/64) is good or bad depending on what > you are doing. i.e. How many NICs are sending on and how many are recieving > on? In a situation where you operate cards in pairs, for instance all the > traffic from card A goes to card B and all the traffic from card B goes to > card A , I would consider this quiet low. In a situation where any card will > talk to any card, for instance traffic from card A can go to card B, C or D, > 5 million pps might be ok. > > The first thing you need to do is play with irq affinities, check out this > blog post http://bramp.net/blog/post. You want to set the irq affinities > such that the rx threads are bound to one smp thread on one core. The next > thing is if possible configure so that network cards that have a 1-1 > relationship to execute on seperate smp threads on the same core. This > should improve the line rate you are seeing. > > Regards > > Ray Kinsella > > > On Mon, Aug 3, 2009 at 9:46 AM, Invernizzi Fabrizio < > fabrizio.invernizzi@telecomitalia.it> wrote: > >> Hi all >> >> I am doing some tests on a BSD system with a 10gbe Intel based network >> card and I have some doubts about the configuration since the performance I >> am experiencing looks (very) poor. >> >> This is the system I am doing test on: >> >> >> >> - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 cache >> size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core installed. >> >> - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit >> Ethernet PCI Express Server Adapter Intel? based (chip 82598EB). >> Driver ixgbe-1.8.6 >> >> - FreeBSD 7.2-RELEASE (64 bit) with this options compiled in the kernel >> options ZERO_COPY_SOCKETS # Turn on zero copy >> send code >> options HZ=1000 >> options BPF_JITTER >> >> >> >> I worked on the driver settings in order to have big TX/RX rings and low >> interrupt rate (traffic latency is not an issue). >> >> >> >> In order to tune up the system i started with some echo request tests. >> >> These are the maximum Bandwidths I can send without loss: >> >> - 64 byte packets: 312 Mbps (1,64% CPU idle) >> >> - 512 byte packets: 2117 Mbps (1,63% CPU idle) >> >> - 1492 byte packets: 5525 Mbps (1,93% CPU idle) >> >> >> >> Am I right considering these figures lower than expected? >> The system is just managing network traffic! >> >> >> >> Now I have started with netgraph tests, in particular with ng_bpf and the >> overall system is going even worst. >> >> I sent some HTTP traffic (597 bytes-long packets) and I configured an >> ng_bpf to filter TCP traffic out from the incoming interface (ix0). >> >> If I use the ngctl msg to see counters on the ng_bpf node, I see extremely >> poor performance: >> >> >> >> - Sending 96Mbps of this traffic I figured out 0.1% packet loss. This >> looks very strange. May be some counter bug? >> >> - Sending 5500Mbps, the netgraph (not the network card driver) is loosing >> 21% of the number of sent packets. See below a snapshot of the CPU load >> under traffic load >> >> >> >> CPU: 0.0% user, 0.0% nice, 87.0% system, 9.1% interrupt, 3.9% idle >> >> Mem: 16M Active, 317M Inact, 366M Wired, 108K Cache, 399M Buf, 7222M Free >> >> Swap: 2048M Total, 2048M Free >> >> >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> >> 12 root 1 171 ki31 0K 16K RUN 2 20.2H 68.80% idle: >> cpu2 >> >> 11 root 1 171 ki31 0K 16K RUN 3 20.1H 64.70% idle: >> cpu3 >> >> 14 root 1 171 ki31 0K 16K RUN 0 20.2H 64.26% idle: >> cpu0 >> >> 13 root 1 171 ki31 0K 16K RUN 1 20.2H 63.67% idle: >> cpu1 >> >> 38 root 1 -68 - 0K 16K CPU1 1 1:28 34.67% ix0 rxq >> >> 40 root 1 -68 - 0K 16K CPU2 0 1:26 34.18% ix0 rxq >> >> 34 root 1 -68 - 0K 16K CPU3 3 1:27 34.08% ix0 rxq >> >> 36 root 1 -68 - 0K 16K RUN 2 1:26 34.08% ix0 rxq >> >> 33 root 1 -68 - 0K 16K WAIT 3 0:40 4.05% irq260: >> ix0 >> >> 39 root 1 -68 - 0K 16K WAIT 2 0:41 3.96% irq263: >> ix0 >> >> 35 root 1 -68 - 0K 16K WAIT 0 0:39 3.66% irq261: >> ix0 >> >> 37 root 1 -68 - 0K 16K WAIT 1 0:42 3.47% irq262: >> ix0 >> >> 16 root 1 -32 - 0K 16K WAIT 0 14:53 2.49% swi4: >> clock sio >> >> >> >> >> >> >> >> Am I missing something? >> >> Does someone know some (more) system tuning to have higher traffic rate >> supported? >> >> >> >> Any help is greatly appreciated. >> >> >> >> Fabrizio >> >> >> >> >> >> ------------------------------------------------------------------ >> Telecom Italia >> Fabrizio INVERNIZZI >> Technology - TILAB >> Accesso Fisso e Trasporto >> Via Reiss Romoli, 274 10148 Torino >> Tel. +39 011 2285497 >> Mob. +39 3316001344 >> Fax +39 06 41867287 >> >> >> Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle >> persone indicate. La diffusione, copia o qualsiasi altra azione derivante >> dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora >> abbiate ricevuto questo documento per errore siete cortesemente pregati di >> darne immediata comunicazione al mittente e di provvedere alla sua >> distruzione, Grazie. >> >> This e-mail and any attachments is confidential and may contain privileged >> information intended for the addressee(s) only. Dissemination, copying, >> printing or use by anybody else is unauthorised. If you are not the intended >> recipient, please delete this message and any attachments and advise the >> sender by return e-mail, Thanks. >> >> _______________________________________________ >> freebsd-performance@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >> To unsubscribe, send any mail to " >> freebsd-performance-unsubscribe@freebsd.org" >> > > From raykinsella78 at gmail.com Mon Aug 3 16:36:52 2009 From: raykinsella78 at gmail.com (Ray Kinsella) Date: Mon Aug 3 16:37:05 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <2a41acea0908030820o40438f6erda78927733529a9@mail.gmail.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <18AAC16B-3CC0-4C70-A009-00A325AB5932@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF96A@GRFMBX702BA020.griffon.local> <2a41acea0908030820o40438f6erda78927733529a9@mail.gmail.com> Message-ID: <584ec6bb0908030907i4371d2d1y63fc23bb889ae06d@mail.gmail.com> Hi all, cpuset is the command to set a cpu affinity, there are details @ http://bramp.net/blog/post vmstat -z is the command you need to determine whether there is contention for mbufs. although the cpu usage does not suggest the system is memory constrained. Regards Ray Kinsella On Mon, Aug 3, 2009 at 4:20 PM, Jack Vogel wrote: > If you go to FreeBSD 8 you will get the improved stack code, and the RX/TX > queue pairs > will be pinned to cpus. It should improve performance. > > Make sure you have enough mbuf memory allocated, try increasing the > descriptors. > > Jack > > > On Mon, Aug 3, 2009 at 3:26 AM, Invernizzi Fabrizio < > fabrizio.invernizzi@telecomitalia.it> wrote: > > > > > If you are meaning in term of Packet per second, you are right. > > > > These are the packet per second measured during tests: > > > > > > > > 64 byte: 610119 Pps > > > > 512 byte: 516917 Pps > > > > 1492 byte: 464962 Pps > > > > > > > > > > > >> Am I correct that the maximum you can reach is around 639,000 > packets > > > >> per second? > > > > > > > > Yes, as you can see the maximum is 610119 Pps. > > > > Where does this limit come from? > > > > > > I duno - the tests I did before were with SYN packets (random source) > > > which was my worst scenario, > > > and the server CPU were busy generating MD5 check sums for > > > "syncache" (around 35% of the time). > > > > > > If I have to compare my results with your, you beat me with factor > > > 2.5, may be because you use ICMP for the test > > > and your processor is better then my test stations :) > > > Also my experience is only with gigabit cards (em driver) and FreeBSD > > > 7.something_before_1 where the em thread was eating 100% cpu. > > > If you are lucky LOCK_PROFILING(9) will help you to see where the CPUs > > > spend their time, if not you will see kernel panic :) > > > > > > I will check, thanks for the hint. > > > > > Once problematic locks identified they can be reworked, but I think > > > the first part is already done > > > and work on the second already started. > > > > > > In my experience increasing hw.em.rxd and hw.em.txd yelled better > > > results, but I think ixgb already comes tuned by default > > > as it still doesn't have to support such a large number of different > > > cards. > > > > I did some tuning in the code of the driver e recompiled the kernel in > > order to reduce context switching (interrupt mitigation) since the driver > > does not support POLLING. > > > > > Also at the time of my tests there were not support for multi queues > > > in the OS even if they were supported by the HW, which is changed in > > > 7.2 (?) > > > > It looks like multi queue is working since I can see the load distributed > > over the 4 cores. > > > > > > Fabrizio > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > > dalla conoscenza di queste informazioni sono rigorosamente vietate. > Qualora > > abbiate ricevuto questo documento per errore siete cortesemente pregati > di > > darne immediata comunicazione al mittente e di provvedere alla sua > > distruzione, Grazie. > > > > This e-mail and any attachments is confidential and may contain > privileged > > information intended for the addressee(s) only. Dissemination, copying, > > printing or use by anybody else is unauthorised. If you are not the > intended > > recipient, please delete this message and any attachments and advise the > > sender by return e-mail, Thanks. > > > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to " > > freebsd-performance-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > From fabrizio.invernizzi at telecomitalia.it Tue Aug 4 07:55:19 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Tue Aug 4 07:55:26 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <4A77094C.8030308@elischer.org> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> > >> The limitation that you see is about the max number of packets that > >> FreeBSD can handle - it looks like your best performance is reached at > >> 64 byte packets? > > > > If you are meaning in term of Packet per second, you are right. These > are the packet per second measured during tests: > > > > 64 byte: 610119 Pps > > 512 byte: 516917 Pps > > 1492 byte: 464962 Pps > > > > > >> Am I correct that the maximum you can reach is around 639,000 packets > >> per second? > > > > Yes, as you can see the maximum is 610119 Pps. > > Where does this limit come from? > > ah that's the whole point of tuning :-) > there are severalpossibities: > 1/ the card's interrupts are probably attache dto aonly 1 cpu, > so that cpu can do no more work This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 124 processes: 13 running, 73 sleeping, 38 waiting CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq .... It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. > 2/ if more than 1 cpu is working, it may be that there is a lock in > heavy contention somewhere. This I think is the problem. I am trying to understand how to 1- see where the heavy contention is (context switching? Some limiting setting?) 2- mitigate it > > is the machine still responsive to other networks while > running at maximum capacity on this network? (make sure that > the other networks are on a differnet CPU (hmm I can't remember how to > do that :-). Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From fabrizio.invernizzi at telecomitalia.it Tue Aug 4 08:14:49 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Tue Aug 4 08:14:56 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <584ec6bb0908030914m74b79dceq9af2581e1b02449a@mail.gmail.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <584ec6bb0908030819vee58480p43989b742e1b7fd2@mail.gmail.com> <584ec6bb0908030914m74b79dceq9af2581e1b02449a@mail.gmail.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A456967237@GRFMBX702BA020.griffon.local> Ray, >To me it looks like interrupt coalescing is not switched on for some reason. >Are you passing any parameters to the driver in boot.conf. This is my loader.conf kern.ipc.nmbclusters=65635 kern.hz=1000 net.bpf_jitter.enable=1 # net.graph.threads=32 # if_em_load="YES" # NETGRAPH TUNING net.graph.maxdata=1024 kern.ipc.somaxconn=4096 net.inet.tcp.recvspace=78840 net.inet.tcp.sendspace=78840 kern.ipc.shmmax=67108864 kern.ipc.shmmni=200 kern.ipc.shmseg=128 kern.ipc.semmni=70 net.local.stream.sendspace=82320 net.local.stream.recvspace=82320 net.inet.tcp.local_slowstart_flightsize=10 net.inet.tcp.nolocaltimewait=1 net.inet.tcp.hostcache.expire=3900 kern.maxusers=512 kern.ipc.nmbclusters=32768 kern.ipc.maxsockets=81920 kern.ipc.maxsockbuf=1048576 net.inet.tcp.tcbhashsize=4096 net.inet.tcp.hostcache.hashsize=1024 >Could you retest with vmstat switched on "vmstat 3" and send us the output. >I expect we are going to see alot of interrupts. Sending 535714 pps (64bytes-long) INTRUDER-64# vmstat 3 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 0 0 0 95420K 7203M 19 0 0 0 17 0 0 0 642 66 1078 0 2 98 0 0 0 95420K 7203M 0 0 0 0 0 0 2 0 18 65 527 0 0 100 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 18 67 527 0 0 100 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 17 64 525 0 0 100 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 31526 64 31402 0 87 13 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 36767 64 33320 0 99 1 0 0 0 95420K 7203M 423 0 0 0 406 0 0 0 36174 384 28107 0 99 1 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 36706 64 27043 0 99 1 0 0 0 95420K 7203M 0 0 0 0 0 0 0 0 34006 64 13117 0 91 9 2 0 0 95420K 7203M 0 0 0 0 0 0 0 0 17 64 550 0 1 99 0 0 0 95420K 7203M 0 0 0 0 3 0 3 0 19 68 507 0 0 100 dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 1.7.4 dev.ix.0.%driver: ix dev.ix.0.%location: slot=0 function=0 dev.ix.0.%pnpinfo: vendor=0x8086 device=0x10c6 subvendor=0x8086 subdevice=0xa15f class=0x020000 dev.ix.0.%parent: pci3 dev.ix.0.stats: -1 dev.ix.0.debug: -1 dev.ix.0.flow_control: 0 dev.ix.0.enable_lro: 1 Adaptive Interrupt Mitigation is enabled: dev.ix.0.enable_aim: 1 I did not changed AIM settings since is quite complex to tune them up. I tried to reverse engineering the algorithm of AIM (see attached picture) but I can't obtain tangible improvements playing with these paramenters. My understanding is that I should reduce the low_latency, but it seems not to work. dev.ix.0.low_latency: 128 dev.ix.0.ave_latency: 400 dev.ix.0.bulk_latency: 1200 Not sure about this dev.ix.0.hdr_split: 0 Not sure about the meaning of dev.ix.0.rx_processing_limit: 100 These are the settings I am using in the ixgbe driver: #define DEFAULT_TXD 1024 #define PERFORM_TXD 2048 #define MAX_TXD 4096 #define MIN_TXD 64 #define DEFAULT_RXD 1024 #define PERFORM_RXD 2048 #define MAX_RXD 4096 #define MIN_RXD 64 #define IXGBE_TX_CLEANUP_THRESHOLD (adapter->num_tx_desc / 1) #define IXGBE_TX_OP_THRESHOLD (adapter->num_tx_desc / 4) I see that I had a good performance improvement setting IXGBE_TX_CLEANUP_THRESHOLD to the tx queue size. This is clear to understand since this (greatly) minimizes context switching in send process reducing the number of time the txq function is called. (Of course with this settings send latency is increased, but this is not an issue). I can't understand the meaning of these parameters and if they could help: /* * This parameter controls the maximum no of times the driver will loop in * the isr. Minimum Value = 1 */ #define MAX_LOOP 10 /* * This parameter controls the duration of transmit watchdog timer. */ #define IXGBE_TX_TIMEOUT 5 /* set to 5 seconds */ Fabrizio Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From fabrizio.invernizzi at telecomitalia.it Tue Aug 4 10:26:29 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Tue Aug 4 10:26:37 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4569672AD@GRFMBX702BA020.griffon.local> Hi all. Going on with my tests. I noticed that I have always a (received) pps/interrupt ratio(per second) that is always about 14 (a very low ratio in my opinion). If I disable aim via sysctl sysctl -w dev.ix.0.enable_aim=0 dev.ix.0.enable_aim: 1 -> 0 this ration remain the same. Something going wrong with interrupt coalescing, but I can't find out what. Any idea? Fabrizio > -----Original Message----- > From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd- > performance@freebsd.org] On Behalf Of Invernizzi Fabrizio > Sent: marted? 4 agosto 2009 9.55 > To: Julian Elischer > Cc: freebsd-performance@freebsd.org; Stefan Lambrev > Subject: RE: Test on 10GBE Intel based network card > > > >> The limitation that you see is about the max number of packets that > > >> FreeBSD can handle - it looks like your best performance is reached > at > > >> 64 byte packets? > > > > > > If you are meaning in term of Packet per second, you are right. These > > are the packet per second measured during tests: > > > > > > 64 byte: 610119 Pps > > > 512 byte: 516917 Pps > > > 1492 byte: 464962 Pps > > > > > > > > >> Am I correct that the maximum you can reach is around 639,000 packets > > >> per second? > > > > > > Yes, as you can see the maximum is 610119 Pps. > > > Where does this limit come from? > > > > ah that's the whole point of tuning :-) > > there are severalpossibities: > > 1/ the card's interrupts are probably attache dto aonly 1 cpu, > > so that cpu can do no more work > > This seems not to be the problem. See below a top snapshot during a > 64byte-long packet storm > > last pid: 8552; load averages: 0.40, 0.09, 0.03 > up 0+20:36:58 09:40:29 > 124 processes: 13 running, 73 sleeping, 38 waiting > CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle > Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free > Swap: 2048M Total, 2048M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: > cpu3 > 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: > cpu0 > 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: > cpu2 > 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: > cpu1 > 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 > rxq > 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 > rxq > 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 > rxq > 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 > rxq > .... > > It looks like the 4 rxq processes are bound to the 4 available cores with > equal distribution. > > > > > 2/ if more than 1 cpu is working, it may be that there is a lock in > > heavy contention somewhere. > > This I think is the problem. I am trying to understand how to > 1- see where the heavy contention is (context switching? Some limiting > setting?) > 2- mitigate it > > > > > is the machine still responsive to other networks while > > running at maximum capacity on this network? (make sure that > > the other networks are on a differnet CPU (hmm I can't remember how to > > do that :-). > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. > Qualora abbiate ricevuto questo documento per errore siete cortesemente > pregati di darne immediata comunicazione al mittente e di provvedere alla > sua distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the > intended recipient, please delete this message and any attachments and > advise the sender by return e-mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance- > unsubscribe@freebsd.org" Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From julian at elischer.org Tue Aug 4 16:17:18 2009 From: julian at elischer.org (Julian Elischer) Date: Tue Aug 4 16:17:24 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> Message-ID: <4A785F20.8050807@elischer.org> Invernizzi Fabrizio wrote: >>>> The limitation that you see is about the max number of packets that >>>> FreeBSD can handle - it looks like your best performance is reached at >>>> 64 byte packets? >>> If you are meaning in term of Packet per second, you are right. These >> are the packet per second measured during tests: >>> 64 byte: 610119 Pps >>> 512 byte: 516917 Pps >>> 1492 byte: 464962 Pps >>> >>> >>>> Am I correct that the maximum you can reach is around 639,000 packets >>>> per second? >>> Yes, as you can see the maximum is 610119 Pps. >>> Where does this limit come from? >> ah that's the whole point of tuning :-) >> there are severalpossibities: >> 1/ the card's interrupts are probably attache dto aonly 1 cpu, >> so that cpu can do no more work > > This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm > > last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 > 124 processes: 13 running, 73 sleeping, 38 waiting > CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle > Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free > Swap: 2048M Total, 2048M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 > 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 > 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 > 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 > 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq > 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq > 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq > 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq > .... > > It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. > > > >> 2/ if more than 1 cpu is working, it may be that there is a lock in >> heavy contention somewhere. > > This I think is the problem. I am trying to understand how to > 1- see where the heavy contention is (context switching? Some limiting setting?) > 2- mitigate it > there ia a lock profiling tool that right now I can't remember the name of.. look it up with google :-) FreeBSD lock profiling tool ah, first hit... http://blogs.epfl.ch/article/23832 >> is the machine still responsive to other networks while >> running at maximum capacity on this network? (make sure that >> the other networks are on a differnet CPU (hmm I can't remember how to >> do that :-). > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From jfvogel at gmail.com Tue Aug 4 16:41:34 2009 From: jfvogel at gmail.com (Jack Vogel) Date: Tue Aug 4 16:41:40 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <4A785F20.8050807@elischer.org> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> <4A785F20.8050807@elischer.org> Message-ID: <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :( I would set it to: kern.ipc.nmbclusters=262144 Also, I thought you were using the current driver, but now it looks like you are using something fairly old, use my latest code which is 1.8.8 Jack On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer wrote: > Invernizzi Fabrizio wrote: > >> The limitation that you see is about the max number of packets that >>>>> FreeBSD can handle - it looks like your best performance is reached at >>>>> 64 byte packets? >>>>> >>>> If you are meaning in term of Packet per second, you are right. These >>>> >>> are the packet per second measured during tests: >>> >>>> 64 byte: 610119 Pps >>>> 512 byte: 516917 Pps >>>> 1492 byte: 464962 Pps >>>> >>>> >>>> Am I correct that the maximum you can reach is around 639,000 packets >>>>> per second? >>>>> >>>> Yes, as you can see the maximum is 610119 Pps. >>>> Where does this limit come from? >>>> >>> ah that's the whole point of tuning :-) >>> there are severalpossibities: >>> 1/ the card's interrupts are probably attache dto aonly 1 cpu, >>> so that cpu can do no more work >>> >> >> This seems not to be the problem. See below a top snapshot during a >> 64byte-long packet storm >> >> last pid: 8552; load averages: 0.40, 0.09, 0.03 >> up >> 0+20:36:58 09:40:29 >> 124 processes: 13 running, 73 sleeping, 38 waiting >> CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle >> Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free >> Swap: 2048M Total, 2048M Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: >> cpu3 >> 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: >> cpu0 >> 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: >> cpu2 >> 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: >> cpu1 >> 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq >> 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq >> 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq >> 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq >> .... >> >> It looks like the 4 rxq processes are bound to the 4 available cores with >> equal distribution. >> >> >> >> 2/ if more than 1 cpu is working, it may be that there is a lock in >>> heavy contention somewhere. >>> >> >> This I think is the problem. I am trying to understand how to >> 1- see where the heavy contention is (context switching? Some limiting >> setting?) >> 2- mitigate it >> >> > > there ia a lock profiling tool that right now I can't remember the name > of.. > > look it up with google :-) FreeBSD lock profiling tool > > ah, first hit... > > http://blogs.epfl.ch/article/23832 > > > > is the machine still responsive to other networks while >>> running at maximum capacity on this network? (make sure that >>> the other networks are on a differnet CPU (hmm I can't remember how to >>> do that :-). >>> >> >> >> >> Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle >> persone indicate. La diffusione, copia o qualsiasi altra azione derivante >> dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora >> abbiate ricevuto questo documento per errore siete cortesemente pregati di >> darne immediata comunicazione al mittente e di provvedere alla sua >> distruzione, Grazie. >> >> This e-mail and any attachments is confidential and may contain privileged >> information intended for the addressee(s) only. Dissemination, copying, >> printing or use by anybody else is unauthorised. If you are not the intended >> recipient, please delete this message and any attachments and advise the >> sender by return e-mail, Thanks. >> > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > From fabrizio.invernizzi at telecomitalia.it Wed Aug 5 09:04:24 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Wed Aug 5 09:04:31 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> <4A785F20.8050807@elischer.org> <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A456967403@GRFMBX702BA020.griffon.local> Hi Jack, Now I have upgraded the driver to 1.8.6 (official intel last release), but no better performance. I am about to test with the settings your are suggesting. Using the same configuation i used before, i see a reduction on interrupts (from 1 for 14 packets in 1.7.6 to 1 for 40 packets in 1.8.6). This leades to a 5% to 10% of free CPU, but no higer packet rate. I tried working with the AIM settings, but no luck. Where can i download your 1.8.8 driver version? Fabrizio ________________________________ From: Jack Vogel [mailto:jfvogel@gmail.com] Sent: marted? 4 agosto 2009 18.42 To: Julian Elischer Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev Subject: Re: Test on 10GBE Intel based network card Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :( I would set it to: kern.ipc.nmbclusters=262144 Also, I thought you were using the current driver, but now it looks like you are using something fairly old, use my latest code which is 1.8.8 Jack On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: Invernizzi Fabrizio wrote: The limitation that you see is about the max number of packets that FreeBSD can handle - it looks like your best performance is reached at 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps Am I correct that the maximum you can reach is around 639,000 packets per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? ah that's the whole point of tuning :-) there are severalpossibities: 1/ the card's interrupts are probably attache dto aonly 1 cpu, so that cpu can do no more work This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 124 processes: 13 running, 73 sleeping, 38 waiting CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq .... It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. 2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere. This I think is the problem. I am trying to understand how to 1- see where the heavy contention is (context switching? Some limiting setting?) 2- mitigate it there ia a lock profiling tool that right now I can't remember the name of.. look it up with google :-) FreeBSD lock profiling tool ah, first hit... http://blogs.epfl.ch/article/23832 is the machine still responsive to other networks while running at maximum capacity on this network? (make sure that the other networks are on a differnet CPU (hmm I can't remember how to do that :-). Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. Non stampare questa mail se non ? necessario. From fabrizio.invernizzi at telecomitalia.it Wed Aug 5 10:13:26 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Wed Aug 5 10:13:33 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> <4A785F20.8050807@elischer.org> <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( ++fabrizio ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 ________________________________ From: Jack Vogel [mailto:jfvogel@gmail.com] Sent: marted? 4 agosto 2009 18.42 To: Julian Elischer Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev Subject: Re: Test on 10GBE Intel based network card Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :( I would set it to: kern.ipc.nmbclusters=262144 Also, I thought you were using the current driver, but now it looks like you are using something fairly old, use my latest code which is 1.8.8 Jack On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: Invernizzi Fabrizio wrote: The limitation that you see is about the max number of packets that FreeBSD can handle - it looks like your best performance is reached at 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps Am I correct that the maximum you can reach is around 639,000 packets per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? ah that's the whole point of tuning :-) there are severalpossibities: 1/ the card's interrupts are probably attache dto aonly 1 cpu, so that cpu can do no more work This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 124 processes: 13 running, 73 sleeping, 38 waiting CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq .... It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. 2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere. This I think is the problem. I am trying to understand how to 1- see where the heavy contention is (context switching? Some limiting setting?) 2- mitigate it there ia a lock profiling tool that right now I can't remember the name of.. look it up with google :-) FreeBSD lock profiling tool ah, first hit... http://blogs.epfl.ch/article/23832 is the machine still responsive to other networks while running at maximum capacity on this network? (make sure that the other networks are on a differnet CPU (hmm I can't remember how to do that :-). Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. Non stampare questa mail se non ? necessario. From grarpamp at gmail.com Fri Aug 7 09:40:17 2009 From: grarpamp at gmail.com (grarpamp) Date: Fri Aug 7 16:27:17 2009 Subject: RELENG_7 heavy disk = system crawls Message-ID: Hi. I'm running RELENG_7 on an older single P4. It has a lot of disk on it that does mainly sequential read/write of gigs of data. The data disks are hanging off a dumb ata133 pdc20269 card, they use geli aes 128 and zfs sha256, single spindles. Free ram, no swap, free disk, no net, etc. In short, whenever I'm doing sequential disk stuff, human interface system performance tanks, big time. Keystrokes take a second or two to appear, X11 focus raises take forever, firefox is a dog... after waiting 10min for the system to load it off disk, run it and paint it to X11, mouse movements are largely ignored unless moved very slowly. Even typing killall is painful to do :) I am very near the limit on wired mem, despite telling zfs to use only 96MiB. And sometimes X11 can't get the ram it wants and dies mid session. But neither of those should affect performance, the kernel will deal with low mem one way or another, drastically, not by bogging down [no swap enabled anyways] right? I used to do similar stuff on RELENG_4 on an old dual PII with plain UFS and I could run gigs of disk data, all the spindles, 8 of them, through sha1, all at once, and still have good interactive performance. Yet fire off one sequential read/write, perhaps piped into sha1 for more cpu load and it's game over. I can renice the user process way down into idprio with no effect. So I doubt giving rtprio to all the human processes would do anything. Is RELENG_7 or GELI+ZFS the dog here? Top of course shows GELI+ZFS eating all the cpu. Ok, fine, so how do I say, hey, I don't care about disk, it'll finish eventually, just give me my gui and keystrokes back? It just doesn't seem right that this one subsystem should be monopolizing the cpu this way? Help/cluebat? Thanks. From freebsd at sopwith.solgatos.com Sat Aug 8 05:53:35 2009 From: freebsd at sopwith.solgatos.com (Dieter) Date: Sat Aug 8 05:53:42 2009 Subject: RELENG_7 heavy disk = system crawls In-Reply-To: Your message of "Fri, 07 Aug 2009 05:17:31 EDT." Message-ID: <200908080533.FAA18701@sopwith.solgatos.com> > Hi. I'm running RELENG_7 on an older single P4. It has a lot of > disk on it that does mainly sequential read/write of gigs of data. > In short, whenever I'm doing sequential disk stuff, human interface > system performance tanks, big time. > I used to do similar stuff on RELENG_4 on an old dual PII with plain > UFS and I could run gigs of disk data, all the spindles, 8 of them, > through sha1, all at once, and still have good interactive performance. > > Yet fire off one sequential read/write, perhaps piped into sha1 for > more cpu load and it's game over. I can renice the user process way > down into idprio with no effect. So I doubt giving rtprio to all the > human processes would do anything. > > Is RELENG_7 or GELI+ZFS the dog here? Top of course shows > GELI+ZFS eating all the cpu. Ok, fine, so how do I say, hey, I don't > care about disk, it'll finish eventually, just give me my gui and > keystrokes back? > > It just doesn't seem right that this one subsystem should be > monopolizing the cpu this way? Help/cluebat? Thanks. Are you *sure* that the cpu is your bottleneck? I see problems where sequential read/write of gigs of data on one disk interferes with i/o to other disks, and it isn't a cpu problem in my case. Just the opposite, cpu bound jobs don't create the problem, i/o bound jobs do. As expected, idprio doesn't help my problem either. I suspect that the problem is the i/o bound job is hogging the disk cache or some similar resource. I run FFS, no ZFS, and I've seen the same problem on 6.0, 6.2, 7.0 and 7.1. Unix grew up on machines where cpu was always the scarce resource, so nice only cares about cpu. Today i/o is often the scarce resource, so we need a way to nice i/o up/down similar to nicing cpu up/down. FreeBSD is supposed to be the "performance" BSD, but watching it leave several disks idle while one disk does i/o makes me wonder. From grarpamp at gmail.com Sat Aug 8 09:02:49 2009 From: grarpamp at gmail.com (grarpamp) Date: Sat Aug 8 12:43:57 2009 Subject: RELENG_7 heavy disk = system crawls Message-ID: > Are you *sure* that the cpu is your bottleneck? Well, I've got a gig free in /usr/local which is ufs2+softdeps. So I just dd if=dev/zero bs=1m of=zero there. Disk was 100% busy, cpu was 10-15% system, 10% user, about 16MiB/sec. Of course since that is my system spindle, and it was busied out by dd, I had to wait 10 sec or so for vi to load to write this note during it :) But the rest of the interface was responsive. Though I've never run RELENG_4 on this box, I'd venture it would feel similar in this case. I can dd if=/dev/ad[n].eli of=/dev/null bs=1m and use 75% system all in geli, 27% disk busy, 20MiB/sec. Interface was slower but reasonable. Now do a dd if=file of=/dev/null bs=1m where file is on zfs on the geli spindle above and the system melts down. Half cpu in geli, half spread over about 8 spa_zio's, 4MiB/sec, all system time, disk 100% busy. I'm not sure yet how to isolate cpu from i/o under my geli+zfs setup. I think they're mated together. Curiously, I've got about 104 spa_zio procs all in tq->tq DL state. Only about 20 or so have more than a little system time on them. But about 9 zfs mounts so that may be ok, don't know. Don't get me wrong, FreeBSD has been my primary os since RELENG_2_2. And it's been great, still is. I recommend and use it all the time. It's just that this workload has really put the screws to things and I don't see a way out. I'd like to deploy geli+zfs everywhere but if I can't login remotely because some user has it busied out on something I've no knobs to control, umm, yeah :) I'm curious to see what others running geli_aes128+zfs_sha256 are seeing in this regard. And I'd love to see what a fast dual or more core amd64 system would be like under this workload. As to your i/o thing, I think back in RELENG_4 that if all the spindles were on the same pata controller/interrupt, monopolistic loads could occur. Seemed to be a hardware thing, not BSD. IE: At the moment I've got a half dozen spindles and filesystems spread out under RELENG_4 all happily doing find | xargs sha1 at once, no problems. That hardware is set for update to RELENG_7 or RELENG_8 in a few weeks. > we need a way to nice i/o up/down That would be handy for sure. User spindles, system spindles, storage, net, keyboard, etc. # systime spread 11 root 1 171 ki31 0K 8K RUN 46.5H 88.18% idle: cpu0 3215 root 1 -8 - 0K 8K geli:w 737:30 1.76% g_eli[0] ad6 607 root 1 -8 - 0K 8K geli:w 158:12 0.00% g_eli[0] ad4 3235 root 1 -16 - 0K 24K tq->tq 69:41 0.00% spa_zio 3229 root 1 -16 - 0K 24K tq->tq 69:40 0.00% spa_zio 3228 root 1 -16 - 0K 24K tq->tq 69:39 0.00% spa_zio 3234 root 1 -16 - 0K 24K tq->tq 69:39 0.00% spa_zio 3233 root 1 -16 - 0K 24K tq->tq 69:39 0.00% spa_zio 3232 root 1 -16 - 0K 24K tq->tq 69:39 0.00% spa_zio 3231 root 1 -16 - 0K 24K tq->tq 69:38 0.00% spa_zio 3230 root 1 -16 - 0K 24K tq->tq 69:37 0.00% spa_zio 1135 user 1 44 0 169M 152M select 56:02 0.00% XFree86 954 root 1 -16 - 0K 24K tq->tq 17:10 0.00% spa_zio 958 root 1 -16 - 0K 24K tq->tq 17:10 0.00% spa_zio 956 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 953 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 957 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 952 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 951 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 955 root 1 -16 - 0K 24K tq->tq 17:09 0.00% spa_zio 613 root 1 -8 - 0K 8K geli:w 16:12 0.00% g_eli[0] ad7 3 root 1 -8 - 0K 8K - 15:05 0.00% g_up From freebsd at sopwith.solgatos.com Sun Aug 9 04:06:55 2009 From: freebsd at sopwith.solgatos.com (Dieter) Date: Sun Aug 9 04:07:02 2009 Subject: RELENG_7 heavy disk = system crawls In-Reply-To: Your message of "Sat, 08 Aug 2009 05:02:48 EDT." Message-ID: <200908090402.EAA08610@sopwith.solgatos.com> > I can dd if=/dev/ad[n].eli of=/dev/null bs=1m and use 75% system > all in geli, 27% disk busy, 20MiB/sec. Interface was slower but > reasonable. I think I understand now. You're doing encryption in the kernel, which eats a lot of cpu, and nice only affects userland. So yeah cpu is a significant part of your problem. > I'm not sure yet how to isolate cpu from i/o under my geli+zfs > setup. I think they're mated together. Agreed. > It's just that this workload has really put the screws to things > and I don't see a way out. I'd like to deploy geli+zfs everywhere > but if I can't login remotely because some user has it busied out > on something I've no knobs to control, umm, yeah :) Do you *need* geli+zfs ? If so, you could see if there are any hardware crypto accellerators with FreeBSD support, or throw lots of cpu (e.g. phenom2 x4) at it. > As to your i/o thing, I think back in RELENG_4 that if all the > spindles were on the same pata controller/interrupt, monopolistic > loads could occur. atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xe000-0xe00f at device 6.0 on pci0 atapci1: port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xcc00-0xcc0f mem 0xfebfb000-0xfebfbfff irq 21 at device 7.0 on pci0 atapci2: port 0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xb800-0xb80f mem 0xfebfa000-0xfebfafff irq 22 at device 8.0 on pci0 atapci3: port 0x8c00-0x8c07,0x8800-0x8803,0x8400-0x8407,0x8000-0x8003,0x7c00-0x7c0f mem 0xfe9fe000-0xfe9fffff irq 17 at device 0.0 on pci3 atapci4: port 0x6c00-0x6c7f mem 0xfe6ff000-0xfe6ff07f,0xfe6f8000-0xfe6fbfff irq 16 at device 0.0 on pci4 atapci5: port 0x4c00-0x4c07,0x4800-0x4803,0x4400-0x4407,0x4000-0x4003,0x3c00-0x3c0f mem 0xfe3fe000-0xfe3fffff irq 18 at device 0.0 on pci6 The nForce pata controller doesn't list an irq, seems odd? From ap00 at mail.ru Mon Aug 10 08:47:18 2009 From: ap00 at mail.ru (Anthony Pankov) Date: Mon Aug 10 08:47:25 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> <4A785F20.8050807@elischer.org> <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> Message-ID: <117233873703.20090810123227@mail.ru> Hello Invernizzi, First of all I didn't catch for what task you are tuning. For forwarding or for transfer? In any case you should start with minimal configuration (no netgraph modules, no any firewall). Choose a test similiar to your typical load. Provide us with test result, cpu consumption and output of ifconfig netstat -s netstat -m debug values from driver (sysctl (?).debug)). Wednesday, August 05, 2009, 2:13:23 PM, you wrote: IF> No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( IF> ++fabrizio IF> ------------------------------------------------------------------ IF> Telecom Italia IF> Fabrizio INVERNIZZI IF> Technology - TILAB IF> Accesso Fisso e Trasporto IF> Via Reiss Romoli, 274 10148 Torino IF> Tel. +39 011 2285497 IF> Mob. +39 3316001344 IF> Fax +39 06 41867287 IF> ________________________________ IF> From: Jack Vogel [mailto:jfvogel@gmail.com] IF> Sent: marted? 4 agosto 2009 18.42 IF> To: Julian Elischer IF> Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev IF> Subject: Re: Test on 10GBE Intel based network card IF> Your nmbclusters is very low, you list it twice so I'm assuming the second value is IF> what it ends up being, 32K :( IF> I would set it to: IF> kern.ipc.nmbclusters=262144 IF> Also, I thought you were using the current driver, but now it looks like you are IF> using something fairly old, use my latest code which is 1.8.8 IF> Jack IF> On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: IF> Invernizzi Fabrizio wrote: IF> The limitation that you see is about the max number of packets that IF> FreeBSD can handle - it looks like your best performance is reached at IF> 64 byte packets? IF> If you are meaning in term of Packet per second, you are right. These IF> are the packet per second measured during tests: IF> 64 byte: 610119 Pps IF> 512 byte: 516917 Pps IF> 1492 byte: 464962 Pps IF> Am I correct that the maximum you can reach is around 639,000 packets IF> per second? IF> Yes, as you can see the maximum is 610119 Pps. IF> Where does this limit come from? IF> ah that's the whole point of tuning :-) IF> there are severalpossibities: IF> 1/ the card's interrupts are probably attache dto aonly 1 cpu, IF> so that cpu can do no more work IF> This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm IF> last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 IF> 124 processes: 13 running, 73 sleeping, 38 waiting IF> CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle IF> Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free IF> Swap: 2048M Total, 2048M Free IF> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND IF> 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 IF> 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 IF> 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 IF> 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 IF> 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq IF> 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq IF> 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq IF> 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq IF> .... IF> It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. IF> 2/ if more than 1 cpu is working, it may be that there is a lock in IF> heavy contention somewhere. IF> This I think is the problem. I am trying to understand how to IF> 1- see where the heavy contention is (context switching? Some limiting setting?) IF> 2- mitigate it IF> there ia a lock profiling tool that right now I can't remember the name of.. IF> look it up with google :-) FreeBSD lock profiling tool IF> ah, first hit... IF> http://blogs.epfl.ch/article/23832 IF> is the machine still responsive to other networks while IF> running at maximum capacity on this network? (make sure that IF> the other networks are on a differnet CPU (hmm I can't remember how to IF> do that :-). -- Best regards, Anthony mailto:ap00@mail.ru From nathan at lenevez.net.au Mon Aug 10 23:53:54 2009 From: nathan at lenevez.net.au (Nathan Le Nevez) Date: Mon Aug 10 23:54:00 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: Message-ID: Hi, I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which performs very poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks configured in RAID-1. Both controllers have write-cache enabled. Both servers are running the same BIOS and firmware versions. Neither servers are running any services other than sshd. Blade with good performance (2 x Opteron 2218, 8GB RAM): ciss0: port 0x4000-0x40ff mem 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 ciss0: [ITHREAD] da0 at ciss0 bus 0 target 0 lun 0 da0: AFPi xCePdU D#i2r Leacuntc hAcecde!s s SCSI-5 device da0: 135.168MB/s transfers dSaM0P:: CAoP CPU #3 Launched! mmand Queueing Enabled da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) Blade with bad performance (2 x Opteron 2352, 16GB RAM): ciss0: port 0x4000-0x40ff mem 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 ciss0: [ITHREAD] da0 at ciss0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: Command Queueing Enabled da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) # dbench -t 10 1 2 3 4 blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec Any help/ideas would be greatly appreciated. I have run through all the Insight diagnostics tools and it fails to find anything wrong with the slow server. Cheers, Nathan From grarpamp at gmail.com Tue Aug 11 03:03:01 2009 From: grarpamp at gmail.com (grarpamp) Date: Tue Aug 11 03:19:20 2009 Subject: RELENG_7 heavy disk = system crawls Message-ID: > nice only affects userland Well, you can set {id,rt}prio and nice on kernel processes. Then look at top to see the nice column change. Have no idea what effect it has nor what the non '-' chars on those procs in that column mean. > Do you *need* geli+zfs? Encryption = required. ZFS... well I like the checksum all the way back to the uberblock feature, raidz2, ditto blocks, compression and the admin model. geom offers encryption, single block checksum authority and raid1/3. > hardware crypto accellerators The soekris ones work and are cheap. I thougt I saw posts that show openssl -speed on today's fast cpu's being faster than the accel cards. Disk crypto is symmetric, not initial pki session setup. And with ~40MB/s of encryption, while building world no less, cpu may not be the issue. So long as I'm not hitting those geli+zfs disks, things are smooth. I want to try the system with geli+ufs2 sometime. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 36555.26k 38074.95k 38999.93k 39258.38k 39176.80k > throw lots of cpu (e.g. phenom2 x4) at it That would only help it get done busying out the system sooner, not balance things out while actively under load. Which as a file server, it always is. Well, ok, it will help after the system is able to do spindle -> geli -> fs -> process at the max sustained read/write speed of the spindles. Which is about 56MiB/sec reading in this case. Which is over 10x faster than I'm getting now. Which means I'd need maybe 10 x 1.8GHz worth of cpu before I have any free cycles to devote to the user interface :) Maybe I'm just clueless this month and the list is too busy to beat me about the head with it. > The nForce pata controller doesn't list an irq, seems odd? > device 6.0 on pci0 This one doesn't either: atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 31.1 on pci0 boot -v and it appears in the irq routing stuff around that line. Then vmstat -i and systat -vmstat 1 also give some clues when device is dd'd. Onboard PATA is always irq14/15 as I've seen. From freebsd at sopwith.solgatos.com Tue Aug 11 06:47:54 2009 From: freebsd at sopwith.solgatos.com (Dieter) Date: Tue Aug 11 06:48:00 2009 Subject: RELENG_7 heavy disk = system crawls In-Reply-To: Your message of "Mon, 10 Aug 2009 23:03:00 EDT." Message-ID: <200908110637.GAA06250@sopwith.solgatos.com> > > hardware crypto accellerators > > The soekris ones work and are cheap. I thougt I saw posts that show > openssl -speed on today's fast cpu's being faster than the accel > cards. Disk crypto is symmetric, not initial pki session setup. The accel cards probably don't get updated as often as cpus do. Then there is the question of how many cards would you need and having enough free slots to plug them into. And the cards aren't going to help with zfs. Does anyone make a disk controller with crypto built in? Then it would scale with the number of drives. And not need extra slots. > > throw lots of cpu (e.g. phenom2 x4) at it > > That would only help it get done busying out the system sooner, not > balance things out while actively under load. Which as a file server, > it always is. > > Well, ok, it will help after the system is able to do spindle -> > geli -> fs -> process at the max sustained read/write speed of the > spindles. Which is about 56MiB/sec reading in this case. Which is > over 10x faster than I'm getting now. Which means I'd need maybe > 10 x 1.8GHz worth of cpu before I have any free cycles to devote > to the user interface :) Well, you could look into mainboards with 2 or 4 CPU sockets. Tyan probably has something. > Maybe I'm just clueless this month and the list is too busy to beat > me about the head with it. Anyone running geli+zfs is sitting there waiting for the user interface to update their screen. :-) >>> The data disks are hanging off a dumb ata133 pdc20269 card Ok, here is a REALLY UGLY way to slow the disks down: for disk in ad2 ad4 ... do atacontrol mode disk UDMA66 done Not a good solution. If one transfer is eating your user interface, with 8 drives you'd have to set the speed very very low. You really want a way to set the priority. BTW, if your data is so important that encryption is required, "checksum all the way back to the uberblock", etc. etc. you might want to upgrade the disks from pata to at least sata. Pata doesn't do error detection on the control info, so it could happily write your bits to the wrong sector. Sata checks the control info as well as the data. Current 7200 rpm sata speeds: dd reading the bare drive, no filesystem, at fast end of the drive: extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ad18 1871.1 0.0 118303.1 0.0 2 0.8 98 dd reading a file from FFS: 28125720000 bytes transferred in 264.921699 secs (106166162 bytes/sec) Or you could go with the green disks that spin slower. From krassi at bulinfo.net Tue Aug 11 12:09:18 2009 From: krassi at bulinfo.net (Krassimir Slavchev) Date: Tue Aug 11 12:09:25 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: References: Message-ID: <4A8159A8.3030206@bulinfo.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, What is the output of 'vmstat -i' and 'camcontrol tags da0' ? I have a ML350 running 7-STABLE with same controller and disks and performance is almost same as your good server. Nathan Le Nevez wrote: > Hi, > > I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which performs very > poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks > configured in RAID-1. Both controllers have write-cache enabled. Both > servers are running the same BIOS and firmware versions. Neither servers are > running any services other than sshd. > > Blade with good performance (2 x Opteron 2218, 8GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: AFPi xCePdU D#i2r Leacuntc hAcecde!s > s SCSI-5 device > da0: 135.168MB/s transfers > dSaM0P:: CAoP CPU #3 Launched! > mmand Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > Blade with bad performance (2 x Opteron 2352, 16GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-5 device > da0: 135.168MB/s transfers > da0: Command Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > # dbench -t 10 1 2 3 4 > blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec > blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec > > Any help/ideas would be greatly appreciated. I have run through all the > Insight diagnostics tools and it fails to find anything wrong with the slow > server. > > Cheers, > Nathan > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFKgVmoxJBWvpalMpkRAvilAJsGF0J34SgD34EcBxX8Ic8Hq6OUBACghpBL C7YgX2qmvgb7WSvgFhDrKl8= =JDJx -----END PGP SIGNATURE----- From nathan at lenevez.net.au Tue Aug 11 21:29:07 2009 From: nathan at lenevez.net.au (Nathan Le Nevez) Date: Tue Aug 11 21:29:13 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: <4A8159A8.3030206@bulinfo.net> References: <4A8159A8.3030206@bulinfo.net> Message-ID: # vmstat -i interrupt total rate irq1: atkbd0 18 0 irq5: ohci0 ohci1+ 1 0 irq19: ciss0 144916 3 irq21: uhci0 22 0 cpu0: timer 80002970 1999 irq256: bce0 17042 0 cpu2: timer 79994902 1999 cpu1: timer 79994975 1999 cpu3: timer 79995009 1999 cpu6: timer 79994957 1999 cpu5: timer 79995046 1999 cpu4: timer 79995041 1999 cpu7: timer 79995057 1999 Total 640129956 16000 # camcontrol tags da0 (pass0:ciss0:0:0:0): device openings: 254 Just for clarification, both systems are running amd64. Thanks, Nathan -----Original Message----- From: Krassimir Slavchev [mailto:krassi@bulinfo.net] Sent: Tuesday, 11 August 2009 9:45 PM To: Nathan Le Nevez Cc: freebsd-performance@freebsd.org Subject: Re: Very slow I/O performance on HP BL465c -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, What is the output of 'vmstat -i' and 'camcontrol tags da0' ? I have a ML350 running 7-STABLE with same controller and disks and performance is almost same as your good server. Nathan Le Nevez wrote: > Hi, > > I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which performs very > poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks > configured in RAID-1. Both controllers have write-cache enabled. Both > servers are running the same BIOS and firmware versions. Neither servers are > running any services other than sshd. > > Blade with good performance (2 x Opteron 2218, 8GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: AFPi xCePdU D#i2r Leacuntc hAcecde!s > s SCSI-5 device > da0: 135.168MB/s transfers > dSaM0P:: CAoP CPU #3 Launched! > mmand Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > Blade with bad performance (2 x Opteron 2352, 16GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-5 device > da0: 135.168MB/s transfers > da0: Command Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > # dbench -t 10 1 2 3 4 > blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec > blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec > > Any help/ideas would be greatly appreciated. I have run through all the > Insight diagnostics tools and it fails to find anything wrong with the slow > server. > > Cheers, > Nathan > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFKgVmoxJBWvpalMpkRAvilAJsGF0J34SgD34EcBxX8Ic8Hq6OUBACghpBL C7YgX2qmvgb7WSvgFhDrKl8= =JDJx -----END PGP SIGNATURE----- From krassi at bulinfo.net Wed Aug 12 05:41:23 2009 From: krassi at bulinfo.net (Krassimir Slavchev) Date: Wed Aug 12 05:41:30 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: References: <4A8159A8.3030206@bulinfo.net> Message-ID: <4A8255FE.5030500@bulinfo.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Looks okay. How your disks are partitioned and from where you are running dbench. Look at the -D option. For example I have: / without soft updates dbench -t 10 4 -> Throughput 72.7276 MB/sec 4 procs /var with soft updates -> Throughput 286.528 MB/sec 4 procs Are you sure that you are not running dbench on zfs or encrypted partition? Nathan Le Nevez wrote: > # vmstat -i > interrupt total rate > irq1: atkbd0 18 0 > irq5: ohci0 ohci1+ 1 0 > irq19: ciss0 144916 3 > irq21: uhci0 22 0 > cpu0: timer 80002970 1999 > irq256: bce0 17042 0 > cpu2: timer 79994902 1999 > cpu1: timer 79994975 1999 > cpu3: timer 79995009 1999 > cpu6: timer 79994957 1999 > cpu5: timer 79995046 1999 > cpu4: timer 79995041 1999 > cpu7: timer 79995057 1999 > Total 640129956 16000 > > # camcontrol tags da0 > (pass0:ciss0:0:0:0): device openings: 254 > > Just for clarification, both systems are running amd64. > > Thanks, > > Nathan > > -----Original Message----- > From: Krassimir Slavchev [mailto:krassi@bulinfo.net] > Sent: Tuesday, 11 August 2009 9:45 PM > To: Nathan Le Nevez > Cc: freebsd-performance@freebsd.org > Subject: Re: Very slow I/O performance on HP BL465c > > Hi, > > What is the output of 'vmstat -i' and 'camcontrol tags da0' ? > I have a ML350 running 7-STABLE with same controller and disks and > performance is almost same as your good server. > > Nathan Le Nevez wrote: >> Hi, > >> I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which performs very >> poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks >> configured in RAID-1. Both controllers have write-cache enabled. Both >> servers are running the same BIOS and firmware versions. Neither servers are >> running any services other than sshd. > >> Blade with good performance (2 x Opteron 2218, 8GB RAM): > >> ciss0: port 0x4000-0x40ff mem >> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 >> ciss0: [ITHREAD] >> da0 at ciss0 bus 0 target 0 lun 0 >> da0: AFPi xCePdU D#i2r Leacuntc hAcecde!s >> s SCSI-5 device >> da0: 135.168MB/s transfers >> dSaM0P:: CAoP CPU #3 Launched! >> mmand Queueing Enabled >> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >> Blade with bad performance (2 x Opteron 2352, 16GB RAM): > >> ciss0: port 0x4000-0x40ff mem >> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 >> ciss0: [ITHREAD] >> da0 at ciss0 bus 0 target 0 lun 0 >> da0: Fixed Direct Access SCSI-5 device >> da0: 135.168MB/s transfers >> da0: Command Queueing Enabled >> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >> # dbench -t 10 1 2 3 4 >> blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec >> blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec > >> Any help/ideas would be greatly appreciated. I have run through all the >> Insight diagnostics tools and it fails to find anything wrong with the slow >> server. > >> Cheers, >> Nathan > >> _______________________________________________ >> freebsd-performance@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >> To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFKglX+xJBWvpalMpkRAjxzAKCpNTFtnrA3GgJn66OPB3hwTAL1FgCdFPJf 5wxj8jV4jw/Hf6sYEwXUOEo= =9Inl -----END PGP SIGNATURE----- From krassi at bulinfo.net Wed Aug 12 07:57:56 2009 From: krassi at bulinfo.net (Krassimir Slavchev) Date: Wed Aug 12 07:58:02 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: References: Message-ID: <4A8275FF.2000201@bulinfo.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Is it possible to exchange disks between your blade1 and blade2 servers? Or to remove disks from one server and connect them to another? Also compare 'tunefs -p /' outputs Also compare the read speed of a raw device with e.g. 'dd if=/dev/da0 of=/dev/null bs=1m count=100' Nathan Le Nevez wrote: > # df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/da0s1a 496M 224M 232M 49% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da0s1e 496M 14K 456M 0% /tmp > /dev/da0s1f 119G 623M 109G 0% /usr > /dev/da0s1d 4.8G 346K 4.4G 0% /var > # mount > /dev/da0s1a on / (ufs, local) > devfs on /dev (devfs, local) > /dev/da0s1e on /tmp (ufs, local, soft-updates) > /dev/da0s1f on /usr (ufs, local, soft-updates) > /dev/da0s1d on /var (ufs, local, soft-updates) > > / - Throughput 6.59862 MB/sec 4 procs > /usr - Throughput 14.487 MB/sec 4 procs > > > > On 12/08/09 3:41 PM, "Krassimir Slavchev" wrote: > > Looks okay. > How your disks are partitioned and from where you are running dbench. > Look at the -D option. For example I have: > / without soft updates -> Throughput 72.7276 MB/sec 4 > procs > /var with soft updates -> Throughput 286.528 MB/sec 4 procs > > Are you sure that you are not running dbench on zfs or encrypted > partition? > > Nathan Le Nevez wrote: >> # vmstat -i >> interrupt total rate >> irq1: atkbd0 18 0 >> irq5: ohci0 ohci1+ 1 0 >> irq19: ciss0 144916 3 >> irq21: uhci0 22 0 >> cpu0: timer 80002970 1999 >> irq256: bce0 17042 0 >> cpu2: timer 79994902 1999 >> cpu1: timer 79994975 1999 >> cpu3: timer 79995009 1999 >> cpu6: timer 79994957 1999 >> cpu5: timer 79995046 1999 >> cpu4: timer 79995041 1999 >> cpu7: timer 79995057 1999 >> Total 640129956 16000 > >> # camcontrol tags da0 >> (pass0:ciss0:0:0:0): device openings: 254 > >> Just for clarification, both systems are running amd64. > >> Thanks, > >> Nathan > >> -----Original Message----- >> From: Krassimir Slavchev [mailto:krassi@bulinfo.net] >> Sent: Tuesday, 11 August 2009 9:45 PM >> To: Nathan Le Nevez >> Cc: freebsd-performance@freebsd.org >> Subject: Re: Very slow I/O performance on HP BL465c > >> Hi, > >> What is the output of 'vmstat -i' and 'camcontrol tags da0' ? >> I have a ML350 running 7-STABLE with same controller and disks and >> performance is almost same as your good server. > >> Nathan Le Nevez wrote: >>> Hi, > >>> I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which > performs very >>> poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks >>> configured in RAID-1. Both controllers have write-cache enabled. Both >>> servers are running the same BIOS and firmware versions. Neither > servers are >>> running any services other than sshd. > >>> Blade with good performance (2 x Opteron 2218, 8GB RAM): > >>> ciss0: port 0x4000-0x40ff mem >>> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 > on pci80 >>> ciss0: [ITHREAD] >>> da0 at ciss0 bus 0 target 0 lun 0 >>> da0: AFPi xCePdU D#i2r Leacuntc > hAcecde!s >>> s SCSI-5 device >>> da0: 135.168MB/s transfers >>> dSaM0P:: CAoP CPU #3 Launched! >>> mmand Queueing Enabled >>> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >>> Blade with bad performance (2 x Opteron 2352, 16GB RAM): > >>> ciss0: port 0x4000-0x40ff mem >>> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 > on pci80 >>> ciss0: [ITHREAD] >>> da0 at ciss0 bus 0 target 0 lun 0 >>> da0: Fixed Direct Access SCSI-5 device >>> da0: 135.168MB/s transfers >>> da0: Command Queueing Enabled >>> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >>> # dbench -t 10 1 2 3 4 >>> blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec > 192.675 MB/sec >>> blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec > 12.407 MB/sec > >>> Any help/ideas would be greatly appreciated. I have run through > all the >>> Insight diagnostics tools and it fails to find anything wrong with > the slow >>> server. > >>> Cheers, >>> Nathan > >>> _______________________________________________ >>> freebsd-performance@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >>> To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFKgnX+xJBWvpalMpkRAnVpAKCx95MbkyGOenqUwmuSmwnBa/l2awCglwpV L1cWIc+IauiDJSPn8fsqJKY= =8QiN -----END PGP SIGNATURE----- From grarpamp at gmail.com Wed Aug 12 04:53:26 2009 From: grarpamp at gmail.com (grarpamp) Date: Wed Aug 12 11:13:32 2009 Subject: RELENG_7 heavy disk = system crawls Message-ID: > cards aren't going to help with zfs. No, but for geli hifn(4) crypto(4)/(9), geli(8) might work if aes-cbc is indeed the mode geli uses. See the source I guess. > Does anyone make a disk controller with crypto built in? Yes. There are trays and cable dongles and things that do aes/des. And some drives are coming out with it in firmware. Does anyone like the cost, closed-source, and trust model of such hardware? > atacontrol I'm partly up against this because some failing drives are negotiating lower speeds for themselves. Never thought of is as a test tool though. > Pata doesn't do error detection on the control info ZFS handles that and informs user about silent corruption. Precisely because of the chained checksums. It's quite addicting. http://www.opensolaris.org/os/community/zfs/ Your little to no overhead for ffs sounds right as always. From wmoran at collaborativefusion.com Wed Aug 12 12:14:35 2009 From: wmoran at collaborativefusion.com (Bill Moran) Date: Wed Aug 12 12:14:55 2009 Subject: RELENG_7 heavy disk = system crawls In-Reply-To: References: Message-ID: <20090812080434.addbf0e0.wmoran@collaborativefusion.com> grarpamp wrote: > > > cards aren't going to help with zfs. > > No, but for geli hifn(4) crypto(4)/(9), geli(8) might work > if aes-cbc is indeed the mode geli uses. See the source I guess. Note that only the most expensive cards are faster than a CPU. Unless you've got a very large budget, you'll get more bang for your buck out of adding CPUs to the unit. During crypto ops, the system will have a spare CPU to use to encrypt, and when not doing crypto ops, you have the added benefit of more processing power for apps. Additionally, support for FreeBSD is spotty. And we've seen cards that work in FreeBSD, but do so at far below their claimed throughput. There are apparently some inefficiencies in the drivers. -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. **************************************************************** From krassi at bulinfo.net Thu Aug 13 05:35:32 2009 From: krassi at bulinfo.net (Krassimir Slavchev) Date: Thu Aug 13 05:35:40 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: References: Message-ID: <4A83A61E.6010009@bulinfo.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan Le Nevez wrote: > I?m fairly certain this is a hardware problem ? swapping the disks from > a known working install on another blade produced the same lousy > performance. Hmm, Check the SAS cables between the controller and the disks tray. Also check the connector's pins. I had such problems on a DL380. Best Regards > > Thanks for your help, time to put a call in with HP although without any > real errors to show them it is going to be a challenge. > > > On 12/08/09 5:57 PM, "Krassimir Slavchev" wrote: > > Is it possible to exchange disks between your blade1 and blade2 servers? > Or to remove disks from one server and connect them to another? > Also compare 'tunefs -p /' outputs > Also compare the read speed of a raw device with e.g. 'dd if=/dev/da0 > of=/dev/null bs=1m count=100' > > Nathan Le Nevez wrote: >> # df -h >> Filesystem Size Used Avail Capacity Mounted on >> /dev/da0s1a 496M 224M 232M 49% / >> devfs 1.0K 1.0K 0B 100% /dev >> /dev/da0s1e 496M 14K 456M 0% /tmp >> /dev/da0s1f 119G 623M 109G 0% /usr >> /dev/da0s1d 4.8G 346K 4.4G 0% /var >> # mount >> /dev/da0s1a on / (ufs, local) >> devfs on /dev (devfs, local) >> /dev/da0s1e on /tmp (ufs, local, soft-updates) >> /dev/da0s1f on /usr (ufs, local, soft-updates) >> /dev/da0s1d on /var (ufs, local, soft-updates) > >> / - Throughput 6.59862 MB/sec 4 procs >> /usr - Throughput 14.487 MB/sec 4 procs > > > >> On 12/08/09 3:41 PM, "Krassimir Slavchev" wrote: > >> Looks okay. >> How your disks are partitioned and from where you are running dbench. >> Look at the -D option. For example I have: >> / without soft updates -> Throughput 72.7276 MB/sec 4 >> procs >> /var with soft updates -> Throughput 286.528 MB/sec 4 procs > >> Are you sure that you are not running dbench on zfs or encrypted >> partition? > >> Nathan Le Nevez wrote: >>> # vmstat -i >>> interrupt total rate >>> irq1: atkbd0 18 0 >>> irq5: ohci0 ohci1+ 1 0 >>> irq19: ciss0 144916 3 >>> irq21: uhci0 22 0 >>> cpu0: timer 80002970 1999 >>> irq256: bce0 17042 0 >>> cpu2: timer 79994902 1999 >>> cpu1: timer 79994975 1999 >>> cpu3: timer 79995009 1999 >>> cpu6: timer 79994957 1999 >>> cpu5: timer 79995046 1999 >>> cpu4: timer 79995041 1999 >>> cpu7: timer 79995057 1999 >>> Total 640129956 16000 > >>> # camcontrol tags da0 >>> (pass0:ciss0:0:0:0): device openings: 254 > >>> Just for clarification, both systems are running amd64. > >>> Thanks, > >>> Nathan > >>> -----Original Message----- >>> From: Krassimir Slavchev [mailto:krassi@bulinfo.net] >>> Sent: Tuesday, 11 August 2009 9:45 PM >>> To: Nathan Le Nevez >>> Cc: freebsd-performance@freebsd.org >>> Subject: Re: Very slow I/O performance on HP BL465c > >>> Hi, > >>> What is the output of 'vmstat -i' and 'camcontrol tags da0' ? >>> I have a ML350 running 7-STABLE with same controller and disks and >>> performance is almost same as your good server. > >>> Nathan Le Nevez wrote: >>>> Hi, > >>>> I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which >> performs very >>>> poorly. Both have the same RAID controller and 2 x 146GB 10k SAS > disks >>>> configured in RAID-1. Both controllers have write-cache enabled. Both >>>> servers are running the same BIOS and firmware versions. Neither >> servers are >>>> running any services other than sshd. > >>>> Blade with good performance (2 x Opteron 2218, 8GB RAM): > >>>> ciss0: port 0x4000-0x40ff mem >>>> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 >> on pci80 >>>> ciss0: [ITHREAD] >>>> da0 at ciss0 bus 0 target 0 lun 0 >>>> da0: AFPi xCePdU D#i2r Leacuntc >> hAcecde!s >>>> s SCSI-5 device >>>> da0: 135.168MB/s transfers >>>> dSaM0P:: CAoP CPU #3 Launched! >>>> mmand Queueing Enabled >>>> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >>>> Blade with bad performance (2 x Opteron 2352, 16GB RAM): > >>>> ciss0: port 0x4000-0x40ff mem >>>> 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 >> on pci80 >>>> ciss0: [ITHREAD] >>>> da0 at ciss0 bus 0 target 0 lun 0 >>>> da0: Fixed Direct Access SCSI-5 device >>>> da0: 135.168MB/s transfers >>>> da0: Command Queueing Enabled >>>> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > >>>> # dbench -t 10 1 2 3 4 >>>> blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec >> 192.675 MB/sec >>>> blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec >> 12.407 MB/sec > >>>> Any help/ideas would be greatly appreciated. I have run through >> all the >>>> Insight diagnostics tools and it fails to find anything wrong with >> the slow >>>> server. > >>>> Cheers, >>>> Nathan > >>>> _______________________________________________ >>>> freebsd-performance@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >>>> To unsubscribe, send any mail to >> "freebsd-performance-unsubscribe@freebsd.org" > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFKg6YexJBWvpalMpkRAt+EAKCIQFP0lAv44TpcUPm2ZC3rP4opSwCfcOZN Gp4q2KxquOyvYJNYvTPk+Sc= =GRjk -----END PGP SIGNATURE----- From lambert at lambertfam.org Thu Aug 13 18:05:57 2009 From: lambert at lambertfam.org (Scott Lambert) Date: Thu Aug 13 18:06:03 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: <4A83A61E.6010009@bulinfo.net> References: <4A83A61E.6010009@bulinfo.net> Message-ID: <20090813173727.GF91291@sysmon.tcworks.net> > Nathan Le Nevez wrote: > > I?m fairly certain this is a hardware problem ? swapping the disks from > > a known working install on another blade produced the same lousy > > performance. > > >>>> Hi, > >>>> > >>>> I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which > >>>> performs very poorly. Both have the same RAID controller and 2 x > >>>> 146GB 10k SAS disks configured in RAID-1. Both controllers have > >>>> write-cache enabled. Both servers are running the same BIOS and > >>>> firmware versions. Neither servers are running any services other > >>>> than sshd. > >>>> > >>>> Blade with good performance (2 x Opteron 2218, 8GB RAM): > >>>> > >>>> Blade with bad performance (2 x Opteron 2352, 16GB RAM): > >>>> > >>>> # dbench -t 10 1 2 3 4 > >>>> blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec > >>>> blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec Sorry, I deleted the start of this thread. I figured someone else would suggest pulling half the RAM in the slow server. That seems to be the biggest difference in configuration, other than CPU model. I don't know that it could cause this problem, but it would seem to be easy enough to test if they are not in production, yet. Just grasping at staws here... -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org From bsam at ipt.ru Thu Aug 13 21:49:25 2009 From: bsam at ipt.ru (Boris Samorodov) Date: Thu Aug 13 21:49:31 2009 Subject: Very slow I/O performance on HP BL465c In-Reply-To: (Nathan Le Nevez's message of "Tue\, 11 Aug 2009 09\:53\:47 +1000") References: Message-ID: <10994730@ipt.ru> Nathan Le Nevez writes: > I'm running 7.2-p3 on 2x HP BL465c blade servers, one of which performs very > poorly. Both have the same RAID controller and 2 x 146GB 10k SAS disks > configured in RAID-1. Both controllers have write-cache enabled. Both > servers are running the same BIOS and firmware versions. Neither servers are > running any services other than sshd. > > Blade with good performance (2 x Opteron 2218, 8GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: AFPi xCePdU D#i2r Leacuntc hAcecde!s > s SCSI-5 device > da0: 135.168MB/s transfers > dSaM0P:: CAoP CPU #3 Launched! > mmand Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > Blade with bad performance (2 x Opteron 2352, 16GB RAM): > > ciss0: port 0x4000-0x40ff mem > 0xfdf80000-0xfdffffff,0xfdf70000-0xfdf77fff irq 19 at device 8.0 on pci80 > ciss0: [ITHREAD] > da0 at ciss0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-5 device > da0: 135.168MB/s transfers > da0: Command Queueing Enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > > # dbench -t 10 1 2 3 4 > blade1 183.456 MB/sec 236.86 MB/sec 299.28 MB/sec 192.675 MB/sec > blade2 6.97931 MB/sec 9.42293 MB/sec 10.2482 MB/sec 12.407 MB/sec > > Any help/ideas would be greatly appreciated. I have run through all the > Insight diagnostics tools and it fails to find anything wrong with the slow > server. The description of those blades at HP site says "HP Smart Array E200i storage controller with 64MB read cache and optional battery-backed write cache". Does both controllers have batteries? I've seen very poor perfomance at RAIDs (with write cache enabled) without batteries. -- WBR, bsam From cpardo at fastsoft.com Fri Aug 14 21:45:25 2009 From: cpardo at fastsoft.com (Carlos Pardo) Date: Fri Aug 14 21:45:33 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local><0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com><36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local><4A77094C.8030308@elischer.org><36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local><4A785F20.8050807@elischer.org><2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> Message-ID: Hi Jack, I have a quick question. We are getting too many missed packets per minute running about 3Gbs traffic. We can not use frame control in our application. We are assuming that there is no way to improve upon the problem since it seems to be a hardware limitation with the receive FIFO. We are using the Intel? 82598EB 10 Gigabit Ethernet Controller. When can we expect the next generation card from Intel? Thanks for any information you may provide. Typical error count "ix0: Missed Packets = 81174" after a few minutes. Best Regards, Cpardo -----Original Message----- From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd-performance@freebsd.org] On Behalf Of Invernizzi Fabrizio Sent: Wednesday, August 05, 2009 3:13 AM To: Jack Vogel; Julian Elischer Cc: freebsd-performance@freebsd.org; Stefan Lambrev Subject: RE: Test on 10GBE Intel based network card No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( ++fabrizio ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 ________________________________ From: Jack Vogel [mailto:jfvogel@gmail.com] Sent: marted? 4 agosto 2009 18.42 To: Julian Elischer Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev Subject: Re: Test on 10GBE Intel based network card Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :( I would set it to: kern.ipc.nmbclusters=262144 Also, I thought you were using the current driver, but now it looks like you are using something fairly old, use my latest code which is 1.8.8 Jack On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: Invernizzi Fabrizio wrote: The limitation that you see is about the max number of packets that FreeBSD can handle - it looks like your best performance is reached at 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps Am I correct that the maximum you can reach is around 639,000 packets per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? ah that's the whole point of tuning :-) there are severalpossibities: 1/ the card's interrupts are probably attache dto aonly 1 cpu, so that cpu can do no more work This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 124 processes: 13 running, 73 sleeping, 38 waiting CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq .... It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. 2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere. This I think is the problem. I am trying to understand how to 1- see where the heavy contention is (context switching? Some limiting setting?) 2- mitigate it there ia a lock profiling tool that right now I can't remember the name of.. look it up with google :-) FreeBSD lock profiling tool ah, first hit... http://blogs.epfl.ch/article/23832 is the machine still responsive to other networks while running at maximum capacity on this network? (make sure that the other networks are on a differnet CPU (hmm I can't remember how to do that :-). Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. Non stampare questa mail se non ? necessario. From jfvogel at gmail.com Fri Aug 14 22:15:30 2009 From: jfvogel at gmail.com (Jack Vogel) Date: Fri Aug 14 22:15:37 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: References: <36A93B31228D3B49B691AD31652BCAE9A4560DF911@GRFMBX702BA020.griffon.local> <0E567C7E-4EAA-4B89-9A8D-FD0450D32ED7@moneybookers.com> <36A93B31228D3B49B691AD31652BCAE9A4560DF947@GRFMBX702BA020.griffon.local> <4A77094C.8030308@elischer.org> <36A93B31228D3B49B691AD31652BCAE9A45696721F@GRFMBX702BA020.griffon.local> <4A785F20.8050807@elischer.org> <2a41acea0908040941y39f16c8cocb84b001e1e9f0de@mail.gmail.com> <36A93B31228D3B49B691AD31652BCAE9A45696743F@GRFMBX702BA020.griffon.local> Message-ID: <2a41acea0908141515o45b7c74g40dbddc32d4b754b@mail.gmail.com> I've talked over the issues with the guy on our team who has been most involved in 10G performance, he asserts that 3Gbs will saturate a single cpu with a small packet size, this is why you need multiqueue across multiple cores. He was dubious about the FIFO assertion, its a relative thing, if you can keep the thing drained it won't be a problem, doing that is a complex combination of factors, the cpu, the bus, the memory.... If you don't deal with the systemic issues just cuz you go from an 82598 to a 82599 is not going to solve things. What about LRO, are/can you use that? I never saw an answer about the forwarding question, you can't use LRO in that case of course. Regards, Jack On Fri, Aug 14, 2009 at 2:33 PM, Carlos Pardo wrote: > Hi Jack, > > I have a quick question. We are getting too many missed packets per minute > running about 3Gbs traffic. We can not use frame control in our application. > We are assuming that there is no way to improve upon the problem since it > seems to be a hardware limitation with the receive FIFO. We are using the > Intel? 82598EB 10 Gigabit Ethernet Controller. When can we expect the next > generation card from Intel? Thanks for any information you may provide. > > Typical error count "ix0: Missed Packets = 81174" after a few minutes. > > Best Regards, > > Cpardo > > > -----Original Message----- > From: owner-freebsd-performance@freebsd.org [mailto: > owner-freebsd-performance@freebsd.org] On Behalf Of Invernizzi Fabrizio > Sent: Wednesday, August 05, 2009 3:13 AM > To: Jack Vogel; Julian Elischer > Cc: freebsd-performance@freebsd.org; Stefan Lambrev > Subject: RE: Test on 10GBE Intel based network card > > No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( > > ++fabrizio > > ------------------------------------------------------------------ > Telecom Italia > Fabrizio INVERNIZZI > Technology - TILAB > Accesso Fisso e Trasporto > Via Reiss Romoli, 274 10148 Torino > Tel. +39 011 2285497 > Mob. +39 3316001344 > Fax +39 06 41867287 > > > ________________________________ > From: Jack Vogel [mailto:jfvogel@gmail.com] > Sent: marted? 4 agosto 2009 18.42 > To: Julian Elischer > Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev > Subject: Re: Test on 10GBE Intel based network card > > Your nmbclusters is very low, you list it twice so I'm assuming the second > value is > what it ends up being, 32K :( > > I would set it to: > > kern.ipc.nmbclusters=262144 > > Also, I thought you were using the current driver, but now it looks like > you are > using something fairly old, use my latest code which is 1.8.8 > > Jack > > > On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: > Invernizzi Fabrizio wrote: > The limitation that you see is about the max number of packets that > FreeBSD can handle - it looks like your best performance is reached at > 64 byte packets? > If you are meaning in term of Packet per second, you are right. These > are the packet per second measured during tests: > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > > Am I correct that the maximum you can reach is around 639,000 packets > per second? > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? > ah that's the whole point of tuning :-) > there are severalpossibities: > 1/ the card's interrupts are probably attache dto aonly 1 cpu, > so that cpu can do no more work > > This seems not to be the problem. See below a top snapshot during a > 64byte-long packet storm > > last pid: 8552; load averages: 0.40, 0.09, 0.03 > up > 0+20:36:58 09:40:29 > 124 processes: 13 running, 73 sleeping, 38 waiting > CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle > Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free > Swap: 2048M Total, 2048M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: > cpu3 > 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: > cpu0 > 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: > cpu2 > 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: > cpu1 > 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq > 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq > 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq > 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq > .... > > It looks like the 4 rxq processes are bound to the 4 available cores with > equal distribution. > > > > 2/ if more than 1 cpu is working, it may be that there is a lock in > heavy contention somewhere. > > This I think is the problem. I am trying to understand how to > 1- see where the heavy contention is (context switching? Some limiting > setting?) > 2- mitigate it > > > > there ia a lock profiling tool that right now I can't remember the name > of.. > > look it up with google :-) FreeBSD lock profiling tool > > ah, first hit... > > http://blogs.epfl.ch/article/23832 > > > > is the machine still responsive to other networks while > running at maximum capacity on this network? (make sure that > the other networks are on a differnet CPU (hmm I can't remember how to > do that :-). > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > _______________________________________________ > freebsd-performance@freebsd.org > mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org freebsd-performance-unsubscribe@freebsd.org>" > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. > Non stampare questa mail se non ? necessario. > > From cpardo at fastsoft.com Mon Aug 17 21:52:47 2009 From: cpardo at fastsoft.com (Carlos Pardo) Date: Mon Aug 17 21:52:54 2009 Subject: Test on 10GBE Intel based network card Message-ID: Hi Jack, Thanks for the quick response. We can not use LRO because of the way we accelerate on the WAN ports. We just moved from 7.0 to 8.0 to use your latest driver (1.8.8). One thing we do not understand in 8.0. We are having insane numbers for XON/XOFF Rcvd counters with essentially no traffic. Driver version 1.2.16 works fine. Who should we contact for help? ix0: Std Mbuf Failed = 0 ix0: Missed Packets = 0 ix0: Receive length errors = 0 ix0: Crc errors = 0 ix0: Driver dropped packets = 0 ix0: watchdog timeouts = 0 ix0: XON Rcvd = 7950055973552 ix0: XON Xmtd = 0 ix0: XOFF Rcvd = 7950055973552 ix0: XOFF Xmtd = 0 ix0: Total Packets Rcvd = 2149 ix0: Good Packets Rcvd = 2149 ix0: Good Packets Xmtd = 1001 ix0: TSO Transmissions = 0 ix1: Std Mbuf Failed = 0 ix1: Missed Packets = 0 ix1: Receive length errors = 0 ix1: Crc errors = 0 ix1: Driver dropped packets = 0 ix1: watchdog timeouts = 0 ix1: XON Rcvd = 7946320044993 ix1: XON Xmtd = 0 ix1: XOFF Rcvd = 7946320044993 ix1: XOFF Xmtd = 0 ix1: Total Packets Rcvd = 1002 ix1: Good Packets Rcvd = 1002 ix1: Good Packets Xmtd = 1588 ix1: TSO Transmissions = 0 Regards, C Pardo From: Jack Vogel [mailto:jfvogel@gmail.com] Sent: Friday, August 14, 2009 3:15 PM To: Carlos Pardo Cc: freebsd-performance@freebsd.org Subject: Re: Test on 10GBE Intel based network card I've talked over the issues with the guy on our team who has been most involved in 10G performance, he asserts that 3Gbs will saturate a single cpu with a small packet size, this is why you need multiqueue across multiple cores. He was dubious about the FIFO assertion, its a relative thing, if you can keep the thing drained it won't be a problem, doing that is a complex combination of factors, the cpu, the bus, the memory.... If you don't deal with the systemic issues just cuz you go from an 82598 to a 82599 is not going to solve things. What about LRO, are/can you use that? I never saw an answer about the forwarding question, you can't use LRO in that case of course. Regards, Jack On Fri, Aug 14, 2009 at 2:33 PM, Carlos Pardo wrote: Hi Jack, I have a quick question. We are getting too many missed packets per minute running about 3Gbs traffic. We can not use frame control in our application. We are assuming that there is no way to improve upon the problem since it seems to be a hardware limitation with the receive FIFO. We are using the Intel? 82598EB 10 Gigabit Ethernet Controller. When can we expect the next generation card from Intel? Thanks for any information you may provide. Typical error count "ix0: Missed Packets = 81174" after a few minutes. Best Regards, Cpardo -----Original Message----- From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd-performance@freebsd.org] On Behalf Of Invernizzi Fabrizio Sent: Wednesday, August 05, 2009 3:13 AM To: Jack Vogel; Julian Elischer Cc: freebsd-performance@freebsd.org; Stefan Lambrev Subject: RE: Test on 10GBE Intel based network card No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( ++fabrizio ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 ________________________________ From: Jack Vogel [mailto:jfvogel@gmail.com] Sent: marted? 4 agosto 2009 18.42 To: Julian Elischer Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev Subject: Re: Test on 10GBE Intel based network card Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :( I would set it to: kern.ipc.nmbclusters=262144 Also, I thought you were using the current driver, but now it looks like you are using something fairly old, use my latest code which is 1.8.8 Jack On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: Invernizzi Fabrizio wrote: The limitation that you see is about the max number of packets that FreeBSD can handle - it looks like your best performance is reached at 64 byte packets? If you are meaning in term of Packet per second, you are right. These are the packet per second measured during tests: 64 byte: 610119 Pps 512 byte: 516917 Pps 1492 byte: 464962 Pps Am I correct that the maximum you can reach is around 639,000 packets per second? Yes, as you can see the maximum is 610119 Pps. Where does this limit come from? ah that's the whole point of tuning :-) there are severalpossibities: 1/ the card's interrupts are probably attache dto aonly 1 cpu, so that cpu can do no more work This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm last pid: 8552; load averages: 0.40, 0.09, 0.03 up 0+20:36:58 09:40:29 124 processes: 13 running, 73 sleeping, 38 waiting CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: cpu3 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: cpu0 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: cpu2 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: cpu1 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq .... It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution. 2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere. This I think is the problem. I am trying to understand how to 1- see where the heavy contention is (context switching? Some limiting setting?) 2- mitigate it there ia a lock profiling tool that right now I can't remember the name of.. look it up with google :-) FreeBSD lock profiling tool ah, first hit... http://blogs.epfl.ch/article/23832 is the machine still responsive to other networks while running at maximum capacity on this network? (make sure that the other networks are on a differnet CPU (hmm I can't remember how to do that :-). Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. Non stampare questa mail se non ? necessario. From jfvogel at gmail.com Mon Aug 17 22:03:38 2009 From: jfvogel at gmail.com (Jack Vogel) Date: Mon Aug 17 22:03:45 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: References: Message-ID: <2a41acea0908171503r3613d430ib154cd3445eb1309@mail.gmail.com> Who ya gonna call, why me of course, its my driver :) Hmmm, the numbers on those look bogus, like some uninitialized variables. You did say you aren't using flow control, right? Jack On Mon, Aug 17, 2009 at 2:52 PM, Carlos Pardo wrote: > Hi Jack, > > > > Thanks for the quick response. We can not use LRO because of the way we > accelerate on the WAN ports. We just moved from 7.0 to 8.0 to use your > latest driver (1.8.8). One thing we do not understand in 8.0. We are having > insane numbers for XON/XOFF Rcvd counters with essentially no traffic. > Driver version 1.2.16 works fine. Who should we contact for help? > > > > ix0: Std Mbuf Failed = 0 > > ix0: Missed Packets = 0 > > ix0: Receive length errors = 0 > > ix0: Crc errors = 0 > > ix0: Driver dropped packets = 0 > > ix0: watchdog timeouts = 0 > > *ix0: XON Rcvd = 7950055973552* > > ix0: XON Xmtd = 0 > > *ix0: XOFF Rcvd = 7950055973552* > > ix0: XOFF Xmtd = 0 > > ix0: Total Packets Rcvd = 2149 > > ix0: Good Packets Rcvd = 2149 > > ix0: Good Packets Xmtd = 1001 > > ix0: TSO Transmissions = 0 > > ix1: Std Mbuf Failed = 0 > > ix1: Missed Packets = 0 > > ix1: Receive length errors = 0 > > ix1: Crc errors = 0 > > ix1: Driver dropped packets = 0 > > ix1: watchdog timeouts = 0 > > *ix1: XON Rcvd = 7946320044993* > > ix1: XON Xmtd = 0 > > *ix1: XOFF Rcvd = 7946320044993* > > ix1: XOFF Xmtd = 0 > > ix1: Total Packets Rcvd = 1002 > > ix1: Good Packets Rcvd = 1002 > > ix1: Good Packets Xmtd = 1588 > > ix1: TSO Transmissions = 0 > > > > Regards, > > > > C Pardo > > > > *From:* Jack Vogel [mailto:jfvogel@gmail.com] > *Sent:* Friday, August 14, 2009 3:15 PM > *To:* Carlos Pardo > *Cc:* freebsd-performance@freebsd.org > *Subject:* Re: Test on 10GBE Intel based network card > > > > I've talked over the issues with the guy on our team who has been most > involved in 10G performance, he asserts that 3Gbs will saturate a single > cpu with a small packet size, this is why you need multiqueue across > multiple cores. He was dubious about the FIFO assertion, its a relative > thing, if you can keep the thing drained it won't be a problem, doing that > is a complex combination of factors, the cpu, the bus, the memory.... > > If you don't deal with the systemic issues just cuz you go from an 82598 > to a 82599 is not going to solve things. > > What about LRO, are/can you use that? I never saw an answer about the > forwarding question, you can't use LRO in that case of course. > > Regards, > > Jack > > On Fri, Aug 14, 2009 at 2:33 PM, Carlos Pardo wrote: > > Hi Jack, > > I have a quick question. We are getting too many missed packets per minute > running about 3Gbs traffic. We can not use frame control in our application. > We are assuming that there is no way to improve upon the problem since it > seems to be a hardware limitation with the receive FIFO. We are using the > Intel? 82598EB 10 Gigabit Ethernet Controller. When can we expect the next > generation card from Intel? Thanks for any information you may provide. > > Typical error count "ix0: Missed Packets = 81174" after a few minutes. > > Best Regards, > > Cpardo > > > > -----Original Message----- > From: owner-freebsd-performance@freebsd.org [mailto: > owner-freebsd-performance@freebsd.org] On Behalf Of Invernizzi Fabrizio > > Sent: Wednesday, August 05, 2009 3:13 AM > To: Jack Vogel; Julian Elischer > Cc: freebsd-performance@freebsd.org; Stefan Lambrev > Subject: RE: Test on 10GBE Intel based network card > > No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<((((( > > ++fabrizio > > ------------------------------------------------------------------ > Telecom Italia > Fabrizio INVERNIZZI > Technology - TILAB > Accesso Fisso e Trasporto > Via Reiss Romoli, 274 10148 Torino > Tel. +39 011 2285497 > Mob. +39 3316001344 > Fax +39 06 41867287 > > > ________________________________ > From: Jack Vogel [mailto:jfvogel@gmail.com] > Sent: marted? 4 agosto 2009 18.42 > To: Julian Elischer > Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan Lambrev > Subject: Re: Test on 10GBE Intel based network card > > Your nmbclusters is very low, you list it twice so I'm assuming the second > value is > what it ends up being, 32K :( > > I would set it to: > > kern.ipc.nmbclusters=262144 > > Also, I thought you were using the current driver, but now it looks like > you are > using something fairly old, use my latest code which is 1.8.8 > > Jack > > On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > wrote: > Invernizzi Fabrizio wrote: > The limitation that you see is about the max number of packets that > FreeBSD can handle - it looks like your best performance is reached at > 64 byte packets? > If you are meaning in term of Packet per second, you are right. These > are the packet per second measured during tests: > 64 byte: 610119 Pps > 512 byte: 516917 Pps > 1492 byte: 464962 Pps > > > Am I correct that the maximum you can reach is around 639,000 packets > per second? > Yes, as you can see the maximum is 610119 Pps. > Where does this limit come from? > ah that's the whole point of tuning :-) > there are severalpossibities: > 1/ the card's interrupts are probably attache dto aonly 1 cpu, > so that cpu can do no more work > > This seems not to be the problem. See below a top snapshot during a > 64byte-long packet storm > > last pid: 8552; load averages: 0.40, 0.09, 0.03 > up > 0+20:36:58 09:40:29 > 124 processes: 13 running, 73 sleeping, 38 waiting > CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, 1.5% idle > Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free > Swap: 2048M Total, 2048M Free > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 11 root 1 171 ki31 0K 16K RUN 3 20.2H 51.17% idle: > cpu3 > 14 root 1 171 ki31 0K 16K RUN 0 20.2H 50.88% idle: > cpu0 > 12 root 1 171 ki31 0K 16K RUN 2 20.2H 50.49% idle: > cpu2 > 13 root 1 171 ki31 0K 16K RUN 1 20.2H 50.10% idle: > cpu1 > 42 root 1 -68 - 0K 16K RUN 1 14:20 36.47% ix0 rxq > 38 root 1 -68 - 0K 16K CPU0 0 14:15 36.08% ix0 rxq > 44 root 1 -68 - 0K 16K CPU2 2 14:08 34.47% ix0 rxq > 40 root 1 -68 - 0K 16K CPU3 3 13:42 32.37% ix0 rxq > .... > > It looks like the 4 rxq processes are bound to the 4 available cores with > equal distribution. > > > > 2/ if more than 1 cpu is working, it may be that there is a lock in > heavy contention somewhere. > > This I think is the problem. I am trying to understand how to > 1- see where the heavy contention is (context switching? Some limiting > setting?) > 2- mitigate it > > > > there ia a lock profiling tool that right now I can't remember the name > of.. > > look it up with google :-) FreeBSD lock profiling tool > > ah, first hit... > > http://blogs.epfl.ch/article/23832 > > > > is the machine still responsive to other networks while > running at maximum capacity on this network? (make sure that > the other networks are on a differnet CPU (hmm I can't remember how to > do that :-). > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > _______________________________________________ > > freebsd-performance@freebsd.org > mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org freebsd-performance-unsubscribe@freebsd.org>" > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle > persone indicate. La diffusione, copia o qualsiasi altra azione derivante > dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora > abbiate ricevuto questo documento per errore siete cortesemente pregati di > darne immediata comunicazione al mittente e di provvedere alla sua > distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain privileged > information intended for the addressee(s) only. Dissemination, copying, > printing or use by anybody else is unauthorised. If you are not the intended > recipient, please delete this message and any attachments and advise the > sender by return e-mail, Thanks. > > [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. > Non stampare questa mail se non ? necessario. > > > From fabrizio.invernizzi at telecomitalia.it Tue Aug 18 07:21:26 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Tue Aug 18 07:21:33 2009 Subject: Test on 10GBE Intel based network card In-Reply-To: <2a41acea0908171503r3613d430ib154cd3445eb1309@mail.gmail.com> References: <2a41acea0908171503r3613d430ib154cd3445eb1309@mail.gmail.com> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A4569679F5@GRFMBX702BA020.griffon.local> Hi I am using ixgbe 1.8.6 on FreeBSD 7.2-RELEASE (amd64). INT-64# sysctl -a | grep dev.ix | grep desc dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 1.8.6 dev.ix.1.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 1.8.6 I see same strange big number of XON/XOFF Rcvd. ix0: XON Rcvd = 5828048552040 ix0: XON Xmtd = 0 XOFF Rcvd = 5828048552040 ix0: XOFF Xmtd = 0 Flow control disabled. INT-64# sysctl -a | grep dev.ix | grep flow_control dev.ix.0.flow_control: 0 dev.ix.1.flow_control: 0 Fabrizio > -----Original Message----- > From: owner-freebsd-performance@freebsd.org > [mailto:owner-freebsd-performance@freebsd.org] On Behalf Of Jack Vogel > Sent: marted? 18 agosto 2009 0.04 > To: Carlos Pardo > Cc: freebsd-performance@freebsd.org > Subject: Re: Test on 10GBE Intel based network card > > Who ya gonna call, why me of course, its my driver :) > > Hmmm, the numbers on those look bogus, like some > uninitialized variables. > You did say you aren't using flow control, right? > > Jack > > > On Mon, Aug 17, 2009 at 2:52 PM, Carlos Pardo > wrote: > > > Hi Jack, > > > > > > > > Thanks for the quick response. We can not use LRO because > of the way > > we accelerate on the WAN ports. We just moved from 7.0 to > 8.0 to use > > your latest driver (1.8.8). One thing we do not understand > in 8.0. We > > are having insane numbers for XON/XOFF Rcvd counters with > essentially no traffic. > > Driver version 1.2.16 works fine. Who should we contact for help? > > > > > > > > ix0: Std Mbuf Failed = 0 > > > > ix0: Missed Packets = 0 > > > > ix0: Receive length errors = 0 > > > > ix0: Crc errors = 0 > > > > ix0: Driver dropped packets = 0 > > > > ix0: watchdog timeouts = 0 > > > > *ix0: XON Rcvd = 7950055973552* > > > > ix0: XON Xmtd = 0 > > > > *ix0: XOFF Rcvd = 7950055973552* > > > > ix0: XOFF Xmtd = 0 > > > > ix0: Total Packets Rcvd = 2149 > > > > ix0: Good Packets Rcvd = 2149 > > > > ix0: Good Packets Xmtd = 1001 > > > > ix0: TSO Transmissions = 0 > > > > ix1: Std Mbuf Failed = 0 > > > > ix1: Missed Packets = 0 > > > > ix1: Receive length errors = 0 > > > > ix1: Crc errors = 0 > > > > ix1: Driver dropped packets = 0 > > > > ix1: watchdog timeouts = 0 > > > > *ix1: XON Rcvd = 7946320044993* > > > > ix1: XON Xmtd = 0 > > > > *ix1: XOFF Rcvd = 7946320044993* > > > > ix1: XOFF Xmtd = 0 > > > > ix1: Total Packets Rcvd = 1002 > > > > ix1: Good Packets Rcvd = 1002 > > > > ix1: Good Packets Xmtd = 1588 > > > > ix1: TSO Transmissions = 0 > > > > > > > > Regards, > > > > > > > > C Pardo > > > > > > > > *From:* Jack Vogel [mailto:jfvogel@gmail.com] > > *Sent:* Friday, August 14, 2009 3:15 PM > > *To:* Carlos Pardo > > *Cc:* freebsd-performance@freebsd.org > > *Subject:* Re: Test on 10GBE Intel based network card > > > > > > > > I've talked over the issues with the guy on our team who > has been most > > involved in 10G performance, he asserts that 3Gbs will saturate a > > single cpu with a small packet size, this is why you need > multiqueue > > across multiple cores. He was dubious about the FIFO > assertion, its a > > relative thing, if you can keep the thing drained it won't be a > > problem, doing that is a complex combination of factors, > the cpu, the bus, the memory.... > > > > If you don't deal with the systemic issues just cuz you go from an > > 82598 to a 82599 is not going to solve things. > > > > What about LRO, are/can you use that? I never saw an answer > about the > > forwarding question, you can't use LRO in that case of course. > > > > Regards, > > > > Jack > > > > On Fri, Aug 14, 2009 at 2:33 PM, Carlos Pardo > wrote: > > > > Hi Jack, > > > > I have a quick question. We are getting too many missed packets per > > minute running about 3Gbs traffic. We can not use frame > control in our application. > > We are assuming that there is no way to improve upon the > problem since > > it seems to be a hardware limitation with the receive FIFO. We are > > using the Intel? 82598EB 10 Gigabit Ethernet Controller. > When can we > > expect the next generation card from Intel? Thanks for any > information you may provide. > > > > Typical error count "ix0: Missed Packets = 81174" after a > few minutes. > > > > Best Regards, > > > > Cpardo > > > > > > > > -----Original Message----- > > From: owner-freebsd-performance@freebsd.org [mailto: > > owner-freebsd-performance@freebsd.org] On Behalf Of Invernizzi > > Fabrizio > > > > Sent: Wednesday, August 05, 2009 3:13 AM > > To: Jack Vogel; Julian Elischer > > Cc: freebsd-performance@freebsd.org; Stefan Lambrev > > Subject: RE: Test on 10GBE Intel based network card > > > > No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver > > :<((((( > > > > ++fabrizio > > > > ------------------------------------------------------------------ > > Telecom Italia > > Fabrizio INVERNIZZI > > Technology - TILAB > > Accesso Fisso e Trasporto > > Via Reiss Romoli, 274 10148 Torino > > Tel. +39 011 2285497 > > Mob. +39 3316001344 > > Fax +39 06 41867287 > > > > > > ________________________________ > > From: Jack Vogel [mailto:jfvogel@gmail.com] > > Sent: marted? 4 agosto 2009 18.42 > > To: Julian Elischer > > Cc: Invernizzi Fabrizio; freebsd-performance@freebsd.org; Stefan > > Lambrev > > Subject: Re: Test on 10GBE Intel based network card > > > > Your nmbclusters is very low, you list it twice so I'm assuming the > > second value is what it ends up being, 32K :( > > > > I would set it to: > > > > kern.ipc.nmbclusters=262144 > > > > Also, I thought you were using the current driver, but now it looks > > like you are using something fairly old, use my latest > code which is > > 1.8.8 > > > > Jack > > > > On Tue, Aug 4, 2009 at 9:17 AM, Julian Elischer > > > wrote: > > Invernizzi Fabrizio wrote: > > The limitation that you see is about the max number of packets that > > FreeBSD can handle - it looks like your best performance is > reached at > > 64 byte packets? > > If you are meaning in term of Packet per second, you are > right. These > > are the packet per second measured during tests: > > 64 byte: 610119 Pps > > 512 byte: 516917 Pps > > 1492 byte: 464962 Pps > > > > > > Am I correct that the maximum you can reach is around > 639,000 packets > > per second? > > Yes, as you can see the maximum is 610119 Pps. > > Where does this limit come from? > > ah that's the whole point of tuning :-) there are > severalpossibities: > > 1/ the card's interrupts are probably attache dto aonly 1 > cpu, so that > > cpu can do no more work > > > > This seems not to be the problem. See below a top snapshot during a > > 64byte-long packet storm > > > > last pid: 8552; load averages: 0.40, 0.09, 0.03 > > up > > 0+20:36:58 09:40:29 > > 124 processes: 13 running, 73 sleeping, 38 waiting > > CPU: 0.0% user, 0.0% nice, 86.3% system, 12.3% interrupt, > 1.5% idle > > Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M > > Free > > Swap: 2048M Total, 2048M Free > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME > WCPU COMMAND > > 11 root 1 171 ki31 0K 16K RUN 3 20.2H > 51.17% idle: > > cpu3 > > 14 root 1 171 ki31 0K 16K RUN 0 20.2H > 50.88% idle: > > cpu0 > > 12 root 1 171 ki31 0K 16K RUN 2 20.2H > 50.49% idle: > > cpu2 > > 13 root 1 171 ki31 0K 16K RUN 1 20.2H > 50.10% idle: > > cpu1 > > 42 root 1 -68 - 0K 16K RUN 1 14:20 > 36.47% ix0 rxq > > 38 root 1 -68 - 0K 16K CPU0 0 14:15 > 36.08% ix0 rxq > > 44 root 1 -68 - 0K 16K CPU2 2 14:08 > 34.47% ix0 rxq > > 40 root 1 -68 - 0K 16K CPU3 3 13:42 > 32.37% ix0 rxq > > .... > > > > It looks like the 4 rxq processes are bound to the 4 > available cores > > with equal distribution. > > > > > > > > 2/ if more than 1 cpu is working, it may be that there is a lock in > > heavy contention somewhere. > > > > This I think is the problem. I am trying to understand how to > > 1- see where the heavy contention is (context switching? > Some limiting > > setting?) > > 2- mitigate it > > > > > > > > there ia a lock profiling tool that right now I can't remember the > > name of.. > > > > look it up with google :-) FreeBSD lock profiling tool > > > > ah, first hit... > > > > http://blogs.epfl.ch/article/23832 > > > > > > > > is the machine still responsive to other networks while running at > > maximum capacity on this network? (make sure that the other > networks > > are on a differnet CPU (hmm I can't remember how to do that :-). > > > > > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > > alle persone indicate. La diffusione, copia o qualsiasi > altra azione > > derivante dalla conoscenza di queste informazioni sono > rigorosamente > > vietate. Qualora abbiate ricevuto questo documento per errore siete > > cortesemente pregati di darne immediata comunicazione al > mittente e di > > provvedere alla sua distruzione, Grazie. > > > > This e-mail and any attachments is confidential and may contain > > privileged information intended for the addressee(s) only. > > Dissemination, copying, printing or use by anybody else is > > unauthorised. If you are not the intended recipient, please delete > > this message and any attachments and advise the sender by > return e-mail, Thanks. > > > > _______________________________________________ > > > > > freebsd-performance@freebsd.org > > > > mailing list > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > > > To unsubscribe, send any mail to " > > freebsd-performance-unsubscribe@freebsd.org > freebsd-performance-unsubscribe@freebsd.org>" > > > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > > alle persone indicate. La diffusione, copia o qualsiasi > altra azione > > derivante dalla conoscenza di queste informazioni sono > rigorosamente > > vietate. Qualora abbiate ricevuto questo documento per errore siete > > cortesemente pregati di darne immediata comunicazione al > mittente e di > > provvedere alla sua distruzione, Grazie. > > > > This e-mail and any attachments is confidential and may contain > > privileged information intended for the addressee(s) only. > > Dissemination, copying, printing or use by anybody else is > > unauthorised. If you are not the intended recipient, please delete > > this message and any attachments and advise the sender by > return e-mail, Thanks. > > > > > [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta > l'ambiente. > > Non stampare questa mail se non ? necessario. > > > > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From fabrizio.invernizzi at telecomitalia.it Wed Aug 19 10:13:41 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Wed Aug 19 10:13:48 2009 Subject: Strange CPU distributionat very high level bandwidth Message-ID: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> Hi all i am going on with some performance tests on a 10gbe network card with FreeBSD. I am doing this test: I send UDP traffic to be forwarded to the other port of the card on both the card ports. Using 1492-long packets i am uppering the number of packets per second i sent In order to see wich is the maximum bandwidth (or pps) the system can support without losses. The limit seems to be about 1890Mbps per port (3870 Mbps total). Looking more in deep the CPU behaviour i see this : - uppering the sent pps results in uppering the intterrupt time (about 90%) - when i am very strict to the limit, interrupt time falls to about 10% and CPU is always (85%) in system (rx/tx driver procedure) Questions: - Is not the AIM intended to contrast this behaviour to limit interrupts sent to CPU? (nothing changes if i disable it) - Why does the system start loosing pkts in that condition? - Why does the system seem to perform better when it is managing more context switches? These are my system details: - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core installed. - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit Ethernet PCI Express Server Adapter Intel(r) based (chip 82598EB). - FreeBSD 7.2-RELEASE (64 bit) Driver ixgbe-1.8.6 hw.intr_storm_threshold:2000000 dev.ix.0.low_latency: 128 dev.ix.0.ave_latency: 400 dev.ix.0.bulk_latency: 1200 dev.ix.1.low_latency: 128 dev.ix.1.ave_latency: 400 dev.ix.1.bulk_latency: 1200 ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta l'ambiente. Non stampare questa mail se non ? necessario. From phoemix at harmless.hu Wed Aug 19 11:14:00 2009 From: phoemix at harmless.hu (Gergely CZUCZY) Date: Wed Aug 19 11:14:07 2009 Subject: Strange CPU distributionat very high level bandwidth In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> Message-ID: <20090819125707.0000396e@unknown> Hello, Just a question. May I ask how many pps is this traffic (packet per second). Forward performance actually depends on the pps rate and not on the bandwidth usage as far as my experience goes. As I've calculated by your given data, it might be around 166Kpps, but i might be wrong there. On Wed, 19 Aug 2009 12:13:37 +0200 Invernizzi Fabrizio wrote: > Hi all > > i am going on with some performance tests on a 10gbe network card > with FreeBSD. > > I am doing this test: I send UDP traffic to be forwarded to the other > port of the card on both the card ports. Using 1492-long packets i am > uppering the number of packets per second i sent In order to see > wich is the maximum bandwidth (or pps) the system can support without > losses. > > The limit seems to be about 1890Mbps per port (3870 Mbps total). > Looking more in deep the CPU behaviour i see this : > - uppering the sent pps results in uppering the intterrupt time > (about 90%) > - when i am very strict to the limit, interrupt time falls to about > 10% and CPU is always (85%) in system (rx/tx driver procedure) > > Questions: > - Is not the AIM intended to contrast this behaviour to limit > interrupts sent to CPU? (nothing changes if i disable it) > - Why does the system start loosing pkts in that condition? > - Why does the system seem to perform better when it is managing more > context switches? > > > > These are my system details: > > - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 > cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core > installed. > > - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit > Ethernet PCI Express Server Adapter Intel(r) based (chip 82598EB). > > - FreeBSD 7.2-RELEASE (64 bit) > > Driver ixgbe-1.8.6 > > hw.intr_storm_threshold:2000000 > > dev.ix.0.low_latency: 128 > dev.ix.0.ave_latency: 400 > dev.ix.0.bulk_latency: 1200 > dev.ix.1.low_latency: 128 > dev.ix.1.ave_latency: 400 > dev.ix.1.bulk_latency: 1200 > > ------------------------------------------------------------------ > Telecom Italia > Fabrizio INVERNIZZI > Technology - TILAB > Accesso Fisso e Trasporto > Via Reiss Romoli, 274 10148 Torino > Tel. +39 011 2285497 > Mob. +39 3316001344 > Fax +39 06 41867287 > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > alle persone indicate. La diffusione, copia o qualsiasi altra azione > derivante dalla conoscenza di queste informazioni sono rigorosamente > vietate. Qualora abbiate ricevuto questo documento per errore siete > cortesemente pregati di darne immediata comunicazione al mittente e > di provvedere alla sua distruzione, Grazie. > > This e-mail and any attachments is confidential and may contain > privileged information intended for the addressee(s) only. > Dissemination, copying, printing or use by anybody else is > unauthorised. If you are not the intended recipient, please delete > this message and any attachments and advise the sender by return > e-mail, Thanks. > > [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta > l'ambiente. Non stampare questa mail se non ? necessario. > > -- Sincerely, Gergely CZUCZY Harmless Digital Bt +36-30-9702963 From fabrizio.invernizzi at telecomitalia.it Wed Aug 19 11:24:14 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Wed Aug 19 11:24:21 2009 Subject: Strange CPU distributionat very high level bandwidth In-Reply-To: <20090819125707.0000396e@unknown> References: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> <20090819125707.0000396e@unknown> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A456967AFF@GRFMBX702BA020.griffon.local> There is an error in my previous email: the upper limit I see is 5722 Mbps (total). I am sending 239700 pps per interface, 1492-bytes long. Fabrizio > -----Original Message----- > From: Gergely CZUCZY [mailto:phoemix@harmless.hu] > Sent: mercoled? 19 agosto 2009 12.57 > To: Invernizzi Fabrizio > Cc: freebsd-performance@freebsd.org > Subject: Re: Strange CPU distributionat very high level bandwidth > > Hello, > > Just a question. May I ask how many pps is this traffic > (packet per second). Forward performance actually depends on > the pps rate and not on the bandwidth usage as far as my > experience goes. As I've calculated by your given data, it > might be around 166Kpps, but i might be wrong there. > > On Wed, 19 Aug 2009 12:13:37 +0200 > Invernizzi Fabrizio wrote: > > > Hi all > > > > i am going on with some performance tests on a 10gbe > network card with > > FreeBSD. > > > > I am doing this test: I send UDP traffic to be forwarded to > the other > > port of the card on both the card ports. Using 1492-long > packets i am > > uppering the number of packets per second i sent In order > to see wich > > is the maximum bandwidth (or pps) the system can support without > > losses. > > > > The limit seems to be about 1890Mbps per port (3870 Mbps total). > > Looking more in deep the CPU behaviour i see this : > > - uppering the sent pps results in uppering the intterrupt time > > (about 90%) > > - when i am very strict to the limit, interrupt time > falls to about > > 10% and CPU is always (85%) in system (rx/tx driver procedure) > > > > Questions: > > - Is not the AIM intended to contrast this behaviour to limit > > interrupts sent to CPU? (nothing changes if i disable it) > > - Why does the system start loosing pkts in that condition? > > - Why does the system seem to perform better when it is > managing more > > context switches? > > > > > > > > These are my system details: > > > > - HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: > 1333 MHz, L2 > > cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core > > installed. > > > > - Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) > 10 Gigabit > > Ethernet PCI Express Server Adapter Intel(r) based (chip 82598EB). > > > > - FreeBSD 7.2-RELEASE (64 bit) > > > > Driver ixgbe-1.8.6 > > > > hw.intr_storm_threshold:2000000 > > > > dev.ix.0.low_latency: 128 > > dev.ix.0.ave_latency: 400 > > dev.ix.0.bulk_latency: 1200 > > dev.ix.1.low_latency: 128 > > dev.ix.1.ave_latency: 400 > > dev.ix.1.bulk_latency: 1200 > > > > ------------------------------------------------------------------ > > Telecom Italia > > Fabrizio INVERNIZZI > > Technology - TILAB > > Accesso Fisso e Trasporto > > Via Reiss Romoli, 274 10148 Torino > > Tel. +39 011 2285497 > > Mob. +39 3316001344 > > Fax +39 06 41867287 > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente > > alle persone indicate. La diffusione, copia o qualsiasi > altra azione > > derivante dalla conoscenza di queste informazioni sono > rigorosamente > > vietate. Qualora abbiate ricevuto questo documento per errore siete > > cortesemente pregati di darne immediata comunicazione al > mittente e di > > provvedere alla sua distruzione, Grazie. > > > > This e-mail and any attachments is confidential and may contain > > privileged information intended for the addressee(s) only. > > Dissemination, copying, printing or use by anybody else is > > unauthorised. If you are not the intended recipient, please delete > > this message and any attachments and advise the sender by return > > e-mail, Thanks. > > > > [cid:00000000000000000000000000000001@TI.Disclaimer]Rispetta > > l'ambiente. Non stampare questa mail se non ? necessario. > > > > > > > > -- > Sincerely, > Gergely CZUCZY > Harmless Digital Bt > > +36-30-9702963 > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks. From ivoras at freebsd.org Thu Aug 20 23:35:48 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Thu Aug 20 23:35:55 2009 Subject: Strange CPU distributionat very high level bandwidth In-Reply-To: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> References: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> Message-ID: Invernizzi Fabrizio wrote: > Hi all > > i am going on with some performance tests on a 10gbe network card with FreeBSD. > > I am doing this test: I send UDP traffic to be forwarded to the other port of the card on both the card ports. > Using 1492-long packets i am uppering the number of packets per second i sent In order to see wich is the maximum bandwidth (or pps) the system can support without losses. > > The limit seems to be about 1890Mbps per port (3870 Mbps total). > Looking more in deep the CPU behaviour i see this : > - uppering the sent pps results in uppering the intterrupt time (about 90%) > - when i am very strict to the limit, interrupt time falls to about 10% and CPU is always (85%) in system (rx/tx driver procedure) > > Questions: > - Is not the AIM intended to contrast this behaviour to limit interrupts sent to CPU? (nothing changes if i disable it) > - Why does the system start loosing pkts in that condition? > - Why does the system seem to perform better when it is managing more context switches? > > - FreeBSD 7.2-RELEASE (64 bit) One idea for you, not directly tied to forwarding as is but to the recent development of multithreaded packet acceptance code, is to use 8.x (currently in BETA so usual precautions about debugging being enabled apply) and then play with netisr and worker thread settings. See the source here: http://svn.freebsd.org/viewvc/base/head/sys/net/netisr.c?view=markup&pathrev=195078 and the comments starting at "Three direct dispatch policies are supported". The code is experimental and thus disabled in 8.0 unless a combination of the following loader tunables are set: net.isr.direct_force net.isr.direct net.isr.maxthreads net.isr.bindthreads I think you can start simply by turning off net.isr.direct_force and then start increasing net.isr.maxthreads until the benefits (if any) go away. Since it is experimental code, your benchmarks would be nice to have. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20090820/63d1d302/signature.pgp From fabrizio.invernizzi at telecomitalia.it Fri Aug 21 08:15:55 2009 From: fabrizio.invernizzi at telecomitalia.it (Invernizzi Fabrizio) Date: Fri Aug 21 08:16:02 2009 Subject: Strange CPU distributionat very high level bandwidth In-Reply-To: References: <36A93B31228D3B49B691AD31652BCAE9A456967AF4@GRFMBX702BA020.griffon.local> Message-ID: <36A93B31228D3B49B691AD31652BCAE9A45770696C@GRFMBX702BA020.griffon.local> Thanks for your suggestion. I hope to have time to do some tests on 8.0 and send some result on the ML next week. ------------------------------------------------------------------ Telecom Italia Fabrizio INVERNIZZI Technology - TILAB Accesso Fisso e Trasporto Via Reiss Romoli, 274 10148 Torino Tel. +39 011 2285497 Mob. +39 3316001344 Fax +39 06 41867287 > -----Original Message----- > From: owner-freebsd-performance@freebsd.org > [mailto:owner-freebsd-performance@freebsd.org] On Behalf Of Ivan Voras > Sent: venerd? 21 agosto 2009 1.14 > To: freebsd-performance@freebsd.org > Subject: Re: Strange CPU distributionat very high level bandwidth > > Invernizzi Fabrizio wrote: > > Hi all > > > > i am going on with some performance tests on a 10gbe > network card with FreeBSD. > > > > I am doing this test: I send UDP traffic to be forwarded to > the other port of the card on both the card ports. > > Using 1492-long packets i am uppering the number of > packets per second i sent In order to see wich is the maximum > bandwidth (or pps) the system can support without losses. > > > > The limit seems to be about 1890Mbps per port (3870 Mbps total). > > Looking more in deep the CPU behaviour i see this : > > - uppering the sent pps results in uppering the > intterrupt time (about 90%) > > - when i am very strict to the limit, interrupt time > falls to about > > 10% and CPU is always (85%) in system (rx/tx driver procedure) > > > > Questions: > > - Is not the AIM intended to contrast this behaviour to limit > > interrupts sent to CPU? (nothing changes if i disable it) > > - Why does the system start loosing pkts in that condition? > > - Why does the system seem to perform better when it is > managing more context switches? > > > > > - FreeBSD 7.2-RELEASE (64 bit) > > One idea for you, not directly tied to forwarding as is but > to the recent development of multithreaded packet acceptance > code, is to use 8.x (currently in BETA so usual precautions > about debugging being enabled apply) and then play with > netisr and worker thread settings. > > See the source here: > > http://svn.freebsd.org/viewvc/base/head/sys/net/netisr.c?view= markup&pathrev=195078 > > and the comments starting at "Three direct dispatch policies > are supported". > > The code is experimental and thus disabled in 8.0 unless a > combination of the following loader tunables are set: > > net.isr.direct_force > net.isr.direct > net.isr.maxthreads > net.isr.bindthreads > > I think you can start simply by turning off > net.isr.direct_force and then start increasing > net.isr.maxthreads until the benefits (if any) go away. Since > it is experimental code, your benchmarks would be nice to have. > > > > Questo messaggio e i suoi allegati sono indirizzati esclusivamente alle persone indicate. La diffusione, copia o qualsiasi altra azione derivante dalla conoscenza di queste informazioni sono rigorosamente vietate. Qualora abbiate ricevuto questo documento per errore siete cortesemente pregati di darne immediata comunicazione al mittente e di provvedere alla sua distruzione, Grazie. This e-mail and any attachments is confidential and may contain privileged information intended for the addressee(s) only. Dissemination, copying, printing or use by anybody else is unauthorised. If you are not the intended recipient, please delete this message and any attachments and advise the sender by return e-mail, Thanks.