From paul at gtcomm.net Tue Jul 1 00:05:55 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 00:06:00 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48695BA6.7060207@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <48695BA6.7060207@ibctech.ca> Message-ID: <4869755A.1020903@gtcomm.net> I am getting this message with normal routing. say... em0 10.1.1.1/24 em1 10.2.2.1/24 using a box 10.1.1.2 on em0 and having another box on 10.2.2.2 on em1 I send packet from 10.1.1.2 which goes through em0 and has a route to 10.2.2.2 out em1 of course and I get MASSIVE RTM_MISS messages but ONLY with this certain packets.. I don't get it? I posted the tcpdump of the types of packets that generate them and the ones that don't. RTM_MISS is normal if the box can't get to a route, it's the 'destination unreachable' message. I would prefer a kernel option to disable this message to save CPU cycles though as it is completely unnecessary to generate. I even set the default gateway to loopback interface and I STILL get the message.. Something is wrong in the code somewhere. Does anyone have any idea how to disable this message? It's causing major cpu usage on my zebra daemon which is watching the route messages and most likely severely limiting pps throughput :/ It generates the messages with only ip on em1 and em0 with nothing else in the routing table and a default gateway set. So it has nothing to do with zebra. It happens in 7-STABLE and (8) -CURRENT, I tested both. There are no RTM_MISS message in 7-RELEASE so someone changed something to -STABLE :/ Paul Steve Bertrand wrote: > Mike Tancsa wrote: >> At 04:04 AM 6/29/2008, Paul wrote: >>> This is just a question but who can get more than 400k pps >>> forwarding performance ? >> >> >> OK, I setup 2 boxes on either end of a RELENG_7 box from about May >> 7th just now, to see with 2 boxes blasting across it how it would >> work. *However*, this is with no firewall loaded and, I must enable >> ip fast forwarding. Without that enabled, the box just falls over. >> >> even at 20Kpps, I start seeing all sorts of messages spewing to route >> -n monitor >> >> >> got message of size 96 on Mon Jun 30 15:39:10 2008 >> RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >> 0, flags: >> locks: inits: >> sockaddrs: >> default > > Mike, > > Is the monitor running on the 7.0 box in the middle you are testing? > > I set up the same configuration, and even with almost no load (< > 1Kpps) can replicate these error messages by making the remote IP > address (in your case 'default', disappear (ie: unplug the cable, DDoS > etc). > > ...to further, I can even replicate the problem at a single packet per > second by trying to ping an IP address that I know for fact that the > router can not get to. > > Do you see these error messages if you set up a loopback address with > an IP on the router, and effectively chop your test environment in > half? In your case, can the router in the middle actually get to a > default gateway for external addresses (when I perform the test, your > 'default' is substituted with the IP I am trying to reach, so I am > only assuming that 'default' is implying default gateway). > > Steve > From pyunyh at gmail.com Tue Jul 1 00:29:11 2008 From: pyunyh at gmail.com (Pyun YongHyeon) Date: Tue Jul 1 00:29:16 2008 Subject: kern/125024: vr(4) does not see incoming multicast packets in non-promiscuous mode (broadcast is fine); breaks IPv6 In-Reply-To: <48693193.1020404@ab.ote.we.lv> References: <200806270345.m5R3j1BT036253@freefall.freebsd.org> <48649776.9040509@ab.ote.we.lv> <20080627074948.GC67753@cdnetworks.co.kr> <4864A217.3040309@ab.ote.we.lv> <20080629072932.GA76469@cdnetworks.co.kr> <48693193.1020404@ab.ote.we.lv> Message-ID: <20080701002659.GC83626@cdnetworks.co.kr> On Mon, Jun 30, 2008 at 12:18:43PM -0700, Eugene M. Kim wrote: > Than you! The new patch fixed the problem. I'll put it under test for > a few more days and let you know if any regression is seen. > Cool, thanks for testing! > Cheers, > Eugene > > Pyun YongHyeon wrote: > >On Fri, Jun 27, 2008 at 01:17:27AM -0700, Eugene M. Kim wrote: > > > Pyun YongHyeon wrote: > > > >I've updated patch again. There was a bug that disabled > > > >multicasting filter. Back out previous patch and try again. > > > >The URL is the same as before. > > > > > > > > > Regards, > > > > > Eugene > > > > > > > > > > rtsol still doesn't work with vr0 being in non-promiscuous mode. > > > However, apparently vr0 picked up router solicitations during system > > > boot-up, as ifconfig shows the correct prefixes autoconfigured. It > > > seems something goes wrong in between. 'o 'a > > > > > > >Oops, I was accessing CAM mask register as 8bit register which > >should be 32bit! Try the patch at the following URL. > > > >http://people.freebsd.org/~yongari/vr/vr.cam.patch2 > > > > > Eugene > > > -- Regards, Pyun YongHyeon From if at xip.at Tue Jul 1 00:35:57 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 00:36:02 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869755A.1020903@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <48695BA6.7060207@ibctech.ca> <4869755A.1020903@gtcomm.net> Message-ID: Dear Paul, > I am getting this message with normal routing. > > say... > > em0 10.1.1.1/24 > > em1 10.2.2.1/24 > > using a box 10.1.1.2 on em0 > and having another box on 10.2.2.2 on em1 > > I send packet from 10.1.1.2 which goes through em0 and has a route to > 10.2.2.2 out em1 of course and I get MASSIVE RTM_MISS messages but ONLY with > this certain packets.. I don't get it? I posted the tcpdump of the types of There is a open bug report: http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 perhaps it has something todo with the multiple fip-stuff? kind regards, Ingo Flaschberger From alex.wilkinson at dsto.defence.gov.au Tue Jul 1 00:54:53 2008 From: alex.wilkinson at dsto.defence.gov.au (Wilkinson, Alex) Date: Tue Jul 1 00:54:58 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200806301944.m5UJifJD081781@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> Message-ID: <20080701004346.GA3898@stlux503.dsto.defence.gov.au> 0n Mon, Jun 30, 2008 at 03:44:48PM -0400, Mike Tancsa wrote: >OK, I setup 2 boxes on either end of a RELENG_7 box from about May >7th just now, to see with 2 boxes blasting across it how it would >work. *However*, this is with no firewall loaded and, I must enable >ip fast forwarding. Without that enabled, the box just falls over. What is "ip fast forwarding" ? -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. From if at xip.at Tue Jul 1 01:00:34 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 01:00:40 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080701004346.GA3898@stlux503.dsto.defence.gov.au> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> Message-ID: Dear Alex, > >OK, I setup 2 boxes on either end of a RELENG_7 box from about May > >7th just now, to see with 2 boxes blasting across it how it would > >work. *However*, this is with no firewall loaded and, I must enable > >ip fast forwarding. Without that enabled, the box just falls over. > > What is "ip fast forwarding" ? instead of copying the while ip packet into system memory, only the ip header is copyied and then in a "fast" path determined if it could be fast forwarded. if possible, a ned header is created at the other network-cards-buffer and the ip-data is copied from network-card-buffer to network-card-buffer directly. Kind regards, Ingo Flaschberger From alex.wilkinson at dsto.defence.gov.au Tue Jul 1 01:07:49 2008 From: alex.wilkinson at dsto.defence.gov.au (Wilkinson, Alex) Date: Tue Jul 1 01:07:54 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> Message-ID: <20080701010716.GF3898@stlux503.dsto.defence.gov.au> 0n Tue, Jul 01, 2008 at 03:00:31AM +0200, Ingo Flaschberger wrote: >Dear Alex, > >> >OK, I setup 2 boxes on either end of a RELENG_7 box from about May >> >7th just now, to see with 2 boxes blasting across it how it would >> >work. *However*, this is with no firewall loaded and, I must enable >> >ip fast forwarding. Without that enabled, the box just falls over. >> >> What is "ip fast forwarding" ? > >instead of copying the while ip packet into system memory, only the ip >header is copyied and then in a "fast" path determined if it could be fast >forwarded. >if possible, a ned header is created at the other network-cards-buffer >and the ip-data is copied from network-card-buffer to network-card-buffer >directly. So how does one enable "ip fast forwarding" on FreeBSD ? -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. From if at xip.at Tue Jul 1 01:10:53 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 01:10:57 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080701010716.GF3898@stlux503.dsto.defence.gov.au> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> Message-ID: Dear Alex, > >if possible, a ned header is created at the other network-cards-buffer > >and the ip-data is copied from network-card-buffer to network-card-buffer > >directly. > > So how does one enable "ip fast forwarding" on FreeBSD ? sysctl -w net.inet.ip.fastforwarding=1 usually interface polling is also chosen to prevent "lock-ups". man polling kind regards, Ingo Flaschberger From crapsh at monkeybrains.net Tue Jul 1 01:23:05 2008 From: crapsh at monkeybrains.net (Support (Rudy)) Date: Tue Jul 1 01:23:15 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> Message-ID: <486986D9.3000607@monkeybrains.net> Ingo Flaschberger wrote: > usually interface polling is also chosen to prevent "lock-ups". > man polling I used polling in FreeBSD 5.x and it helped a bunch. I set up a new router with 7.0 and MSI was recommended to me. (I noticed no difference when moving from polling -> MSI, however, on 5.4 polling seemed to help a lot. What are people using in 7.0? polling or MSI? Rudy From thompsa at FreeBSD.org Tue Jul 1 01:24:10 2008 From: thompsa at FreeBSD.org (Andrew Thompson) Date: Tue Jul 1 01:24:14 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <20080630101629.GD79537@cdnetworks.co.kr> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> Message-ID: <20080701012531.GA92392@citylink.fud.org.nz> On Mon, Jun 30, 2008 at 07:16:29PM +0900, Pyun YongHyeon wrote: > On Mon, Jun 30, 2008 at 12:11:40PM +0300, Stefan Lambrev wrote: > > Greetings, > > > > I just noticed, that when I add em network card to bridge the checksum > > offload is turned off. > > I even put in my rc.conf: > > ifconfig_em0="rxcsum up" > > ifconfig_em1="rxcsum up" > > but after reboot both em0 and em1 have this feature disabled. > > > > Is this expected behavior? Should I care about csum in bridge mode? > > I noticed that enabling checksum offload manually improve things little btw. > > > > AFAIK this is intended one, bridge(4) turns off Tx side checksum > offload by default. I think disabling Tx checksum offload is > required as not all members of a bridge may be able to do checksum > offload. The same is true for TSO but it seems that bridge(4) > doesn't disable it. > If all members of bridge have the same hardware capability I think > bridge(4) may not need to disable Tx side hardware assistance. I > guess bridge(4) can scan every interface capabilities in a member > and can decide what hardware assistance can be activated instead of > blindly turning off Tx side hardware assistance. This patch should do that, are you able to test it Stefan? cheers, Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: bridge_caps.diff Type: text/x-diff Size: 4315 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080701/06a1f7f7/bridge_caps.bin From steve at ibctech.ca Tue Jul 1 01:25:17 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Tue Jul 1 01:25:19 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080701010716.GF3898@stlux503.dsto.defence.gov.au> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> Message-ID: <4869877C.20007@ibctech.ca> Wilkinson, Alex wrote: > So how does one enable "ip fast forwarding" on FreeBSD ? Not to take anything away from Ingo's response, but to inform how to add the functionality to span across reboots, add the following line to /etc/sysctl.conf net.inet.ip.fastforwarding=1 Steve From steve at ibctech.ca Tue Jul 1 01:27:41 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Tue Jul 1 01:27:44 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486986D9.3000607@monkeybrains.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> Message-ID: <4869880D.8040901@ibctech.ca> Support (Rudy) wrote: > Ingo Flaschberger wrote: >> usually interface polling is also chosen to prevent "lock-ups". >> man polling > > > I used polling in FreeBSD 5.x and it helped a bunch. I set up a new > router with 7.0 and MSI was recommended to me. (I noticed no difference > when moving from polling -> MSI, however, on 5.4 polling seemed to help > a lot. I'm curious now... how do you change individual device polling via sysctl? Steve From mike at sentex.net Tue Jul 1 01:29:34 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 1 01:29:40 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48694A9D.1030001@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <48694A9D.1030001@gtcomm.net> Message-ID: <200807010129.m611TUV5083067@lava.sentex.ca> At 05:05 PM 6/30/2008, Paul wrote: >With hours and days of tweaking i can't even get 500k pps :/ no >firewall no anything else.. >What is your kernel config? Sysctl configs? The only thing that makes a difference is net.inet.ip.fastforwarding=1 >My machine i'm testing on is dual opteron 2212 , with intel 2 port >82571 nic.. xeon dual core on a supermicro MB. I am using one NIC on the MB and one on the dual port. em0@pci0:10:1:0: class=0x020000 card=0x11798086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[dc] = powerspec 2 supports D0 D3 current D0 cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transaction cap 05[f0] = MSI supports 1 message, 64 bit em1@pci0:10:1:1: class=0x020000 card=0x11798086 chip=0x10798086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[dc] = powerspec 2 supports D0 D3 current D0 cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transaction cap 05[f0] = MSI supports 1 message, 64 bit em2@pci0:13:0:0: class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82573E Intel Corporation 82573E Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint em3@pci0:14:0:0: class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82573L Intel PRO/1000 PL Network Adaptor' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint >Using 7-STABLE and I tried 6-stable and -current >I get the RTM_MISS with 7 and current but only with certain types of >packets at a certain rate.. :/ I wonder if its a bug with the em driver ? I dont have any other dual port cards handy right now to test with >I can not get more than 500kpps.. i tried everything I could think >of... lowering the rx descriptors on EM to 512 instead of 2048 gave >me some more.. I was stuck at 400kpps until i changed those and i >lowered the rx processing limit. >My tests are going incoming em0 and outgoing em1 in one direction >only and it has major errors when em0 taskq gets close to 80% cpu.. I now have 3 boxes now generating traffic through the box acting as a router. I will try some other operating systems as well to see how they compare when back at the office on Wednesday >I am pretty disappointed that it maxes out a little over 400kpps and >even then it gets some errors here and there , mainly missed packets >due to no buffer and rx overruns (dev.em.0.stats=1) Something about the MB you are using perhaps ? Just for rough comparison, how long does # time make -j4 buildkernel > /var/log/build.out.k 670.485u 66.061s 8:29.54 144.5% 5962+1087k 9185+7419io 380pf+0w take on your machine ? The above value is with inet6 and sctp commented out from the kernel. ---Mike >Mike Tancsa wrote: >>At 04:04 AM 6/29/2008, Paul wrote: >>>This is just a question but who can get more than 400k pps >>>forwarding performance ? >> >> >>OK, I setup 2 boxes on either end of a RELENG_7 box from about May >>7th just now, to see with 2 boxes blasting across it how it would work. >>*However*, this is with no firewall loaded and, I must enable ip >>fast forwarding. Without that enabled, the box just falls over. >> >>even at 20Kpps, I start seeing all sorts of messages spewing to >>route -n monitor >> >> >>got message of size 96 on Mon Jun 30 15:39:10 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default >> >>I am starting to wonder if those messages are the results of >>corrupted packets the machine just cant keep up with ? >> >> >>CPU is >> >>CPU: Intel(R) Xeon(R) CPU 3070 @ 2.66GHz (2660.01-MHz >>686-class CPU) >> >> >> input (Total) output >> packets errs bytes packets errs bytes colls >> 611945 0 77892098 611955 0 77013002 0 >> 616727 0 78215508 616742 0 77303454 0 >> 617066 0 78162130 617082 0 77238434 0 >> 618238 0 78302314 618225 0 77377582 0 >> 617035 0 78141000 617038 0 77215672 0 >> 617625 0 78225600 617588 0 77301734 0 >> 616190 0 78017320 616165 0 77091774 0 >> 615583 0 78064130 615628 0 77152800 0 >> 617662 0 78254388 617658 0 77332340 0 >> 618000 0 78269912 617950 0 77344554 0 >> 617248 0 78183136 617315 0 77259588 0 >> 617325 0 78204566 617289 0 77282094 0 >> 618391 0 78337734 618357 0 77413756 0 >> 616025 0 78116070 616082 0 77203116 0 >> >> >>To generate the packets, I am just using >>/usr/src/tools/tools/netblast on 2 endpoints starting at about the same time >> >># ./netblast 10.10.1.2 500 100 40 >> >>start: 1214854131.083679919 >>finish: 1214854171.084668592 >>send calls: 20139141 >>send errors: 0 >>approx send rate: 503478 >>approx error rate: 0 >> >> >># ./netblast 10.10.1.3 500 10 40 >> >>start: 1214854273.882202815 >>finish: 1214854313.882319031 >>send calls: 23354971 >>send errors: 18757223 >>approx send rate: 114943 >>approx error rate: 0 >> >>The box in the middle doing the forwarding >> >>1[spare-r7]# ifconfig -u >>em0: flags=8843 metric 0 mtu 1500 >> >>options=19b >> ether 00:1b:21:08:32:a8 >> inet 10.20.1.1 netmask 0xffffff00 broadcast 10.20.1.255 >> media: Ethernet autoselect (1000baseTX ) >> status: active >>em1: flags=8843 metric 0 mtu 1500 >> options=9b >> ether 00:1b:21:08:32:a9 >> inet 192.168.43.193 netmask 0xffffff00 broadcast 192.168.43.255 >> media: Ethernet autoselect (100baseTX ) >> status: active >>em3: flags=8843 metric 0 mtu 1500 >> >>options=19b >> ether 00:30:48:90:4c:ff >> inet 10.10.1.1 netmask 0xffffff00 broadcast 10.10.1.255 >> media: Ethernet autoselect (1000baseTX ) >> status: active >>lo0: flags=8049 metric 0 mtu 16384 >> inet 127.0.0.1 netmask 0xff000000 >> >> >>I am going to try a few more tests with and without, firewall rules >>etc as well as an updated kernel to RELENG_7 as of today and see how that goes. >> >> ---Mike >> From if at xip.at Tue Jul 1 01:36:07 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 01:36:12 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486986D9.3000607@monkeybrains.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> Message-ID: Dear Rudy, > I used polling in FreeBSD 5.x and it helped a bunch. I set up a new router > with 7.0 and MSI was recommended to me. (I noticed no difference when moving > from polling -> MSI, however, on 5.4 polling seemed to help a lot. What are > people using in 7.0? > polling or MSI? if you have a inet-router with gige-uplinks, it is possible that there will be (d)dos attacks. only polling helps you then to keep the router manageable (but dropping packets). Kind regards, Ingo Flaschberger From if at xip.at Tue Jul 1 01:39:02 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 01:39:06 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869880D.8040901@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <4869880D.8040901@ibctech.ca> Message-ID: Dear Steve, > I'm curious now... how do you change individual device polling via sysctl? not via sysctl, via ifconfig: # enable interface polling /sbin/ifconfig em0 polling /sbin/ifconfig em1 polling /sbin/ifconfig em2 polling /sbin/ifconfig em3 polling (and via /etc/rc.local also across reboots) kind regards, Ingo Flaschberger From mike at sentex.net Tue Jul 1 01:57:13 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 1 01:57:18 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48695BA6.7060207@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <48695BA6.7060207@ibctech.ca> Message-ID: <200807010157.m611vAOt083163@lava.sentex.ca> At 06:18 PM 6/30/2008, Steve Bertrand wrote: >Mike Tancsa wrote: >>At 04:04 AM 6/29/2008, Paul wrote: >>>This is just a question but who can get more than 400k pps >>>forwarding performance ? >> >>OK, I setup 2 boxes on either end of a RELENG_7 box from about May >>7th just now, to see with 2 boxes blasting across it how it would work. >>*However*, this is with no firewall loaded and, I must enable ip >>fast forwarding. Without that enabled, the box just falls over. >>even at 20Kpps, I start seeing all sorts of messages spewing to >>route -n monitor >> >>got message of size 96 on Mon Jun 30 15:39:10 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default > >Mike, > >Is the monitor running on the 7.0 box in the middle you are testing? In the middle On the box that has 0[r7-router]# ifconfig -u em0: flags=8843 metric 0 mtu 1500 options=9b ether 00:1b:21:08:32:a8 inet 10.20.1.1 netmask 0xffffff00 broadcast 10.20.1.255 media: Ethernet autoselect (1000baseTX ) status: active em1: flags=8843 metric 0 mtu 1500 options=9b ether 00:1b:21:08:32:a9 inet 192.168.43.193 netmask 0xffffff00 broadcast 192.168.43.255 media: Ethernet autoselect (100baseTX ) status: active em3: flags=8843 metric 0 mtu 1500 options=19b ether 00:30:48:90:4c:ff inet 10.10.1.1 netmask 0xffffff00 broadcast 10.10.1.255 media: Ethernet autoselect (1000baseTX ) status: active lo0: flags=8049 metric 0 mtu 16384 inet 127.0.0.1 netmask 0xff000000 0[r7-router]# If I dont have fast forwarding on, yes it seems one packet generates one error message per packet forwarded. This sure does feel like a driver bug, but I just tried on another machine with rl0 acting as a forwarding interface and it too generates a message per packet forwarded! From one end point that sends packets across r7-router # ping -c3 -S 10.10.1.2 10.20.1.3 PING 10.20.1.3 (10.20.1.3) from 10.10.1.2: 56 data bytes 64 bytes from 10.20.1.3: icmp_seq=0 ttl=63 time=0.349 ms 64 bytes from 10.20.1.3: icmp_seq=1 ttl=63 time=0.319 ms 64 bytes from 10.20.1.3: icmp_seq=2 ttl=63 time=0.343 ms --- 10.20.1.3 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.319/0.337/0.349/0.013 ms Here are all the messages generated on the intermediary router 1[r7-router]# route -n monitor got message of size 96 on Mon Jun 30 21:36:41 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:41 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:42 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:42 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:42 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:43 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:43 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Mon Jun 30 21:36:44 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default If I turn on fast forwarding on r7-router, route -n monitor doesnt spew out messages >I set up the same configuration, and even with almost no load (< >1Kpps) can replicate these error messages by making the remote IP >address (in your case 'default', disappear (ie: unplug the cable, DDoS etc). > >...to further, I can even replicate the problem at a single packet >per second by trying to ping an IP address that I know for fact that >the router can not get to. I get it when I can ping a valid host on the other side, which responds >Do you see these error messages if you set up a loopback address >with an IP on the router, and effectively chop your test environment >in half? In your case, can the router in the middle actually get to >a default gateway for external addresses (when I perform the test, >your 'default' is substituted with the IP I am trying to reach, so I >am only assuming that 'default' is implying default gateway). No, its only when the box is forwarding packets. If I do the following on the router, 1[r7-router]# sysctl -w net.inet.ip.fastforwarding=0 net.inet.ip.fastforwarding: 1 -> 0 0[r7-router]# 0[r7-router]# ifconfig lo0 10.40.1.1/32 alias 0[r7-router]# route -n monitor ...all is quiet on the router if I do the following # ping -c3 -S 10.10.1.2 10.40.1.1 PING 10.40.1.1 (10.40.1.1) from 10.10.1.2: 56 data bytes 64 bytes from 10.40.1.1: icmp_seq=0 ttl=64 time=0.221 ms 64 bytes from 10.40.1.1: icmp_seq=1 ttl=64 time=0.177 ms 64 bytes from 10.40.1.1: icmp_seq=2 ttl=64 time=0.213 ms --- 10.40.1.1 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.177/0.204/0.221/0.019 ms ---Mike From steve at ibctech.ca Tue Jul 1 01:58:52 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Tue Jul 1 01:58:55 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200807010129.m611TUV5083067@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <48694A9D.1030001@gtcomm.net> <200807010129.m611TUV5083067@lava.sentex.ca> Message-ID: <48698F5B.2070605@ibctech.ca> Mike Tancsa wrote: >>> The box in the middle doing the forwarding If I can help in any way, a topo map of the setup that you are facing would be good. What do you have at either end. In the interest of pushing 500kpps, I have this, if it helps with troubleshooting: em0@pci1:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em1@pci2:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em2@pci3:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em3@pci4:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em4@pci5:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em5@pci6:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em6@pci7:0:0: class=0x020000 card=0x00008086 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet Steve From mike at sentex.net Tue Jul 1 02:05:06 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 1 02:05:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080701004346.GA3898@stlux503.dsto.defence.gov.au> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> Message-ID: <200807010205.m61250IT083199@lava.sentex.ca> At 08:43 PM 6/30/2008, Wilkinson, Alex wrote: > 0n Mon, Jun 30, 2008 at 03:44:48PM -0400, Mike Tancsa wrote: > > >OK, I setup 2 boxes on either end of a RELENG_7 box from about May > >7th just now, to see with 2 boxes blasting across it how it would > >work. *However*, this is with no firewall loaded and, I must enable > >ip fast forwarding. Without that enabled, the box just falls over. > >What is "ip fast forwarding" ? From /usr/src/sys/netinet/ip_fastfwd.c /* * ip_fastforward gets its speed from processing the forwarded packet to * completion (if_output on the other side) without any queues or netisr's. * The receiving interface DMAs the packet into memory, the upper half of * driver calls ip_fastforward, we do our routing table lookup and directly * send it off to the outgoing interface, which DMAs the packet to the * network card. The only part of the packet we touch with the CPU is the * IP header (unless there are complex firewall rules touching other parts * of the packet, but that is up to you). We are essentially limited by bus * bandwidth and how fast the network card/driver can set up receives and * transmits. * * We handle basic errors, IP header errors, checksum errors, * destination unreachable, fragmentation and fragmentation needed and * report them via ICMP to the sender. * * Else if something is not pure IPv4 unicast forwarding we fall back to * the normal ip_input processing path. We should only be called from * interfaces connected to the outside world. * * Firewalling is fully supported including divert, ipfw fwd and ipfilter * ipnat and address rewrite. * * IPSEC is not supported if this host is a tunnel broker. IPSEC is * supported for connections to/from local host. * * We try to do the least expensive (in CPU ops) checks and operations * first to catch junk with as little overhead as possible. * * We take full advantage of hardware support for IP checksum and * fragmentation offloading. * * We don't do ICMP redirect in the fast forwarding path. I have had my own * cases where two core routers with Zebra routing suite would send millions * ICMP redirects to connected hosts if the destination router was not the * default gateway. In one case it was filling the routing table of a host * with approximately 300.000 cloned redirect entries until it ran out of * kernel memory. However the networking code proved very robust and it didn't * crash or fail in other ways. */ > -aW > >IMPORTANT: This email remains the property of the Australian Defence >Organisation and is subject to the jurisdiction of section 70 of the >CRIMES ACT 1914. If you have received this email in error, you are >requested to contact the sender and delete the email. > > >_______________________________________________ >freebsd-net@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From paul at gtcomm.net Tue Jul 1 02:38:21 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 02:38:25 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> Message-ID: <48699915.4020804@gtcomm.net> Well it's supposed to, but it doesn't seem to do it as well as it should :> How about copying header direct DMA from NIC into cache, then copy from cache into output NIC after applying whatever filters/changes/etc? Ingo Flaschberger wrote: > Dear Alex, > >> >OK, I setup 2 boxes on either end of a RELENG_7 box from about May >> >7th just now, to see with 2 boxes blasting across it how it would >> >work. *However*, this is with no firewall loaded and, I must enable >> >ip fast forwarding. Without that enabled, the box just falls over. >> >> What is "ip fast forwarding" ? > > instead of copying the while ip packet into system memory, only the ip > header is copyied and then in a "fast" path determined if it could be > fast forwarded. > if possible, a ned header is created at the other network-cards-buffer > and the ip-data is copied from network-card-buffer to > network-card-buffer directly. > > Kind regards, > Ingo Flaschberger > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From paul at gtcomm.net Tue Jul 1 02:39:35 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 02:39:39 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486986D9.3000607@monkeybrains.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> Message-ID: <48699960.9070100@gtcomm.net> All the NIC drivers in 7 pretty much use interrupt moderation so it can never lock the machine anyway.. This effectively kills polling and it really no longer has any use except to be able to have a fraction of the cpu set aside for user space but you can do that anyway with SMP Support (Rudy) wrote: > Ingo Flaschberger wrote: >> usually interface polling is also chosen to prevent "lock-ups". >> man polling > > > I used polling in FreeBSD 5.x and it helped a bunch. I set up a new > router with 7.0 and MSI was recommended to me. (I noticed no > difference when moving from polling -> MSI, however, on 5.4 polling > seemed to help a lot. What are people using in 7.0? > polling or MSI? > > Rudy > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From alex.wilkinson at dsto.defence.gov.au Tue Jul 1 03:02:46 2008 From: alex.wilkinson at dsto.defence.gov.au (Wilkinson, Alex) Date: Tue Jul 1 03:02:51 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48699960.9070100@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: <20080701030216.GP3898@stlux503.dsto.defence.gov.au> 0n Mon, Jun 30, 2008 at 10:41:36PM -0400, Paul wrote: >All the NIC drivers in 7 pretty much use interrupt moderation so it can >never lock the machine anyway.. This effectively kills polling and it >really no longer has any use except to be able to have a fraction of the >cpu set aside for user space but you can do that anyway with SMP what is "interrupt moderation" ? -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. From sepherosa at gmail.com Tue Jul 1 03:05:05 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Tue Jul 1 03:05:08 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48699960.9070100@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: On 7/1/08, Paul wrote: > All the NIC drivers in 7 pretty much use interrupt moderation so it can I am not quite sure whether em(4)'s RX interrupt moderation works as expected or not. But, AFAIK, nfe(4) and re(4) does not have RX interrupt moderation. Their TX interrupt moderation could be mimiced by using their hardware timer and disabling their TX interrupt. The lacking of RX im is difficult to handle, I could imagine following way: - During init, enable RX intr - When RX intr comes, disable RX intr and set up hardware timer intr - When timer intr comes and no RX happens, disable timer intr and enable RX intr Properly configured #RX desc and timer intr interval will be required to make sure that the RX desc collection could keep up with the hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ good result, i.e. TX/RX @linespeed without livelocking the system. The drawback of pure timer intr is that you waste extra cpu power, when there is nothing to process. > never lock the machine anyway.. This effectively kills polling and it really > no longer has any use except to be able to have a fraction of the cpu set Oh? Really? :] Best Regards, sephe -- Live Free or Die From paul at gtcomm.net Tue Jul 1 03:10:25 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 03:10:30 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: <4869A099.5070206@gtcomm.net> I have been unable to even come close to livelocking the machine with the em driver interrupt moderation. So that to me throws polling out the window. I tried 8000hz with polling modified to allow 10000 burst and it makes no difference in the amount of pps I can jam through.. It' seems to be limited by the routing path in the kernel more than anything else. If a driver/hardware didn't support interrupt mitigation then it would definitely lock the machine. Sepherosa Ziehau wrote: > On 7/1/08, Paul wrote: > >> All the NIC drivers in 7 pretty much use interrupt moderation so it can >> > > I am not quite sure whether em(4)'s RX interrupt moderation works as > expected or not. But, AFAIK, nfe(4) and re(4) does not have RX > interrupt moderation. Their TX interrupt moderation could be mimiced > by using their hardware timer and disabling their TX interrupt. > > The lacking of RX im is difficult to handle, I could imagine following way: > - During init, enable RX intr > - When RX intr comes, disable RX intr and set up hardware timer intr > - When timer intr comes and no RX happens, disable timer intr and enable RX intr > > Properly configured #RX desc and timer intr interval will be required > to make sure that the RX desc collection could keep up with the > hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ > good result, i.e. TX/RX @linespeed without livelocking the system. > The drawback of pure timer intr is that you waste extra cpu power, > when there is nothing to process. > > >> never lock the machine anyway.. This effectively kills polling and it really >> no longer has any use except to be able to have a fraction of the cpu set >> > > Oh? Really? :] > > Best Regards, > sephe > > From pyunyh at gmail.com Tue Jul 1 03:33:37 2008 From: pyunyh at gmail.com (Pyun YongHyeon) Date: Tue Jul 1 03:33:42 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: <20080701033117.GH83626@cdnetworks.co.kr> On Tue, Jul 01, 2008 at 11:05:03AM +0800, Sepherosa Ziehau wrote: > On 7/1/08, Paul wrote: > > All the NIC drivers in 7 pretty much use interrupt moderation so it can > > I am not quite sure whether em(4)'s RX interrupt moderation works as > expected or not. But, AFAIK, nfe(4) and re(4) does not have RX > interrupt moderation. Their TX interrupt moderation could be mimiced > by using their hardware timer and disabling their TX interrupt. > > The lacking of RX im is difficult to handle, I could imagine following way: > - During init, enable RX intr > - When RX intr comes, disable RX intr and set up hardware timer intr > - When timer intr comes and no RX happens, disable timer intr and enable RX intr > I guess adaptive polling would give the same effect withtout sacrificing CPU cycles. > Properly configured #RX desc and timer intr interval will be required > to make sure that the RX desc collection could keep up with the > hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ > good result, i.e. TX/RX @linespeed without livelocking the system. I thought that too for a while but I prefer to hardware intertrrupt moderation feature. Of course I still have no clue how to enable that interrupt feature on nvidia controllers. :-( > The drawback of pure timer intr is that you waste extra cpu power, > when there is nothing to process. > > > never lock the machine anyway.. This effectively kills polling and it really > > no longer has any use except to be able to have a fraction of the cpu set > > Oh? Really? :] > > Best Regards, > sephe > > -- > Live Free or Die -- Regards, Pyun YongHyeon From sepherosa at gmail.com Tue Jul 1 03:34:53 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Tue Jul 1 03:34:57 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869A099.5070206@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <4869A099.5070206@gtcomm.net> Message-ID: On 7/1/08, Paul wrote: > I have been unable to even come close to livelocking the machine with the em > driver interrupt moderation. Yeah, system will not be livelocked. But even setting its imtimer to 4000, the overall system response is still worse than using polling @4000 with a 9402PT. > So that to me throws polling out the window. I tried 8000hz with polling I don't believe high polling rate will improve forwarding performance. I used to set polling rate to 2000hz, burst max to 750 and each burst to 60. > modified to allow 10000 burst and it makes no difference > in the amount of pps I can jam through.. It' seems to be limited by the > routing path in the kernel more than anything else. > > If a driver/hardware didn't support interrupt mitigation then it would > definitely lock the machine. So polling(4) still has its place. Best Regards, sephe -- Live Free or Die From sepherosa at gmail.com Tue Jul 1 03:50:25 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Tue Jul 1 03:50:29 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080701033117.GH83626@cdnetworks.co.kr> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> Message-ID: On 7/1/08, Pyun YongHyeon wrote: > On Tue, Jul 01, 2008 at 11:05:03AM +0800, Sepherosa Ziehau wrote: > > On 7/1/08, Paul wrote: > > > All the NIC drivers in 7 pretty much use interrupt moderation so it can > > > > I am not quite sure whether em(4)'s RX interrupt moderation works as > > expected or not. But, AFAIK, nfe(4) and re(4) does not have RX > > interrupt moderation. Their TX interrupt moderation could be mimiced > > by using their hardware timer and disabling their TX interrupt. > > > > The lacking of RX im is difficult to handle, I could imagine following way: > > - During init, enable RX intr > > - When RX intr comes, disable RX intr and set up hardware timer intr > > - When timer intr comes and no RX happens, disable timer intr and enable RX intr > > > > > I guess adaptive polling would give the same effect withtout > sacrificing CPU cycles. The possible wasting is one extra timer intr if there is nothing to processing at all. But would it be counted as wasting, if the system was that idle? :) We will see the result, when I could find some free time to implement it :] > > > > Properly configured #RX desc and timer intr interval will be required > > to make sure that the RX desc collection could keep up with the > > hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ > > good result, i.e. TX/RX @linespeed without livelocking the system. > > > I thought that too for a while but I prefer to hardware intertrrupt > moderation feature. Of course I still have no clue how to enable > that interrupt feature on nvidia controllers. :-( RX/TX intr is not affected at all by the so called IMTIMER register. The IMTIMER register is actually only a hardware timer counter register. I took a look at Linux's forcedeth several days ago, but I didn't see anything improved in that area. Best Regards, sephe -- Live Free or Die From paul at gtcomm.net Tue Jul 1 04:03:13 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 04:03:18 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> Message-ID: <4869ACFC.5020205@gtcomm.net> Dual opteron 270 32 bit GENERIC KERNEL Nothing changed in sysctl except forwarding and ip forwarding Broadcom interfaces on board NIC last pid: 11557; load averages: 1.13, 0.83, 0.48 up 0+03:24:26 21:58:38 70 processes: 6 running, 46 sleeping, 18 waiting CPU states: 0.0% user, 0.0% nice, 0.2% system, 28.8% interrupt, 71.1% idle Mem: 9124K Active, 6844K Inact, 26M Wired, 9776K Buf, 1957M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 13 root 171 ki31 0K 8K CPU1 1 203:39 98.93% idle: cpu1 14 root 171 ki31 0K 8K RUN 0 203:26 98.05% idle: cpu0 33 root -68 - 0K 8K CPU2 2 13:06 93.99% irq24: bge0 11 root 171 ki31 0K 8K CPU3 3 201:59 79.10% idle: cpu3 34 root -68 - 0K 8K WAIT 3 1:49 18.90% irq25: bge1 12 root 171 ki31 0K 8K RUN 2 190:42 4.00% idle: cpu2 input (bge0) output packets errs bytes packets errs bytes colls 410143 65619 24608586 1 0 226 0 410350 66749 24621006 1 0 178 0 409452 65382 24567126 1 0 178 0 410338 63157 24620286 1 0 178 0 410125 65777 24607446 1 0 178 0 409794 63018 24587706 1 0 178 0 408208 67566 24492486 1 0 178 0 408416 70305 24504906 1 0 178 0 407919 68339 24475206 1 0 178 0 sysctl -a | grep bge.0.stats dev.bge.0.stats.FramesDroppedDueToFilters: 0 dev.bge.0.stats.DmaWriteQueueFull: 310574781 dev.bge.0.stats.DmaWriteHighPriQueueFull: 0 dev.bge.0.stats.NoMoreRxBDs: 0 dev.bge.0.stats.InputDiscards: 20942213 dev.bge.0.stats.InputErrors: 15 dev.bge.0.stats.RecvThresholdHit: 51440366 dev.bge.0.stats.DmaReadQueueFull: 0 dev.bge.0.stats.DmaReadHighPriQueueFull: 0 dev.bge.0.stats.SendDataCompQueueFull: 0 dev.bge.0.stats.RingSetSendProdIndex: 25738223 dev.bge.0.stats.RingStatusUpdate: 51719806 dev.bge.0.stats.Interrupts: 51719806 dev.bge.0.stats.AvoidedInterrupts: 0 dev.bge.0.stats.SendThresholdHit: 0 dev.bge.0.stats.rx.Octets: 3477501149 dev.bge.0.stats.rx.Fragments: 116 dev.bge.0.stats.rx.UcastPkts: 519429709 dev.bge.0.stats.rx.MulticastPkts: 0 dev.bge.0.stats.rx.FCSErrors: 3 dev.bge.0.stats.rx.AlignmentErrors: 0 dev.bge.0.stats.rx.xonPauseFramesReceived: 0 dev.bge.0.stats.rx.xoffPauseFramesReceived: 0 dev.bge.0.stats.rx.ControlFramesReceived: 0 dev.bge.0.stats.rx.xoffStateEntered: 0 dev.bge.0.stats.rx.FramesTooLong: 0 dev.bge.0.stats.rx.Jabbers: 0 dev.bge.0.stats.rx.UndersizePkts: 0 dev.bge.0.stats.rx.inRangeLengthError: 0 dev.bge.0.stats.rx.outRangeLengthError: 0 dev.bge.0.stats.tx.Octets: 2215096864 dev.bge.0.stats.tx.Collisions: 0 dev.bge.0.stats.tx.XonSent: 0 dev.bge.0.stats.tx.XoffSent: 0 dev.bge.0.stats.tx.flowControlDone: 0 dev.bge.0.stats.tx.InternalMacTransmitErrors: 0 dev.bge.0.stats.tx.SingleCollisionFrames: 0 dev.bge.0.stats.tx.MultipleCollisionFrames: 0 dev.bge.0.stats.tx.DeferredTransmissions: 0 dev.bge.0.stats.tx.ExcessiveCollisions: 0 dev.bge.0.stats.tx.LateCollisions: 0 dev.bge.0.stats.tx.UcastPkts: 25738364 dev.bge.0.stats.tx.MulticastPkts: 0 dev.bge.0.stats.tx.BroadcastPkts: 15 dev.bge.0.stats.tx.CarrierSenseErrors: 0 dev.bge.0.stats.tx.Discards: 0 dev.bge.0.stats.tx.Errors: 0 errors..ERRORS!)(@!*() Hitting that 400k pps limit again, and this is a slower machine than my dual 2212. Going to compile 7-STABLE with options for the cpu and will report back From paul at gtcomm.net Tue Jul 1 04:16:43 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 04:16:46 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869ACFC.5020205@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> Message-ID: <4869B025.9080006@gtcomm.net> Dual opteron 2212, amd64 kernel, GENERIC (same setup as my 270 I just posted except this is 64 bit and different NIC) Intel dual port 82571 , nothing changed in sysctl except fw and fastfw input (em0) output packets errs bytes packets errs bytes colls 455729 69094 27343822 9 0 1412 0 455582 67566 27334938 3 0 566 0 455525 66033 27331518 3 0 566 0 456632 68384 27397938 3 0 566 0 457144 70006 27428604 4 0 776 0 453551 71307 27213226 7 0 1038 0 455282 74669 27317022 6 0 1068 0 452268 75878 27136104 4 0 744 0 455447 76942 27326838 4 0 744 0 455408 68166 27324522 5 0 922 0 456113 74600 27366798 4 0 744 0 456316 68522 27378996 5 0 922 0 455888 71665 27353322 5 0 1034 0 453448 76407 27206904 4 0 744 0 453907 79158 27234444 4 0 744 0 452921 66532 27175278 4 0 744 0 456049 60947 27363048 8 0 1216 0 em0: Excessive collisions = 0 em0: Sequence errors = 0 em0: Defer count = 0 em0: Missed Packets = 131239807 em0: Receive No Buffers = 154088217 em0: Receive Length Errors = 0 em0: Receive errors = 0 em0: Crc errors = 0 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 4986006 em0: watchdog timeouts = 0 em0: XON Rcvd = 0 em0: XON Xmtd = 0 em0: XOFF Rcvd = 0 em0: XOFF Xmtd = 0 em0: Good Packets Rcvd = 969917996 em0: Good Packets Xmtd = 98120420 em0: TSO Contexts Xmtd = 2062 em0: TSO Contexts Failed = 0 errrrrrrrrrrorss..... 79 processes: 6 running, 52 sleeping, 3 stopped, 18 waiting CPU states: 0.0% user, 0.0% nice, 28.2% system, 0.0% interrupt, 71.8% idle Mem: 18M Active, 787M Inact, 241M Wired, 196K Cache, 213M Buf, 928M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 171 ki31 0K 16K CPU2 2 348:19 99.02% idle: cpu2 13 root 171 ki31 0K 16K CPU1 1 347:02 99.02% idle: cpu1 37 root -68 - 0K 16K CPU3 3 37:48 99.02% em0 taskq 14 root 171 ki31 0K 16K RUN 0 340:08 85.94% idle: cpu0 38 root -68 - 0K 16K - 1 8:46 12.06% em1 taskq How do you like those IDLE CPUS :) except 3 of course.. :> Paul wrote: > Dual opteron 270 > 32 bit GENERIC KERNEL Nothing changed in sysctl except forwarding and > ip forwarding > Broadcom interfaces on board NIC > > last pid: 11557; load averages: 1.13, 0.83, > 0.48 > up 0+03:24:26 21:58:38 > 70 processes: 6 running, 46 sleeping, 18 waiting > CPU states: 0.0% user, 0.0% nice, 0.2% system, 28.8% interrupt, > 71.1% idle > Mem: 9124K Active, 6844K Inact, 26M Wired, 9776K Buf, 1957M Free > Swap: 4096M Total, 4096M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 13 root 171 ki31 0K 8K CPU1 1 203:39 98.93% idle: cpu1 > 14 root 171 ki31 0K 8K RUN 0 203:26 98.05% idle: cpu0 > 33 root -68 - 0K 8K CPU2 2 13:06 93.99% irq24: bge0 > 11 root 171 ki31 0K 8K CPU3 3 201:59 79.10% idle: cpu3 > 34 root -68 - 0K 8K WAIT 3 1:49 18.90% irq25: bge1 > 12 root 171 ki31 0K 8K RUN 2 190:42 4.00% idle: cpu2 > > input (bge0) output > packets errs bytes packets errs bytes colls > 410143 65619 24608586 1 0 226 0 > 410350 66749 24621006 1 0 178 0 > 409452 65382 24567126 1 0 178 0 > 410338 63157 24620286 1 0 178 0 > 410125 65777 24607446 1 0 178 0 > 409794 63018 24587706 1 0 178 0 > 408208 67566 24492486 1 0 178 0 > 408416 70305 24504906 1 0 178 0 > 407919 68339 24475206 1 0 178 0 > > > sysctl -a | grep bge.0.stats > dev.bge.0.stats.FramesDroppedDueToFilters: 0 > dev.bge.0.stats.DmaWriteQueueFull: 310574781 > dev.bge.0.stats.DmaWriteHighPriQueueFull: 0 > dev.bge.0.stats.NoMoreRxBDs: 0 > dev.bge.0.stats.InputDiscards: 20942213 > dev.bge.0.stats.InputErrors: 15 > dev.bge.0.stats.RecvThresholdHit: 51440366 > dev.bge.0.stats.DmaReadQueueFull: 0 > dev.bge.0.stats.DmaReadHighPriQueueFull: 0 > dev.bge.0.stats.SendDataCompQueueFull: 0 > dev.bge.0.stats.RingSetSendProdIndex: 25738223 > dev.bge.0.stats.RingStatusUpdate: 51719806 > dev.bge.0.stats.Interrupts: 51719806 > dev.bge.0.stats.AvoidedInterrupts: 0 > dev.bge.0.stats.SendThresholdHit: 0 > dev.bge.0.stats.rx.Octets: 3477501149 > dev.bge.0.stats.rx.Fragments: 116 > dev.bge.0.stats.rx.UcastPkts: 519429709 > dev.bge.0.stats.rx.MulticastPkts: 0 > dev.bge.0.stats.rx.FCSErrors: 3 > dev.bge.0.stats.rx.AlignmentErrors: 0 > dev.bge.0.stats.rx.xonPauseFramesReceived: 0 > dev.bge.0.stats.rx.xoffPauseFramesReceived: 0 > dev.bge.0.stats.rx.ControlFramesReceived: 0 > dev.bge.0.stats.rx.xoffStateEntered: 0 > dev.bge.0.stats.rx.FramesTooLong: 0 > dev.bge.0.stats.rx.Jabbers: 0 > dev.bge.0.stats.rx.UndersizePkts: 0 > dev.bge.0.stats.rx.inRangeLengthError: 0 > dev.bge.0.stats.rx.outRangeLengthError: 0 > dev.bge.0.stats.tx.Octets: 2215096864 > dev.bge.0.stats.tx.Collisions: 0 > dev.bge.0.stats.tx.XonSent: 0 > dev.bge.0.stats.tx.XoffSent: 0 > dev.bge.0.stats.tx.flowControlDone: 0 > dev.bge.0.stats.tx.InternalMacTransmitErrors: 0 > dev.bge.0.stats.tx.SingleCollisionFrames: 0 > dev.bge.0.stats.tx.MultipleCollisionFrames: 0 > dev.bge.0.stats.tx.DeferredTransmissions: 0 > dev.bge.0.stats.tx.ExcessiveCollisions: 0 > dev.bge.0.stats.tx.LateCollisions: 0 > dev.bge.0.stats.tx.UcastPkts: 25738364 > dev.bge.0.stats.tx.MulticastPkts: 0 > dev.bge.0.stats.tx.BroadcastPkts: 15 > dev.bge.0.stats.tx.CarrierSenseErrors: 0 > dev.bge.0.stats.tx.Discards: 0 > dev.bge.0.stats.tx.Errors: 0 > > errors..ERRORS!)(@!*() > > Hitting that 400k pps limit again, and this is a slower machine than > my dual 2212. > Going to compile 7-STABLE with options for the cpu and will report back > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From adrian at freebsd.org Tue Jul 1 04:36:28 2008 From: adrian at freebsd.org (Adrian Chadd) Date: Tue Jul 1 04:36:31 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: 2008/7/1 Sepherosa Ziehau : > Properly configured #RX desc and timer intr interval will be required > to make sure that the RX desc collection could keep up with the > hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ > good result, i.e. TX/RX @linespeed without livelocking the system. > The drawback of pure timer intr is that you waste extra cpu power, > when there is nothing to process. What packet rate is "linespeed" ? With what size packets? arian From sepherosa at gmail.com Tue Jul 1 05:05:38 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Tue Jul 1 05:05:41 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> Message-ID: On 7/1/08, Adrian Chadd wrote: > 2008/7/1 Sepherosa Ziehau : > > > > > Properly configured #RX desc and timer intr interval will be required > > to make sure that the RX desc collection could keep up with the > > hardware speed. I used pure timer intr (8000Hz) on nfe(4) in dfly w/ > > good result, i.e. TX/RX @linespeed without livelocking the system. > > The drawback of pure timer intr is that you waste extra cpu power, > > when there is nothing to process. > > > What packet rate is "linespeed" ? With what size packets? I did not mean pps w/ 64bytes packet. Didn't have time to measure it yet. I only saw that msk(4) and em(4) could accept 64bytes packets @1.4Mpps. I meant netperf -t TCP_STREAM with -s/-S/-m 65536 on an AMDX2 3600+ with 1GBytes ram in i386 mode. Best Regards, sephe -- Live Free or Die From mike at sentex.net Tue Jul 1 06:06:49 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 1 06:06:53 2008 Subject: Route messages In-Reply-To: References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> Message-ID: <200807010606.m6166jFe084204@lava.sentex.ca> At 10:34 PM 6/27/2008, mike@sentex.net wrote: >On Sun, 15 Jun 2008 11:16:17 +0100, in sentex.lists.freebsd.net you >wrote: > > >Paul wrote: > >> Get these with GRE tunnel on > >> FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun May 11 19:00:57 EDT > >> 2008 :/usr/obj/usr/src/sys/ROUTER amd64 > >> But do not get them with 7.0-RELEASE > >> > >> Any ideas what changed? :) Wish there was some sort of changelog.. > >> # of messages per second seems consistent with packets per second on > >> GRE interface.. > >> No impact in routing, but definitely impact in cpu usage for all > >> processes monitoring the route messages. > > > >RTM_MISS is actually fairly common when you don't have a default route. > > > >Hi, > I am seeing this issue as well on a pair of recently deployed >boxes, one running MPD and one acting as an area router in front of >it. The MPD box has a default route and only has 400 routes or so. > >A steady stream of those messages, upwards of 500 per second. > >got message of size 96 on Fri Jun 27 22:25:42 2008 >RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >0, flags: >locks: inits: >sockaddrs: > default > >got message of size 96 on Fri Jun 27 22:25:42 2008 >RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >0, flags: >locks: inits: >sockaddrs: > default > >Is there a way to try and track down what is generating those messages >? Its eating up a fair bit of cpu with quagga (the zebra process >specifically) I narrowed down where the change to RELENG_7 happened. It looks like a commit around April 22nd caused the behaviour to change. When a box acting as a router has a packet transit it, an RTM_MISS is generated for *each packet*... Given a setup of H1 ---- R1 ----- H2 where H1 is 10.10.1.2/24 H2 is 10.20.1.2/24 and R1 has 2 interfaces, 10.10.1.1/24 and 10.20.1.1/24 Pinging H2 from H1 makes R1 generate a RTM_MISS for each packet! For routing daemons such as zebra, this eats up a *lot* of CPU. Turning on ip_fast_forwarding stops this behaviour on R1. However, if the interface routing the packet is an netgraph interface (e.g. mpd) fast_forwarding doesnt seem to have an effect and the RTM_MISS messages are generated again for each packet. The ping packet below is a valid icmp echo request and reply. e.g 0[releng7]# ping -c 2 -S 10.20.1.2 10.10.1.2 PING 10.10.1.2 (10.10.1.2) from 10.20.1.2: 56 data bytes 64 bytes from 10.10.1.2: icmp_seq=0 ttl=63 time=0.302 ms 64 bytes from 10.10.1.2: icmp_seq=1 ttl=63 time=0.337 ms --- 10.10.1.2 ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.302/0.320/0.337/0.018 ms 0[releng7]# generates 4 messages on the router [r7-router]# route -n monitor got message of size 96 on Tue Jul 1 00:42:35 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Tue Jul 1 00:42:35 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Tue Jul 1 00:42:36 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default got message of size 96 on Tue Jul 1 00:42:36 2008 RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno 0, flags: locks: inits: sockaddrs: default I am thinking http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html is the commit ? If I revert to the prev version, the issue goes away. kernel is just 0[r7-router]% diff router GENERIC 24,27c24 < ident router < < makeoptions MODULES_OVERRIDE="ipfw acpi" < --- > ident GENERIC 37,38c34,35 < #options INET6 # IPv6 communications protocols < #options SCTP # Stream Control Transmission Protocol --- > options INET6 # IPv6 communications protocols > options SCTP # Stream Control Transmission Protocol 47c44 < #options NFSLOCKD # Network Lock Manager --- > options NFSLOCKD # Network Lock Manager 61c58 < #options STACK # stack(9) support --- > options STACK # stack(9) support 303c300 < #device uslcom # SI Labs CP2101/CP2102 serial adapters --- > device uslcom # SI Labs CP2101/CP2102 serial adapters ---Mike From paul at gtcomm.net Tue Jul 1 06:24:00 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 06:24:04 2008 Subject: Route messages In-Reply-To: <200807010606.m6166jFe084204@lava.sentex.ca> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> Message-ID: <4869CDFA.3090800@gtcomm.net> Turning on / off fastforwarding has no effect for me. I still get the messages. I also get major ticks of 'destinations found unreachable' in netstat -rs Mike Tancsa wrote: > At 10:34 PM 6/27/2008, mike@sentex.net wrote: >> On Sun, 15 Jun 2008 11:16:17 +0100, in sentex.lists.freebsd.net you >> wrote: >> >> >Paul wrote: >> >> Get these with GRE tunnel on >> >> FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun May 11 19:00:57 EDT >> >> 2008 :/usr/obj/usr/src/sys/ROUTER amd64 >> >> But do not get them with 7.0-RELEASE >> >> >> >> Any ideas what changed? :) Wish there was some sort of changelog.. >> >> # of messages per second seems consistent with packets per second on >> >> GRE interface.. >> >> No impact in routing, but definitely impact in cpu usage for all >> >> processes monitoring the route messages. >> > >> >RTM_MISS is actually fairly common when you don't have a default route. >> > >> >> Hi, >> I am seeing this issue as well on a pair of recently deployed >> boxes, one running MPD and one acting as an area router in front of >> it. The MPD box has a default route and only has 400 routes or so. >> >> A steady stream of those messages, upwards of 500 per second. >> >> got message of size 96 on Fri Jun 27 22:25:42 2008 >> RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >> 0, flags: >> locks: inits: >> sockaddrs: >> default >> >> got message of size 96 on Fri Jun 27 22:25:42 2008 >> RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >> 0, flags: >> locks: inits: >> sockaddrs: >> default >> >> Is there a way to try and track down what is generating those messages >> ? Its eating up a fair bit of cpu with quagga (the zebra process >> specifically) > > I narrowed down where the change to RELENG_7 happened. It looks like > a commit around April 22nd caused the behaviour to change. > > When a box acting as a router has a packet transit it, an RTM_MISS is > generated for *each packet*... > > > Given a setup of > > H1 ---- R1 ----- H2 > > where > H1 is 10.10.1.2/24 > H2 is 10.20.1.2/24 > and > R1 has 2 interfaces, 10.10.1.1/24 and 10.20.1.1/24 > > Pinging H2 from H1 makes R1 generate a RTM_MISS for each packet! For > routing daemons such as zebra, this eats up a *lot* of CPU. Turning > on ip_fast_forwarding stops this behaviour on R1. However, if the > interface routing the packet is an netgraph interface (e.g. mpd) > fast_forwarding doesnt seem to have an effect and the RTM_MISS > messages are generated again for each packet. > > > The ping packet below is a valid icmp echo request and reply. > > e.g > 0[releng7]# ping -c 2 -S 10.20.1.2 10.10.1.2 > PING 10.10.1.2 (10.10.1.2) from 10.20.1.2: 56 data bytes > 64 bytes from 10.10.1.2: icmp_seq=0 ttl=63 time=0.302 ms > 64 bytes from 10.10.1.2: icmp_seq=1 ttl=63 time=0.337 ms > > --- 10.10.1.2 ping statistics --- > 2 packets transmitted, 2 packets received, 0.0% packet loss > round-trip min/avg/max/stddev = 0.302/0.320/0.337/0.018 ms > 0[releng7]# > > generates 4 messages on the router > > [r7-router]# route -n monitor > > got message of size 96 on Tue Jul 1 00:42:35 2008 > RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno > 0, flags: > locks: inits: > sockaddrs: > default > > got message of size 96 on Tue Jul 1 00:42:35 2008 > RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno > 0, flags: > locks: inits: > sockaddrs: > default > > got message of size 96 on Tue Jul 1 00:42:36 2008 > RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno > 0, flags: > locks: inits: > sockaddrs: > default > > got message of size 96 on Tue Jul 1 00:42:36 2008 > RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno > 0, flags: > locks: inits: > sockaddrs: > default > > > > I am thinking > > http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html > is the commit ? If I revert to the prev version, the issue goes away. > > > kernel is just > > 0[r7-router]% diff router GENERIC > 24,27c24 > < ident router > < > < makeoptions MODULES_OVERRIDE="ipfw acpi" > < > --- > > ident GENERIC > 37,38c34,35 > < #options INET6 # IPv6 communications protocols > < #options SCTP # Stream Control Transmission > Protocol > --- > > options INET6 # IPv6 communications protocols > > options SCTP # Stream Control Transmission > Protocol > 47c44 > < #options NFSLOCKD # Network Lock Manager > --- > > options NFSLOCKD # Network Lock Manager > 61c58 > < #options STACK # stack(9) support > --- > > options STACK # stack(9) support > 303c300 > < #device uslcom # SI Labs CP2101/CP2102 serial > adapters > --- > > device uslcom # SI Labs CP2101/CP2102 serial > adapters > > > ---Mike > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From mike at sentex.net Tue Jul 1 06:32:10 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 1 06:32:15 2008 Subject: Route messages In-Reply-To: <4869CDFA.3090800@gtcomm.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869CDFA.3090800@gtcomm.net> Message-ID: <200807010632.m616W7i2084311@lava.sentex.ca> At 02:26 AM 7/1/2008, Paul wrote: >Turning on / off fastforwarding has no effect for me. I still get >the messages. >I also get major ticks of 'destinations found unreachable' in netstat -rs if you use http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/netinet/ip_input.c?rev=1.332.2.1;content-type=text%2Fplain does it fix it for you ? I just cvsup'd to a RELENG_7 as of today, but used the older version of ip_input.c and I no longer get the blast of RTM_MISS messages ---Mike >Mike Tancsa wrote: >>At 10:34 PM 6/27/2008, mike@sentex.net wrote: >>>On Sun, 15 Jun 2008 11:16:17 +0100, in sentex.lists.freebsd.net you >>>wrote: >>> >>> >Paul wrote: >>> >> Get these with GRE tunnel on >>> >> FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #5: Sun May 11 19:00:57 EDT >>> >> 2008 :/usr/obj/usr/src/sys/ROUTER amd64 >>> >> But do not get them with 7.0-RELEASE >>> >> >>> >> Any ideas what changed? :) Wish there was some sort of changelog.. >>> >> # of messages per second seems consistent with packets per second on >>> >> GRE interface.. >>> >> No impact in routing, but definitely impact in cpu usage for all >>> >> processes monitoring the route messages. >>> > >>> >RTM_MISS is actually fairly common when you don't have a default route. >>> > >>> >>>Hi, >>> I am seeing this issue as well on a pair of recently deployed >>>boxes, one running MPD and one acting as an area router in front of >>>it. The MPD box has a default route and only has 400 routes or so. >>> >>>A steady stream of those messages, upwards of 500 per second. >>> >>>got message of size 96 on Fri Jun 27 22:25:42 2008 >>>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >>>0, flags: >>>locks: inits: >>>sockaddrs: >>> default >>> >>>got message of size 96 on Fri Jun 27 22:25:42 2008 >>>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, errno >>>0, flags: >>>locks: inits: >>>sockaddrs: >>> default >>> >>>Is there a way to try and track down what is generating those messages >>>? Its eating up a fair bit of cpu with quagga (the zebra process >>>specifically) >> >>I narrowed down where the change to RELENG_7 happened. It looks >>like a commit around April 22nd caused the behaviour to change. >> >>When a box acting as a router has a packet transit it, an RTM_MISS >>is generated for *each packet*... >> >> >>Given a setup of >> >>H1 ---- R1 ----- H2 >> >>where >>H1 is 10.10.1.2/24 >>H2 is 10.20.1.2/24 >>and >>R1 has 2 interfaces, 10.10.1.1/24 and 10.20.1.1/24 >> >>Pinging H2 from H1 makes R1 generate a RTM_MISS for each >>packet! For routing daemons such as zebra, this eats up a *lot* of >>CPU. Turning on ip_fast_forwarding stops this behaviour on >>R1. However, if the interface routing the packet is an netgraph >>interface (e.g. mpd) fast_forwarding doesnt seem to have an effect >>and the RTM_MISS messages are generated again for each packet. >> >> >>The ping packet below is a valid icmp echo request and reply. >> >>e.g >>0[releng7]# ping -c 2 -S 10.20.1.2 10.10.1.2 >>PING 10.10.1.2 (10.10.1.2) from 10.20.1.2: 56 data bytes >>64 bytes from 10.10.1.2: icmp_seq=0 ttl=63 time=0.302 ms >>64 bytes from 10.10.1.2: icmp_seq=1 ttl=63 time=0.337 ms >> >>--- 10.10.1.2 ping statistics --- >>2 packets transmitted, 2 packets received, 0.0% packet loss >>round-trip min/avg/max/stddev = 0.302/0.320/0.337/0.018 ms >>0[releng7]# >> >>generates 4 messages on the router >> >>[r7-router]# route -n monitor >> >>got message of size 96 on Tue Jul 1 00:42:35 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default >> >>got message of size 96 on Tue Jul 1 00:42:35 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default >> >>got message of size 96 on Tue Jul 1 00:42:36 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default >> >>got message of size 96 on Tue Jul 1 00:42:36 2008 >>RTM_MISS: Lookup failed on this address: len 96, pid: 0, seq 0, >>errno 0, flags: >>locks: inits: >>sockaddrs: >> default >> >> >> >>I am thinking >> >>http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html >>is the commit ? If I revert to the prev version, the issue goes away. >> >> >>kernel is just >> >>0[r7-router]% diff router GENERIC >>24,27c24 >>< ident router >>< >>< makeoptions MODULES_OVERRIDE="ipfw acpi" >>< >>--- >> > ident GENERIC >>37,38c34,35 >>< #options INET6 # IPv6 communications protocols >>< #options SCTP # Stream Control >>Transmission Protocol >>--- >> > options INET6 # IPv6 communications protocols >> > options SCTP # Stream Control >> Transmission Protocol >>47c44 >>< #options NFSLOCKD # Network Lock Manager >>--- >> > options NFSLOCKD # Network Lock Manager >>61c58 >>< #options STACK # stack(9) support >>--- >> > options STACK # stack(9) support >>303c300 >>< #device uslcom # SI Labs CP2101/CP2102 >>serial adapters >>--- >> > device uslcom # SI Labs CP2101/CP2102 >> serial adapters >> >> >> ---Mike >>_______________________________________________ >>freebsd-net@freebsd.org mailing list >>http://lists.freebsd.org/mailman/listinfo/freebsd-net >>To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >_______________________________________________ >freebsd-net@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From andre at freebsd.org Tue Jul 1 08:34:38 2008 From: andre at freebsd.org (Andre Oppermann) Date: Tue Jul 1 08:34:42 2008 Subject: Route messages In-Reply-To: <200807010606.m6166jFe084204@lava.sentex.ca> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> Message-ID: <4869EC1E.8060009@freebsd.org> Mike Tancsa wrote: > I am thinking > > http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html > is the commit ? If I revert to the prev version, the issue goes away. Yes, this change doesn't look right. It should only do the route lookup in ip_input.c when there was an EMSGSIZE error returned by ip_output(). The rtalloc_ign() call causes the message to be sent because it always sets report to one. The default message is RTM_MISS. I'll try to prep an updated patch which doesn't have these issues later today. -- Andre From stefan.lambrev at moneybookers.com Tue Jul 1 08:40:51 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 08:40:54 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <20080701012531.GA92392@citylink.fud.org.nz> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> Message-ID: <4869ED8B.80508@moneybookers.com> Andrew Thompson wrote: > On Mon, Jun 30, 2008 at 07:16:29PM +0900, Pyun YongHyeon wrote: > >> On Mon, Jun 30, 2008 at 12:11:40PM +0300, Stefan Lambrev wrote: >> > Greetings, >> > >> > I just noticed, that when I add em network card to bridge the checksum >> > offload is turned off. >> > I even put in my rc.conf: >> > ifconfig_em0="rxcsum up" >> > ifconfig_em1="rxcsum up" >> > but after reboot both em0 and em1 have this feature disabled. >> > >> > Is this expected behavior? Should I care about csum in bridge mode? >> > I noticed that enabling checksum offload manually improve things little btw. >> > >> >> AFAIK this is intended one, bridge(4) turns off Tx side checksum >> offload by default. I think disabling Tx checksum offload is >> required as not all members of a bridge may be able to do checksum >> offload. The same is true for TSO but it seems that bridge(4) >> doesn't disable it. >> If all members of bridge have the same hardware capability I think >> bridge(4) may not need to disable Tx side hardware assistance. I >> guess bridge(4) can scan every interface capabilities in a member >> and can decide what hardware assistance can be activated instead of >> blindly turning off Tx side hardware assistance. >> > > This patch should do that, are you able to test it Stefan? > ===> if_bridge (all) cc -O2 -fno-strict-aliasing -pipe -march=nocona -D_KERNEL -DKLD_MODULE -std=c99 -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/sys/CORE/opt_global.h -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer -I/usr/obj/usr/src/sys/CORE -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx -mno-3dnow -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -c /usr/src/sys/modules/if_bridge/../../net/if_bridge.c /usr/src/sys/modules/if_bridge/../../net/if_bridge.c: In function 'bridge_capabilities': /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:787: error: 'IFCAP_TOE' undeclared (first use in this function) /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:787: error: (Each undeclared identifier is reported only once /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:787: error: for each function it appears in.) *** Error code 1 1 error *** Error code 2 1 error *** Error code 2 1 error *** Error code 2 1 error *** Error code 2 1 error I'm building without "-j5" to see if the error message will change :) I'm using 7-STABLE from Jun 27 > > cheers, > Andrew > > ------------------------------------------------------------------------ > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Best Wishes, Stefan Lambrev ICQ# 24134177 From stefan.lambrev at moneybookers.com Tue Jul 1 08:49:00 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 08:49:03 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869880D.8040901@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <4869880D.8040901@ibctech.ca> Message-ID: <4869EF78.5060306@moneybookers.com> Steve Bertrand wrote: > Support (Rudy) wrote: >> Ingo Flaschberger wrote: >>> usually interface polling is also chosen to prevent "lock-ups". >>> man polling >> >> >> I used polling in FreeBSD 5.x and it helped a bunch. I set up a new >> router with 7.0 and MSI was recommended to me. (I noticed no >> difference when moving from polling -> MSI, however, on 5.4 polling >> seemed to help a lot. > > I'm curious now... how do you change individual device polling via > sysctl? Using sysctl for polling is deprecated I think. You can do it with ifconfig ifX polling (-polling) you can add polling in rc.conf options also: ifconfig_em0="polling up" #bridged interface in my conf > > Steve > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Best Wishes, Stefan Lambrev ICQ# 24134177 From stefan.lambrev at moneybookers.com Tue Jul 1 08:53:30 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 08:53:35 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> Message-ID: <4869F082.9010606@moneybookers.com> Hi, Ingo Flaschberger wrote: > Dear Rudy, > >> I used polling in FreeBSD 5.x and it helped a bunch. I set up a new >> router with 7.0 and MSI was recommended to me. (I noticed no >> difference when moving from polling -> MSI, however, on 5.4 polling >> seemed to help a lot. What are people using in 7.0? >> polling or MSI? > > if you have a inet-router with gige-uplinks, it is possible that there > will be (d)dos attacks. > only polling helps you then to keep the router manageable (but > dropping packets). Let me disagree :) I'm experimenting with bridge and Intel 82571EB Gigabit Ethernet Controller. On quad core system I have no problems with the stability of the bridge without polling. taskq em0 takes 100% CPU, but I have another three (cpus/cores) that are free and the router is very very stable, no lag on other interfaces and the average load is not very high too. > > Kind regards, > Ingo Flaschberger > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Best Wishes, Stefan Lambrev ICQ# 24134177 From paul at gtcomm.net Tue Jul 1 09:06:59 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 09:07:04 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869B025.9080006@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> Message-ID: <4869F42E.8040904@gtcomm.net> [Big list of testing , rebuilding kernel follows] Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot of junk in the config, added options NO_ADAPTIVE_MUTEXES not sure if that makes any difference or not, will test without. Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf 7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul 1 01:22:18 CDT 2008 amd64 Max input rate .. 587kpps? Take into consideration that these packets are being forwarded out em1 interface which causes a great impact on cpu usage. If I set up a firewall rule to block the packets it can do over 1mpps on em0 input. input (em0) output packets errs bytes packets errs bytes colls 587425 67677 35435456 466 0 25616 0 587412 26629 35434766 453 0 24866 0 587043 26874 35412442 410 0 22544 0 536117 30264 32347300 440 0 24164 0 546240 61521 32951060 459 0 25350 0 563568 66881 33998676 435 0 23894 0 572766 43243 34550840 440 0 24164 0 572336 44411 34525836 445 0 24558 0 572539 37013 34536222 457 0 25136 0 571340 39512 34459008 440 0 24110 0 572673 55137 34540576 438 0 24056 0 555506 49918 33505764 457 0 25330 0 545744 69010 32916908 461 0 25298 0 559472 75650 33745636 429 0 23694 0 564358 60130 34039104 433 0 23786 0 last pid: 1134; load averages: 1.04, 0.94, 0.59 up 0+00:14:13 01:49:59 70 processes: 6 running, 46 sleeping, 17 waiting, 1 lock CPU: 0.0% user, 0.0% nice, 25.6% system, 0.0% interrupt, 74.4% idle Mem: 11M Active, 6596K Inact, 45M Wired, 156K Cache, 9072K Buf, 1917M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 171 ki31 0K 16K RUN 1 12:40 97.56% idle: cpu1 36 root -68 - 0K 16K *em1 2 9:44 85.06% em0 taskq 10 root 171 ki31 0K 16K CPU3 3 11:10 82.47% idle: cpu3 13 root 171 ki31 0K 16K CPU0 0 12:25 73.88% idle: cpu0 11 root 171 ki31 0K 16K RUN 2 6:43 50.10% idle: cpu2 37 root -68 - 0K 16K CPU3 3 1:58 16.46% em1 taskq I noticed.. em0 taskq isn't using 100% cpu like it was on the generic kernel.. What's up with that? Why do I still have all 4 CPUs pretty idle and em0 taskq isn't near 100%? I'm going to try 4bsd and see if that makes it go back to the other way. em0: Excessive collisions = 0 em0: Sequence errors = 0 em0: Defer count = 0 em0: Missed Packets = 45395545 em0: Receive No Buffers = 95916690 em0: Receive Length Errors = 0 em0: Receive errors = 0 em0: Crc errors = 0 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 2740181 em0: watchdog timeouts = 0 em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 em0: XON Rcvd = 0 em0: XON Xmtd = 0 em0: XOFF Rcvd = 0 em0: XOFF Xmtd = 0 em0: Good Packets Rcvd = 450913688 em0: Good Packets Xmtd = 304777 em0: TSO Contexts Xmtd = 94 em0: TSO Contexts Failed = 0 -----Rebooting with: kern.hz=2000 hw.em.rxd=512 hw.em.txd=512 Seems maybe a little bit slower but it's hard to tell since i'm generating random packets the pps varies about 50k +/- probably depending on the randomness.. About the same PPS/errors.. here's a vmstat 1 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 1 52276K 1922M 286 0 1 0 277 0 0 0 7686 838 19436 0 15 85 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13431 127 33430 0 27 73 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13406 115 33222 0 27 73 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13430 115 33393 0 26 74 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13411 115 33322 0 26 74 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13576 123 33415 0 25 75 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13842 115 33354 0 26 74 ------Trying kern.kz=250 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 1 52288K 1923M 607 1 2 0 582 0 0 0 4885 789 12073 0 8 92 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13793 119 33552 0 27 73 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13959 115 33446 0 26 74 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13861 115 33707 0 30 70 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13784 115 33602 0 26 74 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13886 123 33843 0 26 74 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13913 115 33711 0 26 74 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13920 115 33766 0 27 73 pps still no major difference.. jumps between 530k-580k -----Putting HZ back to 1000, recompiling kernel with 4BSD SCHED.. many minutes later.. (can't do make -j with the kernel or it errors) Well, I have to say.. 4BSD is less pps, it will not go over 530k however it seems much, more consistent and not jumping around as much it stays between 520-530 most of the time and i see some ticks at 480's in netstat.. em0 taskq still not using 100%, max around 75-80 -----Building same as above but with preemption off procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 0 52288K 1922M 563 1 2 0 540 0 0 0 6724 725 22195 0 12 88 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13200 119 48075 0 27 73 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13243 123 49137 0 24 76 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13260 115 48633 0 26 74 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13247 115 48625 0 25 75 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13248 115 48687 0 24 76 hmm more context switches.. pps same, maybe a shade lower.. PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K 16K RUN 2 3:39 97.12% idle: cpu2 12 root 171 ki31 0K 16K CPU1 1 3:45 95.70% idle: cpu1 36 root -68 - 0K 16K CPU0 0 2:18 82.67% em0 taskq 10 root 171 ki31 0K 16K CPU3 3 3:24 82.57% idle: cpu3 13 root 171 ki31 0K 16K RUN 0 2:01 20.07% idle: cpu0 37 root -68 - 0K 16K - 3 0:31 15.58% em1 taskq -------rebuilding with ULE, keeping preemption off Hmm.. what the? 450-480kpps seems to be max here. That's.. weird.. I'm going to have to rebuild with Preemption on again just to double check this.. input (em0) output packets errs bytes packets errs bytes colls 464020 95690 28009004 434 0 23728 0 455318 90105 27484456 469 0 25778 0 455720 99914 27511970 462 0 25384 0 465019 86021 28071946 428 0 23392 0 456024 78336 27528862 440 0 24040 0 455018 93526 27468908 440 0 24040 0 461235 91218 27841604 464 0 25336 0 454345 89812 27427262 424 0 23176 0 452661 96937 27327392 441 0 24094 0 456584 90393 27561138 459 0 25222 0 455021 97441 27470158 450 0 24736 0 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us sy id 0 0 1 52276K 1655M 456 1 1 0 441 0 0 0 9775 3598 26256 0 20 80 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12817 119 33056 0 25 75 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12700 123 32975 0 27 73 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12659 115 32897 0 27 73 ------OK I'm stumped now.. Rebuilt with preemption and ULE and preemption again and it's not doing what it did before.. How could that be? Now about 500kpps.. That kind of inconsistency almost invalidates all my testing.. why would it be so much different after trying a bunch of kernel options and rebooting a bunch of times and then going back to the original config doesn't get you what it did in the beginning.. I'll have to dig into this further.. never seen anything like it :) Hopefully the ip_input fix will help free up a few cpu cycles. From adrian at freebsd.org Tue Jul 1 09:09:48 2008 From: adrian at freebsd.org (Adrian Chadd) Date: Tue Jul 1 09:09:52 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869F42E.8040904@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> Message-ID: There's an option to control how many packets it'll process each pass through the isr thread, isn't there? It'd be nicer if this stuff were able to be dynamically tuned. Adrian 2008/7/1 Paul : > [Big list of testing , rebuilding kernel follows] > > Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot of junk > in the config, added > options NO_ADAPTIVE_MUTEXES not sure if that makes any difference > or not, will test without. > Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf > 7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul 1 01:22:18 CDT 2008 amd64 > Max input rate .. 587kpps? Take into consideration that these packets are > being forwarded out em1 interface which > causes a great impact on cpu usage. If I set up a firewall rule to block > the packets it can do over 1mpps on em0 input. > > input (em0) output > packets errs bytes packets errs bytes colls > 587425 67677 35435456 466 0 25616 0 > 587412 26629 35434766 453 0 24866 0 > 587043 26874 35412442 410 0 22544 0 > 536117 30264 32347300 440 0 24164 0 > 546240 61521 32951060 459 0 25350 0 > 563568 66881 33998676 435 0 23894 0 > 572766 43243 34550840 440 0 24164 0 > 572336 44411 34525836 445 0 24558 0 > 572539 37013 34536222 457 0 25136 0 > 571340 39512 34459008 440 0 24110 0 > 572673 55137 34540576 438 0 24056 0 > 555506 49918 33505764 457 0 25330 0 > 545744 69010 32916908 461 0 25298 0 > 559472 75650 33745636 429 0 23694 0 > 564358 60130 34039104 433 0 23786 0 > > last pid: 1134; load averages: 1.04, 0.94, 0.59 > up 0+00:14:13 01:49:59 > 70 processes: 6 running, 46 sleeping, 17 waiting, 1 lock > CPU: 0.0% user, 0.0% nice, 25.6% system, 0.0% interrupt, 74.4% idle > Mem: 11M Active, 6596K Inact, 45M Wired, 156K Cache, 9072K Buf, 1917M Free > Swap: 8192M Total, 8192M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 12 root 171 ki31 0K 16K RUN 1 12:40 97.56% idle: cpu1 > 36 root -68 - 0K 16K *em1 2 9:44 85.06% em0 taskq > 10 root 171 ki31 0K 16K CPU3 3 11:10 82.47% idle: cpu3 > 13 root 171 ki31 0K 16K CPU0 0 12:25 73.88% idle: cpu0 > 11 root 171 ki31 0K 16K RUN 2 6:43 50.10% idle: cpu2 > 37 root -68 - 0K 16K CPU3 3 1:58 16.46% em1 taskq > > > I noticed.. em0 taskq isn't using 100% cpu like it was on the generic > kernel.. What's up with that? Why do I still have all 4 CPUs pretty idle and > em0 taskq isn't near 100%? I'm going to try 4bsd and see > if that makes it go back to the other way. > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 0 > em0: Missed Packets = 45395545 > em0: Receive No Buffers = 95916690 > em0: Receive Length Errors = 0 > em0: Receive errors = 0 > em0: Crc errors = 0 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 2740181 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 0 > em0: XON Xmtd = 0 > em0: XOFF Rcvd = 0 > em0: XOFF Xmtd = 0 > em0: Good Packets Rcvd = 450913688 > em0: Good Packets Xmtd = 304777 > em0: TSO Contexts Xmtd = 94 > em0: TSO Contexts Failed = 0 > > -----Rebooting with: > kern.hz=2000 > hw.em.rxd=512 > hw.em.txd=512 > > Seems maybe a little bit slower but it's hard to tell since i'm generating > random packets the pps varies about 50k +/- probably depending > on the randomness.. About the same PPS/errors.. here's a vmstat 1 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52276K 1922M 286 0 1 0 277 0 0 0 7686 838 19436 0 > 15 85 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13431 127 33430 0 > 27 73 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13406 115 33222 0 > 27 73 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13430 115 33393 0 > 26 74 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13411 115 33322 0 > 26 74 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13576 123 33415 0 > 25 75 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13842 115 33354 0 > 26 74 > > ------Trying kern.kz=250 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52288K 1923M 607 1 2 0 582 0 0 0 4885 789 12073 0 > 8 92 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13793 119 33552 0 > 27 73 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13959 115 33446 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13861 115 33707 0 > 30 70 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13784 115 33602 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13886 123 33843 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13913 115 33711 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13920 115 33766 0 > 27 73 > > pps still no major difference.. > jumps between 530k-580k > > -----Putting HZ back to 1000, > recompiling kernel with 4BSD SCHED.. > many minutes later.. (can't do make -j with the kernel or it errors) > Well, I have to say.. 4BSD is less pps, it will not go over 530k however it > seems much, > more consistent and not jumping around as much it stays between 520-530 most > of the time and i see some ticks > at 480's in netstat.. > em0 taskq still not using 100%, max around 75-80 > > -----Building same as above but with preemption off > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 0 52288K 1922M 563 1 2 0 540 0 0 0 6724 725 22195 0 > 12 88 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13200 119 48075 0 > 27 73 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13243 123 49137 0 > 24 76 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13260 115 48633 0 > 26 74 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13247 115 48625 0 > 25 75 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13248 115 48687 0 > 24 76 > > hmm more context switches.. > pps same, maybe a shade lower.. > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 11 root 171 ki31 0K 16K RUN 2 3:39 97.12% idle: cpu2 > 12 root 171 ki31 0K 16K CPU1 1 3:45 95.70% idle: cpu1 > 36 root -68 - 0K 16K CPU0 0 2:18 82.67% em0 taskq > 10 root 171 ki31 0K 16K CPU3 3 3:24 82.57% idle: cpu3 > 13 root 171 ki31 0K 16K RUN 0 2:01 20.07% idle: cpu0 > 37 root -68 - 0K 16K - 3 0:31 15.58% em1 taskq > > > -------rebuilding with ULE, keeping preemption off > Hmm.. what the? > 450-480kpps seems to be max here. That's.. weird.. > I'm going to have to rebuild with Preemption on again just to double check > this.. > input (em0) output > packets errs bytes packets errs bytes colls > 464020 95690 28009004 434 0 23728 0 > 455318 90105 27484456 469 0 25778 0 > 455720 99914 27511970 462 0 25384 0 > 465019 86021 28071946 428 0 23392 0 > 456024 78336 27528862 440 0 24040 0 > 455018 93526 27468908 440 0 24040 0 > 461235 91218 27841604 464 0 25336 0 > 454345 89812 27427262 424 0 23176 0 > 452661 96937 27327392 441 0 24094 0 > 456584 90393 27561138 459 0 25222 0 > 455021 97441 27470158 450 0 24736 0 > > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52276K 1655M 456 1 1 0 441 0 0 0 9775 3598 26256 0 > 20 80 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12817 119 33056 0 > 25 75 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12700 123 32975 0 > 27 73 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12659 115 32897 0 > 27 73 > > > ------OK I'm stumped now.. Rebuilt with preemption and ULE and preemption > again and it's not doing what it did before.. > How could that be? Now about 500kpps.. > > That kind of inconsistency almost invalidates all my testing.. why would it > be so much different after trying a bunch of kernel options and rebooting a > bunch of times and then going back to the original config doesn't get you > what it did in the beginning.. > > I'll have to dig into this further.. never seen anything like it :) > > Hopefully the ip_input fix will help free up a few cpu cycles. > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From bz at FreeBSD.org Tue Jul 1 09:14:01 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Tue Jul 1 09:14:07 2008 Subject: Route messages In-Reply-To: <4869EC1E.8060009@freebsd.org> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> Message-ID: <20080701084933.W57089@maildrop.int.zabbadoz.net> On Tue, 1 Jul 2008, Andre Oppermann wrote: Hi, > Mike Tancsa wrote: >> I am thinking >> >> http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html >> is the commit ? If I revert to the prev version, the issue goes away. Ha, I finally know why I ended up on Cc: of a thread I had no idea about. Someone could have told me instead of blindly adding me;-) > Yes, this change doesn't look right. It should only do the route > lookup in ip_input.c when there was an EMSGSIZE error returned by > ip_output(). The rtalloc_ign() call causes the message to be sent > because it always sets report to one. The default message is RTM_MISS. > > I'll try to prep an updated patch which doesn't have these issues later > today. Yeah my bad. Sorry. If you do that, do not do an extra route lookup if possible, correct the rtalloc call. Thanks. Bjoern -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From bms at FreeBSD.org Tue Jul 1 09:14:48 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Tue Jul 1 09:14:52 2008 Subject: HEAD UP: non-MPSAFE network drivers to be disabled (was: 8.0 network stack MPsafety goals (fwd)) In-Reply-To: <20080629180126.F90836@fledge.watson.org> References: <20080524111715.T64552@fledge.watson.org> <20080629180126.F90836@fledge.watson.org> Message-ID: <4869F586.7010708@FreeBSD.org> Robert Watson wrote: > > An FYI on the state of things here: in the last month, John has > updated a number of device drivers to be MPSAFE, and the USB work > remains in-flight. I'm holding fire a bit on disabling IFF_NEEDSGIANT > while things settle and I catch up on driver state, and will likely > send out an update next week regarding which device drivers remain on > the kill list, and generally what the status of this project is. Goliath needs to get stoned, it's been a major hurdle in doing IGMPv3/SSM because of the locking fandango. I look forward to it. [For those who ask, what the hell? IGMPv3 potentially makes your wireless multicast better with or without little things like SSM, because of protocol robustness, compact state-changes, and the use of a single link-local IPv4 group for state-change reports, making it easier for your switches to actually do their job.] From stefan.lambrev at moneybookers.com Tue Jul 1 09:17:29 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 09:17:31 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869F42E.8040904@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> Message-ID: <4869F621.4000508@moneybookers.com> Greetings Paul, > > > ------OK I'm stumped now.. Rebuilt with preemption and ULE and > preemption again and it's not doing what it did before.. I saw this in my configuration too :) Just leave your test running for longer time and you will see this strange inconsistency in action. In my configuration I almost always have better throughput after reboot, which drops latter (5-10min under flood) with 50-60kpps and after another 10-15min the number of correctly passed packet increase again. Looks like "auto tuning" of which I'm not aware :) > How could that be? Now about 500kpps.. > > That kind of inconsistency almost invalidates all my testing.. why > would it be so much different after trying a bunch of kernel options > and rebooting a bunch of times and then going back to the original > config doesn't get you what it did in the beginning.. > > I'll have to dig into this further.. never seen anything like it :) > > Hopefully the ip_input fix will help free up a few cpu cycles. > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -- Best Wishes, Stefan Lambrev ICQ# 24134177 From bz at FreeBSD.org Tue Jul 1 09:25:07 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Tue Jul 1 09:25:09 2008 Subject: Route messages In-Reply-To: <20080701084933.W57089@maildrop.int.zabbadoz.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> Message-ID: <20080701092254.T57089@maildrop.int.zabbadoz.net> On Tue, 1 Jul 2008, Bjoern A. Zeeb wrote: Hi, > On Tue, 1 Jul 2008, Andre Oppermann wrote: > > Hi, > >> Mike Tancsa wrote: >>> I am thinking >>> >>> http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html >>> is the commit ? If I revert to the prev version, the issue goes away. > > Ha, I finally know why I ended up on Cc: of a thread I had no idea > about. Someone could have told me instead of blindly adding me;-) > > >> Yes, this change doesn't look right. It should only do the route >> lookup in ip_input.c when there was an EMSGSIZE error returned by >> ip_output(). The rtalloc_ign() call causes the message to be sent >> because it always sets report to one. The default message is RTM_MISS. >> >> I'll try to prep an updated patch which doesn't have these issues later >> today. > > Yeah my bad. Sorry. > > If you do that, do not do an extra route lookup if possible, correct > the rtalloc call. Thanks. So I had a very quick look at the code between doing something else. I think the only change needed is this if I am not mistaken but my head is far away nowhere close enough in this code. Andre, could you review this? Index: sys/netinet/ip_input.c =================================================================== RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v retrieving revision 1.332.2.2 diff -u -p -r1.332.2.2 ip_input.c --- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 +++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 @@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described in RFC1191. */ bzero(&ro, sizeof(ro)); - rtalloc_ign(&ro, RTF_CLONING); error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From stefan.lambrev at moneybookers.com Tue Jul 1 09:51:50 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 09:51:56 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <20080701012531.GA92392@citylink.fud.org.nz> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> Message-ID: <4869FE2E.4070805@moneybookers.com> Hi, May be a stupid questions, but: 1) There are zero matches of IFCAP_TOE in kernel sources .. there is not support for TOE in 7.0, but may be this is work in progress for 8-current? 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - TOE should be repleaced with RXCSUM or just removed? 3) Why RX is never checked? In my case this doesn't matter because em turn off both TX and RX if only one is disabled, but probably there is a hardware, that can separate them e.g. RX disabled while TX enabled? 4) I'm not sure why bridge should not work with two interfaces one of which support TX and the other does not? At least if I turn on checksum offload only on one of the interfaces the bridge is still working ... Andrew Thompson wrote: - cut - > > > This patch should do that, are you able to test it Stefan? > > > cheers, > Andrew > P.S. I saw very good results with netisr2 on a kernel from p4 before few months .. are there any patches flying around so I can test them with 7-STABLE? :) -- Best Wishes, Stefan Lambrev ICQ# 24134177 From stefan.lambrev at moneybookers.com Tue Jul 1 10:10:17 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 10:10:20 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <4869FE2E.4070805@moneybookers.com> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> <4869FE2E.4070805@moneybookers.com> Message-ID: <486A0281.208@moneybookers.com> Hi, Sorry to reply to myself. Stefan Lambrev wrote: > Hi, > > May be a stupid questions, but: > > 1) There are zero matches of IFCAP_TOE in kernel sources .. there is > not support for TOE in 7.0, but may be this is work in progress for > 8-current? > 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - > TOE should be repleaced with RXCSUM or just removed? Your patch plus this small change (replacing TOE with RXCSUM) seems to work fine for me - kernel compiles without a problem and checksum offload is enabled after reboot. > 3) Why RX is never checked? In my case this doesn't matter because em > turn off both TX and RX if only one is disabled, but probably there is > a hardware, > that can separate them e.g. RX disabled while TX enabled? > 4) I'm not sure why bridge should not work with two interfaces one of > which support TX and the other does not? At least if I turn on > checksum offload > only on one of the interfaces the bridge is still working ... > > Andrew Thompson wrote: > > - cut - >> >> >> This patch should do that, are you able to test it Stefan? >> >> >> cheers, >> Andrew >> > P.S. I saw very good results with netisr2 on a kernel from p4 before > few months .. are there any patches flying around so I can test them > with 7-STABLE? :) > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From if at xip.at Tue Jul 1 12:27:18 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 12:27:23 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869A099.5070206@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <4869A099.5070206@gtcomm.net> Message-ID: Dear Paul, > I have been unable to even come close to livelocking the machine with the em > driver interrupt moderation. > So that to me throws polling out the window. I tried 8000hz with polling > modified to allow 10000 burst and it makes no difference higher hz-values gives you better latenca but less overall speed. 2000hz should be enough. Kind regards, Ingo Flaschberger From if at xip.at Tue Jul 1 12:43:17 2008 From: if at xip.at (Ingo Flaschberger) Date: Tue Jul 1 12:43:21 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869F42E.8040904@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> Message-ID: Dear Paul, > > Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot of junk > in the config, added > options NO_ADAPTIVE_MUTEXES not sure if that makes any difference > or not, will test without. > Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf > 7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul 1 01:22:18 CDT 2008 amd64 > Max input rate .. 587kpps? Take into consideration that these packets are > being forwarded out em1 interface which > causes a great impact on cpu usage. If I set up a firewall rule to block the > packets it can do over 1mpps on em0 input. would be great if you can also test with 32bit. what value do you have at net.inet.ip.intr_queue_maxlen? kind regards, Ingo Flaschberger From guru at Sisis.de Tue Jul 1 13:06:57 2008 From: guru at Sisis.de (Matthias Apitz) Date: Tue Jul 1 13:07:06 2008 Subject: RELENG_7 && ath && WPA && stuck when bgscan is active on interface Message-ID: <20080701125452.GA10729@rebelion.Sisis.de> Hello, I'm running the above configuration, RELENG_7 kernel and WPA, on an Asus laptop eeePC 900 for which one must patch the HAL with: http://snapshots.madwifi.org/special/madwifi-ng-r2756+ar5007.tar.gz ) all is fine, mostly, but when 'bgscan' is activated on the interface ath0 it get stuck reproduce-able after some time without any traffic through the interface; setting 'ifconfig ath0 -bgscan' makes the problem going away; could it be related to the bug I'm facing on another laptop with bgscan/WPA/iwi0, see: http://www.freebsd.org/cgi/query-pr.cgi?pr=122331 thx matthias -- Matthias Apitz Manager Technical Support - OCLC GmbH Gruenwalder Weg 28g - 82041 Oberhaching - Germany t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211 e - w http://www.oclc.org/ http://www.UnixArea.de/ b http://gurucubano.blogspot.com/ ?...una sola vez, que es cuanto basta si se trata de verdades definitivas.? ?...only once, which is enough if it has todo with definite truth.? Jos? Saramago, Historia del Cerca de Lisboa From max at love2party.net Tue Jul 1 13:34:47 2008 From: max at love2party.net (Max Laier) Date: Tue Jul 1 13:34:51 2008 Subject: altq on vlan In-Reply-To: <486A2F5F.6070408@FreeBSD.org> References: <1214651667.267043.71931.nullmailer@cicuta.babolo.ru> <200806291743.15021.max@love2party.net> <486A2F5F.6070408@FreeBSD.org> Message-ID: <200807011532.39508.max@love2party.net> On Tuesday 01 July 2008 15:21:35 Sergey Matveychuk wrote: > Max Laier wrote: > > Now please ... let this die, it's stupid! > > I wrote the patch for *very* specific purpose. I've never want to ask > commit it and I did not think it'll be use someone seriously. > > Sorry for touching your religious sense :) Sorry for the harsh language. It's just that this comes up every other month and (unexperienced) users might be using the patch without clue - hence I wanted to have some kind of "DON'T DO THIS UNLESS YOU KNOW WHAT YOU ARE DOING" recorded in this thread - I hope that google will help people to actually find it. Would you mind adding some words to that effect to your patch? And yes, the patch has some value in some situations, but most certainly not in the general case. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From sem at FreeBSD.org Tue Jul 1 13:41:51 2008 From: sem at FreeBSD.org (Sergey Matveychuk) Date: Tue Jul 1 13:41:54 2008 Subject: altq on vlan In-Reply-To: <200806291743.15021.max@love2party.net> References: <1214651667.267043.71931.nullmailer@cicuta.babolo.ru> <200806291743.15021.max@love2party.net> Message-ID: <486A2F5F.6070408@FreeBSD.org> Max Laier wrote: > > Now please ... let this die, it's stupid! > I wrote the patch for *very* specific purpose. I've never want to ask commit it and I did not think it'll be use someone seriously. Sorry for touching your religious sense :) -- Dixi. Sem. From sem at FreeBSD.org Tue Jul 1 13:47:20 2008 From: sem at FreeBSD.org (Sergey Matveychuk) Date: Tue Jul 1 13:47:28 2008 Subject: altq on vlan In-Reply-To: <200807011532.39508.max@love2party.net> References: <1214651667.267043.71931.nullmailer@cicuta.babolo.ru> <200806291743.15021.max@love2party.net> <486A2F5F.6070408@FreeBSD.org> <200807011532.39508.max@love2party.net> Message-ID: <486A356E.5000307@FreeBSD.org> Max Laier wrote: > > Would you mind adding some words to that effect to your patch? > I think I'll hide it from public access instead. Looks like some people prefer to patch kernel instead of learning how to make a queue on parent interface. -- Dixi. Sem. From thompsa at FreeBSD.org Tue Jul 1 14:04:28 2008 From: thompsa at FreeBSD.org (Andrew Thompson) Date: Tue Jul 1 14:04:33 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <4869FE2E.4070805@moneybookers.com> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> <4869FE2E.4070805@moneybookers.com> Message-ID: <20080701140550.GA379@citylink.fud.org.nz> On Tue, Jul 01, 2008 at 12:51:42PM +0300, Stefan Lambrev wrote: > Hi, > > May be a stupid questions, but: > > 1) There are zero matches of IFCAP_TOE in kernel sources .. there is not > support for TOE in 7.0, but may be this is work in progress for 8-current? Yes, its in current only. Just remove IFCAP_TOE. > 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - TOE > should be repleaced with RXCSUM or just removed? > 3) Why RX is never checked? In my case this doesn't matter because em turn > off both TX and RX if only one is disabled, but probably there is a > hardware, > that can separate them e.g. RX disabled while TX enabled? Rx does not matter, whatever isnt offloaded in hardware is just computed locally such as checking the cksum. Its Tx that messes up the bridge, if a outgoing packet is generated locally on an interface that has Tx offloading, it may actaully be sent out a different bridge member that does not have that capability. This would cause it to be sent with an invalid checksum for instance. The bridge used to just disable Tx offloading but this patch you are testing makes sure each feature is supported by all members. > 4) I'm not sure why bridge should not work with two interfaces one of which > support TX and the other does not? At least if I turn on checksum offload > only on one of the interfaces the bridge is still working ... > > Andrew Thompson wrote: > > - cut - >> >> >> This patch should do that, are you able to test it Stefan? >> >> >> cheers, >> Andrew >> > P.S. I saw very good results with netisr2 on a kernel from p4 before few > months .. are there any patches flying around so I can test them with > 7-STABLE? :) > > -- > > Best Wishes, > Stefan Lambrev > ICQ# 24134177 > From stefan.lambrev at moneybookers.com Tue Jul 1 14:20:25 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 1 14:20:29 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <20080701140550.GA379@citylink.fud.org.nz> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> <4869FE2E.4070805@moneybookers.com> <20080701140550.GA379@citylink.fud.org.nz> Message-ID: <486A3D23.2020100@moneybookers.com> Greetings Andrew, The patch compiles and works as expected. I noticed something strange btw - swi1: net was consuming 100% WCPU (shown on top -S) but I'm not sure this have something to do with your patch, as I can't reproduce it right now .. Andrew Thompson wrote: > On Tue, Jul 01, 2008 at 12:51:42PM +0300, Stefan Lambrev wrote: > >> Hi, >> >> May be a stupid questions, but: >> >> 1) There are zero matches of IFCAP_TOE in kernel sources .. there is not >> support for TOE in 7.0, but may be this is work in progress for 8-current? >> > > Yes, its in current only. Just remove IFCAP_TOE. > > >> 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - TOE >> should be repleaced with RXCSUM or just removed? >> 3) Why RX is never checked? In my case this doesn't matter because em turn >> off both TX and RX if only one is disabled, but probably there is a >> hardware, >> that can separate them e.g. RX disabled while TX enabled? >> > > Rx does not matter, whatever isnt offloaded in hardware is just computed > locally such as checking the cksum. Its Tx that messes up the bridge, if > a outgoing packet is generated locally on an interface that has Tx > offloading, it may actaully be sent out a different bridge member that > does not have that capability. This would cause it to be sent with an > invalid checksum for instance. > > The bridge used to just disable Tx offloading but this patch you are > testing makes sure each feature is supported by all members. > > >> 4) I'm not sure why bridge should not work with two interfaces one of which >> support TX and the other does not? At least if I turn on checksum offload >> only on one of the interfaces the bridge is still working ... >> >> Andrew Thompson wrote: >> >> - cut - >> >>> This patch should do that, are you able to test it Stefan? >>> >>> >>> cheers, >>> Andrew >>> >>> >> P.S. I saw very good results with netisr2 on a kernel from p4 before few >> months .. are there any patches flying around so I can test them with >> 7-STABLE? :) >> >> -- >> >> Best Wishes, >> Stefan Lambrev >> ICQ# 24134177 >> >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From sam at freebsd.org Tue Jul 1 14:56:46 2008 From: sam at freebsd.org (Sam Leffler) Date: Tue Jul 1 14:56:50 2008 Subject: FreeBSD NAT-T patch integration In-Reply-To: <20080630040103.94730.qmail@mailgate.gta.com> References: <20080630040103.94730.qmail@mailgate.gta.com> Message-ID: <486A45AB.2080609@freebsd.org> Larry Baird wrote: >> And how do I know that it works ? >> Well, when it doesn't work, I do know it, quite quickly most of the >> time ! >> > I have to chime in here. I did most of the initial porting of the > NAT-T patches from Kame IPSec to FAST_IPSEC. I did look at every > line of code during this process. I found no security problems during > the port. Like Yvan, my company uses the NAT-T patches commercially. > Like he says, if it had problems, we would hear about it. If the patches > don't get commited, I highly suspect Yvan or myself would try to keep the > patches up todate. So far I have done FAST_IPSEC pacthes for FreeBSD 4,5,6. > Yvan did 7 and 8 by himself. Keeping up gets to be a pain after a while. > I do plan to look at the FreeBSD 7 patches soon, but it sure would be nice > to see it commited. > > This whole issue seems ridiculous. I've been trying to get the NAT-T patches committed for a while but since I'm not setup to do any IPSEC testing have deferred to others. If we need to break a logjam I'll pitch in. Sam From sam at freebsd.org Tue Jul 1 15:12:01 2008 From: sam at freebsd.org (Sam Leffler) Date: Tue Jul 1 15:12:06 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <20080701140550.GA379@citylink.fud.org.nz> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> <4869FE2E.4070805@moneybookers.com> <20080701140550.GA379@citylink.fud.org.nz> Message-ID: <486A40BB.70006@freebsd.org> Andrew Thompson wrote: > On Tue, Jul 01, 2008 at 12:51:42PM +0300, Stefan Lambrev wrote: > >> Hi, >> >> May be a stupid questions, but: >> >> 1) There are zero matches of IFCAP_TOE in kernel sources .. there is not >> support for TOE in 7.0, but may be this is work in progress for 8-current? >> > > Yes, its in current only. Just remove IFCAP_TOE. > > >> 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - TOE >> should be repleaced with RXCSUM or just removed? >> 3) Why RX is never checked? In my case this doesn't matter because em turn >> off both TX and RX if only one is disabled, but probably there is a >> hardware, >> that can separate them e.g. RX disabled while TX enabled? >> > > Rx does not matter, whatever isnt offloaded in hardware is just computed > locally such as checking the cksum. Its Tx that messes up the bridge, if > a outgoing packet is generated locally on an interface that has Tx > offloading, it may actaully be sent out a different bridge member that > does not have that capability. This would cause it to be sent with an > invalid checksum for instance. > > The bridge used to just disable Tx offloading but this patch you are > testing makes sure each feature is supported by all members. > > >> 4) I'm not sure why bridge should not work with two interfaces one of which >> support TX and the other does not? At least if I turn on checksum offload >> only on one of the interfaces the bridge is still working ... >> >> Andrew Thompson wrote: >> >> - cut - >> >>> This patch should do that, are you able to test it Stefan? >>> >>> >>> cheers, >>> Andrew >>> >>> >> P.S. I saw very good results with netisr2 on a kernel from p4 before few >> months .. are there any patches flying around so I can test them with >> 7-STABLE? :) >> >> This issue has come up before. Handling checksum offload in the bridge for devices that are not capable is not a big deal and is important for performance. TSO likewise should be done but we're missing a generic TSO support routine to do that (I believe, netbsd has one and linux has a GSO mechanism). Sam From paul at gtcomm.net Tue Jul 1 16:04:03 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 16:04:08 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869F621.4000508@moneybookers.com> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> <4869F621.4000508@moneybookers.com> Message-ID: <486A55ED.2030304@gtcomm.net> Thanks.. I was hoping I wasn't seeing things :> I do not like inconsistencies.. :/ Stefan Lambrev wrote: > > > Greetings Paul, >> >> >> ------OK I'm stumped now.. Rebuilt with preemption and ULE and >> preemption again and it's not doing what it did before.. > I saw this in my configuration too :) Just leave your test running for > longer time and you will see this strange inconsistency in action. > In my configuration I almost always have better throughput after > reboot, which drops latter (5-10min under flood) with 50-60kpps and > after another 10-15min the number of correctly passed packet increase > again. Looks like "auto tuning" of which I'm not aware :) > >> How could that be? Now about 500kpps.. >> >> That kind of inconsistency almost invalidates all my testing.. why >> would it be so much different after trying a bunch of kernel options >> and rebooting a bunch of times and then going back to the original >> config doesn't get you what it did in the beginning.. >> >> I'll have to dig into this further.. never seen anything like it :) >> >> Hopefully the ip_input fix will help free up a few cpu cycles. >> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From paul at gtcomm.net Tue Jul 1 16:08:23 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 16:08:30 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> Message-ID: <486A56F2.2000400@gtcomm.net> I am going to.. I have an opteron 270 dual set up on 32 bit and the 2212 is set up on 64 bit :) Today should bring some 32 bit results as well as etherchannel results. Ingo Flaschberger wrote: > Dear Paul, > >> >> Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot >> of junk in the config, added >> options NO_ADAPTIVE_MUTEXES not sure if that makes any >> difference or not, will test without. >> Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf >> 7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul 1 01:22:18 CDT 2008 amd64 >> Max input rate .. 587kpps? Take into consideration that these >> packets are being forwarded out em1 interface which >> causes a great impact on cpu usage. If I set up a firewall rule to >> block the packets it can do over 1mpps on em0 input. > > would be great if you can also test with 32bit. > > what value do you have at net.inet.ip.intr_queue_maxlen? > > kind regards, > Ingo Flaschberger > > From petar at smokva.net Tue Jul 1 17:23:07 2008 From: petar at smokva.net (Petar Bogdanovic) Date: Tue Jul 1 17:23:12 2008 Subject: dhclient.c: script_go() vs. priv_script_go() Message-ID: <20080701172304.GA17817@pintail.smokva.net> Hi, it's probably because I don't understand the code but may I ask what script_go() is supposed to do? The only function in dhclient.c using execve() is priv_script_go() and this gets executed only once in main() with $reason = PREINIT. That's why I looked at the code in the first place: I can't make dhclient run dhclient-script with anything else than PREINIT. I would expect at least one additional run with i.e. $reason = BOUND. Thanks for the enlightenment, Petar From paul at gtcomm.net Tue Jul 1 18:56:11 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 18:56:15 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4869B025.9080006@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> Message-ID: <486A7E45.3030902@gtcomm.net> I can't reproduce the 580kpps maximum that I saw when I first compiled for some reason, I don't understand, the max I get even with ULE and preemption is now about 530 and it dips to 480 a lot.. The first time I tried it it was at 580 and dipped to 520...what the?.. (kernel config attached at end) * noticed that SOMETIMES the em0 taskq jumps around cpus and doesn't use 100% of any one cpu * noticed that the netstat packets per second rate varies explicitly with the CPU usage of em0 taskq (top output with ULE/PREEMPTION compiled in): PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 3 64:12 94.09% idle: cpu3 36 root -68 - 0K 16K CPU1 1 5:43 89.75% em0 taskq 13 root 171 ki31 0K 16K CPU0 0 63:21 87.30% idle: cpu0 12 root 171 ki31 0K 16K RUN 1 62:44 66.75% idle: cpu1 11 root 171 ki31 0K 16K CPU2 2 62:17 56.49% idle: cpu2 39 root -68 - 0K 16K - 0 0:54 10.64% em3 taskq this is about 480-500kpps rate......... now I wait a minute and PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 10 root 171 ki31 0K 16K CPU3 3 64:56 100.00% idle: cpu3 36 root -68 - 0K 16K CPU2 2 6:21 94.14% em0 taskq 13 root 171 ki31 0K 16K RUN 0 63:55 80.18% idle: cpu0 11 root 171 ki31 0K 16K RUN 2 62:48 67.38% idle: cpu2 12 root 171 ki31 0K 16K CPU1 1 63:04 58.40% idle: cpu1 39 root -68 - 0K 16K - 1 1:00 10.21% em3 taskq 530kpps rate....... drops to 85%.. 480kpps rate goes back up to 95% 530kpps it keeps flopping like this........... none of the CPUs are 100% use and none of the cpus add up , like the cpu time of em0 taskq is 94% so one of the cpus should be 6% idle but it's not. This is with ULE/PREEMPTION.. I see different behavior without preemption and with 4bsd.. and I also see different behavior depending on the time of day lol :) Figure that one out I'll post back without preemption and with 4bsd in a min then i'll move on to the 32 bit platform tests From david.kwan at isilon.com Tue Jul 1 20:02:33 2008 From: david.kwan at isilon.com (David Kwan) Date: Tue Jul 1 20:02:35 2008 Subject: Poor network performance for clients in 100MB to Gigabit environment Message-ID: I have a couple of questions regarding the TCP Stack: I have a situation with clients on a 100MB network connecting to servers on a Gigabit network where the client read speeds are very slow from the FreeBSD server and fast from the Linux server; Write speeds from the clients to both servers are fast. (Clients on the gigabit network work fine with blazing read and write speeds). The network traces shows congestion packets for both servers when doing reads from the clients (dup acks and retransmissions), but the Linux server seem to handle the congestion better. ECN is not enabled on the network and I don't see any congestion windowing or clients window changing. The 100MB/1G switch is dropping packets. I double checked the network configuration and also swapped swithports for the servers to use the others to make sure the switch configuration are the same, and the Linux always does better than FreeBSD. Assuming that the network configuration is a constant for all clients and servers (speed, duplex, and etc...), the only variable is the servers themselves (Linux and FreeBSD). I have tried a couple of FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, with no luck matching the speed and network utilization of Linux (2 years old). The read speed test I'm referring is doing transferring of a 100MB file (cifs, nfs, and ftp), and the Linux server does it consistently in around 10 sec (line speed) with a constant network utilization chart, while the FreeBSD servers are magnitudes slower with erratic network utilization chart. I've attempted to tweak some network sysctl options on the FreeBSD, and the only ones that helped were disabling TSO and inflight; which leads me to think that the inter-packet gap was slightly increased to partially relieve congestion on the switch; not a long term solution. My questions are: 1. Have you heard of this problem before with 100MB clients to Gigabit servers? 2. Are you aware of any Linux fix/patch in the TCP stack to better handling congestion than FreeBSD? I'm looking to address this issue in the FreeBSD, but wondering if the Linux stack did something special that can help with the FreeBSD performance. David K. From paul at gtcomm.net Tue Jul 1 20:08:10 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 20:08:14 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486A7E45.3030902@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> Message-ID: <486A8F24.5010000@gtcomm.net> ULE without PREEMPTION is now yeilding better results. input (em0) output packets errs bytes packets errs bytes colls 571595 40639 34564108 1 0 226 0 577892 48865 34941908 1 0 178 0 545240 84744 32966404 1 0 178 0 587661 44691 35534512 1 0 178 0 587839 38073 35544904 1 0 178 0 587787 43556 35540360 1 0 178 0 540786 39492 32712746 1 0 178 0 572071 55797 34595650 1 0 178 0 *OUCH, IPFW HURTS.. loading ipfw, and adding one ipfw rule allow ip from any to any drops 100Kpps off :/ what's up with THAT? unloaded ipfw module and back 100kpps more again, that's not right with ONE rule.. :/ em0 taskq is still jumping cpus.. is there any way to lock it to one cpu or is this just a function of ULE running a tar czpvf all.tgz * and seeing if pps changes.. negligible.. guess scheduler is doing it's job at least.. Hmm. even when it's getting 50-60k errors per second on the interface I can still SCP a file through that interface although it's not fast.. 3-4MB/s.. You know, I wouldn't care if it added 5ms latency to the packets when it was doing 1mpps as long as it didn't drop any.. Why can't it do that? Queue them up and do them in bigggg chunks so none are dropped........hmm? 32 bit system is compiling now.. won't do > 400kpps with GENERIC kernel, as with 64 bit did 450k with GENERIC, although that could be the difference between opteron 270 and opteron 2212.. Paul From paul at gtcomm.net Tue Jul 1 20:19:02 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 20:19:06 2008 Subject: Poor network performance for clients in 100MB to Gigabit environment In-Reply-To: References: Message-ID: <486A91B0.6040505@gtcomm.net> What options do you have enabled on the linux server? sysctl -a | grep net.ipv4.tcp and on the bsd sysctl -a net.inet.tcp It sounds like a problem with BSD not handing the dropped data or ack packets so what happens is it pushes a burst of data out > 100mbit and the switch drops the packets and then BSD waits too long to recover and doesn't scale the transmission back. TCP is supposed to scale down the transmission speed until packets are not dropped to a point even without ECN. Options such as 'reno' and 'sack' etc. are congestion control algorithms that use congestion windows. David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > > > My questions are: > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > > > David K. > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From jfvogel at gmail.com Tue Jul 1 20:25:27 2008 From: jfvogel at gmail.com (Jack Vogel) Date: Tue Jul 1 20:25:31 2008 Subject: Poor network performance for clients in 100MB to Gigabit environment In-Reply-To: References: Message-ID: <2a41acea0807011325n782b1ca7hfe78f9da67ba0462@mail.gmail.com> Take it from someone who has spent a couple weeks beating his head against a wall over this... system tuning is essential. If your driver is going to the kernel looking for a resource and having to wait, its gonna hurt... Look into kern.ipc, and as Paul said net.inet. Off the shelf config is more than likely going to be inadequate. Good luck, Jack On Tue, Jul 1, 2008 at 12:50 PM, David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > > > My questions are: > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > > > David K. > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From paul at gtcomm.net Tue Jul 1 20:34:58 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 20:35:02 2008 Subject: Maximum ARP Entries Message-ID: <486A956D.3030001@gtcomm.net> Does anyone know if there is a maximum number of ARP entries/ adjacencies that FBSD can handle before recycling? I want to route several thousand ips direct to some interfaces so it will have 3-4k ARP entries.. I'm curious because in Linux I have to set the sysctl net.ipv4.neigh threshholds a lot higher or it bombs with 'too many neighbors'... I don't see a setting like this in BSD sysctl . Thanks! Paul From julian at elischer.org Tue Jul 1 20:56:43 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 1 20:56:47 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486A8F24.5010000@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> Message-ID: <486A9A0E.6060308@elischer.org> Paul wrote: > ULE without PREEMPTION is now yeilding better results. > input (em0) output > packets errs bytes packets errs bytes colls > 571595 40639 34564108 1 0 226 0 > 577892 48865 34941908 1 0 178 0 > 545240 84744 32966404 1 0 178 0 > 587661 44691 35534512 1 0 178 0 > 587839 38073 35544904 1 0 178 0 > 587787 43556 35540360 1 0 178 0 > 540786 39492 32712746 1 0 178 0 > 572071 55797 34595650 1 0 178 0 > > *OUCH, IPFW HURTS.. > loading ipfw, and adding one ipfw rule allow ip from any to any drops > 100Kpps off :/ what's up with THAT? > unloaded ipfw module and back 100kpps more again, that's not right with > ONE rule.. :/ ipfw need sto gain a lock on hte firewall before running, and is quite complex.. I can believe it.. in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two interfaces (bridged) but I think it has slowed down since then due to the SMP locking. > > em0 taskq is still jumping cpus.. is there any way to lock it to one cpu > or is this just a function of ULE > > running a tar czpvf all.tgz * and seeing if pps changes.. > negligible.. guess scheduler is doing it's job at least.. > > Hmm. even when it's getting 50-60k errors per second on the interface I > can still SCP a file through that interface although it's not fast.. > 3-4MB/s.. > > You know, I wouldn't care if it added 5ms latency to the packets when it > was doing 1mpps as long as it didn't drop any.. Why can't it do that? > Queue them up and do them in bigggg chunks so none are dropped........hmm? > > 32 bit system is compiling now.. won't do > 400kpps with GENERIC > kernel, as with 64 bit did 450k with GENERIC, although that could be > the difference between opteron 270 and opteron 2212.. > > Paul > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From paul at gtcomm.net Tue Jul 1 22:47:29 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 1 22:47:36 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486A9A0E.6060308@elischer.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> Message-ID: <486AB47B.2050200@gtcomm.net> Ok, now THIS is absoultely a whole bunch of ridiculousness.. I set up etherchannel, and I'm evenly distributing packets over em0 em1 and em2 to lagg0 and i get WORSE performance than with a single interface.. Can anyone explain this one? This is horrible. I got em0-em2 taskq's using 80% cpu EACH and they are only doing 100kpps EACH looks: packets errs bytes packets errs bytes colls 105050 11066 6303000 0 0 0 0 104952 13969 6297120 0 0 0 0 104331 12121 6259860 0 0 0 0 input (em1) output packets errs bytes packets errs bytes colls 103734 70658 6223998 0 0 0 0 103483 75703 6209046 0 0 0 0 103848 76195 6230886 0 0 0 0 input (em2) output packets errs bytes packets errs bytes colls 103299 62957 6197940 1 0 226 0 106388 73071 6383280 1 0 178 0 104503 70573 6270180 4 0 712 0 last pid: 1378; load averages: 2.31, 1.28, 0.57 up 0+00:06:27 17:42:32 68 processes: 8 running, 42 sleeping, 18 waiting CPU: 0.0% user, 0.0% nice, 58.9% system, 0.0% interrupt, 41.1% idle Mem: 7980K Active, 5932K Inact, 47M Wired, 16K Cache, 8512K Buf, 1920M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K 16K RUN 2 5:18 80.47% idle: cpu2 38 root -68 - 0K 16K CPU3 3 2:30 80.18% em2 taskq 37 root -68 - 0K 16K CPU1 1 2:28 76.90% em1 taskq 36 root -68 - 0K 16K CPU2 2 2:28 72.56% em0 taskq 13 root 171 ki31 0K 16K RUN 0 3:32 29.20% idle: cpu0 12 root 171 ki31 0K 16K RUN 1 3:29 27.88% idle: cpu1 10 root 171 ki31 0K 16K RUN 3 3:21 25.63% idle: cpu3 39 root -68 - 0K 16K - 3 0:32 17.68% em3 taskq See that's total wrongness.. something is very wrong here. Does anyone have any ideas? I really need to get this working. I figured if I evenly distributed the packets over 3 interfaces it simulates having 3 rx queues because it has a separate process for each interface and the result is WAY more CPU usage and a little over half the pps throughput with a single port .. If anyone is interested in tackling some these issues please e-mail me. It would be greatly appreciated. Paul Julian Elischer wrote: > Paul wrote: >> ULE without PREEMPTION is now yeilding better results. >> input (em0) output >> packets errs bytes packets errs bytes colls >> 571595 40639 34564108 1 0 226 0 >> 577892 48865 34941908 1 0 178 0 >> 545240 84744 32966404 1 0 178 0 >> 587661 44691 35534512 1 0 178 0 >> 587839 38073 35544904 1 0 178 0 >> 587787 43556 35540360 1 0 178 0 >> 540786 39492 32712746 1 0 178 0 >> 572071 55797 34595650 1 0 178 0 >> >> *OUCH, IPFW HURTS.. >> loading ipfw, and adding one ipfw rule allow ip from any to any drops >> 100Kpps off :/ what's up with THAT? >> unloaded ipfw module and back 100kpps more again, that's not right >> with ONE rule.. :/ > > ipfw need sto gain a lock on hte firewall before running, > and is quite complex.. I can believe it.. > > in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two > interfaces (bridged) but I think it has slowed down since then due to > the SMP locking. > > >> >> em0 taskq is still jumping cpus.. is there any way to lock it to one >> cpu or is this just a function of ULE >> >> running a tar czpvf all.tgz * and seeing if pps changes.. >> negligible.. guess scheduler is doing it's job at least.. >> >> Hmm. even when it's getting 50-60k errors per second on the interface >> I can still SCP a file through that interface although it's not >> fast.. 3-4MB/s.. >> >> You know, I wouldn't care if it added 5ms latency to the packets when >> it was doing 1mpps as long as it didn't drop any.. Why can't it do >> that? Queue them up and do them in bigggg chunks so none are >> dropped........hmm? >> >> 32 bit system is compiling now.. won't do > 400kpps with GENERIC >> kernel, as with 64 bit did 450k with GENERIC, although that could be >> the difference between opteron 270 and opteron 2212.. >> >> Paul >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From sam at errno.com Tue Jul 1 22:54:07 2008 From: sam at errno.com (Sam Leffler) Date: Tue Jul 1 22:54:09 2008 Subject: kern/124753: net80211 discards power-save queue packets early In-Reply-To: References: <200806191030.m5JAU36i027140@freefall.freebsd.org> Message-ID: <486AB58D.7050005@errno.com> Sepherosa Ziehau wrote: > On Thu, Jun 19, 2008 at 6:30 PM, wrote: > >> Synopsis: net80211 discards power-save queue packets early >> >> Responsible-Changed-From-To: freebsd-i386->freebsd-net >> Responsible-Changed-By: remko >> Responsible-Changed-When: Thu Jun 19 10:29:47 UTC 2008 >> Responsible-Changed-Why: >> reassign to networking team. >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=124753 >> > > In How-To-Repeat, you said: > "Then associate a recent Windows Mobile 6.1 device to the FreeBSD box > running hostapd ..." > > In Description, you said: > "The WM6.1 device recv ps-poll's for packets every 20 seconds ..." > > AFAIK, STA sends ps-poll to AP; AP does not send ps-poll to STA. Why > did your windows STA receive ps-poll from freebsd AP? Did you capture > it by using 802.11 tap? > > And which freebsd driver were you using? > > Your problem looks like: > - Either freebsd AP did not properly configure TIM in beacons, which > could be easily found out by using 802.11 tap. But I highly suspect > if you were using ath(4), TIM would be misconfigured. > - Or your windows STA didn't process TIM according to 802.11 standard. > > The PR states the listen interval sent by the station is 3 (beacons) and the beacon interval is 100TU. This means the AP is required to buffer unicast frames for only 300TU which is ~300 ms. But according to the report the Windows device is polling every 20 seconds so there's no guarantee any packets will be present (even with the net80211 code arbitrarily using 4x the list interval specified by the sta). I find it really hard to believe a device would poll every 20 secs so something seems wrong in what's reported/observed. Given that defeating the aging logic just pushed the problem elsewhere it sounds like there's something else wrong which (as you note) probably requires a packet capture to understand. I'm pretty sure TIM is handled correctly in RELENG_7 but a packet capture would help us verify that. Sam From ru at FreeBSD.org Tue Jul 1 23:00:19 2008 From: ru at FreeBSD.org (Ruslan Ermilov) Date: Tue Jul 1 23:00:24 2008 Subject: Maximum ARP Entries In-Reply-To: <486A956D.3030001@gtcomm.net> References: <486A956D.3030001@gtcomm.net> Message-ID: <20080701223247.GB8518@edoofus.dev.vega.ru> On Tue, Jul 01, 2008 at 04:37:01PM -0400, Paul wrote: > Does anyone know if there is a maximum number of ARP entries/ > adjacencies that FBSD can handle before recycling? > In FreeBSD, ARP still uses routing table as its storage, and as such limits on the routing table memory applies, and the latter currently has no limit. Cheers, -- Ruslan Ermilov ru@FreeBSD.org FreeBSD committer From paul at gtcomm.net Wed Jul 2 00:06:59 2008 From: paul at gtcomm.net (Paul) Date: Wed Jul 2 00:07:04 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486AA299.7090904@elischer.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486A9BBF.1060308@gtcomm.net> <486AA299.7090904@elischer.org> Message-ID: <486AC71D.2080804@gtcomm.net> Apparently lagg hasn't been giant fixed :/ Can we do something about this quickly? with adaptive giant i get more performance on lagg but the cpu usage is smashed 100% I get about 50k more pps per interface (so 150kpps total which STILL is less than a single gigabit port) Check it out 68 processes: 9 running, 41 sleeping, 18 waiting CPU: 0.0% user, 0.0% nice, 89.5% system, 0.0% interrupt, 10.5% idle Mem: 8016K Active, 6192K Inact, 47M Wired, 108K Cache, 9056K Buf, 1919M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 38 root -68 - 0K 16K CPU1 1 3:29 100.00% em2 taskq 37 root -68 - 0K 16K CPU0 0 3:31 98.78% em1 taskq 36 root -68 - 0K 16K CPU3 3 2:53 82.42% em0 taskq 11 root 171 ki31 0K 16K RUN 2 22:48 79.00% idle: cpu2 10 root 171 ki31 0K 16K RUN 3 20:51 22.90% idle: cpu3 39 root -68 - 0K 16K RUN 2 0:32 16.60% em3 taskq 12 root 171 ki31 0K 16K RUN 1 20:16 2.05% idle: cpu1 13 root 171 ki31 0K 16K RUN 0 20:25 1.90% idle: cpu0 input (em0) output packets errs bytes packets errs bytes colls 122588 0 7355280 0 0 0 0 123057 0 7383420 0 0 0 0 input (em1) output packets errs bytes packets errs bytes colls 174917 11899 10495032 2 0 178 0 173967 11697 10438038 2 0 356 0 174630 10603 10477806 2 0 268 0 input (em2) output packets errs bytes packets errs bytes colls 175843 3928 10550580 0 0 0 0 175952 5750 10557120 0 0 0 0 Still less performance than single gig-e.. that giant lock really sucks , and why on earth would LAGG require that.. It seems so simple to fix :/ Anyone up for it:) I wish I was a programmer sometimes, but network engineering will have to do. :D Julian Elischer wrote: > Paul wrote: >> Is PF better than ipfw? iptables almost has no impact on routing >> performance unless I add a swath of rules to it and then it bombs >> I need maybe 10 rules max and I don't want 20% performance drop for >> that.. :P > > well lots of people have wanted to fix it, and I've investigated > quite a lot but it takes someone with 2 weeks of free time and > all the right clue. It's not inherrent in ipfw but it needs some > TLC from someone who cares :-). > > > >> Ouch! :) Is this going to be fixed any time soon? We have some >> money that can be used for development costs to fix things like this >> because >> we use linux and freebsd machines as firewalls for a lot of customers >> and with the increasing bandwidth and pps the customers are demanding >> more and I >> can't give them better performance with a brand new dual xeon or >> opteron machine vs the old p4 machines I have them running on now :/ >> The only difference >> in the new machine vs old machine is that the new one can take in >> more pps and drop it but it can't route a whole lot more. >> Routing/firewalling must still not be lock free, ugh.. :P >> >> Thanks >> >> >> >> Julian Elischer wrote: >>> Paul wrote: >>>> ULE without PREEMPTION is now yeilding better results. >>>> input (em0) output >>>> packets errs bytes packets errs bytes colls >>>> 571595 40639 34564108 1 0 226 0 >>>> 577892 48865 34941908 1 0 178 0 >>>> 545240 84744 32966404 1 0 178 0 >>>> 587661 44691 35534512 1 0 178 0 >>>> 587839 38073 35544904 1 0 178 0 >>>> 587787 43556 35540360 1 0 178 0 >>>> 540786 39492 32712746 1 0 178 0 >>>> 572071 55797 34595650 1 0 178 0 >>>> >>>> *OUCH, IPFW HURTS.. >>>> loading ipfw, and adding one ipfw rule allow ip from any to any >>>> drops 100Kpps off :/ what's up with THAT? >>>> unloaded ipfw module and back 100kpps more again, that's not right >>>> with ONE rule.. :/ >>> >>> ipfw need sto gain a lock on hte firewall before running, >>> and is quite complex.. I can believe it.. >>> >>> in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two >>> interfaces (bridged) but I think it has slowed down since then due >>> to the SMP locking. >>> >>> >>>> >>>> em0 taskq is still jumping cpus.. is there any way to lock it to >>>> one cpu or is this just a function of ULE >>>> >>>> running a tar czpvf all.tgz * and seeing if pps changes.. >>>> negligible.. guess scheduler is doing it's job at least.. >>>> >>>> Hmm. even when it's getting 50-60k errors per second on the >>>> interface I can still SCP a file through that interface although >>>> it's not fast.. 3-4MB/s.. >>>> >>>> You know, I wouldn't care if it added 5ms latency to the packets >>>> when it was doing 1mpps as long as it didn't drop any.. Why can't >>>> it do that? Queue them up and do them in bigggg chunks so none are >>>> dropped........hmm? >>>> >>>> 32 bit system is compiling now.. won't do > 400kpps with GENERIC >>>> kernel, as with 64 bit did 450k with GENERIC, although that could be >>>> the difference between opteron 270 and opteron 2212.. >>>> >>>> Paul >>>> >>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >>> > > From david.kwan at isilon.com Wed Jul 2 00:30:38 2008 From: david.kwan at isilon.com (David Kwan) Date: Wed Jul 2 00:30:42 2008 Subject: Poor network performance for clients in 100MB toGigabit environment In-Reply-To: <486A91B0.6040505@gtcomm.net> References: <486A91B0.6040505@gtcomm.net> Message-ID: I've attempt many standard and non-standard permutations of the tcp tuning parameters without much successful via sysctl. It feels like FreeBSD is not handling the congestion very well and is beyond tuning sysctl. It's just clients on the 100MB networks has slow/erratic reads; Clients on the Gigabit network are fine and screams, so the original tcp parameters are just fine for them. For the record, these are the sysctl options for the Linux and FreeBSD. Linux: net.ipv4.conf.eth0.force_igmp_version = 0 net.ipv4.conf.eth0.disable_policy = 0 net.ipv4.conf.eth0.disable_xfrm = 0 net.ipv4.conf.eth0.arp_ignore = 0 net.ipv4.conf.eth0.arp_announce = 0 net.ipv4.conf.eth0.arp_filter = 0 net.ipv4.conf.eth0.tag = 0 net.ipv4.conf.eth0.log_martians = 0 net.ipv4.conf.eth0.bootp_relay = 0 net.ipv4.conf.eth0.medium_id = 0 net.ipv4.conf.eth0.proxy_arp = 0 net.ipv4.conf.eth0.accept_source_route = 0 net.ipv4.conf.eth0.send_redirects = 1 net.ipv4.conf.eth0.rp_filter = 1 net.ipv4.conf.eth0.shared_media = 1 net.ipv4.conf.eth0.secure_redirects = 1 net.ipv4.conf.eth0.accept_redirects = 1 net.ipv4.conf.eth0.mc_forwarding = 0 net.ipv4.conf.eth0.forwarding = 0 net.ipv4.conf.lo.force_igmp_version = 0 net.ipv4.conf.lo.disable_policy = 1 net.ipv4.conf.lo.disable_xfrm = 1 net.ipv4.conf.lo.arp_ignore = 0 net.ipv4.conf.lo.arp_announce = 0 net.ipv4.conf.lo.arp_filter = 0 net.ipv4.conf.lo.tag = 0 net.ipv4.conf.lo.log_martians = 0 net.ipv4.conf.lo.bootp_relay = 0 net.ipv4.conf.lo.medium_id = 0 net.ipv4.conf.lo.proxy_arp = 0 net.ipv4.conf.lo.accept_source_route = 1 net.ipv4.conf.lo.send_redirects = 1 net.ipv4.conf.lo.rp_filter = 0 net.ipv4.conf.lo.shared_media = 1 net.ipv4.conf.lo.secure_redirects = 1 net.ipv4.conf.lo.accept_redirects = 1 net.ipv4.conf.lo.mc_forwarding = 0 net.ipv4.conf.lo.forwarding = 0 net.ipv4.conf.default.force_igmp_version = 0 net.ipv4.conf.default.disable_policy = 0 net.ipv4.conf.default.disable_xfrm = 0 net.ipv4.conf.default.arp_ignore = 0 net.ipv4.conf.default.arp_announce = 0 net.ipv4.conf.default.arp_filter = 0 net.ipv4.conf.default.tag = 0 net.ipv4.conf.default.log_martians = 0 net.ipv4.conf.default.bootp_relay = 0 net.ipv4.conf.default.medium_id = 0 net.ipv4.conf.default.proxy_arp = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.default.send_redirects = 1 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.shared_media = 1 net.ipv4.conf.default.secure_redirects = 1 net.ipv4.conf.default.accept_redirects = 1 net.ipv4.conf.default.mc_forwarding = 0 net.ipv4.conf.default.forwarding = 0 net.ipv4.conf.all.force_igmp_version = 0 net.ipv4.conf.all.disable_policy = 0 net.ipv4.conf.all.disable_xfrm = 0 net.ipv4.conf.all.arp_ignore = 0 net.ipv4.conf.all.arp_announce = 0 net.ipv4.conf.all.arp_filter = 0 net.ipv4.conf.all.tag = 0 net.ipv4.conf.all.log_martians = 0 net.ipv4.conf.all.bootp_relay = 0 net.ipv4.conf.all.medium_id = 0 net.ipv4.conf.all.proxy_arp = 0 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.all.send_redirects = 1 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.all.shared_media = 1 net.ipv4.conf.all.secure_redirects = 1 net.ipv4.conf.all.accept_redirects = 1 net.ipv4.conf.all.mc_forwarding = 0 net.ipv4.conf.all.forwarding = 0 net.ipv4.neigh.eth0.locktime = 99 net.ipv4.neigh.eth0.proxy_delay = 79 net.ipv4.neigh.eth0.anycast_delay = 99 net.ipv4.neigh.eth0.proxy_qlen = 64 net.ipv4.neigh.eth0.unres_qlen = 3 net.ipv4.neigh.eth0.gc_stale_time = 60 net.ipv4.neigh.eth0.delay_first_probe_time = 5 net.ipv4.neigh.eth0.base_reachable_time = 30 net.ipv4.neigh.eth0.retrans_time = 99 net.ipv4.neigh.eth0.app_solicit = 0 net.ipv4.neigh.eth0.ucast_solicit = 3 net.ipv4.neigh.eth0.mcast_solicit = 3 net.ipv4.neigh.lo.locktime = 99 net.ipv4.neigh.lo.proxy_delay = 79 net.ipv4.neigh.lo.anycast_delay = 99 net.ipv4.neigh.lo.proxy_qlen = 64 net.ipv4.neigh.lo.unres_qlen = 3 net.ipv4.neigh.lo.gc_stale_time = 60 net.ipv4.neigh.lo.delay_first_probe_time = 5 net.ipv4.neigh.lo.base_reachable_time = 30 net.ipv4.neigh.lo.retrans_time = 99 net.ipv4.neigh.lo.app_solicit = 0 net.ipv4.neigh.lo.ucast_solicit = 3 net.ipv4.neigh.lo.mcast_solicit = 3 net.ipv4.neigh.default.gc_thresh3 = 1024 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_interval = 30 net.ipv4.neigh.default.locktime = 99 net.ipv4.neigh.default.proxy_delay = 79 net.ipv4.neigh.default.anycast_delay = 99 net.ipv4.neigh.default.proxy_qlen = 64 net.ipv4.neigh.default.unres_qlen = 3 net.ipv4.neigh.default.gc_stale_time = 60 net.ipv4.neigh.default.delay_first_probe_time = 5 net.ipv4.neigh.default.base_reachable_time = 30 net.ipv4.neigh.default.retrans_time = 99 net.ipv4.neigh.default.app_solicit = 0 net.ipv4.neigh.default.ucast_solicit = 3 net.ipv4.neigh.default.mcast_solicit = 3 net.ipv4.tcp_slow_start_after_idle = 1 net.ipv4.tcp_workaround_signed_windows = 1 net.ipv4.tcp_bic_beta = 819 net.ipv4.tcp_tso_win_divisor = 8 net.ipv4.tcp_moderate_rcvbuf = 1 net.ipv4.tcp_bic_low_window = 14 net.ipv4.tcp_bic_fast_convergence = 1 net.ipv4.tcp_bic = 1 net.ipv4.tcp_vegas_gamma = 2 net.ipv4.tcp_vegas_beta = 6 net.ipv4.tcp_vegas_alpha = 2 net.ipv4.tcp_vegas_cong_avoid = 0 net.ipv4.tcp_westwood = 0 net.ipv4.tcp_no_metrics_save = 0 net.ipv4.ipfrag_secret_interval = 600 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_frto = 0 net.ipv4.tcp_tw_reuse = 0 net.ipv4.icmp_ratemask = 6168 net.ipv4.icmp_ratelimit = 1000 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_app_win = 31 net.ipv4.tcp_rmem = 4096 87380 174760 net.ipv4.tcp_wmem = 4096 16384 131072 net.ipv4.tcp_mem = 786432 1048576 1572864 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_fack = 1 net.ipv4.tcp_orphan_retries = 0 net.ipv4.inet_peer_gc_maxtime = 120 net.ipv4.inet_peer_gc_mintime = 10 net.ipv4.inet_peer_maxttl = 600 net.ipv4.inet_peer_minttl = 120 net.ipv4.inet_peer_threshold = 65664 net.ipv4.igmp_max_msf = 10 net.ipv4.igmp_max_memberships = 20 net.ipv4.route.secret_interval = 600 net.ipv4.route.min_adv_mss = 256 net.ipv4.route.min_pmtu = 552 net.ipv4.route.mtu_expires = 600 net.ipv4.route.gc_elasticity = 8 net.ipv4.route.error_burst = 5000 net.ipv4.route.error_cost = 1000 net.ipv4.route.redirect_silence = 20480 net.ipv4.route.redirect_number = 9 net.ipv4.route.redirect_load = 20 net.ipv4.route.gc_interval = 60 net.ipv4.route.gc_timeout = 300 net.ipv4.route.gc_min_interval = 0 net.ipv4.route.max_size = 1048576 net.ipv4.route.gc_thresh = 65536 net.ipv4.route.max_delay = 10 net.ipv4.route.min_delay = 2 net.ipv4.icmp_errors_use_inbound_ifaddr = 0 net.ipv4.icmp_ignore_bogus_error_responses = 0 net.ipv4.icmp_echo_ignore_broadcasts = 0 net.ipv4.icmp_echo_ignore_all = 0 net.ipv4.ip_local_port_range = 32768 61000 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_rfc1337 = 0 net.ipv4.tcp_stdurg = 0 net.ipv4.tcp_abort_on_overflow = 0 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_fin_timeout = 60 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 7200 net.ipv4.ipfrag_time = 30 net.ipv4.ip_dynaddr = 0 net.ipv4.ipfrag_low_thresh = 196608 net.ipv4.ipfrag_high_thresh = 262144 net.ipv4.tcp_max_tw_buckets = 180000 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syn_retries = 5 net.ipv4.ip_nonlocal_bind = 0 net.ipv4.ip_no_pmtu_disc = 0 net.ipv4.ip_autoconfig = 0 net.ipv4.ip_default_ttl = 64 net.ipv4.ip_forward = 0 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1 FreeBSD: net.inet.ip.portrange.lowfirst: 1023 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.first: 49152 net.inet.ip.portrange.last: 65535 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.hilast: 65535 net.inet.ip.portrange.reservedhigh: 1023 net.inet.ip.portrange.reservedlow: 0 net.inet.ip.portrange.randomized: 1 net.inet.ip.portrange.randomcps: 10 net.inet.ip.portrange.randomtime: 45 net.inet.ip.forwarding: 1 net.inet.ip.redirect: 1 net.inet.ip.ttl: 64 net.inet.ip.rtexpire: 3600 net.inet.ip.rtminexpire: 10 net.inet.ip.rtmaxcache: 128 net.inet.ip.sourceroute: 0 net.inet.ip.intr_queue_maxlen: 5000 net.inet.ip.intr_queue_drops: 0 net.inet.ip.accept_sourceroute: 0 net.inet.ip.keepfaith: 0 net.inet.ip.subnets_are_local: 0 net.inet.ip.same_prefix_carp_only: 0 net.inet.ip.fastforwarding: 0 net.inet.ip.process_options: 1 net.inet.ip.sendsourcequench: 0 net.inet.ip.random_id: 0 net.inet.ip.check_interface: 0 net.inet.ip.fragpackets: 0 net.inet.ip.maxfragsperpacket: 32 net.inet.ip.maxfragpackets: 1024 net.inet.icmp.maskrepl: 0 net.inet.icmp.icmplim: 1000 net.inet.icmp.maskfake: 0 net.inet.icmp.drop_redirect: 0 net.inet.icmp.log_redirect: 0 net.inet.icmp.icmplim_output: 1 net.inet.icmp.reply_src: net.inet.icmp.reply_from_interface: 0 net.inet.icmp.quotelen: 8 net.inet.icmp.bmcastecho: 0 net.inet.tcp.rfc1323: 1 net.inet.tcp.mssdflt: 512 net.inet.tcp.keepidle: 7200000 net.inet.tcp.keepintvl: 75000 net.inet.tcp.sendspace: 131072 net.inet.tcp.recvspace: 131072 net.inet.tcp.keepinit: 75000 net.inet.tcp.delacktime: 100 net.inet.tcp.hostcache.cachelimit: 15360 net.inet.tcp.hostcache.hashsize: 512 net.inet.tcp.hostcache.bucketlimit: 30 net.inet.tcp.hostcache.count: 4 net.inet.tcp.hostcache.expire: 3600 net.inet.tcp.hostcache.purge: 0 net.inet.tcp.log_in_vain: 0 net.inet.tcp.blackhole: 0 net.inet.tcp.delayed_ack: 1 net.inet.tcp.rfc3042: 1 net.inet.tcp.rfc3390: 1 net.inet.tcp.insecure_rst: 0 net.inet.tcp.reass.maxsegments: 8256 net.inet.tcp.reass.cursegments: 0 net.inet.tcp.reass.maxqlen: 48 net.inet.tcp.reass.overflows: 0 net.inet.tcp.path_mtu_discovery: 1 net.inet.tcp.slowstart_flightsize: 1 net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.newreno: 1 net.inet.tcp.sndrexmitpack: 0 net.inet.tcp.sndrexmitbyte: 0 net.inet.tcp.do_tso: 1 net.inet.tcp.effective_maxseg_limit: 65535 net.inet.tcp.min_tso_factor: 2 net.inet.tcp.sack.enable: 1 net.inet.tcp.sack.maxholes: 128 net.inet.tcp.sack.globalmaxholes: 65536 net.inet.tcp.sack.globalholes: 0 net.inet.tcp.minmss: 216 net.inet.tcp.minmssoverload: 0 net.inet.tcp.tcbhashsize: 512 net.inet.tcp.do_tcpdrain: 1 net.inet.tcp.pcbcount: 199 net.inet.tcp.icmp_may_rst: 1 net.inet.tcp.isn_reseed_interval: 0 net.inet.tcp.inflight.enable: 1 net.inet.tcp.inflight.debug: 0 net.inet.tcp.inflight.rttthresh: 10 net.inet.tcp.inflight.min: 6144 net.inet.tcp.inflight.max: 1073725440 net.inet.tcp.inflight.stab: 20 net.inet.tcp.min_rtt: 3 net.inet.tcp.max_rexmt_time: 6400 net.inet.tcp.rexmt_dupacks: 3 net.inet.tcp.syncookies: 1 net.inet.tcp.syncache.bucketlimit: 30 net.inet.tcp.syncache.cachelimit: 15359 net.inet.tcp.syncache.count: 0 net.inet.tcp.syncache.hashsize: 512 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.msl: 30000 net.inet.tcp.rexmit_min: 30 net.inet.tcp.rexmit_slop: 200 net.inet.tcp.always_keepalive: 1 net.inet.udp.checksum: 1 net.inet.udp.maxdgram: 9216 net.inet.udp.recvspace: 512000 net.inet.udp.log_in_vain: 0 net.inet.udp.blackhole: 0 net.inet.udp.strict_mcast_mship: 0 net.inet.raw.maxdgram: 8192 net.inet.raw.recvspace: 411648 net.inet.accf.unloadable: 0 David K. -----Original Message----- From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.org] On Behalf Of Paul Sent: Tuesday, July 01, 2008 1:21 PM To: David Kwan Cc: freebsd-net@freebsd.org Subject: Re: Poor network performance for clients in 100MB toGigabit environment What options do you have enabled on the linux server? sysctl -a | grep net.ipv4.tcp and on the bsd sysctl -a net.inet.tcp It sounds like a problem with BSD not handing the dropped data or ack packets so what happens is it pushes a burst of data out > 100mbit and the switch drops the packets and then BSD waits too long to recover and doesn't scale the transmission back. TCP is supposed to scale down the transmission speed until packets are not dropped to a point even without ECN. Options such as 'reno' and 'sack' etc. are congestion control algorithms that use congestion windows. David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > > > My questions are: > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > > > David K. > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From mike at sentex.net Wed Jul 2 02:22:02 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 2 02:22:07 2008 Subject: Route messages In-Reply-To: <20080701092254.T57089@maildrop.int.zabbadoz.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> Message-ID: <200807020221.m622Lxnf088882@lava.sentex.ca> At 05:24 AM 7/1/2008, Bjoern A. Zeeb wrote: >So I had a very quick look at the code between doing something else. >I think the only change needed is this if I am not mistaken but my >head is far away nowhere close enough in this code. Hi, The patch seems to work in that there is not an RTM_MISS message generated per packet forwarded on my test box. Is it the "final" / correct version ? ---Mike From mcdouga9 at egr.msu.edu Wed Jul 2 05:10:09 2008 From: mcdouga9 at egr.msu.edu (Adam McDougall) Date: Wed Jul 2 05:10:14 2008 Subject: Poor network performance for clients in 100MB toGigabit environment In-Reply-To: References: <486A91B0.6040505@gtcomm.net> Message-ID: <20080702045329.GT23350@egr.msu.edu> Are the NFS mounts UDP or TCP on Linux and FreeBSD? I believe FreeBSD still defaults to UDP which can act differently especially for NFS. On Tue, Jul 01, 2008 at 05:30:35PM -0700, David Kwan wrote: I've attempt many standard and non-standard permutations of the tcp tuning parameters without much successful via sysctl. It feels like FreeBSD is not handling the congestion very well and is beyond tuning sysctl. It's just clients on the 100MB networks has slow/erratic reads; Clients on the Gigabit network are fine and screams, so the original tcp parameters are just fine for them. David K. -----Original Message----- From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.org] On Behalf Of Paul Sent: Tuesday, July 01, 2008 1:21 PM To: David Kwan Cc: freebsd-net@freebsd.org Subject: Re: Poor network performance for clients in 100MB toGigabit environment What options do you have enabled on the linux server? sysctl -a | grep net.ipv4.tcp and on the bsd sysctl -a net.inet.tcp It sounds like a problem with BSD not handing the dropped data or ack packets so what happens is it pushes a burst of data out > 100mbit and the switch drops the packets and then BSD waits too long to recover and doesn't scale the transmission back. TCP is supposed to scale down the transmission speed until packets are not dropped to a point even without ECN. Options such as 'reno' and 'sack' etc. are congestion control algorithms that use congestion windows. David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > > > My questions are: > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > > > David K. > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From if at xip.at Wed Jul 2 08:48:36 2008 From: if at xip.at (Ingo Flaschberger) Date: Wed Jul 2 08:48:40 2008 Subject: Poor network performance for clients in 100MB to Gigabit environment In-Reply-To: References: Message-ID: Dear David, try to enable flow-control at the gig-e switch and freebsd network card. Kind regards, ingo flaschberger geschaeftsleitung --------------------------- netstorage-crossip-flat:fee powered by crossip communications gmbh --------------------------- sebastian kneipp gasse 1 a-1020 wien fix: +43-1-726 15 22-217 fax: +43-1-726 15 22-111 --------------------------- On Tue, 1 Jul 2008, David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > > > My questions are: > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > > > David K. > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From paul at gtcomm.net Wed Jul 2 08:50:35 2008 From: paul at gtcomm.net (Paul) Date: Wed Jul 2 08:50:39 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486A9A0E.6060308@elischer.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> Message-ID: <486B41D5.3060609@gtcomm.net> SMP DISABLED on my Opteron 2212 (ULE, Preemption on) Yields ~750kpps in em0 and out em1 (one direction) I am miffed why this yields more pps than a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 incoming connections so 3 taskq threads I would be willing to set up test equipment (several servers plugged into a switch) with ipkvm and power port access if someone or a group of people want to figure out ways to improve the routing process, ipfw, and lagg. Maximum PPS with one ipfw rule on UP: tops out about 570Kpps.. almost 200kpps lower ? (frown) I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in here and see how that scales, using UP same kernel etc I have now. Julian Elischer wrote: > Paul wrote: >> ULE without PREEMPTION is now yeilding better results. >> input (em0) output >> packets errs bytes packets errs bytes colls >> 571595 40639 34564108 1 0 226 0 >> 577892 48865 34941908 1 0 178 0 >> 545240 84744 32966404 1 0 178 0 >> 587661 44691 35534512 1 0 178 0 >> 587839 38073 35544904 1 0 178 0 >> 587787 43556 35540360 1 0 178 0 >> 540786 39492 32712746 1 0 178 0 >> 572071 55797 34595650 1 0 178 0 >> >> *OUCH, IPFW HURTS.. >> loading ipfw, and adding one ipfw rule allow ip from any to any drops >> 100Kpps off :/ what's up with THAT? >> unloaded ipfw module and back 100kpps more again, that's not right >> with ONE rule.. :/ > > ipfw need sto gain a lock on hte firewall before running, > and is quite complex.. I can believe it.. > > in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two > interfaces (bridged) but I think it has slowed down since then due to > the SMP locking. > > >> >> em0 taskq is still jumping cpus.. is there any way to lock it to one >> cpu or is this just a function of ULE >> >> running a tar czpvf all.tgz * and seeing if pps changes.. >> negligible.. guess scheduler is doing it's job at least.. >> >> Hmm. even when it's getting 50-60k errors per second on the interface >> I can still SCP a file through that interface although it's not >> fast.. 3-4MB/s.. >> >> You know, I wouldn't care if it added 5ms latency to the packets when >> it was doing 1mpps as long as it didn't drop any.. Why can't it do >> that? Queue them up and do them in bigggg chunks so none are >> dropped........hmm? >> >> 32 bit system is compiling now.. won't do > 400kpps with GENERIC >> kernel, as with 64 bit did 450k with GENERIC, although that could be >> the difference between opteron 270 and opteron 2212.. >> >> Paul >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From if at xip.at Wed Jul 2 08:54:02 2008 From: if at xip.at (Ingo Flaschberger) Date: Wed Jul 2 08:54:06 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486B41D5.3060609@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> Message-ID: Dear Paul, > SMP DISABLED on my Opteron 2212 (ULE, Preemption on) > Yields ~750kpps in em0 and out em1 (one direction) > I am miffed why this yields more pps than > a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 > incoming connections so 3 taskq threads because less locking, less synchronisation, .... > I would be willing to set up test equipment (several servers plugged into a > switch) with ipkvm and power port access > if someone or a group of people want to figure out ways to improve the > routing process, ipfw, and lagg. > > Maximum PPS with one ipfw rule on UP: > tops out about 570Kpps.. almost 200kpps lower ? (frown) can you post the rule here? > I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in here > and see how that scales, using UP same kernel etc I have now. really, please try 32bit and 1 cpu. Kind regards, Ingo Flaschberger From paul at gtcomm.net Wed Jul 2 09:47:07 2008 From: paul at gtcomm.net (Paul) Date: Wed Jul 2 09:47:12 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> Message-ID: <486B4F11.6040906@gtcomm.net> Ipfw rule was simply allow ip from any to any :) This is 64bit i'm testing now.. I have a 32 bit install I tested on another machine but it only has bge NIC and wasn't performing as well so I'll reinstall 32 bit on this 2212 and test then drop in the 2222 (3ghz) and test. I still don't like the huge hit ipfw and lagg take :/ ** I tried polling in UP mode and I got some VERY interesting results.. CPU is 44% idle (idle polling isn't on) but I'm getting errors! It's doing 530kpps with ipfw loaded, which without polling uses 100% cpu but now it says my cpu is 44% idle? that makes no sense.. If it was idle why am I getting errors? I only get errors when em taskq was eating 100% cpu.. Idle polling on/off makes no difference. user_frac is set to 5 .. last pid: 1598; load averages: 0.01, 0.16, 0.43 up 0+00:34:41 04:04:43 66 processes: 2 running, 46 sleeping, 18 waiting CPU: 0.0% user, 0.0% nice, 7.3% system, 46.5% interrupt, 46.2% idle Mem: 8064K Active, 6808K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 10:10 88.87% idle 1598 root 45 0 8084K 2052K RUN 0:00 1.12% top 11 root -32 - 0K 16K WAIT 0:02 0.24% swi4: clock sio 13 root -44 - 0K 16K WAIT 14:13 0.15% swi1: net 1329 root 44 0 33732K 4572K select 0:00 0.05% sshd input (em0) output packets errs bytes packets errs bytes colls 541186 68741 33107504 1 0 0 0 540036 70611 33044632 1 0 178 0 540470 66493 33043148 1 0 178 0 541903 67981 33125414 1 0 178 0 541238 84979 33105898 1 0 178 0 541338 74067 33115984 2 0 356 0 539116 49286 32991516 2 0 220 0 kldunload ipfw....... input (em0) output packets errs bytes packets errs bytes colls 600589 0 36751064 1 0 226 0 606294 0 37102868 2 0 220 0 616802 0 37733866 1 0 178 0 623017 0 38117436 1 0 178 0 624800 0 38225470 1 0 178 0 626791 0 38347426 1 0 178 0 last pid: 1605; load averages: 0.00, 0.13, 0.40 up 0+00:35:30 04:05:32 66 processes: 2 running, 46 sleeping, 18 waiting CPU: 0.0% user, 0.0% nice, 7.1% system, 36.0% interrupt, 56.9% idle Mem: 8064K Active, 6812K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 10:16 95.36% idle 13 root -44 - 0K 16K WAIT 14:53 0.24% swi1: net 36 root -68 - 0K 16K - 1:03 0.10% em3 taskq 1605 root 44 0 8084K 2052K RUN 0:00 0.10% top 11 root -32 - 0K 16K WAIT 0:02 0.05% swi4: clock sio add some more PPS...... input (em0) output packets errs bytes packets errs bytes colls 749015 169684 46438936 1 0 42 0 749176 184574 46448916 1 0 178 0 759576 188462 47093716 1 0 178 0 762904 182854 47300052 1 0 178 0 798039 147509 49478422 1 0 178 0 759528 194297 47090740 1 0 178 0 746849 195935 46304642 1 0 178 0 747566 186703 46349096 1 0 178 0 750011 181630 46500702 2 last pid: 1607; load averages: 0.19, 0.17, 0.40 up 0+00:36:18 04:06:20 66 processes: 2 running, 46 sleeping, 18 waiting CPU: 0.0% user, 0.0% nice, 12.5% system, 45.4% interrupt, 42.1% idle Mem: 8068K Active, 6808K Inact, 43M Wired, 92K Cache, 9264K Buf, 1923M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 10:21 85.64% idle 36 root -68 - 0K 16K - 1:07 3.61% em3 taskq 1607 root 44 0 8084K 2052K RUN 0:00 0.93% top 13 root -44 - 0K 16K WAIT 15:32 0.20% swi1: net 11 root -32 - 0K 16K WAIT 0:02 0.05% swi4: clock sio So my maximum without polling is close to 800kpps but if I push that it starts locking me from doing things, or my maximum is 750kpps with polling and the console is very responsive? How on EARTH can my CPU be 42% idle with polling and i'm getting all these errors.. The whole thing makes no sense, something is bugged somewheres.. HZ=2000 for this test (512/512 descriptors) If i lower HZ to 100, I can get a little over 800kpps without polling.. --------Going to reboot with 4000hz and 1024k rx/tx descriptors .......... about the same.. input (em0) output packets errs bytes packets errs bytes colls 720833 244835 44691662 1 0 178 0 744746 215689 46174256 1 0 178 0 744943 194252 46186470 1 0 178 0 743685 199487 46108486 2 0 356 0 743715 209263 46110346 2 0 356 0 last pid: 1426; load averages: 0.22, 0.65, 0.40 up 0+00:07:17 04:16:43 66 processes: 2 running, 46 sleeping, 18 waiting CPU: 0.4% user, 0.0% nice, 12.8% system, 44.2% interrupt, 42.6% idle Mem: 8052K Active, 6192K Inact, 46M Wired, 96K Cache, 8944K Buf, 1921M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 0:49 82.52% idle 36 root -68 - 0K 16K - 0:31 6.84% em3 taskq 1426 root 45 0 8084K 2052K RUN 0:00 1.32% top 13 root -44 - 0K 16K WAIT 3:07 0.59% swi1: net 11 root -32 - 0K 16K WAIT 0:00 0.05% swi4: clock sio ------reboot with 2048/2048 descriptors NOTE: without polling, 128,256,512 give best performance for some strange reason, maybe cache hits this is worse.. input (em0) output packets errs bytes packets errs bytes colls 646290 269912 40080528 0 0 0 0 672548 250198 41687440 1 0 178 0 674856 247162 41841076 1 0 178 0 665062 248851 41233848 1 0 178 0 671764 253300 41649372 bah.. ------- 10000HZ, 512/512 CPU still will not go below 42% idle 700-720 kpps.. actualyl got 40% cpu idle lol Oh well.. Tomorrow hopefully 2222 test and 32 bit test.. then i'm done for while.. :P Paul Ingo Flaschberger wrote: > Dear Paul, > >> SMP DISABLED on my Opteron 2212 (ULE, Preemption on) >> Yields ~750kpps in em0 and out em1 (one direction) >> I am miffed why this yields more pps than >> a) with all 4 cpus running and b) 4 cpus with lagg load balanced over >> 3 incoming connections so 3 taskq threads > > because less locking, less synchronisation, .... > >> I would be willing to set up test equipment (several servers plugged >> into a switch) with ipkvm and power port access >> if someone or a group of people want to figure out ways to improve >> the routing process, ipfw, and lagg. >> >> Maximum PPS with one ipfw rule on UP: >> tops out about 570Kpps.. almost 200kpps lower ? (frown) > > can you post the rule here? > >> I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's >> in here and see how that scales, using UP same kernel etc I have now. > > really, please try 32bit and 1 cpu. > > Kind regards, > Ingo Flaschberger > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From if at xip.at Wed Jul 2 10:05:52 2008 From: if at xip.at (Ingo Flaschberger) Date: Wed Jul 2 10:05:57 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486B4F11.6040906@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> Message-ID: Dear Paul, > I still don't like the huge hit ipfw and lagg take :/ I think, you can't use fastforward with with lagg. > ** I tried polling in UP mode and I got some VERY interesting results.. > CPU is 44% idle (idle polling isn't on) but I'm getting errors! It's doing > 530kpps with ipfw loaded, which without polling uses 100% cpu but now it says > my cpu is 44% idle? that makes no sense.. If it was idle why am I getting > errors? I only get errors when em taskq was eating 100% cpu.. > Idle polling on/off makes no difference. > user_frac is set to 5 .. what are your values: kern.polling.reg_frac= kern.polling.user_frac= kern.polling.burst_max= I use: kern.polling.reg_frac=20 kern.polling.user_frac=20 kern.polling.burst_max=512 if you need more than 1000, you need to change the code: src/sys/kern/kern_poll.c #define MAX_POLL_BURST_MAX 1000 > So my maximum without polling is close to 800kpps but if I push that it > starts locking me from doing things, or how many kpps do you want to achieve? > HZ=2000 for this test (512/512 descriptors) you mean: hw.em.rxd=512 hw.em.txd=512 ? can you try with polling: hw.em.rxd=4096 hw.em.txd=4096 Kind regards, Ingo Flaschberger From stefan.lambrev at moneybookers.com Wed Jul 2 13:02:47 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Wed Jul 2 13:02:53 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> Message-ID: <486B7C69.1010304@moneybookers.com> Hi Ingo Flaschberger wrote: > Dear Paul, > >> I still don't like the huge hit ipfw and lagg take :/ You have to try PF, then you will respect IPFW again ;) -cut- > >> So my maximum without polling is close to 800kpps but if I push that >> it starts locking me from doing things, or > > how many kpps do you want to achieve? Do not know for Paul but, I want to be able to route (and/or bridge to handle) 600-700mbps syn flood, which is something like 1500kpps in every direction. Is it unrealistic? If the code is optimized to fully utilize MP I do not see a reason why quad core processor should not be able to do this. After all single core seems to handle 500kpps, if we utilize four, eight or even more cores we should be able to route 1500kpps + ? I hope TOE once MFCed to 7-STABLE will help too? -- Best Wishes, Stefan Lambrev ICQ# 24134177 From adrian at freebsd.org Wed Jul 2 13:06:05 2008 From: adrian at freebsd.org (Adrian Chadd) Date: Wed Jul 2 13:06:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486B7C69.1010304@moneybookers.com> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486B7C69.1010304@moneybookers.com> Message-ID: 2008/7/2 Stefan Lambrev : > Do not know for Paul but, I want to be able to route (and/or bridge to > handle) 600-700mbps syn flood, > which is something like 1500kpps in every direction. Is it unrealistic? > If the code is optimized to fully utilize MP I do not see a reason why quad > core processor should not be able to do this. > After all single core seems to handle 500kpps, if we utilize four, eight or > even more cores we should be able to route 1500kpps + ? > I hope TOE once MFCed to 7-STABLE will help too? But its not just about CPU use, its about your NIC, your IO bus path, your memory interface, your caches .. things get screwy. Especially if you're holding a full internet routing table. If you're interested in participating in a group funding project to make this happen then let me know. The more the merrier (read: the more that can be achieved :) Adrian From mike at sentex.net Wed Jul 2 13:46:42 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 2 13:46:47 2008 Subject: Route messages In-Reply-To: <20080701092254.T57089@maildrop.int.zabbadoz.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> Message-ID: <200807021346.m62DkdHx091961@lava.sentex.ca> At 05:24 AM 7/1/2008, Bjoern A. Zeeb wrote: >On Tue, 1 Jul 2008, Bjoern A. Zeeb wrote: > >Hi, > >>On Tue, 1 Jul 2008, Andre Oppermann wrote: >> >>Hi, >> >>>Mike Tancsa wrote: >>>>I am thinking >>>>http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html >>>>is the commit ? If I revert to the prev version, the issue goes away. >> >>Ha, I finally know why I ended up on Cc: of a thread I had no idea >>about. Someone could have told me instead of blindly adding me;-) >> >> >>>Yes, this change doesn't look right. It should only do the route >>>lookup in ip_input.c when there was an EMSGSIZE error returned by >>>ip_output(). The rtalloc_ign() call causes the message to be sent >>>because it always sets report to one. The default message is RTM_MISS. >>>I'll try to prep an updated patch which doesn't have these issues later >>>today. >> >>Yeah my bad. Sorry. >> >>If you do that, do not do an extra route lookup if possible, correct >>the rtalloc call. Thanks. > >So I had a very quick look at the code between doing something else. >I think the only change needed is this if I am not mistaken but my >head is far away nowhere close enough in this code. > >Andre, could you review this? > >Index: sys/netinet/ip_input.c >=================================================================== >RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v >retrieving revision 1.332.2.2 >diff -u -p -r1.332.2.2 ip_input.c >--- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 >+++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 >@@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) > * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described > in RFC1191. > */ > bzero(&ro, sizeof(ro)); >- rtalloc_ign(&ro, RTF_CLONING); > > error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); > This could also potentially close http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 http://www.freebsd.org/cgi/query-pr.cgi?pr=123621 ---Mike From andre at freebsd.org Wed Jul 2 13:51:24 2008 From: andre at freebsd.org (Andre Oppermann) Date: Wed Jul 2 13:51:28 2008 Subject: Route messages In-Reply-To: <20080701092254.T57089@maildrop.int.zabbadoz.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> Message-ID: <486B87DB.3080007@freebsd.org> Bjoern A. Zeeb wrote: > On Tue, 1 Jul 2008, Bjoern A. Zeeb wrote: > > Hi, > >> On Tue, 1 Jul 2008, Andre Oppermann wrote: >> >> Hi, >> >>> Mike Tancsa wrote: >>>> I am thinking >>>> >>>> http://lists.freebsd.org/pipermail/cvs-src/2008-April/090303.html >>>> is the commit ? If I revert to the prev version, the issue goes away. >> >> Ha, I finally know why I ended up on Cc: of a thread I had no idea >> about. Someone could have told me instead of blindly adding me;-) >> >> >>> Yes, this change doesn't look right. It should only do the route >>> lookup in ip_input.c when there was an EMSGSIZE error returned by >>> ip_output(). The rtalloc_ign() call causes the message to be sent >>> because it always sets report to one. The default message is RTM_MISS. >>> >>> I'll try to prep an updated patch which doesn't have these issues later >>> today. >> >> Yeah my bad. Sorry. >> >> If you do that, do not do an extra route lookup if possible, correct >> the rtalloc call. Thanks. > > So I had a very quick look at the code between doing something else. > I think the only change needed is this if I am not mistaken but my > head is far away nowhere close enough in this code. > > Andre, could you review this? Yes, this should fix the problem. I haven't tested the patch though. -- Andre > Index: sys/netinet/ip_input.c > =================================================================== > RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v > retrieving revision 1.332.2.2 > diff -u -p -r1.332.2.2 ip_input.c > --- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 > +++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 > @@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) > * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described in > RFC1191. > */ > bzero(&ro, sizeof(ro)); > - rtalloc_ign(&ro, RTF_CLONING); > > error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); > > From bz at FreeBSD.org Wed Jul 2 13:53:22 2008 From: bz at FreeBSD.org (bz@FreeBSD.org) Date: Wed Jul 2 13:53:27 2008 Subject: kern/124540: [tcp] RTM_MISS with the transit packets Message-ID: <200807021353.m62DrMnx048486@freefall.freebsd.org> Synopsis: [tcp] RTM_MISS with the transit packets Responsible-Changed-From-To: freebsd-net->bz Responsible-Changed-By: bz Responsible-Changed-When: Wed Jul 2 13:52:56 UTC 2008 Responsible-Changed-Why: My fault most likely, patch out for review already. http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 From bz at FreeBSD.org Wed Jul 2 13:55:07 2008 From: bz at FreeBSD.org (Bjoern A. Zeeb) Date: Wed Jul 2 13:55:11 2008 Subject: Route messages In-Reply-To: <200807021346.m62DkdHx091961@lava.sentex.ca> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> <200807021346.m62DkdHx091961@lava.sentex.ca> Message-ID: <20080702135402.E57089@maildrop.int.zabbadoz.net> On Wed, 2 Jul 2008, Mike Tancsa wrote: Hi, >> Index: sys/netinet/ip_input.c >> =================================================================== >> RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v >> retrieving revision 1.332.2.2 >> diff -u -p -r1.332.2.2 ip_input.c >> --- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 >> +++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 >> @@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) >> * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described in >> RFC1191. >> */ >> bzero(&ro, sizeof(ro)); >> - rtalloc_ign(&ro, RTF_CLONING); >> >> error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); >> Still waiting on any second pairs of eyes to review this. > This could also potentially close > > http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 > http://www.freebsd.org/cgi/query-pr.cgi?pr=123621 taken, will handle them. -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From mike at sentex.net Wed Jul 2 13:55:54 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 2 13:55:59 2008 Subject: Route messages In-Reply-To: <486B87DB.3080007@freebsd.org> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> <486B87DB.3080007@freebsd.org> Message-ID: <200807021355.m62Dtpli092002@lava.sentex.ca> At 09:51 AM 7/2/2008, Andre Oppermann wrote: >>Andre, could you review this? > >Yes, this should fix the problem. I haven't tested the patch though. It works for me in the lab and on one production machine I patched early this morning. ---Mike >-- >Andre > >>Index: sys/netinet/ip_input.c >>=================================================================== >>RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v >>retrieving revision 1.332.2.2 >>diff -u -p -r1.332.2.2 ip_input.c >>--- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 >>+++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 >>@@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) >> * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field >> described in RFC1191. >> */ >> bzero(&ro, sizeof(ro)); >>- rtalloc_ign(&ro, RTF_CLONING); >> error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); > >_______________________________________________ >freebsd-net@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From cokane at FreeBSD.org Wed Jul 2 14:59:09 2008 From: cokane at FreeBSD.org (cokane@FreeBSD.org) Date: Wed Jul 2 14:59:11 2008 Subject: kern/124225: [ndis] [patch] ndis network driver sometimes loses network connection Message-ID: <200807021459.m62Ex92w054515@freefall.freebsd.org> Synopsis: [ndis] [patch] ndis network driver sometimes loses network connection Responsible-Changed-From-To: freebsd-net->cokane Responsible-Changed-By: cokane Responsible-Changed-When: Wed Jul 2 14:56:51 UTC 2008 Responsible-Changed-Why: PR refers to a recent commit of changes that I made, I will look into solving this problem in my development branch. http://www.freebsd.org/cgi/query-pr.cgi?pr=124225 From paul at gtcomm.net Wed Jul 2 18:22:55 2008 From: paul at gtcomm.net (Paul) Date: Wed Jul 2 18:22:59 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> Message-ID: <486BC7F5.5070604@gtcomm.net> Fastforward works with lagg , lagg just has some issues that need to be fixed, even on UP system. It has the same issue as IPFW. kern.polling.idlepoll_sleeping: 1 kern.polling.stalled: 806 kern.polling.suspect: 97861 kern.polling.phase: 0 kern.polling.enable: 0 kern.polling.handlers: 2 kern.polling.residual_burst: 0 kern.polling.pending_polls: 0 kern.polling.lost_polls: 128535 kern.polling.short_ticks: 1455 kern.polling.reg_frac: 1200 kern.polling.user_frac: 0 kern.polling.idle_poll: 0 kern.polling.each_burst: 50 kern.polling.burst_max: 1440 kern.polling.burst: 377 It's doing 720kpps right now, and it's having a lot of errors, but the cpu is 40% idle! I don't understand? Is it reporting the wrong cpu usage in TOP? Is this a bug? input (em0) output packets errs bytes packets errs bytes colls 722012 42861 44764748 1 0 178 0 704432 52679 43674800 2 0 580 0 693297 53536 42984418 1 0 178 0 704046 42525 43650854 2 0 220 0 714959 37876 44327462 1 0 178 0 744923 24202 46185230 1 0 178 0 726069 34699 45016282 1 0 178 0 681837 78581 42273898 1 0 178 0 663106 85699 41112576 1 0 178 0 708274 55414 43912992 1 0 178 0 659659 94430 40898862 1 0 178 0 669235 100248 41492574 1 0 178 0 676510 100102 41943624 1 0 178 0 679847 98972 42150518 1 0 178 0 677700 92586 42017416 2 0 356 0 672639 86454 41703622 1 0 178 0 675841 72821 41902146 1 0 178 0 679522 86423 42130368 1 0 178 0 660737 72883 40965698 1 0 178 0 637085 81303 39499274 1 0 178 0 655463 98183 40638710 1 0 178 0 input (em0) output packets errs bytes packets errs bytes colls 683650 66140 42386304 1 0 178 0 654910 110089 40604424 1 0 290 0 647969 120709 40174082 1 0 178 0 666260 67037 41308124 1 0 178 0 671570 68276 41637344 1 0 178 0 691683 60819 42884350 1 0 178 0 663656 79528 41146728 2 0 244 0 703917 47860 43642870 2 0 356 0 710988 55792 44081258 2 0 220 0 697062 77661 43217848 1 0 178 0 65 processes: 2 running, 46 sleeping, 17 waiting CPU: 0.0% user, 0.0% nice, 10.8% system, 45.7% interrupt, 43.5% idle Mem: 7968K Active, 6028K Inact, 42M Wired, 84K Cache, 8768K Buf, 1925M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 171 ki31 0K 16K RUN 4:06 86.43% idle 36 root -68 - 0K 16K - 0:57 3.81% em3 taskq 13 root -44 - 0K 16K WAIT 9:22 1.42% swi1: net 1429 root 44 0 8084K 2052K RUN 0:00 0.29% top 11 root -32 - 0K 16K WAIT 0:01 0.05% swi4: clock sio net.isr.swi_count: 39442306 net.isr.drop: 0 net.isr.queued: 8 net.isr.deferred: 0 net.isr.directed: 2189 net.isr.count: 2189 net.isr.direct: 1 net.route.netisr_maxqlen: 16384 net.inet.ip.intr_queue_maxlen: 16384 net.inet.ip.intr_queue_drops: 0 em0: Excessive collisions = 0 em0: Sequence errors = 0 em0: Defer count = 0 em0: Missed Packets = 22574958 em0: Receive No Buffers = 65713041 em0: Receive Length Errors = 0 em0: Receive errors = 0 em0: Crc errors = 0 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 52679 em0: watchdog timeouts = 0 em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 em0: XON Rcvd = 0 em0: XON Xmtd = 0 em0: XOFF Rcvd = 0 em0: XOFF Xmtd = 0 em0: Good Packets Rcvd = 547984791 em0: Good Packets Xmtd = 5000 em0: TSO Contexts Xmtd = 18 em0: TSO Contexts Failed = 0 kern.hz=2000 hw.em.rxd=512 hw.em.txd=512 -----------Reboot with 4096/4096........(my guess is that it will be a lot worse, more errors..) ........ Without polling, 4096 is horrible, about 200kpps less ... :/ Turning on polling.. polling on, 4096 is bad, input (em0) output packets errs bytes packets errs bytes colls 622379 307753 38587506 1 0 178 0 635689 277303 39412718 1 0 178 0 625552 291235 38784244 2 0 580 0 630143 287872 39068870 1 0 178 0 620225 292071 38453954 1 0 178 0 627499 295329 38904942 1 0 178 0 623854 288086 38678952 1 0 178 0 632433 267698 39210850 1 0 178 0 619177 279541 38388978 1 0 178 0 618049 265926 38319038 2 0 356 0 627026 263882 38875616 1 0 178 0 em0: Missed Packets = 16570461 em0: Receive No Buffers = 9220592 em0: Receive Length Errors = 0 em0: Receive errors = 0 em0: Crc errors = 0 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 40539 ------Rebooting with 256/256 descriptors.......... .......... No polling: 843762 25337 52313248 1 0 178 0 763555 0 47340414 1 0 178 0 830189 0 51471722 1 0 178 0 838724 0 52000892 1 0 178 0 813594 939 50442832 1 0 178 0 807303 763 50052790 1 0 178 0 791024 0 49043492 1 0 178 0 768316 1106 47635596 1 0 178 0 Machine is maxed and is unresponsive.. Polling ON: input (em0) output packets errs bytes packets errs bytes colls 784138 179079 48616564 1 0 226 0 788815 129608 48906530 2 0 356 0 755555 142997 46844426 2 0 468 0 803670 144459 49827544 1 0 178 0 777649 147120 48214242 1 0 178 0 779539 146820 48331422 1 0 178 0 786201 148215 48744478 2 0 356 0 776013 101660 48112810 1 0 178 0 774239 145041 48002834 2 0 356 0 771774 102969 47850004 1 0 178 0 Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm really mistified by this.. Every time it maxes out and gets errors, top reports: CPU: 0.0% user, 0.0% nice, 10.1% system, 45.3% interrupt, 44.6% idle pretty much the same line every time 256/256 blows away 4096 , probably fits the descriptors into the cache lines on the cpu and 4096 has too many cache misses and causes worse performance. This is probably just some nasty programming and they could optimize it for 4096 by taking larger chunks, we don't need the latency to be 0.08ms for each packet, i don't care if it's 0.3ms as long as it doesn't drop any. Setting HZ=100 and 256/256 gets a maximum higher than 2000 polling with 256/256 but the box is unresponsive around 850kpps If this only worked with SMP and was optimized it could do millions of pps :/ Ingo Flaschberger wrote: > Dear Paul, > >> I still don't like the huge hit ipfw and lagg take :/ > > I think, you can't use fastforward with with lagg. > >> ** I tried polling in UP mode and I got some VERY interesting results.. >> CPU is 44% idle (idle polling isn't on) but I'm getting errors! >> It's doing 530kpps with ipfw loaded, which without polling uses 100% >> cpu but now it says my cpu is 44% idle? that makes no sense.. If it >> was idle why am I getting errors? I only get errors when em taskq >> was eating 100% cpu.. >> Idle polling on/off makes no difference. >> user_frac is set to 5 .. > > what are your values: > kern.polling.reg_frac= > kern.polling.user_frac= > kern.polling.burst_max= > > I use: > kern.polling.reg_frac=20 > kern.polling.user_frac=20 > kern.polling.burst_max=512 > > if you need more than 1000, you need to change the code: > src/sys/kern/kern_poll.c > #define MAX_POLL_BURST_MAX 1000 > >> So my maximum without polling is close to 800kpps but if I push that >> it starts locking me from doing things, or > > how many kpps do you want to achieve? > >> HZ=2000 for this test (512/512 descriptors) > > you mean: > hw.em.rxd=512 > hw.em.txd=512 > ? > > can you try with polling: > hw.em.rxd=4096 > hw.em.txd=4096 > > Kind regards, > Ingo Flaschberger > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From paul at gtcomm.net Wed Jul 2 18:25:54 2008 From: paul at gtcomm.net (Paul) Date: Wed Jul 2 18:25:59 2008 Subject: Route messages In-Reply-To: <20080702135402.E57089@maildrop.int.zabbadoz.net> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> <200807021346.m62DkdHx091961@lava.sentex.ca> <20080702135402.E57089@maildrop.int.zabbadoz.net> Message-ID: <486BC8AB.60708@gtcomm.net> Works for me on test machine.. I was expecting a performance increase, but nothing changed.. Just no more route messages, zebra will be happy. Bjoern A. Zeeb wrote: > On Wed, 2 Jul 2008, Mike Tancsa wrote: > > Hi, > >>> Index: sys/netinet/ip_input.c >>> =================================================================== >>> RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v >>> retrieving revision 1.332.2.2 >>> diff -u -p -r1.332.2.2 ip_input.c >>> --- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 >>> +++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 >>> @@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) >>> * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described >>> in RFC1191. >>> */ >>> bzero(&ro, sizeof(ro)); >>> - rtalloc_ign(&ro, RTF_CLONING); >>> >>> error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); >>> > > Still waiting on any second pairs of eyes to review this. > > > >> This could also potentially close >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=123621 > > taken, will handle them. > From rahman.sazzadur at gmail.com Wed Jul 2 18:37:12 2008 From: rahman.sazzadur at gmail.com (sazzadur rahman) Date: Wed Jul 2 19:30:36 2008 Subject: A query regarding SCTP congestion control In-Reply-To: <48060748.1090807@cisco.com> References: <7059EA19D7837E44A3BA7DAB464944B37FDA715193@XMAIL5.sooner.net.ou.edu> <48060748.1090807@cisco.com> Message-ID: <82bdb5ec0807021137m7819153rbc0631ab6f310d0e@mail.gmail.com> Hello, I need to get SCTP congestion window data for research purpose. I collected cwnd data from SCTP sender running on FreeBSD 7.0 machine by using KTR kernel log. After that, I tried to plot cwnd vs. time and generated graph. But I am unable to explain the graph and it is very different compared to the graph as shown in the book "Stream Control Transmission Protocol (SCTP)", a reference guide by Randall R. Stewart, page 187 and TCP congestion window. An typical entry from the log looks like: 749199232185105 Net:0xc7703000 at cwnd_event (SACK) cwnd:25140 flight:0 pq:0 atpc:72 needpc:235 (tsn:0,sendcnt:191,strcnt:191) I have used 749199232185105 in x axis as time and cwnd:25140 in y axis. I have attached the image file of the graph herewith this mail. >From the log, I found that cwnd varies very frequently accross time. Does anyone have any idea regarding this issue? Please let me know if you have any questions further. Thanks in advance. Best regards, Md Sazzadur Rahman Graduate Student, School of Computer Science, University of Oklahoma, Norman, Oklahoma, USA Steps for getting kernel log ------------------------------------------ 1. Add options: options KTR options KTR_ENTRIES=65536 options KTR_MASK=KTR_SUBSYS 2. Recompile kernel config CUSTOM_KERNEL_9_6 cd ../compile/ CUSTOM_KERNEL_9_6 make cleandepend;make depend; make all install 3. Tried to enable trace point by: Sysctl -w "net.inet.sctp.log_level=0x00000004" 4. run SCTP sender. 5. pull out data: Ktrdump ?q ?t ?o file_name Prtcwndlog ?l filename > cwnd.txt --------------------------------------------------- On Wed, Apr 16, 2008 at 9:03 AM, Randall Stewart wrote: > Rahman, Md Sazzadur wrote: > >> Hi, I would like to get the values of SCTP congestion control >> algorithm variables (cwnd, ssthresh, flightsize and pba) from any >> SCTP based application in runtime for research purpose. Does any API >> exist in SCTP for that? Do I need to dig the SCTP code in kernel to >> get the values? >> > > There is a socket option to get the cwnd. > > However, I think what you really want is some of the researchish > tracing stuff that SCTP provides. > > You can actually get a real time trace of the cwnd/flight etc via the > various logging functions. > > You basically must compile this as an option.. have to go look > at the options.. > > And then you can either use ktrace (which I don't recommend since > it turns on to much overhead in the kernel) or you can > use SCTP_LOCAL_TRACE_BUF > > This will put it into a piece of memory only for SCTP and > not turn on all the other ktrace points. > > After you enable the logging in your compile you must turn > on the logging level.. > > SCTP_CWND_LOGGING_ENABLE > > woudl be my recommendation. > > It gives you a real time up/down growth of the cwnd/flight/rwnd > > I think I wrote a "how to" somewhere.. let me go look.. > > R > > > >> I will appreciate any help in this regard. >> >> Best Regards, Md Sazzadur Rahman Graduate Student, School of Computer >> Science, University of Oklahoma, Norman, Oklahoma, USA >> >> _______________________________________________ freebsd-net@freebsd.orgmailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, >> send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >> > > -- > Randall Stewart > NSSTG - Cisco Systems Inc. > 803-345-0369 803-317-4952 (cell) > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From fjwcash at gmail.com Wed Jul 2 19:48:30 2008 From: fjwcash at gmail.com (Freddie Cash) Date: Wed Jul 2 19:48:33 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <4869880D.8040901@ibctech.ca> Message-ID: On Mon, Jun 30, 2008 at 6:39 PM, Ingo Flaschberger wrote: >> I'm curious now... how do you change individual device polling via sysctl? > > not via sysctl, via ifconfig: > > # enable interface polling > /sbin/ifconfig em0 polling > /sbin/ifconfig em1 polling > /sbin/ifconfig em2 polling > /sbin/ifconfig em3 polling > > (and via /etc/rc.local also across reboots) No, you put it into the ifconfig_X lines in /etc/rc.conf as the last option. Or -polling to disable it. ifconfig_em0='inet 1.2.3.4/24 polling" ifconfig_em2='inet 1.2.3.5/24 -polling" -- Freddie Cash fjwcash@gmail.com From stef-list at memberwebs.com Thu Jul 3 01:08:33 2008 From: stef-list at memberwebs.com (Stef) Date: Thu Jul 3 01:08:41 2008 Subject: connect(): Operation not permitted References: <678A03F5-5E8A-4CF6-90DF-AA9A4F30FBE1@stromnet.se> <1211037564.6326.27.camel@porksoda> <679DB462-75D6-45CC-949C-1BE8E12C22CD@stromnet.se> <482FD877.6050707@infracaninophile.co.uk> Message-ID: <20080703003955.859BCF180C0@mx.npubs.com> Kian Mohageri wrote: > On Sun, May 18, 2008 at 3:33 AM, Johan Str?m wrote: >> On May 18, 2008, at 9:19 AM, Matthew Seaman wrote: >> >>> Johan Str?m wrote: >>> >>>> drop all traffic)? A check with pfctl -vsr reveals that the actual rule >>>> inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags >>>> S/SA keep state". Where did that "keep state" come from? >>> 'flags S/SA keep state' is the default now for tcp filter rules -- that >>> was new in 7.0 reflecting the upstream changes made between the 4.0 and >>> 4.1 >>> releases of OpenBSD. If you want a stateless rule, append 'no state'. >>> >>> http://www.openbsd.org/faq/pf/filter.html#state >> Thanks! I was actually looking around in the pf.conf manpage but failed to >> find it yesterday, but looking closer today I now saw it. >> Applied the no state (and quick) to the rule, and now no state is created. >> And the problem I had in the first place seems to have been resolved too >> now, even though it didn't look like a state problem.. (started to deny new >> connections much earlier than the states was full, altough maybee i wasnt >> looking for updates fast enough or something). >> > > I'd be willing to bet it's because you're reusing the source port on a > new connection before the old state expires. > > You'll know if you check the state-mismatch counter. > > Anyway, glad you found a resolution. I've been experiencing this "Operation not permitted" too. I've been trying to track down the problem for many months, but due to the complexity of my firewalls (scores of jails each with scores of rules), I wasn't brave enough to ask for help :) As a work around we started creating rules without state, whenever we would run into the problem. Thanks for the pointer about state-mismatch. The state-mismatch counter does is in fact high in my case (see below). How would I go about getting the pf state timeout and the reuse of ports for outbound connections to match? Or is this an intractable problem, that just needs to be worked around? Cheers, Stef Walter Status: Enabled for 13 days 23:55:25 Debug: Urgent Hostid: 0x38ae6776 State Table Total Rate current entries 65 searches 819507771 677.7/s inserts 1136670 0.9/s removals 1136605 0.9/s Counters match 787482855 651.2/s bad-offset 0 0.0/s fragment 0 0.0/s short 0 0.0/s normalize 0 0.0/s memory 0 0.0/s bad-timestamp 0 0.0/s congestion 0 0.0/s ip-option 0 0.0/s proto-cksum 0 0.0/s state-mismatch 748 0.0/s state-insert 0 0.0/s state-limit 0 0.0/s src-limit 0 0.0/s synproxy 0 0.0/s From peterjeremy at optushome.com.au Thu Jul 3 02:58:26 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Thu Jul 3 02:58:29 2008 Subject: arplookup x.x.x.x failed: host is not on local network Message-ID: <20080703025822.GA24765@server.vk2pj.dyndns.org> I'm occasionally seeing pairs of messages like the following on my NAT host: arplookup 192.168.181.114 failed: host is not on local network arpresolve: can't allocate route for 192.168.181.114 In my particular configuration, there are dual subnets between the NAT and target host. My initial assumption was that the request was arriving on the other subnet and I added if_xname to the arplookup printf() - but that shows that interface matches the IP address. I've looked back through the mailing lists but the previous reports of this problem don't match my scenario. I've seen this with FreeBSD 5.3, 6.2 and 7.0. The (in)frequency of the problem makes me wonder if it's actually a resource exhaustion problem. Has anyone got any suggestions? -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080703/b33fd8c2/attachment.pgp From paul at gtcomm.net Thu Jul 3 06:45:59 2008 From: paul at gtcomm.net (Paul) Date: Thu Jul 3 06:46:06 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> Message-ID: <486C7611.9030905@gtcomm.net> Preliminary 32 bit results... When I started out it looked like 32 bit was worse than 64 bit, but it's just the timers are different. For instance, 4000 hz in 64 bit gives better results than 4000hz in 32 bit. Low HZ gives better result with polling on in 32 bit Bottom line, so far I'm not able to get any better performance out of 32 bit at all. In fact I think it might be even a tad slower. I didn't see as high of bursts like I did on 64 bit so far but I'm still testing. Tomorrow comes opteron 2222 so it's 1ghz faster than this one, and I can see if it scales directly with cpu speed or what happens. I did another SMP test with an interesting results. I took one of the cpus out of the machine, so it was just left with a single 2212 (dual core) and it performed better. Less contention I suppose? some results: kern.hz=4000 hw.em.rxd=512 hw.em.txd=512 polling on, idle polling on (only way I can get a reliable netstat output) input (em0) output packets errs bytes packets errs bytes colls 681961 117612 42281586 1 0 226 0 655095 83418 40615892 2 0 220 0 683881 93559 42400626 1 0 178 0 683637 90452 42385498 1 0 178 0 683345 87471 42367394 1 0 178 0 682737 81483 42329696 2 0 220 0 683154 95413 42355552 1 0 178 0 684556 111013 42442476 1 0 178 0 684365 110960 42430634 1 0 178 0 679089 116440 42103518 3 0 534 0 684328 122713 42428340 1 0 178 0 684852 121387 42460828 1 0 178 0 685358 113256 42492200 1 0 178 0 685060 123110 42473724 1 0 178 0 684463 118335 42436710 1 0 178 0 677182 127788 41985300 2 0 356 0 685920 126144 42527044 1 0 178 0 684946 107034 42466656 1 0 178 0 (reboot) kern.hz=1000 input (em0) output packets errs bytes packets errs bytes colls 679611 97394 42136046 5 0 762 0 663939 104714 41164254 5 0 1322 0 685538 91102 42503412 4 0 536 0 676704 94629 41955668 2 0 404 0 685323 115060 42490030 1 0 178 0 675954 105506 41909164 2 0 356 0 655321 92118 40629906 1 0 178 0 686826 85674 42583228 2 0 356 0 686378 89983 42555440 1 0 178 0 685539 80180 42503422 1 0 178 0 686704 88626 42575652 1 0 178 0 686567 88596 42567158 1 0 178 0 687031 82640 42595936 3 0 398 0 sysctl -w kern.polling.each_burst=50 kern.polling.each_burst: 256 -> 50 [root@ircrouter ~]# netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 693036 39992 42968315 3 0 400 0 695538 58189 43123360 1 0 178 0 692670 62765 42945544 1 0 178 0 693219 60755 42979580 2 0 220 0 692637 64761 42943498 1 sysctl -w kern.polling.each_burst=33 kern.polling.each_burst: 50 -> 33 [root@ircrouter ~]# netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 690530 63359 42812868 1 0 226 0 689748 57670 42764380 1 0 178 0 690489 57874 42810322 1 0 178 0 689655 60606 42758614 1 0 178 0 ^C [root@ircrouter ~]# sysctl -w kern.polling.each_burst=3 kern.polling.each_burst: 33 -> 3 [root@ircrouter ~]# netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 612234 110896 37958512 1 0 226 0 614391 112506 38092246 1 0 178 0 ^C [root@ircrouter ~]# sysctl -w kern.polling.each_burst=800 kern.polling.each_burst: 3 -> 800 [root@ircrouter ~]# netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 668057 76496 41419538 1 0 226 0 667689 88674 41396720 2 0 220 0 670526 106654 41572616 1 0 178 0 667326 97832 41374216 1 0 178 0 ^C [root@ircrouter ~]# sysctl -w kern.polling.each_burst=66 kern.polling.each_burst: 800 -> 66 [root@ircrouter ~]# netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 690164 89290 42790172 1 0 226 0 688886 74360 42710936 1 0 178 0 674079 77027 41792902 1 0 178 0 kern.hz=2000 input (em0) output packets errs bytes packets errs bytes colls 699116 238016 43345196 1 0 178 0 698263 225244 43292310 1 0 290 0 697246 222395 43229256 1 0 178 0 696749 207766 43198442 1 0 178 0 697304 217384 43232852 1 0 178 0 696401 209901 43176866 1 0 178 0 696508 207757 43183500 1 0 178 0 ^C hz=2000 with 1024/1024 descriptors input (em0) output packets errs bytes packets errs bytes colls 670315 235780 41559534 1 0 226 0 683218 225838 42359520 1 0 178 0 682998 242551 42345880 1 0 178 0 681777 239649 42270178 1 0 178 0 hz=1000 with 256/256 descriptors netstat -w1 -I em0 input (em0) output packets errs bytes packets errs bytes colls 740584 160355 45916212 2 0 0 0 746027 165198 46253678 1 0 178 0 746068 165921 46256220 1 0 178 0 746505 167527 46283314 1 0 178 0 743902 175019 46121928 1 0 178 0 746130 179795 46260064 1 0 178 0 744457 166448 46156338 1 0 178 0 746169 176137 46262482 1 0 178 0 hz=667 with 256/256 input (em0) output packets errs bytes packets errs bytes colls 742614 91687 46042072 1 0 226 0 739746 85695 45864256 1 0 178 0 733723 85162 45490840 3 0 398 0 737561 102207 45728786 1 0 178 0 739618 127597 45856320 1 0 178 0 ^C Hrm finally same pps as 64 bit..... Now I wonder what happens if I go back to the 64 bit and try 1000 256/256 ?? I don't think I tried that one.. Guess another reinstall :> Installing 64 bit.. (again) Just to be sure.. Paul Ingo Flaschberger wrote: > Dear Paul, > >> SMP DISABLED on my Opteron 2212 (ULE, Preemption on) >> Yields ~750kpps in em0 and out em1 (one direction) >> I am miffed why this yields more pps than >> a) with all 4 cpus running and b) 4 cpus with lagg load balanced over >> 3 incoming connections so 3 taskq threads > > because less locking, less synchronisation, .... > >> I would be willing to set up test equipment (several servers plugged >> into a switch) with ipkvm and power port access >> if someone or a group of people want to figure out ways to improve >> the routing process, ipfw, and lagg. >> >> Maximum PPS with one ipfw rule on UP: >> tops out about 570Kpps.. almost 200kpps lower ? (frown) > > can you post the rule here? > >> I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's >> in here and see how that scales, using UP same kernel etc I have now. > > really, please try 32bit and 1 cpu. > > Kind regards, > Ingo Flaschberger > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From daniel at skytek.it Thu Jul 3 07:05:16 2008 From: daniel at skytek.it (Daniel Ponticello) Date: Thu Jul 3 07:05:21 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <20080703025822.GA24765@server.vk2pj.dyndns.org> References: <20080703025822.GA24765@server.vk2pj.dyndns.org> Message-ID: <486C7A2B.1050902@skytek.it> Hi Peter, i'm having exactly the same problem, but without NAT configuration. Just a simple host on network 192.168.170.xxx that when tries to reach an host on 192.168.181.xxx: it gives the same error arplookup 192.168.181.253 failed: host is not on local network The funny thing is that IP connection works anyway. The problem is present only when trying to reach network 192.168.181.xxx, which is absolutely not on local network. The problem started with freebsd 5.3 and today, with 7, it is still present. Daniel Peter Jeremy ha scritto: > I'm occasionally seeing pairs of messages like the following on > my NAT host: > arplookup 192.168.181.114 failed: host is not on local network > arpresolve: can't allocate route for 192.168.181.114 > > In my particular configuration, there are dual subnets between the NAT > and target host. My initial assumption was that the request was > arriving on the other subnet and I added if_xname to the arplookup > printf() - but that shows that interface matches the IP address. > I've looked back through the mailing lists but the previous reports > of this problem don't match my scenario. > > I've seen this with FreeBSD 5.3, 6.2 and 7.0. > > The (in)frequency of the problem makes me wonder if it's actually a > resource exhaustion problem. > > Has anyone got any suggestions? > > -- WBR, Cordiali Saluti, Daniel Ponticello, VP of Engineering Network Coordination Centre of Skytek --- - For further information about our services: - Please visit our website at http://www.Skytek.it --- From brde at optusnet.com.au Thu Jul 3 07:07:32 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Thu Jul 3 07:07:37 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486BC7F5.5070604@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> Message-ID: <20080703160540.W6369@delplex.bde.org> On Wed, 2 Jul 2008, Paul wrote: >... > -----------Reboot with 4096/4096........(my guess is that it will be a lot > worse, more errors..) > ........ > Without polling, 4096 is horrible, about 200kpps less ... :/ > Turning on polling.. > polling on, 4096 is bad, > input (em0) output > packets errs bytes packets errs bytes colls > 622379 307753 38587506 1 0 178 0 > 635689 277303 39412718 1 0 178 0 > ... > ------Rebooting with 256/256 descriptors.......... > .......... > No polling: > 843762 25337 52313248 1 0 178 0 > 763555 0 47340414 1 0 178 0 > 830189 0 51471722 1 0 178 0 > 838724 0 52000892 1 0 178 0 > 813594 939 50442832 1 0 178 0 > 807303 763 50052790 1 0 178 0 > 791024 0 49043492 1 0 178 0 > 768316 1106 47635596 1 0 178 0 > Machine is maxed and is unresponsive.. That's the most interesting one. Even 1% packet loss would probably destroy performance, so the benchmarks that give 10-50% packet loss are uninteresting. All indications are that you are running out of CPU and memory (DMA and/or cache fills) throughput. The above apparently hits both limits at the same time, while with more descriptors memory throughput runs out first. 1 CPU is apparently barely enough for 800 kpps (is this all with UP now?), and I think more CPUs could only be slower, as you saw with SMP, especially using multiple em taskqs, since memory traffic would be higher. I wouldn't expect this to be fixed soon (except by throwing better/different hardware at it). The CPU/DMA balance can probably be investigated by slowing down the CPU/ memory system. You may remember my previous mail about getting higher pps on bge. Again, all indications are that I'm running out of CPU, memory, and bus throughput too since the bus is only PCI 33MHz. These interact in a complicated way which I haven't been able to untangle. -current is fairly consistently slower than my ~5.2 by about 10%, apparently due to code bloat (extra CPU and related extra cache misses). OTOH, like you I've seen huge variations for changes that should be null (e.g., disturbing the alignment of the text section without changing anything else). My ~5.2 is very consistent since I rarely change it, while -current changes a lot and shows more variation, but with no sign of getting near the ~5.2 plateau or even its old peaks. > Polling ON: > input (em0) output > packets errs bytes packets errs bytes colls > 784138 179079 48616564 1 0 226 0 > 788815 129608 48906530 2 0 356 0 > 755555 142997 46844426 2 0 468 0 > 803670 144459 49827544 1 0 178 0 > 777649 147120 48214242 1 0 178 0 > 779539 146820 48331422 1 0 178 0 > 786201 148215 48744478 2 0 356 0 > 776013 101660 48112810 1 0 178 0 > 774239 145041 48002834 2 0 356 0 > 771774 102969 47850004 1 0 178 0 > > Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm really > mistified by this.. Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy to explain (perhaps incorrectly). Polling can then read at most 256 descriptors every 1/2000 second, giving a max throughput of 512 kpps. Packets < descriptors in general but might be equal here (for small packets). You seem to actually get 784 kpps, which is too high even in descriptors unless, but matches exactly if the errors are counted twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% still happens to be left over after giving up at 512 kpps. Most of the errors are probably handled by the hardware at low cost in CPU by dropping packets. There are other types of errors but none except dropped packets is likely. > Every time it maxes out and gets errors, top reports: > CPU: 0.0% user, 0.0% nice, 10.1% system, 45.3% interrupt, 44.6% idle > pretty much the same line every time > > 256/256 blows away 4096 , probably fits the descriptors into the cache lines > on the cpu and 4096 has too many cache misses and causes worse performance. Quite likely. Maybe your systems have memory systems that are weak relative to other resources, so that they this limit sooner than expected. I should look at my "fixes" for bge, one than changes rxd from 256 to 512, and one that increases the ifq tx length from txd = 512 to about 20000. Both of these might thrash caches. The former makes little difference except for polling at < 4000 Hz, but I don't believe in or use polling. The latter works around select() for write descriptors not working on sockets, so that high frequency polling from userland is not needed to determine a good time to retry after ENOBUFs errors. This is probably only important in pps benchmarks. txd = 512 gives good efficiency in my version of bge, but might be too high for good throughput and is mostly wasted in distribution versions of FreeBSD. Bruce From paul at gtcomm.net Thu Jul 3 07:26:21 2008 From: paul at gtcomm.net (Paul) Date: Thu Jul 3 07:26:26 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080703160540.W6369@delplex.bde.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> Message-ID: <486C7F93.7010308@gtcomm.net> Bruce Evans wrote: > On Wed, 2 Jul 2008, Paul wrote: > >> ... >> -----------Reboot with 4096/4096........(my guess is that it will be >> a lot worse, more errors..) >> ........ >> Without polling, 4096 is horrible, about 200kpps less ... :/ >> Turning on polling.. >> polling on, 4096 is bad, >> input (em0) output >> packets errs bytes packets errs bytes colls >> 622379 307753 38587506 1 0 178 0 >> 635689 277303 39412718 1 0 178 0 >> ... >> ------Rebooting with 256/256 descriptors.......... >> .......... >> No polling: >> 843762 25337 52313248 1 0 178 0 >> 763555 0 47340414 1 0 178 0 >> 830189 0 51471722 1 0 178 0 >> 838724 0 52000892 1 0 178 0 >> 813594 939 50442832 1 0 178 0 >> 807303 763 50052790 1 0 178 0 >> 791024 0 49043492 1 0 178 0 >> 768316 1106 47635596 1 0 178 0 >> Machine is maxed and is unresponsive.. > > That's the most interesting one. Even 1% packet loss would probably > destroy performance, so the benchmarks that give 10-50% packet loss > are uninteresting. > But you realize that it's outputting all of these packets on em3 and I'm watching them coming out and they are consistent with the packets received on em0 that netstat shows are 'good' packets. > All indications are that you are running out of CPU and memory (DMA > and/or cache fills) throughput. The above apparently hits both limits > at the same time, while with more descriptors memory throughput runs > out first. 1 CPU is apparently barely enough for 800 kpps (is this > all with UP now?), and I think more CPUs could only be slower, as you > saw with SMP, especially using multiple em taskqs, since memory traffic > would be higher. I wouldn't expect this to be fixed soon (except by > throwing better/different hardware at it). > > The CPU/DMA balance can probably be investigated by slowing down the CPU/ > memory system. > I'm using a server opteron which supposedly has the best memory performance out of any CPU right now. Plus opterons have the biggest l1 cache, but small l2 cache. Do you think larger l2 cache on the Xeon (6mb for 2 core) would be better? I have a 2222 opteron coming which is 1ghz faster so we will see what happens :> My NIC is PCI-E 4x so there's no bottleneck there. > You may remember my previous mail about getting higher pps on bge. > Again, all indications are that I'm running out of CPU, memory, and > bus throughput too since the bus is only PCI 33MHz. These interact > in a complicated way which I haven't been able to untangle. -current > is fairly consistently slower than my ~5.2 by about 10%, apparently > due to code bloat (extra CPU and related extra cache misses). OTOH, > like you I've seen huge variations for changes that should be null > (e.g., disturbing the alignment of the text section without changing > anything else). My ~5.2 is very consistent since I rarely change it, > while -current changes a lot and shows more variation, but with no > sign of getting near the ~5.2 plateau or even its old peaks. > >> Polling ON: >> input (em0) output >> packets errs bytes packets errs bytes colls >> 784138 179079 48616564 1 0 226 0 >> 788815 129608 48906530 2 0 356 0 >> 755555 142997 46844426 2 0 468 0 >> 803670 144459 49827544 1 0 178 0 >> 777649 147120 48214242 1 0 178 0 >> 779539 146820 48331422 1 0 178 0 >> 786201 148215 48744478 2 0 356 0 >> 776013 101660 48112810 1 0 178 0 >> 774239 145041 48002834 2 0 356 0 >> 771774 102969 47850004 1 0 178 0 >> >> Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm >> really mistified by this.. > > Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy > to explain (perhaps incorrectly). Polling can then read at most 256 > descriptors every 1/2000 second, giving a max throughput of 512 kpps. > Packets < descriptors in general but might be equal here (for small > packets). You seem to actually get 784 kpps, which is too high even > in descriptors unless, but matches exactly if the errors are counted > twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% > still happens to be left over after giving up at 512 kpps. Most of > the errors are probably handled by the hardware at low cost in CPU by > dropping packets. There are other types of errors but none except > dropped packets is likely. > Read above, it's actually transmitting 770kpps out of em3 so it can't just be 512kpps. I suppose multiple packets can fit in 1 descriptor? I am using VERY small tcp packets.. >> Every time it maxes out and gets errors, top reports: >> CPU: 0.0% user, 0.0% nice, 10.1% system, 45.3% interrupt, 44.6% idle >> pretty much the same line every time >> >> 256/256 blows away 4096 , probably fits the descriptors into the >> cache lines on the cpu and 4096 has too many cache misses and causes >> worse performance. > > Quite likely. Maybe your systems have memory systems that are weak > relative > to other resources, so that they this limit sooner than expected. > > I should look at my "fixes" for bge, one than changes rxd from 256 to > 512, > and one that increases the ifq tx length from txd = 512 to about 20000. > Both of these might thrash caches. The former makes little difference > except for polling at < 4000 Hz, but I don't believe in or use polling. > The latter works around select() for write descriptors not working on > sockets, so that high frequency polling from userland is not needed to > determine a good time to retry after ENOBUFs errors. This is probably > only important in pps benchmarks. txd = 512 gives good efficiency in > my version of bge, but might be too high for good throughput and is > mostly > wasted in distribution versions of FreeBSD. > I was thinking of trying 4 or 5.. but how would that work with this new hardware? Thanks Paul From fox at verio.net Thu Jul 3 07:40:41 2008 From: fox at verio.net (David DeSimone) Date: Thu Jul 3 07:40:45 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <20080703025822.GA24765@server.vk2pj.dyndns.org> References: <20080703025822.GA24765@server.vk2pj.dyndns.org> Message-ID: <20080703071629.GA29305@verio.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Peter Jeremy wrote: > > I'm occasionally seeing pairs of messages like the following on > my NAT host: > arplookup 192.168.181.114 failed: host is not on local network > arpresolve: can't allocate route for 192.168.181.114 We see these too on a 7.0 box (amd64). My theory is that this is a response to ARP requests. ARP requests are broadcast, so the BSD box hears someone asking for this IP, but cannot find it on any local interfaces, and so complains that it cannot construct a proper reply. Have not tested it though. - -- David DeSimone == Network Admin == fox@verio.net "This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, dis- tribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you." --Lawyer Bot 6000 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFIbHzNFSrKRjX5eCoRAlfXAKCDLSbKzl2aNF9rPFpLQuyknm6dGgCeNAK0 DYsnpm+5EED36G2D461JQR8= =VP12 -----END PGP SIGNATURE----- This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you. From stefan.lambrev at moneybookers.com Thu Jul 3 07:48:28 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Thu Jul 3 07:48:31 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <20080703025822.GA24765@server.vk2pj.dyndns.org> References: <20080703025822.GA24765@server.vk2pj.dyndns.org> Message-ID: <486C8446.9060302@moneybookers.com> Hi, Peter Jeremy wrote: > I'm occasionally seeing pairs of messages like the following on > my NAT host: > arplookup 192.168.181.114 failed: host is not on local network > arpresolve: can't allocate route for 192.168.181.114 > Normally this happens in badly configured LAN. Lets say we have two hosts in the same physical network (same switch for example) Host1 is configured 192.168.1.33/24 and Hosts2 have 192.168.1.1/30 Now when a broadcast or other packet is sent from Host1 it can reach Host2 without a problem. But when Host2 try reach directly Host1 it doesn't know how and from here - can't allocate route ... I bet 192.168.181.114 have a wrong network mask ;) > In my particular configuration, there are dual subnets between the NAT > and target host. My initial assumption was that the request was > arriving on the other subnet and I added if_xname to the arplookup > printf() - but that shows that interface matches the IP address. > I've looked back through the mailing lists but the previous reports > of this problem don't match my scenario. > > I've seen this with FreeBSD 5.3, 6.2 and 7.0. > > The (in)frequency of the problem makes me wonder if it's actually a > resource exhaustion problem. > > Has anyone got any suggestions? > > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From brde at optusnet.com.au Thu Jul 3 10:42:02 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Thu Jul 3 10:42:06 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486C7F93.7010308@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> Message-ID: <20080703195521.O6973@delplex.bde.org> On Thu, 3 Jul 2008, Paul wrote: > Bruce Evans wrote: >>> No polling: >>> 843762 25337 52313248 1 0 178 0 >>> 763555 0 47340414 1 0 178 0 >>> 830189 0 51471722 1 0 178 0 >>> 838724 0 52000892 1 0 178 0 >>> 813594 939 50442832 1 0 178 0 >>> 807303 763 50052790 1 0 178 0 >>> 791024 0 49043492 1 0 178 0 >>> 768316 1106 47635596 1 0 178 0 >>> Machine is maxed and is unresponsive.. >> >> That's the most interesting one. Even 1% packet loss would probably >> destroy performance, so the benchmarks that give 10-50% packet loss >> are uninteresting. >> > But you realize that it's outputting all of these packets on em3 and I'm > watching them coming out > and they are consistent with the packets received on em0 that netstat shows > are 'good' packets. Well, output is easier. I don't remember seeing the load on a taskq for em3. If there is a memory bottleneck, it might to might not be more related to running only 1 taskq per interrupt, depending on how independent the memory system is for different CPU. I think Opterons have more indenpendence here than most x86's. > I'm using a server opteron which supposedly has the best memory performance > out of any CPU right now. > Plus opterons have the biggest l1 cache, but small l2 cache. Do you think > larger l2 cache on the Xeon (6mb for 2 core) would be better? > I have a 2222 opteron coming which is 1ghz faster so we will see what happens I suspect lower latency memory would help more. Big memory systems have inherently higher latency. My little old A64 workstation and laptop have main memory latencies 3 times smaller than freebsd.org's new Core2 servers according to lmbench2 (42 nsec for the overclocked DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec). If there are a lot of cache misses, then the extra 100 nsec can be important. Profiling of sendto() using hwpmc or perfmon shows a significant number of cache misses per packet (2 or 10?). >>> Polling ON: >>> input (em0) output >>> packets errs bytes packets errs bytes colls >>> 784138 179079 48616564 1 0 226 0 >>> 788815 129608 48906530 2 0 356 0 >>> Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm really >>> mistified by this.. >> >> Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy >> to explain (perhaps incorrectly). Polling can then read at most 256 >> descriptors every 1/2000 second, giving a max throughput of 512 kpps. >> Packets < descriptors in general but might be equal here (for small >> packets). You seem to actually get 784 kpps, which is too high even >> in descriptors unless, but matches exactly if the errors are counted >> twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% >> still happens to be left over after giving up at 512 kpps. Most of >> the errors are probably handled by the hardware at low cost in CPU by >> dropping packets. There are other types of errors but none except >> dropped packets is likely. >> > Read above, it's actually transmitting 770kpps out of em3 so it can't just be > 512kpps. Transmitting is easier, but with polling its even harder to send faster than hz * queue_length than to receive. This is without polling in idle. > I was thinking of trying 4 or 5.. but how would that work with this new > hardware? Poorly, except possibly with polling in FreeBSD-4. FreeBSD-4 generally has lower overheads and latency, but is missing important improvements (mainly tcp optimizations in upper layers, better DMA and/or mbuf handling, and support for newer NICs). FreeBSD-5 is also missing the overhead+latency advantage. Here are some benchmarks. (ttcp mainly tests sendto(). 4.10 em needed a 2-line change to support a not-so-new PCI em NIC. Summary: - my bge NIC can handle about 600 kpps on my faster machine, but only achieves 300 in 4.10 unpatched. - my em NIC can handle about 400 kpps on my slower machine, except in later versions it can receive at about 600 kpps. - only 6.x and later can achieve near wire throughput for 1500-MTU packets (81 kpps vs 76 kpps). This depends on better DMA or mbuf handling... I now remember the details -- it is mainly better mbuf handling: old versions split the 1500-MTU packets into 2 mbufs and this causes 2 descriptors per packet, which causes extra software overheads and even larger overheads for the hardware. %%% Results of benchmarks run on 23 Feb 2007: my~5.2 bge --> ~4.10 em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 639 98 1660 398* 77 8k ttcp -l5 -t 6.0 100 3960 6.0 6 5900 ttcp -l1472 -u -t 76 27 395 76 40 8k ttcp -l1472 -t 51 40 11k 51 26 8k (*) Same as sender according to netstat -I, but systat -ip shows that almost half aren't delivered to upper layers. my~5.2 bge --> 4.11 em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 635 98 1650 399* 74 8k ttcp -l5 -t 5.8 100 3900 5.8 6 5800 ttcp -l1472 -u -t 76 27 395 76 32 8k ttcp -l1472 -t 51 40 11k 51 25 8k (*) Same as sender according to netstat -I, but systat -ip shows that almost half aren't delivered to upper layers. my~5.2 bge --> my~5.2 em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 638 98 1660 394* 100- 8k ttcp -l5 -t 5.8 100 3900 5.8 9 6000 ttcp -l1472 -u -t 76 27 395 76 46 8k ttcp -l1472 -t 51 40 11k 51 35 8k (*) Same as sender according to netstat -I, but systat -ip shows that almost half aren't delivered to upper layers. With the em rate limit on ips changed from 8k to 80k, about 95% are delivered up. my~5.2 bge --> 6.2 em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 637 98 1660 637 100- 15k ttcp -l5 -t 5.8 100 3900 5.8 8 12k ttcp -l1472 -u -t 76 27 395 76 36 16k ttcp -l1472 -t 51 40 11k 51 37 16k my~5.2 bge --> ~current em-fastintr tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 641 98 1670 641 99 8k ttcp -l5 -t 5.9 100 2670 5.9 7 6k ttcp -l1472 -u -t 76 27 395 76 35 8k ttcp -l1472 -t 52 43 11k 52 30 8k ~6.2 bge --> ~current em-fastintr tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 309 62 1600 309 64 8k ttcp -l5 -t 4.9 100 3000 4.9 6 7k ttcp -l1472 -u -t 76 27 395 76 34 8k ttcp -l1472 -t 54 28 6800 54 30 8k ~current bge --> ~current em-fastintr tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t 602 100 1570 602 99 8k ttcp -l5 -t 5.3 100 2660 5.3 5 5300 ttcp -l1472 -u -t 81# 19 212 81# 38 8k ttcp -l1472 -t 53 34 11k 53 30 8k (#) Wire speed to within 0.5%. This is the only kppps in this set of benchmarks that is close to wire speed. Older kernels apparently lose relative to -current because mbufs for mtu-sized packets are not contiguous in older kernels. Old results: ~4.10 bge --> my~5.2 em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t n/a n/a n/a 346 79 8k ttcp -l5 -t n/a n/a n/a 5.4 10 6800 ttcp -l1472 -u -t n/a n/a n/a 67 40 8k ttcp -l1472 -t n/a n/a n/a 51 36 8k ~4.10 kernel, =4 bge --> ~current em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t n/a n/a n/a 347 96 14k ttcp -l5 -t n/a n/a n/a 5.8 10 14k ttcp -l1472 -u -t n/a n/a n/a 67 62 14K ttcp -l1472 -t n/a n/a n/a 52 40 16k ~4.10 kernel, =4+ bge --> ~current em tx rx kpps load% ips kpps load% ips ttcp -l5 -u -t n/a n/a n/a 627 100 9k ttcp -l5 -t n/a n/a n/a 5.6 9 13k ttcp -l1472 -u -t n/a n/a n/a 68 63 14k ttcp -l1472 -t n/a n/a n/a 54 44 16k %%% %%% Results of benchmarks run on 28 Dec 2007: ~5.2 epsplex (em) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 825k 3 206k 229 412k 52.1 45.1 2.8 local with sink: 659k 3 263k 231 131k 66.5 27.3 6.2 tx remote no sink: 35k 3 273k 8237 266k 42.0 52.1 2.3 3.6 tx remote with sink: 26k 3 394k 8224 100 60.0 5.41 3.4 11.2 rx remote no sink: 25k 4 26 8237 373k 20.6 79.4 0.0 0.0 rx remote with sink: 30k 3 203k 8237 398k 36.5 60.7 2.8 0.0 6.3-PR besplex (em) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 417k 1 208k 418k 2 49.5 48.5 2.0 local with sink: 420k 1 276k 145k 2 70.0 23.6 6.4 tx remote no sink: 19k 2 250k 8144 2 58.5 38.7 2.8 0.0 tx remote with sink: 16k 2 361k 8336 2 72.9 24.0 3.1 4.4 rx remote no sink: 429 3 49 888 2 0.3 99.33 0.0 0.4 tx remote with sink: 13k 2 316k 5385 2 31.7 63.8 3.6 0.8 8.0-C epsplex (em-fast) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 442k 3 221k 230 442k 47.2 49.6 2.7 local with sink: 394k 3 262k 228 131k 72.1 22.6 5.3 tx remote no sink: 17k 3 226k 7832 100 94.1 0.2 3.0 0.0 tx remote with sink: 17k 3 360k 7962 100 91.7 0.2 3.7 4.4 rx remote no sink: saturated -- cannot update systat display rx remote with sink: 15k 6 358k 8224 100 97.0 0.0 2.5 0.5 ~4.10 besplex (bge) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 15 0 425k 228 11 96.3 0.0 3.7 local with sink: ** 0 622k 229 ** 94.7 0.3 5.0 tx remote no sink: 29 1 490k 7024 11 47.9 29.8 4.4 17.9 tx remote with sink: 26 1 635k 1883 11 65.7 11.4 5.6 17.3 rx remote no sink: 5 1 68 7025 1 0.0 47.3 0.0 52.7 rx remote with sink: 6679 2 365k 6899 12 19.7 29.2 2.5 48.7 ~5.2-C besplex (bge) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 1M 3 271k 229 543k 50.7 46.8 2.5 local with sink: 1M 3 406k 229 203k 67.4 28.2 4.4 tx remote no sink: 49k 3 474k 11k 167k 52.3 42.7 5.0 0.0 tx remote with sink: 6371 3 641k 1900 100 76.0 16.8 6.2 0.9 rx remote no sink: 34k 3 25 11k 270k 0.8 65.4 0.0 33.8 rx remote with sink: 41k 3 365k 10k 370k 31.5 47.1 2.3 19.0 6.3-PR besplex (bge) ttcp (hz = 1000 else stathz broken): Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 540k 0 270k 540k 0 50.5 46.0 3.5 local with sink: 628k 0 417k 210k 0 68.8 27.9 3.3 tx remote no sink: 15k 1 222k 7190 1 28.4 29.3 1.7 40.6 tx remote with sink: 5947 1 315k 2825 1 39.9 14.7 2.6 42.8 rx remote no sink: 13k 1 23 6943 0 0.3 49.5 0.2 50.0 rx remote with sink: 20k 1 371k 6819 0 29.5 30.1 3.9 36.5 8.0-C besplex (bge) ttcp: Csw Trp Sys Int Sof Sys Intr User Idle local no sink: 649k 3 324k 100 649k 53.9 42.9 3.2 local with sink: 649k 3 433k 100 216k 75.2 18.8 6.0 tx remote no sink: 24k 3 432k 10k 100 49.7 41.3 2.4 6.6 tx remote with sink: 3199 3 568k 1580 100 64.3 19.6 4.0 12.2 rx remote no sink: 20k 3 27 10k 100 0.0 46.1 0.0 53.9 rx remote with sink: 31k 3 370k 10k 100 30.7 30.9 4.8 33.5 %%% Bruce From if at xip.at Thu Jul 3 10:46:04 2008 From: if at xip.at (Ingo Flaschberger) Date: Thu Jul 3 10:46:10 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486B7C69.1010304@moneybookers.com> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486B7C69.1010304@moneybookers.com> Message-ID: Dear Stefan, >>> So my maximum without polling is close to 800kpps but if I push that it >>> starts locking me from doing things, or >> >> how many kpps do you want to achieve? > Do not know for Paul but, I want to be able to route (and/or bridge to > handle) 600-700mbps syn flood, > which is something like 1500kpps in every direction. Is it unrealistic? yes, I think so. look at this project: http://yuba.stanford.edu/NetFPGA/ This card(s) could do that. Maximum count of routes seems to be limited, but with lpf it should work. A freebsd-kernel interface is missing. > If the code is optimized to fully utilize MP I do not see a reason why quad > core processor should not be able to do this. > After all single core seems to handle 500kpps, if we utilize four, eight or > even more cores we should be able to route 1500kpps + ? Theres a "sun" used at quagga dev as bgp-route-server. http://quagga.net/route-server.php (but they don't answered my question regarding fw-performance). > I hope TOE once MFCed to 7-STABLE will help too? I don't think toe will help. Kind regards, Ingo Flaschberger From if at xip.at Thu Jul 3 10:52:19 2008 From: if at xip.at (Ingo Flaschberger) Date: Thu Jul 3 10:52:23 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486C7611.9030905@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> Message-ID: Dear Paul, > Tomorrow comes opteron 2222 so it's 1ghz faster than this one, and I can see > if it scales directly with cpu speed or what happens. can you send me a lspci -v? > I did another SMP test with an interesting results. I took one of the cpus > out of the machine, so it was just left with a single 2212 (dual core) > and it performed better. Less contention I suppose? in smp locking is a performance killer. My next "router" appliance will be: http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 Kind regards, Ingo Flaschberger From if at xip.at Thu Jul 3 10:57:42 2008 From: if at xip.at (Ingo Flaschberger) Date: Thu Jul 3 10:57:47 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486B7C69.1010304@moneybookers.com> Message-ID: Dear Stefan, >>>> So my maximum without polling is close to 800kpps but if I push that it >>>> starts locking me from doing things, or >>> >>> how many kpps do you want to achieve? >> Do not know for Paul but, I want to be able to route (and/or bridge to >> handle) 600-700mbps syn flood, >> which is something like 1500kpps in every direction. Is it unrealistic? I would also give Dragonfly bsd a try, as Mike had the best results with it. Kind regards, Ingo Flaschberger From peterjeremy at optushome.com.au Thu Jul 3 11:52:48 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Thu Jul 3 11:52:51 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <486C8446.9060302@moneybookers.com> References: <20080703025822.GA24765@server.vk2pj.dyndns.org> <486C8446.9060302@moneybookers.com> Message-ID: <20080703115243.GR29380@server.vk2pj.dyndns.org> OK, my responses to the replies so far. One off-line reply requested a topology and netstat output. Since the toplogy may be relevant, below is an extremely simplified approximation (the real network has about 60 subnets and about 70 hosts, each appearing in between two and four subnets). Corp Network 192.168.10.0/24 | 192.168.12.0/24 +------+-------------+----------| | |----------+-------------+-----+ .1| .2| .254| | |.254 .3| .4| +-------+ +-------+ +-------+ +-------+ +-------+ | | | | | | | | | | | host1 | | host2 | | NAT | | host3 | | host4 | | | | | | | | | | | +-------+ +-------+ +-------+ +-------+ +-------+ .1| .2| .254| |.254 .3| .4| +------+-------------+----------| |----------+-------------+-----+ 192.168.11.0/24 192.168.13.0/24 The errors appear to be randomly spread across hosts and subnets. It does not appear consistently and seems to correlate with load (I am getting significant numbers at present and the NAT host is routing about 90Kpps and 100MBps if netstat can be believed). The problem also shows up on another interior routing host that has visibility across the internal networks so it isn't related to NAT or directly related to host load (that host is only seeing about 3.5Kpps - but is also a much slower host). I have managed to capture a tcpdump across the error. syslog reported: Jul 3 21:28:30 xxxx kernel: arplookup 192.168.169.26 failed: host is not on local network and the packets for that host during that second are: 21:28:30.320340 00:0b:cd:d6:66:26 > 00:03:ba:ab:6f:ef, ethertype 802.1Q (0x8100), length 64: vlan 169, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 29304, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.111: icmp 8: echo request seq 35079 21:28:30.320429 00:d0:b7:20:8f:ee > 00:03:ba:ab:6f:ef, ethertype 802.1Q (0x8100), length 46: vlan 168, p 0, ethertype IPv4, IP (tos 0x0, ttl 63, id 29304, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.111: icmp 8: echo request seq 35079 21:28:30.320445 00:0b:cd:d6:66:26 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 169, p 0, ethertype ARP, arp who-has 192.168.169.250 tell 192.168.169.26 21:28:30.320459 00:0b:cd:d6:66:26 > 00:d0:b7:20:8f:ee, ethertype 802.1Q (0x8100), length 64: vlan 169, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 29307, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.250: icmp 8: echo request seq 35079 21:28:30.320493 00:d0:b7:20:8f:ee > 00:0b:cd:d6:66:e4, ethertype 802.1Q (0x8100), length 46: vlan 168, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 15305, offset 0, flags [none], length: 28) 192.168.169.250 > 192.168.169.26: icmp 8: echo reply seq 35079 21:28:30.320531 00:d0:b7:20:8f:ee > 00:0b:cd:d6:66:26, ethertype 802.1Q (0x8100), length 46: vlan 169, p 0, ethertype ARP, arp reply 192.168.169.250 is-at 00:d0:b7:20:8f:ee (this was captured MAC 00:d0:b7:20:8f:ee). Possibly, I'm seeing packet leakage from the switches and that is confusing FreeBSD - definitely the first packet above should not be visible. On 2008-Jul-03 09:05:15 +0200, Daniel Ponticello wrote: >i'm having exactly the same problem, but without NAT configuration. Just >a simple host on network 192.168.170.xxx >that when tries to reach an host on 192.168.181.xxx: it gives the same error Except that in my case, the addresses _are_ local. On 2008-Jul-03 02:16:30 -0500, David DeSimone wrote: >My theory is that this is a response to ARP requests. ARP requests are >broadcast, so the BSD box hears someone asking for this IP, but cannot >find it on any local interfaces, and so complains that it cannot >construct a proper reply. Except that the address it's complaining about is on a local subnet. Interestingly, in the above case, the host is spuriously seeing a packet and has re-routed it via vlan168 - which is the wrong subnet, though the destination host will still see it there. On 2008-Jul-03 10:48:22 +0300, Stefan Lambrev wrote: >I bet 192.168.181.114 have a wrong network mask ;) You lose. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080703/75bfb0a3/attachment.pgp From paul at gtcomm.net Thu Jul 3 12:48:52 2008 From: paul at gtcomm.net (Paul) Date: Thu Jul 3 12:48:57 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080703195521.O6973@delplex.bde.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde. org> Message-ID: <486CCB29.3080308@gtcomm.net> Bruce Evans wrote: > On Thu, 3 Jul 2008, Paul wrote: > >> Bruce Evans wrote: >>>> No polling: >>>> 843762 25337 52313248 1 0 178 0 >>>> 763555 0 47340414 1 0 178 0 >>>> 830189 0 51471722 1 0 178 0 >>>> 838724 0 52000892 1 0 178 0 >>>> 813594 939 50442832 1 0 178 0 >>>> 807303 763 50052790 1 0 178 0 >>>> 791024 0 49043492 1 0 178 0 >>>> 768316 1106 47635596 1 0 178 0 >>>> Machine is maxed and is unresponsive.. >>> >>> That's the most interesting one. Even 1% packet loss would probably >>> destroy performance, so the benchmarks that give 10-50% packet loss >>> are uninteresting. >>> >> But you realize that it's outputting all of these packets on em3 and >> I'm watching them coming out >> and they are consistent with the packets received on em0 that netstat >> shows are 'good' packets. > > Well, output is easier. I don't remember seeing the load on a taskq for > em3. If there is a memory bottleneck, it might to might not be more > related > to running only 1 taskq per interrupt, depending on how independent the > memory system is for different CPU. I think Opterons have more > indenpendence > here than most x86's. > Opterons have on cpu memory controller.. That should help a little. :P But I must be getting more than 1 packet per descriptor because I can do HZ=100 and still get it without polling.. idle polling helps in all cases of polling that I have tested it with, seems moreso on 32 bit >> I'm using a server opteron which supposedly has the best memory >> performance out of any CPU right now. >> Plus opterons have the biggest l1 cache, but small l2 cache. Do you >> think larger l2 cache on the Xeon (6mb for 2 core) would be better? >> I have a 2222 opteron coming which is 1ghz faster so we will see what >> happens > > I suspect lower latency memory would help more. Big memory systems > have inherently higher latency. My little old A64 workstation and > laptop have main memory latencies 3 times smaller than freebsd.org's > new Core2 servers according to lmbench2 (42 nsec for the overclocked > DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec). > If there are a lot of cache misses, then the extra 100 nsec can be > important. Profiling of sendto() using hwpmc or perfmon shows a > significant number of cache misses per packet (2 or 10?). > The opterons are 667mhz DDR2 [registered], I have a Xeon that is ddr3 but i think the latency is higher than ddr2. I'll look up those programs you mentioned and see If I can run some tests. >>>> Polling ON: >>>> input (em0) output >>>> packets errs bytes packets errs bytes colls >>>> 784138 179079 48616564 1 0 226 0 >>>> 788815 129608 48906530 2 0 356 0 >>>> Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm >>>> really mistified by this.. >>> >>> Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy >>> to explain (perhaps incorrectly). Polling can then read at most 256 >>> descriptors every 1/2000 second, giving a max throughput of 512 kpps. >>> Packets < descriptors in general but might be equal here (for small >>> packets). You seem to actually get 784 kpps, which is too high even >>> in descriptors unless, but matches exactly if the errors are counted >>> twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% >>> still happens to be left over after giving up at 512 kpps. Most of >>> the errors are probably handled by the hardware at low cost in CPU by >>> dropping packets. There are other types of errors but none except >>> dropped packets is likely. >>> >> Read above, it's actually transmitting 770kpps out of em3 so it can't >> just be 512kpps. > > Transmitting is easier, but with polling its even harder to send > faster than > hz * queue_length than to receive. This is without polling in idle. > What i'm saying though, it that it's not giving up at 512kpps because 784kpps is coming in em0 and going out em3 so obviously it's reading more than 256 every 1/2000th of a second (packets). What would be the best settings (theoretical) for 1mpps processing? I actually don't have a problem 'receiving' more than 800kpps with much lower CPU usage if it's going to blackhole . so obviously it can receive a lot more, maybe even line rate pps but i can't generate that much. >> I was thinking of trying 4 or 5.. but how would that work with this >> new hardware? > > Poorly, except possibly with polling in FreeBSD-4. FreeBSD-4 generally > has lower overheads and latency, but is missing important improvements > (mainly tcp optimizations in upper layers, better DMA and/or mbuf > handling, and support for newer NICs). FreeBSD-5 is also missing the > overhead+latency advantage. > > Here are some benchmarks. (ttcp mainly tests sendto(). 4.10 em needed a > 2-line change to support a not-so-new PCI em NIC. Summary: > - my bge NIC can handle about 600 kpps on my faster machine, but only > achieves 300 in 4.10 unpatched. > - my em NIC can handle about 400 kpps on my slower machine, except in > later versions it can receive at about 600 kpps. > - only 6.x and later can achieve near wire throughput for 1500-MTU > packets (81 kpps vs 76 kpps). This depends on better DMA or mbuf > handling... I now remember the details -- it is mainly better mbuf > handling: old versions split the 1500-MTU packets into 2 mbufs and > this causes 2 descriptors per packet, which causes extra software > overheads and even larger overheads for the hardware. > > %%% > Results of benchmarks run on 23 Feb 2007: > > my~5.2 bge --> ~4.10 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 639 98 1660 398* 77 8k > ttcp -l5 -t 6.0 100 3960 6.0 6 5900 > ttcp -l1472 -u -t 76 27 395 76 40 8k > ttcp -l1472 -t 51 40 11k 51 26 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. > > my~5.2 bge --> 4.11 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 635 98 1650 399* 74 8k > ttcp -l5 -t 5.8 100 3900 5.8 6 5800 > ttcp -l1472 -u -t 76 27 395 76 32 8k > ttcp -l1472 -t 51 40 11k 51 25 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. > > my~5.2 bge --> my~5.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 638 98 1660 394* 100- 8k > ttcp -l5 -t 5.8 100 3900 5.8 9 6000 > ttcp -l1472 -u -t 76 27 395 76 46 8k > ttcp -l1472 -t 51 40 11k 51 35 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. With the em rate > limit on ips changed from 8k to 80k, about 95% are delivered up. > > my~5.2 bge --> 6.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 637 98 1660 637 100- 15k > ttcp -l5 -t 5.8 100 3900 5.8 8 12k > ttcp -l1472 -u -t 76 27 395 76 36 16k > ttcp -l1472 -t 51 40 11k 51 37 16k > > my~5.2 bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 641 98 1670 641 99 8k > ttcp -l5 -t 5.9 100 2670 5.9 7 6k > ttcp -l1472 -u -t 76 27 395 76 35 8k > ttcp -l1472 -t 52 43 11k 52 30 8k > > ~6.2 bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 309 62 1600 309 64 8k > ttcp -l5 -t 4.9 100 3000 4.9 6 7k > ttcp -l1472 -u -t 76 27 395 76 34 8k > ttcp -l1472 -t 54 28 6800 54 30 8k > > ~current bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 602 100 1570 602 99 8k > ttcp -l5 -t 5.3 100 2660 5.3 5 5300 > ttcp -l1472 -u -t 81# 19 212 81# 38 8k > ttcp -l1472 -t 53 34 11k 53 30 8k > > (#) Wire speed to within 0.5%. This is the only kppps in this set of > benchmarks that is close to wire speed. Older kernels apparently > lose relative to -current because mbufs for mtu-sized packets are > not contiguous in older kernels. > > Old results: > > ~4.10 bge --> my~5.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 346 79 8k > ttcp -l5 -t n/a n/a n/a 5.4 10 6800 > ttcp -l1472 -u -t n/a n/a n/a 67 40 8k > ttcp -l1472 -t n/a n/a n/a 51 36 8k > > ~4.10 kernel, =4 bge --> ~current em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 347 96 14k > ttcp -l5 -t n/a n/a n/a 5.8 10 14k > ttcp -l1472 -u -t n/a n/a n/a 67 62 14K > ttcp -l1472 -t n/a n/a n/a 52 40 16k > > ~4.10 kernel, =4+ bge --> ~current em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 627 100 9k > ttcp -l5 -t n/a n/a n/a 5.6 9 13k > ttcp -l1472 -u -t n/a n/a n/a 68 63 14k > ttcp -l1472 -t n/a n/a n/a 54 44 16k > %%% > > %%% > Results of benchmarks run on 28 Dec 2007: > > ~5.2 epsplex (em) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 825k 3 206k 229 412k 52.1 45.1 2.8 > local with sink: 659k 3 263k 231 131k 66.5 27.3 6.2 > tx remote no sink: 35k 3 273k 8237 266k 42.0 52.1 2.3 3.6 > tx remote with sink: 26k 3 394k 8224 100 60.0 5.41 3.4 11.2 > rx remote no sink: 25k 4 26 8237 373k 20.6 79.4 0.0 0.0 > rx remote with sink: 30k 3 203k 8237 398k 36.5 60.7 2.8 0.0 > > 6.3-PR besplex (em) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 417k 1 208k 418k 2 49.5 48.5 2.0 > local with sink: 420k 1 276k 145k 2 70.0 23.6 6.4 > tx remote no sink: 19k 2 250k 8144 2 58.5 38.7 2.8 0.0 > tx remote with sink: 16k 2 361k 8336 2 72.9 24.0 3.1 4.4 > rx remote no sink: 429 3 49 888 2 0.3 99.33 0.0 0.4 > tx remote with sink: 13k 2 316k 5385 2 31.7 63.8 3.6 0.8 > > 8.0-C epsplex (em-fast) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 442k 3 221k 230 442k 47.2 49.6 2.7 > local with sink: 394k 3 262k 228 131k 72.1 22.6 5.3 > tx remote no sink: 17k 3 226k 7832 100 94.1 0.2 3.0 0.0 > tx remote with sink: 17k 3 360k 7962 100 91.7 0.2 3.7 4.4 > rx remote no sink: saturated -- cannot update systat display > rx remote with sink: 15k 6 358k 8224 100 97.0 0.0 2.5 0.5 > > ~4.10 besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 15 0 425k 228 11 96.3 0.0 3.7 > local with sink: ** 0 622k 229 ** 94.7 0.3 5.0 > tx remote no sink: 29 1 490k 7024 11 47.9 29.8 4.4 17.9 > tx remote with sink: 26 1 635k 1883 11 65.7 11.4 5.6 17.3 > rx remote no sink: 5 1 68 7025 1 0.0 47.3 0.0 52.7 > rx remote with sink: 6679 2 365k 6899 12 19.7 29.2 2.5 48.7 > > ~5.2-C besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 1M 3 271k 229 543k 50.7 46.8 2.5 > local with sink: 1M 3 406k 229 203k 67.4 28.2 4.4 > tx remote no sink: 49k 3 474k 11k 167k 52.3 42.7 5.0 0.0 > tx remote with sink: 6371 3 641k 1900 100 76.0 16.8 6.2 0.9 > rx remote no sink: 34k 3 25 11k 270k 0.8 65.4 0.0 33.8 > rx remote with sink: 41k 3 365k 10k 370k 31.5 47.1 2.3 19.0 > > 6.3-PR besplex (bge) ttcp (hz = 1000 else stathz broken): > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 540k 0 270k 540k 0 50.5 46.0 3.5 > local with sink: 628k 0 417k 210k 0 68.8 27.9 3.3 > tx remote no sink: 15k 1 222k 7190 1 28.4 29.3 1.7 40.6 > tx remote with sink: 5947 1 315k 2825 1 39.9 14.7 2.6 42.8 > rx remote no sink: 13k 1 23 6943 0 0.3 49.5 0.2 50.0 > rx remote with sink: 20k 1 371k 6819 0 29.5 30.1 3.9 36.5 > > 8.0-C besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 649k 3 324k 100 649k 53.9 42.9 3.2 > local with sink: 649k 3 433k 100 216k 75.2 18.8 6.0 > tx remote no sink: 24k 3 432k 10k 100 49.7 41.3 2.4 6.6 > tx remote with sink: 3199 3 568k 1580 100 64.3 19.6 4.0 12.2 > rx remote no sink: 20k 3 27 10k 100 0.0 46.1 0.0 53.9 > rx remote with sink: 31k 3 370k 10k 100 30.7 30.9 4.8 33.5 > %%% > > Bruce > From paul at gtcomm.net Thu Jul 3 12:50:38 2008 From: paul at gtcomm.net (Paul) Date: Thu Jul 3 12:50:47 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> Message-ID: <486CCB98.5000805@gtcomm.net> Err.. pciconf -lv ? none0@pci0:0:0:0: class=0x050000 card=0x151115d9 chip=0x036910de rev=0xa2 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 Memory Controller' class = memory subclass = RAM isab0@pci0:0:1:0: class=0x060100 card=0x151115d9 chip=0x036410de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 LPC Bridge' class = bridge subclass = PCI-ISA none1@pci0:0:1:1: class=0x0c0500 card=0x151115d9 chip=0x036810de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 SMBus' class = serial bus subclass = SMBus ohci0@pci0:0:2:0: class=0x0c0310 card=0x151115d9 chip=0x036c10de rev=0xa1 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 USB Controller' class = serial bus subclass = USB ehci0@pci0:0:2:1: class=0x0c0320 card=0x151115d9 chip=0x036d10de rev=0xa2 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 USB Controller' class = serial bus subclass = USB atapci0@pci0:0:4:0: class=0x01018a card=0x151115d9 chip=0x036e10de rev=0xa1 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 IDE' class = mass storage subclass = ATA atapci1@pci0:0:5:0: class=0x010185 card=0x151115d9 chip=0x037f10de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 SATA Controller' class = mass storage subclass = ATA atapci2@pci0:0:5:1: class=0x010185 card=0x151115d9 chip=0x037f10de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 SATA Controller' class = mass storage subclass = ATA atapci3@pci0:0:5:2: class=0x010185 card=0x151115d9 chip=0x037f10de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 SATA Controller' class = mass storage subclass = ATA pcib1@pci0:0:6:0: class=0x060401 card=0x151115d9 chip=0x037010de rev=0xa2 hdr=0x01 vendor = 'Nvidia Corp' device = 'MCP55 PCI bridge' class = bridge subclass = PCI-PCI nfe0@pci0:0:8:0: class=0x020000 card=0x151115d9 chip=0x037210de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 Ethernet' class = network subclass = ethernet nfe1@pci0:0:9:0: class=0x020000 card=0x151115d9 chip=0x037210de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP55 Ethernet' class = network subclass = ethernet pcib2@pci0:0:10:0: class=0x060400 card=0x000010de chip=0x037610de rev=0xa3 hdr=0x01 vendor = 'Nvidia Corp' device = 'MCP55 PCIe bridge' class = bridge subclass = PCI-PCI pcib5@pci0:0:13:0: class=0x060400 card=0x000010de chip=0x037810de rev=0xa3 hdr=0x01 vendor = 'Nvidia Corp' device = 'MCP55 PCIe bridge' class = bridge subclass = PCI-PCI pcib6@pci0:0:14:0: class=0x060400 card=0x000010de chip=0x037510de rev=0xa3 hdr=0x01 vendor = 'Nvidia Corp' device = 'MCP55 PCIe bridge' class = bridge subclass = PCI-PCI pcib7@pci0:0:15:0: class=0x060400 card=0x000010de chip=0x037710de rev=0xa3 hdr=0x01 vendor = 'Nvidia Corp' device = 'MCP55 PCIe bridge' class = bridge subclass = PCI-PCI hostb0@pci0:0:24:0: class=0x060000 card=0x00000000 chip=0x11001022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = '(K8) Athlon 64/Opteron HyperTransport Technology Configuration' class = bridge subclass = HOST-PCI hostb1@pci0:0:24:1: class=0x060000 card=0x00000000 chip=0x11011022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = '(K8) Athlon 64/Opteron Address Map' class = bridge subclass = HOST-PCI hostb2@pci0:0:24:2: class=0x060000 card=0x00000000 chip=0x11021022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = '(K8) Athlon 64/Opteron DRAM Controller' class = bridge subclass = HOST-PCI hostb3@pci0:0:24:3: class=0x060000 card=0x00000000 chip=0x11031022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = '(K8) Athlon 64/Opteron Miscellaneous Control' class = bridge subclass = HOST-PCI vgapci0@pci0:1:6:0: class=0x030000 card=0x151115d9 chip=0x515e1002 rev=0x02 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'Radeon ES1000 Radeon ES1000' class = display subclass = VGA pcib3@pci0:2:0:0: class=0x060400 card=0x00000000 chip=0x01251033 rev=0x07 hdr=0x01 vendor = 'NEC Electronics Hong Kong' class = bridge subclass = PCI-PCI pcib4@pci0:2:0:1: class=0x060400 card=0x00000000 chip=0x01251033 rev=0x07 hdr=0x01 vendor = 'NEC Electronics Hong Kong' class = bridge subclass = PCI-PCI pcib8@pci0:7:0:0: class=0x060400 card=0x00000000 chip=0x8018111d rev=0x04 hdr=0x01 vendor = 'Integrated Device Technology Inc.' class = bridge subclass = PCI-PCI pcib9@pci0:8:0:0: class=0x060400 card=0x00000000 chip=0x8018111d rev=0x04 hdr=0x01 vendor = 'Integrated Device Technology Inc.' class = bridge subclass = PCI-PCI pcib10@pci0:8:1:0: class=0x060400 card=0x00000000 chip=0x8018111d rev=0x04 hdr=0x01 vendor = 'Integrated Device Technology Inc.' class = bridge subclass = PCI-PCI em0@pci0:9:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet em1@pci0:9:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet em2@pci0:10:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet em3@pci0:10:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual-Core AMD Opteron(tm) Processor 2212 (2010.32-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x40f12 Stepping = 2 Features=0x178bfbff Features2=0x2001 AMD Features=0xea500800 AMD Features2=0x1f Cores per package: 2 usable memory = 1060921344 (1011 MB) avail memory = 1022271488 (974 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 10:34:18) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of fec00000, 1000 (3) failed acpi0: reservation of fee00000, 1000 (3) failed acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 3ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0 cpu0: on acpi0 powernow0: on cpu0 device_attach: powernow0 attach returned 6 cpu1: on acpi0 powernow1: on cpu1 device_attach: powernow1 attach returned 6 Ingo Flaschberger wrote: > Dear Paul, > >> Tomorrow comes opteron 2222 so it's 1ghz faster than this one, and I >> can see if it scales directly with cpu speed or what happens. > > can you send me a lspci -v? > >> I did another SMP test with an interesting results. I took one of the >> cpus out of the machine, so it was just left with a single 2212 (dual >> core) >> and it performed better. Less contention I suppose? > > in smp locking is a performance killer. > > My next "router" appliance will be: > http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 > > Kind regards, > Ingo Flaschberger > From steve at ibctech.ca Thu Jul 3 12:51:49 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Thu Jul 3 12:51:51 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> Message-ID: <486CCB6A.6070104@ibctech.ca> Ingo Flaschberger wrote: > My next "router" appliance will be: > http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 This is exactly the device that I have been testing with (just rebranded). Steve From if at xip.at Thu Jul 3 12:55:37 2008 From: if at xip.at (Ingo Flaschberger) Date: Thu Jul 3 12:55:41 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486CCB6A.6070104@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> <486CCB6A.6070104@ibctech.ca> Message-ID: Dear Steve, >> My next "router" appliance will be: >> http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 > > This is exactly the device that I have been testing with (just rebranded). cool. what performace do you reach? Kind regards, Ingo Flaschberger From steve at ibctech.ca Thu Jul 3 13:02:19 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Thu Jul 3 13:02:26 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> <486CCB6A.6070104@ibctech.ca> Message-ID: <486CCDE0.9030901@ibctech.ca> Ingo Flaschberger wrote: > Dear Steve, > >>> My next "router" appliance will be: >>> http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 >> >> This is exactly the device that I have been testing with (just >> rebranded). > > cool. > what performace do you reach? It's hard to say right now as I've really only been testing it with BGP and mpd. The only pps testing I've done have been with 100Mbps hosts as the only cards I have for hosts on either side are 're' cards which I have bad luck with. I should be getting more Intel GigE cards today, so I'll be testing proper pps throughput tomorrow. On a side, do you know where I can keep track of driver progress for 're'? I have a boatload of these cards and can't get them to operate properly under any version of FreeBSD. If anyone has any specific tests that they want run on this box, including any tweaking, let me know. Also, I run this box by booting it from USB thumbdrive, so it's trivial for me to simply dd the thumbdrive to another, and have multiple configurations. Steve From fabien.thomas at netasq.com Thu Jul 3 15:36:17 2008 From: fabien.thomas at netasq.com (Fabien Thomas) Date: Thu Jul 3 15:36:21 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486B7C69.1010304@moneybookers.com> Message-ID: <07AF62F2-E35F-4C2B-8C59-9F4E0249BD2A@netasq.com> For your information we have mesured 730Kpps using pollng and fastforwarding with 64bits frame without loss (<0.001% packet loss) on a Spirent Smarbits (Pentium D 2.8GHZ + 8xGig em) You can find the code / and some performance report at : http://www.netasq.com/opensource/pollng-rev1-freebsd.tgz The best performance / CPU cost ratio is to use 1 core only and the others core are free to do application processing. Fabien From thompsa at FreeBSD.org Thu Jul 3 16:01:24 2008 From: thompsa at FreeBSD.org (Andrew Thompson) Date: Thu Jul 3 16:01:29 2008 Subject: if_bridge turns off checksum offload of members? In-Reply-To: <486A0281.208@moneybookers.com> References: <4868A34C.6030304@moneybookers.com> <20080630101629.GD79537@cdnetworks.co.kr> <20080701012531.GA92392@citylink.fud.org.nz> <4869FE2E.4070805@moneybookers.com> <486A0281.208@moneybookers.com> Message-ID: <20080703160246.GA45363@citylink.fud.org.nz> On Tue, Jul 01, 2008 at 01:10:09PM +0300, Stefan Lambrev wrote: > Hi, > > Sorry to reply to myself. > > Stefan Lambrev wrote: >> Hi, >> >> May be a stupid questions, but: >> >> 1) There are zero matches of IFCAP_TOE in kernel sources .. there is not >> support for TOE in 7.0, but may be this is work in progress for 8-current? >> 2) In #define BRIDGE_IFCAPS_MASK (IFCAP_TOE|IFCAP_TSO|IFCAP_TXCSUM) - TOE >> should be repleaced with RXCSUM or just removed? > Your patch plus this small change (replacing TOE with RXCSUM) seems to work > fine for me - kernel compiles without a problem and checksum offload is > enabled after reboot. I have committed an updated version of this patch, thanks for testing. Andrew From kian.mohageri at gmail.com Thu Jul 3 16:20:41 2008 From: kian.mohageri at gmail.com (Kian Mohageri) Date: Thu Jul 3 16:20:45 2008 Subject: connect(): Operation not permitted In-Reply-To: <20080703003955.859BCF180C0@mx.npubs.com> References: <678A03F5-5E8A-4CF6-90DF-AA9A4F30FBE1@stromnet.se> <1211037564.6326.27.camel@porksoda> <679DB462-75D6-45CC-949C-1BE8E12C22CD@stromnet.se> <482FD877.6050707@infracaninophile.co.uk> <20080703003955.859BCF180C0@mx.npubs.com> Message-ID: On Wed, Jul 2, 2008 at 5:39 PM, Stef wrote: > Kian Mohageri wrote: >> On Sun, May 18, 2008 at 3:33 AM, Johan Str?m wrote: >>> On May 18, 2008, at 9:19 AM, Matthew Seaman wrote: >>> >>>> Johan Str?m wrote: >>>> >>>>> drop all traffic)? A check with pfctl -vsr reveals that the actual rule >>>>> inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags >>>>> S/SA keep state". Where did that "keep state" come from? >>>> 'flags S/SA keep state' is the default now for tcp filter rules -- that >>>> was new in 7.0 reflecting the upstream changes made between the 4.0 and >>>> 4.1 >>>> releases of OpenBSD. If you want a stateless rule, append 'no state'. >>>> >>>> http://www.openbsd.org/faq/pf/filter.html#state >>> Thanks! I was actually looking around in the pf.conf manpage but failed to >>> find it yesterday, but looking closer today I now saw it. >>> Applied the no state (and quick) to the rule, and now no state is created. >>> And the problem I had in the first place seems to have been resolved too >>> now, even though it didn't look like a state problem.. (started to deny new >>> connections much earlier than the states was full, altough maybee i wasnt >>> looking for updates fast enough or something). >>> >> >> I'd be willing to bet it's because you're reusing the source port on a >> new connection before the old state expires. >> >> You'll know if you check the state-mismatch counter. >> >> Anyway, glad you found a resolution. > > I've been experiencing this "Operation not permitted" too. I've been > trying to track down the problem for many months, but due to the > complexity of my firewalls (scores of jails each with scores of rules), > I wasn't brave enough to ask for help :) > > As a work around we started creating rules without state, whenever we > would run into the problem. > > Thanks for the pointer about state-mismatch. The state-mismatch counter > does is in fact high in my case (see below). How would I go about > getting the pf state timeout and the reuse of ports for outbound > connections to match? Or is this an intractable problem, that just needs > to be worked around? > Make sure your state-mismatch counter is increasing at the same times you experience the problem (and isn't just high from some unrelated issue). A similar/related problem was addressed in OpenBSD 4.3 (http://www.openbsd.org/plus43.html). * In pf(4), allow state reuse if both sides are in FIN_WAIT_2 and a new SYN arrives. I'm not sure if it's been imported yet. If not, you could try tuning your timeout values (see pf.conf(5)). The specific issue I was experienced was solved by shortening tcp.closed, IIRC. It's been a while though. -Kian From steve at ibctech.ca Thu Jul 3 16:40:52 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Thu Jul 3 16:40:58 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> <486CCB6A.6070104@ibctech.ca> Message-ID: <486D011A.7080406@ibctech.ca> Ingo Flaschberger wrote: > Dear Steve, > >>> My next "router" appliance will be: >>> http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 >> >> This is exactly the device that I have been testing with (just >> rebranded). > > cool. > what performace do you reach? After some very quick testing with everything default, I am witnessing results that are far below what I would have expected. I have a few questions: - how do I identify if polling on an interface is enabled? I see no difference with ifconfig output - do I need to compile a new kernel to be able to enable/disable polling? - without moving some hardware around, I only have a single box connected to a router, and I've been testing from that box to a different interface within the router. Will the test results be optimal if I ping all the way through the router to a second device connected to it? - how are the results affected when generating and receiving the test packets within the router itself (as opposed to using outside devices)? Steve From if at xip.at Thu Jul 3 17:17:04 2008 From: if at xip.at (Ingo Flaschberger) Date: Thu Jul 3 17:17:09 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486D011A.7080406@ibctech.ca> References: <4867420D.7090406@gtcomm.net> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486C7611.9030905@gtcomm.net> <486CCB6A.6070104@ibctech.ca> <486D011A.7080406@ibctech.ca> Message-ID: Dear Steve, >>>> My next "router" appliance will be: >>>> http://www.axiomtek.com.tw/products/ViewProduct.asp?view=429 >>> >>> This is exactly the device that I have been testing with (just rebranded). >> >> cool. >> what performace do you reach? > > After some very quick testing with everything default, I am witnessing > results that are far below what I would have expected. I have a few > questions: > > - how do I identify if polling on an interface is enabled? I see no > difference with ifconfig output em0: flags=8843 mtu 1500 options=5b <--- ether 00:90:0b:08:d7:90 media: Ethernet autoselect (1000baseTX ) status: active kern.polling.reg_frac=20 kern.polling.user_frac=20 kern.polling.burst_max=512 man polling polling does not help to get more pps, but prevent locks and preserve some %cpu for other tasks (routing daemons,..) > - do I need to compile a new kernel to be able to enable/disable polling? options DEVICE_POLLING you need this in kern-conf. > - without moving some hardware around, I only have a single box connected to > a router, and I've been testing from that box to a different interface within > the router. Will the test results be optimal if I ping all the way through > the router to a second device connected to it? use any other packet generator. linux has one in kernel, and there are moch more. (iperf,...) ping uses a lot of cpu. > - how are the results affected when generating and receiving the test packets > within the router itself (as opposed to using outside devices)? thats no real "pps" forwarding performance over the network cards. Kind regards, Ingo Flaschberger From bakul at bitblocks.com Thu Jul 3 19:23:31 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Thu Jul 3 19:23:35 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: Your message of "Thu, 03 Jul 2008 21:52:43 +1000." <20080703115243.GR29380@server.vk2pj.dyndns.org> Message-ID: <20080703190513.5CD5D5B4C@mail.bitblocks.com> > Possibly, I'm seeing packet leakage from the switches and that is > confusing FreeBSD - definitely the first packet above should not be > visible. Even if the switch broadcasts on all ports (effectively becoming a hub) that should not cause the symptom you are seeing. If the switch sent arp response to the wrong port, you would've seen this ARP request at least on the sending machine. There is no such packet (for .26) in your tcpdump output. That either means there was no such packet or you've failed to capture it! You said you see the problem with different freebsd versions. Did you boot diff. versions on the same hardware or do you mean different versions are running on diff. hosts? If the latter, specific freebsd versions are not ruled out. You might want to capture many more arp failed messages to see if there is a pattern. Earlier you had wondered if resource exhaustion was to blame. That is ruled out by the arp failed message since the reason indicates the route goes to a gateway. We don't see any ARP request for .26 so this likely means .26 is not the one doing arp lookup (on receiving a request) & the arplookup failed message is on .111, right? We see packets flowing from .26 to .111 but not the other way around. What does netstat -nr look like on .111? If all the clocks are synchronized, you might want to capture tcpdump on *all* the machines! Since syslog timestamp has a granuality of 1 sec, you want to look at packets within a second before and a second after. BTW, your picture is nice but it doesn't jive with anything in the tcpdump output you attached! > Corp Network > 192.168.10.0/24 | 192.168.12.0/24 > +------+-------------+----------| | |----------+-------------+-----+ > .1| .2| .254| | |.254 .3| .4| > +-------+ +-------+ +-------+ +-------+ +-------+ > | | | | | | | | | | > | host1 | | host2 | | NAT | | host3 | | host4 | > | | | | | | | | | | > +-------+ +-------+ +-------+ +-------+ +-------+ > .1| .2| .254| |.254 .3| .4| > +------+-------------+----------| |----------+-------------+-----+ > 192.168.11.0/24 192.168.13.0/24 > > The errors appear to be randomly spread across hosts and subnets. It > does not appear consistently and seems to correlate with load (I am > getting significant numbers at present and the NAT host is routing > about 90Kpps and 100MBps if netstat can be believed). The problem > also shows up on another interior routing host that has visibility > across the internal networks so it isn't related to NAT or directly > related to host load (that host is only seeing about 3.5Kpps - but is > also a much slower host). > > I have managed to capture a tcpdump across the error. syslog reported: > Jul 3 21:28:30 xxxx kernel: arplookup 192.168.169.26 failed: host is not o= > n local network > and the packets for that host during that second are: > 21:28:30.320340 00:0b:cd:d6:66:26 > 00:03:ba:ab:6f:ef, ethertype 802.1Q (0x= > 8100), length 64: vlan 169, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 2= > 9304, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.111:= > icmp 8: echo request seq 35079 > 21:28:30.320429 00:d0:b7:20:8f:ee > 00:03:ba:ab:6f:ef, ethertype 802.1Q (0x= > 8100), length 46: vlan 168, p 0, ethertype IPv4, IP (tos 0x0, ttl 63, id 2= > 9304, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.111:= > icmp 8: echo request seq 35079 > 21:28:30.320445 00:0b:cd:d6:66:26 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x= > 8100), length 64: vlan 169, p 0, ethertype ARP, arp who-has 192.168.169.250= > tell 192.168.169.26 > 21:28:30.320459 00:0b:cd:d6:66:26 > 00:d0:b7:20:8f:ee, ethertype 802.1Q (0x= > 8100), length 64: vlan 169, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 2= > 9307, offset 0, flags [none], length: 28) 192.168.169.26 > 192.168.169.250:= > icmp 8: echo request seq 35079 > 21:28:30.320493 00:d0:b7:20:8f:ee > 00:0b:cd:d6:66:e4, ethertype 802.1Q (0x= > 8100), length 46: vlan 168, p 0, ethertype IPv4, IP (tos 0x0, ttl 64, id 1= > 5305, offset 0, flags [none], length: 28) 192.168.169.250 > 192.168.169.26:= > icmp 8: echo reply seq 35079 > 21:28:30.320531 00:d0:b7:20:8f:ee > 00:0b:cd:d6:66:26, ethertype 802.1Q (0x= > 8100), length 46: vlan 169, p 0, ethertype ARP, arp reply 192.168.169.250 i= > s-at 00:d0:b7:20:8f:ee > (this was captured MAC 00:d0:b7:20:8f:ee). From zaphod at fsklaw.com Thu Jul 3 19:40:36 2008 From: zaphod at fsklaw.com (zaphod@fsklaw.com) Date: Thu Jul 3 19:41:02 2008 Subject: Tunneling issues Message-ID: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> I have a real poser, and I ccan't solve it. Currently I have a ipsec vpn tunneling 14 servers through a central server. Like this: ________________ | | |_______________| | | _________________ | | |________________| | | _________________ | | |________________| I would like to restructure this so that each server talks to each other directly, rather than passing everything through a single server. However, on every other machine I cannot get a second tunnel to come up. Not a gre or gif tunnel. And yet I have 14 on the central machine. The central machine is FreeBSD5.3, the rest are 6.1 or greater. I also fear that I won't be able to update the central server, because I fear not being able to get the tunnels up. I have been just trying to tunnel. IPSEC isn't the issue as I'm not binding an ipsec policy to the tunnel. I've been googling for days, and can't find anything on this. (Can't find anyone creating more than one tunnel). Any ideas would be appreciated as I'm totally stumped here. TIA Cheers, Zaphod From paul at gtcomm.net Thu Jul 3 20:23:04 2008 From: paul at gtcomm.net (Paul) Date: Thu Jul 3 20:23:08 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080703195521.O6973@delplex.bde.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde. org> Message-ID: <486D35A0.4000302@gtcomm.net> Opteron 2222 UP mode, no polling input (em0) output packets errs bytes packets errs bytes colls 1071020 0 66403248 2 0 404 0 1049793 0 65087174 2 0 356 0 1040320 0 64499848 2 0 356 0 1049712 0 65082152 2 0 356 0 1039504 0 64449256 2 0 356 0 933118 0 57853324 2 0 356 0 still has some cpu left and i can't generate any more packets Polling turned on provided better performance on 32 bit, but it gets strange errors on 64 bit.. Even at low pps I get small amounts of errors, and high pps same thing.. you would think that if it got errors at low pps it would get more errors at high pps but that isn't the case.. Polling on: packets errs bytes packets errs bytes colls 979736 963 60743636 1 0 226 0 991838 496 61493960 1 0 178 0 996125 460 61759754 1 0 178 0 979381 326 60721626 1 0 178 0 1022249 379 63379442 1 0 178 0 991468 557 61471020 1 0 178 0 lowering pps a little....... input (em0) output packets errs bytes packets errs bytes colls 818688 151 50758660 1 0 226 0 837920 179 51951044 1 0 178 0 826217 168 51225458 1 0 178 0 801017 100 49663058 1 0 178 0 761857 287 47235138 1 0 178 0 what could cause this? If i'm going to use a uniprocessor mode system I NEED polling to work because I have to have cpu cycles left over for userspace processes and I can't afford to have it lock those out. SMP is no big deal if it actually worked.. I'm going to do a SMP test with this cpu now with polling off/on and then I'm going to apply the polling patch and try that. Bruce Evans wrote: > On Thu, 3 Jul 2008, Paul wrote: > >> Bruce Evans wrote: >>>> No polling: >>>> 843762 25337 52313248 1 0 178 0 >>>> 763555 0 47340414 1 0 178 0 >>>> 830189 0 51471722 1 0 178 0 >>>> 838724 0 52000892 1 0 178 0 >>>> 813594 939 50442832 1 0 178 0 >>>> 807303 763 50052790 1 0 178 0 >>>> 791024 0 49043492 1 0 178 0 >>>> 768316 1106 47635596 1 0 178 0 >>>> Machine is maxed and is unresponsive.. >>> >>> That's the most interesting one. Even 1% packet loss would probably >>> destroy performance, so the benchmarks that give 10-50% packet loss >>> are uninteresting. >>> >> But you realize that it's outputting all of these packets on em3 and >> I'm watching them coming out >> and they are consistent with the packets received on em0 that netstat >> shows are 'good' packets. > > Well, output is easier. I don't remember seeing the load on a taskq for > em3. If there is a memory bottleneck, it might to might not be more > related > to running only 1 taskq per interrupt, depending on how independent the > memory system is for different CPU. I think Opterons have more > indenpendence > here than most x86's. > >> I'm using a server opteron which supposedly has the best memory >> performance out of any CPU right now. >> Plus opterons have the biggest l1 cache, but small l2 cache. Do you >> think larger l2 cache on the Xeon (6mb for 2 core) would be better? >> I have a 2222 opteron coming which is 1ghz faster so we will see what >> happens > > I suspect lower latency memory would help more. Big memory systems > have inherently higher latency. My little old A64 workstation and > laptop have main memory latencies 3 times smaller than freebsd.org's > new Core2 servers according to lmbench2 (42 nsec for the overclocked > DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec). > If there are a lot of cache misses, then the extra 100 nsec can be > important. Profiling of sendto() using hwpmc or perfmon shows a > significant number of cache misses per packet (2 or 10?). > >>>> Polling ON: >>>> input (em0) output >>>> packets errs bytes packets errs bytes colls >>>> 784138 179079 48616564 1 0 226 0 >>>> 788815 129608 48906530 2 0 356 0 >>>> Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm >>>> really mistified by this.. >>> >>> Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy >>> to explain (perhaps incorrectly). Polling can then read at most 256 >>> descriptors every 1/2000 second, giving a max throughput of 512 kpps. >>> Packets < descriptors in general but might be equal here (for small >>> packets). You seem to actually get 784 kpps, which is too high even >>> in descriptors unless, but matches exactly if the errors are counted >>> twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% >>> still happens to be left over after giving up at 512 kpps. Most of >>> the errors are probably handled by the hardware at low cost in CPU by >>> dropping packets. There are other types of errors but none except >>> dropped packets is likely. >>> >> Read above, it's actually transmitting 770kpps out of em3 so it can't >> just be 512kpps. > > Transmitting is easier, but with polling its even harder to send > faster than > hz * queue_length than to receive. This is without polling in idle. > >> I was thinking of trying 4 or 5.. but how would that work with this >> new hardware? > > Poorly, except possibly with polling in FreeBSD-4. FreeBSD-4 generally > has lower overheads and latency, but is missing important improvements > (mainly tcp optimizations in upper layers, better DMA and/or mbuf > handling, and support for newer NICs). FreeBSD-5 is also missing the > overhead+latency advantage. > > Here are some benchmarks. (ttcp mainly tests sendto(). 4.10 em needed a > 2-line change to support a not-so-new PCI em NIC. Summary: > - my bge NIC can handle about 600 kpps on my faster machine, but only > achieves 300 in 4.10 unpatched. > - my em NIC can handle about 400 kpps on my slower machine, except in > later versions it can receive at about 600 kpps. > - only 6.x and later can achieve near wire throughput for 1500-MTU > packets (81 kpps vs 76 kpps). This depends on better DMA or mbuf > handling... I now remember the details -- it is mainly better mbuf > handling: old versions split the 1500-MTU packets into 2 mbufs and > this causes 2 descriptors per packet, which causes extra software > overheads and even larger overheads for the hardware. > > %%% > Results of benchmarks run on 23 Feb 2007: > > my~5.2 bge --> ~4.10 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 639 98 1660 398* 77 8k > ttcp -l5 -t 6.0 100 3960 6.0 6 5900 > ttcp -l1472 -u -t 76 27 395 76 40 8k > ttcp -l1472 -t 51 40 11k 51 26 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. > > my~5.2 bge --> 4.11 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 635 98 1650 399* 74 8k > ttcp -l5 -t 5.8 100 3900 5.8 6 5800 > ttcp -l1472 -u -t 76 27 395 76 32 8k > ttcp -l1472 -t 51 40 11k 51 25 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. > > my~5.2 bge --> my~5.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 638 98 1660 394* 100- 8k > ttcp -l5 -t 5.8 100 3900 5.8 9 6000 > ttcp -l1472 -u -t 76 27 395 76 46 8k > ttcp -l1472 -t 51 40 11k 51 35 8k > > (*) Same as sender according to netstat -I, but systat -ip shows that > almost half aren't delivered to upper layers. With the em rate > limit on ips changed from 8k to 80k, about 95% are delivered up. > > my~5.2 bge --> 6.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 637 98 1660 637 100- 15k > ttcp -l5 -t 5.8 100 3900 5.8 8 12k > ttcp -l1472 -u -t 76 27 395 76 36 16k > ttcp -l1472 -t 51 40 11k 51 37 16k > > my~5.2 bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 641 98 1670 641 99 8k > ttcp -l5 -t 5.9 100 2670 5.9 7 6k > ttcp -l1472 -u -t 76 27 395 76 35 8k > ttcp -l1472 -t 52 43 11k 52 30 8k > > ~6.2 bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 309 62 1600 309 64 8k > ttcp -l5 -t 4.9 100 3000 4.9 6 7k > ttcp -l1472 -u -t 76 27 395 76 34 8k > ttcp -l1472 -t 54 28 6800 54 30 8k > > ~current bge --> ~current em-fastintr > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t 602 100 1570 602 99 8k > ttcp -l5 -t 5.3 100 2660 5.3 5 5300 > ttcp -l1472 -u -t 81# 19 212 81# 38 8k > ttcp -l1472 -t 53 34 11k 53 30 8k > > (#) Wire speed to within 0.5%. This is the only kppps in this set of > benchmarks that is close to wire speed. Older kernels apparently > lose relative to -current because mbufs for mtu-sized packets are > not contiguous in older kernels. > > Old results: > > ~4.10 bge --> my~5.2 em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 346 79 8k > ttcp -l5 -t n/a n/a n/a 5.4 10 6800 > ttcp -l1472 -u -t n/a n/a n/a 67 40 8k > ttcp -l1472 -t n/a n/a n/a 51 36 8k > > ~4.10 kernel, =4 bge --> ~current em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 347 96 14k > ttcp -l5 -t n/a n/a n/a 5.8 10 14k > ttcp -l1472 -u -t n/a n/a n/a 67 62 14K > ttcp -l1472 -t n/a n/a n/a 52 40 16k > > ~4.10 kernel, =4+ bge --> ~current em > tx rx > kpps load% ips kpps load% ips > ttcp -l5 -u -t n/a n/a n/a 627 100 9k > ttcp -l5 -t n/a n/a n/a 5.6 9 13k > ttcp -l1472 -u -t n/a n/a n/a 68 63 14k > ttcp -l1472 -t n/a n/a n/a 54 44 16k > %%% > > %%% > Results of benchmarks run on 28 Dec 2007: > > ~5.2 epsplex (em) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 825k 3 206k 229 412k 52.1 45.1 2.8 > local with sink: 659k 3 263k 231 131k 66.5 27.3 6.2 > tx remote no sink: 35k 3 273k 8237 266k 42.0 52.1 2.3 3.6 > tx remote with sink: 26k 3 394k 8224 100 60.0 5.41 3.4 11.2 > rx remote no sink: 25k 4 26 8237 373k 20.6 79.4 0.0 0.0 > rx remote with sink: 30k 3 203k 8237 398k 36.5 60.7 2.8 0.0 > > 6.3-PR besplex (em) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 417k 1 208k 418k 2 49.5 48.5 2.0 > local with sink: 420k 1 276k 145k 2 70.0 23.6 6.4 > tx remote no sink: 19k 2 250k 8144 2 58.5 38.7 2.8 0.0 > tx remote with sink: 16k 2 361k 8336 2 72.9 24.0 3.1 4.4 > rx remote no sink: 429 3 49 888 2 0.3 99.33 0.0 0.4 > tx remote with sink: 13k 2 316k 5385 2 31.7 63.8 3.6 0.8 > > 8.0-C epsplex (em-fast) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 442k 3 221k 230 442k 47.2 49.6 2.7 > local with sink: 394k 3 262k 228 131k 72.1 22.6 5.3 > tx remote no sink: 17k 3 226k 7832 100 94.1 0.2 3.0 0.0 > tx remote with sink: 17k 3 360k 7962 100 91.7 0.2 3.7 4.4 > rx remote no sink: saturated -- cannot update systat display > rx remote with sink: 15k 6 358k 8224 100 97.0 0.0 2.5 0.5 > > ~4.10 besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 15 0 425k 228 11 96.3 0.0 3.7 > local with sink: ** 0 622k 229 ** 94.7 0.3 5.0 > tx remote no sink: 29 1 490k 7024 11 47.9 29.8 4.4 17.9 > tx remote with sink: 26 1 635k 1883 11 65.7 11.4 5.6 17.3 > rx remote no sink: 5 1 68 7025 1 0.0 47.3 0.0 52.7 > rx remote with sink: 6679 2 365k 6899 12 19.7 29.2 2.5 48.7 > > ~5.2-C besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 1M 3 271k 229 543k 50.7 46.8 2.5 > local with sink: 1M 3 406k 229 203k 67.4 28.2 4.4 > tx remote no sink: 49k 3 474k 11k 167k 52.3 42.7 5.0 0.0 > tx remote with sink: 6371 3 641k 1900 100 76.0 16.8 6.2 0.9 > rx remote no sink: 34k 3 25 11k 270k 0.8 65.4 0.0 33.8 > rx remote with sink: 41k 3 365k 10k 370k 31.5 47.1 2.3 19.0 > > 6.3-PR besplex (bge) ttcp (hz = 1000 else stathz broken): > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 540k 0 270k 540k 0 50.5 46.0 3.5 > local with sink: 628k 0 417k 210k 0 68.8 27.9 3.3 > tx remote no sink: 15k 1 222k 7190 1 28.4 29.3 1.7 40.6 > tx remote with sink: 5947 1 315k 2825 1 39.9 14.7 2.6 42.8 > rx remote no sink: 13k 1 23 6943 0 0.3 49.5 0.2 50.0 > rx remote with sink: 20k 1 371k 6819 0 29.5 30.1 3.9 36.5 > > 8.0-C besplex (bge) ttcp: > Csw Trp Sys Int Sof Sys Intr User Idle > local no sink: 649k 3 324k 100 649k 53.9 42.9 3.2 > local with sink: 649k 3 433k 100 216k 75.2 18.8 6.0 > tx remote no sink: 24k 3 432k 10k 100 49.7 41.3 2.4 6.6 > tx remote with sink: 3199 3 568k 1580 100 64.3 19.6 4.0 12.2 > rx remote no sink: 20k 3 27 10k 100 0.0 46.1 0.0 53.9 > rx remote with sink: 31k 3 370k 10k 100 30.7 30.9 4.8 33.5 > %%% > > Bruce > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From rwatson at FreeBSD.org Fri Jul 4 00:30:19 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Fri Jul 4 00:30:28 2008 Subject: Remaining non-MPSAFE netisr handlers In-Reply-To: <20080526102345.G26343@fledge.watson.org> References: <20080526102345.G26343@fledge.watson.org> Message-ID: <20080704012901.U90881@fledge.watson.org> On Mon, 26 May 2008, Robert Watson wrote: > In the continuing campaign to eliminate the Giant lock from the dregs of the > network stack, I thought I'd send out a list of non-MPSAFE netisr handlers: > > Location Handler Removed with IFF_NEEDSGIANT > dev/usb/usb_ethersubr.c:120 usbintr Yes > net/if_ppp.c:277 pppintr Yes > netinet6/ip6_input.c ip6_input No > > The plan for 8.0 is to remove the NETISR_MPSAFE flag -- all netisr handlers > will be executed without the Giant lock. This doesn't prohibit acquiring > Giant in the handler if required, although that's undesirable for the > obvious reasons (potentially stalling interrupt handling, etc). Obviously, > what would be most desirable is eliminating the remaining requirement for > Giant in the IPv6 input path, primarily consisting of mld6 and nd6. > > With this in mind, my current plan is to remove the flag and add explicit > Giant acquisition for any remaining handlers in June when IFF_NEEDSGIANT > device drivers are disabled. I've now removed the NETISR_MPSAFE flag -- all netisr handlers are now assumed to DTRT with respect to locking. At least until usb and ppp are sorted out, I've introduced NETISR_FORCEQUEUE as an interim measure, which allows protocols to request that they always operate the deferred dispatch, meaning they can acquire Giant if they need to, and modified those two to do so. That should go away by 8.0 also. Robert N M Watson Computer Laboratory University of Cambridge From mike at sentex.net Fri Jul 4 01:55:50 2008 From: mike at sentex.net (Mike Tancsa) Date: Fri Jul 4 01:55:57 2008 Subject: Tunneling issues In-Reply-To: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> Message-ID: <200807040155.m641tl8s000607@lava.sentex.ca> At 03:15 PM 7/3/2008, zaphod@fsklaw.com wrote: >I have a real poser, and I ccan't solve it. > >Currently I have a ipsec vpn tunneling 14 servers through a central server. > >I would like to restructure this so that each server talks to each other >directly, rather than passing everything through a single server. > >However, on every other machine I cannot get a second tunnel to come up. >Not a gre or gif tunnel. And yet I have 14 on the central machine. You would need a lot of policies on each of the boxes (14) but there is no reason it should not work. Do each of the sites have a unique subnet ? Do they have static IP addresses ? An easier solution might be to use something like OpenVPN which allows all the boxes to auth and route through a single server, but they can also talk to each other with a single config option. ---Mike From fox at verio.net Fri Jul 4 02:32:47 2008 From: fox at verio.net (David DeSimone) Date: Fri Jul 4 02:32:52 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <20080703190513.5CD5D5B4C@mail.bitblocks.com> References: <20080703115243.GR29380@server.vk2pj.dyndns.org> <20080703190513.5CD5D5B4C@mail.bitblocks.com> Message-ID: <20080704023244.GH29305@verio.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bakul Shah wrote: > > There is no such packet (for .26) in your tcpdump output. I spotted this: > > 21:28:30.320445 00:0b:cd:d6:66:26 > ff:ff:ff:ff:ff:ff, ethertype > > 802.1Q (0x8100), length 64: vlan 169, p 0, ethertype ARP, arp > > who-has 192.168.169.250 tell 192.168.169.26 which matches with: > > Jul 3 21:28:30 xxxx kernel: arplookup 192.168.169.26 failed: host > > is not on local network So BSD is complaining about the IP of the source, not the host being looked up. Again, I did see these messages in my environment, but in my case, the error was correct: The IP *was not* on the local network. The reason being that we had multiple subnets configured on the same broadcast domain, so the BSD box could indeed hear ARP for subnets it did not know about. I don't know why the box feels moved to complain about this, however. I would think it should not care. In this case, however, the user claims that the box is indeed a member of the 192.168.169 subnet, and therefore it should not be complaining. - -- David DeSimone == Network Admin == fox@verio.net "This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, dis- tribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you." --Lawyer Bot 6000 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFIbYvMFSrKRjX5eCoRAr87AJ9Sr0CBdOazW4SYu3uRbylu1bwz4wCghEcT VvpeTB5KK57fgxuSViz6tb0= =XEIH -----END PGP SIGNATURE----- This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you. From linimon at FreeBSD.org Fri Jul 4 02:57:45 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Fri Jul 4 02:57:57 2008 Subject: kern/125239: [gre] kernel crash when using gre Message-ID: <200807040257.m642viGB085945@freefall.freebsd.org> Synopsis: [gre] kernel crash when using gre Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jul 4 02:57:25 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=125239 From gcshekhar at sbcglobal.net Fri Jul 4 03:27:18 2008 From: gcshekhar at sbcglobal.net (Shekhar Chandrashekhar) Date: Fri Jul 4 03:27:25 2008 Subject: Incorrect ipv6 prefix detaching behavior? Message-ID: <370399.55106.qm@web81002.mail.mud.yahoo.com> I'm running into an issue where nd6_rtr.c:pfxlist_onlink_check() is possibly not doing the right thing by marking a prefix as not ONLINK - I've noticed this behavior in both FreeBSD 6 and 7. I have an interface (say fxp0) which has an router-advertised address for access outside the local subnet (fc00:10:1:2::/64 prefix) and a static address (fc00:10:1:1::/64) for connecting to servers on the same subnet. The def router only advertises the fc00:10:1:2:: prefix. However, as you see from the "flags=LD" below, freebsd seems to mark the fc00:10:1:1::/64 prefix as detached and always forwards to the def router even for a dest in the fc00:10:1:1:: subnet. # ndp -p ;# abbreviated for interesting prefixes fc00:10:1:2::/64 if=fxp0 flags=LAO vltime=2592000, pltime-604800, expire=29d23h59m50s, ref=1 advertised by fe80::214:f604:65f0:93f0%fxp0 (reachable) fc00:10:1:1::/64 if=fxp0 flags=LD vltime=0, pltime=0, expired, ref=1 No advertising router The code in question (around line 1396 of n6_rtr.c) seems to mark any non-advertised prefix as detached - the comment in front of this segment (around line 1376) indicates this is done to take care of a move to a different network: if ((pr->ndpr_stateflags & NDPRF_DETACHED) == 0 && find_pfxlist_reachable_router(pr) == NULL) ---> pr->ndpr_stateflags |= NDPRF_DETACHED; If this is still the current thinking, it looks like my usage scenario is incorrect and I would like to understand why that is so. And what is the workaround? If not and this is a bug, then would suggest a addition to the code to allow only "non-static" prefixes to be detached... if ((pr->ndpr_stateflags & NDPRF_DETACHED) == 0 && find_pfxlist_reachable_router(pr) == NULL && pr->ndpr_pltime != ND6_INFINITE_LIFETIME) pr->ndpr_stateflags |= NDPRF_DETACHED; Thanks in advance for your help, --shekhar ------------------------------------------------------------------------------------------------- (gcshekhar AT sbc NOSPACE global DOT net) Confidence is the feeling you have before you understand the situation From paul at gtcomm.net Fri Jul 4 04:52:36 2008 From: paul at gtcomm.net (Paul) Date: Fri Jul 4 04:52:43 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486D35A0.4000302@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde. org> <486D35A0.4000302@gtcomm.net> Message-ID: <486DAD0D.8090604@gtcomm.net> Numbers are maximum with near 100% cpu usage and some errors occuring, just for testing. FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #6: Thu Jul 3 19:32:38 CDT 2008 root@foo:/usr/obj/usr/src/sys/ROUTER amd64 CPU: Dual-Core AMD Opteron(tm) Processor 2222 (3015.47-MHz K8-class CPU) NON-SMP KERNEL em driver, intel 82571EB NICs fastforwarding on, isr.direct on, ULE, Preemption (NOTE: Interesting thing, without preemption gets errors similar to polling) 64 bit.. 1.1mpps max with opteron 2222 one direction no routing table, no firewall -> em0 --> em3 -> 64 bit.. 700k max with opteron 2222 one direction no routing table, one ipfw rule -> em0 --> em3 -> 64 bit.. 500kpps max with opteron 2222 one direction no routing table, 20 ipfw rule -> em0 --> em3 -> 64 bit.. 750kpps max with opteron 2222 one direction Full BGP (260k route) table -> em0 --> em3 -> 64 bit.. 400kpps max with opteron 2222 one direction no routing table, 2 pf rules no state -> em0 --> em3 -> using lagg driver in etherchannel with 2 ports (em0,em1) reduces the performance by about 8% which is strange as it shouldn't. In SMP mode lagg driver reduces it substantially more, and this is where it should increase performance greatly because incoming packets are load balanced over multiple NICs.. :/ 32 bit test coming next, then I'm going with a high mhz Xeon or c2d proc 45nm and post those results (using same source tree/kernel/etc) I tried polling, and I tried the polling patch that was posted to the list and both work but generate too many errors (missed packets). Without polling the packet errors ONLY occur when the cpu is near 100% usage Paul wrote: > Opteron 2222 UP mode, no polling > > input (em0) output > packets errs bytes packets errs bytes colls > 1071020 0 66403248 2 0 404 0 > 1049793 0 65087174 2 0 356 0 > 1040320 0 64499848 2 0 356 0 > 1049712 0 65082152 2 0 356 0 > 1039504 0 64449256 2 0 356 0 > 933118 0 57853324 2 0 356 0 > > still has some cpu left and i can't generate any more packets > > Polling turned on provided better performance on 32 bit, but it gets > strange errors on 64 bit.. > Even at low pps I get small amounts of errors, and high pps same > thing.. you would think that if > it got errors at low pps it would get more errors at high pps but that > isn't the case.. > Polling on: > packets errs bytes packets errs bytes colls > 979736 963 60743636 1 0 226 0 > 991838 496 61493960 1 0 178 0 > 996125 460 61759754 1 0 178 0 > 979381 326 60721626 1 0 178 0 > 1022249 379 63379442 1 0 178 0 > 991468 557 61471020 1 0 178 0 > > lowering pps a little....... > input (em0) output > packets errs bytes packets errs bytes colls > 818688 151 50758660 1 0 226 0 > 837920 179 51951044 1 0 178 0 > 826217 168 51225458 1 0 178 0 > 801017 100 49663058 1 0 178 0 > 761857 287 47235138 1 0 178 0 > > > what could cause this? > > If i'm going to use a uniprocessor mode system I NEED polling to work > because I have to have > cpu cycles left over for userspace processes and I can't afford to > have it lock those out. > SMP is no big deal if it actually worked.. > > I'm going to do a SMP test with this cpu now with polling off/on and > then I'm going to apply the polling patch and try that. > > > > Bruce Evans wrote: >> On Thu, 3 Jul 2008, Paul wrote: >> >>> Bruce Evans wrote: >>>>> No polling: >>>>> 843762 25337 52313248 1 0 178 0 >>>>> 763555 0 47340414 1 0 178 0 >>>>> 830189 0 51471722 1 0 178 0 >>>>> 838724 0 52000892 1 0 178 0 >>>>> 813594 939 50442832 1 0 178 0 >>>>> 807303 763 50052790 1 0 178 0 >>>>> 791024 0 49043492 1 0 178 0 >>>>> 768316 1106 47635596 1 0 178 0 >>>>> Machine is maxed and is unresponsive.. >>>> >>>> That's the most interesting one. Even 1% packet loss would probably >>>> destroy performance, so the benchmarks that give 10-50% packet loss >>>> are uninteresting. >>>> >>> But you realize that it's outputting all of these packets on em3 >>> and I'm watching them coming out >>> and they are consistent with the packets received on em0 that >>> netstat shows are 'good' packets. >> >> Well, output is easier. I don't remember seeing the load on a taskq for >> em3. If there is a memory bottleneck, it might to might not be more >> related >> to running only 1 taskq per interrupt, depending on how independent the >> memory system is for different CPU. I think Opterons have more >> indenpendence >> here than most x86's. >> >>> I'm using a server opteron which supposedly has the best memory >>> performance out of any CPU right now. >>> Plus opterons have the biggest l1 cache, but small l2 cache. Do you >>> think larger l2 cache on the Xeon (6mb for 2 core) would be better? >>> I have a 2222 opteron coming which is 1ghz faster so we will see >>> what happens >> >> I suspect lower latency memory would help more. Big memory systems >> have inherently higher latency. My little old A64 workstation and >> laptop have main memory latencies 3 times smaller than freebsd.org's >> new Core2 servers according to lmbench2 (42 nsec for the overclocked >> DDR PC3200 one and 55 for the DDR2 PC5400 (?) one, vs 145-155 nsec). >> If there are a lot of cache misses, then the extra 100 nsec can be >> important. Profiling of sendto() using hwpmc or perfmon shows a >> significant number of cache misses per packet (2 or 10?). >> >>>>> Polling ON: >>>>> input (em0) output >>>>> packets errs bytes packets errs bytes colls >>>>> 784138 179079 48616564 1 0 226 0 >>>>> 788815 129608 48906530 2 0 356 0 >>>>> Machine is responsive and has 40% idle cpu.. Why ALWAYS 40% ? I'm >>>>> really mistified by this.. >>>> >>>> Is this with hz=2000 and 256/256 and no polling in idle? 40% is easy >>>> to explain (perhaps incorrectly). Polling can then read at most 256 >>>> descriptors every 1/2000 second, giving a max throughput of 512 kpps. >>>> Packets < descriptors in general but might be equal here (for small >>>> packets). You seem to actually get 784 kpps, which is too high even >>>> in descriptors unless, but matches exactly if the errors are counted >>>> twice (784 - 179 - 505 ~= 512). CPU is getting short too, but 40% >>>> still happens to be left over after giving up at 512 kpps. Most of >>>> the errors are probably handled by the hardware at low cost in CPU by >>>> dropping packets. There are other types of errors but none except >>>> dropped packets is likely. >>>> >>> Read above, it's actually transmitting 770kpps out of em3 so it >>> can't just be 512kpps. >> >> Transmitting is easier, but with polling its even harder to send >> faster than >> hz * queue_length than to receive. This is without polling in idle. >> >>> I was thinking of trying 4 or 5.. but how would that work with this >>> new hardware? >> >> Poorly, except possibly with polling in FreeBSD-4. FreeBSD-4 generally >> has lower overheads and latency, but is missing important improvements >> (mainly tcp optimizations in upper layers, better DMA and/or mbuf >> handling, and support for newer NICs). FreeBSD-5 is also missing the >> overhead+latency advantage. >> >> Here are some benchmarks. (ttcp mainly tests sendto(). 4.10 em needed a >> 2-line change to support a not-so-new PCI em NIC. Summary: >> - my bge NIC can handle about 600 kpps on my faster machine, but only >> achieves 300 in 4.10 unpatched. >> - my em NIC can handle about 400 kpps on my slower machine, except in >> later versions it can receive at about 600 kpps. >> - only 6.x and later can achieve near wire throughput for 1500-MTU >> packets (81 kpps vs 76 kpps). This depends on better DMA or mbuf >> handling... I now remember the details -- it is mainly better mbuf >> handling: old versions split the 1500-MTU packets into 2 mbufs and >> this causes 2 descriptors per packet, which causes extra software >> overheads and even larger overheads for the hardware. >> >> %%% >> Results of benchmarks run on 23 Feb 2007: >> >> my~5.2 bge --> ~4.10 em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 639 98 1660 398* 77 8k >> ttcp -l5 -t 6.0 100 3960 6.0 6 5900 >> ttcp -l1472 -u -t 76 27 395 76 40 8k >> ttcp -l1472 -t 51 40 11k 51 26 8k >> >> (*) Same as sender according to netstat -I, but systat -ip shows that >> almost half aren't delivered to upper layers. >> >> my~5.2 bge --> 4.11 em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 635 98 1650 399* 74 8k >> ttcp -l5 -t 5.8 100 3900 5.8 6 5800 >> ttcp -l1472 -u -t 76 27 395 76 32 8k >> ttcp -l1472 -t 51 40 11k 51 25 8k >> >> (*) Same as sender according to netstat -I, but systat -ip shows that >> almost half aren't delivered to upper layers. >> >> my~5.2 bge --> my~5.2 em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 638 98 1660 394* 100- 8k >> ttcp -l5 -t 5.8 100 3900 5.8 9 6000 >> ttcp -l1472 -u -t 76 27 395 76 46 8k >> ttcp -l1472 -t 51 40 11k 51 35 8k >> >> (*) Same as sender according to netstat -I, but systat -ip shows that >> almost half aren't delivered to upper layers. With the em rate >> limit on ips changed from 8k to 80k, about 95% are delivered up. >> >> my~5.2 bge --> 6.2 em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 637 98 1660 637 100- 15k >> ttcp -l5 -t 5.8 100 3900 5.8 8 12k >> ttcp -l1472 -u -t 76 27 395 76 36 16k >> ttcp -l1472 -t 51 40 11k 51 37 16k >> >> my~5.2 bge --> ~current em-fastintr >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 641 98 1670 641 99 8k >> ttcp -l5 -t 5.9 100 2670 5.9 7 6k >> ttcp -l1472 -u -t 76 27 395 76 35 8k >> ttcp -l1472 -t 52 43 11k 52 30 8k >> >> ~6.2 bge --> ~current em-fastintr >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 309 62 1600 309 64 8k >> ttcp -l5 -t 4.9 100 3000 4.9 6 7k >> ttcp -l1472 -u -t 76 27 395 76 34 8k >> ttcp -l1472 -t 54 28 6800 54 30 8k >> >> ~current bge --> ~current em-fastintr >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t 602 100 1570 602 99 8k >> ttcp -l5 -t 5.3 100 2660 5.3 5 5300 >> ttcp -l1472 -u -t 81# 19 212 81# 38 8k >> ttcp -l1472 -t 53 34 11k 53 30 8k >> >> (#) Wire speed to within 0.5%. This is the only kppps in this set of >> benchmarks that is close to wire speed. Older kernels apparently >> lose relative to -current because mbufs for mtu-sized packets are >> not contiguous in older kernels. >> >> Old results: >> >> ~4.10 bge --> my~5.2 em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t n/a n/a n/a 346 79 8k >> ttcp -l5 -t n/a n/a n/a 5.4 10 6800 >> ttcp -l1472 -u -t n/a n/a n/a 67 40 8k >> ttcp -l1472 -t n/a n/a n/a 51 36 8k >> >> ~4.10 kernel, =4 bge --> ~current em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t n/a n/a n/a 347 96 14k >> ttcp -l5 -t n/a n/a n/a 5.8 10 14k >> ttcp -l1472 -u -t n/a n/a n/a 67 62 14K >> ttcp -l1472 -t n/a n/a n/a 52 40 16k >> >> ~4.10 kernel, =4+ bge --> ~current em >> tx rx >> kpps load% ips kpps load% ips >> ttcp -l5 -u -t n/a n/a n/a 627 100 9k >> ttcp -l5 -t n/a n/a n/a 5.6 9 13k >> ttcp -l1472 -u -t n/a n/a n/a 68 63 14k >> ttcp -l1472 -t n/a n/a n/a 54 44 16k >> %%% >> >> %%% >> Results of benchmarks run on 28 Dec 2007: >> >> ~5.2 epsplex (em) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 825k 3 206k 229 412k 52.1 45.1 2.8 >> local with sink: 659k 3 263k 231 131k 66.5 27.3 6.2 >> tx remote no sink: 35k 3 273k 8237 266k 42.0 52.1 2.3 >> 3.6 >> tx remote with sink: 26k 3 394k 8224 100 60.0 5.41 3.4 >> 11.2 >> rx remote no sink: 25k 4 26 8237 373k 20.6 79.4 0.0 >> 0.0 >> rx remote with sink: 30k 3 203k 8237 398k 36.5 60.7 2.8 >> 0.0 >> >> 6.3-PR besplex (em) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 417k 1 208k 418k 2 49.5 48.5 2.0 >> local with sink: 420k 1 276k 145k 2 70.0 23.6 6.4 >> tx remote no sink: 19k 2 250k 8144 2 58.5 38.7 2.8 >> 0.0 >> tx remote with sink: 16k 2 361k 8336 2 72.9 24.0 3.1 >> 4.4 >> rx remote no sink: 429 3 49 888 2 0.3 99.33 0.0 >> 0.4 >> tx remote with sink: 13k 2 316k 5385 2 31.7 63.8 3.6 >> 0.8 >> >> 8.0-C epsplex (em-fast) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 442k 3 221k 230 442k 47.2 49.6 2.7 >> local with sink: 394k 3 262k 228 131k 72.1 22.6 5.3 >> tx remote no sink: 17k 3 226k 7832 100 94.1 0.2 3.0 >> 0.0 >> tx remote with sink: 17k 3 360k 7962 100 91.7 0.2 3.7 >> 4.4 >> rx remote no sink: saturated -- cannot update systat display >> rx remote with sink: 15k 6 358k 8224 100 97.0 0.0 2.5 >> 0.5 >> >> ~4.10 besplex (bge) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 15 0 425k 228 11 96.3 0.0 3.7 >> local with sink: ** 0 622k 229 ** 94.7 0.3 5.0 >> tx remote no sink: 29 1 490k 7024 11 47.9 29.8 4.4 >> 17.9 >> tx remote with sink: 26 1 635k 1883 11 65.7 11.4 5.6 >> 17.3 >> rx remote no sink: 5 1 68 7025 1 0.0 47.3 0.0 >> 52.7 >> rx remote with sink: 6679 2 365k 6899 12 19.7 29.2 2.5 >> 48.7 >> >> ~5.2-C besplex (bge) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 1M 3 271k 229 543k 50.7 46.8 2.5 >> local with sink: 1M 3 406k 229 203k 67.4 28.2 4.4 >> tx remote no sink: 49k 3 474k 11k 167k 52.3 42.7 5.0 >> 0.0 >> tx remote with sink: 6371 3 641k 1900 100 76.0 16.8 6.2 >> 0.9 >> rx remote no sink: 34k 3 25 11k 270k 0.8 65.4 0.0 >> 33.8 >> rx remote with sink: 41k 3 365k 10k 370k 31.5 47.1 2.3 >> 19.0 >> >> 6.3-PR besplex (bge) ttcp (hz = 1000 else stathz broken): >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 540k 0 270k 540k 0 50.5 46.0 3.5 >> local with sink: 628k 0 417k 210k 0 68.8 27.9 3.3 >> tx remote no sink: 15k 1 222k 7190 1 28.4 29.3 1.7 >> 40.6 >> tx remote with sink: 5947 1 315k 2825 1 39.9 14.7 2.6 >> 42.8 >> rx remote no sink: 13k 1 23 6943 0 0.3 49.5 0.2 >> 50.0 >> rx remote with sink: 20k 1 371k 6819 0 29.5 30.1 3.9 >> 36.5 >> >> 8.0-C besplex (bge) ttcp: >> Csw Trp Sys Int Sof Sys Intr User >> Idle >> local no sink: 649k 3 324k 100 649k 53.9 42.9 3.2 >> local with sink: 649k 3 433k 100 216k 75.2 18.8 6.0 >> tx remote no sink: 24k 3 432k 10k 100 49.7 41.3 2.4 >> 6.6 >> tx remote with sink: 3199 3 568k 1580 100 64.3 19.6 4.0 >> 12.2 >> rx remote no sink: 20k 3 27 10k 100 0.0 46.1 0.0 >> 53.9 >> rx remote with sink: 31k 3 370k 10k 100 30.7 30.9 4.8 >> 33.5 >> %%% >> >> Bruce >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From brde at optusnet.com.au Fri Jul 4 06:28:57 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Fri Jul 4 06:29:05 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486DAD0D.8090604@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde. org> <486D35A0.4000302@gtcomm.net> <486DAD0D.8090604@gtcomm.net> Message-ID: <20080704160950.Y9864@delplex.bde.org> On Fri, 4 Jul 2008, Paul wrote: > Numbers are maximum with near 100% cpu usage and some errors occuring, just > for testing. > FreeBSD 7.0-STABLE FreeBSD 7.0-STABLE #6: Thu Jul 3 19:32:38 CDT 2008 > root@foo:/usr/obj/usr/src/sys/ROUTER amd64 > CPU: Dual-Core AMD Opteron(tm) Processor 2222 (3015.47-MHz K8-class CPU) > NON-SMP KERNEL em driver, intel 82571EB NICs > fastforwarding on, isr.direct on, ULE, Preemption (NOTE: Interesting thing, > without preemption gets errors similar to polling) PREEMPTION is certainly needed with UP. Without it, interrupts don't actually work (to work, they need to preempt the running thread, but they often (usually?) don't do that). Then with UP, there is a good chance that the interrupt thread doesn't get scheduled to run for a long time, but with SMP (especially with lots of CPUs) there is a good chance that another CPU gets scheduled to run the interrupt thread. em (unless misconfigured) doesn't have an interrupt thread; it uses a taskq which might take even longer to be scheduled than an interrupt thread. I use PREEMPTION with UP and !PREEMPTION with SMP. With polling, missed polls cause the same packet loss as not preempting. > I tried polling, and I tried the polling patch that was posted to the list > and both work but generate too many errors (missed packets). > Without polling the packet errors ONLY occur when the cpu is near 100% usage Polling should also only cause packet loss when the CPU is near 100% usage, but now transients of near 100% usually cause packet loss, while with interrupts it takes a transient of > 100% on the competing interrupt- driven resources to cause packet loss. Pleas trim quotes. Bruce From if at xip.at Fri Jul 4 09:10:20 2008 From: if at xip.at (Ingo Flaschberger) Date: Fri Jul 4 09:10:28 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486D35A0.4000302@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> Message-ID: Dear Paul, > Opteron 2222 UP mode, no polling > > input (em0) output > packets errs bytes packets errs bytes colls > 1071020 0 66403248 2 0 404 0 that looks good. (but seems to be near the limit). > Polling turned on provided better performance on 32 bit, but it gets strange > errors on 64 bit.. > Even at low pps I get small amounts of errors, and high pps same thing.. you > would think that if > it got errors at low pps it would get more errors at high pps but that isn't > the case.. > Polling on: > packets errs bytes packets errs bytes colls > 979736 963 60743636 1 0 226 0 > 991838 496 61493960 1 0 178 0 > 996125 460 61759754 1 0 178 0 > 979381 326 60721626 1 0 178 0 > 1022249 379 63379442 1 0 178 0 > 991468 557 61471020 1 0 178 0 > > lowering pps a little....... > input (em0) output > packets errs bytes packets errs bytes colls > 818688 151 50758660 1 0 226 0 > 837920 179 51951044 1 0 178 0 > 826217 168 51225458 1 0 178 0 > 801017 100 49663058 1 0 178 0 > 761857 287 47235138 1 0 178 0 > > > what could cause this? *) kern.polling.idle_poll enabled? *) kern.polling.user_frac ? *) kern.polling.reg_frac ? *) kern.polling.burst_max ? *) kern.polling.each_burst ? Kind regards, Ingo Flaschberger From paul at gtcomm.net Fri Jul 4 09:45:22 2008 From: paul at gtcomm.net (Paul) Date: Fri Jul 4 09:45:29 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> Message-ID: <486DF1A3.9000409@gtcomm.net> ngo Flaschberger wrote: > Dear Paul, > >> Opteron 2222 UP mode, no polling >> >> input (em0) output >> packets errs bytes packets errs bytes colls >> 1071020 0 66403248 2 0 404 0 > > that looks good. (but seems to be near the limit). > Yes it is , any more and errors start. >> Polling turned on provided better performance on 32 bit, but it gets >> strange errors on 64 bit.. >> Even at low pps I get small amounts of errors, and high pps same >> thing.. you would think that if >> it got errors at low pps it would get more errors at high pps but >> that isn't the case.. >> Polling on: >> packets errs bytes packets errs bytes colls >> 979736 963 60743636 1 0 226 0 >> 991838 496 61493960 1 0 178 0 >> 996125 460 61759754 1 0 178 0 >> >> >> what could cause this? > > *) kern.polling.idle_poll enabled? > *) kern.polling.user_frac ? > *) kern.polling.reg_frac ? > *) kern.polling.burst_max ? > *) kern.polling.each_burst ? I tried tons of different values for these and nothing made any significant difference. Idle polling makes a difference, allows more pps, but still errors. Without idle polling it seems PPS is limited to HZ * descriptors, or 1000 * 256 or 512 but 1000 * 1024 is the same as 512.. 4000 * 256 or 2000 * 512 works but starts erroring 600kpps (SMP right now but it happens in UP too) If anyone wants to log into the box and play with settings, recompile the kernel, etc. Let me know. Paul From if at xip.at Fri Jul 4 11:16:16 2008 From: if at xip.at (Ingo Flaschberger) Date: Fri Jul 4 11:16:23 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486DF1A3.9000409@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> Message-ID: Dear Paul, >>> what could cause this? >> >> *) kern.polling.idle_poll enabled? >> *) kern.polling.user_frac ? >> *) kern.polling.reg_frac ? >> *) kern.polling.burst_max ? >> *) kern.polling.each_burst ? > > I tried tons of different values for these and nothing made any significant > difference. > Idle polling makes a difference, allows more pps, but still errors. > Without idle polling it seems PPS is limited to HZ * descriptors, or 1000 * > 256 or 512 > but 1000 * 1024 is the same as 512.. 4000 * 256 or 2000 * 512 works but > starts erroring 600kpps (SMP right now but it happens in UP too) I have patched src/sys/kern/kern_poll.c to support higher burst_max values: #define MAX_POLL_BURST_MAX 10000 When setting kern.polling.burst_max to higher values, the server reach a point, where cpu-usage goes up without load, so try to keep below this values. I also have set the network card to 4096 rx-"ram", to have more room for late polls. Kind regards, Ingo Flaschberger From koitsu at FreeBSD.org Fri Jul 4 11:32:14 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Jul 4 11:32:32 2008 Subject: connect(): Operation not permitted In-Reply-To: References: <678A03F5-5E8A-4CF6-90DF-AA9A4F30FBE1@stromnet.se> <1211037564.6326.27.camel@porksoda> <679DB462-75D6-45CC-949C-1BE8E12C22CD@stromnet.se> <482FD877.6050707@infracaninophile.co.uk> <20080703003955.859BCF180C0@mx.npubs.com> Message-ID: <20080704113213.GA13586@eos.sc1.parodius.com> On Thu, Jul 03, 2008 at 08:55:21AM -0700, Kian Mohageri wrote: > On Wed, Jul 2, 2008 at 5:39 PM, Stef wrote: > > Kian Mohageri wrote: > >> On Sun, May 18, 2008 at 3:33 AM, Johan Str?m wrote: > >>> On May 18, 2008, at 9:19 AM, Matthew Seaman wrote: > >>> > >>>> Johan Str?m wrote: > >>>> > >>>>> drop all traffic)? A check with pfctl -vsr reveals that the actual rule > >>>>> inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags > >>>>> S/SA keep state". Where did that "keep state" come from? > >>>> 'flags S/SA keep state' is the default now for tcp filter rules -- that > >>>> was new in 7.0 reflecting the upstream changes made between the 4.0 and > >>>> 4.1 > >>>> releases of OpenBSD. If you want a stateless rule, append 'no state'. > >>>> > >>>> http://www.openbsd.org/faq/pf/filter.html#state > >>> Thanks! I was actually looking around in the pf.conf manpage but failed to > >>> find it yesterday, but looking closer today I now saw it. > >>> Applied the no state (and quick) to the rule, and now no state is created. > >>> And the problem I had in the first place seems to have been resolved too > >>> now, even though it didn't look like a state problem.. (started to deny new > >>> connections much earlier than the states was full, altough maybee i wasnt > >>> looking for updates fast enough or something). > >>> > >> > >> I'd be willing to bet it's because you're reusing the source port on a > >> new connection before the old state expires. > >> > >> You'll know if you check the state-mismatch counter. > >> > >> Anyway, glad you found a resolution. > > > > I've been experiencing this "Operation not permitted" too. I've been > > trying to track down the problem for many months, but due to the > > complexity of my firewalls (scores of jails each with scores of rules), > > I wasn't brave enough to ask for help :) > > > > As a work around we started creating rules without state, whenever we > > would run into the problem. > > > > Thanks for the pointer about state-mismatch. The state-mismatch counter > > does is in fact high in my case (see below). How would I go about > > getting the pf state timeout and the reuse of ports for outbound > > connections to match? Or is this an intractable problem, that just needs > > to be worked around? > > Make sure your state-mismatch counter is increasing at the same times > you experience the problem (and isn't just high from some unrelated > issue). > > A similar/related problem was addressed in OpenBSD 4.3 > (http://www.openbsd.org/plus43.html). > > * In pf(4), allow state reuse if both sides are in FIN_WAIT_2 and a > new SYN arrives. > > I'm not sure if it's been imported yet. If not, you could try tuning > your timeout values (see pf.conf(5)). > > The specific issue I was experienced was solved by shortening > tcp.closed, IIRC. It's been a while though. When administrators see state-mismatch increasing, they get concerned. The common scapegoat is tcp.closed, which people don't even bother to describe (pf has an internal value of 10 seconds applied to that value, e.g. tcp.closed=5 means 15 seconds). You can set tcp.closed as low as you want, but chances are random Internet users will have equipment with IP stacks that re-use outbound sockets which haven't fully closed down within the aforementioned interval. pf cannot fix this. For example, on our production/hosting systems, we see state-mismatch increase fairly often. I just pfctl -F info'd our main webserver, and within about 15 minutes, state-mismatch was up to 22. We use tcp.closed of 5 (which means 15 seconds). Workarounds such as "no state" suffice, but if you use rdr rules, you MUST track state, which means there's no way of winning in that case. For sake of example, OpenBSD spamd requires the use of rdr rules. Administrators then ask 3 questions: 1) How do I determine whether or not state-mismatch increasing is a sign of bad things, or due to peoples' broken IP stacks, 2) What happens to packets which cause state-mismatch to increment, e.g. are they blocked, passed, or what? 3) Why isn't state-mismatch described in detail in the documentation? Finally, the fix in OpenBSD 4.3 should really be backported to FreeBSD ASAP. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From koitsu at FreeBSD.org Fri Jul 4 12:10:50 2008 From: koitsu at FreeBSD.org (Jeremy Chadwick) Date: Fri Jul 4 12:10:57 2008 Subject: connect(): Operation not permitted In-Reply-To: <20080704113213.GA13586@eos.sc1.parodius.com> References: <678A03F5-5E8A-4CF6-90DF-AA9A4F30FBE1@stromnet.se> <1211037564.6326.27.camel@porksoda> <679DB462-75D6-45CC-949C-1BE8E12C22CD@stromnet.se> <482FD877.6050707@infracaninophile.co.uk> <20080703003955.859BCF180C0@mx.npubs.com> <20080704113213.GA13586@eos.sc1.parodius.com> Message-ID: <20080704121050.GA14604@eos.sc1.parodius.com> On Fri, Jul 04, 2008 at 04:32:13AM -0700, Jeremy Chadwick wrote: > On Thu, Jul 03, 2008 at 08:55:21AM -0700, Kian Mohageri wrote: > > A similar/related problem was addressed in OpenBSD 4.3 > > (http://www.openbsd.org/plus43.html). > > > > * In pf(4), allow state reuse if both sides are in FIN_WAIT_2 and a > > new SYN arrives. The OpenBSD diff: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c.diff?r2=1.559&r1=1.558&f=H I've submit a FreeBSD PR to get the above backported into RELENG_7 and RELENG_6: http://www.freebsd.org/cgi/query-pr.cgi?pr=125261 -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From remko at FreeBSD.org Fri Jul 4 14:50:33 2008 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Fri Jul 4 14:50:44 2008 Subject: kern/125195: fxp(4) driver failed to initialize device Intel 82801DB Message-ID: <200807041450.m64EoWMa080205@freefall.freebsd.org> Synopsis: fxp(4) driver failed to initialize device Intel 82801DB Responsible-Changed-From-To: freebsd-i386->freebsd-net Responsible-Changed-By: remko Responsible-Changed-When: Fri Jul 4 14:50:18 UTC 2008 Responsible-Changed-Why: Reassign to -NET http://www.freebsd.org/cgi/query-pr.cgi?pr=125195 From remko at FreeBSD.org Fri Jul 4 14:51:07 2008 From: remko at FreeBSD.org (remko@FreeBSD.org) Date: Fri Jul 4 14:51:13 2008 Subject: kern/125258: socket's SO_REUSEADDR option does not work Message-ID: <200807041451.m64Ep7Yk080292@freefall.freebsd.org> Synopsis: socket's SO_REUSEADDR option does not work Responsible-Changed-From-To: freebsd-i386->freebsd-net Responsible-Changed-By: remko Responsible-Changed-When: Fri Jul 4 14:50:44 UTC 2008 Responsible-Changed-Why: reassign to -net http://www.freebsd.org/cgi/query-pr.cgi?pr=125258 From paul at gtcomm.net Fri Jul 4 18:01:23 2008 From: paul at gtcomm.net (Paul) Date: Fri Jul 4 18:01:30 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> Message-ID: <486E65E6.3060301@gtcomm.net> I tried all of this :/ still, 256/512 descriptors seem to work the best. Happy to let you log into the machine and fiddle around if you want :) Paul Ingo Flaschberger wrote: > Dear Paul, > >>>> what could cause this? >>> >>> *) kern.polling.idle_poll enabled? >>> *) kern.polling.user_frac ? >>> *) kern.polling.reg_frac ? >>> *) kern.polling.burst_max ? >>> *) kern.polling.each_burst ? >> >> I tried tons of different values for these and nothing made any >> significant difference. >> Idle polling makes a difference, allows more pps, but still errors. >> Without idle polling it seems PPS is limited to HZ * descriptors, or >> 1000 * 256 or 512 >> but 1000 * 1024 is the same as 512.. 4000 * 256 or 2000 * 512 works >> but starts erroring 600kpps (SMP right now but it happens in UP too) > > I have patched src/sys/kern/kern_poll.c to support higher burst_max > values: > #define MAX_POLL_BURST_MAX 10000 > > When setting kern.polling.burst_max to higher values, the server reach > a point, where cpu-usage goes up without load, so try to keep below > this values. I also have set the network card to 4096 rx-"ram", to > have more room for late polls. > > Kind regards, > Ingo Flaschberger > > From kian.mohageri at gmail.com Fri Jul 4 21:30:50 2008 From: kian.mohageri at gmail.com (Kian Mohageri) Date: Fri Jul 4 21:30:57 2008 Subject: connect(): Operation not permitted In-Reply-To: <20080704113213.GA13586@eos.sc1.parodius.com> References: <678A03F5-5E8A-4CF6-90DF-AA9A4F30FBE1@stromnet.se> <1211037564.6326.27.camel@porksoda> <679DB462-75D6-45CC-949C-1BE8E12C22CD@stromnet.se> <482FD877.6050707@infracaninophile.co.uk> <20080703003955.859BCF180C0@mx.npubs.com> <20080704113213.GA13586@eos.sc1.parodius.com> Message-ID: On Fri, Jul 4, 2008 at 4:32 AM, Jeremy Chadwick wrote: > On Thu, Jul 03, 2008 at 08:55:21AM -0700, Kian Mohageri wrote: >> On Wed, Jul 2, 2008 at 5:39 PM, Stef wrote: >> > Kian Mohageri wrote: >> >> On Sun, May 18, 2008 at 3:33 AM, Johan Str?m wrote: >> >>> On May 18, 2008, at 9:19 AM, Matthew Seaman wrote: >> >>> >> >>>> Johan Str?m wrote: >> >>>> >> >>>>> drop all traffic)? A check with pfctl -vsr reveals that the actual rule >> >>>>> inserted is "pass on lo0 inet from 123.123.123.123 to 123.123.123.123 flags >> >>>>> S/SA keep state". Where did that "keep state" come from? >> >>>> 'flags S/SA keep state' is the default now for tcp filter rules -- that >> >>>> was new in 7.0 reflecting the upstream changes made between the 4.0 and >> >>>> 4.1 >> >>>> releases of OpenBSD. If you want a stateless rule, append 'no state'. >> >>>> >> >>>> http://www.openbsd.org/faq/pf/filter.html#state >> >>> Thanks! I was actually looking around in the pf.conf manpage but failed to >> >>> find it yesterday, but looking closer today I now saw it. >> >>> Applied the no state (and quick) to the rule, and now no state is created. >> >>> And the problem I had in the first place seems to have been resolved too >> >>> now, even though it didn't look like a state problem.. (started to deny new >> >>> connections much earlier than the states was full, altough maybee i wasnt >> >>> looking for updates fast enough or something). >> >>> >> >> >> >> I'd be willing to bet it's because you're reusing the source port on a >> >> new connection before the old state expires. >> >> >> >> You'll know if you check the state-mismatch counter. >> >> >> >> Anyway, glad you found a resolution. >> > >> > I've been experiencing this "Operation not permitted" too. I've been >> > trying to track down the problem for many months, but due to the >> > complexity of my firewalls (scores of jails each with scores of rules), >> > I wasn't brave enough to ask for help :) >> > >> > As a work around we started creating rules without state, whenever we >> > would run into the problem. >> > >> > Thanks for the pointer about state-mismatch. The state-mismatch counter >> > does is in fact high in my case (see below). How would I go about >> > getting the pf state timeout and the reuse of ports for outbound >> > connections to match? Or is this an intractable problem, that just needs >> > to be worked around? >> >> Make sure your state-mismatch counter is increasing at the same times >> you experience the problem (and isn't just high from some unrelated >> issue). >> >> A similar/related problem was addressed in OpenBSD 4.3 >> (http://www.openbsd.org/plus43.html). >> >> * In pf(4), allow state reuse if both sides are in FIN_WAIT_2 and a >> new SYN arrives. >> >> I'm not sure if it's been imported yet. If not, you could try tuning >> your timeout values (see pf.conf(5)). >> >> The specific issue I was experienced was solved by shortening >> tcp.closed, IIRC. It's been a while though. > > When administrators see state-mismatch increasing, they get concerned. > The common scapegoat is tcp.closed, which people don't even bother to > describe (pf has an internal value of 10 seconds applied to that value, > e.g. tcp.closed=5 means 15 seconds). > > You can set tcp.closed as low as you want, but chances are random > Internet users will have equipment with IP stacks that re-use outbound > sockets which haven't fully closed down within the aforementioned > interval. pf cannot fix this. > > For example, on our production/hosting systems, we see state-mismatch > increase fairly often. I just pfctl -F info'd our main webserver, and > within about 15 minutes, state-mismatch was up to 22. We use tcp.closed > of 5 (which means 15 seconds). > > Workarounds such as "no state" suffice, but if you use rdr rules, you > MUST track state, which means there's no way of winning in that case. > For sake of example, OpenBSD spamd requires the use of rdr rules. > > Administrators then ask 3 questions: > For the sake of a helpful archive... > 1) How do I determine whether or not state-mismatch increasing is a > sign of bad things, or due to peoples' broken IP stacks, You can't. Only way you know is probably when people complain, or you notice scripts/page loads failing. > 2) What happens to packets which cause state-mismatch to increment, > e.g. are they blocked, passed, or what? Dropped. In the case of a state-mismatch during TCP handshake, an RST is sent. That's why the failure happens immediately. > 3) Why isn't state-mismatch described in detail in the documentation? > Good question. I guess because it would be difficult to document all of the reasons a state wouldn't match. It would be easier to simply document what a state _is_, but that's already in the archives. -Kian From mnd999 at gmail.com Sat Jul 5 16:00:18 2008 From: mnd999 at gmail.com (Mark Dixon) Date: Sat Jul 5 16:00:24 2008 Subject: Net crash in ath on FREEBSD-7 STABLE Message-ID: <6C1A742B-AB45-4E72-8BA9-F333F607A55E@gmail.com> Hi, I get the following when I run portupgrade and it tries to hit the network for a package on stable. It looks like something has been broken in ath? Seeing as I don't update very often it could have been around for a while. Kernel is GENERIC. Modules are kernel, snd_ich, sound, aio and linux. If anyone wants to look at this and needs more info, let me know. Mark GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff8025e2f2 stack pointer = 0x10:0xffffffffaec9e630 frame pointer = 0x10:0xffffff00018fa100 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 16516 (fetch) trap number = 12 panic: page fault cpuid = 2 Uptime: 2m56s Physical memory: 2034 MB Dumping 251 MB: (CTRL-C to abort) 236 220 (CTRL-C to abort) 204 188 172 156 140 124 108 92 76 60 (CTRL-C to abort) 44 28 12 Reading symbols from /boot/kernel/snd_ich.ko...Reading symbols from / boot/kernel/snd_ich.ko.symbols...done. done. Loaded symbols for /boot/kernel/snd_ich.ko Reading symbols from /boot/kernel/sound.ko...Reading symbols from / boot/kernel/sound.ko.symbols...done. done. Loaded symbols for /boot/kernel/sound.ko Reading symbols from /boot/kernel/aio.ko...Reading symbols from /boot/ kernel/aio.ko.symbols...done. done. Loaded symbols for /boot/kernel/aio.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from / boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff804964a9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff804968ad in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:572 #4 0xffffffff8075ff74 in trap_fatal (frame=0xffffff00044d66a0, eva=18446742974267905128) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff80760345 in trap_pfault (frame=0xffffffffaec9e580, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff80760c88 in trap (frame=0xffffffffaec9e580) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff8074664e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffffff8025e2f2 in ath_start (ifp=0xffffff00011cd800) at /usr/src/sys/dev/ath/if_ath.c:1747 #9 0xffffffff8052fab6 in ether_output_frame (ifp=0xffffff00011cd800, m=0xffffff000421eb00) at /usr/src/sys/net/if_ethersubr.c:405 #10 0xffffffff805300e2 in ether_output (ifp=0xffffff00011cd800, m=0xffffff000421eb00, dst=Variable "dst" is not available. ) at /usr/src/sys/net/if_ethersubr.c:374 #11 0xffffffff805755df in ip_output (m=0xffffff000421eb00, opt=Variable "opt" is not available. ) at /usr/src/sys/netinet/ip_output.c:551 #12 0xffffffff805ce48c in tcp_output (tp=0xffffff0004216b60) at /usr/src/sys/netinet/tcp_output.c:1135 #13 0xffffffff805d880d in tcp_usr_rcvd (so=Variable "so" is not available. ) at /usr/src/sys/netinet/tcp_usrreq.c:738 #14 0xffffffff804ef5cf in soreceive_generic (so=0xffffff001040c570, psa=0x0, uio=0xffffffffaec9eb00, mp0=Variable "mp0" is not available. ) at /usr/src/sys/kern/uipc_socket.c:1825 #15 0xffffffff804cee1d in dofileread (td=0xffffff00044d66a0, fd=3, fp=0xffffff00047e4168, auio=0xffffffffaec9eb00, offset=Variable "offset" is not available. ) at file.h:242 #16 0xffffffff804cf18e in kern_readv (td=0xffffff00044d66a0, fd=3, auio=0xffffffffaec9eb00) at /usr/src/sys/kern/sys_generic.c:192 #17 0xffffffff804cf27c in read (td=0xffffffff80e62000, uap=0xffffff00044d66a0) at /usr/src/sys/kern/sys_generic.c:108 #18 0xffffffff807605c7 in syscall (frame=0xffffffffaec9ec70) at /usr/src/sys/amd64/amd64/trap.c:852 #19 0xffffffff8074685b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #20 0x0000000800c0297c in ?? () Previous frame inner to this frame (corrupt stack?) From cswiger at mac.com Sat Jul 5 16:57:17 2008 From: cswiger at mac.com (Chuck Swiger) Date: Sat Jul 5 16:57:24 2008 Subject: arplookup x.x.x.x failed: host is not on local network In-Reply-To: <20080704023244.GH29305@verio.net> References: <20080703115243.GR29380@server.vk2pj.dyndns.org> <20080703190513.5CD5D5B4C@mail.bitblocks.com> <20080704023244.GH29305@verio.net> Message-ID: <486FA3B1.6040806@mac.com> David DeSimone wrote: [ ... ] > Again, I did see these messages in my environment, but in my case, the > error was correct: The IP *was not* on the local network. The reason > being that we had multiple subnets configured on the same broadcast > domain, so the BSD box could indeed hear ARP for subnets it did not know > about. I don't know why the box feels moved to complain about this, > however. I would think it should not care. It's good practice for machines intended to be on different subnets to be in different collision domains. Seeing traffic to or from the wrong network should be considered a potential "red flag", warning that there might be a network misconfiguration that could compromise security. In particular, if you want to securely host a bunch of client machines, setting them up on individual /30 subnets using a multiport firewall or a BSD box with a couple of 4-port NIC cards, rather than a switch, is a good idea. While this situation is something which is supposedly well-suited for VLANs, in practice most switches cannot be relied upon to actually prevent traffic from leaking outside of the specified VLAN. This is more common for ARP traffic, which is sent to the all-ones MAC and may well get forwarded to all ports regardless of VLAN tagging, particularly if the switch is under load and has switched to some kind of "fast forwarding" mode or if it tends to consider all ports trunk ports by default or via dubious autolearning algorithms.... Regards, -- -Chuck From spawk at acm.poly.edu Sat Jul 5 19:02:53 2008 From: spawk at acm.poly.edu (Boris Kochergin) Date: Sat Jul 5 19:02:59 2008 Subject: One-liner for setting IPv6 address and IPv4 endpoints on gif interface? Message-ID: <486FC54B.3060706@acm.poly.edu> Hi, list. I set up an IPv6-over-IPv4 tunnel from Hurricane Electric using a gif(4) interface using the commands: ifconfig gif0 tunnel [source IPv4] [destination IPv4] ifconfig gif0 inet6 [source IPv6] [destination IPv6] prefixlen 128 route -n add -inet6 default [destination IPv6] I'm wondering whether there's a one-liner for executing the first two commands, or some non-one-liner way of making it happen through /etc/rc.conf. Thanks. -Boris From yuri.pankov at gmail.com Sat Jul 5 19:42:45 2008 From: yuri.pankov at gmail.com (Yuri Pankov) Date: Sat Jul 5 19:42:51 2008 Subject: One-liner for setting IPv6 address and IPv4 endpoints on gif interface? In-Reply-To: <486FC54B.3060706@acm.poly.edu> References: <486FC54B.3060706@acm.poly.edu> Message-ID: <20080705191811.GA58433@darklight.homeunix.org> On Sat, Jul 05, 2008 at 03:02:35PM -0400, Boris Kochergin wrote: > Hi, list. I set up an IPv6-over-IPv4 tunnel from Hurricane Electric > using a gif(4) interface using the commands: > > ifconfig gif0 tunnel [source IPv4] [destination IPv4] > ifconfig gif0 inet6 [source IPv6] [destination IPv6] prefixlen 128 > route -n add -inet6 default [destination IPv6] > > I'm wondering whether there's a one-liner for executing the first two > commands, or some non-one-liner way of making it happen through > /etc/rc.conf. Thanks. > > -Boris Not sure about one-liner, but that's what I'm using in rc.conf (Hurricane Electric's tunnelbroker.net tunnel): gif_interfaces="gif0" gifconfig_gif0="src_ipv4 dst_ipv4" ipv6_enable="YES" ipv6_ifconfig_gif0="src_ipv6 dst_ipv6 prefixlen 128" ipv6_defaultrouter="dst_ipv6" HTH, Yuri From if at xip.at Sat Jul 5 22:10:10 2008 From: if at xip.at (Ingo Flaschberger) Date: Sat Jul 5 22:10:17 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486E65E6.3060301@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> Message-ID: Dear Paul, > I tried all of this :/ still, 256/512 descriptors seem to work the best. > Happy to let you log into the machine and fiddle around if you want :) yes, but I'm shure I will also not be able to achieve much more pps. As it seems that you hit hardware-software-level-barriers, my only idea is to test dragonfly bsd, which seems to have less software overhead. I don't think you will be able to route 64byte packets at 1gbit wirespeed (2Mpps) with a current x86 platform. I hoped to reach 1Mpps with the hardware I mentioned some mails before, but 2Mpps is far far away. Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. Perhaps you have some better luck at some different hardware systems (ppc, mips, ..?) or use freebsd only for routing-table-updates and special network-cards (netfpga) for real routing. Kind regards, Ingo Flaschberger From paul at gtcomm.net Sat Jul 5 23:08:50 2008 From: paul at gtcomm.net (Paul) Date: Sat Jul 5 23:08:58 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> References: <4867420D.7090406@gtcomm.net> <48699960.9070100@gtcomm.net><20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> Message-ID: <486FFF70.3090402@gtcomm.net> ULE + PREEMPTION for non SMP no major differences with SMP with ULE/4BSD and preemption ON/OFF 32 bit UP test coming up with new cpu and I'm installing dragonfly sometime this weekend :] UP: 1mpps in one direction with no firewall/no routing table is not too bad, but 1mpps both directions is the goal here 700kpps with full bgp table in one direction is not too bad Ipfw needs a lot of work, barely gets 500kpps with no routing table with a few ipfw rules loaded.. that's horrible Linux barely takes a hit when you start loading iptables rules , but then again linux has a HUGE problem with routing random packet sources/ports .. grr My problem Is I need some box to do fast routing and some to do firewall.. :/ I'll have 32 bit 7-stable UP test with ipfw/routing table and then move on to dragonfly. I'll post the dragonfly results here as well as sign up for their mailing list. Bart Van Kerckhove wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Paul / Ingo, > >>> I tried all of this :/ still, 256/512 descriptors seem to work the >>> best. Happy to let you log into the machine and fiddle around if you >>> want :) >>> > I've been watching this thread closely, since I'm in a very similair > situation. > A few questions/remarks: > > Does ULE provide better performance than 4BSD for forwarding? > Did you try freebsd4 as well? This thread had a report about that quite > opposite to my own experiences, -4 seemed to be a lot faster at forwarding > than anything else I 've tried so far. > Obviously the thing I'm interested in is IMIX - and 64byte packets. > Does anyone have any benchmarks for DragonFly? I asked around on IRC, but > that nor google turned up any useful results. > > > >> I don't think you will be able to route 64byte packets at 1gbit >> wirespeed (2Mpps) with a current x86 platform. >> > Are there actual hardware related reasons this should not be possible, or > is this purely lack of dedicated work towards this goal? > > > >> Theres a "sun" used at quagga dev as bgp-route-server. >> http://quagga.net/route-server.php >> (but they don't answered my question regarding fw-performance). >> > > > the Quagga guys are running a sun T1000 (niagara 1) route server - I happen > to have the machine in my racks, > please let me know if you want to run some tests on it, I'm sure they won't > mind ;-) > It should also make a great testbed for SMP performance testing imho (and > they're pretty cheap these days) > Also, feel free to use me as a relay for your questions, they're not always > very reachable. > > > >> Perhaps you have some better luck at some different hardware systems >> (ppc, mips, ..?) or use freebsd only for routing-table-updates and >> special network-cards (netfpga) for real routing. >> > The netfpga site seems more or less dead - is this project still alive? > It does look like a very interesting idea, even though it's currently quite > linux-centric (and according to docs doesn't have VLAN nor ip6 support, the > former being quite a dealbreaker) > > Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a > freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of > information you are providing us :) > > Met vriendelijke groet / With kind regards, > > Bart Van Kerckhove > http://friet.net/pgp.txt > > -----BEGIN PGP SIGNATURE----- > > iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi > eca31f7WQ/oXq9tJ8TEDN3CA > =YGYq > -----END PGP SIGNATURE----- > > > From paul at gtcomm.net Sun Jul 6 00:58:20 2008 From: paul at gtcomm.net (Paul) Date: Sun Jul 6 00:58:27 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486FFF70.3090402@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> Message-ID: <48701921.7090107@gtcomm.net> UP 32 bit test vs 64 bit: negligible difference in forwarding performance without polling slightly better polling performance but still errors at lower packet rates same massive hit with ipfw loaded Installing dragonfly in a bit.. If anyone has a really fast PPC type system or SUN or something i'd love to try it :) Something with a really big L1 cache :P Paul wrote: > ULE + PREEMPTION for non SMP > no major differences with SMP with ULE/4BSD and preemption ON/OFF > > 32 bit UP test coming up with new cpu > and I'm installing dragonfly sometime this weekend :] > UP: 1mpps in one direction with no firewall/no routing table is not > too bad, but 1mpps both directions is the goal here > 700kpps with full bgp table in one direction is not too bad > Ipfw needs a lot of work, barely gets 500kpps with no routing table > with a few ipfw rules loaded.. that's horrible > Linux barely takes a hit when you start loading iptables rules , but > then again linux has a HUGE problem with routing > random packet sources/ports .. grr > My problem Is I need some box to do fast routing and some to do > firewall.. :/ > I'll have 32 bit 7-stable UP test with ipfw/routing table and then > move on to dragonfly. > I'll post the dragonfly results here as well as sign up for their > mailing list. > > > Bart Van Kerckhove wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Paul / Ingo, >> >>>> I tried all of this :/ still, 256/512 descriptors seem to work the >>>> best. Happy to let you log into the machine and fiddle around if you >>>> want :) >> I've been watching this thread closely, since I'm in a very similair >> situation. >> A few questions/remarks: >> >> Does ULE provide better performance than 4BSD for forwarding? >> Did you try freebsd4 as well? This thread had a report about that quite >> opposite to my own experiences, -4 seemed to be a lot faster at >> forwarding >> than anything else I 've tried so far. >> Obviously the thing I'm interested in is IMIX - and 64byte packets. >> Does anyone have any benchmarks for DragonFly? I asked around on IRC, >> but >> that nor google turned up any useful results. >> >> >>> I don't think you will be able to route 64byte packets at 1gbit >>> wirespeed (2Mpps) with a current x86 platform. >>> >> Are there actual hardware related reasons this should not be >> possible, or >> is this purely lack of dedicated work towards this goal? >> >> >> >>> Theres a "sun" used at quagga dev as bgp-route-server. >>> http://quagga.net/route-server.php >>> (but they don't answered my question regarding fw-performance). >>> >> >> >> the Quagga guys are running a sun T1000 (niagara 1) route server - I >> happen >> to have the machine in my racks, >> please let me know if you want to run some tests on it, I'm sure they >> won't >> mind ;-) >> It should also make a great testbed for SMP performance testing imho >> (and >> they're pretty cheap these days) >> Also, feel free to use me as a relay for your questions, they're not >> always >> very reachable. >> >> >> >>> Perhaps you have some better luck at some different hardware systems >>> (ppc, mips, ..?) or use freebsd only for routing-table-updates and >>> special network-cards (netfpga) for real routing. >>> >> The netfpga site seems more or less dead - is this project still alive? >> It does look like a very interesting idea, even though it's currently >> quite >> linux-centric (and according to docs doesn't have VLAN nor ip6 >> support, the >> former being quite a dealbreaker) >> >> Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a >> freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of >> information you are providing us :) >> >> Met vriendelijke groet / With kind regards, >> >> Bart Van Kerckhove >> http://friet.net/pgp.txt >> >> -----BEGIN PGP SIGNATURE----- >> >> iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi >> eca31f7WQ/oXq9tJ8TEDN3CA >> =YGYq >> -----END PGP SIGNATURE----- >> >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From bart at it-ss.be Sun Jul 6 01:06:16 2008 From: bart at it-ss.be (Bart Van Kerckhove) Date: Sun Jul 6 01:06:23 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net><48699960.9070100@gtcomm.net><20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> Message-ID: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Paul / Ingo, > >> I tried all of this :/ still, 256/512 descriptors seem to work the >> best. Happy to let you log into the machine and fiddle around if you >> want :) I've been watching this thread closely, since I'm in a very similair situation. A few questions/remarks: Does ULE provide better performance than 4BSD for forwarding? Did you try freebsd4 as well? This thread had a report about that quite opposite to my own experiences, -4 seemed to be a lot faster at forwarding than anything else I 've tried so far. Obviously the thing I'm interested in is IMIX - and 64byte packets. Does anyone have any benchmarks for DragonFly? I asked around on IRC, but that nor google turned up any useful results. > I don't think you will be able to route 64byte packets at 1gbit > wirespeed (2Mpps) with a current x86 platform. Are there actual hardware related reasons this should not be possible, or is this purely lack of dedicated work towards this goal? >Theres a "sun" used at quagga dev as bgp-route-server. >http://quagga.net/route-server.php >(but they don't answered my question regarding fw-performance). the Quagga guys are running a sun T1000 (niagara 1) route server - I happen to have the machine in my racks, please let me know if you want to run some tests on it, I'm sure they won't mind ;-) It should also make a great testbed for SMP performance testing imho (and they're pretty cheap these days) Also, feel free to use me as a relay for your questions, they're not always very reachable. > Perhaps you have some better luck at some different hardware systems > (ppc, mips, ..?) or use freebsd only for routing-table-updates and > special network-cards (netfpga) for real routing. The netfpga site seems more or less dead - is this project still alive? It does look like a very interesting idea, even though it's currently quite linux-centric (and according to docs doesn't have VLAN nor ip6 support, the former being quite a dealbreaker) Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of information you are providing us :) Met vriendelijke groet / With kind regards, Bart Van Kerckhove http://friet.net/pgp.txt -----BEGIN PGP SIGNATURE----- iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi eca31f7WQ/oXq9tJ8TEDN3CA =YGYq -----END PGP SIGNATURE----- From cmarlatt at rxsec.com Sun Jul 6 03:40:33 2008 From: cmarlatt at rxsec.com (Chris Marlatt) Date: Sun Jul 6 03:40:40 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> References: <4867420D.7090406@gtcomm.net> <48699960.9070100@gtcomm.net><20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> Message-ID: <48703855.2040503@rxsec.com> Bart Van Kerckhove wrote: > The netfpga site seems more or less dead - is this project still alive? > It does look like a very interesting idea, even though it's currently quite > linux-centric (and according to docs doesn't have VLAN nor ip6 support, the > former being quite a dealbreaker) > Just last Thursday they made another release so it certainly doesn't look dead. I've been following the project for awhile now to see where it's going to go. The lack of FreeBSD support isn't great but I doubt it's going to happen until someone steps up and makes it so. The same is likely true for VLAN support. So far it's primarily been a proof of concept from what I can tell and could be molded into any number of different applications with the appropriate support. Considering all high performance routing platforms separate the management and routing/switching into two (or more) different hardware sections it wouldn't surprise me at all to see this as the only real option to get some serious routing and firewalling performance out of i386/amd64 type servers. Throwing faster and faster cpus at it is only going to get you so far (re: opteron 2212 vs 2222). Even so, 1.1Mpps is a considerable rate. Regards, Chris From julian at elischer.org Sun Jul 6 04:46:14 2008 From: julian at elischer.org (Julian Elischer) Date: Sun Jul 6 04:46:21 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> References: <4867420D.7090406@gtcomm.net> <48699960.9070100@gtcomm.net><20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> Message-ID: <48704E15.1070803@elischer.org> Bart Van Kerckhove wrote: > > >> Perhaps you have some better luck at some different hardware systems >> (ppc, mips, ..?) or use freebsd only for routing-table-updates and >> special network-cards (netfpga) for real routing. > The netfpga site seems more or less dead - is this project still alive? > It does look like a very interesting idea, even though it's currently quite > linux-centric (and according to docs doesn't have VLAN nor ip6 support, the > former being quite a dealbreaker) netfpga is very much alive. I'm on the mailing lists.. but it is summer break and it's an academically driven project. From andrew at modulus.org Sun Jul 6 08:13:49 2008 From: andrew at modulus.org (Andrew Snow) Date: Sun Jul 6 08:13:56 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48704E15.1070803@elischer.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <48704E15.1070803@elischer.org> Message-ID: <48707EA6.2020506@modulus.org> I'm no expert, but I imagine the problem is because the net processing of FreeBSD is not pipelined enough. We are now able to affordably throw many gigabytes of RAM into a machine, as well 2 to 8 CPUs. So why not allow for big buffers and multiple processing steps? I be happy to give up a bit of latency in order to increase the parallel processing ability of packets travelling through the system. I could be wrong but I imagine it would be better to treat the processing of pockets as a series of stages with queues (that can grow quite large if necessary). From sepherosa at gmail.com Sun Jul 6 09:10:01 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Sun Jul 6 09:10:08 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486E65E6.3060301@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> Message-ID: On Sat, Jul 5, 2008 at 2:03 AM, Paul wrote: > I tried all of this :/ still, 256/512 descriptors seem to work the best. > Happy to let you log into the machine and fiddle around if you want :) I think you need to ktr the packet processing time. Standard gigabit max at ~1488095pps, which means you could be @max rate only if each packet's processing time is <= ~672ns. Since you are using fastforwarding, the calculation should be quite straightforward for you; I don't think you should expect anything beyond the calculated result. Best Regards, sephe -- Live Free or Die From linimon at FreeBSD.org Sun Jul 6 09:14:37 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sun Jul 6 09:14:49 2008 Subject: kern/123200: [netgraph] Server failure due to netgraph mpd and dhcpclient Message-ID: <200807060914.m669EadM068115@freefall.freebsd.org> Old Synopsis: Server failure due to netgraph mpd and dhcpclient New Synopsis: [netgraph] Server failure due to netgraph mpd and dhcpclient Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jul 6 09:09:28 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). Note that some feedback has been received. http://www.freebsd.org/cgi/query-pr.cgi?pr=123200 From linimon at FreeBSD.org Sun Jul 6 09:16:47 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sun Jul 6 09:17:45 2008 Subject: kern/122685: It is not visible passing packets in tcpdump Message-ID: <200807060916.m669GkdS068323@freefall.freebsd.org> Synopsis: It is not visible passing packets in tcpdump Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jul 6 09:16:38 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=122685 From linimon at FreeBSD.org Sun Jul 6 09:18:28 2008 From: linimon at FreeBSD.org (linimon@FreeBSD.org) Date: Sun Jul 6 09:18:35 2008 Subject: kern/122145: [build] error while compiling with device ath_rate_amrr Message-ID: <200807060918.m669IRom068463@freefall.freebsd.org> Old Synopsis: error while compiling with device ath_rate_amrr New Synopsis: [build] error while compiling with device ath_rate_amrr Responsible-Changed-From-To: freebsd-net->sam Responsible-Changed-By: linimon Responsible-Changed-When: Sun Jul 6 09:17:46 UTC 2008 Responsible-Changed-Why: Over to committer for MFC reminder to 6, if desired. (Otherwise, just set to 'suspended') http://www.freebsd.org/cgi/query-pr.cgi?pr=122145 From rwatson at FreeBSD.org Sun Jul 6 12:45:53 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Jul 6 12:46:01 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486FFF70.3090402@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> Message-ID: <20080706132148.E44832@fledge.watson.org> On Sat, 5 Jul 2008, Paul wrote: > ULE + PREEMPTION for non SMP no major differences with SMP with ULE/4BSD and > preemption ON/OFF > > 32 bit UP test coming up with new cpu and I'm installing dragonfly sometime > this weekend :] UP: 1mpps in one direction with no firewall/no routing table > is not too bad, but 1mpps both directions is the goal here 700kpps with full > bgp table in one direction is not too bad Ipfw needs a lot of work, barely > gets 500kpps with no routing table with a few ipfw rules loaded.. that's > horrible Linux barely takes a hit when you start loading iptables rules , > but then again linux has a HUGE problem with routing random packet > sources/ports .. grr My problem Is I need some box to do fast routing and > some to do firewall.. :/ I'll have 32 bit 7-stable UP test with ipfw/routing > table and then move on to dragonfly. I'll post the dragonfly results here as > well as sign up for their mailing list. First off, I would recommend using an 8-CURRENT kernel where possible (obviously, with all debugging features disabled), because that's where most of the work is going on right now. MFCs are scheduled for quite a bit of it, but over the course of several months, so using the 8-CURRENT kernel would allow you to help us test and exercise the new code, as well as improve our confidence in it so that it can be MFC'd in a timely manner :-). Experience suggests that forwarding workloads see significant lock contention in the routing and transmit queue code. The former needs some kernel hacking to address in order to improve parallelism for routing lookups. The latter is harder to address given the hardware you're using: modern 10gbps cards frequently offer multiple transmit queues that can be used independently (which our cxgb driver supports), but 1gbps cards generally don't. LOCK_PROFILING is an excellent tool for diagnosing locking hot spots -- it has a significant performance hit, but the results are generally accurate despite this. If your hardware supports hwpmc, that is also an excellent tool for monitoring what's going on. Seeing snapshots of, say, 10-20 seconds of profiling in the steady state, would help us understand better what is going on in your environment. There's some quite interesting work going on to improve network memory allocator efficiency, but that's a bit aways from commit to 8.x as I understand it, and probably not on the 7.x merge path due to the potential disruption it could cause. There's also a patch going around to offload the em_start function to the task queue that processes its input, which significantly reduces lock contention if you have bursty transmit. I'll see if I can't dig it up. Robert N M Watson Computer Laboratory University of Cambridge > > > Bart Van Kerckhove wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Paul / Ingo, >> >>>> I tried all of this :/ still, 256/512 descriptors seem to work the >>>> best. Happy to let you log into the machine and fiddle around if you >>>> want :) >> I've been watching this thread closely, since I'm in a very similair >> situation. >> A few questions/remarks: >> >> Does ULE provide better performance than 4BSD for forwarding? >> Did you try freebsd4 as well? This thread had a report about that quite >> opposite to my own experiences, -4 seemed to be a lot faster at forwarding >> than anything else I 've tried so far. >> Obviously the thing I'm interested in is IMIX - and 64byte packets. >> Does anyone have any benchmarks for DragonFly? I asked around on IRC, but >> that nor google turned up any useful results. >> >> >>> I don't think you will be able to route 64byte packets at 1gbit >>> wirespeed (2Mpps) with a current x86 platform. >>> >> Are there actual hardware related reasons this should not be possible, or >> is this purely lack of dedicated work towards this goal? >> >> >> >>> Theres a "sun" used at quagga dev as bgp-route-server. >>> http://quagga.net/route-server.php >>> (but they don't answered my question regarding fw-performance). >>> >> >> >> the Quagga guys are running a sun T1000 (niagara 1) route server - I happen >> to have the machine in my racks, >> please let me know if you want to run some tests on it, I'm sure they won't >> mind ;-) >> It should also make a great testbed for SMP performance testing imho (and >> they're pretty cheap these days) >> Also, feel free to use me as a relay for your questions, they're not always >> very reachable. >> >> >> >>> Perhaps you have some better luck at some different hardware systems >>> (ppc, mips, ..?) or use freebsd only for routing-table-updates and >>> special network-cards (netfpga) for real routing. >>> >> The netfpga site seems more or less dead - is this project still alive? >> It does look like a very interesting idea, even though it's currently quite >> linux-centric (and according to docs doesn't have VLAN nor ip6 support, the >> former being quite a dealbreaker) >> >> Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a >> freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of >> information you are providing us :) >> >> Met vriendelijke groet / With kind regards, >> >> Bart Van Kerckhove >> http://friet.net/pgp.txt >> >> -----BEGIN PGP SIGNATURE----- >> >> iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi >> eca31f7WQ/oXq9tJ8TEDN3CA >> =YGYq >> -----END PGP SIGNATURE----- >> >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From andre at freebsd.org Mon Jul 7 08:47:25 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 08:47:32 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080706132148.E44832@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> Message-ID: <4871D81B.8070507@freebsd.org> Robert Watson wrote: > Experience suggests that forwarding workloads see significant lock > contention in the routing and transmit queue code. The former needs > some kernel hacking to address in order to improve parallelism for > routing lookups. The latter is harder to address given the hardware > you're using: modern 10gbps cards frequently offer multiple transmit > queues that can be used independently (which our cxgb driver supports), > but 1gbps cards generally don't. Actually the routing code is not contended. The workload in router is mostly serialized without much opportunity for contention. With many interfaces and any-to-any traffic patterns it may get some contention. The locking overhead per packet is always there and has some impact though. -- Andre From rwatson at FreeBSD.org Mon Jul 7 08:54:09 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 08:54:16 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871D81B.8070507@freebsd.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> <4871D81B.8070507@freebsd.org> Message-ID: <20080707095013.N63144@fledge.watson.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Robert Watson wrote: >> Experience suggests that forwarding workloads see significant lock >> contention in the routing and transmit queue code. The former needs some >> kernel hacking to address in order to improve parallelism for routing >> lookups. The latter is harder to address given the hardware you're using: >> modern 10gbps cards frequently offer multiple transmit queues that can be >> used independently (which our cxgb driver supports), but 1gbps cards >> generally don't. > > Actually the routing code is not contended. The workload in router is > mostly serialized without much opportunity for contention. With many > interfaces and any-to-any traffic patterns it may get some contention. The > locking overhead per packet is always there and has some impact though. Yes, I don't see any real sources of contention until we reach the output code, which will run in the input if_em taskqueue threads, as the input path generates little or no contention of the packets are not destined for local delivery. I was a little concerned about mention of degrading performance as firewall complexity grows -- I suspect there's a nice project for someone to do looking at why this is the case. I was under the impression that, in 7.x and later, we use rwlocks to protect firewall state, and that unless stateful firewall rules are used, these are locked read-only rather than writable... Robert N M Watson Computer Laboratory University of Cambridge From andre at freebsd.org Mon Jul 7 09:02:08 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 09:02:14 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> Message-ID: <4871DB8E.5070903@freebsd.org> Ingo Flaschberger wrote: > Dear Paul, > >> I tried all of this :/ still, 256/512 descriptors seem to work the best. >> Happy to let you log into the machine and fiddle around if you want :) > > yes, but I'm shure I will also not be able to achieve much more pps. > As it seems that you hit hardware-software-level-barriers, my only idea > is to test dragonfly bsd, which seems to have less software overhead. I tested DragonFly some time ago with an Agilent N2X tester and it was by far the slowest of the pack. > I don't think you will be able to route 64byte packets at 1gbit > wirespeed (2Mpps) with a current x86 platform. You have to take inter-frame gap and other overheads too. That gives about 1.244Mpps max on a 1GigE interface. In general the chipsets and buses are able to transfer quite a bit of data. On a dual-opteron 848 I was able to sink 2.5Mpps into the machine with "ifconfig em[01] monitor" without hitting the cpu ceiling. This means that the bus and interrupt handling is not where most of the time is spent. When I did my profiling the saturation point was the cache miss penalty for accessing the packet headers. At saturation point about 50% of the time was spent waiting for the memory to make its way into the CPU. > I hoped to reach 1Mpps with the hardware I mentioned some mails before, > but 2Mpps is far far away. > Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. This is more or less expected. PCI32 is not able to sustain high packet rates. The bus setup times kill the speed. For larger packets the ratio gets much better and some reasonable throughput can be achieved. > Perhaps you have some better luck at some different hardware systems > (ppc, mips, ..?) or use freebsd only for routing-table-updates and > special network-cards (netfpga) for real routing. NetFPGA doesn't have enough TCAM space to be useful for real routing (as in Internet sized routing table). The trick many embedded networking CPUs use is cache prefetching that is integrated with the network controller. The first 64-128bytes of every packet are transferred automatically into the L2 cache by the hardware. This allows relatively slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale 7448 in NPE-G2) to get more than 1Mpps. Until something like this is possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed. -- Andre From andre at freebsd.org Mon Jul 7 09:11:39 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 09:11:46 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707095013.N63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> <4871D81B.8070507@freebsd.org> <20080707095013.N63144@fledge.watson.org> Message-ID: <4871DDC9.6060706@freebsd.org> Robert Watson wrote: > > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Robert Watson wrote: >>> Experience suggests that forwarding workloads see significant lock >>> contention in the routing and transmit queue code. The former needs >>> some kernel hacking to address in order to improve parallelism for >>> routing lookups. The latter is harder to address given the hardware >>> you're using: modern 10gbps cards frequently offer multiple transmit >>> queues that can be used independently (which our cxgb driver >>> supports), but 1gbps cards generally don't. >> >> Actually the routing code is not contended. The workload in router is >> mostly serialized without much opportunity for contention. With many >> interfaces and any-to-any traffic patterns it may get some >> contention. The locking overhead per packet is always there and has >> some impact though. > > Yes, I don't see any real sources of contention until we reach the > output code, which will run in the input if_em taskqueue threads, as the > input path generates little or no contention of the packets are not > destined for local delivery. I was a little concerned about mention of The interface output was the second largest block after the cache misses IIRC. The output part seems to have received only moderate attention and detailed performance analysis compared to the interface input path. Most network drivers do a write to the hardware for every packet sent in addition to other overhead that may be necessary for their transmit DMA rings. That adds significant overhead compared to the RX path where those costs are amortized over a larger number packets. > degrading performance as firewall complexity grows -- I suspect there's > a nice project for someone to do looking at why this is the case. I was > under the impression that, in 7.x and later, we use rwlocks to protect > firewall state, and that unless stateful firewall rules are used, these > are locked read-only rather than writable... The overhead of just looking at the packet (twice) in ipfw or other firewall packets is a huge overhead. The main loop of ipfw is a very large block of code. Unless one implements compilation of firewall to native machine code there is not much that can be done. With LLVM we will see some very interesting opportunity in that area. Other than that the ipfw instruction over per rule seems to be quite close to the optimum. I'm not saying one shouldn't take a close look with a profiler to verify this is actually the case. -- Andre From andre at freebsd.org Mon Jul 7 09:47:06 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 09:47:14 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48701921.7090107@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> Message-ID: <4871E618.1080500@freebsd.org> Paul, to get a systematic analysis of the performance please do the following tests and put them into a table for easy comparison: 1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 monitor) 2. inbound pps w/ fastforward into a single blackhole route 3. inbound pps /w fastforward into a single blackhole route w/ ipfw and just one allow all rule 4. inbound pps /w fastforward into a single blackhole route w/ ipfw and just one deny all rule 5. inbound pps /w fastforward into the disc(4) discard network interface 6. inbound pps /w fastforward into the disc(4) discard network interface w/ ipfw and just one allow all rule All surrounding parameters like RX/TX interface queue length, scheduler and so may me varied but should be noted. -- Andre Paul wrote: > UP 32 bit test vs 64 bit: > negligible difference in forwarding performance without polling > slightly better polling performance but still errors at lower packet rates > same massive hit with ipfw loaded > > Installing dragonfly in a bit.. > If anyone has a really fast PPC type system or SUN or something i'd love > to try it :) > Something with a really big L1 cache :P > > > Paul wrote: >> ULE + PREEMPTION for non SMP >> no major differences with SMP with ULE/4BSD and preemption ON/OFF >> >> 32 bit UP test coming up with new cpu >> and I'm installing dragonfly sometime this weekend :] >> UP: 1mpps in one direction with no firewall/no routing table is not >> too bad, but 1mpps both directions is the goal here >> 700kpps with full bgp table in one direction is not too bad >> Ipfw needs a lot of work, barely gets 500kpps with no routing table >> with a few ipfw rules loaded.. that's horrible >> Linux barely takes a hit when you start loading iptables rules , but >> then again linux has a HUGE problem with routing >> random packet sources/ports .. grr >> My problem Is I need some box to do fast routing and some to do >> firewall.. :/ >> I'll have 32 bit 7-stable UP test with ipfw/routing table and then >> move on to dragonfly. >> I'll post the dragonfly results here as well as sign up for their >> mailing list. >> >> >> Bart Van Kerckhove wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Paul / Ingo, >>> >>>>> I tried all of this :/ still, 256/512 descriptors seem to work the >>>>> best. Happy to let you log into the machine and fiddle around if you >>>>> want :) >>> I've been watching this thread closely, since I'm in a very similair >>> situation. >>> A few questions/remarks: >>> >>> Does ULE provide better performance than 4BSD for forwarding? >>> Did you try freebsd4 as well? This thread had a report about that quite >>> opposite to my own experiences, -4 seemed to be a lot faster at >>> forwarding >>> than anything else I 've tried so far. >>> Obviously the thing I'm interested in is IMIX - and 64byte packets. >>> Does anyone have any benchmarks for DragonFly? I asked around on IRC, >>> but >>> that nor google turned up any useful results. >>> >>> >>>> I don't think you will be able to route 64byte packets at 1gbit >>>> wirespeed (2Mpps) with a current x86 platform. >>>> >>> Are there actual hardware related reasons this should not be >>> possible, or >>> is this purely lack of dedicated work towards this goal? >>> >>> >>> >>>> Theres a "sun" used at quagga dev as bgp-route-server. >>>> http://quagga.net/route-server.php >>>> (but they don't answered my question regarding fw-performance). >>>> >>> >>> >>> the Quagga guys are running a sun T1000 (niagara 1) route server - I >>> happen >>> to have the machine in my racks, >>> please let me know if you want to run some tests on it, I'm sure they >>> won't >>> mind ;-) >>> It should also make a great testbed for SMP performance testing imho >>> (and >>> they're pretty cheap these days) >>> Also, feel free to use me as a relay for your questions, they're not >>> always >>> very reachable. >>> >>> >>> >>>> Perhaps you have some better luck at some different hardware systems >>>> (ppc, mips, ..?) or use freebsd only for routing-table-updates and >>>> special network-cards (netfpga) for real routing. >>>> >>> The netfpga site seems more or less dead - is this project still alive? >>> It does look like a very interesting idea, even though it's currently >>> quite >>> linux-centric (and according to docs doesn't have VLAN nor ip6 >>> support, the >>> former being quite a dealbreaker) >>> >>> Paul: I'm looking forward to the C2D 32bit benchmarks (maybe throw in a >>> freebsd4 and/or dragonfly bench if you can..) - appreciate the lots of >>> information you are providing us :) >>> >>> Met vriendelijke groet / With kind regards, >>> >>> Bart Van Kerckhove >>> http://friet.net/pgp.txt >>> >>> -----BEGIN PGP SIGNATURE----- >>> >>> iQA/AwUBSG/tMgoIFchBM0BKEQKUSQCcCJqsw2wtUX7HQi050HEDYX3WPuMAnjmi >>> eca31f7WQ/oXq9tJ8TEDN3CA >>> =YGYq >>> -----END PGP SIGNATURE----- >>> >>> >>> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From andre at freebsd.org Mon Jul 7 09:56:46 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 09:56:53 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <486B41D5.3060609@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> Message-ID: <4871E85C.8090907@freebsd.org> Paul wrote: > SMP DISABLED on my Opteron 2212 (ULE, Preemption on) > Yields ~750kpps in em0 and out em1 (one direction) > I am miffed why this yields more pps than > a) with all 4 cpus running and b) 4 cpus with lagg load balanced over 3 > incoming connections so 3 taskq threads SMP adds quite some overhead in the generic case is currently not well suited for high performance packet forwarding. On SMP interrupts are delivered to one CPU but not necessarily the one that will later on handle the taskqueue to process the packets. That adds overhead. Ideally the interrupt for each network interface is bound to exactly one pre-determined CPU and the taskqueue is bound to the same CPU. That way the overhead for interrupt and taskqueue scheduling can be kept at a minimum. Most of the infrastructure to do this binding already exists in the kernel but is not yet exposed to the outside for us to make use of it. I'm also not sure if the ULE scheduler skips the more global locks when interrupt and the thread are on the same CPU. Distributing the interrupts and taskqueues among the available CPUs gives concurrent forwarding with bi- or multi-directional traffic. All incoming traffic from any particular interface is still serialized though. -- Andre > I would be willing to set up test equipment (several servers plugged > into a switch) with ipkvm and power port access > if someone or a group of people want to figure out ways to improve the > routing process, ipfw, and lagg. > > Maximum PPS with one ipfw rule on UP: > tops out about 570Kpps.. almost 200kpps lower ? (frown) > > I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in > here and see how that scales, using UP same kernel etc I have now. > > > > > > Julian Elischer wrote: >> Paul wrote: >>> ULE without PREEMPTION is now yeilding better results. >>> input (em0) output >>> packets errs bytes packets errs bytes colls >>> 571595 40639 34564108 1 0 226 0 >>> 577892 48865 34941908 1 0 178 0 >>> 545240 84744 32966404 1 0 178 0 >>> 587661 44691 35534512 1 0 178 0 >>> 587839 38073 35544904 1 0 178 0 >>> 587787 43556 35540360 1 0 178 0 >>> 540786 39492 32712746 1 0 178 0 >>> 572071 55797 34595650 1 0 178 0 >>> >>> *OUCH, IPFW HURTS.. >>> loading ipfw, and adding one ipfw rule allow ip from any to any drops >>> 100Kpps off :/ what's up with THAT? >>> unloaded ipfw module and back 100kpps more again, that's not right >>> with ONE rule.. :/ >> >> ipfw need sto gain a lock on hte firewall before running, >> and is quite complex.. I can believe it.. >> >> in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two >> interfaces (bridged) but I think it has slowed down since then due to >> the SMP locking. >> >> >>> >>> em0 taskq is still jumping cpus.. is there any way to lock it to one >>> cpu or is this just a function of ULE >>> >>> running a tar czpvf all.tgz * and seeing if pps changes.. >>> negligible.. guess scheduler is doing it's job at least.. >>> >>> Hmm. even when it's getting 50-60k errors per second on the interface >>> I can still SCP a file through that interface although it's not >>> fast.. 3-4MB/s.. >>> >>> You know, I wouldn't care if it added 5ms latency to the packets when >>> it was doing 1mpps as long as it didn't drop any.. Why can't it do >>> that? Queue them up and do them in bigggg chunks so none are >>> dropped........hmm? >>> >>> 32 bit system is compiling now.. won't do > 400kpps with GENERIC >>> kernel, as with 64 bit did 450k with GENERIC, although that could be >>> the difference between opteron 270 and opteron 2212.. >>> >>> Paul >>> >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From brde at optusnet.com.au Mon Jul 7 10:00:22 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 10:00:28 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871DB8E.5070903@freebsd.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> Message-ID: <20080707191918.B4703@besplex.bde.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Ingo Flaschberger wrote: >> I don't think you will be able to route 64byte packets at 1gbit wirespeed >> (2Mpps) with a current x86 platform. > > You have to take inter-frame gap and other overheads too. That gives > about 1.244Mpps max on a 1GigE interface. What are the other overheads? I calculate 1.644Mpps counting the inter-frame gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes is for the payload, then the max is much lower. >> I hoped to reach 1Mpps with the hardware I mentioned some mails before, but >> 2Mpps is far far away. >> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. > > This is more or less expected. PCI32 is not able to sustain high > packet rates. The bus setup times kill the speed. For larger packets > the ratio gets much better and some reasonable throughput can be achieved. I get about 640 kpps without forwarding (sendto: slightly faster; recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory from 200MHz to 100MHz only reduces the speed by about 10%, while not overclocking the CPU by 10% reduces the speed by the same 10%, so the system is apparently still mainly CPU-bound. > NetFPGA doesn't have enough TCAM space to be useful for real routing > (as in Internet sized routing table). The trick many embedded networking > CPUs use is cache prefetching that is integrated with the network > controller. The first 64-128bytes of every packet are transferred > automatically into the L2 cache by the hardware. This allows relatively > slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale > 7448 in NPE-G2) to get more than 1Mpps. Until something like this is > possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed. Does using fa$ter memory (speed and/or latency) help here? 64 bytes is so small that latency may be more of a problem, especially without a prefetch. Bruce From rwatson at FreeBSD.org Mon Jul 7 10:48:39 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 10:48:46 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871E85C.8090907@freebsd.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> Message-ID: <20080707114538.K63144@fledge.watson.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Distributing the interrupts and taskqueues among the available CPUs gives > concurrent forwarding with bi- or multi-directional traffic. All incoming > traffic from any particular interface is still serialized though. ... although not on multiple input queue-enabled hardware and drivers. While I've really only focused on local traffic performance with my 10gbps Chelsio setup, it should be possible to do packet forwarding from multiple input queues using that hardware and driver today. I'll update the netisr2 patches, which allow work to be pushed to multiple CPUs from a single input queue. However, these necessarily take a cache miss or two on packet header data in order to break out the packets from the input queue into flows that can be processed independently without ordering constraints, so if those cache misses on header data are a big part of the performance of a configuration, load balancing in this manner may not help. What would be neat is if the cards without multiple input queues could still tag receive descriptors with a flow identifier generated from the IP/TCP/etc layers that could be used for work placement. Robert N M Watson Computer Laboratory University of Cambridge > > -- > Andre > >> I would be willing to set up test equipment (several servers plugged into a >> switch) with ipkvm and power port access >> if someone or a group of people want to figure out ways to improve the >> routing process, ipfw, and lagg. >> >> Maximum PPS with one ipfw rule on UP: >> tops out about 570Kpps.. almost 200kpps lower ? (frown) >> >> I'm going to drop in a 3ghz opteron instead of the 2ghz 2212 that's in here >> and see how that scales, using UP same kernel etc I have now. >> >> >> >> >> >> Julian Elischer wrote: >>> Paul wrote: >>>> ULE without PREEMPTION is now yeilding better results. >>>> input (em0) output >>>> packets errs bytes packets errs bytes colls >>>> 571595 40639 34564108 1 0 226 0 >>>> 577892 48865 34941908 1 0 178 0 >>>> 545240 84744 32966404 1 0 178 0 >>>> 587661 44691 35534512 1 0 178 0 >>>> 587839 38073 35544904 1 0 178 0 >>>> 587787 43556 35540360 1 0 178 0 >>>> 540786 39492 32712746 1 0 178 0 >>>> 572071 55797 34595650 1 0 178 0 >>>> *OUCH, IPFW HURTS.. >>>> loading ipfw, and adding one ipfw rule allow ip from any to any drops >>>> 100Kpps off :/ what's up with THAT? >>>> unloaded ipfw module and back 100kpps more again, that's not right with >>>> ONE rule.. :/ >>> >>> ipfw need sto gain a lock on hte firewall before running, >>> and is quite complex.. I can believe it.. >>> >>> in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two >>> interfaces (bridged) but I think it has slowed down since then due to the >>> SMP locking. >>> >>> >>>> >>>> em0 taskq is still jumping cpus.. is there any way to lock it to one cpu >>>> or is this just a function of ULE >>>> >>>> running a tar czpvf all.tgz * and seeing if pps changes.. >>>> negligible.. guess scheduler is doing it's job at least.. >>>> >>>> Hmm. even when it's getting 50-60k errors per second on the interface I >>>> can still SCP a file through that interface although it's not fast.. >>>> 3-4MB/s.. >>>> >>>> You know, I wouldn't care if it added 5ms latency to the packets when it >>>> was doing 1mpps as long as it didn't drop any.. Why can't it do that? >>>> Queue them up and do them in bigggg chunks so none are >>>> dropped........hmm? >>>> >>>> 32 bit system is compiling now.. won't do > 400kpps with GENERIC kernel, >>>> as with 64 bit did 450k with GENERIC, although that could be >>>> the difference between opteron 270 and opteron 2212.. >>>> >>>> Paul >>>> >>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >>> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From bugmaster at FreeBSD.org Mon Jul 7 11:07:03 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 7 11:08:44 2008 Subject: Current problem reports assigned to freebsd-net@FreeBSD.org Message-ID: <200807071107.m67B72MO062112@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c o kern/35442 net [sis] [patch] Problem transmitting runts in if_sis dri a kern/38554 net changing interface ipaddress doesn't seem to work s kern/39937 net ipstealth issue s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/79895 net [ipf] 5.4-RC2 breaks ipfilter NAT when using netgraph s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter s kern/86920 net [ndis] ifconfig: SIOCS80211: Invalid argument [regress o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/92090 net [bge] bge0: watchdog timeout -- resetting f kern/92552 net A serious bug in most network drivers from 5.X to 6.X o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau f kern/102344 net [ipf] Some packets do not pass through network interfa o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] s kern/105943 net Network stack may modify read-only mbuf chain copies o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/106438 net [ipf] ipfilter: keep state does not seem to allow repl o kern/108542 net [bce]: Huge network latencies with 6.2-RELEASE / STABL o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/109308 net [pppd] [panic] Multiple panics kernel ppp suspected [r o kern/109733 net [bge] bge link state issues [regression] o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 net [udp] IP v4 udp fragmented packet reject o kern/113842 net [ip6] PF_INET6 proto domain state can't be cleared wit o kern/114714 net [gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net [ip] [patch] 6.2-STABLE panic during use of multi-cast o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/116328 net [bge]: Solid hang with bge interface o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 net [tun] [panic] [patch] ifconfig tunX destroy: panic o kern/117043 net [em] Intel PWLA8492MT Dual-Port Network adapter EEPROM o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117423 net [vlan] Duplicate IP on different interfaces o kern/117448 net [carp] 6.2 kernel crash [regression] o kern/118880 net [ip6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/119345 net [ath] Unsuported Atheros 5424/2424 and CPU speedstep n o kern/119361 net [bge] bge(4) transmit performance problem o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/120130 net [carp] [panic] carp causes kernel panics in any conste o kern/120266 net [panic] gnugk causes kernel panic when closing UDP soc o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/121080 net [bge] IPv6 NUD problem on multi address config on bge0 o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/121298 net [em] [panic] Fatal trap 12: page fault while in kernel o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121624 net [em] [regression] Intel em WOL fails after upgrade to o kern/121872 net [wpi] driver fails to attach on a fujitsu-siemens s711 o kern/121983 net [fxp] fxp0 MBUF and PAE o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup [reg o kern/122058 net [em] [panic] Panic on em1: taskq o kern/122082 net [in_pcb] NULL pointer dereference in in_pcbdrop o kern/122195 net [ed] Alignment problems in if_ed f kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122427 net [apm] [panic] apm and mDNSResponder cause panic during o kern/122551 net [bge] Broadcom 5715S no carrier on HP BL460c blade usi o kern/122685 net It is not visible passing packets in tcpdump o kern/122743 net [panic] vm_page_unwire: invalid wire count: 0 o kern/122772 net [em] em0 taskq panic, tcp reassembly bug causes radix f kern/122794 net [lagg] Kernel panic after brings lagg(8) up if NICs ar f conf/122858 net [nsswitch.conf] nsswitch in 7.0 is f*cked up o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/123066 net [ipsec] [panic] kernel trap with ipsec o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 f kern/123172 net [bce] Watchdog timeout problems with if_bce f kern/123200 net [netgraph] Server failure due to netgraph mpd and dhcp o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123347 net [bge] bge1: watchdog timeout -- linkstate changed to D o kern/123429 net [nfe] [hang] "ifconfig nfe up" causes a hard system lo o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o kern/123603 net [tcp] tcp_do_segment and Received duplicate SYN o kern/123617 net [tcp] breaking connection when client downloading file o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123881 net [tcp] Turning on TCP blackholing causes slow localhost o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/124127 net [msk] watchdog timeout (missed Tx interrupts) -- recov o kern/124753 net [ieee80211] net80211 discards power-save queue packets o kern/124904 net [fxp] EEPROM corruption with Compaq NC3163 NIC o kern/125079 net [ppp] host routes added by ppp with gateway flag (regr o kern/125195 net [fxp] fxp(4) driver failed to initialize device Intel 94 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o conf/23063 net [PATCH] for static ARP tables in rc.network o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net [nfs] [patch] NFS root configurations without dynamic s kern/60293 net FreeBSD arp poison patch o kern/64556 net [sis] if_sis short cable fix problems with NetGear FA3 o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/77273 net [ipf] ipfilter breaks ipv6 statefull filtering on 5.3 o kern/77913 net [wi] [patch] Add the APDL-325 WLAN pccard to wi(4) o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o bin/79228 net [patch] extend arp(8) to be able to create blackhole r o kern/91594 net [em] FreeBSD > 5.4 w/ACPI fails to detect Intel Pro/10 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/95267 net packet drops periodically appear o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/102035 net [plip] plip networking disables parallel port printing o conf/102502 net [patch] ifconfig name does't rename netgraph node in n o conf/107035 net [patch] bridge interface given in rc.conf not taking a o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o kern/112179 net [sis] [patch] sis driver for natsemi DP83815D autonego o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o bin/117339 net [patch] route(8): loading routing management commands o kern/118727 net [netgraph] [patch] [request] add new ng_pf module a kern/118879 net [bge] [patch] bge has checksum problems on the 5703 ch o bin/118987 net ifconfig(8): ifconfig -l (address_family) does not wor o kern/119432 net [arp] route add -host -iface causes arp e f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/120232 net [nfe] [patch] Bring in nfe(4) to RELENG_6 o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121443 net [gif] LOR icmp6_input/nd6_lookup o kern/121706 net [netinet] [patch] "rtfree: 0xc4383870 has 1 refs" emit s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122068 net [ppp] ppp can not set the correct interface with pptpd o kern/122295 net [bge] bge Ierr rate increase (since 6.0R) [regression] o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122697 net [ath] Atheros card is not well supported o kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge f kern/122839 net [multicast] FreeBSD 7 multicast routing problem o kern/122928 net [em] interface watchdog timeouts and stops receiving p o kern/123892 net [tap] [patch] No buffer space available p kern/123961 net [vr] [patch] Allow vr interface to handle vlans o bin/124004 net ifconfig(8): Cannot assign both an IP and a MAC addres o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124609 net [ipsec] [panic] ipsec 'remainder too big' panic with p o kern/124767 net [iwi] Wireless connection using iwi0 driver (Intel 220 o kern/125003 net [gif] incorrect EtherIP header format. o kern/125239 net [gre] kernel crash when using gre o kern/125258 net [socket] socket's SO_REUSEADDR option does not work 56 problems total. From andre at freebsd.org Mon Jul 7 11:13:22 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 11:13:28 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707114538.K63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <20080707114538.K63144@fledge.watson.org> Message-ID: <4871FA4F.40206@freebsd.org> Robert Watson wrote: > > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Distributing the interrupts and taskqueues among the available CPUs >> gives concurrent forwarding with bi- or multi-directional traffic. All >> incoming traffic from any particular interface is still serialized >> though. > > ... although not on multiple input queue-enabled hardware and drivers. > While I've really only focused on local traffic performance with my > 10gbps Chelsio setup, it should be possible to do packet forwarding from > multiple input queues using that hardware and driver today. > > I'll update the netisr2 patches, which allow work to be pushed to > multiple CPUs from a single input queue. However, these necessarily > take a cache miss or two on packet header data in order to break out the > packets from the input queue into flows that can be processed > independently without ordering constraints, so if those cache misses on > header data are a big part of the performance of a configuration, load > balancing in this manner may not help. What would be neat is if the > cards without multiple input queues could still tag receive descriptors > with a flow identifier generated from the IP/TCP/etc layers that could > be used for work placement. The cache miss is really the elephant in the room. If the network card supports multiple RX rings with separate interrupts and a stable hash based (that includes IP+Port src+dst) distribution they can be bound to different CPUs. It is very important to maintain the packet order for flows that go through the router. Otherwise TCP and VoIP will suffer. -- Andre From andre at freebsd.org Mon Jul 7 11:18:01 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 11:18:07 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707191918.B4703@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.or g> Message-ID: <4871FB66.1060406@freebsd.org> Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Ingo Flaschberger wrote: >>> I don't think you will be able to route 64byte packets at 1gbit >>> wirespeed (2Mpps) with a current x86 platform. >> >> You have to take inter-frame gap and other overheads too. That gives >> about 1.244Mpps max on a 1GigE interface. > > What are the other overheads? I calculate 1.644Mpps counting the > inter-frame > gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes > is for the payload, then the max is much lower. The theoretical maximum at 64byte frames is 1,488,100. I've looked up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. >>> I hoped to reach 1Mpps with the hardware I mentioned some mails >>> before, but 2Mpps is far far away. >>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. >> >> This is more or less expected. PCI32 is not able to sustain high >> packet rates. The bus setup times kill the speed. For larger packets >> the ratio gets much better and some reasonable throughput can be >> achieved. > > I get about 640 kpps without forwarding (sendto: slightly faster; > recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory > from 200MHz to 100MHz only reduces the speed by about 10%, while not > overclocking the CPU by 10% reduces the speed by the same 10%, so the > system is apparently still mainly CPU-bound. On PCI32@33MHz? He's using a 1.2GHz Mobile Pentium on top of that. >> NetFPGA doesn't have enough TCAM space to be useful for real routing >> (as in Internet sized routing table). The trick many embedded networking >> CPUs use is cache prefetching that is integrated with the network >> controller. The first 64-128bytes of every packet are transferred >> automatically into the L2 cache by the hardware. This allows relatively >> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale >> 7448 in NPE-G2) to get more than 1Mpps. Until something like this is >> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed. > > Does using fa$ter memory (speed and/or latency) help here? 64 bytes > is so small that latency may be more of a problem, especially without > a prefetch. Latency. For IPv4 packet forwarding only one cache line per packet is fetched. More memory speed only helps with the DMA from/to the network card. -- Andre From kris at FreeBSD.org Mon Jul 7 11:32:01 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Mon Jul 7 11:32:08 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871D81B.8070507@freebsd.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> <4871D81B.8070507@freebsd.org> Message-ID: <4871FEAF.1060501@FreeBSD.org> Andre Oppermann wrote: > Robert Watson wrote: >> Experience suggests that forwarding workloads see significant lock >> contention in the routing and transmit queue code. The former needs >> some kernel hacking to address in order to improve parallelism for >> routing lookups. The latter is harder to address given the hardware >> you're using: modern 10gbps cards frequently offer multiple transmit >> queues that can be used independently (which our cxgb driver >> supports), but 1gbps cards generally don't. > > Actually the routing code is not contended. The workload in router > is mostly serialized without much opportunity for contention. With > many interfaces and any-to-any traffic patterns it may get some > contention. The locking overhead per packet is always there and has > some impact though. > Actually contention from route locking is a major bottleneck even on packet generation from multiple CPUs on a single host. It is becoming increasingly necessary that someone look into fixing this. Kris From brde at optusnet.com.au Mon Jul 7 12:31:05 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 12:31:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871FB66.1060406@freebsd.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> Message-ID: <20080707213356.G7572@besplex.bde.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Bruce Evans wrote: >> What are the other overheads? I calculate 1.644Mpps counting the >> inter-frame >> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes >> is for the payload, then the max is much lower. > > The theoretical maximum at 64byte frames is 1,488,100. I've looked > up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)). 1.488095 is for 64 bits extra (10^9/(8*64+96+64)). >>>> I hoped to reach 1Mpps with the hardware I mentioned some mails before, >>>> but 2Mpps is far far away. >>>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. >>> >>> This is more or less expected. PCI32 is not able to sustain high >>> packet rates. The bus setup times kill the speed. For larger packets >>> the ratio gets much better and some reasonable throughput can be achieved. >> >> I get about 640 kpps without forwarding (sendto: slightly faster; >> recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory >> from 200MHz to 100MHz only reduces the speed by about 10%, while not >> overclocking the CPU by 10% reduces the speed by the same 10%, so the >> system is apparently still mainly CPU-bound. > > On PCI32@33MHz? He's using a 1.2GHz Mobile Pentium on top of that. Yes. My example shows that FreeBSD is more CPU-bound than I/O bound up to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is fast relative to its clock speed). The memory interface may matter more than the CPU clock. >>> NetFPGA doesn't have enough TCAM space to be useful for real routing >>> (as in Internet sized routing table). The trick many embedded networking >>> CPUs use is cache prefetching that is integrated with the network >>> controller. The first 64-128bytes of every packet are transferred >>> automatically into the L2 cache by the hardware. This allows relatively >>> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale >>> 7448 in NPE-G2) to get more than 1Mpps. Until something like this is >>> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed. >> >> Does using fa$ter memory (speed and/or latency) help here? 64 bytes >> is so small that latency may be more of a problem, especially without >> a prefetch. > > Latency. For IPv4 packet forwarding only one cache line per packet > is fetched. More memory speed only helps with the DMA from/to the > network card. I use low-end memory, but on the machine that does 640 kpps it somehow has latency almost 4 times as low as on new FreeBSD cluster machines (~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc report an average of 11 k8-dc-misses per sendto() while sending via bge at 640 kpps. 11 * 42 accounts for 442 nsec out of the 1562 per packet at this rate. 11 * 150 = 1650 would probably make this rate unachievable despite the system having 20 times as much CPU and bus. Bruce From rwatson at FreeBSD.org Mon Jul 7 12:44:41 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 12:44:48 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707213356.G7572@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> Message-ID: <20080707134036.S63144@fledge.watson.org> On Mon, 7 Jul 2008, Bruce Evans wrote: > I use low-end memory, but on the machine that does 640 kpps it somehow has > latency almost 4 times as low as on new FreeBSD cluster machines (~42 nsec > instead of ~150). perfmon (fixed for AXP and A64) and hwpmc report an > average of 11 k8-dc-misses per sendto() while sending via bge at 640 kpps. > 11 * 42 accounts for 442 nsec out of the 1562 per packet at this rate. 11 * > 150 = 1650 would probably make this rate unachievable despite the system > having 20 times as much CPU and bus. Since you're doing fine-grained performance measurements of a code path that interests me a lot, could you compare the cost per-send on UDP for the following four cases: (1) sendto() to a specific address and port on a socket that has been bound to INADDR_ANY and a specific port. (2) sendto() on a specific address and port on a socket that has been bound to a specific IP address (not INADDR_ANY) and a specific port. (3) send() on a socket that has been connect()'d to a specific IP address and a specific port, and bound to INADDR_ANY and a specific port. (4) send() on a socket that has been connect()'d to a specific IP address and a specific port, and bound to a specific IP address (not INADDR_ANY) and a specific port. The last of these should really be quite a bit faster than the first of these, but I'd be interested in seeing specific measurements for each if that's possible! Thanks, Robert N M Watson Computer Laboratory University of Cambridge From rwatson at FreeBSD.org Mon Jul 7 12:46:08 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 12:46:14 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707134036.S63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707134036.S63144@fledge.watson.org> Message-ID: <20080707134514.P63144@fledge.watson.org> On Mon, 7 Jul 2008, Robert Watson wrote: > The last of these should really be quite a bit faster than the first of > these, but I'd be interested in seeing specific measurements for each if > that's possible! And, if you're feeling particualrly subject to suggestion, you might consider comparing 7.0 recent 8.x along the same dimensions :-). Robert N M Watson Computer Laboratory University of Cambridge From brde at optusnet.com.au Mon Jul 7 12:56:24 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 12:56:35 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707134036.S63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707134036.S63144@fledge.watson.org> Message-ID: <20080707224659.B7844@besplex.bde.org> On Mon, 7 Jul 2008, Robert Watson wrote: > Since you're doing fine-grained performance measurements of a code path that > interests me a lot, could you compare the cost per-send on UDP for the > following four cases: > > (1) sendto() to a specific address and port on a socket that has been bound > to > INADDR_ANY and a specific port. > > (2) sendto() on a specific address and port on a socket that has been bound > to > a specific IP address (not INADDR_ANY) and a specific port. > > (3) send() on a socket that has been connect()'d to a specific IP address and > a specific port, and bound to INADDR_ANY and a specific port. > > (4) send() on a socket that has been connect()'d to a specific IP address > and a specific port, and bound to a specific IP address (not INADDR_ANY) > and a specific port. > > The last of these should really be quite a bit faster than the first of > these, but I'd be interested in seeing specific measurements for each if > that's possible! Not sure if I understand networking well enough to set these up quickly. Does netrate use one of (3) or (4) now? I can tell you vaguely about old results for netrate (send()) vs ttcp (sendto()). send() is lighter weight of course, and this made a difference of 10-20%, but after further tuning the difference became smaller, which suggests that everything ends up waiting for something in common. Now I can measure cache misses better and hope that a simple count of cache misses will be a more reproducible indicator of significant bottlenecks than pps. I got nowhere trying to reduce instruction counts, possibly because it would take avoiding 100's of instructions to get the same benefit as avoiding a single cache miss. Bruce From gavin at FreeBSD.org Mon Jul 7 13:28:38 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Mon Jul 7 13:28:44 2008 Subject: kern/125195: [fxp] fxp(4) driver failed to initialize device Intel 82801DB Message-ID: <200807071328.m67DSbh4079988@freefall.freebsd.org> Synopsis: [fxp] fxp(4) driver failed to initialize device Intel 82801DB State-Changed-From-To: open->feedback State-Changed-By: gavin State-Changed-When: Mon Jul 7 13:27:12 UTC 2008 State-Changed-Why: To submitter: Could you give the putput of "pciconf -l |grep fxp" please? http://www.freebsd.org/cgi/query-pr.cgi?pr=125195 From ertr1013 at student.uu.se Mon Jul 7 13:31:00 2008 From: ertr1013 at student.uu.se (Erik Trulsson) Date: Mon Jul 7 13:31:07 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707213356.G7572@besplex.bde.org> References: <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> Message-ID: <20080707131550.GA69202@owl.midgard.homeip.net> On Mon, Jul 07, 2008 at 10:30:53PM +1000, Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > > > Bruce Evans wrote: > >> What are the other overheads? I calculate 1.644Mpps counting the > >> inter-frame > >> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes > >> is for the payload, then the max is much lower. > > > > The theoretical maximum at 64byte frames is 1,488,100. I've looked > > up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. > > Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)). > 1.488095 is for 64 bits extra (10^9/(8*64+96+64)). A standard ethernet frame (on the wire) consists of: 7 octets preamble 1 octet Start Frame Delimiter 6 octets destination address 6 octets source address 2 octets length/type 46-1500 octets data (+padding if needed) 4 octets Frame Check Sequence Followed by (at least) 96 bits interFrameGap, before the next frame starts. For minimal packet size this gives a maximum packet rate at 1Gbit/s of 1e9/((7+1+6+6+2+46+4)*8+96)/ = 1488095 packets/second You probably missed the preamble and start frame delimiter in your calculation. -- Erik Trulsson ertr1013@student.uu.se From andre at freebsd.org Mon Jul 7 13:37:30 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 13:37:37 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707213356.G7572@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> Message-ID: <48721C18.4060109@freebsd.org> Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Bruce Evans wrote: >>> What are the other overheads? I calculate 1.644Mpps counting the >>> inter-frame >>> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes >>> is for the payload, then the max is much lower. >> >> The theoretical maximum at 64byte frames is 1,488,100. I've looked >> up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. > > Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)). > 1.488095 is for 64 bits extra (10^9/(8*64+96+64)). The preamble has 64 bits and is in addition to the inter-frame gap. >>>>> I hoped to reach 1Mpps with the hardware I mentioned some mails >>>>> before, but 2Mpps is far far away. >>>>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium. >>>> >>>> This is more or less expected. PCI32 is not able to sustain high >>>> packet rates. The bus setup times kill the speed. For larger packets >>>> the ratio gets much better and some reasonable throughput can be >>>> achieved. >>> >>> I get about 640 kpps without forwarding (sendto: slightly faster; >>> recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory >>> from 200MHz to 100MHz only reduces the speed by about 10%, while not >>> overclocking the CPU by 10% reduces the speed by the same 10%, so the >>> system is apparently still mainly CPU-bound. >> >> On PCI32@33MHz? He's using a 1.2GHz Mobile Pentium on top of that. > > Yes. My example shows that FreeBSD is more CPU-bound than I/O bound up > to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is > fast relative to its clock speed). The memory interface may matter more > than the CPU clock. > >>>> NetFPGA doesn't have enough TCAM space to be useful for real routing >>>> (as in Internet sized routing table). The trick many embedded >>>> networking >>>> CPUs use is cache prefetching that is integrated with the network >>>> controller. The first 64-128bytes of every packet are transferred >>>> automatically into the L2 cache by the hardware. This allows >>>> relatively >>>> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz >>>> Freescale >>>> 7448 in NPE-G2) to get more than 1Mpps. Until something like this is >>>> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM >>>> speed. >>> >>> Does using fa$ter memory (speed and/or latency) help here? 64 bytes >>> is so small that latency may be more of a problem, especially without >>> a prefetch. >> >> Latency. For IPv4 packet forwarding only one cache line per packet >> is fetched. More memory speed only helps with the DMA from/to the >> network card. > > I use low-end memory, but on the machine that does 640 kpps it somehow > has latency almost 4 times as low as on new FreeBSD cluster machines > (~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc > report an average of 11 k8-dc-misses per sendto() while sending via > bge at 640 kpps. 11 * 42 accounts for 442 nsec out of the 1562 per > packet at this rate. 11 * 150 = 1650 would probably make this rate > unachievable despite the system having 20 times as much CPU and bus. We were talking routing here. That is a packet received via network interface and sent out on another. Crosses the PCI bus twice. -- Andre From rwatson at FreeBSD.org Mon Jul 7 13:39:00 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 13:39:06 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707224659.B7844@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707134036.S63144@fledge.watson.org> <20080707224659.B7844@besplex.bde.org> Message-ID: <20080707142018.U63144@fledge.watson.org> On Mon, 7 Jul 2008, Bruce Evans wrote: >> (1) sendto() to a specific address and port on a socket that has been bound >> to >> INADDR_ANY and a specific port. >> >> (2) sendto() on a specific address and port on a socket that has been bound >> to >> a specific IP address (not INADDR_ANY) and a specific port. >> >> (3) send() on a socket that has been connect()'d to a specific IP address >> and >> a specific port, and bound to INADDR_ANY and a specific port. >> >> (4) send() on a socket that has been connect()'d to a specific IP address >> and a specific port, and bound to a specific IP address (not INADDR_ANY) >> and a specific port. >> >> The last of these should really be quite a bit faster than the first of >> these, but I'd be interested in seeing specific measurements for each if >> that's possible! > > Not sure if I understand networking well enough to set these up quickly. > Does netrate use one of (3) or (4) now? (3) and (4) are effectively the same thing, I think, since connect(2) should force the selection of a source IP address, but I think it's not a bad idea to confirm that. :-) The structure of the desired micro-benchmark here is basically: int main(int argc, char *argv) { struct sockaddr_in sin; /* Parse command line arguments such as addresss and ports. */ if (bind_desired) { /* Set up sockaddr_in. */ if (bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(-1, "bind"); } /* Set up destination sockaddr_in. */ if (connect_desired) { if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(-1, "connect"); } while (appropriate_condition) { if (connect_desired) { if (send(s, ...) < 0) errors++; } else { if (sendto(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) errors++; } } } > I can tell you vaguely about old results for netrate (send()) vs ttcp > (sendto()). send() is lighter weight of course, and this made a difference > of 10-20%, but after further tuning the difference became smaller, which > suggests that everything ends up waiting for something in common. > > Now I can measure cache misses better and hope that a simple count of cache > misses will be a more reproducible indicator of significant bottlenecks than > pps. I got nowhere trying to reduce instruction counts, possibly because it > would take avoiding 100's of instructions to get the same benefit as > avoiding a single cache miss. If you look at the design of the higher performance UDP applications, they will generally bind a specific IP (perhaps every IP on the host with its own socket), and if they do sustained communication to a specific endpoint they will use connect(2) rather than providing an address for each send(2) system call to the kernel. udp_output(2) makes the trade-offs there fairly clear: with the most recent rev, the optimal case is one connect(2) has been called, allowing a single inpcb read lock and no global data structure access, vs. an application calling sendto(2) for each system call and the local binding remaining INADDR_ANY. Middle ground applications, such as named(8) will force a local binding using bind(2), but then still have to pass an address to each sendto(2). In the future, this case will be further optimized in our code by using a global read lock rather than a global write lock: we have to check for collisions, but we don't actually have to reserve the new 4-tuple for the UDP socket as it's an ephemeral association rather than a connect(2). Robert N M Watson Computer Laboratory University of Cambridge From brde at optusnet.com.au Mon Jul 7 15:35:26 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 15:35:33 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871E618.1080500@freebsd.org> References: <4867420D.7090406@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr><4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> Message-ID: <20080708002228.G680@besplex.bde.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Paul, > > to get a systematic analysis of the performance please do the following > tests and put them into a table for easy comparison: > > 1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 > monitor) >... I won't be running many of these tests, but found this one interesting -- I didn't know about monitor mode. It gives the following behaviour: -monitor ttcp receiving on bge0 at 397 kpps: 35% idle (8.0-CURRENT) 13.6 cm/p monitor ttcp receiving on bge0 at 397 kpps: 83% idle (8.0-CURRENT) 5.8 cm/p -monitor ttcp receiving on em0 at 580 kpps: 5% idle (~5.2) 12.5 cm/p monitor ttcp receiving on em0 at 580 kpps: 65% idle (~5.2) 4.8 cm/p cm/p = k8-dc-misses (bge0 system) cm/p = k7-dc-misses (em0 system) So it seems that the major overheads are not near the driver (as I already knew), and upper layers are responsible for most of the cache misses. The packet header is accessed even in monitor mode, so I think most of the cache misses in upper layers are not related to the packet header. Maybe they are due mainly to perfect non-locality for mbufs. Other cm/p numbers: ttcp sending on bge0 at 640 kpps: (~5.2) 11 cm/p ttcp sending on bge0 at 580 kpps: (8.0-CURRENT) 9 cm/p (-current is 10% slower despite having lower cm/p. This seems to be due to extra instructions executed) ping -fq -c1000000 localhost at 171 kpps: (8.0-CURRENT) 12-33 cm/p (This is certainly CPU-bound. lo0 is much slower than bge0. Latency (rtt) is 2 us. It is 3 us in ~5.2 and was 4 in -current until very recently.) ping -fq -c1000000 etherhost at 40 kpps: (8.0-CURRENT) 55 cm/p (The rate is quite low because flood ping doesn't actually flood. It tries to limit the rate to max(100, 1/latency), but it tends to go at a rate of ql(t)/latency where ql(t) is the average hardware queue length at the current time t. ql(t) starts at 1 and builds up after a minute or 2 to a maximum of about 10 on my hardware. Latency is always ~100 us, so the average ql(t) must have been ~4.) ping -fq -c1000000 etherhost at 20 kpps: (8.0-CURRENT) 45 cm/p (Another run to record the average latency (it was 121) showed high variance.) netblast sending on bge0 at 582 kpps: (8.0-CURRENT) 9.8 cm/p (Packet blasting benchmarks actually flood, unlike flood ping. This is hard to implement, since select() for output-ready doesn't work. netblast has to busy wait, while ttcp guesses how long to sleep but cannot sleep for a short enough interval unless queues are too large or hz is too small. My systems are configured with HZ = 100 and snd.ifq too large so that sleeping for 1/Hz works for ttcp. netblast still busy-waits. This gives an interesting difference for netblast. It tries to send 800 k packets in 1 second by only successfully sends 582 k. 9.8 cm/p is for #misses / 582k. The 300k unsuccessful sends apparently don't cause many cache misses. But variance is high...) ttcp sending on bge0 at 577 kpps: (8.0-CURRENT) 15.5 cm/p (Another run shows high variance.) ttcp rates have low variance for a given kernel but high variance for different kernels (an extra unrelated byte in the text section can cause a 30% change). High variance would also be explained by non-locality of mbufs. Cycling through lots of mbufs would maximize cache misses but random reuse of mbufs would give variance. Or the cycling and variance might be more in general allocation. There is sillyness in getsockaddr(): sendit() calls getsockaddr() and getsockaddr() always uses malloc(), but allocation on the stack works for at the call from sendit(). This malloc() seemed to be responsible for a cache miss or two, but when I changed it to use the stack the results were inconclusive. Bruce From andre at freebsd.org Mon Jul 7 16:20:10 2008 From: andre at freebsd.org (Andre Oppermann) Date: Mon Jul 7 16:20:35 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080708002228.G680@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> Message-ID: <48724238.2020103@freebsd.org> Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Paul, >> >> to get a systematic analysis of the performance please do the following >> tests and put them into a table for easy comparison: >> >> 1. inbound pps w/o loss with interface in monitor mode (ifconfig em0 >> monitor) >> ... > > I won't be running many of these tests, but found this one interesting -- > I didn't know about monitor mode. It gives the following behaviour: > > -monitor ttcp receiving on bge0 at 397 kpps: 35% idle (8.0-CURRENT) 13.6 > cm/p > monitor ttcp receiving on bge0 at 397 kpps: 83% idle (8.0-CURRENT) 5.8 > cm/p > -monitor ttcp receiving on em0 at 580 kpps: 5% idle (~5.2) 12.5 > cm/p > monitor ttcp receiving on em0 at 580 kpps: 65% idle (~5.2) 4.8 > cm/p > > cm/p = k8-dc-misses (bge0 system) > cm/p = k7-dc-misses (em0 system) > > So it seems that the major overheads are not near the driver (as I already > knew), and upper layers are responsible for most of the cache misses. > The packet header is accessed even in monitor mode, so I think most of > the cache misses in upper layers are not related to the packet header. > Maybe they are due mainly to perfect non-locality for mbufs. Monitor mode doesn't access the payload packet header. It only looks at the mbuf (which has a structure called mbuf packet header). The mbuf header it hot in the cache because the driver just touched it and filled in the information. The packet content (the payload) is cold and just arrived via DMA in DRAM. -- Andre From nettwork at gmx.de Mon Jul 7 16:41:52 2008 From: nettwork at gmx.de (Achim) Date: Mon Jul 7 16:41:59 2008 Subject: smbmount / smbclient : strangely varying transfer speeds Message-ID: <200807071615.40987.nettwork@gmx.de> Hello List, I've experienced the following with both a kubuntu and a FBSD7 client and FBSD7 as server: When i try to copy a file off a *mounted* CIFS/SMB-share I get transfer rates below 1 MByte/sec. If i start a second, concurrent transfer i am getting transfer rates around 8MB/s on *each* transfer (Gigabit link). As soon as one transfer stops, the other is dropping to the old rate below 1MB/s. Copying TO the server works fine, atleast from the kubuntu client, the fbsd client isnt here atm. It speeds up the initial transfer to about 3.5MB/s, about half of what a concurrent download does. A single transfer via smbclient yields ~8MB/s. In other Words: Performance with a single client is degraded when the client is smbmount and downloading. With a second transfer in any direction, performance becomes better, to about 3.5 resp. 8 MB/s depending on the second connection up- or downloading. Unlike smbmount, single smbclient transfers yield acceptable results. Anyone with an idea as what to try? My wireshark skills aren't too advanced and i could not find any notable difference between the captures of each transfer type (single mounted, multiple mounted and single smblient), anything i should watch out for? The machines are connected over a simple soho gigabit switch, no fancy network between them. thanks in advance, Achim From brde at optusnet.com.au Mon Jul 7 18:16:49 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 18:16:56 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48724238.2020103@freebsd.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> Message-ID: <20080708034304.R21502@delplex.bde.org> On Mon, 7 Jul 2008, Andre Oppermann wrote: > Bruce Evans wrote: >> So it seems that the major overheads are not near the driver (as I already >> knew), and upper layers are responsible for most of the cache misses. >> The packet header is accessed even in monitor mode, so I think most of >> the cache misses in upper layers are not related to the packet header. >> Maybe they are due mainly to perfect non-locality for mbufs. > > Monitor mode doesn't access the payload packet header. It only looks > at the mbuf (which has a structure called mbuf packet header). The mbuf > header it hot in the cache because the driver just touched it and filled > in the information. The packet content (the payload) is cold and just > arrived via DMA in DRAM. Why does it use ntohs() then? :-). From if_ethersubr.c: % static void % ether_input(struct ifnet *ifp, struct mbuf *m) % { % struct ether_header *eh; % u_short etype; % % if ((ifp->if_flags & IFF_UP) == 0) { % m_freem(m); % return; % } % #ifdef DIAGNOSTIC % if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) { % if_printf(ifp, "discard frame at !IFF_DRV_RUNNING\n"); % m_freem(m); % return; % } % #endif % /* % * Do consistency checks to verify assumptions % * made by code past this point. % */ % if ((m->m_flags & M_PKTHDR) == 0) { % if_printf(ifp, "discard frame w/o packet header\n"); % ifp->if_ierrors++; % m_freem(m); % return; % } % if (m->m_len < ETHER_HDR_LEN) { % /* XXX maybe should pullup? */ % if_printf(ifp, "discard frame w/o leading ethernet " % "header (len %u pkt len %u)\n", % m->m_len, m->m_pkthdr.len); % ifp->if_ierrors++; % m_freem(m); % return; % } % eh = mtod(m, struct ether_header *); Point outside of mbuf header. % etype = ntohs(eh->ether_type); First access outside of mbuf header. But this seems to be bogus and might be fixed by compiler optimization, since etype is not used until after the monitor mode returns. This may have been broken by debugging cruft -- in 5.2, etype is used immediately after here in a printf about discarding oversize frames. The compiler might also pessimize things by reordering code. % if (m->m_pkthdr.rcvif == NULL) { % if_printf(ifp, "discard frame w/o interface pointer\n"); % ifp->if_ierrors++; % m_freem(m); % return; % } % #ifdef DIAGNOSTIC % if (m->m_pkthdr.rcvif != ifp) { % if_printf(ifp, "Warning, frame marked as received on %s\n", % m->m_pkthdr.rcvif->if_xname); % } % #endif % % if (ETHER_IS_MULTICAST(eh->ether_dhost)) { % if (ETHER_IS_BROADCAST(eh->ether_dhost)) % m->m_flags |= M_BCAST; % else % m->m_flags |= M_MCAST; % ifp->if_imcasts++; % } Another dereference of eh (2 unless optimizable and optimized). Here the result is actually used early, but I think you don't care enough about maintaing if_imcasts to do this. % % #ifdef MAC % /* % * Tag the mbuf with an appropriate MAC label before any other % * consumers can get to it. % */ % mac_ifnet_create_mbuf(ifp, m); % #endif % % /* % * Give bpf a chance at the packet. % */ % ETHER_BPF_MTAP(ifp, m); I think this can access the whole packet, but usually doesn't. % % /* % * If the CRC is still on the packet, trim it off. We do this once % * and once only in case we are re-entered. Nothing else on the % * Ethernet receive path expects to see the FCS. % */ % if (m->m_flags & M_HASFCS) { % m_adj(m, -ETHER_CRC_LEN); % m->m_flags &= ~M_HASFCS; % } % % ifp->if_ibytes += m->m_pkthdr.len; % % /* Allow monitor mode to claim this frame, after stats are updated. */ % if (ifp->if_flags & IFF_MONITOR) { % m_freem(m); % return; % } Finally return in monitor mode. I don't see any stats update before here except for the stray if_imcasts one. BTW, stats behave strangely in monitor mode: - netstat -I 1 works except: - the byte counts are 0 every second second (the next second counts the previous 2), while the packet counts are update every second - one system started showing bge0 stats for all interfaces. Perhaps unrelated. - systat -ip shows all counts 0. I think this is due to stats maintained by the driver working but other stats not. The mixture seems strange at user level. Bruce From mav at FreeBSD.org Mon Jul 7 18:30:09 2008 From: mav at FreeBSD.org (Alexander Motin) Date: Mon Jul 7 18:30:23 2008 Subject: kern/123200: [netgraph] Server failure due to netgraph mpd and dhcpclient Message-ID: <200807071830.m67IU9ZG008237@freefall.freebsd.org> The following reply was made to PR kern/123200; it has been noted by GNATS. From: Alexander Motin To: bug-followup@FreeBSD.org, zaulychny@yahoo.com Cc: Subject: Re: kern/123200: [netgraph] Server failure due to netgraph mpd and dhcpclient Date: Mon, 07 Jul 2008 21:27:58 +0300 If I understand right, you are receiving route to you VPN server using DHCP. I think you could get in trouble when DHCP lease time ended and you loose that route making VPN connection route default. In it's place it could cause routing loop by wrapping tunnel inside itself, causing in-kernel recursion loop. I have some feedbacks that stack protection mechanisms added to stable allow system better handle such case. Could you upgrade you system to the 6-STABLE and try again? -- Alexander Motin From paul at gtcomm.net Mon Jul 7 18:43:00 2008 From: paul at gtcomm.net (Paul) Date: Mon Jul 7 18:43:16 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4871E85C.8090907@freebsd.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> Message-ID: <48726422.7050703@gtcomm.net> > one that will later on handle the taskqueue to process the packets. > That adds overhead. Ideally the interrupt for each network interface > is bound to exactly one pre-determined CPU and the taskqueue is bound > to the same CPU. That way the overhead for interrupt and taskqueue > scheduling can be kept at a minimum. Most of the infrastructure to > do this binding already exists in the kernel but is not yet exposed > to the outside for us to make use of it. I'm also not sure if the > ULE scheduler skips the more global locks when interrupt and the > thread are on the same CPU. > > Distributing the interrupts and taskqueues among the available CPUs > gives concurrent forwarding with bi- or multi-directional traffic. > All incoming traffic from any particular interface is still serialized > though. > I used etherchannel to distribute incoming packets over 3 separate cpus evenly but the output was on one interface.. What I got was less performance than with one cpu and all three cpus were close to 100% utilizied. em0,em1,em2 were all receiving packets and sending them out em3. The machine had 4 cpus in it. em3 taskq was low cpu usage and em0,1,2 were using cpu0,1,2(for example) almost fully used. With all that cpu power being used and I got less performance than with 1 cpu :/ Obviously in SMP there is a big issue somewhere. Also my 82571 NIC supports multiple received queues and multiple transmit queues so why hasn't anyone written the driver to support this? It's not a 10gb card and it still supports it and it's widely available and not too expensive either. The new 82575/6 chips support even more queues and the two port version will be out this month and the 4 port in october (PCI-E cards). Motherboards are already shipping with the 82576.. (82571 supports 2x/2x 575/6 support 4x/4x) Paul From paul at gtcomm.net Mon Jul 7 18:55:15 2008 From: paul at gtcomm.net (Paul) Date: Mon Jul 7 18:55:21 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707213356.G7572@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> Message-ID: <4872670A.9050801@gtcomm.net> > I use low-end memory, but on the machine that does 640 kpps it somehow > has latency almost 4 times as low as on new FreeBSD cluster machines > (~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc > report an average of 11 k8-dc-misses per sendto() while sending via > bge at 640 kpps. 11 * 42 accounts for 442 nsec out of the 1562 per > packet at this rate. 11 * 150 = 1650 would probably make this rate > unachievable despite the system having 20 times as much CPU and bus. > Any of the buffered dimms or ddr3 or high cas ddr2 are going to have a lot more latency than older ones because the frequency is so high or the buffering. The best is to use ddr2 with the lowest timings that it supports at the highest frequency but not the highest frequency it supports at higher timings.. for instance i have some 1100mhz ddr2 ram but it's 5-5-5-15 but it will do 5-4-4-12 at 1000 or 900 Mhz so I think the latency may have more impact on the speed than the actual MHz of the ram itself. This works for several benchmarks which I have tested before running the ram at 1:1 with the FSB (400 FSB(1600fsb actual) with ram at 800 and the latency is a lot lower than ram at 1:1.20 FSB even though the bandwidth is higher) With higher latency in the 'server' machines we probably need to do things in bigger chunks.. Anyone using a FBSD router isn't going to care about a 1ms delay in the packet but they will care if packets are dropped or reordered. Paul From brde at optusnet.com.au Mon Jul 7 19:15:46 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 19:16:00 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080708034304.R21502@delplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> Message-ID: <20080708045135.V1022@besplex.bde.org> On Tue, 8 Jul 2008, Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Bruce Evans wrote: >>> So it seems that the major overheads are not near the driver (as I already >>> knew), and upper layers are responsible for most of the cache misses. >>> The packet header is accessed even in monitor mode, so I think most of >>> the cache misses in upper layers are not related to the packet header. >>> Maybe they are due mainly to perfect non-locality for mbufs. >> >> Monitor mode doesn't access the payload packet header. It only looks >> at the mbuf (which has a structure called mbuf packet header). The mbuf >> header it hot in the cache because the driver just touched it and filled >> in the information. The packet content (the payload) is cold and just >> arrived via DMA in DRAM. > > Why does it use ntohs() then? :-). From if_ethersubr.c: > ... > % eh = mtod(m, struct ether_header *); > > Point outside of mbuf header. > > % etype = ntohs(eh->ether_type); > > First access outside of mbuf header. > ... > % % /* Allow monitor mode to claim this frame, after stats are updated. > */ > % if (ifp->if_flags & IFF_MONITOR) { > % m_freem(m); > % return; > % } > > Finally return in monitor mode. > > I don't see any stats update before here except for the stray if_imcasts > one. There are some error stats with printfs, but I've never seen these do anything except with a buggy sk driver. Testing verifies that accessing eh above gives a cache miss. Under ~5.2 receiving on bge0 at 397 kpps: -monitor: 17% idle 19 cm/p (18% less idle than under -current) monitor: 66% idle 8 cm/p (17% less idle than under -current) +monitor: 71% idle 7 cm/p (idle time under -current not measured) +monitor is monitor mode with the exit moved to the top of ether_input(). If the cache miss takes the time measured by lmbench2 (42 ns), then 397 k of these per second gives 17 ms or 1.7% CPU, which is vaguely consistent with the improvement of 5% by not taking this cache miss. Avoiding most of the 19 cache misses should give much more than a 5% improvement. Maybe -current gets its 17% improvement by avoiding some. More userland stats weirdness in userland: - in monitor mode, em0 gives byte counts delayed while bge0 gives byte counts always 0. - netstat -I 1 seems to be broken in ~5.2 in all modes -- it gives output for interfaces with drivers but no hardware. All this is for UP. An SMP kernel on the same UP system loses < 5% for at least tx. Bruce From fbsdlist at src.cx Mon Jul 7 19:53:45 2008 From: fbsdlist at src.cx (Artem Belevich) Date: Mon Jul 7 19:53:59 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080708045135.V1022@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> Message-ID: Hi, As was already mentioned, we can't avoid all cache misses as there's data that's recently been updated in memory via DMA and therefor kicked out of cache. However, we may hide some of the latency penalty by prefetching 'interesting' data early. I.e. we know that we want to access some ethernet headers, so we may start pulling relevant data into cache early. Ideally, by the time we need to access the field, it will already be in the cache. When we're counting nanoseconds per packet this may bring some performance gain. Just my $0.02. --Artem From mike at sentex.net Mon Jul 7 20:06:20 2008 From: mike at sentex.net (Mike Tancsa) Date: Mon Jul 7 20:06:26 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48726422.7050703@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> Message-ID: <200807072006.m67K6GTE020938@lava.sentex.ca> At 02:44 PM 7/7/2008, Paul wrote: >Also my 82571 NIC supports multiple received queues and multiple >transmit queues so why hasn't >anyone written the driver to support this? It's not a 10gb card and >it still supports it and it's widely Intel actually maintains the driver. Not sure if there are plans or not, but perhaps they can comment ? ---Mike >available and not too expensive either. The new 82575/6 chips >support even more queues and the >two port version will be out this month and the 4 port in october >(PCI-E cards). Motherboards are >already shipping with the 82576.. (82571 supports 2x/2x 575/6 >support 4x/4x) > >Paul > > > > > > > > >_______________________________________________ >freebsd-net@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-net >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From paul at gtcomm.net Mon Jul 7 20:21:11 2008 From: paul at gtcomm.net (Paul) Date: Mon Jul 7 20:21:18 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200807072006.m67K6GTE020938@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807072006.m67K6GTE020938@lava.sentex.ca> Message-ID: <48727B31.3070003@gtcomm.net> I hope so, if they maintain the driver then why wouldn't they make it take advantage of their own hardware? I hope they are stuck focusing on windows users :/ Mike Tancsa wrote: > At 02:44 PM 7/7/2008, Paul wrote: >> Also my 82571 NIC supports multiple received queues and multiple >> transmit queues so why hasn't >> anyone written the driver to support this? It's not a 10gb card and >> it still supports it and it's widely > > > Intel actually maintains the driver. Not sure if there are plans or > not, but perhaps they can comment ? > > ---Mike From julian at elischer.org Mon Jul 7 20:25:15 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jul 7 20:25:20 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> Message-ID: <48727BA9.6020702@elischer.org> Artem Belevich wrote: > Hi, > > As was already mentioned, we can't avoid all cache misses as there's > data that's recently been updated in memory via DMA and therefor > kicked out of cache. > > However, we may hide some of the latency penalty by prefetching > 'interesting' data early. I.e. we know that we want to access some > ethernet headers, so we may start pulling relevant data into cache > early. Ideally, by the time we need to access the field, it will > already be in the cache. When we're counting nanoseconds per packet > this may bring some performance gain. Prefetching when you are waiting for the data isn't a help. what you need is a speculative prefetch where you an tell teh processor "We will probably need the following address so start getting it while we go do other stuff". As far as I know we have no capacity to do that.. > > Just my $0.02. > --Artem > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From peterjeremy at optushome.com.au Mon Jul 7 22:13:03 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Mon Jul 7 22:13:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48727BA9.6020702@elischer.org> References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> Message-ID: <20080707221257.GH62764@server.vk2pj.dyndns.org> On 2008-Jul-07 13:25:13 -0700, Julian Elischer wrote: >what you need is a speculative prefetch where you an tell teh >processor "We will probably need the following address so start >getting it while we go do other stuff". This looks like the PREFETCH instructions that exist in at least amd64 and SPARC. Unfortunately, their optimal use is very implementation- dependent and the AMD documentation suggests that incorrect use can degrade performance. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080707/4308ec8b/attachment.pgp From julian at elischer.org Mon Jul 7 22:17:23 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jul 7 22:17:29 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707221257.GH62764@server.vk2pj.dyndns.org> References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> <20080707221257.GH62764@server.vk2pj.dyndns.org> Message-ID: <487295F2.7070802@elischer.org> Peter Jeremy wrote: > On 2008-Jul-07 13:25:13 -0700, Julian Elischer wrote: >> what you need is a speculative prefetch where you an tell teh >> processor "We will probably need the following address so start >> getting it while we go do other stuff". > > This looks like the PREFETCH instructions that exist in at least amd64 > and SPARC. Unfortunately, their optimal use is very implementation- > dependent and the AMD documentation suggests that incorrect use can > degrade performance. > It might be worth looking to see if the network processing threads might be able to prefetch the IP header at least :-) From fbsdlist at src.cx Mon Jul 7 22:33:05 2008 From: fbsdlist at src.cx (Artem Belevich) Date: Mon Jul 7 22:33:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48727BA9.6020702@elischer.org> References: <4867420D.7090406@gtcomm.net> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> Message-ID: > Prefetching when you are waiting for the data isn't a help. Agreed. Got to start prefetch around ns before you actually need the data and move on doing other things that do not depend on the data you've just started prefetching. > what you need is a speculative prefetch where you an tell teh processor "We > will probably need the following address so start getting it while we go do > other stuff". It does not have to be 'speculative' either. In this particular case we have very good idea that we *will* need some data from ethernet header and, probably, IP and TCP headers as well. We might as well tel the hardware to start pulling data in without stalling the CPU. Intel has instructions specifically for this purpose. I assume AMD has them too. --Artem From paul at gtcomm.net Mon Jul 7 23:03:56 2008 From: paul at gtcomm.net (Paul) Date: Mon Jul 7 23:04:02 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> Message-ID: <4872A155.70606@gtcomm.net> We could add this as a part of the fastforwarding code and for a router turn it on and for a server leave it off. When I use a FBSD box for a router, it doesn't do anything else, so there could be two optimized paths that is one for routing/forwarding/firewalling only and one for use as a server. Artem Belevich wrote: >> Prefetching when you are waiting for the data isn't a help. >> > > Agreed. Got to start prefetch around ns > before you actually need the data and move on doing other things that > do not depend on the data you've just started prefetching. > > >> what you need is a speculative prefetch where you an tell teh processor "We >> will probably need the following address so start getting it while we go do >> other stuff". >> > > It does not have to be 'speculative' either. In this particular case > we have very good idea that we *will* need some data from ethernet > header and, probably, IP and TCP headers as well. We might as well tel > the hardware to start pulling data in without stalling the CPU. Intel > has instructions specifically for this purpose. I assume AMD has them > too. > > --Artem > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From andrew at modulus.org Mon Jul 7 23:18:35 2008 From: andrew at modulus.org (Andrew Snow) Date: Mon Jul 7 23:18:42 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200807072006.m67K6GTE020938@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807072006.m67K6GTE020938@lava.sentex.ca> Message-ID: <4872A3FE.706@modulus.org> Mike Tancsa wrote: > At 02:44 PM 7/7/2008, Paul wrote: >> Also my 82571 NIC supports multiple received queues and multiple >> transmit queues so why hasn't >> anyone written the driver to support this? It's not a 10gb card and >> it still supports it and it's widely > Intel actually maintains the driver. Not sure if there are plans or not, > but perhaps they can comment ? Last time Jack Vogel weighed in, I believe he said that the support for multiple queues and other performance enhancements are on the way. Some work had to be done to the networking infrastructure in FreeBSD first to allow this. - Andrew From julian at elischer.org Mon Jul 7 23:22:00 2008 From: julian at elischer.org (Julian Elischer) Date: Mon Jul 7 23:22:25 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> Message-ID: <4872A516.4000306@elischer.org> Artem Belevich wrote: >> Prefetching when you are waiting for the data isn't a help. > > Agreed. Got to start prefetch around ns > before you actually need the data and move on doing other things that > do not depend on the data you've just started prefetching. > >> what you need is a speculative prefetch where you an tell teh processor "We >> will probably need the following address so start getting it while we go do >> other stuff". > > It does not have to be 'speculative' either. "*Will*" is just a very definite subset of 'speculation' :-) > In this particular case > we have very good idea that we *will* need some data from ethernet > header and, probably, IP and TCP headers as well. We might as well tel > the hardware to start pulling data in without stalling the CPU. Intel > has instructions specifically for this purpose. I assume AMD has them > too. > > --Artem > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From mike at sentex.net Tue Jul 8 01:07:36 2008 From: mike at sentex.net (Mike Tancsa) Date: Tue Jul 8 01:07:42 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <48726422.7050703@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> Message-ID: <200807080107.m6817XxO021966@lava.sentex.ca> At 02:44 PM 7/7/2008, Paul wrote: >Also my 82571 NIC supports multiple received queues and multiple >transmit queues so why hasn't >anyone written the driver to support this? It's not a 10gb card and >it still supports it and it's widely >available and not too expensive either. The new 82575/6 chips >support even more queues and the >two port version will be out this month and the 4 port in october >(PCI-E cards). Motherboards are >already shipping with the 82576.. (82571 supports 2x/2x 575/6 >support 4x/4x) Actually, do any of your NICs attach via the igb driver ? ---Mike From paul at gtcomm.net Tue Jul 8 01:20:40 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 8 01:20:47 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200807080107.m6817XxO021966@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> Message-ID: <4872C161.6080105@gtcomm.net> I read through the IGB driver, and it says 82575/6 only... which is the new chip Intel is releasing on the cards this month 2 port and october 4 port, but the chips are on some of the motherboards right now. Why can't it also use the 82571 ? doesn't make any sense.. I haven't tried it but just browsing the driver source doesn't look like it will work. Mike Tancsa wrote: > At 02:44 PM 7/7/2008, Paul wrote: > >> Also my 82571 NIC supports multiple received queues and multiple >> transmit queues so why hasn't >> anyone written the driver to support this? It's not a 10gb card and >> it still supports it and it's widely >> available and not too expensive either. The new 82575/6 chips >> support even more queues and the >> two port version will be out this month and the 4 port in october >> (PCI-E cards). Motherboards are >> already shipping with the 82576.. (82571 supports 2x/2x 575/6 >> support 4x/4x) > > > > > Actually, do any of your NICs attach via the igb driver ? > > ---Mike > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From peterjeremy at optushome.com.au Tue Jul 8 03:32:54 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Tue Jul 8 03:33:02 2008 Subject: smbmount / smbclient : strangely varying transfer speeds In-Reply-To: <200807071615.40987.nettwork@gmx.de> References: <200807071615.40987.nettwork@gmx.de> Message-ID: <20080708033239.GL62764@server.vk2pj.dyndns.org> On 2008-Jul-07 16:15:40 +0000, Achim wrote: >Performance with a single client is degraded when the client is smbmount and >downloading. >With a second transfer in any direction, performance becomes better, to about >3.5 resp. 8 MB/s depending on the second connection up- or downloading. >Unlike smbmount, single smbclient transfers yield acceptable results. Is this two transfers between a single server and single client or between a single server and two clients? In the former case, you might like to try a transfers involving two clients or two servers to try and identify which end is behaving oddly. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080708/e400123f/attachment-0001.pgp From kip.macy at gmail.com Tue Jul 8 07:27:17 2008 From: kip.macy at gmail.com (Kip Macy) Date: Tue Jul 8 07:27:23 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <200807080107.m6817XxO021966@lava.sentex.ca> References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> Message-ID: On Mon, Jul 7, 2008 at 6:07 PM, Mike Tancsa wrote: > At 02:44 PM 7/7/2008, Paul wrote: > >> Also my 82571 NIC supports multiple received queues and multiple transmit >> queues so why hasn't >> anyone written the driver to support this? It's not a 10gb card and it >> still supports it and it's widely >> available and not too expensive either. The new 82575/6 chips support >> even more queues and the >> two port version will be out this month and the 4 port in october (PCI-E >> cards). Motherboards are >> already shipping with the 82576.. (82571 supports 2x/2x 575/6 support >> 4x/4x) > > > > > Actually, do any of your NICs attach via the igb driver ? > I have a pre-production card. With some bug fixes and some tuning of interrupt handling (custom stack - I've been asked to push the changes back in to CVS, I just don't have time right now) an otherwise unoptimized igb can forward 1.04Mpps from one port to another (1.04 Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 core system. -Kip From kip.macy at gmail.com Tue Jul 8 07:32:10 2008 From: kip.macy at gmail.com (Kip Macy) Date: Tue Jul 8 07:32:17 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4872C161.6080105@gtcomm.net> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <4872C161.6080105@gtcomm.net> Message-ID: On Mon, Jul 7, 2008 at 6:22 PM, Paul wrote: > I read through the IGB driver, and it says 82575/6 only... which is the new > chip Intel is releasing on the cards this month 2 port > and october 4 port, but the chips are on some of the motherboards right now. > Why can't it also use the 82571 ? doesn't make any sense.. I haven't tried > it but just browsing the driver source > doesn't look like it will work. The igb driver has been written to remove a lot of the cruft that has accumulated to work around deficiencies in earlier 8257x hardware. Although it supports "legacy" descriptor handling it has a new mode of descriptor handling that is ostensibly better. I don't have access to the data sheets for pre-zoar hardware so I'm not sure what it would take to support multiple queues on that hardware. -Kip From rwatson at FreeBSD.org Tue Jul 8 07:54:46 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Tue Jul 8 07:54:53 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> Message-ID: <20080708085227.J31157@fledge.watson.org> On Mon, 7 Jul 2008, Artem Belevich wrote: > As was already mentioned, we can't avoid all cache misses as there's data > that's recently been updated in memory via DMA and therefor kicked out of > cache. > > However, we may hide some of the latency penalty by prefetching > 'interesting' data early. I.e. we know that we want to access some ethernet > headers, so we may start pulling relevant data into cache early. Ideally, by > the time we need to access the field, it will already be in the cache. When > we're counting nanoseconds per packet this may bring some performance gain. There were some patches floating around for if_em to do a prefetch of the first bit of packet data on packets before handing them up the stack. My understanding is that they moved the hot spot earlier, but didn't make a huge difference because it doesn't really take that long to get to the point where you're processing the IP header in our current stack (a downside to optimization...). However, that's a pretty anecdotal story, and a proper study of the effects of prefetching would be most welcome. One thing that I'd really like to see someone look at is whether, by doing a bit of appropriately timed prefetching, we can move cache misses out from under hot locks that don't really relate to the data being prefetched. Robert N M Watson Computer Laboratory University of Cambridge From stefan.lambrev at moneybookers.com Tue Jul 8 08:15:46 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 8 08:15:53 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> Message-ID: <4873222B.4080907@moneybookers.com> Hi, Kip Macy wrote: > On Mon, Jul 7, 2008 at 6:07 PM, Mike Tancsa wrote: > >> At 02:44 PM 7/7/2008, Paul wrote: >> >> >>> Also my 82571 NIC supports multiple received queues and multiple transmit >>> queues so why hasn't >>> anyone written the driver to support this? It's not a 10gb card and it >>> still supports it and it's widely >>> available and not too expensive either. The new 82575/6 chips support >>> even more queues and the >>> two port version will be out this month and the 4 port in october (PCI-E >>> cards). Motherboards are >>> already shipping with the 82576.. (82571 supports 2x/2x 575/6 support >>> 4x/4x) >>> >> >> >> Actually, do any of your NICs attach via the igb driver ? >> >> > > I have a pre-production card. With some bug fixes and some tuning of > interrupt handling (custom stack - I've been asked to push the changes > back in to CVS, I just don't have time right now) an otherwise > unoptimized igb can forward 1.04Mpps from one port to another (1.04 > Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 > core system. > > Is this on 1gbps or on 10gbps NIC? > -Kip > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From kip.macy at gmail.com Tue Jul 8 08:19:11 2008 From: kip.macy at gmail.com (Kip Macy) Date: Tue Jul 8 08:19:17 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4873222B.4080907@moneybookers.com> References: <4867420D.7090406@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <4873222B.4080907@moneybookers.com> Message-ID: >> I have a pre-production card. With some bug fixes and some tuning of >> interrupt handling (custom stack - I've been asked to push the changes >> back in to CVS, I just don't have time right now) an otherwise >> unoptimized igb can forward 1.04Mpps from one port to another (1.04 >> Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 >> core system. >> >> > > Is this on 1gbps or on 10gbps NIC? >> Hi Stefan, The hardware that igb supports is just the latest revision of the hardware supported by em, i.e. it is 1gbps. Cheers, Kip From paul at gtcomm.net Tue Jul 8 08:26:43 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 8 08:26:53 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <4873222B.4080907@moneybookers.com> Message-ID: <4873253D.3070707@gtcomm.net> Will someone confirm if it will support the 82571EB ? I don't see a reason why not as it's very similar hardware and it's available now in large quantities so making 82571 part of igb I think would be a good idea. Kip Macy wrote: >>> I have a pre-production card. With some bug fixes and some tuning of >>> interrupt handling (custom stack - I've been asked to push the changes >>> back in to CVS, I just don't have time right now) an otherwise >>> unoptimized igb can forward 1.04Mpps from one port to another (1.04 >>> Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 >>> core system. >>> >>> >>> >> Is this on 1gbps or on 10gbps NIC? >> > > Hi Stefan, > The hardware that igb supports is just the latest revision of the > hardware supported by em, i.e. it is 1gbps. > > Cheers, > Kip > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From joe.kuan at itrinegy.com Tue Jul 8 11:08:45 2008 From: joe.kuan at itrinegy.com (Joe Kuan) Date: Tue Jul 8 11:08:57 2008 Subject: Help: FreeBSD 6.3 - em driver & taskqueue & priority Message-ID: Hi all, I have implemented an network application in kernel space and it is working fine. The application involves 3 network interfaces that FreeBSD 6.3 can forward mbuf between em0 and em1 in a rate 1.3 - 1.4 millions packets per second. Em2 is used for controlling the network application. The problem is that when em0 and em1 are transmitting in 1.3 - 1.4 millions packets per second, the em2 interface becomes irresponsive. However, my goal is to make the kernelised network application response as soon as a control packet arrives in em2, ie jumps the queue ahead of all the packets in em0 and em1. I think the problem lies on the priority set on the task structure are all the same for all the em devices. Am I heading in the right direction? If not, please advise me. Many thanks in advance Joe From brde at optusnet.com.au Tue Jul 8 11:47:52 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Tue Jul 8 11:47:59 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707131550.GA69202@owl.midgard.homeip.net> References: <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707131550.GA69202@owl.midgard.homeip.net> Message-ID: <20080708214624.W1168@besplex.bde.org> On Mon, 7 Jul 2008, Erik Trulsson wrote: > On Mon, Jul 07, 2008 at 10:30:53PM +1000, Bruce Evans wrote: >> On Mon, 7 Jul 2008, Andre Oppermann wrote: >>> The theoretical maximum at 64byte frames is 1,488,100. I've looked >>> up my notes the 1.244Mpps number can be ajusted to 1.488Mpps. >> >> Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)). >> 1.488095 is for 64 bits extra (10^9/(8*64+96+64)). > > A standard ethernet frame (on the wire) consists of: > 7 octets preamble > 1 octet Start Frame Delimiter > 6 octets destination address > 6 octets source address > 2 octets length/type > 46-1500 octets data (+padding if needed) > 4 octets Frame Check Sequence > > Followed by (at least) 96 bits interFrameGap, before the next frame starts. > > For minimal packet size this gives a maximum packet rate at 1Gbit/s of > 1e9/((7+1+6+6+2+46+4)*8+96)/ = 1488095 packets/second > > You probably missed the preamble and start frame delimiter in your > calculation. Thanks. Yes, that was it. Bruce From ivoras at freebsd.org Tue Jul 8 12:23:44 2008 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Jul 8 12:23:51 2008 Subject: Help: FreeBSD 6.3 - em driver & taskqueue & priority In-Reply-To: References: Message-ID: Joe Kuan wrote: > Hi all, > > I have implemented an network application in kernel space and it is > working fine. The application involves 3 network interfaces that FreeBSD > 6.3 can forward mbuf between em0 and em1 in a rate 1.3 - 1.4 millions > packets per second. Em2 is used for controlling the network application. > > The problem is that when em0 and em1 are transmitting in 1.3 - 1.4 > millions packets per second, the em2 interface becomes irresponsive. > However, my goal is to make the kernelised network application response > as soon as a control packet arrives in em2, ie jumps the queue ahead of > all the packets in em0 and em1. > > I think the problem lies on the priority set on the task structure > are all the same for all the em devices. Am I heading in the right > direction? A wild theory: are the NICs separate / individual and on separate buses? If they are not (e.g. two of them are on the same card or bus) it might be a hardware issue. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080708/b6adc99d/signature.pgp From joe.kuan at itrinegy.com Tue Jul 8 12:26:38 2008 From: joe.kuan at itrinegy.com (Joe Kuan) Date: Tue Jul 8 12:26:44 2008 Subject: Help: FreeBSD 6.3 - em driver & taskqueue & priority In-Reply-To: References: Message-ID: <03FF3E77-8B0F-4AF2-91D5-F0062DD8C6A2@itrinegy.com> On 8 Jul 2008, at 12:35, Ivan Voras wrote: > Joe Kuan wrote: >> Hi all, >> >> I have implemented an network application in kernel space and it is >> working fine. The application involves 3 network interfaces that >> FreeBSD >> 6.3 can forward mbuf between em0 and em1 in a rate 1.3 - 1.4 millions >> packets per second. Em2 is used for controlling the network >> application. >> >> The problem is that when em0 and em1 are transmitting in 1.3 - 1.4 >> millions packets per second, the em2 interface becomes irresponsive. >> However, my goal is to make the kernelised network application >> response >> as soon as a control packet arrives in em2, ie jumps the queue >> ahead of >> all the packets in em0 and em1. >> >> I think the problem lies on the priority set on the task structure >> are all the same for all the em devices. Am I heading in the right >> direction? > > A wild theory: are the NICs separate / individual and on separate > buses? > If they are not (e.g. two of them are on the same card or bus) it > might > be a hardware issue. > em0 and em1 are on the same card. Em2 is on the separate card. Thanks Joe From nettwork at gmx.de Tue Jul 8 13:16:24 2008 From: nettwork at gmx.de (Achim) Date: Tue Jul 8 13:16:31 2008 Subject: smbmount / smbclient : strangely varying transfer speeds In-Reply-To: <20080708033239.GL62764@server.vk2pj.dyndns.org> References: <200807071615.40987.nettwork@gmx.de> <20080708033239.GL62764@server.vk2pj.dyndns.org> Message-ID: <200807081316.52482.nettwork@gmx.de> On Tuesday 08 July 2008 03:32:40 Peter Jeremy wrote: > On 2008-Jul-07 16:15:40 +0000, Achim wrote: > >Performance with a single client is degraded when the client is smbmount > > and downloading. > >With a second transfer in any direction, performance becomes better, to > > about 3.5 resp. 8 MB/s depending on the second connection up- or > > downloading. Unlike smbmount, single smbclient transfers yield acceptable > > results. > > Is this two transfers between a single server and single client or > between a single server and two clients? In the former case, you > might like to try a transfers involving two clients or two servers > to try and identify which end is behaving oddly. Splendid Idea - As of yet, all test cases indeed were the same two physical machines, however if the client is a FBSD7 instead of the kubuntu one, comparable results are showing, so i suspect the server (FBSD7) to be the source of the oddness. I will try your suggestion as soon as the FBSD(Notebook) is back, should be tonight. Furthermore, there is a FBSD Dualboot on the kubuntu machine that i will test later to see if it shows the same behaviour. Funnily enough, if the second transfer is a very small-bandwidth one, like watching a movie instead of transferring it, the positive effect on the first transfer is also a LOT smaller than with a concurrent high-bandwidth access. From stefan.lambrev at moneybookers.com Tue Jul 8 13:40:19 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Tue Jul 8 13:40:26 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <4872C161.6080105@gtcomm.net> Message-ID: <48736E3E.6090200@moneybookers.com> Hi, Kip Macy wrote: > On Mon, Jul 7, 2008 at 6:22 PM, Paul wrote: > >> I read through the IGB driver, and it says 82575/6 only... which is the new >> chip Intel is releasing on the cards this month 2 port >> and october 4 port, but the chips are on some of the motherboards right now. >> Why can't it also use the 82571 ? doesn't make any sense.. I haven't tried >> it but just browsing the driver source >> doesn't look like it will work. >> > > The igb driver has been written to remove a lot of the cruft that has > accumulated to work around deficiencies in earlier 8257x hardware. > Although it supports "legacy" descriptor handling it has a new mode of > descriptor handling that is ostensibly better. I don't have access to > the data sheets for pre-zoar hardware so I'm not sure what it would > take to support multiple queues on that hardware. > May be we should ask Jack Vogel? He will have some news probably. > -Kip > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Best Wishes, Stefan Lambrev ICQ# 24134177 From fbsdlist at src.cx Tue Jul 8 16:34:50 2008 From: fbsdlist at src.cx (Artem Belevich) Date: Tue Jul 8 16:34:57 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080708085227.J31157@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <20080708085227.J31157@fledge.watson.org> Message-ID: On 7/8/08, Robert Watson wrote: > There were some patches floating around for if_em to do a prefetch of the > first bit of packet data on packets before handing them up the stack. My I found Andre Oppermann's optimization patch mentioned in july 2005 status report: http://lists.freebsd.org/pipermail/freebsd-announce/2005-July/001012.html http://www.nrg4u.com/freebsd/tcp_reass+prefetch-20041216.patch Is that the patch you had in mind? In the report Andre says: "Use [of prefetch] in both of these places show a very significant performance gain but not yet fully quantified." "very significant" bit looks promising. Unfortunately, it does not look like prefetch changes in the patch made it into official kernel. I wonder why. It should be easy enough to apply prefetch-related changes and see if/how it affects forwarding performance. --Artem From francisgendreau at videotron.ca Tue Jul 8 18:50:04 2008 From: francisgendreau at videotron.ca (Francis Gendreau) Date: Tue Jul 8 18:50:22 2008 Subject: kern/125195: verbrose dmesg from asus m3000n m3n as requested by Gavin Atkinson Message-ID: <200807081850.m68Io4qq087507@freefall.freebsd.org> The following reply was made to PR kern/125195; it has been noted by GNATS. From: Francis Gendreau To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/125195: verbrose dmesg from asus m3000n m3n as requested by Gavin Atkinson Date: Tue, 08 Jul 2008 11:33:22 -0400 verbose dmesg after hard power cycle: Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries Data TLB: 4 KB Pages, 4-way set associative, 128 entries Instruction TLB: 4 MB pages, fully associative, 2 entries 2nd-level cache: 1 MB, 8-way set associative, 64 byte line size 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size Data TLB: 4 MB Pages, 4-way set associative, 8 entries 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size real memory = 527695872 (503 MB) Physical memory chunk(s): 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) 0x0000000001028000 - 0x000000001ee22fff, 501198848 bytes (122363 pages) avail memory = 502419456 (479 MB) Table 'FACP' at 0x1f740200 Table 'OEMB' at 0x1f750040 MADT: No MADT table found APIC: Could not find any APICs. pnpbios: Found PnP BIOS data at 0xc00f2e00 pnpbios: Entry = f0000:39da Rev = 1.0 Other BIOS signatures found: wlan_amrr: wlan: <802.11 Link Layer> firmware: 'ipw_bss' version 130: 209190 bytes loaded at 0xc0d68738 firmware: 'ipw_ibss' version 130: 201138 bytes loaded at 0xc0d9d73c firmware: 'ipw_monitor' version 130: 196458 bytes loaded at 0xc0dd0748 snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_buffersize=16384 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 ath_rate: version 1.2 nfslock: pseudo-device kbd: new array size 4 kbd1 at kbdmux0 io: mem: Pentium Pro MTRR support enabled null: random: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27) ACPI: RSDP @ 0x0xf4b70/0x0014 (v 0 ACPIAM) ACPI: RSDT @ 0x0x1f740000/0x002C (v 1 A M I OEMRSDT 0x05000314 MSFT 0x00000097) ACPI: FACP @ 0x0x1f740200/0x0081 (v 2 A M I OEMFACP 0x05000314 MSFT 0x00000097) ACPI: DSDT @ 0x0x1f740300/0x7323 (v 1 0ABBD 0ABBD001 0x00000001 MSFT 0x0100000D) ACPI: FACS @ 0x0x1f750000/0x0040 ACPI: OEMB @ 0x0x1f750040/0x004D (v 1 A M I OEMBIOS 0x05000314 MSFT 0x00000097) npx0: INT 16 interface acpi0: on motherboard acpi0: [MPSAFE] acpi0: [ITHREAD] pci_open(1): mode 1 addr port (0x0cf8) is 0x8000005c pci_open(1a): mode1res=0x80000000 (0x80000000) pci_cfgcheck: device 0 [class=060000] [hdr=80] is there (id=35808086) pcibios: No call entry point AcpiOsDerivePciId: \\_SB_.PCI0.P0P1.CBS0.CBSP -> bus 1 dev 5 func 0 acpi0: Power Button (fixed) acpi0: wakeup code va 0xccd3f000 pa 0x1000 atpic: Programming IRQ9 as level/low AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.FHR0 -> bus 0 dev 31 func 0 AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.IROR -> bus 0 dev 31 func 0 acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 1f700000 (3) failed ACPI timer: 1/1 1/1 1/0 1/1 1/1 1/1 1/0 1/1 1/1 1/1 -> 10 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pci_link0: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 3 4 5 6 7 11 12 Validation 0 11 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link1: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 4 5 6 7 11 12 Validation 0 255 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link2: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 12 Validation 0 4 N 0 4 12 After Disable 0 255 N 0 4 12 pci_link3: Index IRQ Rtd Ref IRQs Initial Probe 0 5 N 0 5 6 Validation 0 5 N 0 5 6 After Disable 0 255 N 0 5 6 pci_link4: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 6 11 Validation 0 11 N 0 6 11 After Disable 0 255 N 0 6 11 pci_link5: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 7 Validation 0 255 N 0 3 7 After Disable 0 255 N 0 3 7 pci_link6: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 4 7 Validation 0 255 N 0 4 7 After Disable 0 255 N 0 4 7 pci_link7: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 6 12 Validation 0 4 N 0 4 6 12 After Disable 0 255 N 0 4 6 12 cpu0: on acpi0 cpu0: switching to generic Cx mode est0: on cpu0 p4tcc0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 ACPI: Found matching pin for 0.2.INTA at func 0: 11 ACPI: Found matching pin for 0.31.INTA at func 1: 255 ACPI: Found matching pin for 0.31.INTB at func 5: 255 ACPI: Found matching pin for 0.31.INTB at func 6: 255 ACPI: Found matching pin for 0.29.INTA at func 0: 11 ACPI: Found matching pin for 0.29.INTB at func 1: 5 ACPI: Found matching pin for 0.29.INTC at func 2: 4 ACPI: Found matching pin for 0.29.INTD at func 7: 4 pci0: on pcib0 pci0: domain=0, physical bus=0 found-> vendor=0x8086, dev=0x3580, revid=0x02 domain=0, bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x2090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3584, revid=0x02 domain=0, bus=0, slot=0, func=1 class=08-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3585, revid=0x02 domain=0, bus=0, slot=0, func=3 class=08-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=0 class=03-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xf0000000, size 27, enabled map[14]: type Memory, range 32, base 0xffa80000, size 19, enabled map[18]: type I/O Port, range 32, base 0xdc00, size 3, enabled pcib0: matched entry for 0.2.INTA (src \\_SB_.LNKA:0) pcib0: slot 2 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=1 class=03-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xe8000000, size 27, enabled map[14]: type Memory, range 32, base 0xff980000, size 19, enabled found-> vendor=0x8086, dev=0x24c2, revid=0x03 domain=0, bus=0, slot=29, func=0 class=0c-03-00, hdrtype=0x00, mfdev=1 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 map[20]: type I/O Port, range 32, base 0xd480, size 5, enabled pcib0: matched entry for 0.29.INTA (src \\_SB_.LNKA:0) pcib0: slot 29 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x24c4, revid=0x03 domain=0, bus=0, slot=29, func=1 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=5 map[20]: type I/O Port, range 32, base 0xd800, size 5, enabled pcib0: matched entry for 0.29.INTB (src \\_SB_.LNKD:0) pcib0: slot 29 INTB routed to irq 5 via \\_SB_.LNKD found-> vendor=0x8086, dev=0x24c7, revid=0x03 domain=0, bus=0, slot=29, func=2 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=c, irq=4 map[20]: type I/O Port, range 32, base 0xd880, size 5, enabled pcib0: matched entry for 0.29.INTC (src \\_SB_.LNKC:0) pcib0: slot 29 INTC routed to irq 4 via \\_SB_.LNKC found-> vendor=0x8086, dev=0x24cd, revid=0x03 domain=0, bus=0, slot=29, func=7 class=0c-03-20, hdrtype=0x00, mfdev=0 cmdreg=0x0106, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=d, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xffa7fc00, size 10, enabled pcib0: matched entry for 0.29.INTD (src \\_SB_.LNKH:0) pcib0: slot 29 INTD routed to irq 4 via \\_SB_.LNKH found-> vendor=0x8086, dev=0x2448, revid=0x83 domain=0, bus=0, slot=30, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 cmdreg=0x0107, statreg=0x8080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x06 (1500 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24cc, revid=0x03 domain=0, bus=0, slot=31, func=0 class=06-01-00, hdrtype=0x00, mfdev=1 cmdreg=0x000f, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24ca, revid=0x03 domain=0, bus=0, slot=31, func=1 class=01-01-8a, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 map[20]: type I/O Port, range 32, base 0xffa0, size 4, enabled map[24]: type Memory, range 32, base 0, size 10, memory disabled found-> vendor=0x8086, dev=0x24c5, revid=0x03 domain=0, bus=0, slot=31, func=5 class=04-01-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe000, size 8, enabled map[14]: type I/O Port, range 32, base 0xe100, size 6, enabled map[18]: type Memory, range 32, base 0, size 9, memory disabled map[1c]: type Memory, range 32, base 0, size 8, memory disabled found-> vendor=0x8086, dev=0x24c6, revid=0x03 domain=0, bus=0, slot=31, func=6 class=07-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe200, size 8, enabled map[14]: type I/O Port, range 32, base 0xe300, size 7, enabled pci0: at device 0.1 (no driver attached) pci0: at device 0.3 (no driver attached) vgapci0: port 0xdc00-0xdc07 mem 0xf0000000-0xf7ffffff,0xffa80000-0xffafffff irq 11 at device 2.0 on pci0 agp0: on vgapci0 vgapci0: Reserved 0x8000000 bytes for rid 0x10 type 3 at 0xf0000000 vgapci0: Reserved 0x80000 bytes for rid 0x14 type 3 at 0xffa80000 agp0: detected 8060k stolen memory agp0: aperture size is 128M vgapci1: mem 0xe8000000-0xefffffff,0xff980000-0xff9fffff at device 2.1 on pci0 uhci0: port 0xd480-0xd49f irq 11 at device 29.0 on pci0 uhci0: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd480 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd800-0xd81f irq 5 at device 29.1 on pci0 uhci1: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd800 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd880-0xd89f irq 4 at device 29.2 on pci0 uhci2: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd880 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xffa7fc00-0xffa7ffff irq 4 at device 29.7 on pci0 ehci0: Reserved 0x400 bytes for rid 0x10 type 3 at 0xffa7fc00 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: on usb3 uhub3: 6 ports with 6 removable, self powered pcib1: at device 30.0 on pci0 pcib1: domain 0 pcib1: secondary bus 1 pcib1: subordinate bus 1 pcib1: I/O decode 0xc000-0xcfff pcib1: memory decode 0xff700000-0xff7fffff pcib1: prefetched decode 0xdea00000-0xdeafffff pcib1: Subtractively decoded bridge. ACPI: Found matching pin for 1.8.INTA at func 0: 11 ACPI: Found matching pin for 1.5.INTA at func 0: 255 ACPI: Found matching pin for 1.5.INTB at func 1: 11 ACPI: Found matching pin for 1.4.INTA at func 0: 4 pci1: on pcib1 pci1: domain=0, physical bus=1 found-> vendor=0x8086, dev=0x1043, revid=0x04 domain=0, bus=1, slot=4, func=0 class=02-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0116, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x22 (8500 ns) intpin=a, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fd000, size 12, enabled pcib1: requested memory range 0xff7fd000-0xff7fdfff: good pcib1: matched entry for 1.4.INTA (src \\_SB_.LNKC:0) pcib1: slot 4 INTA routed to irq 4 via \\_SB_.LNKC found-> vendor=0x1180, dev=0x0475, revid=0xb8 domain=0, bus=1, slot=5, func=0 class=06-07-00, hdrtype=0x02, mfdev=1 cmdreg=0x0007, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x20 (960 ns), mingnt=0x80 (32000 ns), maxlat=0x07 (1750 ns) intpin=a, irq=255 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0, size 12, enabled found-> vendor=0x1180, dev=0x0551, revid=0x00 domain=0, bus=1, slot=5, func=1 class=0c-00-10, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x04 (1000 ns) intpin=b, irq=11 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fe800, size 11, enabled pcib1: requested memory range 0xff7fe800-0xff7fefff: good pcib1: matched entry for 1.5.INTB (src \\_SB_.LNKA:0) pcib1: slot 5 INTB routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x103e, revid=0x83 domain=0, bus=1, slot=8, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0117, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x08 (2000 ns), maxlat=0x38 (14000 ns) intpin=a, irq=11 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0xff7ff000, size 12, enabled pcib1: requested memory range 0xff7ff000-0xff7fffff: good map[14]: type I/O Port, range 32, base 0xcc00, size 6, enabled pcib1: requested I/O range 0xcc00-0xcc3f: in range pcib1: matched entry for 1.8.INTA (src \\_SB_.LNKE:0) pcib1: slot 8 INTA routed to irq 11 via \\_SB_.LNKE ipw0: mem 0xff7fd000-0xff7fdfff irq 4 at device 4.0 on pci1 ipw0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7fd000 ipw0: bpf attached ipw0: Ethernet address: 00:04:23:71:77:46 ipw0: bpf attached ipw0: bpf attached ipw0: [MPSAFE] ipw0: [ITHREAD] ipw0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps cbb0: at device 5.0 on pci1 pcib1: cbb0 requested memory range 0xff700000-0xff7fffff: good cbb0: Lazy allocation of 0x1000 bytes rid 0x10 type 3 at 0xff700000 cardbus0: on cbb0 pccard0: <16-bit PCCard bus> on cbb0 pcib1: matched entry for 1.5.INTA (src \\_SB_.LNKB:0) pci_link1: Picked IRQ 9 with weight 0 pcib1: slot 5 INTA routed to irq 9 via \\_SB_.LNKB cbb0: [MPSAFE] cbb0: [ITHREAD] cbb0: PCI Configuration space: 0x00: 0x04751180 0x02100007 0x060700b8 0x00822000 0x10: 0xff700000 0x020000dc 0x20030201 0xfffff000 0x20: 0x00000000 0xfffff000 0x00000000 0xfffffffc 0x30: 0x00000000 0xfffffffc 0x00000000 0x07000109 0x40: 0x17441043 0x00000001 0x00000000 0x00000000 0x50: 0x00000000 0x00000000 0x00000000 0x00000000 0x60: 0x00000000 0x00000000 0x00000000 0x00000000 0x70: 0x00000000 0x00000000 0x00000000 0x00000000 0x80: 0x20a00001 0x00000000 0x04630463 0x00000000 0x90: 0x00000000 0x00000000 0x00000000 0x00000000 0xa0: 0x80000000 0x00000000 0x00000000 0x00000000 0xb0: 0x00000000 0x00000000 0x00000000 0x00000000 0xc0: 0x17441043 0x00000000 0x00000000 0x00000000 0xd0: 0x00000000 0x00000000 0x00000000 0xfe0a0001 0xe0: 0x24c04000 0x00000000 0x00000000 0x00000000 0xf0: 0x00000000 0x00000000 0x00000000 0x00000000 fwohci0: mem 0xff7fe800-0xff7fefff irq 11 at device 5.1 on pci1 fwohci0: Reserved 0x800 bytes for rid 0x10 type 3 at 0xff7fe800 fwohci0: [MPSAFE] fwohci0: [FILTER] fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:03:10:02:07 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:10:02:07 fwe0: bpf attached fwe0: Ethernet address: 02:e0:18:10:02:07 fwip0: on firewire0 fwip0: bpf attached fwip0: Firewire address: 00:e0:18:00:03:10:02:07 @ 0xfffe00000000, S400, maxrec 2048 sbp0: on firewire0 dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x1374000 fwohci0: Initiate bus reset fwohci0: BUS reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode fxp0: port 0xcc00-0xcc3f mem 0xff7ff000-0xff7fffff irq 11 at device 8.0 on pci1 fxp0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7ff000 fxp0: using memory space register mapping fxp0: PCI IDs: 8086 103e 1043 1745 0083 fxp0: Dynamic Standby mode is disabled fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: MII without any PHY! device_attach: fxp0 attach returned 6 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0 ata0: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6 ata0: reset tp1 mask=03 ostat0=50 ostat1=00 ata0: stat0=0x90 err=0x90 lsb=0x90 msb=0x90 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata0: reset tp2 stat0=50 stat1=00 devices=0x1 ata0: [MPSAFE] ata0: [ITHREAD] ata1: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x18 type 4 at 0x170 atapci0: Reserved 0x1 bytes for rid 0x1c type 4 at 0x376 ata1: reset tp1 mask=03 ostat0=50 ostat1=00 ata1: stat0=0x10 err=0x01 lsb=0x14 msb=0xeb ata1: stat1=0x00 err=0x01 lsb=0x7f msb=0x7f ata1: reset tp2 stat0=10 stat1=00 devices=0x4 ata1: [MPSAFE] ata1: [ITHREAD] pcm0: port 0xe000-0xe0ff,0xe100-0xe13f at device 31.5 on pci0 pcm0: Lazy allocation of 0x200 bytes rid 0x18 type 3 at 0x80000000 pcm0: Lazy allocation of 0x100 bytes rid 0x1c type 3 at 0x80000200 pcib0: matched entry for 0.31.INTB (src \\_SB_.LNKB:0) pcib0: slot 31 INTB routed to irq 9 via \\_SB_.LNKB pcm0: [MPSAFE] pcm0: [ITHREAD] pcm0: pcm0: Codec features headphone, 20 bit DAC, 20 bit ADC, 5 bit master volume, SigmaTel 3D Enhancement pcm0: Primary codec extended features variable rate PCM, reserved 1, AMAP, reserved 4 pcm0: ac97 codec dac ready count: 0 pcm0: Mixer "vol": pcm0: Mixer "pcm": pcm0: Mixer "speaker": pcm0: Mixer "line": pcm0: Mixer "mic": pcm0: Mixer "cd": pcm0: Mixer "rec": pcm0: Mixer "igain": pcm0: Mixer "ogain": pcm0: Mixer "line1": pcm0: Mixer "phin": pcm0: Mixer "phout": pcm0: Mixer "video": pcm0: clone manager: deadline=750ms flags=0x8000001e pcm0: sndbuf_setmap 1620000, 4000; 0xd4d6f000 -> 1620000 pcm0: sndbuf_setmap 162c000, 4000; 0xd4d73000 -> 162c000 pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 acpi_lid0: on acpi0 acpi_tz0: on acpi0 acpi_acad0: on acpi0 battery0: on acpi0 battery1: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 atkbd: the current kbd controller command byte 0065 atkbd: keyboard ID 0x41ab (2) kbd0 at atkbd0 kbd0: atkbd0, AT 101/102 (2), config:0x0, flags:0x3d0000 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: unable to allocate IRQ psmcpnp0: irq 12 on acpi0 psm0: current command byte:0065 psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0-00, 2 buttons psm0: config:00000000, flags:00000008, packet size:3 psm0: syncmask:c0, syncbits:00 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0 port 0x2f8-0x2ff irq 3 drq 1 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff ex_isa_identify() ahc_isa_probe 12: ioport 0xcc00 alloc failed ahc_isa_probe 13: ioport 0xdc00 alloc failed ahc_isa_probe 14: ioport 0xec00 alloc failed ata: ata0 already exists; skipping it ata: ata1 already exists; skipping it atkbdc: atkbdc0 already exists; skipping it sio: sio0 already exists; skipping it pnp_identify: Trying Read_Port at 203 pnp_identify: Trying Read_Port at 243 pnp_identify: Trying Read_Port at 283 pnp_identify: Trying Read_Port at 2c3 pnp_identify: Trying Read_Port at 303 pnp_identify: Trying Read_Port at 343 pnp_identify: Trying Read_Port at 383 pnp_identify: Trying Read_Port at 3c3 PNP Identify complete sc: sc0 already exists; skipping it vga: vga0 already exists; skipping it isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices pmtimer0 on isa0 orm0: at iomem 0xc0000-0xccfff pnpid ORM0000 on isa0 adv0: not probed (disabled) aha0: not probed (disabled) aic0: not probed (disabled) bt0: not probed (disabled) cs0: not probed (disabled) ed0: not probed (disabled) fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fe0: not probed (disabled) ie0: not probed (disabled) le0: not probed (disabled) ppc0: parallel port found at 0x378 ppc0: using extended I/O port range ppc0: ECP SPP ECP+EPP SPP ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [MPSAFE] ppbus0: [ITHREAD] plip0: on ppbus0 plip0: bpf attached lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sc0: fb0, kbd1, terminal emulator: sc (syscons terminal) sio1 failed to probe at port 0x2f8 irq 3 on isa0 sio2: not probed (disabled) sio3: not probed (disabled) sn0: not probed (disabled) vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 vt0: not probed (disabled) isa_probe_children: probing PnP devices Device configuration finished. procfs registered Timecounter "TSC" frequency 600024956 Hz quality 800 Timecounters tick every 1.000 msec lo0: bpf attached hptrr: no controller detected. firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire acpi_acad0: acline initialization start ad0: setting PIO4 on ICH4 chip ad0: setting UDMA100 on ICH4 chip battery0: battery initialization start battery1: battery initialization start system power profile changed to 'economy' ad0: 38154MB at ata0-master UDMA100 ad0: 78140160 sectors [77520C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad0 acpi_acad0: Off Line acpi_acad0: acline initialization done, tried 1 times battery0: battery initialization done, tried 1 times ad0: Intel check1 failed ad0: Adaptec check1 failed ad0: LSI (v3) check1 failed ad0: LSI (v2) check1 failed ad0: FreeBSD check1 failed ata1-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire acd0: setting PIO4 on ICH4 chip acd0: setting UDMA33 on ICH4 chip acd0: DVDROM drive at ata1 as master acd0: read 4125KB/s (4125KB/s), 512KB buffer, UDMA33 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet acd0: Writes: acd0: Audio: play, 16 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc pcm0: measured ac97 link rate at 48017 Hz, will use 48000 Hz acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe0:ata1:0:0:0): Down reving Protocol Version from 2 to 0? (probe0:ata1:0:0:0): error 6 (probe0:ata1:0:0:0): Unretryable Error acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe6:sbp0:0:5:0): error 22 (probe6:sbp0:0:5:0): Unretryable Error (probe1:sbp0:0:0:0): error 22 (probe1:sbp0:0:0:0): Unretryable Error (probe2:sbp0:0:1:0): error 22 (probe2:sbp0:0:1:0): Unretryable Error (probe3:sbp0:0:2:0): error 22 (probe3:sbp0:0:2:0): Unretryable Error (probe4:sbp0:0:3:0): error 22 (probe4:sbp0:0:3:0): Unretryable Error (probe5:sbp0:0:4:0): error 22 (probe5:sbp0:0:4:0): Unretryable Error (probe7:sbp0:0:6:0): error 22 (probe7:sbp0:0:6:0): Unretryable Error pass0 at ata1 bus 0 target 0 lun 0 pass0: Removable CD-ROM SCSI-0 device pass0: 33.000MB/s transfers GEOM: new disk cd0 ATA PseudoRAID loade(cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error cd0 at ata1 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present d (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error Trying to mount root from ufs:/dev/ad0s1a start_init: trying /sbin/init drm0: on vgapci0 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm1: on vgapci1 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm0: [MPSAFE] drm0: [ITHREAD] drm0: [MPSAFE] drm0: [ITHREAD] Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 0 0 done All buffers synced. Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008 root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Preloaded elf kernel "/boot/kernel/kernel" at 0xc0e72000. Preloaded elf module "/boot/kernel/if_ipw.ko" at 0xc0e7214c. Preloaded elf module "/boot/kernel/snd_ich.ko" at 0xc0e721f8. Preloaded elf module "/boot/kernel/sound.ko" at 0xc0e722a4. Preloaded elf module "/boot/kernel/ipw_bss.ko" at 0xc0e72350. Preloaded elf module "/boot/kernel/ipw_ibss.ko" at 0xc0e723fc. Preloaded elf module "/boot/kernel/ipw_monitor.ko" at 0xc0e724ac. Preloaded elf module "/boot/kernel/atapicam.ko" at 0xc0e7255c. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0e7260c. Calibrating clock(s) ... i8254 clock: 1193167 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter "i8254" frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 600023372 Hz CPU: Intel(R) Pentium(R) M processor 1400MHz (600.02-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x695 Stepping = 5 Features=0xa7e9fbbf Features2=0x180 Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries Data TLB: 4 KB Pages, 4-way set associative, 128 entries Instruction TLB: 4 MB pages, fully associative, 2 entries 2nd-level cache: 1 MB, 8-way set associative, 64 byte line size 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size Data TLB: 4 MB Pages, 4-way set associative, 8 entries 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size real memory = 527695872 (503 MB) Physical memory chunk(s): 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) 0x0000000001028000 - 0x000000001ee22fff, 501198848 bytes (122363 pages) avail memory = 502419456 (479 MB) Table 'FACP' at 0x1f740200 Table 'OEMB' at 0x1f750040 MADT: No MADT table found APIC: Could not find any APICs. pnpbios: Found PnP BIOS data at 0xc00f2e00 pnpbios: Entry = f0000:39da Rev = 1.0 Other BIOS signatures found: wlan_amrr: wlan: <802.11 Link Layer> firmware: 'ipw_bss' version 130: 209190 bytes loaded at 0xc0d68738 firmware: 'ipw_ibss' version 130: 201138 bytes loaded at 0xc0d9d73c firmware: 'ipw_monitor' version 130: 196458 bytes loaded at 0xc0dd0748 snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_buffersize=16384 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 ath_rate: version 1.2 nfslock: pseudo-device kbd: new array size 4 kbd1 at kbdmux0 io: mem: Pentium Pro MTRR support enabled null: random: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27) ACPI: RSDP @ 0x0xf4b70/0x0014 (v 0 ACPIAM) ACPI: RSDT @ 0x0x1f740000/0x002C (v 1 A M I OEMRSDT 0x05000314 MSFT 0x00000097) ACPI: FACP @ 0x0x1f740200/0x0081 (v 2 A M I OEMFACP 0x05000314 MSFT 0x00000097) ACPI: DSDT @ 0x0x1f740300/0x7323 (v 1 0ABBD 0ABBD001 0x00000001 MSFT 0x0100000D) ACPI: FACS @ 0x0x1f750000/0x0040 ACPI: OEMB @ 0x0x1f750040/0x004D (v 1 A M I OEMBIOS 0x05000314 MSFT 0x00000097) npx0: INT 16 interface acpi0: on motherboard acpi0: [MPSAFE] acpi0: [ITHREAD] pci_open(1): mode 1 addr port (0x0cf8) is 0x8000005c pci_open(1a): mode1res=0x80000000 (0x80000000) pci_cfgcheck: device 0 [class=060000] [hdr=80] is there (id=35808086) pcibios: No call entry point AcpiOsDerivePciId: \\_SB_.PCI0.P0P1.CBS0.CBSP -> bus 1 dev 5 func 0 acpi0: Power Button (fixed) acpi0: wakeup code va 0xccd3f000 pa 0x1000 atpic: Programming IRQ9 as level/low AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.FHR0 -> bus 0 dev 31 func 0 AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.IROR -> bus 0 dev 31 func 0 acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 1f700000 (3) failed ACPI timer: 1/0 1/0 1/1 1/1 1/0 1/0 1/1 1/1 1/1 1/0 -> 10 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pci_link0: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 3 4 5 6 7 11 12 Validation 0 11 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link1: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 4 5 6 7 11 12 Validation 0 255 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link2: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 12 Validation 0 4 N 0 4 12 After Disable 0 255 N 0 4 12 pci_link3: Index IRQ Rtd Ref IRQs Initial Probe 0 5 N 0 5 6 Validation 0 5 N 0 5 6 After Disable 0 255 N 0 5 6 pci_link4: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 6 11 Validation 0 11 N 0 6 11 After Disable 0 255 N 0 6 11 pci_link5: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 7 Validation 0 255 N 0 3 7 After Disable 0 255 N 0 3 7 pci_link6: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 4 7 Validation 0 255 N 0 4 7 After Disable 0 255 N 0 4 7 pci_link7: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 6 12 Validation 0 4 N 0 4 6 12 After Disable 0 255 N 0 4 6 12 cpu0: on acpi0 cpu0: switching to generic Cx mode est0: on cpu0 p4tcc0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 ACPI: Found matching pin for 0.2.INTA at func 0: 11 ACPI: Found matching pin for 0.31.INTA at func 1: 255 ACPI: Found matching pin for 0.31.INTB at func 5: 255 ACPI: Found matching pin for 0.31.INTB at func 6: 255 ACPI: Found matching pin for 0.29.INTA at func 0: 11 ACPI: Found matching pin for 0.29.INTB at func 1: 5 ACPI: Found matching pin for 0.29.INTC at func 2: 4 ACPI: Found matching pin for 0.29.INTD at func 7: 4 pci0: on pcib0 pci0: domain=0, physical bus=0 found-> vendor=0x8086, dev=0x3580, revid=0x02 domain=0, bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x2090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3584, revid=0x02 domain=0, bus=0, slot=0, func=1 class=08-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3585, revid=0x02 domain=0, bus=0, slot=0, func=3 class=08-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=0 class=03-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xf0000000, size 27, enabled map[14]: type Memory, range 32, base 0xffa80000, size 19, enabled map[18]: type I/O Port, range 32, base 0xdc00, size 3, enabled pcib0: matched entry for 0.2.INTA (src \\_SB_.LNKA:0) pcib0: slot 2 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=1 class=03-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xe8000000, size 27, enabled map[14]: type Memory, range 32, base 0xff980000, size 19, enabled found-> vendor=0x8086, dev=0x24c2, revid=0x03 domain=0, bus=0, slot=29, func=0 class=0c-03-00, hdrtype=0x00, mfdev=1 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 map[20]: type I/O Port, range 32, base 0xd480, size 5, enabled pcib0: matched entry for 0.29.INTA (src \\_SB_.LNKA:0) pcib0: slot 29 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x24c4, revid=0x03 domain=0, bus=0, slot=29, func=1 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=5 map[20]: type I/O Port, range 32, base 0xd800, size 5, enabled pcib0: matched entry for 0.29.INTB (src \\_SB_.LNKD:0) pcib0: slot 29 INTB routed to irq 5 via \\_SB_.LNKD found-> vendor=0x8086, dev=0x24c7, revid=0x03 domain=0, bus=0, slot=29, func=2 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=c, irq=4 map[20]: type I/O Port, range 32, base 0xd880, size 5, enabled pcib0: matched entry for 0.29.INTC (src \\_SB_.LNKC:0) pcib0: slot 29 INTC routed to irq 4 via \\_SB_.LNKC found-> vendor=0x8086, dev=0x24cd, revid=0x03 domain=0, bus=0, slot=29, func=7 class=0c-03-20, hdrtype=0x00, mfdev=0 cmdreg=0x0106, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=d, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xffa7fc00, size 10, enabled pcib0: matched entry for 0.29.INTD (src \\_SB_.LNKH:0) pcib0: slot 29 INTD routed to irq 4 via \\_SB_.LNKH found-> vendor=0x8086, dev=0x2448, revid=0x83 domain=0, bus=0, slot=30, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 cmdreg=0x0107, statreg=0x8080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x06 (1500 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24cc, revid=0x03 domain=0, bus=0, slot=31, func=0 class=06-01-00, hdrtype=0x00, mfdev=1 cmdreg=0x000f, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24ca, revid=0x03 domain=0, bus=0, slot=31, func=1 class=01-01-8a, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 map[20]: type I/O Port, range 32, base 0xffa0, size 4, enabled map[24]: type Memory, range 32, base 0, size 10, memory disabled found-> vendor=0x8086, dev=0x24c5, revid=0x03 domain=0, bus=0, slot=31, func=5 class=04-01-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe000, size 8, enabled map[14]: type I/O Port, range 32, base 0xe100, size 6, enabled map[18]: type Memory, range 32, base 0, size 9, memory disabled map[1c]: type Memory, range 32, base 0, size 8, memory disabled found-> vendor=0x8086, dev=0x24c6, revid=0x03 domain=0, bus=0, slot=31, func=6 class=07-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe200, size 8, enabled map[14]: type I/O Port, range 32, base 0xe300, size 7, enabled pci0: at device 0.1 (no driver attached) pci0: at device 0.3 (no driver attached) vgapci0: port 0xdc00-0xdc07 mem 0xf0000000-0xf7ffffff,0xffa80000-0xffafffff irq 11 at device 2.0 on pci0 agp0: on vgapci0 vgapci0: Reserved 0x8000000 bytes for rid 0x10 type 3 at 0xf0000000 vgapci0: Reserved 0x80000 bytes for rid 0x14 type 3 at 0xffa80000 agp0: detected 8060k stolen memory agp0: aperture size is 128M vgapci1: mem 0xe8000000-0xefffffff,0xff980000-0xff9fffff at device 2.1 on pci0 uhci0: port 0xd480-0xd49f irq 11 at device 29.0 on pci0 uhci0: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd480 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd800-0xd81f irq 5 at device 29.1 on pci0 uhci1: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd800 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd880-0xd89f irq 4 at device 29.2 on pci0 uhci2: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd880 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xffa7fc00-0xffa7ffff irq 4 at device 29.7 on pci0 ehci0: Reserved 0x400 bytes for rid 0x10 type 3 at 0xffa7fc00 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: on usb3 uhub3: 6 ports with 6 removable, self powered pcib1: at device 30.0 on pci0 pcib1: domain 0 pcib1: secondary bus 1 pcib1: subordinate bus 1 pcib1: I/O decode 0xc000-0xcfff pcib1: memory decode 0xff700000-0xff7fffff pcib1: prefetched decode 0xdea00000-0xdeafffff pcib1: Subtractively decoded bridge. ACPI: Found matching pin for 1.8.INTA at func 0: 11 ACPI: Found matching pin for 1.5.INTA at func 0: 255 ACPI: Found matching pin for 1.5.INTB at func 1: 11 ACPI: Found matching pin for 1.4.INTA at func 0: 4 pci1: on pcib1 pci1: domain=0, physical bus=1 found-> vendor=0x8086, dev=0x1043, revid=0x04 domain=0, bus=1, slot=4, func=0 class=02-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0116, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x22 (8500 ns) intpin=a, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fd000, size 12, enabled pcib1: requested memory range 0xff7fd000-0xff7fdfff: good pcib1: matched entry for 1.4.INTA (src \\_SB_.LNKC:0) pcib1: slot 4 INTA routed to irq 4 via \\_SB_.LNKC found-> vendor=0x1180, dev=0x0475, revid=0xb8 domain=0, bus=1, slot=5, func=0 class=06-07-00, hdrtype=0x02, mfdev=1 cmdreg=0x0007, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x20 (960 ns), mingnt=0x80 (32000 ns), maxlat=0x07 (1750 ns) intpin=a, irq=255 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0, size 12, enabled found-> vendor=0x1180, dev=0x0551, revid=0x00 domain=0, bus=1, slot=5, func=1 class=0c-00-10, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x04 (1000 ns) intpin=b, irq=11 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fe800, size 11, enabled pcib1: requested memory range 0xff7fe800-0xff7fefff: good pcib1: matched entry for 1.5.INTB (src \\_SB_.LNKA:0) pcib1: slot 5 INTB routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x103e, revid=0x83 domain=0, bus=1, slot=8, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0117, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x08 (2000 ns), maxlat=0x38 (14000 ns) intpin=a, irq=11 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0xff7ff000, size 12, enabled pcib1: requested memory range 0xff7ff000-0xff7fffff: good map[14]: type I/O Port, range 32, base 0xcc00, size 6, enabled pcib1: requested I/O range 0xcc00-0xcc3f: in range pcib1: matched entry for 1.8.INTA (src \\_SB_.LNKE:0) pcib1: slot 8 INTA routed to irq 11 via \\_SB_.LNKE ipw0: mem 0xff7fd000-0xff7fdfff irq 4 at device 4.0 on pci1 ipw0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7fd000 ipw0: bpf attached ipw0: Ethernet address: 00:04:23:71:77:46 ipw0: bpf attached ipw0: bpf attached ipw0: [MPSAFE] ipw0: [ITHREAD] ipw0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps cbb0: at device 5.0 on pci1 pcib1: cbb0 requested memory range 0xff700000-0xff7fffff: good cbb0: Lazy allocation of 0x1000 bytes rid 0x10 type 3 at 0xff700000 cardbus0: on cbb0 pccard0: <16-bit PCCard bus> on cbb0 pcib1: matched entry for 1.5.INTA (src \\_SB_.LNKB:0) pci_link1: Picked IRQ 9 with weight 0 pcib1: slot 5 INTA routed to irq 9 via \\_SB_.LNKB cbb0: [MPSAFE] cbb0: [ITHREAD] cbb0: PCI Configuration space: 0x00: 0x04751180 0x02100007 0x060700b8 0x00822000 0x10: 0xff700000 0x020000dc 0x20030201 0xfffff000 0x20: 0x00000000 0xfffff000 0x00000000 0xfffffffc 0x30: 0x00000000 0xfffffffc 0x00000000 0x07000109 0x40: 0x17441043 0x00000001 0x00000000 0x00000000 0x50: 0x00000000 0x00000000 0x00000000 0x00000000 0x60: 0x00000000 0x00000000 0x00000000 0x00000000 0x70: 0x00000000 0x00000000 0x00000000 0x00000000 0x80: 0x20a00001 0x00000000 0x04630463 0x00000000 0x90: 0x00000000 0x00000000 0x00000000 0x00000000 0xa0: 0x80000000 0x00000000 0x00000000 0x00000000 0xb0: 0x00000000 0x00000000 0x00000000 0x00000000 0xc0: 0x17441043 0x00000000 0x00000000 0x00000000 0xd0: 0x00000000 0x00000000 0x00000000 0xfe0a0001 0xe0: 0x24c04000 0x00000000 0x00000000 0x00000000 0xf0: 0x00000000 0x00000000 0x00000000 0x00000000 fwohci0: mem 0xff7fe800-0xff7fefff irq 11 at device 5.1 on pci1 fwohci0: Reserved 0x800 bytes for rid 0x10 type 3 at 0xff7fe800 fwohci0: [MPSAFE] fwohci0: [FILTER] fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:03:10:02:07 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:10:02:07 fwe0: bpf attached fwe0: Ethernet address: 02:e0:18:10:02:07 fwip0: on firewire0 fwip0: bpf attached fwip0: Firewire address: 00:e0:18:00:03:10:02:07 @ 0xfffe00000000, S400, maxrec 2048 sbp0: on firewire0 dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x1374000 fwohci0: Initiate bus reset fwohci0: BUS reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode fxp0: port 0xcc00-0xcc3f mem 0xff7ff000-0xff7fffff irq 11 at device 8.0 on pci1 fxp0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7ff000 fxp0: using memory space register mapping fxp0: PCI IDs: 8086 103e 1043 1745 0083 fxp0: Dynamic Standby mode is disabled fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: MII without any PHY! device_attach: fxp0 attach returned 6 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0 ata0: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6 ata0: reset tp1 mask=03 ostat0=50 ostat1=00 ata0: stat0=0x90 err=0x90 lsb=0x90 msb=0x90 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata0: reset tp2 stat0=50 stat1=00 devices=0x1 ata0: [MPSAFE] ata0: [ITHREAD] ata1: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x18 type 4 at 0x170 atapci0: Reserved 0x1 bytes for rid 0x1c type 4 at 0x376 ata1: reset tp1 mask=03 ostat0=50 ostat1=00 ata1: stat0=0x10 err=0x01 lsb=0x14 msb=0xeb ata1: stat1=0x00 err=0x01 lsb=0x7f msb=0x7f ata1: reset tp2 stat0=10 stat1=00 devices=0x4 ata1: [MPSAFE] ata1: [ITHREAD] pcm0: port 0xe000-0xe0ff,0xe100-0xe13f at device 31.5 on pci0 pcm0: Lazy allocation of 0x200 bytes rid 0x18 type 3 at 0x80000000 pcm0: Lazy allocation of 0x100 bytes rid 0x1c type 3 at 0x80000200 pcib0: matched entry for 0.31.INTB (src \\_SB_.LNKB:0) pcib0: slot 31 INTB routed to irq 9 via \\_SB_.LNKB pcm0: [MPSAFE] pcm0: [ITHREAD] pcm0: pcm0: Codec features headphone, 20 bit DAC, 20 bit ADC, 5 bit master volume, SigmaTel 3D Enhancement pcm0: Primary codec extended features variable rate PCM, reserved 1, AMAP, reserved 4 pcm0: ac97 codec dac ready count: 0 pcm0: Mixer "vol": pcm0: Mixer "pcm": pcm0: Mixer "speaker": pcm0: Mixer "line": pcm0: Mixer "mic": pcm0: Mixer "cd": pcm0: Mixer "rec": pcm0: Mixer "igain": pcm0: Mixer "ogain": pcm0: Mixer "line1": pcm0: Mixer "phin": pcm0: Mixer "phout": pcm0: Mixer "video": pcm0: clone manager: deadline=750ms flags=0x8000001e pcm0: sndbuf_setmap 1620000, 4000; 0xd4d6f000 -> 1620000 pcm0: sndbuf_setmap 162c000, 4000; 0xd4d73000 -> 162c000 pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 acpi_lid0: on acpi0 acpi_tz0: on acpi0 acpi_acad0: on acpi0 battery0: on acpi0 battery1: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 atkbd: the current kbd controller command byte 0065 atkbd: keyboard ID 0x41ab (2) kbd0 at atkbd0 kbd0: atkbd0, AT 101/102 (2), config:0x0, flags:0x3d0000 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: unable to allocate IRQ psmcpnp0: irq 12 on acpi0 psm0: current command byte:0065 psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0-00, 2 buttons psm0: config:00000000, flags:00000008, packet size:3 psm0: syncmask:c0, syncbits:00 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0 port 0x2f8-0x2ff irq 3 drq 1 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff ex_isa_identify() ahc_isa_probe 12: ioport 0xcc00 alloc failed ahc_isa_probe 13: ioport 0xdc00 alloc failed ahc_isa_probe 14: ioport 0xec00 alloc failed ata: ata0 already exists; skipping it ata: ata1 already exists; skipping it atkbdc: atkbdc0 already exists; skipping it sio: sio0 already exists; skipping it pnp_identify: Trying Read_Port at 203 pnp_identify: Trying Read_Port at 243 pnp_identify: Trying Read_Port at 283 pnp_identify: Trying Read_Port at 2c3 pnp_identify: Trying Read_Port at 303 pnp_identify: Trying Read_Port at 343 pnp_identify: Trying Read_Port at 383 pnp_identify: Trying Read_Port at 3c3 PNP Identify complete sc: sc0 already exists; skipping it vga: vga0 already exists; skipping it isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices pmtimer0 on isa0 orm0: at iomem 0xc0000-0xccfff pnpid ORM0000 on isa0 adv0: not probed (disabled) aha0: not probed (disabled) aic0: not probed (disabled) bt0: not probed (disabled) cs0: not probed (disabled) ed0: not probed (disabled) fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fe0: not probed (disabled) ie0: not probed (disabled) le0: not probed (disabled) ppc0: parallel port found at 0x378 ppc0: using extended I/O port range ppc0: ECP SPP ECP+EPP SPP ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [MPSAFE] ppbus0: [ITHREAD] plip0: on ppbus0 plip0: bpf attached lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sc0: fb0, kbd1, terminal emulator: sc (syscons terminal) sio1 failed to probe at port 0x2f8 irq 3 on isa0 sio2: not probed (disabled) sio3: not probed (disabled) sn0: not probed (disabled) vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 vt0: not probed (disabled) isa_probe_children: probing PnP devices Device configuration finished. procfs registered Timecounter "TSC" frequency 600023372 Hz quality 800 Timecounters tick every 1.000 msec lo0: bpf attachedfirewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) hptrr: no controller detected. acpi_acad0: acline initialization start battery0: battery initialization start battery1: battery initialization start ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire system power profile changed to 'economy' ad0: setting PIO4 on ICH4 chip acpi_acad0: Off Line acpi_acad0: acline initialization done, tried 1 times ad0: setting UDMA100 on ICH4 chip ad0: 38154MB at ata0-master UDMA100 ad0: 78140160 sectors [77520C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad0 battery0: battery initialization done, tried 1 times ad0: Intel check1 failed ad0: Adaptec check1 failed ad0: LSI (v3) check1 failed ad0: LSI (v2) check1 failed ad0: FreeBSD check1 failed ata1-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire acd0: setting PIO4 on ICH4 chip acd0: setting UDMA33 on ICH4 chip acd0: DVDROM drive at ata1 as master acd0: read 4125KB/s (4125KB/s), 512KB buffer, UDMA33 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet acd0: Writes: acd0: Audio: play, 16 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc pcm0: measured ac97 link rate at 48008 Hz, will use 48000 Hz acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe0:ata1:0:0:0): Down reving Protocol Version from 2 to 0? (probe0:ata1:0:0:0): error 6 (probe0:ata1:0:0:0): Unretryable Error acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe1:sbp0:0:0:0): error 22 (probe1:sbp0:0:0:0): Unretryable Error (probe2:sbp0:0:1:0): error 22 (probe2:sbp0:0:1:0): Unretryable Error (probe3:sbp0:0:2:0): error 22 (probe3:sbp0:0:2:0): Unretryable Error (probe4:sbp0:0:3:0): error 22 (probe4:sbp0:0:3:0): Unretryable Error (probe5:sbp0:0:4:0): error 22 (probe5:sbp0:0:4:0): Unretryable Error (probe6:sbp0:0:5:0): error 22 (probe6:sbp0:0:5:0): Unretryable Error (probe7:sbp0:0:6:0): error 22 (probe7:sbp0:0:6:0): Unretryable Error pass0 at ata1 bus 0 target 0 lun 0 pass0: Removable CD-ROM SCSI-0 device pass0: 33.000MB/s transfers GEOM: new disk cd0 ATA PseudoRAID load(cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error cd0 at ata1 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present ed (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error Trying to mount root from ufs:/dev/ad0s1a start_init: trying /sbin/init drm0: on vgapci0 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm1: on vgapci1 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm0: [MPSAFE] drm0: [ITHREAD] battery1: battery initialization failed, giving up umass0: on uhub3 umass0:3:0:-1: Attached to scbus3 (probe0:umass-sim0:0:0:0): error 22 (probe0:umass-sim0:0:0:0): Unretryable Error pass1 at umass-sim0 bus 0 target 0 lun 0 pass1: < > Removable Direct Access SCSI-2 device pass1: 40.000MB/s transfers GEOM: new disk da0 da0 at umass-sim0 bus 0 target 0 lun 0 da0: < > Removable Direct Access SCSI-2 device da0: 40.000MB/s transfers da0: 3102MB (6354432 512 byte sectors: 255H 63S/T 395C) GEOM_LABEL: Label for provider da0s1a is ufs/usbdrive. From francisgendreau at videotron.ca Tue Jul 8 18:50:06 2008 From: francisgendreau at videotron.ca (Francis Gendreau) Date: Tue Jul 8 18:50:22 2008 Subject: kern/125195: verbrose dmesg from asus m3000n m3n as requested by Gavin Atkinson Message-ID: <200807081850.m68Io6ev087520@freefall.freebsd.org> The following reply was made to PR kern/125195; it has been noted by GNATS. From: Francis Gendreau To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/125195: verbrose dmesg from asus m3000n m3n as requested by Gavin Atkinson Date: Tue, 08 Jul 2008 11:33:22 -0400 verbose dmesg after hard power cycle: Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries Data TLB: 4 KB Pages, 4-way set associative, 128 entries Instruction TLB: 4 MB pages, fully associative, 2 entries 2nd-level cache: 1 MB, 8-way set associative, 64 byte line size 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size Data TLB: 4 MB Pages, 4-way set associative, 8 entries 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size real memory = 527695872 (503 MB) Physical memory chunk(s): 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) 0x0000000001028000 - 0x000000001ee22fff, 501198848 bytes (122363 pages) avail memory = 502419456 (479 MB) Table 'FACP' at 0x1f740200 Table 'OEMB' at 0x1f750040 MADT: No MADT table found APIC: Could not find any APICs. pnpbios: Found PnP BIOS data at 0xc00f2e00 pnpbios: Entry = f0000:39da Rev = 1.0 Other BIOS signatures found: wlan_amrr: wlan: <802.11 Link Layer> firmware: 'ipw_bss' version 130: 209190 bytes loaded at 0xc0d68738 firmware: 'ipw_ibss' version 130: 201138 bytes loaded at 0xc0d9d73c firmware: 'ipw_monitor' version 130: 196458 bytes loaded at 0xc0dd0748 snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_buffersize=16384 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 ath_rate: version 1.2 nfslock: pseudo-device kbd: new array size 4 kbd1 at kbdmux0 io: mem: Pentium Pro MTRR support enabled null: random: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27) ACPI: RSDP @ 0x0xf4b70/0x0014 (v 0 ACPIAM) ACPI: RSDT @ 0x0x1f740000/0x002C (v 1 A M I OEMRSDT 0x05000314 MSFT 0x00000097) ACPI: FACP @ 0x0x1f740200/0x0081 (v 2 A M I OEMFACP 0x05000314 MSFT 0x00000097) ACPI: DSDT @ 0x0x1f740300/0x7323 (v 1 0ABBD 0ABBD001 0x00000001 MSFT 0x0100000D) ACPI: FACS @ 0x0x1f750000/0x0040 ACPI: OEMB @ 0x0x1f750040/0x004D (v 1 A M I OEMBIOS 0x05000314 MSFT 0x00000097) npx0: INT 16 interface acpi0: on motherboard acpi0: [MPSAFE] acpi0: [ITHREAD] pci_open(1): mode 1 addr port (0x0cf8) is 0x8000005c pci_open(1a): mode1res=0x80000000 (0x80000000) pci_cfgcheck: device 0 [class=060000] [hdr=80] is there (id=35808086) pcibios: No call entry point AcpiOsDerivePciId: \\_SB_.PCI0.P0P1.CBS0.CBSP -> bus 1 dev 5 func 0 acpi0: Power Button (fixed) acpi0: wakeup code va 0xccd3f000 pa 0x1000 atpic: Programming IRQ9 as level/low AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.FHR0 -> bus 0 dev 31 func 0 AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.IROR -> bus 0 dev 31 func 0 acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 1f700000 (3) failed ACPI timer: 1/1 1/1 1/0 1/1 1/1 1/1 1/0 1/1 1/1 1/1 -> 10 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pci_link0: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 3 4 5 6 7 11 12 Validation 0 11 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link1: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 4 5 6 7 11 12 Validation 0 255 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link2: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 12 Validation 0 4 N 0 4 12 After Disable 0 255 N 0 4 12 pci_link3: Index IRQ Rtd Ref IRQs Initial Probe 0 5 N 0 5 6 Validation 0 5 N 0 5 6 After Disable 0 255 N 0 5 6 pci_link4: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 6 11 Validation 0 11 N 0 6 11 After Disable 0 255 N 0 6 11 pci_link5: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 7 Validation 0 255 N 0 3 7 After Disable 0 255 N 0 3 7 pci_link6: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 4 7 Validation 0 255 N 0 4 7 After Disable 0 255 N 0 4 7 pci_link7: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 6 12 Validation 0 4 N 0 4 6 12 After Disable 0 255 N 0 4 6 12 cpu0: on acpi0 cpu0: switching to generic Cx mode est0: on cpu0 p4tcc0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 ACPI: Found matching pin for 0.2.INTA at func 0: 11 ACPI: Found matching pin for 0.31.INTA at func 1: 255 ACPI: Found matching pin for 0.31.INTB at func 5: 255 ACPI: Found matching pin for 0.31.INTB at func 6: 255 ACPI: Found matching pin for 0.29.INTA at func 0: 11 ACPI: Found matching pin for 0.29.INTB at func 1: 5 ACPI: Found matching pin for 0.29.INTC at func 2: 4 ACPI: Found matching pin for 0.29.INTD at func 7: 4 pci0: on pcib0 pci0: domain=0, physical bus=0 found-> vendor=0x8086, dev=0x3580, revid=0x02 domain=0, bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x2090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3584, revid=0x02 domain=0, bus=0, slot=0, func=1 class=08-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3585, revid=0x02 domain=0, bus=0, slot=0, func=3 class=08-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=0 class=03-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xf0000000, size 27, enabled map[14]: type Memory, range 32, base 0xffa80000, size 19, enabled map[18]: type I/O Port, range 32, base 0xdc00, size 3, enabled pcib0: matched entry for 0.2.INTA (src \\_SB_.LNKA:0) pcib0: slot 2 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=1 class=03-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xe8000000, size 27, enabled map[14]: type Memory, range 32, base 0xff980000, size 19, enabled found-> vendor=0x8086, dev=0x24c2, revid=0x03 domain=0, bus=0, slot=29, func=0 class=0c-03-00, hdrtype=0x00, mfdev=1 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 map[20]: type I/O Port, range 32, base 0xd480, size 5, enabled pcib0: matched entry for 0.29.INTA (src \\_SB_.LNKA:0) pcib0: slot 29 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x24c4, revid=0x03 domain=0, bus=0, slot=29, func=1 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=5 map[20]: type I/O Port, range 32, base 0xd800, size 5, enabled pcib0: matched entry for 0.29.INTB (src \\_SB_.LNKD:0) pcib0: slot 29 INTB routed to irq 5 via \\_SB_.LNKD found-> vendor=0x8086, dev=0x24c7, revid=0x03 domain=0, bus=0, slot=29, func=2 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=c, irq=4 map[20]: type I/O Port, range 32, base 0xd880, size 5, enabled pcib0: matched entry for 0.29.INTC (src \\_SB_.LNKC:0) pcib0: slot 29 INTC routed to irq 4 via \\_SB_.LNKC found-> vendor=0x8086, dev=0x24cd, revid=0x03 domain=0, bus=0, slot=29, func=7 class=0c-03-20, hdrtype=0x00, mfdev=0 cmdreg=0x0106, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=d, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xffa7fc00, size 10, enabled pcib0: matched entry for 0.29.INTD (src \\_SB_.LNKH:0) pcib0: slot 29 INTD routed to irq 4 via \\_SB_.LNKH found-> vendor=0x8086, dev=0x2448, revid=0x83 domain=0, bus=0, slot=30, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 cmdreg=0x0107, statreg=0x8080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x06 (1500 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24cc, revid=0x03 domain=0, bus=0, slot=31, func=0 class=06-01-00, hdrtype=0x00, mfdev=1 cmdreg=0x000f, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24ca, revid=0x03 domain=0, bus=0, slot=31, func=1 class=01-01-8a, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 map[20]: type I/O Port, range 32, base 0xffa0, size 4, enabled map[24]: type Memory, range 32, base 0, size 10, memory disabled found-> vendor=0x8086, dev=0x24c5, revid=0x03 domain=0, bus=0, slot=31, func=5 class=04-01-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe000, size 8, enabled map[14]: type I/O Port, range 32, base 0xe100, size 6, enabled map[18]: type Memory, range 32, base 0, size 9, memory disabled map[1c]: type Memory, range 32, base 0, size 8, memory disabled found-> vendor=0x8086, dev=0x24c6, revid=0x03 domain=0, bus=0, slot=31, func=6 class=07-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe200, size 8, enabled map[14]: type I/O Port, range 32, base 0xe300, size 7, enabled pci0: at device 0.1 (no driver attached) pci0: at device 0.3 (no driver attached) vgapci0: port 0xdc00-0xdc07 mem 0xf0000000-0xf7ffffff,0xffa80000-0xffafffff irq 11 at device 2.0 on pci0 agp0: on vgapci0 vgapci0: Reserved 0x8000000 bytes for rid 0x10 type 3 at 0xf0000000 vgapci0: Reserved 0x80000 bytes for rid 0x14 type 3 at 0xffa80000 agp0: detected 8060k stolen memory agp0: aperture size is 128M vgapci1: mem 0xe8000000-0xefffffff,0xff980000-0xff9fffff at device 2.1 on pci0 uhci0: port 0xd480-0xd49f irq 11 at device 29.0 on pci0 uhci0: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd480 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd800-0xd81f irq 5 at device 29.1 on pci0 uhci1: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd800 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd880-0xd89f irq 4 at device 29.2 on pci0 uhci2: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd880 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xffa7fc00-0xffa7ffff irq 4 at device 29.7 on pci0 ehci0: Reserved 0x400 bytes for rid 0x10 type 3 at 0xffa7fc00 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: on usb3 uhub3: 6 ports with 6 removable, self powered pcib1: at device 30.0 on pci0 pcib1: domain 0 pcib1: secondary bus 1 pcib1: subordinate bus 1 pcib1: I/O decode 0xc000-0xcfff pcib1: memory decode 0xff700000-0xff7fffff pcib1: prefetched decode 0xdea00000-0xdeafffff pcib1: Subtractively decoded bridge. ACPI: Found matching pin for 1.8.INTA at func 0: 11 ACPI: Found matching pin for 1.5.INTA at func 0: 255 ACPI: Found matching pin for 1.5.INTB at func 1: 11 ACPI: Found matching pin for 1.4.INTA at func 0: 4 pci1: on pcib1 pci1: domain=0, physical bus=1 found-> vendor=0x8086, dev=0x1043, revid=0x04 domain=0, bus=1, slot=4, func=0 class=02-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0116, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x22 (8500 ns) intpin=a, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fd000, size 12, enabled pcib1: requested memory range 0xff7fd000-0xff7fdfff: good pcib1: matched entry for 1.4.INTA (src \\_SB_.LNKC:0) pcib1: slot 4 INTA routed to irq 4 via \\_SB_.LNKC found-> vendor=0x1180, dev=0x0475, revid=0xb8 domain=0, bus=1, slot=5, func=0 class=06-07-00, hdrtype=0x02, mfdev=1 cmdreg=0x0007, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x20 (960 ns), mingnt=0x80 (32000 ns), maxlat=0x07 (1750 ns) intpin=a, irq=255 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0, size 12, enabled found-> vendor=0x1180, dev=0x0551, revid=0x00 domain=0, bus=1, slot=5, func=1 class=0c-00-10, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x04 (1000 ns) intpin=b, irq=11 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fe800, size 11, enabled pcib1: requested memory range 0xff7fe800-0xff7fefff: good pcib1: matched entry for 1.5.INTB (src \\_SB_.LNKA:0) pcib1: slot 5 INTB routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x103e, revid=0x83 domain=0, bus=1, slot=8, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0117, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x08 (2000 ns), maxlat=0x38 (14000 ns) intpin=a, irq=11 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0xff7ff000, size 12, enabled pcib1: requested memory range 0xff7ff000-0xff7fffff: good map[14]: type I/O Port, range 32, base 0xcc00, size 6, enabled pcib1: requested I/O range 0xcc00-0xcc3f: in range pcib1: matched entry for 1.8.INTA (src \\_SB_.LNKE:0) pcib1: slot 8 INTA routed to irq 11 via \\_SB_.LNKE ipw0: mem 0xff7fd000-0xff7fdfff irq 4 at device 4.0 on pci1 ipw0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7fd000 ipw0: bpf attached ipw0: Ethernet address: 00:04:23:71:77:46 ipw0: bpf attached ipw0: bpf attached ipw0: [MPSAFE] ipw0: [ITHREAD] ipw0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps cbb0: at device 5.0 on pci1 pcib1: cbb0 requested memory range 0xff700000-0xff7fffff: good cbb0: Lazy allocation of 0x1000 bytes rid 0x10 type 3 at 0xff700000 cardbus0: on cbb0 pccard0: <16-bit PCCard bus> on cbb0 pcib1: matched entry for 1.5.INTA (src \\_SB_.LNKB:0) pci_link1: Picked IRQ 9 with weight 0 pcib1: slot 5 INTA routed to irq 9 via \\_SB_.LNKB cbb0: [MPSAFE] cbb0: [ITHREAD] cbb0: PCI Configuration space: 0x00: 0x04751180 0x02100007 0x060700b8 0x00822000 0x10: 0xff700000 0x020000dc 0x20030201 0xfffff000 0x20: 0x00000000 0xfffff000 0x00000000 0xfffffffc 0x30: 0x00000000 0xfffffffc 0x00000000 0x07000109 0x40: 0x17441043 0x00000001 0x00000000 0x00000000 0x50: 0x00000000 0x00000000 0x00000000 0x00000000 0x60: 0x00000000 0x00000000 0x00000000 0x00000000 0x70: 0x00000000 0x00000000 0x00000000 0x00000000 0x80: 0x20a00001 0x00000000 0x04630463 0x00000000 0x90: 0x00000000 0x00000000 0x00000000 0x00000000 0xa0: 0x80000000 0x00000000 0x00000000 0x00000000 0xb0: 0x00000000 0x00000000 0x00000000 0x00000000 0xc0: 0x17441043 0x00000000 0x00000000 0x00000000 0xd0: 0x00000000 0x00000000 0x00000000 0xfe0a0001 0xe0: 0x24c04000 0x00000000 0x00000000 0x00000000 0xf0: 0x00000000 0x00000000 0x00000000 0x00000000 fwohci0: mem 0xff7fe800-0xff7fefff irq 11 at device 5.1 on pci1 fwohci0: Reserved 0x800 bytes for rid 0x10 type 3 at 0xff7fe800 fwohci0: [MPSAFE] fwohci0: [FILTER] fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:03:10:02:07 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:10:02:07 fwe0: bpf attached fwe0: Ethernet address: 02:e0:18:10:02:07 fwip0: on firewire0 fwip0: bpf attached fwip0: Firewire address: 00:e0:18:00:03:10:02:07 @ 0xfffe00000000, S400, maxrec 2048 sbp0: on firewire0 dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x1374000 fwohci0: Initiate bus reset fwohci0: BUS reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode fxp0: port 0xcc00-0xcc3f mem 0xff7ff000-0xff7fffff irq 11 at device 8.0 on pci1 fxp0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7ff000 fxp0: using memory space register mapping fxp0: PCI IDs: 8086 103e 1043 1745 0083 fxp0: Dynamic Standby mode is disabled fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: MII without any PHY! device_attach: fxp0 attach returned 6 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0 ata0: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6 ata0: reset tp1 mask=03 ostat0=50 ostat1=00 ata0: stat0=0x90 err=0x90 lsb=0x90 msb=0x90 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata0: reset tp2 stat0=50 stat1=00 devices=0x1 ata0: [MPSAFE] ata0: [ITHREAD] ata1: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x18 type 4 at 0x170 atapci0: Reserved 0x1 bytes for rid 0x1c type 4 at 0x376 ata1: reset tp1 mask=03 ostat0=50 ostat1=00 ata1: stat0=0x10 err=0x01 lsb=0x14 msb=0xeb ata1: stat1=0x00 err=0x01 lsb=0x7f msb=0x7f ata1: reset tp2 stat0=10 stat1=00 devices=0x4 ata1: [MPSAFE] ata1: [ITHREAD] pcm0: port 0xe000-0xe0ff,0xe100-0xe13f at device 31.5 on pci0 pcm0: Lazy allocation of 0x200 bytes rid 0x18 type 3 at 0x80000000 pcm0: Lazy allocation of 0x100 bytes rid 0x1c type 3 at 0x80000200 pcib0: matched entry for 0.31.INTB (src \\_SB_.LNKB:0) pcib0: slot 31 INTB routed to irq 9 via \\_SB_.LNKB pcm0: [MPSAFE] pcm0: [ITHREAD] pcm0: pcm0: Codec features headphone, 20 bit DAC, 20 bit ADC, 5 bit master volume, SigmaTel 3D Enhancement pcm0: Primary codec extended features variable rate PCM, reserved 1, AMAP, reserved 4 pcm0: ac97 codec dac ready count: 0 pcm0: Mixer "vol": pcm0: Mixer "pcm": pcm0: Mixer "speaker": pcm0: Mixer "line": pcm0: Mixer "mic": pcm0: Mixer "cd": pcm0: Mixer "rec": pcm0: Mixer "igain": pcm0: Mixer "ogain": pcm0: Mixer "line1": pcm0: Mixer "phin": pcm0: Mixer "phout": pcm0: Mixer "video": pcm0: clone manager: deadline=750ms flags=0x8000001e pcm0: sndbuf_setmap 1620000, 4000; 0xd4d6f000 -> 1620000 pcm0: sndbuf_setmap 162c000, 4000; 0xd4d73000 -> 162c000 pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 acpi_lid0: on acpi0 acpi_tz0: on acpi0 acpi_acad0: on acpi0 battery0: on acpi0 battery1: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 atkbd: the current kbd controller command byte 0065 atkbd: keyboard ID 0x41ab (2) kbd0 at atkbd0 kbd0: atkbd0, AT 101/102 (2), config:0x0, flags:0x3d0000 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: unable to allocate IRQ psmcpnp0: irq 12 on acpi0 psm0: current command byte:0065 psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0-00, 2 buttons psm0: config:00000000, flags:00000008, packet size:3 psm0: syncmask:c0, syncbits:00 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0 port 0x2f8-0x2ff irq 3 drq 1 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff ex_isa_identify() ahc_isa_probe 12: ioport 0xcc00 alloc failed ahc_isa_probe 13: ioport 0xdc00 alloc failed ahc_isa_probe 14: ioport 0xec00 alloc failed ata: ata0 already exists; skipping it ata: ata1 already exists; skipping it atkbdc: atkbdc0 already exists; skipping it sio: sio0 already exists; skipping it pnp_identify: Trying Read_Port at 203 pnp_identify: Trying Read_Port at 243 pnp_identify: Trying Read_Port at 283 pnp_identify: Trying Read_Port at 2c3 pnp_identify: Trying Read_Port at 303 pnp_identify: Trying Read_Port at 343 pnp_identify: Trying Read_Port at 383 pnp_identify: Trying Read_Port at 3c3 PNP Identify complete sc: sc0 already exists; skipping it vga: vga0 already exists; skipping it isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices pmtimer0 on isa0 orm0: at iomem 0xc0000-0xccfff pnpid ORM0000 on isa0 adv0: not probed (disabled) aha0: not probed (disabled) aic0: not probed (disabled) bt0: not probed (disabled) cs0: not probed (disabled) ed0: not probed (disabled) fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fe0: not probed (disabled) ie0: not probed (disabled) le0: not probed (disabled) ppc0: parallel port found at 0x378 ppc0: using extended I/O port range ppc0: ECP SPP ECP+EPP SPP ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [MPSAFE] ppbus0: [ITHREAD] plip0: on ppbus0 plip0: bpf attached lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sc0: fb0, kbd1, terminal emulator: sc (syscons terminal) sio1 failed to probe at port 0x2f8 irq 3 on isa0 sio2: not probed (disabled) sio3: not probed (disabled) sn0: not probed (disabled) vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 vt0: not probed (disabled) isa_probe_children: probing PnP devices Device configuration finished. procfs registered Timecounter "TSC" frequency 600024956 Hz quality 800 Timecounters tick every 1.000 msec lo0: bpf attached hptrr: no controller detected. firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire acpi_acad0: acline initialization start ad0: setting PIO4 on ICH4 chip ad0: setting UDMA100 on ICH4 chip battery0: battery initialization start battery1: battery initialization start system power profile changed to 'economy' ad0: 38154MB at ata0-master UDMA100 ad0: 78140160 sectors [77520C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad0 acpi_acad0: Off Line acpi_acad0: acline initialization done, tried 1 times battery0: battery initialization done, tried 1 times ad0: Intel check1 failed ad0: Adaptec check1 failed ad0: LSI (v3) check1 failed ad0: LSI (v2) check1 failed ad0: FreeBSD check1 failed ata1-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire acd0: setting PIO4 on ICH4 chip acd0: setting UDMA33 on ICH4 chip acd0: DVDROM drive at ata1 as master acd0: read 4125KB/s (4125KB/s), 512KB buffer, UDMA33 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet acd0: Writes: acd0: Audio: play, 16 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc pcm0: measured ac97 link rate at 48017 Hz, will use 48000 Hz acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe0:ata1:0:0:0): Down reving Protocol Version from 2 to 0? (probe0:ata1:0:0:0): error 6 (probe0:ata1:0:0:0): Unretryable Error acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe6:sbp0:0:5:0): error 22 (probe6:sbp0:0:5:0): Unretryable Error (probe1:sbp0:0:0:0): error 22 (probe1:sbp0:0:0:0): Unretryable Error (probe2:sbp0:0:1:0): error 22 (probe2:sbp0:0:1:0): Unretryable Error (probe3:sbp0:0:2:0): error 22 (probe3:sbp0:0:2:0): Unretryable Error (probe4:sbp0:0:3:0): error 22 (probe4:sbp0:0:3:0): Unretryable Error (probe5:sbp0:0:4:0): error 22 (probe5:sbp0:0:4:0): Unretryable Error (probe7:sbp0:0:6:0): error 22 (probe7:sbp0:0:6:0): Unretryable Error pass0 at ata1 bus 0 target 0 lun 0 pass0: Removable CD-ROM SCSI-0 device pass0: 33.000MB/s transfers GEOM: new disk cd0 ATA PseudoRAID loade(cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error cd0 at ata1 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present d (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error Trying to mount root from ufs:/dev/ad0s1a start_init: trying /sbin/init drm0: on vgapci0 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm1: on vgapci1 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm0: [MPSAFE] drm0: [ITHREAD] drm0: [MPSAFE] drm0: [ITHREAD] Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 0 0 done All buffers synced. Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008 root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Preloaded elf kernel "/boot/kernel/kernel" at 0xc0e72000. Preloaded elf module "/boot/kernel/if_ipw.ko" at 0xc0e7214c. Preloaded elf module "/boot/kernel/snd_ich.ko" at 0xc0e721f8. Preloaded elf module "/boot/kernel/sound.ko" at 0xc0e722a4. Preloaded elf module "/boot/kernel/ipw_bss.ko" at 0xc0e72350. Preloaded elf module "/boot/kernel/ipw_ibss.ko" at 0xc0e723fc. Preloaded elf module "/boot/kernel/ipw_monitor.ko" at 0xc0e724ac. Preloaded elf module "/boot/kernel/atapicam.ko" at 0xc0e7255c. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0e7260c. Calibrating clock(s) ... i8254 clock: 1193167 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter "i8254" frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 600023372 Hz CPU: Intel(R) Pentium(R) M processor 1400MHz (600.02-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x695 Stepping = 5 Features=0xa7e9fbbf Features2=0x180 Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries Data TLB: 4 KB Pages, 4-way set associative, 128 entries Instruction TLB: 4 MB pages, fully associative, 2 entries 2nd-level cache: 1 MB, 8-way set associative, 64 byte line size 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size Data TLB: 4 MB Pages, 4-way set associative, 8 entries 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size real memory = 527695872 (503 MB) Physical memory chunk(s): 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) 0x0000000001028000 - 0x000000001ee22fff, 501198848 bytes (122363 pages) avail memory = 502419456 (479 MB) Table 'FACP' at 0x1f740200 Table 'OEMB' at 0x1f750040 MADT: No MADT table found APIC: Could not find any APICs. pnpbios: Found PnP BIOS data at 0xc00f2e00 pnpbios: Entry = f0000:39da Rev = 1.0 Other BIOS signatures found: wlan_amrr: wlan: <802.11 Link Layer> firmware: 'ipw_bss' version 130: 209190 bytes loaded at 0xc0d68738 firmware: 'ipw_ibss' version 130: 201138 bytes loaded at 0xc0d9d73c firmware: 'ipw_monitor' version 130: 196458 bytes loaded at 0xc0dd0748 snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024] feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_buffersize=16384 feeder_rate_min=1 feeder_rate_max=2016000 feeder_rate_round=25 ath_rate: version 1.2 nfslock: pseudo-device kbd: new array size 4 kbd1 at kbdmux0 io: mem: Pentium Pro MTRR support enabled null: random: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27) ACPI: RSDP @ 0x0xf4b70/0x0014 (v 0 ACPIAM) ACPI: RSDT @ 0x0x1f740000/0x002C (v 1 A M I OEMRSDT 0x05000314 MSFT 0x00000097) ACPI: FACP @ 0x0x1f740200/0x0081 (v 2 A M I OEMFACP 0x05000314 MSFT 0x00000097) ACPI: DSDT @ 0x0x1f740300/0x7323 (v 1 0ABBD 0ABBD001 0x00000001 MSFT 0x0100000D) ACPI: FACS @ 0x0x1f750000/0x0040 ACPI: OEMB @ 0x0x1f750040/0x004D (v 1 A M I OEMBIOS 0x05000314 MSFT 0x00000097) npx0: INT 16 interface acpi0: on motherboard acpi0: [MPSAFE] acpi0: [ITHREAD] pci_open(1): mode 1 addr port (0x0cf8) is 0x8000005c pci_open(1a): mode1res=0x80000000 (0x80000000) pci_cfgcheck: device 0 [class=060000] [hdr=80] is there (id=35808086) pcibios: No call entry point AcpiOsDerivePciId: \\_SB_.PCI0.P0P1.CBS0.CBSP -> bus 1 dev 5 func 0 acpi0: Power Button (fixed) acpi0: wakeup code va 0xccd3f000 pa 0x1000 atpic: Programming IRQ9 as level/low AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.FHR0 -> bus 0 dev 31 func 0 AcpiOsDerivePciId: \\_SB_.PCI0.SBRG.IROR -> bus 0 dev 31 func 0 acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 1f700000 (3) failed ACPI timer: 1/0 1/0 1/1 1/1 1/0 1/0 1/1 1/1 1/1 1/0 -> 10 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pci_link0: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 3 4 5 6 7 11 12 Validation 0 11 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link1: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 4 5 6 7 11 12 Validation 0 255 N 0 3 4 5 6 7 11 12 After Disable 0 255 N 0 3 4 5 6 7 11 12 pci_link2: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 12 Validation 0 4 N 0 4 12 After Disable 0 255 N 0 4 12 pci_link3: Index IRQ Rtd Ref IRQs Initial Probe 0 5 N 0 5 6 Validation 0 5 N 0 5 6 After Disable 0 255 N 0 5 6 pci_link4: Index IRQ Rtd Ref IRQs Initial Probe 0 11 N 0 6 11 Validation 0 11 N 0 6 11 After Disable 0 255 N 0 6 11 pci_link5: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 3 7 Validation 0 255 N 0 3 7 After Disable 0 255 N 0 3 7 pci_link6: Index IRQ Rtd Ref IRQs Initial Probe 0 255 N 0 4 7 Validation 0 255 N 0 4 7 After Disable 0 255 N 0 4 7 pci_link7: Index IRQ Rtd Ref IRQs Initial Probe 0 4 N 0 4 6 12 Validation 0 4 N 0 4 6 12 After Disable 0 255 N 0 4 6 12 cpu0: on acpi0 cpu0: switching to generic Cx mode est0: on cpu0 p4tcc0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 ACPI: Found matching pin for 0.2.INTA at func 0: 11 ACPI: Found matching pin for 0.31.INTA at func 1: 255 ACPI: Found matching pin for 0.31.INTB at func 5: 255 ACPI: Found matching pin for 0.31.INTB at func 6: 255 ACPI: Found matching pin for 0.29.INTA at func 0: 11 ACPI: Found matching pin for 0.29.INTB at func 1: 5 ACPI: Found matching pin for 0.29.INTC at func 2: 4 ACPI: Found matching pin for 0.29.INTD at func 7: 4 pci0: on pcib0 pci0: domain=0, physical bus=0 found-> vendor=0x8086, dev=0x3580, revid=0x02 domain=0, bus=0, slot=0, func=0 class=06-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x2090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3584, revid=0x02 domain=0, bus=0, slot=0, func=1 class=08-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3585, revid=0x02 domain=0, bus=0, slot=0, func=3 class=08-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=0 class=03-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xf0000000, size 27, enabled map[14]: type Memory, range 32, base 0xffa80000, size 19, enabled map[18]: type I/O Port, range 32, base 0xdc00, size 3, enabled pcib0: matched entry for 0.2.INTA (src \\_SB_.LNKA:0) pcib0: slot 2 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x3582, revid=0x02 domain=0, bus=0, slot=2, func=1 class=03-80-00, hdrtype=0x00, mfdev=1 cmdreg=0x0007, statreg=0x0090, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) powerspec 1 supports D0 D1 D3 current D0 map[10]: type Prefetchable Memory, range 32, base 0xe8000000, size 27, enabled map[14]: type Memory, range 32, base 0xff980000, size 19, enabled found-> vendor=0x8086, dev=0x24c2, revid=0x03 domain=0, bus=0, slot=29, func=0 class=0c-03-00, hdrtype=0x00, mfdev=1 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 map[20]: type I/O Port, range 32, base 0xd480, size 5, enabled pcib0: matched entry for 0.29.INTA (src \\_SB_.LNKA:0) pcib0: slot 29 INTA routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x24c4, revid=0x03 domain=0, bus=0, slot=29, func=1 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=5 map[20]: type I/O Port, range 32, base 0xd800, size 5, enabled pcib0: matched entry for 0.29.INTB (src \\_SB_.LNKD:0) pcib0: slot 29 INTB routed to irq 5 via \\_SB_.LNKD found-> vendor=0x8086, dev=0x24c7, revid=0x03 domain=0, bus=0, slot=29, func=2 class=0c-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=c, irq=4 map[20]: type I/O Port, range 32, base 0xd880, size 5, enabled pcib0: matched entry for 0.29.INTC (src \\_SB_.LNKC:0) pcib0: slot 29 INTC routed to irq 4 via \\_SB_.LNKC found-> vendor=0x8086, dev=0x24cd, revid=0x03 domain=0, bus=0, slot=29, func=7 class=0c-03-20, hdrtype=0x00, mfdev=0 cmdreg=0x0106, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=d, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xffa7fc00, size 10, enabled pcib0: matched entry for 0.29.INTD (src \\_SB_.LNKH:0) pcib0: slot 29 INTD routed to irq 4 via \\_SB_.LNKH found-> vendor=0x8086, dev=0x2448, revid=0x83 domain=0, bus=0, slot=30, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 cmdreg=0x0107, statreg=0x8080, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x06 (1500 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24cc, revid=0x03 domain=0, bus=0, slot=31, func=0 class=06-01-00, hdrtype=0x00, mfdev=1 cmdreg=0x000f, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) found-> vendor=0x8086, dev=0x24ca, revid=0x03 domain=0, bus=0, slot=31, func=1 class=01-01-8a, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0280, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=255 map[20]: type I/O Port, range 32, base 0xffa0, size 4, enabled map[24]: type Memory, range 32, base 0, size 10, memory disabled found-> vendor=0x8086, dev=0x24c5, revid=0x03 domain=0, bus=0, slot=31, func=5 class=04-01-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe000, size 8, enabled map[14]: type I/O Port, range 32, base 0xe100, size 6, enabled map[18]: type Memory, range 32, base 0, size 9, memory disabled map[1c]: type Memory, range 32, base 0, size 8, memory disabled found-> vendor=0x8086, dev=0x24c6, revid=0x03 domain=0, bus=0, slot=31, func=6 class=07-03-00, hdrtype=0x00, mfdev=0 cmdreg=0x0005, statreg=0x0290, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=255 powerspec 2 supports D0 D3 current D0 map[10]: type I/O Port, range 32, base 0xe200, size 8, enabled map[14]: type I/O Port, range 32, base 0xe300, size 7, enabled pci0: at device 0.1 (no driver attached) pci0: at device 0.3 (no driver attached) vgapci0: port 0xdc00-0xdc07 mem 0xf0000000-0xf7ffffff,0xffa80000-0xffafffff irq 11 at device 2.0 on pci0 agp0: on vgapci0 vgapci0: Reserved 0x8000000 bytes for rid 0x10 type 3 at 0xf0000000 vgapci0: Reserved 0x80000 bytes for rid 0x14 type 3 at 0xffa80000 agp0: detected 8060k stolen memory agp0: aperture size is 128M vgapci1: mem 0xe8000000-0xefffffff,0xff980000-0xff9fffff at device 2.1 on pci0 uhci0: port 0xd480-0xd49f irq 11 at device 29.0 on pci0 uhci0: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd480 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd800-0xd81f irq 5 at device 29.1 on pci0 uhci1: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd800 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd880-0xd89f irq 4 at device 29.2 on pci0 uhci2: Reserved 0x20 bytes for rid 0x20 type 4 at 0xd880 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xffa7fc00-0xffa7ffff irq 4 at device 29.7 on pci0 ehci0: Reserved 0x400 bytes for rid 0x10 type 3 at 0xffa7fc00 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: on usb3 uhub3: 6 ports with 6 removable, self powered pcib1: at device 30.0 on pci0 pcib1: domain 0 pcib1: secondary bus 1 pcib1: subordinate bus 1 pcib1: I/O decode 0xc000-0xcfff pcib1: memory decode 0xff700000-0xff7fffff pcib1: prefetched decode 0xdea00000-0xdeafffff pcib1: Subtractively decoded bridge. ACPI: Found matching pin for 1.8.INTA at func 0: 11 ACPI: Found matching pin for 1.5.INTA at func 0: 255 ACPI: Found matching pin for 1.5.INTB at func 1: 11 ACPI: Found matching pin for 1.4.INTA at func 0: 4 pci1: on pcib1 pci1: domain=0, physical bus=1 found-> vendor=0x8086, dev=0x1043, revid=0x04 domain=0, bus=1, slot=4, func=0 class=02-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0116, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x22 (8500 ns) intpin=a, irq=4 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fd000, size 12, enabled pcib1: requested memory range 0xff7fd000-0xff7fdfff: good pcib1: matched entry for 1.4.INTA (src \\_SB_.LNKC:0) pcib1: slot 4 INTA routed to irq 4 via \\_SB_.LNKC found-> vendor=0x1180, dev=0x0475, revid=0xb8 domain=0, bus=1, slot=5, func=0 class=06-07-00, hdrtype=0x02, mfdev=1 cmdreg=0x0007, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x20 (960 ns), mingnt=0x80 (32000 ns), maxlat=0x07 (1750 ns) intpin=a, irq=255 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0, size 12, enabled found-> vendor=0x1180, dev=0x0551, revid=0x00 domain=0, bus=1, slot=5, func=1 class=0c-00-10, hdrtype=0x00, mfdev=1 cmdreg=0x0106, statreg=0x0210, cachelnsz=0 (dwords) lattimer=0x40 (1920 ns), mingnt=0x02 (500 ns), maxlat=0x04 (1000 ns) intpin=b, irq=11 powerspec 2 supports D0 D3 current D0 map[10]: type Memory, range 32, base 0xff7fe800, size 11, enabled pcib1: requested memory range 0xff7fe800-0xff7fefff: good pcib1: matched entry for 1.5.INTB (src \\_SB_.LNKA:0) pcib1: slot 5 INTB routed to irq 11 via \\_SB_.LNKA found-> vendor=0x8086, dev=0x103e, revid=0x83 domain=0, bus=1, slot=8, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0117, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x08 (2000 ns), maxlat=0x38 (14000 ns) intpin=a, irq=11 powerspec 2 supports D0 D1 D2 D3 current D0 map[10]: type Memory, range 32, base 0xff7ff000, size 12, enabled pcib1: requested memory range 0xff7ff000-0xff7fffff: good map[14]: type I/O Port, range 32, base 0xcc00, size 6, enabled pcib1: requested I/O range 0xcc00-0xcc3f: in range pcib1: matched entry for 1.8.INTA (src \\_SB_.LNKE:0) pcib1: slot 8 INTA routed to irq 11 via \\_SB_.LNKE ipw0: mem 0xff7fd000-0xff7fdfff irq 4 at device 4.0 on pci1 ipw0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7fd000 ipw0: bpf attached ipw0: Ethernet address: 00:04:23:71:77:46 ipw0: bpf attached ipw0: bpf attached ipw0: [MPSAFE] ipw0: [ITHREAD] ipw0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps cbb0: at device 5.0 on pci1 pcib1: cbb0 requested memory range 0xff700000-0xff7fffff: good cbb0: Lazy allocation of 0x1000 bytes rid 0x10 type 3 at 0xff700000 cardbus0: on cbb0 pccard0: <16-bit PCCard bus> on cbb0 pcib1: matched entry for 1.5.INTA (src \\_SB_.LNKB:0) pci_link1: Picked IRQ 9 with weight 0 pcib1: slot 5 INTA routed to irq 9 via \\_SB_.LNKB cbb0: [MPSAFE] cbb0: [ITHREAD] cbb0: PCI Configuration space: 0x00: 0x04751180 0x02100007 0x060700b8 0x00822000 0x10: 0xff700000 0x020000dc 0x20030201 0xfffff000 0x20: 0x00000000 0xfffff000 0x00000000 0xfffffffc 0x30: 0x00000000 0xfffffffc 0x00000000 0x07000109 0x40: 0x17441043 0x00000001 0x00000000 0x00000000 0x50: 0x00000000 0x00000000 0x00000000 0x00000000 0x60: 0x00000000 0x00000000 0x00000000 0x00000000 0x70: 0x00000000 0x00000000 0x00000000 0x00000000 0x80: 0x20a00001 0x00000000 0x04630463 0x00000000 0x90: 0x00000000 0x00000000 0x00000000 0x00000000 0xa0: 0x80000000 0x00000000 0x00000000 0x00000000 0xb0: 0x00000000 0x00000000 0x00000000 0x00000000 0xc0: 0x17441043 0x00000000 0x00000000 0x00000000 0xd0: 0x00000000 0x00000000 0x00000000 0xfe0a0001 0xe0: 0x24c04000 0x00000000 0x00000000 0x00000000 0xf0: 0x00000000 0x00000000 0x00000000 0x00000000 fwohci0: mem 0xff7fe800-0xff7fefff irq 11 at device 5.1 on pci1 fwohci0: Reserved 0x800 bytes for rid 0x10 type 3 at 0xff7fe800 fwohci0: [MPSAFE] fwohci0: [FILTER] fwohci0: OHCI version 1.0 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:03:10:02:07 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:10:02:07 fwe0: bpf attached fwe0: Ethernet address: 02:e0:18:10:02:07 fwip0: on firewire0 fwip0: bpf attached fwip0: Firewire address: 00:e0:18:00:03:10:02:07 @ 0xfffe00000000, S400, maxrec 2048 sbp0: on firewire0 dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x1374000 fwohci0: Initiate bus reset fwohci0: BUS reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode fxp0: port 0xcc00-0xcc3f mem 0xff7ff000-0xff7fffff irq 11 at device 8.0 on pci1 fxp0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xff7ff000 fxp0: using memory space register mapping fxp0: PCI IDs: 8086 103e 1043 1745 0083 fxp0: Dynamic Standby mode is disabled fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: fxp_miibus_readreg: timed out fxp0: MII without any PHY! device_attach: fxp0 attach returned 6 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0 ata0: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6 ata0: reset tp1 mask=03 ostat0=50 ostat1=00 ata0: stat0=0x90 err=0x90 lsb=0x90 msb=0x90 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata0: reset tp2 stat0=50 stat1=00 devices=0x1 ata0: [MPSAFE] ata0: [ITHREAD] ata1: on atapci0 atapci0: Reserved 0x8 bytes for rid 0x18 type 4 at 0x170 atapci0: Reserved 0x1 bytes for rid 0x1c type 4 at 0x376 ata1: reset tp1 mask=03 ostat0=50 ostat1=00 ata1: stat0=0x10 err=0x01 lsb=0x14 msb=0xeb ata1: stat1=0x00 err=0x01 lsb=0x7f msb=0x7f ata1: reset tp2 stat0=10 stat1=00 devices=0x4 ata1: [MPSAFE] ata1: [ITHREAD] pcm0: port 0xe000-0xe0ff,0xe100-0xe13f at device 31.5 on pci0 pcm0: Lazy allocation of 0x200 bytes rid 0x18 type 3 at 0x80000000 pcm0: Lazy allocation of 0x100 bytes rid 0x1c type 3 at 0x80000200 pcib0: matched entry for 0.31.INTB (src \\_SB_.LNKB:0) pcib0: slot 31 INTB routed to irq 9 via \\_SB_.LNKB pcm0: [MPSAFE] pcm0: [ITHREAD] pcm0: pcm0: Codec features headphone, 20 bit DAC, 20 bit ADC, 5 bit master volume, SigmaTel 3D Enhancement pcm0: Primary codec extended features variable rate PCM, reserved 1, AMAP, reserved 4 pcm0: ac97 codec dac ready count: 0 pcm0: Mixer "vol": pcm0: Mixer "pcm": pcm0: Mixer "speaker": pcm0: Mixer "line": pcm0: Mixer "mic": pcm0: Mixer "cd": pcm0: Mixer "rec": pcm0: Mixer "igain": pcm0: Mixer "ogain": pcm0: Mixer "line1": pcm0: Mixer "phin": pcm0: Mixer "phout": pcm0: Mixer "video": pcm0: clone manager: deadline=750ms flags=0x8000001e pcm0: sndbuf_setmap 1620000, 4000; 0xd4d6f000 -> 1620000 pcm0: sndbuf_setmap 162c000, 4000; 0xd4d73000 -> 162c000 pci0: at device 31.6 (no driver attached) acpi_button0: on acpi0 acpi_lid0: on acpi0 acpi_tz0: on acpi0 acpi_acad0: on acpi0 battery0: on acpi0 battery1: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 atkbd: the current kbd controller command byte 0065 atkbd: keyboard ID 0x41ab (2) kbd0 at atkbd0 kbd0: atkbd0, AT 101/102 (2), config:0x0, flags:0x3d0000 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: unable to allocate IRQ psmcpnp0: irq 12 on acpi0 psm0: current command byte:0065 psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0-00, 2 buttons psm0: config:00000000, flags:00000008, packet size:3 psm0: syncmask:c0, syncbits:00 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0: configured irq 3 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: irq maps: 0 0 0 0 sio0 port 0x2f8-0x2ff irq 3 drq 1 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff unknown: status reg test failed ff ex_isa_identify() ahc_isa_probe 12: ioport 0xcc00 alloc failed ahc_isa_probe 13: ioport 0xdc00 alloc failed ahc_isa_probe 14: ioport 0xec00 alloc failed ata: ata0 already exists; skipping it ata: ata1 already exists; skipping it atkbdc: atkbdc0 already exists; skipping it sio: sio0 already exists; skipping it pnp_identify: Trying Read_Port at 203 pnp_identify: Trying Read_Port at 243 pnp_identify: Trying Read_Port at 283 pnp_identify: Trying Read_Port at 2c3 pnp_identify: Trying Read_Port at 303 pnp_identify: Trying Read_Port at 343 pnp_identify: Trying Read_Port at 383 pnp_identify: Trying Read_Port at 3c3 PNP Identify complete sc: sc0 already exists; skipping it vga: vga0 already exists; skipping it isa_probe_children: disabling PnP devices isa_probe_children: probing non-PnP devices pmtimer0 on isa0 orm0: at iomem 0xc0000-0xccfff pnpid ORM0000 on isa0 adv0: not probed (disabled) aha0: not probed (disabled) aic0: not probed (disabled) bt0: not probed (disabled) cs0: not probed (disabled) ed0: not probed (disabled) fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fe0: not probed (disabled) ie0: not probed (disabled) le0: not probed (disabled) ppc0: parallel port found at 0x378 ppc0: using extended I/O port range ppc0: ECP SPP ECP+EPP SPP ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [MPSAFE] ppbus0: [ITHREAD] plip0: on ppbus0 plip0: bpf attached lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sc0: fb0, kbd1, terminal emulator: sc (syscons terminal) sio1 failed to probe at port 0x2f8 irq 3 on isa0 sio2: not probed (disabled) sio3: not probed (disabled) sn0: not probed (disabled) vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 vt0: not probed (disabled) isa_probe_children: probing PnP devices Device configuration finished. procfs registered Timecounter "TSC" frequency 600023372 Hz quality 800 Timecounters tick every 1.000 msec lo0: bpf attachedfirewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) hptrr: no controller detected. acpi_acad0: acline initialization start battery0: battery initialization start battery1: battery initialization start ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire system power profile changed to 'economy' ad0: setting PIO4 on ICH4 chip acpi_acad0: Off Line acpi_acad0: acline initialization done, tried 1 times ad0: setting UDMA100 on ICH4 chip ad0: 38154MB at ata0-master UDMA100 ad0: 78140160 sectors [77520C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad0 battery0: battery initialization done, tried 1 times ad0: Intel check1 failed ad0: Adaptec check1 failed ad0: LSI (v3) check1 failed ad0: LSI (v2) check1 failed ad0: FreeBSD check1 failed ata1-master: pio=PIO4 wdma=WDMA2 udma=UDMA33 cable=40 wire acd0: setting PIO4 on ICH4 chip acd0: setting UDMA33 on ICH4 chip acd0: DVDROM drive at ata1 as master acd0: read 4125KB/s (4125KB/s), 512KB buffer, UDMA33 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet acd0: Writes: acd0: Audio: play, 16 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc pcm0: measured ac97 link rate at 48008 Hz, will use 48000 Hz acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe0:ata1:0:0:0): Down reving Protocol Version from 2 to 0? (probe0:ata1:0:0:0): error 6 (probe0:ata1:0:0:0): Unretryable Error acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 (probe0:ata1:0:0:0): error 22 (probe0:ata1:0:0:0): Unretryable Error (probe1:sbp0:0:0:0): error 22 (probe1:sbp0:0:0:0): Unretryable Error (probe2:sbp0:0:1:0): error 22 (probe2:sbp0:0:1:0): Unretryable Error (probe3:sbp0:0:2:0): error 22 (probe3:sbp0:0:2:0): Unretryable Error (probe4:sbp0:0:3:0): error 22 (probe4:sbp0:0:3:0): Unretryable Error (probe5:sbp0:0:4:0): error 22 (probe5:sbp0:0:4:0): Unretryable Error (probe6:sbp0:0:5:0): error 22 (probe6:sbp0:0:5:0): Unretryable Error (probe7:sbp0:0:6:0): error 22 (probe7:sbp0:0:6:0): Unretryable Error pass0 at ata1 bus 0 target 0 lun 0 pass0: Removable CD-ROM SCSI-0 device pass0: 33.000MB/s transfers GEOM: new disk cd0 ATA PseudoRAID load(cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error cd0 at ata1 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present ed (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error (cd0:ata1:0:0:0): error 6 (cd0:ata1:0:0:0): Unretryable Error Trying to mount root from ufs:/dev/ad0s1a start_init: trying /sbin/init drm0: on vgapci0 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm1: on vgapci1 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized i915 1.5.0 20060119 drm0: [MPSAFE] drm0: [ITHREAD] battery1: battery initialization failed, giving up umass0: on uhub3 umass0:3:0:-1: Attached to scbus3 (probe0:umass-sim0:0:0:0): error 22 (probe0:umass-sim0:0:0:0): Unretryable Error pass1 at umass-sim0 bus 0 target 0 lun 0 pass1: < > Removable Direct Access SCSI-2 device pass1: 40.000MB/s transfers GEOM: new disk da0 da0 at umass-sim0 bus 0 target 0 lun 0 da0: < > Removable Direct Access SCSI-2 device da0: 40.000MB/s transfers da0: 3102MB (6354432 512 byte sectors: 255H 63S/T 395C) GEOM_LABEL: Label for provider da0s1a is ufs/usbdrive. From paul at gtcomm.net Tue Jul 8 20:57:38 2008 From: paul at gtcomm.net (Paul) Date: Tue Jul 8 20:57:44 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> Message-ID: <4873D539.9060107@gtcomm.net> But this is probably no routing table, and single source and dst ips or very limited number of ips and ports. the entire problem with Linux is the route cache, try and generate random source ips and random source/dst ports and it won't even do 100kpps without problems. I would like to log into the machine and see 1.4Mpps going through 3 nics :) Brian McGinty wrote: >> I have a pre-production card. With some bug fixes and some tuning of >> interrupt handling (custom stack - I've been asked to push the changes >> back in to CVS, I just don't have time right now) an otherwise >> unoptimized igb can forward 1.04Mpps from one port to another (1.04 >> Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 >> core system. >> > > I have a 8 core system running stock Linux that easily does line rate > (ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps! > > Cheers, > Brian. > > >> -Kip >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >> > > From kip.macy at gmail.com Tue Jul 8 21:06:19 2008 From: kip.macy at gmail.com (Kip Macy) Date: Tue Jul 8 21:06:26 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> References: <4867420D.7090406@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 1:46 PM, Brian McGinty wrote: >> I have a pre-production card. With some bug fixes and some tuning of >> interrupt handling (custom stack - I've been asked to push the changes >> back in to CVS, I just don't have time right now) an otherwise >> unoptimized igb can forward 1.04Mpps from one port to another (1.04 >> Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 >> core system. > > I have a 8 core system running stock Linux that easily does line rate > (ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps! Hi Brian I very much doubt that this is ceteris paribus. This is 384 random IPs -> 384 random IP addresses with a flow lookup for each packet. Also, I've read through igb on Linux - it has a lot of optimizations that the FreeBSD driver lacks and I have yet to implement. Thanks, Kip From brian.mcginty at gmail.com Tue Jul 8 21:13:49 2008 From: brian.mcginty at gmail.com (Brian McGinty) Date: Tue Jul 8 21:14:01 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> Message-ID: <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> > I have a pre-production card. With some bug fixes and some tuning of > interrupt handling (custom stack - I've been asked to push the changes > back in to CVS, I just don't have time right now) an otherwise > unoptimized igb can forward 1.04Mpps from one port to another (1.04 > Mpps in on igb0 and 1.04 Mpps out on igb1) using 3.5 cores on an 8 > core system. I have a 8 core system running stock Linux that easily does line rate (ie, 1.488 Mpps) on 3 (82575) interfaces. Ie, 3 * 1.48 Mpps! Cheers, Brian. > > -Kip > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From brde at optusnet.com.au Wed Jul 9 05:30:24 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 9 05:30:32 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707221257.GH62764@server.vk2pj.dyndns.org> References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> <20080707221257.GH62764@server.vk2pj.dyndns.org> Message-ID: <20080709142008.H26105@delplex.bde.org> On Tue, 8 Jul 2008, Peter Jeremy wrote: > On 2008-Jul-07 13:25:13 -0700, Julian Elischer wrote: >> what you need is a speculative prefetch where you an tell teh >> processor "We will probably need the following address so start >> getting it while we go do other stuff". > > This looks like the PREFETCH instructions that exist in at least amd64 > and SPARC. Unfortunately, their optimal use is very implementation- > dependent and the AMD documentation suggests that incorrect use can > degrade performance. I use the following hacks to test these in my version of bge in ~5.2: % Index: dev/bge/if_bge.c % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v % retrieving revision 1.84 % diff -u -2 -r1.84 if_bge.c % --- dev/bge/if_bge.c 12 Mar 2005 06:51:25 -0000 1.84 % +++ dev/bge/if_bge.c 8 Jul 2008 04:49:12 -0000 % @@ -2690,4 +2845,11 @@ % */ % % +int bge_prefetch = 1; % +int bge_nprefetchnta = 0; % +int bge_nprefetch = 0x40; % +int bge_nprefetchw = 0; % +int bge_nprefetch0 = 0; % +int bge_nprefetch1 = 0; % +int bge_nprefetch2 = 0; % static void % bge_rxeof(sc) % @@ -2789,4 +2960,35 @@ % #endif % eh = mtod(m, struct ether_header *); % + if (bge_prefetch) { % + struct cl { % + char cl_data[64]; /* XXX */ % + } *clp; % + int i, j; % + % + /* XXX misalignment is likely. */ % + clp = mtod(m, struct cl *); % +#ifdef __i386__ /* XXX actually 3dnow */ % + for (i = 0, j = 0; i < bge_nprefetchnta; % + i += sizeof(*clp), j++) % + __asm("prefetchnta %0" : : "m" (clp[j])); % + for (i = 0, j = 0; i < bge_nprefetch; % + i += sizeof(*clp), j++) % + __asm("prefetch %0" : : "m" (clp[j])); % + for (i = 0, j = 0; i < bge_nprefetchw; % + i += sizeof(*clp), j++) % + __asm("prefetchw %0" : : "m" (clp[j])); % +#endif % +#ifdef __amd64__ % + for (i = 0, j = 0; i < bge_nprefetch0; % + i += sizeof(*clp), j++) % + __asm("prefetch0 %0" : : "m" (clp[j])); % + for (i = 0, j = 0; i < bge_nprefetch1; % + i += sizeof(*clp), j++) % + __asm("prefetch1 %0" : : "m" (clp[j])); % + for (i = 0, j = 0; i < bge_nprefetch2; % + i += sizeof(*clp), j++) % + __asm("prefetch2 %0" : : "m" (clp[j])); % +#endif % + } % m->m_pkthdr.len = m->m_len = cur_rx->bge_len - ETHER_CRC_LEN; % m->m_pkthdr.rcvif = ifp; % Index: net/if_ethersubr.c % =================================================================== % RCS file: /home/ncvs/src/sys/net/if_ethersubr.c,v % retrieving revision 1.174 % diff -u -2 -r1.174 if_ethersubr.c % --- net/if_ethersubr.c 24 Jun 2004 12:31:44 -0000 1.174 % +++ net/if_ethersubr.c 7 Jul 2008 18:31:13 -0000 % @@ -479,4 +479,5 @@ % * mbuf chain m with the ethernet header at the front. % */ % +int monearly = 0; % static void % ether_input(struct ifnet *ifp, struct mbuf *m) % @@ -485,4 +486,12 @@ % u_short etype; % % + if (monearly && ifp->if_flags & IFF_MONITOR) { % + /* % + * Interface marked for monitoring; discard packet. % + */ % + m_freem(m); % + return; % + } % + % /* % * Do consistency checks to verify assumptions The results were underwhelming and contrary to Andre's assertion that the primary bottleneck (apart from PCI32) is hardware-related cache misses (I think it is software-related cache misses). I previously reported that fixing monitor mode avoids 1 cache miss and thus saves 5% CPU. Plain prefetch forces this cache miss (but no other hardware-related ones, since there are no other hardware-related ones in upper layers) to occur asynchronously and always occur. However, it only saves 2% in unfixed normal mode and in unfixed monitor mode (in fixed monitor mode, it makes little difference except to not avoid the cache miss -- since the cache miss is asynchronous it doesn't affect %CPU much). Even 5% is a relatively uninteresting savings, since the non-hardware related CPU overhead is 10 times as much as that. I'm testing only receive of udp packets with a payload of 5 bytes (padded), so the whole packet fits in 64 bytes and there is only 1 hardware-related cache miss per packet to avoid or prefetch. The precise size is 60 (64 - CRC_size I think). m->m_data is always misaligned at an offset of 2 bytes from a 64-byte cache line boundary, prefetching 64 bytes at this address is not quite right, but since the 60 bytes all fit in 1 cache line, the prefetch fetches enough. prefetchnta as in Andre's old patch (16 Dec 2004) didn't seem to work. I also prefetch as soon as possible in the driver interrupt handler where Andre's old patch prefetches in ether_input() where this is almost certainly too late. The difference between the 5% and the 2% saings may be due to it also being too late in the driver interrupt handler. Someone mentioned not caring about latency. Doing something else to wait for all the prefetches made by the interrupt handler to complete might help here, but only if you could find something useful to do (hard), and I think latency would just increase the slowness in most cases since significant latency would require long queues and the long queues would bust caches (starting with discarding all the prefetches). Andre's old patch uses a hard-coded prefetch size of 74 (76 after source alignment and 128 after rounding up) where mine uses a parameter of 64 (66 after virtual source alignment and 64 after rounding down). This would cause an unnecessary extra cache miss for small packets. It too only tries to prefetch the packet header, but allows for tcp and tcp options so a small packet's headers alone are larger than 64 bytes. The extra cache miss for never-accessed data shouldn' cost much since it uses prefetchnta. (All of my tests are on an Athlon64 where prefetchnta actually works, unlike on AthlonXP. But actually working might be responsible for it not being very effective here. To work, it must not be too aggressive or it will cost too much for never-accessed data.) Timings (some repeated), all for ttcp receiving on bge0 at 397 kpps: -monitor: 35% idle (8.0-CURRENT) 14 cm/p monitor: 83% idle (8.0-CURRENT) 6 cm/p +monitor: 85% idle (8.0-CURRENT) 5 cm/p -monitor: 17% idle (~5.2) 19 cm/p 17-19 monitor: 66% idle (~5.2) 8 cm/p 66-68 +monitor: 71% idle (~5.2) 7 cm/p 70-75 cm/p = k8-dc-misses (bge0 system) +monitor is monitor mode with the exit moved to the top of ether_input(). Patch for ~5.2 now included. Results with prefetch not actually shown above since I forgot half of the details. cm/p was unchanged except for +monitor it is increased (by the unused prefetch). %idle decreased by 1-2% (less in -current where there is less slop) except for +monitor. Note that -current has many improvements over ~5.2 in both %CPU and cache misses for receiving. But for sending, -current gives a 10% lower rate for the same CPU (100%) though it reduces cache misses. Simplified or improved patches for -current: % diff -c2 ./dev/bge/if_bge.c~ ./dev/bge/if_bge.c % *** ./dev/bge/if_bge.c~ Fri May 16 16:39:01 2008 % --- ./dev/bge/if_bge.c Tue Jul 8 07:58:52 2008 % *************** % *** 3017,3020 **** % --- 3133,3137 ---- % */ % % + int bge_prefetch = 1; % static void % bge_rxeof(struct bge_softc *sc) % *************** % *** 3126,3129 **** % --- 3252,3257 ---- % m->m_pkthdr.len = m->m_len = cur_rx->bge_len - ETHER_CRC_LEN; % m->m_pkthdr.rcvif = ifp; % + if (bge_prefetch) % + __asm("prefetch %0" : : "m" (*mtod(m, char *))); % % if (ifp->if_capenable & IFCAP_RXCSUM) { % diff -c2 ./net/if_ethersubr.c~ ./net/if_ethersubr.c % *** ./net/if_ethersubr.c~ Fri May 16 16:41:45 2008 % --- ./net/if_ethersubr.c Tue Jul 8 07:55:14 2008 % *************** % *** 509,512 **** % --- 507,511 ---- % * mbuf chain m with the ethernet header at the front. % */ % + int broken_monitor = 0; % static void % ether_input(struct ifnet *ifp, struct mbuf *m) % *************** % *** 546,550 **** % } % eh = mtod(m, struct ether_header *); % - etype = ntohs(eh->ether_type); % if (m->m_pkthdr.rcvif == NULL) { % if_printf(ifp, "discard frame w/o interface pointer\n"); % --- 545,548 ---- % *************** % *** 560,564 **** % #endif % % ! if (ETHER_IS_MULTICAST(eh->ether_dhost)) { % if (ETHER_IS_BROADCAST(eh->ether_dhost)) % m->m_flags |= M_BCAST; % --- 558,564 ---- % #endif % % ! if (((ifp->if_flags & IFF_MONITOR) == 0 || broken_monitor) && % ! ETHER_IS_MULTICAST(eh->ether_dhost)) { % ! /* XXX bpf might need this even in monitor mode. */ % if (ETHER_IS_BROADCAST(eh->ether_dhost)) % m->m_flags |= M_BCAST; % *************** % *** 616,619 **** % --- 616,620 ---- % * TODO: Deal with Q-in-Q frames, but not arbitrary nesting levels. % */ % + etype = ntohs(eh->ether_type); % if ((m->m_flags & M_VLANTAG) == 0 && etype == ETHERTYPE_VLAN) { % struct ether_vlan_header *evl; Bruce From brde at optusnet.com.au Wed Jul 9 08:50:30 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 9 08:50:37 2008 Subject: svn commit: r180256 - head/sys/dev/arl In-Reply-To: <200807041748.m64HmZur018637@svn.freebsd.org> References: <200807041748.m64HmZur018637@svn.freebsd.org> Message-ID: <20080705161831.F13262@delplex.bde.org> On Fri, 4 Jul 2008, John Baldwin wrote: > Author: jhb > Date: Fri Jul 4 17:48:34 2008 > New Revision: 180256 > URL: http://svn.freebsd.org/changeset/base/180256 > > Log: > Make arl(4) MPSAFE: > ... > - ifp->if_snd.ifq_maxlen = IFQ_MAXLEN; > + IFQ_SET_MAXLEN(&ifp->if_snd, IFQ_MAXLEN); Why do we obfuscate setting of ifq_maxlen using a macro, especially when the setting is to a wrong default value? The macro was introduced with ALTQ changes, but seems to have never done anything different for ALTQ. ALTQ also introduced an ifq_drv_maxlen field, but the macro provides no help for managing this. Drivers that support ALTQ end up with 2 settings of ifq_*maxlen, one direct one for ifq_drv_maxlen and one obfuscated one of ifq_maxlen. arl apparently doesn't support ALTQ, and you didn't fix this -- it still doesn't set ifq_drv_maxlen. if_attach() uses the correct default value of ifqmaxlen if the driver leaves ifp->if_snd.ifq_maxlen set to 0, but prints a bogus warning about this. Non-driver code under net/ still mostly doesn't use the obfuscation, but uses IFQ_MAXLEN and ifqmaxlen almost perfectly randomly to have about 50% of each. Since ifqmaxlen isn't a tuneable or sysctl, and is statically initialized to IFQ_MAXLEN, not using only makes a difference if someone iniitalizes it diffently using a debugger, so these bugs are normally just spelling errors. IFQ_MAXLEN is also too small for 1Gbps or even 100Nbps hardware devices, so only drivers for old hardware and some software drivers can use it anyway. Bruce From rwatson at FreeBSD.org Wed Jul 9 12:14:20 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Jul 9 12:14:26 2008 Subject: svn commit: r180256 - head/sys/dev/arl In-Reply-To: <20080705161831.F13262@delplex.bde.org> References: <200807041748.m64HmZur018637@svn.freebsd.org> <20080705161831.F13262@delplex.bde.org> Message-ID: <20080709131101.S8639@fledge.watson.org> On Sat, 5 Jul 2008, Bruce Evans wrote: > On Fri, 4 Jul 2008, John Baldwin wrote: Since ifqmaxlen isn't a tuneable or > sysctl, and is statically initialized to IFQ_MAXLEN, not using only makes a > difference if someone iniitalizes it diffently using a debugger, so these > bugs are normally just spelling errors. IFQ_MAXLEN is also too small for > 1Gbps or even 100Nbps hardware devices, so only drivers for old hardware and > some software drivers can use it anyway. I was actually thinking about this this morning -- Paul Saab pointed out to me that, on Linux, you can run-time tune the transmit queue limit using ifconfig(8). I think doing something similar would, if nothing else, make it easier to understand the impact of our current queue settings in testing. And, just to put it on the table in e-mail, since I know it has come up a lot at developer summits: the ALTQ infrastructure is decreasingly compatible with current network devices, which often have quite large queues (descriptor rings) in hardware, or where there are multiple transmit queues. One possibility I've been considering is making the whole ifq subsystem a library to device drivers, rather than a required interface to transmit. This would allow the device driver to instantiate more than one if there are multiple hardware queues that need to be represented, or, for example, allow synthetic encapsulation interfaces (such as vlan) to avoid queueing entirely and directly dispatch to the lower layer interface without requiring a mandatory enqueue/dequeue step. I've started hacking on this every now and then, but it requires a lot of code to be touched -- it's something we do need to address before 8.0, however. Robert N M Watson Computer Laboratory University of Cambridge From sclark46 at earthlink.net Wed Jul 9 14:33:52 2008 From: sclark46 at earthlink.net (Stephen Clark) Date: Wed Jul 9 14:33:58 2008 Subject: 6.3-p2 gre Message-ID: <4874C71D.5000204@earthlink.net> Hello List, I am running ospf over a gre/vpn tunnel. When I run tcpdump on the gre interface ospf stops working. I see the following errors in the ospfd log. 2008/07/09 10:05:02 OSPF: *** sendmsg in ospf_write failed to 224.0.0.5, id 0, off 0, len 68, interface gre1, mtu 1412: Network is down 2008/07/09 10:05:12 OSPF: *** sendmsg in ospf_write failed to 224.0.0.5, id 0, off 0, len 68, interface gre1, mtu 1412: Network is down if I then do ifconfig gre1 down;ifconfig gre1 up ospf start workding again. this does not happen on freebsd 4.x Also if I use the -p flag when running tcpdump things seem to be OK and ospf continues to work normally. Why would putting the gre interface in promiscuous mode cause this problem? Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson) From brde at optusnet.com.au Wed Jul 9 15:13:10 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 9 15:13:17 2008 Subject: svn commit: r180256 - head/sys/dev/arl In-Reply-To: <20080709131101.S8639@fledge.watson.org> References: <200807041748.m64HmZur018637@svn.freebsd.org> <20080705161831.F13262@delplex.bde.org> <20080709131101.S8639@fledge.watson.org> Message-ID: <20080710002759.Q27395@delplex.bde.org> On Wed, 9 Jul 2008, Robert Watson wrote: > On Sat, 5 Jul 2008, Bruce Evans wrote: > >> On Fri, 4 Jul 2008, John Baldwin wrote: Since ifqmaxlen isn't a tuneable or >> sysctl, and is statically initialized to IFQ_MAXLEN, not using only makes a >> difference if someone iniitalizes it diffently using a debugger, so these >> bugs are normally just spelling errors. IFQ_MAXLEN is also too small for >> 1Gbps or even 100Nbps hardware devices, so only drivers for old hardware >> and some software drivers can use it anyway. > > I was actually thinking about this this morning -- Paul Saab pointed out to > me that, on Linux, you can run-time tune the transmit queue limit using > ifconfig(8). I think doing something similar would, if nothing else, make it > easier to understand the impact of our current queue settings in testing. Yes, the control should really be per-device. However, I don't like the bloat for dynamic everything in every driver. However2, I use a hack (a per-driver global possibly-set by ddb at boot time) to optionally enlarge the tx queue for all drivers that I touch. It was in editing this and having to change it for ALTQ and its unnecessary macro that I noticed the bogusness if IFQ_MAXLEN and the ALTQ macro. > And, just to put it on the table in e-mail, since I know it has come up a lot > at developer summits: the ALTQ infrastructure is decreasingly compatible with > current network devices, which often have quite large queues (descriptor > rings) in hardware, or where there are multiple transmit queues. One Hardware queues are never large :-). 512 is common, but enlargement gives ~20000. 20000 is too large for most purposes but rarely matters. I don't use ALTQ, and just notice that very rarely, latency can be enormous if the queue length builds up to he maximum. > possibility I've been considering is making the whole ifq subsystem a library > to device drivers, rather than a required interface to transmit. This would > allow the device driver to instantiate more than one if there are multiple > hardware queues that need to be represented, or, for example, allow synthetic > encapsulation interfaces (such as vlan) to avoid queueing entirely and > directly dispatch to the lower layer interface without requiring a mandatory > enqueue/dequeue step. I've started hacking on this every now and then, but > it requires a lot of code to be touched -- it's something we do need to > address before 8.0, however. Could this be more efficient? I think direct dispatch wouldn't work well. It didn't help as much as hoped for rx, and tx is predictable so perfect scheduling of it is possible (only dispatch in bulk in order to be more efficient). Also, the current implementation gives necessary watermark stuff almost automatically -- the queue split gives a virtual low watermark at the split point, and this reduces the chance of the combined queue running dry. Bruce From zaphod at fsklaw.com Wed Jul 9 15:22:34 2008 From: zaphod at fsklaw.com (zaphod@fsklaw.com) Date: Wed Jul 9 15:22:40 2008 Subject: Tunneling issues In-Reply-To: <200807040155.m641tl8s000607@lava.sentex.ca> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> Message-ID: <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> > At 03:15 PM 7/3/2008, zaphod@fsklaw.com wrote: >>I have a real poser, and I ccan't solve it. >> >>Currently I have a ipsec vpn tunneling 14 servers through a central >> server. >> >>I would like to restructure this so that each server talks to each other >>directly, rather than passing everything through a single server. >> >>However, on every other machine I cannot get a second tunnel to come up. >>Not a gre or gif tunnel. And yet I have 14 on the central machine. > > You would need a lot of policies on each of the boxes (14) but there > is no reason it should not work. Do each of the sites have a unique > subnet ? Do they have static IP addresses ? > > > An easier solution might be to use something like OpenVPN which > allows all the boxes to auth and route through a single server, but > they can also talk to each other with a single config option. > > ---Mike Mike, thanks for the response. I agree it should work. But it's not. With respect to the next two questions, yes and yes. I'm not a huge fan of OpenVPN, but the bigger issue is that the gif tunnels come up at boot up. As well as routes. Given the client server nature of OpenVPN it is suitable, because if a server reboots, I'm not certain a client would auto re-connect. But I have done no testing. And If I can't reesolve this I may have to. Cheers, Zaphod > > > From mike at sentex.net Wed Jul 9 15:45:41 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 9 15:45:47 2008 Subject: Tunneling issues In-Reply-To: <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> Message-ID: <200807091545.m69FjcP4031350@lava.sentex.ca> At 11:21 AM 7/9/2008, zaphod@fsklaw.com wrote: >I agree it should work. But it's not. With respect to the next two >questions, yes and yes. Can you post some of the configs you are using for 3 of the sites so we can perhaps spot the problem(s) you are having ? I have a similar setup with 5 sites, all talking to each other via IPSEC tunnels. Its a lot of policies, but they work just fine. >I'm not a huge fan of OpenVPN, but the bigger issue is that the gif >tunnels come up at boot up. As well as routes. Given the client server >nature of OpenVPN it is suitable, because if a server reboots, I'm not >certain a client would auto re-connect. We have ~ 400 sites running OpenVPN across Canada that all reconnect just fine after reboots / power cycles etc. We dont let the clients talk to each other, but that would just be a config change to allow that to work. ---Mike From sclark46 at earthlink.net Wed Jul 9 16:50:50 2008 From: sclark46 at earthlink.net (Stephen Clark) Date: Wed Jul 9 16:50:57 2008 Subject: Tunneling issues In-Reply-To: <200807091545.m69FjcP4031350@lava.sentex.ca> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> Message-ID: <4874EC67.6020104@earthlink.net> Mike Tancsa wrote: > At 11:21 AM 7/9/2008, zaphod@fsklaw.com wrote: > >> I agree it should work. But it's not. With respect to the next two >> questions, yes and yes. > > Can you post some of the configs you are using for 3 of the sites so we > can perhaps spot the problem(s) you are having ? I have a similar setup > with 5 sites, all talking to each other via IPSEC tunnels. Its a lot of > policies, but they work just fine. > > > > >> I'm not a huge fan of OpenVPN, but the bigger issue is that the gif >> tunnels come up at boot up. As well as routes. Given the client server >> nature of OpenVPN it is suitable, because if a server reboots, I'm not >> certain a client would auto re-connect. > > We have ~ 400 sites running OpenVPN across Canada that all reconnect > just fine after reboots / power cycles etc. We dont let the clients > talk to each other, but that would just be a config change to allow that > to work. > > ---Mike > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > Hi, I do this also - having both multiple gre/vpn tunnels to do ospf. Using freebsd 4.x and 6.1 Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson) From rwatson at FreeBSD.org Wed Jul 9 17:17:33 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Wed Jul 9 17:17:39 2008 Subject: svn commit: r180256 - head/sys/dev/arl In-Reply-To: <20080710002759.Q27395@delplex.bde.org> References: <200807041748.m64HmZur018637@svn.freebsd.org> <20080705161831.F13262@delplex.bde.org> <20080709131101.S8639@fledge.watson.org> <20080710002759.Q27395@delplex.bde.org> Message-ID: <20080709181531.Q8639@fledge.watson.org> On Thu, 10 Jul 2008, Bruce Evans wrote: >> possibility I've been considering is making the whole ifq subsystem a >> library to device drivers, rather than a required interface to transmit. >> This would allow the device driver to instantiate more than one if there >> are multiple hardware queues that need to be represented, or, for example, >> allow synthetic encapsulation interfaces (such as vlan) to avoid queueing >> entirely and directly dispatch to the lower layer interface without >> requiring a mandatory enqueue/dequeue step. I've started hacking on this >> every now and then, but it requires a lot of code to be touched -- it's >> something we do need to address before 8.0, however. > > Could this be more efficient? > > I think direct dispatch wouldn't work well. It didn't help as much as hoped > for rx, and tx is predictable so perfect scheduling of it is possible (only > dispatch in bulk in order to be more efficient). Also, the current > implementation gives necessary watermark stuff almost automatically -- the > queue split gives a virtual low watermark at the split point, and this > reduces the chance of the combined queue running dry. In most cases, what I have in mind would simply be a rearrangement rather than a functional change. However, for vlans, I think it would significantly lower overhead without really modifying queueing behavior: notice that we enqueue it at the VLAN layer just to dequeue it a few instructions later so that we can enqueue it a layer lower. Robert N M Watson Computer Laboratory University of Cambridge From zaphod at fsklaw.com Wed Jul 9 17:31:38 2008 From: zaphod at fsklaw.com (zaphod@fsklaw.com) Date: Wed Jul 9 17:31:44 2008 Subject: Tunneling issues In-Reply-To: <200807091545.m69FjcP4031350@lava.sentex.ca> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> Message-ID: > At 11:21 AM 7/9/2008, zaphod@fsklaw.com wrote: > >>I agree it should work. But it's not. With respect to the next two >>questions, yes and yes. > > Can you post some of the configs you are using for 3 of the sites so > we can perhaps spot the problem(s) you are having ? I have a similar > setup with 5 sites, all talking to each other via IPSEC tunnels. Its > a lot of policies, but they work just fine. > > > > >>I'm not a huge fan of OpenVPN, but the bigger issue is that the gif >>tunnels come up at boot up. As well as routes. Given the client server >>nature of OpenVPN it is suitable, because if a server reboots, I'm not >>certain a client would auto re-connect. > > We have ~ 400 sites running OpenVPN across Canada that all reconnect > just fine after reboots / power cycles etc. We dont let the clients > talk to each other, but that would just be a config change to allow > that to work. > > ---Mike > Last first. Well that's good info on OpenVPN. As to the first, I'm not even at the ipsec stage yet. I'm just trying to get tunnels up. I wrote a couple of shell scripts to bring them up for testing. Server1 orange# more mkgif #/bin/sh ifconfig gif1 create ifconfig gif1 1.1.1.1 2.2.2.2 ifconfig gif1 inet 192.168.72.1 192.168.70.1 netmask 255.255.255.0 ifconfig gif1 tunnel 1.1.1.1 2.2.2.2 ifconfig gif1 mtu 1500 route change 192.168.70.0 192.168.70.1 255.255.255.0 route change 192.168.71.0 192.168.70.1 255.255.255.0 Server2 to# more mkgif #/bin/sh ifconfig gif1 create ifconfig gif1 2.2.2.2 1.1.1.1 ifconfig gif1 inet 192.168.70.1 192.168.72.1 netmask 255.255.255.0 ifconfig gif1 tunnel 2.2.2.2 1.1.1.1 ifconfig gif1 mtu 1500 route change 192.168.72.0 192.168.72.1 255.255.255.0 Seems pretty straight forward a tunnel. But nothing heads out. Can't ping a thing. I even tried a gre, when I did that I got a ping error. Unfortunately I can't find my note on the exact error. Cheers, Zaphod > > From julian at elischer.org Wed Jul 9 17:49:21 2008 From: julian at elischer.org (Julian Elischer) Date: Wed Jul 9 17:49:28 2008 Subject: Tunneling issues In-Reply-To: References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> Message-ID: <4874FA1F.40209@elischer.org> zaphod@fsklaw.com wrote: >> At 11:21 AM 7/9/2008, zaphod@fsklaw.com wrote: >> >>> I agree it should work. But it's not. With respect to the next two >>> questions, yes and yes. >> Can you post some of the configs you are using for 3 of the sites so >> we can perhaps spot the problem(s) you are having ? I have a similar >> setup with 5 sites, all talking to each other via IPSEC tunnels. Its >> a lot of policies, but they work just fine. >> >> >> >> >>> I'm not a huge fan of OpenVPN, but the bigger issue is that the gif >>> tunnels come up at boot up. As well as routes. Given the client server >>> nature of OpenVPN it is suitable, because if a server reboots, I'm not >>> certain a client would auto re-connect. >> We have ~ 400 sites running OpenVPN across Canada that all reconnect >> just fine after reboots / power cycles etc. We dont let the clients >> talk to each other, but that would just be a config change to allow >> that to work. >> >> ---Mike >> > Last first. Well that's good info on OpenVPN. > > As to the first, I'm not even at the ipsec stage yet. I'm just trying to > get tunnels up. I wrote a couple of shell scripts to bring them up for > testing. > > Server1 > > orange# more mkgif > #/bin/sh > ifconfig gif1 create > ifconfig gif1 1.1.1.1 2.2.2.2 ^^^^ what's that for? since you over-ride it in the next line vvvvv > ifconfig gif1 inet 192.168.72.1 192.168.70.1 netmask 255.255.255.0 (PTP links don't have netmasks) > ifconfig gif1 tunnel 1.1.1.1 2.2.2.2 > ifconfig gif1 mtu 1500 > route change 192.168.70.0 192.168.70.1 255.255.255.0 > route change 192.168.71.0 192.168.70.1 255.255.255.0 > > Server2 > to# more mkgif > #/bin/sh > ifconfig gif1 create > ifconfig gif1 2.2.2.2 1.1.1.1 > ifconfig gif1 inet 192.168.70.1 192.168.72.1 netmask 255.255.255.0 > ifconfig gif1 tunnel 2.2.2.2 1.1.1.1 > ifconfig gif1 mtu 1500 > route change 192.168.72.0 192.168.72.1 255.255.255.0 > > Seems pretty straight forward a tunnel. But nothing heads out. Can't ping > a thing. > > I even tried a gre, when I did that I got a ping error. Unfortunately I > can't find my note on the exact error. > > Cheers, > > Zaphod >> > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From mike at sentex.net Wed Jul 9 18:04:35 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 9 18:04:42 2008 Subject: Tunneling issues In-Reply-To: References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> Message-ID: <200807091804.m69I4VOh031916@lava.sentex.ca> At 01:30 PM 7/9/2008, zaphod@fsklaw.com wrote: >Seems pretty straight forward a tunnel. But nothing heads out. Can't ping >a thing. I think your tunnel endpoints are overlapping your remote subnets. The GIF tunnel IP addresses are not supposed to be on the same internal LAN. If server 1's public IP is 1.1.1.1 and server 2 is 2.2.2.2 and server1's internet network is 192.168.1.0/24 and server2's inside network is 192.168.2.0/24 This should work. #!/bin/sh #server1 to connect to server2 MEOUTSIDE=1.1.1.1 MEINSIDE=10.10.69.1 REMOTEOUTSIDE=2.2.2.2 REMOTEINSIDE=10.10.69.2 REMOTENET=192.168.2.0/24 /sbin/ifconfig gif1 create tunnel $MEOUTSIDE $REMOTEOUTSIDE /sbin/ifconfig gif1 $MEINSIDE netmask 255.255.255.252 $REMOTEINSIDE /sbin/route delete $REMOTENET /sbin/route add $REMOTENET $REMOTEINSIDE #!/bin/sh #server2 script to connect to server1 MEOUTSIDE=2.2.2.2 MEINSIDE=10.10.69.2 REMOTEOUTSIDE=1.1.1.1 REMOTEINSIDE=10.10.69.1 REMOTENET=192.168.1.0/24 /sbin/ifconfig gif1 create tunnel $MEOUTSIDE $REMOTEOUTSIDE /sbin/ifconfig gif1 $MEINSIDE netmask 255.255.255.252 $REMOTEINSIDE /sbin/route delete $REMOTENET /sbin/route add $REMOTENET $REMOTEINSIDE Also, dont confuse using GIF and IPSEC. To create some IPSEC tunnels, you dont need gif or gre interfaces. The policies will do that for you. ---Mike >Server1 > >orange# more mkgif >#/bin/sh >ifconfig gif1 create >ifconfig gif1 1.1.1.1 2.2.2.2 >ifconfig gif1 inet 192.168.72.1 192.168.70.1 netmask 255.255.255.0 >ifconfig gif1 tunnel 1.1.1.1 2.2.2.2 >ifconfig gif1 mtu 1500 >route change 192.168.70.0 192.168.70.1 255.255.255.0 >route change 192.168.71.0 192.168.70.1 255.255.255.0 > >Server2 >to# more mkgif >#/bin/sh >ifconfig gif1 create >ifconfig gif1 2.2.2.2 1.1.1.1 >ifconfig gif1 inet 192.168.70.1 192.168.72.1 netmask 255.255.255.0 >ifconfig gif1 tunnel 2.2.2.2 1.1.1.1 >ifconfig gif1 mtu 1500 >route change 192.168.72.0 192.168.72.1 255.255.255.0 From zaphod at fsklaw.com Wed Jul 9 18:04:57 2008 From: zaphod at fsklaw.com (zaphod@fsklaw.com) Date: Wed Jul 9 18:05:03 2008 Subject: Tunneling issues In-Reply-To: <4874FA1F.40209@elischer.org> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> <4874FA1F.40209@elischer.org> Message-ID: <3d2c56c963f5fc5f6732548548068f69.squirrel@cor> > zaphod@fsklaw.com wrote: >>> At 11:21 AM 7/9/2008, zaphod@fsklaw.com wrote: >>> >>>> I agree it should work. But it's not. With respect to the next two >>>> questions, yes and yes. >>> Can you post some of the configs you are using for 3 of the sites so >>> we can perhaps spot the problem(s) you are having ? I have a similar >>> setup with 5 sites, all talking to each other via IPSEC tunnels. Its >>> a lot of policies, but they work just fine. >>> >>> >>> >>> >>>> I'm not a huge fan of OpenVPN, but the bigger issue is that the gif >>>> tunnels come up at boot up. As well as routes. Given the client >>>> server >>>> nature of OpenVPN it is suitable, because if a server reboots, I'm not >>>> certain a client would auto re-connect. >>> We have ~ 400 sites running OpenVPN across Canada that all reconnect >>> just fine after reboots / power cycles etc. We dont let the clients >>> talk to each other, but that would just be a config change to allow >>> that to work. >>> >>> ---Mike >>> >> Last first. Well that's good info on OpenVPN. >> >> As to the first, I'm not even at the ipsec stage yet. I'm just trying >> to >> get tunnels up. I wrote a couple of shell scripts to bring them up for >> testing. >> >> Server1 >> >> orange# more mkgif >> #/bin/sh >> ifconfig gif1 create >> ifconfig gif1 1.1.1.1 2.2.2.2 > > ^^^^ what's that for? Well added that as I was googling the problem someone had said to do it so I tried it. Wasn't there initially. Doesn't work with or without. > since you over-ride it in the next line vvvvv > > >> ifconfig gif1 inet 192.168.72.1 192.168.70.1 netmask 255.255.255.0 > > (PTP links don't have netmasks) > snip: Got it from the manual # ifconfig gif0 create # ifconfig gif0 tunnel A.B.C.D W.X.Y.Z # ifconfig gif0 inet 192.168.1.1 192.168.2.1 netmask 0xffffffff I'll try it without. Cheers, Zaphod From mike at sentex.net Wed Jul 9 18:26:46 2008 From: mike at sentex.net (Mike Tancsa) Date: Wed Jul 9 18:26:53 2008 Subject: Tunneling issues In-Reply-To: <7.1.0.9.0.20080709133535.2396cea8@sentex.net> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> <7.1.0.9.0.20080709133535.2396cea8@sentex.net> Message-ID: <200807091826.m69IQiKR032020@lava.sentex.ca> At 02:04 PM 7/9/2008, Mike Tancsa wrote: >Also, dont confuse using GIF and IPSEC. To create some IPSEC >tunnels, you dont need gif or gre interfaces. The policies will do >that for you. Here is a simple example that just uses IPSEC tunnels with a static key. You dont need any gif/gre stuff. Dont use this in production, use IPSEC-TOOLS from the ports to do dynamic keying. To test the tunnel, assuming the inside interface of the freebsd boxes are .1 ping -S 192.168.1.1 192.168.1.2 #/bin/sh server1 MEOUTSIDE=1.1.1.1 MEINSIDE=192.168.1.0/24 REMOTEOUTSIDE=2.2.2.2 REMOTEINSIDE=192.168.5.0/24 IPSECKEY=ZA6PkrlNH6BN11SG1rCa8dxa setkey -c < Synopsis: CARP combined with LAGG causes system panic - V7.0 RELEASE#2.0/amd64 Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 9 18:48:59 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). Note to submitter: we are going to need the stack trace. http://www.freebsd.org/cgi/query-pr.cgi?pr=125442 From gnn at freebsd.org Wed Jul 9 21:26:54 2008 From: gnn at freebsd.org (gnn@freebsd.org) Date: Wed Jul 9 21:27:02 2008 Subject: What's the deal with hardware checksum and net.inet.udp.checksum? Message-ID: I would assume that if a card, say the em, has hardware TX checksum that the UDP checksum could be calculated by the hardware, but this seems not to be the case. The manual pages are unhelpful in this regard. Thanks, George From rwatson at FreeBSD.org Thu Jul 10 10:43:24 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Thu Jul 10 10:43:31 2008 Subject: What's the deal with hardware checksum and net.inet.udp.checksum? In-Reply-To: References: Message-ID: <20080710114028.T34050@fledge.watson.org> On Wed, 9 Jul 2008, gnn@freebsd.org wrote: > I would assume that if a card, say the em, has hardware TX checksum that the > UDP checksum could be calculated by the hardware, but this seems not to be > the case. The manual pages are unhelpful in this regard. On the whole, they should be generated in hardware as long as it's not administratively disabled with ifconfig, and as long as there aren't know bugs in the hardware for the rev you're using. Just for example, hardware checksumming is disabled in software for quite a few early 1gbps cards due to bugs in the hardware causing rather nasty side effects. What specific problem are you seeing? We do do a software checksum of the pseudo-header, but the UDP data should be checksummed by hardware. (The usual test for hardware checksum being enabled on transmit is to tcpdump the interface and see tcpdump reporting lots of bad checksums, as the BPF capture happens before hardware checksumming is run -- in principle on the receive side that shouldn't happen!) Robert N M Watson Computer Laboratory University of Cambridge From steve at ibctech.ca Thu Jul 10 12:25:05 2008 From: steve at ibctech.ca (Steve Bertrand) Date: Thu Jul 10 12:25:12 2008 Subject: Tunneling issues In-Reply-To: <3d2c56c963f5fc5f6732548548068f69.squirrel@cor> References: <8f7879db41dbaecc479a017110e8f32f.squirrel@cor> <200807040155.m641tl8s000607@lava.sentex.ca> <7904ac587e71a42fb86c2bbe77bde0ae.squirrel@cor> <200807091545.m69FjcP4031350@lava.sentex.ca> <4874FA1F.40209@elischer.org> <3d2c56c963f5fc5f6732548548068f69.squirrel@cor> Message-ID: <4875FFA1.9010608@ibctech.ca> zaphod@fsklaw.com wrote: >>> ifconfig gif1 inet 192.168.72.1 192.168.70.1 netmask 255.255.255.0 Above you are assigning a /24 netmask. > Got it from the manual > > > # ifconfig gif0 create > # ifconfig gif0 tunnel A.B.C.D W.X.Y.Z > # ifconfig gif0 inet 192.168.1.1 192.168.2.1 netmask 0xffffffff In your example from the manual, it is applying a /32 netmask ie: 255.255.255.255. Aside from that, do you see ICMP inbound via tcpdump on the machine that you are trying to ping? Is the traffic making it to the destination, but not the return trip? Steve From a_gaviola at yahoo.com.ph Thu Jul 10 14:54:14 2008 From: a_gaviola at yahoo.com.ph (Archimedes Gaviola) Date: Thu Jul 10 14:54:21 2008 Subject: [Regarding Packet Error] Message-ID: <44627.52285.qm@web76615.mail.sg1.yahoo.com> Hi, I'm running a FreeBSD-6.2 RELEASE system on an Intel Gigabit copper NIC but when it receive packets, I encounter some errors with netstat. Below is the netstat dump of my system. As what you can see, there is no dropping of packets received but only errors at the input level of the interface. Does anyone have any idea of what's going on at the entry of the packets? What could be the possible causes of packet errors? Or what network data has been collected in the netstat when it displays error? Can someone explained how FreeBSD process packets when it enters the network system in the kernel? This is to know how my system behaves before doing any step for troubleshooting. # netstat -I em0 -w 1 -d input (em0) output packets errs bytes packets errs bytes colls drops 16233 1410 24570946 13785 0 965516 0 0 16065 1262 24318048 13210 0 930566 0 0 19091 1152 28897958 15965 0 1119340 0 0 16105 1341 24380062 13827 0 975404 0 0 15999 1242 24215216 14073 0 991380 0 0 16314 1526 24695034 13531 0 946110 0 0 16319 1134 24699760 12704 0 886526 0 0 16320 1253 24698795 13752 0 966382 0 0 15900 1467 24065394 13529 0 953136 0 0 16014 1285 24239412 13018 0 912128 0 0 16219 1253 24549782 13003 0 911444 0 0 16205 1245 24528586 13504 0 947206 0 0 16003 1430 24222790 14052 0 997390 0 0 16035 1381 24269784 12641 0 881562 0 0 15909 1165 24074754 13608 0 956804 0 0 15961 1511 24151964 13501 0 953862 0 0 16217 1285 24542454 13245 0 925960 0 0 16308 1272 24683106 12444 0 861054 0 0 16323 1004 24707389 12764 0 888100 0 0 16148 1096 24442256 12663 0 888614 0 0 16534 1202 25026660 13348 0 941722 0 0 Thank you. Get your preferred Email name! Now you can @ymail.com and @rocketmail.com. http://mail.promotions.yahoo.com/newdomains/ph/ From mike at sentex.net Thu Jul 10 15:46:47 2008 From: mike at sentex.net (Mike Tancsa) Date: Thu Jul 10 15:46:53 2008 Subject: [Regarding Packet Error] In-Reply-To: <44627.52285.qm@web76615.mail.sg1.yahoo.com> References: <44627.52285.qm@web76615.mail.sg1.yahoo.com> Message-ID: <200807101546.m6AFkiAw036967@lava.sentex.ca> At 10:27 AM 7/10/2008, Archimedes Gaviola wrote: >Hi, > >I'm running a FreeBSD-6.2 RELEASE system on an Intel Gigabit copper NIC >but when it receive packets, There have been a number of bug fixes to the intel nics since 6.2R. Can you at least update to 6.3R or RELENG_6 ? Also, check the stats for your nics sysctl dev.em.0.stats=1 it will dump them to syslog Does the switchport your are connected to show any errors ? ---Mike From gnn at FreeBSD.org Thu Jul 10 19:54:00 2008 From: gnn at FreeBSD.org (gnn@FreeBSD.org) Date: Thu Jul 10 19:54:06 2008 Subject: What's the deal with hardware checksum and net.inet.udp.checksum? In-Reply-To: <20080710114028.T34050@fledge.watson.org> References: <20080710114028.T34050@fledge.watson.org> Message-ID: At Thu, 10 Jul 2008 11:43:23 +0100 (BST), rwatson wrote: > > On Wed, 9 Jul 2008, gnn@freebsd.org wrote: > > > I would assume that if a card, say the em, has hardware TX checksum that the > > UDP checksum could be calculated by the hardware, but this seems not to be > > the case. The manual pages are unhelpful in this regard. > > On the whole, they should be generated in hardware as long as it's > not administratively disabled with ifconfig, and as long as there > aren't know bugs in the hardware for the rev you're using. Just for > example, hardware checksumming is disabled in software for quite a > few early 1gbps cards due to bugs in the hardware causing rather > nasty side effects. What specific problem are you seeing? We do do > a software checksum of the pseudo-header, but the UDP data should be > checksummed by hardware. > > (The usual test for hardware checksum being enabled on transmit is > to tcpdump the interface and see tcpdump reporting lots of bad > checksums, as the BPF capture happens before hardware checksumming > is run -- in principle on the receive side that shouldn't happen!) > If the sysctl it turned off on the transmitter then the receiving machine sees UDP checksums of 0. Best, George From rwatson at FreeBSD.org Thu Jul 10 21:06:33 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Thu Jul 10 21:06:39 2008 Subject: What's the deal with hardware checksum and net.inet.udp.checksum? In-Reply-To: References: <20080710114028.T34050@fledge.watson.org> Message-ID: <20080710220201.K34050@fledge.watson.org> On Thu, 10 Jul 2008, gnn@FreeBSD.org wrote: > If the sysctl it turned off on the transmitter then the receiving machine > sees UDP checksums of 0. Right. If you disable UDP checksumming, we don't generate checksums (hardware or software) in udp_output(): /* * Set up checksum and output datagram. */ if (udp_cksum) { if (inp->inp_flags & INP_ONESBCAST) faddr.s_addr = INADDR_BROADCAST; ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr, htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP)); m->m_pkthdr.csum_flags = CSUM_UDP; m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum); } else ui->ui_sum = 0; You can disable hardware checksums using the -txcsum flag on ifconfig for each interface -- once the above-generated mbuf header gets to the IP layer and the route out an interface is available, we on-demand generate checksum in software if hardware checksums aren't available or are administratively disabled. Vis ip_output(): m->m_pkthdr.csum_flags |= CSUM_IP; sw_csum = m->m_pkthdr.csum_flags & ~ifp->if_hwassist; if (sw_csum & CSUM_DELAY_DATA) { in_delayed_cksum(m); sw_csum &= ~CSUM_DELAY_DATA; } m->m_pkthdr.csum_flags &= ifp->if_hwassist; It's possible to imagine adding a global sysctl that has slightly different policy implications, such as globally disabling hardware checksums, or not generating full checksums if the interface doesn't support hardware checksums rather than generating them. Robert N M Watson Computer Laboratory University of Cambridge From paul at gtcomm.net Fri Jul 11 05:12:08 2008 From: paul at gtcomm.net (Paul) Date: Fri Jul 11 05:12:15 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080709142008.H26105@delplex.bde.org> References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> <20080707221257.GH62764@server.vk2pj.dyndns.org> <20080709142008.H26105@delplex.bde.org> Message-ID: <4876EC16.5070209@gtcomm.net> I tested Linux in bridge configuration with the same machine and it CPUed out at about 600kpps through the bridge.. That's a bit low :/ Soft interrupt using all the cpu. Same opteron 2222, 82571EB Pci express NIC. Tried SMP/ non-smp , load balanced irqs, etc.. Good news is using iptables only adds a few percentage onto the CPU usage. But still, what's with that.. So far FreeBSD got the highest pps rating for forwarding. I haven't tried bridge mode. Ipfw probably takes a big hit in that too though. Looking for an 82575 to test.. Paul From artis.caune at gmail.com Fri Jul 11 05:35:26 2008 From: artis.caune at gmail.com (Artis Caune) Date: Fri Jul 11 05:35:33 2008 Subject: [Regarding Packet Error] In-Reply-To: <44627.52285.qm@web76615.mail.sg1.yahoo.com> References: <44627.52285.qm@web76615.mail.sg1.yahoo.com> Message-ID: <9e20d71e0807102207s7cbebb8fiad965ef46eaf84d@mail.gmail.com> On Thu, Jul 10, 2008 at 5:27 PM, Archimedes Gaviola wrote: > Hi, > > I'm running a FreeBSD-6.2 RELEASE system on an Intel Gigabit copper NIC > but when it receive packets, I encounter some errors with netstat. Below is > the netstat dump of my system. As what you can see, there is no dropping of > packets received but only errors at the input level of the interface. Does > anyone have any idea of what's going on at the entry of the packets? What > could be the possible causes of packet errors? Or what network data has > been collected in the netstat when it displays error? Can someone > explained how FreeBSD process packets when it enters the network system > in the kernel? This is to know how my system behaves before doing any step > for troubleshooting. have you enabled polling on interface? try to disable if yes. is it pci or pci64/x interface? try to set the following variables in /boot/loader.conf and reboot: hw.em.rxd=4096 hw.em.txd=4096 From a_gaviola at yahoo.com.ph Fri Jul 11 07:47:06 2008 From: a_gaviola at yahoo.com.ph (Archimedes Gaviola) Date: Fri Jul 11 07:47:12 2008 Subject: [Regarding Packet Error] In-Reply-To: <200807101546.m6AFkiAw036967@lava.sentex.ca> Message-ID: <948463.97751.qm@web76608.mail.sg1.yahoo.com> --- On Thu, 7/10/08, Mike Tancsa wrote: > From: Mike Tancsa > Subject: Re: [Regarding Packet Error] > To: a_gaviola@yahoo.com.ph, freebsd-net@freebsd.org > Date: Thursday, 10 July, 2008, 11:46 PM > At 10:27 AM 7/10/2008, Archimedes Gaviola wrote: > >Hi, > > > >I'm running a FreeBSD-6.2 RELEASE system on an > Intel Gigabit copper NIC > >but when it receive packets, > > There have been a number of bug fixes to the intel nics > since > 6.2R. Can you at least update to 6.3R or RELENG_6 ? [Archimedes] Yes, I'll try newer versions of FreeBSD with 6.3 RELEASE and 7.0 RELEASE if errors will still occur. Also, > check the > stats for your nics > > sysctl dev.em.0.stats=1 > > it will dump them to syslog [Archimedes] Here's the log. Jul 11 07:01:18 freebsd62 kernel: em0: Excessive collisions = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Sequence errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Defer count = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Missed Packets = 294797 Jul 11 07:01:18 freebsd62 kernel: em0: Receive No Buffers = 1377122 Jul 11 07:01:18 freebsd62 kernel: em0: Receive Length Errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Receive errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Crc errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Alignment errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: Carrier extension errors = 0 Jul 11 07:01:18 freebsd62 kernel: em0: RX overruns = 24092 Jul 11 07:01:18 freebsd62 kernel: em0: watchdog timeouts = 0 Jul 11 07:01:18 freebsd62 kernel: em0: XON Rcvd = 0 Jul 11 07:01:18 freebsd62 kernel: em0: XON Xmtd = 26434 Jul 11 07:01:18 freebsd62 kernel: em0: XOFF Rcvd = 0 Jul 11 07:01:18 freebsd62 kernel: em0: XOFF Xmtd = 319731 Jul 11 07:01:18 freebsd62 kernel: em0: Good Packets Rcvd = 10486744 Jul 11 07:01:18 freebsd62 kernel: em0: Good Packets Xmtd = 7311565 > > Does the switchport your are connected to show any errors ? > [Archimedes] No, it doesn't show any errors on my Gigabit LinkSys switch. Thanks, Archimedes Yahoo! Toolbar is now powered with Search Assist.Download it now! http://ph.toolbar.yahoo.com/ From a_gaviola at yahoo.com.ph Fri Jul 11 08:04:07 2008 From: a_gaviola at yahoo.com.ph (Archimedes Gaviola) Date: Fri Jul 11 08:04:14 2008 Subject: [Regarding Packet Error] In-Reply-To: <9e20d71e0807102207s7cbebb8fiad965ef46eaf84d@mail.gmail.com> Message-ID: <212607.89062.qm@web76615.mail.sg1.yahoo.com> --- On Fri, 7/11/08, Artis Caune wrote: > From: Artis Caune > Subject: Re: [Regarding Packet Error] > To: a_gaviola@yahoo.com.ph > Cc: freebsd-net@freebsd.org > Date: Friday, 11 July, 2008, 1:07 PM > On Thu, Jul 10, 2008 at 5:27 PM, Archimedes Gaviola > wrote: > > Hi, > > > > I'm running a FreeBSD-6.2 RELEASE system on an > Intel Gigabit copper NIC > > but when it receive packets, I encounter some errors > with netstat. Below is > > the netstat dump of my system. As what you can see, > there is no dropping of > > packets received but only errors at the input level of > the interface. Does > > anyone have any idea of what's going on at the > entry of the packets? What > > could be the possible causes of packet errors? Or what > network data has > > been collected in the netstat when it displays error? > Can someone > > explained how FreeBSD process packets when it enters > the network system > > in the kernel? This is to know how my system behaves > before doing any step > > for troubleshooting. > > have you enabled polling on interface? try to disable if > yes. [Archimedes] No, device polling is not enabled to my system. > is it pci or pci64/x interface? [Archimedes] This is an Intel PCI-X copper NIC http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/pro1000mt_sa.pdf > > try to set the following variables in /boot/loader.conf and > reboot: > hw.em.rxd=4096 > hw.em.txd=4096 [Archimedes] Yes, I need to try these options also. Thanks, Archimedes Yahoo! Toolbar is now powered with Search Assist.Download it now! http://ph.toolbar.yahoo.com/ From stefan.lambrev at moneybookers.com Fri Jul 11 10:09:21 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Fri Jul 11 10:09:28 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <4876EC16.5070209@gtcomm.net> References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> <20080707221257.GH62764@server.vk2pj.dyndns.org> <20080709142008.H26105@delplex.bde.org> <4876EC16.5070209@gtcomm.net> Message-ID: <4877314C.5000106@moneybookers.com> Hi Paul, Paul wrote: > I tested Linux in bridge configuration with the same machine and it > CPUed out at about 600kpps through the bridge.. 600kpps incoming or 600kpps incoming+ outgoing ? > That's a bit low :/ Soft interrupt using all the cpu. Same opteron > 2222, 82571EB Pci express NIC. > Tried SMP/ non-smp , load balanced irqs, etc.. Does hwpmc work out of the box (FreeBSD) with those CPUs? > > Good news is using iptables only adds a few percentage onto the CPU > usage. But still, what's with that.. > So far FreeBSD got the highest pps rating for forwarding. I haven't > tried bridge mode. Ipfw probably takes a big hit in that too though. > > Looking for an 82575 to test.. P.S. It was a nice chat, but what we can expect from the future? Any plans, patches etc? Someone suggested to install 8-current and test with it as this is the "fast" way to have something included in FreeBSD. I can do this - I can install 8-current, patch it and put it under load and report results, but need patches :) I guess Paul is in the same situation .. -- Best Wishes, Stefan Lambrev ICQ# 24134177 From bart at it-ss.be Fri Jul 11 15:15:08 2008 From: bart at it-ss.be (Bart Van Kerckhove) Date: Fri Jul 11 15:15:14 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] References: <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> <20080708045135.V1022@besplex.bde.org> <48727BA9.6020702@elischer.org> <20080707221257.GH62764@server.vk2pj.dyndns.org> <20080709142008.H26105@delplex.bde.org><4876EC16.5070209@gtcomm.net> <4877314C.5000106@moneybookers.com> Message-ID: <003d01c8e368$e3213f50$020b000a@bartwrkstxp> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> Good news is using iptables only adds a few percentage onto the CPU >> usage. But still, what's with that.. >> So far FreeBSD got the highest pps rating for forwarding. I haven't >> tried bridge mode. Ipfw probably takes a big hit in that too though. >> Looking for an 82575 to test.. > > P.S. It was a nice chat, but what we can expect from the future? Any > plans, patches etc? > Someone suggested to install 8-current and test with it as this is the > "fast" way to have something included in FreeBSD. > I can do this - I can install 8-current, patch it and put it under > load and report results, but need patches :) > I guess Paul is in the same situation .. I'm in the same situation as well. Would anyone be interested in very specific work aimed at improving IP forwarding? I would happily put out a bounty for this, and I'm quite sure I'm not alone. PS Paul: idd you get around to testing C2D ? Kind regards, Met vriendelijke groet / With kind regards, Bart Van Kerckhove http://friet.net/pgp.txt "There are 10 kinds of ppl; those who read binary and those who don't" -----BEGIN PGP SIGNATURE----- iQA/AwUBSHdqOQoIFchBM0BKEQJSPQCfQKKgD8+xrX088+o0IKmPDdDD0XoAnAv+ SqgNdjkKsEstDYqnFDNUQuK3 =ft58 -----END PGP SIGNATURE----- From gavin at FreeBSD.org Fri Jul 11 20:18:05 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Fri Jul 11 20:18:17 2008 Subject: kern/125502: [ral] ifconfig ral0 scan produces no output unless in shared mode Message-ID: <200807112018.m6BKI4oo086736@freefall.freebsd.org> Old Synopsis: ifconfig ral0 scan produces no output New Synopsis: [ral] ifconfig ral0 scan produces no output unless in shared mode State-Changed-From-To: open->feedback State-Changed-By: gavin State-Changed-When: Fri Jul 11 20:11:15 UTC 2008 State-Changed-Why: To submitter: We'll probably need more details from you before there's any chance of diagnosing the problem. To start with, could you try using the wlandebug tool, from /usr/src/tools/tools/net80211 and see if that reveals anything obvious? # wlandebug -i ral0 +scan+auth+debug+assoc net.wlan.0.debug: 0 => 0xc80000 Responsible-Changed-From-To: freebsd-i386->freebsd-net Responsible-Changed-By: gavin Responsible-Changed-When: Fri Jul 11 20:11:15 UTC 2008 Responsible-Changed-Why: Over to maintainer(s) http://www.freebsd.org/cgi/query-pr.cgi?pr=125502 From robin at icir.org Fri Jul 11 20:52:08 2008 From: robin at icir.org (Robin Sommer) Date: Fri Jul 11 20:52:15 2008 Subject: BPF problems on FreeBSD 7.0 Message-ID: <20080711202737.GB27418@icir.org> Hi all, we're seeing some strange effects with our libpcap-based application (the Bro network intrusion detection system) on a FreeBSD 7-RELEASE system. As the application has always been running fine on 6.x, we're wondering whether this might be triggered by any of the changes that went into 7. The problem is that the Bro process, after running fine for a few hours or so, regularly stalls completely; the process seems to enter some odd state, using 0% CPU and with top showing only an empty field in the STATE column. We saw this effect with a Neterion network card and first thought it might be a driver problem. After switching to an Intel card, we see something slightly different: now the process doesn't stall completely anymore but it still gets to some point at which it stops receiving packets from libpcap. We haven't yet seen these problems with any other libpcap application. The only difference between Bro and most other libpcap applications that I can think of right now, is that Bro is using select() on the file descriptor. However, with a small test applicaton which mimics Bro's way of using libpcap, we couldn't reproduce the problem so far either. With the Neterion card, we have also tried disabling LRO and MSI explicitly but to no avail. Again, this is all with a Bro installation that works fine when running FreeBSD 6.x (we haven't run 6.x on the same boxes but we see the problems on two separate machines running FreeBSD 7). I'm wondering whether anybody here has seen something similar or might have an idea where to start looking for the cause. Any ideas? Thanks, Robin -- Robin Sommer * Phone +1 (510) 666-2886 * robin@icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From Mark.Favas at csiro.au Fri Jul 11 21:20:07 2008 From: Mark.Favas at csiro.au (Mark.Favas@csiro.au) Date: Fri Jul 11 21:20:16 2008 Subject: kern/92090: [bge] bge0: watchdog timeout -- resetting Message-ID: <200807112120.m6BLK7Bh091849@freefall.freebsd.org> The following reply was made to PR kern/92090; it has been noted by GNATS. From: To: , Cc: Subject: Re: kern/92090: [bge] bge0: watchdog timeout -- resetting Date: Sat, 12 Jul 2008 05:13:08 +0800 --_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D17EXWAMBX01nexu_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I can confirm that this problem still exists in FreeBSD7.0-STABLE. Hardware is Dell PowerEdge 2550 with on-board bge interface. And yes, the p= roblem is most reliably triggered by csup. FreeBSD bienvenue 7.0-STABLE FreeBSD 7.0-STABLE #1: Mon Jul 7 16:14:42 WST= 2008 root@bienvenue:/usr/obj/usr/src/sys/BIENVENUE i386 Jul 11 03:50:46 bienvenue kernel: bge0: watchdog timeout -- resetting Jul 11 03:50:46 bienvenue kernel: bge0: link state changed to DOWN Jul 11 03:50:48 bienvenue kernel: bge0: link state changed to UP Jul 12 03:55:09 bienvenue kernel: bge0: watchdog timeout -- resetting Jul 12 03:55:09 bienvenue kernel: bge0: link state changed to DOWN Jul 12 03:55:11 bienvenue kernel: bge0: link state changed to UP --_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D17EXWAMBX01nexu_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
I can confirm that this problem still exists in FreeBSD7.0-STABLE.
 
Hardware is Dell PowerEdge 2550 with on-board bge interface. And yes, = the problem is most reliably triggered by csup.
 
FreeBSD bienvenue 7.0-STABLE FreeBSD 7.0-STABLE #1: Mon Jul  7 16= :14:42 WST 2008     root@bienvenue:/usr/obj/usr/src/sys= /BIENVENUE  i386
 
Jul 11 03:50:46 bienvenue kernel: bge0: watchdog timeout -- resetting<= /div>
Jul 11 03:50:46 bienvenue kernel: bge0: link state changed to DOWN
Jul 11 03:50:48 bienvenue kernel: bge0: link state changed to UP
Jul 12 03:55:09 bienvenue kernel: bge0: watchdog timeout -- resetting<= /div>
Jul 12 03:55:09 bienvenue kernel: bge0: link state changed to DOWN
Jul 12 03:55:11 bienvenue kernel: bge0: link state changed to UP
 
 
 
--_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D17EXWAMBX01nexu_-- From Mark.Favas at csiro.au Fri Jul 11 21:40:08 2008 From: Mark.Favas at csiro.au (Mark.Favas@csiro.au) Date: Fri Jul 11 21:40:15 2008 Subject: kern/123347: [bge] bge1: watchdog timeout -- linkstate changed to DOWN Message-ID: <200807112140.m6BLe8Ao094258@freefall.freebsd.org> The following reply was made to PR kern/123347; it has been noted by GNATS. From: To: , Cc: Subject: Re: kern/123347: [bge] bge1: watchdog timeout -- linkstate changed to DOWN Date: Sat, 12 Jul 2008 05:03:21 +0800 --_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D16EXWAMBX01nexu_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I am currently seeing the same issue with a single bge interface on FreeBSD= 7.0-STABLE. pciconf -lv | grep -A 3 bge bge0@pci0:1:8:0: class=3D0x020000 card=3D0x00d11028 chip=3D0x164414e= 4 rev=3D0x10 hdr=3D0x00 vendor =3D 'Broadcom Corporation' device =3D 'BCM5751F NetXtreme Gigabit Ethernet Controller' class =3D network Jul 11 03:50:46 bienvenue kernel: bge0: watchdog timeout -- resetting Jul 11 03:50:46 bienvenue kernel: bge0: link state changed to DOWN Jul 11 03:50:48 bienvenue kernel: bge0: link state changed to UP Jul 12 03:55:09 bienvenue kernel: bge0: watchdog timeout -- resetting Jul 12 03:55:09 bienvenue kernel: bge0: link state changed to DOWN Jul 12 03:55:11 bienvenue kernel: bge0: link state changed to UP FreeBSD bienvenue 7.0-STABLE FreeBSD 7.0-STABLE #1: Mon Jul 7 16:14:42 WST= 2008 root@bienvenue:/usr/obj/usr/src/sys/BIENVENUE i386 --_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D16EXWAMBX01nexu_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
I am currently seeing the same issue with a single bge interface on Fr= eeBSD 7.0-STABLE.
 
pciconf -lv | grep -A 3 bge
bge0@pci0:1:8:0:        class=3D0x0= 20000 card=3D0x00d11028 chip=3D0x164414e4 rev=3D0x10 hdr=3D0x00
    vendor     =3D 'Broadcom Corpor= ation'
    device     =3D 'BCM5751F NetXtr= eme Gigabit Ethernet Controller'
    class      =3D network
 
Jul 11 03:50:46 bienvenue kernel: bge0: watchdog timeout -- resetting<= /div>
Jul 11 03:50:46 bienvenue kernel: bge0: link state changed to DOWN
Jul 11 03:50:48 bienvenue kernel: bge0: link state changed to UP
Jul 12 03:55:09 bienvenue kernel: bge0: watchdog timeout -- resetting<= /div>
Jul 12 03:55:09 bienvenue kernel: bge0: link state changed to DOWN
Jul 12 03:55:11 bienvenue kernel: bge0: link state changed to UP
 
FreeBSD bienvenue 7.0-STABLE FreeBSD 7.0-STABLE #1: Mon Jul  7 16= :14:42 WST 2008     root@bienvenue:/usr/obj/usr/src/sys= /BIENVENUE  i386
 
--_000_3710094049A65F4EA7CA1DC0D39D0E8B01EC139D16EXWAMBX01nexu_-- From brian.mcginty at gmail.com Sat Jul 12 06:44:54 2008 From: brian.mcginty at gmail.com (Brian McGinty) Date: Sat Jul 12 06:45:01 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> Message-ID: <601bffc40807112344n7a683f81y516f540e24d87389@mail.gmail.com> > Hi Brian > I very much doubt that this is ceteris paribus. This is 384 random IPs > -> 384 random IP addresses with a flow lookup for each packet. Also, > I've read through igb on Linux - it has a lot of optimizations that > the FreeBSD driver lacks and I have yet to implement. Hey Kip, when will you push the optimization into FreeBSD? Cheers, Brian From bzeeb-lists at lists.zabbadoz.net Sat Jul 12 12:45:07 2008 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Sat Jul 12 12:45:33 2008 Subject: Route messages In-Reply-To: <200807021355.m62Dtpli092002@lava.sentex.ca> References: <4852E23E.2040505@gtcomm.net> <4854EBF1.7020708@FreeBSD.org> <200807010606.m6166jFe084204@lava.sentex.ca> <4869EC1E.8060009@freebsd.org> <20080701084933.W57089@maildrop.int.zabbadoz.net> <20080701092254.T57089@maildrop.int.zabbadoz.net> <486B87DB.3080007@freebsd.org> <200807021355.m62Dtpli092002@lava.sentex.ca> Message-ID: <20080712124250.M57089@maildrop.int.zabbadoz.net> On Wed, 2 Jul 2008, Mike Tancsa wrote: Hi, > It works for me in the lab and on one production machine I patched early this > morning. I just MFCed this to 7-STABLE. So if you update your trees make sure you have rev. 1.332.2.3 of ip_input.c. /bz >>> Index: sys/netinet/ip_input.c >>> =================================================================== >>> RCS file: /shared/mirror/FreeBSD/r/ncvs/src/sys/netinet/ip_input.c,v >>> retrieving revision 1.332.2.2 >>> diff -u -p -r1.332.2.2 ip_input.c >>> --- sys/netinet/ip_input.c 22 Apr 2008 12:02:55 -0000 1.332.2.2 >>> +++ sys/netinet/ip_input.c 1 Jul 2008 09:23:08 -0000 >>> @@ -1363,7 +1363,6 @@ ip_forward(struct mbuf *m, int srcrt) >>> * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described in >>> RFC1191. >>> */ >>> bzero(&ro, sizeof(ro)); >>> - rtalloc_ign(&ro, RTF_CLONING); >>> error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From gavin at FreeBSD.org Sun Jul 13 17:11:00 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Sun Jul 13 17:11:07 2008 Subject: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics Message-ID: <200807131711.m6DHB0ni082660@freefall.freebsd.org> Old Synopsis: [ndis] with wep enters kdb.enter.unknown, panics New Synopsis: [ndis] [patch] with wep enters kdb.enter.unknown, panics State-Changed-From-To: feedback->open State-Changed-By: gavin State-Changed-When: Sun Jul 13 17:06:03 UTC 2008 State-Changed-Why: Over to maintainers for evaluation Responsible-Changed-From-To: gavin->freebsd-net Responsible-Changed-By: gavin Responsible-Changed-When: Sun Jul 13 17:06:03 UTC 2008 Responsible-Changed-Why: Submitter reports my patch fixes things for him http://www.freebsd.org/cgi/query-pr.cgi?pr=125181 From bugmaster at FreeBSD.org Mon Jul 14 11:07:02 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 14 11:08:24 2008 Subject: Current problem reports assigned to freebsd-net@FreeBSD.org Message-ID: <200807141107.m6EB72wk014490@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c o kern/35442 net [sis] [patch] Problem transmitting runts in if_sis dri a kern/38554 net changing interface ipaddress doesn't seem to work s kern/39937 net ipstealth issue s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/79895 net [ipf] 5.4-RC2 breaks ipfilter NAT when using netgraph s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter s kern/86920 net [ndis] ifconfig: SIOCS80211: Invalid argument [regress o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/92090 net [bge] bge0: watchdog timeout -- resetting f kern/92552 net A serious bug in most network drivers from 5.X to 6.X o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau f kern/102344 net [ipf] Some packets do not pass through network interfa o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] s kern/105943 net Network stack may modify read-only mbuf chain copies o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/106438 net [ipf] ipfilter: keep state does not seem to allow repl o kern/108542 net [bce]: Huge network latencies with 6.2-RELEASE / STABL o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/109308 net [pppd] [panic] Multiple panics kernel ppp suspected [r o kern/109733 net [bge] bge link state issues [regression] o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 net [udp] IP v4 udp fragmented packet reject o kern/113842 net [ip6] PF_INET6 proto domain state can't be cleared wit o kern/114714 net [gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net [ip] [patch] 6.2-STABLE panic during use of multi-cast o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/116328 net [bge]: Solid hang with bge interface o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 net [tun] [panic] [patch] ifconfig tunX destroy: panic o kern/117043 net [em] Intel PWLA8492MT Dual-Port Network adapter EEPROM o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117423 net [vlan] Duplicate IP on different interfaces o kern/117448 net [carp] 6.2 kernel crash [regression] o kern/118880 net [ip6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/119345 net [ath] Unsuported Atheros 5424/2424 and CPU speedstep n o kern/119361 net [bge] bge(4) transmit performance problem o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/120130 net [carp] [panic] carp causes kernel panics in any conste o kern/120266 net [panic] gnugk causes kernel panic when closing UDP soc o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/121080 net [bge] IPv6 NUD problem on multi address config on bge0 o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/121298 net [em] [panic] Fatal trap 12: page fault while in kernel o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121624 net [em] [regression] Intel em WOL fails after upgrade to o kern/121872 net [wpi] driver fails to attach on a fujitsu-siemens s711 o kern/121983 net [fxp] fxp0 MBUF and PAE o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup [reg o kern/122058 net [em] [panic] Panic on em1: taskq o kern/122082 net [in_pcb] NULL pointer dereference in in_pcbdrop o kern/122195 net [ed] Alignment problems in if_ed f kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122427 net [apm] [panic] apm and mDNSResponder cause panic during o kern/122551 net [bge] Broadcom 5715S no carrier on HP BL460c blade usi o kern/122685 net It is not visible passing packets in tcpdump o kern/122743 net [panic] vm_page_unwire: invalid wire count: 0 o kern/122772 net [em] em0 taskq panic, tcp reassembly bug causes radix f kern/122794 net [lagg] Kernel panic after brings lagg(8) up if NICs ar f conf/122858 net [nsswitch.conf] nsswitch in 7.0 is f*cked up o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/123066 net [ipsec] [panic] kernel trap with ipsec o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 f kern/123172 net [bce] Watchdog timeout problems with if_bce f kern/123200 net [netgraph] Server failure due to netgraph mpd and dhcp o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123347 net [bge] bge1: watchdog timeout -- linkstate changed to D o kern/123429 net [nfe] [hang] "ifconfig nfe up" causes a hard system lo o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o kern/123603 net [tcp] tcp_do_segment and Received duplicate SYN o kern/123617 net [tcp] breaking connection when client downloading file o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123881 net [tcp] Turning on TCP blackholing causes slow localhost o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/124127 net [msk] watchdog timeout (missed Tx interrupts) -- recov o kern/124753 net [ieee80211] net80211 discards power-save queue packets o kern/124904 net [fxp] EEPROM corruption with Compaq NC3163 NIC o kern/125079 net [ppp] host routes added by ppp with gateway flag (regr f kern/125195 net [fxp] fxp(4) driver failed to initialize device Intel o kern/125442 net [carp][lagg] CARP combined with LAGG causes system pan 95 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o conf/23063 net [PATCH] for static ARP tables in rc.network o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net [nfs] [patch] NFS root configurations without dynamic s kern/60293 net FreeBSD arp poison patch o kern/64556 net [sis] if_sis short cable fix problems with NetGear FA3 o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/77273 net [ipf] ipfilter breaks ipv6 statefull filtering on 5.3 o kern/77913 net [wi] [patch] Add the APDL-325 WLAN pccard to wi(4) o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o bin/79228 net [patch] extend arp(8) to be able to create blackhole r o kern/91594 net [em] FreeBSD > 5.4 w/ACPI fails to detect Intel Pro/10 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/95267 net packet drops periodically appear o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/102035 net [plip] plip networking disables parallel port printing o conf/102502 net [patch] ifconfig name does't rename netgraph node in n o conf/107035 net [patch] bridge interface given in rc.conf not taking a o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o kern/112179 net [sis] [patch] sis driver for natsemi DP83815D autonego o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o bin/117339 net [patch] route(8): loading routing management commands o kern/118727 net [netgraph] [patch] [request] add new ng_pf module a kern/118879 net [bge] [patch] bge has checksum problems on the 5703 ch o bin/118987 net ifconfig(8): ifconfig -l (address_family) does not wor o kern/119432 net [arp] route add -host -iface causes arp e f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/120232 net [nfe] [patch] Bring in nfe(4) to RELENG_6 o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121443 net [gif] LOR icmp6_input/nd6_lookup o kern/121706 net [netinet] [patch] "rtfree: 0xc4383870 has 1 refs" emit s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122068 net [ppp] ppp can not set the correct interface with pptpd o kern/122295 net [bge] bge Ierr rate increase (since 6.0R) [regression] o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122697 net [ath] Atheros card is not well supported o kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge f kern/122839 net [multicast] FreeBSD 7 multicast routing problem o kern/122928 net [em] interface watchdog timeouts and stops receiving p o kern/123892 net [tap] [patch] No buffer space available p kern/123961 net [vr] [patch] Allow vr interface to handle vlans o bin/124004 net ifconfig(8): Cannot assign both an IP and a MAC addres o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124609 net [ipsec] [panic] ipsec 'remainder too big' panic with p o kern/124767 net [iwi] Wireless connection using iwi0 driver (Intel 220 o kern/125003 net [gif] incorrect EtherIP header format. o kern/125181 net [ndis] [patch] with wep enters kdb.enter.unknown, pani o kern/125239 net [gre] kernel crash when using gre o kern/125258 net [socket] socket's SO_REUSEADDR option does not work f kern/125502 net [ral] ifconfig ral0 scan produces no output unless in 58 problems total. From brde at optusnet.com.au Mon Jul 14 12:35:04 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 14 12:35:11 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <20080707142018.U63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707134036.S63144@fledge.watson.org> <20080707224659.B7844@besplex.bde.org> <20080707142018.U63144@fledge.watson.org> Message-ID: <20080714212912.D885@besplex.bde.org> On Mon, 7 Jul 2008, Robert Watson wrote: > On Mon, 7 Jul 2008, Bruce Evans wrote: > >>> (1) sendto() to a specific address and port on a socket that has been >>> bound to >>> INADDR_ANY and a specific port. >>> >>> (2) sendto() on a specific address and port on a socket that has been >>> bound to >>> a specific IP address (not INADDR_ANY) and a specific port. >>> >>> (3) send() on a socket that has been connect()'d to a specific IP address >>> and >>> a specific port, and bound to INADDR_ANY and a specific port. >>> >>> (4) send() on a socket that has been connect()'d to a specific IP address >>> and a specific port, and bound to a specific IP address (not >>> INADDR_ANY) >>> and a specific port. >>> >>> The last of these should really be quite a bit faster than the first of >>> these, but I'd be interested in seeing specific measurements for each if >>> that's possible! >> >> Not sure if I understand networking well enough to set these up quickly. >> Does netrate use one of (3) or (4) now? > > (3) and (4) are effectively the same thing, I think, since connect(2) should > force the selection of a source IP address, but I think it's not a bad idea > to confirm that. :-) > > The structure of the desired micro-benchmark here is basically: > ... I hacked netblast.c to do this: % --- /usr/src/tools/tools/netrate/netblast/netblast.c Fri Dec 16 17:02:44 2005 % +++ netblast.c Mon Jul 14 21:26:52 2008 % @@ -44,9 +44,11 @@ % { % % - fprintf(stderr, "netblast [ip] [port] [payloadsize] [duration]\n"); % - exit(-1); % + fprintf(stderr, "netblast ip port payloadsize duration bind connect\n"); % + exit(1); % } % % +static int gconnected; % static int global_stop_flag; % +static struct sockaddr_in *gsin; % % static void % @@ -116,6 +118,13 @@ % counter++; % } % - if (send(s, packet, packet_len, 0) < 0) % + if (gconnected && send(s, packet, packet_len, 0) < 0) { % send_errors++; % + usleep(1000); % + } % + if (!gconnected && sendto(s, packet, packet_len, 0, % + (struct sockaddr *)gsin, sizeof(*gsin)) < 0) { % + send_errors++; % + usleep(1000); % + } % send_calls++; % } % @@ -146,9 +155,10 @@ % struct sockaddr_in sin; % char *dummy, *packet; % - int s; % + int bind_desired, connect_desired, s; % % - if (argc != 5) % + if (argc != 7) % usage(); % % + gsin = &sin; % bzero(&sin, sizeof(sin)); % sin.sin_len = sizeof(sin); % @@ -176,4 +186,7 @@ % usage(); % % + bind_desired = (strcmp(argv[5], "b") == 0); % + connect_desired = (strcmp(argv[6], "c") == 0); % + % packet = malloc(payloadsize); % if (packet == NULL) { % @@ -189,7 +202,19 @@ % } % % - if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) { % - perror("connect"); % - return (-1); % + if (bind_desired) { % + struct sockaddr_in osin; % + % + osin = sin; % + if (inet_aton("0", &sin.sin_addr) == 0) % + perror("inet_aton(0)"); % + if (bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) % + err(-1, "bind"); % + sin = osin; % + } % + % + if (connect_desired) { % + if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) % + err(-1, "connect"); % + gconnected = 1; % } % This also fixes some bugs in usage() (bogus [] around non-optional args and bogus exit code) and adds a sleep after send failure. Without the sleep, netblast distorts the measurements by taking 100% CPU. This depends on kernel queues having enough buffering to not run dry during the sleep time (rounded up to a tick boundary). I use ifq_maxlen = DRIVER_TX_RING_CNT + imax(2 * tick / 4, 10000) = 10512 for DRIVER = bge and HZ = 100. This is actually wrong now. The magic 2 is to round up to a tick boundary and the magic 4 is for bge taking a minimum of 4 usec per packet on old hadware, but bge actually takes about 1.5 usec on the test hardware and I'd like it to take 0.66 usec. The queues rarely run dry in practice, but running dry just a few times for a few msec each would explain some anomalies. Old SGI ttcp uses a select timeout of 18 msec here. nttcp and netsend use more sophisticated methods that don't work unless HZ is too small. It's just impossible for a program to schedule its sleeps with a fine enough resolution to ensure waking up before the queue runs dry, unless HZ is too small or the queue is too large. select() for writing doesn't work for the queue part of socket i/o. Results: ~5.2 sendto (1): 630 kpps 98% CPU 11 cm/p (cache misses/packet (min)) -cur sendto: 590 kpps 100% CPU 10 cm/p (July 8 -current) (2): no significant difference - see below ~5.2 send (3): 620 kpps 75% CPU 9.5 cm/p -cur send: 520 kpps 60% CPU 8 cm/p (4): no significant difference - see below send() has lower CPU overheads as expected. For some reason, send() gets lower throughput than sendto(). I think the reason is just that the queue runs dry due to the lower CPU overhead making it possible for the userland sender to outrun the hardware -- userland sees more ENOBUFS and sleeps more often, so it sometimes sleeps too long due to my out of date hack for increasing the queue length. For some reason, this affects -current much more than ~5.2 (the bge drivers in each have lots of modifications which are supposed to be equivalent here). Probably the same reason. sendto() still 5-10% higher overhead in -current than in ~5.2 and runs out of CPU. It runs out under ~5.2 testing ttcp too. > If you look at the design of the higher performance UDP applications, they > will generally bind a specific IP (perhaps every IP on the host with its own > socket), and if they do sustained communication to a specific endpoint they > will use connect(2) rather than providing an address for each send(2) system > call to the kernel. I couldn't see any effect from binding. I'm only testing sending, and it doesn't seem to be possible to bind to anything except local addresses (0.0.0.0, the NIC's address and 127.0.0.1) but these seem to be equivalent (with no extra work for translation on every packet?) and seem to be used by default anyway. In the above, sin.sin_addr has to be set to the receiver's ip from the command line (else it defaults to a local address), and the above temporarily sets it back to 0.0.0.0 so as to use the same sin for the local bind()). > udp_output(2) makes the trade-offs there fairly clear: with the most recent > rev, the optimal case is one connect(2) has been called, allowing a single > inpcb read lock and no global data structure access, vs. an application > calling sendto(2) for each system call and the local binding remaining > INADDR_ANY. Middle ground applications, such as named(8) will force a local > binding using bind(2), but then still have to pass an address to each > sendto(2). In the future, this case will be further optimized in our code by > using a global read lock rather than a global write lock: we have to check > for collisions, but we don't actually have to reserve the new 4-tuple for the > UDP socket as it's an ephemeral association rather than a connect(2). The July 8 -current should have this rev. Note that I'm not testing SMP or stessing locking, or nontrivial routine tables, or forwarding, and don't plan to. UP with a direct connection is hard enough and short of CPU enough to understand and make efficient. Locking barely shows up in older tests, only partly because it is mostly inline. Bruce From bms at FreeBSD.org Mon Jul 14 13:44:35 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Mon Jul 14 13:44:41 2008 Subject: BPF problems on FreeBSD 7.0 In-Reply-To: <20080711202737.GB27418@icir.org> References: <20080711202737.GB27418@icir.org> Message-ID: <487B5840.3000401@FreeBSD.org> Robin Sommer wrote: > Hi all, > > we're seeing some strange effects with our libpcap-based application > (the Bro network intrusion detection system) on a FreeBSD 7-RELEASE > system. As the application has always been running fine on 6.x, > we're wondering whether this might be triggered by any of the > changes that went into 7. > ... > I'm wondering whether anybody here has seen something similar or > might have an idea where to start looking for the cause. Any ideas? > One place to start might be: netstat -B output in 7.x (I *think* this got MFCed), this will let us see what the drop count is for the Bro process, and what the flags are for the open BPF descriptors in the system. I'm not hot on current BPF internals, but I hazard a guess this is related to BPF descriptor buffering -- an area where there have been changes, some of which I've eyeballed. cheers BMS From gnn at FreeBSD.org Mon Jul 14 17:42:46 2008 From: gnn at FreeBSD.org (gnn@FreeBSD.org) Date: Mon Jul 14 17:42:53 2008 Subject: What's the deal with hardware checksum and net.inet.udp.checksum? In-Reply-To: <20080710220201.K34050@fledge.watson.org> References: <20080710114028.T34050@fledge.watson.org> <20080710220201.K34050@fledge.watson.org> Message-ID: Ahhhh, thanks, George From gnn at freebsd.org Mon Jul 14 21:45:56 2008 From: gnn at freebsd.org (gnn@freebsd.org) Date: Mon Jul 14 21:46:02 2008 Subject: igb doesn't compile in STABLE? Message-ID: Howdy, As of today, this afternoon, I see the following: linking kernel.debug e1000_api.o(.text+0xad9): In function `e1000_setup_init_funcs': ../../../dev/em/e1000_api.c:343: undefined reference to `e1000_init_function_pointers_80003es2lan' e1000_api.o(.text+0xae8):../../../dev/em/e1000_api.c:340: undefined reference to `e1000_init_function_pointers_82571' e1000_api.o(.text+0xafa):../../../dev/em/e1000_api.c:334: undefined reference to `e1000_init_function_pointers_82541' e1000_api.o(.text+0xb0c):../../../dev/em/e1000_api.c:328: undefined reference to `e1000_init_function_pointers_82540' e1000_api.o(.text+0xb1e):../../../dev/em/e1000_api.c:321: undefined reference to `e1000_init_function_pointers_82543' e1000_api.o(.text+0xb30):../../../dev/em/e1000_api.c:316: undefined reference to `e1000_init_function_pointers_82542' e1000_ich8lan.o(.text+0x98c): In function `e1000_valid_nvm_bank_detect_ich8lan': ../../../dev/em/e1000_ich8lan.c:1032: undefined reference to `e1000_translate_register_82542' e1000_ich8lan.o(.text+0xc32): In function `e1000_acquire_swflag_ich8lan': ../../../dev/em/e1000_ich8lan.c:424: undefined reference to `e1000_translate_register_82542' e1000_ich8lan.o(.text+0xc6e):../../../dev/em/e1000_ich8lan.c:426: undefined reference to `e1000_translate_register_82542' e1000_ich8lan.o(.text+0xc9d):../../../dev/em/e1000_ich8lan.c:422: undefined reference to `e1000_translate_register_82542' e1000_ich8lan.o(.text+0xced):../../../dev/em/e1000_ich8lan.c:436: undefined reference to `e1000_translate_register_82542' e1000_ich8lan.o(.text+0x16bf):../../../dev/em/e1000_ich8lan.c:2700: more undefined references to `e1000_translate_register_82542' follow *** Error code 1 Thoughts? Later, George From jfvogel at gmail.com Mon Jul 14 22:17:37 2008 From: jfvogel at gmail.com (Jack Vogel) Date: Mon Jul 14 22:17:45 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: References: Message-ID: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> Just guessing, did someone change conf/files maybe?? Jack On Mon, Jul 14, 2008 at 2:44 PM, wrote: > Howdy, > > As of today, this afternoon, I see the following: > > linking kernel.debug > e1000_api.o(.text+0xad9): In function `e1000_setup_init_funcs': > ../../../dev/em/e1000_api.c:343: undefined reference to `e1000_init_function_pointers_80003es2lan' > e1000_api.o(.text+0xae8):../../../dev/em/e1000_api.c:340: undefined reference to `e1000_init_function_pointers_82571' > e1000_api.o(.text+0xafa):../../../dev/em/e1000_api.c:334: undefined reference to `e1000_init_function_pointers_82541' > e1000_api.o(.text+0xb0c):../../../dev/em/e1000_api.c:328: undefined reference to `e1000_init_function_pointers_82540' > e1000_api.o(.text+0xb1e):../../../dev/em/e1000_api.c:321: undefined reference to `e1000_init_function_pointers_82543' > e1000_api.o(.text+0xb30):../../../dev/em/e1000_api.c:316: undefined reference to `e1000_init_function_pointers_82542' > e1000_ich8lan.o(.text+0x98c): In function `e1000_valid_nvm_bank_detect_ich8lan': > ../../../dev/em/e1000_ich8lan.c:1032: undefined reference to `e1000_translate_register_82542' > e1000_ich8lan.o(.text+0xc32): In function `e1000_acquire_swflag_ich8lan': > ../../../dev/em/e1000_ich8lan.c:424: undefined reference to `e1000_translate_register_82542' > e1000_ich8lan.o(.text+0xc6e):../../../dev/em/e1000_ich8lan.c:426: undefined reference to `e1000_translate_register_82542' > e1000_ich8lan.o(.text+0xc9d):../../../dev/em/e1000_ich8lan.c:422: undefined reference to `e1000_translate_register_82542' > e1000_ich8lan.o(.text+0xced):../../../dev/em/e1000_ich8lan.c:436: undefined reference to `e1000_translate_register_82542' > e1000_ich8lan.o(.text+0x16bf):../../../dev/em/e1000_ich8lan.c:2700: more undefined references to `e1000_translate_register_82542' follow > *** Error code 1 > > > Thoughts? > > Later, > George > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From freebsdlists at bsdunix.ch Tue Jul 15 12:31:50 2008 From: freebsdlists at bsdunix.ch (Thomas Vogt) Date: Tue Jul 15 12:31:57 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) Message-ID: <487C9457.5080609@bsdunix.ch> Hello Since i updated my FreeBSD 6.3 dns server with the latest bind version in the ports (dns/bind94) my system is flooding my log with "too many open file descriptors" messages. Is there something i can do? Example: Jul 15 12:08:38 intern named[50840]: socket: too many open file descriptors Jul 15 12:09:05 intern last message repeated 68 times sysctl: kern.ipc.somaxconn=4096 kern.ipc.nmbclusters=65536 kern.ipc.maxsockets=204800 net.inet.tcp.sendspace=65535 net.inet.tcp.recvspace=65535 net.inet.udp.recvspace=65535 loder.conf userconfig_script_load="YES" kern.maxdsiz="900M" net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=100 System: FreeBSD intern.lan 6.3-RELEASE-p2 FreeBSD 6.3-RELEASE-p2 #4: Fri May 16 11:40:24 UTC 2008 root@intern.lan:/usr/obj/usr/src/sys/UP6 i386 netstat -m 517/773/1290 mbufs in use (current/cache/total) 513/261/774/65536 mbuf clusters in use (current/cache/total/max) 513/255 mbuf+clusters out of packet secondary zone in use (current/cache) 0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/0 9k jumbo clusters in use (current/cache/total/max) 0/0/0/0 16k jumbo clusters in use (current/cache/total/max) 1155K/715K/1870K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/7/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 105 calls to protocol drain routines Regards, Thomas From kris at FreeBSD.org Tue Jul 15 13:14:25 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 15 13:14:37 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <487C9457.5080609@bsdunix.ch> References: <487C9457.5080609@bsdunix.ch> Message-ID: <487CA29F.6080500@FreeBSD.org> Thomas Vogt wrote: > Hello > > Since i updated my FreeBSD 6.3 dns server with the latest bind version > in the ports (dns/bind94) my system is flooding my log with "too many > open file descriptors" messages. > > Is there something i can do? > > Example: > Jul 15 12:08:38 intern named[50840]: socket: too many open file descriptors > Jul 15 12:09:05 intern last message repeated 68 times Is this a busy name server handling thousands of queries per second? If so, the solution, perhaps not surprisingly, is to increase the number of file descriptors :) kern.maxfiles: 12328 kern.maxfilesperproc: 11095 Kris From gnn at freebsd.org Tue Jul 15 17:04:25 2008 From: gnn at freebsd.org (gnn@freebsd.org) Date: Tue Jul 15 17:04:32 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> References: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> Message-ID: At Mon, 14 Jul 2008 14:53:16 -0700, Jack Vogel wrote: > > Just guessing, did someone change conf/files maybe?? > If you build a STABLE kernel with igb AND em then things work and the kernel uses em. I'm not sure which thing needs to be changed in conf/files or otherwise though. Later, George From jfvogel at gmail.com Tue Jul 15 17:07:24 2008 From: jfvogel at gmail.com (Jack Vogel) Date: Tue Jul 15 17:07:35 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: References: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> Message-ID: <2a41acea0807151007q29a783c4r2ae63c5a631952ba@mail.gmail.com> Oh, so the problem is if igb alone is defined? On Tue, Jul 15, 2008 at 10:04 AM, wrote: > At Mon, 14 Jul 2008 14:53:16 -0700, > Jack Vogel wrote: >> >> Just guessing, did someone change conf/files maybe?? >> > > If you build a STABLE kernel with igb AND em then things work and the > kernel uses em. > > I'm not sure which thing needs to be changed in conf/files or > otherwise though. > > Later, > George > From gnn at freebsd.org Tue Jul 15 17:32:07 2008 From: gnn at freebsd.org (gnn@freebsd.org) Date: Tue Jul 15 17:32:14 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: <2a41acea0807151007q29a783c4r2ae63c5a631952ba@mail.gmail.com> References: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> <2a41acea0807151007q29a783c4r2ae63c5a631952ba@mail.gmail.com> Message-ID: At Tue, 15 Jul 2008 10:07:22 -0700, Jack Vogel wrote: > > Oh, so the problem is if igb alone is defined? > Yes. Best, George From jfvogel at gmail.com Tue Jul 15 17:35:59 2008 From: jfvogel at gmail.com (Jack Vogel) Date: Tue Jul 15 17:36:10 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: References: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> <2a41acea0807151007q29a783c4r2ae63c5a631952ba@mail.gmail.com> Message-ID: <2a41acea0807151035w291269abt4ed99989ae45cc8b@mail.gmail.com> OK, will put on my todo list :) On Tue, Jul 15, 2008 at 10:31 AM, wrote: > At Tue, 15 Jul 2008 10:07:22 -0700, > Jack Vogel wrote: >> >> Oh, so the problem is if igb alone is defined? >> > > Yes. > > Best, > George > From Jinmei_Tatuya at isc.org Tue Jul 15 18:12:55 2008 From: Jinmei_Tatuya at isc.org (JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?=) Date: Tue Jul 15 18:13:02 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <487C9457.5080609@bsdunix.ch> References: <487C9457.5080609@bsdunix.ch> Message-ID: At Tue, 15 Jul 2008 14:13:11 +0200, Thomas Vogt wrote: > Since i updated my FreeBSD 6.3 dns server with the latest bind version > in the ports (dns/bind94) my system is flooding my log with "too many > open file descriptors" messages. > > Is there something i can do? How many sockets is named actually using while it makes this log message? Try, e.g, % sockstat | grep named | wc -l --- JINMEI, Tatuya Internet Systems Consortium, Inc. From freebsdlists at bsdunix.ch Tue Jul 15 20:54:15 2008 From: freebsdlists at bsdunix.ch (Thomas Vogt) Date: Tue Jul 15 20:54:22 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: References: <487C9457.5080609@bsdunix.ch> Message-ID: <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> Hello Am 15.07.2008 um 20:12 schrieb JINMEI Tatuya / ????: > At Tue, 15 Jul 2008 14:13:11 +0200, > Thomas Vogt wrote: > >> Since i updated my FreeBSD 6.3 dns server with the latest bind >> version >> in the ports (dns/bind94) my system is flooding my log with "too many >> open file descriptors" messages. >> >> Is there something i can do? > > How many sockets is named actually using while it makes this log > message? Try, e.g, > % sockstat | grep named | wc -l Not that many: sockstat | grep named | wc -l 996 Regards, Thomas Thomas Vogt From Jinmei_Tatuya at isc.org Tue Jul 15 20:59:07 2008 From: Jinmei_Tatuya at isc.org (JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?=) Date: Tue Jul 15 20:59:14 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> References: <487C9457.5080609@bsdunix.ch> <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> Message-ID: At Tue, 15 Jul 2008 22:54:11 +0200, Thomas Vogt wrote: > >> Since i updated my FreeBSD 6.3 dns server with the latest bind > >> version > >> in the ports (dns/bind94) my system is flooding my log with "too many > >> open file descriptors" messages. > >> > >> Is there something i can do? > > > > How many sockets is named actually using while it makes this log > > message? Try, e.g, > > % sockstat | grep named | wc -l > > Not that many: > sockstat | grep named | wc -l > 996 Ah, it's actually quite a lot in this context:-) If that's regularly happening, I'm afraid recent P1 versions don't handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. --- JINMEI, Tatuya Internet Systems Consortium, Inc. From kris at FreeBSD.org Tue Jul 15 21:09:28 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 15 21:09:35 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: References: <487C9457.5080609@bsdunix.ch> <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> Message-ID: <487D120A.6010001@FreeBSD.org> JINMEI Tatuya / ???? wrote: > At Tue, 15 Jul 2008 22:54:11 +0200, > Thomas Vogt wrote: > >>>> Since i updated my FreeBSD 6.3 dns server with the latest bind >>>> version >>>> in the ports (dns/bind94) my system is flooding my log with "too many >>>> open file descriptors" messages. >>>> >>>> Is there something i can do? >>> How many sockets is named actually using while it makes this log >>> message? Try, e.g, >>> % sockstat | grep named | wc -l >> Not that many: >> sockstat | grep named | wc -l >> 996 > > Ah, it's actually quite a lot in this context:-) > > If that's regularly happening, I'm afraid recent P1 versions don't > handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. Or increase the number of file descriptors as a workaround, per my email :) Kris From Jinmei_Tatuya at isc.org Tue Jul 15 21:18:41 2008 From: Jinmei_Tatuya at isc.org (JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?=) Date: Tue Jul 15 21:18:48 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <487D120A.6010001@FreeBSD.org> References: <487C9457.5080609@bsdunix.ch> <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> <487D120A.6010001@FreeBSD.org> Message-ID: At Tue, 15 Jul 2008 23:09:30 +0200, Kris Kennaway wrote: > > If that's regularly happening, I'm afraid recent P1 versions don't > > handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. > > Or increase the number of file descriptors as a workaround, per my email :) Does FreeBSD allow an application to increase FD_SETSIZE (at its compilation time)? I thought FD_SETSIZE defaults to 1024. Any 9.x.y-P1 versions can only open FD_SETSIZE sockets, regardless of the # FDs limit. Besides, I guess that the P1 versions severely suffer from heavy overhead of select(2) when it regularly opens more than 1000 sockets. Even if 'too many open file' messages are gone, many users won't accept the increased load due to the overhead. Beta versions use kqueue, eliminating the fundamental overhead as well as the (too low) limitation of # of descriptors. --- JINMEI, Tatuya Internet Systems Consortium, Inc. From robin at icir.org Tue Jul 15 21:20:13 2008 From: robin at icir.org (Robin Sommer) Date: Tue Jul 15 21:20:20 2008 Subject: BPF problems on FreeBSD 7.0 In-Reply-To: <487B5840.3000401@FreeBSD.org> References: <20080711202737.GB27418@icir.org> <487B5840.3000401@FreeBSD.org> Message-ID: <20080715212013.GA91123@icir.org> On Mon, Jul 14, 2008 at 14:44 +0100, Bruce M. Simpson wrote: > One place to start might be: netstat -B output in 7.x (I *think* this got > MFCed), this will let us see what the drop count is for the Bro process, > and what the flags are for the open BPF descriptors in the system. Thanks for the suggestion. Here's the netstat -B output at the time it has stalled (after about 6 hours of working normally): Pid Netif Flags Recv Drop Match Sblen Hblen Command 14557 nxge0 p--s--- 2162189525 32514465 42815457 4194248 4194258 bro Top shows: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 14557 bro 1 -58 0 272M 267M 5 25:53 0.00% bro A few minutes after starting the process, when Bro was still working fine, a netstat -B output was: # netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 14557 nxge0 p--s--- 4779235 0 94967 0 0 bro Thanks, Robin -- Robin Sommer * Phone +1 (510) 666-2886 * robin@icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From julian at elischer.org Tue Jul 15 21:27:36 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 15 21:28:00 2008 Subject: BPF problems on FreeBSD 7.0 In-Reply-To: <20080715212013.GA91123@icir.org> References: <20080711202737.GB27418@icir.org> <487B5840.3000401@FreeBSD.org> <20080715212013.GA91123@icir.org> Message-ID: <487D15C7.3040700@elischer.org> Robin Sommer wrote: > On Mon, Jul 14, 2008 at 14:44 +0100, Bruce M. Simpson wrote: > >> One place to start might be: netstat -B output in 7.x (I *think* this got >> MFCed), this will let us see what the drop count is for the Bro process, >> and what the flags are for the open BPF descriptors in the system. > > Thanks for the suggestion. Here's the netstat -B output at the time > it has stalled (after about 6 hours of working normally): > > Pid Netif Flags Recv Drop Match Sblen Hblen Command > 14557 nxge0 p--s--- 2162189525 32514465 42815457 4194248 4194258 br the Recv number is JUST past 2^31. at your rate of receiving packets, it passed that value about 2 minutes before this snapshot was taken.. > > Top shows: > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 14557 bro 1 -58 0 272M 267M 5 25:53 0.00% bro > > > > A few minutes after starting the process, when Bro was still working > fine, a netstat -B output was: > > # netstat -B > Pid Netif Flags Recv Drop Match Sblen Hblen Command > 14557 nxge0 p--s--- 4779235 0 94967 0 0 bro > > Thanks, > > Robin > From kris at FreeBSD.org Tue Jul 15 21:28:21 2008 From: kris at FreeBSD.org (Kris Kennaway) Date: Tue Jul 15 21:28:39 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: References: <487C9457.5080609@bsdunix.ch> <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> <487D120A.6010001@FreeBSD.org> Message-ID: <487D1677.8010900@FreeBSD.org> JINMEI Tatuya / ???? wrote: > At Tue, 15 Jul 2008 23:09:30 +0200, > Kris Kennaway wrote: > >>> If that's regularly happening, I'm afraid recent P1 versions don't >>> handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. >> Or increase the number of file descriptors as a workaround, per my email :) > > Does FreeBSD allow an application to increase FD_SETSIZE (at its > compilation time)? I thought FD_SETSIZE defaults to 1024. Any > 9.x.y-P1 versions can only open FD_SETSIZE sockets, regardless of the > # FDs limit. > > Besides, I guess that the P1 versions severely suffer from heavy > overhead of select(2) when it regularly opens more than 1000 sockets. > Even if 'too many open file' messages are gone, many users won't > accept the increased load due to the overhead. Beta versions use > kqueue, eliminating the fundamental overhead as well as the (too low) > limitation of # of descriptors. Ah yes, I hadnt thought about select limitations. Kris From robin at icir.org Tue Jul 15 21:51:21 2008 From: robin at icir.org (Robin Sommer) Date: Tue Jul 15 21:51:27 2008 Subject: BPF problems on FreeBSD 7.0 In-Reply-To: <487D15C7.3040700@elischer.org> References: <20080711202737.GB27418@icir.org> <487B5840.3000401@FreeBSD.org> <20080715212013.GA91123@icir.org> <487D15C7.3040700@elischer.org> Message-ID: <20080715215120.GB92009@icir.org> On Tue, Jul 15, 2008 at 14:25 -0700, you wrote: >> Thanks for the suggestion. Here's the netstat -B output at the time >> it has stalled (after about 6 hours of working normally): [...] > at your rate of receiving packets, it passed that value about > 2 minutes before this snapshot was taken.. Sorry, I wasn't precise: the process stalled after about 6 hours but the netstat output is actually from much later (the next day in fact, because it stalled latet a night) when it was still in that state. Robin -- Robin Sommer * Phone +1 (510) 666-2886 * robin@icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From bakul at bitblocks.com Tue Jul 15 22:12:33 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 15 22:12:38 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: Your message of "Tue, 15 Jul 2008 14:18:41 PDT." Message-ID: <20080715221231.E087C5B46@mail.bitblocks.com> On Tue, 15 Jul 2008 14:18:41 PDT JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= wrote: > At Tue, 15 Jul 2008 23:09:30 +0200, > Kris Kennaway wrote: > > > > If that's regularly happening, I'm afraid recent P1 versions don't > > > handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. > > > > Or increase the number of file descriptors as a workaround, per my email :) > > Does FreeBSD allow an application to increase FD_SETSIZE (at its > compilation time)? I thought FD_SETSIZE defaults to 1024. Any > 9.x.y-P1 versions can only open FD_SETSIZE sockets, regardless of the > # FDs limit. $ cvs log /sys/kern/kern_generic.c ... revision 1.19 date: 1996/08/20 07:17:48; author: smpatel; state: Exp; lines: +43 -15 Remove the kernel FD_SETSIZE limit for select(). ... Unless things have changed, you can completely ignore FD_SETSIZE (& struct fd_set) and decide at runtime how many fds you want in a select read/write set (subject to the openfiles limit). Hmm... things have reverted back.... cvs blame -r1.71 /sys/kern/kern_generic.c # the earliest reversal I can find ... 1.71 (peter 07-Feb-01): * This is kinda bogus. We have fd limi ts, but that doesn't 1.71 (peter 07-Feb-01): * map too well to the size of the pfd[] array. Make sure 1.71 (peter 07-Feb-01): * we let the process use at least FD_SETSIZE entries. 1.71 (peter 07-Feb-01): * The specs say we only have to support OPEN_MAX entries (64). 1.71 (peter 07-Feb-01): */ 1.71 (peter 07-Feb-01): lim = min((int)p->p_rlimit[RLIMIT_NOFILE].rlim_cur, maxfilesperproc); 1.71 (peter 07-Feb-01): lim = min(lim, FD_SETSIZE); 1.71 (peter 07-Feb-01): if (nfds > lim) 1.71 (peter 07-Feb-01): return (EINVAL); Sigh.... This is a mistake. I don't see why a user is not allowed to select on all the fds he can open. The corresponding log indicates perhaps the author didn't know select used to work for # of fds > FD_SETSIZE. revision 1.71 date: 2001/02/07 23:28:01; author: peter; state: Exp; lines: +16 -8 The code I picked up from NetBSD in '97 had a nasty bug. It limited the index of the pollfd array to the number of fd's currently open, not the maximum number of fd's. ie: if you had 0,1,2 open, you could not use pollfd slots higher than 20. The specs say we only have to support OPEN_MAX [64] entries but we allow way more than that. > Besides, I guess that the P1 versions severely suffer from heavy > overhead of select(2) when it regularly opens more than 1000 sockets. > Even if 'too many open file' messages are gone, many users won't > accept the increased load due to the overhead. Beta versions use > kqueue, eliminating the fundamental overhead as well as the (too low) > limitation of # of descriptors. Or more portably you can use poll(2). From Jinmei_Tatuya at isc.org Tue Jul 15 22:39:10 2008 From: Jinmei_Tatuya at isc.org (JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?=) Date: Tue Jul 15 22:39:16 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <20080715221231.E087C5B46@mail.bitblocks.com> References: <20080715221231.E087C5B46@mail.bitblocks.com> Message-ID: At Tue, 15 Jul 2008 15:12:31 -0700, Bakul Shah wrote: > > Besides, I guess that the P1 versions severely suffer from heavy > > overhead of select(2) when it regularly opens more than 1000 sockets. > > Even if 'too many open file' messages are gone, many users won't > > accept the increased load due to the overhead. Beta versions use > > kqueue, eliminating the fundamental overhead as well as the (too low) > > limitation of # of descriptors. > > Or more portably you can use poll(2). I've not played with poll(2) in BIND9, but as far as I understand it, it doesn't solve the fundamental overhead issue here. For example, the application should examine all possible descriptors even if only a few of them are readable. Anyway, since this is a FreeBSD specific list, I believe we can safely assume the existence of kqueue, unless we are talking about a very old version:-) --- JINMEI, Tatuya Internet Systems Consortium, Inc. From bakul at bitblocks.com Tue Jul 15 23:09:19 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Tue Jul 15 23:09:29 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: Your message of "Tue, 15 Jul 2008 15:39:09 PDT." Message-ID: <20080715230917.DAC3B5B46@mail.bitblocks.com> On Tue, 15 Jul 2008 15:39:09 PDT JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= wrote: > At Tue, 15 Jul 2008 15:12:31 -0700, > Bakul Shah wrote: > > > > Besides, I guess that the P1 versions severely suffer from heavy > > > overhead of select(2) when it regularly opens more than 1000 sockets. > > > Even if 'too many open file' messages are gone, many users won't > > > accept the increased load due to the overhead. Beta versions use > > > kqueue, eliminating the fundamental overhead as well as the (too low) > > > limitation of # of descriptors. > > > > Or more portably you can use poll(2). > > I've not played with poll(2) in BIND9, but as far as I understand it, > it doesn't solve the fundamental overhead issue here. For example, > the application should examine all possible descriptors even if only a > few of them are readable. IIRC, when poll() returns n, you only look at the first n values in the pollfd array so it is a win when you expect a very small number of fds to be ready. In the select case you have to test the bit array until you see the last ready fd. > Anyway, since this is a FreeBSD specific list, I believe we can safely > assume the existence of kqueue, unless we are talking about a very old > version:-) Presumably kqueue has a lower cpu usage until the system gets loaded at which point polling might win. From julian at elischer.org Tue Jul 15 23:19:15 2008 From: julian at elischer.org (Julian Elischer) Date: Tue Jul 15 23:19:23 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <20080715230917.DAC3B5B46@mail.bitblocks.com> References: <20080715230917.DAC3B5B46@mail.bitblocks.com> Message-ID: <487D2FF0.7000706@elischer.org> Bakul Shah wrote: > On Tue, 15 Jul 2008 15:39:09 PDT JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= wrote: >> At Tue, 15 Jul 2008 15:12:31 -0700, >> Bakul Shah wrote: >> >>>> Besides, I guess that the P1 versions severely suffer from heavy >>>> overhead of select(2) when it regularly opens more than 1000 sockets. >>>> Even if 'too many open file' messages are gone, many users won't >>>> accept the increased load due to the overhead. Beta versions use >>>> kqueue, eliminating the fundamental overhead as well as the (too low) >>>> limitation of # of descriptors. >>> Or more portably you can use poll(2). >> I've not played with poll(2) in BIND9, but as far as I understand it, >> it doesn't solve the fundamental overhead issue here. For example, >> the application should examine all possible descriptors even if only a >> few of them are readable. > > IIRC, when poll() returns n, you only look at the first n > values in the pollfd array so it is a win when you expect a > very small number of fds to be ready. In the select case you > have to test the bit array until you see the last ready fd. > >> Anyway, since this is a FreeBSD specific list, I believe we can safely >> assume the existence of kqueue, unless we are talking about a very old >> version:-) > > Presumably kqueue has a lower cpu usage until the system gets > loaded at which point polling might win. I don't think so, since kqueue only runs code associated with events that have actually happened, and then only once until it's processed where las I looked poll had more to do on each call. also kqueue allows you to associate arbitrary identification informnation with each event so you don't have to have extra code to go from the fd to the event.. It's just way more efficient. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From Jinmei_Tatuya at isc.org Tue Jul 15 23:37:01 2008 From: Jinmei_Tatuya at isc.org (JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?=) Date: Tue Jul 15 23:37:07 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <20080715230917.DAC3B5B46@mail.bitblocks.com> References: <20080715230917.DAC3B5B46@mail.bitblocks.com> Message-ID: At Tue, 15 Jul 2008 16:09:17 -0700, Bakul Shah wrote: > IIRC, when poll() returns n, you only look at the first n > values in the pollfd array so it is a win when you expect a > very small number of fds to be ready. In the select case you > have to test the bit array until you see the last ready fd. % uname -a FreeBSD opt1.jinmei.org 7.0-RC1 FreeBSD 7.0-RC1 #0: Fri Jan 25 15:17:04 PST 2008 root@opt1.jinmei.org:/usr/src/sys/amd64/compile/GENERIC_NOSMP amd64 (please ignore "RC1":-) % cat polltest.c (omitted here, see below) % cc -o polltest polltest.c % ./polltest poll returned: 1 999th socket is ready (fd=1002) Perhaps You're probably confused poll(2) with /dev/poll. The latter behaves as you described (but is not portable as poll(2)). --- JINMEI, Tatuya Internet Systems Consortium, Inc. out put of polltest.c #include #include #include #include #include #include #include main() { int i, n; struct pollfd pfds[1000]; struct sockaddr_in sin; socklen_t sin_len; char buf[16]; memset(pfds, 0, sizeof(pfds)); memset(&sin, 0, sizeof(sin)); sin.sin_family = AF_INET; sin.sin_len = sizeof(sin); inet_pton(AF_INET, "127.0.0.1", &sin.sin_addr); for (i = 0; i < 1000; i ++) { if ((pfds[i].fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0) { perror("socket"); exit(1); } if (bind(pfds[i].fd, (struct sockaddr *)&sin, sizeof(sin)) < 0) { perror("bind"); exit(1); } pfds[i].events = POLLIN; } sin_len = sizeof(sin); if (getsockname(pfds[999].fd, (struct sockaddr *)&sin, &sin_len) < 0) { perror("getsockname"); exit(1); } if (sendto(pfds[999].fd, buf, sizeof(buf), 0, (struct sockaddr *)&sin, sizeof(sin)) < 0) { perror("sendto"); exit(1); } n = poll(pfds, 1000, -1); printf("poll returned: %d\n", n); for (i = 0; i < 1000; i++) { if ((pfds[i].revents & POLLIN) != 0) { printf("%dth socket is ready (fd=%d)\n", i, pfds[i].fd); } } exit(0); } From peterjeremy at optushome.com.au Tue Jul 15 23:43:15 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Tue Jul 15 23:43:23 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <20080715230917.DAC3B5B46@mail.bitblocks.com> References: <20080715230917.DAC3B5B46@mail.bitblocks.com> Message-ID: <20080715234254.GZ62764@server.vk2pj.dyndns.org> On 2008-Jul-15 16:09:17 -0700, Bakul Shah wrote: >IIRC, when poll() returns n, you only look at the first n >values in the pollfd array so it is a win when you expect a >very small number of fds to be ready. In the select case you >have to test the bit array until you see the last ready fd. No. Both poll(2) and select(2) return the number of FDs ready for I/O. You need to scan the pollfd or fd_set array until you find that many FDs ready. poll(2) is a win if you only need to test a small number of FDs compared to the number of FDs that the process has open. In the case of bind, you have a large number of FDs to test, of which you are only expecting a very small number to be ready - if you don't treat fd_set as opaque, select(2) allows you to quickly skip large (roughly wordsize) chunks of un-interesting FDs. Note that, based on sys_generic.c in 7.x and -CURRENT, poll(2) is limited to checking FD_SETSIZE descriptors, whilst select(2) has no upper limit. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080715/2ab6d294/attachment.pgp From smithi at nimnet.asn.au Wed Jul 16 03:40:48 2008 From: smithi at nimnet.asn.au (Ian Smith) Date: Wed Jul 16 03:40:56 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <487CA29F.6080500@FreeBSD.org> Message-ID: On Tue, 15 Jul 2008, Kris Kennaway wrote: > Thomas Vogt wrote: > > Hello > > > > Since i updated my FreeBSD 6.3 dns server with the latest bind version > > in the ports (dns/bind94) my system is flooding my log with "too many > > open file descriptors" messages. > > > > Is there something i can do? > > > > Example: > > Jul 15 12:08:38 intern named[50840]: socket: too many open file descriptors > > Jul 15 12:09:05 intern last message repeated 68 times > > Is this a busy name server handling thousands of queries per second? > > If so, the solution, perhaps not surprisingly, is to increase the number > of file descriptors :) > > kern.maxfiles: 12328 > kern.maxfilesperproc: 11095 Can you disclose the magic incantation for those particular numbers? cheers, Ian From alfred at freebsd.org Wed Jul 16 06:10:49 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Wed Jul 16 06:10:57 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: <20080715234254.GZ62764@server.vk2pj.dyndns.org> References: <20080715230917.DAC3B5B46@mail.bitblocks.com> <20080715234254.GZ62764@server.vk2pj.dyndns.org> Message-ID: <20080716055322.GQ95574@elvis.mu.org> FWIW, the userland scan of the files is not nearly as bad as what happens in the kernel when hundreds or thousands of objects are accessed that blow out the cache, oh and the locking that occurs as well. * Peter Jeremy [080715 16:43] wrote: > On 2008-Jul-15 16:09:17 -0700, Bakul Shah wrote: > >IIRC, when poll() returns n, you only look at the first n > >values in the pollfd array so it is a win when you expect a > >very small number of fds to be ready. In the select case you > >have to test the bit array until you see the last ready fd. > > No. Both poll(2) and select(2) return the number of FDs ready for > I/O. You need to scan the pollfd or fd_set array until you find that > many FDs ready. > > poll(2) is a win if you only need to test a small number of FDs > compared to the number of FDs that the process has open. In the case > of bind, you have a large number of FDs to test, of which you are > only expecting a very small number to be ready - if you don't > treat fd_set as opaque, select(2) allows you to quickly skip large > (roughly wordsize) chunks of un-interesting FDs. > > Note that, based on sys_generic.c in 7.x and -CURRENT, poll(2) is > limited to checking FD_SETSIZE descriptors, whilst select(2) has > no upper limit. > > -- > Peter Jeremy > Please excuse any delays as the result of my ISP's inability to implement > an MTA that is either RFC2821-compliant or matches their claimed behaviour. -- - Alfred Perlstein From bakul at bitblocks.com Wed Jul 16 07:19:26 2008 From: bakul at bitblocks.com (Bakul Shah) Date: Wed Jul 16 07:19:32 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: Your message of "Tue, 15 Jul 2008 16:37:00 PDT." Message-ID: <20080716071925.2F8145B4D@mail.bitblocks.com> On Tue, 15 Jul 2008 16:37:00 PDT JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= wrote: > > Perhaps You're probably confused poll(2) with /dev/poll. The latter > behaves as you described (but is not portable as poll(2)). Indeed I am confused. Not sure where I got that idea. On Tue, 15 Jul 2008 16:17:04 PDT Julian Elischer wrote: > Bakul Shah wrote: > > ... > > Presumably kqueue has a lower cpu usage until the system gets > > loaded at which point polling might win. > > I don't think so, since kqueue only runs code associated with events > that have actually happened, and then only once until it's processed > where las I looked poll had more to do on each call. Yes. poll/select overhead of scanning the entire list is incurred on each system call + the kernel overhead (as Alfred pointed out later). On Wed, 16 Jul 2008 09:42:54 +1000 Peter Jeremy wrote: > Note that, based on sys_generic.c in 7.x and -CURRENT, poll(2) is > limited to checking FD_SETSIZE descriptors, whilst select(2) has > no upper limit. I strike out here as well. I should've read the code much more carefully or tested select() before opening my mouth. All in all it was not a good idea to post anything. My apologies for wasting everyone's time. And thanks all for correcting me without any flaming! From eitans at mellanox.co.il Wed Jul 16 08:08:56 2008 From: eitans at mellanox.co.il (Eitan Shefi) Date: Wed Jul 16 08:09:04 2008 Subject: "ping" with packets larger then 25152 bytes fails. Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD196E6F@mtlexch01.mtl.com> When I run "ping" between 2 identical FreeBSD hosts, with packets larger then 25152 bytes, "ping" fails. Does someone has an idea what might cause this failure ? Thanks, Eitan From peterjeremy at optushome.com.au Wed Jul 16 11:55:54 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Wed Jul 16 11:56:01 2008 Subject: "ping" with packets larger then 25152 bytes fails. In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD196E6F@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD196E6F@mtlexch01.mtl.com> Message-ID: <20080716115548.GJ62764@server.vk2pj.dyndns.org> On 2008-Jul-16 10:41:57 +0300, Eitan Shefi wrote: >When I run "ping" between 2 identical FreeBSD hosts, with packets larger >then 25152 bytes, "ping" fails. Intriguing. >Does someone has an idea what might cause this failure ? No, but a few more datapoints: - it only affects real network connections - localhost is unaffected - The problem also occurs when pinging FreeBSD 7.x from linux but not when the same linux system pings a Winbloze box. - Pinging either linux or winbloze from FreeBSD 7.x fails. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080716/88091133/attachment.pgp From gnn at freebsd.org Wed Jul 16 15:23:54 2008 From: gnn at freebsd.org (gnn@freebsd.org) Date: Wed Jul 16 15:24:06 2008 Subject: igb doesn't compile in STABLE? In-Reply-To: <2a41acea0807151035w291269abt4ed99989ae45cc8b@mail.gmail.com> References: <2a41acea0807141453s7235d894i31a744a0f673fcc0@mail.gmail.com> <2a41acea0807151007q29a783c4r2ae63c5a631952ba@mail.gmail.com> <2a41acea0807151035w291269abt4ed99989ae45cc8b@mail.gmail.com> Message-ID: At Tue, 15 Jul 2008 10:35:57 -0700, Jack Vogel wrote: > > OK, will put on my todo list :) > Thanks. A kernel built that way (i.e. with igb and em) does actually work, which is good, but if you're going to split them up we should get this right before 7.1. Best, George From atkin901 at yahoo.com Wed Jul 16 16:50:37 2008 From: atkin901 at yahoo.com (Mark Atkinson) Date: Wed Jul 16 16:50:43 2008 Subject: "ping" with packets larger then 25152 bytes fails. References: <5D49E7A8952DC44FB38C38FA0D758EAD196E6F@mtlexch01.mtl.com> Message-ID: Eitan Shefi wrote: > When I run "ping" between 2 identical FreeBSD hosts, with packets larger > then 25152 bytes, "ping" fails. > > Does someone has an idea what might cause this failure ?[ My first guess is you're probably hitting the fragment limit for maximum fragments per packet. Which is like 16/packet by default. -- Mark Atkinson atkin901@yahoo.com (!wired)?(coffee++):(wired); From davidch at broadcom.com Wed Jul 16 17:34:35 2008 From: davidch at broadcom.com (David Christensen) Date: Wed Jul 16 17:34:43 2008 Subject: Enabling MSI-X on -CURRENT for New Network Driver Message-ID: <5D267A3F22FD854F8F48B3D2B52381932677F1A0C3@IRVEXCHCCR01.corp.ad.broadcom.com> I'm working on adding MSI-X support for a new network driver and having some difficulty in actually getting an interrupt. Does this look right? /* Select and configure the IRQ. */ sc->bxe_msix_count = pci_msix_count(dev); rid = 1; /* Try allocating MSI-X interrupts. */ if ((sc->bxe_cap_flags & BXE_MSIX_CAPABLE_FLAG) && (bxe_msi_enable >= 2) && (sc->bxe_msix_count > 0)) { int msix_needed = sc->bxe_msix_count; if (pci_alloc_msix(dev, &sc->bxe_msix_count) == 0) { if (sc->bxe_msix_count == msix_needed) { DBPRINT(sc, BXE_INFO_LOAD, "%s(): Using %d MSI-X " "vector(s).\n", __FUNCTION__, sc->bxe_msix_count); sc->bxe_flags |= BXE_USING_MSIX_FLAG; } else { pci_release_msi(dev); sc->bxe_flags &= ~BXE_USING_MSIX_FLAG; sc->bxe_msix_count = 0; } } } /* Try allocating MSI interrupts if we didn't get MSI-X. */ ... /* Try legacy interrupt. */ ... /* Allocate the interrupt and report any errors. */ sc->bxe_res_irq = bus_alloc_resource_any(dev, SYS_RES_IRQ, &rid, RF_ACTIVE); /* Report any IRQ allocation errors. */ if (sc->bxe_res_irq == NULL) { BXE_PRINTF("%s(%d): PCI map interrupt failed!\n", __FILE__, __LINE__); rc = ENXIO; goto bxe_attach_fail; } sc->bxe_irq_rid = rid; sc->bxe_intr = bxe_intr; The allocation doesn't fail and I usually see an IRQ allocated to the driver using "vmstat -i" (though not always): ===[root] /usr/src/sys/modules/bxe # vmstat -i interrupt total rate irq1: atkbd0 1 0 irq4: sio0 46432 6 irq6: fdc0 10 0 irq14: ata0 58 0 irq17: atapci1 42684 5 cpu0: timer 15331063 1999 irq256: em0 917 0 cpu3: timer 15330811 1999 cpu1: timer 15330808 1999 cpu2: timer 15330811 1999 cpu5: timer 15330811 1999 cpu6: timer 15330810 1999 cpu4: timer 15330806 1999 cpu7: timer 15330811 1999 irq258: bxe0 2 0 Total 122736835 16010 But my interrupt handler doesn't seem to be called. The goal is to get a single interrupt working first, multiple queue support comes next. Any ideas? Dave From barney_cordoba at yahoo.com Wed Jul 16 20:04:41 2008 From: barney_cordoba at yahoo.com (Barney Cordoba) Date: Wed Jul 16 20:04:47 2008 Subject: "ping" with packets larger then 25152 bytes fails. In-Reply-To: <20080716115548.GJ62764@server.vk2pj.dyndns.org> Message-ID: <813740.41559.qm@web63912.mail.re1.yahoo.com> --- On Wed, 7/16/08, Peter Jeremy wrote: > From: Peter Jeremy > Subject: Re: "ping" with packets larger then 25152 bytes fails. > To: "Eitan Shefi" > Cc: freebsd-net@freebsd.org > Date: Wednesday, July 16, 2008, 7:55 AM > On 2008-Jul-16 10:41:57 +0300, Eitan Shefi > wrote: > >When I run "ping" between 2 identical FreeBSD > hosts, with packets larger > >then 25152 bytes, "ping" fails. > > Intriguing. > > >Does someone has an idea what might cause this failure > ? > > No, but a few more datapoints: > - it only affects real network connections - localhost is > unaffected > - The problem also occurs when pinging FreeBSD 7.x from > linux but not > when the same linux system pings a Winbloze box. > - Pinging either linux or winbloze from FreeBSD 7.x fails. > > -- > Peter Jeremy > Please excuse any delays as the result of my ISP's > inability to implement > an MTA that is either RFC2821-compliant or matches their > claimed behaviour. Isn't this sort of like going to your auto dealer and complaining that you get vibration at 240mph? From peterjeremy at optushome.com.au Wed Jul 16 21:35:31 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Wed Jul 16 21:35:37 2008 Subject: "ping" with packets larger then 25152 bytes fails. In-Reply-To: <813740.41559.qm@web63912.mail.re1.yahoo.com> References: <20080716115548.GJ62764@server.vk2pj.dyndns.org> <813740.41559.qm@web63912.mail.re1.yahoo.com> Message-ID: <20080716211815.GW62764@server.vk2pj.dyndns.org> On 2008-Jul-16 12:37:59 -0700, Barney Cordoba wrote: >> >When I run "ping" between 2 identical FreeBSD hosts, with packets larger >> >then 25152 bytes, "ping" fails. ... >Isn't this sort of like going to your auto dealer and complaining that you get vibration at 240mph? I don't think so. There are no specific limits on the size of ICMP ECHO REQUEST or ICMP ECHO REPLY packets, therefore the only limit should be the IP packet limit (64KB). It does work with other IP stacks and with the loopback interface on FreeBSD. Poking around a bit more, the culprit looks like net.inet.ip.maxfragsperpacket - which is set to 16 by default. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080716/4666e773/attachment.pgp From linxiaosong at keynet.com.cn Thu Jul 17 02:38:15 2008 From: linxiaosong at keynet.com.cn (Wasily Lin) Date: Thu Jul 17 02:38:22 2008 Subject: mpd5.1 MTU problem Message-ID: <487EACC5.1060109@keynet.com.cn> Hello, I set up a PPPoE server on FreeBSD 7.0(amd64) with mpd 5.1 and it works fine for all clients except for my FreeBSD 7.0(i386) Notebook. Connecting has no problem and I get ip but all website can not be access even on PPPoE server itself(Apache installed), so can not ftp site. I've used mpd 5.1_1 and pppoe(built-in) as pppoe client but the problem was same - can not access http/ftp..., only icmp works. I think the problem is MTU then changed that but no effects. Now my configuration: PPPoE Server: startup: set netflow peer 127.0.0.1 1813 set user admin xxxxx admin set user operator xxxxx operator set user user xxxxx user set console open default: load pppoe_server pppoe_server: create bundle template B set ippool add pool 10.0.0.100 10.0.0.200 set iface enable netflow-in set iface enable netflow-out set iface enable ipacct set iface enable proxy-arp set iface mtu 1460 <-----------------------! set ipcp ranges 10.0.0.1/32 ippool pool set ipcp dns 172.18.30.125 create link template common pppoe set link enable pap set link disable chap set link enable multilink set link action bundle B load radius create link template em0 common set link max-children 1000 set pppoe iface em0 set link enable incoming radius: set radius server 127.0.0.1 xxxxxxxx 1812 1813 set radius retries 3 set radius timeout 3 set radius me 127.0.0.1 set auth max-logins 1 set auth acct-update 300 set auth enable radius-auth set auth enable radius-acct set radius enable message-authentic PPPoE client: startup: set user admin xxxxx admin set console open default: load pppoe_client pppoe_client: create bundle static B1 set iface route default set ipcp ranges 0.0.0.0/0 0.0.0.0/0 create link static L1 pppoe set link action bundle B1 set auth authname xxxxxx set auth password xxxxxx set link max-redial 0 set link keep-alive 10 60 set pppoe iface em0 set pppoe service "" open After connected: PPPoE server: ng15: flags=88d1 metric 0 mtu 1460 inet 10.0.0.1 --> 10.0.0.115 netmask 0xffffffff PPPoE client: ng0: flags=88d1 metric 0 mtu 1460 inet 10.0.0.115 --> 10.0.0.1 netmask 0xffffffff tcpdump output: PPPoE server: pppoe# tcpdump -i ng15 -ln host 10.0.0.1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ng15, link-type NULL (BSD loopback), capture size 96 bytes 10:08:44.469993 IP 10.0.0.115.60331 > 10.0.0.1.80: S 2092758811:2092758811(0) win 65535 10:08:44.470056 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:08:47.469997 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:08:53.469978 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:09:05.469918 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:09:44.972709 IP 10.0.0.115.60331 > 10.0.0.1.80: F 1:1(0) ack 1 win 8272 10:09:44.972744 IP 10.0.0.1.80 > 10.0.0.115.60331: R 687014729:687014729(0) win 0 PPPoE client: r00t# tcpdump -i ng0 -ln host 10.0.0.1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ng0, link-type NULL (BSD loopback), capture size 96 bytes 10:12:06.792399 IP 10.0.0.115.60331 > 10.0.0.1.80: S 2092758811:2092758811(0) win 65535 10:12:06.793151 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:12:06.793178 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 10:12:09.793385 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:12:09.793414 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 10:12:15.793331 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:12:15.793358 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 10:12:27.793227 IP 10.0.0.1.80 > 10.0.0.115.60331: S 687014728:687014728(0) ack 2092758812 win 65535 10:12:27.793255 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 10:13:07.294273 IP 10.0.0.115.60331 > 10.0.0.1.80: F 1:1(0) ack 1 win 8272 10:13:07.295358 IP 10.0.0.1.80 > 10.0.0.115.60331: R 687014729:687014729(0) win 0 As you can see, tcp/ack from client can not go through but tcp/syn, tcp/fin are fine. What's the reason? I've used the same client to connect to ISP's ADSL and work fine so what I am sure is the server refused my tcp/ack. But why? Thanks all. BSD4LZX !DSPAM:487eacd27993450375810! From brde at optusnet.com.au Thu Jul 17 03:15:33 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Thu Jul 17 03:15:40 2008 Subject: Enabling MSI-X on -CURRENT for New Network Driver In-Reply-To: <5D267A3F22FD854F8F48B3D2B52381932677F1A0C3@IRVEXCHCCR01.corp.ad.broadcom.com> References: <5D267A3F22FD854F8F48B3D2B52381932677F1A0C3@IRVEXCHCCR01.corp.ad.broadcom.com> Message-ID: <20080717131000.K2693@besplex.bde.org> On Wed, 16 Jul 2008, David Christensen wrote: > I'm working on adding MSI-X support for a new network driver > and having some difficulty in actually getting an interrupt. > Does this look right? I don't know, but on FreeBSD cluster machines running RELENG_8 bce generates too many interrupts -- approx. 46000/second to deliver approx. 2 packets/second. bce works normally on FreeBSD cluster machines running RELENG_7 and earlier (2 interrupts/second to deliver systat -v output). Bruce From sepherosa at gmail.com Thu Jul 17 03:47:07 2008 From: sepherosa at gmail.com (Sepherosa Ziehau) Date: Thu Jul 17 03:47:13 2008 Subject: Enabling MSI-X on -CURRENT for New Network Driver In-Reply-To: <20080717131000.K2693@besplex.bde.org> References: <5D267A3F22FD854F8F48B3D2B52381932677F1A0C3@IRVEXCHCCR01.corp.ad.broadcom.com> <20080717131000.K2693@besplex.bde.org> Message-ID: On Thu, Jul 17, 2008 at 11:15 AM, Bruce Evans wrote: > On Wed, 16 Jul 2008, David Christensen wrote: > >> I'm working on adding MSI-X support for a new network driver >> and having some difficulty in actually getting an interrupt. >> Does this look right? > > I don't know, but on FreeBSD cluster machines running RELENG_8 bce > generates too many interrupts -- approx. 46000/second to deliver On dfly, I set the bce_rx_quick_cons_trip to 24 and bce_rx_ticks to 125, else live lock (>40000/sec) is promised when sinking packets @800kpps. I think bce uses the same coal logic as bge, so bce_rx_quick_cons_trip probably could be set to a larger value like 128; didn't have time to test 128 yet. Best Regards, sephe -- Live Free or Die From sam at freebsd.org Thu Jul 17 04:10:21 2008 From: sam at freebsd.org (Sam Leffler) Date: Thu Jul 17 04:10:27 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <486A45AB.2080609@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> Message-ID: <487EC62A.3070301@freebsd.org> Sam Leffler wrote: > Larry Baird wrote: >>> And how do I know that it works ? >>> Well, when it doesn't work, I do know it, quite quickly most of the >>> time ! >>> >> I have to chime in here. I did most of the initial porting of the >> NAT-T patches from Kame IPSec to FAST_IPSEC. I did look at every >> line of code during this process. I found no security problems during >> the port. Like Yvan, my company uses the NAT-T patches commercially. >> Like he says, if it had problems, we would hear about it. If the >> patches >> don't get commited, I highly suspect Yvan or myself would try to keep >> the >> patches up todate. So far I have done FAST_IPSEC pacthes for FreeBSD >> 4,5,6. Yvan did 7 and 8 by himself. Keeping up gets to be a pain >> after a while. I do plan to look at the FreeBSD 7 patches soon, but >> it sure would be nice >> to see it commited. >> Please test/review the following patch against HEAD: http://people.freebsd.org/~sam/nat_t-20080616.patch This adds only the kernel portion of the NAT-T support; you must provide the user-level code from another place. The main difference from the patches floating around are in the ctloutput path (adding proper locking for HEAD) and decap of ESP-in-UDP frames. Assuming folks are ok w/ these changes I'll commit to HEAD. Once this stuff goes in we can look at getting the user-mode mods into the tree. Sam PS. Thanks especially to Matthew Grooms who tested an earlier version and fixed a bug. From andrew at modulus.org Thu Jul 17 04:44:31 2008 From: andrew at modulus.org (Andrew Snow) Date: Thu Jul 17 04:44:39 2008 Subject: named.conf: query-source address In-Reply-To: <20080717044106.GA53681@eos.sc1.parodius.com> References: <20080716162042.GA27666@svzserv.kemerovo.su> <487E312E.9090307@infracaninophile.co.uk> <20080717035155.GA81536@svzserv.kemerovo.su> <8DFF6DCD-6619-4251-9944-59CED8DF1B19@mac.com> <20080717044106.GA53681@eos.sc1.parodius.com> Message-ID: <487ECDD7.2050901@modulus.org> Don't forget the souls who find themselves using jails. In this case it is common to want a name server on the parent host but not on any of the jail IPs. From smithi at nimnet.asn.au Thu Jul 17 06:19:44 2008 From: smithi at nimnet.asn.au (Ian Smith) Date: Thu Jul 17 06:19:51 2008 Subject: mpd5.1 MTU problem In-Reply-To: <487EACC5.1060109@keynet.com.cn> Message-ID: On Thu, 17 Jul 2008, Wasily Lin wrote: > Hello, > I set up a PPPoE server on FreeBSD 7.0(amd64) with mpd 5.1 and it works > fine for all clients except for my FreeBSD 7.0(i386) Notebook. > Connecting has no problem and I get ip but all website can not be access > even on PPPoE server itself(Apache installed), so can not ftp site. > I've used mpd 5.1_1 and pppoe(built-in) as pppoe client but the > problem was same - can not access http/ftp..., only icmp works. I think > the problem is MTU then changed that but no effects. Now my configuration: > > PPPoE Server: > startup: > set netflow peer 127.0.0.1 1813 > set user admin xxxxx admin > set user operator xxxxx operator > set user user xxxxx user > set console open > > default: > load pppoe_server > > pppoe_server: > > create bundle template B > set ippool add pool 10.0.0.100 10.0.0.200 > set iface enable netflow-in > set iface enable netflow-out > set iface enable ipacct > set iface enable proxy-arp > set iface mtu 1460 <-----------------------! > set ipcp ranges 10.0.0.1/32 ippool pool > set ipcp dns 172.18.30.125 > > create link template common pppoe > set link enable pap > set link disable chap > set link enable multilink > set link action bundle B > load radius > > create link template em0 common > set link max-children 1000 > set pppoe iface em0 > set link enable incoming > > radius: > set radius server 127.0.0.1 xxxxxxxx 1812 1813 > set radius retries 3 > set radius timeout 3 > set radius me 127.0.0.1 > set auth max-logins 1 > set auth acct-update 300 > set auth enable radius-auth > set auth enable radius-acct > set radius enable message-authentic > > PPPoE client: > startup: > set user admin xxxxx admin > set console open > > default: > load pppoe_client > > pppoe_client: > create bundle static B1 > set iface route default > set ipcp ranges 0.0.0.0/0 0.0.0.0/0 > > create link static L1 pppoe > set link action bundle B1 > set auth authname xxxxxx > set auth password xxxxxx > set link max-redial 0 > set link keep-alive 10 60 > set pppoe iface em0 > set pppoe service "" For the same apparent problem, from my working mpd 4.1 client config: # needed? seems so, t23 had trouble with large tcp pkts .. yep, fixed .. set iface enable tcpmssfix which I see is still in http://mpd.sourceforge.net/doc5/mpd28.html cheers, Ian > open > > After connected: > > PPPoE server: > ng15: flags=88d1 metric > 0 mtu 1460 > inet 10.0.0.1 --> 10.0.0.115 netmask 0xffffffff > > PPPoE client: > ng0: flags=88d1 metric 0 > mtu 1460 > inet 10.0.0.115 --> 10.0.0.1 netmask 0xffffffff > > tcpdump output: > > PPPoE server: > pppoe# tcpdump -i ng15 -ln host 10.0.0.1 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on ng15, link-type NULL (BSD loopback), capture size 96 bytes > 10:08:44.469993 IP 10.0.0.115.60331 > 10.0.0.1.80: S > 2092758811:2092758811(0) win 65535 3,sackOK,timestamp 4639873 0> > 10:08:44.470056 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:08:47.469997 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:08:53.469978 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:09:05.469918 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:09:44.972709 IP 10.0.0.115.60331 > 10.0.0.1.80: F 1:1(0) ack 1 win > 8272 > 10:09:44.972744 IP 10.0.0.1.80 > 10.0.0.115.60331: R > 687014729:687014729(0) win 0 > > PPPoE client: > r00t# tcpdump -i ng0 -ln host 10.0.0.1 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on ng0, link-type NULL (BSD loopback), capture size 96 bytes > 10:12:06.792399 IP 10.0.0.115.60331 > 10.0.0.1.80: S > 2092758811:2092758811(0) win 65535 3,sackOK,timestamp 4639873 0> > 10:12:06.793151 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:12:06.793178 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 > > 10:12:09.793385 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:12:09.793414 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 > > 10:12:15.793331 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:12:15.793358 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 > > 10:12:27.793227 IP 10.0.0.1.80 > 10.0.0.115.60331: S > 687014728:687014728(0) ack 2092758812 win 65535 3,sackOK,timestamp 1602770998 4639873> > 10:12:27.793255 IP 10.0.0.115.60331 > 10.0.0.1.80: . ack 1 win 8272 > > 10:13:07.294273 IP 10.0.0.115.60331 > 10.0.0.1.80: F 1:1(0) ack 1 win > 8272 > 10:13:07.295358 IP 10.0.0.1.80 > 10.0.0.115.60331: R > 687014729:687014729(0) win 0 > > As you can see, tcp/ack from client can not go through but tcp/syn, > tcp/fin are fine. > > What's the reason? I've used the same client to connect to ISP's ADSL > and work fine so what I am sure is the server refused my tcp/ack. But why? > > Thanks all. > > BSD4LZX From freebsdlists at bsdunix.ch Thu Jul 17 06:52:19 2008 From: freebsdlists at bsdunix.ch (Thomas Vogt) Date: Thu Jul 17 06:52:30 2008 Subject: too many open file descriptors messages since bind 9.4.2-P1 (port dns94) In-Reply-To: References: <487C9457.5080609@bsdunix.ch> <2A7CBD67-7532-4B13-82DD-A6EF5DEAA6BD@bsdunix.ch> Message-ID: Hello Am 15.07.2008 um 22:59 schrieb JINMEI Tatuya / ????: > At Tue, 15 Jul 2008 22:54:11 +0200, > Thomas Vogt wrote: > >>>> Since i updated my FreeBSD 6.3 dns server with the latest bind >>>> version >>>> in the ports (dns/bind94) my system is flooding my log with "too >>>> many >>>> open file descriptors" messages. >>>> >>>> Is there something i can do? >>> >>> How many sockets is named actually using while it makes this log >>> message? Try, e.g, >>> % sockstat | grep named | wc -l >> >> Not that many: >> sockstat | grep named | wc -l >> 996 > > Ah, it's actually quite a lot in this context:-) > > If that's regularly happening, I'm afraid recent P1 versions don't > handle that well, and recommend you try 9.4.3b2 ore 9.5.1b1. I installed 9.4.3b2. I haven't seen any "too many open file descriptors" messages so far. "sockstat | grep named | wc -l" shows me much less listen bind versions. During the whole night and at this early time in the morning we just have 40-150 open binds. Maybe all our customers are enjoying their summer hollydays or 9.4.3b2 handels it much better. Regads, Thomas From mav at FreeBSD.org Thu Jul 17 08:14:31 2008 From: mav at FreeBSD.org (Alexander Motin) Date: Thu Jul 17 08:14:38 2008 Subject: mpd5.1 MTU problem In-Reply-To: <1216275783.00099216.1216262401@10.7.7.3> References: <1216275783.00099216.1216262401@10.7.7.3> Message-ID: <487EF154.5070808@FreeBSD.org> Wasily Lin wrote: > set iface enable netflow-in > set iface enable netflow-out > set iface enable ipacct Strange combination. > set iface enable proxy-arp Are you sure you need it? > set iface mtu 1460 <-----------------------! That's not a problem, but usually 1492 used for PPPoE. Also in some situation 'set iface enable tcpmssfix' could help. > As you can see, tcp/ack from client can not go through but tcp/syn, > tcp/fin are fine. > > What's the reason? I've used the same client to connect to ISP's ADSL > and work fine so what I am sure is the server refused my tcp/ack. But why? As soon as all packets are very small I don't think it is an MTU problem. I would recommend you to use tcpdump on Ethernet interface to understand which side actually drops the packets and probably why. Also check that you are not using any firewall and try to disable some features on server side like ipacct. -- Alexander Motin From biancalana at gmail.com Thu Jul 17 16:02:16 2008 From: biancalana at gmail.com (Alexandre Biancalana) Date: Thu Jul 17 16:02:25 2008 Subject: openospfd+carp Message-ID: <8e10486b0807170902l4a3db309we7f143af6b79235b@mail.gmail.com> Hi list, I'm deploying a new structure between our company and our datacenter that is composed of two L2L (lan-to-lan) 100Mbit links and two redudant gateway/firewall at each side. I configured one vlan per 100Mbit link and used carp (with Max's carpdev patch) to do the failover between machines on each side, the vlan interfaces are configured without ip address, only carp interfaces have ips. I want to use OpenOSPFD to do automatic failover+loadbalance of this L2L links. This works ? Someone have a similar setup ? Any hints ? I'm using FreeBSD 7, OpenOSPFD 4 (from ports) and Max's carpdev patch. Best Regards, Alexandre From cokane at FreeBSD.org Thu Jul 17 16:30:03 2008 From: cokane at FreeBSD.org (Coleman Kane) Date: Thu Jul 17 16:30:10 2008 Subject: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics Message-ID: <200807171630.m6HGU3IZ015801@freefall.freebsd.org> The following reply was made to PR kern/125181; it has been noted by GNATS. From: Coleman Kane To: bug-followup@FreeBSD.org, onemda@gmail.com Cc: thompsa@FreeBSD.org Subject: Re: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics Date: Thu, 17 Jul 2008 12:09:52 -0400 --=-soKy1PZEAkA40vAIl1Y1 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Andrew, I got directed to this PR by onemda@gmail.com (Paul D. Mahol), who's been helping me track down some edge cases in the if_ndis locking rewrite. I am not 100% familiar with the locking semantics in play here (IEEE80211 and VAPs), so I wanted to run it by you before I determine that it seems to be working well for me. --=20 Coleman Kane --=-soKy1PZEAkA40vAIl1Y1 Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEABECAAYFAkh/bs8ACgkQcMSxQcXat5cPdQCfbs4UgSOx8VZ7wJOu9H1bYdxA h7sAnRJA4UxSvjdNCGG7tm95Jedhz/Ae =vNY9 -----END PGP SIGNATURE----- --=-soKy1PZEAkA40vAIl1Y1-- From thompsa at FreeBSD.org Thu Jul 17 16:50:05 2008 From: thompsa at FreeBSD.org (Andrew Thompson) Date: Thu Jul 17 16:50:12 2008 Subject: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics Message-ID: <200807171650.m6HGo4aK018239@freefall.freebsd.org> The following reply was made to PR kern/125181; it has been noted by GNATS. From: Andrew Thompson To: Coleman Kane Cc: bug-followup@FreeBSD.org, onemda@gmail.com Subject: Re: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics Date: Thu, 17 Jul 2008 09:43:42 -0700 On Thu, Jul 17, 2008 at 12:09:52PM -0400, Coleman Kane wrote: > Andrew, > > I got directed to this PR by onemda@gmail.com (Paul D. Mahol), who's > been helping me track down some edge cases in the if_ndis locking > rewrite. I am not 100% familiar with the locking semantics in play here > (IEEE80211 and VAPs), so I wanted to run it by you before I determine > that it seems to be working well for me. I dont think ndis should be reaching into the net80211 lock. Now that ndis uses a regular mutex its a good chance to add mtx_asserts in the right places and get the locking up to speed. I will try to post a patch soon unless someone beats be to it. Andrew From julian at elischer.org Thu Jul 17 17:09:18 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jul 17 17:09:25 2008 Subject: Requesting comments on Multi-routing table usage Message-ID: <487F7C0C.8090303@elischer.org> The current code in -current will add a new interface to all FIBs. So for example when you add a gre interface irt shows up everywhere. This behaviour is probbaly correct for the base NICs on the system when you boot, but it is probably wrong in other cases. For example, when mpd makes tunnels it probably (but not always) wants to add that set of routes into one FIB. Similarly for other apps that can create tunnels. What is needed is a way to allow the caller to somehow specify the behaviour wanted whenever new interfaces are added. various things crossed my minds.. ------------- Maybe real hardware shoudl go everywhere and virtual should go to the FIB of the creator Maybe P2P interfaces should not go everywhere. Maybe a sysctl can be used to 'flip' teh mode from "everywhere" to "specific fib" after boot has completed. (I have code for this but it's not the perfect solution). Maybe ifconfig can set a new flag somewhere somehow. Maybe a process can set a flag for itself saying what its mode is.. ---------- The trouble is that there is not an "always correct" answer. some people may want to see a tunnel turn up on all FIBs and others may not. From smithi at nimnet.asn.au Thu Jul 17 17:54:54 2008 From: smithi at nimnet.asn.au (Ian Smith) Date: Thu Jul 17 17:55:01 2008 Subject: Requesting comments on Multi-routing table usage In-Reply-To: <487F7C0C.8090303@elischer.org> Message-ID: On Thu, 17 Jul 2008, Julian Elischer wrote: > The current code in -current will add a new interface to all > FIBs. Consider yanking/reinserting cardbus NICs as one source of fun. > So for example when you add a gre interface irt shows up everywhere. > > This behaviour is probbaly correct for the base NICs on the system > when you boot, but it is probably wrong in other cases. > > For example, when mpd makes tunnels it probably > (but not always) wants to add that set of routes into one > FIB. Similarly for other apps that can create tunnels. > > What is needed is a way to allow the caller to somehow > specify the behaviour wanted whenever new interfaces are added. > > various things crossed my minds.. I'm of two minds myself .. but you seem to have lots more :) > ------------- > Maybe real hardware shoudl go everywhere and virtual should go to > the FIB of the creator > > Maybe P2P interfaces should not go everywhere. > > Maybe a sysctl can be used to 'flip' teh mode from "everywhere" > to "specific fib" after boot has completed. (I have code for this but > it's not the perfect solution). Yes in addition to 'setfib N command' it would be likely useful to have a more global 'setfibto' type command, so you could run whole scripts or shells in a known fib context, to which scripts etc could be oblivious? Tuning by sysctl/s would seem most useful, at least for development? > Maybe ifconfig can set a new flag somewhere somehow. > > Maybe a process can set a flag for itself saying what its mode is.. > ---------- > > > The trouble is that there is not an "always correct" answer. > some people may want to see a tunnel turn up on all FIBs > and others may not. It's the options that drive ya crazy .. but being able to set/tune the forwarding context - one fib, all fibs, or a set of fibs? - may allow flexibility in view of the large set of maybes you (so far) mentioned. Just some popcorn from the peanut gallery .. cheers, Ian From julian at elischer.org Thu Jul 17 19:25:20 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jul 17 19:25:27 2008 Subject: Requesting comments on Multi-routing table usage In-Reply-To: References: Message-ID: <487F9BED.90402@elischer.org> Ian Smith wrote: > On Thu, 17 Jul 2008, Julian Elischer wrote: > > The current code in -current will add a new interface to all > > FIBs. > > Consider yanking/reinserting cardbus NICs as one source of fun. > > > So for example when you add a gre interface irt shows up everywhere. > > > > This behaviour is probbaly correct for the base NICs on the system > > when you boot, but it is probably wrong in other cases. > > > > For example, when mpd makes tunnels it probably > > (but not always) wants to add that set of routes into one > > FIB. Similarly for other apps that can create tunnels. > > > > What is needed is a way to allow the caller to somehow > > specify the behaviour wanted whenever new interfaces are added. > > > > various things crossed my minds.. > > I'm of two minds myself .. but you seem to have lots more :) > > > ------------- > > Maybe real hardware shoudl go everywhere and virtual should go to > > the FIB of the creator > > > > Maybe P2P interfaces should not go everywhere. > > > > Maybe a sysctl can be used to 'flip' teh mode from "everywhere" > > to "specific fib" after boot has completed. (I have code for this but > > it's not the perfect solution). > > Yes in addition to 'setfib N command' it would be likely useful to have > a more global 'setfibto' type command, so you could run whole scripts or > shells in a known fib context, to which scripts etc could be oblivious? that's already possible with setfib.. setfib N sh script is going to do that.. The issue I have is with the routes that are added to routing tables when an interface is added.. It's a specific instance that is tricky because it's a side effect rather than a directly requested action. what some people have asked to do is have multiple tunnels to the same place but have different routing tables specify different tunnels to get to that place.. e.g. gre0 1.1.1.1 2.2.2.2 gre1 3.3.3.3 2.2.2.2 gre2 4.4.4.4 2.2.2.2 where in fib 0 the route to 2.2.2.2 is via gre0 and in fib1 it is via gre1 and in fib2 it is via gre2 then you can use setfib in ipfw and pf to use different tunnels to get selected traffic to 2.2.2.2.. This is what is being asked for, but you can only add the interfaces like that if ifconfig only effects differnet FIBS for each interface. > > Tuning by sysctl/s would seem most useful, at least for development? > > > Maybe ifconfig can set a new flag somewhere somehow. > > > > Maybe a process can set a flag for itself saying what its mode is.. > > ---------- > > > > > > The trouble is that there is not an "always correct" answer. > > some people may want to see a tunnel turn up on all FIBs > > and others may not. > > It's the options that drive ya crazy .. but being able to set/tune the > forwarding context - one fib, all fibs, or a set of fibs? - may allow > flexibility in view of the large set of maybes you (so far) mentioned. > > Just some popcorn from the peanut gallery .. > > cheers, Ian From julian at elischer.org Thu Jul 17 19:29:51 2008 From: julian at elischer.org (Julian Elischer) Date: Thu Jul 17 19:29:57 2008 Subject: Requesting comments on Multi-routing table usage In-Reply-To: <487F9BED.90402@elischer.org> References: <487F9BED.90402@elischer.org> Message-ID: <487F9CFB.2080901@elischer.org> Julian Elischer wrote: > Ian Smith wrote: >> On Thu, 17 Jul 2008, Julian Elischer wrote: >> > The current code in -current will add a new interface to all >> > FIBs. >> >> Consider yanking/reinserting cardbus NICs as one source of fun. >> >> > So for example when you add a gre interface irt shows up everywhere. >> > > This behaviour is probbaly correct for the base NICs on the >> system > when you boot, but it is probably wrong in other cases. >> > >> > For example, when mpd makes tunnels it probably >> > (but not always) wants to add that set of routes into one >> > FIB. Similarly for other apps that can create tunnels. >> > > What is needed is a way to allow the caller to somehow >> > specify the behaviour wanted whenever new interfaces are added. >> > > various things crossed my minds.. >> >> I'm of two minds myself .. but you seem to have lots more :) >> >> > ------------- >> > Maybe real hardware shoudl go everywhere and virtual should go to >> > the FIB of the creator >> > > Maybe P2P interfaces should not go everywhere. >> > > Maybe a sysctl can be used to 'flip' teh mode from "everywhere" >> > to "specific fib" after boot has completed. (I have code for this >> but > it's not the perfect solution). >> >> Yes in addition to 'setfib N command' it would be likely useful to have >> a more global 'setfibto' type command, so you could run whole scripts or >> shells in a known fib context, to which scripts etc could be oblivious? > > that's already possible with setfib.. > setfib N sh script is going to do that.. > > The issue I have is with the routes that are added to routing tables > when an interface is added.. It's a specific instance that is tricky > because it's a side effect rather than a directly requested action. > > what some people have asked to do is have multiple tunnels to the same > place but have different routing tables specify different tunnels to get > to that place.. > > e.g. > > gre0 1.1.1.1 2.2.2.2 > gre1 3.3.3.3 2.2.2.2 > gre2 4.4.4.4 2.2.2.2 > > where in fib 0 the route to 2.2.2.2 is via gre0 > and in fib1 it is via gre1 > and in fib2 it is via gre2 > then you can use setfib in ipfw and pf to use different tunnels to get > selected traffic to 2.2.2.2.. > > This is what is being asked for, but you can only add the > interfaces like that if ifconfig only effects differnet FIBS for each > interface. hmmm that makes me think that maybe an ifconfig command to associate a FIB with an interface might do the trick... if it's not associated with a FIB it get to all of them, but if you have previously associated it wit a FIB, then only that FIB is affected. That may just be a good enough answer. > > > >> >> Tuning by sysctl/s would seem most useful, at least for development? >> >> > Maybe ifconfig can set a new flag somewhere somehow. >> > > Maybe a process can set a flag for itself saying what its mode is.. >> > ---------- >> > > > The trouble is that there is not an "always correct" answer. >> > some people may want to see a tunnel turn up on all FIBs >> > and others may not. >> >> It's the options that drive ya crazy .. but being able to set/tune the >> forwarding context - one fib, all fibs, or a set of fibs? - may allow >> flexibility in view of the large set of maybes you (so far) mentioned. >> >> Just some popcorn from the peanut gallery .. >> >> cheers, Ian > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From lab at gta.com Thu Jul 17 20:21:42 2008 From: lab at gta.com (Larry Baird) Date: Thu Jul 17 20:21:49 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <487EC62A.3070301@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> Message-ID: <20080717202141.GA65940@gta.com> Sam, > Please test/review the following patch against HEAD: > > http://people.freebsd.org/~sam/nat_t-20080616.patch > > This adds only the kernel portion of the NAT-T support; you must provide > the user-level code from another place. > > The main difference from the patches floating around are in the > ctloutput path (adding proper locking for HEAD) and decap of ESP-in-UDP > frames. Assuming folks are ok w/ these changes I'll commit to HEAD. > Once this stuff goes in we can look at getting the user-mode mods into > the tree. I should have time to begin to look at this tomorrow. I also have an additional patch that needs adding. In sys/netipsec/ipsec_mbuf.c the function m_makespace() has an assert/comment stating "code doesn't handle clusters". If using NAT-T with crypto acceleration you can hit this case. I'll email this patch to you within the next couple of days. Larry -- ------------------------------------------------------------------------ Larry Baird | http://www.gta.com Global Technology Associates, Inc. | Orlando, FL Email: lab@gta.com | TEL 407-380-0220, FAX 407-380-6080 From danger at FreeBSD.org Thu Jul 17 20:22:29 2008 From: danger at FreeBSD.org (Daniel Gerzo) Date: Thu Jul 17 20:22:40 2008 Subject: etc/rc.firewall6 Message-ID: <743720911.20080717222210@rulez.sk> Hello freebsd-net, would somebody more knowledgeable then I am in ip6 review this [1] small patch for /etc/rc.firewall6? May I get an approval from some src/ committer to commit this (please keep me in the CC: list)? Thank you. [1] http://cvsup.sk.freebsd.org/~danger/rc.ipfw6.diff -- Best regards, Daniel mailto:danger@FreeBSD.org From hrs at FreeBSD.org Thu Jul 17 21:48:17 2008 From: hrs at FreeBSD.org (hrs@FreeBSD.org) Date: Thu Jul 17 21:48:24 2008 Subject: kern/125003: [gif] incorrect EtherIP header format. Message-ID: <200807172148.m6HLmHl9043759@freefall.freebsd.org> Synopsis: [gif] incorrect EtherIP header format. Responsible-Changed-From-To: freebsd-net->hrs Responsible-Changed-By: hrs Responsible-Changed-When: Thu Jul 17 21:47:32 UTC 2008 Responsible-Changed-Why: I will handle this. http://www.freebsd.org/cgi/query-pr.cgi?pr=125003 From dougb at FreeBSD.org Thu Jul 17 23:00:04 2008 From: dougb at FreeBSD.org (Doug Barton) Date: Thu Jul 17 23:00:14 2008 Subject: etc/rc.firewall6 In-Reply-To: <743720911.20080717222210@rulez.sk> References: <743720911.20080717222210@rulez.sk> Message-ID: <487FC8B1.4070003@FreeBSD.org> Daniel Gerzo wrote: > Hello freebsd-net, > > would somebody more knowledgeable then I am in ip6 review this [1] > small patch for /etc/rc.firewall6? May I get an approval from some > src/ committer to commit this (please keep me in the CC: list)? > > Thank you. > > [1] http://cvsup.sk.freebsd.org/~danger/rc.ipfw6.diff > Looks like the right direction to go in for the DNS stuff, yes. About the ntp stuff, 2 questions. First, you did not make the same changes in the NTP section in the second hunk as you did in the first, is that intentional? Second, wouldn't it be better to specify the port number (123) on both sides? NTP uses that same port for sending and receiving queries, and I've always built firewalls that way successfully. Doug -- This .signature sanitized for your protection From cswiger at mac.com Thu Jul 17 23:21:31 2008 From: cswiger at mac.com (Chuck Swiger) Date: Thu Jul 17 23:21:37 2008 Subject: etc/rc.firewall6 In-Reply-To: <487FC8B1.4070003@FreeBSD.org> References: <743720911.20080717222210@rulez.sk> <487FC8B1.4070003@FreeBSD.org> Message-ID: <615CAFFA-48AF-4207-A838-B8AB58B6EE76@mac.com> On Jul 17, 2008, at 3:33 PM, Doug Barton wrote: [ ... ] > About the ntp stuff, 2 questions. First, you did not make the same > changes in the NTP section in the second hunk as you did in the > first, is that intentional? Second, wouldn't it be better to > specify the port number (123) on both sides? NTP uses that same port > for sending and receiving queries, and I've always built firewalls > that way successfully. David Mills' ntpd uses port 123 on both sides, true. Other NTP implementations tend to use ephemeral ports; a quick histogram of 30 seconds or so of traffic to a stratum-2 NTP server suggests about half of the NTP traffic out there uses other ports. Regards, -- -Chuck # tcpdump -w ntp_packets.dump udp port 123 tcpdump: listening on fxp0, link-type EN10MB (Ethernet), capture size 96 bytes ^C 615 packets captured 897 packets received by filter 0 packets dropped by kernel # tcpdump -nr ntp_packets.dump | wc -l reading from file ntp_packets.dump, link-type EN10MB (Ethernet) 615 # tcpdump -nr ntp_packets.dump | grep '.123 >' | wc -l reading from file ntp_packets.dump, link-type EN10MB (Ethernet) 347 Most of these above were packets sent by my server. The rest have quite an assortment of source ports being used: # tcpdump -nr ntp_packets.dump | grep -v '.123 >' | head reading from file ntp_packets.dump, link-type EN10MB (Ethernet) 19:06:41.598527 IP 69.144.236.104.3186 > 199.103.21.227.123: NTPv4, Client, length 48 19:06:41.620732 IP 70.169.250.10.297 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:06:41.755699 IP 63.118.102.151.47817 > 199.103.21.227.123: NTPv4, Client, length 48 19:06:41.932513 IP 65.7.131.67.61897 > 199.103.21.227.123: NTPv3, Client, length 48 19:06:42.041643 IP 69.48.55.134.6 > 199.103.21.227.123: NTPv3, Client, length 48 19:06:42.098282 IP 64.211.94.227.32839 > 199.103.21.227.123: NTPv4, Client, length 48 19:06:42.248041 IP 74.234.132.214.49846 > 199.103.21.227.123: NTPv3, Client, length 48 19:06:42.263930 IP 66.134.96.79.50420 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:06:42.338483 IP 38.115.128.242.12709 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:06:42.764847 IP 70.169.250.10.429 > 199.103.21.227.123: NTPv3, symmetric active, length 48 # tcpdump -nr ntp_packets.dump | grep -v '.123 >' | tail reading from file ntp_packets.dump, link-type EN10MB (Ethernet) 19:07:09.302753 IP 170.235.223.10.47601 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:07:09.355610 IP 38.105.187.251.278 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:07:09.360286 IP 70.148.188.206.59640 > 199.103.21.227.123: NTPv4, Client, length 48 19:07:09.502241 IP 138.210.238.176.26487 > 199.103.21.227.123: NTPv3, Client, length 48 19:07:09.838130 IP 66.89.121.68.13587 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:07:10.064838 IP 76.201.148.100.2050 > 199.103.21.227.123: NTPv3, Client, length 48 19:07:10.121137 IP 217.96.91.6.37920 > 199.103.21.227.123: NTPv4, Client, length 48 19:07:10.124784 IP 70.169.250.10.24 > 199.103.21.227.123: NTPv3, symmetric active, length 48 19:07:10.203358 IP 24.154.104.34.40289 > 199.103.21.227.123: NTPv4, Client, length 48 19:07:10.234026 IP 64.178.45.44.1 > 199.103.21.227.123: NTPv4, Client, length 48 From max at love2party.net Thu Jul 17 23:35:39 2008 From: max at love2party.net (Max Laier) Date: Thu Jul 17 23:35:46 2008 Subject: etc/rc.firewall6 In-Reply-To: <615CAFFA-48AF-4207-A838-B8AB58B6EE76@mac.com> References: <743720911.20080717222210@rulez.sk> <487FC8B1.4070003@FreeBSD.org> <615CAFFA-48AF-4207-A838-B8AB58B6EE76@mac.com> Message-ID: <200807180135.35912.max@love2party.net> On Friday 18 July 2008 01:21:28 Chuck Swiger wrote: > On Jul 17, 2008, at 3:33 PM, Doug Barton wrote: > [ ... ] > > > About the ntp stuff, 2 questions. First, you did not make the same > > changes in the NTP section in the second hunk as you did in the > > first, is that intentional? Second, wouldn't it be better to > > specify the port number (123) on both sides? NTP uses that same port > > for sending and receiving queries, and I've always built firewalls > > that way successfully. > > David Mills' ntpd uses port 123 on both sides, true. Other NTP > implementations tend to use ephemeral ports; a quick histogram of 30 > seconds or so of traffic to a stratum-2 NTP server suggests about half > of the NTP traffic out there uses other ports. Don't forget PNAT. I'd also argue that the rc.firewall6 in base is supposed to work with the ntpd in base. We should, however, not forget about ntpdate, which seems to use ephemeral ports. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From cswiger at mac.com Fri Jul 18 00:07:26 2008 From: cswiger at mac.com (Chuck Swiger) Date: Fri Jul 18 00:07:33 2008 Subject: etc/rc.firewall6 In-Reply-To: <200807180135.35912.max@love2party.net> References: <743720911.20080717222210@rulez.sk> <487FC8B1.4070003@FreeBSD.org> <615CAFFA-48AF-4207-A838-B8AB58B6EE76@mac.com> <200807180135.35912.max@love2party.net> Message-ID: <7CD8CD0E-0150-438C-BD50-D2A8C2210280@mac.com> On Jul 17, 2008, at 4:35 PM, Max Laier wrote: >> David Mills' ntpd uses port 123 on both sides, true. Other NTP >> implementations tend to use ephemeral ports; a quick histogram of 30 >> seconds or so of traffic to a stratum-2 NTP server suggests about >> half >> of the NTP traffic out there uses other ports. > > Don't forget PNAT. I'd also argue that the rc.firewall6 in base is > supposed to work with the ntpd in base. We should, however, not > forget > about ntpdate, which seems to use ephemeral ports. Certainly some forms of NAT might also "scrub" ntpd's use of port 123 to some random higher port, true enough. It's not recommended that machines providing time service to others have NAT in the way, though, so that circumstance wasn't at the top of my mind. :-) -- -Chuck From vanhu at FreeBSD.org Fri Jul 18 08:28:38 2008 From: vanhu at FreeBSD.org (VANHULLEBUS Yvan) Date: Fri Jul 18 08:28:46 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <487EC62A.3070301@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> Message-ID: <20080718082834.GA11096@zen.inc> On Wed, Jul 16, 2008 at 09:10:18PM -0700, Sam Leffler wrote: [...] > Please test/review the following patch against HEAD: > > http://people.freebsd.org/~sam/nat_t-20080616.patch For those who may be interested,I ported Sam's changes to FreeBSD7, the patch is here: http://people.freebsd.org/~vanhu/patch-natt-test-releng7-20080717.diff Please note that this patch has NOT been pushed to the "official" location for NAT-T patches, as I did NOT test it for now (kernel has been compiled successfully, but I'll only be able to switch to it tomorrow, as I actually use the tunnel to that gate to access it). > This adds only the kernel portion of the NAT-T support; you must provide > the user-level code from another place. Note for people who are interested: user-level code comes from ipsec-tools, as for previous versions of the NAT-T patch. Sam's changes have only impacts on the kernel itself, so if you are already running a FreeBSD kernel+userland with NAT-T patchset, you'll only need to repatch/rebuild your kernel, rebuilding world (at least includes) and ipsec-tools is NOT needed. Of course, if you're running a FreeBSD host which actually does know NOTHING about NAT-T, you'll need to apply the patch, rebuild your kernel, at least rebuild includes (or ipsec-tools won't detect NAT-T support), then rebuild ipsec-tools. But that was already the procedure with previous versions of the patch. > The main difference from the patches floating around are in the > ctloutput path (adding proper locking for HEAD) and decap of ESP-in-UDP > frames. Assuming folks are ok w/ these changes I'll commit to HEAD. > Once this stuff goes in we can look at getting the user-mode mods into > the tree. I reported your changes on locking system (and just changed INP_WLOCKS to INP_LOCKS) on the RELENG7 version, is that ok ? While I'm here, a few words about authors and contributors of the patch, just to ensure it has been told at least once :-) Original authors of the patch are Emmanuel Dreyfus (manu at NetBSD.org, for the NetBSD version) and me (for the FreeBSD version), when patches for both BSDs were very similar. Larry ported the patch to FAST_IPSEC stack (Larry, I'm quite sure you also reported other patches, but I don't remember exactly what). Bjoern reported some fixes. I ported the patch to FreeBSD7 and to actual HEAD, and also made some other various things on it. Sam made the changes we're talking about in that thread. Matthew did a LOT of tests with various implementations and reported bugs. I would also like to thanks Julien VANHERZEELE, which is the guy at my works who does IPSec qualification, and who also set up lots of tests related to NAT-T for years. If some other people reported me some patches / bugs and have not been cited here, please accept my apologies for such a bad memory. If some other people have some patches, bug reports, etc... related to that patch, please report them as soon as possible ! Yvan. -- NETASQ http://www.netasq.com From onemda at gmail.com Fri Jul 18 11:23:28 2008 From: onemda at gmail.com (Paul B. Mahol) Date: Fri Jul 18 11:23:35 2008 Subject: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, panics In-Reply-To: <200807171650.m6HGo4aK018239@freefall.freebsd.org> References: <200807171650.m6HGo4aK018239@freefall.freebsd.org> Message-ID: <3a142e750807180358i4e3baa3m9ebe7cad357fe2cf@mail.gmail.com> On 7/17/08, Andrew Thompson wrote: > The following reply was made to PR kern/125181; it has been noted by GNATS. > > From: Andrew Thompson > To: Coleman Kane > Cc: bug-followup@FreeBSD.org, onemda@gmail.com > Subject: Re: kern/125181: [ndis] [patch] with wep enters kdb.enter.unknown, > panics > Date: Thu, 17 Jul 2008 09:43:42 -0700 > > On Thu, Jul 17, 2008 at 12:09:52PM -0400, Coleman Kane wrote: > > Andrew, > > > > I got directed to this PR by onemda@gmail.com (Paul D. Mahol), who's > > been helping me track down some edge cases in the if_ndis locking > > rewrite. I am not 100% familiar with the locking semantics in play here > > (IEEE80211 and VAPs), so I wanted to run it by you before I determine > > that it seems to be working well for me. > > I dont think ndis should be reaching into the net80211 lock. Now that > ndis uses a regular mutex its a good chance to add mtx_asserts in the > right places and get the locking up to speed. I will try to post a patch > soon unless someone beats be to it. Patch impact on performance is marginal if not completely irrelevant. The only way to improve code in that file is rewritting offending functions. And at end net80211 lock would be still there (called via some other function). From mgrooms at shrew.net Fri Jul 18 14:08:29 2008 From: mgrooms at shrew.net (Matthew Grooms) Date: Fri Jul 18 14:08:36 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <4880973B.2010200@shrew.net> References: <4880973B.2010200@shrew.net> Message-ID: <4880A3D7.5020300@shrew.net> > On Wed, Jul 16, 2008 at 09:10:18PM -0700, Sam Leffler wrote: > > > This adds only the kernel portion of the NAT-T support; you must provide > > the user-level code from another place. > > Note for people who are interested: > user-level code comes from ipsec-tools, as for previous versions of > the NAT-T patch. > > Sam's changes have only impacts on the kernel itself, so if you are > already running a FreeBSD kernel+userland with NAT-T patchset, you'll > only need to repatch/rebuild your kernel, rebuilding world (at least > includes) and ipsec-tools is NOT needed. > > Of course, if you're running a FreeBSD host which actually does know > NOTHING about NAT-T, you'll need to apply the patch, rebuild your > kernel, at least rebuild includes (or ipsec-tools won't detect NAT-T > support), then rebuild ipsec-tools. > For anyone trying to install ipsec-tools to test this patch, its worth mentioning that the port has a build issues on CURRENT. This has been corrected in cvs and the 7-branch of ipsec-tools. As a quick remedy, a patch is attached that can be applied to the port work sources. -Matthew -------------- next part -------------- Index: src/racoon/crypto_openssl.c =================================================================== RCS file: /cvsroot/src/crypto/dist/ipsec-tools/src/racoon/crypto_openssl.c,v retrieving revision 1.11.6.1 diff -u -r1.11.6.1 crypto_openssl.c --- src/racoon/crypto_openssl.c 18 Dec 2006 10:18:10 -0000 1.11.6.1 +++ src/racoon/crypto_openssl.c 18 Jul 2008 13:45:05 -0000 @@ -675,7 +675,7 @@ { plog(LLV_ERROR, LOCATION, NULL, "data is not terminated by NUL."); - hexdump(gen->d.ia5->data, gen->d.ia5->length + 1); + racoon_hexdump(gen->d.ia5->data, gen->d.ia5->length + 1); goto end; } Index: src/racoon/eaytest.c =================================================================== RCS file: /cvsroot/src/crypto/dist/ipsec-tools/src/racoon/eaytest.c,v retrieving revision 1.7.6.1 diff -u -r1.7.6.1 eaytest.c --- src/racoon/eaytest.c 6 Jun 2007 15:36:38 -0000 1.7.6.1 +++ src/racoon/eaytest.c 18 Jul 2008 13:45:05 -0000 @@ -65,7 +65,7 @@ #include "package_version.h" -#define PVDUMP(var) hexdump((var)->v, (var)->l) +#define PVDUMP(var) racoon_hexdump((var)->v, (var)->l) /*#define CERTTEST_BROKEN */ Index: src/racoon/misc.c =================================================================== RCS file: /cvsroot/src/crypto/dist/ipsec-tools/src/racoon/misc.c,v retrieving revision 1.4 diff -u -r1.4 misc.c --- src/racoon/misc.c 9 Sep 2006 16:22:09 -0000 1.4 +++ src/racoon/misc.c 18 Jul 2008 13:45:05 -0000 @@ -73,7 +73,7 @@ #endif int -hexdump(buf0, len) +racoon_hexdump(buf0, len) void *buf0; size_t len; { Index: src/racoon/misc.h =================================================================== RCS file: /cvsroot/src/crypto/dist/ipsec-tools/src/racoon/misc.h,v retrieving revision 1.4 diff -u -r1.4 misc.h --- src/racoon/misc.h 9 Sep 2006 16:22:09 -0000 1.4 +++ src/racoon/misc.h 18 Jul 2008 13:45:05 -0000 @@ -42,7 +42,7 @@ #define LOCATION debug_location(__FILE__, __LINE__, NULL) #endif -extern int hexdump __P((void *, size_t)); +extern int racoon_hexdump __P((void *, size_t)); extern char *bit2str __P((int, int)); extern void *get_newbuf __P((void *, size_t)); extern const char *debug_location __P((const char *, int, const char *)); Index: src/racoon/racoonctl.c =================================================================== RCS file: /cvsroot/src/crypto/dist/ipsec-tools/src/racoon/racoonctl.c,v retrieving revision 1.7 diff -u -r1.7 racoonctl.c --- src/racoon/racoonctl.c 2 Oct 2006 07:12:26 -0000 1.7 +++ src/racoon/racoonctl.c 18 Jul 2008 13:45:06 -0000 @@ -303,7 +303,7 @@ err(1, "kmpstat"); if (loglevel) - hexdump(combuf, ((struct admin_com *)combuf)->ac_len); + racoon_hexdump(combuf, ((struct admin_com *)combuf)->ac_len); com_init(); From ticso at cicely7.cicely.de Fri Jul 18 14:32:53 2008 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Fri Jul 18 14:33:13 2008 Subject: TCP zombie connections with 7-RELEASE and STABLE from 15th june Message-ID: <20080718135931.GA48087@cicely7.cicely.de> 14:45:58.109631 IP 213.83.6.106.3270 > 85.159.14.110.443: S 470580731:470580731(0) win 32768 14:45:58.109753 IP 85.159.14.110.443 > 213.83.6.106.3270: S 1364510055:1364510055(0) ack 470580732 win 65535 14:45:58.114324 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 1 win 33304 14:45:59.816810 IP 213.83.6.106.3270 > 85.159.14.110.443: F 1:1(0) ack 1 win 33304 14:45:59.816900 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 2 win 8326 14:45:59.818445 IP 85.159.14.110.443 > 213.83.6.106.3270: F 1:1(0) ack 2 win 8326 14:45:59.822859 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.415401 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 1 win 0 14:46:00.420082 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.420139 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.424772 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.424847 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.429065 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.429089 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.433247 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.433305 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.437641 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.437700 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.442408 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.442445 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.447231 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.447291 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.451525 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.451587 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.455957 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.456024 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.460666 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.460732 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:00.465092 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:00.465150 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 [...] 14:46:31.182624 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:31.182978 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.183006 IP 85.159.14.110.443 > 213.83.6.106.3270: . ack 1 win 65535 14:46:31.183146 IP 85.159.14.110.443 > 213.83.6.106.3270: F 1:1(0) ack 1 win 65535 14:46:31.183173 IP 85.159.14.110.443 > 213.83.6.106.3270: F 1:1(0) ack 1 win 65535 14:46:31.184038 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.184124 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.184157 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.184740 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.185174 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.186762 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.187366 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.187380 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 14:46:31.187573 IP 213.83.6.106.3270 > 85.159.14.110.443: . ack 2 win 33303 443 is a self written server, but it also happens with port 80 and sslproxy. The client is a telnet, which disconnects directly after connecting, so the disconnect is initiated from the client, which seems to be important for this problem to trigger. You can see that the FIN handshake completes and netstat on the client box shows the connection in TIME_WAIT. The server however has the connection still in ESTABLISHED state. What happens in the application code looks quite silly. I do a typical accept loop and then I process the data in a new thread. After my thread terminates and closes it's filedescriptor the select loop accepts the old connection again. This doesn't happen in every case but almost always. Finally after 30 seconds without data to read my newly created thread closes the zombie connection again. The question is why accept returns me a filedescriptor for a connection which was already returned and should have been closed? -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From mgrooms at shrew.net Fri Jul 18 19:43:54 2008 From: mgrooms at shrew.net (Matthew Grooms) Date: Fri Jul 18 19:44:01 2008 Subject: Help with tap device configuration oddity Message-ID: <4880F273.1090802@shrew.net> All, I noticed a problem with some software I wrote for FreeBSD using tap devices. It would appear that you get inconsistent results from ioctl calls SIOCSIFADDR and SIOCSIFNETMASK when used with tap than when used with a real Ethernet device. I wrote a quick test program to demonstrate this which can be found at the following url ... http://hole.shrew.net/~mgrooms/files/taptest.cpp g++ taptest.cpp -o taptest USAGE : taptest
[ifname] Specify the ifname parameter to configure an existing adapter. Omit the ifname paramter to create a tap device and configure it instead. When I use this with an Ethernet device on CURRENT, I get normal results ... # ./taptest 10.1.2.3 255.255.255.0 1350 le1 ii : configured adapter le1 [10.1.2.3/255.255.255.0 MTU 1350] le1: flags=8843 metric 0 mtu 1350 options=8 ether 00:0c:29:bd:60:2b inet 10.1.2.3 netmask 0xffffff00 broadcast 10.1.2.255 media: Ethernet autoselect status: active # netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 10.aa.bbb.c UGS 0 89 le0 10.1.2.0/24 link#2 UC 0 0 le1 ... When I use this with a tap device on CURRENT, I always get a wacky 10/8 route added and no 10.2.3/24 route like you would expect ... # ./taptest 10.2.3.4 255.255.255.0 1350 creating tap device ii : opened tap device /dev/tap0 ii : configured adapter tap0 [10.2.3.4/255.255.255.0 MTU 1350] tap0: flags=8843 metric 0 mtu 1350 ether 00:bd:59:d2:02:00 inet 10.1.2.3 netmask 0xffffff00 broadcast 10.1.2.255 Opened by PID 1497 # netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 10.aa.bbb.c UGS 0 89 le0 10.0.0.0/8 link#5 UC 0 0 tap0 This really messes with traffic that should go out the default route. I tested this on 6.2-RELEASE as well and got similar results ... # netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 10.a.b.c UGS 0 5940 lnc0 10 link#7 UC 0 0 tap0 Can someone please explain this to me? Thanks in advance, -Matthew From lab at gta.com Sat Jul 19 01:40:53 2008 From: lab at gta.com (Larry Baird) Date: Sat Jul 19 01:41:00 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <487EC62A.3070301@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> Message-ID: <20080719014051.GA80850@gta.com> Sam, > The main difference from the patches floating around are in the > ctloutput path (adding proper locking for HEAD) and decap of ESP-in-UDP > frames. Assuming folks are ok w/ these changes I'll commit to HEAD. > Once this stuff goes in we can look at getting the user-mode mods into > the tree. Didn't get the free time I thought I would have today. Hopefully over the weekend I will get time to finish reviewing the patch. I have attached the patch against head for ipsec_mbuf.c. Now that FreeBSD has a svn respository, creating diffs against head is trivial. (-: Larry -- ------------------------------------------------------------------------ Larry Baird | http://www.gta.com Global Technology Associates, Inc. | Orlando, FL Email: lab@gta.com | TEL 407-380-0220, FAX 407-380-6080 From smithi at nimnet.asn.au Sat Jul 19 03:39:51 2008 From: smithi at nimnet.asn.au (Ian Smith) Date: Sat Jul 19 03:40:02 2008 Subject: Requesting comments on Multi-routing table usage In-Reply-To: <487F9CFB.2080901@elischer.org> Message-ID: On Thu, 17 Jul 2008, Julian Elischer wrote: > Julian Elischer wrote: > > Ian Smith wrote: > >> On Thu, 17 Jul 2008, Julian Elischer wrote: > >> > The current code in -current will add a new interface to all > >> > FIBs. [..] > >> Yes in addition to 'setfib N command' it would be likely useful to have > >> a more global 'setfibto' type command, so you could run whole scripts or > >> shells in a known fib context, to which scripts etc could be oblivious? > > > > that's already possible with setfib.. > > setfib N sh script is going to do that.. Yeah, guess I was thinking more of setting fixed FIB context 'from now on' which I think your ifconfig solution below probably best addresses. > > The issue I have is with the routes that are added to routing tables > > when an interface is added.. It's a specific instance that is tricky > > because it's a side effect rather than a directly requested action. > > > > what some people have asked to do is have multiple tunnels to the same > > place but have different routing tables specify different tunnels to get > > to that place.. > > > > e.g. > > > > gre0 1.1.1.1 2.2.2.2 > > gre1 3.3.3.3 2.2.2.2 > > gre2 4.4.4.4 2.2.2.2 > > > > where in fib 0 the route to 2.2.2.2 is via gre0 > > and in fib1 it is via gre1 > > and in fib2 it is via gre2 > > then you can use setfib in ipfw and pf to use different tunnels to get > > selected traffic to 2.2.2.2.. > > > > This is what is being asked for, but you can only add the > > interfaces like that if ifconfig only effects differnet FIBS for each > > interface. > > hmmm that makes me think that maybe an ifconfig command to associate > a FIB with an interface might do the trick... > if it's not associated with a FIB it get to all of them, but if > you have previously associated it wit a FIB, then only that FIB is > affected. > > That may just be a good enough answer. Do you have some suggested syntax for the ifconfig command? I may well just be blowing smoke here, and only lightly browsed this stuff earlier (with interest) but I wonder whether a choice between all FIBs and just one is too, well, binary for all possible situations, and whether some situations might wish to refer to some set of FIBs? And if so, rather than any complicated set manipulation, this could be accomplished - if needed - by having a '-option' syntax as is common to ifconfig arguments, to remove a particular FIB(s) from the ALL set? Just till someone equipped with proper net-fu turns up to comment :) cheers, Ian From gonzo at FreeBSD.org Sat Jul 19 13:24:24 2008 From: gonzo at FreeBSD.org (gonzo@FreeBSD.org) Date: Sat Jul 19 13:24:31 2008 Subject: kern/125442: [carp][lagg] CARP combined with LAGG causes system panic - 7.0/amd64 Message-ID: <200807191324.m6JDONgi019873@freefall.freebsd.org> Synopsis: [carp][lagg] CARP combined with LAGG causes system panic - 7.0/amd64 State-Changed-From-To: open->feedback State-Changed-By: gonzo State-Changed-When: Sat Jul 19 13:23:55 UTC 2008 State-Changed-Why: I'll take it. Responsible-Changed-From-To: freebsd-net->gonzo Responsible-Changed-By: gonzo Responsible-Changed-When: Sat Jul 19 13:23:55 UTC 2008 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=125442 From kungfujesus06 at gmail.com Sat Jul 19 19:08:22 2008 From: kungfujesus06 at gmail.com (Adam Stylinski) Date: Sat Jul 19 19:08:28 2008 Subject: nfe driver Message-ID: <96af083b0807191144p38d49087kdfd3979f9c155ae8@mail.gmail.com> I have an mcp67 nforce networking controller using the nfe driver. I currently cannot set my MTU to anything higher than 1500. The controller definitely supports jumbo frames. Is there any hope of the BSD driver supporting it? I'm more than willing to test things out. I guess another question to ask would be if the newer kernel sources in freebsd-stable have support for jumbo frames on the MCP67 in the nfe driver. From brad at comstyle.com Sat Jul 19 19:28:04 2008 From: brad at comstyle.com (Brad) Date: Sat Jul 19 19:28:10 2008 Subject: nfe driver In-Reply-To: <96af083b0807191144p38d49087kdfd3979f9c155ae8@mail.gmail.com> References: <96af083b0807191144p38d49087kdfd3979f9c155ae8@mail.gmail.com> Message-ID: <200807191527.53302.brad@comstyle.com> On Saturday 19 July 2008 14:44:02 Adam Stylinski wrote: > The controller definitely supports jumbo frames. What proof do you have of this? > I guess another question to ask would be if the newer kernel sources in > freebsd-stable have support for jumbo frames on the MCP67 in the nfe > driver. No. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From kip.macy at gmail.com Sun Jul 20 01:11:24 2008 From: kip.macy at gmail.com (Kip Macy) Date: Sun Jul 20 01:11:32 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <601bffc40807112344n7a683f81y516f540e24d87389@mail.gmail.com> References: <4867420D.7090406@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> <601bffc40807112344n7a683f81y516f540e24d87389@mail.gmail.com> Message-ID: On Fri, Jul 11, 2008 at 11:44 PM, Brian McGinty wrote: >> Hi Brian >> I very much doubt that this is ceteris paribus. This is 384 random IPs >> -> 384 random IP addresses with a flow lookup for each packet. Also, >> I've read through igb on Linux - it has a lot of optimizations that >> the FreeBSD driver lacks and I have yet to implement. > > Hey Kip, > when will you push the optimization into FreeBSD? Hi Brian, I'm hoping to get to it some time in August. I'm a bit behind in my contracts at the moment. FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces on an 8-core system. I'm hoping to push it up to 3Mpps. Thanks, Kip From brian.mcginty at gmail.com Sun Jul 20 02:17:46 2008 From: brian.mcginty at gmail.com (Brian McGinty) Date: Sun Jul 20 02:17:52 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: References: <4867420D.7090406@gtcomm.net> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> <601bffc40807112344n7a683f81y516f540e24d87389@mail.gmail.com> Message-ID: <601bffc40807191917g131bacao6485376365304f55@mail.gmail.com> G'day Kip, > I'm hoping to get to it some time in August. I'm a bit behind in my > contracts at the moment. A few weeks ago, I did a quick comparison of the driver between FreeBSD and Linux, and found quite a few differences that's worth pulling over. The guy from Intel working on FreeBSD, Jack?, is he the one that does this sort of sync-up of the drivers between the two distribution, or you? There's been a lot of changes recently, including full support for multiple Rx/Tx queues that significantly ups the ante on performance. FreeBSD doesn't support multiple Rx/Tx, or does something half arsed. > FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces > on an 8-core system. I'm hoping to push it up to 3Mpps. Is this no-loss number, and how did you test it? I don't have throughput numbers for the Oplin. I'm waiting to get some time on the Ixia at work to generate performance numbers for 1G and 10G for all packet sizes, on FreeBSD and Linux, on a 16 core system, and blast it to the list. I expect Linux to do 2-3 times better :-) Later, Brian From kip.macy at gmail.com Sun Jul 20 02:28:06 2008 From: kip.macy at gmail.com (Kip Macy) Date: Sun Jul 20 02:28:13 2008 Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] In-Reply-To: <601bffc40807191917g131bacao6485376365304f55@mail.gmail.com> References: <4867420D.7090406@gtcomm.net> <4871E85C.8090907@freebsd.org> <48726422.7050703@gtcomm.net> <200807080107.m6817XxO021966@lava.sentex.ca> <601bffc40807081346q454c1f40td47a0f54806d8a8c@mail.gmail.com> <601bffc40807112344n7a683f81y516f540e24d87389@mail.gmail.com> <601bffc40807191917g131bacao6485376365304f55@mail.gmail.com> Message-ID: On Sat, Jul 19, 2008 at 7:17 PM, Brian McGinty wrote: > G'day Kip, > >> I'm hoping to get to it some time in August. I'm a bit behind in my >> contracts at the moment. > > A few weeks ago, I did a quick comparison of the driver between > FreeBSD and Linux, and found quite a few differences that's worth > pulling over. The guy from Intel working on FreeBSD, Jack?, is he the > one that does this sort of sync-up of the drivers between the two > distribution, or you? There's been a lot of changes recently, > including full support for multiple Rx/Tx queues that significantly > ups the ante on performance. FreeBSD doesn't support multiple Rx/Tx, > or does something half arsed. This is on a variant of RELENG_6 FreeBSD with a recent version of ULE and running the Checkpoint firewall. It also uses the full number of queues available to igb (4) and #queues == #cores (8 in this case) for ixgbe. The drivers in CVS have some bugs that I have fixed in this FreeBSD variant. FreeBSD's CVS version of the Intel drivers definitely lags Linux in terms of some optimizations. Even my version doesn't have some of the linux optimizations. >> FYI: I'm actually able to forward 2.3Mpps between 2 10Gig interfaces >> on an 8-core system. I'm hoping to push it up to 3Mpps. This is testing with an IXIA I don't currently have zero loss numbers. This is not fully loaded. However, ixgbe spews out pause frames when rx gets backed up so losses never get much above 0.1%. > Is this no-loss number, and how did you test it? I don't have > throughput numbers for the Oplin. I'm waiting to get some time on the > Ixia at work to generate performance numbers for 1G and 10G for all > packet sizes, on FreeBSD and Linux, on a 16 core system, and blast it > to the list. I expect Linux to do 2-3 times better :-) Sure, if you don't care about packet reordering. On their own box Checkpoint claims that Linux is currently able to do 20% better than we are seeing. Even they don't claim 200% - 300%. I know people who are switching off of Linux for memcache because they simply can't make it perform. So you're mileage really varies depending on the workload. I'm not sure where you get your numbers from. I would really like to get a hold of this magical Linux distribution to do a side by side comparison on the same workload. A 200% - 300% performance delta would definitely justify switching. Thanks, Kip From luigi.iannone at uclouvain.be Sun Jul 20 13:25:13 2008 From: luigi.iannone at uclouvain.be (Luigi Iannone) Date: Sun Jul 20 13:25:21 2008 Subject: OpenLISP Message-ID: Hello FreeBSD Networking Community, During the last years, there have been many discussions about the scalability of the Internet architecture notably within the IRTF RRG. With IPv6, thanks to its huge addressing space, it is possible to design protocols and mechanisms that are more scalable and more powerful than with IPv4. A typical example is the multihoming problem. This problem occurs when a site is attached to several Internet Service providers. With IPv4, the classical solution is for the site to obtain one IPv4 prefix and advertise it by using BGP. This solution works and traffic engineering is possible, but unfortunately, it contributes to a significant growth of the BGP routing tables in the global Internet. Approaches to better scale the Internet architecture are being discussed, notably within the Routing Research Group of the Internet Research Task Force. Several of these approaches rely on separating the two roles of IP addresses: the locator role and the identifier role. In today's IPv4 Internet, IPv4 addresses are used both to indicate the location in the Internet topology of a host (the locator role) and to terminate the transport flows on end-hosts (the identifier role). This means that it is difficult to change the IP address of a host without disrupting transport flows. The techniques that separate identifiers from locators take a different approach. First, an identifier is attached to each end- host. This identifier is used to terminate the transport flows. Second, each identifier may be reachable through multiple locators and a mapping mechanism is used to map an identifier (or a set of identifiers) onto a set of locators. This improves the scalability of the routing system as only the locators need to be distributed by BGP provided, of course, that the mapping system remains scalable. Furthermore, separating identifiers and locators has several additional benefits in terms of path diversity and performance. Some approaches propose to attach locators to hosts while other prefer to attach locators only to routers. The latter approach is the solution chosen by the proponents of the Locator/Identifier Separation Protocol (LISP). LISP is a router-based solution to solve the scaling problems of the Internet architecture that is currently being developed by Cisco. There are still many open questions concerning notably the mapping between identifiers and locators. To allow researchers and network operators to experiment with LISP, the IP Networking Lab of UCLouvain releases OpenLISP. OpenLISP is the first publicly available implementation of LISP on the FreeBSD kernel. OpenLISP was designed and implemented by Luigi Iannone. You can find more details about OpenLISP from http://inl.info.ucl.ac.be Any feedback from the FreeBSD Networking community is more than welcome. Best regards, Luigi Iannone luigi.iannone@uclouvain.be From julian at elischer.org Sun Jul 20 18:33:00 2008 From: julian at elischer.org (Julian Elischer) Date: Sun Jul 20 18:33:06 2008 Subject: OpenLISP In-Reply-To: References: Message-ID: <488384E5.3060608@elischer.org> Luigi Iannone wrote: > Hello FreeBSD Networking Community, > hello to you too :-) > The latter approach is the solution chosen by the proponents of the > Locator/Identifier Separation Protocol (LISP). LISP is a router-based > solution to solve the scaling problems of the Internet architecture that > is currently being developed by Cisco. Couldn't possibly come up with a better acronym? "lisp" is kinda taken.. are there any documents with PICTURES you can recommend to us? Does this connect at all with SCTP's capacity to multihome? jelische@cisco.com From Luigi.Iannone at uclouvain.be Sun Jul 20 18:45:53 2008 From: Luigi.Iannone at uclouvain.be (Luigi Iannone) Date: Sun Jul 20 18:46:03 2008 Subject: OpenLISP In-Reply-To: <488384E5.3060608@elischer.org> References: <488384E5.3060608@elischer.org> Message-ID: Hi, Le 20-juil.-08 ? 20:33, Julian Elischer a ?crit : > Luigi Iannone wrote: >> Hello FreeBSD Networking Community, > > hello to you too :-) > > >> The latter approach is the solution chosen by the proponents of >> the Locator/Identifier Separation Protocol (LISP). LISP is a >> router-based solution to solve the scaling problems of the >> Internet architecture that is currently being developed by Cisco. > > Couldn't possibly come up with a better acronym? "lisp" is kinda > taken.. > > are there any documents with PICTURES you can recommend to us? > Well, the official document can be found here: http://www.ietf.org/internet-drafts/draft-farinacci-lisp-08.txt If you want some _pictures_ you can get a look here: http://rosie.ripe.net/ripe/meetings/ripe-56/presentations/uploads/ Tuesday/Plenary%2016:00/upl/Fuller-LISP_Intro_and_Update.gNyX.pps > Does this connect at all with SCTP's capacity to multihome? > Not really. As far as I know SCTP is an end-to-end solution, where end-to-end stand for end-hosts. LISP is meant to be deployed mainly on border routers of stub domains. Cheers Luigi > > > jelische@cisco.com Luigi Iannone luigi.iannone@uclouvain.be From max at love2party.net Sun Jul 20 18:51:06 2008 From: max at love2party.net (Max Laier) Date: Sun Jul 20 18:51:13 2008 Subject: OpenLISP In-Reply-To: <488384E5.3060608@elischer.org> References: <488384E5.3060608@elischer.org> Message-ID: <200807202051.04431.max@love2party.net> On Sunday 20 July 2008 20:33:09 Julian Elischer wrote: > Luigi Iannone wrote: > > Hello FreeBSD Networking Community, > > hello to you too :-) > > > The latter approach is the solution chosen by the proponents of the > > Locator/Identifier Separation Protocol (LISP). LISP is a router-based > > solution to solve the scaling problems of the Internet architecture > > that is currently being developed by Cisco. > > Couldn't possibly come up with a better acronym? "lisp" is kinda > taken.. > > are there any documents with PICTURES you can recommend to us? The draft is quite readable: http://tools.ietf.org/html/draft-farinacci-lisp-08 > Does this connect at all with SCTP's capacity to multihome? (AFAIK) Not at all. They try to solve similar problems (or at least there is some intersection). A word about the implementation. The interception mechanism for LISP tunneled packets in ip_input/forward is *horrible*! Some of that is due to the design, but I believe it can be implemented much cleaner if you were to use the pfil(9) API. I'd really like to avoid putting this kind of stuff into the main ip code as it hurts readability a lot. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From Luigi.Iannone at uclouvain.be Sun Jul 20 19:00:31 2008 From: Luigi.Iannone at uclouvain.be (Luigi Iannone) Date: Sun Jul 20 19:00:38 2008 Subject: OpenLISP In-Reply-To: <200807202051.04431.max@love2party.net> References: <488384E5.3060608@elischer.org> <200807202051.04431.max@love2party.net> Message-ID: <4711A2FE-DFB4-44C0-9FA6-D69BD1B05C2E@uclouvain.be> > Hi, > A word about the implementation. The interception mechanism for LISP > tunneled packets in ip_input/forward is *horrible*! Some of that > is due > to the design, but I believe it can be implemented much cleaner if you > were to use the pfil(9) API. I'd really like to avoid putting this > kind > of stuff into the main ip code as it hurts readability a lot. > Thanks for the hint I'll get a look at that. Cheers Luigi > -- > /"\ Best regards, | mlaier@freebsd.org > \ / Max Laier | ICQ #67774661 > X http://pf4freebsd.love2party.net/ | mlaier@EFnet > / \ ASCII Ribbon Campaign | Against HTML Mail and News > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" Luigi Iannone luigi.iannone@uclouvain.be From julian at elischer.org Sun Jul 20 19:49:27 2008 From: julian at elischer.org (Julian Elischer) Date: Sun Jul 20 19:49:34 2008 Subject: OpenLISP In-Reply-To: <4711A2FE-DFB4-44C0-9FA6-D69BD1B05C2E@uclouvain.be> References: <488384E5.3060608@elischer.org> <200807202051.04431.max@love2party.net> <4711A2FE-DFB4-44C0-9FA6-D69BD1B05C2E@uclouvain.be> Message-ID: <488396CE.1060008@elischer.org> Luigi Iannone wrote: >> > > Hi, > > >> A word about the implementation. The interception mechanism for LISP >> tunneled packets in ip_input/forward is *horrible*! Some of that is due >> to the design, but I believe it can be implemented much cleaner if you >> were to use the pfil(9) API. I'd really like to avoid putting this kind >> of stuff into the main ip code as it hurts readability a lot. >> > > Thanks for the hint I'll get a look at that. my head hurts after reading that :-) I think I only 'got' half of it.. I'll read it again later.. The aim of this is to reduce routing table size and allow multihoming with the destination being able to suggest to a remote sight how to route back to it right? > > Cheers > > Luigi > > From Luigi.Iannone at uclouvain.be Sun Jul 20 19:58:00 2008 From: Luigi.Iannone at uclouvain.be (Luigi Iannone) Date: Sun Jul 20 19:58:06 2008 Subject: OpenLISP In-Reply-To: <488396CE.1060008@elischer.org> References: <488384E5.3060608@elischer.org> <200807202051.04431.max@love2party.net> <4711A2FE-DFB4-44C0-9FA6-D69BD1B05C2E@uclouvain.be> <488396CE.1060008@elischer.org> Message-ID: Le 20-juil.-08 ? 21:49, Julian Elischer a ?crit : > Luigi Iannone wrote: >>> >> Hi, >>> A word about the implementation. The interception mechanism for >>> LISP tunneled packets in ip_input/forward is *horrible*! Some of >>> that is due to the design, but I believe it can be implemented >>> much cleaner if you were to use the pfil(9) API. I'd really like >>> to avoid putting this kind of stuff into the main ip code as it >>> hurts readability a lot. >>> >> Thanks for the hint I'll get a look at that. > > my head hurts after reading that :-) > > I think I only 'got' half of it.. > I'll read it again later.. > The aim of this is to reduce routing table size and allow multihoming > You got it. > with the destination being able to suggest to a remote sight how to > route back to it right? > or at least suggest where to go through ... L. > >> Cheers >> Luigi > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" Luigi Iannone luigi.iannone@uclouvain.be From kmacy at freebsd.org Sun Jul 20 22:38:35 2008 From: kmacy at freebsd.org (Kip Macy) Date: Sun Jul 20 22:39:04 2008 Subject: moving sockbuf in to its own header Message-ID: <3c1674c90807201514o5bafba72r6be84de6e37a5525@mail.gmail.com> TOE drivers need to be able to directly enqueue data in to a socket buffer and thus benefit from having knowledge of sockbuf internals. However, there is no need for them to know about other socket definitions. Thus I would like to move sockbuf and accompanying definitions to their own header. This is a fairly straightforward change so I don't really see the need to wait more than a few days for feedback: http://www.fsmware.com/sockbuf.h.diff From kip.macy at gmail.com Sun Jul 20 23:07:30 2008 From: kip.macy at gmail.com (Kip Macy) Date: Sun Jul 20 23:07:36 2008 Subject: moving sockbuf in to its own header In-Reply-To: <3c1674c90807201514o5bafba72r6be84de6e37a5525@mail.gmail.com> References: <3c1674c90807201514o5bafba72r6be84de6e37a5525@mail.gmail.com> Message-ID: Actually, I'd like to re-factor multiple parts of socketvar in to separate files. Please provide feedback on the following: http://www.fsmware.com/socketvar_refactor.diff Thanks, Kip On Sun, Jul 20, 2008 at 3:14 PM, Kip Macy wrote: > TOE drivers need to be able to directly enqueue data in to a socket > buffer and thus benefit from having knowledge of sockbuf internals. > However, there is no need for them to know about other socket > definitions. Thus I would like to move sockbuf and accompanying > definitions to their own header. > > This is a fairly straightforward change so I don't really see the need > to wait more than a few days for feedback: > > http://www.fsmware.com/sockbuf.h.diff > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From vanhu at FreeBSD.org Mon Jul 21 08:31:13 2008 From: vanhu at FreeBSD.org (VANHULLEBUS Yvan) Date: Mon Jul 21 08:31:20 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <487EC62A.3070301@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> Message-ID: <20080721083110.GA21786@zen.inc> On Wed, Jul 16, 2008 at 09:10:18PM -0700, Sam Leffler wrote: [...] > Please test/review the following patch against HEAD: > > http://people.freebsd.org/~sam/nat_t-20080616.patch I have tested the RELENG7 version of the patch, and it works well. But I noticed a misplaced #endif at the beginning of udp_ctloutput(), which will generate problems if INET6 is not defined: if (sopt->sopt_level != IPPROTO_UDP) { #ifdef INET6 if (INP_CHECK_SOCKAF(so, AF_INET6)) { INP_WUNLOCK(inp); error = ip6_ctloutput(so, sopt); #endif } else { INP_WUNLOCK(inp); error = ip_ctloutput(so, sopt); #ifdef INET6 } #endif return (error); } The code should be: if (sopt->sopt_level != IPPROTO_UDP) { #ifdef INET6 if (INP_CHECK_SOCKAF(so, AF_INET6)) { INP_WUNLOCK(inp); error = ip6_ctloutput(so, sopt); } else { #endif INP_WUNLOCK(inp); error = ip_ctloutput(so, sopt); #ifdef INET6 } #endif return (error); } Yvan. From bzeeb-lists at lists.zabbadoz.net Mon Jul 21 09:30:07 2008 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Mon Jul 21 09:30:14 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <487EC62A.3070301@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> Message-ID: <20080721085325.B57089@maildrop.int.zabbadoz.net> On Wed, 16 Jul 2008, Sam Leffler wrote: Hi, > Please test/review the following patch against HEAD: > > http://people.freebsd.org/~sam/nat_t-20080616.patch > > This adds only the kernel portion of the NAT-T support; you must provide the > user-level code from another place. > > The main difference from the patches floating around are in the ctloutput > path (adding proper locking for HEAD) and decap of ESP-in-UDP frames. > Assuming folks are ok w/ these changes I'll commit to HEAD. Once this stuff > goes in we can look at getting the user-mode mods into the tree. I have skipped through the patch. My main concern at the moment is the API (pfkey stuff) to userland as Yvan had stated in <20080626075307.GA1401@zen.inc>. I know that at the moment there seems to be one public (pseudo) reference implementation this all works together but there might be/are other people not using libipsec from ipsec-tools. The point is changing the API once this hits the tree will be hard to detect at a later point if at all (unless with a __FreeBSD_version or (another) library version bump/sym versioning). We are still missing other things I think not mentioned elswhere like partial checksum recalculation. I still wonder if we'd have all the information (at the right place) in the kernel so we could easily add support for that at a later time w/o having to change APIs again. Considering that it seems noone using this patch in products has implemented this .. I dunno. It's something that is already mentioned in the introduction of RFC 3947 and in 3.1.2. of 3948 and thus should be very obvious to anyone ever seriously thought of finishing a proper more than "it works for me" version of the patch. Some minor things I had seen not reported so far: I have seen two printfs that should be changed to proper logging, ... /NAT-T OA present s,bave,have, in "...in the SPD: This means we bave a non-generated" but maybe change the entire comment. "non-generated SPD" is kind of wrong wording. I'd happily go through another patch once the missing/to be corrected things were addressed. /bz -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From bugmaster at FreeBSD.org Mon Jul 21 11:06:59 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 21 11:08:18 2008 Subject: Current problem reports assigned to freebsd-net@FreeBSD.org Message-ID: <200807211106.m6LB6x0X031944@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c o kern/35442 net [sis] [patch] Problem transmitting runts in if_sis dri a kern/38554 net [patch] changing interface ipaddress doesn't seem to w s kern/39937 net ipstealth issue s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/79895 net [ipf] 5.4-RC2 breaks ipfilter NAT when using netgraph s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter s kern/86920 net [ndis] ifconfig: SIOCS80211: Invalid argument [regress o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/92090 net [bge] bge0: watchdog timeout -- resetting f kern/92552 net A serious bug in most network drivers from 5.X to 6.X o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau f kern/102344 net [ipf] Some packets do not pass through network interfa o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] s kern/105943 net Network stack may modify read-only mbuf chain copies o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/106438 net [ipf] ipfilter: keep state does not seem to allow repl o kern/108542 net [bce]: Huge network latencies with 6.2-RELEASE / STABL o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/109308 net [pppd] [panic] Multiple panics kernel ppp suspected [r o kern/109733 net [bge] bge link state issues [regression] o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 net [udp] IP v4 udp fragmented packet reject o kern/113842 net [ip6] PF_INET6 proto domain state can't be cleared wit o kern/114714 net [gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net [ip] [patch] 6.2-STABLE panic during use of multi-cast o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/116328 net [bge]: Solid hang with bge interface o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 net [tun] [panic] [patch] ifconfig tunX destroy: panic o kern/117043 net [em] Intel PWLA8492MT Dual-Port Network adapter EEPROM o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117423 net [vlan] Duplicate IP on different interfaces o kern/117448 net [carp] 6.2 kernel crash [regression] o kern/118880 net [ip6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/119345 net [ath] Unsuported Atheros 5424/2424 and CPU speedstep n o kern/119361 net [bge] bge(4) transmit performance problem o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/120130 net [carp] [panic] carp causes kernel panics in any conste o kern/120266 net [panic] gnugk causes kernel panic when closing UDP soc o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/121080 net [bge] IPv6 NUD problem on multi address config on bge0 o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/121298 net [em] [panic] Fatal trap 12: page fault while in kernel o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121624 net [em] [regression] Intel em WOL fails after upgrade to o kern/121872 net [wpi] driver fails to attach on a fujitsu-siemens s711 o kern/121983 net [fxp] fxp0 MBUF and PAE o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup [reg o kern/122058 net [em] [panic] Panic on em1: taskq o kern/122082 net [in_pcb] NULL pointer dereference in in_pcbdrop o kern/122195 net [ed] Alignment problems in if_ed f kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122427 net [apm] [panic] apm and mDNSResponder cause panic during o kern/122551 net [bge] Broadcom 5715S no carrier on HP BL460c blade usi o kern/122685 net It is not visible passing packets in tcpdump o kern/122743 net [panic] vm_page_unwire: invalid wire count: 0 o kern/122772 net [em] em0 taskq panic, tcp reassembly bug causes radix f kern/122794 net [lagg] Kernel panic after brings lagg(8) up if NICs ar f conf/122858 net [nsswitch.conf] nsswitch in 7.0 is f*cked up o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/123066 net [ipsec] [panic] kernel trap with ipsec o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 f kern/123172 net [bce] Watchdog timeout problems with if_bce f kern/123200 net [netgraph] Server failure due to netgraph mpd and dhcp o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123347 net [bge] bge1: watchdog timeout -- linkstate changed to D o kern/123429 net [nfe] [hang] "ifconfig nfe up" causes a hard system lo o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o kern/123603 net [tcp] tcp_do_segment and Received duplicate SYN o kern/123617 net [tcp] breaking connection when client downloading file o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123881 net [tcp] Turning on TCP blackholing causes slow localhost o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/124127 net [msk] watchdog timeout (missed Tx interrupts) -- recov o kern/124753 net [ieee80211] net80211 discards power-save queue packets o kern/124904 net [fxp] EEPROM corruption with Compaq NC3163 NIC o kern/125079 net [ppp] host routes added by ppp with gateway flag (regr f kern/125195 net [fxp] fxp(4) driver failed to initialize device Intel 94 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o conf/23063 net [PATCH] for static ARP tables in rc.network o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net [nfs] [patch] NFS root configurations without dynamic s kern/60293 net FreeBSD arp poison patch o kern/64556 net [sis] if_sis short cable fix problems with NetGear FA3 o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/77273 net [ipf] ipfilter breaks ipv6 statefull filtering on 5.3 o kern/77913 net [wi] [patch] Add the APDL-325 WLAN pccard to wi(4) o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o bin/79228 net [patch] extend arp(8) to be able to create blackhole r o kern/91594 net [em] FreeBSD > 5.4 w/ACPI fails to detect Intel Pro/10 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/95267 net packet drops periodically appear o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/102035 net [plip] plip networking disables parallel port printing o conf/102502 net [patch] ifconfig name does't rename netgraph node in n o conf/107035 net [patch] bridge interface given in rc.conf not taking a o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o kern/112179 net [sis] [patch] sis driver for natsemi DP83815D autonego o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o bin/117339 net [patch] route(8): loading routing management commands o kern/118727 net [netgraph] [patch] [request] add new ng_pf module a kern/118879 net [bge] [patch] bge has checksum problems on the 5703 ch o bin/118987 net ifconfig(8): ifconfig -l (address_family) does not wor o kern/119432 net [arp] route add -host -iface causes arp e f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/120232 net [nfe] [patch] Bring in nfe(4) to RELENG_6 o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121443 net [gif] LOR icmp6_input/nd6_lookup o kern/121706 net [netinet] [patch] "rtfree: 0xc4383870 has 1 refs" emit s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122068 net [ppp] ppp can not set the correct interface with pptpd o kern/122295 net [bge] bge Ierr rate increase (since 6.0R) [regression] o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122697 net [ath] Atheros card is not well supported o kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge f kern/122839 net [multicast] FreeBSD 7 multicast routing problem o kern/122928 net [em] interface watchdog timeouts and stops receiving p o kern/123892 net [tap] [patch] No buffer space available p kern/123961 net [vr] [patch] Allow vr interface to handle vlans o bin/124004 net ifconfig(8): Cannot assign both an IP and a MAC addres o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124609 net [ipsec] [panic] ipsec 'remainder too big' panic with p o kern/124767 net [iwi] Wireless connection using iwi0 driver (Intel 220 o kern/125181 net [ndis] [patch] with wep enters kdb.enter.unknown, pani o kern/125239 net [gre] kernel crash when using gre o kern/125258 net [socket] socket's SO_REUSEADDR option does not work f kern/125502 net [ral] ifconfig ral0 scan produces no output unless in 57 problems total. From gavin at FreeBSD.org Mon Jul 21 13:57:58 2008 From: gavin at FreeBSD.org (gavin@FreeBSD.org) Date: Mon Jul 21 13:58:05 2008 Subject: kern/125816: [carp] [bridge] carp stuck in init when using bridge interface Message-ID: <200807211357.m6LDvwPB049993@freefall.freebsd.org> Old Synopsis: carp stuck in init when using bridge interface New Synopsis: [carp] [bridge] carp stuck in init when using bridge interface Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: gavin Responsible-Changed-When: Mon Jul 21 13:53:12 UTC 2008 Responsible-Changed-Why: Over to maintainer(s). Hopefully somebody can establish if this is an issue with carp or with bridge. http://www.freebsd.org/cgi/query-pr.cgi?pr=125816 From vanhu at FreeBSD.org Mon Jul 21 14:13:30 2008 From: vanhu at FreeBSD.org (VANHULLEBUS Yvan) Date: Mon Jul 21 14:13:38 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <20080721083110.GA21786@zen.inc> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721083110.GA21786@zen.inc> Message-ID: <20080721141327.GA24677@zen.inc> On Mon, Jul 21, 2008 at 10:31:10AM +0200, VANHULLEBUS Yvan wrote: > On Wed, Jul 16, 2008 at 09:10:18PM -0700, Sam Leffler wrote: > [...] > > Please test/review the following patch against HEAD: > > > > http://people.freebsd.org/~sam/nat_t-20080616.patch > > I have tested the RELENG7 version of the patch, and it works well. > > > But I noticed a misplaced #endif at the beginning of udp_ctloutput(), > which will generate problems if INET6 is not defined: [....] After some more testing, I found another issue: in udp4_espdecap(), when payload <= sizeof(uint64_t) + sizeof(struct esp), packet should not be discarded, but just returned for normal processing. And I also have doubts about a change in udp_ctloutput(), in the switch statement which process optval and searches for an UDP_ENCAP_ESPINUDP* flag. The way you changed it forces a flags cleanup anytime. I don't see why someone would set both UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE, but as I was tracking down a problem, I changed it again to be processed "the old way" to ensure it was not the source of the issue. Sam, did you have a good reason to change that part of the code, or was it mostly to have a more compliant coding style ? Updated patches are available for HEAD, RELENG7 and RELENG63 (yeah :-) here: http://people.freebsd.org/~vanhu/NAT-T/ Please all notice that there is still the word "test" in patches names..... Yvan. From vanhu at FreeBSD.org Mon Jul 21 14:27:00 2008 From: vanhu at FreeBSD.org (VANHULLEBUS Yvan) Date: Mon Jul 21 14:27:07 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <20080721085325.B57089@maildrop.int.zabbadoz.net> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721085325.B57089@maildrop.int.zabbadoz.net> Message-ID: <20080721142657.GB24677@zen.inc> [Larry, I kept you in an explicit CC, even if I guess you suscribed to the list] On Mon, Jul 21, 2008 at 09:26:15AM +0000, Bjoern A. Zeeb wrote: > On Wed, 16 Jul 2008, Sam Leffler wrote: > > Hi, Hi. [...] > My main concern at the moment is the API (pfkey stuff) to userland as > Yvan had stated in <20080626075307.GA1401@zen.inc>. It is also one of my main concerns actually. > I know that at the moment there seems to be one public (pseudo) reference > implementation this all works together but there might be/are other > people not using libipsec from ipsec-tools. Well, people who use another libipsec are expected to "just" not see NAT-T extensions. The only "real issue" is that, actually, NAT-T ports are sent though sockaddr structs, when RFC 2367 says that zeroing ports MUST be done (section 2.3.3). There is already an open ticket on ipsec-tools side to cleanup that part of the code on userland's size of PFKey interface, and I hope it will be done for 0.8.0 release (sorry, no release date for now). As soon as I'll have a working patch on userland, I'll do the work on FreeBSD's kernel side. I hope everything will be done within a few weeks, but I already know that we'll have backward compatibility issues with various kernels (ipsec-tools runs at least on FreeBSD, NetBSD, Linux and MacOSX). Yvan. From sam at freebsd.org Mon Jul 21 15:20:57 2008 From: sam at freebsd.org (Sam Leffler) Date: Mon Jul 21 15:21:04 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <20080721085325.B57089@maildrop.int.zabbadoz.net> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721085325.B57089@maildrop.int.zabbadoz.net> Message-ID: <4884A956.5060108@freebsd.org> Bjoern A. Zeeb wrote: > On Wed, 16 Jul 2008, Sam Leffler wrote: > > Hi, > >> Please test/review the following patch against HEAD: >> >> http://people.freebsd.org/~sam/nat_t-20080616.patch >> >> This adds only the kernel portion of the NAT-T support; you must >> provide the user-level code from another place. >> >> The main difference from the patches floating around are in the >> ctloutput path (adding proper locking for HEAD) and decap of >> ESP-in-UDP frames. Assuming folks are ok w/ these changes I'll commit >> to HEAD. Once this stuff goes in we can look at getting the >> user-mode mods into the tree. > > I have skipped through the patch. > > My main concern at the moment is the API (pfkey stuff) to userland as > Yvan had stated in <20080626075307.GA1401@zen.inc>. > > I know that at the moment there seems to be one public (pseudo) reference > implementation this all works together but there might be/are other > people not using libipsec from ipsec-tools. > > The point is changing the API once this hits the tree will be hard to > detect at a later point if at all (unless with a __FreeBSD_version or > (another) library version bump/sym versioning). > > > We are still missing other things I think not mentioned elswhere like > partial checksum recalculation. Please send me your specific issues; I haven't seen any comments about "partial checksum recalculations". > I still wonder if we'd have all the information (at the right place) in > the kernel so we could easily add support for that at a later time > w/o having to change APIs again. Considering that it seems noone using > this patch in products has implemented this .. I dunno. > It's something that is already mentioned in the introduction of RFC 3947 > and in 3.1.2. of 3948 and thus should be very obvious to anyone ever > seriously thought of finishing a proper more than "it works for me" > version of the patch. I don't see any of the above blocking this work going in. Forcing people to maintain out-of-tree patches for years because of vague concerns is unproductive. This code is used by at least 2 vendors shipping products. > > > Some minor things I had seen not reported so far: > > I have seen two printfs that should be changed to proper logging, ... > /NAT-T OA present > > s,bave,have, in "...in the SPD: This means we bave a non-generated" > but maybe change the entire comment. "non-generated SPD" is kind of > wrong wording. > > > I'd happily go through another patch once the missing/to be corrected > things were addressed. > Please apply your changes to the p4 branch or fix 'em when the code hits CVS. I've see no concrete rationale for holding this work out. Sam From sam at freebsd.org Mon Jul 21 15:33:58 2008 From: sam at freebsd.org (Sam Leffler) Date: Mon Jul 21 15:34:04 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <20080721141327.GA24677@zen.inc> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721083110.GA21786@zen.inc> <20080721141327.GA24677@zen.inc> Message-ID: <4884AC65.7020605@freebsd.org> VANHULLEBUS Yvan wrote: > On Mon, Jul 21, 2008 at 10:31:10AM +0200, VANHULLEBUS Yvan wrote: > >> On Wed, Jul 16, 2008 at 09:10:18PM -0700, Sam Leffler wrote: >> [...] >> >>> Please test/review the following patch against HEAD: >>> >>> http://people.freebsd.org/~sam/nat_t-20080616.patch >>> >> I have tested the RELENG7 version of the patch, and it works well. >> >> >> But I noticed a misplaced #endif at the beginning of udp_ctloutput(), >> which will generate problems if INET6 is not defined: >> > [....] > > > After some more testing, I found another issue: in udp4_espdecap(), > when payload <= sizeof(uint64_t) + sizeof(struct esp), packet should > not be discarded, but just returned for normal processing. > Please edit the sam_nat_t branch in p4 or send a patch I can apply. > And I also have doubts about a change in udp_ctloutput(), in the > switch statement which process optval and searches for an > UDP_ENCAP_ESPINUDP* flag. > > The way you changed it forces a flags cleanup anytime. > I don't see why someone would set both UDP_ENCAP_ESPINUDP and > UDP_ENCAP_ESPINUDP_NON_IKE, but as I was tracking down a problem, I > changed it again to be processed "the old way" to ensure it was not > the source of the issue. > Sorry but I'm not clear on what you are saying. The code changed the behaviour of setting udp encapsulation so that only one of UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE can be set a time. The original code from you permitted both flags to be set but the code that handled the encap/decap assumed only one was set. > Sam, did you have a good reason to change that part of the code, or > was it mostly to have a more compliant coding style ? > See above. > > Updated patches are available for HEAD, RELENG7 and RELENG63 (yeah :-) > here: > http://people.freebsd.org/~vanhu/NAT-T/ > > Please all notice that there is still the word "test" in patches > names..... > Sorry again I don't understand what you write. Sam > > > Yvan. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > From sam at freebsd.org Mon Jul 21 15:42:34 2008 From: sam at freebsd.org (Sam Leffler) Date: Mon Jul 21 15:42:41 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <20080721142657.GB24677@zen.inc> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721085325.B57089@maildrop.int.zabbadoz.net> <20080721142657.GB24677@zen.inc> Message-ID: <4884AE67.4020204@freebsd.org> VANHULLEBUS Yvan wrote: > [Larry, I kept you in an explicit CC, even if I guess you suscribed to > the list] > > On Mon, Jul 21, 2008 at 09:26:15AM +0000, Bjoern A. Zeeb wrote: > >> On Wed, 16 Jul 2008, Sam Leffler wrote: >> >> Hi, >> > > Hi. > > > [...] > >> My main concern at the moment is the API (pfkey stuff) to userland as >> Yvan had stated in <20080626075307.GA1401@zen.inc>. >> > > It is also one of my main concerns actually. > > > >> I know that at the moment there seems to be one public (pseudo) reference >> implementation this all works together but there might be/are other >> people not using libipsec from ipsec-tools. >> > > Well, people who use another libipsec are expected to "just" not see > NAT-T extensions. > > The only "real issue" is that, actually, NAT-T ports are sent though > sockaddr structs, when RFC 2367 says that zeroing ports MUST be done > (section 2.3.3). > > > There is already an open ticket on ipsec-tools side to cleanup that > part of the code on userland's size of PFKey interface, and I hope > it will be done for 0.8.0 release (sorry, no release date for now). > > As soon as I'll have a working patch on userland, I'll do the work on > FreeBSD's kernel side. I hope everything will be done within a few > weeks, but I already know that we'll have backward compatibility > issues with various kernels (ipsec-tools runs at least on FreeBSD, > NetBSD, Linux and MacOSX). > With regard to changing the kernel API. First, this is HEAD and api's can change. I intentionally have said nothing about MFC and didn't touch user code. Getting the support into the kernel enables use and testing which was the point of getting the logjam broken so full NAT-T support can ship w/ 8.0. I committed to get everything necessary in the tree in time for 8.0 but now that you have direct access to freebsd's repo I think that's less important. Sam From davidch at broadcom.com Mon Jul 21 17:36:13 2008 From: davidch at broadcom.com (David Christensen) Date: Mon Jul 21 17:36:20 2008 Subject: Status of Multi-Queue (RSS) Support in -CURRENT Message-ID: <5D267A3F22FD854F8F48B3D2B52381932678025873@IRVEXCHCCR01.corp.ad.broadcom.com> I'm working on implementing multi-queue support for a 10Gb device on FreeBSD and I wanted to find out the current state of the OS with regards to supporting this. It seems that support for multiple receive queues can be done today since most of the routing is done in hardware but the transmit side is a different story. I've seen some things in the cxgb driver that suggest changes to the OS (such as a m_pkthdr.rss_hash field) but I don't see any OS code to back that usage model up. What's the state of the art in multi-queue support for FreeBSD? Dave From mgrooms at shrew.net Mon Jul 21 18:27:04 2008 From: mgrooms at shrew.net (Matthew Grooms) Date: Mon Jul 21 18:27:10 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <4884C28F.4020402@shrew.net> References: <4884C28F.4020402@shrew.net> Message-ID: <4884D4F9.4050707@shrew.net> On Mon, Jul 21, 2008 at 10:31:10AM +0200, VANHULLEBUS Yvan wrote: > > After some more testing, I found another issue: in udp4_espdecap(), > when payload <= sizeof(uint64_t) + sizeof(struct esp), packet should > not be discarded, but just returned for normal processing. > I noticed this too. But the only situation that I could think of where a valid ISAKMP packet will be smaller than this is a NAT-T keep-alive. These are handled previously in the code path so I don't think there is an issue from a functional standpoint. > And I also have doubts about a change in udp_ctloutput(), in the > switch statement which process optval and searches for an > UDP_ENCAP_ESPINUDP* flag. > > The way you changed it forces a flags cleanup anytime. > I don't see why someone would set both UDP_ENCAP_ESPINUDP and > UDP_ENCAP_ESPINUDP_NON_IKE, but as I was tracking down a problem, I > changed it again to be processed "the old way" to ensure it was not > the source of the issue. > It should be disallowed as in Sams patch. Allowing them to be mixed would cause problems using any of the patches I have seen. There is no way to distinguish between a Draft 00/01 ISAKMP packet and an RFC ESP packet without matching the port value to a SAD NAT-T mapping. And as you mentioned, I also don't see why anyone would try to set them both. There should never be a situation where you need to evaluate a NON-ESP NAT-T marker on an ISAKMP socket, only NON-ISAKMP markers. On a related note, I noticed the patch unconditionally uses a source port of 500 when processing outbound Draft 00/01 packets. Should this value be obtained from the SAD NAT-T mapping to support an IKE daemon bound to a non standard port? Thanks, -Matthew From vanhu at FreeBSD.org Mon Jul 21 19:16:39 2008 From: vanhu at FreeBSD.org (VANHULLEBUS Yvan) Date: Mon Jul 21 19:16:47 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <4884D4F9.4050707@shrew.net> References: <4884C28F.4020402@shrew.net> <4884D4F9.4050707@shrew.net> Message-ID: <20080721191635.GA2846@zen.inc> On Mon, Jul 21, 2008 at 01:27:05PM -0500, Matthew Grooms wrote: > On Mon, Jul 21, 2008 at 10:31:10AM +0200, VANHULLEBUS Yvan wrote: >> >> After some more testing, I found another issue: in udp4_espdecap(), >> when payload <= sizeof(uint64_t) + sizeof(struct esp), packet should >> not be discarded, but just returned for normal processing. >> > > I noticed this too. But the only situation that I could think of where a > valid ISAKMP packet will be smaller than this is a NAT-T keep-alive. > These are handled previously in the code path so I don't think there is > an issue from a functional standpoint. That's what I also supposed when I noticed that, but I was tracking down a negociation problem (as an initiator, responder's first exchange in Main mode was seen on tcpdump but not on racoon's log), and it has been solved by fixing that part of the code.... [...] > It should be disallowed as in Sams patch. Allowing them to be mixed > would cause problems using any of the patches I have seen. There is no > way to distinguish between a Draft 00/01 ISAKMP packet and an RFC ESP > packet without matching the port value to a SAD NAT-T mapping. And as > you mentioned, I also don't see why anyone would try to set them both. > There should never be a situation where you need to evaluate a NON-ESP > NAT-T marker on an ISAKMP socket, only NON-ISAKMP markers. Yes. As I said, I was tracking down a problem on a gate which used to run for a long time with previous patches, so every difference was suspect for me :-) > On a related note, I noticed the patch unconditionally uses a source > port of 500 when processing outbound Draft 00/01 packets. Should this > value be obtained from the SAD NAT-T mapping to support an IKE daemon > bound to a non standard port? It should really really not happen..... but yes, it would be cleaner to get it from SAD than setting 500 anytime. Yvan. From bzeeb-lists at lists.zabbadoz.net Mon Jul 21 19:30:08 2008 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Mon Jul 21 19:30:14 2008 Subject: FreeBSD NAT-T patch integration [CFR/CFT] In-Reply-To: <4884A956.5060108@freebsd.org> References: <20080630040103.94730.qmail@mailgate.gta.com> <486A45AB.2080609@freebsd.org> <487EC62A.3070301@freebsd.org> <20080721085325.B57089@maildrop.int.zabbadoz.net> <4884A956.5060108@freebsd.org> Message-ID: <20080721180626.V57089@maildrop.int.zabbadoz.net> On Mon, 21 Jul 2008, Sam Leffler wrote: Hi Sam, >> We are still missing other things I think not mentioned elswhere like >> partial checksum recalculation. > > Please send me your specific issues; I haven't seen any comments about > "partial checksum recalculations". So what has kept you from reading the RFCs for the patch you were working on? "It works for me" does not mean "It's right and all done". /bz -- Bjoern A. Zeeb Stop bit received. Insert coin for new game. From rehsack at web.de Mon Jul 21 20:39:53 2008 From: rehsack at web.de (Jens Rehsack) Date: Mon Jul 21 20:40:01 2008 Subject: lo0 not in ioctl( SIOCGIFCONF ) Message-ID: <4884F401.4050103@web.de> Hi, maybe this question is better asked in this list ... I was searching why ports/net/p5-Net-Interface was not working as expected and found some reasons. Most of them I can answer by implementing some test code as attached, but now I'm wondering why em0 is shown twice and lo0 is not included. The same situation on another machine .. --- BEGIN ifconfig -a (waldorf) em0: flags=8843 metric 0 mtu 1500 options=19b ether 00:15:17:10:84:6c inet 10.62.10.3 netmask 0xffffff00 broadcast 10.62.10.255 media: Ethernet autoselect (100baseTX ) status: active em1: flags=8802 metric 0 mtu 1500 options=19b ether 00:15:17:10:84:6d media: Ethernet autoselect status: no carrier lo0: flags=8049 metric 0 mtu 16384 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 --- END ifconfig -a (waldorf) ./netif em0 em0 em1 --- BEGIN ifconfig -a (STINGRAY) ifconfig -a fxp0: flags=8843 metric 0 mtu 1500 options=8 ether 00:a0:c9:ce:c8:64 inet 10.62.10.12 netmask 0xffffff00 broadcast 10.62.10.255 media: Ethernet autoselect (100baseTX ) status: active fxp1: flags=8843 metric 0 mtu 1500 options=8 ether 00:a0:c9:ce:db:83 media: Ethernet autoselect (100baseTX ) status: active pflog0: flags=141 metric 0 mtu 33204 lo0: flags=8049 metric 0 mtu 16384 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 vlan0: flags=8843 metric 0 mtu 1500 ether 00:a0:c9:ce:db:83 media: Ethernet autoselect (100baseTX ) status: active vlan: 7 parent interface: fxp1 tun0: flags=8051 metric 0 mtu 1492 inet 87.149.231.190 --> 217.0.119.167 netmask 0xffffffff Opened by PID 27503 --- END ifconfig -a (STINGRAY) ./netif32 fxp0 fxp0 fxp1 Why aren't lo0, vlan0 and tun0 not included? What can I do to get these entries (portable way, please). Best regards, Jens From brooks at freebsd.org Mon Jul 21 20:47:42 2008 From: brooks at freebsd.org (Brooks Davis) Date: Mon Jul 21 20:47:49 2008 Subject: lo0 not in ioctl( SIOCGIFCONF ) In-Reply-To: <4884F401.4050103@web.de> References: <4884F401.4050103@web.de> Message-ID: <20080721204820.GE1699@lor.one-eyed-alien.net> > Hi, > > maybe this question is better asked in this list ... > > I was searching why ports/net/p5-Net-Interface was not working as > expected and found some reasons. Most of them I can answer by implementing > some test code as attached, but now I'm wondering why em0 is shown twice > and lo0 is not included. > The same situation on another machine .. The attachment didn't make it through. -- Brooks On Mon, Jul 21, 2008 at 08:39:29PM +0000, Jens Rehsack wrote: > > --- BEGIN ifconfig -a (waldorf) > em0: flags=8843 metric 0 mtu 1500 > options=19b > ether 00:15:17:10:84:6c > inet 10.62.10.3 netmask 0xffffff00 broadcast 10.62.10.255 > media: Ethernet autoselect (100baseTX ) > status: active > em1: flags=8802 metric 0 mtu 1500 > options=19b > ether 00:15:17:10:84:6d > media: Ethernet autoselect > status: no carrier > lo0: flags=8049 metric 0 mtu 16384 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 > inet6 ::1 prefixlen 128 > inet 127.0.0.1 netmask 0xff000000 > --- END ifconfig -a (waldorf) > ./netif > em0 > em0 > em1 > > --- BEGIN ifconfig -a (STINGRAY) > ifconfig -a > fxp0: flags=8843 metric 0 mtu 1500 > options=8 > ether 00:a0:c9:ce:c8:64 > inet 10.62.10.12 netmask 0xffffff00 broadcast 10.62.10.255 > media: Ethernet autoselect (100baseTX ) > status: active > fxp1: flags=8843 metric 0 mtu 1500 > options=8 > ether 00:a0:c9:ce:db:83 > media: Ethernet autoselect (100baseTX ) > status: active > pflog0: flags=141 metric 0 mtu 33204 > lo0: flags=8049 metric 0 mtu 16384 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > inet6 ::1 prefixlen 128 > inet 127.0.0.1 netmask 0xff000000 > vlan0: flags=8843 metric 0 mtu 1500 > ether 00:a0:c9:ce:db:83 > media: Ethernet autoselect (100baseTX ) > status: active > vlan: 7 parent interface: fxp1 > tun0: flags=8051 metric 0 mtu 1492 > inet 87.149.231.190 --> 217.0.119.167 netmask 0xffffffff > Opened by PID 27503 > --- END ifconfig -a (STINGRAY) > ./netif32 > fxp0 > fxp0 > fxp1 > > Why aren't lo0, vlan0 and tun0 not included? What can I do to get these > entries (portable way, please). > > Best regards, > Jens > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080721/398aecb1/attachment.pgp From kip.macy at gmail.com Mon Jul 21 21:27:15 2008 From: kip.macy at gmail.com (Kip Macy) Date: Mon Jul 21 21:27:21 2008 Subject: Status of Multi-Queue (RSS) Support in -CURRENT In-Reply-To: <5D267A3F22FD854F8F48B3D2B52381932678025873@IRVEXCHCCR01.corp.ad.broadcom.com> References: <5D267A3F22FD854F8F48B3D2B52381932678025873@IRVEXCHCCR01.corp.ad.broadcom.com> Message-ID: On Mon, Jul 21, 2008 at 10:36 AM, David Christensen wrote: > I'm working on implementing multi-queue support for a 10Gb > device on FreeBSD and I wanted to find out the current state > of the OS with regards to supporting this. It seems that > support for multiple receive queues can be done today since > most of the routing is done in hardware but the transmit > side is a different story. I've seen some things in the > cxgb driver that suggest changes to the OS (such as a > m_pkthdr.rss_hash field) but I don't see any OS code to > back that usage model up. What's the state of the art > in multi-queue support for FreeBSD? Unfortunately nothing has gone in yet. Robert has a prototype interface and I *think* that he may have come around to accepting my approach. The right way to integrate QoS and multi-queue cleanly isn't 100% obvious. I think it isn't unreasonable to expect that the new interfaces will go in in time for 7.2 but 7.1 is basically impossible at this point given that the freeze will be happening in the next month. Thanks, Kip From rehsack at web.de Mon Jul 21 21:31:03 2008 From: rehsack at web.de (Jens Rehsack) Date: Mon Jul 21 21:31:09 2008 Subject: lo0 not in ioctl( SIOCGIFCONF ) In-Reply-To: <20080721204820.GE1699@lor.one-eyed-alien.net> References: <4884F401.4050103@web.de> <20080721204820.GE1699@lor.one-eyed-alien.net> Message-ID: <4884FFFF.9090908@web.de> Brooks Davis wrote: >> Hi, >> >> maybe this question is better asked in this list ... >> >> I was searching why ports/net/p5-Net-Interface was not working as >> expected and found some reasons. Most of them I can answer by implementing >> some test code as attached, but now I'm wondering why em0 is shown twice >> and lo0 is not included. >> The same situation on another machine .. > > The attachment didn't make it through. > > -- Brooks Copy&Paste starts here ... #include #include #include #include #include #include #include #include #ifndef _SIZEOF_ADDR_IFREQ #define _SIZEOF_ADDR_IFREQ(ifr) \ ((ifr).ifr_addr.sa_len > sizeof(struct sockaddr) ? \ (sizeof(struct ifreq) - sizeof(struct sockaddr) + \ (ifr).ifr_addr.sa_len) : sizeof(struct ifreq)) #endif int main() { struct ifconf ifc; struct ifreq *ifr, *lifr; int fd; unsigned int n; fd = socket( AF_INET, SOCK_STREAM, 0 ); bzero(&ifc, sizeof(ifc)); n = 3; ifr = calloc( ifc.ifc_len, sizeof(*ifr) ); do { n *= 2; ifr = realloc( ifr, sizeof(*ifr) * n ); bzero( ifr, sizeof(*ifr) * n ); ifc.ifc_req = ifr; ifc.ifc_len = n * sizeof(*ifr); } while( ( ioctl( fd, SIOCGIFCONF, &ifc ) == -1 ) || ( ifc.ifc_len == n * sizeof(*ifr)) ); lifr = (struct ifreq *)&ifc.ifc_buf[ifc.ifc_len]; while (ifr < lifr) { printf( "%s\n", ifr->ifr_name ); ifr = (struct ifreq *)(((char *)ifr) + _SIZEOF_ADDR_IFREQ(*ifr)); } return 0; } From brooks at freebsd.org Mon Jul 21 22:23:37 2008 From: brooks at freebsd.org (Brooks Davis) Date: Mon Jul 21 22:23:43 2008 Subject: lo0 not in ioctl( SIOCGIFCONF ) In-Reply-To: <4884FFFF.9090908@web.de> References: <4884F401.4050103@web.de> <20080721204820.GE1699@lor.one-eyed-alien.net> <4884FFFF.9090908@web.de> Message-ID: <20080721222416.GG1699@lor.one-eyed-alien.net> On Mon, Jul 21, 2008 at 09:30:39PM +0000, Jens Rehsack wrote: > Brooks Davis wrote: >>> Hi, >>> >>> maybe this question is better asked in this list ... >>> >>> I was searching why ports/net/p5-Net-Interface was not working as >>> expected and found some reasons. Most of them I can answer by implementing >>> some test code as attached, but now I'm wondering why em0 is shown twice >>> and lo0 is not included. >>> The same situation on another machine .. >> >> The attachment didn't make it through. >> >> -- Brooks > > Copy&Paste starts here ... > #include > #include > #include > #include > #include > #include > #include > #include > > #ifndef _SIZEOF_ADDR_IFREQ > #define _SIZEOF_ADDR_IFREQ(ifr) \ > ((ifr).ifr_addr.sa_len > sizeof(struct sockaddr) ? \ > (sizeof(struct ifreq) - sizeof(struct sockaddr) + \ > (ifr).ifr_addr.sa_len) : sizeof(struct ifreq)) > #endif > > int > main() > { > struct ifconf ifc; > struct ifreq *ifr, *lifr; > int fd; > unsigned int n; > > fd = socket( AF_INET, SOCK_STREAM, 0 ); > bzero(&ifc, sizeof(ifc)); > n = 3; > ifr = calloc( ifc.ifc_len, sizeof(*ifr) ); > do > { > n *= 2; > ifr = realloc( ifr, sizeof(*ifr) * n ); > bzero( ifr, sizeof(*ifr) * n ); > ifc.ifc_req = ifr; > ifc.ifc_len = n * sizeof(*ifr); > } while( ( ioctl( fd, SIOCGIFCONF, &ifc ) == -1 ) || ( ifc.ifc_len > == n * sizeof(*ifr)) ); There are several problems with this loop. First, icoctl won't return an error in the overflow case because that's not how SIOCGIFCONF works. SIOCGIFCONF is badly designed in a number of ways, but that's how it is. Second, checking that the array is completely full isn't at all reliable because what is returned is actually ifreq structures which might or might not vary in length as they contain addresses. Thus you need <=. Third, you should start by allocating a significant amount of space. Yes, your algorithm is O(sqrt(n)), but allocating a larger value has effectively no cost so you might as well save some system calls on average. > lifr = (struct ifreq *)&ifc.ifc_buf[ifc.ifc_len]; > > while (ifr < lifr) > { > printf( "%s\n", ifr->ifr_name ); > ifr = (struct ifreq *)(((char *)ifr) + _SIZEOF_ADDR_IFREQ(*ifr)); > } This loop has two problems. First, the ifr's are variable length so you immediately go off into the weeds. Second, there is at least one per interface and one per address so you to keep track of the last interface name and not repeat them. -- Brooks > > return 0; > } > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20080721/7f05d287/attachment.pgp From rehsack at web.de Mon Jul 21 22:37:48 2008 From: rehsack at web.de (Jens Rehsack) Date: Mon Jul 21 22:37:54 2008 Subject: lo0 not in ioctl( SIOCGIFCONF ) In-Reply-To: <20080721222416.GG1699@lor.one-eyed-alien.net> References: <4884F401.4050103@web.de> <20080721204820.GE1699@lor.one-eyed-alien.net> <4884FFFF.9090908@web.de> <20080721222416.GG1699@lor.one-eyed-alien.net> Message-ID: <48850F72.90204@web.de> Brooks Davis wrote: > On Mon, Jul 21, 2008 at 09:30:39PM +0000, Jens Rehsack wrote: >> Brooks Davis wrote: >>>> Hi, >>>> >>>> maybe this question is better asked in this list ... >>>> >>>> I was searching why ports/net/p5-Net-Interface was not working as >>>> expected and found some reasons. Most of them I can answer by implementing >>>> some test code as attached, but now I'm wondering why em0 is shown twice >>>> and lo0 is not included. >>>> The same situation on another machine .. >>> The attachment didn't make it through. >>> >>> -- Brooks >> Copy&Paste starts here ... >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #ifndef _SIZEOF_ADDR_IFREQ >> #define _SIZEOF_ADDR_IFREQ(ifr) \ >> ((ifr).ifr_addr.sa_len > sizeof(struct sockaddr) ? \ >> (sizeof(struct ifreq) - sizeof(struct sockaddr) + \ >> (ifr).ifr_addr.sa_len) : sizeof(struct ifreq)) >> #endif >> >> int >> main() >> { >> struct ifconf ifc; >> struct ifreq *ifr, *lifr; >> int fd; >> unsigned int n; >> >> fd = socket( AF_INET, SOCK_STREAM, 0 ); >> bzero(&ifc, sizeof(ifc)); >> n = 3; >> ifr = calloc( ifc.ifc_len, sizeof(*ifr) ); >> do >> { >> n *= 2; >> ifr = realloc( ifr, sizeof(*ifr) * n ); >> bzero( ifr, sizeof(*ifr) * n ); >> ifc.ifc_req = ifr; >> ifc.ifc_len = n * sizeof(*ifr); >> } while( ( ioctl( fd, SIOCGIFCONF, &ifc ) == -1 ) || ( ifc.ifc_len >> == n * sizeof(*ifr)) ); > > There are several problems with this loop. First, icoctl won't return > an error in the overflow case because that's not how SIOCGIFCONF works. > SIOCGIFCONF is badly designed in a number of ways, but that's how it > is. Second, checking that the array is completely full isn't at all > reliable because what is returned is actually ifreq structures which > might or might not vary in length as they contain addresses. Thus you > need <=. Third, you should start by allocating a significant amount of > space. Yes, your algorithm is O(sqrt(n)), but allocating a larger > value has effectively no cost so you might as well save some system calls > on average. Thanks - that was the information I miss. I'll try tomorrow (it's slightly late here) and send back the result. Using <= should produce an endless loop, but maybe checking if ifc.ifc_len <= (n/2) * sizeof(*ifr) could bring wanted results ... >> lifr =