Proposed 6.2 em RELEASE patch

Clayton Milos clay at milos.co.za
Sat Nov 11 07:08:43 UTC 2006


----- Original Message ----- 
From: "Scott Long" <scottl at samsco.org>
To: "Mike Tancsa" <mike at sentex.net>
Cc: "freebsd-net" <freebsd-net at freebsd.org>; <glebius at freebsd.org>; 
<freebsd-stable at freebsd.org>; "Jack Vogel" <jfvogel at gmail.com>
Sent: Saturday, November 11, 2006 8:42 AM
Subject: Re: Proposed 6.2 em RELEASE patch


> Mike Tancsa wrote:
>> At 05:00 PM 11/10/2006, Jack Vogel wrote:
>>> On 11/10/06, Mike Tancsa <mike at sentex.net> wrote:
>>>>
>>>> Some more tests. I tried again with what was committed to today's
>>>> RELENG_6. I am guessing its pretty well the same patch.  Polling is
>>>> the only way to avoid livelock at a high pps rate.  Does anyone know
>>>> of any simple tools to measure end to end packet loss ? Polling will
>>>> end up dropping some packets and I want to be able to compare.  Same
>>>> hardware from the previous post.
>>>
>>> The commit WAS the last patch I posted. SO, making sure I understood 
>>> you,
>>> you are saying that POLLING is doing better than FAST_INTR, or only
>>> better than the legacy code that went in with my merge?
>>
>> Hi,
>> The last set of tests I posted are ONLY with what is in today's 
>> RELENG_6-- i.e. the latest commit. I did a few variations on the driver--  
>> first with
>> #define EM_FAST_INTR 1
>> in if_em.c
>>
>> one without
>>
>> and one with polling in the kernel.
>>
>> With a decent packet rate passing through, the box will lockup.  Not sure 
>> if I am just hitting the limits of the PCIe bus, or interrupt moderation 
>> is not kicking in, or this is a case of "Doctor, it hurts when I send a 
>> lot of packets through"... "Well, dont do that"
>>
>> Using polling prevents the lockup, but it will of course drop packets. 
>> This is for firewalls with a fairly high bandwidth rate, as well as I 
>> need it to be able to survive a decent DDoS attack.  I am not looking for 
>> 1Mpps, but something more than 100Kpps
>>
>>         ---Mike
>
> Hi,
>
> Thanks for all of the data.  I know that a good amount of testing was
> done with single stream stress tests, but it's not clear how much was
> done with multiple streams prior to your efforts.  So, I'm not terribly
> surprised by your results.  I'm still a bit unclear on the exact
> topology of your setup, so if could explain it some more in private
> email, I'd appreciate it.
>
> For the short term, I don't think that there is anything that can be
> magically tweaked that will safely give better results.  I know that
> Gleb has some ideas on a fairly simple change for the non-INTR_FAST,
> non-POLLING case, but I and several others worry that it's not robust
> in the face of real-world network problems.
>
> For the long term, I have a number of ideas for improving both the RX
> and TX paths in the driver.  Some of it is specific to the if_em driver,
> some involve improvements in the FFWD and PFIL_HOOKS code as well as the
> driver.  What will help me is if you can hook up a serial console to
> your machine and see if it can be made to drop to the debugger while it
> is under load and otherwise unresponsive.  If you can, getting a process
> dump might help confirm where each CPU is spending its time.
>
> Scott


I applied Jack's patch to the em driver and all seemed well until xl was 
giving me the same issues.

Thanks Jack on my machine your first patch looks 100%

Since my box does not take too much load and to me a slightly more loaded 
machine is better than an unstable one i re-complied the kernel without SMP 
so I have a dual CPU system with only one of the CPU's working.

I've smacked it with about 50G of data using samba and FTP and it didn't 
blink. I am however using a fxp card for the live IP side but the xl's are 
still in the kernel and getting picked up. I have just not configured them 
with IP's for traffic. I don't think this is the issue tho. I'd say there's 
something to do with the SMP code that is causing these issues.

I have another box with SMP on it. Same kind of setup with a Tyan Tiger 
instead of a Thunder motherboard. 2 Fxp NICs in it. Most of the time it's 
stable but if i throw a lot of traffic at it it locks up too. Next time it 
does I will post the console message, but there is no warnings about 
watchdog timeouts far as I can remember. It's running 5.5-RELEASE-p8 with 
SMP enabled.

-Clay



More information about the freebsd-net mailing list