Intel 10Gb

Alexander Sack pisymbol at gmail.com
Sat May 15 21:49:34 UTC 2010


On Sat, May 15, 2010 at 9:23 AM, Barney Cordoba
<barney_cordoba at yahoo.com> wrote:
>
>
> --- On Fri, 5/14/10, Alexander Sack <pisymbol at gmail.com> wrote:
>
>> From: Alexander Sack <pisymbol at gmail.com>
>> Subject: Re: Intel 10Gb
>> To: "Jack Vogel" <jfvogel at gmail.com>
>> Cc: "Murat Balaban" <murat at enderunix.org>, freebsd-net at freebsd.org, freebsd-performance at freebsd.org, "Andrew Gallatin" <gallatin at cs.duke.edu>
>> Date: Friday, May 14, 2010, 1:20 PM
>> On Fri, May 14, 2010 at 1:01 PM, Jack
>> Vogel <jfvogel at gmail.com>
>> wrote:
>> >
>> >
>> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack <pisymbol at gmail.com>
>> wrote:
>> >>
>> >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin
>> <gallatin at cs.duke.edu>
>> >> wrote:
>> >> > Alexander Sack wrote:
>> >> > <...>
>> >> >>> Using this driver/firmware combo, we
>> can receive minimal packets at
>> >> >>> line rate (14.8Mpps) to userspace.
>>  You can even access this using a
>> >> >>> libpcap interface.  The trick is
>> that the fast paths are OS-bypass,
>> >> >>> and don't suffer from OS overheads,
>> like lock contention.  See
>> >> >>> http://www.myri.com/scs/SNF/doc/index.html for
>> details.
>> >> >>
>> >> >> But your timestamps will be atrocious at
>> 10G speeds.  Myricom doesn't
>> >> >> timestamp packets AFAIK.  If you want
>> reliable timestamps you need to
>> >> >> look at companies like Endace, Napatech,
>> etc.
>> >> >
>> >> > I see your old help ticket in our system.
>>  Yes, our timestamping
>> >> > is not as good as a dedicated capture card
>> with a GPS reference,
>> >> > but it is good enough for most people.
>> >>
>> >> I was told btw that it doesn't timestamp at ALL.
>>  I am assuming NOW
>> >> that is incorrect.
>> >>
>> >> Define *most* people.
>> >>
>> >> I am not knocking the Myricom card.  In fact I so
>> wish you guys would
>> >> just add the ability to latch to a 1PPS for
>> timestamping and it would
>> >> be perfect.
>> >>
>> >> We use I think an older version of the card
>> internally for replay.
>> >> Its a great multi-purpose card.
>> >>
>> >> However with IPG at 10G in the nanoseconds, anyone
>> trying to do OWDs
>> >> or RTT will find it difficult compared to an
>> Endace or Napatech card.
>> >>
>> >> Btw, I was referring to bpf(4) specifically, so
>> please don't take my
>> >> comments as a knock against it.
>> >>
>> >> >> PS I am not sure but Intel also supports
>> writing packets directly in
>> >> >> cache (yet I thought the 82599 driver
>> actually does a prefetch anyway
>> >> >> which had me confused on why that helps)
>> >> >
>> >> > You're talking about DCA.  We support DCA as
>> well (and I suspect some
>> >> > other 10G NICs do to).  There are a few
>> barriers to using DCA on
>> >> > FreeBSD, not least of which is that FreeBSD
>> doesn't currently have the
>> >> > infrastructure to support it (no IOATDMA or
>> DCA drivers).
>> >>
>> >> Right.
>> >>
>> >> > DCA is also problematic because support from
>> system/motherboard
>> >> > vendors is very spotty.  The vendor must
>> provide the correct tag table
>> >> > in BIOS such that the tags match the CPU/core
>> numbering in the system.
>> >> > Many motherboard vendors don't bother with
>> this, and you cannot enable
>> >> > DCA on a lot of systems, even though the
>> underlying chipset supports
>> >> > DCA.  I've done hacks to force-enable it in
>> the past, with mixed
>> >> > results. The problem is that DCA depends on
>> having the correct tag
>> >> > table, so that packets can be prefetched into
>> the correct CPU's cache.
>> >> > If the tag table is incorrect, DCA is a big
>> pessimization, because it
>> >> > blows the cache in other CPUs.
>> >>
>> >> Right.
>> >>
>> >> > That said, I would *love* it if FreeBSD grew
>> ioatdma/dca support.
>> >> > Jack, does Intel have any interest in porting
>> DCA support to FreeBSD?
>> >>
>> >> Question for Jack or Drew, what DOES FreeBSD have
>> to do to support
>> >> DCA?  I thought DCA was something you just enable
>> on the NIC chipset
>> >> and if the system is IOATDMA aware, it just works.
>>  Is that not right
>> >> (assuming cache tags are correct and accessible)?
>>  i.e. I thought this
>> >> was hardware black magic than anything specific
>> the OS has to do.
>> >>
>> >
>> > OK, let me see if I can clarify some of this. First,
>> there IS an I/OAT
>> > driver
>> > that I did for FreeBSD like 3 or 4 years ago, in the
>> timeframe that we put
>> > the feature out. However, at that time all it was good
>> for was the DMA
>> > aspect
>> > of things, and Prafulla used it to accelerate the
>> stack copies; interest did
>> > not seem that great so I put the code aside, its not
>> badly dated and needs
>> > to be brought up to date due to there being a few
>> different versions of the
>> > hardware now.
>> >
>> > At one point maybe a year back I started to take the
>> code apart thinking
>> > I would JUST do DCA, that got back-burnered due to
>> other higher priority
>> > issues, but its still an item in my queue.
>> >
>> > I also had a nibble of an interest in using the DMA
>> engine so perhaps I
>> > should not go down the road of just doing the DCA
>> support in the I/OAT
>> > part of the driver. The question is how to make the
>> infrastructure work.
>> >
>> > To answer Alexander's question, DCA support is NOT in
>> the NIC, its in
>> > the chipset, that's why the I/OAT driver was done as a
>> seperate driver,
>> > but the NIC was the user of the info, its been a while
>> since I was into
>> > the code but if memory serves the I/OAT driver just
>> enables the support
>> > in the chipset, and then the NIC driver configures its
>> engine to use it.
>>
>> Thank you very much Jack!  :)  It was not clear
>> from the docs what was
>> where to me.  I just assumed this was Intel NIC knew
>> Intel chipset
>> black magic!  LOL.
>>
>> > DCA and DMA were supported in Linux in the same driver
>> because
>> > the chipset features were easily handled together
>> perhaps, I'm not
>> > sure :)
>>
>> Ok!  (it was my other reference)
>>
>> > Fabien's data earlier in this thread suggested that a
>> strategicallly
>> > placed prefetch did you more good than DCA did if I
>> recall, what
>> > do you all think of that?
>>
>> I thought there was a thread where prefetch didn't do much
>> for you....lol...
>>
>> If you just prefetch willy-nilly then don't you run the
>> risk of
>> packets hitting caches on cores outside of what the
>> application
>> reading them is on thereby defeating the whole purpose of
>> prefetch?
>>
>> > As far as I'm concerned right now I am willing to
>> resurrect the driver,
>> > clean it up and make the features available, we can
>> see how valuable
>> > they are after that, how does that sound??
>>
>> Sounds good to me.  I at least put it somewhere
>> publicly for people to look at.
>>
>> -aps
>
> Of course none of this has anything to do with the original subject.
> Processing a monodirectional stream is really no problem, nor does
> it require any sort of special design consideration. All of this chatter
> about card features is largely minutia.
>
> Modern processors are so fast that its a waste of brain cells to spend
> time trying to squeeze nonoseconds from packet gathering. You guys sound
> the same as when you were trying to do 10Mb/s ethernet with ISA bus NICs.

It depends on what you really mean and what lock contention you are
specifically talking about.

The NIC features as well as multi-queue bpf(4) is a way to distribute
the load across multiple cores thereby lowering total CPU overhead
(that's always good) AS WELL AS provide the ability for libpcap
consumers to post-process caught packets in cache.  Most third-party
capture cards already do just this:  they are typically stream or feed
based and allow for flow based steering to distribute the load across
cores.

Intel only recently has added this in their 10g chipsets (Jack can
correct if I'm wrong).

All of these things help both in capture and post-processing.

> It makes no sense to focus on optimizing tires for a car which can't break
>  80Mph. The entire problem is lock contention. Until you have a driver
> that can scale to a point where 10gb/s is workable without significant
> lock contention, you're just feeding a dead body.

Lock contention in bpf(4) or in the NIC driver or in both?  :)

> Unless of course your goal for 10gb/s for FreeBSD is for it to be a really
> good network monitor.

That is exactly my goal: it would be great to see FreeBSD as a
fantastic general-purpose network monitor at 10gb/s speeds.  There are
couple of issues one of which is also timestamping.

-aps


More information about the freebsd-performance mailing list