Intel 10Gb

Barney Cordoba barney_cordoba at yahoo.com
Sat May 15 13:23:59 UTC 2010



--- On Fri, 5/14/10, Alexander Sack <pisymbol at gmail.com> wrote:

> From: Alexander Sack <pisymbol at gmail.com>
> Subject: Re: Intel 10Gb
> To: "Jack Vogel" <jfvogel at gmail.com>
> Cc: "Murat Balaban" <murat at enderunix.org>, freebsd-net at freebsd.org, freebsd-performance at freebsd.org, "Andrew Gallatin" <gallatin at cs.duke.edu>
> Date: Friday, May 14, 2010, 1:20 PM
> On Fri, May 14, 2010 at 1:01 PM, Jack
> Vogel <jfvogel at gmail.com>
> wrote:
> >
> >
> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack <pisymbol at gmail.com>
> wrote:
> >>
> >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin
> <gallatin at cs.duke.edu>
> >> wrote:
> >> > Alexander Sack wrote:
> >> > <...>
> >> >>> Using this driver/firmware combo, we
> can receive minimal packets at
> >> >>> line rate (14.8Mpps) to userspace.
>  You can even access this using a
> >> >>> libpcap interface.  The trick is
> that the fast paths are OS-bypass,
> >> >>> and don't suffer from OS overheads,
> like lock contention.  See
> >> >>> http://www.myri.com/scs/SNF/doc/index.html for
> details.
> >> >>
> >> >> But your timestamps will be atrocious at
> 10G speeds.  Myricom doesn't
> >> >> timestamp packets AFAIK.  If you want
> reliable timestamps you need to
> >> >> look at companies like Endace, Napatech,
> etc.
> >> >
> >> > I see your old help ticket in our system.
>  Yes, our timestamping
> >> > is not as good as a dedicated capture card
> with a GPS reference,
> >> > but it is good enough for most people.
> >>
> >> I was told btw that it doesn't timestamp at ALL.
>  I am assuming NOW
> >> that is incorrect.
> >>
> >> Define *most* people.
> >>
> >> I am not knocking the Myricom card.  In fact I so
> wish you guys would
> >> just add the ability to latch to a 1PPS for
> timestamping and it would
> >> be perfect.
> >>
> >> We use I think an older version of the card
> internally for replay.
> >> Its a great multi-purpose card.
> >>
> >> However with IPG at 10G in the nanoseconds, anyone
> trying to do OWDs
> >> or RTT will find it difficult compared to an
> Endace or Napatech card.
> >>
> >> Btw, I was referring to bpf(4) specifically, so
> please don't take my
> >> comments as a knock against it.
> >>
> >> >> PS I am not sure but Intel also supports
> writing packets directly in
> >> >> cache (yet I thought the 82599 driver
> actually does a prefetch anyway
> >> >> which had me confused on why that helps)
> >> >
> >> > You're talking about DCA.  We support DCA as
> well (and I suspect some
> >> > other 10G NICs do to).  There are a few
> barriers to using DCA on
> >> > FreeBSD, not least of which is that FreeBSD
> doesn't currently have the
> >> > infrastructure to support it (no IOATDMA or
> DCA drivers).
> >>
> >> Right.
> >>
> >> > DCA is also problematic because support from
> system/motherboard
> >> > vendors is very spotty.  The vendor must
> provide the correct tag table
> >> > in BIOS such that the tags match the CPU/core
> numbering in the system.
> >> > Many motherboard vendors don't bother with
> this, and you cannot enable
> >> > DCA on a lot of systems, even though the
> underlying chipset supports
> >> > DCA.  I've done hacks to force-enable it in
> the past, with mixed
> >> > results. The problem is that DCA depends on
> having the correct tag
> >> > table, so that packets can be prefetched into
> the correct CPU's cache.
> >> > If the tag table is incorrect, DCA is a big
> pessimization, because it
> >> > blows the cache in other CPUs.
> >>
> >> Right.
> >>
> >> > That said, I would *love* it if FreeBSD grew
> ioatdma/dca support.
> >> > Jack, does Intel have any interest in porting
> DCA support to FreeBSD?
> >>
> >> Question for Jack or Drew, what DOES FreeBSD have
> to do to support
> >> DCA?  I thought DCA was something you just enable
> on the NIC chipset
> >> and if the system is IOATDMA aware, it just works.
>  Is that not right
> >> (assuming cache tags are correct and accessible)?
>  i.e. I thought this
> >> was hardware black magic than anything specific
> the OS has to do.
> >>
> >
> > OK, let me see if I can clarify some of this. First,
> there IS an I/OAT
> > driver
> > that I did for FreeBSD like 3 or 4 years ago, in the
> timeframe that we put
> > the feature out. However, at that time all it was good
> for was the DMA
> > aspect
> > of things, and Prafulla used it to accelerate the
> stack copies; interest did
> > not seem that great so I put the code aside, its not
> badly dated and needs
> > to be brought up to date due to there being a few
> different versions of the
> > hardware now.
> >
> > At one point maybe a year back I started to take the
> code apart thinking
> > I would JUST do DCA, that got back-burnered due to
> other higher priority
> > issues, but its still an item in my queue.
> >
> > I also had a nibble of an interest in using the DMA
> engine so perhaps I
> > should not go down the road of just doing the DCA
> support in the I/OAT
> > part of the driver. The question is how to make the
> infrastructure work.
> >
> > To answer Alexander's question, DCA support is NOT in
> the NIC, its in
> > the chipset, that's why the I/OAT driver was done as a
> seperate driver,
> > but the NIC was the user of the info, its been a while
> since I was into
> > the code but if memory serves the I/OAT driver just
> enables the support
> > in the chipset, and then the NIC driver configures its
> engine to use it.
> 
> Thank you very much Jack!  :)  It was not clear
> from the docs what was
> where to me.  I just assumed this was Intel NIC knew
> Intel chipset
> black magic!  LOL.
> 
> > DCA and DMA were supported in Linux in the same driver
> because
> > the chipset features were easily handled together
> perhaps, I'm not
> > sure :)
> 
> Ok!  (it was my other reference)
> 
> > Fabien's data earlier in this thread suggested that a
> strategicallly
> > placed prefetch did you more good than DCA did if I
> recall, what
> > do you all think of that?
> 
> I thought there was a thread where prefetch didn't do much
> for you....lol...
> 
> If you just prefetch willy-nilly then don't you run the
> risk of
> packets hitting caches on cores outside of what the
> application
> reading them is on thereby defeating the whole purpose of
> prefetch?
> 
> > As far as I'm concerned right now I am willing to
> resurrect the driver,
> > clean it up and make the features available, we can
> see how valuable
> > they are after that, how does that sound??
> 
> Sounds good to me.  I at least put it somewhere
> publicly for people to look at.
> 
> -aps

Of course none of this has anything to do with the original subject.
Processing a monodirectional stream is really no problem, nor does
it require any sort of special design consideration. All of this chatter
about card features is largely minutia. 

Modern processors are so fast that its a waste of brain cells to spend
time trying to squeeze nonoseconds from packet gathering. You guys sound
the same as when you were trying to do 10Mb/s ethernet with ISA bus NICs.

It makes no sense to focus on optimizing tires for a car which can't break
 80Mph. The entire problem is lock contention. Until you have a driver
that can scale to a point where 10gb/s is workable without significant
lock contention, you're just feeding a dead body.

Unless of course your goal for 10gb/s for FreeBSD is for it to be a really
good network monitor. 

BC 


      


More information about the freebsd-performance mailing list