FreeBSD boxes as a 'router'...

Thu Nov 22 14:14:14 UTC 2012

--- On Wed, 11/21/12, Adrian Chadd <adrian at freebsd.org> wrote:

> From: Adrian Chadd <adrian at freebsd.org>
> Subject: Re: FreeBSD boxes as a 'router'...
> To: "Andre Oppermann" <andre at freebsd.org>
> Cc: "Barney Cordoba" <barney_cordoba at yahoo.com>, "Jim Thompson" <jim at netgate.com>, "Alfred Perlstein" <bright at mu.org>, khatfield at socllc.net, "freebsd-net at freebsd.org" <freebsd-net at freebsd.org>
> Date: Wednesday, November 21, 2012, 1:26 PM
> On 21 November 2012 00:30, Andre
> Oppermann <andre at freebsd.org>
> wrote:
> > On 21.11.2012 08:55, Adrian Chadd wrote:
> >>
> >> Something that has popped up a few times, even
> recently, is breaking
> >> out of an RX loop after you service a number of
> frames.
> >
> > That is what I basically described.
> 
> Right, and this can be done right now without too much
> reworking,
> right? I mean, people could begin by doing a drive-by on
> drivers for
> this.
> The RX path for a driver shouldn't be too difficult to do;
> the TX path
> is the racy one.
> 
> >> During stupidly high levels of RX, you may find the
> NIC happily
> >> receiving frames faster than you can service the RX
> queue. If this
> >> occurs, you could end up just plain being stuck
> there.
> 
> > That's the live-lock.
> 
> And again you can solve this without having to devolve into
> polling.
> Again, polling to me feels like a bludgeon beating around a
> system
> that isn't really designed for the extreme cases it's
> facing.
> Maybe your work in the tcp_taskqueue branch addresses the
> larger scale
> issues here, but I've solved this relatively easily in the
> past.
> 
> >> So what I've done in the past is to loop over a
> certain number of
> >> frames, then schedule a taskqueue to service
> whatever's left over.
> 
> > Taskqueue's shouldn't be used anymore.  We've got
> ithreads now.
> > Contrary to popular belief (and due to poor
> documentation) an
> > ithread does not run at interrupt level.  Only the
> fast interrupt
> > handler does that.  The ithread is a normal kernel
> thread tied to
> > an fast interrupt handler and trailing it whenever it
> said
> > INTR_SCHEDULE_ITHREAD.
> 
> Sure, but taskqueues are still useful if you want to
> serialise access
> without relying on mutexes wrapping large parts of the
> packet handling
> code to enforce said order.
> 
> Yes, normal ithreads don't run at interrupt level.
> 
> And we can change the priority of taskqueues in each driver,
> right?
> And/or we could change the behaviour of driver
> ithreads/taskqueues to
> be automatically reniced?

Why schedule a taskqueue? You're just adding more work to a system
that's already overloaded. You'll get another interrupt soon enough.
You can control the delay, to simulate a "poll" without having to 
add yet-another task to the system.

The idea that you're getting so many packets that the system can't handle
it, and that you have to schedule a task because you might not get 
another interrupt is just bad thinking for anything other than an end
user application, in which case this conversation isn't relevant. 
> 
> I'm not knocking your work here, I'm just trying to
> understand whether
> we can do this stuff as small individual pieces of work
> rather than
> one big subsystem overhaul.
> 
> And CoDel is interesting as a concept, but it's certainly
> not new. But
> again, if you don't drop the frames during the driver
> receive path
> (and try to do it higher up in the stack, eg as part of some
> firewall
> rule) you still risk reaching a stable state where the CPU
> is 100%
> pinned because you've wasted cycles pushing those frames
> into the
> queue only to be dropped.

Queue althorithms that assume that all network applications are the
same are to be put on the heap with isdn and ATM and other stupid ideas
designed by IETF "thinkers". 

The design goal should be to avoid queuing; and drop events are usually
not part of a normal flow; the concept that you can have a nice algorithm
to handle it assumes that you are trying to do too much with a too slow
cpu. Crap designed by Cisco exists only because their hardware never had
enough CPU to do the work needed to be done. 

> What _I_ had to do there was have a quick gate to look up if
> a frame
> was part of an active session in ipfw and if it was, let it
> be queued
> to the driver. I also had a second gate in the driver for
> new TCP
> connections, but that was a separate hack. Anything else was
> dropped.
> 
> In any case, what I'm trying to say is this - when I was
> last doing
> this kind of stuff, I didn't just subscribe to "polling will
> fix all."
> I spent a few months knee deep in the public intel e1000
> documentation
> and tuning guide, the em driver and the queue/firewall code,
> in order
> to figure out how to attack this without using polling.
> 
> And yes, you've also just described NAPI. :-)
> 

Im not sure what you're doing that would cause packets to come in faster
than you can service them, unless you're running on an old XT or something.
A modern $300 cpu can manage an awful lot of packets, depending on your
application.

Packets are like customers. Sometimes you have to let them go. Its fairly
easy to determine what a given system running a given application can 
handle. If you get more than that you have little chance of figuring out
a scheme to manage it.

If you're running an embedded app and you dont have the option of simply
getting a faster machine, then you just have to set a threshold and deal
with it. You can try to be "smart" and peek at packets and drop "less
important" packets, but in my experience the smarter you try to be, the 
dumber you turn out to be.

with modern cpus with big caches, the bottlenecks are almost always locking
and not queuing or memory shuffling, assuming you're not running on a
single core system. So design accordingly.

Unfortunately the multiqueue drivers in FreeBSD aren't usable, so until
someone figures out a proper design that just doesn't suck up more cores
with marginal if any gains in capacity, you're stuck 

BC