IF_HANDOFF vs. IFQ_HANDOFF

Thu Jun 15 16:18:09 UTC 2006

On 15-Jun-2006 Robert Watson wrote:
> On Wed, 14 Jun 2006, John Polstra wrote:
> 
>> Can somebody explain why there is both an IF_HANDOFF macro and an 
>> IFQ_HANDOFF macro?  Except for a slight difference in parameters, they both 
>> seem to do roughly the same thing using completely distinct blocks of code. 
>> Is IF_HANDOFF supposed to be used only when the target queue is not the 
>> interface's if_snd queue?  That seems likely, but a few pieces of code 
>> (e.g., ng_fec.c) use IF_HANDOFF to put a packet into the if_snd queue.
> 
> In essense, yes.  One is used when the default interface queue is used, and 
> the other is used when a specific queue is desired.  However, the names are, 
> in fact, backwards from what you expect, for historical reasons.  You'll 
> notice, also, that the behavior of an interface queue in the presence of altq 
> is quite different than the non-altq case, also for historical reasons, and 
> that the handoff routines are sometimes without used any ifnet at all (e.g., 
> the netisr handoff).  I've looked at cleaning this up a few times, as well as 
> a potential race in the use of the IFF_OACTIVE flag, but have gotten 
> side-tracked in the socket and protocol code for the last 4-6 months.

Yep, the IFF_OACTIVE race was one of the things that made it catch my
eye in the first place.  Luckily (?), most of the FreeBSD drivers use
IFF_OACTIVE completely incorrectly.  (The em and bge drivers certainly
do.)  If they used it properly, the race would matter a lot more.

> On a number of occasions, we've discussed moving towards an
> if_startmbuf() call, which pushes the selection of queueing logic
> into the network device driver, which is increasingly desirable when
> network interfaces support multiple queues, significant queues in
> hardware, and where we have pseudo-interfaces wrapped around real
> ones and want to avoid multiple enqueue operations (i.e., if_vlan).

Yes!  The if_snd queue is obsolete with modern hardware.  All it does
is slow things down.  The network adapter's transmit descriptor ring
is all the send queue we need.  We obviously can't get rid of the
if_snd queue entirely as long as low-end hardware still exists (i.e.,
until Hell freezes over), but it would be nice if it were possible to
bypass it.

> A related question is whether or not the generice queueing
> primitives should provide implicit locking (as they do now), and
> whether coalescing that locking with existing driver locking makes
> sense.  For example, in the case where oactive isn't currently
> set, we actually perform six lock operations during mbuf handoff:
> lock/unlock the queue for enqueue, enter the device driver, lock
> driver mutex, lock/unlock the queue for dequeue, unlock the device
> driver lock.  In the case where oactive is set (i.e., there's queued
> output), which is frequently the case under load,

Good -- I'm glad somebody else understands how IFF_OACTIVE is supposed
to be used.  Too bad our most important drivers don't use it properly.
The whole point of the IFF_OACTIVE flag is to suppress the call to the
if_start function when it's not needed -- i.e., whenever we can rely
on the driver to come back later on its own initiative and process the
if_snd queue.  In practice, that means the flag should be set whenever
the driver has one or more transmit packets that have been initiated
in the HW but have not yet completed.  When the completion interrupt
comes in, the driver is supposed to check the if_snd queue for more
mbufs and process them.  Only when the transmit side of the HW goes
totally idle should IFF_OACTIVE be cleared again.  Most of our drivers
set the flag only when they run out of transmit descriptors (i.e.,
practically never), which is just plain wrong.

> this becomes significantly more efficient, because having a separate
> queue mutex avoids contention on the driver mutex during send if
> the driver is busy, etc.  Evaluating the benefits of the additional
> granularity there is something that has never been done thoroughly,
> though.
>
> In short, it's a bit of a mess, but it's non-trivial to fix beyond
> some basics because it will require us to figure out where we want
> to go with interface queuing and handoff.  Moving to if_startmbuf
> is relatively easy (I've prototyped it at least once, and have a
> relatively recent version in p4 somewhere), as you can migrate
> drivers gradually.  On the other hand, without a clear message to
> driver writers about how we want them to do queueing in the general
> case, this also comes with risks.

Well, I'm glad somebody has been thinking about it.

John