MQ Patch.
Navdeep Parhar
np at FreeBSD.org
Tue Oct 29 22:03:14 UTC 2013
On 10/29/13 14:25, Andre Oppermann wrote:
> On 29.10.2013 22:03, Navdeep Parhar wrote:
>> On 10/29/13 13:41, Andre Oppermann wrote:
>>> Let me jump in here and explain roughly the ideas/path I'm exploring
>>> in creating and eventually implementing a big picture for drivers,
>>> queues, queue management, various QoS and so on:
>>>
>>> Situation: We're still mostly based on the old 4.4BSD IFQ model with
>>> a couple of work-arounds (sndring, drbr) and the bit-rotten ALTQ we
>>> have in tree aren't helpful at all.
>>>
>>> Steps:
>>>
>>> 1. take the soft-queuing method out of the ifnet layer and make it
>>> a property of the driver, so that the upper stack (or actually
>>> protocol L3/L2 mapping/encapsulation layer) calls (*if_transmit)
>>> without any queuing at that point. It then is up to the driver
>>> to decide how it multiplexes multi-core access to its queue(s)
>>> and how they are configured.
>>
>> It would work out much better if the kernel was aware of the number of
>> tx queues of a multiq driver and explicitly selected one in if_transmit.
>> The driver has no information on the CPU affinity etc. of the
>> applications generating the traffic; the kernel does. In general, the
>> kernel has a much better "global view" of the system and some of the
>> stuff currently in the drivers really should move up into the stack.
>
> I've been thinking a lot about this and come to the preliminary conclusion
> that the upper stack should not tell the driver which queue to use. There
> are way to many possible and depending on the use-case, better or worse
> performing approaches. Also we have a big problem with cores vs. queues
> mismatches either way (more cores than queues or more queues than cores,
> though the latter is much less of problem).
>
> For now I see these primary multi-hardware-queue approaches to be
> implemented
> first:
>
> a) the drivers (*if_transmit) takes the flowid from the mbuf header and
> selects one of the N hardware DMA rings based on it. Each of the DMA
> rings is protected by a lock. Here the assumption is that by having
> enough DMA rings the contention on each of them will be relatively low
> and ideally a flow and ring sort of sticks to a core that sends lots
> of packets into that flow. Of course it is a statistical certainty that
> some bouncing will be going on.
>
> b) the driver assigns the DMA rings to particular cores which by that,
> through
> a critnest++ can drive them lockless. The drivers (*if_transmit)
> will look
> up the core it got called on and push the traffic out on that DMA ring.
> The problem is the actual upper stacks affinity which is not guaranteed.
> This has to consequences: there may be reordering of packets of the same
> flow because the protocols send function happens to be called from a
> different core the second time. Or the drivers (*if_transmit) has to
> switch to the right core to complete the transmit for this flow if the
> upper stack migrated/bounced around. It is rather difficult to assure
> full affinity from userspace down through the upper stack and then to
> the driver.
>
> c) non-multi-queue capable hardware uses a kernel provided set of functions
> to manage the contention for the single resource of a DMA ring.
>
> The point here is that the driver is the right place to make these
> decisions
> because the upper stack lacks (and shouldn't care about) the actual
> available
> hardware and its capabilities. All necessary information is available
> to the
> driver as well through the appropriate mbuf header fields and the core
> it is
> called on.
>
I mildly disagree with most of this, specifically with the part that the
driver is the right place to make these decisions. But you did say this
was a "preliminary conclusion" so there's hope yet ;-)
Let's wait till you have an early implementation and we are all able to
experiment with it. To be continued...
Regards,
Navdeep
More information about the freebsd-net
mailing list